All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-07 13:55 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-07 13:55 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Robin Murphy, Michael Ellerman,
	Joerg Roedel, Joel Stanley, Jason Gunthorpe, Alex Williamson,
	Oliver O'Halloran, kvm-ppc, kvm, Jason Gunthorpe,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin

Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
the Type1 VFIO driver. Recent development though has added a coherency
capability check to the generic part of VFIO and essentially disabled
VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().

This adds an iommu_ops stub which reports support for cache
coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
this provides minimum code for the probing to not crash.

Because now we have to set iommu_ops to the system (bus_set_iommu() or
iommu_device_register()), this requires the POWERNV PCI setup to happen
after bus_register(&pci_bus_type) which is postcore_initcall
TODO: check if it still works, read sha1, for more details:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387

Because setting the ops triggers probing, this does not work well with
iommu_group_add_device(), hence the move to iommu_probe_device().

Because iommu_probe_device() does not take the group (which is why
we had the helper in the first place), this adds
pci_controller_ops::device_group.

So, basically there is one iommu_device per PHB and devices are added to
groups indirectly via series of calls inside the IOMMU code.

pSeries is out of scope here (a minor fix needed for barely supported
platform in regard to VFIO).

The previous discussion is here:
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices")
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---


does it make sense to have this many callbacks, or
the generic IOMMU code can safely operate without some
(given I add some more checks for !NULL)? thanks,


---
 arch/powerpc/include/asm/iommu.h          |   2 +
 arch/powerpc/include/asm/pci-bridge.h     |   7 ++
 arch/powerpc/kernel/iommu.c               | 106 +++++++++++++++++++++-
 arch/powerpc/kernel/pci-common.c          |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  40 ++++++++
 5 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7e29c73e3dd4..4bdae0ee29d0 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -215,6 +215,8 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
 		enum dma_data_direction *direction);
 extern void iommu_tce_kill(struct iommu_table *tbl,
 		unsigned long entry, unsigned long pages);
+
+extern const struct iommu_ops spapr_tce_iommu_ops;
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c85f901227c9..338a45b410b4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -8,6 +8,7 @@
 #include <linux/list.h>
 #include <linux/ioport.h>
 #include <linux/numa.h>
+#include <linux/iommu.h>
 
 struct device_node;
 
@@ -44,6 +45,9 @@ struct pci_controller_ops {
 #endif
 
 	void		(*shutdown)(struct pci_controller *hose);
+
+	struct iommu_group *(*device_group)(struct pci_controller *hose,
+					    struct pci_dev *pdev);
 };
 
 /*
@@ -131,6 +135,9 @@ struct pci_controller {
 	struct irq_domain	*dev_domain;
 	struct irq_domain	*msi_domain;
 	struct fwnode_handle	*fwnode;
+
+	/* iommu_ops support */
+	struct iommu_device	iommu;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 7e56ddb3e0b9..c4c7eb596fef 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1138,6 +1138,8 @@ EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
+	int ret;
+
 	/*
 	 * The sysfs entries should be populated before
 	 * binding IOMMU group. If sysfs entries isn't
@@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 	pr_debug("%s: Adding %s to iommu group %d\n",
 		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
 
-	return iommu_group_add_device(table_group->group, dev);
+	ret = iommu_probe_device(dev);
+	dev_info(dev, "probed with %d\n", ret);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
@@ -1176,4 +1181,103 @@ void iommu_del_device(struct device *dev)
 	iommu_group_remove_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_del_device);
+
+/*
+ * A simple iommu_ops to allow less cruft in generic VFIO code.
+ */
+static bool spapr_tce_iommu_capable(enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
+{
+	struct iommu_domain *domain;
+
+	if (type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+	if (!domain)
+		return NULL;
+
+	domain->geometry.aperture_start = 0;
+	domain->geometry.aperture_end = ~0ULL;
+	domain->geometry.force_aperture = true;
+
+	return domain;
+}
+
+static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
+{
+	struct pci_dev *pdev;
+	struct pci_controller *hose;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+	return &hose->iommu;
+}
+
+static void spapr_tce_iommu_release_device(struct device *dev)
+{
+}
+
+static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
+{
+	return false;
+}
+
+static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
+{
+	struct pci_controller *hose;
+	struct pci_dev *pdev;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	if (!hose->controller_ops.device_group)
+		return ERR_PTR(-ENOENT);
+
+	return hose->controller_ops.device_group(hose, pdev);
+}
+
+static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
+				      struct device *dev)
+{
+	return 0;
+}
+
+static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
+				       struct device *dev)
+{
+}
+
+const struct iommu_ops spapr_tce_iommu_ops = {
+	.capable = spapr_tce_iommu_capable,
+	.domain_alloc = spapr_tce_iommu_domain_alloc,
+	.probe_device = spapr_tce_iommu_probe_device,
+	.release_device = spapr_tce_iommu_release_device,
+	.device_group = spapr_tce_iommu_device_group,
+	.is_attach_deferred = spapr_tce_iommu_is_attach_deferred,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev = spapr_tce_iommu_attach_dev,
+		.detach_dev = spapr_tce_iommu_detach_dev,
+	}
+};
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 068410cd54a3..72ca5afba0c0 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1714,4 +1714,4 @@ static int __init discover_phbs(void)
 
 	return 0;
 }
-core_initcall(discover_phbs);
+postcore_initcall_sync(discover_phbs);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5e65d983e257..d5139d003794 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2912,6 +2912,25 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus)
 	}
 }
 
+static struct iommu_group *pnv_pci_device_group(struct pci_controller *hose,
+						struct pci_dev *pdev)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+
+	if (WARN_ON(!phb))
+		return ERR_PTR(-ENODEV);
+
+	pe = pnv_pci_bdfn_to_pe(phb, pdev->devfn | (pdev->bus->number << 8));
+	if (!pe)
+		return ERR_PTR(-ENODEV);
+
+	if (!pe->table_group.group)
+		return ERR_PTR(-ENODEV);
+
+	return pe->table_group.group;
+}
+
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_ioda_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_ioda_dma_bus_setup,
@@ -2922,6 +2941,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.setup_bridge		= pnv_pci_fixup_bridge_resources,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.shutdown		= pnv_pci_ioda_shutdown,
+	.device_group		= pnv_pci_device_group,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
@@ -2932,6 +2952,20 @@ static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
 	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
+static struct attribute *spapr_tce_iommu_attrs[] = {
+	NULL,
+};
+
+static struct attribute_group spapr_tce_iommu_group = {
+	.name = "spapr-tce-iommu",
+	.attrs = spapr_tce_iommu_attrs,
+};
+
+static const struct attribute_group *spapr_tce_iommu_groups[] = {
+	&spapr_tce_iommu_group,
+	NULL,
+};
+
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 u64 hub_id, int ioda_type)
 {
@@ -3199,6 +3233,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* create pci_dn's for DT nodes under this PHB */
 	pci_devs_phb_init_dynamic(hose);
+
+	iommu_device_sysfs_add(&hose->iommu, hose->parent,
+			       spapr_tce_iommu_groups, "iommu%lld", phb_id);
+	rc = iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops, hose->parent);
+	if (rc)
+		pr_warn("iommu_device_register returned %ld\n", rc);
 }
 
 void __init pnv_pci_init_ioda2_phb(struct device_node *np)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-07 13:55 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-07 13:55 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Joerg Roedel, kvm, Fabiano Rosas, Alexey Kardashevskiy,
	Daniel Henrique Barboza, Nicholas Piggin, kvm-ppc,
	Jason Gunthorpe, Alex Williamson, Oliver O'Halloran,
	Joel Stanley, Jason Gunthorpe, Murilo Opsfelder Araujo,
	Robin Murphy

Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
the Type1 VFIO driver. Recent development though has added a coherency
capability check to the generic part of VFIO and essentially disabled
VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().

This adds an iommu_ops stub which reports support for cache
coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
this provides minimum code for the probing to not crash.

Because now we have to set iommu_ops to the system (bus_set_iommu() or
iommu_device_register()), this requires the POWERNV PCI setup to happen
after bus_register(&pci_bus_type) which is postcore_initcall
TODO: check if it still works, read sha1, for more details:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387

Because setting the ops triggers probing, this does not work well with
iommu_group_add_device(), hence the move to iommu_probe_device().

Because iommu_probe_device() does not take the group (which is why
we had the helper in the first place), this adds
pci_controller_ops::device_group.

So, basically there is one iommu_device per PHB and devices are added to
groups indirectly via series of calls inside the IOMMU code.

pSeries is out of scope here (a minor fix needed for barely supported
platform in regard to VFIO).

The previous discussion is here:
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices")
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---


does it make sense to have this many callbacks, or
the generic IOMMU code can safely operate without some
(given I add some more checks for !NULL)? thanks,


---
 arch/powerpc/include/asm/iommu.h          |   2 +
 arch/powerpc/include/asm/pci-bridge.h     |   7 ++
 arch/powerpc/kernel/iommu.c               | 106 +++++++++++++++++++++-
 arch/powerpc/kernel/pci-common.c          |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  40 ++++++++
 5 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7e29c73e3dd4..4bdae0ee29d0 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -215,6 +215,8 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
 		enum dma_data_direction *direction);
 extern void iommu_tce_kill(struct iommu_table *tbl,
 		unsigned long entry, unsigned long pages);
+
+extern const struct iommu_ops spapr_tce_iommu_ops;
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c85f901227c9..338a45b410b4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -8,6 +8,7 @@
 #include <linux/list.h>
 #include <linux/ioport.h>
 #include <linux/numa.h>
+#include <linux/iommu.h>
 
 struct device_node;
 
@@ -44,6 +45,9 @@ struct pci_controller_ops {
 #endif
 
 	void		(*shutdown)(struct pci_controller *hose);
+
+	struct iommu_group *(*device_group)(struct pci_controller *hose,
+					    struct pci_dev *pdev);
 };
 
 /*
@@ -131,6 +135,9 @@ struct pci_controller {
 	struct irq_domain	*dev_domain;
 	struct irq_domain	*msi_domain;
 	struct fwnode_handle	*fwnode;
+
+	/* iommu_ops support */
+	struct iommu_device	iommu;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 7e56ddb3e0b9..c4c7eb596fef 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1138,6 +1138,8 @@ EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
+	int ret;
+
 	/*
 	 * The sysfs entries should be populated before
 	 * binding IOMMU group. If sysfs entries isn't
@@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 	pr_debug("%s: Adding %s to iommu group %d\n",
 		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
 
-	return iommu_group_add_device(table_group->group, dev);
+	ret = iommu_probe_device(dev);
+	dev_info(dev, "probed with %d\n", ret);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
@@ -1176,4 +1181,103 @@ void iommu_del_device(struct device *dev)
 	iommu_group_remove_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_del_device);
+
+/*
+ * A simple iommu_ops to allow less cruft in generic VFIO code.
+ */
+static bool spapr_tce_iommu_capable(enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
+{
+	struct iommu_domain *domain;
+
+	if (type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+	if (!domain)
+		return NULL;
+
+	domain->geometry.aperture_start = 0;
+	domain->geometry.aperture_end = ~0ULL;
+	domain->geometry.force_aperture = true;
+
+	return domain;
+}
+
+static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
+{
+	struct pci_dev *pdev;
+	struct pci_controller *hose;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+	return &hose->iommu;
+}
+
+static void spapr_tce_iommu_release_device(struct device *dev)
+{
+}
+
+static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
+{
+	return false;
+}
+
+static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
+{
+	struct pci_controller *hose;
+	struct pci_dev *pdev;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	if (!hose->controller_ops.device_group)
+		return ERR_PTR(-ENOENT);
+
+	return hose->controller_ops.device_group(hose, pdev);
+}
+
+static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
+				      struct device *dev)
+{
+	return 0;
+}
+
+static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
+				       struct device *dev)
+{
+}
+
+const struct iommu_ops spapr_tce_iommu_ops = {
+	.capable = spapr_tce_iommu_capable,
+	.domain_alloc = spapr_tce_iommu_domain_alloc,
+	.probe_device = spapr_tce_iommu_probe_device,
+	.release_device = spapr_tce_iommu_release_device,
+	.device_group = spapr_tce_iommu_device_group,
+	.is_attach_deferred = spapr_tce_iommu_is_attach_deferred,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev = spapr_tce_iommu_attach_dev,
+		.detach_dev = spapr_tce_iommu_detach_dev,
+	}
+};
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 068410cd54a3..72ca5afba0c0 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1714,4 +1714,4 @@ static int __init discover_phbs(void)
 
 	return 0;
 }
-core_initcall(discover_phbs);
+postcore_initcall_sync(discover_phbs);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5e65d983e257..d5139d003794 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2912,6 +2912,25 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus)
 	}
 }
 
+static struct iommu_group *pnv_pci_device_group(struct pci_controller *hose,
+						struct pci_dev *pdev)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+
+	if (WARN_ON(!phb))
+		return ERR_PTR(-ENODEV);
+
+	pe = pnv_pci_bdfn_to_pe(phb, pdev->devfn | (pdev->bus->number << 8));
+	if (!pe)
+		return ERR_PTR(-ENODEV);
+
+	if (!pe->table_group.group)
+		return ERR_PTR(-ENODEV);
+
+	return pe->table_group.group;
+}
+
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_ioda_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_ioda_dma_bus_setup,
@@ -2922,6 +2941,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.setup_bridge		= pnv_pci_fixup_bridge_resources,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.shutdown		= pnv_pci_ioda_shutdown,
+	.device_group		= pnv_pci_device_group,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
@@ -2932,6 +2952,20 @@ static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
 	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
+static struct attribute *spapr_tce_iommu_attrs[] = {
+	NULL,
+};
+
+static struct attribute_group spapr_tce_iommu_group = {
+	.name = "spapr-tce-iommu",
+	.attrs = spapr_tce_iommu_attrs,
+};
+
+static const struct attribute_group *spapr_tce_iommu_groups[] = {
+	&spapr_tce_iommu_group,
+	NULL,
+};
+
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 u64 hub_id, int ioda_type)
 {
@@ -3199,6 +3233,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* create pci_dn's for DT nodes under this PHB */
 	pci_devs_phb_init_dynamic(hose);
+
+	iommu_device_sysfs_add(&hose->iommu, hose->parent,
+			       spapr_tce_iommu_groups, "iommu%lld", phb_id);
+	rc = iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops, hose->parent);
+	if (rc)
+		pr_warn("iommu_device_register returned %ld\n", rc);
 }
 
 void __init pnv_pci_init_ioda2_phb(struct device_node *np)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-07 13:55 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-07 13:55 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Alexey Kardashevskiy, Robin Murphy, Michael Ellerman,
	Joerg Roedel, Joel Stanley, Jason Gunthorpe, Alex Williamson,
	Oliver O'Halloran, kvm-ppc, kvm, Jason Gunthorpe,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin

Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
the Type1 VFIO driver. Recent development though has added a coherency
capability check to the generic part of VFIO and essentially disabled
VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().

This adds an iommu_ops stub which reports support for cache
coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
this provides minimum code for the probing to not crash.

Because now we have to set iommu_ops to the system (bus_set_iommu() or
iommu_device_register()), this requires the POWERNV PCI setup to happen
after bus_register(&pci_bus_type) which is postcore_initcall
TODO: check if it still works, read sha1, for more details:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?idU37fcb319d016ce387

Because setting the ops triggers probing, this does not work well with
iommu_group_add_device(), hence the move to iommu_probe_device().

Because iommu_probe_device() does not take the group (which is why
we had the helper in the first place), this adds
pci_controller_ops::device_group.

So, basically there is one iommu_device per PHB and devices are added to
groups indirectly via series of calls inside the IOMMU code.

pSeries is out of scope here (a minor fix needed for barely supported
platform in regard to VFIO).

The previous discussion is here:
https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

Fixes: e8ae0e140c05 ("vfio: Require that devices support DMA cache coherence")
Fixes: 70693f470848 ("vfio: Set DMA ownership for VFIO devices")
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Fabiano Rosas <farosas@linux.ibm.com>
Cc: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---


does it make sense to have this many callbacks, or
the generic IOMMU code can safely operate without some
(given I add some more checks for !NULL)? thanks,


---
 arch/powerpc/include/asm/iommu.h          |   2 +
 arch/powerpc/include/asm/pci-bridge.h     |   7 ++
 arch/powerpc/kernel/iommu.c               | 106 +++++++++++++++++++++-
 arch/powerpc/kernel/pci-common.c          |   2 +-
 arch/powerpc/platforms/powernv/pci-ioda.c |  40 ++++++++
 5 files changed, 155 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 7e29c73e3dd4..4bdae0ee29d0 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -215,6 +215,8 @@ extern long iommu_tce_xchg_no_kill(struct mm_struct *mm,
 		enum dma_data_direction *direction);
 extern void iommu_tce_kill(struct iommu_table *tbl,
 		unsigned long entry, unsigned long pages);
+
+extern const struct iommu_ops spapr_tce_iommu_ops;
 #else
 static inline void iommu_register_group(struct iommu_table_group *table_group,
 					int pci_domain_number,
diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
index c85f901227c9..338a45b410b4 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -8,6 +8,7 @@
 #include <linux/list.h>
 #include <linux/ioport.h>
 #include <linux/numa.h>
+#include <linux/iommu.h>
 
 struct device_node;
 
@@ -44,6 +45,9 @@ struct pci_controller_ops {
 #endif
 
 	void		(*shutdown)(struct pci_controller *hose);
+
+	struct iommu_group *(*device_group)(struct pci_controller *hose,
+					    struct pci_dev *pdev);
 };
 
 /*
@@ -131,6 +135,9 @@ struct pci_controller {
 	struct irq_domain	*dev_domain;
 	struct irq_domain	*msi_domain;
 	struct fwnode_handle	*fwnode;
+
+	/* iommu_ops support */
+	struct iommu_device	iommu;
 };
 
 /* These are used for config access before all the PCI probing
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 7e56ddb3e0b9..c4c7eb596fef 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1138,6 +1138,8 @@ EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 {
+	int ret;
+
 	/*
 	 * The sysfs entries should be populated before
 	 * binding IOMMU group. If sysfs entries isn't
@@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
 	pr_debug("%s: Adding %s to iommu group %d\n",
 		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
 
-	return iommu_group_add_device(table_group->group, dev);
+	ret = iommu_probe_device(dev);
+	dev_info(dev, "probed with %d\n", ret);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_add_device);
 
@@ -1176,4 +1181,103 @@ void iommu_del_device(struct device *dev)
 	iommu_group_remove_device(dev);
 }
 EXPORT_SYMBOL_GPL(iommu_del_device);
+
+/*
+ * A simple iommu_ops to allow less cruft in generic VFIO code.
+ */
+static bool spapr_tce_iommu_capable(enum iommu_cap cap)
+{
+	switch (cap) {
+	case IOMMU_CAP_CACHE_COHERENCY:
+		return true;
+	default:
+		break;
+	}
+
+	return false;
+}
+
+static struct iommu_domain *spapr_tce_iommu_domain_alloc(unsigned int type)
+{
+	struct iommu_domain *domain;
+
+	if (type != IOMMU_DOMAIN_BLOCKED)
+		return NULL;
+
+	domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+	if (!domain)
+		return NULL;
+
+	domain->geometry.aperture_start = 0;
+	domain->geometry.aperture_end = ~0ULL;
+	domain->geometry.force_aperture = true;
+
+	return domain;
+}
+
+static struct iommu_device *spapr_tce_iommu_probe_device(struct device *dev)
+{
+	struct pci_dev *pdev;
+	struct pci_controller *hose;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+	return &hose->iommu;
+}
+
+static void spapr_tce_iommu_release_device(struct device *dev)
+{
+}
+
+static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
+{
+	return false;
+}
+
+static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
+{
+	struct pci_controller *hose;
+	struct pci_dev *pdev;
+
+	/* Weirdly iommu_device_register() assigns the same ops to all buses */
+	if (!dev_is_pci(dev))
+		return ERR_PTR(-EPERM);
+
+	pdev = to_pci_dev(dev);
+	hose = pdev->bus->sysdata;
+
+	if (!hose->controller_ops.device_group)
+		return ERR_PTR(-ENOENT);
+
+	return hose->controller_ops.device_group(hose, pdev);
+}
+
+static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
+				      struct device *dev)
+{
+	return 0;
+}
+
+static void spapr_tce_iommu_detach_dev(struct iommu_domain *dom,
+				       struct device *dev)
+{
+}
+
+const struct iommu_ops spapr_tce_iommu_ops = {
+	.capable = spapr_tce_iommu_capable,
+	.domain_alloc = spapr_tce_iommu_domain_alloc,
+	.probe_device = spapr_tce_iommu_probe_device,
+	.release_device = spapr_tce_iommu_release_device,
+	.device_group = spapr_tce_iommu_device_group,
+	.is_attach_deferred = spapr_tce_iommu_is_attach_deferred,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev = spapr_tce_iommu_attach_dev,
+		.detach_dev = spapr_tce_iommu_detach_dev,
+	}
+};
+
 #endif /* CONFIG_IOMMU_API */
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 068410cd54a3..72ca5afba0c0 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1714,4 +1714,4 @@ static int __init discover_phbs(void)
 
 	return 0;
 }
-core_initcall(discover_phbs);
+postcore_initcall_sync(discover_phbs);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5e65d983e257..d5139d003794 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2912,6 +2912,25 @@ static void pnv_pci_ioda_dma_bus_setup(struct pci_bus *bus)
 	}
 }
 
+static struct iommu_group *pnv_pci_device_group(struct pci_controller *hose,
+						struct pci_dev *pdev)
+{
+	struct pnv_phb *phb = hose->private_data;
+	struct pnv_ioda_pe *pe;
+
+	if (WARN_ON(!phb))
+		return ERR_PTR(-ENODEV);
+
+	pe = pnv_pci_bdfn_to_pe(phb, pdev->devfn | (pdev->bus->number << 8));
+	if (!pe)
+		return ERR_PTR(-ENODEV);
+
+	if (!pe->table_group.group)
+		return ERR_PTR(-ENODEV);
+
+	return pe->table_group.group;
+}
+
 static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.dma_dev_setup		= pnv_pci_ioda_dma_dev_setup,
 	.dma_bus_setup		= pnv_pci_ioda_dma_bus_setup,
@@ -2922,6 +2941,7 @@ static const struct pci_controller_ops pnv_pci_ioda_controller_ops = {
 	.setup_bridge		= pnv_pci_fixup_bridge_resources,
 	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
 	.shutdown		= pnv_pci_ioda_shutdown,
+	.device_group		= pnv_pci_device_group,
 };
 
 static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
@@ -2932,6 +2952,20 @@ static const struct pci_controller_ops pnv_npu_ocapi_ioda_controller_ops = {
 	.shutdown		= pnv_pci_ioda_shutdown,
 };
 
+static struct attribute *spapr_tce_iommu_attrs[] = {
+	NULL,
+};
+
+static struct attribute_group spapr_tce_iommu_group = {
+	.name = "spapr-tce-iommu",
+	.attrs = spapr_tce_iommu_attrs,
+};
+
+static const struct attribute_group *spapr_tce_iommu_groups[] = {
+	&spapr_tce_iommu_group,
+	NULL,
+};
+
 static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 					 u64 hub_id, int ioda_type)
 {
@@ -3199,6 +3233,12 @@ static void __init pnv_pci_init_ioda_phb(struct device_node *np,
 
 	/* create pci_dn's for DT nodes under this PHB */
 	pci_devs_phb_init_dynamic(hose);
+
+	iommu_device_sysfs_add(&hose->iommu, hose->parent,
+			       spapr_tce_iommu_groups, "iommu%lld", phb_id);
+	rc = iommu_device_register(&hose->iommu, &spapr_tce_iommu_ops, hose->parent);
+	if (rc)
+		pr_warn("iommu_device_register returned %ld\n", rc);
 }
 
 void __init pnv_pci_init_ioda2_phb(struct device_node *np)
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-07 13:55 ` Alexey Kardashevskiy
  (?)
@ 2022-07-07 15:10   ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-07 15:10 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> the Type1 VFIO driver. Recent development though has added a coherency
> capability check to the generic part of VFIO and essentially disabled
> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
> 
> This adds an iommu_ops stub which reports support for cache
> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
> this provides minimum code for the probing to not crash.
> 
> Because now we have to set iommu_ops to the system (bus_set_iommu() or
> iommu_device_register()), this requires the POWERNV PCI setup to happen
> after bus_register(&pci_bus_type) which is postcore_initcall
> TODO: check if it still works, read sha1, for more details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
> 
> Because setting the ops triggers probing, this does not work well with
> iommu_group_add_device(), hence the move to iommu_probe_device().
> 
> Because iommu_probe_device() does not take the group (which is why
> we had the helper in the first place), this adds
> pci_controller_ops::device_group.
> 
> So, basically there is one iommu_device per PHB and devices are added to
> groups indirectly via series of calls inside the IOMMU code.
> 
> pSeries is out of scope here (a minor fix needed for barely supported
> platform in regard to VFIO).
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

I think this is basically OK, for what it is. It looks like there is
more some-day opportunity to make use of the core infrastructure though.

> does it make sense to have this many callbacks, or
> the generic IOMMU code can safely operate without some
> (given I add some more checks for !NULL)? thanks,

I wouldn't worry about it..

> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>  	pr_debug("%s: Adding %s to iommu group %d\n",
>  		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>  
> -	return iommu_group_add_device(table_group->group, dev);
> +	ret = iommu_probe_device(dev);
> +	dev_info(dev, "probed with %d\n", ret);

For another day, but it seems a bit strange to call iommu_probe_device() like this?
Shouldn't one of the existing call sites cover this? The one in
of_iommu.c perhaps?

> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
> +{
> +       return false;
> +}

I think you can NULL this op:

static bool iommu_is_attach_deferred(struct device *dev)
{
	const struct iommu_ops *ops = dev_iommu_ops(dev);

	if (ops->is_attach_deferred)
		return ops->is_attach_deferred(dev);

	return false;
}

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);
> +
> +	pdev = to_pci_dev(dev);
> +	hose = pdev->bus->sysdata;
> +
> +	if (!hose->controller_ops.device_group)
> +		return ERR_PTR(-ENOENT);
> +
> +	return hose->controller_ops.device_group(hose, pdev);
> +}

Is this missing a refcount get on the group?

> +
> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	return 0;
> +}

It is important when this returns that the iommu translation is all
emptied. There should be no left over translations from the DMA API at
this point. I have no idea how power works in this regard, but it
should be explained why this is safe in a comment at a minimum.

It will turn into a security problem to allow kernel mappings to leak
past this point.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-07 15:10   ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-07 15:10 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy

On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> the Type1 VFIO driver. Recent development though has added a coherency
> capability check to the generic part of VFIO and essentially disabled
> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
> 
> This adds an iommu_ops stub which reports support for cache
> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
> this provides minimum code for the probing to not crash.
> 
> Because now we have to set iommu_ops to the system (bus_set_iommu() or
> iommu_device_register()), this requires the POWERNV PCI setup to happen
> after bus_register(&pci_bus_type) which is postcore_initcall
> TODO: check if it still works, read sha1, for more details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
> 
> Because setting the ops triggers probing, this does not work well with
> iommu_group_add_device(), hence the move to iommu_probe_device().
> 
> Because iommu_probe_device() does not take the group (which is why
> we had the helper in the first place), this adds
> pci_controller_ops::device_group.
> 
> So, basically there is one iommu_device per PHB and devices are added to
> groups indirectly via series of calls inside the IOMMU code.
> 
> pSeries is out of scope here (a minor fix needed for barely supported
> platform in regard to VFIO).
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

I think this is basically OK, for what it is. It looks like there is
more some-day opportunity to make use of the core infrastructure though.

> does it make sense to have this many callbacks, or
> the generic IOMMU code can safely operate without some
> (given I add some more checks for !NULL)? thanks,

I wouldn't worry about it..

> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>  	pr_debug("%s: Adding %s to iommu group %d\n",
>  		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>  
> -	return iommu_group_add_device(table_group->group, dev);
> +	ret = iommu_probe_device(dev);
> +	dev_info(dev, "probed with %d\n", ret);

For another day, but it seems a bit strange to call iommu_probe_device() like this?
Shouldn't one of the existing call sites cover this? The one in
of_iommu.c perhaps?

> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
> +{
> +       return false;
> +}

I think you can NULL this op:

static bool iommu_is_attach_deferred(struct device *dev)
{
	const struct iommu_ops *ops = dev_iommu_ops(dev);

	if (ops->is_attach_deferred)
		return ops->is_attach_deferred(dev);

	return false;
}

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);
> +
> +	pdev = to_pci_dev(dev);
> +	hose = pdev->bus->sysdata;
> +
> +	if (!hose->controller_ops.device_group)
> +		return ERR_PTR(-ENOENT);
> +
> +	return hose->controller_ops.device_group(hose, pdev);
> +}

Is this missing a refcount get on the group?

> +
> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	return 0;
> +}

It is important when this returns that the iommu translation is all
emptied. There should be no left over translations from the DMA API at
this point. I have no idea how power works in this regard, but it
should be explained why this is safe in a comment at a minimum.

It will turn into a security problem to allow kernel mappings to leak
past this point.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-07 15:10   ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-07 15:10 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> the Type1 VFIO driver. Recent development though has added a coherency
> capability check to the generic part of VFIO and essentially disabled
> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
> 
> This adds an iommu_ops stub which reports support for cache
> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
> this provides minimum code for the probing to not crash.
> 
> Because now we have to set iommu_ops to the system (bus_set_iommu() or
> iommu_device_register()), this requires the POWERNV PCI setup to happen
> after bus_register(&pci_bus_type) which is postcore_initcall
> TODO: check if it still works, read sha1, for more details:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?idU37fcb319d016ce387
> 
> Because setting the ops triggers probing, this does not work well with
> iommu_group_add_device(), hence the move to iommu_probe_device().
> 
> Because iommu_probe_device() does not take the group (which is why
> we had the helper in the first place), this adds
> pci_controller_ops::device_group.
> 
> So, basically there is one iommu_device per PHB and devices are added to
> groups indirectly via series of calls inside the IOMMU code.
> 
> pSeries is out of scope here (a minor fix needed for barely supported
> platform in regard to VFIO).
> 
> The previous discussion is here:
> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/

I think this is basically OK, for what it is. It looks like there is
more some-day opportunity to make use of the core infrastructure though.

> does it make sense to have this many callbacks, or
> the generic IOMMU code can safely operate without some
> (given I add some more checks for !NULL)? thanks,

I wouldn't worry about it..

> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>  	pr_debug("%s: Adding %s to iommu group %d\n",
>  		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>  
> -	return iommu_group_add_device(table_group->group, dev);
> +	ret = iommu_probe_device(dev);
> +	dev_info(dev, "probed with %d\n", ret);

For another day, but it seems a bit strange to call iommu_probe_device() like this?
Shouldn't one of the existing call sites cover this? The one in
of_iommu.c perhaps?

> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
> +{
> +       return false;
> +}

I think you can NULL this op:

static bool iommu_is_attach_deferred(struct device *dev)
{
	const struct iommu_ops *ops = dev_iommu_ops(dev);

	if (ops->is_attach_deferred)
		return ops->is_attach_deferred(dev);

	return false;
}

> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
> +{
> +	struct pci_controller *hose;
> +	struct pci_dev *pdev;
> +
> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
> +	if (!dev_is_pci(dev))
> +		return ERR_PTR(-EPERM);
> +
> +	pdev = to_pci_dev(dev);
> +	hose = pdev->bus->sysdata;
> +
> +	if (!hose->controller_ops.device_group)
> +		return ERR_PTR(-ENOENT);
> +
> +	return hose->controller_ops.device_group(hose, pdev);
> +}

Is this missing a refcount get on the group?

> +
> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> +				      struct device *dev)
> +{
> +	return 0;
> +}

It is important when this returns that the iommu translation is all
emptied. There should be no left over translations from the DMA API at
this point. I have no idea how power works in this regard, but it
should be explained why this is safe in a comment at a minimum.

It will turn into a security problem to allow kernel mappings to leak
past this point.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-07 15:10   ` Jason Gunthorpe
  (?)
@ 2022-07-08  5:00     ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  4:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 7/8/22 01:10, Jason Gunthorpe wrote:
> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>> the Type1 VFIO driver. Recent development though has added a coherency
>> capability check to the generic part of VFIO and essentially disabled
>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>
>> This adds an iommu_ops stub which reports support for cache
>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
>> this provides minimum code for the probing to not crash.
>>
>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>> after bus_register(&pci_bus_type) which is postcore_initcall
>> TODO: check if it still works, read sha1, for more details:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?idU37fcb319d016ce387
>>
>> Because setting the ops triggers probing, this does not work well with
>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>
>> Because iommu_probe_device() does not take the group (which is why
>> we had the helper in the first place), this adds
>> pci_controller_ops::device_group.
>>
>> So, basically there is one iommu_device per PHB and devices are added to
>> groups indirectly via series of calls inside the IOMMU code.
>>
>> pSeries is out of scope here (a minor fix needed for barely supported
>> platform in regard to VFIO).
>>
>> The previous discussion is here:
>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> 
> I think this is basically OK, for what it is. It looks like there is
> more some-day opportunity to make use of the core infrastructure though.
> 
>> does it make sense to have this many callbacks, or
>> the generic IOMMU code can safely operate without some
>> (given I add some more checks for !NULL)? thanks,
> 
> I wouldn't worry about it..
> 
>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>>   	pr_debug("%s: Adding %s to iommu group %d\n",
>>   		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>>   
>> -	return iommu_group_add_device(table_group->group, dev);
>> +	ret = iommu_probe_device(dev);
>> +	dev_info(dev, "probed with %d\n", ret);
> 
> For another day, but it seems a bit strange to call iommu_probe_device() like this?
> Shouldn't one of the existing call sites cover this? The one in
> of_iommu.c perhaps?


It looks to me that of_iommu.c expects the iommu setup to happen before 
linux starts as linux looks for #iommu-cells or iommu-map properties in 
the device tree. The powernv firmware (aka skiboot) does not do this and 
it is linux which manages iommu groups.


>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>> +{
>> +       return false;
>> +}
> 
> I think you can NULL this op:
> 
> static bool iommu_is_attach_deferred(struct device *dev)
> {
> 	const struct iommu_ops *ops = dev_iommu_ops(dev);
> 
> 	if (ops->is_attach_deferred)
> 		return ops->is_attach_deferred(dev);
> 
> 	return false;
> }
> 
>> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
>> +{
>> +	struct pci_controller *hose;
>> +	struct pci_dev *pdev;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
>> +
>> +	pdev = to_pci_dev(dev);
>> +	hose = pdev->bus->sysdata;
>> +
>> +	if (!hose->controller_ops.device_group)
>> +		return ERR_PTR(-ENOENT);
>> +
>> +	return hose->controller_ops.device_group(hose, pdev);
>> +}
> 
> Is this missing a refcount get on the group?
> 
>> +
>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>> +				      struct device *dev)
>> +{
>> +	return 0;
>> +}
> 
> It is important when this returns that the iommu translation is all
> emptied. There should be no left over translations from the DMA API at
> this point. I have no idea how power works in this regard, but it
> should be explained why this is safe in a comment at a minimum.
>
 > It will turn into a security problem to allow kernel mappings to leak
 > past this point.
 >

I've added for v2 checking for no valid mappings for a device (or, more 
precisely, in the associated iommu_group), this domain does not need 
checking, right?

In general, is "domain" something from hardware or it is a software 
concept? Thanks,


> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  5:00     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  5:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 7/8/22 01:10, Jason Gunthorpe wrote:
> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>> the Type1 VFIO driver. Recent development though has added a coherency
>> capability check to the generic part of VFIO and essentially disabled
>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>
>> This adds an iommu_ops stub which reports support for cache
>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
>> this provides minimum code for the probing to not crash.
>>
>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>> after bus_register(&pci_bus_type) which is postcore_initcall
>> TODO: check if it still works, read sha1, for more details:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
>>
>> Because setting the ops triggers probing, this does not work well with
>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>
>> Because iommu_probe_device() does not take the group (which is why
>> we had the helper in the first place), this adds
>> pci_controller_ops::device_group.
>>
>> So, basically there is one iommu_device per PHB and devices are added to
>> groups indirectly via series of calls inside the IOMMU code.
>>
>> pSeries is out of scope here (a minor fix needed for barely supported
>> platform in regard to VFIO).
>>
>> The previous discussion is here:
>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> 
> I think this is basically OK, for what it is. It looks like there is
> more some-day opportunity to make use of the core infrastructure though.
> 
>> does it make sense to have this many callbacks, or
>> the generic IOMMU code can safely operate without some
>> (given I add some more checks for !NULL)? thanks,
> 
> I wouldn't worry about it..
> 
>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>>   	pr_debug("%s: Adding %s to iommu group %d\n",
>>   		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>>   
>> -	return iommu_group_add_device(table_group->group, dev);
>> +	ret = iommu_probe_device(dev);
>> +	dev_info(dev, "probed with %d\n", ret);
> 
> For another day, but it seems a bit strange to call iommu_probe_device() like this?
> Shouldn't one of the existing call sites cover this? The one in
> of_iommu.c perhaps?


It looks to me that of_iommu.c expects the iommu setup to happen before 
linux starts as linux looks for #iommu-cells or iommu-map properties in 
the device tree. The powernv firmware (aka skiboot) does not do this and 
it is linux which manages iommu groups.


>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>> +{
>> +       return false;
>> +}
> 
> I think you can NULL this op:
> 
> static bool iommu_is_attach_deferred(struct device *dev)
> {
> 	const struct iommu_ops *ops = dev_iommu_ops(dev);
> 
> 	if (ops->is_attach_deferred)
> 		return ops->is_attach_deferred(dev);
> 
> 	return false;
> }
> 
>> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
>> +{
>> +	struct pci_controller *hose;
>> +	struct pci_dev *pdev;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
>> +
>> +	pdev = to_pci_dev(dev);
>> +	hose = pdev->bus->sysdata;
>> +
>> +	if (!hose->controller_ops.device_group)
>> +		return ERR_PTR(-ENOENT);
>> +
>> +	return hose->controller_ops.device_group(hose, pdev);
>> +}
> 
> Is this missing a refcount get on the group?
> 
>> +
>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>> +				      struct device *dev)
>> +{
>> +	return 0;
>> +}
> 
> It is important when this returns that the iommu translation is all
> emptied. There should be no left over translations from the DMA API at
> this point. I have no idea how power works in this regard, but it
> should be explained why this is safe in a comment at a minimum.
>
 > It will turn into a security problem to allow kernel mappings to leak
 > past this point.
 >

I've added for v2 checking for no valid mappings for a device (or, more 
precisely, in the associated iommu_group), this domain does not need 
checking, right?

In general, is "domain" something from hardware or it is a software 
concept? Thanks,


> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  5:00     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  5:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy



On 7/8/22 01:10, Jason Gunthorpe wrote:
> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>> the Type1 VFIO driver. Recent development though has added a coherency
>> capability check to the generic part of VFIO and essentially disabled
>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>
>> This adds an iommu_ops stub which reports support for cache
>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
>> this provides minimum code for the probing to not crash.
>>
>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>> after bus_register(&pci_bus_type) which is postcore_initcall
>> TODO: check if it still works, read sha1, for more details:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
>>
>> Because setting the ops triggers probing, this does not work well with
>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>
>> Because iommu_probe_device() does not take the group (which is why
>> we had the helper in the first place), this adds
>> pci_controller_ops::device_group.
>>
>> So, basically there is one iommu_device per PHB and devices are added to
>> groups indirectly via series of calls inside the IOMMU code.
>>
>> pSeries is out of scope here (a minor fix needed for barely supported
>> platform in regard to VFIO).
>>
>> The previous discussion is here:
>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> 
> I think this is basically OK, for what it is. It looks like there is
> more some-day opportunity to make use of the core infrastructure though.
> 
>> does it make sense to have this many callbacks, or
>> the generic IOMMU code can safely operate without some
>> (given I add some more checks for !NULL)? thanks,
> 
> I wouldn't worry about it..
> 
>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>>   	pr_debug("%s: Adding %s to iommu group %d\n",
>>   		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>>   
>> -	return iommu_group_add_device(table_group->group, dev);
>> +	ret = iommu_probe_device(dev);
>> +	dev_info(dev, "probed with %d\n", ret);
> 
> For another day, but it seems a bit strange to call iommu_probe_device() like this?
> Shouldn't one of the existing call sites cover this? The one in
> of_iommu.c perhaps?


It looks to me that of_iommu.c expects the iommu setup to happen before 
linux starts as linux looks for #iommu-cells or iommu-map properties in 
the device tree. The powernv firmware (aka skiboot) does not do this and 
it is linux which manages iommu groups.


>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>> +{
>> +       return false;
>> +}
> 
> I think you can NULL this op:
> 
> static bool iommu_is_attach_deferred(struct device *dev)
> {
> 	const struct iommu_ops *ops = dev_iommu_ops(dev);
> 
> 	if (ops->is_attach_deferred)
> 		return ops->is_attach_deferred(dev);
> 
> 	return false;
> }
> 
>> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
>> +{
>> +	struct pci_controller *hose;
>> +	struct pci_dev *pdev;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
>> +
>> +	pdev = to_pci_dev(dev);
>> +	hose = pdev->bus->sysdata;
>> +
>> +	if (!hose->controller_ops.device_group)
>> +		return ERR_PTR(-ENOENT);
>> +
>> +	return hose->controller_ops.device_group(hose, pdev);
>> +}
> 
> Is this missing a refcount get on the group?
> 
>> +
>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>> +				      struct device *dev)
>> +{
>> +	return 0;
>> +}
> 
> It is important when this returns that the iommu translation is all
> emptied. There should be no left over translations from the DMA API at
> this point. I have no idea how power works in this regard, but it
> should be explained why this is safe in a comment at a minimum.
>
 > It will turn into a security problem to allow kernel mappings to leak
 > past this point.
 >

I've added for v2 checking for no valid mappings for a device (or, more 
precisely, in the associated iommu_group), this domain does not need 
checking, right?

In general, is "domain" something from hardware or it is a software 
concept? Thanks,


> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08  5:00     ` Alexey Kardashevskiy
  (?)
@ 2022-07-08  6:34       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 7/8/22 15:00, Alexey Kardashevskiy wrote:
> 
> 
> On 7/8/22 01:10, Jason Gunthorpe wrote:
>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>> the Type1 VFIO driver. Recent development though has added a coherency
>>> capability check to the generic part of VFIO and essentially disabled
>>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>>
>>> This adds an iommu_ops stub which reports support for cache
>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI 
>>> devices,
>>> this provides minimum code for the probing to not crash.
>>>
>>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>>> after bus_register(&pci_bus_type) which is postcore_initcall
>>> TODO: check if it still works, read sha1, for more details:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
>>>
>>> Because setting the ops triggers probing, this does not work well with
>>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>>
>>> Because iommu_probe_device() does not take the group (which is why
>>> we had the helper in the first place), this adds
>>> pci_controller_ops::device_group.
>>>
>>> So, basically there is one iommu_device per PHB and devices are added to
>>> groups indirectly via series of calls inside the IOMMU code.
>>>
>>> pSeries is out of scope here (a minor fix needed for barely supported
>>> platform in regard to VFIO).
>>>
>>> The previous discussion is here:
>>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
>>
>> I think this is basically OK, for what it is. It looks like there is
>> more some-day opportunity to make use of the core infrastructure though.
>>
>>> does it make sense to have this many callbacks, or
>>> the generic IOMMU code can safely operate without some
>>> (given I add some more checks for !NULL)? thanks,
>>
>> I wouldn't worry about it..
>>
>>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group 
>>> *table_group, struct device *dev)
>>>       pr_debug("%s: Adding %s to iommu group %d\n",
>>>            __func__, dev_name(dev),  
>>> iommu_group_id(table_group->group));
>>> -    return iommu_group_add_device(table_group->group, dev);
>>> +    ret = iommu_probe_device(dev);
>>> +    dev_info(dev, "probed with %d\n", ret);
>>
>> For another day, but it seems a bit strange to call 
>> iommu_probe_device() like this?
>> Shouldn't one of the existing call sites cover this? The one in
>> of_iommu.c perhaps?
> 
> 
> It looks to me that of_iommu.c expects the iommu setup to happen before 
> linux starts as linux looks for #iommu-cells or iommu-map properties in 
> the device tree. The powernv firmware (aka skiboot) does not do this and 
> it is linux which manages iommu groups.
> 
> 
>>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>>> +{
>>> +       return false;
>>> +}
>>
>> I think you can NULL this op:
>>
>> static bool iommu_is_attach_deferred(struct device *dev)
>> {
>>     const struct iommu_ops *ops = dev_iommu_ops(dev);
>>
>>     if (ops->is_attach_deferred)
>>         return ops->is_attach_deferred(dev);
>>
>>     return false;
>> }
>>
>>> +static struct iommu_group *spapr_tce_iommu_device_group(struct 
>>> device *dev)
>>> +{
>>> +    struct pci_controller *hose;
>>> +    struct pci_dev *pdev;
>>> +
>>> +    /* Weirdly iommu_device_register() assigns the same ops to all 
>>> buses */
>>> +    if (!dev_is_pci(dev))
>>> +        return ERR_PTR(-EPERM);
>>> +
>>> +    pdev = to_pci_dev(dev);
>>> +    hose = pdev->bus->sysdata;
>>> +
>>> +    if (!hose->controller_ops.device_group)
>>> +        return ERR_PTR(-ENOENT);
>>> +
>>> +    return hose->controller_ops.device_group(hose, pdev);
>>> +}
>>
>> Is this missing a refcount get on the group?
>>
>>> +
>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>> +                      struct device *dev)
>>> +{
>>> +    return 0;
>>> +}
>>
>> It is important when this returns that the iommu translation is all
>> emptied. There should be no left over translations from the DMA API at
>> this point. I have no idea how power works in this regard, but it
>> should be explained why this is safe in a comment at a minimum.
>>
>  > It will turn into a security problem to allow kernel mappings to leak
>  > past this point.
>  >
> 
> I've added for v2 checking for no valid mappings for a device (or, more 
> precisely, in the associated iommu_group), this domain does not need 
> checking, right?


Uff, not that simple. Looks like once a device is in a group, its 
dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess then 
there is a way to set those to NULL or do something similar to let
dma_map_direct() from kernel/dma/mapping.c return "true", is not there?

For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is 
fine to do nothing as tce_iommu_take_ownership() and 
tce_iommu_take_ownership_ddw() take care of not having active DMA 
mappings. Thanks,


> 
> In general, is "domain" something from hardware or it is a software 
> concept? Thanks,
> 
> 
>> Jason
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  6:34       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy



On 7/8/22 15:00, Alexey Kardashevskiy wrote:
> 
> 
> On 7/8/22 01:10, Jason Gunthorpe wrote:
>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>> the Type1 VFIO driver. Recent development though has added a coherency
>>> capability check to the generic part of VFIO and essentially disabled
>>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>>
>>> This adds an iommu_ops stub which reports support for cache
>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI 
>>> devices,
>>> this provides minimum code for the probing to not crash.
>>>
>>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>>> after bus_register(&pci_bus_type) which is postcore_initcall
>>> TODO: check if it still works, read sha1, for more details:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
>>>
>>> Because setting the ops triggers probing, this does not work well with
>>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>>
>>> Because iommu_probe_device() does not take the group (which is why
>>> we had the helper in the first place), this adds
>>> pci_controller_ops::device_group.
>>>
>>> So, basically there is one iommu_device per PHB and devices are added to
>>> groups indirectly via series of calls inside the IOMMU code.
>>>
>>> pSeries is out of scope here (a minor fix needed for barely supported
>>> platform in regard to VFIO).
>>>
>>> The previous discussion is here:
>>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
>>
>> I think this is basically OK, for what it is. It looks like there is
>> more some-day opportunity to make use of the core infrastructure though.
>>
>>> does it make sense to have this many callbacks, or
>>> the generic IOMMU code can safely operate without some
>>> (given I add some more checks for !NULL)? thanks,
>>
>> I wouldn't worry about it..
>>
>>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group 
>>> *table_group, struct device *dev)
>>>       pr_debug("%s: Adding %s to iommu group %d\n",
>>>            __func__, dev_name(dev),  
>>> iommu_group_id(table_group->group));
>>> -    return iommu_group_add_device(table_group->group, dev);
>>> +    ret = iommu_probe_device(dev);
>>> +    dev_info(dev, "probed with %d\n", ret);
>>
>> For another day, but it seems a bit strange to call 
>> iommu_probe_device() like this?
>> Shouldn't one of the existing call sites cover this? The one in
>> of_iommu.c perhaps?
> 
> 
> It looks to me that of_iommu.c expects the iommu setup to happen before 
> linux starts as linux looks for #iommu-cells or iommu-map properties in 
> the device tree. The powernv firmware (aka skiboot) does not do this and 
> it is linux which manages iommu groups.
> 
> 
>>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>>> +{
>>> +       return false;
>>> +}
>>
>> I think you can NULL this op:
>>
>> static bool iommu_is_attach_deferred(struct device *dev)
>> {
>>     const struct iommu_ops *ops = dev_iommu_ops(dev);
>>
>>     if (ops->is_attach_deferred)
>>         return ops->is_attach_deferred(dev);
>>
>>     return false;
>> }
>>
>>> +static struct iommu_group *spapr_tce_iommu_device_group(struct 
>>> device *dev)
>>> +{
>>> +    struct pci_controller *hose;
>>> +    struct pci_dev *pdev;
>>> +
>>> +    /* Weirdly iommu_device_register() assigns the same ops to all 
>>> buses */
>>> +    if (!dev_is_pci(dev))
>>> +        return ERR_PTR(-EPERM);
>>> +
>>> +    pdev = to_pci_dev(dev);
>>> +    hose = pdev->bus->sysdata;
>>> +
>>> +    if (!hose->controller_ops.device_group)
>>> +        return ERR_PTR(-ENOENT);
>>> +
>>> +    return hose->controller_ops.device_group(hose, pdev);
>>> +}
>>
>> Is this missing a refcount get on the group?
>>
>>> +
>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>> +                      struct device *dev)
>>> +{
>>> +    return 0;
>>> +}
>>
>> It is important when this returns that the iommu translation is all
>> emptied. There should be no left over translations from the DMA API at
>> this point. I have no idea how power works in this regard, but it
>> should be explained why this is safe in a comment at a minimum.
>>
>  > It will turn into a security problem to allow kernel mappings to leak
>  > past this point.
>  >
> 
> I've added for v2 checking for no valid mappings for a device (or, more 
> precisely, in the associated iommu_group), this domain does not need 
> checking, right?


Uff, not that simple. Looks like once a device is in a group, its 
dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess then 
there is a way to set those to NULL or do something similar to let
dma_map_direct() from kernel/dma/mapping.c return "true", is not there?

For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is 
fine to do nothing as tce_iommu_take_ownership() and 
tce_iommu_take_ownership_ddw() take care of not having active DMA 
mappings. Thanks,


> 
> In general, is "domain" something from hardware or it is a software 
> concept? Thanks,
> 
> 
>> Jason
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  6:34       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 7/8/22 15:00, Alexey Kardashevskiy wrote:
> 
> 
> On 7/8/22 01:10, Jason Gunthorpe wrote:
>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>> the Type1 VFIO driver. Recent development though has added a coherency
>>> capability check to the generic part of VFIO and essentially disabled
>>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>>
>>> This adds an iommu_ops stub which reports support for cache
>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI 
>>> devices,
>>> this provides minimum code for the probing to not crash.
>>>
>>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>>> after bus_register(&pci_bus_type) which is postcore_initcall
>>> TODO: check if it still works, read sha1, for more details:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?idU37fcb319d016ce387
>>>
>>> Because setting the ops triggers probing, this does not work well with
>>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>>
>>> Because iommu_probe_device() does not take the group (which is why
>>> we had the helper in the first place), this adds
>>> pci_controller_ops::device_group.
>>>
>>> So, basically there is one iommu_device per PHB and devices are added to
>>> groups indirectly via series of calls inside the IOMMU code.
>>>
>>> pSeries is out of scope here (a minor fix needed for barely supported
>>> platform in regard to VFIO).
>>>
>>> The previous discussion is here:
>>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
>>
>> I think this is basically OK, for what it is. It looks like there is
>> more some-day opportunity to make use of the core infrastructure though.
>>
>>> does it make sense to have this many callbacks, or
>>> the generic IOMMU code can safely operate without some
>>> (given I add some more checks for !NULL)? thanks,
>>
>> I wouldn't worry about it..
>>
>>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group 
>>> *table_group, struct device *dev)
>>>       pr_debug("%s: Adding %s to iommu group %d\n",
>>>            __func__, dev_name(dev),  
>>> iommu_group_id(table_group->group));
>>> -    return iommu_group_add_device(table_group->group, dev);
>>> +    ret = iommu_probe_device(dev);
>>> +    dev_info(dev, "probed with %d\n", ret);
>>
>> For another day, but it seems a bit strange to call 
>> iommu_probe_device() like this?
>> Shouldn't one of the existing call sites cover this? The one in
>> of_iommu.c perhaps?
> 
> 
> It looks to me that of_iommu.c expects the iommu setup to happen before 
> linux starts as linux looks for #iommu-cells or iommu-map properties in 
> the device tree. The powernv firmware (aka skiboot) does not do this and 
> it is linux which manages iommu groups.
> 
> 
>>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>>> +{
>>> +       return false;
>>> +}
>>
>> I think you can NULL this op:
>>
>> static bool iommu_is_attach_deferred(struct device *dev)
>> {
>>     const struct iommu_ops *ops = dev_iommu_ops(dev);
>>
>>     if (ops->is_attach_deferred)
>>         return ops->is_attach_deferred(dev);
>>
>>     return false;
>> }
>>
>>> +static struct iommu_group *spapr_tce_iommu_device_group(struct 
>>> device *dev)
>>> +{
>>> +    struct pci_controller *hose;
>>> +    struct pci_dev *pdev;
>>> +
>>> +    /* Weirdly iommu_device_register() assigns the same ops to all 
>>> buses */
>>> +    if (!dev_is_pci(dev))
>>> +        return ERR_PTR(-EPERM);
>>> +
>>> +    pdev = to_pci_dev(dev);
>>> +    hose = pdev->bus->sysdata;
>>> +
>>> +    if (!hose->controller_ops.device_group)
>>> +        return ERR_PTR(-ENOENT);
>>> +
>>> +    return hose->controller_ops.device_group(hose, pdev);
>>> +}
>>
>> Is this missing a refcount get on the group?
>>
>>> +
>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>> +                      struct device *dev)
>>> +{
>>> +    return 0;
>>> +}
>>
>> It is important when this returns that the iommu translation is all
>> emptied. There should be no left over translations from the DMA API at
>> this point. I have no idea how power works in this regard, but it
>> should be explained why this is safe in a comment at a minimum.
>>
>  > It will turn into a security problem to allow kernel mappings to leak
>  > past this point.
>  >
> 
> I've added for v2 checking for no valid mappings for a device (or, more 
> precisely, in the associated iommu_group), this domain does not need 
> checking, right?


Uff, not that simple. Looks like once a device is in a group, its 
dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess then 
there is a way to set those to NULL or do something similar to let
dma_map_direct() from kernel/dma/mapping.c return "true", is not there?

For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is 
fine to do nothing as tce_iommu_take_ownership() and 
tce_iommu_take_ownership_ddw() take care of not having active DMA 
mappings. Thanks,


> 
> In general, is "domain" something from hardware or it is a software 
> concept? Thanks,
> 
> 
>> Jason
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08  6:34       ` Alexey Kardashevskiy
  (?)
@ 2022-07-08  7:32         ` Tian, Kevin
  -1 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-08  7:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> Sent: Friday, July 8, 2022 2:35 PM
> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
> >
> >
> > On 7/8/22 01:10, Jason Gunthorpe wrote:
> >> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> >>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> >>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> >>> the Type1 VFIO driver. Recent development though has added a
> coherency
> >>> capability check to the generic part of VFIO and essentially disabled
> >>> VFIO on PPC64; the similar story about
> iommu_group_dma_owner_claimed().
> >>>
> >>> This adds an iommu_ops stub which reports support for cache
> >>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
> >>> devices,
> >>> this provides minimum code for the probing to not crash.

stale comment since this patch doesn't use bus_set_iommu() now.

> >>> +
> >>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> >>> +                      struct device *dev)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>
> >> It is important when this returns that the iommu translation is all
> >> emptied. There should be no left over translations from the DMA API at
> >> this point. I have no idea how power works in this regard, but it
> >> should be explained why this is safe in a comment at a minimum.
> >>
> >  > It will turn into a security problem to allow kernel mappings to leak
> >  > past this point.
> >  >
> >
> > I've added for v2 checking for no valid mappings for a device (or, more
> > precisely, in the associated iommu_group), this domain does not need
> > checking, right?
> 
> 
> Uff, not that simple. Looks like once a device is in a group, its
> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
> then
> there is a way to set those to NULL or do something similar to let
> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?

dev->dma_ops is NULL as long as you don't handle DMA domain type
here and don't call iommu_setup_dma_ops().

Given this only supports blocking domain then above should be irrelevant.

> 
> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
> fine to do nothing as tce_iommu_take_ownership() and
> tce_iommu_take_ownership_ddw() take care of not having active DMA
> mappings. Thanks,
> 
> 
> >
> > In general, is "domain" something from hardware or it is a software
> > concept? Thanks,
> >

'domain' is a software concept to represent the hardware I/O page
table. A blocking domain means all DMAs from a device attached to
this domain are blocked/rejected (equivalent to an empty I/O page
table), usually enforced in the .attach_dev() callback. 

Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  7:32         ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-08  7:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jason Gunthorpe
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> Sent: Friday, July 8, 2022 2:35 PM
> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
> >
> >
> > On 7/8/22 01:10, Jason Gunthorpe wrote:
> >> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
> >>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
> >>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
> >>> the Type1 VFIO driver. Recent development though has added a
> coherency
> >>> capability check to the generic part of VFIO and essentially disabled
> >>> VFIO on PPC64; the similar story about
> iommu_group_dma_owner_claimed().
> >>>
> >>> This adds an iommu_ops stub which reports support for cache
> >>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
> >>> devices,
> >>> this provides minimum code for the probing to not crash.

stale comment since this patch doesn't use bus_set_iommu() now.

> >>> +
> >>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
> >>> +                      struct device *dev)
> >>> +{
> >>> +    return 0;
> >>> +}
> >>
> >> It is important when this returns that the iommu translation is all
> >> emptied. There should be no left over translations from the DMA API at
> >> this point. I have no idea how power works in this regard, but it
> >> should be explained why this is safe in a comment at a minimum.
> >>
> >  > It will turn into a security problem to allow kernel mappings to leak
> >  > past this point.
> >  >
> >
> > I've added for v2 checking for no valid mappings for a device (or, more
> > precisely, in the associated iommu_group), this domain does not need
> > checking, right?
> 
> 
> Uff, not that simple. Looks like once a device is in a group, its
> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
> then
> there is a way to set those to NULL or do something similar to let
> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?

dev->dma_ops is NULL as long as you don't handle DMA domain type
here and don't call iommu_setup_dma_ops().

Given this only supports blocking domain then above should be irrelevant.

> 
> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
> fine to do nothing as tce_iommu_take_ownership() and
> tce_iommu_take_ownership_ddw() take care of not having active DMA
> mappings. Thanks,
> 
> 
> >
> > In general, is "domain" something from hardware or it is a software
> > concept? Thanks,
> >

'domain' is a software concept to represent the hardware I/O page
table. A blocking domain means all DMAs from a device attached to
this domain are blocked/rejected (equivalent to an empty I/O page
table), usually enforced in the .attach_dev() callback. 

Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  7:32         ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-08  7:32 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

PiBGcm9tOiBBbGV4ZXkgS2FyZGFzaGV2c2tpeSA8YWlrQG96bGFicy5ydT4NCj4gU2VudDogRnJp
ZGF5LCBKdWx5IDgsIDIwMjIgMjozNSBQTQ0KPiBPbiA3LzgvMjIgMTU6MDAsIEFsZXhleSBLYXJk
YXNoZXZza2l5IHdyb3RlOg0KPiA+DQo+ID4NCj4gPiBPbiA3LzgvMjIgMDE6MTAsIEphc29uIEd1
bnRob3JwZSB3cm90ZToNCj4gPj4gT24gVGh1LCBKdWwgMDcsIDIwMjIgYXQgMTE6NTU6NTJQTSAr
MTAwMCwgQWxleGV5IEthcmRhc2hldnNraXkgd3JvdGU6DQo+ID4+PiBIaXN0b3JpY2FsbHkgUFBD
NjQgbWFuYWdlZCB0byBhdm9pZCB1c2luZyBpb21tdV9vcHMuIFRoZSBWRklPIGRyaXZlcg0KPiA+
Pj4gdXNlcyBhIFNQQVBSIFRDRSBzdWItZHJpdmVyIGFuZCBhbGwgaW9tbXVfb3BzIHVzZXMgd2Vy
ZSBrZXB0IGluDQo+ID4+PiB0aGUgVHlwZTEgVkZJTyBkcml2ZXIuIFJlY2VudCBkZXZlbG9wbWVu
dCB0aG91Z2ggaGFzIGFkZGVkIGENCj4gY29oZXJlbmN5DQo+ID4+PiBjYXBhYmlsaXR5IGNoZWNr
IHRvIHRoZSBnZW5lcmljIHBhcnQgb2YgVkZJTyBhbmQgZXNzZW50aWFsbHkgZGlzYWJsZWQNCj4g
Pj4+IFZGSU8gb24gUFBDNjQ7IHRoZSBzaW1pbGFyIHN0b3J5IGFib3V0DQo+IGlvbW11X2dyb3Vw
X2RtYV9vd25lcl9jbGFpbWVkKCkuDQo+ID4+Pg0KPiA+Pj4gVGhpcyBhZGRzIGFuIGlvbW11X29w
cyBzdHViIHdoaWNoIHJlcG9ydHMgc3VwcG9ydCBmb3IgY2FjaGUNCj4gPj4+IGNvaGVyZW5jeS4g
QmVjYXVzZSBidXNfc2V0X2lvbW11KCkgdHJpZ2dlcnMgSU9NTVUgcHJvYmluZyBvZiBQQ0kNCj4g
Pj4+IGRldmljZXMsDQo+ID4+PiB0aGlzIHByb3ZpZGVzIG1pbmltdW0gY29kZSBmb3IgdGhlIHBy
b2JpbmcgdG8gbm90IGNyYXNoLg0KDQpzdGFsZSBjb21tZW50IHNpbmNlIHRoaXMgcGF0Y2ggZG9l
c24ndCB1c2UgYnVzX3NldF9pb21tdSgpIG5vdy4NCg0KPiA+Pj4gKw0KPiA+Pj4gK3N0YXRpYyBp
bnQgc3BhcHJfdGNlX2lvbW11X2F0dGFjaF9kZXYoc3RydWN0IGlvbW11X2RvbWFpbiAqZG9tLA0K
PiA+Pj4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCBzdHJ1Y3Qg
ZGV2aWNlICpkZXYpDQo+ID4+PiArew0KPiA+Pj4gK8KgwqDCoCByZXR1cm4gMDsNCj4gPj4+ICt9
DQo+ID4+DQo+ID4+IEl0IGlzIGltcG9ydGFudCB3aGVuIHRoaXMgcmV0dXJucyB0aGF0IHRoZSBp
b21tdSB0cmFuc2xhdGlvbiBpcyBhbGwNCj4gPj4gZW1wdGllZC4gVGhlcmUgc2hvdWxkIGJlIG5v
IGxlZnQgb3ZlciB0cmFuc2xhdGlvbnMgZnJvbSB0aGUgRE1BIEFQSSBhdA0KPiA+PiB0aGlzIHBv
aW50LiBJIGhhdmUgbm8gaWRlYSBob3cgcG93ZXIgd29ya3MgaW4gdGhpcyByZWdhcmQsIGJ1dCBp
dA0KPiA+PiBzaG91bGQgYmUgZXhwbGFpbmVkIHdoeSB0aGlzIGlzIHNhZmUgaW4gYSBjb21tZW50
IGF0IGEgbWluaW11bS4NCj4gPj4NCj4gPiAgPiBJdCB3aWxsIHR1cm4gaW50byBhIHNlY3VyaXR5
IHByb2JsZW0gdG8gYWxsb3cga2VybmVsIG1hcHBpbmdzIHRvIGxlYWsNCj4gPiAgPiBwYXN0IHRo
aXMgcG9pbnQuDQo+ID4gID4NCj4gPg0KPiA+IEkndmUgYWRkZWQgZm9yIHYyIGNoZWNraW5nIGZv
ciBubyB2YWxpZCBtYXBwaW5ncyBmb3IgYSBkZXZpY2UgKG9yLCBtb3JlDQo+ID4gcHJlY2lzZWx5
LCBpbiB0aGUgYXNzb2NpYXRlZCBpb21tdV9ncm91cCksIHRoaXMgZG9tYWluIGRvZXMgbm90IG5l
ZWQNCj4gPiBjaGVja2luZywgcmlnaHQ/DQo+IA0KPiANCj4gVWZmLCBub3QgdGhhdCBzaW1wbGUu
IExvb2tzIGxpa2Ugb25jZSBhIGRldmljZSBpcyBpbiBhIGdyb3VwLCBpdHMNCj4gZG1hX29wcyBp
cyBzZXQgdG8gaW9tbXVfZG1hX29wcyBhbmQgSU9NTVUgY29kZSBvd25zIERNQS4gSSBndWVzcw0K
PiB0aGVuDQo+IHRoZXJlIGlzIGEgd2F5IHRvIHNldCB0aG9zZSB0byBOVUxMIG9yIGRvIHNvbWV0
aGluZyBzaW1pbGFyIHRvIGxldA0KPiBkbWFfbWFwX2RpcmVjdCgpIGZyb20ga2VybmVsL2RtYS9t
YXBwaW5nLmMgcmV0dXJuICJ0cnVlIiwgaXMgbm90IHRoZXJlPw0KDQpkZXYtPmRtYV9vcHMgaXMg
TlVMTCBhcyBsb25nIGFzIHlvdSBkb24ndCBoYW5kbGUgRE1BIGRvbWFpbiB0eXBlDQpoZXJlIGFu
ZCBkb24ndCBjYWxsIGlvbW11X3NldHVwX2RtYV9vcHMoKS4NCg0KR2l2ZW4gdGhpcyBvbmx5IHN1
cHBvcnRzIGJsb2NraW5nIGRvbWFpbiB0aGVuIGFib3ZlIHNob3VsZCBiZSBpcnJlbGV2YW50Lg0K
DQo+IA0KPiBGb3Igbm93IEknbGwgYWRkIGEgY29tbWVudCBpbiBzcGFwcl90Y2VfaW9tbXVfYXR0
YWNoX2RldigpIHRoYXQgaXQgaXMNCj4gZmluZSB0byBkbyBub3RoaW5nIGFzIHRjZV9pb21tdV90
YWtlX293bmVyc2hpcCgpIGFuZA0KPiB0Y2VfaW9tbXVfdGFrZV9vd25lcnNoaXBfZGR3KCkgdGFr
ZSBjYXJlIG9mIG5vdCBoYXZpbmcgYWN0aXZlIERNQQ0KPiBtYXBwaW5ncy4gVGhhbmtzLA0KPiAN
Cj4gDQo+ID4NCj4gPiBJbiBnZW5lcmFsLCBpcyAiZG9tYWluIiBzb21ldGhpbmcgZnJvbSBoYXJk
d2FyZSBvciBpdCBpcyBhIHNvZnR3YXJlDQo+ID4gY29uY2VwdD8gVGhhbmtzLA0KPiA+DQoNCidk
b21haW4nIGlzIGEgc29mdHdhcmUgY29uY2VwdCB0byByZXByZXNlbnQgdGhlIGhhcmR3YXJlIEkv
TyBwYWdlDQp0YWJsZS4gQSBibG9ja2luZyBkb21haW4gbWVhbnMgYWxsIERNQXMgZnJvbSBhIGRl
dmljZSBhdHRhY2hlZCB0bw0KdGhpcyBkb21haW4gYXJlIGJsb2NrZWQvcmVqZWN0ZWQgKGVxdWl2
YWxlbnQgdG8gYW4gZW1wdHkgSS9PIHBhZ2UNCnRhYmxlKSwgdXN1YWxseSBlbmZvcmNlZCBpbiB0
aGUgLmF0dGFjaF9kZXYoKSBjYWxsYmFjay4gDQoNClllcywgYSBjb21tZW50IGZvciB3aHkgaGF2
aW5nIGEgTlVMTCAuYXR0YWNoX2RldigpIGlzIE9LIGlzIHdlbGNvbWVkLg0KDQpUaGFua3MNCktl
dmluDQo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08  7:32         ` Tian, Kevin
  (?)
@ 2022-07-08  9:45           ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  9:45 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 17:32, Tian, Kevin wrote:
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Sent: Friday, July 8, 2022 2:35 PM
>> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 7/8/22 01:10, Jason Gunthorpe wrote:
>>>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>>>> the Type1 VFIO driver. Recent development though has added a
>> coherency
>>>>> capability check to the generic part of VFIO and essentially disabled
>>>>> VFIO on PPC64; the similar story about
>> iommu_group_dma_owner_claimed().
>>>>>
>>>>> This adds an iommu_ops stub which reports support for cache
>>>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
>>>>> devices,
>>>>> this provides minimum code for the probing to not crash.
> 
> stale comment since this patch doesn't use bus_set_iommu() now.
> 
>>>>> +
>>>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>>>> +                      struct device *dev)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>
>>>> It is important when this returns that the iommu translation is all
>>>> emptied. There should be no left over translations from the DMA API at
>>>> this point. I have no idea how power works in this regard, but it
>>>> should be explained why this is safe in a comment at a minimum.
>>>>
>>>   > It will turn into a security problem to allow kernel mappings to leak
>>>   > past this point.
>>>   >
>>>
>>> I've added for v2 checking for no valid mappings for a device (or, more
>>> precisely, in the associated iommu_group), this domain does not need
>>> checking, right?
>>
>>
>> Uff, not that simple. Looks like once a device is in a group, its
>> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
>> then
>> there is a way to set those to NULL or do something similar to let
>> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?
> 
> dev->dma_ops is NULL as long as you don't handle DMA domain type
> here and don't call iommu_setup_dma_ops().
> 
> Given this only supports blocking domain then above should be irrelevant.
> 
>>
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
>> fine to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA
>> mappings. Thanks,
>>
>>
>>>
>>> In general, is "domain" something from hardware or it is a software
>>> concept? Thanks,
>>>
> 
> 'domain' is a software concept to represent the hardware I/O page
> table. A blocking domain means all DMAs from a device attached to
> this domain are blocked/rejected (equivalent to an empty I/O page
> table), usually enforced in the .attach_dev() callback.
> 
> Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.


Making it NULL makes __iommu_attach_device() fail, .attach_dev() needs 
to return 0 in this crippled environment. Thanks for explaining the 
rest, good food for thoughts.



> 
> Thanks
> Kevin

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  9:45           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  9:45 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy



On 08/07/2022 17:32, Tian, Kevin wrote:
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Sent: Friday, July 8, 2022 2:35 PM
>> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 7/8/22 01:10, Jason Gunthorpe wrote:
>>>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>>>> the Type1 VFIO driver. Recent development though has added a
>> coherency
>>>>> capability check to the generic part of VFIO and essentially disabled
>>>>> VFIO on PPC64; the similar story about
>> iommu_group_dma_owner_claimed().
>>>>>
>>>>> This adds an iommu_ops stub which reports support for cache
>>>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
>>>>> devices,
>>>>> this provides minimum code for the probing to not crash.
> 
> stale comment since this patch doesn't use bus_set_iommu() now.
> 
>>>>> +
>>>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>>>> +                      struct device *dev)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>
>>>> It is important when this returns that the iommu translation is all
>>>> emptied. There should be no left over translations from the DMA API at
>>>> this point. I have no idea how power works in this regard, but it
>>>> should be explained why this is safe in a comment at a minimum.
>>>>
>>>   > It will turn into a security problem to allow kernel mappings to leak
>>>   > past this point.
>>>   >
>>>
>>> I've added for v2 checking for no valid mappings for a device (or, more
>>> precisely, in the associated iommu_group), this domain does not need
>>> checking, right?
>>
>>
>> Uff, not that simple. Looks like once a device is in a group, its
>> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
>> then
>> there is a way to set those to NULL or do something similar to let
>> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?
> 
> dev->dma_ops is NULL as long as you don't handle DMA domain type
> here and don't call iommu_setup_dma_ops().
> 
> Given this only supports blocking domain then above should be irrelevant.
> 
>>
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
>> fine to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA
>> mappings. Thanks,
>>
>>
>>>
>>> In general, is "domain" something from hardware or it is a software
>>> concept? Thanks,
>>>
> 
> 'domain' is a software concept to represent the hardware I/O page
> table. A blocking domain means all DMAs from a device attached to
> this domain are blocked/rejected (equivalent to an empty I/O page
> table), usually enforced in the .attach_dev() callback.
> 
> Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.


Making it NULL makes __iommu_attach_device() fail, .attach_dev() needs 
to return 0 in this crippled environment. Thanks for explaining the 
rest, good food for thoughts.



> 
> Thanks
> Kevin

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08  9:45           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08  9:45 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 17:32, Tian, Kevin wrote:
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Sent: Friday, July 8, 2022 2:35 PM
>> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 7/8/22 01:10, Jason Gunthorpe wrote:
>>>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>>>> the Type1 VFIO driver. Recent development though has added a
>> coherency
>>>>> capability check to the generic part of VFIO and essentially disabled
>>>>> VFIO on PPC64; the similar story about
>> iommu_group_dma_owner_claimed().
>>>>>
>>>>> This adds an iommu_ops stub which reports support for cache
>>>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
>>>>> devices,
>>>>> this provides minimum code for the probing to not crash.
> 
> stale comment since this patch doesn't use bus_set_iommu() now.
> 
>>>>> +
>>>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>>>> +                      struct device *dev)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>
>>>> It is important when this returns that the iommu translation is all
>>>> emptied. There should be no left over translations from the DMA API at
>>>> this point. I have no idea how power works in this regard, but it
>>>> should be explained why this is safe in a comment at a minimum.
>>>>
>>>   > It will turn into a security problem to allow kernel mappings to leak
>>>   > past this point.
>>>   >
>>>
>>> I've added for v2 checking for no valid mappings for a device (or, more
>>> precisely, in the associated iommu_group), this domain does not need
>>> checking, right?
>>
>>
>> Uff, not that simple. Looks like once a device is in a group, its
>> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
>> then
>> there is a way to set those to NULL or do something similar to let
>> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?
> 
> dev->dma_ops is NULL as long as you don't handle DMA domain type
> here and don't call iommu_setup_dma_ops().
> 
> Given this only supports blocking domain then above should be irrelevant.
> 
>>
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
>> fine to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA
>> mappings. Thanks,
>>
>>
>>>
>>> In general, is "domain" something from hardware or it is a software
>>> concept? Thanks,
>>>
> 
> 'domain' is a software concept to represent the hardware I/O page
> table. A blocking domain means all DMAs from a device attached to
> this domain are blocked/rejected (equivalent to an empty I/O page
> table), usually enforced in the .attach_dev() callback.
> 
> Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.


Making it NULL makes __iommu_attach_device() fail, .attach_dev() needs 
to return 0 in this crippled environment. Thanks for explaining the 
rest, good food for thoughts.



> 
> Thanks
> Kevin

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08  9:45           ` Alexey Kardashevskiy
  (?)
@ 2022-07-08 10:18             ` Tian, Kevin
  -1 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-08 10:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> Sent: Friday, July 8, 2022 5:46 PM
> 
> >>>
> >>> In general, is "domain" something from hardware or it is a software
> >>> concept? Thanks,
> >>>
> >
> > 'domain' is a software concept to represent the hardware I/O page
> > table. A blocking domain means all DMAs from a device attached to
> > this domain are blocked/rejected (equivalent to an empty I/O page
> > table), usually enforced in the .attach_dev() callback.
> >
> > Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.
> 
> 
> Making it NULL makes __iommu_attach_device() fail, .attach_dev() needs
> to return 0 in this crippled environment. Thanks for explaining the
> rest, good food for thoughts.
> 

Yeah, that's a typo. 😊

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 10:18             ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-08 10:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jason Gunthorpe
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> Sent: Friday, July 8, 2022 5:46 PM
> 
> >>>
> >>> In general, is "domain" something from hardware or it is a software
> >>> concept? Thanks,
> >>>
> >
> > 'domain' is a software concept to represent the hardware I/O page
> > table. A blocking domain means all DMAs from a device attached to
> > this domain are blocked/rejected (equivalent to an empty I/O page
> > table), usually enforced in the .attach_dev() callback.
> >
> > Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.
> 
> 
> Making it NULL makes __iommu_attach_device() fail, .attach_dev() needs
> to return 0 in this crippled environment. Thanks for explaining the
> rest, good food for thoughts.
> 

Yeah, that's a typo. 😊

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 10:18             ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-08 10:18 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

PiBGcm9tOiBBbGV4ZXkgS2FyZGFzaGV2c2tpeSA8YWlrQG96bGFicy5ydT4NCj4gU2VudDogRnJp
ZGF5LCBKdWx5IDgsIDIwMjIgNTo0NiBQTQ0KPiANCj4gPj4+DQo+ID4+PiBJbiBnZW5lcmFsLCBp
cyAiZG9tYWluIiBzb21ldGhpbmcgZnJvbSBoYXJkd2FyZSBvciBpdCBpcyBhIHNvZnR3YXJlDQo+
ID4+PiBjb25jZXB0PyBUaGFua3MsDQo+ID4+Pg0KPiA+DQo+ID4gJ2RvbWFpbicgaXMgYSBzb2Z0
d2FyZSBjb25jZXB0IHRvIHJlcHJlc2VudCB0aGUgaGFyZHdhcmUgSS9PIHBhZ2UNCj4gPiB0YWJs
ZS4gQSBibG9ja2luZyBkb21haW4gbWVhbnMgYWxsIERNQXMgZnJvbSBhIGRldmljZSBhdHRhY2hl
ZCB0bw0KPiA+IHRoaXMgZG9tYWluIGFyZSBibG9ja2VkL3JlamVjdGVkIChlcXVpdmFsZW50IHRv
IGFuIGVtcHR5IEkvTyBwYWdlDQo+ID4gdGFibGUpLCB1c3VhbGx5IGVuZm9yY2VkIGluIHRoZSAu
YXR0YWNoX2RldigpIGNhbGxiYWNrLg0KPiA+DQo+ID4gWWVzLCBhIGNvbW1lbnQgZm9yIHdoeSBo
YXZpbmcgYSBOVUxMIC5hdHRhY2hfZGV2KCkgaXMgT0sgaXMgd2VsY29tZWQuDQo+IA0KPiANCj4g
TWFraW5nIGl0IE5VTEwgbWFrZXMgX19pb21tdV9hdHRhY2hfZGV2aWNlKCkgZmFpbCwgLmF0dGFj
aF9kZXYoKSBuZWVkcw0KPiB0byByZXR1cm4gMCBpbiB0aGlzIGNyaXBwbGVkIGVudmlyb25tZW50
LiBUaGFua3MgZm9yIGV4cGxhaW5pbmcgdGhlDQo+IHJlc3QsIGdvb2QgZm9vZCBmb3IgdGhvdWdo
dHMuDQo+IA0KDQpZZWFoLCB0aGF0J3MgYSB0eXBvLiDwn5iKDQo

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08  6:34       ` Alexey Kardashevskiy
  (?)
@ 2022-07-08 11:55         ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 11:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:

> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
> to do nothing as tce_iommu_take_ownership() and
> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.

That will still cause a security problem because
tce_iommu_take_ownership()/etc are called too late. This is the moment
in the flow when the ownershift must change away from the DMA API that
power implements and to VFIO, not later.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 11:55         ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 11:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy

On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:

> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
> to do nothing as tce_iommu_take_ownership() and
> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.

That will still cause a security problem because
tce_iommu_take_ownership()/etc are called too late. This is the moment
in the flow when the ownershift must change away from the DMA API that
power implements and to VFIO, not later.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 11:55         ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 11:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:

> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
> to do nothing as tce_iommu_take_ownership() and
> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.

That will still cause a security problem because
tce_iommu_take_ownership()/etc are called too late. This is the moment
in the flow when the ownershift must change away from the DMA API that
power implements and to VFIO, not later.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08 11:55         ` Jason Gunthorpe
  (?)
@ 2022-07-08 13:10           ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08 13:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 21:55, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> 
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>> to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> 
> That will still cause a security problem because
> tce_iommu_take_ownership()/etc are called too late. This is the moment
> in the flow when the ownershift must change away from the DMA API that
> power implements and to VFIO, not later.

It is getting better and better :)

On POWERNV, at the boot time the platforms sets up PHBs, enables bypass, 
creates groups and attaches devices. As for now attaching devices to the 
default domain (which is BLOCKED) fails the not-being-use check as 
enabled bypass means "everything is mapped for DMA". So at this point 
the default domain has to be IOMMU_DOMAIN_IDENTITY or 
IOMMU_DOMAIN_UNMANAGED so later on VFIO can move devices to 
IOMMU_DOMAIN_BLOCKED. Am I missing something?


> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:10           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08 13:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy



On 08/07/2022 21:55, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> 
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>> to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> 
> That will still cause a security problem because
> tce_iommu_take_ownership()/etc are called too late. This is the moment
> in the flow when the ownershift must change away from the DMA API that
> power implements and to VFIO, not later.

It is getting better and better :)

On POWERNV, at the boot time the platforms sets up PHBs, enables bypass, 
creates groups and attaches devices. As for now attaching devices to the 
default domain (which is BLOCKED) fails the not-being-use check as 
enabled bypass means "everything is mapped for DMA". So at this point 
the default domain has to be IOMMU_DOMAIN_IDENTITY or 
IOMMU_DOMAIN_UNMANAGED so later on VFIO can move devices to 
IOMMU_DOMAIN_BLOCKED. Am I missing something?


> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:10           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08 13:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 21:55, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> 
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>> to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> 
> That will still cause a security problem because
> tce_iommu_take_ownership()/etc are called too late. This is the moment
> in the flow when the ownershift must change away from the DMA API that
> power implements and to VFIO, not later.

It is getting better and better :)

On POWERNV, at the boot time the platforms sets up PHBs, enables bypass, 
creates groups and attaches devices. As for now attaching devices to the 
default domain (which is BLOCKED) fails the not-being-use check as 
enabled bypass means "everything is mapped for DMA". So at this point 
the default domain has to be IOMMU_DOMAIN_IDENTITY or 
IOMMU_DOMAIN_UNMANAGED so later on VFIO can move devices to 
IOMMU_DOMAIN_BLOCKED. Am I missing something?


> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08 13:10           ` Alexey Kardashevskiy
  (?)
@ 2022-07-08 13:19             ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 13:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 08, 2022 at 11:10:07PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 08/07/2022 21:55, Jason Gunthorpe wrote:
> > On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> > 
> > > For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
> > > to do nothing as tce_iommu_take_ownership() and
> > > tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> > 
> > That will still cause a security problem because
> > tce_iommu_take_ownership()/etc are called too late. This is the moment
> > in the flow when the ownershift must change away from the DMA API that
> > power implements and to VFIO, not later.
> 
> It is getting better and better :)
> 
> On POWERNV, at the boot time the platforms sets up PHBs, enables bypass,
> creates groups and attaches devices. As for now attaching devices to the
> default domain (which is BLOCKED) fails the not-being-use check as enabled
> bypass means "everything is mapped for DMA". So at this point the default
> domain has to be IOMMU_DOMAIN_IDENTITY or IOMMU_DOMAIN_UNMANAGED so later on
> VFIO can move devices to IOMMU_DOMAIN_BLOCKED. Am I missing something?

For power the default domain should be NULL

NULL means that the platform is using the group to provide its DMA
ops. IIRC this patch was already setup correctly to do this?

The transition from NULL to blocking must isolate the group so all DMA
is blocked. blocking to NULL should re-estbalish platform DMA API
control.

The default domain should be non-NULL when the normal dma-iommu stuff is
providing the DMA API.

So, I think it is already setup properly, it is just the question of
what to do when entering/leaving blocking mode.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:19             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 13:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy

On Fri, Jul 08, 2022 at 11:10:07PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 08/07/2022 21:55, Jason Gunthorpe wrote:
> > On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> > 
> > > For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
> > > to do nothing as tce_iommu_take_ownership() and
> > > tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> > 
> > That will still cause a security problem because
> > tce_iommu_take_ownership()/etc are called too late. This is the moment
> > in the flow when the ownershift must change away from the DMA API that
> > power implements and to VFIO, not later.
> 
> It is getting better and better :)
> 
> On POWERNV, at the boot time the platforms sets up PHBs, enables bypass,
> creates groups and attaches devices. As for now attaching devices to the
> default domain (which is BLOCKED) fails the not-being-use check as enabled
> bypass means "everything is mapped for DMA". So at this point the default
> domain has to be IOMMU_DOMAIN_IDENTITY or IOMMU_DOMAIN_UNMANAGED so later on
> VFIO can move devices to IOMMU_DOMAIN_BLOCKED. Am I missing something?

For power the default domain should be NULL

NULL means that the platform is using the group to provide its DMA
ops. IIRC this patch was already setup correctly to do this?

The transition from NULL to blocking must isolate the group so all DMA
is blocked. blocking to NULL should re-estbalish platform DMA API
control.

The default domain should be non-NULL when the normal dma-iommu stuff is
providing the DMA API.

So, I think it is already setup properly, it is just the question of
what to do when entering/leaving blocking mode.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:19             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 13:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 08, 2022 at 11:10:07PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 08/07/2022 21:55, Jason Gunthorpe wrote:
> > On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> > 
> > > For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
> > > to do nothing as tce_iommu_take_ownership() and
> > > tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> > 
> > That will still cause a security problem because
> > tce_iommu_take_ownership()/etc are called too late. This is the moment
> > in the flow when the ownershift must change away from the DMA API that
> > power implements and to VFIO, not later.
> 
> It is getting better and better :)
> 
> On POWERNV, at the boot time the platforms sets up PHBs, enables bypass,
> creates groups and attaches devices. As for now attaching devices to the
> default domain (which is BLOCKED) fails the not-being-use check as enabled
> bypass means "everything is mapped for DMA". So at this point the default
> domain has to be IOMMU_DOMAIN_IDENTITY or IOMMU_DOMAIN_UNMANAGED so later on
> VFIO can move devices to IOMMU_DOMAIN_BLOCKED. Am I missing something?

For power the default domain should be NULL

NULL means that the platform is using the group to provide its DMA
ops. IIRC this patch was already setup correctly to do this?

The transition from NULL to blocking must isolate the group so all DMA
is blocked. blocking to NULL should re-estbalish platform DMA API
control.

The default domain should be non-NULL when the normal dma-iommu stuff is
providing the DMA API.

So, I think it is already setup properly, it is just the question of
what to do when entering/leaving blocking mode.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08 13:19             ` Jason Gunthorpe
  (?)
@ 2022-07-08 13:32               ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08 13:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 23:19, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 11:10:07PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 08/07/2022 21:55, Jason Gunthorpe wrote:
>>> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
>>>
>>>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>>>> to do nothing as tce_iommu_take_ownership() and
>>>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
>>>
>>> That will still cause a security problem because
>>> tce_iommu_take_ownership()/etc are called too late. This is the moment
>>> in the flow when the ownershift must change away from the DMA API that
>>> power implements and to VFIO, not later.
>>
>> It is getting better and better :)
>>
>> On POWERNV, at the boot time the platforms sets up PHBs, enables bypass,
>> creates groups and attaches devices. As for now attaching devices to the
>> default domain (which is BLOCKED) fails the not-being-use check as enabled
>> bypass means "everything is mapped for DMA". So at this point the default
>> domain has to be IOMMU_DOMAIN_IDENTITY or IOMMU_DOMAIN_UNMANAGED so later on
>> VFIO can move devices to IOMMU_DOMAIN_BLOCKED. Am I missing something?
> 
> For power the default domain should be NULL
> 
> NULL means that the platform is using the group to provide its DMA
> ops. IIRC this patch was already setup correctly to do this?
> 
> The transition from NULL to blocking must isolate the group so all DMA
> is blocked. blocking to NULL should re-estbalish platform DMA API
> control.
> 
> The default domain should be non-NULL when the normal dma-iommu stuff is
> providing the DMA API.
> 
> So, I think it is already setup properly, it is just the question of
> what to do when entering/leaving blocking mode.



Well, the patch calls iommu_probe_device() which calls 
iommu_alloc_default_domain() which creates IOMMU_DOMAIN_BLOCKED (==0) as 
nothing initialized iommu_def_domain_type. Need a different default type 
(and return NULL when IOMMU API tries creating this type)?



> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:32               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08 13:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy



On 08/07/2022 23:19, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 11:10:07PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 08/07/2022 21:55, Jason Gunthorpe wrote:
>>> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
>>>
>>>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>>>> to do nothing as tce_iommu_take_ownership() and
>>>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
>>>
>>> That will still cause a security problem because
>>> tce_iommu_take_ownership()/etc are called too late. This is the moment
>>> in the flow when the ownershift must change away from the DMA API that
>>> power implements and to VFIO, not later.
>>
>> It is getting better and better :)
>>
>> On POWERNV, at the boot time the platforms sets up PHBs, enables bypass,
>> creates groups and attaches devices. As for now attaching devices to the
>> default domain (which is BLOCKED) fails the not-being-use check as enabled
>> bypass means "everything is mapped for DMA". So at this point the default
>> domain has to be IOMMU_DOMAIN_IDENTITY or IOMMU_DOMAIN_UNMANAGED so later on
>> VFIO can move devices to IOMMU_DOMAIN_BLOCKED. Am I missing something?
> 
> For power the default domain should be NULL
> 
> NULL means that the platform is using the group to provide its DMA
> ops. IIRC this patch was already setup correctly to do this?
> 
> The transition from NULL to blocking must isolate the group so all DMA
> is blocked. blocking to NULL should re-estbalish platform DMA API
> control.
> 
> The default domain should be non-NULL when the normal dma-iommu stuff is
> providing the DMA API.
> 
> So, I think it is already setup properly, it is just the question of
> what to do when entering/leaving blocking mode.



Well, the patch calls iommu_probe_device() which calls 
iommu_alloc_default_domain() which creates IOMMU_DOMAIN_BLOCKED (==0) as 
nothing initialized iommu_def_domain_type. Need a different default type 
(and return NULL when IOMMU API tries creating this type)?



> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:32               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-08 13:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 23:19, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 11:10:07PM +1000, Alexey Kardashevskiy wrote:
>>
>>
>> On 08/07/2022 21:55, Jason Gunthorpe wrote:
>>> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
>>>
>>>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>>>> to do nothing as tce_iommu_take_ownership() and
>>>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
>>>
>>> That will still cause a security problem because
>>> tce_iommu_take_ownership()/etc are called too late. This is the moment
>>> in the flow when the ownershift must change away from the DMA API that
>>> power implements and to VFIO, not later.
>>
>> It is getting better and better :)
>>
>> On POWERNV, at the boot time the platforms sets up PHBs, enables bypass,
>> creates groups and attaches devices. As for now attaching devices to the
>> default domain (which is BLOCKED) fails the not-being-use check as enabled
>> bypass means "everything is mapped for DMA". So at this point the default
>> domain has to be IOMMU_DOMAIN_IDENTITY or IOMMU_DOMAIN_UNMANAGED so later on
>> VFIO can move devices to IOMMU_DOMAIN_BLOCKED. Am I missing something?
> 
> For power the default domain should be NULL
> 
> NULL means that the platform is using the group to provide its DMA
> ops. IIRC this patch was already setup correctly to do this?
> 
> The transition from NULL to blocking must isolate the group so all DMA
> is blocked. blocking to NULL should re-estbalish platform DMA API
> control.
> 
> The default domain should be non-NULL when the normal dma-iommu stuff is
> providing the DMA API.
> 
> So, I think it is already setup properly, it is just the question of
> what to do when entering/leaving blocking mode.



Well, the patch calls iommu_probe_device() which calls 
iommu_alloc_default_domain() which creates IOMMU_DOMAIN_BLOCKED (=0) as 
nothing initialized iommu_def_domain_type. Need a different default type 
(and return NULL when IOMMU API tries creating this type)?



> 
> Jason

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08 13:32               ` Alexey Kardashevskiy
  (?)
@ 2022-07-08 13:59                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 13:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 08, 2022 at 11:32:58PM +1000, Alexey Kardashevskiy wrote:
> > For power the default domain should be NULL
> > 
> > NULL means that the platform is using the group to provide its DMA
> > ops. IIRC this patch was already setup correctly to do this?
> > 
> > The transition from NULL to blocking must isolate the group so all DMA
> > is blocked. blocking to NULL should re-estbalish platform DMA API
> > control.
> > 
> > The default domain should be non-NULL when the normal dma-iommu stuff is
> > providing the DMA API.
> > 
> > So, I think it is already setup properly, it is just the question of
> > what to do when entering/leaving blocking mode.
> 
> Well, the patch calls iommu_probe_device() which calls
> iommu_alloc_default_domain() which creates IOMMU_DOMAIN_BLOCKED
> (==0) as

Yes, we always create a blocking domain during probe, but it isn't
used until required

> nothing initialized iommu_def_domain_type. Need a different default type
> (and return NULL when IOMMU API tries creating this type)?

iommu_alloc_default_domain() should fail on power because none of the
domain types it tries to create are supported. This should result in a
NULL group->default_domain

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:59                 ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 13:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy

On Fri, Jul 08, 2022 at 11:32:58PM +1000, Alexey Kardashevskiy wrote:
> > For power the default domain should be NULL
> > 
> > NULL means that the platform is using the group to provide its DMA
> > ops. IIRC this patch was already setup correctly to do this?
> > 
> > The transition from NULL to blocking must isolate the group so all DMA
> > is blocked. blocking to NULL should re-estbalish platform DMA API
> > control.
> > 
> > The default domain should be non-NULL when the normal dma-iommu stuff is
> > providing the DMA API.
> > 
> > So, I think it is already setup properly, it is just the question of
> > what to do when entering/leaving blocking mode.
> 
> Well, the patch calls iommu_probe_device() which calls
> iommu_alloc_default_domain() which creates IOMMU_DOMAIN_BLOCKED
> (==0) as

Yes, we always create a blocking domain during probe, but it isn't
used until required

> nothing initialized iommu_def_domain_type. Need a different default type
> (and return NULL when IOMMU API tries creating this type)?

iommu_alloc_default_domain() should fail on power because none of the
domain types it tries to create are supported. This should result in a
NULL group->default_domain

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-08 13:59                 ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-08 13:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 08, 2022 at 11:32:58PM +1000, Alexey Kardashevskiy wrote:
> > For power the default domain should be NULL
> > 
> > NULL means that the platform is using the group to provide its DMA
> > ops. IIRC this patch was already setup correctly to do this?
> > 
> > The transition from NULL to blocking must isolate the group so all DMA
> > is blocked. blocking to NULL should re-estbalish platform DMA API
> > control.
> > 
> > The default domain should be non-NULL when the normal dma-iommu stuff is
> > providing the DMA API.
> > 
> > So, I think it is already setup properly, it is just the question of
> > what to do when entering/leaving blocking mode.
> 
> Well, the patch calls iommu_probe_device() which calls
> iommu_alloc_default_domain() which creates IOMMU_DOMAIN_BLOCKED
> (=0) as

Yes, we always create a blocking domain during probe, but it isn't
used until required

> nothing initialized iommu_def_domain_type. Need a different default type
> (and return NULL when IOMMU API tries creating this type)?

iommu_alloc_default_domain() should fail on power because none of the
domain types it tries to create are supported. This should result in a
NULL group->default_domain

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08 11:55         ` Jason Gunthorpe
  (?)
@ 2022-07-09  2:58           ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-09  2:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 08/07/2022 21:55, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> 
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>> to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> 
> That will still cause a security problem because
> tce_iommu_take_ownership()/etc are called too late. This is the moment
> in the flow when the ownershift must change away from the DMA API that
> power implements and to VFIO, not later.

Trying to do that.

vfio_group_set_container:
     iommu_group_claim_dma_owner
     driver->ops->attach_group

iommu_group_claim_dma_owner sets a domain to a group. Good. But it 
attaches devices, not groups. Bad.

driver->ops->attach_group on POWER attaches a group so VFIO claims 
ownership over a group, not devices. Underlying API 
(pnv_ioda2_take_ownership()) does not need to keep track of the state, 
it is one group, one ownership transfer, easy.


What is exactly the reason why iommu_group_claim_dma_owner() cannot stay 
inside Type1 (sorry if it was explained, I could have missed)?



Also, from another mail, you said iommu_alloc_default_domain() should 
fail on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or 
the whole iommu_group_claim_dma_owner() thing falls apart.
And iommu_ops::domain_alloc() is not told if it is asked to create a 
default domain, it only takes a type.



-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-09  2:58           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-09  2:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy, David Gibson



On 08/07/2022 21:55, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> 
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>> to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> 
> That will still cause a security problem because
> tce_iommu_take_ownership()/etc are called too late. This is the moment
> in the flow when the ownershift must change away from the DMA API that
> power implements and to VFIO, not later.

Trying to do that.

vfio_group_set_container:
     iommu_group_claim_dma_owner
     driver->ops->attach_group

iommu_group_claim_dma_owner sets a domain to a group. Good. But it 
attaches devices, not groups. Bad.

driver->ops->attach_group on POWER attaches a group so VFIO claims 
ownership over a group, not devices. Underlying API 
(pnv_ioda2_take_ownership()) does not need to keep track of the state, 
it is one group, one ownership transfer, easy.


What is exactly the reason why iommu_group_claim_dma_owner() cannot stay 
inside Type1 (sorry if it was explained, I could have missed)?



Also, from another mail, you said iommu_alloc_default_domain() should 
fail on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or 
the whole iommu_group_claim_dma_owner() thing falls apart.
And iommu_ops::domain_alloc() is not told if it is asked to create a 
default domain, it only takes a type.



-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-09  2:58           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-09  2:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 08/07/2022 21:55, Jason Gunthorpe wrote:
> On Fri, Jul 08, 2022 at 04:34:55PM +1000, Alexey Kardashevskiy wrote:
> 
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is fine
>> to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA mappings.
> 
> That will still cause a security problem because
> tce_iommu_take_ownership()/etc are called too late. This is the moment
> in the flow when the ownershift must change away from the DMA API that
> power implements and to VFIO, not later.

Trying to do that.

vfio_group_set_container:
     iommu_group_claim_dma_owner
     driver->ops->attach_group

iommu_group_claim_dma_owner sets a domain to a group. Good. But it 
attaches devices, not groups. Bad.

driver->ops->attach_group on POWER attaches a group so VFIO claims 
ownership over a group, not devices. Underlying API 
(pnv_ioda2_take_ownership()) does not need to keep track of the state, 
it is one group, one ownership transfer, easy.


What is exactly the reason why iommu_group_claim_dma_owner() cannot stay 
inside Type1 (sorry if it was explained, I could have missed)?



Also, from another mail, you said iommu_alloc_default_domain() should 
fail on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or 
the whole iommu_group_claim_dma_owner() thing falls apart.
And iommu_ops::domain_alloc() is not told if it is asked to create a 
default domain, it only takes a type.



-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-09  2:58           ` Alexey Kardashevskiy
  (?)
@ 2022-07-10  6:29             ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-10  6:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson

On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
 
> driver->ops->attach_group on POWER attaches a group so VFIO claims ownership
> over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does
> not need to keep track of the state, it is one group, one ownership
> transfer, easy.

It should not change, I think you can just map the attach_dev to the group?

> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
> inside Type1 (sorry if it was explained, I could have missed)?

It has nothing to do with type1 - the ownership system is designed to
exclude other in-kernel drivers from using the group at the same time
vfio is using the group. power still needs this protection regardless
of if is using the formal iommu api or not.

> Also, from another mail, you said iommu_alloc_default_domain() should fail
> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the whole
> iommu_group_claim_dma_owner() thing falls apart.

Yes

> And iommu_ops::domain_alloc() is not told if it is asked to create a default
> domain, it only takes a type.

"default domain" refers to the default type pased to domain_alloc(),
it will never be blocking, so it will always fail on power.

"default domain" is better understood as the domain used by the DMA
API

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-10  6:29             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-10  6:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy, David Gibson

On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
 
> driver->ops->attach_group on POWER attaches a group so VFIO claims ownership
> over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does
> not need to keep track of the state, it is one group, one ownership
> transfer, easy.

It should not change, I think you can just map the attach_dev to the group?

> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
> inside Type1 (sorry if it was explained, I could have missed)?

It has nothing to do with type1 - the ownership system is designed to
exclude other in-kernel drivers from using the group at the same time
vfio is using the group. power still needs this protection regardless
of if is using the formal iommu api or not.

> Also, from another mail, you said iommu_alloc_default_domain() should fail
> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the whole
> iommu_group_claim_dma_owner() thing falls apart.

Yes

> And iommu_ops::domain_alloc() is not told if it is asked to create a default
> domain, it only takes a type.

"default domain" refers to the default type pased to domain_alloc(),
it will never be blocking, so it will always fail on power.

"default domain" is better understood as the domain used by the DMA
API

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-10  6:29             ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-10  6:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson

On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
 
> driver->ops->attach_group on POWER attaches a group so VFIO claims ownership
> over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does
> not need to keep track of the state, it is one group, one ownership
> transfer, easy.

It should not change, I think you can just map the attach_dev to the group?

> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
> inside Type1 (sorry if it was explained, I could have missed)?

It has nothing to do with type1 - the ownership system is designed to
exclude other in-kernel drivers from using the group at the same time
vfio is using the group. power still needs this protection regardless
of if is using the formal iommu api or not.

> Also, from another mail, you said iommu_alloc_default_domain() should fail
> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the whole
> iommu_group_claim_dma_owner() thing falls apart.

Yes

> And iommu_ops::domain_alloc() is not told if it is asked to create a default
> domain, it only takes a type.

"default domain" refers to the default type pased to domain_alloc(),
it will never be blocking, so it will always fail on power.

"default domain" is better understood as the domain used by the DMA
API

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-10  6:29             ` Jason Gunthorpe
  (?)
@ 2022-07-10 12:32               ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-10 12:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 10/07/2022 16:29, Jason Gunthorpe wrote:
> On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
>   
>> driver->ops->attach_group on POWER attaches a group so VFIO claims ownership
>> over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does
>> not need to keep track of the state, it is one group, one ownership
>> transfer, easy.
> 
> It should not change, I think you can just map the attach_dev to the group?

There are multiple devices in a group, cannot just map 1:1.


>> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
>> inside Type1 (sorry if it was explained, I could have missed)?
> 
> It has nothing to do with type1 - the ownership system is designed to
> exclude other in-kernel drivers from using the group at the same time
> vfio is using the group. power still needs this protection regardless
> of if is using the formal iommu api or not.

POWER deals with it in vfio_iommu_driver_ops::attach_group.

>> Also, from another mail, you said iommu_alloc_default_domain() should fail
>> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the whole
>> iommu_group_claim_dma_owner() thing falls apart.
> 
> Yes
> 
>> And iommu_ops::domain_alloc() is not told if it is asked to create a default
>> domain, it only takes a type.
> 
> "default domain" refers to the default type pased to domain_alloc(),
> it will never be blocking, so it will always fail on power.
> "default domain" is better understood as the domain used by the DMA
> API

The DMA API on POWER does not use iommu_ops, it is dma_iommu_ops from 
arch/powerpc/kernel/dma-iommu.c from before 2005. so the default domain 
is type == 0 where 0 == BLOCKED.


-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-10 12:32               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-10 12:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy, David Gibson



On 10/07/2022 16:29, Jason Gunthorpe wrote:
> On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
>   
>> driver->ops->attach_group on POWER attaches a group so VFIO claims ownership
>> over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does
>> not need to keep track of the state, it is one group, one ownership
>> transfer, easy.
> 
> It should not change, I think you can just map the attach_dev to the group?

There are multiple devices in a group, cannot just map 1:1.


>> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
>> inside Type1 (sorry if it was explained, I could have missed)?
> 
> It has nothing to do with type1 - the ownership system is designed to
> exclude other in-kernel drivers from using the group at the same time
> vfio is using the group. power still needs this protection regardless
> of if is using the formal iommu api or not.

POWER deals with it in vfio_iommu_driver_ops::attach_group.

>> Also, from another mail, you said iommu_alloc_default_domain() should fail
>> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the whole
>> iommu_group_claim_dma_owner() thing falls apart.
> 
> Yes
> 
>> And iommu_ops::domain_alloc() is not told if it is asked to create a default
>> domain, it only takes a type.
> 
> "default domain" refers to the default type pased to domain_alloc(),
> it will never be blocking, so it will always fail on power.
> "default domain" is better understood as the domain used by the DMA
> API

The DMA API on POWER does not use iommu_ops, it is dma_iommu_ops from 
arch/powerpc/kernel/dma-iommu.c from before 2005. so the default domain 
is type == 0 where 0 == BLOCKED.


-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-10 12:32               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-10 12:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 10/07/2022 16:29, Jason Gunthorpe wrote:
> On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
>   
>> driver->ops->attach_group on POWER attaches a group so VFIO claims ownership
>> over a group, not devices. Underlying API (pnv_ioda2_take_ownership()) does
>> not need to keep track of the state, it is one group, one ownership
>> transfer, easy.
> 
> It should not change, I think you can just map the attach_dev to the group?

There are multiple devices in a group, cannot just map 1:1.


>> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
>> inside Type1 (sorry if it was explained, I could have missed)?
> 
> It has nothing to do with type1 - the ownership system is designed to
> exclude other in-kernel drivers from using the group at the same time
> vfio is using the group. power still needs this protection regardless
> of if is using the formal iommu api or not.

POWER deals with it in vfio_iommu_driver_ops::attach_group.

>> Also, from another mail, you said iommu_alloc_default_domain() should fail
>> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the whole
>> iommu_group_claim_dma_owner() thing falls apart.
> 
> Yes
> 
>> And iommu_ops::domain_alloc() is not told if it is asked to create a default
>> domain, it only takes a type.
> 
> "default domain" refers to the default type pased to domain_alloc(),
> it will never be blocking, so it will always fail on power.
> "default domain" is better understood as the domain used by the DMA
> API

The DMA API on POWER does not use iommu_ops, it is dma_iommu_ops from 
arch/powerpc/kernel/dma-iommu.c from before 2005. so the default domain 
is type = 0 where 0 = BLOCKED.


-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-10 12:32               ` Alexey Kardashevskiy
  (?)
@ 2022-07-11 13:24                 ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-11 13:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 10/07/2022 22:32, Alexey Kardashevskiy wrote:
> 
> 
> On 10/07/2022 16:29, Jason Gunthorpe wrote:
>> On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
>>> driver->ops->attach_group on POWER attaches a group so VFIO claims 
>>> ownership
>>> over a group, not devices. Underlying API 
>>> (pnv_ioda2_take_ownership()) does
>>> not need to keep track of the state, it is one group, one ownership
>>> transfer, easy.
>>
>> It should not change, I think you can just map the attach_dev to the 
>> group?
> 
> There are multiple devices in a group, cannot just map 1:1.
> 
> 
>>> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
>>> inside Type1 (sorry if it was explained, I could have missed)?
>>
>> It has nothing to do with type1 - the ownership system is designed to
>> exclude other in-kernel drivers from using the group at the same time
>> vfio is using the group. power still needs this protection regardless
>> of if is using the formal iommu api or not.
> 
> POWER deals with it in vfio_iommu_driver_ops::attach_group.


I really think that for 5.19 we should really move this blocked domain 
business to Type1 like this:

https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf

Thanks,


>>> Also, from another mail, you said iommu_alloc_default_domain() should 
>>> fail
>>> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the 
>>> whole
>>> iommu_group_claim_dma_owner() thing falls apart.
>>
>> Yes
>>
>>> And iommu_ops::domain_alloc() is not told if it is asked to create a 
>>> default
>>> domain, it only takes a type.
>>
>> "default domain" refers to the default type pased to domain_alloc(),
>> it will never be blocking, so it will always fail on power.
>> "default domain" is better understood as the domain used by the DMA
>> API
> 
> The DMA API on POWER does not use iommu_ops, it is dma_iommu_ops from 
> arch/powerpc/kernel/dma-iommu.c from before 2005. so the default domain 
> is type == 0 where 0 == BLOCKED.
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-11 13:24                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-11 13:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy, David Gibson



On 10/07/2022 22:32, Alexey Kardashevskiy wrote:
> 
> 
> On 10/07/2022 16:29, Jason Gunthorpe wrote:
>> On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
>>> driver->ops->attach_group on POWER attaches a group so VFIO claims 
>>> ownership
>>> over a group, not devices. Underlying API 
>>> (pnv_ioda2_take_ownership()) does
>>> not need to keep track of the state, it is one group, one ownership
>>> transfer, easy.
>>
>> It should not change, I think you can just map the attach_dev to the 
>> group?
> 
> There are multiple devices in a group, cannot just map 1:1.
> 
> 
>>> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
>>> inside Type1 (sorry if it was explained, I could have missed)?
>>
>> It has nothing to do with type1 - the ownership system is designed to
>> exclude other in-kernel drivers from using the group at the same time
>> vfio is using the group. power still needs this protection regardless
>> of if is using the formal iommu api or not.
> 
> POWER deals with it in vfio_iommu_driver_ops::attach_group.


I really think that for 5.19 we should really move this blocked domain 
business to Type1 like this:

https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf

Thanks,


>>> Also, from another mail, you said iommu_alloc_default_domain() should 
>>> fail
>>> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the 
>>> whole
>>> iommu_group_claim_dma_owner() thing falls apart.
>>
>> Yes
>>
>>> And iommu_ops::domain_alloc() is not told if it is asked to create a 
>>> default
>>> domain, it only takes a type.
>>
>> "default domain" refers to the default type pased to domain_alloc(),
>> it will never be blocking, so it will always fail on power.
>> "default domain" is better understood as the domain used by the DMA
>> API
> 
> The DMA API on POWER does not use iommu_ops, it is dma_iommu_ops from 
> arch/powerpc/kernel/dma-iommu.c from before 2005. so the default domain 
> is type == 0 where 0 == BLOCKED.
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-11 13:24                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-11 13:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 10/07/2022 22:32, Alexey Kardashevskiy wrote:
> 
> 
> On 10/07/2022 16:29, Jason Gunthorpe wrote:
>> On Sat, Jul 09, 2022 at 12:58:00PM +1000, Alexey Kardashevskiy wrote:
>>> driver->ops->attach_group on POWER attaches a group so VFIO claims 
>>> ownership
>>> over a group, not devices. Underlying API 
>>> (pnv_ioda2_take_ownership()) does
>>> not need to keep track of the state, it is one group, one ownership
>>> transfer, easy.
>>
>> It should not change, I think you can just map the attach_dev to the 
>> group?
> 
> There are multiple devices in a group, cannot just map 1:1.
> 
> 
>>> What is exactly the reason why iommu_group_claim_dma_owner() cannot stay
>>> inside Type1 (sorry if it was explained, I could have missed)?
>>
>> It has nothing to do with type1 - the ownership system is designed to
>> exclude other in-kernel drivers from using the group at the same time
>> vfio is using the group. power still needs this protection regardless
>> of if is using the formal iommu api or not.
> 
> POWER deals with it in vfio_iommu_driver_ops::attach_group.


I really think that for 5.19 we should really move this blocked domain 
business to Type1 like this:

https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf

Thanks,


>>> Also, from another mail, you said iommu_alloc_default_domain() should 
>>> fail
>>> on power but at least IOMMU_DOMAIN_BLOCKED must be supported, or the 
>>> whole
>>> iommu_group_claim_dma_owner() thing falls apart.
>>
>> Yes
>>
>>> And iommu_ops::domain_alloc() is not told if it is asked to create a 
>>> default
>>> domain, it only takes a type.
>>
>> "default domain" refers to the default type pased to domain_alloc(),
>> it will never be blocking, so it will always fail on power.
>> "default domain" is better understood as the domain used by the DMA
>> API
> 
> The DMA API on POWER does not use iommu_ops, it is dma_iommu_ops from 
> arch/powerpc/kernel/dma-iommu.c from before 2005. so the default domain 
> is type = 0 where 0 = BLOCKED.
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-11 13:24                 ` Alexey Kardashevskiy
  (?)
@ 2022-07-11 18:46                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-11 18:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson

On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:

> I really think that for 5.19 we should really move this blocked domain
> business to Type1 like this:
> 
> https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf

This creates the same security bug for power we are discussing here. If you
don't want to fix it then lets just merge this iommu_ops patch as is rather than
mangle the core code.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-11 18:46                   ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-11 18:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy, David Gibson

On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:

> I really think that for 5.19 we should really move this blocked domain
> business to Type1 like this:
> 
> https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf

This creates the same security bug for power we are discussing here. If you
don't want to fix it then lets just merge this iommu_ops patch as is rather than
mangle the core code.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-11 18:46                   ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-11 18:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson

On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:

> I really think that for 5.19 we should really move this blocked domain
> business to Type1 like this:
> 
> https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf

This creates the same security bug for power we are discussing here. If you
don't want to fix it then lets just merge this iommu_ops patch as is rather than
mangle the core code.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-11 18:46                   ` Jason Gunthorpe
  (?)
@ 2022-07-12  2:27                     ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-12  2:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 7/12/22 04:46, Jason Gunthorpe wrote:
> On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:
> 
>> I really think that for 5.19 we should really move this blocked domain
>> business to Type1 like this:
>>
>> https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf
> 
> This creates the same security bug for power we are discussing here. If you

How so? attach_dev() on power makes uninitalizes DMA setup for the group 
on the hardware level, any other DMA user won't be able to initiate DMA.


> don't want to fix it then lets just merge this iommu_ops patch as is rather than
> mangle the core code.

The core code should not be assuming iommu_ops != NULL, Type1 should, I 
thought it is the whole point of having Type1, why is not it the case 
anymore?


-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-12  2:27                     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-12  2:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy, David Gibson



On 7/12/22 04:46, Jason Gunthorpe wrote:
> On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:
> 
>> I really think that for 5.19 we should really move this blocked domain
>> business to Type1 like this:
>>
>> https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf
> 
> This creates the same security bug for power we are discussing here. If you

How so? attach_dev() on power makes uninitalizes DMA setup for the group 
on the hardware level, any other DMA user won't be able to initiate DMA.


> don't want to fix it then lets just merge this iommu_ops patch as is rather than
> mangle the core code.

The core code should not be assuming iommu_ops != NULL, Type1 should, I 
thought it is the whole point of having Type1, why is not it the case 
anymore?


-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-12  2:27                     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-12  2:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson



On 7/12/22 04:46, Jason Gunthorpe wrote:
> On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:
> 
>> I really think that for 5.19 we should really move this blocked domain
>> business to Type1 like this:
>>
>> https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf
> 
> This creates the same security bug for power we are discussing here. If you

How so? attach_dev() on power makes uninitalizes DMA setup for the group 
on the hardware level, any other DMA user won't be able to initiate DMA.


> don't want to fix it then lets just merge this iommu_ops patch as is rather than
> mangle the core code.

The core code should not be assuming iommu_ops != NULL, Type1 should, I 
thought it is the whole point of having Type1, why is not it the case 
anymore?


-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-12  2:27                     ` Alexey Kardashevskiy
  (?)
@ 2022-07-12  5:44                       ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-12  5:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson

On Tue, Jul 12, 2022 at 12:27:17PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 7/12/22 04:46, Jason Gunthorpe wrote:
> > On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:
> > 
> > > I really think that for 5.19 we should really move this blocked domain
> > > business to Type1 like this:
> > > 
> > > https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf
> > 
> > This creates the same security bug for power we are discussing here. If you
> 
> How so? attach_dev() on power makes uninitalizes DMA setup for the group on
> the hardware level, any other DMA user won't be able to initiate DMA.

We removed all the code from VFIO that prevented dma driver conflicts
and lowered into the new APIs. You have to use these new APIs or
there are problems with exclusivity of the group.

The previous code that was allowing power to work safely doesn't exist
any more, which is why you can't just ignore these apis for
type2.

They have nothing to do with the vfio 'type', they are all about
arbitrating who gets to use the group or not and making a safe hand
off protocol from one group owner to the other. Since power says it
has groups it must implement the sharing protocol for groups.

> > don't want to fix it then lets just merge this iommu_ops patch as is rather than
> > mangle the core code.
> 
> The core code should not be assuming iommu_ops != NULL, Type1 should, I
> thought it is the whole point of having Type1, why is not it the case
> anymore?

Architectures should not be creating iommu groups without providing
proper iommu subsystem support. The half baked use of the iommu
subsystem in power is the problem here.

Adding the ops and starting to use the subsystem properly is the
correct thing to do, even if you can't complete every corner right
now. At least the issues are limited to arch code and can be fixed by
arch maintainers.

I think the patch you have here is fine to fix vfio on power and it
should simply be merged for v5.19 and power folks can further work on
this in the later cycles.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-12  5:44                       ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-12  5:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Joerg Roedel, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy, David Gibson

On Tue, Jul 12, 2022 at 12:27:17PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 7/12/22 04:46, Jason Gunthorpe wrote:
> > On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:
> > 
> > > I really think that for 5.19 we should really move this blocked domain
> > > business to Type1 like this:
> > > 
> > > https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf
> > 
> > This creates the same security bug for power we are discussing here. If you
> 
> How so? attach_dev() on power makes uninitalizes DMA setup for the group on
> the hardware level, any other DMA user won't be able to initiate DMA.

We removed all the code from VFIO that prevented dma driver conflicts
and lowered into the new APIs. You have to use these new APIs or
there are problems with exclusivity of the group.

The previous code that was allowing power to work safely doesn't exist
any more, which is why you can't just ignore these apis for
type2.

They have nothing to do with the vfio 'type', they are all about
arbitrating who gets to use the group or not and making a safe hand
off protocol from one group owner to the other. Since power says it
has groups it must implement the sharing protocol for groups.

> > don't want to fix it then lets just merge this iommu_ops patch as is rather than
> > mangle the core code.
> 
> The core code should not be assuming iommu_ops != NULL, Type1 should, I
> thought it is the whole point of having Type1, why is not it the case
> anymore?

Architectures should not be creating iommu groups without providing
proper iommu subsystem support. The half baked use of the iommu
subsystem in power is the problem here.

Adding the ops and starting to use the subsystem properly is the
correct thing to do, even if you can't complete every corner right
now. At least the issues are limited to arch code and can be fixed by
arch maintainers.

I think the patch you have here is fine to fix vfio on power and it
should simply be merged for v5.19 and power folks can further work on
this in the later cycles.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-12  5:44                       ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-12  5:44 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Joerg Roedel,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin, David Gibson

On Tue, Jul 12, 2022 at 12:27:17PM +1000, Alexey Kardashevskiy wrote:
> 
> 
> On 7/12/22 04:46, Jason Gunthorpe wrote:
> > On Mon, Jul 11, 2022 at 11:24:32PM +1000, Alexey Kardashevskiy wrote:
> > 
> > > I really think that for 5.19 we should really move this blocked domain
> > > business to Type1 like this:
> > > 
> > > https://github.com/aik/linux/commit/96f80c8db03b181398ad355f6f90e574c3ada4bf
> > 
> > This creates the same security bug for power we are discussing here. If you
> 
> How so? attach_dev() on power makes uninitalizes DMA setup for the group on
> the hardware level, any other DMA user won't be able to initiate DMA.

We removed all the code from VFIO that prevented dma driver conflicts
and lowered into the new APIs. You have to use these new APIs or
there are problems with exclusivity of the group.

The previous code that was allowing power to work safely doesn't exist
any more, which is why you can't just ignore these apis for
type2.

They have nothing to do with the vfio 'type', they are all about
arbitrating who gets to use the group or not and making a safe hand
off protocol from one group owner to the other. Since power says it
has groups it must implement the sharing protocol for groups.

> > don't want to fix it then lets just merge this iommu_ops patch as is rather than
> > mangle the core code.
> 
> The core code should not be assuming iommu_ops != NULL, Type1 should, I
> thought it is the whole point of having Type1, why is not it the case
> anymore?

Architectures should not be creating iommu groups without providing
proper iommu subsystem support. The half baked use of the iommu
subsystem in power is the problem here.

Adding the ops and starting to use the subsystem properly is the
correct thing to do, even if you can't complete every corner right
now. At least the issues are limited to arch code and can be fixed by
arch maintainers.

I think the patch you have here is fine to fix vfio on power and it
should simply be merged for v5.19 and power folks can further work on
this in the later cycles.

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-08  7:32         ` Tian, Kevin
  (?)
@ 2022-07-29  2:21           ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-29  2:21 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 17:32, Tian, Kevin wrote:
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Sent: Friday, July 8, 2022 2:35 PM
>> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 7/8/22 01:10, Jason Gunthorpe wrote:
>>>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>>>> the Type1 VFIO driver. Recent development though has added a
>> coherency
>>>>> capability check to the generic part of VFIO and essentially disabled
>>>>> VFIO on PPC64; the similar story about
>> iommu_group_dma_owner_claimed().
>>>>>
>>>>> This adds an iommu_ops stub which reports support for cache
>>>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
>>>>> devices,
>>>>> this provides minimum code for the probing to not crash.
> 
> stale comment since this patch doesn't use bus_set_iommu() now.
> 
>>>>> +
>>>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>>>> +                      struct device *dev)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>
>>>> It is important when this returns that the iommu translation is all
>>>> emptied. There should be no left over translations from the DMA API at
>>>> this point. I have no idea how power works in this regard, but it
>>>> should be explained why this is safe in a comment at a minimum.
>>>>
>>>   > It will turn into a security problem to allow kernel mappings to leak
>>>   > past this point.
>>>   >
>>>
>>> I've added for v2 checking for no valid mappings for a device (or, more
>>> precisely, in the associated iommu_group), this domain does not need
>>> checking, right?
>>
>>
>> Uff, not that simple. Looks like once a device is in a group, its
>> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
>> then
>> there is a way to set those to NULL or do something similar to let
>> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?
> 
> dev->dma_ops is NULL as long as you don't handle DMA domain type
> here and don't call iommu_setup_dma_ops().
> 
> Given this only supports blocking domain then above should be irrelevant.
> 
>>
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
>> fine to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA
>> mappings. Thanks,
>>
>>
>>>
>>> In general, is "domain" something from hardware or it is a software
>>> concept? Thanks,
>>>
> 
> 'domain' is a software concept to represent the hardware I/O page
> table. 


About this. If a platform has a concept of explicit DMA windows (2 or 
more), is it one domain with 2 windows or 2 domains with one window each?

If it is 2 windows, iommu_domain_ops misses windows manipulation 
callbacks (I vaguely remember it being there for embedded PPC64 but 
cannot find it quickly).

If it is 1 window per a domain, then can a device be attached to 2 
domains at least in theory (I suspect not)?

On server POWER CPUs, each DMA window is backed by an independent IOMMU 
page table. (reminder) A window is a bus address range where devices are 
allowed to DMA to/from ;)

Thanks,


> A blocking domain means all DMAs from a device attached to
> this domain are blocked/rejected (equivalent to an empty I/O page
> table), usually enforced in the .attach_dev() callback.
> 
> Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.
> 
> Thanks
> Kevin

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  2:21           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-29  2:21 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, Robin Murphy



On 08/07/2022 17:32, Tian, Kevin wrote:
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Sent: Friday, July 8, 2022 2:35 PM
>> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 7/8/22 01:10, Jason Gunthorpe wrote:
>>>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>>>> the Type1 VFIO driver. Recent development though has added a
>> coherency
>>>>> capability check to the generic part of VFIO and essentially disabled
>>>>> VFIO on PPC64; the similar story about
>> iommu_group_dma_owner_claimed().
>>>>>
>>>>> This adds an iommu_ops stub which reports support for cache
>>>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
>>>>> devices,
>>>>> this provides minimum code for the probing to not crash.
> 
> stale comment since this patch doesn't use bus_set_iommu() now.
> 
>>>>> +
>>>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>>>> +                      struct device *dev)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>
>>>> It is important when this returns that the iommu translation is all
>>>> emptied. There should be no left over translations from the DMA API at
>>>> this point. I have no idea how power works in this regard, but it
>>>> should be explained why this is safe in a comment at a minimum.
>>>>
>>>   > It will turn into a security problem to allow kernel mappings to leak
>>>   > past this point.
>>>   >
>>>
>>> I've added for v2 checking for no valid mappings for a device (or, more
>>> precisely, in the associated iommu_group), this domain does not need
>>> checking, right?
>>
>>
>> Uff, not that simple. Looks like once a device is in a group, its
>> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
>> then
>> there is a way to set those to NULL or do something similar to let
>> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?
> 
> dev->dma_ops is NULL as long as you don't handle DMA domain type
> here and don't call iommu_setup_dma_ops().
> 
> Given this only supports blocking domain then above should be irrelevant.
> 
>>
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
>> fine to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA
>> mappings. Thanks,
>>
>>
>>>
>>> In general, is "domain" something from hardware or it is a software
>>> concept? Thanks,
>>>
> 
> 'domain' is a software concept to represent the hardware I/O page
> table. 


About this. If a platform has a concept of explicit DMA windows (2 or 
more), is it one domain with 2 windows or 2 domains with one window each?

If it is 2 windows, iommu_domain_ops misses windows manipulation 
callbacks (I vaguely remember it being there for embedded PPC64 but 
cannot find it quickly).

If it is 1 window per a domain, then can a device be attached to 2 
domains at least in theory (I suspect not)?

On server POWER CPUs, each DMA window is backed by an independent IOMMU 
page table. (reminder) A window is a bus address range where devices are 
allowed to DMA to/from ;)

Thanks,


> A blocking domain means all DMAs from a device attached to
> this domain are blocked/rejected (equivalent to an empty I/O page
> table), usually enforced in the .attach_dev() callback.
> 
> Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.
> 
> Thanks
> Kevin

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  2:21           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-29  2:21 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: linuxppc-dev, Robin Murphy, Michael Ellerman, Rodel, Jorg,
	Joel Stanley, Alex Williamson, Oliver O'Halloran, kvm-ppc,
	kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin



On 08/07/2022 17:32, Tian, Kevin wrote:
>> From: Alexey Kardashevskiy <aik@ozlabs.ru>
>> Sent: Friday, July 8, 2022 2:35 PM
>> On 7/8/22 15:00, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 7/8/22 01:10, Jason Gunthorpe wrote:
>>>> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>>>>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>>>>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>>>>> the Type1 VFIO driver. Recent development though has added a
>> coherency
>>>>> capability check to the generic part of VFIO and essentially disabled
>>>>> VFIO on PPC64; the similar story about
>> iommu_group_dma_owner_claimed().
>>>>>
>>>>> This adds an iommu_ops stub which reports support for cache
>>>>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI
>>>>> devices,
>>>>> this provides minimum code for the probing to not crash.
> 
> stale comment since this patch doesn't use bus_set_iommu() now.
> 
>>>>> +
>>>>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>>>>> +                      struct device *dev)
>>>>> +{
>>>>> +    return 0;
>>>>> +}
>>>>
>>>> It is important when this returns that the iommu translation is all
>>>> emptied. There should be no left over translations from the DMA API at
>>>> this point. I have no idea how power works in this regard, but it
>>>> should be explained why this is safe in a comment at a minimum.
>>>>
>>>   > It will turn into a security problem to allow kernel mappings to leak
>>>   > past this point.
>>>   >
>>>
>>> I've added for v2 checking for no valid mappings for a device (or, more
>>> precisely, in the associated iommu_group), this domain does not need
>>> checking, right?
>>
>>
>> Uff, not that simple. Looks like once a device is in a group, its
>> dma_ops is set to iommu_dma_ops and IOMMU code owns DMA. I guess
>> then
>> there is a way to set those to NULL or do something similar to let
>> dma_map_direct() from kernel/dma/mapping.c return "true", is not there?
> 
> dev->dma_ops is NULL as long as you don't handle DMA domain type
> here and don't call iommu_setup_dma_ops().
> 
> Given this only supports blocking domain then above should be irrelevant.
> 
>>
>> For now I'll add a comment in spapr_tce_iommu_attach_dev() that it is
>> fine to do nothing as tce_iommu_take_ownership() and
>> tce_iommu_take_ownership_ddw() take care of not having active DMA
>> mappings. Thanks,
>>
>>
>>>
>>> In general, is "domain" something from hardware or it is a software
>>> concept? Thanks,
>>>
> 
> 'domain' is a software concept to represent the hardware I/O page
> table. 


About this. If a platform has a concept of explicit DMA windows (2 or 
more), is it one domain with 2 windows or 2 domains with one window each?

If it is 2 windows, iommu_domain_ops misses windows manipulation 
callbacks (I vaguely remember it being there for embedded PPC64 but 
cannot find it quickly).

If it is 1 window per a domain, then can a device be attached to 2 
domains at least in theory (I suspect not)?

On server POWER CPUs, each DMA window is backed by an independent IOMMU 
page table. (reminder) A window is a bus address range where devices are 
allowed to DMA to/from ;)

Thanks,


> A blocking domain means all DMAs from a device attached to
> this domain are blocked/rejected (equivalent to an empty I/O page
> table), usually enforced in the .attach_dev() callback.
> 
> Yes, a comment for why having a NULL .attach_dev() is OK is welcomed.
> 
> Thanks
> Kevin

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-29  2:21           ` Alexey Kardashevskiy
  (?)
@ 2022-07-29  2:53             ` Oliver O'Halloran
  -1 siblings, 0 replies; 75+ messages in thread
From: Oliver O'Halloran @ 2022-07-29  2:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Tian, Kevin, Jason Gunthorpe, linuxppc-dev, Robin Murphy,
	Michael Ellerman, Rodel, Jorg, Joel Stanley, Alex Williamson,
	kvm-ppc, kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> *snip*
>
> About this. If a platform has a concept of explicit DMA windows (2 or
> more), is it one domain with 2 windows or 2 domains with one window each?
>
> If it is 2 windows, iommu_domain_ops misses windows manipulation
> callbacks (I vaguely remember it being there for embedded PPC64 but
> cannot find it quickly).
>
> If it is 1 window per a domain, then can a device be attached to 2
> domains at least in theory (I suspect not)?
>
> On server POWER CPUs, each DMA window is backed by an independent IOMMU
> page table. (reminder) A window is a bus address range where devices are
> allowed to DMA to/from ;)

I've always thought of windows as being entries to a top-level "iommu
page table" for the device / domain. The fact each window is backed by
a separate IOMMU page table shouldn't really be relevant outside the
arch/platform.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  2:53             ` Oliver O'Halloran
  0 siblings, 0 replies; 75+ messages in thread
From: Oliver O'Halloran @ 2022-07-29  2:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Tian, Kevin, Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson, Joel Stanley,
	Jason Gunthorpe, Robin Murphy

On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> *snip*
>
> About this. If a platform has a concept of explicit DMA windows (2 or
> more), is it one domain with 2 windows or 2 domains with one window each?
>
> If it is 2 windows, iommu_domain_ops misses windows manipulation
> callbacks (I vaguely remember it being there for embedded PPC64 but
> cannot find it quickly).
>
> If it is 1 window per a domain, then can a device be attached to 2
> domains at least in theory (I suspect not)?
>
> On server POWER CPUs, each DMA window is backed by an independent IOMMU
> page table. (reminder) A window is a bus address range where devices are
> allowed to DMA to/from ;)

I've always thought of windows as being entries to a top-level "iommu
page table" for the device / domain. The fact each window is backed by
a separate IOMMU page table shouldn't really be relevant outside the
arch/platform.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  2:53             ` Oliver O'Halloran
  0 siblings, 0 replies; 75+ messages in thread
From: Oliver O'Halloran @ 2022-07-29  2:53 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Tian, Kevin, Jason Gunthorpe, linuxppc-dev, Robin Murphy,
	Michael Ellerman, Rodel, Jorg, Joel Stanley, Alex Williamson,
	kvm-ppc, kvm, Daniel Henrique Barboza, Fabiano Rosas,
	Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>
> *snip*
>
> About this. If a platform has a concept of explicit DMA windows (2 or
> more), is it one domain with 2 windows or 2 domains with one window each?
>
> If it is 2 windows, iommu_domain_ops misses windows manipulation
> callbacks (I vaguely remember it being there for embedded PPC64 but
> cannot find it quickly).
>
> If it is 1 window per a domain, then can a device be attached to 2
> domains at least in theory (I suspect not)?
>
> On server POWER CPUs, each DMA window is backed by an independent IOMMU
> page table. (reminder) A window is a bus address range where devices are
> allowed to DMA to/from ;)

I've always thought of windows as being entries to a top-level "iommu
page table" for the device / domain. The fact each window is backed by
a separate IOMMU page table shouldn't really be relevant outside the
arch/platform.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-29  2:53             ` Oliver O'Halloran
  (?)
@ 2022-07-29  3:10               ` Tian, Kevin
  -1 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-29  3:10 UTC (permalink / raw)
  To: Oliver O'Halloran, Alexey Kardashevskiy
  Cc: Jason Gunthorpe, linuxppc-dev, Robin Murphy, Michael Ellerman,
	Rodel, Jorg, Joel Stanley, Alex Williamson, kvm-ppc, kvm,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin

> From: Oliver O'Halloran <oohall@gmail.com>
> Sent: Friday, July 29, 2022 10:53 AM
> 
> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >
> > *snip*
> >
> > About this. If a platform has a concept of explicit DMA windows (2 or
> > more), is it one domain with 2 windows or 2 domains with one window
> each?
> >
> > If it is 2 windows, iommu_domain_ops misses windows manipulation
> > callbacks (I vaguely remember it being there for embedded PPC64 but
> > cannot find it quickly).
> >
> > If it is 1 window per a domain, then can a device be attached to 2
> > domains at least in theory (I suspect not)?
> >
> > On server POWER CPUs, each DMA window is backed by an independent
> IOMMU
> > page table. (reminder) A window is a bus address range where devices are
> > allowed to DMA to/from ;)
> 
> I've always thought of windows as being entries to a top-level "iommu
> page table" for the device / domain. The fact each window is backed by
> a separate IOMMU page table shouldn't really be relevant outside the
> arch/platform.

Yes. This is what was agreed when discussing how to integrate iommufd
with POWER [1].

One domain represents one address space.

Windows are just constraints on the address space for what ranges can
be mapped.

having two page tables underlying is just kind of POWER specific format.

Thanks
Kevin

[1] https://lore.kernel.org/all/Yns+TCSa6hWbU7wZ@yekko/

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  3:10               ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-29  3:10 UTC (permalink / raw)
  To: Oliver O'Halloran, Alexey Kardashevskiy
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson, Joel Stanley,
	Jason Gunthorpe, Robin Murphy

> From: Oliver O'Halloran <oohall@gmail.com>
> Sent: Friday, July 29, 2022 10:53 AM
> 
> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> >
> > *snip*
> >
> > About this. If a platform has a concept of explicit DMA windows (2 or
> > more), is it one domain with 2 windows or 2 domains with one window
> each?
> >
> > If it is 2 windows, iommu_domain_ops misses windows manipulation
> > callbacks (I vaguely remember it being there for embedded PPC64 but
> > cannot find it quickly).
> >
> > If it is 1 window per a domain, then can a device be attached to 2
> > domains at least in theory (I suspect not)?
> >
> > On server POWER CPUs, each DMA window is backed by an independent
> IOMMU
> > page table. (reminder) A window is a bus address range where devices are
> > allowed to DMA to/from ;)
> 
> I've always thought of windows as being entries to a top-level "iommu
> page table" for the device / domain. The fact each window is backed by
> a separate IOMMU page table shouldn't really be relevant outside the
> arch/platform.

Yes. This is what was agreed when discussing how to integrate iommufd
with POWER [1].

One domain represents one address space.

Windows are just constraints on the address space for what ranges can
be mapped.

having two page tables underlying is just kind of POWER specific format.

Thanks
Kevin

[1] https://lore.kernel.org/all/Yns+TCSa6hWbU7wZ@yekko/

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  3:10               ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-29  3:10 UTC (permalink / raw)
  To: Oliver O'Halloran, Alexey Kardashevskiy
  Cc: Jason Gunthorpe, linuxppc-dev, Robin Murphy, Michael Ellerman,
	Rodel, Jorg, Joel Stanley, Alex Williamson, kvm-ppc, kvm,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin

PiBGcm9tOiBPbGl2ZXIgTydIYWxsb3JhbiA8b29oYWxsQGdtYWlsLmNvbT4NCj4gU2VudDogRnJp
ZGF5LCBKdWx5IDI5LCAyMDIyIDEwOjUzIEFNDQo+IA0KPiBPbiBGcmksIEp1bCAyOSwgMjAyMiBh
dCAxMjoyMSBQTSBBbGV4ZXkgS2FyZGFzaGV2c2tpeSA8YWlrQG96bGFicy5ydT4gd3JvdGU6DQo+
ID4NCj4gPiAqc25pcCoNCj4gPg0KPiA+IEFib3V0IHRoaXMuIElmIGEgcGxhdGZvcm0gaGFzIGEg
Y29uY2VwdCBvZiBleHBsaWNpdCBETUEgd2luZG93cyAoMiBvcg0KPiA+IG1vcmUpLCBpcyBpdCBv
bmUgZG9tYWluIHdpdGggMiB3aW5kb3dzIG9yIDIgZG9tYWlucyB3aXRoIG9uZSB3aW5kb3cNCj4g
ZWFjaD8NCj4gPg0KPiA+IElmIGl0IGlzIDIgd2luZG93cywgaW9tbXVfZG9tYWluX29wcyBtaXNz
ZXMgd2luZG93cyBtYW5pcHVsYXRpb24NCj4gPiBjYWxsYmFja3MgKEkgdmFndWVseSByZW1lbWJl
ciBpdCBiZWluZyB0aGVyZSBmb3IgZW1iZWRkZWQgUFBDNjQgYnV0DQo+ID4gY2Fubm90IGZpbmQg
aXQgcXVpY2tseSkuDQo+ID4NCj4gPiBJZiBpdCBpcyAxIHdpbmRvdyBwZXIgYSBkb21haW4sIHRo
ZW4gY2FuIGEgZGV2aWNlIGJlIGF0dGFjaGVkIHRvIDINCj4gPiBkb21haW5zIGF0IGxlYXN0IGlu
IHRoZW9yeSAoSSBzdXNwZWN0IG5vdCk/DQo+ID4NCj4gPiBPbiBzZXJ2ZXIgUE9XRVIgQ1BVcywg
ZWFjaCBETUEgd2luZG93IGlzIGJhY2tlZCBieSBhbiBpbmRlcGVuZGVudA0KPiBJT01NVQ0KPiA+
IHBhZ2UgdGFibGUuIChyZW1pbmRlcikgQSB3aW5kb3cgaXMgYSBidXMgYWRkcmVzcyByYW5nZSB3
aGVyZSBkZXZpY2VzIGFyZQ0KPiA+IGFsbG93ZWQgdG8gRE1BIHRvL2Zyb20gOykNCj4gDQo+IEkn
dmUgYWx3YXlzIHRob3VnaHQgb2Ygd2luZG93cyBhcyBiZWluZyBlbnRyaWVzIHRvIGEgdG9wLWxl
dmVsICJpb21tdQ0KPiBwYWdlIHRhYmxlIiBmb3IgdGhlIGRldmljZSAvIGRvbWFpbi4gVGhlIGZh
Y3QgZWFjaCB3aW5kb3cgaXMgYmFja2VkIGJ5DQo+IGEgc2VwYXJhdGUgSU9NTVUgcGFnZSB0YWJs
ZSBzaG91bGRuJ3QgcmVhbGx5IGJlIHJlbGV2YW50IG91dHNpZGUgdGhlDQo+IGFyY2gvcGxhdGZv
cm0uDQoNClllcy4gVGhpcyBpcyB3aGF0IHdhcyBhZ3JlZWQgd2hlbiBkaXNjdXNzaW5nIGhvdyB0
byBpbnRlZ3JhdGUgaW9tbXVmZA0Kd2l0aCBQT1dFUiBbMV0uDQoNCk9uZSBkb21haW4gcmVwcmVz
ZW50cyBvbmUgYWRkcmVzcyBzcGFjZS4NCg0KV2luZG93cyBhcmUganVzdCBjb25zdHJhaW50cyBv
biB0aGUgYWRkcmVzcyBzcGFjZSBmb3Igd2hhdCByYW5nZXMgY2FuDQpiZSBtYXBwZWQuDQoNCmhh
dmluZyB0d28gcGFnZSB0YWJsZXMgdW5kZXJseWluZyBpcyBqdXN0IGtpbmQgb2YgUE9XRVIgc3Bl
Y2lmaWMgZm9ybWF0Lg0KDQpUaGFua3MNCktldmluDQoNClsxXSBodHRwczovL2xvcmUua2VybmVs
Lm9yZy9hbGwvWW5zK1RDU2E2aFdiVTd3WkB5ZWtrby8NCg=

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-29  3:10               ` Tian, Kevin
  (?)
@ 2022-07-29  3:50                 ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-29  3:50 UTC (permalink / raw)
  To: Tian, Kevin, Oliver O'Halloran
  Cc: Jason Gunthorpe, linuxppc-dev, Robin Murphy, Michael Ellerman,
	Rodel, Jorg, Joel Stanley, Alex Williamson, kvm-ppc, kvm,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin



On 29/07/2022 13:10, Tian, Kevin wrote:
>> From: Oliver O'Halloran <oohall@gmail.com>
>> Sent: Friday, July 29, 2022 10:53 AM
>>
>> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>
>>> *snip*
>>>
>>> About this. If a platform has a concept of explicit DMA windows (2 or
>>> more), is it one domain with 2 windows or 2 domains with one window
>> each?
>>>
>>> If it is 2 windows, iommu_domain_ops misses windows manipulation
>>> callbacks (I vaguely remember it being there for embedded PPC64 but
>>> cannot find it quickly).
>>>
>>> If it is 1 window per a domain, then can a device be attached to 2
>>> domains at least in theory (I suspect not)?
>>>
>>> On server POWER CPUs, each DMA window is backed by an independent
>> IOMMU
>>> page table. (reminder) A window is a bus address range where devices are
>>> allowed to DMA to/from ;)
>>
>> I've always thought of windows as being entries to a top-level "iommu
>> page table" for the device / domain. The fact each window is backed by
>> a separate IOMMU page table shouldn't really be relevant outside the
>> arch/platform.
> 
> Yes. This is what was agreed when discussing how to integrate iommufd
> with POWER [1].
> 
> One domain represents one address space.
> 
> Windows are just constraints on the address space for what ranges can
> be mapped.
> 
> having two page tables underlying is just kind of POWER specific format.


It is a POWER specific thing with one not-so-obvious consequence of each 
window having an independent page size (fixed at the moment or creation) 
and (most likely) different page size, like, 4K vs. 2M.


> 
> Thanks
> Kevin
> 
> [1] https://lore.kernel.org/all/Yns+TCSa6hWbU7wZ@yekko/

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  3:50                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-29  3:50 UTC (permalink / raw)
  To: Tian, Kevin, Oliver O'Halloran
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson, Joel Stanley,
	Jason Gunthorpe, Robin Murphy



On 29/07/2022 13:10, Tian, Kevin wrote:
>> From: Oliver O'Halloran <oohall@gmail.com>
>> Sent: Friday, July 29, 2022 10:53 AM
>>
>> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>
>>> *snip*
>>>
>>> About this. If a platform has a concept of explicit DMA windows (2 or
>>> more), is it one domain with 2 windows or 2 domains with one window
>> each?
>>>
>>> If it is 2 windows, iommu_domain_ops misses windows manipulation
>>> callbacks (I vaguely remember it being there for embedded PPC64 but
>>> cannot find it quickly).
>>>
>>> If it is 1 window per a domain, then can a device be attached to 2
>>> domains at least in theory (I suspect not)?
>>>
>>> On server POWER CPUs, each DMA window is backed by an independent
>> IOMMU
>>> page table. (reminder) A window is a bus address range where devices are
>>> allowed to DMA to/from ;)
>>
>> I've always thought of windows as being entries to a top-level "iommu
>> page table" for the device / domain. The fact each window is backed by
>> a separate IOMMU page table shouldn't really be relevant outside the
>> arch/platform.
> 
> Yes. This is what was agreed when discussing how to integrate iommufd
> with POWER [1].
> 
> One domain represents one address space.
> 
> Windows are just constraints on the address space for what ranges can
> be mapped.
> 
> having two page tables underlying is just kind of POWER specific format.


It is a POWER specific thing with one not-so-obvious consequence of each 
window having an independent page size (fixed at the moment or creation) 
and (most likely) different page size, like, 4K vs. 2M.


> 
> Thanks
> Kevin
> 
> [1] https://lore.kernel.org/all/Yns+TCSa6hWbU7wZ@yekko/

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  3:50                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 75+ messages in thread
From: Alexey Kardashevskiy @ 2022-07-29  3:50 UTC (permalink / raw)
  To: Tian, Kevin, Oliver O'Halloran
  Cc: Jason Gunthorpe, linuxppc-dev, Robin Murphy, Michael Ellerman,
	Rodel, Jorg, Joel Stanley, Alex Williamson, kvm-ppc, kvm,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin



On 29/07/2022 13:10, Tian, Kevin wrote:
>> From: Oliver O'Halloran <oohall@gmail.com>
>> Sent: Friday, July 29, 2022 10:53 AM
>>
>> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>
>>> *snip*
>>>
>>> About this. If a platform has a concept of explicit DMA windows (2 or
>>> more), is it one domain with 2 windows or 2 domains with one window
>> each?
>>>
>>> If it is 2 windows, iommu_domain_ops misses windows manipulation
>>> callbacks (I vaguely remember it being there for embedded PPC64 but
>>> cannot find it quickly).
>>>
>>> If it is 1 window per a domain, then can a device be attached to 2
>>> domains at least in theory (I suspect not)?
>>>
>>> On server POWER CPUs, each DMA window is backed by an independent
>> IOMMU
>>> page table. (reminder) A window is a bus address range where devices are
>>> allowed to DMA to/from ;)
>>
>> I've always thought of windows as being entries to a top-level "iommu
>> page table" for the device / domain. The fact each window is backed by
>> a separate IOMMU page table shouldn't really be relevant outside the
>> arch/platform.
> 
> Yes. This is what was agreed when discussing how to integrate iommufd
> with POWER [1].
> 
> One domain represents one address space.
> 
> Windows are just constraints on the address space for what ranges can
> be mapped.
> 
> having two page tables underlying is just kind of POWER specific format.


It is a POWER specific thing with one not-so-obvious consequence of each 
window having an independent page size (fixed at the moment or creation) 
and (most likely) different page size, like, 4K vs. 2M.


> 
> Thanks
> Kevin
> 
> [1] https://lore.kernel.org/all/Yns+TCSa6hWbU7wZ@yekko/

-- 
Alexey

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-29  3:50                 ` Alexey Kardashevskiy
  (?)
@ 2022-07-29  4:24                   ` Tian, Kevin
  -1 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-29  4:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Oliver O'Halloran
  Cc: Jason Gunthorpe, linuxppc-dev, Robin Murphy, Michael Ellerman,
	Rodel, Jorg, Joel Stanley, Alex Williamson, kvm-ppc, kvm,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> Sent: Friday, July 29, 2022 11:50 AM
> 
> 
> On 29/07/2022 13:10, Tian, Kevin wrote:
> >> From: Oliver O'Halloran <oohall@gmail.com>
> >> Sent: Friday, July 29, 2022 10:53 AM
> >>
> >> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru>
> wrote:
> >>>
> >>> *snip*
> >>>
> >>> About this. If a platform has a concept of explicit DMA windows (2 or
> >>> more), is it one domain with 2 windows or 2 domains with one window
> >> each?
> >>>
> >>> If it is 2 windows, iommu_domain_ops misses windows manipulation
> >>> callbacks (I vaguely remember it being there for embedded PPC64 but
> >>> cannot find it quickly).
> >>>
> >>> If it is 1 window per a domain, then can a device be attached to 2
> >>> domains at least in theory (I suspect not)?
> >>>
> >>> On server POWER CPUs, each DMA window is backed by an independent
> >> IOMMU
> >>> page table. (reminder) A window is a bus address range where devices
> are
> >>> allowed to DMA to/from ;)
> >>
> >> I've always thought of windows as being entries to a top-level "iommu
> >> page table" for the device / domain. The fact each window is backed by
> >> a separate IOMMU page table shouldn't really be relevant outside the
> >> arch/platform.
> >
> > Yes. This is what was agreed when discussing how to integrate iommufd
> > with POWER [1].
> >
> > One domain represents one address space.
> >
> > Windows are just constraints on the address space for what ranges can
> > be mapped.
> >
> > having two page tables underlying is just kind of POWER specific format.
> 
> 
> It is a POWER specific thing with one not-so-obvious consequence of each
> window having an independent page size (fixed at the moment or creation)
> and (most likely) different page size, like, 4K vs. 2M.
> 
> 

page size is anyway decided by the iommu driver. Same for other vendors.
the caller (e.g. vfio) just tries to map as many contiguous pages as
possible but in the end it's iommu driver to decide the actual size.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  4:24                   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-29  4:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Oliver O'Halloran
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, linuxppc-dev,
	Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson, Joel Stanley,
	Jason Gunthorpe, Robin Murphy

> From: Alexey Kardashevskiy <aik@ozlabs.ru>
> Sent: Friday, July 29, 2022 11:50 AM
> 
> 
> On 29/07/2022 13:10, Tian, Kevin wrote:
> >> From: Oliver O'Halloran <oohall@gmail.com>
> >> Sent: Friday, July 29, 2022 10:53 AM
> >>
> >> On Fri, Jul 29, 2022 at 12:21 PM Alexey Kardashevskiy <aik@ozlabs.ru>
> wrote:
> >>>
> >>> *snip*
> >>>
> >>> About this. If a platform has a concept of explicit DMA windows (2 or
> >>> more), is it one domain with 2 windows or 2 domains with one window
> >> each?
> >>>
> >>> If it is 2 windows, iommu_domain_ops misses windows manipulation
> >>> callbacks (I vaguely remember it being there for embedded PPC64 but
> >>> cannot find it quickly).
> >>>
> >>> If it is 1 window per a domain, then can a device be attached to 2
> >>> domains at least in theory (I suspect not)?
> >>>
> >>> On server POWER CPUs, each DMA window is backed by an independent
> >> IOMMU
> >>> page table. (reminder) A window is a bus address range where devices
> are
> >>> allowed to DMA to/from ;)
> >>
> >> I've always thought of windows as being entries to a top-level "iommu
> >> page table" for the device / domain. The fact each window is backed by
> >> a separate IOMMU page table shouldn't really be relevant outside the
> >> arch/platform.
> >
> > Yes. This is what was agreed when discussing how to integrate iommufd
> > with POWER [1].
> >
> > One domain represents one address space.
> >
> > Windows are just constraints on the address space for what ranges can
> > be mapped.
> >
> > having two page tables underlying is just kind of POWER specific format.
> 
> 
> It is a POWER specific thing with one not-so-obvious consequence of each
> window having an independent page size (fixed at the moment or creation)
> and (most likely) different page size, like, 4K vs. 2M.
> 
> 

page size is anyway decided by the iommu driver. Same for other vendors.
the caller (e.g. vfio) just tries to map as many contiguous pages as
possible but in the end it's iommu driver to decide the actual size.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* RE: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29  4:24                   ` Tian, Kevin
  0 siblings, 0 replies; 75+ messages in thread
From: Tian, Kevin @ 2022-07-29  4:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Oliver O'Halloran
  Cc: Jason Gunthorpe, linuxppc-dev, Robin Murphy, Michael Ellerman,
	Rodel, Jorg, Joel Stanley, Alex Williamson, kvm-ppc, kvm,
	Daniel Henrique Barboza, Fabiano Rosas, Murilo Opsfelder Araujo,
	Nicholas Piggin

PiBGcm9tOiBBbGV4ZXkgS2FyZGFzaGV2c2tpeSA8YWlrQG96bGFicy5ydT4NCj4gU2VudDogRnJp
ZGF5LCBKdWx5IDI5LCAyMDIyIDExOjUwIEFNDQo+IA0KPiANCj4gT24gMjkvMDcvMjAyMiAxMzox
MCwgVGlhbiwgS2V2aW4gd3JvdGU6DQo+ID4+IEZyb206IE9saXZlciBPJ0hhbGxvcmFuIDxvb2hh
bGxAZ21haWwuY29tPg0KPiA+PiBTZW50OiBGcmlkYXksIEp1bHkgMjksIDIwMjIgMTA6NTMgQU0N
Cj4gPj4NCj4gPj4gT24gRnJpLCBKdWwgMjksIDIwMjIgYXQgMTI6MjEgUE0gQWxleGV5IEthcmRh
c2hldnNraXkgPGFpa0BvemxhYnMucnU+DQo+IHdyb3RlOg0KPiA+Pj4NCj4gPj4+ICpzbmlwKg0K
PiA+Pj4NCj4gPj4+IEFib3V0IHRoaXMuIElmIGEgcGxhdGZvcm0gaGFzIGEgY29uY2VwdCBvZiBl
eHBsaWNpdCBETUEgd2luZG93cyAoMiBvcg0KPiA+Pj4gbW9yZSksIGlzIGl0IG9uZSBkb21haW4g
d2l0aCAyIHdpbmRvd3Mgb3IgMiBkb21haW5zIHdpdGggb25lIHdpbmRvdw0KPiA+PiBlYWNoPw0K
PiA+Pj4NCj4gPj4+IElmIGl0IGlzIDIgd2luZG93cywgaW9tbXVfZG9tYWluX29wcyBtaXNzZXMg
d2luZG93cyBtYW5pcHVsYXRpb24NCj4gPj4+IGNhbGxiYWNrcyAoSSB2YWd1ZWx5IHJlbWVtYmVy
IGl0IGJlaW5nIHRoZXJlIGZvciBlbWJlZGRlZCBQUEM2NCBidXQNCj4gPj4+IGNhbm5vdCBmaW5k
IGl0IHF1aWNrbHkpLg0KPiA+Pj4NCj4gPj4+IElmIGl0IGlzIDEgd2luZG93IHBlciBhIGRvbWFp
biwgdGhlbiBjYW4gYSBkZXZpY2UgYmUgYXR0YWNoZWQgdG8gMg0KPiA+Pj4gZG9tYWlucyBhdCBs
ZWFzdCBpbiB0aGVvcnkgKEkgc3VzcGVjdCBub3QpPw0KPiA+Pj4NCj4gPj4+IE9uIHNlcnZlciBQ
T1dFUiBDUFVzLCBlYWNoIERNQSB3aW5kb3cgaXMgYmFja2VkIGJ5IGFuIGluZGVwZW5kZW50DQo+
ID4+IElPTU1VDQo+ID4+PiBwYWdlIHRhYmxlLiAocmVtaW5kZXIpIEEgd2luZG93IGlzIGEgYnVz
IGFkZHJlc3MgcmFuZ2Ugd2hlcmUgZGV2aWNlcw0KPiBhcmUNCj4gPj4+IGFsbG93ZWQgdG8gRE1B
IHRvL2Zyb20gOykNCj4gPj4NCj4gPj4gSSd2ZSBhbHdheXMgdGhvdWdodCBvZiB3aW5kb3dzIGFz
IGJlaW5nIGVudHJpZXMgdG8gYSB0b3AtbGV2ZWwgImlvbW11DQo+ID4+IHBhZ2UgdGFibGUiIGZv
ciB0aGUgZGV2aWNlIC8gZG9tYWluLiBUaGUgZmFjdCBlYWNoIHdpbmRvdyBpcyBiYWNrZWQgYnkN
Cj4gPj4gYSBzZXBhcmF0ZSBJT01NVSBwYWdlIHRhYmxlIHNob3VsZG4ndCByZWFsbHkgYmUgcmVs
ZXZhbnQgb3V0c2lkZSB0aGUNCj4gPj4gYXJjaC9wbGF0Zm9ybS4NCj4gPg0KPiA+IFllcy4gVGhp
cyBpcyB3aGF0IHdhcyBhZ3JlZWQgd2hlbiBkaXNjdXNzaW5nIGhvdyB0byBpbnRlZ3JhdGUgaW9t
bXVmZA0KPiA+IHdpdGggUE9XRVIgWzFdLg0KPiA+DQo+ID4gT25lIGRvbWFpbiByZXByZXNlbnRz
IG9uZSBhZGRyZXNzIHNwYWNlLg0KPiA+DQo+ID4gV2luZG93cyBhcmUganVzdCBjb25zdHJhaW50
cyBvbiB0aGUgYWRkcmVzcyBzcGFjZSBmb3Igd2hhdCByYW5nZXMgY2FuDQo+ID4gYmUgbWFwcGVk
Lg0KPiA+DQo+ID4gaGF2aW5nIHR3byBwYWdlIHRhYmxlcyB1bmRlcmx5aW5nIGlzIGp1c3Qga2lu
ZCBvZiBQT1dFUiBzcGVjaWZpYyBmb3JtYXQuDQo+IA0KPiANCj4gSXQgaXMgYSBQT1dFUiBzcGVj
aWZpYyB0aGluZyB3aXRoIG9uZSBub3Qtc28tb2J2aW91cyBjb25zZXF1ZW5jZSBvZiBlYWNoDQo+
IHdpbmRvdyBoYXZpbmcgYW4gaW5kZXBlbmRlbnQgcGFnZSBzaXplIChmaXhlZCBhdCB0aGUgbW9t
ZW50IG9yIGNyZWF0aW9uKQ0KPiBhbmQgKG1vc3QgbGlrZWx5KSBkaWZmZXJlbnQgcGFnZSBzaXpl
LCBsaWtlLCA0SyB2cy4gMk0uDQo+IA0KPiANCg0KcGFnZSBzaXplIGlzIGFueXdheSBkZWNpZGVk
IGJ5IHRoZSBpb21tdSBkcml2ZXIuIFNhbWUgZm9yIG90aGVyIHZlbmRvcnMuDQp0aGUgY2FsbGVy
IChlLmcuIHZmaW8pIGp1c3QgdHJpZXMgdG8gbWFwIGFzIG1hbnkgY29udGlndW91cyBwYWdlcyBh
cw0KcG9zc2libGUgYnV0IGluIHRoZSBlbmQgaXQncyBpb21tdSBkcml2ZXIgdG8gZGVjaWRlIHRo
ZSBhY3R1YWwgc2l6ZS4NCg=

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
  2022-07-29  4:24                   ` Tian, Kevin
  (?)
@ 2022-07-29 12:09                     ` Jason Gunthorpe
  -1 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-29 12:09 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alexey Kardashevskiy, Oliver O'Halloran, linuxppc-dev,
	Robin Murphy, Michael Ellerman, Rodel, Jorg, Joel Stanley,
	Alex Williamson, kvm-ppc, kvm, Daniel Henrique Barboza,
	Fabiano Rosas, Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 29, 2022 at 04:24:36AM +0000, Tian, Kevin wrote:
> > It is a POWER specific thing with one not-so-obvious consequence of each
> > window having an independent page size (fixed at the moment or creation)
> > and (most likely) different page size, like, 4K vs. 2M.
> 
> page size is anyway decided by the iommu driver. Same for other vendors.
> the caller (e.g. vfio) just tries to map as many contiguous pages as
> possible but in the end it's iommu driver to decide the actual size.

An iommu_domain with a non-uniform page_size is weird, but it could be
made to work as some device-specific thing. The only thing the core
codes care about is the minimum alignment, so you'd just have to set
it to 4k and mapping the wrong thing to the 2M area would fail.

domains with their io page size != PAGE_SIZE are already very special
things that require special handling by userspace..

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29 12:09                     ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-29 12:09 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Rodel, Jorg, kvm, Fabiano Rosas, Alexey Kardashevskiy,
	Robin Murphy, Daniel Henrique Barboza, Nicholas Piggin,
	Murilo Opsfelder Araujo, kvm-ppc, Alex Williamson,
	Oliver O'Halloran, Joel Stanley, linuxppc-dev

On Fri, Jul 29, 2022 at 04:24:36AM +0000, Tian, Kevin wrote:
> > It is a POWER specific thing with one not-so-obvious consequence of each
> > window having an independent page size (fixed at the moment or creation)
> > and (most likely) different page size, like, 4K vs. 2M.
> 
> page size is anyway decided by the iommu driver. Same for other vendors.
> the caller (e.g. vfio) just tries to map as many contiguous pages as
> possible but in the end it's iommu driver to decide the actual size.

An iommu_domain with a non-uniform page_size is weird, but it could be
made to work as some device-specific thing. The only thing the core
codes care about is the minimum alignment, so you'd just have to set
it to 4k and mapping the wrong thing to the 2M area would fail.

domains with their io page size != PAGE_SIZE are already very special
things that require special handling by userspace..

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains
@ 2022-07-29 12:09                     ` Jason Gunthorpe
  0 siblings, 0 replies; 75+ messages in thread
From: Jason Gunthorpe @ 2022-07-29 12:09 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alexey Kardashevskiy, Oliver O'Halloran, linuxppc-dev,
	Robin Murphy, Michael Ellerman, Rodel, Jorg, Joel Stanley,
	Alex Williamson, kvm-ppc, kvm, Daniel Henrique Barboza,
	Fabiano Rosas, Murilo Opsfelder Araujo, Nicholas Piggin

On Fri, Jul 29, 2022 at 04:24:36AM +0000, Tian, Kevin wrote:
> > It is a POWER specific thing with one not-so-obvious consequence of each
> > window having an independent page size (fixed at the moment or creation)
> > and (most likely) different page size, like, 4K vs. 2M.
> 
> page size is anyway decided by the iommu driver. Same for other vendors.
> the caller (e.g. vfio) just tries to map as many contiguous pages as
> possible but in the end it's iommu driver to decide the actual size.

An iommu_domain with a non-uniform page_size is weird, but it could be
made to work as some device-specific thing. The only thing the core
codes care about is the minimum alignment, so you'd just have to set
it to 4k and mapping the wrong thing to the 2M area would fail.

domains with their io page size != PAGE_SIZE are already very special
things that require special handling by userspace..

Jason

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2022-07-29 12:10 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07 13:55 [PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains Alexey Kardashevskiy
2022-07-07 13:55 ` Alexey Kardashevskiy
2022-07-07 13:55 ` Alexey Kardashevskiy
2022-07-07 15:10 ` Jason Gunthorpe
2022-07-07 15:10   ` Jason Gunthorpe
2022-07-07 15:10   ` Jason Gunthorpe
2022-07-08  4:58   ` Alexey Kardashevskiy
2022-07-08  5:00     ` Alexey Kardashevskiy
2022-07-08  5:00     ` Alexey Kardashevskiy
2022-07-08  6:34     ` Alexey Kardashevskiy
2022-07-08  6:34       ` Alexey Kardashevskiy
2022-07-08  6:34       ` Alexey Kardashevskiy
2022-07-08  7:32       ` Tian, Kevin
2022-07-08  7:32         ` Tian, Kevin
2022-07-08  7:32         ` Tian, Kevin
2022-07-08  9:45         ` Alexey Kardashevskiy
2022-07-08  9:45           ` Alexey Kardashevskiy
2022-07-08  9:45           ` Alexey Kardashevskiy
2022-07-08 10:18           ` Tian, Kevin
2022-07-08 10:18             ` Tian, Kevin
2022-07-08 10:18             ` Tian, Kevin
2022-07-29  2:21         ` Alexey Kardashevskiy
2022-07-29  2:21           ` Alexey Kardashevskiy
2022-07-29  2:21           ` Alexey Kardashevskiy
2022-07-29  2:53           ` Oliver O'Halloran
2022-07-29  2:53             ` Oliver O'Halloran
2022-07-29  2:53             ` Oliver O'Halloran
2022-07-29  3:10             ` Tian, Kevin
2022-07-29  3:10               ` Tian, Kevin
2022-07-29  3:10               ` Tian, Kevin
2022-07-29  3:50               ` Alexey Kardashevskiy
2022-07-29  3:50                 ` Alexey Kardashevskiy
2022-07-29  3:50                 ` Alexey Kardashevskiy
2022-07-29  4:24                 ` Tian, Kevin
2022-07-29  4:24                   ` Tian, Kevin
2022-07-29  4:24                   ` Tian, Kevin
2022-07-29 12:09                   ` Jason Gunthorpe
2022-07-29 12:09                     ` Jason Gunthorpe
2022-07-29 12:09                     ` Jason Gunthorpe
2022-07-08 11:55       ` Jason Gunthorpe
2022-07-08 11:55         ` Jason Gunthorpe
2022-07-08 11:55         ` Jason Gunthorpe
2022-07-08 13:10         ` Alexey Kardashevskiy
2022-07-08 13:10           ` Alexey Kardashevskiy
2022-07-08 13:10           ` Alexey Kardashevskiy
2022-07-08 13:19           ` Jason Gunthorpe
2022-07-08 13:19             ` Jason Gunthorpe
2022-07-08 13:19             ` Jason Gunthorpe
2022-07-08 13:32             ` Alexey Kardashevskiy
2022-07-08 13:32               ` Alexey Kardashevskiy
2022-07-08 13:32               ` Alexey Kardashevskiy
2022-07-08 13:59               ` Jason Gunthorpe
2022-07-08 13:59                 ` Jason Gunthorpe
2022-07-08 13:59                 ` Jason Gunthorpe
2022-07-09  2:58         ` Alexey Kardashevskiy
2022-07-09  2:58           ` Alexey Kardashevskiy
2022-07-09  2:58           ` Alexey Kardashevskiy
2022-07-10  6:29           ` Jason Gunthorpe
2022-07-10  6:29             ` Jason Gunthorpe
2022-07-10  6:29             ` Jason Gunthorpe
2022-07-10 12:32             ` Alexey Kardashevskiy
2022-07-10 12:32               ` Alexey Kardashevskiy
2022-07-10 12:32               ` Alexey Kardashevskiy
2022-07-11 13:24               ` Alexey Kardashevskiy
2022-07-11 13:24                 ` Alexey Kardashevskiy
2022-07-11 13:24                 ` Alexey Kardashevskiy
2022-07-11 18:46                 ` Jason Gunthorpe
2022-07-11 18:46                   ` Jason Gunthorpe
2022-07-11 18:46                   ` Jason Gunthorpe
2022-07-12  2:27                   ` Alexey Kardashevskiy
2022-07-12  2:27                     ` Alexey Kardashevskiy
2022-07-12  2:27                     ` Alexey Kardashevskiy
2022-07-12  5:44                     ` Jason Gunthorpe
2022-07-12  5:44                       ` Jason Gunthorpe
2022-07-12  5:44                       ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.