linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] iommu: Per-group default domain type
@ 2020-01-01  5:26 Lu Baolu
  2020-01-01  5:26 ` [RFC PATCH 1/4] driver core: Add iommu_passthrough to struct device Lu Baolu
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-01  5:26 UTC (permalink / raw)
  To: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: ashok.raj, jacob.jun.pan, kevin.tian, Christoph Hellwig,
	Robin Murphy, iommu, linux-kernel, Lu Baolu

An IOMMU group represents the smallest set of devices that are considered
to be isolated. All devices belonging to an IOMMU group share a default
domain for DMA APIs. There are two types of default domain: IOMMU_DOMAIN_DMA
and IOMMU_DOMAIN_IDENTITY. The former means IOMMU translation, while the
latter means IOMMU by-pass.

Currently, the default domain type for the IOMMU groups is determined
globally. All IOMMU groups use a single default domain type. The global
default domain type can be adjusted by kernel build configuration or
kernel parameters.

More and more users are looking forward to a fine grained default domain
type. For example, with the global default domain type set to translation,
the OEM verndors or end users might want some trusted and fast-speed devices
to bypass IOMMU for performance gains. On the other hand, with global
default domain type set to by-pass, some devices with limited system
memory addressing capability might want IOMMU translation to remove the
bounce buffer overhead.

This series proposes per-group default domain type to meet these demands.
It adds a per-device iommu_passthrough attribute. By setting this
attribute, end users or device vendors are able to tell the IOMMU subsystem
that this device is willing to use a default domain of IOMMU_DOMAIN_IDENTITY.
The IOMMU device probe procedure is reformed to pre-allocate groups for
all devices on a specific bus before adding the devices into the groups.
This enables the IOMMU device probe precedure to determine a per-group
default domain type before allocating IOMMU domains and attaching them
to devices.

Please help to review it. Your comments and suggestions are appricated.

Best regards,
baolu 

Lu Baolu (4):
  driver core: Add iommu_passthrough to struct device
  PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
  iommu: Preallocate iommu group when probing devices
  iommu: Determine default domain type before allocating domain

 .../admin-guide/kernel-parameters.txt         |   5 +
 drivers/iommu/iommu.c                         | 127 ++++++++++++++----
 drivers/pci/pci.c                             |  34 +++++
 drivers/pci/pci.h                             |   1 +
 drivers/pci/probe.c                           |   2 +
 include/linux/device.h                        |   3 +
 6 files changed, 143 insertions(+), 29 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/4] driver core: Add iommu_passthrough to struct device
  2020-01-01  5:26 [RFC PATCH 0/4] iommu: Per-group default domain type Lu Baolu
@ 2020-01-01  5:26 ` Lu Baolu
  2020-01-01  5:26 ` [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough Lu Baolu
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-01  5:26 UTC (permalink / raw)
  To: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: ashok.raj, jacob.jun.pan, kevin.tian, Christoph Hellwig,
	Robin Murphy, iommu, linux-kernel, Lu Baolu

Add iommu_passthrough to struct device. This enables the iommu
subsystem to prepare an identity domain for the device so that
the DMA IOVA will be translated to the same physical address.
This field could be set in various subsystems, such as PCI,
according to the device/firmware properties or kernel command
parameters.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/device.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index 96ff76731e93..763d2d078d34 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1247,6 +1247,8 @@ struct dev_links_info {
  *		  sync_state() callback.
  * @dma_coherent: this particular device is dma coherent, even if the
  *		architecture supports non-coherent devices.
+ * @iommu_passthrough: this particular device need to by pass the IOMMU DMA
+ *		translation.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -1347,6 +1349,7 @@ struct device {
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
 	bool			dma_coherent:1;
 #endif
+	bool			iommu_passthrough:1;
 };
 
 static inline struct device *kobj_to_dev(struct kobject *kobj)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
  2020-01-01  5:26 [RFC PATCH 0/4] iommu: Per-group default domain type Lu Baolu
  2020-01-01  5:26 ` [RFC PATCH 1/4] driver core: Add iommu_passthrough to struct device Lu Baolu
@ 2020-01-01  5:26 ` Lu Baolu
  2020-01-18  0:18   ` Bjorn Helgaas
  2020-01-21 14:17   ` Bjorn Helgaas
  2020-01-01  5:26 ` [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices Lu Baolu
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-01  5:26 UTC (permalink / raw)
  To: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: ashok.raj, jacob.jun.pan, kevin.tian, Christoph Hellwig,
	Robin Murphy, iommu, linux-kernel, Lu Baolu

The new parameter takes a list of devices separated by a semicolon.
Each device specified will have its iommu_passthrough bit in struct
device set. This is very similar to the existing 'disable_acs_redir'
parameter.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 +++
 drivers/pci/pci.c                             | 34 +++++++++++++++++++
 drivers/pci/pci.h                             |  1 +
 drivers/pci/probe.c                           |  2 ++
 4 files changed, 42 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..d3edc2cb6696 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3583,6 +3583,11 @@
 				may put more devices in an IOMMU group.
 		force_floating	[S390] Force usage of floating interrupts.
 		nomio		[S390] Do not use MIO instructions.
+		iommu_passthrough=<pci_dev>[; ...]
+				Specify one or more PCI devices (in the format
+				specified above) separated by semicolons.
+				Each device specified will bypass IOMMU DMA
+				translation.
 
 	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
 			Management.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 90dbd7c70371..05bf3f4acc36 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6401,6 +6401,37 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_fixup_cardbus);
 
+static const char *iommu_passthrough_param;
+bool pci_iommu_passthrough_match(struct pci_dev *dev)
+{
+	int ret = 0;
+	const char *p = iommu_passthrough_param;
+
+	if (!p)
+		return false;
+
+	while (*p) {
+		ret = pci_dev_str_match(dev, p, &p);
+		if (ret < 0) {
+			pr_info_once("PCI: Can't parse iommu_passthrough parameter: %s\n",
+				     iommu_passthrough_param);
+
+			break;
+		} else if (ret == 1) {
+			pci_info(dev, "PCI: IOMMU passthrough\n");
+			return true;
+		}
+
+		if (*p != ';' && *p != ',') {
+			/* End of param or invalid format */
+			break;
+		}
+		p++;
+	}
+
+	return false;
+}
+
 static int __init pci_setup(char *str)
 {
 	while (str) {
@@ -6462,6 +6493,8 @@ static int __init pci_setup(char *str)
 				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
 			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
 				disable_acs_redir_param = str + 18;
+			} else if (!strncmp(str, "iommu_passthrough=", 18)) {
+				iommu_passthrough_param = str + 18;
 			} else {
 				pr_err("PCI: Unknown option `%s'\n", str);
 			}
@@ -6486,6 +6519,7 @@ static int __init pci_realloc_setup_params(void)
 	resource_alignment_param = kstrdup(resource_alignment_param,
 					   GFP_KERNEL);
 	disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
+	iommu_passthrough_param = kstrdup(iommu_passthrough_param, GFP_KERNEL);
 
 	return 0;
 }
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a0a53bd05a0b..95f6af06aba6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -288,6 +288,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
 struct pci_bus *pci_bus_get(struct pci_bus *bus);
 void pci_bus_put(struct pci_bus *bus);
+bool pci_iommu_passthrough_match(struct pci_dev *dev);
 
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 512cb4312ddd..4c571ee75621 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2404,6 +2404,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 
 	dev->state_saved = false;
 
+	dev->dev.iommu_passthrough = pci_iommu_passthrough_match(dev);
+
 	pci_init_capabilities(dev);
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
  2020-01-01  5:26 [RFC PATCH 0/4] iommu: Per-group default domain type Lu Baolu
  2020-01-01  5:26 ` [RFC PATCH 1/4] driver core: Add iommu_passthrough to struct device Lu Baolu
  2020-01-01  5:26 ` [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough Lu Baolu
@ 2020-01-01  5:26 ` Lu Baolu
  2020-01-17 10:21   ` Joerg Roedel
  2020-01-01  5:26 ` [RFC PATCH 4/4] iommu: Determine default domain type before allocating domain Lu Baolu
  2020-01-20  9:44 ` [RFC PATCH 0/4] iommu: Per-group default domain type John Garry
  4 siblings, 1 reply; 19+ messages in thread
From: Lu Baolu @ 2020-01-01  5:26 UTC (permalink / raw)
  To: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: ashok.raj, jacob.jun.pan, kevin.tian, Christoph Hellwig,
	Robin Murphy, iommu, linux-kernel, Lu Baolu

This splits iommu group allocation from adding devices. This makes
it possible to determine the default domain type for each group as
all devices belonging to the group have been determined.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 92 +++++++++++++++++++++++++++++++------------
 1 file changed, 66 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fdd40756dbc1..716326a2ee5b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -49,6 +49,7 @@ struct group_device {
 	struct list_head list;
 	struct device *dev;
 	char *name;
+	bool added;
 };
 
 struct iommu_group_attribute {
@@ -176,7 +177,6 @@ int iommu_probe_device(struct device *dev)
 	const struct iommu_ops *ops = dev->bus->iommu_ops;
 	int ret;
 
-	WARN_ON(dev->iommu_group);
 	if (!ops)
 		return -EINVAL;
 
@@ -686,13 +686,28 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
 int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 {
 	int ret, i = 0;
-	struct group_device *device;
+	struct group_device *device = NULL;
 
-	device = kzalloc(sizeof(*device), GFP_KERNEL);
-	if (!device)
-		return -ENOMEM;
+	mutex_lock(&group->mutex);
+	list_for_each_entry(device, &group->devices, list) {
+		if (device->dev == dev)
+			break;
+	}
+	mutex_unlock(&group->mutex);
 
-	device->dev = dev;
+	if (!device || device->dev != dev) {
+		device = kzalloc(sizeof(*device), GFP_KERNEL);
+		if (!device)
+			return -ENOMEM;
+
+		device->dev = dev;
+		mutex_lock(&group->mutex);
+		list_add_tail(&device->list, &group->devices);
+		mutex_unlock(&group->mutex);
+	} else if (device->added) {
+		kobject_get(group->devices_kobj);
+		return 0;
+	}
 
 	ret = sysfs_create_link(&dev->kobj, &group->kobj, "iommu_group");
 	if (ret)
@@ -728,13 +743,14 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 	iommu_group_create_direct_mappings(group, dev);
 
 	mutex_lock(&group->mutex);
-	list_add_tail(&device->list, &group->devices);
 	if (group->domain)
 		ret = __iommu_attach_device(group->domain, dev);
 	mutex_unlock(&group->mutex);
 	if (ret)
 		goto err_put_group;
 
+	device->added = true;
+
 	/* Notify any listeners about change to group. */
 	blocking_notifier_call_chain(&group->notifier,
 				     IOMMU_GROUP_NOTIFY_ADD_DEVICE, dev);
@@ -746,16 +762,16 @@ int iommu_group_add_device(struct iommu_group *group, struct device *dev)
 	return 0;
 
 err_put_group:
-	mutex_lock(&group->mutex);
-	list_del(&device->list);
-	mutex_unlock(&group->mutex);
-	dev->iommu_group = NULL;
 	kobject_put(group->devices_kobj);
 err_free_name:
 	kfree(device->name);
 err_remove_link:
 	sysfs_remove_link(&dev->kobj, "iommu_group");
 err_free_device:
+	mutex_lock(&group->mutex);
+	list_del(&device->list);
+	mutex_unlock(&group->mutex);
+	dev->iommu_group = NULL;
 	kfree(device);
 	dev_err(dev, "Failed to add to iommu group %d: %d\n", group->id, ret);
 	return ret;
@@ -1339,6 +1355,34 @@ struct iommu_group *fsl_mc_device_group(struct device *dev)
 	return group;
 }
 
+static int alloc_iommu_group(struct device *dev, void *data)
+{
+	const struct iommu_ops *ops = dev->bus->iommu_ops;
+	struct group_device *device;
+	struct iommu_group *group;
+
+	if (!ops || WARN_ON(dev->iommu_group))
+		return -EINVAL;
+
+	device = kzalloc(sizeof(*device), GFP_KERNEL);
+	if (!device)
+		return -ENOMEM;
+
+	group = ops->device_group(dev);
+	if (WARN_ON_ONCE(IS_ERR_OR_NULL(group))) {
+		kfree(device);
+		return -EINVAL;
+	}
+
+	device->dev = dev;
+	dev->iommu_group = group;
+	mutex_lock(&group->mutex);
+	list_add_tail(&device->list, &group->devices);
+	mutex_unlock(&group->mutex);
+
+	return 0;
+}
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
@@ -1351,23 +1395,15 @@ struct iommu_group *fsl_mc_device_group(struct device *dev)
  */
 struct iommu_group *iommu_group_get_for_dev(struct device *dev)
 {
-	const struct iommu_ops *ops = dev->bus->iommu_ops;
-	struct iommu_group *group;
+	struct iommu_group *group = dev->iommu_group;
 	int ret;
 
-	group = iommu_group_get(dev);
-	if (group)
-		return group;
-
-	if (!ops)
-		return ERR_PTR(-EINVAL);
-
-	group = ops->device_group(dev);
-	if (WARN_ON_ONCE(group == NULL))
-		return ERR_PTR(-EINVAL);
-
-	if (IS_ERR(group))
-		return group;
+	if (!group) {
+		ret = alloc_iommu_group(dev, NULL);
+		if (ret)
+			return ERR_PTR(ret);
+		group = dev->iommu_group;
+	}
 
 	/*
 	 * Try to allocate a default domain - needs support from the
@@ -1501,6 +1537,10 @@ static int iommu_bus_init(struct bus_type *bus, const struct iommu_ops *ops)
 	if (err)
 		goto out_free;
 
+	err = bus_for_each_dev(bus, NULL, NULL, alloc_iommu_group);
+	if (err)
+		goto out_err;
+
 	err = bus_for_each_dev(bus, NULL, NULL, add_iommu_group);
 	if (err)
 		goto out_err;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 4/4] iommu: Determine default domain type before allocating domain
  2020-01-01  5:26 [RFC PATCH 0/4] iommu: Per-group default domain type Lu Baolu
                   ` (2 preceding siblings ...)
  2020-01-01  5:26 ` [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices Lu Baolu
@ 2020-01-01  5:26 ` Lu Baolu
  2020-01-20  9:44 ` [RFC PATCH 0/4] iommu: Per-group default domain type John Garry
  4 siblings, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-01  5:26 UTC (permalink / raw)
  To: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: ashok.raj, jacob.jun.pan, kevin.tian, Christoph Hellwig,
	Robin Murphy, iommu, linux-kernel, Lu Baolu

Determine the default domain type for each group and use it to
allocate the iommu domain.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/iommu.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 716326a2ee5b..fc1df1acbd25 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -43,6 +43,7 @@ struct iommu_group {
 	int id;
 	struct iommu_domain *default_domain;
 	struct iommu_domain *domain;
+	unsigned int def_domain_type;
 };
 
 struct group_device {
@@ -1383,6 +1384,33 @@ static int alloc_iommu_group(struct device *dev, void *data)
 	return 0;
 }
 
+static void get_group_def_domain_type(struct iommu_group *group)
+{
+	struct group_device *tmp = NULL;
+
+	mutex_lock(&group->mutex);
+	list_for_each_entry(tmp, &group->devices, list) {
+		struct device *dev = tmp->dev;
+
+		/*
+		 * If there are any untrusted devices in the group, force
+		 * IOMMU_DOMAIN_DMA to prevent DMA attack from malicious
+		 * devices.
+		 */
+		if (dev_is_pci(dev) && to_pci_dev(dev)->untrusted) {
+			group->def_domain_type = IOMMU_DOMAIN_DMA;
+			break;
+		}
+
+		if (dev->iommu_passthrough)
+			group->def_domain_type = IOMMU_DOMAIN_IDENTITY;
+	}
+	mutex_unlock(&group->mutex);
+
+	if (!group->def_domain_type)
+		group->def_domain_type = iommu_def_domain_type;
+}
+
 /**
  * iommu_group_get_for_dev - Find or create the IOMMU group for a device
  * @dev: target device
@@ -1412,13 +1440,14 @@ struct iommu_group *iommu_group_get_for_dev(struct device *dev)
 	if (!group->default_domain) {
 		struct iommu_domain *dom;
 
-		dom = __iommu_domain_alloc(dev->bus, iommu_def_domain_type);
-		if (!dom && iommu_def_domain_type != IOMMU_DOMAIN_DMA) {
+		get_group_def_domain_type(group);
+		dom = __iommu_domain_alloc(dev->bus, group->def_domain_type);
+		if (!dom && group->def_domain_type != IOMMU_DOMAIN_DMA) {
 			dom = __iommu_domain_alloc(dev->bus, IOMMU_DOMAIN_DMA);
 			if (dom) {
 				dev_warn(dev,
 					 "failed to allocate default IOMMU domain of type %u; falling back to IOMMU_DOMAIN_DMA",
-					 iommu_def_domain_type);
+					 group->def_domain_type);
 			}
 		}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
  2020-01-01  5:26 ` [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices Lu Baolu
@ 2020-01-17 10:21   ` Joerg Roedel
  2020-01-18  2:18     ` Lu Baolu
  2020-01-19  6:29     ` Lu Baolu
  0 siblings, 2 replies; 19+ messages in thread
From: Joerg Roedel @ 2020-01-17 10:21 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Bjorn Helgaas, ashok.raj, jacob.jun.pan,
	kevin.tian, Christoph Hellwig, Robin Murphy, iommu, linux-kernel

On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote:
> This splits iommu group allocation from adding devices. This makes
> it possible to determine the default domain type for each group as
> all devices belonging to the group have been determined.

I think its better to keep group allocation as it is and just defer
default domain allocation after each device is in its group. But take
care about the device hotplug path which might add new devices to groups
which already have a default domain, or add new groups that might need a
default domain too.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
  2020-01-01  5:26 ` [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough Lu Baolu
@ 2020-01-18  0:18   ` Bjorn Helgaas
  2020-01-18  2:04     ` Lu Baolu
  2020-01-21 14:17   ` Bjorn Helgaas
  1 sibling, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2020-01-18  0:18 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Greg Kroah-Hartman, ashok.raj, jacob.jun.pan,
	kevin.tian, Christoph Hellwig, Robin Murphy, iommu, linux-kernel

On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote:
> The new parameter takes a list of devices separated by a semicolon.
> Each device specified will have its iommu_passthrough bit in struct
> device set. This is very similar to the existing 'disable_acs_redir'
> parameter.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 +++
>  drivers/pci/pci.c                             | 34 +++++++++++++++++++
>  drivers/pci/pci.h                             |  1 +
>  drivers/pci/probe.c                           |  2 ++
>  4 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ade4e6ec23e0..d3edc2cb6696 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3583,6 +3583,11 @@
>  				may put more devices in an IOMMU group.
>  		force_floating	[S390] Force usage of floating interrupts.
>  		nomio		[S390] Do not use MIO instructions.
> +		iommu_passthrough=<pci_dev>[; ...]
> +				Specify one or more PCI devices (in the format
> +				specified above) separated by semicolons.
> +				Each device specified will bypass IOMMU DMA
> +				translation.
>  
>  	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
>  			Management.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 90dbd7c70371..05bf3f4acc36 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6401,6 +6401,37 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus)
>  }
>  EXPORT_SYMBOL(pci_fixup_cardbus);
>  
> +static const char *iommu_passthrough_param;
> +bool pci_iommu_passthrough_match(struct pci_dev *dev)
> +{
> +	int ret = 0;
> +	const char *p = iommu_passthrough_param;
> +
> +	if (!p)
> +		return false;
> +
> +	while (*p) {
> +		ret = pci_dev_str_match(dev, p, &p);
> +		if (ret < 0) {
> +			pr_info_once("PCI: Can't parse iommu_passthrough parameter: %s\n",
> +				     iommu_passthrough_param);
> +
> +			break;
> +		} else if (ret == 1) {
> +			pci_info(dev, "PCI: IOMMU passthrough\n");
> +			return true;
> +		}
> +
> +		if (*p != ';' && *p != ',') {
> +			/* End of param or invalid format */
> +			break;
> +		}
> +		p++;
> +	}
> +
> +	return false;
> +}

This duplicates a lot of the code in pci_disable_acs_redir().  That
needs to be factored out somehow so we don't duplicate it.

Bjorn

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
  2020-01-18  0:18   ` Bjorn Helgaas
@ 2020-01-18  2:04     ` Lu Baolu
  0 siblings, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-18  2:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Joerg Roedel, Greg Kroah-Hartman, ashok.raj,
	jacob.jun.pan, kevin.tian, Christoph Hellwig, Robin Murphy,
	iommu, linux-kernel

Hi Bjorn,

On 1/18/20 8:18 AM, Bjorn Helgaas wrote:
> On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote:
>> The new parameter takes a list of devices separated by a semicolon.
>> Each device specified will have its iommu_passthrough bit in struct
>> device set. This is very similar to the existing 'disable_acs_redir'
>> parameter.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   .../admin-guide/kernel-parameters.txt         |  5 +++
>>   drivers/pci/pci.c                             | 34 +++++++++++++++++++
>>   drivers/pci/pci.h                             |  1 +
>>   drivers/pci/probe.c                           |  2 ++
>>   4 files changed, 42 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index ade4e6ec23e0..d3edc2cb6696 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3583,6 +3583,11 @@
>>   				may put more devices in an IOMMU group.
>>   		force_floating	[S390] Force usage of floating interrupts.
>>   		nomio		[S390] Do not use MIO instructions.
>> +		iommu_passthrough=<pci_dev>[; ...]
>> +				Specify one or more PCI devices (in the format
>> +				specified above) separated by semicolons.
>> +				Each device specified will bypass IOMMU DMA
>> +				translation.
>>   
>>   	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
>>   			Management.
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 90dbd7c70371..05bf3f4acc36 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -6401,6 +6401,37 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus)
>>   }
>>   EXPORT_SYMBOL(pci_fixup_cardbus);
>>   
>> +static const char *iommu_passthrough_param;
>> +bool pci_iommu_passthrough_match(struct pci_dev *dev)
>> +{
>> +	int ret = 0;
>> +	const char *p = iommu_passthrough_param;
>> +
>> +	if (!p)
>> +		return false;
>> +
>> +	while (*p) {
>> +		ret = pci_dev_str_match(dev, p, &p);
>> +		if (ret < 0) {
>> +			pr_info_once("PCI: Can't parse iommu_passthrough parameter: %s\n",
>> +				     iommu_passthrough_param);
>> +
>> +			break;
>> +		} else if (ret == 1) {
>> +			pci_info(dev, "PCI: IOMMU passthrough\n");
>> +			return true;
>> +		}
>> +
>> +		if (*p != ';' && *p != ',') {
>> +			/* End of param or invalid format */
>> +			break;
>> +		}
>> +		p++;
>> +	}
>> +
>> +	return false;
>> +}
> 
> This duplicates a lot of the code in pci_disable_acs_redir().  That
> needs to be factored out somehow so we don't duplicate it.
> 

Sure. I will try to refactor the code in the next version.

> Bjorn
> 

Best regards,
baolu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
  2020-01-17 10:21   ` Joerg Roedel
@ 2020-01-18  2:18     ` Lu Baolu
  2020-01-19  6:29     ` Lu Baolu
  1 sibling, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-18  2:18 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: baolu.lu, Greg Kroah-Hartman, Bjorn Helgaas, ashok.raj,
	jacob.jun.pan, kevin.tian, Christoph Hellwig, Robin Murphy,
	iommu, linux-kernel

Hi Joerg,

On 1/17/20 6:21 PM, Joerg Roedel wrote:
> On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote:
>> This splits iommu group allocation from adding devices. This makes
>> it possible to determine the default domain type for each group as
>> all devices belonging to the group have been determined.
> 
> I think its better to keep group allocation as it is and just defer
> default domain allocation after each device is in its group. But take
> care about the device hotplug path which might add new devices to groups
> which already have a default domain, or add new groups that might need a
> default domain too.
Thanks for the comment. It looks good to me. I will try to do it in the
next version.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
  2020-01-17 10:21   ` Joerg Roedel
  2020-01-18  2:18     ` Lu Baolu
@ 2020-01-19  6:29     ` Lu Baolu
  2020-01-21 12:45       ` Robin Murphy
  1 sibling, 1 reply; 19+ messages in thread
From: Lu Baolu @ 2020-01-19  6:29 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: baolu.lu, Greg Kroah-Hartman, Bjorn Helgaas, ashok.raj,
	jacob.jun.pan, kevin.tian, Christoph Hellwig, Robin Murphy,
	iommu, linux-kernel

Hi Joerg,

On 1/17/20 6:21 PM, Joerg Roedel wrote:
> On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote:
>> This splits iommu group allocation from adding devices. This makes
>> it possible to determine the default domain type for each group as
>> all devices belonging to the group have been determined.
> 
> I think its better to keep group allocation as it is and just defer
> default domain allocation after each device is in its group. But take

I tried defering default domain allocation, but it seems not possible.

The call path of adding devices into their groups:

iommu_probe_device
-> ops->add_device(dev)
    -> (iommu vendor driver) iommu_group_get_for_dev(dev)

After doing this, the vendor driver will get the default domain and
apply dma_ops according to the domain type. If we defer the domain
allocation, they will get a NULL default domain and cause panic in
the vendor driver.

Any suggestions?

Best regards,
baolu

> care about the device hotplug path which might add new devices to groups
> which already have a default domain, or add new groups that might need a
> default domain too.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] iommu: Per-group default domain type
  2020-01-01  5:26 [RFC PATCH 0/4] iommu: Per-group default domain type Lu Baolu
                   ` (3 preceding siblings ...)
  2020-01-01  5:26 ` [RFC PATCH 4/4] iommu: Determine default domain type before allocating domain Lu Baolu
@ 2020-01-20  9:44 ` John Garry
  2020-01-21  0:43   ` Lu Baolu
  4 siblings, 1 reply; 19+ messages in thread
From: John Garry @ 2020-01-20  9:44 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: kevin.tian, ashok.raj, linux-kernel, iommu, jacob.jun.pan,
	Robin Murphy, Christoph Hellwig

On 01/01/2020 05:26, Lu Baolu wrote:
> An IOMMU group represents the smallest set of devices that are considered
> to be isolated. All devices belonging to an IOMMU group share a default
> domain for DMA APIs. There are two types of default domain: IOMMU_DOMAIN_DMA
> and IOMMU_DOMAIN_IDENTITY. The former means IOMMU translation, while the
> latter means IOMMU by-pass.
> 
> Currently, the default domain type for the IOMMU groups is determined
> globally. All IOMMU groups use a single default domain type. The global
> default domain type can be adjusted by kernel build configuration or
> kernel parameters.
> 
> More and more users are looking forward to a fine grained default domain
> type. For example, with the global default domain type set to translation,
> the OEM verndors or end users might want some trusted and fast-speed devices
> to bypass IOMMU for performance gains. On the other hand, with global
> default domain type set to by-pass, some devices with limited system
> memory addressing capability might want IOMMU translation to remove the
> bounce buffer overhead.

Hi Lu Baolu,

Do you think that it would be a more common usecase to want 
kernel-managed devices to be passthrough for performance reasons and 
some select devices to be in DMA domain, like those with limited address 
cap or whose drivers request huge amounts of memory?

I just think it would be more manageable to set kernel commandline 
parameters for this, i.e. those select few which want DMA domain.

Thanks,
John

> 
> This series proposes per-group default domain type to meet these demands.
> It adds a per-device iommu_passthrough attribute. By setting this
> attribute, end users or device vendors are able to tell the IOMMU subsystem
> that this device is willing to use a default domain of IOMMU_DOMAIN_IDENTITY.
> The IOMMU device probe procedure is reformed to pre-allocate groups for
> all devices on a specific bus before adding the devices into the groups.
> This enables the IOMMU device probe precedure to determine a per-group
> default domain type before allocating IOMMU domains and attaching them
> to devices.
> 
> Please help to review it. Your comments and suggestions are appricated.
> 
> Best regards,
> baolu
> 
> Lu Baolu (4):
>    driver core: Add iommu_passthrough to struct device
>    PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
>    iommu: Preallocate iommu group when probing devices
>    iommu: Determine default domain type before allocating domain
> 
>   .../admin-guide/kernel-parameters.txt         |   5 +
>   drivers/iommu/iommu.c                         | 127 ++++++++++++++----
>   drivers/pci/pci.c                             |  34 +++++
>   drivers/pci/pci.h                             |   1 +
>   drivers/pci/probe.c                           |   2 +
>   include/linux/device.h                        |   3 +
>   6 files changed, 143 insertions(+), 29 deletions(-)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] iommu: Per-group default domain type
  2020-01-20  9:44 ` [RFC PATCH 0/4] iommu: Per-group default domain type John Garry
@ 2020-01-21  0:43   ` Lu Baolu
  2020-01-21 10:14     ` John Garry
  0 siblings, 1 reply; 19+ messages in thread
From: Lu Baolu @ 2020-01-21  0:43 UTC (permalink / raw)
  To: John Garry, Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: baolu.lu, kevin.tian, ashok.raj, linux-kernel, iommu,
	jacob.jun.pan, Robin Murphy, Christoph Hellwig

Hi John,

On 1/20/20 5:44 PM, John Garry wrote:
> On 01/01/2020 05:26, Lu Baolu wrote:
>> An IOMMU group represents the smallest set of devices that are considered
>> to be isolated. All devices belonging to an IOMMU group share a default
>> domain for DMA APIs. There are two types of default domain: 
>> IOMMU_DOMAIN_DMA
>> and IOMMU_DOMAIN_IDENTITY. The former means IOMMU translation, while the
>> latter means IOMMU by-pass.
>>
>> Currently, the default domain type for the IOMMU groups is determined
>> globally. All IOMMU groups use a single default domain type. The global
>> default domain type can be adjusted by kernel build configuration or
>> kernel parameters.
>>
>> More and more users are looking forward to a fine grained default domain
>> type. For example, with the global default domain type set to 
>> translation,
>> the OEM verndors or end users might want some trusted and fast-speed 
>> devices
>> to bypass IOMMU for performance gains. On the other hand, with global
>> default domain type set to by-pass, some devices with limited system
>> memory addressing capability might want IOMMU translation to remove the
>> bounce buffer overhead.
> 
> Hi Lu Baolu,
> 
> Do you think that it would be a more common usecase to want 
> kernel-managed devices to be passthrough for performance reasons and 
> some select devices to be in DMA domain, like those with limited address 
> cap or whose drivers request huge amounts of memory?
> 
> I just think it would be more manageable to set kernel commandline 
> parameters for this, i.e. those select few which want DMA domain.
>

It's just two sides of a coin. Currently, iommu subsystem make DMA
domain by default, that's the reason why I selected to let user set
which devices are willing to use identity domains.


> Thanks,
> John

Best regards,
baolu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] iommu: Per-group default domain type
  2020-01-21  0:43   ` Lu Baolu
@ 2020-01-21 10:14     ` John Garry
  2020-01-22  4:58       ` Lu Baolu
  0 siblings, 1 reply; 19+ messages in thread
From: John Garry @ 2020-01-21 10:14 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: kevin.tian, ashok.raj, linux-kernel, iommu, jacob.jun.pan,
	Robin Murphy, Christoph Hellwig, Sai Praneeth Prakhya

On 21/01/2020 00:43, Lu Baolu wrote:
>>> An IOMMU group represents the smallest set of devices that are 
>>> considered
>>> to be isolated. All devices belonging to an IOMMU group share a default
>>> domain for DMA APIs. There are two types of default domain: 
>>> IOMMU_DOMAIN_DMA
>>> and IOMMU_DOMAIN_IDENTITY. The former means IOMMU translation, while the
>>> latter means IOMMU by-pass.
>>>
>>> Currently, the default domain type for the IOMMU groups is determined
>>> globally. All IOMMU groups use a single default domain type. The global
>>> default domain type can be adjusted by kernel build configuration or
>>> kernel parameters.
>>>
>>> More and more users are looking forward to a fine grained default domain
>>> type. For example, with the global default domain type set to 
>>> translation,
>>> the OEM verndors or end users might want some trusted and fast-speed 
>>> devices
>>> to bypass IOMMU for performance gains. On the other hand, with global
>>> default domain type set to by-pass, some devices with limited system
>>> memory addressing capability might want IOMMU translation to remove the
>>> bounce buffer overhead.
>>
>> Hi Lu Baolu,
>>
>> Do you think that it would be a more common usecase to want 
>> kernel-managed devices to be passthrough for performance reasons and 
>> some select devices to be in DMA domain, like those with limited 
>> address cap or whose drivers request huge amounts of memory?
>>
>> I just think it would be more manageable to set kernel commandline 
>> parameters for this, i.e. those select few which want DMA domain.
>>

Hi Baolu,

> 
> It's just two sides of a coin. Currently, iommu subsystem make DMA
> domain by default, that's the reason why I selected to let user set
> which devices are willing to use identity domains.
> 

OK, understood.

There was an alternate solution here which would allow per-group type to 
be updated via sysfs:

https://lore.kernel.org/linux-iommu/cover.1566353521.git.sai.praneeth.prakhya@intel.com/

Any idea what happened to that?

Cheers,
John


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
  2020-01-19  6:29     ` Lu Baolu
@ 2020-01-21 12:45       ` Robin Murphy
  2020-01-22  5:39         ` Lu Baolu
  0 siblings, 1 reply; 19+ messages in thread
From: Robin Murphy @ 2020-01-21 12:45 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Greg Kroah-Hartman, Bjorn Helgaas, ashok.raj, jacob.jun.pan,
	kevin.tian, Christoph Hellwig, iommu, linux-kernel

On 19/01/2020 6:29 am, Lu Baolu wrote:
> Hi Joerg,
> 
> On 1/17/20 6:21 PM, Joerg Roedel wrote:
>> On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote:
>>> This splits iommu group allocation from adding devices. This makes
>>> it possible to determine the default domain type for each group as
>>> all devices belonging to the group have been determined.
>>
>> I think its better to keep group allocation as it is and just defer
>> default domain allocation after each device is in its group. But take
> 
> I tried defering default domain allocation, but it seems not possible.
> 
> The call path of adding devices into their groups:
> 
> iommu_probe_device
> -> ops->add_device(dev)
>     -> (iommu vendor driver) iommu_group_get_for_dev(dev)
> 
> After doing this, the vendor driver will get the default domain and
> apply dma_ops according to the domain type. If we defer the domain
> allocation, they will get a NULL default domain and cause panic in
> the vendor driver.
> 
> Any suggestions?

https://lore.kernel.org/linux-iommu/6dbbfc10-3247-744c-ae8d-443a336e0c50@linux.intel.com/

Haven't we been here before? ;)

Since we can't (safely or reasonably) change a group's default domain 
after ops->add_device() has returned, and in general it gets impractical 
to evaluate "all device in a group" once you look beyond &pci_bus_type 
(or consider hotplug as mentioned), then AFAICS there's no reasonable 
way to get away from the default domain type being defined by the first 
device to attach. But in practice it's hardly a problem anyway - if 
every device in a given group requests the same domain type then it 
doesn't matter which comes first, and if they don't then we ultimately 
end up with an impossible set of constraints, so are doomed to do the 
'wrong' thing regardless.

Thus unless anyone wants to start redefining the whole group concept to 
separate the notions of ID aliasing and peer-to-peer isolation (which 
still wouldn't necessarily help), I think this user override effectively 
boils down to just another flavour of iommu_request_*_for_dev(), and as 
such comes right back to the "just pass the bloody device to 
ops->domain_alloc() and resolve everything up-front" argument.

Robin.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
  2020-01-01  5:26 ` [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough Lu Baolu
  2020-01-18  0:18   ` Bjorn Helgaas
@ 2020-01-21 14:17   ` Bjorn Helgaas
  2020-01-22  4:49     ` Lu Baolu
  1 sibling, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2020-01-21 14:17 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Joerg Roedel, Greg Kroah-Hartman, ashok.raj, jacob.jun.pan,
	kevin.tian, Christoph Hellwig, Robin Murphy, iommu, linux-kernel,
	linux-pci

[+cc linux-pci, thread at https://lore.kernel.org/r/20200101052648.14295-1-baolu.lu@linux.intel.com]

On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote:
> The new parameter takes a list of devices separated by a semicolon.
> Each device specified will have its iommu_passthrough bit in struct
> device set. This is very similar to the existing 'disable_acs_redir'
> parameter.

Almost all of this patchset is in drivers/iommu.  Should the parameter
be "iommu ..." instead of "pci=iommu_passthrough=..."?

There is already an "iommu.passthrough=" argument.  Would this fit
better there?  Since the iommu_passthrough bit is generic, it seems
like you anticipate similar situations for non-PCI devices.

> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  5 +++
>  drivers/pci/pci.c                             | 34 +++++++++++++++++++
>  drivers/pci/pci.h                             |  1 +
>  drivers/pci/probe.c                           |  2 ++
>  4 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ade4e6ec23e0..d3edc2cb6696 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3583,6 +3583,11 @@
>  				may put more devices in an IOMMU group.
>  		force_floating	[S390] Force usage of floating interrupts.
>  		nomio		[S390] Do not use MIO instructions.
> +		iommu_passthrough=<pci_dev>[; ...]
> +				Specify one or more PCI devices (in the format
> +				specified above) separated by semicolons.
> +				Each device specified will bypass IOMMU DMA
> +				translation.
>  
>  	pcie_aspm=	[PCIE] Forcibly enable or disable PCIe Active State Power
>  			Management.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 90dbd7c70371..05bf3f4acc36 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6401,6 +6401,37 @@ void __weak pci_fixup_cardbus(struct pci_bus *bus)
>  }
>  EXPORT_SYMBOL(pci_fixup_cardbus);
>  
> +static const char *iommu_passthrough_param;
> +bool pci_iommu_passthrough_match(struct pci_dev *dev)
> +{
> +	int ret = 0;
> +	const char *p = iommu_passthrough_param;
> +
> +	if (!p)
> +		return false;
> +
> +	while (*p) {
> +		ret = pci_dev_str_match(dev, p, &p);
> +		if (ret < 0) {
> +			pr_info_once("PCI: Can't parse iommu_passthrough parameter: %s\n",
> +				     iommu_passthrough_param);
> +
> +			break;
> +		} else if (ret == 1) {
> +			pci_info(dev, "PCI: IOMMU passthrough\n");
> +			return true;
> +		}
> +
> +		if (*p != ';' && *p != ',') {
> +			/* End of param or invalid format */
> +			break;
> +		}
> +		p++;
> +	}
> +
> +	return false;
> +}
> +
>  static int __init pci_setup(char *str)
>  {
>  	while (str) {
> @@ -6462,6 +6493,8 @@ static int __init pci_setup(char *str)
>  				pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
>  			} else if (!strncmp(str, "disable_acs_redir=", 18)) {
>  				disable_acs_redir_param = str + 18;
> +			} else if (!strncmp(str, "iommu_passthrough=", 18)) {
> +				iommu_passthrough_param = str + 18;
>  			} else {
>  				pr_err("PCI: Unknown option `%s'\n", str);
>  			}
> @@ -6486,6 +6519,7 @@ static int __init pci_realloc_setup_params(void)
>  	resource_alignment_param = kstrdup(resource_alignment_param,
>  					   GFP_KERNEL);
>  	disable_acs_redir_param = kstrdup(disable_acs_redir_param, GFP_KERNEL);
> +	iommu_passthrough_param = kstrdup(iommu_passthrough_param, GFP_KERNEL);
>  
>  	return 0;
>  }
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index a0a53bd05a0b..95f6af06aba6 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -288,6 +288,7 @@ void pci_reassigndev_resource_alignment(struct pci_dev *dev);
>  void pci_disable_bridge_window(struct pci_dev *dev);
>  struct pci_bus *pci_bus_get(struct pci_bus *bus);
>  void pci_bus_put(struct pci_bus *bus);
> +bool pci_iommu_passthrough_match(struct pci_dev *dev);
>  
>  /* PCIe link information */
>  #define PCIE_SPEED2STR(speed) \
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 512cb4312ddd..4c571ee75621 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2404,6 +2404,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>  
>  	dev->state_saved = false;
>  
> +	dev->dev.iommu_passthrough = pci_iommu_passthrough_match(dev);
> +
>  	pci_init_capabilities(dev);
>  
>  	/*
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough
  2020-01-21 14:17   ` Bjorn Helgaas
@ 2020-01-22  4:49     ` Lu Baolu
  0 siblings, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-22  4:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: baolu.lu, Joerg Roedel, Greg Kroah-Hartman, ashok.raj,
	jacob.jun.pan, kevin.tian, Christoph Hellwig, Robin Murphy,
	iommu, linux-kernel, linux-pci

Hi Bjorn,

On 1/21/20 10:17 PM, Bjorn Helgaas wrote:
> [+cc linux-pci, thread athttps://lore.kernel.org/r/20200101052648.14295-1-baolu.lu@linux.intel.com]
> 
> On Wed, Jan 01, 2020 at 01:26:46PM +0800, Lu Baolu wrote:
>> The new parameter takes a list of devices separated by a semicolon.
>> Each device specified will have its iommu_passthrough bit in struct
>> device set. This is very similar to the existing 'disable_acs_redir'
>> parameter.
> Almost all of this patchset is in drivers/iommu.  Should the parameter
> be "iommu ..." instead of "pci=iommu_passthrough=..."?
> 
> There is already an "iommu.passthrough=" argument.  Would this fit
> better there?  Since the iommu_passthrough bit is generic, it seems
> like you anticipate similar situations for non-PCI devices.
> 

Yes. Fair enough.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/4] iommu: Per-group default domain type
  2020-01-21 10:14     ` John Garry
@ 2020-01-22  4:58       ` Lu Baolu
  0 siblings, 0 replies; 19+ messages in thread
From: Lu Baolu @ 2020-01-22  4:58 UTC (permalink / raw)
  To: John Garry, Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas
  Cc: baolu.lu, kevin.tian, ashok.raj, linux-kernel, iommu,
	jacob.jun.pan, Robin Murphy, Christoph Hellwig,
	Sai Praneeth Prakhya

Hi,

On 1/21/20 6:14 PM, John Garry wrote:
> On 21/01/2020 00:43, Lu Baolu wrote:
>>>> An IOMMU group represents the smallest set of devices that are 
>>>> considered
>>>> to be isolated. All devices belonging to an IOMMU group share a default
>>>> domain for DMA APIs. There are two types of default domain: 
>>>> IOMMU_DOMAIN_DMA
>>>> and IOMMU_DOMAIN_IDENTITY. The former means IOMMU translation, while 
>>>> the
>>>> latter means IOMMU by-pass.
>>>>
>>>> Currently, the default domain type for the IOMMU groups is determined
>>>> globally. All IOMMU groups use a single default domain type. The global
>>>> default domain type can be adjusted by kernel build configuration or
>>>> kernel parameters.
>>>>
>>>> More and more users are looking forward to a fine grained default 
>>>> domain
>>>> type. For example, with the global default domain type set to 
>>>> translation,
>>>> the OEM verndors or end users might want some trusted and fast-speed 
>>>> devices
>>>> to bypass IOMMU for performance gains. On the other hand, with global
>>>> default domain type set to by-pass, some devices with limited system
>>>> memory addressing capability might want IOMMU translation to remove the
>>>> bounce buffer overhead.
>>>
>>> Hi Lu Baolu,
>>>
>>> Do you think that it would be a more common usecase to want 
>>> kernel-managed devices to be passthrough for performance reasons and 
>>> some select devices to be in DMA domain, like those with limited 
>>> address cap or whose drivers request huge amounts of memory?
>>>
>>> I just think it would be more manageable to set kernel commandline 
>>> parameters for this, i.e. those select few which want DMA domain.
>>>
> 
> Hi Baolu,
> 
>>
>> It's just two sides of a coin. Currently, iommu subsystem make DMA
>> domain by default, that's the reason why I selected to let user set
>> which devices are willing to use identity domains.
>>
> 
> OK, understood.
> 
> There was an alternate solution here which would allow per-group type to 
> be updated via sysfs:
> 
> https://lore.kernel.org/linux-iommu/cover.1566353521.git.sai.praneeth.prakhya@intel.com/ 
> 

Yes. My patch set just tries to do this statically during boot time.

> 
> Any idea what happened to that?

No idea. Sai might have more information. :-)

Best regards,
baolu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
  2020-01-21 12:45       ` Robin Murphy
@ 2020-01-22  5:39         ` Lu Baolu
  2020-01-23 14:55           ` Robin Murphy
  0 siblings, 1 reply; 19+ messages in thread
From: Lu Baolu @ 2020-01-22  5:39 UTC (permalink / raw)
  To: Robin Murphy, Joerg Roedel
  Cc: baolu.lu, Greg Kroah-Hartman, Bjorn Helgaas, ashok.raj,
	jacob.jun.pan, kevin.tian, Christoph Hellwig, iommu,
	linux-kernel

Hi Robin,

On 1/21/20 8:45 PM, Robin Murphy wrote:
> On 19/01/2020 6:29 am, Lu Baolu wrote:
>> Hi Joerg,
>>
>> On 1/17/20 6:21 PM, Joerg Roedel wrote:
>>> On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote:
>>>> This splits iommu group allocation from adding devices. This makes
>>>> it possible to determine the default domain type for each group as
>>>> all devices belonging to the group have been determined.
>>>
>>> I think its better to keep group allocation as it is and just defer
>>> default domain allocation after each device is in its group. But take
>>
>> I tried defering default domain allocation, but it seems not possible.
>>
>> The call path of adding devices into their groups:
>>
>> iommu_probe_device
>> -> ops->add_device(dev)
>>     -> (iommu vendor driver) iommu_group_get_for_dev(dev)
>>
>> After doing this, the vendor driver will get the default domain and
>> apply dma_ops according to the domain type. If we defer the domain
>> allocation, they will get a NULL default domain and cause panic in
>> the vendor driver.
>>
>> Any suggestions?
> 
> https://lore.kernel.org/linux-iommu/6dbbfc10-3247-744c-ae8d-443a336e0c50@linux.intel.com/ 
> 
> 
> Haven't we been here before? ;)
> 
> Since we can't (safely or reasonably) change a group's default domain 
> after ops->add_device() has returned, and in general it gets impractical 
> to evaluate "all device in a group" once you look beyond &pci_bus_type 
> (or consider hotplug as mentioned), then AFAICS there's no reasonable 
> way to get away from the default domain type being defined by the first 
> device to attach.

Yes, agreed.

> But in practice it's hardly a problem anyway - if 
> every device in a given group requests the same domain type then it 
> doesn't matter which comes first, and if they don't then we ultimately 
> end up with an impossible set of constraints, so are doomed to do the 
> 'wrong' thing regardless.

The third case is, for example, three devices A, B, C in a group. The
first device A is neutral about which type of default domain type is
used. So the iommu framework will use a static default domain. But the
device B requires to use a specific one which is different from the
default. Currently, this is handled in the vendor iommu driver and one
motivation of this patch set is to handle this in the generic layer.

> 
> Thus unless anyone wants to start redefining the whole group concept to 
> separate the notions of ID aliasing and peer-to-peer isolation (which 
> still wouldn't necessarily help), I think this user override effectively 
> boils down to just another flavour of iommu_request_*_for_dev(), and as 
> such comes right back to the "just pass the bloody device to 
> ops->domain_alloc() and resolve everything up-front" argument.
> 
> Robin.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices
  2020-01-22  5:39         ` Lu Baolu
@ 2020-01-23 14:55           ` Robin Murphy
  0 siblings, 0 replies; 19+ messages in thread
From: Robin Murphy @ 2020-01-23 14:55 UTC (permalink / raw)
  To: Lu Baolu, Joerg Roedel
  Cc: Greg Kroah-Hartman, Bjorn Helgaas, ashok.raj, jacob.jun.pan,
	kevin.tian, Christoph Hellwig, iommu, linux-kernel

On 22/01/2020 5:39 am, Lu Baolu wrote:
> Hi Robin,
> 
> On 1/21/20 8:45 PM, Robin Murphy wrote:
>> On 19/01/2020 6:29 am, Lu Baolu wrote:
>>> Hi Joerg,
>>>
>>> On 1/17/20 6:21 PM, Joerg Roedel wrote:
>>>> On Wed, Jan 01, 2020 at 01:26:47PM +0800, Lu Baolu wrote:
>>>>> This splits iommu group allocation from adding devices. This makes
>>>>> it possible to determine the default domain type for each group as
>>>>> all devices belonging to the group have been determined.
>>>>
>>>> I think its better to keep group allocation as it is and just defer
>>>> default domain allocation after each device is in its group. But take
>>>
>>> I tried defering default domain allocation, but it seems not possible.
>>>
>>> The call path of adding devices into their groups:
>>>
>>> iommu_probe_device
>>> -> ops->add_device(dev)
>>>     -> (iommu vendor driver) iommu_group_get_for_dev(dev)
>>>
>>> After doing this, the vendor driver will get the default domain and
>>> apply dma_ops according to the domain type. If we defer the domain
>>> allocation, they will get a NULL default domain and cause panic in
>>> the vendor driver.
>>>
>>> Any suggestions?
>>
>> https://lore.kernel.org/linux-iommu/6dbbfc10-3247-744c-ae8d-443a336e0c50@linux.intel.com/ 
>>
>>
>> Haven't we been here before? ;)
>>
>> Since we can't (safely or reasonably) change a group's default domain 
>> after ops->add_device() has returned, and in general it gets 
>> impractical to evaluate "all device in a group" once you look beyond 
>> &pci_bus_type (or consider hotplug as mentioned), then AFAICS there's 
>> no reasonable way to get away from the default domain type being 
>> defined by the first device to attach.
> 
> Yes, agreed.
> 
>> But in practice it's hardly a problem anyway - if every device in a 
>> given group requests the same domain type then it doesn't matter which 
>> comes first, and if they don't then we ultimately end up with an 
>> impossible set of constraints, so are doomed to do the 'wrong' thing 
>> regardless.
> 
> The third case is, for example, three devices A, B, C in a group. The
> first device A is neutral about which type of default domain type is
> used. So the iommu framework will use a static default domain. But the
> device B requires to use a specific one which is different from the
> default. Currently, this is handled in the vendor iommu driver and one
> motivation of this patch set is to handle this in the generic layer.

Yes, I wasn't explicitly considering that particular case, but it mostly 
falls out more or less the same way. Given that multi-device groups 
*should* be relatively rare, for the user override it seems reasonable 
to expect the user to see when devices get grouped and specify all of 
them to achieve the desired result; the trusted/untrusted attribute 
definitely shouldn't differ within any given group; and 
opportunistically replacing passthrough domains with translation domains 
for DMA-limited devices can only ever be a best-effort thing without 
consistent results, since at best that still comes down to which driver 
probed and called dma_set_mask() first.

Platform-specific exceptions like in device_def_domain_type() probably 
do want to stay in the individual drivers, but rolling that up into 
default domain allocation would be neat, and functionally no worse than 
the existing process.

In principle we could fairly easily delay allocating a group's default 
domain until the first driver bind event. It wouldn't help universally - 
in the absolute worst case, device B might only be created at all by 
device A's driver probing - and it might need careful coordination in 
areas like the bus->dma_configure() flow, but it could at least help 
accommodate the more common PCI case.

Robin.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-01-23 14:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-01  5:26 [RFC PATCH 0/4] iommu: Per-group default domain type Lu Baolu
2020-01-01  5:26 ` [RFC PATCH 1/4] driver core: Add iommu_passthrough to struct device Lu Baolu
2020-01-01  5:26 ` [RFC PATCH 2/4] PCI: Add "pci=iommu_passthrough=" parameter for iommu passthrough Lu Baolu
2020-01-18  0:18   ` Bjorn Helgaas
2020-01-18  2:04     ` Lu Baolu
2020-01-21 14:17   ` Bjorn Helgaas
2020-01-22  4:49     ` Lu Baolu
2020-01-01  5:26 ` [RFC PATCH 3/4] iommu: Preallocate iommu group when probing devices Lu Baolu
2020-01-17 10:21   ` Joerg Roedel
2020-01-18  2:18     ` Lu Baolu
2020-01-19  6:29     ` Lu Baolu
2020-01-21 12:45       ` Robin Murphy
2020-01-22  5:39         ` Lu Baolu
2020-01-23 14:55           ` Robin Murphy
2020-01-01  5:26 ` [RFC PATCH 4/4] iommu: Determine default domain type before allocating domain Lu Baolu
2020-01-20  9:44 ` [RFC PATCH 0/4] iommu: Per-group default domain type John Garry
2020-01-21  0:43   ` Lu Baolu
2020-01-21 10:14     ` John Garry
2020-01-22  4:58       ` Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).