linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
@ 2021-06-21 23:52 Douglas Anderson
  2021-06-21 23:52 ` [PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers Douglas Anderson
                   ` (7 more replies)
  0 siblings, 8 replies; 34+ messages in thread
From: Douglas Anderson @ 2021-06-21 23:52 UTC (permalink / raw)
  To: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Douglas Anderson, Andy Gross,
	Bartosz Golaszewski, Dan Williams, Geert Uytterhoeven,
	Heikki Krogerus, Randy Dunlap, linux-kernel


This patch attempts to put forward a proposal for enabling non-strict
DMA on a device-by-device basis. The patch series requests non-strict
DMA for the Qualcomm SDHCI controller as a first device to enable,
getting a nice bump in performance with what's believed to be a very
small drop in security / safety (see the patch for the full argument).

As part of this patch series I am end up slightly cleaning up some of
the interactions between the PCI subsystem and the IOMMU subsystem but
I don't go all the way to fully remove all the tentacles. Specifically
this patch series only concerns itself with a single aspect: strict
vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
to talk about / reason about for more subsystems compared to overall
deciding what it means for a device to be "external" or "untrusted".

If something like this patch series ends up being landable, it will
undoubtedly need coordination between many maintainers to land. I
believe it's fully bisectable but later patches in the series
definitely depend on earlier ones. Sorry for the long CC list. :(


Douglas Anderson (6):
  drivers: base: Add the concept of "pre_probe" to drivers
  drivers: base: Add bits to struct device to control iommu strictness
  PCI: Indicate that we want to force strict DMA for untrusted devices
  iommu: Combine device strictness requests with the global default
  iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
  mmc: sdhci-msm: Request non-strict IOMMU mode

 drivers/base/dd.c             | 10 +++++--
 drivers/iommu/dma-iommu.c     |  2 +-
 drivers/iommu/iommu.c         | 56 +++++++++++++++++++++++++++--------
 drivers/mmc/host/sdhci-msm.c  |  8 +++++
 drivers/pci/probe.c           |  4 ++-
 include/linux/device.h        | 11 +++++++
 include/linux/device/driver.h |  9 ++++++
 include/linux/iommu.h         |  2 ++
 8 files changed, 85 insertions(+), 17 deletions(-)

-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
@ 2021-06-21 23:52 ` Douglas Anderson
  2021-06-24 13:35   ` Greg KH
  2021-06-21 23:52 ` [PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness Douglas Anderson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Douglas Anderson @ 2021-06-21 23:52 UTC (permalink / raw)
  To: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Douglas Anderson, Geert Uytterhoeven,
	linux-kernel

Right now things are a bit awkward if a driver would like a chance to
run before some of the more "automatic" things (pinctrl, DMA, IOMMUs,
...) happen to a device. This patch aims to fix that problem by
introducing the concept of a "pre_probe" function that drivers can
implement to run before the "automatic" stuff.

Why would you want to run before the "automatic" stuff? The incentive
in my case is that I want to be able to fill in some boolean flags in
the "struct device" before the IOMMU init runs. It appears that the
strictness vs. non-strictness of a device's iommu config is determined
once at init time and can't be changed afterwards. However, I would
like to avoid hardcoding the rules for strictness in the IOMMU
driver. Instead I'd like to let individual drivers be able to make
informed decisions about the appropriateness of strictness
vs. non-strictness.

The desire for running code pre_probe is likely not limited to my use
case. I believe that the list "qcom_smmu_client_of_match" is hacked
into the iommu driver specifically because there was no real good
framework for this. For the existing list it wasn't _quite_ as ugly as
my needs since the decision could be made solely on compatible string,
but it still feels like it would have been better for individual
drivers to run code and setup some state rather than coding up a big
list in the IOMMU driver.

Even without this patch, I believe it is possible for a driver to run
before the "automatic" things by registering for
"BUS_NOTIFY_BIND_DRIVER" in its init call, though I haven't personally
tested this. Using the notifier is a bit awkward, though, and I'd
rather avoid it. Also, using "BUS_NOTIFY_BIND_DRIVER" would require
drivers to stop using the convenience module_platform_driver() helper
and roll a bunch of boilerplate code.

NOTE: the pre_probe here is listed in the driver structure. As a side
effect of this it will be passed a "struct device *" rather than the
more specific device type (like the "struct platform_device *" that
most platform devices get passed to their probe). Presumably this
won't cause trouble and it's a lot less code to write but if we need
to make it more symmetric that's also possible by touching more files.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/base/dd.c             | 10 ++++++++--
 include/linux/device/driver.h |  9 +++++++++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index ecd7cf848daf..9a13bff8dafa 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -549,10 +549,16 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 re_probe:
 	dev->driver = drv;
 
+	if (drv->pre_probe) {
+		ret = drv->pre_probe(dev);
+		if (ret)
+			goto probe_failed_pre_dma;
+	}
+
 	/* If using pinctrl, bind pins now before probing */
 	ret = pinctrl_bind_pins(dev);
 	if (ret)
-		goto pinctrl_bind_failed;
+		goto probe_failed_pre_dma;
 
 	if (dev->bus->dma_configure) {
 		ret = dev->bus->dma_configure(dev);
@@ -639,7 +645,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	if (dev->bus)
 		blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
-pinctrl_bind_failed:
+probe_failed_pre_dma:
 	device_links_no_driver(dev);
 	devres_release_all(dev);
 	arch_teardown_dma_ops(dev);
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf4993..f7305dd6ceb1 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -57,6 +57,14 @@ enum probe_type {
  * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
+ * @pre_probe:	Called after a device has been bound to a driver but before
+ *		anything "automatic" (pinctrl, DMA, IOMMUs, ...) has been
+ *		setup. This is mostly a chance for the driver to do things
+ *		that might need to be run before any of those automatic
+ *		processes. The vast majority of devices don't need to
+ *		implement this. Note that there is no "post_remove" at the
+ *		moment. If you need to undo something that you did in
+ *		pre_probe() you can use devres.
  * @probe:	Called to query the existence of a specific device,
  *		whether this driver can work with it, and bind the driver
  *		to a specific device.
@@ -105,6 +113,7 @@ struct device_driver {
 	const struct of_device_id	*of_match_table;
 	const struct acpi_device_id	*acpi_match_table;
 
+	int (*pre_probe) (struct device *dev);
 	int (*probe) (struct device *dev);
 	void (*sync_state)(struct device *dev);
 	int (*remove) (struct device *dev);
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
  2021-06-21 23:52 ` [PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers Douglas Anderson
@ 2021-06-21 23:52 ` Douglas Anderson
  2021-06-24 13:36   ` Greg KH
  2021-06-21 23:52 ` [PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices Douglas Anderson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Douglas Anderson @ 2021-06-21 23:52 UTC (permalink / raw)
  To: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Douglas Anderson, Bartosz Golaszewski,
	Dan Williams, Heikki Krogerus, Randy Dunlap, linux-kernel

How to control the "strictness" of an IOMMU is a bit of a mess right
now. As far as I can tell, right now:
* You can set the default to "non-strict" and some devices (right now,
  only PCI devices) can request to run in "strict" mode.
* You can set the default to "strict" and no devices in the system are
  allowed to run as "non-strict".

I believe this needs to be improved a bit. Specifically:
* We should be able to default to "strict" mode but let devices that
  claim to be fairly low risk request that they be run in "non-strict"
  mode.
* We should allow devices outside of PCIe to request "strict" mode if
  the system default is "non-strict".

I believe the correct way to do this is two bits in "struct
device". One allows a device to force things to "strict" mode and the
other allows a device to _request_ "non-strict" mode. The asymmetry
here is on purpose. Generally if anything in the system makes a
request for strictness of something then we want it strict. Thus
drivers can only request (but not force) non-strictness.

It's expected that the strictness fields can be filled in by the bus
code like in the patch ("PCI: Indicate that we want to force strict
DMA for untrusted devices") or by using the new pre_probe concept
introduced in the patch ("drivers: base: Add the concept of
"pre_probe" to drivers").

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 include/linux/device.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index f1a00040fa53..c1b985e10c47 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -449,6 +449,15 @@ struct dev_links_info {
  *		and optionall (if the coherent mask is large enough) also
  *		for dma allocations.  This flag is managed by the dma ops
  *		instance from ->dma_supported.
+ * @force_strict_iommu: If set to %true then we should force this device to
+ *			iommu.strict regardless of the other defaults in the
+ *			system. Only has an effect if an IOMMU is in place.
+ * @request_non_strict_iommu: If set to %true and there are no other known
+ *			      reasons to make the iommu.strict for this device,
+ *			      then default to non-strict mode. This implies
+ *			      some belief that the DMA master for this device
+ *			      won't abuse the DMA path to compromise the kernel.
+ *			      Only has an effect if an IOMMU is in place.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -557,6 +566,8 @@ struct device {
 #ifdef CONFIG_DMA_OPS_BYPASS
 	bool			dma_ops_bypass : 1;
 #endif
+	bool			force_strict_iommu:1;
+	bool			request_non_strict_iommu:1;
 };
 
 /**
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
  2021-06-21 23:52 ` [PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers Douglas Anderson
  2021-06-21 23:52 ` [PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness Douglas Anderson
@ 2021-06-21 23:52 ` Douglas Anderson
  2021-06-24 13:38   ` Greg KH
  2021-06-21 23:52 ` [PATCH 4/6] iommu: Combine device strictness requests with the global default Douglas Anderson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Douglas Anderson @ 2021-06-21 23:52 UTC (permalink / raw)
  To: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Douglas Anderson, linux-kernel

At the moment the generic IOMMU framework reaches into the PCIe device
to check the "untrusted" state and uses this information to figure out
if it should be running the IOMMU in strict or non-strict mode. Let's
instead set the new boolean in "struct device" to indicate when we
want forced strictness.

NOTE: we still continue to set the "untrusted" bit in PCIe since that
apparently is used for more than just IOMMU strictness. It probably
makes sense for a later patchset to clarify all of the other needs we
have for "untrusted" PCIe devices (perhaps add more booleans into the
"struct device") so we can fully eliminate the need for the IOMMU
framework to reach into a PCIe device.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/pci/probe.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 275204646c68..8d81f0fb3e50 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1572,8 +1572,10 @@ static void set_pcie_untrusted(struct pci_dev *dev)
 	 * untrusted as well.
 	 */
 	parent = pci_upstream_bridge(dev);
-	if (parent && (parent->untrusted || parent->external_facing))
+	if (parent && (parent->untrusted || parent->external_facing)) {
 		dev->untrusted = true;
+		dev->dev.force_strict_iommu = true;
+	}
 }
 
 /**
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
                   ` (2 preceding siblings ...)
  2021-06-21 23:52 ` [PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices Douglas Anderson
@ 2021-06-21 23:52 ` Douglas Anderson
  2021-06-22  2:03   ` Lu Baolu
                     ` (3 more replies)
  2021-06-21 23:52 ` [PATCH 5/6] iommu: Stop reaching into PCIe devices to decide strict vs. non-strict Douglas Anderson
                   ` (3 subsequent siblings)
  7 siblings, 4 replies; 34+ messages in thread
From: Douglas Anderson @ 2021-06-21 23:52 UTC (permalink / raw)
  To: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Douglas Anderson, linux-kernel

In the patch ("drivers: base: Add bits to struct device to control
iommu strictness") we add the ability for devices to tell us about
their IOMMU strictness requirements. Let's now take that into account
in the IOMMU layer.

A few notes here:
* Presumably this is always how iommu_get_dma_strict() was intended to
  behave. Had this not been the intention then it never would have
  taken a domain as a parameter.
* The iommu_set_dma_strict() feels awfully non-symmetric now. That
  function sets the _default_ strictness globally in the system
  whereas iommu_get_dma_strict() returns the value for a given domain
  (falling back to the default). Presumably, at least, the fact that
  iommu_set_dma_strict() doesn't take a domain makes this obvious.

The function iommu_get_dma_strict() should now make it super obvious
where strictness comes from and who overides who. Though the function
changed a bunch to make the logic clearer, the only two new rules
should be:
* Devices can force strictness for themselves, overriding the cmdline
  "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
* Devices can request non-strictness for themselves, assuming there
  was no cmdline "iommu.strict=1" or a call to
  iommu_set_dma_strict(true).

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/iommu/iommu.c | 56 +++++++++++++++++++++++++++++++++----------
 include/linux/iommu.h |  2 ++
 2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..0c84a4c06110 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -28,8 +28,19 @@
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 
+enum iommu_strictness {
+	IOMMU_DEFAULT_STRICTNESS = -1,
+	IOMMU_NOT_STRICT = 0,
+	IOMMU_STRICT = 1,
+};
+static inline enum iommu_strictness bool_to_strictness(bool strictness)
+{
+	return (enum iommu_strictness)strictness;
+}
+
 static unsigned int iommu_def_domain_type __read_mostly;
-static bool iommu_dma_strict __read_mostly = true;
+static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
+static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
 static u32 iommu_cmd_line __read_mostly;
 
 struct iommu_group {
@@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
 };
 
 #define IOMMU_CMD_LINE_DMA_API		BIT(0)
-#define IOMMU_CMD_LINE_STRICT		BIT(1)
 
 static int iommu_alloc_default_domain(struct iommu_group *group,
 				      struct device *dev);
@@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type);
 
 static int __init iommu_dma_setup(char *str)
 {
-	int ret = kstrtobool(str, &iommu_dma_strict);
+	bool strict;
+	int ret = kstrtobool(str, &strict);
 
 	if (!ret)
-		iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
+		cmdline_dma_strict = bool_to_strictness(strict);
 	return ret;
 }
 early_param("iommu.strict", iommu_dma_setup);
 
 void iommu_set_dma_strict(bool strict)
 {
-	if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-		iommu_dma_strict = strict;
+	/* A driver can request strictness but not the other way around */
+	if (driver_dma_strict != IOMMU_STRICT)
+		driver_dma_strict = bool_to_strictness(strict);
 }
 
 bool iommu_get_dma_strict(struct iommu_domain *domain)
 {
-	/* only allow lazy flushing for DMA domains */
-	if (domain->type == IOMMU_DOMAIN_DMA)
-		return iommu_dma_strict;
+	/* Non-DMA domains or anyone forcing it to strict makes it strict */
+	if (domain->type != IOMMU_DOMAIN_DMA ||
+	    cmdline_dma_strict == IOMMU_STRICT ||
+	    driver_dma_strict == IOMMU_STRICT ||
+	    domain->force_strict)
+		return true;
+
+	/* Anyone requesting non-strict (if no forces) makes it non-strict */
+	if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
+	    driver_dma_strict == IOMMU_NOT_STRICT ||
+	    domain->request_non_strict)
+		return false;
+
+	/* Nobody said anything, so it's strict by default */
 	return true;
 }
 EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
@@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
 
 static int iommu_group_alloc_default_domain(struct bus_type *bus,
 					    struct iommu_group *group,
-					    unsigned int type)
+					    unsigned int type,
+					    struct device *dev)
 {
 	struct iommu_domain *dom;
 
@@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
 	if (!dom)
 		return -ENOMEM;
 
+	/* Save the strictness requests from the device */
+	if (dev && type == IOMMU_DOMAIN_DMA) {
+		dom->request_non_strict = dev->request_non_strict_iommu;
+		dom->force_strict = dev->force_strict_iommu;
+	}
+
 	group->default_domain = dom;
 	if (!group->domain)
 		group->domain = dom;
@@ -1550,7 +1580,7 @@ static int iommu_alloc_default_domain(struct iommu_group *group,
 
 	type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type;
 
-	return iommu_group_alloc_default_domain(dev->bus, group, type);
+	return iommu_group_alloc_default_domain(dev->bus, group, type, dev);
 }
 
 /**
@@ -1721,7 +1751,7 @@ static void probe_alloc_default_domain(struct bus_type *bus,
 	if (!gtype.type)
 		gtype.type = iommu_def_domain_type;
 
-	iommu_group_alloc_default_domain(bus, group, gtype.type);
+	iommu_group_alloc_default_domain(bus, group, gtype.type, NULL);
 
 }
 
@@ -3130,7 +3160,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
 	}
 
 	/* Sets group->default_domain to the newly allocated domain */
-	ret = iommu_group_alloc_default_domain(dev->bus, group, type);
+	ret = iommu_group_alloc_default_domain(dev->bus, group, type, dev);
 	if (ret)
 		goto out;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..0bddef77f415 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -81,6 +81,8 @@ struct iommu_domain_geometry {
 
 struct iommu_domain {
 	unsigned type;
+	bool force_strict:1;
+	bool request_non_strict:1;
 	const struct iommu_ops *ops;
 	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
 	iommu_fault_handler_t handler;
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/6] iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
                   ` (3 preceding siblings ...)
  2021-06-21 23:52 ` [PATCH 4/6] iommu: Combine device strictness requests with the global default Douglas Anderson
@ 2021-06-21 23:52 ` Douglas Anderson
  2021-06-21 23:52 ` [PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode Douglas Anderson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 34+ messages in thread
From: Douglas Anderson @ 2021-06-21 23:52 UTC (permalink / raw)
  To: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Douglas Anderson, linux-kernel

We now have a way for PCIe devices to force iommu.strict through the
"struct device" and that's now hooked up. Let's remove the special
case for PCIe devices.

NOTE: there are still other places in this file that make decisions
based on the PCIe "untrusted" status. This patch only handles removing
the one related to iommu.strict. Removing the other cases is left as
an exercise to the reader.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/iommu/dma-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..e50c06ce1a6b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -368,7 +368,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
 
 	init_iova_domain(iovad, 1UL << order, base_pfn);
 
-	if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
+	if (!cookie->fq_domain &&
 	    domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
 		if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
 					  iommu_dma_entry_dtor))
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
                   ` (4 preceding siblings ...)
  2021-06-21 23:52 ` [PATCH 5/6] iommu: Stop reaching into PCIe devices to decide strict vs. non-strict Douglas Anderson
@ 2021-06-21 23:52 ` Douglas Anderson
  2021-06-24 13:43   ` Greg KH
  2021-06-22 11:35 ` [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Robin Murphy
  2021-06-22 17:39 ` John Garry
  7 siblings, 1 reply; 34+ messages in thread
From: Douglas Anderson @ 2021-06-21 23:52 UTC (permalink / raw)
  To: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Douglas Anderson, Andy Gross, linux-kernel

IOMMUs can be run in "strict" mode or in "non-strict" mode. The
quick-summary difference between the two is that in "strict" mode we
wait until everything is flushed out when we unmap DMA memory. In
"non-strict" we don't.

Using the IOMMU in "strict" mode is more secure/safer but slower
because we have to sit and wait for flushes while we're unmapping. To
explain a bit why "non-strict" mode is unsafe, let's imagine two
examples.

An example of "non-strict" being insecure when reading from a device:
a) Linux driver maps memory for DMA.
b) Linux driver starts DMA on the device.
c) Device write to RAM subject to bounds checking done by IOMMU.
d) Device finishes writing to RAM and signals transfer is finished.
e) Linux driver starts unmapping DMA memory but doesn't flush.
f) Linux driver validates that the data in memory looks sane and that
   accessing it won't cause the driver to, for instance, overflow a
   buffer.
g) Device takes advantage of knowledge of how the Linux driver works
   and sneaks in a modification to the data after the validation but
   before the IOMMU unmap flush finishes.
h) Device has now caused the Linux driver to access memory it
   shouldn't.

An example of "non-strict" being insecure when writing to a device:
a) Linux driver writes data intended for the device to RAM.
b) Linux driver maps memory for DMA.
c) Linux driver starts DMA on the device.
d) Device reads from RAM subject to bounds checking done by IOMMU.
e) Device finishes reading from RAM and signals transfer is finished.
f) Linux driver starts unmapping DMA memory but doesn't flush.
g) Linux driver frees memory and returns it to the pool.
h) Memory is allocated for another purpose.
i) Device takes advantage of the period of time before IOMMU flush to
   read memory that it shouldn't have had access to.

As you can see from the above examples, using the iommu in
"non-strict" mode might not sound _too_ scary (the window of badness
is small and the exposed memory is small) but there is certainly
risk. Let's evaluate the risk by breaking it down into two problems
that IOMMUs are supposed to be protecting us against:

Case 1: IOMMUs prevent malicious code running on the peripheral (maybe
a malicious peripheral or maybe someone exploited a benign peripheral)
from turning into an exploit of the Linux kernel. This is particularly
important if the peripheral has loadable / updatable firmware or if
the peripheral has some type of general purpose processor and is
processing untrusted inputs. It's also important if the device is
something that can be easily plugged into the host and the device has
direct DMA access itself, like a PCIe device.

Case 2: IOMMUs limit the severity of a class of software bugs in the
kernel. If we misconfigure a peripheral by accident then instead of
the peripheral clobbering random memory due to a bug we might get an
IOMMU error.

Now that we understand the issue and the risks, let's evaluate whether
we really need "strict" mode for the Qualcomm SDHCI controllers. I
will make the argument that we don't _need_ strict mode for them. Why?
* The SDHCI controller on Qualcomm SoCs doesn't appear to have
  loadable / updatable firmware and, assuming it's got some firmware
  baked into it, I see no evidence that the firmware could be
  compromised.
* Even though, for external SD cards in particular, the controller is
  dealing with "untrusted" inputs, it's dealing with them in a very
  controlled way.  It seems unlikely that a rogue SD card would be
  able to present something to the SDHCI controller that would cause
  it to DMA to/from an address other than one the kernel told it
  about.
* Although it would be nice to catch more software bugs, once the
  Linux driver has been debugged and stressed the value is not very
  high. If the IOMMU caught something like this the system would be in
  a pretty bad shape anyway (we don't really recover from IOMMU
  errors) and the only benefit would be a better spotlight on what
  went wrong.

Now we have a good understanding of the benefits of "strict" mode for
our SDHCI controllers, let's look at some performance numbers. I used
"dd" to measure read speeds from eMMC on a sc7180-trogdor-lazor
board. Basic test command (while booted from USB):
  echo 3 > /proc/sys/vm/drop_caches
  dd if=/dev/mmcblk1 of=/dev/null bs=4M count=512

I attempted to run my tests for enough iterations that results
stabilized and weren't too noisy. Tests were run with patches picked
to the chromeos-5.4 tree (sanity checked against v5.13-rc7). I also
attempted to compare to other attempts to address IOMMU problems
and/or attempts to bump the cpufreq up to solve this problem:
- eMMC datasheet spec: 300 MB/s "Typical Sequential Performance"
  NOTE: we're driving the bus at 192 MHz instead of 200 Mhz so we might
  not be able to achieve the full 300 MB/s.
- Baseline: 210.9 MB/s
- Baseline + peg cpufreq to max: 284.3 MB/s
- This patch: 279.6 MB/s
- This patch + peg cpufreq to max: 288.1 MB/s
- Joel's IO Wait fix [1]: 258.4 MB/s
- Joel's IO Wait fix [1] + peg cpufreq to max: 287.8 MB/s
- TLBIVA patches [2] + [3]: 214.7 MB/s
- TLBIVA patches [2] + [3] + peg cpufreq to max: 285.7 MB/s
- This patch plus Joel's [1]: 280.2 MB/s
- This patch plus Joel's [1] + peg...: 279.0 MB/s
  NOTE: I suspect something in the system was thermal throttling since
  there's a heat wave right now.

I also spent a little bit of time trying to see if I could get the
IOMMU flush for MMC out of the critical path but was unable to figure
out how to do this and get good performance.

Overall I'd say that the performance results above show:
* It's really not straightforward to point at "one thing" that is
  making our eMMC performance bad.
* It's certainly possible to get pretty good eMMC performance even
  without this patch.
* This patch makes it much easier to get good eMMC performance.
* No other solutions that I found resulted in quite as good eMMC
  performance as having this patch.

Given all the above (security safety concerns are minimal and it's a
nice performance win), I'm proposing that running SDHCI on Qualcomm
SoCs in non-strict mode is the right thing to do until such point in
time as someone can come up with a better solution to get good SD/eMMC
performance without it.

NOTES:
* It's likely that arguments similar to the above can be made for
  other SDHCI controllers. However, given that this is something that
  can have an impact on security it feels like we want each SDHCI
  controller to opt-in. I believe it is conceivable, for instance,
  that some SDHCI controllers might have loadable or updatable
  firmware.
* It's also likely other peripherals will want this to get the quick
  performance win. That also should be fine, though anyone landing a
  similar patch should be very careful that it is low risk for all
  users of a given peripheral.
* Conceivably if even this patch is considered too "high risk", we
  could limit this to just non-removable cards (like eMMC) by just
  checking the device tree. This is one nice advantage of using the
  pre_probe() to set this.

[1] https://lore.kernel.org/r/20210618040639.3113489-1-joel@joelfernandes.org
[2] https://lore.kernel.org/r/1623850736-389584-1-git-send-email-quic_c_gdjako@quicinc.com/
[3] https://lore.kernel.org/r/cover.1623981933.git.saiprakash.ranjan@codeaurora.org/

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

 drivers/mmc/host/sdhci-msm.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c
index e44b7a66b73c..33ef5e6941d7 100644
--- a/drivers/mmc/host/sdhci-msm.c
+++ b/drivers/mmc/host/sdhci-msm.c
@@ -2465,6 +2465,13 @@ static inline void sdhci_msm_get_of_property(struct platform_device *pdev,
 }
 
 
+static int sdhci_msm_pre_probe(struct device *dev)
+{
+	dev->request_non_strict_iommu = true;
+
+	return 0;
+}
+
 static int sdhci_msm_probe(struct platform_device *pdev)
 {
 	struct sdhci_host *host;
@@ -2811,6 +2818,7 @@ static struct platform_driver sdhci_msm_driver = {
 		   .of_match_table = sdhci_msm_dt_match,
 		   .pm = &sdhci_msm_pm_ops,
 		   .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+		   .pre_probe = sdhci_msm_pre_probe,
 	},
 };
 
-- 
2.32.0.288.g62a8d224e6-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-21 23:52 ` [PATCH 4/6] iommu: Combine device strictness requests with the global default Douglas Anderson
@ 2021-06-22  2:03   ` Lu Baolu
  2021-06-22 16:53     ` Doug Anderson
  2021-06-22  2:55   ` Saravana Kannan
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 34+ messages in thread
From: Lu Baolu @ 2021-06-22  2:03 UTC (permalink / raw)
  To: Douglas Anderson, gregkh, rafael, rafael.j.wysocki, will,
	robin.murphy, joro, bjorn.andersson, ulf.hansson, adrian.hunter,
	bhelgaas
  Cc: baolu.lu, robdclark, linux-kernel, saravanak, linux-arm-msm,
	linux-mmc, quic_c_gdjako, iommu, linux-pci, joel, rajatja,
	sonnyrao, vbadigan

On 6/22/21 7:52 AM, Douglas Anderson wrote:
> @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
>   
>   static int iommu_group_alloc_default_domain(struct bus_type *bus,
>   					    struct iommu_group *group,
> -					    unsigned int type)
> +					    unsigned int type,
> +					    struct device *dev)
>   {
>   	struct iommu_domain *dom;
>   
> @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
>   	if (!dom)
>   		return -ENOMEM;
>   
> +	/* Save the strictness requests from the device */
> +	if (dev && type == IOMMU_DOMAIN_DMA) {
> +		dom->request_non_strict = dev->request_non_strict_iommu;
> +		dom->force_strict = dev->force_strict_iommu;
> +	}
> +

An iommu default domain might be used by multiple devices which might
have different "strict" attributions. Then who could override who?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-21 23:52 ` [PATCH 4/6] iommu: Combine device strictness requests with the global default Douglas Anderson
  2021-06-22  2:03   ` Lu Baolu
@ 2021-06-22  2:55   ` Saravana Kannan
  2021-06-22 16:40     ` Doug Anderson
  2021-06-22 11:49   ` Robin Murphy
  2021-06-22 18:45   ` Rajat Jain
  3 siblings, 1 reply; 34+ messages in thread
From: Saravana Kannan @ 2021-06-22  2:55 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas, robdclark,
	linux-arm-msm, linux-pci, quic_c_gdjako, iommu, sonnyrao,
	saiprakash.ranjan, linux-mmc, vbadigan, rajatja, joel,
	linux-kernel

On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> In the patch ("drivers: base: Add bits to struct device to control
> iommu strictness") we add the ability for devices to tell us about
> their IOMMU strictness requirements. Let's now take that into account
> in the IOMMU layer.
>
> A few notes here:
> * Presumably this is always how iommu_get_dma_strict() was intended to
>   behave. Had this not been the intention then it never would have
>   taken a domain as a parameter.
> * The iommu_set_dma_strict() feels awfully non-symmetric now. That
>   function sets the _default_ strictness globally in the system
>   whereas iommu_get_dma_strict() returns the value for a given domain
>   (falling back to the default). Presumably, at least, the fact that
>   iommu_set_dma_strict() doesn't take a domain makes this obvious.
>
> The function iommu_get_dma_strict() should now make it super obvious
> where strictness comes from and who overides who. Though the function
> changed a bunch to make the logic clearer, the only two new rules
> should be:
> * Devices can force strictness for themselves, overriding the cmdline
>   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> * Devices can request non-strictness for themselves, assuming there
>   was no cmdline "iommu.strict=1" or a call to
>   iommu_set_dma_strict(true).
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>  drivers/iommu/iommu.c | 56 +++++++++++++++++++++++++++++++++----------
>  include/linux/iommu.h |  2 ++
>  2 files changed, 45 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 808ab70d5df5..0c84a4c06110 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -28,8 +28,19 @@
>  static struct kset *iommu_group_kset;
>  static DEFINE_IDA(iommu_group_ida);
>
> +enum iommu_strictness {
> +       IOMMU_DEFAULT_STRICTNESS = -1,
> +       IOMMU_NOT_STRICT = 0,
> +       IOMMU_STRICT = 1,
> +};
> +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> +{
> +       return (enum iommu_strictness)strictness;
> +}
> +
>  static unsigned int iommu_def_domain_type __read_mostly;
> -static bool iommu_dma_strict __read_mostly = true;
> +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
> +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
>  static u32 iommu_cmd_line __read_mostly;
>
>  struct iommu_group {
> @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
>  };
>
>  #define IOMMU_CMD_LINE_DMA_API         BIT(0)
> -#define IOMMU_CMD_LINE_STRICT          BIT(1)
>
>  static int iommu_alloc_default_domain(struct iommu_group *group,
>                                       struct device *dev);
> @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type);
>
>  static int __init iommu_dma_setup(char *str)
>  {
> -       int ret = kstrtobool(str, &iommu_dma_strict);
> +       bool strict;
> +       int ret = kstrtobool(str, &strict);
>
>         if (!ret)
> -               iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> +               cmdline_dma_strict = bool_to_strictness(strict);
>         return ret;
>  }
>  early_param("iommu.strict", iommu_dma_setup);
>
>  void iommu_set_dma_strict(bool strict)
>  {
> -       if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> -               iommu_dma_strict = strict;
> +       /* A driver can request strictness but not the other way around */
> +       if (driver_dma_strict != IOMMU_STRICT)
> +               driver_dma_strict = bool_to_strictness(strict);
>  }
>
>  bool iommu_get_dma_strict(struct iommu_domain *domain)
>  {
> -       /* only allow lazy flushing for DMA domains */
> -       if (domain->type == IOMMU_DOMAIN_DMA)
> -               return iommu_dma_strict;
> +       /* Non-DMA domains or anyone forcing it to strict makes it strict */
> +       if (domain->type != IOMMU_DOMAIN_DMA ||
> +           cmdline_dma_strict == IOMMU_STRICT ||
> +           driver_dma_strict == IOMMU_STRICT ||
> +           domain->force_strict)
> +               return true;
> +
> +       /* Anyone requesting non-strict (if no forces) makes it non-strict */
> +       if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> +           driver_dma_strict == IOMMU_NOT_STRICT ||
> +           domain->request_non_strict)
> +               return false;
> +
> +       /* Nobody said anything, so it's strict by default */

If iommu.strict is not set in the command line, upstream treats it as
iommu.strict=1. Meaning, no drivers can override it.

If I understand it correctly, with your series, if iommu.strict=1 is
not set, drivers can override the "default strict mode" and ask for
non-strict mode for their domain. So if this series gets in and future
driver changes start asking for non-strict mode, systems that are
expected to operate in fully strict mode will now have devices
operating in non-strict mode.

That's breaking backward compatibility for the kernel command line
param. It looks like what you really need is to change iommu.strict
from 0/1 to lazy (previously 0), strict preferred, strict enforced
(previously 1) and you need to default it to "enforced".

Alternately (and potentially a better option?), you really should be
changing/extending dev_is_untrusted() so that it applies for any
struct device (not just PCI device) and then have this overridden in
DT (or ACPI or any firmware) to indicate a specific device is safe to
use non-strict mode on. What you are trying to capture (if the device
safe enough) really isn't a function of the DMA device's driver, but a
function of the DMA device.



>         return true;
>  }
>  EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
> @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
>
>  static int iommu_group_alloc_default_domain(struct bus_type *bus,
>                                             struct iommu_group *group,
> -                                           unsigned int type)
> +                                           unsigned int type,
> +                                           struct device *dev)
>  {
>         struct iommu_domain *dom;
>
> @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
>         if (!dom)
>                 return -ENOMEM;
>
> +       /* Save the strictness requests from the device */
> +       if (dev && type == IOMMU_DOMAIN_DMA) {
> +               dom->request_non_strict = dev->request_non_strict_iommu;
> +               dom->force_strict = dev->force_strict_iommu;
> +       }
> +
>         group->default_domain = dom;
>         if (!group->domain)
>                 group->domain = dom;
> @@ -1550,7 +1580,7 @@ static int iommu_alloc_default_domain(struct iommu_group *group,
>
>         type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type;
>
> -       return iommu_group_alloc_default_domain(dev->bus, group, type);
> +       return iommu_group_alloc_default_domain(dev->bus, group, type, dev);
>  }
>
>  /**
> @@ -1721,7 +1751,7 @@ static void probe_alloc_default_domain(struct bus_type *bus,
>         if (!gtype.type)
>                 gtype.type = iommu_def_domain_type;
>
> -       iommu_group_alloc_default_domain(bus, group, gtype.type);
> +       iommu_group_alloc_default_domain(bus, group, gtype.type, NULL);
>
>  }
>
> @@ -3130,7 +3160,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
>         }
>
>         /* Sets group->default_domain to the newly allocated domain */
> -       ret = iommu_group_alloc_default_domain(dev->bus, group, type);
> +       ret = iommu_group_alloc_default_domain(dev->bus, group, type, dev);
>         if (ret)
>                 goto out;
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 32d448050bf7..0bddef77f415 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -81,6 +81,8 @@ struct iommu_domain_geometry {
>
>  struct iommu_domain {
>         unsigned type;
> +       bool force_strict:1;
> +       bool request_non_strict:1;
>         const struct iommu_ops *ops;
>         unsigned long pgsize_bitmap;    /* Bitmap of page sizes in use */
>         iommu_fault_handler_t handler;
> --
> 2.32.0.288.g62a8d224e6-goog
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
                   ` (5 preceding siblings ...)
  2021-06-21 23:52 ` [PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode Douglas Anderson
@ 2021-06-22 11:35 ` Robin Murphy
  2021-06-22 16:06   ` Doug Anderson
  2021-06-22 17:39 ` John Garry
  7 siblings, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2021-06-22 11:35 UTC (permalink / raw)
  To: Douglas Anderson, gregkh, rafael, rafael.j.wysocki, will, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Andy Gross, Bartosz Golaszewski, Dan Williams,
	Geert Uytterhoeven, Heikki Krogerus, Randy Dunlap, linux-kernel

Hi Doug,

On 2021-06-22 00:52, Douglas Anderson wrote:
> 
> This patch attempts to put forward a proposal for enabling non-strict
> DMA on a device-by-device basis. The patch series requests non-strict
> DMA for the Qualcomm SDHCI controller as a first device to enable,
> getting a nice bump in performance with what's believed to be a very
> small drop in security / safety (see the patch for the full argument).
> 
> As part of this patch series I am end up slightly cleaning up some of
> the interactions between the PCI subsystem and the IOMMU subsystem but
> I don't go all the way to fully remove all the tentacles. Specifically
> this patch series only concerns itself with a single aspect: strict
> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> to talk about / reason about for more subsystems compared to overall
> deciding what it means for a device to be "external" or "untrusted".
> 
> If something like this patch series ends up being landable, it will
> undoubtedly need coordination between many maintainers to land. I
> believe it's fully bisectable but later patches in the series
> definitely depend on earlier ones. Sorry for the long CC list. :(

Unfortunately, this doesn't work. In normal operation, the default 
domains should be established long before individual drivers are even 
loaded (if they are modules), let alone anywhere near probing. The fact 
that iommu_probe_device() sometimes gets called far too late off the 
back of driver probe is an unfortunate artefact of the original 
probe-deferral scheme, and causes other problems like potentially 
malformed groups - I've been forming a plan to fix that for a while now, 
so I for one really can't condone anything trying to rely on it. 
Non-deterministic behaviour based on driver probe order for multi-device 
groups is part of the existing problem, and your proposal seems equally 
vulnerable to that too.

FWIW we already have a go-faster knob for people who want to tweak the 
security/performance compromise for specific devices, namely the sysfs 
interface for changing a group's domain type before binding the relevant 
driver(s). Is that something you could use in your application, say from 
an initramfs script?

Thanks,
Robin.

> Douglas Anderson (6):
>    drivers: base: Add the concept of "pre_probe" to drivers
>    drivers: base: Add bits to struct device to control iommu strictness
>    PCI: Indicate that we want to force strict DMA for untrusted devices
>    iommu: Combine device strictness requests with the global default
>    iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
>    mmc: sdhci-msm: Request non-strict IOMMU mode
> 
>   drivers/base/dd.c             | 10 +++++--
>   drivers/iommu/dma-iommu.c     |  2 +-
>   drivers/iommu/iommu.c         | 56 +++++++++++++++++++++++++++--------
>   drivers/mmc/host/sdhci-msm.c  |  8 +++++
>   drivers/pci/probe.c           |  4 ++-
>   include/linux/device.h        | 11 +++++++
>   include/linux/device/driver.h |  9 ++++++
>   include/linux/iommu.h         |  2 ++
>   8 files changed, 85 insertions(+), 17 deletions(-)
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-21 23:52 ` [PATCH 4/6] iommu: Combine device strictness requests with the global default Douglas Anderson
  2021-06-22  2:03   ` Lu Baolu
  2021-06-22  2:55   ` Saravana Kannan
@ 2021-06-22 11:49   ` Robin Murphy
  2021-06-22 18:45   ` Rajat Jain
  3 siblings, 0 replies; 34+ messages in thread
From: Robin Murphy @ 2021-06-22 11:49 UTC (permalink / raw)
  To: Douglas Anderson, gregkh, rafael, rafael.j.wysocki, will, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, linux-kernel

On 2021-06-22 00:52, Douglas Anderson wrote:
> In the patch ("drivers: base: Add bits to struct device to control
> iommu strictness") we add the ability for devices to tell us about
> their IOMMU strictness requirements. Let's now take that into account
> in the IOMMU layer.
> 
> A few notes here:
> * Presumably this is always how iommu_get_dma_strict() was intended to
>    behave. Had this not been the intention then it never would have
>    taken a domain as a parameter.

FWIW strictness does have the semantic of being a per-domain property, 
but mostly in the sense that it's only relevant to IOMMU_DOMAIN_DMA 
domains, so the main thing was encapsulating that check rather than 
duplicating it all over callsites.

> * The iommu_set_dma_strict() feels awfully non-symmetric now. That
>    function sets the _default_ strictness globally in the system
>    whereas iommu_get_dma_strict() returns the value for a given domain
>    (falling back to the default). Presumably, at least, the fact that
>    iommu_set_dma_strict() doesn't take a domain makes this obvious.

It *is* asymmetric - one is for IOMMU core code and individual driver 
internals to know whether they need to do whatever bits of setting up a 
flush queue for a given domain they are responsible for, while the other 
is specifically for two drivers to force the global default in order to 
preserve legacy driver-specific behaviour. Maybe that should have been 
called something like iommu_set_dma_default_strict instead... :/

Robin.

> The function iommu_get_dma_strict() should now make it super obvious
> where strictness comes from and who overides who. Though the function
> changed a bunch to make the logic clearer, the only two new rules
> should be:
> * Devices can force strictness for themselves, overriding the cmdline
>    "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> * Devices can request non-strictness for themselves, assuming there
>    was no cmdline "iommu.strict=1" or a call to
>    iommu_set_dma_strict(true).
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>   drivers/iommu/iommu.c | 56 +++++++++++++++++++++++++++++++++----------
>   include/linux/iommu.h |  2 ++
>   2 files changed, 45 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 808ab70d5df5..0c84a4c06110 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -28,8 +28,19 @@
>   static struct kset *iommu_group_kset;
>   static DEFINE_IDA(iommu_group_ida);
>   
> +enum iommu_strictness {
> +	IOMMU_DEFAULT_STRICTNESS = -1,
> +	IOMMU_NOT_STRICT = 0,
> +	IOMMU_STRICT = 1,
> +};
> +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> +{
> +	return (enum iommu_strictness)strictness;
> +}
> +
>   static unsigned int iommu_def_domain_type __read_mostly;
> -static bool iommu_dma_strict __read_mostly = true;
> +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
> +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
>   static u32 iommu_cmd_line __read_mostly;
>   
>   struct iommu_group {
> @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
>   };
>   
>   #define IOMMU_CMD_LINE_DMA_API		BIT(0)
> -#define IOMMU_CMD_LINE_STRICT		BIT(1)
>   
>   static int iommu_alloc_default_domain(struct iommu_group *group,
>   				      struct device *dev);
> @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type);
>   
>   static int __init iommu_dma_setup(char *str)
>   {
> -	int ret = kstrtobool(str, &iommu_dma_strict);
> +	bool strict;
> +	int ret = kstrtobool(str, &strict);
>   
>   	if (!ret)
> -		iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> +		cmdline_dma_strict = bool_to_strictness(strict);
>   	return ret;
>   }
>   early_param("iommu.strict", iommu_dma_setup);
>   
>   void iommu_set_dma_strict(bool strict)
>   {
> -	if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> -		iommu_dma_strict = strict;
> +	/* A driver can request strictness but not the other way around */
> +	if (driver_dma_strict != IOMMU_STRICT)
> +		driver_dma_strict = bool_to_strictness(strict);
>   }
>   
>   bool iommu_get_dma_strict(struct iommu_domain *domain)
>   {
> -	/* only allow lazy flushing for DMA domains */
> -	if (domain->type == IOMMU_DOMAIN_DMA)
> -		return iommu_dma_strict;
> +	/* Non-DMA domains or anyone forcing it to strict makes it strict */
> +	if (domain->type != IOMMU_DOMAIN_DMA ||
> +	    cmdline_dma_strict == IOMMU_STRICT ||
> +	    driver_dma_strict == IOMMU_STRICT ||
> +	    domain->force_strict)
> +		return true;
> +
> +	/* Anyone requesting non-strict (if no forces) makes it non-strict */
> +	if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> +	    driver_dma_strict == IOMMU_NOT_STRICT ||
> +	    domain->request_non_strict)
> +		return false;
> +
> +	/* Nobody said anything, so it's strict by default */
>   	return true;
>   }
>   EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
> @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
>   
>   static int iommu_group_alloc_default_domain(struct bus_type *bus,
>   					    struct iommu_group *group,
> -					    unsigned int type)
> +					    unsigned int type,
> +					    struct device *dev)
>   {
>   	struct iommu_domain *dom;
>   
> @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
>   	if (!dom)
>   		return -ENOMEM;
>   
> +	/* Save the strictness requests from the device */
> +	if (dev && type == IOMMU_DOMAIN_DMA) {
> +		dom->request_non_strict = dev->request_non_strict_iommu;
> +		dom->force_strict = dev->force_strict_iommu;
> +	}
> +
>   	group->default_domain = dom;
>   	if (!group->domain)
>   		group->domain = dom;
> @@ -1550,7 +1580,7 @@ static int iommu_alloc_default_domain(struct iommu_group *group,
>   
>   	type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type;
>   
> -	return iommu_group_alloc_default_domain(dev->bus, group, type);
> +	return iommu_group_alloc_default_domain(dev->bus, group, type, dev);
>   }
>   
>   /**
> @@ -1721,7 +1751,7 @@ static void probe_alloc_default_domain(struct bus_type *bus,
>   	if (!gtype.type)
>   		gtype.type = iommu_def_domain_type;
>   
> -	iommu_group_alloc_default_domain(bus, group, gtype.type);
> +	iommu_group_alloc_default_domain(bus, group, gtype.type, NULL);
>   
>   }
>   
> @@ -3130,7 +3160,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
>   	}
>   
>   	/* Sets group->default_domain to the newly allocated domain */
> -	ret = iommu_group_alloc_default_domain(dev->bus, group, type);
> +	ret = iommu_group_alloc_default_domain(dev->bus, group, type, dev);
>   	if (ret)
>   		goto out;
>   
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 32d448050bf7..0bddef77f415 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -81,6 +81,8 @@ struct iommu_domain_geometry {
>   
>   struct iommu_domain {
>   	unsigned type;
> +	bool force_strict:1;
> +	bool request_non_strict:1;
>   	const struct iommu_ops *ops;
>   	unsigned long pgsize_bitmap;	/* Bitmap of page sizes in use */
>   	iommu_fault_handler_t handler;
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 11:35 ` [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Robin Murphy
@ 2021-06-22 16:06   ` Doug Anderson
  2021-06-22 20:02     ` Rob Herring
  2021-06-22 22:10     ` Robin Murphy
  0 siblings, 2 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-22 16:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Joerg Roedel, Bjorn Andersson, Ulf Hansson,
	Adrian Hunter, Bjorn Helgaas, Rob Clark, linux-arm-msm,
	linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Saravana Kannan,
	Joel Fernandes, Andy Gross, Bartosz Golaszewski, Dan Williams,
	Geert Uytterhoeven, Heikki Krogerus, Randy Dunlap, LKML

Hi,

On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> Hi Doug,
>
> On 2021-06-22 00:52, Douglas Anderson wrote:
> >
> > This patch attempts to put forward a proposal for enabling non-strict
> > DMA on a device-by-device basis. The patch series requests non-strict
> > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > getting a nice bump in performance with what's believed to be a very
> > small drop in security / safety (see the patch for the full argument).
> >
> > As part of this patch series I am end up slightly cleaning up some of
> > the interactions between the PCI subsystem and the IOMMU subsystem but
> > I don't go all the way to fully remove all the tentacles. Specifically
> > this patch series only concerns itself with a single aspect: strict
> > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > to talk about / reason about for more subsystems compared to overall
> > deciding what it means for a device to be "external" or "untrusted".
> >
> > If something like this patch series ends up being landable, it will
> > undoubtedly need coordination between many maintainers to land. I
> > believe it's fully bisectable but later patches in the series
> > definitely depend on earlier ones. Sorry for the long CC list. :(
>
> Unfortunately, this doesn't work. In normal operation, the default
> domains should be established long before individual drivers are even
> loaded (if they are modules), let alone anywhere near probing. The fact
> that iommu_probe_device() sometimes gets called far too late off the
> back of driver probe is an unfortunate artefact of the original
> probe-deferral scheme, and causes other problems like potentially
> malformed groups - I've been forming a plan to fix that for a while now,
> so I for one really can't condone anything trying to rely on it.
> Non-deterministic behaviour based on driver probe order for multi-device
> groups is part of the existing problem, and your proposal seems equally
> vulnerable to that too.

Doh! :( I definitely can't say I understand the iommu subsystem
amazingly well. It was working for me, but I could believe that I was
somehow violating a rule somewhere.

I'm having a bit of a hard time understanding where the problem is
though. Is there any chance that you missed the part of my series
where I introduced a "pre_probe" step? Specifically, I see this:

* really_probe() is called w/ a driver and a device.
* -> calls dev->bus->dma_configure() w/ a "struct device *"
* -> eventually calls iommu_probe_device() w/ the device.
* -> calls iommu_alloc_default_domain() w/ the device
* -> calls iommu_group_alloc_default_domain()
* -> always allocates a new domain

...so we always have a "struct device" when a domain is allocated if
that domain is going to be associated with a device.

I will agree that iommu_probe_device() is called before the driver
probe, but unless I missed something it's after the device driver is
loaded.  ...and assuming something like patch #1 in this series looks
OK then iommu_probe_device() will be called after "pre_probe".

So assuming I'm not missing something, I'm not actually relying the
IOMMU getting init off the back of driver probe.


> FWIW we already have a go-faster knob for people who want to tweak the
> security/performance compromise for specific devices, namely the sysfs
> interface for changing a group's domain type before binding the relevant
> driver(s). Is that something you could use in your application, say from
> an initramfs script?

We've never had an initramfs script in Chrome OS. I don't know all the
history of why (I'm trying to check), but I'm nearly certain it was a
conscious decision. Probably it has to do with the fact that we're not
trying to build a generic distribution where a single boot source can
boot a huge variety of hardware. We generally have one kernel for a
class of devices. I believe avoiding the initramfs just keeps things
simpler.

I think trying to revamp Chrome OS to switch to an initramfs type
system would be a pretty big undertaking since (as I understand it)
you can't just run a little command and then return to the normal boot
flow. Once you switch to initramfs you're committing to finding /
setting up the rootfs yourself and on Chrome OS I believe that means a
whole bunch of dm-verity work.


...so probably the initramfs is a no-go for me, but I'm still crossing
my fingers that the pre_probe() might be legit if you take a second
look at it?

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-22  2:55   ` Saravana Kannan
@ 2021-06-22 16:40     ` Doug Anderson
  2021-06-22 19:50       ` Saravana Kannan
  0 siblings, 1 reply; 34+ messages in thread
From: Doug Anderson @ 2021-06-22 16:40 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark,
	linux-arm-msm, linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Joel Fernandes, LKML

Hi,

On Mon, Jun 21, 2021 at 7:56 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > In the patch ("drivers: base: Add bits to struct device to control
> > iommu strictness") we add the ability for devices to tell us about
> > their IOMMU strictness requirements. Let's now take that into account
> > in the IOMMU layer.
> >
> > A few notes here:
> > * Presumably this is always how iommu_get_dma_strict() was intended to
> >   behave. Had this not been the intention then it never would have
> >   taken a domain as a parameter.
> > * The iommu_set_dma_strict() feels awfully non-symmetric now. That
> >   function sets the _default_ strictness globally in the system
> >   whereas iommu_get_dma_strict() returns the value for a given domain
> >   (falling back to the default). Presumably, at least, the fact that
> >   iommu_set_dma_strict() doesn't take a domain makes this obvious.
> >
> > The function iommu_get_dma_strict() should now make it super obvious
> > where strictness comes from and who overides who. Though the function
> > changed a bunch to make the logic clearer, the only two new rules
> > should be:
> > * Devices can force strictness for themselves, overriding the cmdline
> >   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> > * Devices can request non-strictness for themselves, assuming there
> >   was no cmdline "iommu.strict=1" or a call to
> >   iommu_set_dma_strict(true).
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  drivers/iommu/iommu.c | 56 +++++++++++++++++++++++++++++++++----------
> >  include/linux/iommu.h |  2 ++
> >  2 files changed, 45 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 808ab70d5df5..0c84a4c06110 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -28,8 +28,19 @@
> >  static struct kset *iommu_group_kset;
> >  static DEFINE_IDA(iommu_group_ida);
> >
> > +enum iommu_strictness {
> > +       IOMMU_DEFAULT_STRICTNESS = -1,
> > +       IOMMU_NOT_STRICT = 0,
> > +       IOMMU_STRICT = 1,
> > +};
> > +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> > +{
> > +       return (enum iommu_strictness)strictness;
> > +}
> > +
> >  static unsigned int iommu_def_domain_type __read_mostly;
> > -static bool iommu_dma_strict __read_mostly = true;
> > +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
> > +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
> >  static u32 iommu_cmd_line __read_mostly;
> >
> >  struct iommu_group {
> > @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
> >  };
> >
> >  #define IOMMU_CMD_LINE_DMA_API         BIT(0)
> > -#define IOMMU_CMD_LINE_STRICT          BIT(1)
> >
> >  static int iommu_alloc_default_domain(struct iommu_group *group,
> >                                       struct device *dev);
> > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type);
> >
> >  static int __init iommu_dma_setup(char *str)
> >  {
> > -       int ret = kstrtobool(str, &iommu_dma_strict);
> > +       bool strict;
> > +       int ret = kstrtobool(str, &strict);
> >
> >         if (!ret)
> > -               iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> > +               cmdline_dma_strict = bool_to_strictness(strict);
> >         return ret;
> >  }
> >  early_param("iommu.strict", iommu_dma_setup);
> >
> >  void iommu_set_dma_strict(bool strict)
> >  {
> > -       if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> > -               iommu_dma_strict = strict;
> > +       /* A driver can request strictness but not the other way around */
> > +       if (driver_dma_strict != IOMMU_STRICT)
> > +               driver_dma_strict = bool_to_strictness(strict);
> >  }
> >
> >  bool iommu_get_dma_strict(struct iommu_domain *domain)
> >  {
> > -       /* only allow lazy flushing for DMA domains */
> > -       if (domain->type == IOMMU_DOMAIN_DMA)
> > -               return iommu_dma_strict;
> > +       /* Non-DMA domains or anyone forcing it to strict makes it strict */
> > +       if (domain->type != IOMMU_DOMAIN_DMA ||
> > +           cmdline_dma_strict == IOMMU_STRICT ||
> > +           driver_dma_strict == IOMMU_STRICT ||
> > +           domain->force_strict)
> > +               return true;
> > +
> > +       /* Anyone requesting non-strict (if no forces) makes it non-strict */
> > +       if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> > +           driver_dma_strict == IOMMU_NOT_STRICT ||
> > +           domain->request_non_strict)
> > +               return false;
> > +
> > +       /* Nobody said anything, so it's strict by default */
>
> If iommu.strict is not set in the command line, upstream treats it as
> iommu.strict=1. Meaning, no drivers can override it.
>
> If I understand it correctly, with your series, if iommu.strict=1 is
> not set, drivers can override the "default strict mode" and ask for
> non-strict mode for their domain. So if this series gets in and future
> driver changes start asking for non-strict mode, systems that are
> expected to operate in fully strict mode will now have devices
> operating in non-strict mode.
>
> That's breaking backward compatibility for the kernel command line
> param. It looks like what you really need is to change iommu.strict
> from 0/1 to lazy (previously 0), strict preferred, strict enforced
> (previously 1) and you need to default it to "enforced".

I'm not quite sure I'd agree, but certainly it could be up for debate.
I think I'm keeping full compatibility with the kernel command line
parameter, specifically:

* iommu.strict=0: default to non-strict mode unless a driver overrides

* iommu.strict=1: force everything to strict no matter what

...both of those two things are the same before and after my patchset.

You're arguing that I'm changing the behavior of the system when no
command line parameter is present. To me this seems a little bit of a
stretch. If no command line parameter is present I'd assert that the
kernel should do some sort of sane behavior and that we don't have to
force 100% strictness if the command line parameter isn't present at
all.

I would also note that your assertion that the system is 100% strict
under the "no command line parameter" case isn't actually true as far
as I can tell. The code in mainline is a little hard to follow (for me
the code after my patch is easier to follow), but you can see that
even before my patch a call to iommu_set_dma_strict() could be used to
make the system non-strict if no command-line parameter was passed.


> Alternately (and potentially a better option?), you really should be
> changing/extending dev_is_untrusted() so that it applies for any
> struct device (not just PCI device) and then have this overridden in
> DT (or ACPI or any firmware) to indicate a specific device is safe to
> use non-strict mode on.

I was really trying _not_ to do that. I believe this has been talked
about several times, including at last year's Linux Plumbers
conference. As far as I can tell it always ends in a shouting match w/
no forward progress. There are a bunch of problems here, namely:

* Trust isn't necessarily binary. There might be peripherals that you
sort-of trust, others that you really trust, and others that you don't
trust at all. For the ones you sort-of trust there may be some things
you trust about them and other things you don't trust about them.

* The firmware isn't necessarily the best arbitrar of trust. For
instance, if the company that employs me (Google) compiled their own
firmware for a given peripheral device and they were convinced that
the peripheral firmware couldn't be compromised any more easily than
code running in the kernel itself, they might assert that the
peripheral device should be "trusted". An individual Linux hacker,
however, might not really trust the firmware blob that Google provides
and might want the device to be "untrusted".

* In the PCI subsystem I believe that "trusted" vs "untrusted" is
generally associated with whether a device is soldered down onto the
board or in some type of slot (the "external" concept). That's been
working OK for them, I think, but I'm not convinced it'd be easy to
apply everywhere. One example problem: what do you do about SD cards?
The thing doing the DMA (the SDHCI controller) is certainly "internal"
but the cards are "external". I'm making the argument in my series
that SDHCI should be considered at least trusted enough to use
non-strict DMA, but it's still technically "external" and you wouldn't
necessarily, for instance, trust the filesystem structure not to be
crafted in a malicious way so as to exploit the kernel.

> What you are trying to capture (if the device
> safe enough) really isn't a function of the DMA device's driver, but a
> function of the DMA device.

It's a function of the DMA device, but the entity in the kernel with
the most knowledge about this is the device's driver. The driver also
has the best ability to make informed decisions, perhaps looking at
the device's properties (like the "non-removable" one for SD/MMC) to
help make decisions without us having to create a new property to
describe trust and then argue about who sets it and when.

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-22  2:03   ` Lu Baolu
@ 2021-06-22 16:53     ` Doug Anderson
  2021-06-22 17:01       ` Doug Anderson
  0 siblings, 1 reply; 34+ messages in thread
From: Doug Anderson @ 2021-06-22 16:53 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark, LKML,
	Saravana Kannan, linux-arm-msm, Linux MMC List, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-pci, Joel Fernandes, Rajat Jain, Sonny Rao,
	Veerabhadrarao Badiganti

Hi,

On Mon, Jun 21, 2021 at 7:05 PM Lu Baolu <baolu.lu@linux.intel.com> wrote:
>
> On 6/22/21 7:52 AM, Douglas Anderson wrote:
> > @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
> >
> >   static int iommu_group_alloc_default_domain(struct bus_type *bus,
> >                                           struct iommu_group *group,
> > -                                         unsigned int type)
> > +                                         unsigned int type,
> > +                                         struct device *dev)
> >   {
> >       struct iommu_domain *dom;
> >
> > @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
> >       if (!dom)
> >               return -ENOMEM;
> >
> > +     /* Save the strictness requests from the device */
> > +     if (dev && type == IOMMU_DOMAIN_DMA) {
> > +             dom->request_non_strict = dev->request_non_strict_iommu;
> > +             dom->force_strict = dev->force_strict_iommu;
> > +     }
> > +
>
> An iommu default domain might be used by multiple devices which might
> have different "strict" attributions. Then who could override who?

My gut instinct would be that if multiple devices were part of a given
domain that it would be combined like this:

1. Any device that requests strict makes the domain strict force strict.

2. To request non-strict all of the devices in the domain would have
to request non-strict.

To do that I'd have to change my patchset obviously, but I don't think
it should be hard. We can just keep a count of devices and a count of
the strict vs. non-strict requests? If there are no other blockers
I'll try to do that in my v2.

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-22 16:53     ` Doug Anderson
@ 2021-06-22 17:01       ` Doug Anderson
  0 siblings, 0 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-22 17:01 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark, LKML,
	Saravana Kannan, linux-arm-msm, Linux MMC List, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	linux-pci, Joel Fernandes, Rajat Jain, Sonny Rao,
	Veerabhadrarao Badiganti

Hi,

On Tue, Jun 22, 2021 at 9:53 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Jun 21, 2021 at 7:05 PM Lu Baolu <baolu.lu@linux.intel.com> wrote:
> >
> > On 6/22/21 7:52 AM, Douglas Anderson wrote:
> > > @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
> > >
> > >   static int iommu_group_alloc_default_domain(struct bus_type *bus,
> > >                                           struct iommu_group *group,
> > > -                                         unsigned int type)
> > > +                                         unsigned int type,
> > > +                                         struct device *dev)
> > >   {
> > >       struct iommu_domain *dom;
> > >
> > > @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
> > >       if (!dom)
> > >               return -ENOMEM;
> > >
> > > +     /* Save the strictness requests from the device */
> > > +     if (dev && type == IOMMU_DOMAIN_DMA) {
> > > +             dom->request_non_strict = dev->request_non_strict_iommu;
> > > +             dom->force_strict = dev->force_strict_iommu;
> > > +     }
> > > +
> >
> > An iommu default domain might be used by multiple devices which might
> > have different "strict" attributions. Then who could override who?
>
> My gut instinct would be that if multiple devices were part of a given
> domain that it would be combined like this:
>
> 1. Any device that requests strict makes the domain strict force strict.
>
> 2. To request non-strict all of the devices in the domain would have
> to request non-strict.
>
> To do that I'd have to change my patchset obviously, but I don't think
> it should be hard. We can just keep a count of devices and a count of
> the strict vs. non-strict requests? If there are no other blockers
> I'll try to do that in my v2.

One issue, I guess, is that we might need to transition a non-strict
domain to become strict. Is that possible? We'd end up with an extra
"flush queue" that we didn't need to allocate, but are there other
problems?

Actually, in general would it be possible to transition between strict
and non-strict at runtime as long as we called
init_iova_flush_queue()? Maybe that's a better solution than all
this--we just boot in strict mode and can just transition to
non-strict mode after-the-fact?


-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
                   ` (6 preceding siblings ...)
  2021-06-22 11:35 ` [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Robin Murphy
@ 2021-06-22 17:39 ` John Garry
  2021-06-22 19:50   ` Doug Anderson
  7 siblings, 1 reply; 34+ messages in thread
From: John Garry @ 2021-06-22 17:39 UTC (permalink / raw)
  To: Douglas Anderson, gregkh, rafael, rafael.j.wysocki, will,
	robin.murphy, joro, bjorn.andersson, ulf.hansson, adrian.hunter,
	bhelgaas
  Cc: robdclark, linux-arm-msm, linux-pci, quic_c_gdjako, iommu,
	sonnyrao, saiprakash.ranjan, linux-mmc, vbadigan, rajatja,
	saravanak, joel, Andy Gross, Bartosz Golaszewski, Dan Williams,
	Geert Uytterhoeven, Heikki Krogerus, Randy Dunlap, linux-kernel

On 22/06/2021 00:52, Douglas Anderson wrote:
> 
> This patch attempts to put forward a proposal for enabling non-strict
> DMA on a device-by-device basis. The patch series requests non-strict
> DMA for the Qualcomm SDHCI controller as a first device to enable,
> getting a nice bump in performance with what's believed to be a very
> small drop in security / safety (see the patch for the full argument).
> 
> As part of this patch series I am end up slightly cleaning up some of
> the interactions between the PCI subsystem and the IOMMU subsystem but
> I don't go all the way to fully remove all the tentacles. Specifically
> this patch series only concerns itself with a single aspect: strict
> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> to talk about / reason about for more subsystems compared to overall
> deciding what it means for a device to be "external" or "untrusted".
> 
> If something like this patch series ends up being landable, it will
> undoubtedly need coordination between many maintainers to land. I
> believe it's fully bisectable but later patches in the series
> definitely depend on earlier ones. Sorry for the long CC list. :(
> 

JFYI, In case to missed it, and I know it's not the same thing as you 
want, above, but the following series will allow you to build the kernel 
to default to lazy mode:

https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.garry@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e

So iommu.strict=0 would be no longer always required for arm64.

Thanks,
John


> 
> Douglas Anderson (6):
>    drivers: base: Add the concept of "pre_probe" to drivers
>    drivers: base: Add bits to struct device to control iommu strictness
>    PCI: Indicate that we want to force strict DMA for untrusted devices
>    iommu: Combine device strictness requests with the global default
>    iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
>    mmc: sdhci-msm: Request non-strict IOMMU mode
> 
>   drivers/base/dd.c             | 10 +++++--
>   drivers/iommu/dma-iommu.c     |  2 +-
>   drivers/iommu/iommu.c         | 56 +++++++++++++++++++++++++++--------
>   drivers/mmc/host/sdhci-msm.c  |  8 +++++
>   drivers/pci/probe.c           |  4 ++-
>   include/linux/device.h        | 11 +++++++
>   include/linux/device/driver.h |  9 ++++++
>   include/linux/iommu.h         |  2 ++
>   8 files changed, 85 insertions(+), 17 deletions(-)
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-21 23:52 ` [PATCH 4/6] iommu: Combine device strictness requests with the global default Douglas Anderson
                     ` (2 preceding siblings ...)
  2021-06-22 11:49   ` Robin Murphy
@ 2021-06-22 18:45   ` Rajat Jain
  2021-06-22 19:35     ` Doug Anderson
  3 siblings, 1 reply; 34+ messages in thread
From: Rajat Jain @ 2021-06-22 18:45 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: gregkh, rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas, robdclark,
	linux-arm-msm, linux-pci, quic_c_gdjako, iommu, sonnyrao,
	saiprakash.ranjan, linux-mmc, vbadigan, saravanak, joel,
	linux-kernel

On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> In the patch ("drivers: base: Add bits to struct device to control
> iommu strictness") we add the ability for devices to tell us about
> their IOMMU strictness requirements. Let's now take that into account
> in the IOMMU layer.
>
> A few notes here:
> * Presumably this is always how iommu_get_dma_strict() was intended to
>   behave. Had this not been the intention then it never would have
>   taken a domain as a parameter.
> * The iommu_set_dma_strict() feels awfully non-symmetric now. That
>   function sets the _default_ strictness globally in the system
>   whereas iommu_get_dma_strict() returns the value for a given domain
>   (falling back to the default). Presumably, at least, the fact that
>   iommu_set_dma_strict() doesn't take a domain makes this obvious.
>
> The function iommu_get_dma_strict() should now make it super obvious
> where strictness comes from and who overides who. Though the function
> changed a bunch to make the logic clearer, the only two new rules
> should be:
> * Devices can force strictness for themselves, overriding the cmdline
>   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> * Devices can request non-strictness for themselves, assuming there
>   was no cmdline "iommu.strict=1" or a call to
>   iommu_set_dma_strict(true).

Along the same lines, I believe a platform (device tree / ACPI) should
also be able to have a say in this. I assume in your proposal, a
platform would expose a property in device tree which the device
driver would need to parse and then use it to set these bits in the
"struct device"?

Thanks,

Rajat



>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>
>  drivers/iommu/iommu.c | 56 +++++++++++++++++++++++++++++++++----------
>  include/linux/iommu.h |  2 ++
>  2 files changed, 45 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 808ab70d5df5..0c84a4c06110 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -28,8 +28,19 @@
>  static struct kset *iommu_group_kset;
>  static DEFINE_IDA(iommu_group_ida);
>
> +enum iommu_strictness {
> +       IOMMU_DEFAULT_STRICTNESS = -1,
> +       IOMMU_NOT_STRICT = 0,
> +       IOMMU_STRICT = 1,
> +};
> +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> +{
> +       return (enum iommu_strictness)strictness;
> +}
> +
>  static unsigned int iommu_def_domain_type __read_mostly;
> -static bool iommu_dma_strict __read_mostly = true;
> +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
> +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
>  static u32 iommu_cmd_line __read_mostly;
>
>  struct iommu_group {
> @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
>  };
>
>  #define IOMMU_CMD_LINE_DMA_API         BIT(0)
> -#define IOMMU_CMD_LINE_STRICT          BIT(1)
>
>  static int iommu_alloc_default_domain(struct iommu_group *group,
>                                       struct device *dev);
> @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type);
>
>  static int __init iommu_dma_setup(char *str)
>  {
> -       int ret = kstrtobool(str, &iommu_dma_strict);
> +       bool strict;
> +       int ret = kstrtobool(str, &strict);
>
>         if (!ret)
> -               iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> +               cmdline_dma_strict = bool_to_strictness(strict);
>         return ret;
>  }
>  early_param("iommu.strict", iommu_dma_setup);
>
>  void iommu_set_dma_strict(bool strict)
>  {
> -       if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> -               iommu_dma_strict = strict;
> +       /* A driver can request strictness but not the other way around */
> +       if (driver_dma_strict != IOMMU_STRICT)
> +               driver_dma_strict = bool_to_strictness(strict);
>  }
>
>  bool iommu_get_dma_strict(struct iommu_domain *domain)
>  {
> -       /* only allow lazy flushing for DMA domains */
> -       if (domain->type == IOMMU_DOMAIN_DMA)
> -               return iommu_dma_strict;
> +       /* Non-DMA domains or anyone forcing it to strict makes it strict */
> +       if (domain->type != IOMMU_DOMAIN_DMA ||
> +           cmdline_dma_strict == IOMMU_STRICT ||
> +           driver_dma_strict == IOMMU_STRICT ||
> +           domain->force_strict)
> +               return true;
> +
> +       /* Anyone requesting non-strict (if no forces) makes it non-strict */
> +       if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> +           driver_dma_strict == IOMMU_NOT_STRICT ||
> +           domain->request_non_strict)
> +               return false;
> +
> +       /* Nobody said anything, so it's strict by default */
>         return true;
>  }
>  EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
> @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
>
>  static int iommu_group_alloc_default_domain(struct bus_type *bus,
>                                             struct iommu_group *group,
> -                                           unsigned int type)
> +                                           unsigned int type,
> +                                           struct device *dev)
>  {
>         struct iommu_domain *dom;
>
> @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,
>         if (!dom)
>                 return -ENOMEM;
>
> +       /* Save the strictness requests from the device */
> +       if (dev && type == IOMMU_DOMAIN_DMA) {
> +               dom->request_non_strict = dev->request_non_strict_iommu;
> +               dom->force_strict = dev->force_strict_iommu;
> +       }
> +
>         group->default_domain = dom;
>         if (!group->domain)
>                 group->domain = dom;
> @@ -1550,7 +1580,7 @@ static int iommu_alloc_default_domain(struct iommu_group *group,
>
>         type = iommu_get_def_domain_type(dev) ? : iommu_def_domain_type;
>
> -       return iommu_group_alloc_default_domain(dev->bus, group, type);
> +       return iommu_group_alloc_default_domain(dev->bus, group, type, dev);
>  }
>
>  /**
> @@ -1721,7 +1751,7 @@ static void probe_alloc_default_domain(struct bus_type *bus,
>         if (!gtype.type)
>                 gtype.type = iommu_def_domain_type;
>
> -       iommu_group_alloc_default_domain(bus, group, gtype.type);
> +       iommu_group_alloc_default_domain(bus, group, gtype.type, NULL);
>
>  }
>
> @@ -3130,7 +3160,7 @@ static int iommu_change_dev_def_domain(struct iommu_group *group,
>         }
>
>         /* Sets group->default_domain to the newly allocated domain */
> -       ret = iommu_group_alloc_default_domain(dev->bus, group, type);
> +       ret = iommu_group_alloc_default_domain(dev->bus, group, type, dev);
>         if (ret)
>                 goto out;
>
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 32d448050bf7..0bddef77f415 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -81,6 +81,8 @@ struct iommu_domain_geometry {
>
>  struct iommu_domain {
>         unsigned type;
> +       bool force_strict:1;
> +       bool request_non_strict:1;
>         const struct iommu_ops *ops;
>         unsigned long pgsize_bitmap;    /* Bitmap of page sizes in use */
>         iommu_fault_handler_t handler;
> --
> 2.32.0.288.g62a8d224e6-goog
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-22 18:45   ` Rajat Jain
@ 2021-06-22 19:35     ` Doug Anderson
  0 siblings, 0 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-22 19:35 UTC (permalink / raw)
  To: Rajat Jain
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark,
	linux-arm-msm, linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Saravana Kannan, Joel Fernandes, LKML

Hi,

On Tue, Jun 22, 2021 at 11:45 AM Rajat Jain <rajatja@google.com> wrote:
>
> On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson <dianders@chromium.org> wrote:
> >
> > In the patch ("drivers: base: Add bits to struct device to control
> > iommu strictness") we add the ability for devices to tell us about
> > their IOMMU strictness requirements. Let's now take that into account
> > in the IOMMU layer.
> >
> > A few notes here:
> > * Presumably this is always how iommu_get_dma_strict() was intended to
> >   behave. Had this not been the intention then it never would have
> >   taken a domain as a parameter.
> > * The iommu_set_dma_strict() feels awfully non-symmetric now. That
> >   function sets the _default_ strictness globally in the system
> >   whereas iommu_get_dma_strict() returns the value for a given domain
> >   (falling back to the default). Presumably, at least, the fact that
> >   iommu_set_dma_strict() doesn't take a domain makes this obvious.
> >
> > The function iommu_get_dma_strict() should now make it super obvious
> > where strictness comes from and who overides who. Though the function
> > changed a bunch to make the logic clearer, the only two new rules
> > should be:
> > * Devices can force strictness for themselves, overriding the cmdline
> >   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> > * Devices can request non-strictness for themselves, assuming there
> >   was no cmdline "iommu.strict=1" or a call to
> >   iommu_set_dma_strict(true).
>
> Along the same lines, I believe a platform (device tree / ACPI) should
> also be able to have a say in this. I assume in your proposal, a
> platform would expose a property in device tree which the device
> driver would need to parse and then use it to set these bits in the
> "struct device"?

Nothing would prevent creating a device tree or ACPI property that
caused either "force-strict" or "request-non-strict" from being set if
everyone agrees that it's a good idea. I wouldn't reject the idea
myself, but I do worry that we'd devolve into the usual bikeshed for
exactly how this should look. I talked about this a bit in my response
to Saravana, but basically:

* If there was some generic property, would we call it "untrusted",
"external", or something else?

* How do you describe "trust" in a generic "objective" way? It's not
really boolean and trying to describe exactly how trustworthy
something should be considered is hard.

* At least for the device tree there's a general requirement that it
describes the hardware and not so much how the software should
configure the hardware. As I understand it there is _some_ leeway here
where it's OK to describe how the hardware was designed for the OS to
configure it, but it's a pretty high bar and a hard sell. In general
the device tree isn't supposed to be used to describe "policy". In
other words: if one OS might decide on one setting and another OS on
another then it doesn't really belong in the device tree.

* In general the kernel is also not really supposed to have policy
hardcoded in either, though it feels like we can get away with having
a good default/sane policy and allowing overriding the policy with
command line parameters (like iommu.strict). In the case where
something has to be configured at bootup there's not many ways to do
better.


tl;dr: I have no plans to try to make an overarching property, but my
patch series does allow subsystems to come up with and easily
implement their own rules as it makes sense. While this might seem
hodgepodge I prefer to see it as "flexible" since I'm not convinced
that we're going to be able to come up with an overarching trust
framework.

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 17:39 ` John Garry
@ 2021-06-22 19:50   ` Doug Anderson
  0 siblings, 0 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-22 19:50 UTC (permalink / raw)
  To: John Garry
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark,
	linux-arm-msm, linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Saravana Kannan,
	Joel Fernandes, Andy Gross, Bartosz Golaszewski, Dan Williams,
	Geert Uytterhoeven, Heikki Krogerus, Randy Dunlap, LKML

Hi,

On Tue, Jun 22, 2021 at 10:46 AM John Garry <john.garry@huawei.com> wrote:
>
> On 22/06/2021 00:52, Douglas Anderson wrote:
> >
> > This patch attempts to put forward a proposal for enabling non-strict
> > DMA on a device-by-device basis. The patch series requests non-strict
> > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > getting a nice bump in performance with what's believed to be a very
> > small drop in security / safety (see the patch for the full argument).
> >
> > As part of this patch series I am end up slightly cleaning up some of
> > the interactions between the PCI subsystem and the IOMMU subsystem but
> > I don't go all the way to fully remove all the tentacles. Specifically
> > this patch series only concerns itself with a single aspect: strict
> > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > to talk about / reason about for more subsystems compared to overall
> > deciding what it means for a device to be "external" or "untrusted".
> >
> > If something like this patch series ends up being landable, it will
> > undoubtedly need coordination between many maintainers to land. I
> > believe it's fully bisectable but later patches in the series
> > definitely depend on earlier ones. Sorry for the long CC list. :(
> >
>
> JFYI, In case to missed it, and I know it's not the same thing as you
> want, above, but the following series will allow you to build the kernel
> to default to lazy mode:
>
> https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.garry@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e
>
> So iommu.strict=0 would be no longer always required for arm64.

Excellent! I'm never a fan of command line parameters as a replacement
for Kconfig. They are great for debugging or for cases where the user
of the kernel and the person compiling the kernel are not the same
(like with off-the-shelf Linux distros) but aren't great for setting a
default for embedded environments.

I actually think that something like my patchset may be even more
important atop yours. Do you agree? If the default is non-strict it
could be extra important to be able to mark a certain device to be in
"strict" mode.

...also, unfortunately I probably won't be able to use your patchest
for my use case. I think we want the extra level of paranoia by
default and really only want to allow non-strict mode for devices that
we're really convinced are safe.

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
  2021-06-22 16:40     ` Doug Anderson
@ 2021-06-22 19:50       ` Saravana Kannan
  0 siblings, 0 replies; 34+ messages in thread
From: Saravana Kannan @ 2021-06-22 19:50 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Robin Murphy, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark,
	linux-arm-msm, linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Joel Fernandes, LKML

On Tue, Jun 22, 2021 at 9:46 AM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Jun 21, 2021 at 7:56 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson <dianders@chromium.org> wrote:
> > >
> > > In the patch ("drivers: base: Add bits to struct device to control
> > > iommu strictness") we add the ability for devices to tell us about
> > > their IOMMU strictness requirements. Let's now take that into account
> > > in the IOMMU layer.
> > >
> > > A few notes here:
> > > * Presumably this is always how iommu_get_dma_strict() was intended to
> > >   behave. Had this not been the intention then it never would have
> > >   taken a domain as a parameter.
> > > * The iommu_set_dma_strict() feels awfully non-symmetric now. That
> > >   function sets the _default_ strictness globally in the system
> > >   whereas iommu_get_dma_strict() returns the value for a given domain
> > >   (falling back to the default). Presumably, at least, the fact that
> > >   iommu_set_dma_strict() doesn't take a domain makes this obvious.
> > >
> > > The function iommu_get_dma_strict() should now make it super obvious
> > > where strictness comes from and who overides who. Though the function
> > > changed a bunch to make the logic clearer, the only two new rules
> > > should be:
> > > * Devices can force strictness for themselves, overriding the cmdline
> > >   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> > > * Devices can request non-strictness for themselves, assuming there
> > >   was no cmdline "iommu.strict=1" or a call to
> > >   iommu_set_dma_strict(true).
> > >
> > > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > > ---
> > >
> > >  drivers/iommu/iommu.c | 56 +++++++++++++++++++++++++++++++++----------
> > >  include/linux/iommu.h |  2 ++
> > >  2 files changed, 45 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 808ab70d5df5..0c84a4c06110 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -28,8 +28,19 @@
> > >  static struct kset *iommu_group_kset;
> > >  static DEFINE_IDA(iommu_group_ida);
> > >
> > > +enum iommu_strictness {
> > > +       IOMMU_DEFAULT_STRICTNESS = -1,
> > > +       IOMMU_NOT_STRICT = 0,
> > > +       IOMMU_STRICT = 1,
> > > +};
> > > +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> > > +{
> > > +       return (enum iommu_strictness)strictness;
> > > +}
> > > +
> > >  static unsigned int iommu_def_domain_type __read_mostly;
> > > -static bool iommu_dma_strict __read_mostly = true;
> > > +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
> > > +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS;
> > >  static u32 iommu_cmd_line __read_mostly;
> > >
> > >  struct iommu_group {
> > > @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
> > >  };
> > >
> > >  #define IOMMU_CMD_LINE_DMA_API         BIT(0)
> > > -#define IOMMU_CMD_LINE_STRICT          BIT(1)
> > >
> > >  static int iommu_alloc_default_domain(struct iommu_group *group,
> > >                                       struct device *dev);
> > > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type);
> > >
> > >  static int __init iommu_dma_setup(char *str)
> > >  {
> > > -       int ret = kstrtobool(str, &iommu_dma_strict);
> > > +       bool strict;
> > > +       int ret = kstrtobool(str, &strict);
> > >
> > >         if (!ret)
> > > -               iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> > > +               cmdline_dma_strict = bool_to_strictness(strict);
> > >         return ret;
> > >  }
> > >  early_param("iommu.strict", iommu_dma_setup);
> > >
> > >  void iommu_set_dma_strict(bool strict)
> > >  {
> > > -       if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> > > -               iommu_dma_strict = strict;
> > > +       /* A driver can request strictness but not the other way around */
> > > +       if (driver_dma_strict != IOMMU_STRICT)
> > > +               driver_dma_strict = bool_to_strictness(strict);
> > >  }
> > >
> > >  bool iommu_get_dma_strict(struct iommu_domain *domain)
> > >  {
> > > -       /* only allow lazy flushing for DMA domains */
> > > -       if (domain->type == IOMMU_DOMAIN_DMA)
> > > -               return iommu_dma_strict;
> > > +       /* Non-DMA domains or anyone forcing it to strict makes it strict */
> > > +       if (domain->type != IOMMU_DOMAIN_DMA ||
> > > +           cmdline_dma_strict == IOMMU_STRICT ||
> > > +           driver_dma_strict == IOMMU_STRICT ||
> > > +           domain->force_strict)
> > > +               return true;
> > > +
> > > +       /* Anyone requesting non-strict (if no forces) makes it non-strict */
> > > +       if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> > > +           driver_dma_strict == IOMMU_NOT_STRICT ||
> > > +           domain->request_non_strict)
> > > +               return false;
> > > +
> > > +       /* Nobody said anything, so it's strict by default */
> >
> > If iommu.strict is not set in the command line, upstream treats it as
> > iommu.strict=1. Meaning, no drivers can override it.
> >
> > If I understand it correctly, with your series, if iommu.strict=1 is
> > not set, drivers can override the "default strict mode" and ask for
> > non-strict mode for their domain. So if this series gets in and future
> > driver changes start asking for non-strict mode, systems that are
> > expected to operate in fully strict mode will now have devices
> > operating in non-strict mode.
> >
> > That's breaking backward compatibility for the kernel command line
> > param. It looks like what you really need is to change iommu.strict
> > from 0/1 to lazy (previously 0), strict preferred, strict enforced
> > (previously 1) and you need to default it to "enforced".
>
> I'm not quite sure I'd agree, but certainly it could be up for debate.
> I think I'm keeping full compatibility with the kernel command line
> parameter, specifically:
>
> * iommu.strict=0: default to non-strict mode unless a driver overrides
>
> * iommu.strict=1: force everything to strict no matter what
>
> ...both of those two things are the same before and after my patchset.
>
> You're arguing that I'm changing the behavior of the system when no
> command line parameter is present. To me this seems a little bit of a
> stretch. If no command line parameter is present I'd assert that the
> kernel should do some sort of sane behavior and that we don't have to
> force 100% strictness if the command line parameter isn't present at
> all.
>
> I would also note that your assertion that the system is 100% strict
> under the "no command line parameter" case isn't actually true as far
> as I can tell. The code in mainline is a little hard to follow (for me
> the code after my patch is easier to follow), but you can see that
> even before my patch a call to iommu_set_dma_strict() could be used to
> make the system non-strict if no command-line parameter was passed.

Well, the kernel doc says if you don't list iommu.strict, it defaults
to 1. So no one would have a reason to set all the command line
options to their default values. You can't fit that all into the
commandline anyway :) That's why I think this breaks backward
compatibility. But I'll let the IOMMU folks Ack/Nak based on this. I
just wanted to highlight the difference in behavior.

>
>
> > Alternately (and potentially a better option?), you really should be
> > changing/extending dev_is_untrusted() so that it applies for any
> > struct device (not just PCI device) and then have this overridden in
> > DT (or ACPI or any firmware) to indicate a specific device is safe to
> > use non-strict mode on.
>
> I was really trying _not_ to do that. I believe this has been talked
> about several times, including at last year's Linux Plumbers
> conference. As far as I can tell it always ends in a shouting match w/
> no forward progress.

I feel you :) DT changes can be painful. But in this case, it seems
like the right path. We'll see what other think. Looks like Rajat has
a similar view.

> There are a bunch of problems here, namely:
>
> * Trust isn't necessarily binary. There might be peripherals that you
> sort-of trust, others that you really trust, and others that you don't
> trust at all. For the ones you sort-of trust there may be some things
> you trust about them and other things you don't trust about them.

Right, then let's not create a boolean property. It also doesn't have
to be a "trust" property. It could simply be an "allow-lazy-tlb-flush"
property. Let's figure that out.

> * The firmware isn't necessarily the best arbitrar of trust. For
> instance, if the company that employs me (Google) compiled their own
> firmware for a given peripheral device and they were convinced that
> the peripheral firmware couldn't be compromised any more easily than
> code running in the kernel itself, they might assert that the
> peripheral device should be "trusted".

The peripheral manufacturer IS the right entity to decide if the
hardware is trustworthy or not. Saying you don't trust the DT from a
peripheral manufacturer is a whole other problem that isn't going to
be solved by one series.

> An individual Linux hacker,
> however, might not really trust the firmware blob that Google provides
> and might want the device to be "untrusted".

If the Linux hacker wants to overwrite the kernel in their device, I
expect they'd also be able to change the DT. And they always have
iommu.strict=1 if they don't trust the firmware.

> * In the PCI subsystem I believe that "trusted" vs "untrusted" is
> generally associated with whether a device is soldered down onto the
> board or in some type of slot (the "external" concept). That's been
> working OK for them, I think, but I'm not convinced it'd be easy to
> apply everywhere. One example problem: what do you do about SD cards?
> The thing doing the DMA (the SDHCI controller) is certainly "internal"
> but the cards are "external". I'm making the argument in my series
> that SDHCI should be considered at least trusted enough to use
> non-strict DMA, but it's still technically "external" and you wouldn't
> necessarily, for instance, trust the filesystem structure not to be
> crafted in a malicious way so as to exploit the kernel.

Right, I wasn't clear. I didn't say we should add a "trusted vs
untrusted" property to firmware. We can make it more granular and let
the DT/bus device. In this case, the DT could set it up and PCI can
override it if it wants to.

> > What you are trying to capture (if the device
> > safe enough) really isn't a function of the DMA device's driver, but a
> > function of the DMA device.
>
> It's a function of the DMA device, but the entity in the kernel with
> the most knowledge about this is the device's driver. The driver also
> has the best ability to make informed decisions, perhaps looking at
> the device's properties (like the "non-removable" one for SD/MMC) to
> help make decisions without us having to create a new property to
> describe trust and then argue about who sets it and when.

I'd definitely disagree on this point. The driver doesn't know
anything about the hardware unless it comes from DT. It might not be
an explicit property and you might infer stuff from the compatible
string, but that's still coming from DT. In your example, the QC
SD/MMC driver always sets the flag because it assumes the specific
device (compatible string it's matching) is always secure. So the
driver is just passing along information from DT. So I'm saying let's
make it a generic property that says if a device is secure enough to
allow lazy-tlb-flush.

Again, I'm not going to Nak/Ack the series, but this is my 2 cents.

-Saravana

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 16:06   ` Doug Anderson
@ 2021-06-22 20:02     ` Rob Herring
  2021-06-22 20:05       ` Saravana Kannan
  2021-06-22 22:10     ` Robin Murphy
  1 sibling, 1 reply; 34+ messages in thread
From: Rob Herring @ 2021-06-22 20:02 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Robin Murphy, Greg Kroah-Hartman, Rafael J. Wysocki,
	Rafael J. Wysocki, Will Deacon, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark,
	linux-arm-msm, linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Saravana Kannan,
	Joel Fernandes, Andy Gross, Bartosz Golaszewski, Dan Williams,
	Geert Uytterhoeven, Heikki Krogerus, Randy Dunlap, LKML

On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> Hi,
> 
> On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >
> > Hi Doug,
> >
> > On 2021-06-22 00:52, Douglas Anderson wrote:
> > >
> > > This patch attempts to put forward a proposal for enabling non-strict
> > > DMA on a device-by-device basis. The patch series requests non-strict
> > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > getting a nice bump in performance with what's believed to be a very
> > > small drop in security / safety (see the patch for the full argument).
> > >
> > > As part of this patch series I am end up slightly cleaning up some of
> > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > I don't go all the way to fully remove all the tentacles. Specifically
> > > this patch series only concerns itself with a single aspect: strict
> > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > to talk about / reason about for more subsystems compared to overall
> > > deciding what it means for a device to be "external" or "untrusted".
> > >
> > > If something like this patch series ends up being landable, it will
> > > undoubtedly need coordination between many maintainers to land. I
> > > believe it's fully bisectable but later patches in the series
> > > definitely depend on earlier ones. Sorry for the long CC list. :(
> >
> > Unfortunately, this doesn't work. In normal operation, the default
> > domains should be established long before individual drivers are even
> > loaded (if they are modules), let alone anywhere near probing. The fact
> > that iommu_probe_device() sometimes gets called far too late off the
> > back of driver probe is an unfortunate artefact of the original
> > probe-deferral scheme, and causes other problems like potentially
> > malformed groups - I've been forming a plan to fix that for a while now,
> > so I for one really can't condone anything trying to rely on it.
> > Non-deterministic behaviour based on driver probe order for multi-device
> > groups is part of the existing problem, and your proposal seems equally
> > vulnerable to that too.
> 
> Doh! :( I definitely can't say I understand the iommu subsystem
> amazingly well. It was working for me, but I could believe that I was
> somehow violating a rule somewhere.
> 
> I'm having a bit of a hard time understanding where the problem is
> though. Is there any chance that you missed the part of my series
> where I introduced a "pre_probe" step? Specifically, I see this:
> 
> * really_probe() is called w/ a driver and a device.
> * -> calls dev->bus->dma_configure() w/ a "struct device *"
> * -> eventually calls iommu_probe_device() w/ the device.
> * -> calls iommu_alloc_default_domain() w/ the device
> * -> calls iommu_group_alloc_default_domain()
> * -> always allocates a new domain
> 
> ...so we always have a "struct device" when a domain is allocated if
> that domain is going to be associated with a device.
> 
> I will agree that iommu_probe_device() is called before the driver
> probe, but unless I missed something it's after the device driver is
> loaded.  ...and assuming something like patch #1 in this series looks
> OK then iommu_probe_device() will be called after "pre_probe".
> 
> So assuming I'm not missing something, I'm not actually relying the
> IOMMU getting init off the back of driver probe.
> 
> 
> > FWIW we already have a go-faster knob for people who want to tweak the
> > security/performance compromise for specific devices, namely the sysfs
> > interface for changing a group's domain type before binding the relevant
> > driver(s). Is that something you could use in your application, say from
> > an initramfs script?
> 
> We've never had an initramfs script in Chrome OS. I don't know all the
> history of why (I'm trying to check), but I'm nearly certain it was a
> conscious decision. Probably it has to do with the fact that we're not
> trying to build a generic distribution where a single boot source can
> boot a huge variety of hardware. We generally have one kernel for a
> class of devices. I believe avoiding the initramfs just keeps things
> simpler.
> 
> I think trying to revamp Chrome OS to switch to an initramfs type
> system would be a pretty big undertaking since (as I understand it)
> you can't just run a little command and then return to the normal boot
> flow. Once you switch to initramfs you're committing to finding /
> setting up the rootfs yourself and on Chrome OS I believe that means a
> whole bunch of dm-verity work.
> 
> 
> ...so probably the initramfs is a no-go for me, but I'm still crossing
> my fingers that the pre_probe() might be legit if you take a second
> look at it?

Couldn't you have a driver flag that has the same effect as twiddling 
sysfs? At the being of probe, check the flag and go set the underlying 
sysfs setting in the device.

Though you may want this to be per device, not per driver. To do that 
early, I think you'd need a DT property. I wouldn't be totally opposed 
to that and I appreciate you not starting there. :)

Rob

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 20:02     ` Rob Herring
@ 2021-06-22 20:05       ` Saravana Kannan
  2021-06-22 20:10         ` Doug Anderson
  0 siblings, 1 reply; 34+ messages in thread
From: Saravana Kannan @ 2021-06-22 20:05 UTC (permalink / raw)
  To: Rob Herring
  Cc: Doug Anderson, Robin Murphy, Greg Kroah-Hartman,
	Rafael J. Wysocki, Rafael J. Wysocki, Will Deacon, Joerg Roedel,
	Bjorn Andersson, Ulf Hansson, Adrian Hunter, Bjorn Helgaas,
	Rob Clark, linux-arm-msm, linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Joel Fernandes, Andy Gross,
	Bartosz Golaszewski, Dan Williams, Geert Uytterhoeven,
	Heikki Krogerus, Randy Dunlap, LKML

On Tue, Jun 22, 2021 at 1:02 PM Rob Herring <robh@kernel.org> wrote:
>
> On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > >
> > > Hi Doug,
> > >
> > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > >
> > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > getting a nice bump in performance with what's believed to be a very
> > > > small drop in security / safety (see the patch for the full argument).
> > > >
> > > > As part of this patch series I am end up slightly cleaning up some of
> > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > this patch series only concerns itself with a single aspect: strict
> > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > to talk about / reason about for more subsystems compared to overall
> > > > deciding what it means for a device to be "external" or "untrusted".
> > > >
> > > > If something like this patch series ends up being landable, it will
> > > > undoubtedly need coordination between many maintainers to land. I
> > > > believe it's fully bisectable but later patches in the series
> > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > >
> > > Unfortunately, this doesn't work. In normal operation, the default
> > > domains should be established long before individual drivers are even
> > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > that iommu_probe_device() sometimes gets called far too late off the
> > > back of driver probe is an unfortunate artefact of the original
> > > probe-deferral scheme, and causes other problems like potentially
> > > malformed groups - I've been forming a plan to fix that for a while now,
> > > so I for one really can't condone anything trying to rely on it.
> > > Non-deterministic behaviour based on driver probe order for multi-device
> > > groups is part of the existing problem, and your proposal seems equally
> > > vulnerable to that too.
> >
> > Doh! :( I definitely can't say I understand the iommu subsystem
> > amazingly well. It was working for me, but I could believe that I was
> > somehow violating a rule somewhere.
> >
> > I'm having a bit of a hard time understanding where the problem is
> > though. Is there any chance that you missed the part of my series
> > where I introduced a "pre_probe" step? Specifically, I see this:
> >
> > * really_probe() is called w/ a driver and a device.
> > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > * -> eventually calls iommu_probe_device() w/ the device.
> > * -> calls iommu_alloc_default_domain() w/ the device
> > * -> calls iommu_group_alloc_default_domain()
> > * -> always allocates a new domain
> >
> > ...so we always have a "struct device" when a domain is allocated if
> > that domain is going to be associated with a device.
> >
> > I will agree that iommu_probe_device() is called before the driver
> > probe, but unless I missed something it's after the device driver is
> > loaded.  ...and assuming something like patch #1 in this series looks
> > OK then iommu_probe_device() will be called after "pre_probe".
> >
> > So assuming I'm not missing something, I'm not actually relying the
> > IOMMU getting init off the back of driver probe.
> >
> >
> > > FWIW we already have a go-faster knob for people who want to tweak the
> > > security/performance compromise for specific devices, namely the sysfs
> > > interface for changing a group's domain type before binding the relevant
> > > driver(s). Is that something you could use in your application, say from
> > > an initramfs script?
> >
> > We've never had an initramfs script in Chrome OS. I don't know all the
> > history of why (I'm trying to check), but I'm nearly certain it was a
> > conscious decision. Probably it has to do with the fact that we're not
> > trying to build a generic distribution where a single boot source can
> > boot a huge variety of hardware. We generally have one kernel for a
> > class of devices. I believe avoiding the initramfs just keeps things
> > simpler.
> >
> > I think trying to revamp Chrome OS to switch to an initramfs type
> > system would be a pretty big undertaking since (as I understand it)
> > you can't just run a little command and then return to the normal boot
> > flow. Once you switch to initramfs you're committing to finding /
> > setting up the rootfs yourself and on Chrome OS I believe that means a
> > whole bunch of dm-verity work.
> >
> >
> > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > my fingers that the pre_probe() might be legit if you take a second
> > look at it?
>
> Couldn't you have a driver flag that has the same effect as twiddling
> sysfs? At the being of probe, check the flag and go set the underlying
> sysfs setting in the device.

My understanding of what Robin is saying is that we'd need this info
well before the driver is even available. The pre_probe() is
effectively doing the same thing you are suggesting.

> Though you may want this to be per device, not per driver. To do that
> early, I think you'd need a DT property. I wouldn't be totally opposed
> to that and I appreciate you not starting there. :)

Which is what I'm suggest elsewhere in the thread:

https://lore.kernel.org/lkml/CAGETcx83qCZF5JN5cqXxdSFiEgfc4jYESJg-RepL2wJXJv0Eww@mail.gmail.com/

-Saravana

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 20:05       ` Saravana Kannan
@ 2021-06-22 20:10         ` Doug Anderson
  2021-06-23 13:54           ` Rob Herring
  0 siblings, 1 reply; 34+ messages in thread
From: Doug Anderson @ 2021-06-22 20:10 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Robin Murphy, Greg Kroah-Hartman, Rafael J. Wysocki,
	Rafael J. Wysocki, Will Deacon, Joerg Roedel, Bjorn Andersson,
	Ulf Hansson, Adrian Hunter, Bjorn Helgaas, Rob Clark,
	linux-arm-msm, linux-pci, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Joel Fernandes, Andy Gross,
	Bartosz Golaszewski, Dan Williams, Geert Uytterhoeven,
	Heikki Krogerus, Randy Dunlap, LKML

Hi,

On Tue, Jun 22, 2021 at 1:06 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Jun 22, 2021 at 1:02 PM Rob Herring <robh@kernel.org> wrote:
> >
> > On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > > >
> > > > Hi Doug,
> > > >
> > > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > > >
> > > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > > getting a nice bump in performance with what's believed to be a very
> > > > > small drop in security / safety (see the patch for the full argument).
> > > > >
> > > > > As part of this patch series I am end up slightly cleaning up some of
> > > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > > this patch series only concerns itself with a single aspect: strict
> > > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > > to talk about / reason about for more subsystems compared to overall
> > > > > deciding what it means for a device to be "external" or "untrusted".
> > > > >
> > > > > If something like this patch series ends up being landable, it will
> > > > > undoubtedly need coordination between many maintainers to land. I
> > > > > believe it's fully bisectable but later patches in the series
> > > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > > >
> > > > Unfortunately, this doesn't work. In normal operation, the default
> > > > domains should be established long before individual drivers are even
> > > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > > that iommu_probe_device() sometimes gets called far too late off the
> > > > back of driver probe is an unfortunate artefact of the original
> > > > probe-deferral scheme, and causes other problems like potentially
> > > > malformed groups - I've been forming a plan to fix that for a while now,
> > > > so I for one really can't condone anything trying to rely on it.
> > > > Non-deterministic behaviour based on driver probe order for multi-device
> > > > groups is part of the existing problem, and your proposal seems equally
> > > > vulnerable to that too.
> > >
> > > Doh! :( I definitely can't say I understand the iommu subsystem
> > > amazingly well. It was working for me, but I could believe that I was
> > > somehow violating a rule somewhere.
> > >
> > > I'm having a bit of a hard time understanding where the problem is
> > > though. Is there any chance that you missed the part of my series
> > > where I introduced a "pre_probe" step? Specifically, I see this:
> > >
> > > * really_probe() is called w/ a driver and a device.
> > > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > > * -> eventually calls iommu_probe_device() w/ the device.
> > > * -> calls iommu_alloc_default_domain() w/ the device
> > > * -> calls iommu_group_alloc_default_domain()
> > > * -> always allocates a new domain
> > >
> > > ...so we always have a "struct device" when a domain is allocated if
> > > that domain is going to be associated with a device.
> > >
> > > I will agree that iommu_probe_device() is called before the driver
> > > probe, but unless I missed something it's after the device driver is
> > > loaded.  ...and assuming something like patch #1 in this series looks
> > > OK then iommu_probe_device() will be called after "pre_probe".
> > >
> > > So assuming I'm not missing something, I'm not actually relying the
> > > IOMMU getting init off the back of driver probe.
> > >
> > >
> > > > FWIW we already have a go-faster knob for people who want to tweak the
> > > > security/performance compromise for specific devices, namely the sysfs
> > > > interface for changing a group's domain type before binding the relevant
> > > > driver(s). Is that something you could use in your application, say from
> > > > an initramfs script?
> > >
> > > We've never had an initramfs script in Chrome OS. I don't know all the
> > > history of why (I'm trying to check), but I'm nearly certain it was a
> > > conscious decision. Probably it has to do with the fact that we're not
> > > trying to build a generic distribution where a single boot source can
> > > boot a huge variety of hardware. We generally have one kernel for a
> > > class of devices. I believe avoiding the initramfs just keeps things
> > > simpler.
> > >
> > > I think trying to revamp Chrome OS to switch to an initramfs type
> > > system would be a pretty big undertaking since (as I understand it)
> > > you can't just run a little command and then return to the normal boot
> > > flow. Once you switch to initramfs you're committing to finding /
> > > setting up the rootfs yourself and on Chrome OS I believe that means a
> > > whole bunch of dm-verity work.
> > >
> > >
> > > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > > my fingers that the pre_probe() might be legit if you take a second
> > > look at it?
> >
> > Couldn't you have a driver flag that has the same effect as twiddling
> > sysfs? At the being of probe, check the flag and go set the underlying
> > sysfs setting in the device.
>
> My understanding of what Robin is saying is that we'd need this info
> well before the driver is even available. The pre_probe() is
> effectively doing the same thing you are suggesting.

Right, I was just about to respond with the same. ;-) So overall right
now we're blocked waiting for someone to point out the error in my
logic. ;-)


> > Though you may want this to be per device, not per driver. To do that
> > early, I think you'd need a DT property. I wouldn't be totally opposed
> > to that and I appreciate you not starting there. :)
>
> Which is what I'm suggest elsewhere in the thread:
>
> https://lore.kernel.org/lkml/CAGETcx83qCZF5JN5cqXxdSFiEgfc4jYESJg-RepL2wJXJv0Eww@mail.gmail.com/

Rob: I'd be happy if you wanted to comment on that thread. If you say
that it's fine to add a generic device tree property to control
strictness then I'm more than happy to add support for it. I've been
going on the theory that you'd NAK such a property but I'm totally
good with being wrong. ;-)

I'd be more than happy if you could suggest what you'd envision such a
property to be named.


-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 16:06   ` Doug Anderson
  2021-06-22 20:02     ` Rob Herring
@ 2021-06-22 22:10     ` Robin Murphy
  2021-06-23 17:29       ` Doug Anderson
  1 sibling, 1 reply; 34+ messages in thread
From: Robin Murphy @ 2021-06-22 22:10 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Joerg Roedel, Bjorn Andersson, Ulf Hansson,
	Adrian Hunter, Bjorn Helgaas, Rob Clark, linux-arm-msm,
	linux-pci, quic_c_gdjako, list@263.net:IOMMU DRIVERS,
	Joerg Roedel, iommu, Sonny Rao, Sai Prakash Ranjan,
	Linux MMC List, Veerabhadrarao Badiganti, Rajat Jain,
	Saravana Kannan, Joel Fernandes, Andy Gross, Bartosz Golaszewski,
	Dan Williams, Geert Uytterhoeven, Heikki Krogerus, Randy Dunlap,
	LKML

On 2021-06-22 17:06, Doug Anderson wrote:
> Hi,
> 
> On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
>>
>> Hi Doug,
>>
>> On 2021-06-22 00:52, Douglas Anderson wrote:
>>>
>>> This patch attempts to put forward a proposal for enabling non-strict
>>> DMA on a device-by-device basis. The patch series requests non-strict
>>> DMA for the Qualcomm SDHCI controller as a first device to enable,
>>> getting a nice bump in performance with what's believed to be a very
>>> small drop in security / safety (see the patch for the full argument).
>>>
>>> As part of this patch series I am end up slightly cleaning up some of
>>> the interactions between the PCI subsystem and the IOMMU subsystem but
>>> I don't go all the way to fully remove all the tentacles. Specifically
>>> this patch series only concerns itself with a single aspect: strict
>>> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
>>> to talk about / reason about for more subsystems compared to overall
>>> deciding what it means for a device to be "external" or "untrusted".
>>>
>>> If something like this patch series ends up being landable, it will
>>> undoubtedly need coordination between many maintainers to land. I
>>> believe it's fully bisectable but later patches in the series
>>> definitely depend on earlier ones. Sorry for the long CC list. :(
>>
>> Unfortunately, this doesn't work. In normal operation, the default
>> domains should be established long before individual drivers are even
>> loaded (if they are modules), let alone anywhere near probing. The fact
>> that iommu_probe_device() sometimes gets called far too late off the
>> back of driver probe is an unfortunate artefact of the original
>> probe-deferral scheme, and causes other problems like potentially
>> malformed groups - I've been forming a plan to fix that for a while now,
>> so I for one really can't condone anything trying to rely on it.
>> Non-deterministic behaviour based on driver probe order for multi-device
>> groups is part of the existing problem, and your proposal seems equally
>> vulnerable to that too.
> 
> Doh! :( I definitely can't say I understand the iommu subsystem
> amazingly well. It was working for me, but I could believe that I was
> somehow violating a rule somewhere.
> 
> I'm having a bit of a hard time understanding where the problem is
> though. Is there any chance that you missed the part of my series
> where I introduced a "pre_probe" step? Specifically, I see this:
> 
> * really_probe() is called w/ a driver and a device.
> * -> calls dev->bus->dma_configure() w/ a "struct device *"
> * -> eventually calls iommu_probe_device() w/ the device.

This...

> * -> calls iommu_alloc_default_domain() w/ the device
> * -> calls iommu_group_alloc_default_domain()
> * -> always allocates a new domain
> 
> ...so we always have a "struct device" when a domain is allocated if
> that domain is going to be associated with a device.
> 
> I will agree that iommu_probe_device() is called before the driver
> probe, but unless I missed something it's after the device driver is
> loaded.  ...and assuming something like patch #1 in this series looks
> OK then iommu_probe_device() will be called after "pre_probe".
> 
> So assuming I'm not missing something, I'm not actually relying the
> IOMMU getting init off the back of driver probe.

...is implicitly that. Sorry that it's not obvious.

The "proper" flow is that iommu_probe_device() is called for everything 
which already exists during the IOMMU driver's own probe when it calls 
bus_set_iommu(), then at BUS_NOTIFY_ADD_DEVICE time for everything which 
appears subsequently. The only trouble is, to observe it in action on a 
DT-based system you'd currently have to go back at least 4 years, before 
09515ef5ddad...

Basically there were two issues: firstly we need the of_xlate step 
before add_device (now probe_device) for a DT-based IOMMU driver to know 
whether it should care about the given device or not. When -EPROBE_DEFER 
was the only tool we had to ensure probe ordering, and resolving the 
"iommus" DT property the only place to decide that, delaying it all 
until driver probe time was the only reasonable option, however ugly. 
The iommu_probe_device() "replay" in {of,acpi}_iommu_configure() is 
merely doing its best to fake up the previous behaviour. Try binding a 
dummy driver to your device first, then unbind it and bind the real one, 
and you'll see that iommu_probe_device() doesn't run the second or 
subsequent times. Now that we have fw_devlink to take care of ordering, 
the main reason for this weirdness is largely gone, so I'm keen to start 
getting rid of the divergence again as far as possible. Fundamentally, 
IOMMU drivers are supposed to be aware of all devices which the kernel 
knows about, regardless of whether they have a driver available or not.

The second issue is that when we have multiple IOMMU instances, the 
initial bus_set_iommu() "replay" is only useful for the first instance, 
so devices managed by other instances which aren't up and running yet 
will be glossed over. Currently this ends up being papered over by the 
solution to the first point on DT systems, while the x86 drivers hide 
their individual IOMMU units behind a single IOMMU API "instance", so 
it's been having little impact in practice. However, improving the core 
API's multi-instance support is an increasingly pressing issue now that 
new more varied systems are showing up, and it's that which is really 
going to be driving the aforementioned changes. FWIW the plan I 
currently have is to hang things off iommu_device_register() instead.

>> FWIW we already have a go-faster knob for people who want to tweak the
>> security/performance compromise for specific devices, namely the sysfs
>> interface for changing a group's domain type before binding the relevant
>> driver(s). Is that something you could use in your application, say from
>> an initramfs script?
> 
> We've never had an initramfs script in Chrome OS. I don't know all the
> history of why (I'm trying to check), but I'm nearly certain it was a
> conscious decision. Probably it has to do with the fact that we're not
> trying to build a generic distribution where a single boot source can
> boot a huge variety of hardware. We generally have one kernel for a
> class of devices. I believe avoiding the initramfs just keeps things
> simpler.
> 
> I think trying to revamp Chrome OS to switch to an initramfs type
> system would be a pretty big undertaking since (as I understand it)
> you can't just run a little command and then return to the normal boot
> flow. Once you switch to initramfs you're committing to finding /
> setting up the rootfs yourself and on Chrome OS I believe that means a
> whole bunch of dm-verity work.
> 
> 
> ...so probably the initramfs is a no-go for me, but I'm still crossing
> my fingers that the pre_probe() might be legit if you take a second
> look at it?

That's fair enough - TBH the current sysfs interface is a pretty 
specialist sport primarily for datacentre folks who can afford to take 
down their 40GBE NIC or whatever momentarily for a longer-term payoff, 
but it was worth exploring - I'm assuming the SDHCI holds your root 
filesystem so you wouldn't be able to do the unbinding dance from real 
userspace. As I said, the idea of embedding any sort of data in 
individual client drivers is a non-starter in general since it only has 
any hope of working on DT platforms (maybe arm64 ACPI too?), and only 
for very much the wrong reasons.

If this is something primarily demanded by QCom platforms in the short 
term, I'm tempted to say just try it with more device-matching magic in 
arm-smmu-qcom. Otherwise, the idea of growing the sysfs interface to 
allow switching a DMA domain from default-strict to non-strict is 
certainly an interesting prospect. Going a step beyond that to bring up 
a flush queue 'live' without rebuilding the whole group and domain could 
get ugly when it comes to drivers' interaction with io-pgtable, but I 
think it might be *technically* feasible...

Robin.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 20:10         ` Doug Anderson
@ 2021-06-23 13:54           ` Rob Herring
  0 siblings, 0 replies; 34+ messages in thread
From: Rob Herring @ 2021-06-23 13:54 UTC (permalink / raw)
  To: Doug Anderson
  Cc: Saravana Kannan, Robin Murphy, Greg Kroah-Hartman,
	Rafael J. Wysocki, Rafael J. Wysocki, Will Deacon, Joerg Roedel,
	Bjorn Andersson, Ulf Hansson, Adrian Hunter, Bjorn Helgaas,
	Rob Clark, linux-arm-msm, PCI, quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Joel Fernandes, Andy Gross,
	Bartosz Golaszewski, Dan Williams, Geert Uytterhoeven,
	Heikki Krogerus, Randy Dunlap, LKML

On Tue, Jun 22, 2021 at 2:10 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Tue, Jun 22, 2021 at 1:06 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Tue, Jun 22, 2021 at 1:02 PM Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Tue, Jun 22, 2021 at 09:06:02AM -0700, Doug Anderson wrote:
> > > > Hi,
> > > >
> > > > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> > > > >
> > > > > Hi Doug,
> > > > >
> > > > > On 2021-06-22 00:52, Douglas Anderson wrote:
> > > > > >
> > > > > > This patch attempts to put forward a proposal for enabling non-strict
> > > > > > DMA on a device-by-device basis. The patch series requests non-strict
> > > > > > DMA for the Qualcomm SDHCI controller as a first device to enable,
> > > > > > getting a nice bump in performance with what's believed to be a very
> > > > > > small drop in security / safety (see the patch for the full argument).
> > > > > >
> > > > > > As part of this patch series I am end up slightly cleaning up some of
> > > > > > the interactions between the PCI subsystem and the IOMMU subsystem but
> > > > > > I don't go all the way to fully remove all the tentacles. Specifically
> > > > > > this patch series only concerns itself with a single aspect: strict
> > > > > > vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> > > > > > to talk about / reason about for more subsystems compared to overall
> > > > > > deciding what it means for a device to be "external" or "untrusted".
> > > > > >
> > > > > > If something like this patch series ends up being landable, it will
> > > > > > undoubtedly need coordination between many maintainers to land. I
> > > > > > believe it's fully bisectable but later patches in the series
> > > > > > definitely depend on earlier ones. Sorry for the long CC list. :(
> > > > >
> > > > > Unfortunately, this doesn't work. In normal operation, the default
> > > > > domains should be established long before individual drivers are even
> > > > > loaded (if they are modules), let alone anywhere near probing. The fact
> > > > > that iommu_probe_device() sometimes gets called far too late off the
> > > > > back of driver probe is an unfortunate artefact of the original
> > > > > probe-deferral scheme, and causes other problems like potentially
> > > > > malformed groups - I've been forming a plan to fix that for a while now,
> > > > > so I for one really can't condone anything trying to rely on it.
> > > > > Non-deterministic behaviour based on driver probe order for multi-device
> > > > > groups is part of the existing problem, and your proposal seems equally
> > > > > vulnerable to that too.
> > > >
> > > > Doh! :( I definitely can't say I understand the iommu subsystem
> > > > amazingly well. It was working for me, but I could believe that I was
> > > > somehow violating a rule somewhere.
> > > >
> > > > I'm having a bit of a hard time understanding where the problem is
> > > > though. Is there any chance that you missed the part of my series
> > > > where I introduced a "pre_probe" step? Specifically, I see this:
> > > >
> > > > * really_probe() is called w/ a driver and a device.
> > > > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > > > * -> eventually calls iommu_probe_device() w/ the device.
> > > > * -> calls iommu_alloc_default_domain() w/ the device
> > > > * -> calls iommu_group_alloc_default_domain()
> > > > * -> always allocates a new domain
> > > >
> > > > ...so we always have a "struct device" when a domain is allocated if
> > > > that domain is going to be associated with a device.
> > > >
> > > > I will agree that iommu_probe_device() is called before the driver
> > > > probe, but unless I missed something it's after the device driver is
> > > > loaded.  ...and assuming something like patch #1 in this series looks
> > > > OK then iommu_probe_device() will be called after "pre_probe".
> > > >
> > > > So assuming I'm not missing something, I'm not actually relying the
> > > > IOMMU getting init off the back of driver probe.
> > > >
> > > >
> > > > > FWIW we already have a go-faster knob for people who want to tweak the
> > > > > security/performance compromise for specific devices, namely the sysfs
> > > > > interface for changing a group's domain type before binding the relevant
> > > > > driver(s). Is that something you could use in your application, say from
> > > > > an initramfs script?
> > > >
> > > > We've never had an initramfs script in Chrome OS. I don't know all the
> > > > history of why (I'm trying to check), but I'm nearly certain it was a
> > > > conscious decision. Probably it has to do with the fact that we're not
> > > > trying to build a generic distribution where a single boot source can
> > > > boot a huge variety of hardware. We generally have one kernel for a
> > > > class of devices. I believe avoiding the initramfs just keeps things
> > > > simpler.
> > > >
> > > > I think trying to revamp Chrome OS to switch to an initramfs type
> > > > system would be a pretty big undertaking since (as I understand it)
> > > > you can't just run a little command and then return to the normal boot
> > > > flow. Once you switch to initramfs you're committing to finding /
> > > > setting up the rootfs yourself and on Chrome OS I believe that means a
> > > > whole bunch of dm-verity work.
> > > >
> > > >
> > > > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > > > my fingers that the pre_probe() might be legit if you take a second
> > > > look at it?
> > >
> > > Couldn't you have a driver flag that has the same effect as twiddling
> > > sysfs? At the being of probe, check the flag and go set the underlying
> > > sysfs setting in the device.
> >
> > My understanding of what Robin is saying is that we'd need this info
> > well before the driver is even available. The pre_probe() is
> > effectively doing the same thing you are suggesting.
>
> Right, I was just about to respond with the same. ;-) So overall right
> now we're blocked waiting for someone to point out the error in my
> logic. ;-)

Okay, I don't see how sysfs would work in that case either. You can't
assume the driver is not available until after sysfs. But I'll defer
to others...

> > > Though you may want this to be per device, not per driver. To do that
> > > early, I think you'd need a DT property. I wouldn't be totally opposed
> > > to that and I appreciate you not starting there. :)
> >
> > Which is what I'm suggest elsewhere in the thread:
> >
> > https://lore.kernel.org/lkml/CAGETcx83qCZF5JN5cqXxdSFiEgfc4jYESJg-RepL2wJXJv0Eww@mail.gmail.com/
>
> Rob: I'd be happy if you wanted to comment on that thread. If you say
> that it's fine to add a generic device tree property to control
> strictness then I'm more than happy to add support for it. I've been
> going on the theory that you'd NAK such a property but I'm totally
> good with being wrong. ;-)
>
> I'd be more than happy if you could suggest what you'd envision such a
> property to be named.

You want me to do the hard part? ;)

Would this work as a flag in iommus cell (either another cell or bit
in the existing cell)?

You could go the compatible match list route as well. At least until
you work out the kernel implementation.

Rob

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-22 22:10     ` Robin Murphy
@ 2021-06-23 17:29       ` Doug Anderson
  2021-06-24 17:23         ` Doug Anderson
  0 siblings, 1 reply; 34+ messages in thread
From: Doug Anderson @ 2021-06-23 17:29 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Joerg Roedel, Bjorn Andersson, Ulf Hansson,
	Adrian Hunter, Bjorn Helgaas, Rob Clark, linux-arm-msm,
	linux-pci, quic_c_gdjako, list@263.net:IOMMU DRIVERS, Sonny Rao,
	Sai Prakash Ranjan, Linux MMC List, Veerabhadrarao Badiganti,
	Rajat Jain, Saravana Kannan, Joel Fernandes, Andy Gross,
	Bartosz Golaszewski, Dan Williams, Geert Uytterhoeven,
	Heikki Krogerus, Randy Dunlap, LKML

Hi,

On Tue, Jun 22, 2021 at 3:10 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2021-06-22 17:06, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Jun 22, 2021 at 4:35 AM Robin Murphy <robin.murphy@arm.com> wrote:
> >>
> >> Hi Doug,
> >>
> >> On 2021-06-22 00:52, Douglas Anderson wrote:
> >>>
> >>> This patch attempts to put forward a proposal for enabling non-strict
> >>> DMA on a device-by-device basis. The patch series requests non-strict
> >>> DMA for the Qualcomm SDHCI controller as a first device to enable,
> >>> getting a nice bump in performance with what's believed to be a very
> >>> small drop in security / safety (see the patch for the full argument).
> >>>
> >>> As part of this patch series I am end up slightly cleaning up some of
> >>> the interactions between the PCI subsystem and the IOMMU subsystem but
> >>> I don't go all the way to fully remove all the tentacles. Specifically
> >>> this patch series only concerns itself with a single aspect: strict
> >>> vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
> >>> to talk about / reason about for more subsystems compared to overall
> >>> deciding what it means for a device to be "external" or "untrusted".
> >>>
> >>> If something like this patch series ends up being landable, it will
> >>> undoubtedly need coordination between many maintainers to land. I
> >>> believe it's fully bisectable but later patches in the series
> >>> definitely depend on earlier ones. Sorry for the long CC list. :(
> >>
> >> Unfortunately, this doesn't work. In normal operation, the default
> >> domains should be established long before individual drivers are even
> >> loaded (if they are modules), let alone anywhere near probing. The fact
> >> that iommu_probe_device() sometimes gets called far too late off the
> >> back of driver probe is an unfortunate artefact of the original
> >> probe-deferral scheme, and causes other problems like potentially
> >> malformed groups - I've been forming a plan to fix that for a while now,
> >> so I for one really can't condone anything trying to rely on it.
> >> Non-deterministic behaviour based on driver probe order for multi-device
> >> groups is part of the existing problem, and your proposal seems equally
> >> vulnerable to that too.
> >
> > Doh! :( I definitely can't say I understand the iommu subsystem
> > amazingly well. It was working for me, but I could believe that I was
> > somehow violating a rule somewhere.
> >
> > I'm having a bit of a hard time understanding where the problem is
> > though. Is there any chance that you missed the part of my series
> > where I introduced a "pre_probe" step? Specifically, I see this:
> >
> > * really_probe() is called w/ a driver and a device.
> > * -> calls dev->bus->dma_configure() w/ a "struct device *"
> > * -> eventually calls iommu_probe_device() w/ the device.
>
> This...
>
> > * -> calls iommu_alloc_default_domain() w/ the device
> > * -> calls iommu_group_alloc_default_domain()
> > * -> always allocates a new domain
> >
> > ...so we always have a "struct device" when a domain is allocated if
> > that domain is going to be associated with a device.
> >
> > I will agree that iommu_probe_device() is called before the driver
> > probe, but unless I missed something it's after the device driver is
> > loaded.  ...and assuming something like patch #1 in this series looks
> > OK then iommu_probe_device() will be called after "pre_probe".
> >
> > So assuming I'm not missing something, I'm not actually relying the
> > IOMMU getting init off the back of driver probe.
>
> ...is implicitly that. Sorry that it's not obvious.
>
> The "proper" flow is that iommu_probe_device() is called for everything
> which already exists during the IOMMU driver's own probe when it calls
> bus_set_iommu(), then at BUS_NOTIFY_ADD_DEVICE time for everything which
> appears subsequently. The only trouble is, to observe it in action on a
> DT-based system you'd currently have to go back at least 4 years, before
> 09515ef5ddad...
>
> Basically there were two issues: firstly we need the of_xlate step
> before add_device (now probe_device) for a DT-based IOMMU driver to know
> whether it should care about the given device or not. When -EPROBE_DEFER
> was the only tool we had to ensure probe ordering, and resolving the
> "iommus" DT property the only place to decide that, delaying it all
> until driver probe time was the only reasonable option, however ugly.
> The iommu_probe_device() "replay" in {of,acpi}_iommu_configure() is
> merely doing its best to fake up the previous behaviour. Try binding a
> dummy driver to your device first, then unbind it and bind the real one,
> and you'll see that iommu_probe_device() doesn't run the second or
> subsequent times. Now that we have fw_devlink to take care of ordering,

I probably haven't been following fw_devlink as closely as I should,
but does it completely replace -EPROBE_DEFER? From what I can tell
right now it helps optimize the boot ordering but it's not mandatory
(AKA Linux should boot/work fine with fw_devlink=off)?


> the main reason for this weirdness is largely gone, so I'm keen to start
> getting rid of the divergence again as far as possible. Fundamentally,
> IOMMU drivers are supposed to be aware of all devices which the kernel
> knows about, regardless of whether they have a driver available or not.
>
> The second issue is that when we have multiple IOMMU instances, the
> initial bus_set_iommu() "replay" is only useful for the first instance,
> so devices managed by other instances which aren't up and running yet
> will be glossed over. Currently this ends up being papered over by the
> solution to the first point on DT systems, while the x86 drivers hide
> their individual IOMMU units behind a single IOMMU API "instance", so
> it's been having little impact in practice. However, improving the core
> API's multi-instance support is an increasingly pressing issue now that
> new more varied systems are showing up, and it's that which is really
> going to be driving the aforementioned changes. FWIW the plan I
> currently have is to hang things off iommu_device_register() instead.

I tried to follow all the above the best I can and hopefully
understood some of it. ;-)

OK, so what do you think about these points?

* I think you are arguing that IOMMU strictness requirements needs to
be known regardless of whether we have a driver for a given device or
not. So the whole pre_probe stuff is just not right.

* One thing that's been proposed is putting strictness info in the
device tree. In theory on device-tree systems you could still put
strictness info in the per-device nodes assuming that all devices are
actually described in the device tree. That sounds nice, but I can
believe it would be pretty hard to parse backward like this. AKA when
the IOMMU probes it would need to find all the device tree nodes that
would end up grouped together parse their details to find out if they
should be non-strict.

* Instead of putting the details in per-device nodes, maybe it makes
sense to accept that we should be specifying these things at the IOMMU
level? If specifying things at the device tree level then the
device-tree node of the IOMMU itself would just have a list of things
that should be strict/non-strict. ...this could potentially be merged
with a hardcoded list of things in the IOMMU driver based on the IOMMU
compatible string.

Do those sound right?

I still haven't totally grokked the ideal way to identify devices. I
guess on Qualcomm systems each device is in its own group and so could
have its own strictness levels? ...or would it be better to list by
"stream ID" or something like that?

If we do something like this then maybe that's a solution that could
land short-ish term? We would know right at init time whether a given
domain should be strict or non-strict and there'd be no requirements
to transition it.


> >> FWIW we already have a go-faster knob for people who want to tweak the
> >> security/performance compromise for specific devices, namely the sysfs
> >> interface for changing a group's domain type before binding the relevant
> >> driver(s). Is that something you could use in your application, say from
> >> an initramfs script?
> >
> > We've never had an initramfs script in Chrome OS. I don't know all the
> > history of why (I'm trying to check), but I'm nearly certain it was a
> > conscious decision. Probably it has to do with the fact that we're not
> > trying to build a generic distribution where a single boot source can
> > boot a huge variety of hardware. We generally have one kernel for a
> > class of devices. I believe avoiding the initramfs just keeps things
> > simpler.
> >
> > I think trying to revamp Chrome OS to switch to an initramfs type
> > system would be a pretty big undertaking since (as I understand it)
> > you can't just run a little command and then return to the normal boot
> > flow. Once you switch to initramfs you're committing to finding /
> > setting up the rootfs yourself and on Chrome OS I believe that means a
> > whole bunch of dm-verity work.
> >
> >
> > ...so probably the initramfs is a no-go for me, but I'm still crossing
> > my fingers that the pre_probe() might be legit if you take a second
> > look at it?
>
> That's fair enough - TBH the current sysfs interface is a pretty
> specialist sport primarily for datacentre folks who can afford to take
> down their 40GBE NIC or whatever momentarily for a longer-term payoff,
> but it was worth exploring - I'm assuming the SDHCI holds your root
> filesystem so you wouldn't be able to do the unbinding dance from real
> userspace. As I said, the idea of embedding any sort of data in
> individual client drivers is a non-starter in general since it only has
> any hope of working on DT platforms (maybe arm64 ACPI too?), and only
> for very much the wrong reasons.
>
> If this is something primarily demanded by QCom platforms in the short
> term,

At the moment I'm mostly focused on QCom platforms, but that's mostly
because they're the first board that I've dealt with that has iommus
on pretty much every peripheral under the sun. That's a _good_ thing,
but it also means it's where we're hitting all these performance
issues. I believe that the usage of "iommus everywhere" is about to
become a lot more widespread across the SoC ecosystem, especially as
they are important for virtualization and that seems to be a hot-topic
these days.

Basically: I could land a short-term hack a solution for me (and
probably this is the right thing for me to do?), but I'm very much
interested in finding a better solution, ideally something that we
could achieve in something less than a year? Am I dreaming?


> I'm tempted to say just try it with more device-matching magic in
> arm-smmu-qcom.

Yeah, I had this working (at least as far as my testing went)
downstream in the chromeos-5.4 tree. I basically just implemented
"init_context" for the arm-smmu-qcom driver and then if the "struct
device *" passed in matched the SDMMC compatible string I just ORed in
"IO_PGTABLE_QUIRK_NON_STRICT". That actually seemed to work fine in
the downstream kernel tree but not upstream. I believe it broke in
commit a250c23f15c2 ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE")
because the flush queue stopped getting initted.

I think I could revamp my patch series to continue to have the
strictness attributes in the "struct iommu_domain" but just get rid of
the bits in the "struct device" and obviously get rid of the
"pre_probe" patch. Would something like that be acceptable?


> Otherwise, the idea of growing the sysfs interface to
> allow switching a DMA domain from default-strict to non-strict is
> certainly an interesting prospect. Going a step beyond that to bring up
> a flush queue 'live' without rebuilding the whole group and domain could
> get ugly when it comes to drivers' interaction with io-pgtable, but I
> think it might be *technically* feasible...

It seems like it would be nice, but maybe with the above we could get
by without having to do this?

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers
  2021-06-21 23:52 ` [PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers Douglas Anderson
@ 2021-06-24 13:35   ` Greg KH
  0 siblings, 0 replies; 34+ messages in thread
From: Greg KH @ 2021-06-24 13:35 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas, robdclark,
	linux-arm-msm, linux-pci, quic_c_gdjako, iommu, sonnyrao,
	saiprakash.ranjan, linux-mmc, vbadigan, rajatja, saravanak, joel,
	Geert Uytterhoeven, linux-kernel

On Mon, Jun 21, 2021 at 04:52:43PM -0700, Douglas Anderson wrote:
> Right now things are a bit awkward if a driver would like a chance to
> run before some of the more "automatic" things (pinctrl, DMA, IOMMUs,
> ...) happen to a device. This patch aims to fix that problem by
> introducing the concept of a "pre_probe" function that drivers can
> implement to run before the "automatic" stuff.
> 
> Why would you want to run before the "automatic" stuff? The incentive
> in my case is that I want to be able to fill in some boolean flags in
> the "struct device" before the IOMMU init runs. It appears that the
> strictness vs. non-strictness of a device's iommu config is determined
> once at init time and can't be changed afterwards. However, I would
> like to avoid hardcoding the rules for strictness in the IOMMU
> driver. Instead I'd like to let individual drivers be able to make
> informed decisions about the appropriateness of strictness
> vs. non-strictness.
> 
> The desire for running code pre_probe is likely not limited to my use
> case. I believe that the list "qcom_smmu_client_of_match" is hacked
> into the iommu driver specifically because there was no real good
> framework for this. For the existing list it wasn't _quite_ as ugly as
> my needs since the decision could be made solely on compatible string,
> but it still feels like it would have been better for individual
> drivers to run code and setup some state rather than coding up a big
> list in the IOMMU driver.
> 
> Even without this patch, I believe it is possible for a driver to run
> before the "automatic" things by registering for
> "BUS_NOTIFY_BIND_DRIVER" in its init call, though I haven't personally
> tested this. Using the notifier is a bit awkward, though, and I'd
> rather avoid it. Also, using "BUS_NOTIFY_BIND_DRIVER" would require
> drivers to stop using the convenience module_platform_driver() helper
> and roll a bunch of boilerplate code.
> 
> NOTE: the pre_probe here is listed in the driver structure. As a side
> effect of this it will be passed a "struct device *" rather than the
> more specific device type (like the "struct platform_device *" that
> most platform devices get passed to their probe). Presumably this
> won't cause trouble and it's a lot less code to write but if we need
> to make it more symmetric that's also possible by touching more files.

No, please please no.

If a bus really wants to do crud like this, it can do it in it's own
probe callback, the driver core doesn't need to mess with this.

If you need to mess with iommu values in struct device, again, do that
in the bus core for the devices on that specific bus, that's where those
values are supposed to be set anyway, right?

If the iommu drivers need to be run before a specific bus is
initialized, then fix that there, the driver core does not need to care
about this at all.

so a big NACK on this one, sorry.

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness
  2021-06-21 23:52 ` [PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness Douglas Anderson
@ 2021-06-24 13:36   ` Greg KH
  2021-06-24 13:42     ` Doug Anderson
  0 siblings, 1 reply; 34+ messages in thread
From: Greg KH @ 2021-06-24 13:36 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas, robdclark,
	linux-arm-msm, linux-pci, quic_c_gdjako, iommu, sonnyrao,
	saiprakash.ranjan, linux-mmc, vbadigan, rajatja, saravanak, joel,
	Bartosz Golaszewski, Dan Williams, Heikki Krogerus, Randy Dunlap,
	linux-kernel

On Mon, Jun 21, 2021 at 04:52:44PM -0700, Douglas Anderson wrote:
> How to control the "strictness" of an IOMMU is a bit of a mess right
> now. As far as I can tell, right now:
> * You can set the default to "non-strict" and some devices (right now,
>   only PCI devices) can request to run in "strict" mode.
> * You can set the default to "strict" and no devices in the system are
>   allowed to run as "non-strict".
> 
> I believe this needs to be improved a bit. Specifically:
> * We should be able to default to "strict" mode but let devices that
>   claim to be fairly low risk request that they be run in "non-strict"
>   mode.
> * We should allow devices outside of PCIe to request "strict" mode if
>   the system default is "non-strict".
> 
> I believe the correct way to do this is two bits in "struct
> device". One allows a device to force things to "strict" mode and the
> other allows a device to _request_ "non-strict" mode. The asymmetry
> here is on purpose. Generally if anything in the system makes a
> request for strictness of something then we want it strict. Thus
> drivers can only request (but not force) non-strictness.
> 
> It's expected that the strictness fields can be filled in by the bus
> code like in the patch ("PCI: Indicate that we want to force strict
> DMA for untrusted devices") or by using the new pre_probe concept
> introduced in the patch ("drivers: base: Add the concept of
> "pre_probe" to drivers").
> 
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> 
>  include/linux/device.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index f1a00040fa53..c1b985e10c47 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -449,6 +449,15 @@ struct dev_links_info {
>   *		and optionall (if the coherent mask is large enough) also
>   *		for dma allocations.  This flag is managed by the dma ops
>   *		instance from ->dma_supported.
> + * @force_strict_iommu: If set to %true then we should force this device to
> + *			iommu.strict regardless of the other defaults in the
> + *			system. Only has an effect if an IOMMU is in place.

Why would you ever NOT want to do this?

> + * @request_non_strict_iommu: If set to %true and there are no other known
> + *			      reasons to make the iommu.strict for this device,
> + *			      then default to non-strict mode. This implies
> + *			      some belief that the DMA master for this device
> + *			      won't abuse the DMA path to compromise the kernel.
> + *			      Only has an effect if an IOMMU is in place.

This feels in contrast to the previous field you just added, how do they
both interact?  Would an enum work better?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices
  2021-06-21 23:52 ` [PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices Douglas Anderson
@ 2021-06-24 13:38   ` Greg KH
  2021-06-24 13:46     ` Doug Anderson
  0 siblings, 1 reply; 34+ messages in thread
From: Greg KH @ 2021-06-24 13:38 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas, robdclark,
	linux-arm-msm, linux-pci, quic_c_gdjako, iommu, sonnyrao,
	saiprakash.ranjan, linux-mmc, vbadigan, rajatja, saravanak, joel,
	linux-kernel

On Mon, Jun 21, 2021 at 04:52:45PM -0700, Douglas Anderson wrote:
> At the moment the generic IOMMU framework reaches into the PCIe device
> to check the "untrusted" state and uses this information to figure out
> if it should be running the IOMMU in strict or non-strict mode. Let's
> instead set the new boolean in "struct device" to indicate when we
> want forced strictness.
> 
> NOTE: we still continue to set the "untrusted" bit in PCIe since that
> apparently is used for more than just IOMMU strictness. It probably
> makes sense for a later patchset to clarify all of the other needs we
> have for "untrusted" PCIe devices (perhaps add more booleans into the
> "struct device") so we can fully eliminate the need for the IOMMU
> framework to reach into a PCIe device.

It feels like the iommu code should not be messing with pci devices at
all, please don't do this.

Why does this matter?  Why wouldn't a pci device use "strict" iommu at
all times?  What happens if it does not?  Why are PCI devices special?

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness
  2021-06-24 13:36   ` Greg KH
@ 2021-06-24 13:42     ` Doug Anderson
  0 siblings, 0 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-24 13:42 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Will Deacon, Robin Murphy,
	Joerg Roedel, Bjorn Andersson, Ulf Hansson, Adrian Hunter,
	Bjorn Helgaas, Rob Clark, linux-arm-msm, linux-pci,
	quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Saravana Kannan,
	Joel Fernandes, Bartosz Golaszewski, Dan Williams,
	Heikki Krogerus, Randy Dunlap, LKML

Hi,

On Thu, Jun 24, 2021 at 6:37 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jun 21, 2021 at 04:52:44PM -0700, Douglas Anderson wrote:
> > How to control the "strictness" of an IOMMU is a bit of a mess right
> > now. As far as I can tell, right now:
> > * You can set the default to "non-strict" and some devices (right now,
> >   only PCI devices) can request to run in "strict" mode.
> > * You can set the default to "strict" and no devices in the system are
> >   allowed to run as "non-strict".
> >
> > I believe this needs to be improved a bit. Specifically:
> > * We should be able to default to "strict" mode but let devices that
> >   claim to be fairly low risk request that they be run in "non-strict"
> >   mode.
> > * We should allow devices outside of PCIe to request "strict" mode if
> >   the system default is "non-strict".
> >
> > I believe the correct way to do this is two bits in "struct
> > device". One allows a device to force things to "strict" mode and the
> > other allows a device to _request_ "non-strict" mode. The asymmetry
> > here is on purpose. Generally if anything in the system makes a
> > request for strictness of something then we want it strict. Thus
> > drivers can only request (but not force) non-strictness.
> >
> > It's expected that the strictness fields can be filled in by the bus
> > code like in the patch ("PCI: Indicate that we want to force strict
> > DMA for untrusted devices") or by using the new pre_probe concept
> > introduced in the patch ("drivers: base: Add the concept of
> > "pre_probe" to drivers").
> >
> > Signed-off-by: Douglas Anderson <dianders@chromium.org>
> > ---
> >
> >  include/linux/device.h | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index f1a00040fa53..c1b985e10c47 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -449,6 +449,15 @@ struct dev_links_info {
> >   *           and optionall (if the coherent mask is large enough) also
> >   *           for dma allocations.  This flag is managed by the dma ops
> >   *           instance from ->dma_supported.
> > + * @force_strict_iommu: If set to %true then we should force this device to
> > + *                   iommu.strict regardless of the other defaults in the
> > + *                   system. Only has an effect if an IOMMU is in place.
>
> Why would you ever NOT want to do this?
>
> > + * @request_non_strict_iommu: If set to %true and there are no other known
> > + *                         reasons to make the iommu.strict for this device,
> > + *                         then default to non-strict mode. This implies
> > + *                         some belief that the DMA master for this device
> > + *                         won't abuse the DMA path to compromise the kernel.
> > + *                         Only has an effect if an IOMMU is in place.
>
> This feels in contrast to the previous field you just added, how do they
> both interact?  Would an enum work better?

Right that it never makes sense to set both bits so an enum could work
better, essentially it would be { dont_care, request_non_strict,
force_strict }.

In any case the whole idea of doing this at the device level looks
like it's not the right way to go anyway, so this patch and the
previous pre_probe one are essentially abandoned and I need to send
out a v2 with some different approaches.

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode
  2021-06-21 23:52 ` [PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode Douglas Anderson
@ 2021-06-24 13:43   ` Greg KH
  2021-06-24 14:00     ` Doug Anderson
  0 siblings, 1 reply; 34+ messages in thread
From: Greg KH @ 2021-06-24 13:43 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: rafael, rafael.j.wysocki, will, robin.murphy, joro,
	bjorn.andersson, ulf.hansson, adrian.hunter, bhelgaas, robdclark,
	linux-arm-msm, linux-pci, quic_c_gdjako, iommu, sonnyrao,
	saiprakash.ranjan, linux-mmc, vbadigan, rajatja, saravanak, joel,
	Andy Gross, linux-kernel

On Mon, Jun 21, 2021 at 04:52:48PM -0700, Douglas Anderson wrote:
> IOMMUs can be run in "strict" mode or in "non-strict" mode. The
> quick-summary difference between the two is that in "strict" mode we
> wait until everything is flushed out when we unmap DMA memory. In
> "non-strict" we don't.
> 
> Using the IOMMU in "strict" mode is more secure/safer but slower
> because we have to sit and wait for flushes while we're unmapping. To
> explain a bit why "non-strict" mode is unsafe, let's imagine two
> examples.
> 
> An example of "non-strict" being insecure when reading from a device:
> a) Linux driver maps memory for DMA.
> b) Linux driver starts DMA on the device.
> c) Device write to RAM subject to bounds checking done by IOMMU.
> d) Device finishes writing to RAM and signals transfer is finished.
> e) Linux driver starts unmapping DMA memory but doesn't flush.

Why doesn't it "flush"?

> f) Linux driver validates that the data in memory looks sane and that
>    accessing it won't cause the driver to, for instance, overflow a
>    buffer.
> g) Device takes advantage of knowledge of how the Linux driver works
>    and sneaks in a modification to the data after the validation but
>    before the IOMMU unmap flush finishes.
> h) Device has now caused the Linux driver to access memory it
>    shouldn't.

So you are now saying we need to not trust hardware?  If so, that's a
massive switch for how the kernel needs to work right?

And what driver does f) and allows g) to happen?  That would be a normal
bug anyway, why not just fix the driver?

> An example of "non-strict" being insecure when writing to a device:
> a) Linux driver writes data intended for the device to RAM.
> b) Linux driver maps memory for DMA.
> c) Linux driver starts DMA on the device.
> d) Device reads from RAM subject to bounds checking done by IOMMU.
> e) Device finishes reading from RAM and signals transfer is finished.
> f) Linux driver starts unmapping DMA memory but doesn't flush.

Why does it not flush?

What do you mean by "flush"

> g) Linux driver frees memory and returns it to the pool.

What pool?

> h) Memory is allocated for another purpose.

Allocated by what?

We have memory allocators that write over the data when freed, why not
just use this if you are concerned about this type of issue?

> i) Device takes advantage of the period of time before IOMMU flush to
>    read memory that it shouldn't have had access to.

What memory would that be?

And if you really care about these issues, are you not able to take the
"hit" for the flush all the time as that is a hardware thing, not a
software thing.  Why not just always take advantage of that, no driver
changes needed?

And this isn't going to work, again, because the "pre_probe" isn't going
to be acceptable, sorry.

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices
  2021-06-24 13:38   ` Greg KH
@ 2021-06-24 13:46     ` Doug Anderson
  0 siblings, 0 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-24 13:46 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Will Deacon, Robin Murphy,
	Joerg Roedel, Bjorn Andersson, Ulf Hansson, Adrian Hunter,
	Bjorn Helgaas, Rob Clark, linux-arm-msm, linux-pci,
	quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Saravana Kannan,
	Joel Fernandes, LKML

Hi,

On Thu, Jun 24, 2021 at 6:38 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jun 21, 2021 at 04:52:45PM -0700, Douglas Anderson wrote:
> > At the moment the generic IOMMU framework reaches into the PCIe device
> > to check the "untrusted" state and uses this information to figure out
> > if it should be running the IOMMU in strict or non-strict mode. Let's
> > instead set the new boolean in "struct device" to indicate when we
> > want forced strictness.
> >
> > NOTE: we still continue to set the "untrusted" bit in PCIe since that
> > apparently is used for more than just IOMMU strictness. It probably
> > makes sense for a later patchset to clarify all of the other needs we
> > have for "untrusted" PCIe devices (perhaps add more booleans into the
> > "struct device") so we can fully eliminate the need for the IOMMU
> > framework to reach into a PCIe device.
>
> It feels like the iommu code should not be messing with pci devices at
> all, please don't do this.

I think it's generally agreed that having the IOMMU code reach into
the PCIe code is pretty non-ideal, but that's not something that my
patch series added. The IOMMU code already has special cases to reach
into PCIe devices to decide strictness. I was actually trying to
reduce the amount of it.

> Why does this matter?  Why wouldn't a pci device use "strict" iommu at
> all times?  What happens if it does not?  Why are PCI devices special?

This is something pre-existing in Linux. In my patch series I was
trying to make PCI devices less special and take some of the concepts
from there and expand them, but in a cleaner way. It sounds like in my
v2 I should steer away from this and leave the existing PCI hacks
alone.

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode
  2021-06-24 13:43   ` Greg KH
@ 2021-06-24 14:00     ` Doug Anderson
  0 siblings, 0 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-24 14:00 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Will Deacon, Robin Murphy,
	Joerg Roedel, Bjorn Andersson, Ulf Hansson, Adrian Hunter,
	Bjorn Helgaas, Rob Clark, linux-arm-msm, linux-pci,
	quic_c_gdjako,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Sonny Rao, Sai Prakash Ranjan, Linux MMC List,
	Veerabhadrarao Badiganti, Rajat Jain, Saravana Kannan,
	Joel Fernandes, Andy Gross, LKML

Hi,

On Thu, Jun 24, 2021 at 6:43 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jun 21, 2021 at 04:52:48PM -0700, Douglas Anderson wrote:
> > IOMMUs can be run in "strict" mode or in "non-strict" mode. The
> > quick-summary difference between the two is that in "strict" mode we
> > wait until everything is flushed out when we unmap DMA memory. In
> > "non-strict" we don't.
> >
> > Using the IOMMU in "strict" mode is more secure/safer but slower
> > because we have to sit and wait for flushes while we're unmapping. To
> > explain a bit why "non-strict" mode is unsafe, let's imagine two
> > examples.
> >
> > An example of "non-strict" being insecure when reading from a device:
> > a) Linux driver maps memory for DMA.
> > b) Linux driver starts DMA on the device.
> > c) Device write to RAM subject to bounds checking done by IOMMU.
> > d) Device finishes writing to RAM and signals transfer is finished.
> > e) Linux driver starts unmapping DMA memory but doesn't flush.
>
> Why doesn't it "flush"?

This is just how the pre-existing "iommu.strict=0" command line parameter works.


> > f) Linux driver validates that the data in memory looks sane and that
> >    accessing it won't cause the driver to, for instance, overflow a
> >    buffer.
> > g) Device takes advantage of knowledge of how the Linux driver works
> >    and sneaks in a modification to the data after the validation but
> >    before the IOMMU unmap flush finishes.
> > h) Device has now caused the Linux driver to access memory it
> >    shouldn't.
>
> So you are now saying we need to not trust hardware?  If so, that's a
> massive switch for how the kernel needs to work right?

This is a pre-existing concept in the kernel and is in fact so
prevalent that there are a bunch of inconsistent ways to configure it
(though it's being made better [1])

* On ARM64, default is strict and you can configure it with iommu.strict

* On AMD, default is non-strict and you can configure it with
amd_iommu=fullflush

* On Intel, default is non-strict and you can configure it with
intel_iommu=strict

...also pre-existing is that the kernel has special cases for
"external" PCI devices where it forces them to strict mode even if the
default is non-strict (like on Intel and AMD). I was pointed
specifically at <http://thunderclap.io/> for an example of why this
was important.


> And what driver does f) and allows g) to happen?  That would be a normal
> bug anyway, why not just fix the driver?

This one would be possible to workaround in the driver by copying the
memory somewhere else, but it violates the DMA model. Specifically
step "e)" above is supposed to mean that the driver is now in full
control of the memory, so it should be perfectly justified in assuming
that nobody else is scribbling on it.


> > An example of "non-strict" being insecure when writing to a device:
> > a) Linux driver writes data intended for the device to RAM.
> > b) Linux driver maps memory for DMA.
> > c) Linux driver starts DMA on the device.
> > d) Device reads from RAM subject to bounds checking done by IOMMU.
> > e) Device finishes reading from RAM and signals transfer is finished.
> > f) Linux driver starts unmapping DMA memory but doesn't flush.
>
> Why does it not flush?
>
> What do you mean by "flush"

"flush" means force / wait for the IOMMU unmap to fully take effect.


> > g) Linux driver frees memory and returns it to the pool.
>
> What pool?

The normal Linux memory pool.


> > h) Memory is allocated for another purpose.
>
> Allocated by what?

Someone else that wanted memory.


> We have memory allocators that write over the data when freed, why not
> just use this if you are concerned about this type of issue?
>
> > i) Device takes advantage of the period of time before IOMMU flush to
> >    read memory that it shouldn't have had access to.
>
> What memory would that be?

Depends on who got it. This could be hard to predict unless a
peripheral was trying to exploit a very specific version of Linux
where maybe it was predictable who got the memory next.


> And if you really care about these issues, are you not able to take the
> "hit" for the flush all the time as that is a hardware thing, not a
> software thing.  Why not just always take advantage of that, no driver
> changes needed?

The whole concept of strict vs. non-strict is definitely not new to my
series and I'm mostly just trying to configure it properly.


> And this isn't going to work, again, because the "pre_probe" isn't going
> to be acceptable, sorry.

Right. As discussed in the cover letter, I'm going to try to solve
this in other ways that doesn't involve pre_probe.

[1] https://lore.kernel.org/linux-iommu/1624016058-189713-1-git-send-email-john.garry@huawei.com/T/#m21bc07b9353b3ba85f2a40557645c2bcc13cbb3e

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
  2021-06-23 17:29       ` Doug Anderson
@ 2021-06-24 17:23         ` Doug Anderson
  0 siblings, 0 replies; 34+ messages in thread
From: Doug Anderson @ 2021-06-24 17:23 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rafael J. Wysocki,
	Will Deacon, Joerg Roedel, Bjorn Andersson, Ulf Hansson,
	Adrian Hunter, Bjorn Helgaas, Rob Clark, linux-arm-msm,
	linux-pci, quic_c_gdjako, list@263.net:IOMMU DRIVERS, Sonny Rao,
	Sai Prakash Ranjan, Linux MMC List, Veerabhadrarao Badiganti,
	Rajat Jain, Saravana Kannan, Joel Fernandes, Andy Gross,
	Bartosz Golaszewski, Dan Williams, Geert Uytterhoeven,
	Heikki Krogerus, Randy Dunlap, LKML

Hi,

On Wed, Jun 23, 2021 at 10:29 AM Doug Anderson <dianders@chromium.org> wrote:
>
> * Instead of putting the details in per-device nodes, maybe it makes
> sense to accept that we should be specifying these things at the IOMMU
> level? If specifying things at the device tree level then the
> device-tree node of the IOMMU itself would just have a list of things
> that should be strict/non-strict. ...this could potentially be merged
> with a hardcoded list of things in the IOMMU driver based on the IOMMU
> compatible string.
>
> Do those sound right?
>
> I still haven't totally grokked the ideal way to identify devices. I
> guess on Qualcomm systems each device is in its own group and so could
> have its own strictness levels? ...or would it be better to list by
> "stream ID" or something like that?
>
> If we do something like this then maybe that's a solution that could
> land short-ish term? We would know right at init time whether a given
> domain should be strict or non-strict and there'd be no requirements
> to transition it.

OK, so I have attempted to implement this in the Qualcomm IOMMU driver
in v2 of this series:

https://lore.kernel.org/r/20210624171759.4125094-1-dianders@chromium.org/

Hopefully that doesn't fragment the discussion too much, but it seemed
like it might help move us forward to see what this would look like in
code.

I'll also note that I removed a few people from the CC list on v2 of
the series because I'm no longer touching any code outside of the
IOMMU subsystem and I thought folks would appreciate less noise in
their inboxes. I've CCed a boatload of mailing lists though so it
should be easy to find. If I dropped you from the CC list of v2 and
you really want back on then I'm more than happy to re-add you.

-Doug

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-06-24 17:30 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-21 23:52 [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Douglas Anderson
2021-06-21 23:52 ` [PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers Douglas Anderson
2021-06-24 13:35   ` Greg KH
2021-06-21 23:52 ` [PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness Douglas Anderson
2021-06-24 13:36   ` Greg KH
2021-06-24 13:42     ` Doug Anderson
2021-06-21 23:52 ` [PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices Douglas Anderson
2021-06-24 13:38   ` Greg KH
2021-06-24 13:46     ` Doug Anderson
2021-06-21 23:52 ` [PATCH 4/6] iommu: Combine device strictness requests with the global default Douglas Anderson
2021-06-22  2:03   ` Lu Baolu
2021-06-22 16:53     ` Doug Anderson
2021-06-22 17:01       ` Doug Anderson
2021-06-22  2:55   ` Saravana Kannan
2021-06-22 16:40     ` Doug Anderson
2021-06-22 19:50       ` Saravana Kannan
2021-06-22 11:49   ` Robin Murphy
2021-06-22 18:45   ` Rajat Jain
2021-06-22 19:35     ` Doug Anderson
2021-06-21 23:52 ` [PATCH 5/6] iommu: Stop reaching into PCIe devices to decide strict vs. non-strict Douglas Anderson
2021-06-21 23:52 ` [PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode Douglas Anderson
2021-06-24 13:43   ` Greg KH
2021-06-24 14:00     ` Doug Anderson
2021-06-22 11:35 ` [PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC Robin Murphy
2021-06-22 16:06   ` Doug Anderson
2021-06-22 20:02     ` Rob Herring
2021-06-22 20:05       ` Saravana Kannan
2021-06-22 20:10         ` Doug Anderson
2021-06-23 13:54           ` Rob Herring
2021-06-22 22:10     ` Robin Murphy
2021-06-23 17:29       ` Doug Anderson
2021-06-24 17:23         ` Doug Anderson
2021-06-22 17:39 ` John Garry
2021-06-22 19:50   ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).