linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] RMRR related fixes and enhancements
@ 2019-05-16 10:08 Eric Auger
  2019-05-16 10:08 ` [PATCH v3 1/7] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() Eric Auger
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Currently the Intel reserved region is attached to the
RMRR unit and when building the list of RMRR seen by a device
we link this unique reserved region without taking care of
potential multiple usage of this reserved region by several devices.

Also while reading the vtd spec it is unclear to me whether
the RMRR device scope referenced by an RMRR ACPI struct could
be a PCI-PCI bridge, in which case I think we also need to
check the device belongs to the PCI sub-hierarchy of the device
referenced in the scope. This would be true for device_has_rmrr()
and intel_iommu_get_resv_regions().

Last, the VFIO subsystem would need to compute the usable IOVA range
by querying the iommu_get_group_resv_regions() API. This would allow,
for instance, to report potential conflicts between the guest physical
address space and host reserved regions.
  
However iommu_get_group_resv_regions() currently fails to differentiate
RMRRs that are known safe for device assignment and RMRRs that must be
enforced. So we introduce a new reserved memory region type (relaxable),
reported when associated to an USB or GFX device. The last 2 patches aim
at unblocking [1] which is stuck since 4.18.

[1-5] are fixes
[6-7] are enhancements

The two parts can be considered separately if needed.

References:
[1] [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
    https://patchwork.kernel.org/patch/10425309/

Branch: This series is available at:
https://github.com/eauger/linux/tree/v5.1-rmrr-v3

History:

v2 -> v3:
s/||/&& in iommu_group_create_direct_mappings

v1 -> v2:
- introduce is_downstream_to_pci_bridge() in a separate patch, change param
  names and add kerneldoc comment
- add 6,7

Eric Auger (7):
  iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
  iommu/vt-d: Duplicate iommu_resv_region objects per device list
  iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
  iommu/vt-d: Handle RMRR with PCI bridge device scopes
  iommu/vt-d: Handle PCI bridge RMRR device scopes in
    intel_iommu_get_resv_regions
  iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  iommu/vt-d: Differentiate relaxable and non relaxable RMRRs

 drivers/acpi/arm64/iort.c   |  3 +-
 drivers/iommu/amd_iommu.c   |  7 ++--
 drivers/iommu/arm-smmu-v3.c |  2 +-
 drivers/iommu/arm-smmu.c    |  2 +-
 drivers/iommu/intel-iommu.c | 82 +++++++++++++++++++++++++------------
 drivers/iommu/iommu.c       | 19 +++++----
 include/linux/iommu.h       |  8 +++-
 7 files changed, 82 insertions(+), 41 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/7] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
  2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
@ 2019-05-16 10:08 ` Eric Auger
  2019-05-16 10:08 ` [PATCH v3 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list Eric Auger
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

We plan to call iommu_alloc_resv_region in a non preemptible section.
Pass a GFP flag to the function and update all the call sites.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/acpi/arm64/iort.c   | 3 ++-
 drivers/iommu/amd_iommu.c   | 7 ++++---
 drivers/iommu/arm-smmu-v3.c | 2 +-
 drivers/iommu/arm-smmu.c    | 2 +-
 drivers/iommu/intel-iommu.c | 4 ++--
 drivers/iommu/iommu.c       | 7 ++++---
 include/linux/iommu.h       | 2 +-
 7 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9058cb084b91..8313b2d1b710 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -868,7 +868,8 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 			struct iommu_resv_region *region;
 
 			region = iommu_alloc_resv_region(base + SZ_64K, SZ_64K,
-							 prot, IOMMU_RESV_MSI);
+							 prot, IOMMU_RESV_MSI,
+							 GFP_KERNEL);
 			if (region) {
 				list_add_tail(&region->list, head);
 				resv++;
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 09c9e45f7fa2..f2eb8e9cd8a6 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3136,7 +3136,8 @@ static void amd_iommu_get_resv_regions(struct device *dev,
 			type = IOMMU_RESV_RESERVED;
 
 		region = iommu_alloc_resv_region(entry->address_start,
-						 length, prot, type);
+						 length, prot, type,
+						 GFP_KERNEL);
 		if (!region) {
 			dev_err(dev, "Out of memory allocating dm-regions\n");
 			return;
@@ -3146,14 +3147,14 @@ static void amd_iommu_get_resv_regions(struct device *dev,
 
 	region = iommu_alloc_resv_region(MSI_RANGE_START,
 					 MSI_RANGE_END - MSI_RANGE_START + 1,
-					 0, IOMMU_RESV_MSI);
+					 0, IOMMU_RESV_MSI, GFP_KERNEL);
 	if (!region)
 		return;
 	list_add_tail(&region->list, head);
 
 	region = iommu_alloc_resv_region(HT_RANGE_START,
 					 HT_RANGE_END - HT_RANGE_START + 1,
-					 0, IOMMU_RESV_RESERVED);
+					 0, IOMMU_RESV_RESERVED, GFP_KERNEL);
 	if (!region)
 		return;
 	list_add_tail(&region->list, head);
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4d5a694f02c2..f9b1279ef5bf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2226,7 +2226,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
 	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-					 prot, IOMMU_RESV_SW_MSI);
+					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
 	if (!region)
 		return;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0a28b3..646e76813e91 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1670,7 +1670,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
 	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-					 prot, IOMMU_RESV_SW_MSI);
+					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
 	if (!region)
 		return;
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..2be36dff189a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4220,7 +4220,7 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 
 	length = rmrr->end_address - rmrr->base_address + 1;
 	rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
-					      IOMMU_RESV_DIRECT);
+					      IOMMU_RESV_DIRECT, GFP_KERNEL);
 	if (!rmrru->resv)
 		goto free_rmrru;
 
@@ -5489,7 +5489,7 @@ static void intel_iommu_get_resv_regions(struct device *device,
 
 	reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
 				      IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
-				      0, IOMMU_RESV_MSI);
+				      0, IOMMU_RESV_MSI, GFP_KERNEL);
 	if (!reg)
 		return;
 	list_add_tail(&reg->list, head);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee6623f9b2..ae4ea5c0e6f9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -256,7 +256,7 @@ static int iommu_insert_resv_region(struct iommu_resv_region *new,
 	}
 insert:
 	region = iommu_alloc_resv_region(new->start, new->length,
-					 new->prot, new->type);
+					 new->prot, new->type, GFP_KERNEL);
 	if (!region)
 		return -ENOMEM;
 
@@ -1891,11 +1891,12 @@ void iommu_put_resv_regions(struct device *dev, struct list_head *list)
 
 struct iommu_resv_region *iommu_alloc_resv_region(phys_addr_t start,
 						  size_t length, int prot,
-						  enum iommu_resv_type type)
+						  enum iommu_resv_type type,
+						  gfp_t flags)
 {
 	struct iommu_resv_region *region;
 
-	region = kzalloc(sizeof(*region), GFP_KERNEL);
+	region = kzalloc(sizeof(*region), flags);
 	if (!region)
 		return NULL;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6f6f47..ba91666998fb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -364,7 +364,7 @@ extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
 extern int iommu_request_dm_for_dev(struct device *dev);
 extern struct iommu_resv_region *
 iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot,
-			enum iommu_resv_type type);
+			enum iommu_resv_type type, gfp_t flags);
 extern int iommu_get_group_resv_regions(struct iommu_group *group,
 					struct list_head *head);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list
  2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
  2019-05-16 10:08 ` [PATCH v3 1/7] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() Eric Auger
@ 2019-05-16 10:08 ` Eric Auger
  2019-05-16 10:08 ` [PATCH v3 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper Eric Auger
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

intel_iommu_get_resv_regions() aims to return the list of
reserved regions accessible by a given @device. However several
devices can access the same reserved memory region and when
building the list it is not safe to use a single iommu_resv_region
object, whose container is the RMRR. This iommu_resv_region must
be duplicated per device reserved region list.

Let's remove the struct iommu_resv_region from the RMRR unit
and allocate the iommu_resv_region directly in
intel_iommu_get_resv_regions().

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2be36dff189a..590a0e78d11d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -322,7 +322,6 @@ struct dmar_rmrr_unit {
 	u64	end_address;		/* reserved end address */
 	struct dmar_dev_scope *devices;	/* target devices */
 	int	devices_cnt;		/* target device count */
-	struct iommu_resv_region *resv; /* reserved region handle */
 };
 
 struct dmar_atsr_unit {
@@ -4205,7 +4204,6 @@ static inline void init_iommu_pm_ops(void) {}
 int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 {
 	struct acpi_dmar_reserved_memory *rmrr;
-	int prot = DMA_PTE_READ|DMA_PTE_WRITE;
 	struct dmar_rmrr_unit *rmrru;
 	size_t length;
 
@@ -4219,22 +4217,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 	rmrru->end_address = rmrr->end_address;
 
 	length = rmrr->end_address - rmrr->base_address + 1;
-	rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
-					      IOMMU_RESV_DIRECT, GFP_KERNEL);
-	if (!rmrru->resv)
-		goto free_rmrru;
 
 	rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),
 				((void *)rmrr) + rmrr->header.length,
 				&rmrru->devices_cnt);
 	if (rmrru->devices_cnt && rmrru->devices == NULL)
-		goto free_all;
+		goto free_rmrru;
 
 	list_add(&rmrru->list, &dmar_rmrr_units);
 
 	return 0;
-free_all:
-	kfree(rmrru->resv);
 free_rmrru:
 	kfree(rmrru);
 out:
@@ -4452,7 +4444,6 @@ static void intel_iommu_free_dmars(void)
 	list_for_each_entry_safe(rmrru, rmrr_n, &dmar_rmrr_units, list) {
 		list_del(&rmrru->list);
 		dmar_free_dev_scope(&rmrru->devices, &rmrru->devices_cnt);
-		kfree(rmrru->resv);
 		kfree(rmrru);
 	}
 
@@ -5470,6 +5461,7 @@ static void intel_iommu_remove_device(struct device *dev)
 static void intel_iommu_get_resv_regions(struct device *device,
 					 struct list_head *head)
 {
+	int prot = DMA_PTE_READ|DMA_PTE_WRITE;
 	struct iommu_resv_region *reg;
 	struct dmar_rmrr_unit *rmrr;
 	struct device *i_dev;
@@ -5479,10 +5471,21 @@ static void intel_iommu_get_resv_regions(struct device *device,
 	for_each_rmrr_units(rmrr) {
 		for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
 					  i, i_dev) {
+			struct iommu_resv_region *resv;
+			size_t length;
+
 			if (i_dev != device)
 				continue;
 
-			list_add_tail(&rmrr->resv->list, head);
+			length = rmrr->end_address - rmrr->base_address + 1;
+			resv = iommu_alloc_resv_region(rmrr->base_address,
+						       length, prot,
+						       IOMMU_RESV_DIRECT,
+						       GFP_ATOMIC);
+			if (!resv)
+				break;
+
+			list_add_tail(&resv->list, head);
 		}
 	}
 	rcu_read_unlock();
@@ -5500,10 +5503,8 @@ static void intel_iommu_put_resv_regions(struct device *dev,
 {
 	struct iommu_resv_region *entry, *next;
 
-	list_for_each_entry_safe(entry, next, head, list) {
-		if (entry->type == IOMMU_RESV_MSI)
-			kfree(entry);
-	}
+	list_for_each_entry_safe(entry, next, head, list)
+		kfree(entry);
 }
 
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
  2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
  2019-05-16 10:08 ` [PATCH v3 1/7] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() Eric Auger
  2019-05-16 10:08 ` [PATCH v3 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list Eric Auger
@ 2019-05-16 10:08 ` Eric Auger
  2019-05-16 10:08 ` [PATCH v3 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes Eric Auger
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Several call sites are about to check whether a device belongs
to the PCI sub-hierarchy of a candidate PCI-PCI bridge.
Introduce an helper to perform that check.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 590a0e78d11d..15c2f9677491 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev)
 	return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
 }
 
+/* is_downstream_to_pci_bridge - test if a device belongs to the
+ * PCI sub-hierarchy of a candidate PCI-PCI bridge
+ *
+ * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy
+ * @bridge: the candidate PCI-PCI bridge
+ *
+ * Return: true if @dev belongs to @bridge PCI sub-hierarchy
+ */
+static bool
+is_downstream_to_pci_bridge(struct device *dev, struct device *bridge)
+{
+	struct pci_dev *pdev, *pbridge;
+
+	if (!dev_is_pci(dev) || !dev_is_pci(bridge))
+		return false;
+
+	pdev = to_pci_dev(dev);
+	pbridge = to_pci_dev(bridge);
+
+	if (pbridge->subordinate &&
+	    pbridge->subordinate->number <= pdev->bus->number &&
+	    pbridge->subordinate->busn_res.end >= pdev->bus->number)
+		return true;
+
+	return false;
+}
+
 static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
 {
 	struct dmar_drhd_unit *drhd = NULL;
 	struct intel_iommu *iommu;
 	struct device *tmp;
-	struct pci_dev *ptmp, *pdev = NULL;
+	struct pci_dev *pdev = NULL;
 	u16 segment = 0;
 	int i;
 
@@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf
 				goto out;
 			}
 
-			if (!pdev || !dev_is_pci(tmp))
-				continue;
-
-			ptmp = to_pci_dev(tmp);
-			if (ptmp->subordinate &&
-			    ptmp->subordinate->number <= pdev->bus->number &&
-			    ptmp->subordinate->busn_res.end >= pdev->bus->number)
+			if (is_downstream_to_pci_bridge(dev, tmp))
 				goto got_pdev;
 		}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes
  2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
                   ` (2 preceding siblings ...)
  2019-05-16 10:08 ` [PATCH v3 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper Eric Auger
@ 2019-05-16 10:08 ` Eric Auger
  2019-05-16 10:08 ` [PATCH v3 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions Eric Auger
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

When reading the vtd specification and especially the
Reserved Memory Region Reporting Structure chapter,
it is not obvious a device scope element cannot be a
PCI-PCI bridge, in which case all downstream ports are
likely to access the reserved memory region. Let's handle
this case in device_has_rmrr.

Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain")

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- is_downstream_to_pci_bridge helper introduced in a separate patch
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 15c2f9677491..7ed820e79313 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2910,7 +2910,8 @@ static bool device_has_rmrr(struct device *dev)
 		 */
 		for_each_active_dev_scope(rmrr->devices,
 					  rmrr->devices_cnt, i, tmp)
-			if (tmp == dev) {
+			if (tmp == dev ||
+			    is_downstream_to_pci_bridge(dev, tmp)) {
 				rcu_read_unlock();
 				return true;
 			}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions
  2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
                   ` (3 preceding siblings ...)
  2019-05-16 10:08 ` [PATCH v3 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes Eric Auger
@ 2019-05-16 10:08 ` Eric Auger
  2019-05-16 10:08 ` [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
  2019-05-16 10:08 ` [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs Eric Auger
  6 siblings, 0 replies; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

In the case the RMRR device scope is a PCI-PCI bridge, let's check
the device belongs to the PCI sub-hierarchy.

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7ed820e79313..a36604f4900f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5496,7 +5496,8 @@ static void intel_iommu_get_resv_regions(struct device *device,
 			struct iommu_resv_region *resv;
 			size_t length;
 
-			if (i_dev != device)
+			if (i_dev != device &&
+			    !is_downstream_to_pci_bridge(device, i_dev))
 				continue;
 
 			length = rmrr->end_address - rmrr->base_address + 1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
                   ` (4 preceding siblings ...)
  2019-05-16 10:08 ` [PATCH v3 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions Eric Auger
@ 2019-05-16 10:08 ` Eric Auger
  2019-05-16 11:16   ` Jean-Philippe Brucker
  2019-05-16 12:46   ` Robin Murphy
  2019-05-16 10:08 ` [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs Eric Auger
  6 siblings, 2 replies; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Introduce a new type for reserved region. This corresponds
to directly mapped regions which are known to be relaxable
in some specific conditions, such as device assignment use
case. Well known examples are those used by USB controllers
providing PS/2 keyboard emulation for pre-boot BIOS and
early BOOT or RMRRs associated to IGD working in legacy mode.

Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
RMRR on graphics devices too"), those regions are currently
considered "safe" with respect to device assignment use case
which requires a non direct mapping at IOMMU physical level
(RAM GPA -> HPA mapping).

Those RMRRs currently exist and sometimes the device is
attempting to access it but this has not been considered
an issue until now.

However at the moment, iommu_get_group_resv_regions() is
not able to make any difference between directly mapped
regions: those which must be absolutely enforced and those
like above ones which are known as relaxable.

This is a blocker for reporting severe conflicts between
non relaxable RMRRs (like MSI doorbells) and guest GPA space.

With this new reserved region type we will be able to use
iommu_get_group_resv_regions() to enumerate the IOVA space
that is usable through the IOMMU API without introducing
regressions with respect to existing device assignment
use cases (USB and IGD).

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

Note: At the moment the sysfs ABI is not changed. However I wonder
whether it wouldn't be preferable to report the direct region as
"direct_relaxed" there. At the moment, in case the same direct
region is used by 2 devices, one USB/GFX and another not belonging
to the previous categories, the direct region will be output twice
with "direct" type.

This would unblock Shameer's series:
[PATCH v6 0/7] vfio/type1: Add support for valid iova list management
https://patchwork.kernel.org/patch/10425309/

which failed to get pulled for 4.18 merge window due to IGD
device assignment regression.

v2 -> v3:
- fix direct type check
---
 drivers/iommu/iommu.c | 12 +++++++-----
 include/linux/iommu.h |  6 ++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ae4ea5c0e6f9..28c3d6351832 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -73,10 +73,11 @@ struct iommu_group_attribute {
 };
 
 static const char * const iommu_group_resv_type_string[] = {
-	[IOMMU_RESV_DIRECT]	= "direct",
-	[IOMMU_RESV_RESERVED]	= "reserved",
-	[IOMMU_RESV_MSI]	= "msi",
-	[IOMMU_RESV_SW_MSI]	= "msi",
+	[IOMMU_RESV_DIRECT]			= "direct",
+	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",
+	[IOMMU_RESV_RESERVED]			= "reserved",
+	[IOMMU_RESV_MSI]			= "msi",
+	[IOMMU_RESV_SW_MSI]			= "msi",
 };
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
@@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
 		start = ALIGN(entry->start, pg_size);
 		end   = ALIGN(entry->start + entry->length, pg_size);
 
-		if (entry->type != IOMMU_RESV_DIRECT)
+		if (entry->type != IOMMU_RESV_DIRECT &&
+		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
 			continue;
 
 		for (addr = start; addr < end; addr += pg_size) {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ba91666998fb..14a521f85f14 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -135,6 +135,12 @@ enum iommu_attr {
 enum iommu_resv_type {
 	/* Memory regions which must be mapped 1:1 at all times */
 	IOMMU_RESV_DIRECT,
+	/*
+	 * Memory regions which are advertised to be 1:1 but are
+	 * commonly considered relaxable in some conditions,
+	 * for instance in device assignment use case (USB, Graphics)
+	 */
+	IOMMU_RESV_DIRECT_RELAXABLE,
 	/* Arbitrary "never map this or give it to a device" address ranges */
 	IOMMU_RESV_RESERVED,
 	/* Hardware MSI region (untranslated) */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs
  2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
                   ` (5 preceding siblings ...)
  2019-05-16 10:08 ` [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
@ 2019-05-16 10:08 ` Eric Auger
  2019-05-17  4:46   ` Lu Baolu
  6 siblings, 1 reply; 22+ messages in thread
From: Eric Auger @ 2019-05-16 10:08 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory
region type, let's report USB and GFX RMRRs as relaxable ones.

This allows to have a finer reporting at IOMMU API level of
reserved memory regions. This will be exploitable by VFIO to
define the usable IOVA range and detect potential conflicts
between the guest physical address space and host reserved
regions.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a36604f4900f..af1d65fdedfc 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5493,7 +5493,9 @@ static void intel_iommu_get_resv_regions(struct device *device,
 	for_each_rmrr_units(rmrr) {
 		for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
 					  i, i_dev) {
+			struct pci_dev *pdev = to_pci_dev(device);
 			struct iommu_resv_region *resv;
+			enum iommu_resv_type type;
 			size_t length;
 
 			if (i_dev != device &&
@@ -5501,9 +5503,13 @@ static void intel_iommu_get_resv_regions(struct device *device,
 				continue;
 
 			length = rmrr->end_address - rmrr->base_address + 1;
+
+			type = (pdev &&
+				(IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))) ?
+				IOMMU_RESV_DIRECT_RELAXABLE : IOMMU_RESV_DIRECT;
+
 			resv = iommu_alloc_resv_region(rmrr->base_address,
-						       length, prot,
-						       IOMMU_RESV_DIRECT,
+						       length, prot, type,
 						       GFP_ATOMIC);
 			if (!resv)
 				break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 10:08 ` [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
@ 2019-05-16 11:16   ` Jean-Philippe Brucker
  2019-05-16 11:45     ` Auger Eric
  2019-05-16 12:46   ` Robin Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-16 11:16 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson

On 16/05/2019 11:08, Eric Auger wrote:
> Note: At the moment the sysfs ABI is not changed. However I wonder
> whether it wouldn't be preferable to report the direct region as
> "direct_relaxed" there. At the moment, in case the same direct
> region is used by 2 devices, one USB/GFX and another not belonging
> to the previous categories, the direct region will be output twice
> with "direct" type.
> 
> This would unblock Shameer's series:
> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
> https://patchwork.kernel.org/patch/10425309/

Thanks for doing this!

> which failed to get pulled for 4.18 merge window due to IGD
> device assignment regression.
> 
> v2 -> v3:
> - fix direct type check
> ---
>  drivers/iommu/iommu.c | 12 +++++++-----
>  include/linux/iommu.h |  6 ++++++
>  2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ae4ea5c0e6f9..28c3d6351832 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
>  };
>  
>  static const char * const iommu_group_resv_type_string[] = {
> -	[IOMMU_RESV_DIRECT]	= "direct",
> -	[IOMMU_RESV_RESERVED]	= "reserved",
> -	[IOMMU_RESV_MSI]	= "msi",
> -	[IOMMU_RESV_SW_MSI]	= "msi",
> +	[IOMMU_RESV_DIRECT]			= "direct",
> +	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",
> +	[IOMMU_RESV_RESERVED]			= "reserved",
> +	[IOMMU_RESV_MSI]			= "msi",
> +	[IOMMU_RESV_SW_MSI]			= "msi",
>  };
>  
>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
>  		start = ALIGN(entry->start, pg_size);
>  		end   = ALIGN(entry->start + entry->length, pg_size);
>  
> -		if (entry->type != IOMMU_RESV_DIRECT)
> +		if (entry->type != IOMMU_RESV_DIRECT &&
> +		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)

I'm trying to understand why you need to create direct mappings at all
for these relaxable regions. In the host the region is needed for legacy
device features, which are disabled (and cannot be re-enabled) when
assigning the device to a guest?

Thanks,
Jean


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 11:16   ` Jean-Philippe Brucker
@ 2019-05-16 11:45     ` Auger Eric
  2019-05-16 12:43       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 22+ messages in thread
From: Auger Eric @ 2019-05-16 11:45 UTC (permalink / raw)
  To: Jean-Philippe Brucker, eric.auger.pro, joro, iommu, linux-kernel,
	dwmw2, lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson

Hi Jean-Philippe,

On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote:
> On 16/05/2019 11:08, Eric Auger wrote:
>> Note: At the moment the sysfs ABI is not changed. However I wonder
>> whether it wouldn't be preferable to report the direct region as
>> "direct_relaxed" there. At the moment, in case the same direct
>> region is used by 2 devices, one USB/GFX and another not belonging
>> to the previous categories, the direct region will be output twice
>> with "direct" type.
>>
>> This would unblock Shameer's series:
>> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
>> https://patchwork.kernel.org/patch/10425309/
> 
> Thanks for doing this!
> 
>> which failed to get pulled for 4.18 merge window due to IGD
>> device assignment regression.
>>
>> v2 -> v3:
>> - fix direct type check
>> ---
>>  drivers/iommu/iommu.c | 12 +++++++-----
>>  include/linux/iommu.h |  6 ++++++
>>  2 files changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index ae4ea5c0e6f9..28c3d6351832 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
>>  };
>>  
>>  static const char * const iommu_group_resv_type_string[] = {
>> -	[IOMMU_RESV_DIRECT]	= "direct",
>> -	[IOMMU_RESV_RESERVED]	= "reserved",
>> -	[IOMMU_RESV_MSI]	= "msi",
>> -	[IOMMU_RESV_SW_MSI]	= "msi",
>> +	[IOMMU_RESV_DIRECT]			= "direct",
>> +	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",
>> +	[IOMMU_RESV_RESERVED]			= "reserved",
>> +	[IOMMU_RESV_MSI]			= "msi",
>> +	[IOMMU_RESV_SW_MSI]			= "msi",
>>  };
>>  
>>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
>> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
>>  		start = ALIGN(entry->start, pg_size);
>>  		end   = ALIGN(entry->start + entry->length, pg_size);
>>  
>> -		if (entry->type != IOMMU_RESV_DIRECT)
>> +		if (entry->type != IOMMU_RESV_DIRECT &&
>> +		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
> 
> I'm trying to understand why you need to create direct mappings at all
> for these relaxable regions. In the host the region is needed for legacy
> device features, which are disabled (and cannot be re-enabled) when
> assigning the device to a guest?
This follows Kevin's comment in the thread below:
https://patchwork.kernel.org/patch/10449103/#21957279

In normal DMA API host path, those regions need to be 1-1 mapped. They
are likely to be accessed by the driver or FW at early boot phase or
even during execution, depending on features being used.

That's the reason, according to Kevin we couldn't hide them.

We just know that, in general, they are not used anymore when assigning
the device or if accesses are attempted this generally does not block
the assignment use case. For example, it is said in
https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in
legacy IGD assignment use case, there may be "a small numbers of DMAR
faults when initially assigned".

Thanks

Eric


> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 11:45     ` Auger Eric
@ 2019-05-16 12:43       ` Jean-Philippe Brucker
  2019-05-16 12:58         ` Auger Eric
  0 siblings, 1 reply; 22+ messages in thread
From: Jean-Philippe Brucker @ 2019-05-16 12:43 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson

On 16/05/2019 12:45, Auger Eric wrote:
> Hi Jean-Philippe,
> 
> On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote:
>> On 16/05/2019 11:08, Eric Auger wrote:
>>> Note: At the moment the sysfs ABI is not changed. However I wonder
>>> whether it wouldn't be preferable to report the direct region as
>>> "direct_relaxed" there. At the moment, in case the same direct
>>> region is used by 2 devices, one USB/GFX and another not belonging
>>> to the previous categories, the direct region will be output twice
>>> with "direct" type.
>>>
>>> This would unblock Shameer's series:
>>> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
>>> https://patchwork.kernel.org/patch/10425309/
>>
>> Thanks for doing this!
>>
>>> which failed to get pulled for 4.18 merge window due to IGD
>>> device assignment regression.
>>>
>>> v2 -> v3:
>>> - fix direct type check
>>> ---
>>>  drivers/iommu/iommu.c | 12 +++++++-----
>>>  include/linux/iommu.h |  6 ++++++
>>>  2 files changed, 13 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index ae4ea5c0e6f9..28c3d6351832 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
>>>  };
>>>  
>>>  static const char * const iommu_group_resv_type_string[] = {
>>> -	[IOMMU_RESV_DIRECT]	= "direct",
>>> -	[IOMMU_RESV_RESERVED]	= "reserved",
>>> -	[IOMMU_RESV_MSI]	= "msi",
>>> -	[IOMMU_RESV_SW_MSI]	= "msi",
>>> +	[IOMMU_RESV_DIRECT]			= "direct",
>>> +	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",
>>> +	[IOMMU_RESV_RESERVED]			= "reserved",
>>> +	[IOMMU_RESV_MSI]			= "msi",
>>> +	[IOMMU_RESV_SW_MSI]			= "msi",
>>>  };
>>>  
>>>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
>>> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
>>>  		start = ALIGN(entry->start, pg_size);
>>>  		end   = ALIGN(entry->start + entry->length, pg_size);
>>>  
>>> -		if (entry->type != IOMMU_RESV_DIRECT)
>>> +		if (entry->type != IOMMU_RESV_DIRECT &&
>>> +		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
>>
>> I'm trying to understand why you need to create direct mappings at all
>> for these relaxable regions. In the host the region is needed for legacy
>> device features, which are disabled (and cannot be re-enabled) when
>> assigning the device to a guest?
> This follows Kevin's comment in the thread below:
> https://patchwork.kernel.org/patch/10449103/#21957279
> 
> In normal DMA API host path, those regions need to be 1-1 mapped. They
> are likely to be accessed by the driver or FW at early boot phase or
> even during execution, depending on features being used.
> 
> That's the reason, according to Kevin we couldn't hide them.
> 
> We just know that, in general, they are not used anymore when assigning
> the device or if accesses are attempted this generally does not block
> the assignment use case. For example, it is said in
> https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in
> legacy IGD assignment use case, there may be "a small numbers of DMAR
> faults when initially assigned".

Hmm, fair enough. That doesn't sound too good, if the device might
perform arbitrary writes into guest memory once new IOMMU mappings are
in place. I was wondering if we could report some IOVA ranges as
"available but avoid if possible". If the guest has a vIOMMU, they are
easy to avoid. But I doubt they would ever get used, since probably no
one is going to instantiate a vIOMMU for a graphics device in legacy mode.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 10:08 ` [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
  2019-05-16 11:16   ` Jean-Philippe Brucker
@ 2019-05-16 12:46   ` Robin Murphy
  2019-05-16 13:23     ` Auger Eric
  1 sibling, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2019-05-16 12:46 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, will.deacon, hanjun.guo, sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

On 16/05/2019 11:08, Eric Auger wrote:
> Introduce a new type for reserved region. This corresponds
> to directly mapped regions which are known to be relaxable
> in some specific conditions, such as device assignment use
> case. Well known examples are those used by USB controllers
> providing PS/2 keyboard emulation for pre-boot BIOS and
> early BOOT or RMRRs associated to IGD working in legacy mode.
> 
> Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
> from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
> RMRR on graphics devices too"), those regions are currently
> considered "safe" with respect to device assignment use case
> which requires a non direct mapping at IOMMU physical level
> (RAM GPA -> HPA mapping).
> 
> Those RMRRs currently exist and sometimes the device is
> attempting to access it but this has not been considered
> an issue until now.
> 
> However at the moment, iommu_get_group_resv_regions() is
> not able to make any difference between directly mapped
> regions: those which must be absolutely enforced and those
> like above ones which are known as relaxable.
> 
> This is a blocker for reporting severe conflicts between
> non relaxable RMRRs (like MSI doorbells) and guest GPA space.
> 
> With this new reserved region type we will be able to use
> iommu_get_group_resv_regions() to enumerate the IOVA space
> that is usable through the IOMMU API without introducing
> regressions with respect to existing device assignment
> use cases (USB and IGD).
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> Note: At the moment the sysfs ABI is not changed. However I wonder
> whether it wouldn't be preferable to report the direct region as
> "direct_relaxed" there. At the moment, in case the same direct
> region is used by 2 devices, one USB/GFX and another not belonging
> to the previous categories, the direct region will be output twice
> with "direct" type.

Hmm, that sounds a bit off - if we have overlapping regions within the 
same domain, then we need to do some additional pre-processing to adjust 
them anyway, since any part of a relaxable region which overlaps a 
non-relaxable region cannot actually be relaxed, and so really should 
never be described as such.

> This would unblock Shameer's series:
> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
> https://patchwork.kernel.org/patch/10425309/
> 
> which failed to get pulled for 4.18 merge window due to IGD
> device assignment regression.
> 
> v2 -> v3:
> - fix direct type check
> ---
>   drivers/iommu/iommu.c | 12 +++++++-----
>   include/linux/iommu.h |  6 ++++++
>   2 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index ae4ea5c0e6f9..28c3d6351832 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
>   };
>   
>   static const char * const iommu_group_resv_type_string[] = {
> -	[IOMMU_RESV_DIRECT]	= "direct",
> -	[IOMMU_RESV_RESERVED]	= "reserved",
> -	[IOMMU_RESV_MSI]	= "msi",
> -	[IOMMU_RESV_SW_MSI]	= "msi",
> +	[IOMMU_RESV_DIRECT]			= "direct",
> +	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",

Wasn't part of the idea that we might not need to expose these to 
userspace at all if they're only relevant to default domains, and not to 
VFIO domains or anything else userspace can get involved with?

> +	[IOMMU_RESV_RESERVED]			= "reserved",
> +	[IOMMU_RESV_MSI]			= "msi",
> +	[IOMMU_RESV_SW_MSI]			= "msi",
>   };
>   
>   #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
>   		start = ALIGN(entry->start, pg_size);
>   		end   = ALIGN(entry->start + entry->length, pg_size);
>   
> -		if (entry->type != IOMMU_RESV_DIRECT)
> +		if (entry->type != IOMMU_RESV_DIRECT &&
> +		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
>   			continue;
>   
>   		for (addr = start; addr < end; addr += pg_size) {
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ba91666998fb..14a521f85f14 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -135,6 +135,12 @@ enum iommu_attr {
>   enum iommu_resv_type {
>   	/* Memory regions which must be mapped 1:1 at all times */
>   	IOMMU_RESV_DIRECT,
> +	/*
> +	 * Memory regions which are advertised to be 1:1 but are
> +	 * commonly considered relaxable in some conditions,
> +	 * for instance in device assignment use case (USB, Graphics)
> +	 */
> +	IOMMU_RESV_DIRECT_RELAXABLE,

What do you think of s/RELAXABLE/BOOT/ ? My understanding is that these 
regions are only considered relevant until Linux has taken full control 
of the endpoint, and having a slightly more well-defined scope than 
"some conditions" might be nice.

There's an open question of how to handle things like simple-fb and EFI 
framebuffer handover on Arm and other platforms, so I'd really like to 
be able to slot that right into this case as well (how we describe the 
relevant connections in DT/ACPI is another matter, but hey, one step at 
a time...)

Otherwise though, I too am pleased to see this, thanks! I did start 
having a go myself after talking to Alex at KVM Forum, but I got bogged 
down in the intel-iommu parts and inevitably distracted - patch #7 looks 
beautifully simpler than I'd convinced myself was possible :D

Robin.

>   	/* Arbitrary "never map this or give it to a device" address ranges */
>   	IOMMU_RESV_RESERVED,
>   	/* Hardware MSI region (untranslated) */
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 12:43       ` Jean-Philippe Brucker
@ 2019-05-16 12:58         ` Auger Eric
  2019-05-16 17:06           ` Alex Williamson
  0 siblings, 1 reply; 22+ messages in thread
From: Auger Eric @ 2019-05-16 12:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker, eric.auger.pro, joro, iommu, linux-kernel,
	dwmw2, lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson

Hi Jean-Philippe,

On 5/16/19 2:43 PM, Jean-Philippe Brucker wrote:
> On 16/05/2019 12:45, Auger Eric wrote:
>> Hi Jean-Philippe,
>>
>> On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote:
>>> On 16/05/2019 11:08, Eric Auger wrote:
>>>> Note: At the moment the sysfs ABI is not changed. However I wonder
>>>> whether it wouldn't be preferable to report the direct region as
>>>> "direct_relaxed" there. At the moment, in case the same direct
>>>> region is used by 2 devices, one USB/GFX and another not belonging
>>>> to the previous categories, the direct region will be output twice
>>>> with "direct" type.
>>>>
>>>> This would unblock Shameer's series:
>>>> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
>>>> https://patchwork.kernel.org/patch/10425309/
>>>
>>> Thanks for doing this!
>>>
>>>> which failed to get pulled for 4.18 merge window due to IGD
>>>> device assignment regression.
>>>>
>>>> v2 -> v3:
>>>> - fix direct type check
>>>> ---
>>>>  drivers/iommu/iommu.c | 12 +++++++-----
>>>>  include/linux/iommu.h |  6 ++++++
>>>>  2 files changed, 13 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index ae4ea5c0e6f9..28c3d6351832 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
>>>>  };
>>>>  
>>>>  static const char * const iommu_group_resv_type_string[] = {
>>>> -	[IOMMU_RESV_DIRECT]	= "direct",
>>>> -	[IOMMU_RESV_RESERVED]	= "reserved",
>>>> -	[IOMMU_RESV_MSI]	= "msi",
>>>> -	[IOMMU_RESV_SW_MSI]	= "msi",
>>>> +	[IOMMU_RESV_DIRECT]			= "direct",
>>>> +	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",
>>>> +	[IOMMU_RESV_RESERVED]			= "reserved",
>>>> +	[IOMMU_RESV_MSI]			= "msi",
>>>> +	[IOMMU_RESV_SW_MSI]			= "msi",
>>>>  };
>>>>  
>>>>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
>>>> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
>>>>  		start = ALIGN(entry->start, pg_size);
>>>>  		end   = ALIGN(entry->start + entry->length, pg_size);
>>>>  
>>>> -		if (entry->type != IOMMU_RESV_DIRECT)
>>>> +		if (entry->type != IOMMU_RESV_DIRECT &&
>>>> +		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
>>>
>>> I'm trying to understand why you need to create direct mappings at all
>>> for these relaxable regions. In the host the region is needed for legacy
>>> device features, which are disabled (and cannot be re-enabled) when
>>> assigning the device to a guest?
>> This follows Kevin's comment in the thread below:
>> https://patchwork.kernel.org/patch/10449103/#21957279
>>
>> In normal DMA API host path, those regions need to be 1-1 mapped. They
>> are likely to be accessed by the driver or FW at early boot phase or
>> even during execution, depending on features being used.
>>
>> That's the reason, according to Kevin we couldn't hide them.
>>
>> We just know that, in general, they are not used anymore when assigning
>> the device or if accesses are attempted this generally does not block
>> the assignment use case. For example, it is said in
>> https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in
>> legacy IGD assignment use case, there may be "a small numbers of DMAR
>> faults when initially assigned".
> 
> Hmm, fair enough. That doesn't sound too good, if the device might
> perform arbitrary writes into guest memory once new IOMMU mappings are
> in place. I was wondering if we could report some IOVA ranges as
> "available but avoid if possible".
In Shameer's series we currently reject any vfio dma_map that would fall
into an RMRR (hence the regression on existing USB/GFX use case). With
the relaxable RMRR info we could imagine to let the userspace choose
whether we want to proceed with the dma_map despite the risk or
introduce a vfio_iommu_type1 module option (turned off by default for
not regressing existing USB/GFX passthrough) that would forbid dma_map
on relaxable RMRR regions.

Thanks

Eric

 If the guest has a vIOMMU, they are
> easy to avoid. But I doubt they would ever get used, since probably no
> one is going to instantiate a vIOMMU for a graphics device in legacy mode.
> 
> Thanks,
> Jean
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 12:46   ` Robin Murphy
@ 2019-05-16 13:23     ` Auger Eric
  2019-05-16 16:53       ` Alex Williamson
  2019-05-16 17:53       ` Robin Murphy
  0 siblings, 2 replies; 22+ messages in thread
From: Auger Eric @ 2019-05-16 13:23 UTC (permalink / raw)
  To: Robin Murphy, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, will.deacon, hanjun.guo, sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Hi Robin,
On 5/16/19 2:46 PM, Robin Murphy wrote:
> On 16/05/2019 11:08, Eric Auger wrote:
>> Introduce a new type for reserved region. This corresponds
>> to directly mapped regions which are known to be relaxable
>> in some specific conditions, such as device assignment use
>> case. Well known examples are those used by USB controllers
>> providing PS/2 keyboard emulation for pre-boot BIOS and
>> early BOOT or RMRRs associated to IGD working in legacy mode.
>>
>> Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
>> from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
>> RMRR on graphics devices too"), those regions are currently
>> considered "safe" with respect to device assignment use case
>> which requires a non direct mapping at IOMMU physical level
>> (RAM GPA -> HPA mapping).
>>
>> Those RMRRs currently exist and sometimes the device is
>> attempting to access it but this has not been considered
>> an issue until now.
>>
>> However at the moment, iommu_get_group_resv_regions() is
>> not able to make any difference between directly mapped
>> regions: those which must be absolutely enforced and those
>> like above ones which are known as relaxable.
>>
>> This is a blocker for reporting severe conflicts between
>> non relaxable RMRRs (like MSI doorbells) and guest GPA space.
>>
>> With this new reserved region type we will be able to use
>> iommu_get_group_resv_regions() to enumerate the IOVA space
>> that is usable through the IOMMU API without introducing
>> regressions with respect to existing device assignment
>> use cases (USB and IGD).
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> Note: At the moment the sysfs ABI is not changed. However I wonder
>> whether it wouldn't be preferable to report the direct region as
>> "direct_relaxed" there. At the moment, in case the same direct
>> region is used by 2 devices, one USB/GFX and another not belonging
>> to the previous categories, the direct region will be output twice
>> with "direct" type.
> 
> Hmm, that sounds a bit off - if we have overlapping regions within the
> same domain, then we need to do some additional pre-processing to adjust
> them anyway, since any part of a relaxable region which overlaps a
> non-relaxable region cannot actually be relaxed, and so really should
> never be described as such.
In iommu_insert_resv_region(), we are overlapping regions of the same
type. So iommu_get_group_resv_regions() should return both the relaxable
region and non relaxable one. I should test this again using a hacked
kernel though.
> 
>> This would unblock Shameer's series:
>> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
>> https://patchwork.kernel.org/patch/10425309/
>>
>> which failed to get pulled for 4.18 merge window due to IGD
>> device assignment regression.
>>
>> v2 -> v3:
>> - fix direct type check
>> ---
>>   drivers/iommu/iommu.c | 12 +++++++-----
>>   include/linux/iommu.h |  6 ++++++
>>   2 files changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index ae4ea5c0e6f9..28c3d6351832 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
>>   };
>>     static const char * const iommu_group_resv_type_string[] = {
>> -    [IOMMU_RESV_DIRECT]    = "direct",
>> -    [IOMMU_RESV_RESERVED]    = "reserved",
>> -    [IOMMU_RESV_MSI]    = "msi",
>> -    [IOMMU_RESV_SW_MSI]    = "msi",
>> +    [IOMMU_RESV_DIRECT]            = "direct",
>> +    [IOMMU_RESV_DIRECT_RELAXABLE]        = "direct",
> 
> Wasn't part of the idea that we might not need to expose these to
> userspace at all if they're only relevant to default domains, and not to
> VFIO domains or anything else userspace can get involved with?
Fur sure, today, those regions are exposed as "direct". Isn't this
information relevant to the userspace as it would rather not use those
direct regions. We eventually chose to implement the check inside the
VFIO driver but I understood we wanted this info to be visible. Then it
is arguable whether we should expose the new relaxable type, at the pain
of making the UABI evolve.

> 
>> +    [IOMMU_RESV_RESERVED]            = "reserved",
>> +    [IOMMU_RESV_MSI]            = "msi",
>> +    [IOMMU_RESV_SW_MSI]            = "msi",
>>   };
>>     #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)        \
>> @@ -573,7 +574,8 @@ static int
>> iommu_group_create_direct_mappings(struct iommu_group *group,
>>           start = ALIGN(entry->start, pg_size);
>>           end   = ALIGN(entry->start + entry->length, pg_size);
>>   -        if (entry->type != IOMMU_RESV_DIRECT)
>> +        if (entry->type != IOMMU_RESV_DIRECT &&
>> +            entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
>>               continue;
>>             for (addr = start; addr < end; addr += pg_size) {
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index ba91666998fb..14a521f85f14 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -135,6 +135,12 @@ enum iommu_attr {
>>   enum iommu_resv_type {
>>       /* Memory regions which must be mapped 1:1 at all times */
>>       IOMMU_RESV_DIRECT,
>> +    /*
>> +     * Memory regions which are advertised to be 1:1 but are
>> +     * commonly considered relaxable in some conditions,
>> +     * for instance in device assignment use case (USB, Graphics)
>> +     */
>> +    IOMMU_RESV_DIRECT_RELAXABLE,
> 
> What do you think of s/RELAXABLE/BOOT/ ? My understanding is that these
> regions are only considered relevant until Linux has taken full control
> of the endpoint, and having a slightly more well-defined scope than
> "some conditions" might be nice.
That's not my current understanding. I think those RMRRs may be used
post-boot (especially the IGD stolen memory covered by RMRR). I
understand this depends on the video mode or FW in use by the IGD. But I
am definitively not an expert here.
> 
> There's an open question of how to handle things like simple-fb and EFI
> framebuffer handover on Arm and other platforms, so I'd really like to
> be able to slot that right into this case as well (how we describe the
> relevant connections in DT/ACPI is another matter, but hey, one step at
> a time...)
> 
> Otherwise though, I too am pleased to see this, thanks! I did start
> having a go myself after talking to Alex at KVM Forum, but I got bogged
> down in the intel-iommu parts and inevitably distracted - patch #7 looks
> beautifully simpler than I'd convinced myself was possible :D
Thank you Robin. Let's see if we can converge ...

Thanks

Eric
> 
> Robin.
> 
>>       /* Arbitrary "never map this or give it to a device" address
>> ranges */
>>       IOMMU_RESV_RESERVED,
>>       /* Hardware MSI region (untranslated) */
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 13:23     ` Auger Eric
@ 2019-05-16 16:53       ` Alex Williamson
  2019-05-16 17:53       ` Robin Murphy
  1 sibling, 0 replies; 22+ messages in thread
From: Alex Williamson @ 2019-05-16 16:53 UTC (permalink / raw)
  To: Auger Eric
  Cc: Robin Murphy, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, will.deacon, hanjun.guo, sudeep.holla,
	shameerali.kolothum.thodi

On Thu, 16 May 2019 15:23:17 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Robin,
> On 5/16/19 2:46 PM, Robin Murphy wrote:
> >> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> >> index ba91666998fb..14a521f85f14 100644
> >> --- a/include/linux/iommu.h
> >> +++ b/include/linux/iommu.h
> >> @@ -135,6 +135,12 @@ enum iommu_attr {
> >>   enum iommu_resv_type {
> >>       /* Memory regions which must be mapped 1:1 at all times */
> >>       IOMMU_RESV_DIRECT,
> >> +    /*
> >> +     * Memory regions which are advertised to be 1:1 but are
> >> +     * commonly considered relaxable in some conditions,
> >> +     * for instance in device assignment use case (USB, Graphics)
> >> +     */
> >> +    IOMMU_RESV_DIRECT_RELAXABLE,  
> > 
> > What do you think of s/RELAXABLE/BOOT/ ? My understanding is that these
> > regions are only considered relevant until Linux has taken full control
> > of the endpoint, and having a slightly more well-defined scope than
> > "some conditions" might be nice.  
> That's not my current understanding. I think those RMRRs may be used
> post-boot (especially the IGD stolen memory covered by RMRR). I
> understand this depends on the video mode or FW in use by the IGD. But I
> am definitively not an expert here.

Nor am I, but generally the distinction I'm trying to achieve is
whether the reserved region is necessary for the device operation or
for the system operation.  If we deny the IGD device its mapping to
stolen memory, then maybe the IGD device doesn't work, no big deal.  If
we deny USB its RMRR, then we assume we're only cutting off PS/2
emulation that we expect isn't used at this point anyway.  Both of these
are choices in how the driver wants to use the device.  On the other
hand if we have a system where management firmware has backdoors to
devices for system health monitoring, then declining to honor the RMRR
has larger implications.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 12:58         ` Auger Eric
@ 2019-05-16 17:06           ` Alex Williamson
  2019-05-16 17:23             ` Robin Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Alex Williamson @ 2019-05-16 17:06 UTC (permalink / raw)
  To: Auger Eric
  Cc: Jean-Philippe Brucker, eric.auger.pro, joro, iommu, linux-kernel,
	dwmw2, lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla

On Thu, 16 May 2019 14:58:08 +0200
Auger Eric <eric.auger@redhat.com> wrote:

> Hi Jean-Philippe,
> 
> On 5/16/19 2:43 PM, Jean-Philippe Brucker wrote:
> > On 16/05/2019 12:45, Auger Eric wrote:  
> >> Hi Jean-Philippe,
> >>
> >> On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote:  
> >>> On 16/05/2019 11:08, Eric Auger wrote:  
> >>>> Note: At the moment the sysfs ABI is not changed. However I wonder
> >>>> whether it wouldn't be preferable to report the direct region as
> >>>> "direct_relaxed" there. At the moment, in case the same direct
> >>>> region is used by 2 devices, one USB/GFX and another not belonging
> >>>> to the previous categories, the direct region will be output twice
> >>>> with "direct" type.
> >>>>
> >>>> This would unblock Shameer's series:
> >>>> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
> >>>> https://patchwork.kernel.org/patch/10425309/  
> >>>
> >>> Thanks for doing this!
> >>>  
> >>>> which failed to get pulled for 4.18 merge window due to IGD
> >>>> device assignment regression.
> >>>>
> >>>> v2 -> v3:
> >>>> - fix direct type check
> >>>> ---
> >>>>  drivers/iommu/iommu.c | 12 +++++++-----
> >>>>  include/linux/iommu.h |  6 ++++++
> >>>>  2 files changed, 13 insertions(+), 5 deletions(-)
> >>>>
> >>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>>> index ae4ea5c0e6f9..28c3d6351832 100644
> >>>> --- a/drivers/iommu/iommu.c
> >>>> +++ b/drivers/iommu/iommu.c
> >>>> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
> >>>>  };
> >>>>  
> >>>>  static const char * const iommu_group_resv_type_string[] = {
> >>>> -	[IOMMU_RESV_DIRECT]	= "direct",
> >>>> -	[IOMMU_RESV_RESERVED]	= "reserved",
> >>>> -	[IOMMU_RESV_MSI]	= "msi",
> >>>> -	[IOMMU_RESV_SW_MSI]	= "msi",
> >>>> +	[IOMMU_RESV_DIRECT]			= "direct",
> >>>> +	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",
> >>>> +	[IOMMU_RESV_RESERVED]			= "reserved",
> >>>> +	[IOMMU_RESV_MSI]			= "msi",
> >>>> +	[IOMMU_RESV_SW_MSI]			= "msi",
> >>>>  };
> >>>>  
> >>>>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
> >>>> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
> >>>>  		start = ALIGN(entry->start, pg_size);
> >>>>  		end   = ALIGN(entry->start + entry->length, pg_size);
> >>>>  
> >>>> -		if (entry->type != IOMMU_RESV_DIRECT)
> >>>> +		if (entry->type != IOMMU_RESV_DIRECT &&
> >>>> +		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)  
> >>>
> >>> I'm trying to understand why you need to create direct mappings at all
> >>> for these relaxable regions. In the host the region is needed for legacy
> >>> device features, which are disabled (and cannot be re-enabled) when
> >>> assigning the device to a guest?  
> >> This follows Kevin's comment in the thread below:
> >> https://patchwork.kernel.org/patch/10449103/#21957279
> >>
> >> In normal DMA API host path, those regions need to be 1-1 mapped. They
> >> are likely to be accessed by the driver or FW at early boot phase or
> >> even during execution, depending on features being used.
> >>
> >> That's the reason, according to Kevin we couldn't hide them.
> >>
> >> We just know that, in general, they are not used anymore when assigning
> >> the device or if accesses are attempted this generally does not block
> >> the assignment use case. For example, it is said in
> >> https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in
> >> legacy IGD assignment use case, there may be "a small numbers of DMAR
> >> faults when initially assigned".  
> > 
> > Hmm, fair enough. That doesn't sound too good, if the device might
> > perform arbitrary writes into guest memory once new IOMMU mappings are
> > in place. I was wondering if we could report some IOVA ranges as
> > "available but avoid if possible".  
> In Shameer's series we currently reject any vfio dma_map that would fall
> into an RMRR (hence the regression on existing USB/GFX use case). With
> the relaxable RMRR info we could imagine to let the userspace choose
> whether we want to proceed with the dma_map despite the risk or
> introduce a vfio_iommu_type1 module option (turned off by default for
> not regressing existing USB/GFX passthrough) that would forbid dma_map
> on relaxable RMRR regions.

Yep, the risk that Jean-Philippe mentions is real, the IGD device has
the stolen memory addresses latched into the hardware and we're unable
to change that.  What we try to do now is trap page table writes to the
device and translate them to a VM allocated stolen memory range, which
is sufficient for getting a BIOS splash screen, but we really want to
assume that the OS level driver just doesn't use the stolen memory
range.  There was a time when it seemed like we could assume the Intel
drivers were heading in that direction, but it seems that's no longer
an actual goal.  To fully support IGD assignment in a way that isn't as
fragile as it is today, we'd want to re-export the RMRR out to
userspace so that QEMU could identity map it into the VM address
space.  That's not trivial, it's only one of several issues around
IGD assignment, and we've got GVT-g (Intel vGPUs) now that don't impose
these requirements, so motivation to tackle the issue is somewhat
reduced.

With the changes here, we might want vfio to issue a warning when one
of these relaxed reserved regions is ignored and we'd probably want a
module option to opt-in to strict enforcement, where downstreams that
don't claim to support IGD assignment might enforce this by default.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 17:06           ` Alex Williamson
@ 2019-05-16 17:23             ` Robin Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2019-05-16 17:23 UTC (permalink / raw)
  To: Alex Williamson, Auger Eric
  Cc: Jean-Philippe Brucker, will.deacon, linux-kernel, iommu,
	sudeep.holla, dwmw2, eric.auger.pro

On 16/05/2019 18:06, Alex Williamson wrote:
> On Thu, 16 May 2019 14:58:08 +0200
> Auger Eric <eric.auger@redhat.com> wrote:
> 
>> Hi Jean-Philippe,
>>
>> On 5/16/19 2:43 PM, Jean-Philippe Brucker wrote:
>>> On 16/05/2019 12:45, Auger Eric wrote:
>>>> Hi Jean-Philippe,
>>>>
>>>> On 5/16/19 1:16 PM, Jean-Philippe Brucker wrote:
>>>>> On 16/05/2019 11:08, Eric Auger wrote:
>>>>>> Note: At the moment the sysfs ABI is not changed. However I wonder
>>>>>> whether it wouldn't be preferable to report the direct region as
>>>>>> "direct_relaxed" there. At the moment, in case the same direct
>>>>>> region is used by 2 devices, one USB/GFX and another not belonging
>>>>>> to the previous categories, the direct region will be output twice
>>>>>> with "direct" type.
>>>>>>
>>>>>> This would unblock Shameer's series:
>>>>>> [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
>>>>>> https://patchwork.kernel.org/patch/10425309/
>>>>>
>>>>> Thanks for doing this!
>>>>>   
>>>>>> which failed to get pulled for 4.18 merge window due to IGD
>>>>>> device assignment regression.
>>>>>>
>>>>>> v2 -> v3:
>>>>>> - fix direct type check
>>>>>> ---
>>>>>>   drivers/iommu/iommu.c | 12 +++++++-----
>>>>>>   include/linux/iommu.h |  6 ++++++
>>>>>>   2 files changed, 13 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>>>> index ae4ea5c0e6f9..28c3d6351832 100644
>>>>>> --- a/drivers/iommu/iommu.c
>>>>>> +++ b/drivers/iommu/iommu.c
>>>>>> @@ -73,10 +73,11 @@ struct iommu_group_attribute {
>>>>>>   };
>>>>>>   
>>>>>>   static const char * const iommu_group_resv_type_string[] = {
>>>>>> -	[IOMMU_RESV_DIRECT]	= "direct",
>>>>>> -	[IOMMU_RESV_RESERVED]	= "reserved",
>>>>>> -	[IOMMU_RESV_MSI]	= "msi",
>>>>>> -	[IOMMU_RESV_SW_MSI]	= "msi",
>>>>>> +	[IOMMU_RESV_DIRECT]			= "direct",
>>>>>> +	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct",
>>>>>> +	[IOMMU_RESV_RESERVED]			= "reserved",
>>>>>> +	[IOMMU_RESV_MSI]			= "msi",
>>>>>> +	[IOMMU_RESV_SW_MSI]			= "msi",
>>>>>>   };
>>>>>>   
>>>>>>   #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
>>>>>> @@ -573,7 +574,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
>>>>>>   		start = ALIGN(entry->start, pg_size);
>>>>>>   		end   = ALIGN(entry->start + entry->length, pg_size);
>>>>>>   
>>>>>> -		if (entry->type != IOMMU_RESV_DIRECT)
>>>>>> +		if (entry->type != IOMMU_RESV_DIRECT &&
>>>>>> +		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
>>>>>
>>>>> I'm trying to understand why you need to create direct mappings at all
>>>>> for these relaxable regions. In the host the region is needed for legacy
>>>>> device features, which are disabled (and cannot be re-enabled) when
>>>>> assigning the device to a guest?
>>>> This follows Kevin's comment in the thread below:
>>>> https://patchwork.kernel.org/patch/10449103/#21957279
>>>>
>>>> In normal DMA API host path, those regions need to be 1-1 mapped. They
>>>> are likely to be accessed by the driver or FW at early boot phase or
>>>> even during execution, depending on features being used.
>>>>
>>>> That's the reason, according to Kevin we couldn't hide them.
>>>>
>>>> We just know that, in general, they are not used anymore when assigning
>>>> the device or if accesses are attempted this generally does not block
>>>> the assignment use case. For example, it is said in
>>>> https://github.com/qemu/qemu/blob/master/docs/igd-assign.txt that in
>>>> legacy IGD assignment use case, there may be "a small numbers of DMAR
>>>> faults when initially assigned".
>>>
>>> Hmm, fair enough. That doesn't sound too good, if the device might
>>> perform arbitrary writes into guest memory once new IOMMU mappings are
>>> in place. I was wondering if we could report some IOVA ranges as
>>> "available but avoid if possible".
>> In Shameer's series we currently reject any vfio dma_map that would fall
>> into an RMRR (hence the regression on existing USB/GFX use case). With
>> the relaxable RMRR info we could imagine to let the userspace choose
>> whether we want to proceed with the dma_map despite the risk or
>> introduce a vfio_iommu_type1 module option (turned off by default for
>> not regressing existing USB/GFX passthrough) that would forbid dma_map
>> on relaxable RMRR regions.
> 
> Yep, the risk that Jean-Philippe mentions is real, the IGD device has
> the stolen memory addresses latched into the hardware and we're unable
> to change that.  What we try to do now is trap page table writes to the
> device and translate them to a VM allocated stolen memory range, which
> is sufficient for getting a BIOS splash screen, but we really want to
> assume that the OS level driver just doesn't use the stolen memory
> range.  There was a time when it seemed like we could assume the Intel
> drivers were heading in that direction, but it seems that's no longer
> an actual goal.  To fully support IGD assignment in a way that isn't as
> fragile as it is today, we'd want to re-export the RMRR out to
> userspace so that QEMU could identity map it into the VM address
> space.  That's not trivial, it's only one of several issues around
> IGD assignment, and we've got GVT-g (Intel vGPUs) now that don't impose
> these requirements, so motivation to tackle the issue is somewhat
> reduced.
> 
> With the changes here, we might want vfio to issue a warning when one
> of these relaxed reserved regions is ignored and we'd probably want a
> module option to opt-in to strict enforcement, where downstreams that
> don't claim to support IGD assignment might enforce this by default.

OK, I guess that resolves my thoughts about "boot" reservations .vs 
"relaxable" ones - clearly they are distinct things, we will ultimately 
want both, and only the former can be hidden from userspace (and ignored 
by VFIO). I'm happy with that; we can come back to boot regions at a 
later date :)

Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 13:23     ` Auger Eric
  2019-05-16 16:53       ` Alex Williamson
@ 2019-05-16 17:53       ` Robin Murphy
  2019-05-17  7:11         ` Auger Eric
  2019-05-20 12:45         ` Auger Eric
  1 sibling, 2 replies; 22+ messages in thread
From: Robin Murphy @ 2019-05-16 17:53 UTC (permalink / raw)
  To: Auger Eric, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, will.deacon, hanjun.guo, sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

On 16/05/2019 14:23, Auger Eric wrote:
> Hi Robin,
> On 5/16/19 2:46 PM, Robin Murphy wrote:
>> On 16/05/2019 11:08, Eric Auger wrote:
>>> Introduce a new type for reserved region. This corresponds
>>> to directly mapped regions which are known to be relaxable
>>> in some specific conditions, such as device assignment use
>>> case. Well known examples are those used by USB controllers
>>> providing PS/2 keyboard emulation for pre-boot BIOS and
>>> early BOOT or RMRRs associated to IGD working in legacy mode.
>>>
>>> Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
>>> from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
>>> RMRR on graphics devices too"), those regions are currently
>>> considered "safe" with respect to device assignment use case
>>> which requires a non direct mapping at IOMMU physical level
>>> (RAM GPA -> HPA mapping).
>>>
>>> Those RMRRs currently exist and sometimes the device is
>>> attempting to access it but this has not been considered
>>> an issue until now.
>>>
>>> However at the moment, iommu_get_group_resv_regions() is
>>> not able to make any difference between directly mapped
>>> regions: those which must be absolutely enforced and those
>>> like above ones which are known as relaxable.
>>>
>>> This is a blocker for reporting severe conflicts between
>>> non relaxable RMRRs (like MSI doorbells) and guest GPA space.
>>>
>>> With this new reserved region type we will be able to use
>>> iommu_get_group_resv_regions() to enumerate the IOVA space
>>> that is usable through the IOMMU API without introducing
>>> regressions with respect to existing device assignment
>>> use cases (USB and IGD).
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> ---
>>>
>>> Note: At the moment the sysfs ABI is not changed. However I wonder
>>> whether it wouldn't be preferable to report the direct region as
>>> "direct_relaxed" there. At the moment, in case the same direct
>>> region is used by 2 devices, one USB/GFX and another not belonging
>>> to the previous categories, the direct region will be output twice
>>> with "direct" type.
>>
>> Hmm, that sounds a bit off - if we have overlapping regions within the
>> same domain, then we need to do some additional pre-processing to adjust
>> them anyway, since any part of a relaxable region which overlaps a
>> non-relaxable region cannot actually be relaxed, and so really should
>> never be described as such.
> In iommu_insert_resv_region(), we are overlapping regions of the same
> type. So iommu_get_group_resv_regions() should return both the relaxable
> region and non relaxable one. I should test this again using a hacked
> kernel though.

We should still consider relaxable regions as being able to merge back 
in to regular direct regions to a degree - If a relaxable region falls 
entirely within a direct one then there's no point exposing it because 
the direct region *has* to take precedence and be enforced. If there is 
an incomplete overlap then we could possibly just trust consumers to see 
it and give the direct region precedence themselves, but since the 
relaxable region is our own in-kernel invention rather than firmware 
gospel I think it would be safer to truncate it to just its 
non-overlapping part.

Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs
  2019-05-16 10:08 ` [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs Eric Auger
@ 2019-05-17  4:46   ` Lu Baolu
  2019-05-17  7:07     ` Auger Eric
  0 siblings, 1 reply; 22+ messages in thread
From: Lu Baolu @ 2019-05-17  4:46 UTC (permalink / raw)
  To: Eric Auger, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: baolu.lu, alex.williamson, shameerali.kolothum.thodi

Hi Eric,

On 5/16/19 6:08 PM, Eric Auger wrote:
> Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory
> region type, let's report USB and GFX RMRRs as relaxable ones.
> 
> This allows to have a finer reporting at IOMMU API level of
> reserved memory regions. This will be exploitable by VFIO to
> define the usable IOVA range and detect potential conflicts
> between the guest physical address space and host reserved
> regions.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>   drivers/iommu/intel-iommu.c | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index a36604f4900f..af1d65fdedfc 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5493,7 +5493,9 @@ static void intel_iommu_get_resv_regions(struct device *device,
>   	for_each_rmrr_units(rmrr) {
>   		for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
>   					  i, i_dev) {
> +			struct pci_dev *pdev = to_pci_dev(device);

Probably should be:

struct pci_dev *pdev = dev_is_pci(device) ? to_pci_dev(device) : NULL;

Best regards,
Lu Baolu


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs
  2019-05-17  4:46   ` Lu Baolu
@ 2019-05-17  7:07     ` Auger Eric
  0 siblings, 0 replies; 22+ messages in thread
From: Auger Eric @ 2019-05-17  7:07 UTC (permalink / raw)
  To: Lu Baolu, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Hi Lu,

On 5/17/19 6:46 AM, Lu Baolu wrote:
> Hi Eric,
> 
> On 5/16/19 6:08 PM, Eric Auger wrote:
>> Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory
>> region type, let's report USB and GFX RMRRs as relaxable ones.
>>
>> This allows to have a finer reporting at IOMMU API level of
>> reserved memory regions. This will be exploitable by VFIO to
>> define the usable IOVA range and detect potential conflicts
>> between the guest physical address space and host reserved
>> regions.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   drivers/iommu/intel-iommu.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index a36604f4900f..af1d65fdedfc 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5493,7 +5493,9 @@ static void intel_iommu_get_resv_regions(struct
>> device *device,
>>       for_each_rmrr_units(rmrr) {
>>           for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
>>                         i, i_dev) {
>> +            struct pci_dev *pdev = to_pci_dev(device);
> 
> Probably should be:
> 
> struct pci_dev *pdev = dev_is_pci(device) ? to_pci_dev(device) : NULL;

That's correct. I will fix that asap.

Thanks!

Eric
> 
> Best regards,
> Lu Baolu
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 17:53       ` Robin Murphy
@ 2019-05-17  7:11         ` Auger Eric
  2019-05-20 12:45         ` Auger Eric
  1 sibling, 0 replies; 22+ messages in thread
From: Auger Eric @ 2019-05-17  7:11 UTC (permalink / raw)
  To: Robin Murphy, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, will.deacon, hanjun.guo, sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Hi Robin,
On 5/16/19 7:53 PM, Robin Murphy wrote:
> On 16/05/2019 14:23, Auger Eric wrote:
>> Hi Robin,
>> On 5/16/19 2:46 PM, Robin Murphy wrote:
>>> On 16/05/2019 11:08, Eric Auger wrote:
>>>> Introduce a new type for reserved region. This corresponds
>>>> to directly mapped regions which are known to be relaxable
>>>> in some specific conditions, such as device assignment use
>>>> case. Well known examples are those used by USB controllers
>>>> providing PS/2 keyboard emulation for pre-boot BIOS and
>>>> early BOOT or RMRRs associated to IGD working in legacy mode.
>>>>
>>>> Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
>>>> from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
>>>> RMRR on graphics devices too"), those regions are currently
>>>> considered "safe" with respect to device assignment use case
>>>> which requires a non direct mapping at IOMMU physical level
>>>> (RAM GPA -> HPA mapping).
>>>>
>>>> Those RMRRs currently exist and sometimes the device is
>>>> attempting to access it but this has not been considered
>>>> an issue until now.
>>>>
>>>> However at the moment, iommu_get_group_resv_regions() is
>>>> not able to make any difference between directly mapped
>>>> regions: those which must be absolutely enforced and those
>>>> like above ones which are known as relaxable.
>>>>
>>>> This is a blocker for reporting severe conflicts between
>>>> non relaxable RMRRs (like MSI doorbells) and guest GPA space.
>>>>
>>>> With this new reserved region type we will be able to use
>>>> iommu_get_group_resv_regions() to enumerate the IOVA space
>>>> that is usable through the IOMMU API without introducing
>>>> regressions with respect to existing device assignment
>>>> use cases (USB and IGD).
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> Note: At the moment the sysfs ABI is not changed. However I wonder
>>>> whether it wouldn't be preferable to report the direct region as
>>>> "direct_relaxed" there. At the moment, in case the same direct
>>>> region is used by 2 devices, one USB/GFX and another not belonging
>>>> to the previous categories, the direct region will be output twice
>>>> with "direct" type.
>>>
>>> Hmm, that sounds a bit off - if we have overlapping regions within the
>>> same domain, then we need to do some additional pre-processing to adjust
>>> them anyway, since any part of a relaxable region which overlaps a
>>> non-relaxable region cannot actually be relaxed, and so really should
>>> never be described as such.
>> In iommu_insert_resv_region(), we are overlapping regions of the same
>> type. So iommu_get_group_resv_regions() should return both the relaxable
>> region and non relaxable one. I should test this again using a hacked
>> kernel though.
> 
> We should still consider relaxable regions as being able to merge back
> in to regular direct regions to a degree - If a relaxable region falls
> entirely within a direct one then there's no point exposing it because
> the direct region *has* to take precedence and be enforced. If there is
> an incomplete overlap then we could possibly just trust consumers to see
> it and give the direct region precedence themselves, but since the
> relaxable region is our own in-kernel invention rather than firmware
> gospel I think it would be safer to truncate it to just its
> non-overlapping part.

I see your point and I agree. I will implement that.

Thanks

Eric
> 
> Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-16 17:53       ` Robin Murphy
  2019-05-17  7:11         ` Auger Eric
@ 2019-05-20 12:45         ` Auger Eric
  1 sibling, 0 replies; 22+ messages in thread
From: Auger Eric @ 2019-05-20 12:45 UTC (permalink / raw)
  To: Robin Murphy, eric.auger.pro, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, will.deacon, hanjun.guo, sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Hi Robin,

On 5/16/19 7:53 PM, Robin Murphy wrote:
> On 16/05/2019 14:23, Auger Eric wrote:
>> Hi Robin,
>> On 5/16/19 2:46 PM, Robin Murphy wrote:
>>> On 16/05/2019 11:08, Eric Auger wrote:
>>>> Introduce a new type for reserved region. This corresponds
>>>> to directly mapped regions which are known to be relaxable
>>>> in some specific conditions, such as device assignment use
>>>> case. Well known examples are those used by USB controllers
>>>> providing PS/2 keyboard emulation for pre-boot BIOS and
>>>> early BOOT or RMRRs associated to IGD working in legacy mode.
>>>>
>>>> Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
>>>> from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
>>>> RMRR on graphics devices too"), those regions are currently
>>>> considered "safe" with respect to device assignment use case
>>>> which requires a non direct mapping at IOMMU physical level
>>>> (RAM GPA -> HPA mapping).
>>>>
>>>> Those RMRRs currently exist and sometimes the device is
>>>> attempting to access it but this has not been considered
>>>> an issue until now.
>>>>
>>>> However at the moment, iommu_get_group_resv_regions() is
>>>> not able to make any difference between directly mapped
>>>> regions: those which must be absolutely enforced and those
>>>> like above ones which are known as relaxable.
>>>>
>>>> This is a blocker for reporting severe conflicts between
>>>> non relaxable RMRRs (like MSI doorbells) and guest GPA space.
>>>>
>>>> With this new reserved region type we will be able to use
>>>> iommu_get_group_resv_regions() to enumerate the IOVA space
>>>> that is usable through the IOMMU API without introducing
>>>> regressions with respect to existing device assignment
>>>> use cases (USB and IGD).
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> Note: At the moment the sysfs ABI is not changed. However I wonder
>>>> whether it wouldn't be preferable to report the direct region as
>>>> "direct_relaxed" there. At the moment, in case the same direct
>>>> region is used by 2 devices, one USB/GFX and another not belonging
>>>> to the previous categories, the direct region will be output twice
>>>> with "direct" type.
>>>
>>> Hmm, that sounds a bit off - if we have overlapping regions within the
>>> same domain, then we need to do some additional pre-processing to adjust
>>> them anyway, since any part of a relaxable region which overlaps a
>>> non-relaxable region cannot actually be relaxed, and so really should
>>> never be described as such.
>> In iommu_insert_resv_region(), we are overlapping regions of the same
>> type. So iommu_get_group_resv_regions() should return both the relaxable
>> region and non relaxable one. I should test this again using a hacked
>> kernel though.
> 
> We should still consider relaxable regions as being able to merge back
> in to regular direct regions to a degree - If a relaxable region falls
> entirely within a direct one then there's no point exposing it because
> the direct region *has* to take precedence and be enforced. If there is
> an incomplete overlap then we could possibly just trust consumers to see
> it and give the direct region precedence themselves, but since the
> relaxable region is our own in-kernel invention rather than firmware
> gospel I think it would be safer to truncate it to just its
> non-overlapping part.

While I understand your reasoning above, and I tend to agree this would
be nice to have, I have spent some time looking at the algo to
merge/split/overlay regions and I am afraid that this will bring a huge
complexity in the insertion function (I am also afraid the existing data
structs may not be well adapted overall). So I am now a bit reluctant to
add such complexty for shrinking relaxable regions that we will
eventually ignore in any case.

Thanks

Eric
> 
> Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-05-20 12:45 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 10:08 [PATCH v3 0/7] RMRR related fixes and enhancements Eric Auger
2019-05-16 10:08 ` [PATCH v3 1/7] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() Eric Auger
2019-05-16 10:08 ` [PATCH v3 2/7] iommu/vt-d: Duplicate iommu_resv_region objects per device list Eric Auger
2019-05-16 10:08 ` [PATCH v3 3/7] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper Eric Auger
2019-05-16 10:08 ` [PATCH v3 4/7] iommu/vt-d: Handle RMRR with PCI bridge device scopes Eric Auger
2019-05-16 10:08 ` [PATCH v3 5/7] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions Eric Auger
2019-05-16 10:08 ` [PATCH v3 6/7] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
2019-05-16 11:16   ` Jean-Philippe Brucker
2019-05-16 11:45     ` Auger Eric
2019-05-16 12:43       ` Jean-Philippe Brucker
2019-05-16 12:58         ` Auger Eric
2019-05-16 17:06           ` Alex Williamson
2019-05-16 17:23             ` Robin Murphy
2019-05-16 12:46   ` Robin Murphy
2019-05-16 13:23     ` Auger Eric
2019-05-16 16:53       ` Alex Williamson
2019-05-16 17:53       ` Robin Murphy
2019-05-17  7:11         ` Auger Eric
2019-05-20 12:45         ` Auger Eric
2019-05-16 10:08 ` [PATCH v3 7/7] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs Eric Auger
2019-05-17  4:46   ` Lu Baolu
2019-05-17  7:07     ` Auger Eric

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).