linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/8] RMRR related fixes and enhancements
@ 2019-05-27  8:55 Eric Auger
  2019-05-27  8:55 ` [PATCH v4 1/8] iommu: Fix a leak in iommu_insert_resv_region Eric Auger
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Currently the Intel reserved region is attached to the
RMRR unit and when building the list of RMRR seen by a device
we link this unique reserved region without taking care of
potential multiple usage of this reserved region by several devices.

Also while reading the vtd spec it is unclear to me whether
the RMRR device scope referenced by an RMRR ACPI struct could
be a PCI-PCI bridge, in which case I think we also need to
check the device belongs to the PCI sub-hierarchy of the device
referenced in the scope. This would be true for device_has_rmrr()
and intel_iommu_get_resv_regions().

Last, the VFIO subsystem would need to compute the usable IOVA range
by querying the iommu_get_group_resv_regions() API. This would allow,
for instance, to report potential conflicts between the guest physical
address space and host reserved regions.

However iommu_get_group_resv_regions() currently fails to differentiate
RMRRs that are known safe for device assignment and RMRRs that must be
enforced. So we introduce a new reserved memory region type (relaxable),
reported when associated to an USB or GFX device. The last 2 patches aim
at unblocking [1] which is stuck since 4.18.

[1-6] are fixes
[7-8] are enhancements

The two parts can be considered separately if needed.

References:
[1] [PATCH v6 0/7] vfio/type1: Add support for valid iova list management
    https://patchwork.kernel.org/patch/10425309/

Branch: This series is available at:
https://github.com/eauger/linux/tree/v5.2-rc2-rmrr-v4

History:

v3 -> v4:
- added "iommu: Fix a leak in iommu_insert_resv_region"
- introduced device_rmrr_is_relaxable and fixed to_pci_dev call
  without checking dev_is_pci
- Despite Robin suggested to hide direct relaxable behind direct
  ones, I think this would lead to a very complex implementation
  of iommu_insert_resv_region while in general the relaxable
  regions are going to be ignored by the caller. By the way I
  found a leak in this function, hence the new first patch

v2 -> v3:
s/||/&& in iommu_group_create_direct_mappings

v1 -> v2:
- introduce is_downstream_to_pci_bridge() in a separate patch, change param
  names and add kerneldoc comment
- add 6,7


Eric Auger (8):
  iommu: Fix a leak in iommu_insert_resv_region
  iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
  iommu/vt-d: Duplicate iommu_resv_region objects per device list
  iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
  iommu/vt-d: Handle RMRR with PCI bridge device scopes
  iommu/vt-d: Handle PCI bridge RMRR device scopes in
    intel_iommu_get_resv_regions
  iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  iommu/vt-d: Differentiate relaxable and non relaxable RMRRs

 .../ABI/testing/sysfs-kernel-iommu_groups     |   9 ++
 drivers/acpi/arm64/iort.c                     |   3 +-
 drivers/iommu/amd_iommu.c                     |   7 +-
 drivers/iommu/arm-smmu-v3.c                   |   2 +-
 drivers/iommu/arm-smmu.c                      |   2 +-
 drivers/iommu/intel-iommu.c                   | 127 ++++++++++++------
 drivers/iommu/iommu.c                         |  27 ++--
 include/linux/iommu.h                         |   8 +-
 8 files changed, 128 insertions(+), 57 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/8] iommu: Fix a leak in iommu_insert_resv_region
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  2019-05-27  8:55 ` [PATCH v4 2/8] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() Eric Auger
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

In case we expand an existing region, we unlink
this latter and insert the larger one. In
that case we should free the original region after
the insertion. Also we can immediately return.

Fixes: 6c65fb318e8b ("iommu: iommu_get_group_resv_regions")

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/iommu.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 67ee6623f9b2..f961f71e4ff8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -237,18 +237,21 @@ static int iommu_insert_resv_region(struct iommu_resv_region *new,
 			pos = pos->next;
 		} else if ((start >= a) && (end <= b)) {
 			if (new->type == type)
-				goto done;
+				return 0;
 			else
 				pos = pos->next;
 		} else {
 			if (new->type == type) {
 				phys_addr_t new_start = min(a, start);
 				phys_addr_t new_end = max(b, end);
+				int ret;
 
 				list_del(&entry->list);
 				entry->start = new_start;
 				entry->length = new_end - new_start + 1;
-				iommu_insert_resv_region(entry, regions);
+				ret = iommu_insert_resv_region(entry, regions);
+				kfree(entry);
+				return ret;
 			} else {
 				pos = pos->next;
 			}
@@ -261,7 +264,6 @@ static int iommu_insert_resv_region(struct iommu_resv_region *new,
 		return -ENOMEM;
 
 	list_add_tail(&region->list, pos);
-done:
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/8] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region()
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
  2019-05-27  8:55 ` [PATCH v4 1/8] iommu: Fix a leak in iommu_insert_resv_region Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  2019-05-27  8:55 ` [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list Eric Auger
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

We plan to call iommu_alloc_resv_region in a non preemptible section.
Pass a GFP flag to the function and update all the call sites.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/acpi/arm64/iort.c   | 3 ++-
 drivers/iommu/amd_iommu.c   | 7 ++++---
 drivers/iommu/arm-smmu-v3.c | 2 +-
 drivers/iommu/arm-smmu.c    | 2 +-
 drivers/iommu/intel-iommu.c | 4 ++--
 drivers/iommu/iommu.c       | 7 ++++---
 include/linux/iommu.h       | 2 +-
 7 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index b5390b4c9ade..2d922db4978e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -843,7 +843,8 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 			struct iommu_resv_region *region;
 
 			region = iommu_alloc_resv_region(base + SZ_64K, SZ_64K,
-							 prot, IOMMU_RESV_MSI);
+							 prot, IOMMU_RESV_MSI,
+							 GFP_KERNEL);
 			if (region) {
 				list_add_tail(&region->list, head);
 				resv++;
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 09c9e45f7fa2..f2eb8e9cd8a6 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3136,7 +3136,8 @@ static void amd_iommu_get_resv_regions(struct device *dev,
 			type = IOMMU_RESV_RESERVED;
 
 		region = iommu_alloc_resv_region(entry->address_start,
-						 length, prot, type);
+						 length, prot, type,
+						 GFP_KERNEL);
 		if (!region) {
 			dev_err(dev, "Out of memory allocating dm-regions\n");
 			return;
@@ -3146,14 +3147,14 @@ static void amd_iommu_get_resv_regions(struct device *dev,
 
 	region = iommu_alloc_resv_region(MSI_RANGE_START,
 					 MSI_RANGE_END - MSI_RANGE_START + 1,
-					 0, IOMMU_RESV_MSI);
+					 0, IOMMU_RESV_MSI, GFP_KERNEL);
 	if (!region)
 		return;
 	list_add_tail(&region->list, head);
 
 	region = iommu_alloc_resv_region(HT_RANGE_START,
 					 HT_RANGE_END - HT_RANGE_START + 1,
-					 0, IOMMU_RESV_RESERVED);
+					 0, IOMMU_RESV_RESERVED, GFP_KERNEL);
 	if (!region)
 		return;
 	list_add_tail(&region->list, head);
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4d5a694f02c2..f9b1279ef5bf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2226,7 +2226,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
 	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-					 prot, IOMMU_RESV_SW_MSI);
+					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
 	if (!region)
 		return;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0a28b3..646e76813e91 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1670,7 +1670,7 @@ static void arm_smmu_get_resv_regions(struct device *dev,
 	int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
 
 	region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
-					 prot, IOMMU_RESV_SW_MSI);
+					 prot, IOMMU_RESV_SW_MSI, GFP_KERNEL);
 	if (!region)
 		return;
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a209199f3af6..2be36dff189a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4220,7 +4220,7 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 
 	length = rmrr->end_address - rmrr->base_address + 1;
 	rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
-					      IOMMU_RESV_DIRECT);
+					      IOMMU_RESV_DIRECT, GFP_KERNEL);
 	if (!rmrru->resv)
 		goto free_rmrru;
 
@@ -5489,7 +5489,7 @@ static void intel_iommu_get_resv_regions(struct device *device,
 
 	reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
 				      IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
-				      0, IOMMU_RESV_MSI);
+				      0, IOMMU_RESV_MSI, GFP_KERNEL);
 	if (!reg)
 		return;
 	list_add_tail(&reg->list, head);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f961f71e4ff8..7dd1a57217e3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -259,7 +259,7 @@ static int iommu_insert_resv_region(struct iommu_resv_region *new,
 	}
 insert:
 	region = iommu_alloc_resv_region(new->start, new->length,
-					 new->prot, new->type);
+					 new->prot, new->type, GFP_KERNEL);
 	if (!region)
 		return -ENOMEM;
 
@@ -1893,11 +1893,12 @@ void iommu_put_resv_regions(struct device *dev, struct list_head *list)
 
 struct iommu_resv_region *iommu_alloc_resv_region(phys_addr_t start,
 						  size_t length, int prot,
-						  enum iommu_resv_type type)
+						  enum iommu_resv_type type,
+						  gfp_t flags)
 {
 	struct iommu_resv_region *region;
 
-	region = kzalloc(sizeof(*region), GFP_KERNEL);
+	region = kzalloc(sizeof(*region), flags);
 	if (!region)
 		return NULL;
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a815cf6f6f47..ba91666998fb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -364,7 +364,7 @@ extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
 extern int iommu_request_dm_for_dev(struct device *dev);
 extern struct iommu_resv_region *
 iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot,
-			enum iommu_resv_type type);
+			enum iommu_resv_type type, gfp_t flags);
 extern int iommu_get_group_resv_regions(struct iommu_group *group,
 					struct list_head *head);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
  2019-05-27  8:55 ` [PATCH v4 1/8] iommu: Fix a leak in iommu_insert_resv_region Eric Auger
  2019-05-27  8:55 ` [PATCH v4 2/8] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  2019-05-27 15:23   ` Joerg Roedel
  2019-05-27  8:55 ` [PATCH v4 4/8] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper Eric Auger
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

intel_iommu_get_resv_regions() aims to return the list of
reserved regions accessible by a given @device. However several
devices can access the same reserved memory region and when
building the list it is not safe to use a single iommu_resv_region
object, whose container is the RMRR. This iommu_resv_region must
be duplicated per device reserved region list.

Let's remove the struct iommu_resv_region from the RMRR unit
and allocate the iommu_resv_region directly in
intel_iommu_get_resv_regions().

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2be36dff189a..590a0e78d11d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -322,7 +322,6 @@ struct dmar_rmrr_unit {
 	u64	end_address;		/* reserved end address */
 	struct dmar_dev_scope *devices;	/* target devices */
 	int	devices_cnt;		/* target device count */
-	struct iommu_resv_region *resv; /* reserved region handle */
 };
 
 struct dmar_atsr_unit {
@@ -4205,7 +4204,6 @@ static inline void init_iommu_pm_ops(void) {}
 int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 {
 	struct acpi_dmar_reserved_memory *rmrr;
-	int prot = DMA_PTE_READ|DMA_PTE_WRITE;
 	struct dmar_rmrr_unit *rmrru;
 	size_t length;
 
@@ -4219,22 +4217,16 @@ int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 	rmrru->end_address = rmrr->end_address;
 
 	length = rmrr->end_address - rmrr->base_address + 1;
-	rmrru->resv = iommu_alloc_resv_region(rmrr->base_address, length, prot,
-					      IOMMU_RESV_DIRECT, GFP_KERNEL);
-	if (!rmrru->resv)
-		goto free_rmrru;
 
 	rmrru->devices = dmar_alloc_dev_scope((void *)(rmrr + 1),
 				((void *)rmrr) + rmrr->header.length,
 				&rmrru->devices_cnt);
 	if (rmrru->devices_cnt && rmrru->devices == NULL)
-		goto free_all;
+		goto free_rmrru;
 
 	list_add(&rmrru->list, &dmar_rmrr_units);
 
 	return 0;
-free_all:
-	kfree(rmrru->resv);
 free_rmrru:
 	kfree(rmrru);
 out:
@@ -4452,7 +4444,6 @@ static void intel_iommu_free_dmars(void)
 	list_for_each_entry_safe(rmrru, rmrr_n, &dmar_rmrr_units, list) {
 		list_del(&rmrru->list);
 		dmar_free_dev_scope(&rmrru->devices, &rmrru->devices_cnt);
-		kfree(rmrru->resv);
 		kfree(rmrru);
 	}
 
@@ -5470,6 +5461,7 @@ static void intel_iommu_remove_device(struct device *dev)
 static void intel_iommu_get_resv_regions(struct device *device,
 					 struct list_head *head)
 {
+	int prot = DMA_PTE_READ|DMA_PTE_WRITE;
 	struct iommu_resv_region *reg;
 	struct dmar_rmrr_unit *rmrr;
 	struct device *i_dev;
@@ -5479,10 +5471,21 @@ static void intel_iommu_get_resv_regions(struct device *device,
 	for_each_rmrr_units(rmrr) {
 		for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
 					  i, i_dev) {
+			struct iommu_resv_region *resv;
+			size_t length;
+
 			if (i_dev != device)
 				continue;
 
-			list_add_tail(&rmrr->resv->list, head);
+			length = rmrr->end_address - rmrr->base_address + 1;
+			resv = iommu_alloc_resv_region(rmrr->base_address,
+						       length, prot,
+						       IOMMU_RESV_DIRECT,
+						       GFP_ATOMIC);
+			if (!resv)
+				break;
+
+			list_add_tail(&resv->list, head);
 		}
 	}
 	rcu_read_unlock();
@@ -5500,10 +5503,8 @@ static void intel_iommu_put_resv_regions(struct device *dev,
 {
 	struct iommu_resv_region *entry, *next;
 
-	list_for_each_entry_safe(entry, next, head, list) {
-		if (entry->type == IOMMU_RESV_MSI)
-			kfree(entry);
-	}
+	list_for_each_entry_safe(entry, next, head, list)
+		kfree(entry);
 }
 
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/8] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
                   ` (2 preceding siblings ...)
  2019-05-27  8:55 ` [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  2019-05-27  8:55 ` [PATCH v4 5/8] iommu/vt-d: Handle RMRR with PCI bridge device scopes Eric Auger
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Several call sites are about to check whether a device belongs
to the PCI sub-hierarchy of a candidate PCI-PCI bridge.
Introduce an helper to perform that check.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 590a0e78d11d..15c2f9677491 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -736,12 +736,39 @@ static int iommu_dummy(struct device *dev)
 	return dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO;
 }
 
+/* is_downstream_to_pci_bridge - test if a device belongs to the
+ * PCI sub-hierarchy of a candidate PCI-PCI bridge
+ *
+ * @dev: candidate PCI device belonging to @bridge PCI sub-hierarchy
+ * @bridge: the candidate PCI-PCI bridge
+ *
+ * Return: true if @dev belongs to @bridge PCI sub-hierarchy
+ */
+static bool
+is_downstream_to_pci_bridge(struct device *dev, struct device *bridge)
+{
+	struct pci_dev *pdev, *pbridge;
+
+	if (!dev_is_pci(dev) || !dev_is_pci(bridge))
+		return false;
+
+	pdev = to_pci_dev(dev);
+	pbridge = to_pci_dev(bridge);
+
+	if (pbridge->subordinate &&
+	    pbridge->subordinate->number <= pdev->bus->number &&
+	    pbridge->subordinate->busn_res.end >= pdev->bus->number)
+		return true;
+
+	return false;
+}
+
 static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn)
 {
 	struct dmar_drhd_unit *drhd = NULL;
 	struct intel_iommu *iommu;
 	struct device *tmp;
-	struct pci_dev *ptmp, *pdev = NULL;
+	struct pci_dev *pdev = NULL;
 	u16 segment = 0;
 	int i;
 
@@ -787,13 +814,7 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf
 				goto out;
 			}
 
-			if (!pdev || !dev_is_pci(tmp))
-				continue;
-
-			ptmp = to_pci_dev(tmp);
-			if (ptmp->subordinate &&
-			    ptmp->subordinate->number <= pdev->bus->number &&
-			    ptmp->subordinate->busn_res.end >= pdev->bus->number)
+			if (is_downstream_to_pci_bridge(dev, tmp))
 				goto got_pdev;
 		}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/8] iommu/vt-d: Handle RMRR with PCI bridge device scopes
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
                   ` (3 preceding siblings ...)
  2019-05-27  8:55 ` [PATCH v4 4/8] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  2019-05-27  8:55 ` [PATCH v4 6/8] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions Eric Auger
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

When reading the vtd specification and especially the
Reserved Memory Region Reporting Structure chapter,
it is not obvious a device scope element cannot be a
PCI-PCI bridge, in which case all downstream ports are
likely to access the reserved memory region. Let's handle
this case in device_has_rmrr.

Fixes: ea2447f700ca ("intel-iommu: Prevent devices with RMRRs from being placed into SI Domain")

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v1 -> v2:
- is_downstream_to_pci_bridge helper introduced in a separate patch
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 15c2f9677491..7ed820e79313 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2910,7 +2910,8 @@ static bool device_has_rmrr(struct device *dev)
 		 */
 		for_each_active_dev_scope(rmrr->devices,
 					  rmrr->devices_cnt, i, tmp)
-			if (tmp == dev) {
+			if (tmp == dev ||
+			    is_downstream_to_pci_bridge(dev, tmp)) {
 				rcu_read_unlock();
 				return true;
 			}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 6/8] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
                   ` (4 preceding siblings ...)
  2019-05-27  8:55 ` [PATCH v4 5/8] iommu/vt-d: Handle RMRR with PCI bridge device scopes Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  2019-05-27  8:55 ` [PATCH v4 7/8] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
  2019-05-27  8:55 ` [PATCH v4 8/8] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs Eric Auger
  7 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

In the case the RMRR device scope is a PCI-PCI bridge, let's check
the device belongs to the PCI sub-hierarchy.

Fixes: 0659b8dc45a6 ("iommu/vt-d: Implement reserved region get/put callbacks")

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 7ed820e79313..a36604f4900f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5496,7 +5496,8 @@ static void intel_iommu_get_resv_regions(struct device *device,
 			struct iommu_resv_region *resv;
 			size_t length;
 
-			if (i_dev != device)
+			if (i_dev != device &&
+			    !is_downstream_to_pci_bridge(device, i_dev))
 				continue;
 
 			length = rmrr->end_address - rmrr->base_address + 1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 7/8] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
                   ` (5 preceding siblings ...)
  2019-05-27  8:55 ` [PATCH v4 6/8] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  2019-05-27  8:55 ` [PATCH v4 8/8] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs Eric Auger
  7 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Introduce a new type for reserved region. This corresponds
to directly mapped regions which are known to be relaxable
in some specific conditions, such as device assignment use
case. Well known examples are those used by USB controllers
providing PS/2 keyboard emulation for pre-boot BIOS and
early BOOT or RMRRs associated to IGD working in legacy mode.

Since commit c875d2c1b808 ("iommu/vt-d: Exclude devices using RMRRs
from IOMMU API domains") and commit 18436afdc11a ("iommu/vt-d: Allow
RMRR on graphics devices too"), those regions are currently
considered "safe" with respect to device assignment use case
which requires a non direct mapping at IOMMU physical level
(RAM GPA -> HPA mapping).

Those RMRRs currently exist and sometimes the device is
attempting to access it but this has not been considered
an issue until now.

However at the moment, iommu_get_group_resv_regions() is
not able to make any difference between directly mapped
regions: those which must be absolutely enforced and those
like above ones which are known as relaxable.

This is a blocker for reporting severe conflicts between
non relaxable RMRRs (like MSI doorbells) and guest GPA space.

With this new reserved region type we will be able to use
iommu_get_group_resv_regions() to enumerate the IOVA space
that is usable through the IOMMU API without introducing
regressions with respect to existing device assignment
use cases (USB and IGD).

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v3 -> v4:
- expose the relaxable regions as "direct-relaxable" in the sysfs
- update ABI documentation

v2 -> v3:
- fix direct type check
---
 Documentation/ABI/testing/sysfs-kernel-iommu_groups |  9 +++++++++
 drivers/iommu/iommu.c                               | 12 +++++++-----
 include/linux/iommu.h                               |  6 ++++++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index 35c64e00b35c..017f5bc3920c 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -24,3 +24,12 @@ Description:    /sys/kernel/iommu_groups/reserved_regions list IOVA
 		region is described on a single line: the 1st field is
 		the base IOVA, the second is the end IOVA and the third
 		field describes the type of the region.
+
+What:		/sys/kernel/iommu_groups/reserved_regions
+Date: 		June 2019
+KernelVersion:  v5.3
+Contact: 	Eric Auger <eric.auger@redhat.com>
+Description:    In case an RMRR is used only by graphics or USB devices
+		it is now exposed as "direct-relaxable" instead of "direct".
+		In device assignment use case, for instance, those RMRR
+		are considered to be relaxable and safe.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7dd1a57217e3..130a6936d6c7 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -73,10 +73,11 @@ struct iommu_group_attribute {
 };
 
 static const char * const iommu_group_resv_type_string[] = {
-	[IOMMU_RESV_DIRECT]	= "direct",
-	[IOMMU_RESV_RESERVED]	= "reserved",
-	[IOMMU_RESV_MSI]	= "msi",
-	[IOMMU_RESV_SW_MSI]	= "msi",
+	[IOMMU_RESV_DIRECT]			= "direct",
+	[IOMMU_RESV_DIRECT_RELAXABLE]		= "direct-relaxable",
+	[IOMMU_RESV_RESERVED]			= "reserved",
+	[IOMMU_RESV_MSI]			= "msi",
+	[IOMMU_RESV_SW_MSI]			= "msi",
 };
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)		\
@@ -575,7 +576,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
 		start = ALIGN(entry->start, pg_size);
 		end   = ALIGN(entry->start + entry->length, pg_size);
 
-		if (entry->type != IOMMU_RESV_DIRECT)
+		if (entry->type != IOMMU_RESV_DIRECT &&
+		    entry->type != IOMMU_RESV_DIRECT_RELAXABLE)
 			continue;
 
 		for (addr = start; addr < end; addr += pg_size) {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ba91666998fb..14a521f85f14 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -135,6 +135,12 @@ enum iommu_attr {
 enum iommu_resv_type {
 	/* Memory regions which must be mapped 1:1 at all times */
 	IOMMU_RESV_DIRECT,
+	/*
+	 * Memory regions which are advertised to be 1:1 but are
+	 * commonly considered relaxable in some conditions,
+	 * for instance in device assignment use case (USB, Graphics)
+	 */
+	IOMMU_RESV_DIRECT_RELAXABLE,
 	/* Arbitrary "never map this or give it to a device" address ranges */
 	IOMMU_RESV_RESERVED,
 	/* Hardware MSI region (untranslated) */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 8/8] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs
  2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
                   ` (6 preceding siblings ...)
  2019-05-27  8:55 ` [PATCH v4 7/8] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
@ 2019-05-27  8:55 ` Eric Auger
  7 siblings, 0 replies; 11+ messages in thread
From: Eric Auger @ 2019-05-27  8:55 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, joro, iommu, linux-kernel, dwmw2,
	lorenzo.pieralisi, robin.murphy, will.deacon, hanjun.guo,
	sudeep.holla
  Cc: alex.williamson, shameerali.kolothum.thodi

Now we have a new IOMMU_RESV_DIRECT_RELAXABLE reserved memory
region type, let's report USB and GFX RMRRs as relaxable ones.

We introduce a new device_rmrr_is_relaxable() helper to check
whether the rmrr belongs to the relaxable category.

This allows to have a finer reporting at IOMMU API level of
reserved memory regions. This will be exploitable by VFIO to
define the usable IOVA range and detect potential conflicts
between the guest physical address space and host reserved
regions.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v3 -> v4:
- introduce device_rmrr_is_relaxable and reshuffle the comments
---
 drivers/iommu/intel-iommu.c | 55 +++++++++++++++++++++++++++----------
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a36604f4900f..ed24a11accb8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2920,6 +2920,36 @@ static bool device_has_rmrr(struct device *dev)
 	return false;
 }
 
+/*
+ * device_rmrr_is_relaxable - Test whether the RMRR of this device
+ * is relaxable (ie. is allowed to be not enforced under some conditions)
+ *
+ * @dev: device handle
+ *
+ * We assume that PCI USB devices with RMRRs have them largely
+ * for historical reasons and that the RMRR space is not actively used post
+ * boot.  This exclusion may change if vendors begin to abuse it.
+ *
+ * The same exception is made for graphics devices, with the requirement that
+ * any use of the RMRR regions will be torn down before assigning the device
+ * to a guest.
+ *
+ * Return: true if the RMRR is relaxable
+ */
+static bool device_rmrr_is_relaxable(struct device *dev)
+{
+	struct pci_dev *pdev;
+
+	if (!dev_is_pci(dev))
+		return false;
+
+	pdev = to_pci_dev(dev);
+	if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
+		return true;
+	else
+		return false;
+}
+
 /*
  * There are a couple cases where we need to restrict the functionality of
  * devices associated with RMRRs.  The first is when evaluating a device for
@@ -2934,25 +2964,16 @@ static bool device_has_rmrr(struct device *dev)
  * We therefore prevent devices associated with an RMRR from participating in
  * the IOMMU API, which eliminates them from device assignment.
  *
- * In both cases we assume that PCI USB devices with RMRRs have them largely
- * for historical reasons and that the RMRR space is not actively used post
- * boot.  This exclusion may change if vendors begin to abuse it.
- *
- * The same exception is made for graphics devices, with the requirement that
- * any use of the RMRR regions will be torn down before assigning the device
- * to a guest.
+ * In both cases, devices which have relaxable RMRRs are not concerned by this
+ * restriction. See device_rmrr_is_relaxable comment.
  */
 static bool device_is_rmrr_locked(struct device *dev)
 {
 	if (!device_has_rmrr(dev))
 		return false;
 
-	if (dev_is_pci(dev)) {
-		struct pci_dev *pdev = to_pci_dev(dev);
-
-		if (IS_USB_DEVICE(pdev) || IS_GFX_DEVICE(pdev))
-			return false;
-	}
+	if (device_rmrr_is_relaxable(dev))
+		return false;
 
 	return true;
 }
@@ -5494,6 +5515,7 @@ static void intel_iommu_get_resv_regions(struct device *device,
 		for_each_active_dev_scope(rmrr->devices, rmrr->devices_cnt,
 					  i, i_dev) {
 			struct iommu_resv_region *resv;
+			enum iommu_resv_type type;
 			size_t length;
 
 			if (i_dev != device &&
@@ -5501,9 +5523,12 @@ static void intel_iommu_get_resv_regions(struct device *device,
 				continue;
 
 			length = rmrr->end_address - rmrr->base_address + 1;
+
+			type = device_rmrr_is_relaxable(device) ?
+				IOMMU_RESV_DIRECT_RELAXABLE : IOMMU_RESV_DIRECT;
+
 			resv = iommu_alloc_resv_region(rmrr->base_address,
-						       length, prot,
-						       IOMMU_RESV_DIRECT,
+						       length, prot, type,
 						       GFP_ATOMIC);
 			if (!resv)
 				break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list
  2019-05-27  8:55 ` [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list Eric Auger
@ 2019-05-27 15:23   ` Joerg Roedel
  2019-05-28 11:51     ` Auger Eric
  0 siblings, 1 reply; 11+ messages in thread
From: Joerg Roedel @ 2019-05-27 15:23 UTC (permalink / raw)
  To: Eric Auger
  Cc: eric.auger.pro, iommu, linux-kernel, dwmw2, lorenzo.pieralisi,
	robin.murphy, will.deacon, hanjun.guo, sudeep.holla,
	alex.williamson, shameerali.kolothum.thodi

On Mon, May 27, 2019 at 10:55:36AM +0200, Eric Auger wrote:
> -			list_add_tail(&rmrr->resv->list, head);
> +			length = rmrr->end_address - rmrr->base_address + 1;
> +			resv = iommu_alloc_resv_region(rmrr->base_address,
> +						       length, prot,
> +						       IOMMU_RESV_DIRECT,
> +						       GFP_ATOMIC);
> +			if (!resv)
> +				break;
> +
> +			list_add_tail(&resv->list, head);

Okay, so this happens in a rcu_read_locked section and must be atomic,
but I don't like this extra parameter to iommu_alloc_resv_region().

How about replacing the rcu-lock with the dmar_global_lock, which
protects against changes to the global rmrr list? This will make this
loop preemptible and taking the global lock is okay because this
function is in no way performance relevant.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list
  2019-05-27 15:23   ` Joerg Roedel
@ 2019-05-28 11:51     ` Auger Eric
  0 siblings, 0 replies; 11+ messages in thread
From: Auger Eric @ 2019-05-28 11:51 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: eric.auger.pro, iommu, linux-kernel, dwmw2, lorenzo.pieralisi,
	robin.murphy, will.deacon, hanjun.guo, sudeep.holla,
	alex.williamson, shameerali.kolothum.thodi

Hi Joerg,

On 5/27/19 5:23 PM, Joerg Roedel wrote:
> On Mon, May 27, 2019 at 10:55:36AM +0200, Eric Auger wrote:
>> -			list_add_tail(&rmrr->resv->list, head);
>> +			length = rmrr->end_address - rmrr->base_address + 1;
>> +			resv = iommu_alloc_resv_region(rmrr->base_address,
>> +						       length, prot,
>> +						       IOMMU_RESV_DIRECT,
>> +						       GFP_ATOMIC);
>> +			if (!resv)
>> +				break;
>> +
>> +			list_add_tail(&resv->list, head);
> 
> Okay, so this happens in a rcu_read_locked section and must be atomic,
> but I don't like this extra parameter to iommu_alloc_resv_region().
> 
> How about replacing the rcu-lock with the dmar_global_lock, which
> protects against changes to the global rmrr list? This will make this
> loop preemptible and taking the global lock is okay because this
> function is in no way performance relevant.

After studying in more details the for_each_active_dev_scope macro and
rcu_dereference_check it looks OK to me. I respinned accordingly.

Thanks

Eric
> 
> Regards,
> 
> 	Joerg
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-05-28 11:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-27  8:55 [PATCH v4 0/8] RMRR related fixes and enhancements Eric Auger
2019-05-27  8:55 ` [PATCH v4 1/8] iommu: Fix a leak in iommu_insert_resv_region Eric Auger
2019-05-27  8:55 ` [PATCH v4 2/8] iommu: Pass a GFP flag parameter to iommu_alloc_resv_region() Eric Auger
2019-05-27  8:55 ` [PATCH v4 3/8] iommu/vt-d: Duplicate iommu_resv_region objects per device list Eric Auger
2019-05-27 15:23   ` Joerg Roedel
2019-05-28 11:51     ` Auger Eric
2019-05-27  8:55 ` [PATCH v4 4/8] iommu/vt-d: Introduce is_downstream_to_pci_bridge helper Eric Auger
2019-05-27  8:55 ` [PATCH v4 5/8] iommu/vt-d: Handle RMRR with PCI bridge device scopes Eric Auger
2019-05-27  8:55 ` [PATCH v4 6/8] iommu/vt-d: Handle PCI bridge RMRR device scopes in intel_iommu_get_resv_regions Eric Auger
2019-05-27  8:55 ` [PATCH v4 7/8] iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions Eric Auger
2019-05-27  8:55 ` [PATCH v4 8/8] iommu/vt-d: Differentiate relaxable and non relaxable RMRRs Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).