linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 00/10] iommu: Bounce page for untrusted devices
@ 2019-07-25  3:17 Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma Lu Baolu
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu

The Thunderbolt vulnerabilities are public and have a nice
name as Thunderclap [1] [3] nowadays. This patch series aims
to mitigate those concerns.

An external PCI device is a PCI peripheral device connected
to the system through an external bus, such as Thunderbolt.
What makes it different is that it can't be trusted to the
same degree as the devices build into the system. Generally,
a trusted PCIe device will DMA into the designated buffers
and not overrun or otherwise write outside the specified
bounds. But it's different for an external device.

The minimum IOMMU mapping granularity is one page (4k), so
for DMA transfers smaller than that a malicious PCIe device
can access the whole page of memory even if it does not
belong to the driver in question. This opens a possibility
for DMA attack. For more information about DMA attacks
imposed by an untrusted PCI/PCIe device, please refer to [2].

This implements bounce buffer for the untrusted external
devices. The transfers should be limited in isolated pages
so the IOMMU window does not cover memory outside of what
the driver expects. Previously (v3 and before), we proposed
an optimisation to only copy the head and tail of the buffer
if it spans multiple pages, and directly map the ones in the
middle. Figure 1 gives a big picture about this solution.

                                swiotlb             System
                IOVA          bounce page           Memory
             .---------.      .---------.        .---------.
             |         |      |         |        |         |
             |         |      |         |        |         |
buffer_start .---------.      .---------.        .---------.
             |         |----->|         |*******>|         |
             |         |      |         | swiotlb|         |
             |         |      |         | mapping|         |
 IOMMU Page  '---------'      '---------'        '---------'
  Boundary   |         |                         |         |
             |         |                         |         |
             |         |                         |         |
             |         |------------------------>|         |
             |         |    IOMMU mapping        |         |
             |         |                         |         |
 IOMMU Page  .---------.                         .---------.
  Boundary   |         |                         |         |
             |         |                         |         |
             |         |------------------------>|         |
             |         |     IOMMU mapping       |         |
             |         |                         |         |
             |         |                         |         |
 IOMMU Page  .---------.      .---------.        .---------.
  Boundary   |         |      |         |        |         |
             |         |      |         |        |         |
             |         |----->|         |*******>|         |
  buffer_end '---------'      '---------' swiotlb'---------'
             |         |      |         | mapping|         |
             |         |      |         |        |         |
             '---------'      '---------'        '---------'
          Figure 1: A big view of iommu bounce page 

As Robin Murphy pointed out, this ties us to using strict mode for
TLB maintenance, which may not be an overall win depending on the
balance between invalidation bandwidth vs. memcpy bandwidth. If we
use standard SWIOTLB logic to always copy the whole thing, we should
be able to release the bounce pages via the flush queue to allow
'safe' lazy unmaps. So since v4 we start to use the standard swiotlb
logic.

                                swiotlb             System
                IOVA          bounce page           Memory
buffer_start .---------.      .---------.        .---------.
             |         |      |         |        |         |
             |         |      |         |        |         |
             |         |      |         |        .---------.physical
             |         |----->|         | ------>|         |_start  
             |         |iommu |         | swiotlb|         |
             |         | map  |         |   map  |         |
 IOMMU Page  .---------.      .---------.        '---------'
  Boundary   |         |      |         |        |         |
             |         |      |         |        |         |
             |         |----->|         |        |         |
             |         |iommu |         |        |         |
             |         | map  |         |        |         |
             |         |      |         |        |         |
 IOMMU Page  .---------.      .---------.        .---------.
  Boundary   |         |      |         |        |         |
             |         |----->|         |        |         |
             |         |iommu |         |        |         |
             |         | map  |         |        |         |
             |         |      |         |        |         |
 IOMMU Page  |         |      |         |        |         |
  Boundary   .---------.      .---------.        .---------.
             |         |      |         |------->|         |
  buffer_end '---------'      '---------' swiotlb|         |
             |         |----->|         |   map  |         |
             |         |iommu |         |        |         |
             |         | map  |         |        '---------' physical
             |         |      |         |        |         | _end    
             '---------'      '---------'        '---------'
          Figure 2: A big view of simplified iommu bounce page 

The implementation of bounce buffers for untrusted devices
will cause a little performance overhead, but we didn't see
any user experience problems. The users could use the kernel
parameter defined in the IOMMU driver to remove the performance
overhead if they trust their devices enough.

This series introduces below APIs for bounce page:

 * iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
   - Map a buffer start at DMA address @addr in bounce page
     manner. For buffer that doesn't cross whole minimal
     IOMMU pages, the bounce buffer policy is applied.
     A bounce page mapped by swiotlb will be used as the DMA
     target in the IOMMU page table.
 
 * iommu_bounce_unmap(dev, addr, size, dir, attrs)
   - Unmap the buffer mapped with iommu_bounce_map(). The bounce
     page will be torn down after the bounced data get synced.
 
 * iommu_bounce_sync_single(dev, addr, size, dir, target)
   - Synce the bounced data in case the bounce mapped buffer is
     reused.

The bounce page idea:
Based-on-idea-by: Mika Westerberg <mika.westerberg@intel.com>
Based-on-idea-by: Ashok Raj <ashok.raj@intel.com>
Based-on-idea-by: Alan Cox <alan.cox@intel.com>
Based-on-idea-by: Kevin Tian <kevin.tian@intel.com>
Based-on-idea-by: Robin Murphy <robin.murphy@arm.com>

The patch series has been tested by:
Tested-by: Xu Pengfei <pengfei.xu@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@intel.com>

Reference:
[1] https://thunderclap.io/
[2] https://thunderclap.io/thunderclap-paper-ndss2019.pdf
[3] https://christian.kellner.me/2019/02/27/thunderclap-and-linux/
[4] https://lkml.org/lkml/2019/3/4/644

Best regards,
Baolu

Change log:
  v4->v5:
  - The previous v4 was posted here:
    https://lkml.org/lkml/2019/6/2/187
  - Add per-device dma ops and use bounce buffer specific dma
    ops for those untrusted devices.
      devices with identity domains	-> system default dma ops
      trusted devices with dma domains	-> iommu/vt-d dma ops
      untrusted devices		 	-> bounced dma ops
  - Address various review comments received since v4.
  - This patch series is based on v5.3-rc1.

  v3->v4:
  - The previous v3 was posted here:
    https://lkml.org/lkml/2019/4/20/213
  - Discard the optimization of only mapping head and tail
    partial pages, use the standard swiotlb in order to achieve
    iotlb flush efficiency.
  - This patch series is based on the top of the vt-d branch of
    Joerg's iommu tree.

  v2->v3:
  - The previous v2 was posed here:
    https://lkml.org/lkml/2019/3/27/157
  - Reuse the existing swiotlb APIs for bounce buffer by
    extending it to support bounce page.
  - Move the bouce page APIs into iommu generic layer.
  - This patch series is based on 5.1-rc1.

  v1->v2:
  - The previous v1 was posted here:
    https://lkml.org/lkml/2019/3/12/66
  - Refactor the code to remove struct bounce_param;
  - During the v1 review cycle, we discussed the possibility
    of reusing swiotlb code to avoid code dumplication, but
    we found the swiotlb implementations are not ready for the
    use of bounce page pool.
    https://lkml.org/lkml/2019/3/19/259
  - This patch series has been rebased to v5.1-rc2.

Lu Baolu (10):
  iommu/vt-d: Don't switch off swiotlb if use direct dma
  iommu/vt-d: Use per-device dma_ops
  iommu/vt-d: Cleanup after use per-device dma ops
  PCI: Add dev_is_untrusted helper
  swiotlb: Split size parameter to map/unmap APIs
  swiotlb: Zero out bounce buffer for untrusted device
  iommu: Add bounce page APIs
  iommu/vt-d: Check whether device requires bounce buffer
  iommu/vt-d: Add trace events for device dma map/unmap
  iommu/vt-d: Use bounce buffer for untrusted devices

 .../admin-guide/kernel-parameters.txt         |   5 +
 drivers/iommu/Kconfig                         |  15 +
 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/intel-iommu.c                   | 287 ++++++++++++------
 drivers/iommu/intel-trace.c                   |  14 +
 drivers/iommu/iommu.c                         | 118 +++++++
 drivers/xen/swiotlb-xen.c                     |   8 +-
 include/linux/iommu.h                         |  35 +++
 include/linux/pci.h                           |   2 +
 include/linux/swiotlb.h                       |   8 +-
 include/trace/events/intel_iommu.h            |  95 ++++++
 kernel/dma/direct.c                           |   2 +-
 kernel/dma/swiotlb.c                          |  30 +-
 13 files changed, 516 insertions(+), 104 deletions(-)
 create mode 100644 drivers/iommu/intel-trace.c
 create mode 100644 include/trace/events/intel_iommu.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25  5:41   ` Christoph Hellwig
  2019-07-25  3:17 ` [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops Lu Baolu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu, Jacob Pan

The direct dma implementation depends on swiotlb. Hence, don't
switch off swiotlb since direct dma interfaces are used in this
driver.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bdaed2da8a55..8064af607d3b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4568,9 +4568,6 @@ static int __init platform_optin_force_iommu(void)
 		iommu_identity_mapping |= IDENTMAP_ALL;
 
 	dmar_disabled = 0;
-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
-	swiotlb = 0;
-#endif
 	no_iommu = 0;
 
 	return 1;
@@ -4709,9 +4706,6 @@ int __init intel_iommu_init(void)
 	}
 	up_write(&dmar_global_lock);
 
-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
-	swiotlb = 0;
-#endif
 	dma_ops = &intel_dma_ops;
 
 	init_iommu_pm_ops();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25  5:44   ` Christoph Hellwig
  2019-07-25  3:17 ` [PATCH v5 03/10] iommu/vt-d: Cleanup after use per-device dma ops Lu Baolu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu, Jacob Pan

Current Intel IOMMU driver sets the system level dma_ops hence
each dma API will go through the IOMMU driver even the devices
are using an identity mapped domain. This applies per-device
dma_ops in this driver and leave the system level dma_ops for
direct dma.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>\
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 43 ++++++-------------------------------
 1 file changed, 7 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 8064af607d3b..11474bd2e348 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3419,43 +3419,10 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev)
 /* Check if the dev needs to go through non-identity map and unmap process.*/
 static bool iommu_need_mapping(struct device *dev)
 {
-	int ret;
-
 	if (iommu_dummy(dev))
 		return false;
 
-	ret = identity_mapping(dev);
-	if (ret) {
-		u64 dma_mask = *dev->dma_mask;
-
-		if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
-			dma_mask = dev->coherent_dma_mask;
-
-		if (dma_mask >= dma_get_required_mask(dev))
-			return false;
-
-		/*
-		 * 32 bit DMA is removed from si_domain and fall back to
-		 * non-identity mapping.
-		 */
-		dmar_remove_one_dev_info(dev);
-		ret = iommu_request_dma_domain_for_dev(dev);
-		if (ret) {
-			struct iommu_domain *domain;
-			struct dmar_domain *dmar_domain;
-
-			domain = iommu_get_domain_for_dev(dev);
-			if (domain) {
-				dmar_domain = to_dmar_domain(domain);
-				dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN;
-			}
-			get_private_domain_for_dev(dev);
-		}
-
-		dev_info(dev, "32bit DMA uses non-identity mapping\n");
-	}
-
-	return true;
+	return !identity_mapping(dev);
 }
 
 static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
@@ -4706,8 +4673,6 @@ int __init intel_iommu_init(void)
 	}
 	up_write(&dmar_global_lock);
 
-	dma_ops = &intel_dma_ops;
-
 	init_iommu_pm_ops();
 
 	for_each_active_iommu(iommu, drhd) {
@@ -5280,6 +5245,8 @@ static int intel_iommu_add_device(struct device *dev)
 				dev_info(dev,
 					 "Device uses a private identity domain.\n");
 			}
+		} else {
+			set_dma_ops(dev, &intel_dma_ops);
 		}
 	} else {
 		if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
@@ -5295,6 +5262,8 @@ static int intel_iommu_add_device(struct device *dev)
 				dev_info(dev,
 					 "Device uses a private dma domain.\n");
 			}
+
+			set_dma_ops(dev, &intel_dma_ops);
 		}
 	}
 
@@ -5313,6 +5282,8 @@ static void intel_iommu_remove_device(struct device *dev)
 	iommu_group_remove_device(dev);
 
 	iommu_device_unlink(&iommu->iommu, dev);
+
+	set_dma_ops(dev, NULL);
 }
 
 static void intel_iommu_get_resv_regions(struct device *device,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 03/10] iommu/vt-d: Cleanup after use per-device dma ops
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 04/10] PCI: Add dev_is_untrusted helper Lu Baolu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu, Jacob Pan

After using per-device dma ops, we don't need to check whether
a dvice needs mapping, hence all checks of iommu_need_mapping()
are unnecessary now. Cleanup them.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 50 ++++---------------------------------
 1 file changed, 5 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 11474bd2e348..a458df975c55 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2749,17 +2749,6 @@ static int __init si_domain_init(int hw)
 	return 0;
 }
 
-static int identity_mapping(struct device *dev)
-{
-	struct device_domain_info *info;
-
-	info = dev->archdata.iommu;
-	if (info && info != DUMMY_DEVICE_DOMAIN_INFO)
-		return (info->domain == si_domain);
-
-	return 0;
-}
-
 static int domain_add_dev_info(struct dmar_domain *domain, struct device *dev)
 {
 	struct dmar_domain *ndomain;
@@ -3416,15 +3405,6 @@ static struct dmar_domain *get_private_domain_for_dev(struct device *dev)
 	return domain;
 }
 
-/* Check if the dev needs to go through non-identity map and unmap process.*/
-static bool iommu_need_mapping(struct device *dev)
-{
-	if (iommu_dummy(dev))
-		return false;
-
-	return !identity_mapping(dev);
-}
-
 static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
 				     size_t size, int dir, u64 dma_mask)
 {
@@ -3486,20 +3466,15 @@ static dma_addr_t intel_map_page(struct device *dev, struct page *page,
 				 enum dma_data_direction dir,
 				 unsigned long attrs)
 {
-	if (iommu_need_mapping(dev))
-		return __intel_map_single(dev, page_to_phys(page) + offset,
-				size, dir, *dev->dma_mask);
-	return dma_direct_map_page(dev, page, offset, size, dir, attrs);
+	return __intel_map_single(dev, page_to_phys(page) + offset,
+				  size, dir, *dev->dma_mask);
 }
 
 static dma_addr_t intel_map_resource(struct device *dev, phys_addr_t phys_addr,
 				     size_t size, enum dma_data_direction dir,
 				     unsigned long attrs)
 {
-	if (iommu_need_mapping(dev))
-		return __intel_map_single(dev, phys_addr, size, dir,
-				*dev->dma_mask);
-	return dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
+	return __intel_map_single(dev, phys_addr, size, dir, *dev->dma_mask);
 }
 
 static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size)
@@ -3551,17 +3526,13 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
 			     size_t size, enum dma_data_direction dir,
 			     unsigned long attrs)
 {
-	if (iommu_need_mapping(dev))
-		intel_unmap(dev, dev_addr, size);
-	else
-		dma_direct_unmap_page(dev, dev_addr, size, dir, attrs);
+	intel_unmap(dev, dev_addr, size);
 }
 
 static void intel_unmap_resource(struct device *dev, dma_addr_t dev_addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-	if (iommu_need_mapping(dev))
-		intel_unmap(dev, dev_addr, size);
+	intel_unmap(dev, dev_addr, size);
 }
 
 static void *intel_alloc_coherent(struct device *dev, size_t size,
@@ -3571,9 +3542,6 @@ static void *intel_alloc_coherent(struct device *dev, size_t size,
 	struct page *page = NULL;
 	int order;
 
-	if (!iommu_need_mapping(dev))
-		return dma_direct_alloc(dev, size, dma_handle, flags, attrs);
-
 	size = PAGE_ALIGN(size);
 	order = get_order(size);
 
@@ -3607,9 +3575,6 @@ static void intel_free_coherent(struct device *dev, size_t size, void *vaddr,
 	int order;
 	struct page *page = virt_to_page(vaddr);
 
-	if (!iommu_need_mapping(dev))
-		return dma_direct_free(dev, size, vaddr, dma_handle, attrs);
-
 	size = PAGE_ALIGN(size);
 	order = get_order(size);
 
@@ -3627,9 +3592,6 @@ static void intel_unmap_sg(struct device *dev, struct scatterlist *sglist,
 	struct scatterlist *sg;
 	int i;
 
-	if (!iommu_need_mapping(dev))
-		return dma_direct_unmap_sg(dev, sglist, nelems, dir, attrs);
-
 	for_each_sg(sglist, sg, nelems, i) {
 		nrpages += aligned_nrpages(sg_dma_address(sg), sg_dma_len(sg));
 	}
@@ -3651,8 +3613,6 @@ static int intel_map_sg(struct device *dev, struct scatterlist *sglist, int nele
 	struct intel_iommu *iommu;
 
 	BUG_ON(dir == DMA_NONE);
-	if (!iommu_need_mapping(dev))
-		return dma_direct_map_sg(dev, sglist, nelems, dir, attrs);
 
 	domain = find_domain(dev);
 	if (!domain)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 04/10] PCI: Add dev_is_untrusted helper
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
                   ` (2 preceding siblings ...)
  2019-07-25  3:17 ` [PATCH v5 03/10] iommu/vt-d: Cleanup after use per-device dma ops Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25  5:44   ` Christoph Hellwig
  2019-07-25  3:17 ` [PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs Lu Baolu
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu

There are several places in the kernel where it is necessary to
check whether a device is a pci untrusted device. Add a helper
to simplify the callers.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 include/linux/pci.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 9e700d9f9f28..960352a75a10 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1029,6 +1029,7 @@ void pcibios_setup_bridge(struct pci_bus *bus, unsigned long type);
 void pci_sort_breadthfirst(void);
 #define dev_is_pci(d) ((d)->bus == &pci_bus_type)
 #define dev_is_pf(d) ((dev_is_pci(d) ? to_pci_dev(d)->is_physfn : false))
+#define dev_is_untrusted(d) ((dev_is_pci(d) ? to_pci_dev(d)->untrusted : false))
 
 /* Generic PCI functions exported to card drivers */
 
@@ -1766,6 +1767,7 @@ static inline struct pci_dev *pci_dev_get(struct pci_dev *dev) { return NULL; }
 
 #define dev_is_pci(d) (false)
 #define dev_is_pf(d) (false)
+#define dev_is_untrusted(d) (false)
 static inline bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags)
 { return false; }
 static inline int pci_irqd_intx_xlate(struct irq_domain *d,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
                   ` (3 preceding siblings ...)
  2019-07-25  3:17 ` [PATCH v5 04/10] PCI: Add dev_is_untrusted helper Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25 11:47   ` Christoph Hellwig
  2019-07-25  3:17 ` [PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device Lu Baolu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu

This splits the size parameter to swiotlb_tbl_map_single() and
swiotlb_tbl_unmap_single() into an alloc_size and a mapping_size
parameter, where the latter one is rounded up to the iommu page
size.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/xen/swiotlb-xen.c |  8 ++++----
 include/linux/swiotlb.h   |  8 ++++++--
 kernel/dma/direct.c       |  2 +-
 kernel/dma/swiotlb.c      | 24 +++++++++++++-----------
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index cfbe46785a3b..58d25486971e 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -400,8 +400,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 	 */
 	trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
 
-	map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
-				     attrs);
+	map = swiotlb_tbl_map_single(dev, start_dma_addr, phys,
+				     size, size, dir, attrs);
 	if (map == (phys_addr_t)DMA_MAPPING_ERROR)
 		return DMA_MAPPING_ERROR;
 
@@ -411,7 +411,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 	 * Ensure that the address returned is DMA'ble
 	 */
 	if (unlikely(!dma_capable(dev, dev_addr, size))) {
-		swiotlb_tbl_unmap_single(dev, map, size, dir,
+		swiotlb_tbl_unmap_single(dev, map, size, size, dir,
 				attrs | DMA_ATTR_SKIP_CPU_SYNC);
 		return DMA_MAPPING_ERROR;
 	}
@@ -447,7 +447,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr,
 
 	/* NOTE: We use dev_addr here, not paddr! */
 	if (is_xen_swiotlb_buffer(dev_addr))
-		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
+		swiotlb_tbl_unmap_single(hwdev, paddr, size, size, dir, attrs);
 }
 
 static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 361f62bb4a8e..cde3dc18e21a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -46,13 +46,17 @@ enum dma_sync_target {
 
 extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 					  dma_addr_t tbl_dma_addr,
-					  phys_addr_t phys, size_t size,
+					  phys_addr_t phys,
+					  size_t mapping_size,
+					  size_t alloc_size,
 					  enum dma_data_direction dir,
 					  unsigned long attrs);
 
 extern void swiotlb_tbl_unmap_single(struct device *hwdev,
 				     phys_addr_t tlb_addr,
-				     size_t size, enum dma_data_direction dir,
+				     size_t mapping_size,
+				     size_t alloc_size,
+				     enum dma_data_direction dir,
 				     unsigned long attrs);
 
 extern void swiotlb_tbl_sync_single(struct device *hwdev,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 59bdceea3737..6c183326c4e6 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -297,7 +297,7 @@ void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
 	if (unlikely(is_swiotlb_buffer(phys)))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+		swiotlb_tbl_unmap_single(dev, phys, size, size, dir, attrs);
 }
 EXPORT_SYMBOL(dma_direct_unmap_page);
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9de232229063..43c88626a1f3 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -444,7 +444,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
 
 phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 				   dma_addr_t tbl_dma_addr,
-				   phys_addr_t orig_addr, size_t size,
+				   phys_addr_t orig_addr,
+				   size_t mapping_size,
+				   size_t alloc_size,
 				   enum dma_data_direction dir,
 				   unsigned long attrs)
 {
@@ -481,8 +483,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 	 * For mappings greater than or equal to a page, we limit the stride
 	 * (and hence alignment) to a page size.
 	 */
-	nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
-	if (size >= PAGE_SIZE)
+	nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	if (alloc_size >= PAGE_SIZE)
 		stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
 	else
 		stride = 1;
@@ -547,7 +549,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 	spin_unlock_irqrestore(&io_tlb_lock, flags);
 	if (!(attrs & DMA_ATTR_NO_WARN) && printk_ratelimit())
 		dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes), total %lu (slots), used %lu (slots)\n",
-			 size, io_tlb_nslabs, tmp_io_tlb_used);
+			 alloc_size, io_tlb_nslabs, tmp_io_tlb_used);
 	return (phys_addr_t)DMA_MAPPING_ERROR;
 found:
 	io_tlb_used += nslots;
@@ -562,7 +564,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-		swiotlb_bounce(orig_addr, tlb_addr, size, DMA_TO_DEVICE);
+		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
 
 	return tlb_addr;
 }
@@ -571,11 +573,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
  * tlb_addr is the physical address of the bounce buffer to unmap.
  */
 void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
-			      size_t size, enum dma_data_direction dir,
-			      unsigned long attrs)
+			      size_t mapping_size, size_t alloc_size,
+			      enum dma_data_direction dir, unsigned long attrs)
 {
 	unsigned long flags;
-	int i, count, nslots = ALIGN(size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
+	int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
 	int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = io_tlb_orig_addr[index];
 
@@ -585,7 +587,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
 	if (orig_addr != INVALID_PHYS_ADDR &&
 	    !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
-		swiotlb_bounce(orig_addr, tlb_addr, size, DMA_FROM_DEVICE);
+		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE);
 
 	/*
 	 * Return the buffer to the free list by setting the corresponding
@@ -665,14 +667,14 @@ bool swiotlb_map(struct device *dev, phys_addr_t *phys, dma_addr_t *dma_addr,
 
 	/* Oh well, have to allocate and map a bounce buffer. */
 	*phys = swiotlb_tbl_map_single(dev, __phys_to_dma(dev, io_tlb_start),
-			*phys, size, dir, attrs);
+			*phys, size, size, dir, attrs);
 	if (*phys == (phys_addr_t)DMA_MAPPING_ERROR)
 		return false;
 
 	/* Ensure that the address returned is DMA'ble */
 	*dma_addr = __phys_to_dma(dev, *phys);
 	if (unlikely(!dma_capable(dev, *dma_addr, size))) {
-		swiotlb_tbl_unmap_single(dev, *phys, size, dir,
+		swiotlb_tbl_unmap_single(dev, *phys, size, size, dir,
 			attrs | DMA_ATTR_SKIP_CPU_SYNC);
 		return false;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
                   ` (4 preceding siblings ...)
  2019-07-25  3:17 ` [PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25 11:49   ` Christoph Hellwig
  2019-07-25  3:17 ` [PATCH v5 07/10] iommu: Add bounce page APIs Lu Baolu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu

This is necessary to avoid exposing valid kernel data to any
malicious device.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 kernel/dma/swiotlb.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 43c88626a1f3..edc84a00b9f9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -35,6 +35,7 @@
 #include <linux/scatterlist.h>
 #include <linux/mem_encrypt.h>
 #include <linux/set_memory.h>
+#include <linux/pci.h>
 #ifdef CONFIG_DEBUG_FS
 #include <linux/debugfs.h>
 #endif
@@ -562,6 +563,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 	 */
 	for (i = 0; i < nslots; i++)
 		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+
+	/* Zero out the bounce buffer if the consumer is untrusted. */
+	if (dev_is_untrusted(hwdev))
+		memset(phys_to_virt(tlb_addr), 0, alloc_size);
+
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
 		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 07/10] iommu: Add bounce page APIs
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
                   ` (5 preceding siblings ...)
  2019-07-25  3:17 ` [PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 08/10] iommu/vt-d: Check whether device requires bounce buffer Lu Baolu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu, Jacob Pan, Alan Cox, Mika Westerberg

IOMMU hardware always use paging for DMA remapping.  The
minimum mapped window is a page size. The device drivers
may map buffers not filling whole IOMMU window. It allows
device to access to possibly unrelated memory and various
malicious devices can exploit this to perform DMA attack.

This introduces the bouce buffer mechanism for DMA buffers
which doesn't fill a minimal IOMMU page. It could be used
by various vendor specific IOMMU drivers as long as the
DMA domain is managed by the generic IOMMU layer. Below
APIs are added:

* iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
  - Map a buffer start at DMA address @addr in bounce page
    manner. For buffer parts that doesn't cross a whole
    minimal IOMMU page, the bounce page policy is applied.
    A bounce page mapped by swiotlb will be used as the DMA
    target in the IOMMU page table. Otherwise, the physical
    address @paddr is mapped instead.

* iommu_bounce_unmap(dev, addr, size, dir, attrs)
  - Unmap the buffer mapped with iommu_bounce_map(). The bounce
    page will be torn down after the bounced data get synced.

* iommu_bounce_sync(dev, addr, size, dir, target)
  - Synce the bounced data in case the bounce mapped buffer is
    reused.

The whole APIs are included within a kernel option IOMMU_BOUNCE_PAGE.
It's useful for cases where bounce page doesn't needed, for example,
embedded cases.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/Kconfig |  13 +++++
 drivers/iommu/iommu.c | 118 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/iommu.h |  35 +++++++++++++
 3 files changed, 166 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e15cdcd8cb3c..d7f2e09cbcf2 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -86,6 +86,19 @@ config IOMMU_DEFAULT_PASSTHROUGH
 
 	  If unsure, say N here.
 
+config IOMMU_BOUNCE_PAGE
+	bool "Use bounce page for untrusted devices"
+	depends on IOMMU_API && SWIOTLB
+	help
+	  IOMMU hardware always use paging for DMA remapping. The minimum
+	  mapped window is a page size. The device drivers may map buffers
+	  not filling whole IOMMU window. This allows device to access to
+	  possibly unrelated memory and malicious device can exploit this
+	  to perform a DMA attack. Select this to use a bounce page for the
+	  buffer which doesn't fill a whole IOMU page.
+
+	  If unsure, say N here.
+
 config OF_IOMMU
        def_bool y
        depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0c674d80c37f..fe3815186d72 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2468,3 +2468,121 @@ int iommu_sva_get_pasid(struct iommu_sva *handle)
 	return ops->sva_get_pasid(handle);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+#ifdef CONFIG_IOMMU_BOUNCE_PAGE
+
+/*
+ * Bounce buffer support for external devices:
+ *
+ * IOMMU hardware always use paging for DMA remapping. The minimum mapped
+ * window is a page size. The device drivers may map buffers not filling
+ * whole IOMMU window. This allows device to access to possibly unrelated
+ * memory and malicious device can exploit this to perform a DMA attack.
+ * Use bounce pages for the buffer which doesn't fill whole IOMMU pages.
+ */
+
+static inline size_t
+get_aligned_size(struct iommu_domain *domain, size_t size)
+{
+	return ALIGN(size, 1 << __ffs(domain->pgsize_bitmap));
+}
+
+dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova,
+			    phys_addr_t paddr, size_t size,
+			    enum dma_data_direction dir,
+			    unsigned long attrs)
+{
+	struct iommu_domain *domain;
+	unsigned int min_pagesz;
+	phys_addr_t tlb_addr;
+	size_t aligned_size;
+	int prot = 0;
+	int ret;
+
+	domain = iommu_get_dma_domain(dev);
+	if (!domain)
+		return DMA_MAPPING_ERROR;
+
+	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+		prot |= IOMMU_READ;
+	if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+		prot |= IOMMU_WRITE;
+
+	aligned_size = get_aligned_size(domain, size);
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+	/*
+	 * If both the physical buffer start address and size are
+	 * page aligned, we don't need to use a bounce page.
+	 */
+	if (!IS_ALIGNED(paddr | size, min_pagesz)) {
+		tlb_addr = swiotlb_tbl_map_single(dev,
+				__phys_to_dma(dev, io_tlb_start),
+				paddr, size, aligned_size, dir, attrs);
+		if (tlb_addr == DMA_MAPPING_ERROR)
+			return DMA_MAPPING_ERROR;
+	} else {
+		tlb_addr = paddr;
+	}
+
+	ret = iommu_map(domain, iova, tlb_addr, aligned_size, prot);
+	if (ret) {
+		if (is_swiotlb_buffer(tlb_addr))
+			swiotlb_tbl_unmap_single(dev, tlb_addr, size,
+						 aligned_size, dir, attrs);
+
+		return DMA_MAPPING_ERROR;
+	}
+
+	return iova;
+}
+EXPORT_SYMBOL_GPL(iommu_bounce_map);
+
+static inline phys_addr_t
+iova_to_tlb_addr(struct iommu_domain *domain, dma_addr_t addr)
+{
+	if (unlikely(!domain->ops || !domain->ops->iova_to_phys))
+		return 0;
+
+	return domain->ops->iova_to_phys(domain, addr);
+}
+
+void iommu_bounce_unmap(struct device *dev, dma_addr_t iova, size_t size,
+			enum dma_data_direction dir, unsigned long attrs)
+{
+	struct iommu_domain *domain;
+	phys_addr_t tlb_addr;
+	size_t aligned_size;
+
+	domain = iommu_get_dma_domain(dev);
+	if (WARN_ON(!domain))
+		return;
+
+	aligned_size = get_aligned_size(domain, size);
+	tlb_addr = iova_to_tlb_addr(domain, iova);
+	if (WARN_ON(!tlb_addr))
+		return;
+
+	iommu_unmap(domain, iova, aligned_size);
+	if (is_swiotlb_buffer(tlb_addr))
+		swiotlb_tbl_unmap_single(dev, tlb_addr, size,
+					 aligned_size, dir, attrs);
+}
+EXPORT_SYMBOL_GPL(iommu_bounce_unmap);
+
+void iommu_bounce_sync(struct device *dev, dma_addr_t addr, size_t size,
+		       enum dma_data_direction dir, enum dma_sync_target target)
+{
+	struct iommu_domain *domain;
+	phys_addr_t tlb_addr;
+
+	domain = iommu_get_dma_domain(dev);
+	if (WARN_ON(!domain))
+		return;
+
+	tlb_addr = iova_to_tlb_addr(domain, addr);
+	if (is_swiotlb_buffer(tlb_addr))
+		swiotlb_tbl_sync_single(dev, tlb_addr, size, dir, target);
+}
+EXPORT_SYMBOL_GPL(iommu_bounce_sync);
+#endif /* CONFIG_IOMMU_BOUNCE_PAGE */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fdc355ccc570..5569b84cc9be 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -14,6 +14,8 @@
 #include <linux/err.h>
 #include <linux/of.h>
 #include <uapi/linux/iommu.h>
+#include <linux/swiotlb.h>
+#include <linux/dma-direct.h>
 
 #define IOMMU_READ	(1 << 0)
 #define IOMMU_WRITE	(1 << 1)
@@ -560,6 +562,39 @@ int iommu_sva_set_ops(struct iommu_sva *handle,
 		      const struct iommu_sva_ops *ops);
 int iommu_sva_get_pasid(struct iommu_sva *handle);
 
+#ifdef CONFIG_IOMMU_BOUNCE_PAGE
+dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova,
+			    phys_addr_t paddr, size_t size,
+			    enum dma_data_direction dir,
+			    unsigned long attrs);
+void iommu_bounce_unmap(struct device *dev, dma_addr_t iova, size_t size,
+			enum dma_data_direction dir, unsigned long attrs);
+void iommu_bounce_sync(struct device *dev, dma_addr_t addr, size_t size,
+		       enum dma_data_direction dir,
+		       enum dma_sync_target target);
+#else
+static inline
+dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova,
+			    phys_addr_t paddr, size_t size,
+			    enum dma_data_direction dir,
+			    unsigned long attrs)
+{
+	return DMA_MAPPING_ERROR;
+}
+
+static inline
+void iommu_bounce_unmap(struct device *dev, dma_addr_t iova, size_t size,
+			enum dma_data_direction dir, unsigned long attrs)
+{
+}
+
+static inline
+void iommu_bounce_sync(struct device *dev, dma_addr_t addr, size_t size,
+		       enum dma_data_direction dir, enum dma_sync_target target)
+{
+}
+#endif /* CONFIG_IOMMU_BOUNCE_PAGE */
+
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 08/10] iommu/vt-d: Check whether device requires bounce buffer
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
                   ` (6 preceding siblings ...)
  2019-07-25  3:17 ` [PATCH v5 07/10] iommu: Add bounce page APIs Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap Lu Baolu
  2019-07-25  3:17 ` [PATCH v5 10/10] iommu/vt-d: Use bounce buffer for untrusted devices Lu Baolu
  9 siblings, 0 replies; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu, Jacob Pan

This adds a helper to check whether a device needs to
use bounce buffer. It also provides a boot time option
to disable the bounce buffer. Users can use this to
prevent the iommu driver from using the bounce buffer
for performance gain.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Xu Pengfei <pengfei.xu@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 5 +++++
 drivers/iommu/intel-iommu.c                     | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 46b826fcb5ad..8628454bd8a2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1732,6 +1732,11 @@
 			Note that using this option lowers the security
 			provided by tboot because it makes the system
 			vulnerable to DMA attacks.
+		nobounce [Default off]
+			Disable bounce buffer for unstrusted devices such as
+			the Thunderbolt devices. This will treat the untrusted
+			devices as the trusted ones, hence might expose security
+			risks of DMA attacks.
 
 	intel_idle.max_cstate=	[KNL,HW,ACPI,X86]
 			0	disables intel_idle and fall back on acpi_idle.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a458df975c55..4185406b0368 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -360,6 +360,7 @@ static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
 static int iommu_identity_mapping;
+static int intel_no_bounce;
 
 #define IDENTMAP_ALL		1
 #define IDENTMAP_GFX		2
@@ -373,6 +374,8 @@ EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
 static DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
+#define device_needs_bounce(d) (!intel_no_bounce && dev_is_untrusted(d))
+
 /*
  * Iterate over elements in device_domain_list and call the specified
  * callback @fn against each element.
@@ -455,6 +458,9 @@ static int __init intel_iommu_setup(char *str)
 			printk(KERN_INFO
 				"Intel-IOMMU: not forcing on after tboot. This could expose security risk for tboot\n");
 			intel_iommu_tboot_noforce = 1;
+		} else if (!strncmp(str, "nobounce", 8)) {
+			pr_info("Intel-IOMMU: No bounce buffer. This could expose security risks of DMA attacks\n");
+			intel_no_bounce = 1;
 		}
 
 		str += strcspn(str, ",");
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
                   ` (7 preceding siblings ...)
  2019-07-25  3:17 ` [PATCH v5 08/10] iommu/vt-d: Check whether device requires bounce buffer Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  2019-07-25 12:26   ` Steven Rostedt
  2019-07-25  3:17 ` [PATCH v5 10/10] iommu/vt-d: Use bounce buffer for untrusted devices Lu Baolu
  9 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu, Jacob Pan

This adds trace support for the Intel IOMMU driver. It
also declares some events which could be used to trace
the events when an IOVA is being mapped or unmapped in
a domain.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/Makefile             |  1 +
 drivers/iommu/intel-trace.c        | 14 +++++
 include/trace/events/intel_iommu.h | 95 ++++++++++++++++++++++++++++++
 3 files changed, 110 insertions(+)
 create mode 100644 drivers/iommu/intel-trace.c
 create mode 100644 include/trace/events/intel_iommu.h

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index f13f36ae1af6..bfe27b2755bd 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
 obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
 obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
 obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
diff --git a/drivers/iommu/intel-trace.c b/drivers/iommu/intel-trace.c
new file mode 100644
index 000000000000..bfb6a6e37a88
--- /dev/null
+++ b/drivers/iommu/intel-trace.c
@@ -0,0 +1,14 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu <baolu.lu@linux.intel.com>
+ */
+
+#include <linux/string.h>
+#include <linux/types.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/intel_iommu.h>
diff --git a/include/trace/events/intel_iommu.h b/include/trace/events/intel_iommu.h
new file mode 100644
index 000000000000..3fdeaad93b2e
--- /dev/null
+++ b/include/trace/events/intel_iommu.h
@@ -0,0 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Intel IOMMU trace support
+ *
+ * Copyright (C) 2019 Intel Corporation
+ *
+ * Author: Lu Baolu <baolu.lu@linux.intel.com>
+ */
+#ifdef CONFIG_INTEL_IOMMU
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM intel_iommu
+
+#if !defined(_TRACE_INTEL_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_INTEL_IOMMU_H
+
+#include <linux/tracepoint.h>
+#include <linux/intel-iommu.h>
+
+DECLARE_EVENT_CLASS(dma_map,
+	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+		 size_t size),
+
+	TP_ARGS(dev, dev_addr, phys_addr, size),
+
+	TP_STRUCT__entry(
+		__string(dev_name, dev_name(dev))
+		__field(dma_addr_t, dev_addr)
+		__field(phys_addr_t, phys_addr)
+		__field(size_t,	size)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name(dev));
+		__entry->dev_addr = dev_addr;
+		__entry->phys_addr = phys_addr;
+		__entry->size = size;
+	),
+
+	TP_printk("dev=%s dev_addr=0x%llx phys_addr=0x%llx size=%zu",
+		  __get_str(dev_name),
+		  (unsigned long long)__entry->dev_addr,
+		  (unsigned long long)__entry->phys_addr,
+		  __entry->size)
+);
+
+DEFINE_EVENT(dma_map, bounce_map_single,
+	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+		 size_t size),
+	TP_ARGS(dev, dev_addr, phys_addr, size)
+);
+
+DEFINE_EVENT(dma_map, bounce_map_sg,
+	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
+		 size_t size),
+	TP_ARGS(dev, dev_addr, phys_addr, size)
+);
+
+DECLARE_EVENT_CLASS(dma_unmap,
+	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+
+	TP_ARGS(dev, dev_addr, size),
+
+	TP_STRUCT__entry(
+		__string(dev_name, dev_name(dev))
+		__field(dma_addr_t, dev_addr)
+		__field(size_t,	size)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name(dev));
+		__entry->dev_addr = dev_addr;
+		__entry->size = size;
+	),
+
+	TP_printk("dev=%s dev_addr=0x%llx size=%zu",
+		  __get_str(dev_name),
+		  (unsigned long long)__entry->dev_addr,
+		  __entry->size)
+);
+
+DEFINE_EVENT(dma_unmap, bounce_unmap_single,
+	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+	TP_ARGS(dev, dev_addr, size)
+);
+
+DEFINE_EVENT(dma_unmap, bounce_unmap_sg,
+	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
+	TP_ARGS(dev, dev_addr, size)
+);
+
+#endif /* _TRACE_INTEL_IOMMU_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
+#endif /* CONFIG_INTEL_IOMMU */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 10/10] iommu/vt-d: Use bounce buffer for untrusted devices
  2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
                   ` (8 preceding siblings ...)
  2019-07-25  3:17 ` [PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap Lu Baolu
@ 2019-07-25  3:17 ` Lu Baolu
  9 siblings, 0 replies; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  3:17 UTC (permalink / raw)
  To: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig
  Cc: ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Lu Baolu, Jacob Pan

The Intel VT-d hardware uses paging for DMA remapping.
The minimum mapped window is a page size. The device
drivers may map buffers not filling the whole IOMMU
window. This allows the device to access to possibly
unrelated memory and a malicious device could exploit
this to perform DMA attacks. To address this, the
Intel IOMMU driver will use bounce pages for those
buffers which don't fill whole IOMMU pages.

Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Tested-by: Xu Pengfei <pengfei.xu@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@intel.com>
---
 drivers/iommu/Kconfig       |   2 +
 drivers/iommu/intel-iommu.c | 188 +++++++++++++++++++++++++++++++++++-
 2 files changed, 188 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d7f2e09cbcf2..3baa418edc16 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -195,6 +195,8 @@ config INTEL_IOMMU
 	select IOMMU_IOVA
 	select NEED_DMA_MAP_STATE
 	select DMAR_TABLE
+	select SWIOTLB
+	select IOMMU_BOUNCE_PAGE
 	help
 	  DMA remapping (DMAR) devices support enables independent address
 	  translations for Direct Memory Access (DMA) from devices.
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 4185406b0368..2cdec279ccac 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -44,6 +44,7 @@
 #include <asm/irq_remapping.h>
 #include <asm/cacheflush.h>
 #include <asm/iommu.h>
+#include <trace/events/intel_iommu.h>
 
 #include "irq_remapping.h"
 #include "intel-pasid.h"
@@ -3672,6 +3673,183 @@ static const struct dma_map_ops intel_dma_ops = {
 	.dma_supported = dma_direct_supported,
 };
 
+static dma_addr_t
+bounce_map_single(struct device *dev, phys_addr_t paddr, size_t size,
+		  enum dma_data_direction dir, unsigned long attrs,
+		  u64 dma_mask)
+{
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+	unsigned long iova_pfn;
+	unsigned long nrpages;
+	dma_addr_t ret_addr;
+
+	domain = find_domain(dev);
+	if (WARN_ON(dir == DMA_NONE || !domain))
+		return DMA_MAPPING_ERROR;
+
+	iommu = domain_get_iommu(domain);
+	nrpages = aligned_nrpages(0, size);
+	iova_pfn = intel_alloc_iova(dev, domain,
+				    dma_to_mm_pfn(nrpages), dma_mask);
+	if (!iova_pfn)
+		return DMA_MAPPING_ERROR;
+
+	ret_addr = iommu_bounce_map(dev, iova_pfn << PAGE_SHIFT,
+				    paddr, size, dir, attrs);
+	if (ret_addr == DMA_MAPPING_ERROR) {
+		free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages));
+		return DMA_MAPPING_ERROR;
+	}
+
+	trace_bounce_map_single(dev, iova_pfn << PAGE_SHIFT, paddr, size);
+
+	return ret_addr;
+}
+
+static void
+bounce_unmap_single(struct device *dev, dma_addr_t dev_addr, size_t size,
+		    enum dma_data_direction dir, unsigned long attrs)
+{
+	struct dmar_domain *domain;
+	struct intel_iommu *iommu;
+	unsigned long iova_pfn;
+	unsigned long nrpages;
+
+	domain = find_domain(dev);
+	if (WARN_ON(!domain))
+		return;
+
+	iommu_bounce_unmap(dev, dev_addr, size, dir, attrs);
+	trace_bounce_unmap_single(dev, dev_addr, size);
+
+	iommu = domain_get_iommu(domain);
+	iova_pfn = IOVA_PFN(dev_addr);
+	nrpages = aligned_nrpages(0, size);
+
+	iommu_flush_iotlb_psi(iommu, domain,
+			      mm_to_dma_pfn(iova_pfn), nrpages, 0, 0);
+	free_iova_fast(&domain->iovad, iova_pfn, dma_to_mm_pfn(nrpages));
+}
+
+static dma_addr_t
+bounce_map_page(struct device *dev, struct page *page, unsigned long offset,
+		size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+	return bounce_map_single(dev, page_to_phys(page) + offset,
+				 size, dir, attrs, *dev->dma_mask);
+}
+
+static dma_addr_t
+bounce_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size,
+		    enum dma_data_direction dir, unsigned long attrs)
+{
+	return bounce_map_single(dev, phys_addr, size,
+				 dir, attrs, *dev->dma_mask);
+}
+
+static void
+bounce_unmap_page(struct device *dev, dma_addr_t dev_addr, size_t size,
+		  enum dma_data_direction dir, unsigned long attrs)
+{
+	bounce_unmap_single(dev, dev_addr, size, dir, attrs);
+}
+
+static void
+bounce_unmap_resource(struct device *dev, dma_addr_t dev_addr, size_t size,
+		      enum dma_data_direction dir, unsigned long attrs)
+{
+	bounce_unmap_single(dev, dev_addr, size, dir, attrs);
+}
+
+static void
+bounce_unmap_sg(struct device *dev, struct scatterlist *sglist, int nelems,
+		enum dma_data_direction dir, unsigned long attrs)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sglist, sg, nelems, i)
+		bounce_unmap_page(dev, sg->dma_address,
+				  sg_dma_len(sg), dir, attrs);
+}
+
+static int
+bounce_map_sg(struct device *dev, struct scatterlist *sglist, int nelems,
+	      enum dma_data_direction dir, unsigned long attrs)
+{
+	int i;
+	struct scatterlist *sg;
+
+	for_each_sg(sglist, sg, nelems, i) {
+		sg->dma_address = bounce_map_page(dev, sg_page(sg),
+				sg->offset, sg->length, dir, attrs);
+		if (sg->dma_address == DMA_MAPPING_ERROR)
+			goto out_unmap;
+		sg_dma_len(sg) = sg->length;
+	}
+
+	return nelems;
+
+out_unmap:
+	bounce_unmap_sg(dev, sglist, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC);
+	return 0;
+}
+
+static void
+bounce_sync_single_for_cpu(struct device *dev, dma_addr_t addr,
+			   size_t size, enum dma_data_direction dir)
+{
+	iommu_bounce_sync(dev, addr, size, dir, SYNC_FOR_CPU);
+}
+
+static void
+bounce_sync_single_for_device(struct device *dev, dma_addr_t addr,
+			      size_t size, enum dma_data_direction dir)
+{
+	iommu_bounce_sync(dev, addr, size, dir, SYNC_FOR_DEVICE);
+}
+
+static void
+bounce_sync_sg_for_cpu(struct device *dev, struct scatterlist *sglist,
+		       int nelems, enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sglist, sg, nelems, i)
+		iommu_bounce_sync(dev, sg_dma_address(sg),
+				  sg_dma_len(sg), dir, SYNC_FOR_CPU);
+}
+
+static void
+bounce_sync_sg_for_device(struct device *dev, struct scatterlist *sglist,
+			  int nelems, enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	for_each_sg(sglist, sg, nelems, i)
+		iommu_bounce_sync(dev, sg_dma_address(sg),
+				  sg_dma_len(sg), dir, SYNC_FOR_DEVICE);
+}
+
+static const struct dma_map_ops bounce_dma_ops = {
+	.alloc			= intel_alloc_coherent,
+	.free			= intel_free_coherent,
+	.map_sg			= bounce_map_sg,
+	.unmap_sg		= bounce_unmap_sg,
+	.map_page		= bounce_map_page,
+	.unmap_page		= bounce_unmap_page,
+	.sync_single_for_cpu	= bounce_sync_single_for_cpu,
+	.sync_single_for_device	= bounce_sync_single_for_device,
+	.sync_sg_for_cpu	= bounce_sync_sg_for_cpu,
+	.sync_sg_for_device	= bounce_sync_sg_for_device,
+	.map_resource		= bounce_map_resource,
+	.unmap_resource		= bounce_unmap_resource,
+	.dma_supported		= dma_direct_supported,
+};
+
 static inline int iommu_domain_cache_init(void)
 {
 	int ret = 0;
@@ -5212,7 +5390,10 @@ static int intel_iommu_add_device(struct device *dev)
 					 "Device uses a private identity domain.\n");
 			}
 		} else {
-			set_dma_ops(dev, &intel_dma_ops);
+			if (device_needs_bounce(dev))
+				set_dma_ops(dev, &bounce_dma_ops);
+			else
+				set_dma_ops(dev, &intel_dma_ops);
 		}
 	} else {
 		if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
@@ -5229,7 +5410,10 @@ static int intel_iommu_add_device(struct device *dev)
 					 "Device uses a private dma domain.\n");
 			}
 
-			set_dma_ops(dev, &intel_dma_ops);
+			if (device_needs_bounce(dev))
+				set_dma_ops(dev, &bounce_dma_ops);
+			else
+				set_dma_ops(dev, &intel_dma_ops);
 		}
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma
  2019-07-25  3:17 ` [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma Lu Baolu
@ 2019-07-25  5:41   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-07-25  5:41 UTC (permalink / raw)
  To: Lu Baolu
  Cc: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-07-25  3:17 ` [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops Lu Baolu
@ 2019-07-25  5:44   ` Christoph Hellwig
  2019-07-25  7:18     ` Lu Baolu
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2019-07-25  5:44 UTC (permalink / raw)
  To: Lu Baolu
  Cc: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

>  /* Check if the dev needs to go through non-identity map and unmap process.*/
>  static bool iommu_need_mapping(struct device *dev)
>  {
> -	int ret;
> -
>  	if (iommu_dummy(dev))
>  		return false;
>  
> -	ret = identity_mapping(dev);
> -	if (ret) {
> -		u64 dma_mask = *dev->dma_mask;
> -
> -		if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
> -			dma_mask = dev->coherent_dma_mask;
> -
> -		if (dma_mask >= dma_get_required_mask(dev))
> -			return false;

Don't we need to keep this bit so that we still allow the IOMMU
to act if the device has a too small DMA mask to address all memory in
the system, even if if it should otherwise be identity mapped?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 04/10] PCI: Add dev_is_untrusted helper
  2019-07-25  3:17 ` [PATCH v5 04/10] PCI: Add dev_is_untrusted helper Lu Baolu
@ 2019-07-25  5:44   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-07-25  5:44 UTC (permalink / raw)
  To: Lu Baolu
  Cc: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel

On Thu, Jul 25, 2019 at 11:17:11AM +0800, Lu Baolu wrote:
> There are several places in the kernel where it is necessary to
> check whether a device is a pci untrusted device. Add a helper
> to simplify the callers.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-07-25  5:44   ` Christoph Hellwig
@ 2019-07-25  7:18     ` Lu Baolu
  2019-07-25 11:43       ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-25  7:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

Hi Christoph,

On 7/25/19 1:44 PM, Christoph Hellwig wrote:
>>   /* Check if the dev needs to go through non-identity map and unmap process.*/
>>   static bool iommu_need_mapping(struct device *dev)
>>   {
>> -	int ret;
>> -
>>   	if (iommu_dummy(dev))
>>   		return false;
>>   
>> -	ret = identity_mapping(dev);
>> -	if (ret) {
>> -		u64 dma_mask = *dev->dma_mask;
>> -
>> -		if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask)
>> -			dma_mask = dev->coherent_dma_mask;
>> -
>> -		if (dma_mask >= dma_get_required_mask(dev))
>> -			return false;
> 
> Don't we need to keep this bit so that we still allow the IOMMU
> to act if the device has a too small DMA mask to address all memory in
> the system, even if if it should otherwise be identity mapped?
> 

This checking happens only when device is using an identity mapped
domain. If the device has a small DMA mask, swiotlb will be used for
high memory access.

This is supposed to be handled in dma_direct_map_page():

         if (unlikely(!dma_direct_possible(dev, dma_addr, size)) &&
             !swiotlb_map(dev, &phys, &dma_addr, size, dir, attrs)) {
                 report_addr(dev, dma_addr, size);
                 return DMA_MAPPING_ERROR;
         }

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-07-25  7:18     ` Lu Baolu
@ 2019-07-25 11:43       ` Christoph Hellwig
  2019-07-26  1:56         ` Lu Baolu
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2019-07-25 11:43 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

On Thu, Jul 25, 2019 at 03:18:03PM +0800, Lu Baolu wrote:
>> Don't we need to keep this bit so that we still allow the IOMMU
>> to act if the device has a too small DMA mask to address all memory in
>> the system, even if if it should otherwise be identity mapped?
>>
>
> This checking happens only when device is using an identity mapped
> domain. If the device has a small DMA mask, swiotlb will be used for
> high memory access.
>
> This is supposed to be handled in dma_direct_map_page():
>
>         if (unlikely(!dma_direct_possible(dev, dma_addr, size)) &&
>             !swiotlb_map(dev, &phys, &dma_addr, size, dir, attrs)) {
>                 report_addr(dev, dma_addr, size);
>                 return DMA_MAPPING_ERROR;
>         }

Well, yes.  But the point is that the current code uses dynamic iommu
mappings even if the devices is in the identity mapped domain when the
dma mask іs too small to map all memory directly.  Your change means it
will now use swiotlb which is most likely going to be a lot more
expensive.  I don't think that this change is a good idea, and even if
we decide that this is a good idea after all that should be done in a
separate prep patch that explains the rationale.

> Best regards,
> Baolu
---end quoted text---

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs
  2019-07-25  3:17 ` [PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs Lu Baolu
@ 2019-07-25 11:47   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-07-25 11:47 UTC (permalink / raw)
  To: Lu Baolu
  Cc: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel

On Thu, Jul 25, 2019 at 11:17:12AM +0800, Lu Baolu wrote:
> This splits the size parameter to swiotlb_tbl_map_single() and
> swiotlb_tbl_unmap_single() into an alloc_size and a mapping_size
> parameter, where the latter one is rounded up to the iommu page
> size.
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device
  2019-07-25  3:17 ` [PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device Lu Baolu
@ 2019-07-25 11:49   ` Christoph Hellwig
  2019-07-26  2:21     ` Lu Baolu
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2019-07-25 11:49 UTC (permalink / raw)
  To: Lu Baolu
  Cc: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel

> index 43c88626a1f3..edc84a00b9f9 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -35,6 +35,7 @@
>  #include <linux/scatterlist.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/set_memory.h>
> +#include <linux/pci.h>
>  #ifdef CONFIG_DEBUG_FS
>  #include <linux/debugfs.h>
>  #endif
> @@ -562,6 +563,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>  	 */
>  	for (i = 0; i < nslots; i++)
>  		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> +
> +	/* Zero out the bounce buffer if the consumer is untrusted. */
> +	if (dev_is_untrusted(hwdev))
> +		memset(phys_to_virt(tlb_addr), 0, alloc_size);

Hmm.  Maybe we need to move the untrusted flag to struct device?
Directly poking into the pci_dev from swiotlb is a bit of a layering
violation.

> +
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>  	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>  		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);

Also for the case where we bounce here we only need to zero the padding
(if there is any), so I think we could optimize this a bit.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap
  2019-07-25  3:17 ` [PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap Lu Baolu
@ 2019-07-25 12:26   ` Steven Rostedt
  2019-07-26  2:24     ` Lu Baolu
  0 siblings, 1 reply; 30+ messages in thread
From: Steven Rostedt @ 2019-07-25 12:26 UTC (permalink / raw)
  To: Lu Baolu
  Cc: David Woodhouse, Joerg Roedel, Bjorn Helgaas, Christoph Hellwig,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, iommu, linux-kernel, Jacob Pan

On Thu, 25 Jul 2019 11:17:16 +0800
Lu Baolu <baolu.lu@linux.intel.com> wrote:

> This adds trace support for the Intel IOMMU driver. It
> also declares some events which could be used to trace
> the events when an IOVA is being mapped or unmapped in
> a domain.
> 
> Cc: Ashok Raj <ashok.raj@intel.com>
> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/Makefile             |  1 +
>  drivers/iommu/intel-trace.c        | 14 +++++
>  include/trace/events/intel_iommu.h | 95 ++++++++++++++++++++++++++++++
>  3 files changed, 110 insertions(+)
>  create mode 100644 drivers/iommu/intel-trace.c
>  create mode 100644 include/trace/events/intel_iommu.h

This patch looks fine, but I don't see the use cases for anything but
trace_bounce_map_single() and trace_bounce_unmap_single() used.

Other than that.

Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index f13f36ae1af6..bfe27b2755bd 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -17,6 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
>  obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>  obj-$(CONFIG_DMAR_TABLE) += dmar.o
>  obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
> +obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
>  obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
>  obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
>  obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
> diff --git a/drivers/iommu/intel-trace.c b/drivers/iommu/intel-trace.c
> new file mode 100644
> index 000000000000..bfb6a6e37a88
> --- /dev/null
> +++ b/drivers/iommu/intel-trace.c
> @@ -0,0 +1,14 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Intel IOMMU trace support
> + *
> + * Copyright (C) 2019 Intel Corporation
> + *
> + * Author: Lu Baolu <baolu.lu@linux.intel.com>
> + */
> +
> +#include <linux/string.h>
> +#include <linux/types.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/intel_iommu.h>
> diff --git a/include/trace/events/intel_iommu.h b/include/trace/events/intel_iommu.h
> new file mode 100644
> index 000000000000..3fdeaad93b2e
> --- /dev/null
> +++ b/include/trace/events/intel_iommu.h
> @@ -0,0 +1,95 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Intel IOMMU trace support
> + *
> + * Copyright (C) 2019 Intel Corporation
> + *
> + * Author: Lu Baolu <baolu.lu@linux.intel.com>
> + */
> +#ifdef CONFIG_INTEL_IOMMU
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM intel_iommu
> +
> +#if !defined(_TRACE_INTEL_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_INTEL_IOMMU_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/intel-iommu.h>
> +
> +DECLARE_EVENT_CLASS(dma_map,
> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
> +		 size_t size),
> +
> +	TP_ARGS(dev, dev_addr, phys_addr, size),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name(dev))
> +		__field(dma_addr_t, dev_addr)
> +		__field(phys_addr_t, phys_addr)
> +		__field(size_t,	size)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name(dev));
> +		__entry->dev_addr = dev_addr;
> +		__entry->phys_addr = phys_addr;
> +		__entry->size = size;
> +	),
> +
> +	TP_printk("dev=%s dev_addr=0x%llx phys_addr=0x%llx size=%zu",
> +		  __get_str(dev_name),
> +		  (unsigned long long)__entry->dev_addr,
> +		  (unsigned long long)__entry->phys_addr,
> +		  __entry->size)
> +);
> +
> +DEFINE_EVENT(dma_map, bounce_map_single,
> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
> +		 size_t size),
> +	TP_ARGS(dev, dev_addr, phys_addr, size)
> +);
> +
> +DEFINE_EVENT(dma_map, bounce_map_sg,
> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
> +		 size_t size),
> +	TP_ARGS(dev, dev_addr, phys_addr, size)
> +);
> +
> +DECLARE_EVENT_CLASS(dma_unmap,
> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
> +
> +	TP_ARGS(dev, dev_addr, size),
> +
> +	TP_STRUCT__entry(
> +		__string(dev_name, dev_name(dev))
> +		__field(dma_addr_t, dev_addr)
> +		__field(size_t,	size)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name(dev));
> +		__entry->dev_addr = dev_addr;
> +		__entry->size = size;
> +	),
> +
> +	TP_printk("dev=%s dev_addr=0x%llx size=%zu",
> +		  __get_str(dev_name),
> +		  (unsigned long long)__entry->dev_addr,
> +		  __entry->size)
> +);
> +
> +DEFINE_EVENT(dma_unmap, bounce_unmap_single,
> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
> +	TP_ARGS(dev, dev_addr, size)
> +);
> +
> +DEFINE_EVENT(dma_unmap, bounce_unmap_sg,
> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
> +	TP_ARGS(dev, dev_addr, size)
> +);
> +
> +#endif /* _TRACE_INTEL_IOMMU_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> +#endif /* CONFIG_INTEL_IOMMU */


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-07-25 11:43       ` Christoph Hellwig
@ 2019-07-26  1:56         ` Lu Baolu
  2019-11-12  7:16           ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-07-26  1:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

Hi,

On 7/25/19 7:43 PM, Christoph Hellwig wrote:
> On Thu, Jul 25, 2019 at 03:18:03PM +0800, Lu Baolu wrote:
>>> Don't we need to keep this bit so that we still allow the IOMMU
>>> to act if the device has a too small DMA mask to address all memory in
>>> the system, even if if it should otherwise be identity mapped?
>>>
>>
>> This checking happens only when device is using an identity mapped
>> domain. If the device has a small DMA mask, swiotlb will be used for
>> high memory access.
>>
>> This is supposed to be handled in dma_direct_map_page():
>>
>>          if (unlikely(!dma_direct_possible(dev, dma_addr, size)) &&
>>              !swiotlb_map(dev, &phys, &dma_addr, size, dir, attrs)) {
>>                  report_addr(dev, dma_addr, size);
>>                  return DMA_MAPPING_ERROR;
>>          }
> 
> Well, yes.  But the point is that the current code uses dynamic iommu
> mappings even if the devices is in the identity mapped domain when the
> dma mask іs too small to map all memory directly.  Your change means it
> will now use swiotlb which is most likely going to be a lot more

By default, we use DMA domain. The privileged users are able to change
this with global kernel parameter or per-group default domain type under
discussion. In another word, use of identity domain is a choice of the
privileged user who should consider the possible bounce buffer overhead.

I think current code doesn't do the right thing. The user asks the iommu
driver to use identity domain for a device, but the driver force it back
to DMA domain because of the device address capability.

> expensive.  I don't think that this change is a good idea, and even if
> we decide that this is a good idea after all that should be done in a
> separate prep patch that explains the rationale.

Yes. Make sense.

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device
  2019-07-25 11:49   ` Christoph Hellwig
@ 2019-07-26  2:21     ` Lu Baolu
  0 siblings, 0 replies; 30+ messages in thread
From: Lu Baolu @ 2019-07-26  2:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel

Hi,

On 7/25/19 7:49 PM, Christoph Hellwig wrote:
>> index 43c88626a1f3..edc84a00b9f9 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -35,6 +35,7 @@
>>   #include <linux/scatterlist.h>
>>   #include <linux/mem_encrypt.h>
>>   #include <linux/set_memory.h>
>> +#include <linux/pci.h>
>>   #ifdef CONFIG_DEBUG_FS
>>   #include <linux/debugfs.h>
>>   #endif
>> @@ -562,6 +563,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>>   	 */
>>   	for (i = 0; i < nslots; i++)
>>   		io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
>> +
>> +	/* Zero out the bounce buffer if the consumer is untrusted. */
>> +	if (dev_is_untrusted(hwdev))
>> +		memset(phys_to_virt(tlb_addr), 0, alloc_size);
> 
> Hmm.  Maybe we need to move the untrusted flag to struct device?
> Directly poking into the pci_dev from swiotlb is a bit of a layering
> violation.

Yes. We can consider this. But I tend to think that it's worth of a
separated series. That's a reason why I defined dev_is_untrusted(). This
helper keeps the caller same when moving the untrusted flag.

> 
>> +
>>   	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>>   	    (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>>   		swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
> 
> Also for the case where we bounce here we only need to zero the padding
> (if there is any), so I think we could optimize this a bit.
> 

Yes. There's duplication here.

Best regards,
Baolu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap
  2019-07-25 12:26   ` Steven Rostedt
@ 2019-07-26  2:24     ` Lu Baolu
  0 siblings, 0 replies; 30+ messages in thread
From: Lu Baolu @ 2019-07-26  2:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: baolu.lu, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	Christoph Hellwig, ashok.raj, jacob.jun.pan, alan.cox,
	kevin.tian, mika.westerberg, Ingo Molnar, Greg Kroah-Hartman,
	pengfei.xu, Konrad Rzeszutek Wilk, Marek Szyprowski,
	Robin Murphy, Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, iommu, linux-kernel, Jacob Pan

Hi,

On 7/25/19 8:26 PM, Steven Rostedt wrote:
> On Thu, 25 Jul 2019 11:17:16 +0800
> Lu Baolu <baolu.lu@linux.intel.com> wrote:
> 
>> This adds trace support for the Intel IOMMU driver. It
>> also declares some events which could be used to trace
>> the events when an IOVA is being mapped or unmapped in
>> a domain.
>>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/Makefile             |  1 +
>>   drivers/iommu/intel-trace.c        | 14 +++++
>>   include/trace/events/intel_iommu.h | 95 ++++++++++++++++++++++++++++++
>>   3 files changed, 110 insertions(+)
>>   create mode 100644 drivers/iommu/intel-trace.c
>>   create mode 100644 include/trace/events/intel_iommu.h
> 
> This patch looks fine, but I don't see the use cases for anything but
> trace_bounce_map_single() and trace_bounce_unmap_single() used.

This only adds trace event/point for this case. We will add more later.

> 
> Other than that.
> 
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Thank you!

Best regards,
Baolu

> 
> -- Steve
> 
>>
>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>> index f13f36ae1af6..bfe27b2755bd 100644
>> --- a/drivers/iommu/Makefile
>> +++ b/drivers/iommu/Makefile
>> @@ -17,6 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
>>   obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o
>>   obj-$(CONFIG_DMAR_TABLE) += dmar.o
>>   obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o
>> +obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o
>>   obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o
>>   obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o
>>   obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o
>> diff --git a/drivers/iommu/intel-trace.c b/drivers/iommu/intel-trace.c
>> new file mode 100644
>> index 000000000000..bfb6a6e37a88
>> --- /dev/null
>> +++ b/drivers/iommu/intel-trace.c
>> @@ -0,0 +1,14 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Intel IOMMU trace support
>> + *
>> + * Copyright (C) 2019 Intel Corporation
>> + *
>> + * Author: Lu Baolu <baolu.lu@linux.intel.com>
>> + */
>> +
>> +#include <linux/string.h>
>> +#include <linux/types.h>
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/intel_iommu.h>
>> diff --git a/include/trace/events/intel_iommu.h b/include/trace/events/intel_iommu.h
>> new file mode 100644
>> index 000000000000..3fdeaad93b2e
>> --- /dev/null
>> +++ b/include/trace/events/intel_iommu.h
>> @@ -0,0 +1,95 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Intel IOMMU trace support
>> + *
>> + * Copyright (C) 2019 Intel Corporation
>> + *
>> + * Author: Lu Baolu <baolu.lu@linux.intel.com>
>> + */
>> +#ifdef CONFIG_INTEL_IOMMU
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM intel_iommu
>> +
>> +#if !defined(_TRACE_INTEL_IOMMU_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_INTEL_IOMMU_H
>> +
>> +#include <linux/tracepoint.h>
>> +#include <linux/intel-iommu.h>
>> +
>> +DECLARE_EVENT_CLASS(dma_map,
>> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
>> +		 size_t size),
>> +
>> +	TP_ARGS(dev, dev_addr, phys_addr, size),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dev_name, dev_name(dev))
>> +		__field(dma_addr_t, dev_addr)
>> +		__field(phys_addr_t, phys_addr)
>> +		__field(size_t,	size)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dev_name, dev_name(dev));
>> +		__entry->dev_addr = dev_addr;
>> +		__entry->phys_addr = phys_addr;
>> +		__entry->size = size;
>> +	),
>> +
>> +	TP_printk("dev=%s dev_addr=0x%llx phys_addr=0x%llx size=%zu",
>> +		  __get_str(dev_name),
>> +		  (unsigned long long)__entry->dev_addr,
>> +		  (unsigned long long)__entry->phys_addr,
>> +		  __entry->size)
>> +);
>> +
>> +DEFINE_EVENT(dma_map, bounce_map_single,
>> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
>> +		 size_t size),
>> +	TP_ARGS(dev, dev_addr, phys_addr, size)
>> +);
>> +
>> +DEFINE_EVENT(dma_map, bounce_map_sg,
>> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, phys_addr_t phys_addr,
>> +		 size_t size),
>> +	TP_ARGS(dev, dev_addr, phys_addr, size)
>> +);
>> +
>> +DECLARE_EVENT_CLASS(dma_unmap,
>> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
>> +
>> +	TP_ARGS(dev, dev_addr, size),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(dev_name, dev_name(dev))
>> +		__field(dma_addr_t, dev_addr)
>> +		__field(size_t,	size)
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(dev_name, dev_name(dev));
>> +		__entry->dev_addr = dev_addr;
>> +		__entry->size = size;
>> +	),
>> +
>> +	TP_printk("dev=%s dev_addr=0x%llx size=%zu",
>> +		  __get_str(dev_name),
>> +		  (unsigned long long)__entry->dev_addr,
>> +		  __entry->size)
>> +);
>> +
>> +DEFINE_EVENT(dma_unmap, bounce_unmap_single,
>> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
>> +	TP_ARGS(dev, dev_addr, size)
>> +);
>> +
>> +DEFINE_EVENT(dma_unmap, bounce_unmap_sg,
>> +	TP_PROTO(struct device *dev, dma_addr_t dev_addr, size_t size),
>> +	TP_ARGS(dev, dev_addr, size)
>> +);
>> +
>> +#endif /* _TRACE_INTEL_IOMMU_H */
>> +
>> +/* This part must be outside protection */
>> +#include <trace/define_trace.h>
>> +#endif /* CONFIG_INTEL_IOMMU */
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-07-26  1:56         ` Lu Baolu
@ 2019-11-12  7:16           ` Christoph Hellwig
  2019-11-13  2:50             ` Lu Baolu
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2019-11-12  7:16 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

On Fri, Jul 26, 2019 at 09:56:51AM +0800, Lu Baolu wrote:
> I think current code doesn't do the right thing. The user asks the iommu
> driver to use identity domain for a device, but the driver force it back
> to DMA domain because of the device address capability.
>
>> expensive.  I don't think that this change is a good idea, and even if
>> we decide that this is a good idea after all that should be done in a
>> separate prep patch that explains the rationale.
>
> Yes. Make sense.

Now that the bounce code has landed it might be good time to revisit
this patch in isolation and with a better explanation.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-11-12  7:16           ` Christoph Hellwig
@ 2019-11-13  2:50             ` Lu Baolu
  2019-11-13  7:03               ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-11-13  2:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

Hi Christoph,

On 11/12/19 3:16 PM, Christoph Hellwig wrote:
> On Fri, Jul 26, 2019 at 09:56:51AM +0800, Lu Baolu wrote:
>> I think current code doesn't do the right thing. The user asks the iommu
>> driver to use identity domain for a device, but the driver force it back
>> to DMA domain because of the device address capability.
>>
>>> expensive.  I don't think that this change is a good idea, and even if
>>> we decide that this is a good idea after all that should be done in a
>>> separate prep patch that explains the rationale.
>>
>> Yes. Make sense.
> 
> Now that the bounce code has landed it might be good time to revisit
> this patch in isolation and with a better explanation.
> 

Yes. Thanks for bringing this up.

Currently, this is a block issue for using per-device dma ops in Intel
IOMMU driver. Hence block this driver from using the generic iommu dma
ops.

I'd like to align Intel IOMMU driver with other vendors. Use iommu dma
ops for devices which have been selected to go through iommu. And use
direct dma ops if selected to by pass.

One concern of this propose is that for devices with limited address
capability, shall we force it to use iommu or alternatively use swiotlb
if user decides to let it by pass iommu.

I understand that using swiotlb will cause some overhead due to the
bounced buffer, but Intel IOMMU is default on hence any users who use a
default kernel won't suffer this. We only need to document this so that
users understand this overhead when they decide to let such devices by
pass iommu. This is common to all vendor iommu drivers as far as I can
see.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-11-13  2:50             ` Lu Baolu
@ 2019-11-13  7:03               ` Christoph Hellwig
  2019-11-13  9:53                 ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2019-11-13  7:03 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

On Wed, Nov 13, 2019 at 10:50:27AM +0800, Lu Baolu wrote:
> Currently, this is a block issue for using per-device dma ops in Intel
> IOMMU driver. Hence block this driver from using the generic iommu dma
> ops.

That is in fact the reason why I bring it up :)

> I'd like to align Intel IOMMU driver with other vendors. Use iommu dma
> ops for devices which have been selected to go through iommu. And use
> direct dma ops if selected to by pass.
>
> One concern of this propose is that for devices with limited address
> capability, shall we force it to use iommu or alternatively use swiotlb
> if user decides to let it by pass iommu.
>
> I understand that using swiotlb will cause some overhead due to the
> bounced buffer, but Intel IOMMU is default on hence any users who use a
> default kernel won't suffer this. We only need to document this so that
> users understand this overhead when they decide to let such devices by
> pass iommu. This is common to all vendor iommu drivers as far as I can
> see.

Indeed.  And one idea would be to lift the code in the powerpc
dma_iommu_ops that check a flag and use the direct ops to the generic
dma code and a flag in struct device.  We can then switch the intel
iommu ops (and AMD Gart) over to it.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-11-13  7:03               ` Christoph Hellwig
@ 2019-11-13  9:53                 ` Christoph Hellwig
  2019-11-14  5:14                   ` Lu Baolu
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2019-11-13  9:53 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

On Wed, Nov 13, 2019 at 08:03:12AM +0100, Christoph Hellwig wrote:
> Indeed.  And one idea would be to lift the code in the powerpc
> dma_iommu_ops that check a flag and use the direct ops to the generic
> dma code and a flag in struct device.  We can then switch the intel
> iommu ops (and AMD Gart) over to it.

Let me know what you think of the branch below.  Only compile tested
and booted on qemu with an emulated intel iommu:

	http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-11-13  9:53                 ` Christoph Hellwig
@ 2019-11-14  5:14                   ` Lu Baolu
  2019-11-14  8:14                     ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-11-14  5:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

Hi Christoph,

On 11/13/19 5:53 PM, Christoph Hellwig wrote:
> On Wed, Nov 13, 2019 at 08:03:12AM +0100, Christoph Hellwig wrote:
>> Indeed.  And one idea would be to lift the code in the powerpc
>> dma_iommu_ops that check a flag and use the direct ops to the generic
>> dma code and a flag in struct device.  We can then switch the intel
>> iommu ops (and AMD Gart) over to it.
> 
> Let me know what you think of the branch below.  Only compile tested
> and booted on qemu with an emulated intel iommu:
> 
> 	http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass
> 

I took a quick look at the related patches on the branch. Most of them
look good to me. But I would like to understand more about below logic.

static int intel_dma_supported(struct device *dev, u64 mask)
{
	struct device_domain_info *info = dev->archdata.iommu;
	int ret;

	ret = dma_direct_supported(dev, mask);
	if (ret < 0)
		return ret;

	if (!info || info == DUMMY_DEVICE_DOMAIN_INFO ||
			info == DEFER_DEVICE_DOMAIN_INFO) {
		dev->dma_ops_bypass = true;
	} else if (info->domain == si_domain) {
		if (mask < dma_direct_get_required_mask(dev)) {
			dev->dma_ops_bypass = false;
			intel_iommu_set_dma_domain(dev);
			dev_info(dev, "32bit DMA uses non-identity mapping\n");
		} else {
			dev->dma_ops_bypass = true;
		}
	} else {
		dev->dma_ops_bypass = false;
	}

	return 0;
}

Could you please educate me what dma_supported() is exactly for? Will
it always get called during boot? When will it be called?

In above implementation, why do we need to check dma_direct_supported()
at the beginning? And why

	if (!info || info == DUMMY_DEVICE_DOMAIN_INFO ||
			info == DEFER_DEVICE_DOMAIN_INFO) {
		dev->dma_ops_bypass = true;

Best regards,
baolu


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-11-14  5:14                   ` Lu Baolu
@ 2019-11-14  8:14                     ` Christoph Hellwig
  2019-11-15  0:57                       ` Lu Baolu
  0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2019-11-14  8:14 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

On Thu, Nov 14, 2019 at 01:14:11PM +0800, Lu Baolu wrote:
> Could you please educate me what dma_supported() is exactly for? Will
> it always get called during boot? When will it be called?

->dma_supported is set when setting either the dma_mask or
dma_coherent_mask. These days it serves too primary purposes: reject
too small masks that can't be addressed, and provide any hooks needed
in the driver based on the mask.

> In above implementation, why do we need to check dma_direct_supported()
> at the beginning? And why

Because the existing driver called dma_direct_supported, which I added
based on x86 arch overrides doings the same a while ago.  I suspect
it is related to addressing for tiny dma masks, but I'm not entirely
sure.  The longer term intel-iommu maintainers or x86 maintainers might
be able to shed more light how this was supposed to work and/or how
systems with the Intel IOMMU deal with e.g. ISA devices with 24-bit
addressing.

>
> 	if (!info || info == DUMMY_DEVICE_DOMAIN_INFO ||
> 			info == DEFER_DEVICE_DOMAIN_INFO) {
> 		dev->dma_ops_bypass = true;

This was supposed to transform the checks from iommu_dummy and
identity_mapping.  But I think it actually isn't entirely correct and
already went bad in the patch to remove identity_mapping.  Pleae check 
the branch I just re-pushed, which should be correct now.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-11-14  8:14                     ` Christoph Hellwig
@ 2019-11-15  0:57                       ` Lu Baolu
  2019-11-20 10:44                         ` Christoph Hellwig
  0 siblings, 1 reply; 30+ messages in thread
From: Lu Baolu @ 2019-11-15  0:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: baolu.lu, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

Hi,

On 11/14/19 4:14 PM, Christoph Hellwig wrote:
> On Thu, Nov 14, 2019 at 01:14:11PM +0800, Lu Baolu wrote:
>> Could you please educate me what dma_supported() is exactly for? Will
>> it always get called during boot? When will it be called?
> 
> ->dma_supported is set when setting either the dma_mask or
> dma_coherent_mask. These days it serves too primary purposes: reject
> too small masks that can't be addressed, and provide any hooks needed
> in the driver based on the mask.

Thanks! So ->dma_supported might not be called before driver maps buffer
and start DMA. Right?

> 
>> In above implementation, why do we need to check dma_direct_supported()
>> at the beginning? And why
> 
> Because the existing driver called dma_direct_supported, which I added
> based on x86 arch overrides doings the same a while ago.  I suspect
> it is related to addressing for tiny dma masks, but I'm not entirely
> sure.  The longer term intel-iommu maintainers or x86 maintainers might
> be able to shed more light how this was supposed to work and/or how
> systems with the Intel IOMMU deal with e.g. ISA devices with 24-bit
> addressing.

Yes. Make sense.

> 
>>
>> 	if (!info || info == DUMMY_DEVICE_DOMAIN_INFO ||
>> 			info == DEFER_DEVICE_DOMAIN_INFO) {
>> 		dev->dma_ops_bypass = true;
> 
> This was supposed to transform the checks from iommu_dummy and
> identity_mapping.  But I think it actually isn't entirely correct and
> already went bad in the patch to remove identity_mapping.  Pleae check
> the branch I just re-pushed, which should be correct now.
> 

Okay. Thanks!

Best regard,
baolu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops
  2019-11-15  0:57                       ` Lu Baolu
@ 2019-11-20 10:44                         ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2019-11-20 10:44 UTC (permalink / raw)
  To: Lu Baolu
  Cc: Christoph Hellwig, David Woodhouse, Joerg Roedel, Bjorn Helgaas,
	ashok.raj, jacob.jun.pan, alan.cox, kevin.tian, mika.westerberg,
	Ingo Molnar, Greg Kroah-Hartman, pengfei.xu,
	Konrad Rzeszutek Wilk, Marek Szyprowski, Robin Murphy,
	Jonathan Corbet, Boris Ostrovsky, Juergen Gross,
	Stefano Stabellini, Steven Rostedt, iommu, linux-kernel,
	Jacob Pan

On Fri, Nov 15, 2019 at 08:57:32AM +0800, Lu Baolu wrote:
> Hi,
>
> On 11/14/19 4:14 PM, Christoph Hellwig wrote:
>> On Thu, Nov 14, 2019 at 01:14:11PM +0800, Lu Baolu wrote:
>>> Could you please educate me what dma_supported() is exactly for? Will
>>> it always get called during boot? When will it be called?
>>
>> ->dma_supported is set when setting either the dma_mask or
>> dma_coherent_mask. These days it serves too primary purposes: reject
>> too small masks that can't be addressed, and provide any hooks needed
>> in the driver based on the mask.
>
> Thanks! So ->dma_supported might not be called before driver maps buffer
> and start DMA. Right?

It is supposed to, yes.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-11-20 10:44 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25  3:17 [PATCH v5 00/10] iommu: Bounce page for untrusted devices Lu Baolu
2019-07-25  3:17 ` [PATCH v5 01/10] iommu/vt-d: Don't switch off swiotlb if use direct dma Lu Baolu
2019-07-25  5:41   ` Christoph Hellwig
2019-07-25  3:17 ` [PATCH v5 02/10] iommu/vt-d: Use per-device dma_ops Lu Baolu
2019-07-25  5:44   ` Christoph Hellwig
2019-07-25  7:18     ` Lu Baolu
2019-07-25 11:43       ` Christoph Hellwig
2019-07-26  1:56         ` Lu Baolu
2019-11-12  7:16           ` Christoph Hellwig
2019-11-13  2:50             ` Lu Baolu
2019-11-13  7:03               ` Christoph Hellwig
2019-11-13  9:53                 ` Christoph Hellwig
2019-11-14  5:14                   ` Lu Baolu
2019-11-14  8:14                     ` Christoph Hellwig
2019-11-15  0:57                       ` Lu Baolu
2019-11-20 10:44                         ` Christoph Hellwig
2019-07-25  3:17 ` [PATCH v5 03/10] iommu/vt-d: Cleanup after use per-device dma ops Lu Baolu
2019-07-25  3:17 ` [PATCH v5 04/10] PCI: Add dev_is_untrusted helper Lu Baolu
2019-07-25  5:44   ` Christoph Hellwig
2019-07-25  3:17 ` [PATCH v5 05/10] swiotlb: Split size parameter to map/unmap APIs Lu Baolu
2019-07-25 11:47   ` Christoph Hellwig
2019-07-25  3:17 ` [PATCH v5 06/10] swiotlb: Zero out bounce buffer for untrusted device Lu Baolu
2019-07-25 11:49   ` Christoph Hellwig
2019-07-26  2:21     ` Lu Baolu
2019-07-25  3:17 ` [PATCH v5 07/10] iommu: Add bounce page APIs Lu Baolu
2019-07-25  3:17 ` [PATCH v5 08/10] iommu/vt-d: Check whether device requires bounce buffer Lu Baolu
2019-07-25  3:17 ` [PATCH v5 09/10] iommu/vt-d: Add trace events for device dma map/unmap Lu Baolu
2019-07-25 12:26   ` Steven Rostedt
2019-07-26  2:24     ` Lu Baolu
2019-07-25  3:17 ` [PATCH v5 10/10] iommu/vt-d: Use bounce buffer for untrusted devices Lu Baolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).