This series implements mitigations for lack of DMA access control on systems without an IOMMU, which could result in the DMA accessing the system memory at unexpected times and/or unexpected addresses, possibly leading to data leakage or corruption. For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is not behind an IOMMU. As PCI-e, by design, gives the device full access to system memory, a vulnerability in the Wi-Fi firmware could easily escalate to a full system exploit (remote wifi exploits: [1a], [1b] that shows a full chain of exploits; [2], [3]). To mitigate the security concerns, we introduce bounced DMA. The bounced DMA ops provide an implementation of DMA ops that bounce streaming DMA in and out of a specially allocated region. The feature on its own provides a basic level of protection against the DMA overwriting buffer contents at unexpected times. However, to protect against general data leakage and system memory corruption, the system needs to provide a way to restrict the DMA to a predefined memory region (this is usually done at firmware level, e.g. in ATF on some ARM platforms). Currently, 32-bit architectures are not supported because of the need to handle HIGHMEM, which increases code complexity and adds more performance penalty for such platforms. Also, bounced DMA can not be enabled on devices behind an IOMMU, as those require an IOMMU-aware implementation of DMA ops and do not require this kind of mitigation anyway. [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html [2] https://blade.tencent.com/en/advisories/qualpwn/ [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ Claire Chang (4): dma-mapping: Add bounced DMA ops dma-mapping: Add bounced DMA pool dt-bindings: of: Add plumbing for bounced DMA pool of: Add plumbing for bounced DMA pool .../reserved-memory/reserved-memory.txt | 36 +++ drivers/of/address.c | 37 +++ drivers/of/device.c | 3 + drivers/of/of_private.h | 6 + include/linux/device.h | 3 + include/linux/dma-mapping.h | 1 + kernel/dma/Kconfig | 17 + kernel/dma/Makefile | 1 + kernel/dma/bounced.c | 304 ++++++++++++++++++ 9 files changed, 408 insertions(+) create mode 100644 kernel/dma/bounced.c -- 2.27.0.383.g050319c2ae-goog _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
The bounced DMA ops provide an implementation of DMA ops that bounce streaming DMA in and out of a specially allocated region. Only the operations relevant to streaming DMA are supported. Signed-off-by: Claire Chang <tientzu@chromium.org> --- include/linux/device.h | 3 + include/linux/dma-mapping.h | 1 + kernel/dma/Kconfig | 17 +++ kernel/dma/Makefile | 1 + kernel/dma/bounced.c | 215 ++++++++++++++++++++++++++++++++++++ 5 files changed, 237 insertions(+) create mode 100644 kernel/dma/bounced.c diff --git a/include/linux/device.h b/include/linux/device.h index 7322c51e9c0c..868b9a364003 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -588,6 +588,9 @@ struct device { struct list_head dma_pools; /* dma pools (if dma'ble) */ +#ifdef CONFIG_DMA_BOUNCED + struct dma_bounced_mem *dma_bounced_mem; +#endif #ifdef CONFIG_DMA_DECLARE_COHERENT struct dma_coherent_mem *dma_mem; /* internal for coherent mem override */ diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 2328f451a45d..86089424dafd 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -135,6 +135,7 @@ struct dma_map_ops { extern const struct dma_map_ops dma_virt_ops; extern const struct dma_map_ops dma_dummy_ops; +extern const struct dma_map_ops dma_bounced_ops; #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 1da3f44f2565..148734c8748b 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -88,6 +88,23 @@ config DMA_DIRECT_REMAP select DMA_REMAP select DMA_COHERENT_POOL +config DMA_BOUNCED + bool "DMA Bounced" + depends on !HIGHMEM + select OF_RESERVED_MEM + help + This enables support for bounced DMA pools which provide a level of + DMA memory protection on systems with limited hardware protection + capabilities, such as those lacking an IOMMU. It does so by bouncing + the data to a specially allocated DMA-accessible protected region + before mapping and unmapping. One can assign the protected memory + region in the device tree by using reserved-memory. + + For more information see + <Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt> + and <kernel/dma/bounced.c>. + If unsure, say "n". + config DMA_CMA bool "DMA Contiguous Memory Allocator" depends on HAVE_DMA_CONTIGUOUS && CMA diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile index 370f63344e9c..f5fb4f42326a 100644 --- a/kernel/dma/Makefile +++ b/kernel/dma/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_HAS_DMA) += mapping.o direct.o dummy.o +obj-$(CONFIG_DMA_BOUNCED) += bounced.o obj-$(CONFIG_DMA_CMA) += contiguous.o obj-$(CONFIG_DMA_DECLARE_COHERENT) += coherent.o obj-$(CONFIG_DMA_VIRT_OPS) += virt.o diff --git a/kernel/dma/bounced.c b/kernel/dma/bounced.c new file mode 100644 index 000000000000..fcaabb5eccf2 --- /dev/null +++ b/kernel/dma/bounced.c @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Bounced DMA support. + * + * This implements the mitigations for lack of IOMMU by bouncing the data to a + * specially allocated region before mapping and unmapping. + * + * Copyright 2020 Google LLC. + */ +#include <linux/dma-direct.h> +#include <linux/dma-mapping.h> +#include <linux/dma-noncoherent.h> +#include <linux/io.h> +#include <linux/genalloc.h> +#include <linux/slab.h> + +struct dma_bounced_mem { + void **orig_addr; + void *virt_base; + dma_addr_t device_base; + dma_addr_t device_end; + struct gen_pool *pool; + size_t size; +}; + +static void dma_bounced_set_orig_virt(struct device *dev, dma_addr_t dma_addr, + void *orig_addr) +{ + struct dma_bounced_mem *mem = dev->dma_bounced_mem; + int idx = (dma_addr - mem->device_base) >> PAGE_SHIFT; + + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) + return; + + mem->orig_addr[idx] = orig_addr; +} + +static void *dma_bounced_get_orig_virt(struct device *dev, dma_addr_t dma_addr) +{ + struct dma_bounced_mem *mem = dev->dma_bounced_mem; + int idx = (dma_addr - mem->device_base) >> PAGE_SHIFT; + + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) + return NULL; + + return mem->orig_addr[idx]; +} + +static void *dma_bounced_get_virt(struct device *dev, dma_addr_t dma_addr) +{ + struct dma_bounced_mem *mem = dev->dma_bounced_mem; + + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) + return NULL; + + return (dma_addr - mem->device_base) + mem->virt_base; +} + +static void dma_bounced_sync_single_for_cpu(struct device *dev, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir) +{ + void *orig_virt = dma_bounced_get_orig_virt(dev, dma_addr); + void *bounced_virt = dma_bounced_get_virt(dev, dma_addr); + + if (!orig_virt || !bounced_virt) + return; + + dma_direct_sync_single_for_cpu(dev, dma_addr, size, dir); + + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) + memcpy(orig_virt, bounced_virt, size); +} + +static void dma_bounced_sync_single_for_device(struct device *dev, + dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir) +{ + void *orig_virt = dma_bounced_get_orig_virt(dev, dma_addr); + void *bounced_virt = dma_bounced_get_virt(dev, dma_addr); + + if (!orig_virt || !bounced_virt) + return; + + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) + memcpy(bounced_virt, orig_virt, size); + + dma_direct_sync_single_for_device(dev, dma_addr, size, dir); +} + +static void dma_bounced_sync_sg_for_cpu(struct device *dev, + struct scatterlist *sgl, int nents, + enum dma_data_direction dir) +{ + struct scatterlist *sg; + int i; + + for_each_sg(sgl, sg, nents, i) { + dma_bounced_sync_single_for_cpu(dev, sg->dma_address, + sg->length, dir); + } +} + +static void dma_bounced_sync_sg_for_device(struct device *dev, + struct scatterlist *sgl, int nents, + enum dma_data_direction dir) +{ + struct scatterlist *sg; + int i; + + for_each_sg(sgl, sg, nents, i) { + dma_bounced_sync_single_for_device(dev, sg->dma_address, + sg->length, dir); + } +} + +static void dma_bounced_unmap_page(struct device *dev, dma_addr_t dma_addr, + size_t size, enum dma_data_direction dir, + unsigned long attrs) +{ + struct dma_bounced_mem *mem = dev->dma_bounced_mem; + + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) + return; + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_bounced_sync_single_for_cpu(dev, dma_addr, size, dir); + + dma_bounced_set_orig_virt(dev, dma_addr, NULL); + gen_pool_free(mem->pool, + (unsigned long)dma_bounced_get_virt(dev, dma_addr), size); +} + +static dma_addr_t dma_bounced_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + unsigned long attrs) +{ + struct dma_bounced_mem *mem = dev->dma_bounced_mem; + dma_addr_t dma_addr; + void *orig_virt; + + if (unlikely(!gen_pool_dma_alloc(mem->pool, size, &dma_addr))) + return DMA_MAPPING_ERROR; + + orig_virt = page_to_virt(page) + offset; + dma_bounced_set_orig_virt(dev, dma_addr, orig_virt); + + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) + dma_bounced_sync_single_for_device(dev, dma_addr, size, dir); + + return dma_addr; +} + +static void dma_bounced_unmap_sg(struct device *dev, struct scatterlist *sgl, + int nents, enum dma_data_direction dir, + unsigned long attrs) +{ + struct scatterlist *sg; + int i; + + for_each_sg(sgl, sg, nents, i) { + dma_bounced_unmap_page(dev, sg->dma_address, sg_dma_len(sg), + dir, attrs); + } +} + +static int dma_bounced_map_sg(struct device *dev, struct scatterlist *sgl, + int nents, enum dma_data_direction dir, + unsigned long attrs) +{ + int i; + struct scatterlist *sg; + + for_each_sg(sgl, sg, nents, i) { + sg->dma_address = dma_bounced_map_page( + dev, sg_page(sg), sg->offset, sg->length, dir, attrs); + if (sg->dma_address == DMA_MAPPING_ERROR) + goto out_unmap; + sg_dma_len(sg) = sg->length; + } + + return nents; + +out_unmap: + dma_bounced_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC); + return 0; +} + +static size_t dma_bounced_max_mapping_size(struct device *dev) +{ + return dev->dma_bounced_mem->size; +} + +const struct dma_map_ops dma_bounced_ops = { + .alloc = NULL, + .free = NULL, + .mmap = NULL, + .get_sgtable = NULL, + .sync_single_for_cpu = dma_bounced_sync_single_for_cpu, + .sync_single_for_device = dma_bounced_sync_single_for_device, + .sync_sg_for_cpu = dma_bounced_sync_sg_for_cpu, + .sync_sg_for_device = dma_bounced_sync_sg_for_device, + .map_page = dma_bounced_map_page, + .unmap_page = dma_bounced_unmap_page, + .map_sg = dma_bounced_map_sg, + .unmap_sg = dma_bounced_unmap_sg, + .unmap_resource = NULL, + .map_resource = NULL, + .cache_sync = NULL, + .dma_supported = dma_direct_supported, + .get_required_mask = dma_direct_get_required_mask, + .max_mapping_size = dma_bounced_max_mapping_size, + .get_merge_boundary = NULL, +}; -- 2.27.0.383.g050319c2ae-goog _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Add the initialization function to create bounce buffer pools from matching reserved-memory nodes in the device tree. The bounce buffer pools provide a basic level of protection against the DMA overwriting buffer contents at unexpected times. However, to protect against general data leakage and system memory corruption, the system needs to provide a way to restrict the DMA to a predefined memory region. Signed-off-by: Claire Chang <tientzu@chromium.org> --- kernel/dma/bounced.c | 89 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/kernel/dma/bounced.c b/kernel/dma/bounced.c index fcaabb5eccf2..0bfd6cf90aee 100644 --- a/kernel/dma/bounced.c +++ b/kernel/dma/bounced.c @@ -12,6 +12,9 @@ #include <linux/dma-noncoherent.h> #include <linux/io.h> #include <linux/genalloc.h> +#include <linux/of.h> +#include <linux/of_fdt.h> +#include <linux/of_reserved_mem.h> #include <linux/slab.h> struct dma_bounced_mem { @@ -213,3 +216,89 @@ const struct dma_map_ops dma_bounced_ops = { .max_mapping_size = dma_bounced_max_mapping_size, .get_merge_boundary = NULL, }; + +static int dma_bounced_device_init(struct reserved_mem *rmem, + struct device *dev) +{ + struct dma_bounced_mem *mem; + int ret; + + mem = kzalloc(sizeof(*mem), GFP_KERNEL); + if (!mem) + return -ENOMEM; + + mem->virt_base = + devm_memremap(dev, rmem->base, rmem->size, MEMREMAP_WB); + if (!mem->virt_base) { + ret = -EINVAL; + goto error; + } + + mem->size = rmem->size; + mem->device_base = phys_to_dma(dev, rmem->base); + mem->device_end = mem->device_base + rmem->size; + + mem->orig_addr = kcalloc(mem->size >> PAGE_SHIFT, + sizeof(*mem->orig_addr), GFP_KERNEL); + if (!mem->orig_addr) { + ret = -ENOMEM; + goto error; + } + + mem->pool = devm_gen_pool_create(dev, PAGE_SHIFT, NUMA_NO_NODE, + "bounced DMA"); + if (!mem->pool) { + ret = -ENOMEM; + goto error; + } + + ret = gen_pool_add_virt(mem->pool, (unsigned long)mem->virt_base, + rmem->base, rmem->size, NUMA_NO_NODE); + if (ret) + goto error; + + dev->dma_bounced_mem = mem; + set_dma_ops(dev, &dma_bounced_ops); + + return 0; + +error: + kfree(mem); + + return ret; +} + +static void dma_bounced_device_release(struct reserved_mem *rmem, + struct device *dev) +{ + struct dma_bounced_mem *mem = dev->dma_bounced_mem; + + set_dma_ops(dev, NULL); + dev->dma_bounced_mem = NULL; + + kfree(mem->orig_addr); + kfree(mem); +} + +static const struct reserved_mem_ops rmem_dma_bounced_ops = { + .device_init = dma_bounced_device_init, + .device_release = dma_bounced_device_release, +}; + +static int __init dma_bounced_setup(struct reserved_mem *rmem) +{ + unsigned long node = rmem->fdt_node; + + if (of_get_flat_dt_prop(node, "reusable", NULL) || + of_get_flat_dt_prop(node, "linux,cma-default", NULL) || + of_get_flat_dt_prop(node, "linux,dma-default", NULL) || + of_get_flat_dt_prop(node, "no-map", NULL)) + return -EINVAL; + + rmem->ops = &rmem_dma_bounced_ops; + pr_info("Reserved memory: created DMA bounced memory pool at %pa, size %ld MiB\n", + &rmem->base, (unsigned long)rmem->size / SZ_1M); + return 0; +} + +RESERVEDMEM_OF_DECLARE(dma, "bounced-dma-pool", dma_bounced_setup); -- 2.27.0.383.g050319c2ae-goog _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Introduce the new compatible string, bounced-dma-pool, for bounced DMA. One can specify the address and length of the bounced memory region by bounced-dma-pool in the device tree. Signed-off-by: Claire Chang <tientzu@chromium.org> --- .../reserved-memory/reserved-memory.txt | 36 +++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt index 4dd20de6977f..45b3134193ea 100644 --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt @@ -51,6 +51,24 @@ compatible (optional) - standard definition used as a shared pool of DMA buffers for a set of devices. It can be used by an operating system to instantiate the necessary pool management subsystem if necessary. + - bounced-dma-pool: This indicates a region of memory meant to be used + as a pool of DMA bounce buffers for a given device. When using this, + the no-map and reusable properties must not be set, so the operating + system can create a virtual mapping that will be used for + synchronization. Also, there must be a bounced-dma property in the + device node to specify the indexes of reserved-memory nodes. One can + specify two reserved-memory nodes in the device tree. One with + shared-dma-pool to handle the coherent DMA buffer allocation, and + another one with bounced-dma-pool for regular DMA to/from system + memory, which would be subject to bouncing. The main purpose for + bounced DMA is to mitigate the lack of DMA access control on systems + without an IOMMU, which could result in the DMA accessing the system + memory at unexpected times and/or unexpected addresses, possibly + leading to data leakage or corruption. The feature on its own provides + a basic level of protection against the DMA overwriting buffer + contents at unexpected times. However, to protect against general data + leakage and system memory corruption, the system needs to provide a + way to restrict the DMA to a predefined memory region. - vendor specific string in the form <vendor>,[<device>-]<usage> no-map (optional) - empty property - Indicates the operating system must not create a virtual mapping @@ -117,6 +135,17 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB). compatible = "acme,multimedia-memory"; reg = <0x77000000 0x4000000>; }; + + wifi_bounced_dma_mem_region: wifi_bounced_dma_mem_region { + compatible = "bounced-dma-pool"; + reg = <0x50000000 0x4000000>; + }; + + wifi_coherent_mem_region: wifi_coherent_mem_region { + compatible = "shared-dma-pool"; + reg = <0x54000000 0x400000>; + }; + }; /* ... */ @@ -135,4 +164,11 @@ one for multimedia processing (named multimedia-memory@77000000, 64MiB). memory-region = <&multimedia_reserved>; /* ... */ }; + + pcie_wifi: pcie_wifi@0,0 { + memory-region = <&wifi_bounced_dma_mem_region>, + <&wifi_coherent_mem_region>; + bounced-dma = <0>, <1>; + /* ... */ + }; }; -- 2.27.0.383.g050319c2ae-goog _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
If a device is not behind an IOMMU, we look up the device node and set up the bounced DMA when the bounced-dma property is presented. One can specify two reserved-memory nodes in the device tree. One with shared-dma-pool to handle the coherent DMA buffer allocation, and another one with bounced-dma-pool for regular DMA to/from system memory, which would be subject to bouncing. Signed-off-by: Claire Chang <tientzu@chromium.org> --- drivers/of/address.c | 37 +++++++++++++++++++++++++++++++++++++ drivers/of/device.c | 3 +++ drivers/of/of_private.h | 6 ++++++ 3 files changed, 46 insertions(+) diff --git a/drivers/of/address.c b/drivers/of/address.c index 8eea3f6e29a4..a767b80f8862 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -8,6 +8,7 @@ #include <linux/logic_pio.h> #include <linux/module.h> #include <linux/of_address.h> +#include <linux/of_reserved_mem.h> #include <linux/pci.h> #include <linux/pci_regs.h> #include <linux/sizes.h> @@ -1009,6 +1010,42 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz return ret; } +int of_dma_set_bounce_buffer(struct device *dev) +{ + int length, size, ret, i; + u32 idx[2]; + + if (!dev || !dev->of_node) + return -EINVAL; + + if (!of_get_property(dev->of_node, "bounced-dma", &length)) + return 0; + + size = length / sizeof(idx[0]); + if (size > ARRAY_SIZE(idx)) { + dev_err(dev, + "bounced-dma expected less than or equal to 2 indexs, but got %d\n", + size); + return -EINVAL; + } + + ret = of_property_read_u32_array(dev->of_node, "bounced-dma", idx, + size); + + for (i = 0; i < size; i++) { + ret = of_reserved_mem_device_init_by_idx(dev, dev->of_node, + idx[i]); + if (ret) { + dev_err(dev, + "of_reserved_mem_device_init_by_idx() failed with %d\n", + ret); + return ret; + } + } + + return 0; +} + /** * of_dma_is_coherent - Check if device is coherent * @np: device node diff --git a/drivers/of/device.c b/drivers/of/device.c index 27203bfd0b22..238beef48a50 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -169,6 +169,9 @@ int of_dma_configure(struct device *dev, struct device_node *np, bool force_dma) arch_setup_dma_ops(dev, dma_addr, size, iommu, coherent); + if (!iommu) + return of_dma_set_bounce_buffer(dev); + return 0; } EXPORT_SYMBOL_GPL(of_dma_configure); diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index edc682249c00..3d1b8cca1519 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -160,12 +160,18 @@ extern int of_bus_n_size_cells(struct device_node *np); #ifdef CONFIG_OF_ADDRESS extern int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size); +extern int of_dma_set_bounce_buffer(struct device *dev); #else static inline int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size) { return -ENODEV; } + +static inline int of_dma_get_bounce_buffer(struct device *dev) +{ + return -ENODEV; +} #endif #endif /* _LINUX_OF_PRIVATE_H */ -- 2.27.0.383.g050319c2ae-goog _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
On 2020-07-13 10:12, Claire Chang wrote: > This series implements mitigations for lack of DMA access control on > systems without an IOMMU, which could result in the DMA accessing the > system memory at unexpected times and/or unexpected addresses, possibly > leading to data leakage or corruption. > > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus > is not behind an IOMMU. As PCI-e, by design, gives the device full > access to system memory, a vulnerability in the Wi-Fi firmware could > easily escalate to a full system exploit (remote wifi exploits: [1a], > [1b] that shows a full chain of exploits; [2], [3]). > > To mitigate the security concerns, we introduce bounced DMA. The bounced > DMA ops provide an implementation of DMA ops that bounce streaming DMA > in and out of a specially allocated region. The feature on its own > provides a basic level of protection against the DMA overwriting buffer > contents at unexpected times. However, to protect against general data > leakage and system memory corruption, the system needs to provide a way > to restrict the DMA to a predefined memory region (this is usually done > at firmware level, e.g. in ATF on some ARM platforms). More to the point, this seems to need some fairly special interconnect hardware too. On typical systems that just stick a TZASC directly in front of the memory controller it would be hard to block DMA access without also blocking CPU access. With something like Arm TZC-400 I guess you could set up a "secure" region for most of DRAM that allows NS accesses by NSAID from the CPUs, then similar regions for the pools with NSAID access for both the respective device and the CPUs, but by that point you've probably used up most of the available regions before even considering what the firmware and TEE might want for actual Secure memory. In short, I don't foresee this being used by very many systems. That said,, although the motivation is different, it appears to end up being almost exactly the same end result as the POWER secure virtualisation thingy (essentially: constrain DMA to a specific portion of RAM). The more code can be shared with that, the better. > Currently, 32-bit architectures are not supported because of the need to > handle HIGHMEM, which increases code complexity and adds more > performance penalty for such platforms. Also, bounced DMA can not be > enabled on devices behind an IOMMU, as those require an IOMMU-aware > implementation of DMA ops and do not require this kind of mitigation > anyway. Note that we do actually have the notion of bounced DMA with IOMMUs already (to avoid leakage of unrelated data in the same page). I think it's only implemented for intel-iommu so far, but shouldn't take much work to generalise to iommu-dma if anyone wanted to. That's already done a bunch of work to generalise the SWIOTLB routines to be more reusable, so building on top of that would be highly preferable. Thirdly, the concept of device-private bounce buffers does in fact already exist to some degree too - there are various USB, crypto and other devices that can only DMA to a local SRAM buffer (not to mention subsystems doing their own bouncing for the sake of alignment/block merging/etc.). Again, a slightly more generalised solution that makes this a first-class notion for dma-direct itself and could help supplant some of those hacks would be really really good. Robin. > [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html > [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html > [2] https://blade.tencent.com/en/advisories/qualpwn/ > [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ > > > Claire Chang (4): > dma-mapping: Add bounced DMA ops > dma-mapping: Add bounced DMA pool > dt-bindings: of: Add plumbing for bounced DMA pool > of: Add plumbing for bounced DMA pool > > .../reserved-memory/reserved-memory.txt | 36 +++ > drivers/of/address.c | 37 +++ > drivers/of/device.c | 3 + > drivers/of/of_private.h | 6 + > include/linux/device.h | 3 + > include/linux/dma-mapping.h | 1 + > kernel/dma/Kconfig | 17 + > kernel/dma/Makefile | 1 + > kernel/dma/bounced.c | 304 ++++++++++++++++++ > 9 files changed, 408 insertions(+) > create mode 100644 kernel/dma/bounced.c > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
On 2020-07-13 10:12, Claire Chang wrote: > The bounced DMA ops provide an implementation of DMA ops that bounce > streaming DMA in and out of a specially allocated region. Only the > operations relevant to streaming DMA are supported. I think there are too many implicit assumptions here - apparently that coherent allocations will always be intercepted by dma_*_from_dev_coherent(), and that calling into dma-direct won't actually bounce things a second time beyond where you thought they were going, manage coherency for a different address, and make it all go subtly wrong. Consider "swiotlb=force", for instance... Again, plumbing this straight into dma-direct so that SWIOTLB can simply target a different buffer and always bounce regardless of masks would seem a far better option. Robin. > Signed-off-by: Claire Chang <tientzu@chromium.org> > --- > include/linux/device.h | 3 + > include/linux/dma-mapping.h | 1 + > kernel/dma/Kconfig | 17 +++ > kernel/dma/Makefile | 1 + > kernel/dma/bounced.c | 215 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 237 insertions(+) > create mode 100644 kernel/dma/bounced.c > > diff --git a/include/linux/device.h b/include/linux/device.h > index 7322c51e9c0c..868b9a364003 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -588,6 +588,9 @@ struct device { > > struct list_head dma_pools; /* dma pools (if dma'ble) */ > > +#ifdef CONFIG_DMA_BOUNCED > + struct dma_bounced_mem *dma_bounced_mem; > +#endif > #ifdef CONFIG_DMA_DECLARE_COHERENT > struct dma_coherent_mem *dma_mem; /* internal for coherent mem > override */ > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h > index 2328f451a45d..86089424dafd 100644 > --- a/include/linux/dma-mapping.h > +++ b/include/linux/dma-mapping.h > @@ -135,6 +135,7 @@ struct dma_map_ops { > > extern const struct dma_map_ops dma_virt_ops; > extern const struct dma_map_ops dma_dummy_ops; > +extern const struct dma_map_ops dma_bounced_ops; > > #define DMA_BIT_MASK(n) (((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) > > diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig > index 1da3f44f2565..148734c8748b 100644 > --- a/kernel/dma/Kconfig > +++ b/kernel/dma/Kconfig > @@ -88,6 +88,23 @@ config DMA_DIRECT_REMAP > select DMA_REMAP > select DMA_COHERENT_POOL > > +config DMA_BOUNCED > + bool "DMA Bounced" > + depends on !HIGHMEM > + select OF_RESERVED_MEM > + help > + This enables support for bounced DMA pools which provide a level of > + DMA memory protection on systems with limited hardware protection > + capabilities, such as those lacking an IOMMU. It does so by bouncing > + the data to a specially allocated DMA-accessible protected region > + before mapping and unmapping. One can assign the protected memory > + region in the device tree by using reserved-memory. > + > + For more information see > + <Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt> > + and <kernel/dma/bounced.c>. > + If unsure, say "n". > + > config DMA_CMA > bool "DMA Contiguous Memory Allocator" > depends on HAVE_DMA_CONTIGUOUS && CMA > diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile > index 370f63344e9c..f5fb4f42326a 100644 > --- a/kernel/dma/Makefile > +++ b/kernel/dma/Makefile > @@ -1,6 +1,7 @@ > # SPDX-License-Identifier: GPL-2.0 > > obj-$(CONFIG_HAS_DMA) += mapping.o direct.o dummy.o > +obj-$(CONFIG_DMA_BOUNCED) += bounced.o > obj-$(CONFIG_DMA_CMA) += contiguous.o > obj-$(CONFIG_DMA_DECLARE_COHERENT) += coherent.o > obj-$(CONFIG_DMA_VIRT_OPS) += virt.o > diff --git a/kernel/dma/bounced.c b/kernel/dma/bounced.c > new file mode 100644 > index 000000000000..fcaabb5eccf2 > --- /dev/null > +++ b/kernel/dma/bounced.c > @@ -0,0 +1,215 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Bounced DMA support. > + * > + * This implements the mitigations for lack of IOMMU by bouncing the data to a > + * specially allocated region before mapping and unmapping. > + * > + * Copyright 2020 Google LLC. > + */ > +#include <linux/dma-direct.h> > +#include <linux/dma-mapping.h> > +#include <linux/dma-noncoherent.h> > +#include <linux/io.h> > +#include <linux/genalloc.h> > +#include <linux/slab.h> > + > +struct dma_bounced_mem { > + void **orig_addr; > + void *virt_base; > + dma_addr_t device_base; > + dma_addr_t device_end; > + struct gen_pool *pool; > + size_t size; > +}; > + > +static void dma_bounced_set_orig_virt(struct device *dev, dma_addr_t dma_addr, > + void *orig_addr) > +{ > + struct dma_bounced_mem *mem = dev->dma_bounced_mem; > + int idx = (dma_addr - mem->device_base) >> PAGE_SHIFT; > + > + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) > + return; > + > + mem->orig_addr[idx] = orig_addr; > +} > + > +static void *dma_bounced_get_orig_virt(struct device *dev, dma_addr_t dma_addr) > +{ > + struct dma_bounced_mem *mem = dev->dma_bounced_mem; > + int idx = (dma_addr - mem->device_base) >> PAGE_SHIFT; > + > + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) > + return NULL; > + > + return mem->orig_addr[idx]; > +} > + > +static void *dma_bounced_get_virt(struct device *dev, dma_addr_t dma_addr) > +{ > + struct dma_bounced_mem *mem = dev->dma_bounced_mem; > + > + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) > + return NULL; > + > + return (dma_addr - mem->device_base) + mem->virt_base; > +} > + > +static void dma_bounced_sync_single_for_cpu(struct device *dev, > + dma_addr_t dma_addr, size_t size, > + enum dma_data_direction dir) > +{ > + void *orig_virt = dma_bounced_get_orig_virt(dev, dma_addr); > + void *bounced_virt = dma_bounced_get_virt(dev, dma_addr); > + > + if (!orig_virt || !bounced_virt) > + return; > + > + dma_direct_sync_single_for_cpu(dev, dma_addr, size, dir); > + > + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) > + memcpy(orig_virt, bounced_virt, size); > +} > + > +static void dma_bounced_sync_single_for_device(struct device *dev, > + dma_addr_t dma_addr, size_t size, > + enum dma_data_direction dir) > +{ > + void *orig_virt = dma_bounced_get_orig_virt(dev, dma_addr); > + void *bounced_virt = dma_bounced_get_virt(dev, dma_addr); > + > + if (!orig_virt || !bounced_virt) > + return; > + > + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) > + memcpy(bounced_virt, orig_virt, size); > + > + dma_direct_sync_single_for_device(dev, dma_addr, size, dir); > +} > + > +static void dma_bounced_sync_sg_for_cpu(struct device *dev, > + struct scatterlist *sgl, int nents, > + enum dma_data_direction dir) > +{ > + struct scatterlist *sg; > + int i; > + > + for_each_sg(sgl, sg, nents, i) { > + dma_bounced_sync_single_for_cpu(dev, sg->dma_address, > + sg->length, dir); > + } > +} > + > +static void dma_bounced_sync_sg_for_device(struct device *dev, > + struct scatterlist *sgl, int nents, > + enum dma_data_direction dir) > +{ > + struct scatterlist *sg; > + int i; > + > + for_each_sg(sgl, sg, nents, i) { > + dma_bounced_sync_single_for_device(dev, sg->dma_address, > + sg->length, dir); > + } > +} > + > +static void dma_bounced_unmap_page(struct device *dev, dma_addr_t dma_addr, > + size_t size, enum dma_data_direction dir, > + unsigned long attrs) > +{ > + struct dma_bounced_mem *mem = dev->dma_bounced_mem; > + > + if (dma_addr < mem->device_base || dma_addr >= mem->device_end) > + return; > + > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + dma_bounced_sync_single_for_cpu(dev, dma_addr, size, dir); > + > + dma_bounced_set_orig_virt(dev, dma_addr, NULL); > + gen_pool_free(mem->pool, > + (unsigned long)dma_bounced_get_virt(dev, dma_addr), size); > +} > + > +static dma_addr_t dma_bounced_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, > + unsigned long attrs) > +{ > + struct dma_bounced_mem *mem = dev->dma_bounced_mem; > + dma_addr_t dma_addr; > + void *orig_virt; > + > + if (unlikely(!gen_pool_dma_alloc(mem->pool, size, &dma_addr))) > + return DMA_MAPPING_ERROR; > + > + orig_virt = page_to_virt(page) + offset; > + dma_bounced_set_orig_virt(dev, dma_addr, orig_virt); > + > + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) > + dma_bounced_sync_single_for_device(dev, dma_addr, size, dir); > + > + return dma_addr; > +} > + > +static void dma_bounced_unmap_sg(struct device *dev, struct scatterlist *sgl, > + int nents, enum dma_data_direction dir, > + unsigned long attrs) > +{ > + struct scatterlist *sg; > + int i; > + > + for_each_sg(sgl, sg, nents, i) { > + dma_bounced_unmap_page(dev, sg->dma_address, sg_dma_len(sg), > + dir, attrs); > + } > +} > + > +static int dma_bounced_map_sg(struct device *dev, struct scatterlist *sgl, > + int nents, enum dma_data_direction dir, > + unsigned long attrs) > +{ > + int i; > + struct scatterlist *sg; > + > + for_each_sg(sgl, sg, nents, i) { > + sg->dma_address = dma_bounced_map_page( > + dev, sg_page(sg), sg->offset, sg->length, dir, attrs); > + if (sg->dma_address == DMA_MAPPING_ERROR) > + goto out_unmap; > + sg_dma_len(sg) = sg->length; > + } > + > + return nents; > + > +out_unmap: > + dma_bounced_unmap_sg(dev, sgl, i, dir, attrs | DMA_ATTR_SKIP_CPU_SYNC); > + return 0; > +} > + > +static size_t dma_bounced_max_mapping_size(struct device *dev) > +{ > + return dev->dma_bounced_mem->size; > +} > + > +const struct dma_map_ops dma_bounced_ops = { > + .alloc = NULL, > + .free = NULL, > + .mmap = NULL, > + .get_sgtable = NULL, > + .sync_single_for_cpu = dma_bounced_sync_single_for_cpu, > + .sync_single_for_device = dma_bounced_sync_single_for_device, > + .sync_sg_for_cpu = dma_bounced_sync_sg_for_cpu, > + .sync_sg_for_device = dma_bounced_sync_sg_for_device, > + .map_page = dma_bounced_map_page, > + .unmap_page = dma_bounced_unmap_page, > + .map_sg = dma_bounced_map_sg, > + .unmap_sg = dma_bounced_unmap_sg, > + .unmap_resource = NULL, > + .map_resource = NULL, > + .cache_sync = NULL, > + .dma_supported = dma_direct_supported, > + .get_required_mask = dma_direct_get_required_mask, > + .max_mapping_size = dma_bounced_max_mapping_size, > + .get_merge_boundary = NULL, > +}; > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
On Mon, Jul 13, 2020 at 12:55:43PM +0100, Robin Murphy wrote: > On 2020-07-13 10:12, Claire Chang wrote: >> The bounced DMA ops provide an implementation of DMA ops that bounce >> streaming DMA in and out of a specially allocated region. Only the >> operations relevant to streaming DMA are supported. > > I think there are too many implicit assumptions here - apparently that > coherent allocations will always be intercepted by > dma_*_from_dev_coherent(), and that calling into dma-direct won't actually > bounce things a second time beyond where you thought they were going, > manage coherency for a different address, and make it all go subtly wrong. > Consider "swiotlb=force", for instance... > > Again, plumbing this straight into dma-direct so that SWIOTLB can simply > target a different buffer and always bounce regardless of masks would seem > a far better option. I haven't really had time to read through the details, but I agree that any bouncing scheme should reuse the swiotlb code and not invent a parallel infrastructure. _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
On Mon, Jul 13, 2020 at 7:40 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2020-07-13 10:12, Claire Chang wrote: > > This series implements mitigations for lack of DMA access control on > > systems without an IOMMU, which could result in the DMA accessing the > > system memory at unexpected times and/or unexpected addresses, possibly > > leading to data leakage or corruption. > > > > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus > > is not behind an IOMMU. As PCI-e, by design, gives the device full > > access to system memory, a vulnerability in the Wi-Fi firmware could > > easily escalate to a full system exploit (remote wifi exploits: [1a], > > [1b] that shows a full chain of exploits; [2], [3]). > > > > To mitigate the security concerns, we introduce bounced DMA. The bounced > > DMA ops provide an implementation of DMA ops that bounce streaming DMA > > in and out of a specially allocated region. The feature on its own > > provides a basic level of protection against the DMA overwriting buffer > > contents at unexpected times. However, to protect against general data > > leakage and system memory corruption, the system needs to provide a way > > to restrict the DMA to a predefined memory region (this is usually done > > at firmware level, e.g. in ATF on some ARM platforms). > > More to the point, this seems to need some fairly special interconnect > hardware too. On typical systems that just stick a TZASC directly in > front of the memory controller it would be hard to block DMA access > without also blocking CPU access. With something like Arm TZC-400 I > guess you could set up a "secure" region for most of DRAM that allows NS > accesses by NSAID from the CPUs, then similar regions for the pools with > NSAID access for both the respective device and the CPUs, but by that > point you've probably used up most of the available regions before even > considering what the firmware and TEE might want for actual Secure memory. > > In short, I don't foresee this being used by very many systems. We're going to use this on MTK SoC with MPU (memory protection unit) to restrict the DMA access for PCI-e Wi-Fi. > > That said,, although the motivation is different, it appears to end up > being almost exactly the same end result as the POWER secure > virtualisation thingy (essentially: constrain DMA to a specific portion > of RAM). The more code can be shared with that, the better. Could you share a bit more about the POWER secure virtualisation thingy? > > > Currently, 32-bit architectures are not supported because of the need to > > handle HIGHMEM, which increases code complexity and adds more > > performance penalty for such platforms. Also, bounced DMA can not be > > enabled on devices behind an IOMMU, as those require an IOMMU-aware > > implementation of DMA ops and do not require this kind of mitigation > > anyway. > > Note that we do actually have the notion of bounced DMA with IOMMUs > already (to avoid leakage of unrelated data in the same page). I think > it's only implemented for intel-iommu so far, but shouldn't take much > work to generalise to iommu-dma if anyone wanted to. That's already done > a bunch of work to generalise the SWIOTLB routines to be more reusable, > so building on top of that would be highly preferable. Yes, I'm aware of that and I'll try to put this on top of SWIOTLB. > > Thirdly, the concept of device-private bounce buffers does in fact > already exist to some degree too - there are various USB, crypto and > other devices that can only DMA to a local SRAM buffer (not to mention > subsystems doing their own bouncing for the sake of alignment/block > merging/etc.). Again, a slightly more generalised solution that makes > this a first-class notion for dma-direct itself and could help supplant > some of those hacks would be really really good. > > Robin. > > > [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html > > [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html > > [2] https://blade.tencent.com/en/advisories/qualpwn/ > > [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ > > > > > > Claire Chang (4): > > dma-mapping: Add bounced DMA ops > > dma-mapping: Add bounced DMA pool > > dt-bindings: of: Add plumbing for bounced DMA pool > > of: Add plumbing for bounced DMA pool > > > > .../reserved-memory/reserved-memory.txt | 36 +++ > > drivers/of/address.c | 37 +++ > > drivers/of/device.c | 3 + > > drivers/of/of_private.h | 6 + > > include/linux/device.h | 3 + > > include/linux/dma-mapping.h | 1 + > > kernel/dma/Kconfig | 17 + > > kernel/dma/Makefile | 1 + > > kernel/dma/bounced.c | 304 ++++++++++++++++++ > > 9 files changed, 408 insertions(+) > > create mode 100644 kernel/dma/bounced.c > > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
On Tue, Jul 14, 2020 at 7:01 PM Christoph Hellwig <hch@lst.de> wrote: > > On Mon, Jul 13, 2020 at 12:55:43PM +0100, Robin Murphy wrote: > > On 2020-07-13 10:12, Claire Chang wrote: > >> The bounced DMA ops provide an implementation of DMA ops that bounce > >> streaming DMA in and out of a specially allocated region. Only the > >> operations relevant to streaming DMA are supported. > > > > I think there are too many implicit assumptions here - apparently that > > coherent allocations will always be intercepted by > > dma_*_from_dev_coherent(), and that calling into dma-direct won't actually > > bounce things a second time beyond where you thought they were going, > > manage coherency for a different address, and make it all go subtly wrong. > > Consider "swiotlb=force", for instance... > > > > Again, plumbing this straight into dma-direct so that SWIOTLB can simply > > target a different buffer and always bounce regardless of masks would seem > > a far better option. > > I haven't really had time to read through the details, but I agree that > any bouncing scheme should reuse the swiotlb code and not invent a > parallel infrastructure. Thanks for the feedback. I'll try to reuse SWIOTLB. _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
On Wed, Jul 15, 2020 at 11:46 AM Claire Chang <tientzu@chromium.org> wrote: > > On Tue, Jul 14, 2020 at 7:01 PM Christoph Hellwig <hch@lst.de> wrote: > > > > On Mon, Jul 13, 2020 at 12:55:43PM +0100, Robin Murphy wrote: > > > On 2020-07-13 10:12, Claire Chang wrote: > > >> The bounced DMA ops provide an implementation of DMA ops that bounce > > >> streaming DMA in and out of a specially allocated region. Only the > > >> operations relevant to streaming DMA are supported. > > > > > > I think there are too many implicit assumptions here - apparently that > > > coherent allocations will always be intercepted by > > > dma_*_from_dev_coherent(), and that calling into dma-direct won't actually > > > bounce things a second time beyond where you thought they were going, > > > manage coherency for a different address, and make it all go subtly wrong. > > > Consider "swiotlb=force", for instance... If I understand it correctly, reusing SWIOTLB won't prevent the coherent allocations from always being intercepted by dma_*_from_dev_coherent(), right? Since we can't bounce the coherent memory, we still need to rely on dma_*_from_dev_coherent() and a reserved-memory region for coherent DMA to restrict the device DMA access. As for calling into dma-direct, in this version, I use set_dma_ops to set the dma_bounced_ops, so I just bypass dma-direct and SWIOTLB. "swiotlb=force" won't bounce things a second time and the data will still be bounced to the region set in dts. Besides, I did a quick search and found that only two *-iommu.c directly use dma_direct_map_page. https://elixir.bootlin.com/linux/latest/C/ident/dma_direct_map_page Since bounced DMA is to mitigate the lack of DMA access control on systems without an IOMMU (see patch#4, only call of_dma_set_bounce_buffer for the devices not behind an IOMMU), can we assume no one will use dma-direct? (I understand that if we build bounced DMA on top of SWIOTLB, we don't need to worry about this.) > > > > > > Again, plumbing this straight into dma-direct so that SWIOTLB can simply > > > target a different buffer and always bounce regardless of masks would seem > > > a far better option. > > > > I haven't really had time to read through the details, but I agree that > > any bouncing scheme should reuse the swiotlb code and not invent a > > parallel infrastructure. > Thanks for the feedback. I'll try to reuse SWIOTLB. My current plan is to first change the buffers management logic in SWIOTLB to use gen_pool like this patch (i.e., gen_pool_dma_alloc, gen_pool_free, ect), and then make SWIOTLB use the device's private pool for regular DMA to/from system memory if possible. Does this sound right? Thanks! _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
On 2020-07-15 04:43, Claire Chang wrote: > On Mon, Jul 13, 2020 at 7:40 PM Robin Murphy <robin.murphy@arm.com> wrote: >> >> On 2020-07-13 10:12, Claire Chang wrote: >>> This series implements mitigations for lack of DMA access control on >>> systems without an IOMMU, which could result in the DMA accessing the >>> system memory at unexpected times and/or unexpected addresses, possibly >>> leading to data leakage or corruption. >>> >>> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus >>> is not behind an IOMMU. As PCI-e, by design, gives the device full >>> access to system memory, a vulnerability in the Wi-Fi firmware could >>> easily escalate to a full system exploit (remote wifi exploits: [1a], >>> [1b] that shows a full chain of exploits; [2], [3]). >>> >>> To mitigate the security concerns, we introduce bounced DMA. The bounced >>> DMA ops provide an implementation of DMA ops that bounce streaming DMA >>> in and out of a specially allocated region. The feature on its own >>> provides a basic level of protection against the DMA overwriting buffer >>> contents at unexpected times. However, to protect against general data >>> leakage and system memory corruption, the system needs to provide a way >>> to restrict the DMA to a predefined memory region (this is usually done >>> at firmware level, e.g. in ATF on some ARM platforms). >> >> More to the point, this seems to need some fairly special interconnect >> hardware too. On typical systems that just stick a TZASC directly in >> front of the memory controller it would be hard to block DMA access >> without also blocking CPU access. With something like Arm TZC-400 I >> guess you could set up a "secure" region for most of DRAM that allows NS >> accesses by NSAID from the CPUs, then similar regions for the pools with >> NSAID access for both the respective device and the CPUs, but by that >> point you've probably used up most of the available regions before even >> considering what the firmware and TEE might want for actual Secure memory. >> >> In short, I don't foresee this being used by very many systems. > We're going to use this on MTK SoC with MPU (memory protection unit) to > restrict the DMA access for PCI-e Wi-Fi. OK, based on failing to really grasp the M4U and LARB side of MTK's interconnect for the media stuff, I'm not even going to ask about that MPU ;) What I had in mind and didn't do a good job of actually expressing there was that this is a strong justification for using shared code as much as possible, to minimise the amount we have to maintain that's specific to this relatively niche use-case (it's good for SoCs like yours that *can* support it, but most will either have IOMMUs or simply not have the option of doing anything). >> That said,, although the motivation is different, it appears to end up >> being almost exactly the same end result as the POWER secure >> virtualisation thingy (essentially: constrain DMA to a specific portion >> of RAM). The more code can be shared with that, the better. > Could you share a bit more about the POWER secure virtualisation thingy? There are probably more recent resources, but what I could easily turn up from memory (other than a lot of back-and-forth about virtio) is this old patchset which seems to form a reasonable overview: https://lore.kernel.org/linux-iommu/20180824162535.22798-1-bauerman@linux.ibm.com/ The main differences are that they have the luxury of being able to make the SWIOTLB buffer accessible in-place, rather than having to control its initial allocation, and because it's a system-wide thing they could pretty much achieve it all with existing machinery - on second look there are actually far fewer changes there than I thought - whereas you need the extra work to handle it on a per-device basis. Robin. >>> Currently, 32-bit architectures are not supported because of the need to >>> handle HIGHMEM, which increases code complexity and adds more >>> performance penalty for such platforms. Also, bounced DMA can not be >>> enabled on devices behind an IOMMU, as those require an IOMMU-aware >>> implementation of DMA ops and do not require this kind of mitigation >>> anyway. >> >> Note that we do actually have the notion of bounced DMA with IOMMUs >> already (to avoid leakage of unrelated data in the same page). I think >> it's only implemented for intel-iommu so far, but shouldn't take much >> work to generalise to iommu-dma if anyone wanted to. That's already done >> a bunch of work to generalise the SWIOTLB routines to be more reusable, >> so building on top of that would be highly preferable. > Yes, I'm aware of that and I'll try to put this on top of SWIOTLB. >> >> Thirdly, the concept of device-private bounce buffers does in fact >> already exist to some degree too - there are various USB, crypto and >> other devices that can only DMA to a local SRAM buffer (not to mention >> subsystems doing their own bouncing for the sake of alignment/block >> merging/etc.). Again, a slightly more generalised solution that makes >> this a first-class notion for dma-direct itself and could help supplant >> some of those hacks would be really really good. >> >> Robin. >> >>> [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html >>> [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html >>> [2] https://blade.tencent.com/en/advisories/qualpwn/ >>> [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ >>> >>> >>> Claire Chang (4): >>> dma-mapping: Add bounced DMA ops >>> dma-mapping: Add bounced DMA pool >>> dt-bindings: of: Add plumbing for bounced DMA pool >>> of: Add plumbing for bounced DMA pool >>> >>> .../reserved-memory/reserved-memory.txt | 36 +++ >>> drivers/of/address.c | 37 +++ >>> drivers/of/device.c | 3 + >>> drivers/of/of_private.h | 6 + >>> include/linux/device.h | 3 + >>> include/linux/dma-mapping.h | 1 + >>> kernel/dma/Kconfig | 17 + >>> kernel/dma/Makefile | 1 + >>> kernel/dma/bounced.c | 304 ++++++++++++++++++ >>> 9 files changed, 408 insertions(+) >>> create mode 100644 kernel/dma/bounced.c >>> _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
v2 that reuses SWIOTLB here: https://lore.kernel.org/patchwork/cover/1280705/ Thanks, Claire _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu