linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH v2 1/2] dma-heap: Keep track of the heap device struct
@ 2020-08-14  6:24 John Stultz
  2020-08-14  6:24 ` [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap John Stultz
  0 siblings, 1 reply; 6+ messages in thread
From: John Stultz @ 2020-08-14  6:24 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Andrew F . Davis, Benjamin Gaignard,
	Liam Mark, Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Robin Murphy, linux-media, dri-devel

Keep track of the heap device struct.

This will be useful for special DMA allocations
and actions.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/dma-buf/dma-heap.c | 33 +++++++++++++++++++++++++--------
 include/linux/dma-heap.h   |  9 +++++++++
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index afd22c9dbdcf..72c746755d89 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -30,6 +30,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @heap_dev		heap device struct
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -40,6 +41,7 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+	struct device *heap_dev;
 };
 
 static LIST_HEAD(heap_list);
@@ -190,10 +192,21 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
 	return heap->priv;
 }
 
+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap)
+{
+	return heap->heap_dev;
+}
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
-	struct device *dev_ret;
 	unsigned int minor;
 	int ret;
 
@@ -247,16 +260,20 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 		goto err1;
 	}
 
-	dev_ret = device_create(dma_heap_class,
-				NULL,
-				heap->heap_devt,
-				NULL,
-				heap->name);
-	if (IS_ERR(dev_ret)) {
+	heap->heap_dev = device_create(dma_heap_class,
+				       NULL,
+				       heap->heap_devt,
+				       NULL,
+				       heap->name);
+	if (IS_ERR(heap->heap_dev)) {
 		pr_err("dma_heap: Unable to create device\n");
-		err_ret = ERR_CAST(dev_ret);
+		err_ret = ERR_CAST(heap->heap_dev);
 		goto err2;
 	}
+
+	/* Make sure it doesn't disappear on us */
+	heap->heap_dev = get_device(heap->heap_dev);
+
 	/* Add heap to the list */
 	mutex_lock(&heap_list_lock);
 	list_add(&heap->list, &heap_list);
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 454e354d1ffb..82857e096910 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -50,6 +50,15 @@ struct dma_heap_export_info {
  */
 void *dma_heap_get_drvdata(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
  2020-08-14  6:24 [RFC][PATCH v2 1/2] dma-heap: Keep track of the heap device struct John Stultz
@ 2020-08-14  6:24 ` John Stultz
  2020-08-14  9:17   ` Daniel Vetter
  2020-08-14 16:14   ` Ezequiel Garcia
  0 siblings, 2 replies; 6+ messages in thread
From: John Stultz @ 2020-08-14  6:24 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Andrew F . Davis, Benjamin Gaignard,
	Liam Mark, Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Robin Murphy, linux-media, dri-devel

This adds a heap that allocates non-contiguous buffers that are
marked as writecombined, so they are not cached by the CPU.

This is useful, as most graphics buffers are usually not touched
by the CPU or only written into once by the CPU. So when mapping
the buffer over and over between devices, we can skip the CPU
syncing, which saves a lot of cache management overhead, greatly
improving performance.

For folk using ION, there was a ION_FLAG_CACHED flag, which
signaled if the returned buffer should be CPU cacheable or not.
With DMA-BUF heaps, we have no such flag, and by default the
current heaps (system and cma) produce CPU cachable buffers.
So for folks transitioning from ION to DMA-BUF Heaps, this fills
in some of that missing functionality.

This does have a few "ugly" bits that were required to get
the buffer properly flushed out initially which I'd like to
improve. So feedback would be very welcome!

Many thanks to Liam Mark for his help to get this working.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v2:
* Fix build issue on sh reported-by: kernel test robot <lkp@intel.com>
* Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and
  for_each_sg_page() along with numerous other cleanups suggested
  by Robin Murphy
---
 drivers/dma-buf/heaps/Kconfig                |  10 +
 drivers/dma-buf/heaps/Makefile               |   1 +
 drivers/dma-buf/heaps/system_uncached_heap.c | 371 +++++++++++++++++++
 3 files changed, 382 insertions(+)
 create mode 100644 drivers/dma-buf/heaps/system_uncached_heap.c

diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
index a5eef06c4226..420b0ed0a512 100644
--- a/drivers/dma-buf/heaps/Kconfig
+++ b/drivers/dma-buf/heaps/Kconfig
@@ -5,6 +5,16 @@ config DMABUF_HEAPS_SYSTEM
 	  Choose this option to enable the system dmabuf heap. The system heap
 	  is backed by pages from the buddy allocator. If in doubt, say Y.
 
+config DMABUF_HEAPS_SYSTEM_UNCACHED
+	bool "DMA-BUF Uncached System Heap"
+	depends on DMABUF_HEAPS
+	help
+	  Choose this option to enable the uncached system dmabuf heap. This
+	  heap is backed by pages from the buddy allocator, but pages are setup
+	  for write combining. This avoids cache management overhead, and can
+	  be faster if pages are mostly untouched by the cpu.  If in doubt,
+	  say Y.
+
 config DMABUF_HEAPS_CMA
 	bool "DMA-BUF CMA Heap"
 	depends on DMABUF_HEAPS && DMA_CMA
diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
index 6e54cdec3da0..085685ec478f 100644
--- a/drivers/dma-buf/heaps/Makefile
+++ b/drivers/dma-buf/heaps/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-y					+= heap-helpers.o
 obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)	+= system_heap.o
+obj-$(CONFIG_DMABUF_HEAPS_SYSTEM_UNCACHED) += system_uncached_heap.o
 obj-$(CONFIG_DMABUF_HEAPS_CMA)		+= cma_heap.o
diff --git a/drivers/dma-buf/heaps/system_uncached_heap.c b/drivers/dma-buf/heaps/system_uncached_heap.c
new file mode 100644
index 000000000000..3b8e699bcae7
--- /dev/null
+++ b/drivers/dma-buf/heaps/system_uncached_heap.c
@@ -0,0 +1,371 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Uncached System DMA-Heap exporter
+ *
+ * Copyright (C) 2020 Linaro Ltd.
+ *
+ * Based off of Andrew Davis' SRAM heap:
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
+ *	Andrew F. Davis <afd@ti.com>
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-heap.h>
+
+struct uncached_heap {
+	struct dma_heap *heap;
+};
+
+struct uncached_heap_buffer {
+	struct dma_heap *heap;
+	struct list_head attachments;
+	struct mutex lock;
+	unsigned long len;
+	struct sg_table sg_table;
+	int vmap_cnt;
+	void *vaddr;
+};
+
+struct dma_heap_attachment {
+	struct device *dev;
+	struct sg_table *table;
+	struct list_head list;
+};
+
+static struct sg_table *dup_sg_table(struct sg_table *table)
+{
+	struct sg_table *new_table;
+	int ret, i;
+	struct scatterlist *sg, *new_sg;
+
+	new_table = kzalloc(sizeof(*new_table), GFP_KERNEL);
+	if (!new_table)
+		return ERR_PTR(-ENOMEM);
+
+	ret = sg_alloc_table(new_table, table->nents, GFP_KERNEL);
+	if (ret) {
+		kfree(new_table);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	new_sg = new_table->sgl;
+	for_each_sgtable_sg(table, sg, i) {
+		sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset);
+		new_sg = sg_next(new_sg);
+	}
+
+	return new_table;
+}
+
+static int dma_heap_attach(struct dma_buf *dmabuf,
+			   struct dma_buf_attachment *attachment)
+{
+	struct uncached_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a;
+	struct sg_table *table;
+
+	a = kzalloc(sizeof(*a), GFP_KERNEL);
+	if (!a)
+		return -ENOMEM;
+
+	table = dup_sg_table(&buffer->sg_table);
+	if (IS_ERR(table)) {
+		kfree(a);
+		return -ENOMEM;
+	}
+
+	a->table = table;
+	a->dev = attachment->dev;
+	INIT_LIST_HEAD(&a->list);
+
+	attachment->priv = a;
+
+	mutex_lock(&buffer->lock);
+	list_add(&a->list, &buffer->attachments);
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static void dma_heap_detatch(struct dma_buf *dmabuf,
+			     struct dma_buf_attachment *attachment)
+{
+	struct uncached_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a = attachment->priv;
+
+	mutex_lock(&buffer->lock);
+	list_del(&a->list);
+	mutex_unlock(&buffer->lock);
+
+	sg_free_table(a->table);
+	kfree(a->table);
+	kfree(a);
+}
+
+static struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment,
+					     enum dma_data_direction direction)
+{
+	struct dma_heap_attachment *a = attachment->priv;
+	struct sg_table *table = a->table;
+
+	if (dma_map_sgtable(attachment->dev, table, direction, DMA_ATTR_SKIP_CPU_SYNC))
+		return ERR_PTR(-ENOMEM);
+
+	return table;
+}
+
+static void dma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
+				   struct sg_table *table,
+				   enum dma_data_direction direction)
+{
+	dma_unmap_sgtable(attachment->dev, table, direction, DMA_ATTR_SKIP_CPU_SYNC);
+}
+
+static int dma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
+{
+	struct uncached_heap_buffer *buffer = dmabuf->priv;
+	struct sg_table *table = &buffer->sg_table;
+	unsigned long addr = vma->vm_start;
+	struct sg_page_iter piter;
+	int ret;
+
+	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+	for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
+		struct page *page = sg_page_iter_page(&piter);
+
+		ret = remap_pfn_range(vma, addr, page_to_pfn(page), PAGE_SIZE,
+				      vma->vm_page_prot);
+		if (ret)
+			return ret;
+		addr += PAGE_SIZE;
+		if (addr >= vma->vm_end)
+			return 0;
+	}
+	return 0;
+}
+
+static void *dma_heap_do_vmap(struct uncached_heap_buffer *buffer)
+{
+	struct sg_table *table = &buffer->sg_table;
+	int npages = PAGE_ALIGN(buffer->len) / PAGE_SIZE;
+	struct page **pages = vmalloc(sizeof(struct page *) * npages);
+	struct page **tmp = pages;
+	struct sg_page_iter piter;
+	pgprot_t pgprot;
+	void *vaddr;
+
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
+
+	pgprot = pgprot_writecombine(PAGE_KERNEL);
+
+	for_each_sgtable_page(table, &piter, 0) {
+		WARN_ON(tmp - pages >= npages);
+		*tmp++ = sg_page_iter_page(&piter);
+	}
+
+	vaddr = vmap(pages, npages, VM_MAP, pgprot);
+	vfree(pages);
+
+	if (!vaddr)
+		return ERR_PTR(-ENOMEM);
+
+	return vaddr;
+}
+
+static void *dma_heap_buffer_vmap_get(struct uncached_heap_buffer *buffer)
+{
+	void *vaddr;
+
+	if (buffer->vmap_cnt) {
+		buffer->vmap_cnt++;
+		return buffer->vaddr;
+	}
+
+	vaddr = dma_heap_do_vmap(buffer);
+	if (IS_ERR(vaddr))
+		return vaddr;
+
+	buffer->vaddr = vaddr;
+	buffer->vmap_cnt++;
+	return vaddr;
+}
+
+static void dma_heap_buffer_vmap_put(struct uncached_heap_buffer *buffer)
+{
+	if (!--buffer->vmap_cnt) {
+		vunmap(buffer->vaddr);
+		buffer->vaddr = NULL;
+	}
+}
+
+static void *dma_heap_vmap(struct dma_buf *dmabuf)
+{
+	struct uncached_heap_buffer *buffer = dmabuf->priv;
+	void *vaddr;
+
+	mutex_lock(&buffer->lock);
+	vaddr = dma_heap_buffer_vmap_get(buffer);
+	mutex_unlock(&buffer->lock);
+
+	return vaddr;
+}
+
+static void dma_heap_vunmap(struct dma_buf *dmabuf, void *vaddr)
+{
+	struct uncached_heap_buffer *buffer = dmabuf->priv;
+
+	mutex_lock(&buffer->lock);
+	dma_heap_buffer_vmap_put(buffer);
+	mutex_unlock(&buffer->lock);
+}
+
+static void dma_heap_dma_buf_release(struct dma_buf *dmabuf)
+{
+	struct uncached_heap_buffer *buffer = dmabuf->priv;
+	struct sg_table *table;
+	struct scatterlist *sg;
+	int i;
+
+	table = &buffer->sg_table;
+	dma_unmap_sgtable(dma_heap_get_dev(buffer->heap), table, DMA_BIDIRECTIONAL, 0);
+
+	for_each_sgtable_sg(table, sg, i)
+		__free_page(sg_page(sg));
+	sg_free_table(table);
+	kfree(buffer);
+}
+
+const struct dma_buf_ops uncached_heap_buf_ops = {
+	.attach = dma_heap_attach,
+	.detach = dma_heap_detatch,
+	.map_dma_buf = dma_heap_map_dma_buf,
+	.unmap_dma_buf = dma_heap_unmap_dma_buf,
+	.mmap = dma_heap_mmap,
+	.vmap = dma_heap_vmap,
+	.vunmap = dma_heap_vunmap,
+	.release = dma_heap_dma_buf_release,
+};
+
+static int uncached_heap_allocate(struct dma_heap *heap,
+				  unsigned long len,
+				  unsigned long fd_flags,
+				  unsigned long heap_flags)
+{
+	struct uncached_heap_buffer *buffer;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	struct dma_buf *dmabuf;
+	struct sg_table *table;
+	struct scatterlist *sg;
+	pgoff_t pagecount;
+	pgoff_t pg;
+	int i, ret = -ENOMEM;
+
+	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&buffer->attachments);
+	mutex_init(&buffer->lock);
+	buffer->heap = heap;
+	buffer->len = len;
+
+	table = &buffer->sg_table;
+	pagecount = len / PAGE_SIZE;
+	if (sg_alloc_table(table, pagecount, GFP_KERNEL))
+		goto free_buffer;
+
+	sg = table->sgl;
+	for (pg = 0; pg < pagecount; pg++) {
+		struct page *page;
+		/*
+		 * Avoid trying to allocate memory if the process
+		 * has been killed by SIGKILL
+		 */
+		if (fatal_signal_pending(current))
+			goto free_pages;
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			goto free_pages;
+		sg_set_page(sg, page, page_size(page), 0);
+		sg = sg_next(sg);
+	}
+
+	/* create the dmabuf */
+	exp_info.ops = &uncached_heap_buf_ops;
+	exp_info.size = buffer->len;
+	exp_info.flags = fd_flags;
+	exp_info.priv = buffer;
+	dmabuf = dma_buf_export(&exp_info);
+	if (IS_ERR(dmabuf)) {
+		ret = PTR_ERR(dmabuf);
+		goto free_pages;
+	}
+
+	ret = dma_buf_fd(dmabuf, fd_flags);
+	if (ret < 0) {
+		dma_buf_put(dmabuf);
+		/* just return, as put will call release and that will free */
+		return ret;
+	}
+
+	/*
+	 * XXX This is hackish. While the buffer will be uncached, we need
+	 * to initially flush cpu cache, since the __GFP_ZERO on the
+	 * allocation means the zeroing was done by the cpu and thus it is
+	 * likely cached. Map (and implicitly flush) it out now so we don't
+	 * get corruption later on.
+	 *
+	 * Ideally we could do this without using the heap device as a dummy dev.
+	 */
+	dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
+
+	return ret;
+
+free_pages:
+	for_each_sgtable_sg(table, sg, i)
+		__free_page(sg_page(sg));
+	sg_free_table(table);
+free_buffer:
+	kfree(buffer);
+
+	return ret;
+}
+
+static struct dma_heap_ops uncached_heap_ops = {
+	.allocate = uncached_heap_allocate,
+};
+
+static int uncached_heap_create(void)
+{
+	struct uncached_heap *heap;
+	struct dma_heap_export_info exp_info;
+
+	heap = kzalloc(sizeof(*heap), GFP_KERNEL);
+	if (!heap)
+		return -ENOMEM;
+
+	exp_info.name = "system-uncached";
+	exp_info.ops = &uncached_heap_ops;
+	exp_info.priv = heap;
+	heap->heap = dma_heap_add(&exp_info);
+	if (IS_ERR(heap->heap)) {
+		int ret = PTR_ERR(heap->heap);
+
+		kfree(heap);
+		return ret;
+	}
+	dma_coerce_mask_and_coherent(dma_heap_get_dev(heap->heap), DMA_BIT_MASK(64));
+
+	return 0;
+}
+device_initcall(uncached_heap_create);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
  2020-08-14  6:24 ` [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap John Stultz
@ 2020-08-14  9:17   ` Daniel Vetter
  2020-08-14 19:57     ` John Stultz
  2020-08-14 16:14   ` Ezequiel Garcia
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel Vetter @ 2020-08-14  9:17 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, dri-devel, Liam Mark, Andrew F . Davis, Laura Abbott,
	Hridya Valsaraju, Robin Murphy, linux-media

On Fri, Aug 14, 2020 at 06:24:58AM +0000, John Stultz wrote:
> This adds a heap that allocates non-contiguous buffers that are
> marked as writecombined, so they are not cached by the CPU.
> 
> This is useful, as most graphics buffers are usually not touched
> by the CPU or only written into once by the CPU. So when mapping
> the buffer over and over between devices, we can skip the CPU
> syncing, which saves a lot of cache management overhead, greatly
> improving performance.
> 
> For folk using ION, there was a ION_FLAG_CACHED flag, which
> signaled if the returned buffer should be CPU cacheable or not.
> With DMA-BUF heaps, we have no such flag, and by default the
> current heaps (system and cma) produce CPU cachable buffers.
> So for folks transitioning from ION to DMA-BUF Heaps, this fills
> in some of that missing functionality.
> 
> This does have a few "ugly" bits that were required to get
> the buffer properly flushed out initially which I'd like to
> improve. So feedback would be very welcome!
> 
> Many thanks to Liam Mark for his help to get this working.
> 
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Andrew F. Davis <afd@ti.com>
> Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
> Cc: Liam Mark <lmark@codeaurora.org>
> Cc: Laura Abbott <labbott@kernel.org>
> Cc: Brian Starkey <Brian.Starkey@arm.com>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
> v2:
> * Fix build issue on sh reported-by: kernel test robot <lkp@intel.com>
> * Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and
>   for_each_sg_page() along with numerous other cleanups suggested
>   by Robin Murphy

Mildly annoying me, but where's the userspace for this? Do we have a
gralloc somewhere that works with open driver stacks and uses this?

Without that I'm somewhat afraid we'll engineer ourselves something that
looks like it should work, but doesn't really work in reality.
-Daniel

> ---
>  drivers/dma-buf/heaps/Kconfig                |  10 +
>  drivers/dma-buf/heaps/Makefile               |   1 +
>  drivers/dma-buf/heaps/system_uncached_heap.c | 371 +++++++++++++++++++
>  3 files changed, 382 insertions(+)
>  create mode 100644 drivers/dma-buf/heaps/system_uncached_heap.c
> 
> diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
> index a5eef06c4226..420b0ed0a512 100644
> --- a/drivers/dma-buf/heaps/Kconfig
> +++ b/drivers/dma-buf/heaps/Kconfig
> @@ -5,6 +5,16 @@ config DMABUF_HEAPS_SYSTEM
>  	  Choose this option to enable the system dmabuf heap. The system heap
>  	  is backed by pages from the buddy allocator. If in doubt, say Y.
>  
> +config DMABUF_HEAPS_SYSTEM_UNCACHED
> +	bool "DMA-BUF Uncached System Heap"
> +	depends on DMABUF_HEAPS
> +	help
> +	  Choose this option to enable the uncached system dmabuf heap. This
> +	  heap is backed by pages from the buddy allocator, but pages are setup
> +	  for write combining. This avoids cache management overhead, and can
> +	  be faster if pages are mostly untouched by the cpu.  If in doubt,
> +	  say Y.
> +
>  config DMABUF_HEAPS_CMA
>  	bool "DMA-BUF CMA Heap"
>  	depends on DMABUF_HEAPS && DMA_CMA
> diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
> index 6e54cdec3da0..085685ec478f 100644
> --- a/drivers/dma-buf/heaps/Makefile
> +++ b/drivers/dma-buf/heaps/Makefile
> @@ -1,4 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-y					+= heap-helpers.o
>  obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)	+= system_heap.o
> +obj-$(CONFIG_DMABUF_HEAPS_SYSTEM_UNCACHED) += system_uncached_heap.o
>  obj-$(CONFIG_DMABUF_HEAPS_CMA)		+= cma_heap.o
> diff --git a/drivers/dma-buf/heaps/system_uncached_heap.c b/drivers/dma-buf/heaps/system_uncached_heap.c
> new file mode 100644
> index 000000000000..3b8e699bcae7
> --- /dev/null
> +++ b/drivers/dma-buf/heaps/system_uncached_heap.c
> @@ -0,0 +1,371 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Uncached System DMA-Heap exporter
> + *
> + * Copyright (C) 2020 Linaro Ltd.
> + *
> + * Based off of Andrew Davis' SRAM heap:
> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
> + *	Andrew F. Davis <afd@ti.com>
> + */
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/err.h>
> +#include <linux/highmem.h>
> +#include <linux/io.h>
> +#include <linux/mm.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include <linux/dma-buf.h>
> +#include <linux/dma-heap.h>
> +
> +struct uncached_heap {
> +	struct dma_heap *heap;
> +};
> +
> +struct uncached_heap_buffer {
> +	struct dma_heap *heap;
> +	struct list_head attachments;
> +	struct mutex lock;
> +	unsigned long len;
> +	struct sg_table sg_table;
> +	int vmap_cnt;
> +	void *vaddr;
> +};
> +
> +struct dma_heap_attachment {
> +	struct device *dev;
> +	struct sg_table *table;
> +	struct list_head list;
> +};
> +
> +static struct sg_table *dup_sg_table(struct sg_table *table)
> +{
> +	struct sg_table *new_table;
> +	int ret, i;
> +	struct scatterlist *sg, *new_sg;
> +
> +	new_table = kzalloc(sizeof(*new_table), GFP_KERNEL);
> +	if (!new_table)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = sg_alloc_table(new_table, table->nents, GFP_KERNEL);
> +	if (ret) {
> +		kfree(new_table);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	new_sg = new_table->sgl;
> +	for_each_sgtable_sg(table, sg, i) {
> +		sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset);
> +		new_sg = sg_next(new_sg);
> +	}
> +
> +	return new_table;
> +}
> +
> +static int dma_heap_attach(struct dma_buf *dmabuf,
> +			   struct dma_buf_attachment *attachment)
> +{
> +	struct uncached_heap_buffer *buffer = dmabuf->priv;
> +	struct dma_heap_attachment *a;
> +	struct sg_table *table;
> +
> +	a = kzalloc(sizeof(*a), GFP_KERNEL);
> +	if (!a)
> +		return -ENOMEM;
> +
> +	table = dup_sg_table(&buffer->sg_table);
> +	if (IS_ERR(table)) {
> +		kfree(a);
> +		return -ENOMEM;
> +	}
> +
> +	a->table = table;
> +	a->dev = attachment->dev;
> +	INIT_LIST_HEAD(&a->list);
> +
> +	attachment->priv = a;
> +
> +	mutex_lock(&buffer->lock);
> +	list_add(&a->list, &buffer->attachments);
> +	mutex_unlock(&buffer->lock);
> +
> +	return 0;
> +}
> +
> +static void dma_heap_detatch(struct dma_buf *dmabuf,
> +			     struct dma_buf_attachment *attachment)
> +{
> +	struct uncached_heap_buffer *buffer = dmabuf->priv;
> +	struct dma_heap_attachment *a = attachment->priv;
> +
> +	mutex_lock(&buffer->lock);
> +	list_del(&a->list);
> +	mutex_unlock(&buffer->lock);
> +
> +	sg_free_table(a->table);
> +	kfree(a->table);
> +	kfree(a);
> +}
> +
> +static struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment,
> +					     enum dma_data_direction direction)
> +{
> +	struct dma_heap_attachment *a = attachment->priv;
> +	struct sg_table *table = a->table;
> +
> +	if (dma_map_sgtable(attachment->dev, table, direction, DMA_ATTR_SKIP_CPU_SYNC))
> +		return ERR_PTR(-ENOMEM);
> +
> +	return table;
> +}
> +
> +static void dma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
> +				   struct sg_table *table,
> +				   enum dma_data_direction direction)
> +{
> +	dma_unmap_sgtable(attachment->dev, table, direction, DMA_ATTR_SKIP_CPU_SYNC);
> +}
> +
> +static int dma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> +{
> +	struct uncached_heap_buffer *buffer = dmabuf->priv;
> +	struct sg_table *table = &buffer->sg_table;
> +	unsigned long addr = vma->vm_start;
> +	struct sg_page_iter piter;
> +	int ret;
> +
> +	vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> +	for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
> +		struct page *page = sg_page_iter_page(&piter);
> +
> +		ret = remap_pfn_range(vma, addr, page_to_pfn(page), PAGE_SIZE,
> +				      vma->vm_page_prot);
> +		if (ret)
> +			return ret;
> +		addr += PAGE_SIZE;
> +		if (addr >= vma->vm_end)
> +			return 0;
> +	}
> +	return 0;
> +}
> +
> +static void *dma_heap_do_vmap(struct uncached_heap_buffer *buffer)
> +{
> +	struct sg_table *table = &buffer->sg_table;
> +	int npages = PAGE_ALIGN(buffer->len) / PAGE_SIZE;
> +	struct page **pages = vmalloc(sizeof(struct page *) * npages);
> +	struct page **tmp = pages;
> +	struct sg_page_iter piter;
> +	pgprot_t pgprot;
> +	void *vaddr;
> +
> +	if (!pages)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pgprot = pgprot_writecombine(PAGE_KERNEL);
> +
> +	for_each_sgtable_page(table, &piter, 0) {
> +		WARN_ON(tmp - pages >= npages);
> +		*tmp++ = sg_page_iter_page(&piter);
> +	}
> +
> +	vaddr = vmap(pages, npages, VM_MAP, pgprot);
> +	vfree(pages);
> +
> +	if (!vaddr)
> +		return ERR_PTR(-ENOMEM);
> +
> +	return vaddr;
> +}
> +
> +static void *dma_heap_buffer_vmap_get(struct uncached_heap_buffer *buffer)
> +{
> +	void *vaddr;
> +
> +	if (buffer->vmap_cnt) {
> +		buffer->vmap_cnt++;
> +		return buffer->vaddr;
> +	}
> +
> +	vaddr = dma_heap_do_vmap(buffer);
> +	if (IS_ERR(vaddr))
> +		return vaddr;
> +
> +	buffer->vaddr = vaddr;
> +	buffer->vmap_cnt++;
> +	return vaddr;
> +}
> +
> +static void dma_heap_buffer_vmap_put(struct uncached_heap_buffer *buffer)
> +{
> +	if (!--buffer->vmap_cnt) {
> +		vunmap(buffer->vaddr);
> +		buffer->vaddr = NULL;
> +	}
> +}
> +
> +static void *dma_heap_vmap(struct dma_buf *dmabuf)
> +{
> +	struct uncached_heap_buffer *buffer = dmabuf->priv;
> +	void *vaddr;
> +
> +	mutex_lock(&buffer->lock);
> +	vaddr = dma_heap_buffer_vmap_get(buffer);
> +	mutex_unlock(&buffer->lock);
> +
> +	return vaddr;
> +}
> +
> +static void dma_heap_vunmap(struct dma_buf *dmabuf, void *vaddr)
> +{
> +	struct uncached_heap_buffer *buffer = dmabuf->priv;
> +
> +	mutex_lock(&buffer->lock);
> +	dma_heap_buffer_vmap_put(buffer);
> +	mutex_unlock(&buffer->lock);
> +}
> +
> +static void dma_heap_dma_buf_release(struct dma_buf *dmabuf)
> +{
> +	struct uncached_heap_buffer *buffer = dmabuf->priv;
> +	struct sg_table *table;
> +	struct scatterlist *sg;
> +	int i;
> +
> +	table = &buffer->sg_table;
> +	dma_unmap_sgtable(dma_heap_get_dev(buffer->heap), table, DMA_BIDIRECTIONAL, 0);
> +
> +	for_each_sgtable_sg(table, sg, i)
> +		__free_page(sg_page(sg));
> +	sg_free_table(table);
> +	kfree(buffer);
> +}
> +
> +const struct dma_buf_ops uncached_heap_buf_ops = {
> +	.attach = dma_heap_attach,
> +	.detach = dma_heap_detatch,
> +	.map_dma_buf = dma_heap_map_dma_buf,
> +	.unmap_dma_buf = dma_heap_unmap_dma_buf,
> +	.mmap = dma_heap_mmap,
> +	.vmap = dma_heap_vmap,
> +	.vunmap = dma_heap_vunmap,
> +	.release = dma_heap_dma_buf_release,
> +};
> +
> +static int uncached_heap_allocate(struct dma_heap *heap,
> +				  unsigned long len,
> +				  unsigned long fd_flags,
> +				  unsigned long heap_flags)
> +{
> +	struct uncached_heap_buffer *buffer;
> +	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +	struct dma_buf *dmabuf;
> +	struct sg_table *table;
> +	struct scatterlist *sg;
> +	pgoff_t pagecount;
> +	pgoff_t pg;
> +	int i, ret = -ENOMEM;
> +
> +	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
> +	if (!buffer)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(&buffer->attachments);
> +	mutex_init(&buffer->lock);
> +	buffer->heap = heap;
> +	buffer->len = len;
> +
> +	table = &buffer->sg_table;
> +	pagecount = len / PAGE_SIZE;
> +	if (sg_alloc_table(table, pagecount, GFP_KERNEL))
> +		goto free_buffer;
> +
> +	sg = table->sgl;
> +	for (pg = 0; pg < pagecount; pg++) {
> +		struct page *page;
> +		/*
> +		 * Avoid trying to allocate memory if the process
> +		 * has been killed by SIGKILL
> +		 */
> +		if (fatal_signal_pending(current))
> +			goto free_pages;
> +		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +		if (!page)
> +			goto free_pages;
> +		sg_set_page(sg, page, page_size(page), 0);
> +		sg = sg_next(sg);
> +	}
> +
> +	/* create the dmabuf */
> +	exp_info.ops = &uncached_heap_buf_ops;
> +	exp_info.size = buffer->len;
> +	exp_info.flags = fd_flags;
> +	exp_info.priv = buffer;
> +	dmabuf = dma_buf_export(&exp_info);
> +	if (IS_ERR(dmabuf)) {
> +		ret = PTR_ERR(dmabuf);
> +		goto free_pages;
> +	}
> +
> +	ret = dma_buf_fd(dmabuf, fd_flags);
> +	if (ret < 0) {
> +		dma_buf_put(dmabuf);
> +		/* just return, as put will call release and that will free */
> +		return ret;
> +	}
> +
> +	/*
> +	 * XXX This is hackish. While the buffer will be uncached, we need
> +	 * to initially flush cpu cache, since the __GFP_ZERO on the
> +	 * allocation means the zeroing was done by the cpu and thus it is
> +	 * likely cached. Map (and implicitly flush) it out now so we don't
> +	 * get corruption later on.
> +	 *
> +	 * Ideally we could do this without using the heap device as a dummy dev.
> +	 */
> +	dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
> +
> +	return ret;
> +
> +free_pages:
> +	for_each_sgtable_sg(table, sg, i)
> +		__free_page(sg_page(sg));
> +	sg_free_table(table);
> +free_buffer:
> +	kfree(buffer);
> +
> +	return ret;
> +}
> +
> +static struct dma_heap_ops uncached_heap_ops = {
> +	.allocate = uncached_heap_allocate,
> +};
> +
> +static int uncached_heap_create(void)
> +{
> +	struct uncached_heap *heap;
> +	struct dma_heap_export_info exp_info;
> +
> +	heap = kzalloc(sizeof(*heap), GFP_KERNEL);
> +	if (!heap)
> +		return -ENOMEM;
> +
> +	exp_info.name = "system-uncached";
> +	exp_info.ops = &uncached_heap_ops;
> +	exp_info.priv = heap;
> +	heap->heap = dma_heap_add(&exp_info);
> +	if (IS_ERR(heap->heap)) {
> +		int ret = PTR_ERR(heap->heap);
> +
> +		kfree(heap);
> +		return ret;
> +	}
> +	dma_coerce_mask_and_coherent(dma_heap_get_dev(heap->heap), DMA_BIT_MASK(64));
> +
> +	return 0;
> +}
> +device_initcall(uncached_heap_create);
> -- 
> 2.17.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
  2020-08-14  6:24 ` [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap John Stultz
  2020-08-14  9:17   ` Daniel Vetter
@ 2020-08-14 16:14   ` Ezequiel Garcia
  2020-08-14 20:14     ` John Stultz
  1 sibling, 1 reply; 6+ messages in thread
From: Ezequiel Garcia @ 2020-08-14 16:14 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Sumit Semwal, Andrew F . Davis, Benjamin Gaignard,
	Liam Mark, Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Robin Murphy, linux-media, dri-devel

Hi John,

Thanks for the patch.

On Fri, 14 Aug 2020 at 03:25, John Stultz <john.stultz@linaro.org> wrote:
>
> This adds a heap that allocates non-contiguous buffers that are
> marked as writecombined, so they are not cached by the CPU.
>

What's the rationale for exposing the memory
attribute as a new heap, instead of just introducing flags?

I guess this calls for some guidelines on what situations
call for a separate heap, and when it's just a modifier flag.

Thanks!
Ezequiel

> This is useful, as most graphics buffers are usually not touched
> by the CPU or only written into once by the CPU. So when mapping
> the buffer over and over between devices, we can skip the CPU
> syncing, which saves a lot of cache management overhead, greatly
> improving performance.
>
> For folk using ION, there was a ION_FLAG_CACHED flag, which
> signaled if the returned buffer should be CPU cacheable or not.
> With DMA-BUF heaps, we have no such flag, and by default the
> current heaps (system and cma) produce CPU cachable buffers.
> So for folks transitioning from ION to DMA-BUF Heaps, this fills
> in some of that missing functionality.
>
> This does have a few "ugly" bits that were required to get
> the buffer properly flushed out initially which I'd like to
> improve. So feedback would be very welcome!
>
> Many thanks to Liam Mark for his help to get this working.
>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Andrew F. Davis <afd@ti.com>
> Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
> Cc: Liam Mark <lmark@codeaurora.org>
> Cc: Laura Abbott <labbott@kernel.org>
> Cc: Brian Starkey <Brian.Starkey@arm.com>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
> v2:
> * Fix build issue on sh reported-by: kernel test robot <lkp@intel.com>
> * Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and
>   for_each_sg_page() along with numerous other cleanups suggested
>   by Robin Murphy
> ---
>  drivers/dma-buf/heaps/Kconfig                |  10 +
>  drivers/dma-buf/heaps/Makefile               |   1 +
>  drivers/dma-buf/heaps/system_uncached_heap.c | 371 +++++++++++++++++++
>  3 files changed, 382 insertions(+)
>  create mode 100644 drivers/dma-buf/heaps/system_uncached_heap.c
>
> diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig
> index a5eef06c4226..420b0ed0a512 100644
> --- a/drivers/dma-buf/heaps/Kconfig
> +++ b/drivers/dma-buf/heaps/Kconfig
> @@ -5,6 +5,16 @@ config DMABUF_HEAPS_SYSTEM
>           Choose this option to enable the system dmabuf heap. The system heap
>           is backed by pages from the buddy allocator. If in doubt, say Y.
>
> +config DMABUF_HEAPS_SYSTEM_UNCACHED
> +       bool "DMA-BUF Uncached System Heap"
> +       depends on DMABUF_HEAPS
> +       help
> +         Choose this option to enable the uncached system dmabuf heap. This
> +         heap is backed by pages from the buddy allocator, but pages are setup
> +         for write combining. This avoids cache management overhead, and can
> +         be faster if pages are mostly untouched by the cpu.  If in doubt,
> +         say Y.
> +
>  config DMABUF_HEAPS_CMA
>         bool "DMA-BUF CMA Heap"
>         depends on DMABUF_HEAPS && DMA_CMA
> diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
> index 6e54cdec3da0..085685ec478f 100644
> --- a/drivers/dma-buf/heaps/Makefile
> +++ b/drivers/dma-buf/heaps/Makefile
> @@ -1,4 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-y                                  += heap-helpers.o
>  obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)      += system_heap.o
> +obj-$(CONFIG_DMABUF_HEAPS_SYSTEM_UNCACHED) += system_uncached_heap.o
>  obj-$(CONFIG_DMABUF_HEAPS_CMA)         += cma_heap.o
> diff --git a/drivers/dma-buf/heaps/system_uncached_heap.c b/drivers/dma-buf/heaps/system_uncached_heap.c
> new file mode 100644
> index 000000000000..3b8e699bcae7
> --- /dev/null
> +++ b/drivers/dma-buf/heaps/system_uncached_heap.c
> @@ -0,0 +1,371 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Uncached System DMA-Heap exporter
> + *
> + * Copyright (C) 2020 Linaro Ltd.
> + *
> + * Based off of Andrew Davis' SRAM heap:
> + * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
> + *     Andrew F. Davis <afd@ti.com>
> + */
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/err.h>
> +#include <linux/highmem.h>
> +#include <linux/io.h>
> +#include <linux/mm.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>
> +#include <linux/dma-buf.h>
> +#include <linux/dma-heap.h>
> +
> +struct uncached_heap {
> +       struct dma_heap *heap;
> +};
> +
> +struct uncached_heap_buffer {
> +       struct dma_heap *heap;
> +       struct list_head attachments;
> +       struct mutex lock;
> +       unsigned long len;
> +       struct sg_table sg_table;
> +       int vmap_cnt;
> +       void *vaddr;
> +};
> +
> +struct dma_heap_attachment {
> +       struct device *dev;
> +       struct sg_table *table;
> +       struct list_head list;
> +};
> +
> +static struct sg_table *dup_sg_table(struct sg_table *table)
> +{
> +       struct sg_table *new_table;
> +       int ret, i;
> +       struct scatterlist *sg, *new_sg;
> +
> +       new_table = kzalloc(sizeof(*new_table), GFP_KERNEL);
> +       if (!new_table)
> +               return ERR_PTR(-ENOMEM);
> +
> +       ret = sg_alloc_table(new_table, table->nents, GFP_KERNEL);
> +       if (ret) {
> +               kfree(new_table);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       new_sg = new_table->sgl;
> +       for_each_sgtable_sg(table, sg, i) {
> +               sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset);
> +               new_sg = sg_next(new_sg);
> +       }
> +
> +       return new_table;
> +}
> +
> +static int dma_heap_attach(struct dma_buf *dmabuf,
> +                          struct dma_buf_attachment *attachment)
> +{
> +       struct uncached_heap_buffer *buffer = dmabuf->priv;
> +       struct dma_heap_attachment *a;
> +       struct sg_table *table;
> +
> +       a = kzalloc(sizeof(*a), GFP_KERNEL);
> +       if (!a)
> +               return -ENOMEM;
> +
> +       table = dup_sg_table(&buffer->sg_table);
> +       if (IS_ERR(table)) {
> +               kfree(a);
> +               return -ENOMEM;
> +       }
> +
> +       a->table = table;
> +       a->dev = attachment->dev;
> +       INIT_LIST_HEAD(&a->list);
> +
> +       attachment->priv = a;
> +
> +       mutex_lock(&buffer->lock);
> +       list_add(&a->list, &buffer->attachments);
> +       mutex_unlock(&buffer->lock);
> +
> +       return 0;
> +}
> +
> +static void dma_heap_detatch(struct dma_buf *dmabuf,
> +                            struct dma_buf_attachment *attachment)
> +{
> +       struct uncached_heap_buffer *buffer = dmabuf->priv;
> +       struct dma_heap_attachment *a = attachment->priv;
> +
> +       mutex_lock(&buffer->lock);
> +       list_del(&a->list);
> +       mutex_unlock(&buffer->lock);
> +
> +       sg_free_table(a->table);
> +       kfree(a->table);
> +       kfree(a);
> +}
> +
> +static struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment,
> +                                            enum dma_data_direction direction)
> +{
> +       struct dma_heap_attachment *a = attachment->priv;
> +       struct sg_table *table = a->table;
> +
> +       if (dma_map_sgtable(attachment->dev, table, direction, DMA_ATTR_SKIP_CPU_SYNC))
> +               return ERR_PTR(-ENOMEM);
> +
> +       return table;
> +}
> +
> +static void dma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
> +                                  struct sg_table *table,
> +                                  enum dma_data_direction direction)
> +{
> +       dma_unmap_sgtable(attachment->dev, table, direction, DMA_ATTR_SKIP_CPU_SYNC);
> +}
> +
> +static int dma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> +{
> +       struct uncached_heap_buffer *buffer = dmabuf->priv;
> +       struct sg_table *table = &buffer->sg_table;
> +       unsigned long addr = vma->vm_start;
> +       struct sg_page_iter piter;
> +       int ret;
> +
> +       vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> +       for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
> +               struct page *page = sg_page_iter_page(&piter);
> +
> +               ret = remap_pfn_range(vma, addr, page_to_pfn(page), PAGE_SIZE,
> +                                     vma->vm_page_prot);
> +               if (ret)
> +                       return ret;
> +               addr += PAGE_SIZE;
> +               if (addr >= vma->vm_end)
> +                       return 0;
> +       }
> +       return 0;
> +}
> +
> +static void *dma_heap_do_vmap(struct uncached_heap_buffer *buffer)
> +{
> +       struct sg_table *table = &buffer->sg_table;
> +       int npages = PAGE_ALIGN(buffer->len) / PAGE_SIZE;
> +       struct page **pages = vmalloc(sizeof(struct page *) * npages);
> +       struct page **tmp = pages;
> +       struct sg_page_iter piter;
> +       pgprot_t pgprot;
> +       void *vaddr;
> +
> +       if (!pages)
> +               return ERR_PTR(-ENOMEM);
> +
> +       pgprot = pgprot_writecombine(PAGE_KERNEL);
> +
> +       for_each_sgtable_page(table, &piter, 0) {
> +               WARN_ON(tmp - pages >= npages);
> +               *tmp++ = sg_page_iter_page(&piter);
> +       }
> +
> +       vaddr = vmap(pages, npages, VM_MAP, pgprot);
> +       vfree(pages);
> +
> +       if (!vaddr)
> +               return ERR_PTR(-ENOMEM);
> +
> +       return vaddr;
> +}
> +
> +static void *dma_heap_buffer_vmap_get(struct uncached_heap_buffer *buffer)
> +{
> +       void *vaddr;
> +
> +       if (buffer->vmap_cnt) {
> +               buffer->vmap_cnt++;
> +               return buffer->vaddr;
> +       }
> +
> +       vaddr = dma_heap_do_vmap(buffer);
> +       if (IS_ERR(vaddr))
> +               return vaddr;
> +
> +       buffer->vaddr = vaddr;
> +       buffer->vmap_cnt++;
> +       return vaddr;
> +}
> +
> +static void dma_heap_buffer_vmap_put(struct uncached_heap_buffer *buffer)
> +{
> +       if (!--buffer->vmap_cnt) {
> +               vunmap(buffer->vaddr);
> +               buffer->vaddr = NULL;
> +       }
> +}
> +
> +static void *dma_heap_vmap(struct dma_buf *dmabuf)
> +{
> +       struct uncached_heap_buffer *buffer = dmabuf->priv;
> +       void *vaddr;
> +
> +       mutex_lock(&buffer->lock);
> +       vaddr = dma_heap_buffer_vmap_get(buffer);
> +       mutex_unlock(&buffer->lock);
> +
> +       return vaddr;
> +}
> +
> +static void dma_heap_vunmap(struct dma_buf *dmabuf, void *vaddr)
> +{
> +       struct uncached_heap_buffer *buffer = dmabuf->priv;
> +
> +       mutex_lock(&buffer->lock);
> +       dma_heap_buffer_vmap_put(buffer);
> +       mutex_unlock(&buffer->lock);
> +}
> +
> +static void dma_heap_dma_buf_release(struct dma_buf *dmabuf)
> +{
> +       struct uncached_heap_buffer *buffer = dmabuf->priv;
> +       struct sg_table *table;
> +       struct scatterlist *sg;
> +       int i;
> +
> +       table = &buffer->sg_table;
> +       dma_unmap_sgtable(dma_heap_get_dev(buffer->heap), table, DMA_BIDIRECTIONAL, 0);
> +
> +       for_each_sgtable_sg(table, sg, i)
> +               __free_page(sg_page(sg));
> +       sg_free_table(table);
> +       kfree(buffer);
> +}
> +
> +const struct dma_buf_ops uncached_heap_buf_ops = {
> +       .attach = dma_heap_attach,
> +       .detach = dma_heap_detatch,
> +       .map_dma_buf = dma_heap_map_dma_buf,
> +       .unmap_dma_buf = dma_heap_unmap_dma_buf,
> +       .mmap = dma_heap_mmap,
> +       .vmap = dma_heap_vmap,
> +       .vunmap = dma_heap_vunmap,
> +       .release = dma_heap_dma_buf_release,
> +};
> +
> +static int uncached_heap_allocate(struct dma_heap *heap,
> +                                 unsigned long len,
> +                                 unsigned long fd_flags,
> +                                 unsigned long heap_flags)
> +{
> +       struct uncached_heap_buffer *buffer;
> +       DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> +       struct dma_buf *dmabuf;
> +       struct sg_table *table;
> +       struct scatterlist *sg;
> +       pgoff_t pagecount;
> +       pgoff_t pg;
> +       int i, ret = -ENOMEM;
> +
> +       buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
> +       if (!buffer)
> +               return -ENOMEM;
> +
> +       INIT_LIST_HEAD(&buffer->attachments);
> +       mutex_init(&buffer->lock);
> +       buffer->heap = heap;
> +       buffer->len = len;
> +
> +       table = &buffer->sg_table;
> +       pagecount = len / PAGE_SIZE;
> +       if (sg_alloc_table(table, pagecount, GFP_KERNEL))
> +               goto free_buffer;
> +
> +       sg = table->sgl;
> +       for (pg = 0; pg < pagecount; pg++) {
> +               struct page *page;
> +               /*
> +                * Avoid trying to allocate memory if the process
> +                * has been killed by SIGKILL
> +                */
> +               if (fatal_signal_pending(current))
> +                       goto free_pages;
> +               page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +               if (!page)
> +                       goto free_pages;
> +               sg_set_page(sg, page, page_size(page), 0);
> +               sg = sg_next(sg);
> +       }
> +
> +       /* create the dmabuf */
> +       exp_info.ops = &uncached_heap_buf_ops;
> +       exp_info.size = buffer->len;
> +       exp_info.flags = fd_flags;
> +       exp_info.priv = buffer;
> +       dmabuf = dma_buf_export(&exp_info);
> +       if (IS_ERR(dmabuf)) {
> +               ret = PTR_ERR(dmabuf);
> +               goto free_pages;
> +       }
> +
> +       ret = dma_buf_fd(dmabuf, fd_flags);
> +       if (ret < 0) {
> +               dma_buf_put(dmabuf);
> +               /* just return, as put will call release and that will free */
> +               return ret;
> +       }
> +
> +       /*
> +        * XXX This is hackish. While the buffer will be uncached, we need
> +        * to initially flush cpu cache, since the __GFP_ZERO on the
> +        * allocation means the zeroing was done by the cpu and thus it is
> +        * likely cached. Map (and implicitly flush) it out now so we don't
> +        * get corruption later on.
> +        *
> +        * Ideally we could do this without using the heap device as a dummy dev.
> +        */
> +       dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
> +
> +       return ret;
> +
> +free_pages:
> +       for_each_sgtable_sg(table, sg, i)
> +               __free_page(sg_page(sg));
> +       sg_free_table(table);
> +free_buffer:
> +       kfree(buffer);
> +
> +       return ret;
> +}
> +
> +static struct dma_heap_ops uncached_heap_ops = {
> +       .allocate = uncached_heap_allocate,
> +};
> +
> +static int uncached_heap_create(void)
> +{
> +       struct uncached_heap *heap;
> +       struct dma_heap_export_info exp_info;
> +
> +       heap = kzalloc(sizeof(*heap), GFP_KERNEL);
> +       if (!heap)
> +               return -ENOMEM;
> +
> +       exp_info.name = "system-uncached";
> +       exp_info.ops = &uncached_heap_ops;
> +       exp_info.priv = heap;
> +       heap->heap = dma_heap_add(&exp_info);
> +       if (IS_ERR(heap->heap)) {
> +               int ret = PTR_ERR(heap->heap);
> +
> +               kfree(heap);
> +               return ret;
> +       }
> +       dma_coerce_mask_and_coherent(dma_heap_get_dev(heap->heap), DMA_BIT_MASK(64));
> +
> +       return 0;
> +}
> +device_initcall(uncached_heap_create);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
  2020-08-14  9:17   ` Daniel Vetter
@ 2020-08-14 19:57     ` John Stultz
  0 siblings, 0 replies; 6+ messages in thread
From: John Stultz @ 2020-08-14 19:57 UTC (permalink / raw)
  To: John Stultz, lkml, dri-devel, Liam Mark, Andrew F . Davis,
	Laura Abbott, Hridya Valsaraju, Robin Murphy, linux-media

On Fri, Aug 14, 2020 at 2:17 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Fri, Aug 14, 2020 at 06:24:58AM +0000, John Stultz wrote:
> > This adds a heap that allocates non-contiguous buffers that are
> > marked as writecombined, so they are not cached by the CPU.
> >
> > This is useful, as most graphics buffers are usually not touched
> > by the CPU or only written into once by the CPU. So when mapping
> > the buffer over and over between devices, we can skip the CPU
> > syncing, which saves a lot of cache management overhead, greatly
> > improving performance.
> >
> > For folk using ION, there was a ION_FLAG_CACHED flag, which
> > signaled if the returned buffer should be CPU cacheable or not.
> > With DMA-BUF heaps, we have no such flag, and by default the
> > current heaps (system and cma) produce CPU cachable buffers.
> > So for folks transitioning from ION to DMA-BUF Heaps, this fills
> > in some of that missing functionality.
> >
> > This does have a few "ugly" bits that were required to get
> > the buffer properly flushed out initially which I'd like to
> > improve. So feedback would be very welcome!
> >
> > Many thanks to Liam Mark for his help to get this working.
> >
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: Andrew F. Davis <afd@ti.com>
> > Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
> > Cc: Liam Mark <lmark@codeaurora.org>
> > Cc: Laura Abbott <labbott@kernel.org>
> > Cc: Brian Starkey <Brian.Starkey@arm.com>
> > Cc: Hridya Valsaraju <hridya@google.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> > Signed-off-by: John Stultz <john.stultz@linaro.org>
> > ---
> > v2:
> > * Fix build issue on sh reported-by: kernel test robot <lkp@intel.com>
> > * Rework to use for_each_sgtable_sg(), dma_map_sgtable(), and
> >   for_each_sg_page() along with numerous other cleanups suggested
> >   by Robin Murphy
>
> Mildly annoying me, but where's the userspace for this? Do we have a
> gralloc somewhere that works with open driver stacks and uses this?

So this is still early RFC, but I've added support to the HiKey960
gralloc, and have pending patches (following DRM rules) I pushed here:
  https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519

And avoiding the cache syncing overhead improves performance nicely on
that board.

There is also work in progress to change the codec2 implementation in
AOSP (which uses ion directly), to move over to DMA BUF heaps and as
it used the !ION_FLAG_CACHED case there this heap would be useful to
match ION's functionality.

The latest patches for that are here:
https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640
(though I'm expecting a deeper rework on those)

thanks
-john

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap
  2020-08-14 16:14   ` Ezequiel Garcia
@ 2020-08-14 20:14     ` John Stultz
  0 siblings, 0 replies; 6+ messages in thread
From: John Stultz @ 2020-08-14 20:14 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: lkml, Sumit Semwal, Andrew F . Davis, Benjamin Gaignard,
	Liam Mark, Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Robin Murphy, linux-media, dri-devel

On Fri, Aug 14, 2020 at 9:15 AM Ezequiel Garcia
<ezequiel@vanguardiasur.com.ar> wrote:
> Thanks for the patch.
>
> On Fri, 14 Aug 2020 at 03:25, John Stultz <john.stultz@linaro.org> wrote:
> >
> > This adds a heap that allocates non-contiguous buffers that are
> > marked as writecombined, so they are not cached by the CPU.
> >
>
> What's the rationale for exposing the memory
> attribute as a new heap, instead of just introducing flags?
>
> I guess this calls for some guidelines on what situations
> call for a separate heap, and when it's just a modifier flag.

YES! :) A big part of this patch is to start a discussion and feel out
what properties of heaps are generic enough to be flags, and what
aspects are unique enough to deserve having their own heap
implementation.

ION used the ION_FLAG_CACHED bit for this and considered it a generic
property (though by default all buffers were uncached). This seemed to
cause enough friction in reviews that we dropped it and used cachable
buffers for the initial DMA BUF heaps.

Further, I want to make sure we avoid the custom flag abuse that ION
saw, especially with vendor heaps. So I think having each unique
behavior being a separate heap is a reasonable stance.

That said, we added the (currently unused) heap-flags field to the
interface as there may be some attributes or modalities that are truly
generic across heaps. So if we want to add an UNCACHED flag instead,
I'm open to that.. however I want to make sure it has clear general
meaning so that its behavior is consistent across all heaps and
architectures (or produces an error if it's not supported).

thanks
-john

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-08-14 20:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14  6:24 [RFC][PATCH v2 1/2] dma-heap: Keep track of the heap device struct John Stultz
2020-08-14  6:24 ` [RFC][PATCH v2 2/2] dma-heap: Add a system-uncached heap John Stultz
2020-08-14  9:17   ` Daniel Vetter
2020-08-14 19:57     ` John Stultz
2020-08-14 16:14   ` Ezequiel Garcia
2020-08-14 20:14     ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).