linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
@ 2020-11-10  3:49 John Stultz
  2020-11-10  3:49 ` [PATCH v5 1/7] dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists John Stultz
                   ` (8 more replies)
  0 siblings, 9 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

Hey All,
  So just wanted to send my last revision of my patch series
of performance optimizations to the dma-buf system heap.

This series reworks the system heap to use sgtables, and then
consolidates the pagelist method from the heap-helpers into the
CMA heap. After which the heap-helpers logic is removed (as it
is unused). I'd still like to find a better way to avoid some of
the logic duplication in implementing the entire dma_buf_ops
handlers per heap. But unfortunately that code is tied somewhat
to how the buffer's memory is tracked. As more heaps show up I
think we'll have a better idea how to best share code, so for
now I think this is ok.

After this, the series introduces an optimization that
Ørjan Eide implemented for ION that avoids calling sync on
attachments that don't have a mapping.

Next, an optimization to use larger order pages for the system
heap. This change brings us closer to the current performance
of the ION allocation code (though there still is a gap due
to ION using a mix of deferred-freeing and page pools, I'll be
looking at integrating those eventually).

Finally, a reworked version of my uncached system heap
implementation I was submitting a few weeks back. Since it
duplicated a lot of the now reworked system heap code, I
realized it would be much simpler to add the functionality to
the system_heap implementation itself.

While not improving the core allocation performance, the
uncached heap allocations do result in *much* improved
performance on HiKey960 as it avoids a lot of flushing and
invalidating buffers that the cpu doesn't touch often.

Feedback on these would be great!

thanks
-john

New in v5:
* Added a comment explaining why the order sizes are
  chosen as they are

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org

John Stultz (7):
  dma-buf: system_heap: Rework system heap to use sgtables instead of
    pagelists
  dma-buf: heaps: Move heap-helper logic into the cma_heap
    implementation
  dma-buf: heaps: Remove heap-helpers code
  dma-buf: heaps: Skip sync if not mapped
  dma-buf: system_heap: Allocate higher order pages if available
  dma-buf: dma-heap: Keep track of the heap device struct
  dma-buf: system_heap: Add a system-uncached heap re-using the system
    heap

 drivers/dma-buf/dma-heap.c           |  33 +-
 drivers/dma-buf/heaps/Makefile       |   1 -
 drivers/dma-buf/heaps/cma_heap.c     | 324 +++++++++++++++---
 drivers/dma-buf/heaps/heap-helpers.c | 270 ---------------
 drivers/dma-buf/heaps/heap-helpers.h |  53 ---
 drivers/dma-buf/heaps/system_heap.c  | 494 ++++++++++++++++++++++++---
 include/linux/dma-heap.h             |   9 +
 7 files changed, 753 insertions(+), 431 deletions(-)
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 1/7] dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
@ 2020-11-10  3:49 ` John Stultz
  2020-11-10  3:49 ` [PATCH v5 2/7] dma-buf: heaps: Move heap-helper logic into the cma_heap implementation John Stultz
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

In preparation for some patches to optmize the system
heap code, rework the dmabuf exporter to utilize sgtables rather
then pageslists for tracking the associated pages.

This will allow for large order page allocations, as well as
more efficient page pooling.

In doing so, the system heap stops using the heap-helpers logic
which sadly is not quite as generic as I was hoping it to be, so
this patch adds heap specific implementations of the dma_buf_ops
function handlers.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v2:
* Fix locking issue and an unused return value Reported-by:
     kernel test robot <lkp@intel.com>
     Julia Lawall <julia.lawall@lip6.fr>
* Make system_heap_buf_ops static Reported-by:
     kernel test robot <lkp@intel.com>
v3:
* Use the new sgtable mapping functions, as Suggested-by:
     Daniel Mentz <danielmentz@google.com>
v4:
* Make sys_heap static (indirectly) Reported-by:
     kernel test robot <lkp@intel.com>
* Spelling fix suggested by BrianS
---
 drivers/dma-buf/heaps/system_heap.c | 344 ++++++++++++++++++++++++----
 1 file changed, 298 insertions(+), 46 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 0bf688e3c023..5c44f9c06807 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -3,7 +3,11 @@
  * DMABUF System heap exporter
  *
  * Copyright (C) 2011 Google, Inc.
- * Copyright (C) 2019 Linaro Ltd.
+ * Copyright (C) 2019, 2020 Linaro Ltd.
+ *
+ * Portions based off of Andrew Davis' SRAM heap:
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
+ *	Andrew F. Davis <afd@ti.com>
  */
 
 #include <linux/dma-buf.h>
@@ -15,72 +19,321 @@
 #include <linux/module.h>
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
-#include <linux/sched/signal.h>
-#include <asm/page.h>
+#include <linux/vmalloc.h>
+
+static struct dma_heap *sys_heap;
 
-#include "heap-helpers.h"
+struct system_heap_buffer {
+	struct dma_heap *heap;
+	struct list_head attachments;
+	struct mutex lock;
+	unsigned long len;
+	struct sg_table sg_table;
+	int vmap_cnt;
+	void *vaddr;
+};
 
-struct dma_heap *sys_heap;
+struct dma_heap_attachment {
+	struct device *dev;
+	struct sg_table *table;
+	struct list_head list;
+};
 
-static void system_heap_free(struct heap_helper_buffer *buffer)
+static struct sg_table *dup_sg_table(struct sg_table *table)
 {
-	pgoff_t pg;
+	struct sg_table *new_table;
+	int ret, i;
+	struct scatterlist *sg, *new_sg;
+
+	new_table = kzalloc(sizeof(*new_table), GFP_KERNEL);
+	if (!new_table)
+		return ERR_PTR(-ENOMEM);
+
+	ret = sg_alloc_table(new_table, table->orig_nents, GFP_KERNEL);
+	if (ret) {
+		kfree(new_table);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	new_sg = new_table->sgl;
+	for_each_sgtable_sg(table, sg, i) {
+		sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset);
+		new_sg = sg_next(new_sg);
+	}
+
+	return new_table;
+}
+
+static int system_heap_attach(struct dma_buf *dmabuf,
+			      struct dma_buf_attachment *attachment)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a;
+	struct sg_table *table;
+
+	a = kzalloc(sizeof(*a), GFP_KERNEL);
+	if (!a)
+		return -ENOMEM;
+
+	table = dup_sg_table(&buffer->sg_table);
+	if (IS_ERR(table)) {
+		kfree(a);
+		return -ENOMEM;
+	}
+
+	a->table = table;
+	a->dev = attachment->dev;
+	INIT_LIST_HEAD(&a->list);
+
+	attachment->priv = a;
+
+	mutex_lock(&buffer->lock);
+	list_add(&a->list, &buffer->attachments);
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static void system_heap_detach(struct dma_buf *dmabuf,
+			       struct dma_buf_attachment *attachment)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a = attachment->priv;
+
+	mutex_lock(&buffer->lock);
+	list_del(&a->list);
+	mutex_unlock(&buffer->lock);
+
+	sg_free_table(a->table);
+	kfree(a->table);
+	kfree(a);
+}
 
-	for (pg = 0; pg < buffer->pagecount; pg++)
-		__free_page(buffer->pages[pg]);
-	kfree(buffer->pages);
+static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment *attachment,
+						enum dma_data_direction direction)
+{
+	struct dma_heap_attachment *a = attachment->priv;
+	struct sg_table *table = a->table;
+	int ret;
+
+	ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return table;
+}
+
+static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
+				      struct sg_table *table,
+				      enum dma_data_direction direction)
+{
+	dma_unmap_sgtable(attachment->dev, table, direction, 0);
+}
+
+static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
+						enum dma_data_direction direction)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a;
+
+	mutex_lock(&buffer->lock);
+
+	if (buffer->vmap_cnt)
+		invalidate_kernel_vmap_range(buffer->vaddr, buffer->len);
+
+	list_for_each_entry(a, &buffer->attachments, list) {
+		dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
+	}
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static int system_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
+					      enum dma_data_direction direction)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a;
+
+	mutex_lock(&buffer->lock);
+
+	if (buffer->vmap_cnt)
+		flush_kernel_vmap_range(buffer->vaddr, buffer->len);
+
+	list_for_each_entry(a, &buffer->attachments, list) {
+		dma_sync_sgtable_for_device(a->dev, a->table, direction);
+	}
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static int system_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	struct sg_table *table = &buffer->sg_table;
+	unsigned long addr = vma->vm_start;
+	struct sg_page_iter piter;
+	int ret;
+
+	for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
+		struct page *page = sg_page_iter_page(&piter);
+
+		ret = remap_pfn_range(vma, addr, page_to_pfn(page), PAGE_SIZE,
+				      vma->vm_page_prot);
+		if (ret)
+			return ret;
+		addr += PAGE_SIZE;
+		if (addr >= vma->vm_end)
+			return 0;
+	}
+	return 0;
+}
+
+static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
+{
+	struct sg_table *table = &buffer->sg_table;
+	int npages = PAGE_ALIGN(buffer->len) / PAGE_SIZE;
+	struct page **pages = vmalloc(sizeof(struct page *) * npages);
+	struct page **tmp = pages;
+	struct sg_page_iter piter;
+	void *vaddr;
+
+	if (!pages)
+		return ERR_PTR(-ENOMEM);
+
+	for_each_sgtable_page(table, &piter, 0) {
+		WARN_ON(tmp - pages >= npages);
+		*tmp++ = sg_page_iter_page(&piter);
+	}
+
+	vaddr = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+	vfree(pages);
+
+	if (!vaddr)
+		return ERR_PTR(-ENOMEM);
+
+	return vaddr;
+}
+
+static void *system_heap_vmap(struct dma_buf *dmabuf)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	void *vaddr;
+
+	mutex_lock(&buffer->lock);
+	if (buffer->vmap_cnt) {
+		buffer->vmap_cnt++;
+		vaddr = buffer->vaddr;
+		goto out;
+	}
+
+	vaddr = system_heap_do_vmap(buffer);
+	if (IS_ERR(vaddr))
+		goto out;
+
+	buffer->vaddr = vaddr;
+	buffer->vmap_cnt++;
+out:
+	mutex_unlock(&buffer->lock);
+
+	return vaddr;
+}
+
+static void system_heap_vunmap(struct dma_buf *dmabuf, void *vaddr)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+
+	mutex_lock(&buffer->lock);
+	if (!--buffer->vmap_cnt) {
+		vunmap(buffer->vaddr);
+		buffer->vaddr = NULL;
+	}
+	mutex_unlock(&buffer->lock);
+}
+
+static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
+{
+	struct system_heap_buffer *buffer = dmabuf->priv;
+	struct sg_table *table;
+	struct scatterlist *sg;
+	int i;
+
+	table = &buffer->sg_table;
+	for_each_sgtable_sg(table, sg, i)
+		__free_page(sg_page(sg));
+	sg_free_table(table);
 	kfree(buffer);
 }
 
+static const struct dma_buf_ops system_heap_buf_ops = {
+	.attach = system_heap_attach,
+	.detach = system_heap_detach,
+	.map_dma_buf = system_heap_map_dma_buf,
+	.unmap_dma_buf = system_heap_unmap_dma_buf,
+	.begin_cpu_access = system_heap_dma_buf_begin_cpu_access,
+	.end_cpu_access = system_heap_dma_buf_end_cpu_access,
+	.mmap = system_heap_mmap,
+	.vmap = system_heap_vmap,
+	.vunmap = system_heap_vunmap,
+	.release = system_heap_dma_buf_release,
+};
+
 static int system_heap_allocate(struct dma_heap *heap,
 				unsigned long len,
 				unsigned long fd_flags,
 				unsigned long heap_flags)
 {
-	struct heap_helper_buffer *helper_buffer;
+	struct system_heap_buffer *buffer;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 	struct dma_buf *dmabuf;
-	int ret = -ENOMEM;
+	struct sg_table *table;
+	struct scatterlist *sg;
+	pgoff_t pagecount;
 	pgoff_t pg;
+	int i, ret = -ENOMEM;
 
-	helper_buffer = kzalloc(sizeof(*helper_buffer), GFP_KERNEL);
-	if (!helper_buffer)
+	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
+	if (!buffer)
 		return -ENOMEM;
 
-	init_heap_helper_buffer(helper_buffer, system_heap_free);
-	helper_buffer->heap = heap;
-	helper_buffer->size = len;
-
-	helper_buffer->pagecount = len / PAGE_SIZE;
-	helper_buffer->pages = kmalloc_array(helper_buffer->pagecount,
-					     sizeof(*helper_buffer->pages),
-					     GFP_KERNEL);
-	if (!helper_buffer->pages) {
-		ret = -ENOMEM;
-		goto err0;
-	}
+	INIT_LIST_HEAD(&buffer->attachments);
+	mutex_init(&buffer->lock);
+	buffer->heap = heap;
+	buffer->len = len;
 
-	for (pg = 0; pg < helper_buffer->pagecount; pg++) {
+	table = &buffer->sg_table;
+	pagecount = len / PAGE_SIZE;
+	if (sg_alloc_table(table, pagecount, GFP_KERNEL))
+		goto free_buffer;
+
+	sg = table->sgl;
+	for (pg = 0; pg < pagecount; pg++) {
+		struct page *page;
 		/*
 		 * Avoid trying to allocate memory if the process
-		 * has been killed by by SIGKILL
+		 * has been killed by SIGKILL
 		 */
 		if (fatal_signal_pending(current))
-			goto err1;
-
-		helper_buffer->pages[pg] = alloc_page(GFP_KERNEL | __GFP_ZERO);
-		if (!helper_buffer->pages[pg])
-			goto err1;
+			goto free_pages;
+		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!page)
+			goto free_pages;
+		sg_set_page(sg, page, page_size(page), 0);
+		sg = sg_next(sg);
 	}
 
 	/* create the dmabuf */
-	dmabuf = heap_helper_export_dmabuf(helper_buffer, fd_flags);
+	exp_info.ops = &system_heap_buf_ops;
+	exp_info.size = buffer->len;
+	exp_info.flags = fd_flags;
+	exp_info.priv = buffer;
+	dmabuf = dma_buf_export(&exp_info);
 	if (IS_ERR(dmabuf)) {
 		ret = PTR_ERR(dmabuf);
-		goto err1;
+		goto free_pages;
 	}
 
-	helper_buffer->dmabuf = dmabuf;
-
 	ret = dma_buf_fd(dmabuf, fd_flags);
 	if (ret < 0) {
 		dma_buf_put(dmabuf);
@@ -90,12 +343,12 @@ static int system_heap_allocate(struct dma_heap *heap,
 
 	return ret;
 
-err1:
-	while (pg > 0)
-		__free_page(helper_buffer->pages[--pg]);
-	kfree(helper_buffer->pages);
-err0:
-	kfree(helper_buffer);
+free_pages:
+	for_each_sgtable_sg(table, sg, i)
+		__free_page(sg_page(sg));
+	sg_free_table(table);
+free_buffer:
+	kfree(buffer);
 
 	return ret;
 }
@@ -107,7 +360,6 @@ static const struct dma_heap_ops system_heap_ops = {
 static int system_heap_create(void)
 {
 	struct dma_heap_export_info exp_info;
-	int ret = 0;
 
 	exp_info.name = "system";
 	exp_info.ops = &system_heap_ops;
@@ -115,9 +367,9 @@ static int system_heap_create(void)
 
 	sys_heap = dma_heap_add(&exp_info);
 	if (IS_ERR(sys_heap))
-		ret = PTR_ERR(sys_heap);
+		return PTR_ERR(sys_heap);
 
-	return ret;
+	return 0;
 }
 module_init(system_heap_create);
 MODULE_LICENSE("GPL v2");
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 2/7] dma-buf: heaps: Move heap-helper logic into the cma_heap implementation
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
  2020-11-10  3:49 ` [PATCH v5 1/7] dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists John Stultz
@ 2020-11-10  3:49 ` John Stultz
  2020-11-10  3:49 ` [PATCH v5 3/7] dma-buf: heaps: Remove heap-helpers code John Stultz
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

Since the heap-helpers logic ended up not being as generic as
hoped, move the heap-helpers dma_buf_ops implementations into
the cma_heap directly.

This will allow us to remove the heap_helpers code in a following
patch.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v2:
* Fix unused return value and locking issue Reported-by:
    kernel test robot <lkp@intel.com>
    Julia Lawall <julia.lawall@inria.fr>
* Make cma_heap_buf_ops static suggested by
    kernel test robot <lkp@intel.com>
* Fix uninitialized return in cma Reported-by:
    kernel test robot <lkp@intel.com>
* Minor cleanups
v3:
* Use the new sgtable mapping functions, as Suggested-by:
     Daniel Mentz <danielmentz@google.com>
v4:
* Spelling fix suggested by BrianS
---
 drivers/dma-buf/heaps/cma_heap.c | 314 ++++++++++++++++++++++++++-----
 1 file changed, 265 insertions(+), 49 deletions(-)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index e55384dc115b..5341e5e226d5 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -2,76 +2,290 @@
 /*
  * DMABUF CMA heap exporter
  *
- * Copyright (C) 2012, 2019 Linaro Ltd.
+ * Copyright (C) 2012, 2019, 2020 Linaro Ltd.
  * Author: <benjamin.gaignard@linaro.org> for ST-Ericsson.
+ *
+ * Also utilizing parts of Andrew Davis' SRAM heap:
+ * Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
+ *	Andrew F. Davis <afd@ti.com>
  */
-
 #include <linux/cma.h>
-#include <linux/device.h>
 #include <linux/dma-buf.h>
 #include <linux/dma-heap.h>
 #include <linux/dma-map-ops.h>
 #include <linux/err.h>
-#include <linux/errno.h>
 #include <linux/highmem.h>
+#include <linux/io.h>
+#include <linux/mm.h>
 #include <linux/module.h>
-#include <linux/slab.h>
 #include <linux/scatterlist.h>
-#include <linux/sched/signal.h>
+#include <linux/slab.h>
 
-#include "heap-helpers.h"
 
 struct cma_heap {
 	struct dma_heap *heap;
 	struct cma *cma;
 };
 
-static void cma_heap_free(struct heap_helper_buffer *buffer)
+struct cma_heap_buffer {
+	struct cma_heap *heap;
+	struct list_head attachments;
+	struct mutex lock;
+	unsigned long len;
+	struct page *cma_pages;
+	struct page **pages;
+	pgoff_t pagecount;
+	int vmap_cnt;
+	void *vaddr;
+};
+
+struct dma_heap_attachment {
+	struct device *dev;
+	struct sg_table table;
+	struct list_head list;
+};
+
+static int cma_heap_attach(struct dma_buf *dmabuf,
+			   struct dma_buf_attachment *attachment)
 {
-	struct cma_heap *cma_heap = dma_heap_get_drvdata(buffer->heap);
-	unsigned long nr_pages = buffer->pagecount;
-	struct page *cma_pages = buffer->priv_virt;
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a;
+	int ret;
 
-	/* free page list */
-	kfree(buffer->pages);
-	/* release memory */
-	cma_release(cma_heap->cma, cma_pages, nr_pages);
+	a = kzalloc(sizeof(*a), GFP_KERNEL);
+	if (!a)
+		return -ENOMEM;
+
+	ret = sg_alloc_table_from_pages(&a->table, buffer->pages,
+					buffer->pagecount, 0,
+					buffer->pagecount << PAGE_SHIFT,
+					GFP_KERNEL);
+	if (ret) {
+		kfree(a);
+		return ret;
+	}
+
+	a->dev = attachment->dev;
+	INIT_LIST_HEAD(&a->list);
+
+	attachment->priv = a;
+
+	mutex_lock(&buffer->lock);
+	list_add(&a->list, &buffer->attachments);
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static void cma_heap_detach(struct dma_buf *dmabuf,
+			    struct dma_buf_attachment *attachment)
+{
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a = attachment->priv;
+
+	mutex_lock(&buffer->lock);
+	list_del(&a->list);
+	mutex_unlock(&buffer->lock);
+
+	sg_free_table(&a->table);
+	kfree(a);
+}
+
+static struct sg_table *cma_heap_map_dma_buf(struct dma_buf_attachment *attachment,
+					     enum dma_data_direction direction)
+{
+	struct dma_heap_attachment *a = attachment->priv;
+	struct sg_table *table = &a->table;
+	int ret;
+
+	ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+	if (ret)
+		return ERR_PTR(-ENOMEM);
+	return table;
+}
+
+static void cma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
+				   struct sg_table *table,
+				   enum dma_data_direction direction)
+{
+	dma_unmap_sgtable(attachment->dev, table, direction, 0);
+}
+
+static int cma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
+					     enum dma_data_direction direction)
+{
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a;
+
+	if (buffer->vmap_cnt)
+		invalidate_kernel_vmap_range(buffer->vaddr, buffer->len);
+
+	mutex_lock(&buffer->lock);
+	list_for_each_entry(a, &buffer->attachments, list) {
+		dma_sync_sgtable_for_cpu(a->dev, &a->table, direction);
+	}
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static int cma_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
+					   enum dma_data_direction direction)
+{
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+	struct dma_heap_attachment *a;
+
+	if (buffer->vmap_cnt)
+		flush_kernel_vmap_range(buffer->vaddr, buffer->len);
+
+	mutex_lock(&buffer->lock);
+	list_for_each_entry(a, &buffer->attachments, list) {
+		dma_sync_sgtable_for_device(a->dev, &a->table, direction);
+	}
+	mutex_unlock(&buffer->lock);
+
+	return 0;
+}
+
+static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct cma_heap_buffer *buffer = vma->vm_private_data;
+
+	if (vmf->pgoff > buffer->pagecount)
+		return VM_FAULT_SIGBUS;
+
+	vmf->page = buffer->pages[vmf->pgoff];
+	get_page(vmf->page);
+
+	return 0;
+}
+
+static const struct vm_operations_struct dma_heap_vm_ops = {
+	.fault = cma_heap_vm_fault,
+};
+
+static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
+{
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
+		return -EINVAL;
+
+	vma->vm_ops = &dma_heap_vm_ops;
+	vma->vm_private_data = buffer;
+
+	return 0;
+}
+
+static void *cma_heap_do_vmap(struct cma_heap_buffer *buffer)
+{
+	void *vaddr;
+
+	vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
+	if (!vaddr)
+		return ERR_PTR(-ENOMEM);
+
+	return vaddr;
+}
+
+static void *cma_heap_vmap(struct dma_buf *dmabuf)
+{
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+	void *vaddr;
+
+	mutex_lock(&buffer->lock);
+	if (buffer->vmap_cnt) {
+		buffer->vmap_cnt++;
+		vaddr = buffer->vaddr;
+		goto out;
+	}
+
+	vaddr = cma_heap_do_vmap(buffer);
+	if (IS_ERR(vaddr))
+		goto out;
+
+	buffer->vaddr = vaddr;
+	buffer->vmap_cnt++;
+out:
+	mutex_unlock(&buffer->lock);
+
+	return vaddr;
+}
+
+static void cma_heap_vunmap(struct dma_buf *dmabuf, void *vaddr)
+{
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+
+	mutex_lock(&buffer->lock);
+	if (!--buffer->vmap_cnt) {
+		vunmap(buffer->vaddr);
+		buffer->vaddr = NULL;
+	}
+	mutex_unlock(&buffer->lock);
+}
+
+static void cma_heap_dma_buf_release(struct dma_buf *dmabuf)
+{
+	struct cma_heap_buffer *buffer = dmabuf->priv;
+	struct cma_heap *cma_heap = buffer->heap;
+
+	if (buffer->vmap_cnt > 0) {
+		WARN(1, "%s: buffer still mapped in the kernel\n", __func__);
+		vunmap(buffer->vaddr);
+	}
+
+	cma_release(cma_heap->cma, buffer->cma_pages, buffer->pagecount);
 	kfree(buffer);
 }
 
-/* dmabuf heap CMA operations functions */
+static const struct dma_buf_ops cma_heap_buf_ops = {
+	.attach = cma_heap_attach,
+	.detach = cma_heap_detach,
+	.map_dma_buf = cma_heap_map_dma_buf,
+	.unmap_dma_buf = cma_heap_unmap_dma_buf,
+	.begin_cpu_access = cma_heap_dma_buf_begin_cpu_access,
+	.end_cpu_access = cma_heap_dma_buf_end_cpu_access,
+	.mmap = cma_heap_mmap,
+	.vmap = cma_heap_vmap,
+	.vunmap = cma_heap_vunmap,
+	.release = cma_heap_dma_buf_release,
+};
+
 static int cma_heap_allocate(struct dma_heap *heap,
-			     unsigned long len,
-			     unsigned long fd_flags,
-			     unsigned long heap_flags)
+				  unsigned long len,
+				  unsigned long fd_flags,
+				  unsigned long heap_flags)
 {
 	struct cma_heap *cma_heap = dma_heap_get_drvdata(heap);
-	struct heap_helper_buffer *helper_buffer;
-	struct page *cma_pages;
+	struct cma_heap_buffer *buffer;
+	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
 	size_t size = PAGE_ALIGN(len);
-	unsigned long nr_pages = size >> PAGE_SHIFT;
+	pgoff_t pagecount = size >> PAGE_SHIFT;
 	unsigned long align = get_order(size);
+	struct page *cma_pages;
 	struct dma_buf *dmabuf;
 	int ret = -ENOMEM;
 	pgoff_t pg;
 
-	if (align > CONFIG_CMA_ALIGNMENT)
-		align = CONFIG_CMA_ALIGNMENT;
-
-	helper_buffer = kzalloc(sizeof(*helper_buffer), GFP_KERNEL);
-	if (!helper_buffer)
+	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
+	if (!buffer)
 		return -ENOMEM;
 
-	init_heap_helper_buffer(helper_buffer, cma_heap_free);
-	helper_buffer->heap = heap;
-	helper_buffer->size = len;
+	INIT_LIST_HEAD(&buffer->attachments);
+	mutex_init(&buffer->lock);
+	buffer->len = size;
+
+	if (align > CONFIG_CMA_ALIGNMENT)
+		align = CONFIG_CMA_ALIGNMENT;
 
-	cma_pages = cma_alloc(cma_heap->cma, nr_pages, align, false);
+	cma_pages = cma_alloc(cma_heap->cma, pagecount, align, false);
 	if (!cma_pages)
-		goto free_buf;
+		goto free_buffer;
 
+	/* Clear the cma pages */
 	if (PageHighMem(cma_pages)) {
-		unsigned long nr_clear_pages = nr_pages;
+		unsigned long nr_clear_pages = pagecount;
 		struct page *page = cma_pages;
 
 		while (nr_clear_pages > 0) {
@@ -85,7 +299,6 @@ static int cma_heap_allocate(struct dma_heap *heap,
 			 */
 			if (fatal_signal_pending(current))
 				goto free_cma;
-
 			page++;
 			nr_clear_pages--;
 		}
@@ -93,28 +306,30 @@ static int cma_heap_allocate(struct dma_heap *heap,
 		memset(page_address(cma_pages), 0, size);
 	}
 
-	helper_buffer->pagecount = nr_pages;
-	helper_buffer->pages = kmalloc_array(helper_buffer->pagecount,
-					     sizeof(*helper_buffer->pages),
-					     GFP_KERNEL);
-	if (!helper_buffer->pages) {
+	buffer->pages = kmalloc_array(pagecount, sizeof(*buffer->pages), GFP_KERNEL);
+	if (!buffer->pages) {
 		ret = -ENOMEM;
 		goto free_cma;
 	}
 
-	for (pg = 0; pg < helper_buffer->pagecount; pg++)
-		helper_buffer->pages[pg] = &cma_pages[pg];
+	for (pg = 0; pg < pagecount; pg++)
+		buffer->pages[pg] = &cma_pages[pg];
+
+	buffer->cma_pages = cma_pages;
+	buffer->heap = cma_heap;
+	buffer->pagecount = pagecount;
 
 	/* create the dmabuf */
-	dmabuf = heap_helper_export_dmabuf(helper_buffer, fd_flags);
+	exp_info.ops = &cma_heap_buf_ops;
+	exp_info.size = buffer->len;
+	exp_info.flags = fd_flags;
+	exp_info.priv = buffer;
+	dmabuf = dma_buf_export(&exp_info);
 	if (IS_ERR(dmabuf)) {
 		ret = PTR_ERR(dmabuf);
 		goto free_pages;
 	}
 
-	helper_buffer->dmabuf = dmabuf;
-	helper_buffer->priv_virt = cma_pages;
-
 	ret = dma_buf_fd(dmabuf, fd_flags);
 	if (ret < 0) {
 		dma_buf_put(dmabuf);
@@ -125,11 +340,12 @@ static int cma_heap_allocate(struct dma_heap *heap,
 	return ret;
 
 free_pages:
-	kfree(helper_buffer->pages);
+	kfree(buffer->pages);
 free_cma:
-	cma_release(cma_heap->cma, cma_pages, nr_pages);
-free_buf:
-	kfree(helper_buffer);
+	cma_release(cma_heap->cma, cma_pages, pagecount);
+free_buffer:
+	kfree(buffer);
+
 	return ret;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 3/7] dma-buf: heaps: Remove heap-helpers code
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
  2020-11-10  3:49 ` [PATCH v5 1/7] dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists John Stultz
  2020-11-10  3:49 ` [PATCH v5 2/7] dma-buf: heaps: Move heap-helper logic into the cma_heap implementation John Stultz
@ 2020-11-10  3:49 ` John Stultz
  2020-11-10  3:49 ` [PATCH v5 4/7] dma-buf: heaps: Skip sync if not mapped John Stultz
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

The heap-helpers code was not as generic as initially hoped
and it is now not being used, so remove it from the tree.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/dma-buf/heaps/Makefile       |   1 -
 drivers/dma-buf/heaps/heap-helpers.c | 270 ---------------------------
 drivers/dma-buf/heaps/heap-helpers.h |  53 ------
 3 files changed, 324 deletions(-)
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
 delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h

diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile
index 6e54cdec3da0..974467791032 100644
--- a/drivers/dma-buf/heaps/Makefile
+++ b/drivers/dma-buf/heaps/Makefile
@@ -1,4 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-y					+= heap-helpers.o
 obj-$(CONFIG_DMABUF_HEAPS_SYSTEM)	+= system_heap.o
 obj-$(CONFIG_DMABUF_HEAPS_CMA)		+= cma_heap.o
diff --git a/drivers/dma-buf/heaps/heap-helpers.c b/drivers/dma-buf/heaps/heap-helpers.c
deleted file mode 100644
index d0696cf937af..000000000000
--- a/drivers/dma-buf/heaps/heap-helpers.c
+++ /dev/null
@@ -1,270 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/device.h>
-#include <linux/dma-buf.h>
-#include <linux/err.h>
-#include <linux/highmem.h>
-#include <linux/idr.h>
-#include <linux/list.h>
-#include <linux/slab.h>
-#include <linux/uaccess.h>
-#include <linux/vmalloc.h>
-#include <uapi/linux/dma-heap.h>
-
-#include "heap-helpers.h"
-
-void init_heap_helper_buffer(struct heap_helper_buffer *buffer,
-			     void (*free)(struct heap_helper_buffer *))
-{
-	buffer->priv_virt = NULL;
-	mutex_init(&buffer->lock);
-	buffer->vmap_cnt = 0;
-	buffer->vaddr = NULL;
-	buffer->pagecount = 0;
-	buffer->pages = NULL;
-	INIT_LIST_HEAD(&buffer->attachments);
-	buffer->free = free;
-}
-
-struct dma_buf *heap_helper_export_dmabuf(struct heap_helper_buffer *buffer,
-					  int fd_flags)
-{
-	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
-
-	exp_info.ops = &heap_helper_ops;
-	exp_info.size = buffer->size;
-	exp_info.flags = fd_flags;
-	exp_info.priv = buffer;
-
-	return dma_buf_export(&exp_info);
-}
-
-static void *dma_heap_map_kernel(struct heap_helper_buffer *buffer)
-{
-	void *vaddr;
-
-	vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
-	if (!vaddr)
-		return ERR_PTR(-ENOMEM);
-
-	return vaddr;
-}
-
-static void dma_heap_buffer_destroy(struct heap_helper_buffer *buffer)
-{
-	if (buffer->vmap_cnt > 0) {
-		WARN(1, "%s: buffer still mapped in the kernel\n", __func__);
-		vunmap(buffer->vaddr);
-	}
-
-	buffer->free(buffer);
-}
-
-static void *dma_heap_buffer_vmap_get(struct heap_helper_buffer *buffer)
-{
-	void *vaddr;
-
-	if (buffer->vmap_cnt) {
-		buffer->vmap_cnt++;
-		return buffer->vaddr;
-	}
-	vaddr = dma_heap_map_kernel(buffer);
-	if (IS_ERR(vaddr))
-		return vaddr;
-	buffer->vaddr = vaddr;
-	buffer->vmap_cnt++;
-	return vaddr;
-}
-
-static void dma_heap_buffer_vmap_put(struct heap_helper_buffer *buffer)
-{
-	if (!--buffer->vmap_cnt) {
-		vunmap(buffer->vaddr);
-		buffer->vaddr = NULL;
-	}
-}
-
-struct dma_heaps_attachment {
-	struct device *dev;
-	struct sg_table table;
-	struct list_head list;
-};
-
-static int dma_heap_attach(struct dma_buf *dmabuf,
-			   struct dma_buf_attachment *attachment)
-{
-	struct dma_heaps_attachment *a;
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-	int ret;
-
-	a = kzalloc(sizeof(*a), GFP_KERNEL);
-	if (!a)
-		return -ENOMEM;
-
-	ret = sg_alloc_table_from_pages(&a->table, buffer->pages,
-					buffer->pagecount, 0,
-					buffer->pagecount << PAGE_SHIFT,
-					GFP_KERNEL);
-	if (ret) {
-		kfree(a);
-		return ret;
-	}
-
-	a->dev = attachment->dev;
-	INIT_LIST_HEAD(&a->list);
-
-	attachment->priv = a;
-
-	mutex_lock(&buffer->lock);
-	list_add(&a->list, &buffer->attachments);
-	mutex_unlock(&buffer->lock);
-
-	return 0;
-}
-
-static void dma_heap_detach(struct dma_buf *dmabuf,
-			    struct dma_buf_attachment *attachment)
-{
-	struct dma_heaps_attachment *a = attachment->priv;
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-
-	mutex_lock(&buffer->lock);
-	list_del(&a->list);
-	mutex_unlock(&buffer->lock);
-
-	sg_free_table(&a->table);
-	kfree(a);
-}
-
-static
-struct sg_table *dma_heap_map_dma_buf(struct dma_buf_attachment *attachment,
-				      enum dma_data_direction direction)
-{
-	struct dma_heaps_attachment *a = attachment->priv;
-	struct sg_table *table = &a->table;
-	int ret;
-
-	ret = dma_map_sgtable(attachment->dev, table, direction, 0);
-	if (ret)
-		table = ERR_PTR(ret);
-	return table;
-}
-
-static void dma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
-				   struct sg_table *table,
-				   enum dma_data_direction direction)
-{
-	dma_unmap_sgtable(attachment->dev, table, direction, 0);
-}
-
-static vm_fault_t dma_heap_vm_fault(struct vm_fault *vmf)
-{
-	struct vm_area_struct *vma = vmf->vma;
-	struct heap_helper_buffer *buffer = vma->vm_private_data;
-
-	if (vmf->pgoff > buffer->pagecount)
-		return VM_FAULT_SIGBUS;
-
-	vmf->page = buffer->pages[vmf->pgoff];
-	get_page(vmf->page);
-
-	return 0;
-}
-
-static const struct vm_operations_struct dma_heap_vm_ops = {
-	.fault = dma_heap_vm_fault,
-};
-
-static int dma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
-{
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-
-	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
-		return -EINVAL;
-
-	vma->vm_ops = &dma_heap_vm_ops;
-	vma->vm_private_data = buffer;
-
-	return 0;
-}
-
-static void dma_heap_dma_buf_release(struct dma_buf *dmabuf)
-{
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-
-	dma_heap_buffer_destroy(buffer);
-}
-
-static int dma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
-					     enum dma_data_direction direction)
-{
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-	struct dma_heaps_attachment *a;
-	int ret = 0;
-
-	mutex_lock(&buffer->lock);
-
-	if (buffer->vmap_cnt)
-		invalidate_kernel_vmap_range(buffer->vaddr, buffer->size);
-
-	list_for_each_entry(a, &buffer->attachments, list) {
-		dma_sync_sg_for_cpu(a->dev, a->table.sgl, a->table.nents,
-				    direction);
-	}
-	mutex_unlock(&buffer->lock);
-
-	return ret;
-}
-
-static int dma_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
-					   enum dma_data_direction direction)
-{
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-	struct dma_heaps_attachment *a;
-
-	mutex_lock(&buffer->lock);
-
-	if (buffer->vmap_cnt)
-		flush_kernel_vmap_range(buffer->vaddr, buffer->size);
-
-	list_for_each_entry(a, &buffer->attachments, list) {
-		dma_sync_sg_for_device(a->dev, a->table.sgl, a->table.nents,
-				       direction);
-	}
-	mutex_unlock(&buffer->lock);
-
-	return 0;
-}
-
-static void *dma_heap_dma_buf_vmap(struct dma_buf *dmabuf)
-{
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-	void *vaddr;
-
-	mutex_lock(&buffer->lock);
-	vaddr = dma_heap_buffer_vmap_get(buffer);
-	mutex_unlock(&buffer->lock);
-
-	return vaddr;
-}
-
-static void dma_heap_dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
-{
-	struct heap_helper_buffer *buffer = dmabuf->priv;
-
-	mutex_lock(&buffer->lock);
-	dma_heap_buffer_vmap_put(buffer);
-	mutex_unlock(&buffer->lock);
-}
-
-const struct dma_buf_ops heap_helper_ops = {
-	.map_dma_buf = dma_heap_map_dma_buf,
-	.unmap_dma_buf = dma_heap_unmap_dma_buf,
-	.mmap = dma_heap_mmap,
-	.release = dma_heap_dma_buf_release,
-	.attach = dma_heap_attach,
-	.detach = dma_heap_detach,
-	.begin_cpu_access = dma_heap_dma_buf_begin_cpu_access,
-	.end_cpu_access = dma_heap_dma_buf_end_cpu_access,
-	.vmap = dma_heap_dma_buf_vmap,
-	.vunmap = dma_heap_dma_buf_vunmap,
-};
diff --git a/drivers/dma-buf/heaps/heap-helpers.h b/drivers/dma-buf/heaps/heap-helpers.h
deleted file mode 100644
index 805d2df88024..000000000000
--- a/drivers/dma-buf/heaps/heap-helpers.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * DMABUF Heaps helper code
- *
- * Copyright (C) 2011 Google, Inc.
- * Copyright (C) 2019 Linaro Ltd.
- */
-
-#ifndef _HEAP_HELPERS_H
-#define _HEAP_HELPERS_H
-
-#include <linux/dma-heap.h>
-#include <linux/list.h>
-
-/**
- * struct heap_helper_buffer - helper buffer metadata
- * @heap:		back pointer to the heap the buffer came from
- * @dmabuf:		backing dma-buf for this buffer
- * @size:		size of the buffer
- * @priv_virt		pointer to heap specific private value
- * @lock		mutext to protect the data in this structure
- * @vmap_cnt		count of vmap references on the buffer
- * @vaddr		vmap'ed virtual address
- * @pagecount		number of pages in the buffer
- * @pages		list of page pointers
- * @attachments		list of device attachments
- *
- * @free		heap callback to free the buffer
- */
-struct heap_helper_buffer {
-	struct dma_heap *heap;
-	struct dma_buf *dmabuf;
-	size_t size;
-
-	void *priv_virt;
-	struct mutex lock;
-	int vmap_cnt;
-	void *vaddr;
-	pgoff_t pagecount;
-	struct page **pages;
-	struct list_head attachments;
-
-	void (*free)(struct heap_helper_buffer *buffer);
-};
-
-void init_heap_helper_buffer(struct heap_helper_buffer *buffer,
-			     void (*free)(struct heap_helper_buffer *));
-
-struct dma_buf *heap_helper_export_dmabuf(struct heap_helper_buffer *buffer,
-					  int fd_flags);
-
-extern const struct dma_buf_ops heap_helper_ops;
-#endif /* _HEAP_HELPERS_H */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 4/7] dma-buf: heaps: Skip sync if not mapped
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
                   ` (2 preceding siblings ...)
  2020-11-10  3:49 ` [PATCH v5 3/7] dma-buf: heaps: Remove heap-helpers code John Stultz
@ 2020-11-10  3:49 ` John Stultz
  2020-11-10  3:49 ` [PATCH v5 5/7] dma-buf: system_heap: Allocate higher order pages if available John Stultz
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

This patch is basically a port of Ørjan Eide's similar patch for ION
 https://lore.kernel.org/lkml/20200414134629.54567-1-orjan.eide@arm.com/

Only sync the sg-list of dma-buf heap attachment when the attachment
is actually mapped on the device.

dma-bufs may be synced at any time. It can be reached from user space
via DMA_BUF_IOCTL_SYNC, so there are no guarantees from callers on when
syncs may be attempted, and dma_buf_end_cpu_access() and
dma_buf_begin_cpu_access() may not be paired.

Since the sg_list's dma_address isn't set up until the buffer is used
on the device, and dma_map_sg() is called on it, the dma_address will be
NULL if sync is attempted on the dma-buf before it's mapped on a device.

Before v5.0 (commit 55897af63091 ("dma-direct: merge swiotlb_dma_ops
into the dma_direct code")) this was a problem as the dma-api (at least
the swiotlb_dma_ops on arm64) would use the potentially invalid
dma_address. How that failed depended on how the device handled physical
address 0. If 0 was a valid address to physical ram, that page would get
flushed a lot, while the actual pages in the buffer would not get synced
correctly. While if 0 is an invalid physical address it may cause a
fault and trigger a crash.

In v5.0 this was incidentally fixed by commit 55897af63091 ("dma-direct:
merge swiotlb_dma_ops into the dma_direct code"), as this moved the
dma-api to use the page pointer in the sg_list, and (for Ion buffers at
least) this will always be valid if the sg_list exists at all.

But, this issue is re-introduced in v5.3 with
commit 449fa54d6815 ("dma-direct: correct the physical addr in
dma_direct_sync_sg_for_cpu/device") moves the dma-api back to the old
behaviour and picks the dma_address that may be invalid.

dma-buf core doesn't ensure that the buffer is mapped on the device, and
thus have a valid sg_list, before calling the exporter's
begin_cpu_access.

Logic and commit message originally by: Ørjan Eide <orjan.eide@arm.com>

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/dma-buf/heaps/cma_heap.c    | 10 ++++++++++
 drivers/dma-buf/heaps/system_heap.c | 10 ++++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c
index 5341e5e226d5..028f1e8d6041 100644
--- a/drivers/dma-buf/heaps/cma_heap.c
+++ b/drivers/dma-buf/heaps/cma_heap.c
@@ -43,6 +43,7 @@ struct dma_heap_attachment {
 	struct device *dev;
 	struct sg_table table;
 	struct list_head list;
+	bool mapped;
 };
 
 static int cma_heap_attach(struct dma_buf *dmabuf,
@@ -67,6 +68,7 @@ static int cma_heap_attach(struct dma_buf *dmabuf,
 
 	a->dev = attachment->dev;
 	INIT_LIST_HEAD(&a->list);
+	a->mapped = false;
 
 	attachment->priv = a;
 
@@ -101,6 +103,7 @@ static struct sg_table *cma_heap_map_dma_buf(struct dma_buf_attachment *attachme
 	ret = dma_map_sgtable(attachment->dev, table, direction, 0);
 	if (ret)
 		return ERR_PTR(-ENOMEM);
+	a->mapped = true;
 	return table;
 }
 
@@ -108,6 +111,9 @@ static void cma_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
 				   struct sg_table *table,
 				   enum dma_data_direction direction)
 {
+	struct dma_heap_attachment *a = attachment->priv;
+
+	a->mapped = false;
 	dma_unmap_sgtable(attachment->dev, table, direction, 0);
 }
 
@@ -122,6 +128,8 @@ static int cma_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 
 	mutex_lock(&buffer->lock);
 	list_for_each_entry(a, &buffer->attachments, list) {
+		if (!a->mapped)
+			continue;
 		dma_sync_sgtable_for_cpu(a->dev, &a->table, direction);
 	}
 	mutex_unlock(&buffer->lock);
@@ -140,6 +148,8 @@ static int cma_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
 
 	mutex_lock(&buffer->lock);
 	list_for_each_entry(a, &buffer->attachments, list) {
+		if (!a->mapped)
+			continue;
 		dma_sync_sgtable_for_device(a->dev, &a->table, direction);
 	}
 	mutex_unlock(&buffer->lock);
diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 5c44f9c06807..15b36bc862b1 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -37,6 +37,7 @@ struct dma_heap_attachment {
 	struct device *dev;
 	struct sg_table *table;
 	struct list_head list;
+	bool mapped;
 };
 
 static struct sg_table *dup_sg_table(struct sg_table *table)
@@ -84,6 +85,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
 	a->table = table;
 	a->dev = attachment->dev;
 	INIT_LIST_HEAD(&a->list);
+	a->mapped = false;
 
 	attachment->priv = a;
 
@@ -120,6 +122,7 @@ static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment *attac
 	if (ret)
 		return ERR_PTR(ret);
 
+	a->mapped = true;
 	return table;
 }
 
@@ -127,6 +130,9 @@ static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
 				      struct sg_table *table,
 				      enum dma_data_direction direction)
 {
+	struct dma_heap_attachment *a = attachment->priv;
+
+	a->mapped = false;
 	dma_unmap_sgtable(attachment->dev, table, direction, 0);
 }
 
@@ -142,6 +148,8 @@ static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 		invalidate_kernel_vmap_range(buffer->vaddr, buffer->len);
 
 	list_for_each_entry(a, &buffer->attachments, list) {
+		if (!a->mapped)
+			continue;
 		dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
 	}
 	mutex_unlock(&buffer->lock);
@@ -161,6 +169,8 @@ static int system_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
 		flush_kernel_vmap_range(buffer->vaddr, buffer->len);
 
 	list_for_each_entry(a, &buffer->attachments, list) {
+		if (!a->mapped)
+			continue;
 		dma_sync_sgtable_for_device(a->dev, a->table, direction);
 	}
 	mutex_unlock(&buffer->lock);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 5/7] dma-buf: system_heap: Allocate higher order pages if available
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
                   ` (3 preceding siblings ...)
  2020-11-10  3:49 ` [PATCH v5 4/7] dma-buf: heaps: Skip sync if not mapped John Stultz
@ 2020-11-10  3:49 ` John Stultz
  2020-11-10  3:49 ` [PATCH v5 6/7] dma-buf: dma-heap: Keep track of the heap device struct John Stultz
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

While the system heap can return non-contiguous pages,
try to allocate larger order pages if possible.

This will allow slight performance gains and make implementing
page pooling easier.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v3:
* Use page_size() rather then opencoding it
v5:
* Add comment explaining order size rational
---
 drivers/dma-buf/heaps/system_heap.c | 89 +++++++++++++++++++++++------
 1 file changed, 71 insertions(+), 18 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 15b36bc862b1..55367266a47b 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -40,6 +40,20 @@ struct dma_heap_attachment {
 	bool mapped;
 };
 
+#define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
+				| __GFP_NORETRY) & ~__GFP_RECLAIM) \
+				| __GFP_COMP)
+#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
+static gfp_t order_flags[] = {HIGH_ORDER_GFP, LOW_ORDER_GFP, LOW_ORDER_GFP};
+/*
+ * The selection of the orders used for allocation (1MB, 64K, 4K) is designed
+ * to match with the sizes often found in IOMMUs. Using order 4 pages instead
+ * of order 0 pages can significantly improve the performance of many IOMMUs
+ * by reducing TLB pressure and time spent updating page tables.
+ */
+static const unsigned int orders[] = {8, 4, 0};
+#define NUM_ORDERS ARRAY_SIZE(orders)
+
 static struct sg_table *dup_sg_table(struct sg_table *table)
 {
 	struct sg_table *new_table;
@@ -270,8 +284,11 @@ static void system_heap_dma_buf_release(struct dma_buf *dmabuf)
 	int i;
 
 	table = &buffer->sg_table;
-	for_each_sgtable_sg(table, sg, i)
-		__free_page(sg_page(sg));
+	for_each_sg(table->sgl, sg, table->nents, i) {
+		struct page *page = sg_page(sg);
+
+		__free_pages(page, compound_order(page));
+	}
 	sg_free_table(table);
 	kfree(buffer);
 }
@@ -289,6 +306,26 @@ static const struct dma_buf_ops system_heap_buf_ops = {
 	.release = system_heap_dma_buf_release,
 };
 
+static struct page *alloc_largest_available(unsigned long size,
+					    unsigned int max_order)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < NUM_ORDERS; i++) {
+		if (size <  (PAGE_SIZE << orders[i]))
+			continue;
+		if (max_order < orders[i])
+			continue;
+
+		page = alloc_pages(order_flags[i], orders[i]);
+		if (!page)
+			continue;
+		return page;
+	}
+	return NULL;
+}
+
 static int system_heap_allocate(struct dma_heap *heap,
 				unsigned long len,
 				unsigned long fd_flags,
@@ -296,11 +333,13 @@ static int system_heap_allocate(struct dma_heap *heap,
 {
 	struct system_heap_buffer *buffer;
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
+	unsigned long size_remaining = len;
+	unsigned int max_order = orders[0];
 	struct dma_buf *dmabuf;
 	struct sg_table *table;
 	struct scatterlist *sg;
-	pgoff_t pagecount;
-	pgoff_t pg;
+	struct list_head pages;
+	struct page *page, *tmp_page;
 	int i, ret = -ENOMEM;
 
 	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
@@ -312,25 +351,35 @@ static int system_heap_allocate(struct dma_heap *heap,
 	buffer->heap = heap;
 	buffer->len = len;
 
-	table = &buffer->sg_table;
-	pagecount = len / PAGE_SIZE;
-	if (sg_alloc_table(table, pagecount, GFP_KERNEL))
-		goto free_buffer;
-
-	sg = table->sgl;
-	for (pg = 0; pg < pagecount; pg++) {
-		struct page *page;
+	INIT_LIST_HEAD(&pages);
+	i = 0;
+	while (size_remaining > 0) {
 		/*
 		 * Avoid trying to allocate memory if the process
 		 * has been killed by SIGKILL
 		 */
 		if (fatal_signal_pending(current))
-			goto free_pages;
-		page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+			goto free_buffer;
+
+		page = alloc_largest_available(size_remaining, max_order);
 		if (!page)
-			goto free_pages;
+			goto free_buffer;
+
+		list_add_tail(&page->lru, &pages);
+		size_remaining -= page_size(page);
+		max_order = compound_order(page);
+		i++;
+	}
+
+	table = &buffer->sg_table;
+	if (sg_alloc_table(table, i, GFP_KERNEL))
+		goto free_buffer;
+
+	sg = table->sgl;
+	list_for_each_entry_safe(page, tmp_page, &pages, lru) {
 		sg_set_page(sg, page, page_size(page), 0);
 		sg = sg_next(sg);
+		list_del(&page->lru);
 	}
 
 	/* create the dmabuf */
@@ -350,14 +399,18 @@ static int system_heap_allocate(struct dma_heap *heap,
 		/* just return, as put will call release and that will free */
 		return ret;
 	}
-
 	return ret;
 
 free_pages:
-	for_each_sgtable_sg(table, sg, i)
-		__free_page(sg_page(sg));
+	for_each_sgtable_sg(table, sg, i) {
+		struct page *p = sg_page(sg);
+
+		__free_pages(p, compound_order(p));
+	}
 	sg_free_table(table);
 free_buffer:
+	list_for_each_entry_safe(page, tmp_page, &pages, lru)
+		__free_pages(page, compound_order(page));
 	kfree(buffer);
 
 	return ret;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 6/7] dma-buf: dma-heap: Keep track of the heap device struct
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
                   ` (4 preceding siblings ...)
  2020-11-10  3:49 ` [PATCH v5 5/7] dma-buf: system_heap: Allocate higher order pages if available John Stultz
@ 2020-11-10  3:49 ` John Stultz
  2020-11-10  3:49 ` [PATCH v5 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap John Stultz
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

Keep track of the heap device struct.

This will be useful for special DMA allocations
and actions.

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 drivers/dma-buf/dma-heap.c | 33 +++++++++++++++++++++++++--------
 include/linux/dma-heap.h   |  9 +++++++++
 2 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index afd22c9dbdcf..72c746755d89 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -30,6 +30,7 @@
  * @heap_devt		heap device node
  * @list		list head connecting to list of heaps
  * @heap_cdev		heap char device
+ * @heap_dev		heap device struct
  *
  * Represents a heap of memory from which buffers can be made.
  */
@@ -40,6 +41,7 @@ struct dma_heap {
 	dev_t heap_devt;
 	struct list_head list;
 	struct cdev heap_cdev;
+	struct device *heap_dev;
 };
 
 static LIST_HEAD(heap_list);
@@ -190,10 +192,21 @@ void *dma_heap_get_drvdata(struct dma_heap *heap)
 	return heap->priv;
 }
 
+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap)
+{
+	return heap->heap_dev;
+}
+
 struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 {
 	struct dma_heap *heap, *h, *err_ret;
-	struct device *dev_ret;
 	unsigned int minor;
 	int ret;
 
@@ -247,16 +260,20 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info)
 		goto err1;
 	}
 
-	dev_ret = device_create(dma_heap_class,
-				NULL,
-				heap->heap_devt,
-				NULL,
-				heap->name);
-	if (IS_ERR(dev_ret)) {
+	heap->heap_dev = device_create(dma_heap_class,
+				       NULL,
+				       heap->heap_devt,
+				       NULL,
+				       heap->name);
+	if (IS_ERR(heap->heap_dev)) {
 		pr_err("dma_heap: Unable to create device\n");
-		err_ret = ERR_CAST(dev_ret);
+		err_ret = ERR_CAST(heap->heap_dev);
 		goto err2;
 	}
+
+	/* Make sure it doesn't disappear on us */
+	heap->heap_dev = get_device(heap->heap_dev);
+
 	/* Add heap to the list */
 	mutex_lock(&heap_list_lock);
 	list_add(&heap->list, &heap_list);
diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h
index 454e354d1ffb..82857e096910 100644
--- a/include/linux/dma-heap.h
+++ b/include/linux/dma-heap.h
@@ -50,6 +50,15 @@ struct dma_heap_export_info {
  */
 void *dma_heap_get_drvdata(struct dma_heap *heap);
 
+/**
+ * dma_heap_get_dev() - get device struct for the heap
+ * @heap: DMA-Heap to retrieve device struct from
+ *
+ * Returns:
+ * The device struct for the heap.
+ */
+struct device *dma_heap_get_dev(struct dma_heap *heap);
+
 /**
  * dma_heap_add - adds a heap to dmabuf heaps
  * @exp_info:		information needed to register this heap
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
                   ` (5 preceding siblings ...)
  2020-11-10  3:49 ` [PATCH v5 6/7] dma-buf: dma-heap: Keep track of the heap device struct John Stultz
@ 2020-11-10  3:49 ` John Stultz
  2020-11-12  5:39 ` [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation Sumit Semwal
       [not found] ` <CAF2Aj3iEUkBDyyWDT63iT_7KrquOcEo_L5rCteGF1OJg8Ux3ug@mail.gmail.com>
  8 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2020-11-10  3:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Sumit Semwal, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	linux-media, dri-devel

This adds a heap that allocates non-contiguous buffers that are
marked as writecombined, so they are not cached by the CPU.

This is useful, as most graphics buffers are usually not touched
by the CPU or only written into once by the CPU. So when mapping
the buffer over and over between devices, we can skip the CPU
syncing, which saves a lot of cache management overhead, greatly
improving performance.

For folk using ION, there was a ION_FLAG_CACHED flag, which
signaled if the returned buffer should be CPU cacheable or not.
With DMA-BUF heaps, we do not yet have such a flag, and by default
the current heaps (system and cma) produce CPU cachable buffers.
So for folks transitioning from ION to DMA-BUF Heaps, this fills
in some of that missing functionality.

There has been a suggestion to make this functionality a flag
(DMAHEAP_FLAG_UNCACHED?) on the system heap, similar to how
ION used the ION_FLAG_CACHED. But I want to make sure an
_UNCACHED flag would truely be a generic attribute across all
heaps. So far that has been unclear, so having it as a separate
heap seemes better for now. (But I'm open to discussion on this
point!)

This is a rework of earlier efforts to add a uncached system heap,
done utilizing the exisitng system heap, adding just a bit of
logic to handle the uncached case.

Feedback would be very welcome!

Many thanks to Liam Mark for his help to get this working.

Pending opensource users of this code include:
* AOSP HiKey960 gralloc:
  - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
  - Visibly improves performance over the system heap
* AOSP Codec2 (possibly, needs more review):
  - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325

Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Liam Mark <lmark@codeaurora.org>
Cc: Laura Abbott <labbott@kernel.org>
Cc: Brian Starkey <Brian.Starkey@arm.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
Cc: Ørjan Eide <orjan.eide@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Ezequiel Garcia <ezequiel@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: James Jones <jajones@nvidia.com>
Cc: linux-media@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
v4:
* Make sys_uncached_heap static, as
    Reported-by: kernel test robot <lkp@intel.com>
* Fix wrong return value, caught by smatch
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
* Ensure we call flush/invalidate_kernel_vmap_range() in the
  uncached cases to try to address feedback about VIVT caches
  from Christoph
* Reorder a few lines as suggested by BrianS
* Avoid holding the initial mapping for the lifetime of the buffer
  as suggested by BrianS
* Fix a unlikely race between allocate and updating the dma_mask
  that BrianS noticed.
---
 drivers/dma-buf/heaps/system_heap.c | 111 ++++++++++++++++++++++++----
 1 file changed, 95 insertions(+), 16 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 55367266a47b..1fc5b38cbc59 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 
 static struct dma_heap *sys_heap;
+static struct dma_heap *sys_uncached_heap;
 
 struct system_heap_buffer {
 	struct dma_heap *heap;
@@ -31,6 +32,8 @@ struct system_heap_buffer {
 	struct sg_table sg_table;
 	int vmap_cnt;
 	void *vaddr;
+
+	bool uncached;
 };
 
 struct dma_heap_attachment {
@@ -38,6 +41,8 @@ struct dma_heap_attachment {
 	struct sg_table *table;
 	struct list_head list;
 	bool mapped;
+
+	bool uncached;
 };
 
 #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
@@ -100,7 +105,7 @@ static int system_heap_attach(struct dma_buf *dmabuf,
 	a->dev = attachment->dev;
 	INIT_LIST_HEAD(&a->list);
 	a->mapped = false;
-
+	a->uncached = buffer->uncached;
 	attachment->priv = a;
 
 	mutex_lock(&buffer->lock);
@@ -130,9 +135,13 @@ static struct sg_table *system_heap_map_dma_buf(struct dma_buf_attachment *attac
 {
 	struct dma_heap_attachment *a = attachment->priv;
 	struct sg_table *table = a->table;
+	int attr = 0;
 	int ret;
 
-	ret = dma_map_sgtable(attachment->dev, table, direction, 0);
+	if (a->uncached)
+		attr = DMA_ATTR_SKIP_CPU_SYNC;
+
+	ret = dma_map_sgtable(attachment->dev, table, direction, attr);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -145,9 +154,12 @@ static void system_heap_unmap_dma_buf(struct dma_buf_attachment *attachment,
 				      enum dma_data_direction direction)
 {
 	struct dma_heap_attachment *a = attachment->priv;
+	int attr = 0;
 
+	if (a->uncached)
+		attr = DMA_ATTR_SKIP_CPU_SYNC;
 	a->mapped = false;
-	dma_unmap_sgtable(attachment->dev, table, direction, 0);
+	dma_unmap_sgtable(attachment->dev, table, direction, attr);
 }
 
 static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -161,10 +173,12 @@ static int system_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
 	if (buffer->vmap_cnt)
 		invalidate_kernel_vmap_range(buffer->vaddr, buffer->len);
 
-	list_for_each_entry(a, &buffer->attachments, list) {
-		if (!a->mapped)
-			continue;
-		dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
+	if (!buffer->uncached) {
+		list_for_each_entry(a, &buffer->attachments, list) {
+			if (!a->mapped)
+				continue;
+			dma_sync_sgtable_for_cpu(a->dev, a->table, direction);
+		}
 	}
 	mutex_unlock(&buffer->lock);
 
@@ -182,10 +196,12 @@ static int system_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
 	if (buffer->vmap_cnt)
 		flush_kernel_vmap_range(buffer->vaddr, buffer->len);
 
-	list_for_each_entry(a, &buffer->attachments, list) {
-		if (!a->mapped)
-			continue;
-		dma_sync_sgtable_for_device(a->dev, a->table, direction);
+	if (!buffer->uncached) {
+		list_for_each_entry(a, &buffer->attachments, list) {
+			if (!a->mapped)
+				continue;
+			dma_sync_sgtable_for_device(a->dev, a->table, direction);
+		}
 	}
 	mutex_unlock(&buffer->lock);
 
@@ -200,6 +216,9 @@ static int system_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
 	struct sg_page_iter piter;
 	int ret;
 
+	if (buffer->uncached)
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
 	for_each_sgtable_page(table, &piter, vma->vm_pgoff) {
 		struct page *page = sg_page_iter_page(&piter);
 
@@ -221,17 +240,21 @@ static void *system_heap_do_vmap(struct system_heap_buffer *buffer)
 	struct page **pages = vmalloc(sizeof(struct page *) * npages);
 	struct page **tmp = pages;
 	struct sg_page_iter piter;
+	pgprot_t pgprot = PAGE_KERNEL;
 	void *vaddr;
 
 	if (!pages)
 		return ERR_PTR(-ENOMEM);
 
+	if (buffer->uncached)
+		pgprot = pgprot_writecombine(PAGE_KERNEL);
+
 	for_each_sgtable_page(table, &piter, 0) {
 		WARN_ON(tmp - pages >= npages);
 		*tmp++ = sg_page_iter_page(&piter);
 	}
 
-	vaddr = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+	vaddr = vmap(pages, npages, VM_MAP, pgprot);
 	vfree(pages);
 
 	if (!vaddr)
@@ -326,10 +349,11 @@ static struct page *alloc_largest_available(unsigned long size,
 	return NULL;
 }
 
-static int system_heap_allocate(struct dma_heap *heap,
-				unsigned long len,
-				unsigned long fd_flags,
-				unsigned long heap_flags)
+static int system_heap_do_allocate(struct dma_heap *heap,
+				   unsigned long len,
+				   unsigned long fd_flags,
+				   unsigned long heap_flags,
+				   bool uncached)
 {
 	struct system_heap_buffer *buffer;
 	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
@@ -350,6 +374,7 @@ static int system_heap_allocate(struct dma_heap *heap,
 	mutex_init(&buffer->lock);
 	buffer->heap = heap;
 	buffer->len = len;
+	buffer->uncached = uncached;
 
 	INIT_LIST_HEAD(&pages);
 	i = 0;
@@ -399,6 +424,18 @@ static int system_heap_allocate(struct dma_heap *heap,
 		/* just return, as put will call release and that will free */
 		return ret;
 	}
+
+	/*
+	 * For uncached buffers, we need to initially flush cpu cache, since
+	 * the __GFP_ZERO on the allocation means the zeroing was done by the
+	 * cpu and thus it is likely cached. Map (and implicitly flush) and
+	 * unmap it now so we don't get corruption later on.
+	 */
+	if (buffer->uncached) {
+		dma_map_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
+		dma_unmap_sgtable(dma_heap_get_dev(heap), table, DMA_BIDIRECTIONAL, 0);
+	}
+
 	return ret;
 
 free_pages:
@@ -416,10 +453,40 @@ static int system_heap_allocate(struct dma_heap *heap,
 	return ret;
 }
 
+static int system_heap_allocate(struct dma_heap *heap,
+				unsigned long len,
+				unsigned long fd_flags,
+				unsigned long heap_flags)
+{
+	return system_heap_do_allocate(heap, len, fd_flags, heap_flags, false);
+}
+
 static const struct dma_heap_ops system_heap_ops = {
 	.allocate = system_heap_allocate,
 };
 
+static int system_uncached_heap_allocate(struct dma_heap *heap,
+					 unsigned long len,
+					 unsigned long fd_flags,
+					 unsigned long heap_flags)
+{
+	return system_heap_do_allocate(heap, len, fd_flags, heap_flags, true);
+}
+
+/* Dummy function to be used until we can call coerce_mask_and_coherent */
+static int system_uncached_heap_not_initialized(struct dma_heap *heap,
+						unsigned long len,
+						unsigned long fd_flags,
+						unsigned long heap_flags)
+{
+	return -EBUSY;
+}
+
+static struct dma_heap_ops system_uncached_heap_ops = {
+	/* After system_heap_create is complete, we will swap this */
+	.allocate = system_uncached_heap_not_initialized,
+};
+
 static int system_heap_create(void)
 {
 	struct dma_heap_export_info exp_info;
@@ -432,6 +499,18 @@ static int system_heap_create(void)
 	if (IS_ERR(sys_heap))
 		return PTR_ERR(sys_heap);
 
+	exp_info.name = "system-uncached";
+	exp_info.ops = &system_uncached_heap_ops;
+	exp_info.priv = NULL;
+
+	sys_uncached_heap = dma_heap_add(&exp_info);
+	if (IS_ERR(sys_uncached_heap))
+		return PTR_ERR(sys_uncached_heap);
+
+	dma_coerce_mask_and_coherent(dma_heap_get_dev(sys_uncached_heap), DMA_BIT_MASK(64));
+	mb(); /* make sure we only set allocate after dma_mask is set */
+	system_uncached_heap_ops.allocate = system_uncached_heap_allocate;
+
 	return 0;
 }
 module_init(system_heap_create);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
                   ` (6 preceding siblings ...)
  2020-11-10  3:49 ` [PATCH v5 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap John Stultz
@ 2020-11-12  5:39 ` Sumit Semwal
  2020-11-12  9:32   ` Daniel Vetter
       [not found] ` <CAF2Aj3iEUkBDyyWDT63iT_7KrquOcEo_L5rCteGF1OJg8Ux3ug@mail.gmail.com>
  8 siblings, 1 reply; 18+ messages in thread
From: Sumit Semwal @ 2020-11-12  5:39 UTC (permalink / raw)
  To: John Stultz, Daniel Vetter, Christian Koenig
  Cc: lkml, Liam Mark, Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Suren Baghdasaryan, Sandeep Patil, Daniel Mentz,
	Chris Goldsworthy, Ørjan Eide, Robin Murphy,
	Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list

Hi John,

On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
>
> Hey All,
>   So just wanted to send my last revision of my patch series
> of performance optimizations to the dma-buf system heap.

Thanks very much for your patches - I think the first 5 patches look good to me.

I know there was a bit of discussion over adding a new system-uncached
heap v/s using a flag to identify that; I think I prefer the separate
heap idea, but lets ask one last time if any one else has any real
objections to it.

Daniel, Christian: any comments from your side on this?

I am planning to merge this series to drm-misc this week if I hear no
objections.
>
> This series reworks the system heap to use sgtables, and then
> consolidates the pagelist method from the heap-helpers into the
> CMA heap. After which the heap-helpers logic is removed (as it
> is unused). I'd still like to find a better way to avoid some of
> the logic duplication in implementing the entire dma_buf_ops
> handlers per heap. But unfortunately that code is tied somewhat
> to how the buffer's memory is tracked. As more heaps show up I
> think we'll have a better idea how to best share code, so for
> now I think this is ok.
>
> After this, the series introduces an optimization that
> Ørjan Eide implemented for ION that avoids calling sync on
> attachments that don't have a mapping.
>
> Next, an optimization to use larger order pages for the system
> heap. This change brings us closer to the current performance
> of the ION allocation code (though there still is a gap due
> to ION using a mix of deferred-freeing and page pools, I'll be
> looking at integrating those eventually).
>
> Finally, a reworked version of my uncached system heap
> implementation I was submitting a few weeks back. Since it
> duplicated a lot of the now reworked system heap code, I
> realized it would be much simpler to add the functionality to
> the system_heap implementation itself.
>
> While not improving the core allocation performance, the
> uncached heap allocations do result in *much* improved
> performance on HiKey960 as it avoids a lot of flushing and
> invalidating buffers that the cpu doesn't touch often.
>
> Feedback on these would be great!
>
> thanks
> -john
>
> New in v5:
> * Added a comment explaining why the order sizes are
>   chosen as they are
>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: Liam Mark <lmark@codeaurora.org>
> Cc: Laura Abbott <labbott@kernel.org>
> Cc: Brian Starkey <Brian.Starkey@arm.com>
> Cc: Hridya Valsaraju <hridya@google.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Sandeep Patil <sspatil@google.com>
> Cc: Daniel Mentz <danielmentz@google.com>
> Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
> Cc: Ørjan Eide <orjan.eide@arm.com>
> Cc: Robin Murphy <robin.murphy@arm.com>
> Cc: Ezequiel Garcia <ezequiel@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: James Jones <jajones@nvidia.com>
> Cc: linux-media@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
>
> John Stultz (7):
>   dma-buf: system_heap: Rework system heap to use sgtables instead of
>     pagelists
>   dma-buf: heaps: Move heap-helper logic into the cma_heap
>     implementation
>   dma-buf: heaps: Remove heap-helpers code
>   dma-buf: heaps: Skip sync if not mapped
>   dma-buf: system_heap: Allocate higher order pages if available
>   dma-buf: dma-heap: Keep track of the heap device struct
>   dma-buf: system_heap: Add a system-uncached heap re-using the system
>     heap
>
>  drivers/dma-buf/dma-heap.c           |  33 +-
>  drivers/dma-buf/heaps/Makefile       |   1 -
>  drivers/dma-buf/heaps/cma_heap.c     | 324 +++++++++++++++---
>  drivers/dma-buf/heaps/heap-helpers.c | 270 ---------------
>  drivers/dma-buf/heaps/heap-helpers.h |  53 ---
>  drivers/dma-buf/heaps/system_heap.c  | 494 ++++++++++++++++++++++++---
>  include/linux/dma-heap.h             |   9 +
>  7 files changed, 753 insertions(+), 431 deletions(-)
>  delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
>  delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h
>
> --
> 2.17.1
>
Thanks much,

Best,
Sumit.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-12  5:39 ` [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation Sumit Semwal
@ 2020-11-12  9:32   ` Daniel Vetter
  2020-11-13  4:11     ` John Stultz
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2020-11-12  9:32 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: John Stultz, Daniel Vetter, Christian Koenig, lkml, Liam Mark,
	Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Suren Baghdasaryan, Sandeep Patil, Daniel Mentz,
	Chris Goldsworthy, Ørjan Eide, Robin Murphy,
	Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list

On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> Hi John,
> 
> On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
> >
> > Hey All,
> >   So just wanted to send my last revision of my patch series
> > of performance optimizations to the dma-buf system heap.
> 
> Thanks very much for your patches - I think the first 5 patches look good to me.
> 
> I know there was a bit of discussion over adding a new system-uncached
> heap v/s using a flag to identify that; I think I prefer the separate
> heap idea, but lets ask one last time if any one else has any real
> objections to it.
> 
> Daniel, Christian: any comments from your side on this?

I do wonder a bit where the userspace stack for this all is, since tuning
allocators without a full stack is fairly pointless. dma-buf heaps is a
bit in a limbo situation here it feels like.

Plus I'm vary of anything related to leaking this kind of stuff beyond the
dma-api because dma api maintainers don't like us doing that. But
personally no concern on that front really, gpus need this. It's just that
we do need solid justification I think if we land this. Hence back to
first point.

Ideally first point comes in the form of benchmarking on android together
with a mesa driver (or mesa + some v4l driver or whatever it takes to
actually show the benefits, I have no idea).
-Daniel

> 
> I am planning to merge this series to drm-misc this week if I hear no
> objections.
> >
> > This series reworks the system heap to use sgtables, and then
> > consolidates the pagelist method from the heap-helpers into the
> > CMA heap. After which the heap-helpers logic is removed (as it
> > is unused). I'd still like to find a better way to avoid some of
> > the logic duplication in implementing the entire dma_buf_ops
> > handlers per heap. But unfortunately that code is tied somewhat
> > to how the buffer's memory is tracked. As more heaps show up I
> > think we'll have a better idea how to best share code, so for
> > now I think this is ok.
> >
> > After this, the series introduces an optimization that
> > Ørjan Eide implemented for ION that avoids calling sync on
> > attachments that don't have a mapping.
> >
> > Next, an optimization to use larger order pages for the system
> > heap. This change brings us closer to the current performance
> > of the ION allocation code (though there still is a gap due
> > to ION using a mix of deferred-freeing and page pools, I'll be
> > looking at integrating those eventually).
> >
> > Finally, a reworked version of my uncached system heap
> > implementation I was submitting a few weeks back. Since it
> > duplicated a lot of the now reworked system heap code, I
> > realized it would be much simpler to add the functionality to
> > the system_heap implementation itself.
> >
> > While not improving the core allocation performance, the
> > uncached heap allocations do result in *much* improved
> > performance on HiKey960 as it avoids a lot of flushing and
> > invalidating buffers that the cpu doesn't touch often.
> >
> > Feedback on these would be great!
> >
> > thanks
> > -john
> >
> > New in v5:
> > * Added a comment explaining why the order sizes are
> >   chosen as they are
> >
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>
> > Cc: Liam Mark <lmark@codeaurora.org>
> > Cc: Laura Abbott <labbott@kernel.org>
> > Cc: Brian Starkey <Brian.Starkey@arm.com>
> > Cc: Hridya Valsaraju <hridya@google.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Sandeep Patil <sspatil@google.com>
> > Cc: Daniel Mentz <danielmentz@google.com>
> > Cc: Chris Goldsworthy <cgoldswo@codeaurora.org>
> > Cc: Ørjan Eide <orjan.eide@arm.com>
> > Cc: Robin Murphy <robin.murphy@arm.com>
> > Cc: Ezequiel Garcia <ezequiel@collabora.com>
> > Cc: Simon Ser <contact@emersion.fr>
> > Cc: James Jones <jajones@nvidia.com>
> > Cc: linux-media@vger.kernel.org
> > Cc: dri-devel@lists.freedesktop.org
> >
> > John Stultz (7):
> >   dma-buf: system_heap: Rework system heap to use sgtables instead of
> >     pagelists
> >   dma-buf: heaps: Move heap-helper logic into the cma_heap
> >     implementation
> >   dma-buf: heaps: Remove heap-helpers code
> >   dma-buf: heaps: Skip sync if not mapped
> >   dma-buf: system_heap: Allocate higher order pages if available
> >   dma-buf: dma-heap: Keep track of the heap device struct
> >   dma-buf: system_heap: Add a system-uncached heap re-using the system
> >     heap
> >
> >  drivers/dma-buf/dma-heap.c           |  33 +-
> >  drivers/dma-buf/heaps/Makefile       |   1 -
> >  drivers/dma-buf/heaps/cma_heap.c     | 324 +++++++++++++++---
> >  drivers/dma-buf/heaps/heap-helpers.c | 270 ---------------
> >  drivers/dma-buf/heaps/heap-helpers.h |  53 ---
> >  drivers/dma-buf/heaps/system_heap.c  | 494 ++++++++++++++++++++++++---
> >  include/linux/dma-heap.h             |   9 +
> >  7 files changed, 753 insertions(+), 431 deletions(-)
> >  delete mode 100644 drivers/dma-buf/heaps/heap-helpers.c
> >  delete mode 100644 drivers/dma-buf/heaps/heap-helpers.h
> >
> > --
> > 2.17.1
> >
> Thanks much,
> 
> Best,
> Sumit.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-12  9:32   ` Daniel Vetter
@ 2020-11-13  4:11     ` John Stultz
  2020-11-13 20:39       ` Daniel Vetter
  0 siblings, 1 reply; 18+ messages in thread
From: John Stultz @ 2020-11-13  4:11 UTC (permalink / raw)
  To: Sumit Semwal, John Stultz, Christian Koenig, lkml, Liam Mark,
	Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Suren Baghdasaryan, Sandeep Patil, Daniel Mentz,
	Chris Goldsworthy, Ørjan Eide, Robin Murphy,
	Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list
  Cc: Daniel Vetter

On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
> > >
> > > Hey All,
> > >   So just wanted to send my last revision of my patch series
> > > of performance optimizations to the dma-buf system heap.
> >
> > Thanks very much for your patches - I think the first 5 patches look good to me.
> >
> > I know there was a bit of discussion over adding a new system-uncached
> > heap v/s using a flag to identify that; I think I prefer the separate
> > heap idea, but lets ask one last time if any one else has any real
> > objections to it.
> >
> > Daniel, Christian: any comments from your side on this?
>
> I do wonder a bit where the userspace stack for this all is, since tuning
> allocators without a full stack is fairly pointless. dma-buf heaps is a
> bit in a limbo situation here it feels like.

As mentioned in the system-uncached patch:
Pending opensource users of this code include:
* AOSP HiKey960 gralloc:
  - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
  - Visibly improves performance over the system heap
* AOSP Codec2 (possibly, needs more review):
  - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325

Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
able to use the current dmabuf heaps instead of ION.

So I'm not sure what you mean by limbo, other than it being in a
transition state where the interface is upstream and we're working on
moving vendors to it from ION (which is staged to be dropped in 5.11).
Part of that work is making sure we don't regress the performance
expectations.

> Plus I'm vary of anything related to leaking this kind of stuff beyond the
> dma-api because dma api maintainers don't like us doing that. But
> personally no concern on that front really, gpus need this. It's just that
> we do need solid justification I think if we land this. Hence back to
> first point.
>
> Ideally first point comes in the form of benchmarking on android together
> with a mesa driver (or mesa + some v4l driver or whatever it takes to
> actually show the benefits, I have no idea).

Tying it with mesa is a little tough as the grallocs for mesa devices
usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
allocation path for dmabuf heaps there gets a little complex as last I
tried that (when trying to get HiKey working with Lima graphics, as
gbm wouldn't allocate the contiguous buffers required by the display),
I ran into issues with the drm_hwcomposer and mesa expecting the gbm
private handle metadata in the buffer when it was passed in.

But I might take a look at it again. I got a bit lost digging through
the mesa gbm allocation paths last time.

I'll also try to see if I can find a benchmark for the codec2 code
(using dmabuf heaps with and without the uncached heap) on on db845c
(w/ mesa), as that is already working and I suspect that might be
close to what you're looking for.

thanks
-john

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-13  4:11     ` John Stultz
@ 2020-11-13 20:39       ` Daniel Vetter
  2020-11-18  2:40         ` John Stultz
  2020-12-07 22:01         ` Nicolas Dufresne
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Vetter @ 2020-11-13 20:39 UTC (permalink / raw)
  To: John Stultz
  Cc: Sumit Semwal, Christian Koenig, lkml, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list,
	Daniel Vetter

On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote:
> On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
> > > >
> > > > Hey All,
> > > >   So just wanted to send my last revision of my patch series
> > > > of performance optimizations to the dma-buf system heap.
> > >
> > > Thanks very much for your patches - I think the first 5 patches look good to me.
> > >
> > > I know there was a bit of discussion over adding a new system-uncached
> > > heap v/s using a flag to identify that; I think I prefer the separate
> > > heap idea, but lets ask one last time if any one else has any real
> > > objections to it.
> > >
> > > Daniel, Christian: any comments from your side on this?
> >
> > I do wonder a bit where the userspace stack for this all is, since tuning
> > allocators without a full stack is fairly pointless. dma-buf heaps is a
> > bit in a limbo situation here it feels like.
> 
> As mentioned in the system-uncached patch:
> Pending opensource users of this code include:
> * AOSP HiKey960 gralloc:
>   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
>   - Visibly improves performance over the system heap
> * AOSP Codec2 (possibly, needs more review):
>   - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> 
> Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
> able to use the current dmabuf heaps instead of ION.
> 
> So I'm not sure what you mean by limbo, other than it being in a
> transition state where the interface is upstream and we're working on
> moving vendors to it from ION (which is staged to be dropped in 5.11).
> Part of that work is making sure we don't regress the performance
> expectations.

The mesa thing below, since if we test this with some downstream kernel
drivers or at least non-mesa userspace I'm somewhat worried we're just
creating a nice split world between the android gfx world and the
mesa/linux desktop gfx world.

But then that's kinda how android rolls, so *shrug*

> > Plus I'm vary of anything related to leaking this kind of stuff beyond the
> > dma-api because dma api maintainers don't like us doing that. But
> > personally no concern on that front really, gpus need this. It's just that
> > we do need solid justification I think if we land this. Hence back to
> > first point.
> >
> > Ideally first point comes in the form of benchmarking on android together
> > with a mesa driver (or mesa + some v4l driver or whatever it takes to
> > actually show the benefits, I have no idea).
> 
> Tying it with mesa is a little tough as the grallocs for mesa devices
> usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
> allocation path for dmabuf heaps there gets a little complex as last I
> tried that (when trying to get HiKey working with Lima graphics, as
> gbm wouldn't allocate the contiguous buffers required by the display),
> I ran into issues with the drm_hwcomposer and mesa expecting the gbm
> private handle metadata in the buffer when it was passed in.
> 
> But I might take a look at it again. I got a bit lost digging through
> the mesa gbm allocation paths last time.
> 
> I'll also try to see if I can find a benchmark for the codec2 code
> (using dmabuf heaps with and without the uncached heap) on on db845c
> (w/ mesa), as that is already working and I suspect that might be
> close to what you're looking for.

tbh I think trying to push for this long term is the best we can hope for.

Media is also a lot more *meh* since it's deeply fragmented and a lot less
of it upstream than on the gles/display side.

I think confirming that this at least doesn't horrible blow up on a
gralloc/gbm+mesa stack would be useful I think.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-13 20:39       ` Daniel Vetter
@ 2020-11-18  2:40         ` John Stultz
  2020-11-18  7:46           ` Daniel Vetter
  2020-12-07 22:01         ` Nicolas Dufresne
  1 sibling, 1 reply; 18+ messages in thread
From: John Stultz @ 2020-11-18  2:40 UTC (permalink / raw)
  To: John Stultz, Sumit Semwal, Christian Koenig, lkml, Liam Mark,
	Laura Abbott, Brian Starkey, Hridya Valsaraju,
	Suren Baghdasaryan, Sandeep Patil, Daniel Mentz,
	Chris Goldsworthy, Ørjan Eide, Robin Murphy,
	Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list
  Cc: Daniel Vetter

On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote:
> > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
> > > > >
> > > > > Hey All,
> > > > >   So just wanted to send my last revision of my patch series
> > > > > of performance optimizations to the dma-buf system heap.
> > > >
> > > > Thanks very much for your patches - I think the first 5 patches look good to me.
> > > >
> > > > I know there was a bit of discussion over adding a new system-uncached
> > > > heap v/s using a flag to identify that; I think I prefer the separate
> > > > heap idea, but lets ask one last time if any one else has any real
> > > > objections to it.
> > > >
> > > > Daniel, Christian: any comments from your side on this?
> > >
> > > I do wonder a bit where the userspace stack for this all is, since tuning
> > > allocators without a full stack is fairly pointless. dma-buf heaps is a
> > > bit in a limbo situation here it feels like.
> >
> > As mentioned in the system-uncached patch:
> > Pending opensource users of this code include:
> > * AOSP HiKey960 gralloc:
> >   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
> >   - Visibly improves performance over the system heap
> > * AOSP Codec2 (possibly, needs more review):
> >   - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> >
> > Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
> > able to use the current dmabuf heaps instead of ION.
> >
> > So I'm not sure what you mean by limbo, other than it being in a
> > transition state where the interface is upstream and we're working on
> > moving vendors to it from ION (which is staged to be dropped in 5.11).
> > Part of that work is making sure we don't regress the performance
> > expectations.
>
> The mesa thing below, since if we test this with some downstream kernel
> drivers or at least non-mesa userspace I'm somewhat worried we're just
> creating a nice split world between the android gfx world and the
> mesa/linux desktop gfx world.
>
> But then that's kinda how android rolls, so *shrug*
>
> > > Plus I'm vary of anything related to leaking this kind of stuff beyond the
> > > dma-api because dma api maintainers don't like us doing that. But
> > > personally no concern on that front really, gpus need this. It's just that
> > > we do need solid justification I think if we land this. Hence back to
> > > first point.
> > >
> > > Ideally first point comes in the form of benchmarking on android together
> > > with a mesa driver (or mesa + some v4l driver or whatever it takes to
> > > actually show the benefits, I have no idea).
> >
> > Tying it with mesa is a little tough as the grallocs for mesa devices
> > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
> > allocation path for dmabuf heaps there gets a little complex as last I
> > tried that (when trying to get HiKey working with Lima graphics, as
> > gbm wouldn't allocate the contiguous buffers required by the display),
> > I ran into issues with the drm_hwcomposer and mesa expecting the gbm
> > private handle metadata in the buffer when it was passed in.
> >
> > But I might take a look at it again. I got a bit lost digging through
> > the mesa gbm allocation paths last time.
> >
> > I'll also try to see if I can find a benchmark for the codec2 code
> > (using dmabuf heaps with and without the uncached heap) on on db845c
> > (w/ mesa), as that is already working and I suspect that might be
> > close to what you're looking for.
>
> tbh I think trying to push for this long term is the best we can hope for.
>
> Media is also a lot more *meh* since it's deeply fragmented and a lot less
> of it upstream than on the gles/display side.
>
> I think confirming that this at least doesn't horrible blow up on a
> gralloc/gbm+mesa stack would be useful I think.

Sorry, I'm still a little foggy on precisely what you're suggesting here.

The patch stack I have has already been used with db845c (mesa +
gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps.
So no blowing up there. And I'm working with Hridya to find a
benchmark for codec2 so we can try to show the performance delta.

However, if you're wanting a dma-buf gralloc implementation with mesa,
that may be a little tougher to do, but I guess I can give it a go.

Hopefully this will address concerns about the system-uncached heap
patch (the last two patches in this series)?

In the meantime I hope we can queue the first five patches, as it
would be nice to get the code rearranging in as there are others
trying to stage their own heaps, and I'd like to avoid dragging that
churn out for too long (in addition to improving the allocation
performance). Those changes have no ABI implications.

thanks
-john

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-18  2:40         ` John Stultz
@ 2020-11-18  7:46           ` Daniel Vetter
  2020-11-20  6:32             ` Sumit Semwal
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Vetter @ 2020-11-18  7:46 UTC (permalink / raw)
  To: John Stultz
  Cc: Sumit Semwal, Christian Koenig, lkml, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list

On Wed, Nov 18, 2020 at 3:40 AM John Stultz <john.stultz@linaro.org> wrote:
> On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote:
> > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
> > > > > >
> > > > > > Hey All,
> > > > > >   So just wanted to send my last revision of my patch series
> > > > > > of performance optimizations to the dma-buf system heap.
> > > > >
> > > > > Thanks very much for your patches - I think the first 5 patches look good to me.
> > > > >
> > > > > I know there was a bit of discussion over adding a new system-uncached
> > > > > heap v/s using a flag to identify that; I think I prefer the separate
> > > > > heap idea, but lets ask one last time if any one else has any real
> > > > > objections to it.
> > > > >
> > > > > Daniel, Christian: any comments from your side on this?
> > > >
> > > > I do wonder a bit where the userspace stack for this all is, since tuning
> > > > allocators without a full stack is fairly pointless. dma-buf heaps is a
> > > > bit in a limbo situation here it feels like.
> > >
> > > As mentioned in the system-uncached patch:
> > > Pending opensource users of this code include:
> > > * AOSP HiKey960 gralloc:
> > >   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
> > >   - Visibly improves performance over the system heap
> > > * AOSP Codec2 (possibly, needs more review):
> > >   - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> > >
> > > Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
> > > able to use the current dmabuf heaps instead of ION.
> > >
> > > So I'm not sure what you mean by limbo, other than it being in a
> > > transition state where the interface is upstream and we're working on
> > > moving vendors to it from ION (which is staged to be dropped in 5.11).
> > > Part of that work is making sure we don't regress the performance
> > > expectations.
> >
> > The mesa thing below, since if we test this with some downstream kernel
> > drivers or at least non-mesa userspace I'm somewhat worried we're just
> > creating a nice split world between the android gfx world and the
> > mesa/linux desktop gfx world.
> >
> > But then that's kinda how android rolls, so *shrug*
> >
> > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the
> > > > dma-api because dma api maintainers don't like us doing that. But
> > > > personally no concern on that front really, gpus need this. It's just that
> > > > we do need solid justification I think if we land this. Hence back to
> > > > first point.
> > > >
> > > > Ideally first point comes in the form of benchmarking on android together
> > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to
> > > > actually show the benefits, I have no idea).
> > >
> > > Tying it with mesa is a little tough as the grallocs for mesa devices
> > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
> > > allocation path for dmabuf heaps there gets a little complex as last I
> > > tried that (when trying to get HiKey working with Lima graphics, as
> > > gbm wouldn't allocate the contiguous buffers required by the display),
> > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm
> > > private handle metadata in the buffer when it was passed in.
> > >
> > > But I might take a look at it again. I got a bit lost digging through
> > > the mesa gbm allocation paths last time.
> > >
> > > I'll also try to see if I can find a benchmark for the codec2 code
> > > (using dmabuf heaps with and without the uncached heap) on on db845c
> > > (w/ mesa), as that is already working and I suspect that might be
> > > close to what you're looking for.
> >
> > tbh I think trying to push for this long term is the best we can hope for.
> >
> > Media is also a lot more *meh* since it's deeply fragmented and a lot less
> > of it upstream than on the gles/display side.
> >
> > I think confirming that this at least doesn't horrible blow up on a
> > gralloc/gbm+mesa stack would be useful I think.
>
> Sorry, I'm still a little foggy on precisely what you're suggesting here.
>
> The patch stack I have has already been used with db845c (mesa +
> gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps.
> So no blowing up there. And I'm working with Hridya to find a
> benchmark for codec2 so we can try to show the performance delta.
>
> However, if you're wanting a dma-buf gralloc implementation with mesa,
> that may be a little tougher to do, but I guess I can give it a go.
>
> Hopefully this will address concerns about the system-uncached heap
> patch (the last two patches in this series)?
>
> In the meantime I hope we can queue the first five patches, as it
> would be nice to get the code rearranging in as there are others
> trying to stage their own heaps, and I'd like to avoid dragging that
> churn out for too long (in addition to improving the allocation
> performance). Those changes have no ABI implications.

Maybe I'm also misunderstanding what dma-buf heaps is used for in
Android, at least usually. I thought it's used to allocate all the
winsys/shared buffers through gralloc (at least in the blobby stacks),
to handle the allocation constraints problem. In the open stacks we
don't seem to have a platform with both mesa and v4l (or some other
codec) with "interesting" allocations constraints, so no one using
that gralloc+dma-buf heaps combo for what it was meant for. Hence why
I'm a bit vary that we're creating something here which just misses
the point a bit when we try to actually use it (in that glorious
forever-future world where an android platform has enough drivers in
upstream to do so).

For other "this solves a system problem" we tend to be quite a bit
more picky with the demonstration use case, to make sure we're
actually creating something that solves the problem in reality.

But it also looks like Android's just not there yet, so *shrug* ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-18  7:46           ` Daniel Vetter
@ 2020-11-20  6:32             ` Sumit Semwal
  2020-11-20  9:34               ` Daniel Vetter
  0 siblings, 1 reply; 18+ messages in thread
From: Sumit Semwal @ 2020-11-20  6:32 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: John Stultz, Christian Koenig, lkml, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list

Hi Daniel,


On Wed, 18 Nov 2020 at 13:16, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Nov 18, 2020 at 3:40 AM John Stultz <john.stultz@linaro.org> wrote:
> > On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote:
> > > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
> > > > > > >
> > > > > > > Hey All,
> > > > > > >   So just wanted to send my last revision of my patch series
> > > > > > > of performance optimizations to the dma-buf system heap.
> > > > > >
> > > > > > Thanks very much for your patches - I think the first 5 patches look good to me.
> > > > > >
> > > > > > I know there was a bit of discussion over adding a new system-uncached
> > > > > > heap v/s using a flag to identify that; I think I prefer the separate
> > > > > > heap idea, but lets ask one last time if any one else has any real
> > > > > > objections to it.
> > > > > >
> > > > > > Daniel, Christian: any comments from your side on this?
> > > > >
> > > > > I do wonder a bit where the userspace stack for this all is, since tuning
> > > > > allocators without a full stack is fairly pointless. dma-buf heaps is a
> > > > > bit in a limbo situation here it feels like.
> > > >
> > > > As mentioned in the system-uncached patch:
> > > > Pending opensource users of this code include:
> > > > * AOSP HiKey960 gralloc:
> > > >   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
> > > >   - Visibly improves performance over the system heap
> > > > * AOSP Codec2 (possibly, needs more review):
> > > >   - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> > > >
> > > > Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
> > > > able to use the current dmabuf heaps instead of ION.
> > > >
> > > > So I'm not sure what you mean by limbo, other than it being in a
> > > > transition state where the interface is upstream and we're working on
> > > > moving vendors to it from ION (which is staged to be dropped in 5.11).
> > > > Part of that work is making sure we don't regress the performance
> > > > expectations.
> > >
> > > The mesa thing below, since if we test this with some downstream kernel
> > > drivers or at least non-mesa userspace I'm somewhat worried we're just
> > > creating a nice split world between the android gfx world and the
> > > mesa/linux desktop gfx world.
> > >
> > > But then that's kinda how android rolls, so *shrug*
> > >
> > > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the
> > > > > dma-api because dma api maintainers don't like us doing that. But
> > > > > personally no concern on that front really, gpus need this. It's just that
> > > > > we do need solid justification I think if we land this. Hence back to
> > > > > first point.
> > > > >
> > > > > Ideally first point comes in the form of benchmarking on android together
> > > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to
> > > > > actually show the benefits, I have no idea).
> > > >
> > > > Tying it with mesa is a little tough as the grallocs for mesa devices
> > > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
> > > > allocation path for dmabuf heaps there gets a little complex as last I
> > > > tried that (when trying to get HiKey working with Lima graphics, as
> > > > gbm wouldn't allocate the contiguous buffers required by the display),
> > > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm
> > > > private handle metadata in the buffer when it was passed in.
> > > >
> > > > But I might take a look at it again. I got a bit lost digging through
> > > > the mesa gbm allocation paths last time.
> > > >
> > > > I'll also try to see if I can find a benchmark for the codec2 code
> > > > (using dmabuf heaps with and without the uncached heap) on on db845c
> > > > (w/ mesa), as that is already working and I suspect that might be
> > > > close to what you're looking for.
> > >
> > > tbh I think trying to push for this long term is the best we can hope for.
> > >
> > > Media is also a lot more *meh* since it's deeply fragmented and a lot less
> > > of it upstream than on the gles/display side.
> > >
> > > I think confirming that this at least doesn't horrible blow up on a
> > > gralloc/gbm+mesa stack would be useful I think.
> >
> > Sorry, I'm still a little foggy on precisely what you're suggesting here.
> >
> > The patch stack I have has already been used with db845c (mesa +
> > gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps.
> > So no blowing up there. And I'm working with Hridya to find a
> > benchmark for codec2 so we can try to show the performance delta.
> >
> > However, if you're wanting a dma-buf gralloc implementation with mesa,
> > that may be a little tougher to do, but I guess I can give it a go.
> >
> > Hopefully this will address concerns about the system-uncached heap
> > patch (the last two patches in this series)?
> >
> > In the meantime I hope we can queue the first five patches, as it
> > would be nice to get the code rearranging in as there are others
> > trying to stage their own heaps, and I'd like to avoid dragging that
> > churn out for too long (in addition to improving the allocation
> > performance). Those changes have no ABI implications.
>
> Maybe I'm also misunderstanding what dma-buf heaps is used for in
> Android, at least usually. I thought it's used to allocate all the
> winsys/shared buffers through gralloc (at least in the blobby stacks),
> to handle the allocation constraints problem. In the open stacks we
> don't seem to have a platform with both mesa and v4l (or some other
> codec) with "interesting" allocations constraints, so no one using
> that gralloc+dma-buf heaps combo for what it was meant for. Hence why
> I'm a bit vary that we're creating something here which just misses
> the point a bit when we try to actually use it (in that glorious
> forever-future world where an android platform has enough drivers in
> upstream to do so).
>
> For other "this solves a system problem" we tend to be quite a bit
> more picky with the demonstration use case, to make sure we're
> actually creating something that solves the problem in reality.
>
> But it also looks like Android's just not there yet, so *shrug* ...

For me, looking at the first 5 patches (listed below, for quick
reference), they are only doing code reorganisation and minor updates
for already existing heaps, and no ABI change, I am not able to
clearly see your objection here. To me, these seem to be required
updates that the existing system heap users can benefit from.

dma-buf: system_heap: Rework system heap to use sgtables instead of
    pagelists
  dma-buf: heaps: Move heap-helper logic into the cma_heap
    implementation
  dma-buf: heaps: Remove heap-helpers code
  dma-buf: heaps: Skip sync if not mapped
  dma-buf: system_heap: Allocate higher order pages if available

If we talk about the last two patches - the ones that add system
uncached heap, I somewhat agree that we should be able to show the
performance gains with this approach (which has been in use on ION and
in devices) using dma-buf gralloc or similar.

We can discuss the system-uncached heap when the dma-buf gralloc or
similar demonstration for performance benefits is done, but I am
inclined to push these 5 patches listed above through.

Best,
Sumit.

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-20  6:32             ` Sumit Semwal
@ 2020-11-20  9:34               ` Daniel Vetter
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Vetter @ 2020-11-20  9:34 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: John Stultz, Christian Koenig, lkml, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list

On Fri, Nov 20, 2020 at 7:32 AM Sumit Semwal <sumit.semwal@linaro.org> wrote:
>
> Hi Daniel,
>
>
> On Wed, 18 Nov 2020 at 13:16, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Nov 18, 2020 at 3:40 AM John Stultz <john.stultz@linaro.org> wrote:
> > > On Fri, Nov 13, 2020 at 12:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote:
> > > > > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > > > > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org> wrote:
> > > > > > > >
> > > > > > > > Hey All,
> > > > > > > >   So just wanted to send my last revision of my patch series
> > > > > > > > of performance optimizations to the dma-buf system heap.
> > > > > > >
> > > > > > > Thanks very much for your patches - I think the first 5 patches look good to me.
> > > > > > >
> > > > > > > I know there was a bit of discussion over adding a new system-uncached
> > > > > > > heap v/s using a flag to identify that; I think I prefer the separate
> > > > > > > heap idea, but lets ask one last time if any one else has any real
> > > > > > > objections to it.
> > > > > > >
> > > > > > > Daniel, Christian: any comments from your side on this?
> > > > > >
> > > > > > I do wonder a bit where the userspace stack for this all is, since tuning
> > > > > > allocators without a full stack is fairly pointless. dma-buf heaps is a
> > > > > > bit in a limbo situation here it feels like.
> > > > >
> > > > > As mentioned in the system-uncached patch:
> > > > > Pending opensource users of this code include:
> > > > > * AOSP HiKey960 gralloc:
> > > > >   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
> > > > >   - Visibly improves performance over the system heap
> > > > > * AOSP Codec2 (possibly, needs more review):
> > > > >   - https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> > > > >
> > > > > Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
> > > > > able to use the current dmabuf heaps instead of ION.
> > > > >
> > > > > So I'm not sure what you mean by limbo, other than it being in a
> > > > > transition state where the interface is upstream and we're working on
> > > > > moving vendors to it from ION (which is staged to be dropped in 5.11).
> > > > > Part of that work is making sure we don't regress the performance
> > > > > expectations.
> > > >
> > > > The mesa thing below, since if we test this with some downstream kernel
> > > > drivers or at least non-mesa userspace I'm somewhat worried we're just
> > > > creating a nice split world between the android gfx world and the
> > > > mesa/linux desktop gfx world.
> > > >
> > > > But then that's kinda how android rolls, so *shrug*
> > > >
> > > > > > Plus I'm vary of anything related to leaking this kind of stuff beyond the
> > > > > > dma-api because dma api maintainers don't like us doing that. But
> > > > > > personally no concern on that front really, gpus need this. It's just that
> > > > > > we do need solid justification I think if we land this. Hence back to
> > > > > > first point.
> > > > > >
> > > > > > Ideally first point comes in the form of benchmarking on android together
> > > > > > with a mesa driver (or mesa + some v4l driver or whatever it takes to
> > > > > > actually show the benefits, I have no idea).
> > > > >
> > > > > Tying it with mesa is a little tough as the grallocs for mesa devices
> > > > > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
> > > > > allocation path for dmabuf heaps there gets a little complex as last I
> > > > > tried that (when trying to get HiKey working with Lima graphics, as
> > > > > gbm wouldn't allocate the contiguous buffers required by the display),
> > > > > I ran into issues with the drm_hwcomposer and mesa expecting the gbm
> > > > > private handle metadata in the buffer when it was passed in.
> > > > >
> > > > > But I might take a look at it again. I got a bit lost digging through
> > > > > the mesa gbm allocation paths last time.
> > > > >
> > > > > I'll also try to see if I can find a benchmark for the codec2 code
> > > > > (using dmabuf heaps with and without the uncached heap) on on db845c
> > > > > (w/ mesa), as that is already working and I suspect that might be
> > > > > close to what you're looking for.
> > > >
> > > > tbh I think trying to push for this long term is the best we can hope for.
> > > >
> > > > Media is also a lot more *meh* since it's deeply fragmented and a lot less
> > > > of it upstream than on the gles/display side.
> > > >
> > > > I think confirming that this at least doesn't horrible blow up on a
> > > > gralloc/gbm+mesa stack would be useful I think.
> > >
> > > Sorry, I'm still a little foggy on precisely what you're suggesting here.
> > >
> > > The patch stack I have has already been used with db845c (mesa +
> > > gbm_grallloc), with the codec2 (sw decoders) using dmabuf heaps.
> > > So no blowing up there. And I'm working with Hridya to find a
> > > benchmark for codec2 so we can try to show the performance delta.
> > >
> > > However, if you're wanting a dma-buf gralloc implementation with mesa,
> > > that may be a little tougher to do, but I guess I can give it a go.
> > >
> > > Hopefully this will address concerns about the system-uncached heap
> > > patch (the last two patches in this series)?
> > >
> > > In the meantime I hope we can queue the first five patches, as it
> > > would be nice to get the code rearranging in as there are others
> > > trying to stage their own heaps, and I'd like to avoid dragging that
> > > churn out for too long (in addition to improving the allocation
> > > performance). Those changes have no ABI implications.
> >
> > Maybe I'm also misunderstanding what dma-buf heaps is used for in
> > Android, at least usually. I thought it's used to allocate all the
> > winsys/shared buffers through gralloc (at least in the blobby stacks),
> > to handle the allocation constraints problem. In the open stacks we
> > don't seem to have a platform with both mesa and v4l (or some other
> > codec) with "interesting" allocations constraints, so no one using
> > that gralloc+dma-buf heaps combo for what it was meant for. Hence why
> > I'm a bit vary that we're creating something here which just misses
> > the point a bit when we try to actually use it (in that glorious
> > forever-future world where an android platform has enough drivers in
> > upstream to do so).
> >
> > For other "this solves a system problem" we tend to be quite a bit
> > more picky with the demonstration use case, to make sure we're
> > actually creating something that solves the problem in reality.
> >
> > But it also looks like Android's just not there yet, so *shrug* ...
>
> For me, looking at the first 5 patches (listed below, for quick
> reference), they are only doing code reorganisation and minor updates
> for already existing heaps, and no ABI change, I am not able to
> clearly see your objection here. To me, these seem to be required
> updates that the existing system heap users can benefit from.
>
> dma-buf: system_heap: Rework system heap to use sgtables instead of
>     pagelists
>   dma-buf: heaps: Move heap-helper logic into the cma_heap
>     implementation
>   dma-buf: heaps: Remove heap-helpers code
>   dma-buf: heaps: Skip sync if not mapped
>   dma-buf: system_heap: Allocate higher order pages if available
>
> If we talk about the last two patches - the ones that add system
> uncached heap, I somewhat agree that we should be able to show the
> performance gains with this approach (which has been in use on ION and
> in devices) using dma-buf gralloc or similar.
>
> We can discuss the system-uncached heap when the dma-buf gralloc or
> similar demonstration for performance benefits is done, but I am
> inclined to push these 5 patches listed above through.

Yeah makes total sense - I was arguing about the new stuff, not the refactoring.
-Daniel

>
> Best,
> Sumit.
>
> > -Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
  2020-11-13 20:39       ` Daniel Vetter
  2020-11-18  2:40         ` John Stultz
@ 2020-12-07 22:01         ` Nicolas Dufresne
  1 sibling, 0 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2020-12-07 22:01 UTC (permalink / raw)
  To: Daniel Vetter, John Stultz
  Cc: Sumit Semwal, Christian Koenig, lkml, Liam Mark, Laura Abbott,
	Brian Starkey, Hridya Valsaraju, Suren Baghdasaryan,
	Sandeep Patil, Daniel Mentz, Chris Goldsworthy, Ørjan Eide,
	Robin Murphy, Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, DRI mailing list

Le vendredi 13 novembre 2020 à 21:39 +0100, Daniel Vetter a écrit :
> On Thu, Nov 12, 2020 at 08:11:02PM -0800, John Stultz wrote:
> > On Thu, Nov 12, 2020 at 1:32 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Thu, Nov 12, 2020 at 11:09:04AM +0530, Sumit Semwal wrote:
> > > > On Tue, 10 Nov 2020 at 09:19, John Stultz <john.stultz@linaro.org>
> > > > wrote:
> > > > > 
> > > > > Hey All,
> > > > >   So just wanted to send my last revision of my patch series
> > > > > of performance optimizations to the dma-buf system heap.
> > > > 
> > > > Thanks very much for your patches - I think the first 5 patches look
> > > > good to me.
> > > > 
> > > > I know there was a bit of discussion over adding a new system-uncached
> > > > heap v/s using a flag to identify that; I think I prefer the separate
> > > > heap idea, but lets ask one last time if any one else has any real
> > > > objections to it.
> > > > 
> > > > Daniel, Christian: any comments from your side on this?
> > > 
> > > I do wonder a bit where the userspace stack for this all is, since tuning
> > > allocators without a full stack is fairly pointless. dma-buf heaps is a
> > > bit in a limbo situation here it feels like.
> > 
> > As mentioned in the system-uncached patch:
> > Pending opensource users of this code include:
> > * AOSP HiKey960 gralloc:
> >   - https://android-review.googlesource.com/c/device/linaro/hikey/+/1399519
> >   - Visibly improves performance over the system heap
> > * AOSP Codec2 (possibly, needs more review):
> >   - 
> > https://android-review.googlesource.com/c/platform/frameworks/av/+/1360640/17/media/codec2/vndk/C2DmaBufAllocator.cpp#325
> > 
> > Additionally both the HiKey, HiKey960 grallocs  and Codec2 are already
> > able to use the current dmabuf heaps instead of ION.
> > 
> > So I'm not sure what you mean by limbo, other than it being in a
> > transition state where the interface is upstream and we're working on
> > moving vendors to it from ION (which is staged to be dropped in 5.11).
> > Part of that work is making sure we don't regress the performance
> > expectations.
> 
> The mesa thing below, since if we test this with some downstream kernel
> drivers or at least non-mesa userspace I'm somewhat worried we're just
> creating a nice split world between the android gfx world and the
> mesa/linux desktop gfx world.
> 
> But then that's kinda how android rolls, so *shrug*
> 
> > > Plus I'm vary of anything related to leaking this kind of stuff beyond the
> > > dma-api because dma api maintainers don't like us doing that. But
> > > personally no concern on that front really, gpus need this. It's just that
> > > we do need solid justification I think if we land this. Hence back to
> > > first point.
> > > 
> > > Ideally first point comes in the form of benchmarking on android together
> > > with a mesa driver (or mesa + some v4l driver or whatever it takes to
> > > actually show the benefits, I have no idea).
> > 
> > Tying it with mesa is a little tough as the grallocs for mesa devices
> > usually use gbm (gralloc.gbm or gralloc.minigbm). Swapping the
> > allocation path for dmabuf heaps there gets a little complex as last I
> > tried that (when trying to get HiKey working with Lima graphics, as
> > gbm wouldn't allocate the contiguous buffers required by the display),
> > I ran into issues with the drm_hwcomposer and mesa expecting the gbm
> > private handle metadata in the buffer when it was passed in.
> > 
> > But I might take a look at it again. I got a bit lost digging through
> > the mesa gbm allocation paths last time.
> > 
> > I'll also try to see if I can find a benchmark for the codec2 code
> > (using dmabuf heaps with and without the uncached heap) on on db845c
> > (w/ mesa), as that is already working and I suspect that might be
> > close to what you're looking for.
> 
> tbh I think trying to push for this long term is the best we can hope for.
> 
> Media is also a lot more *meh* since it's deeply fragmented and a lot less
> of it upstream than on the gles/display side.

Sorry to jump in, but I'd like to reset a bit. The Media APIs are a lot more
generic, most of the kernel API is usable without specific knowledge of the HW.
Pretty much all APIs are exercised through v4l2-ctl and v4l2-compliance on the
V4L2 side (including performance testing). It would be pretty straight forward
to demonstrate the use of DMABuf heaps (just do live resolution switching,
you'll beat the internal V4L2 allocator without even looking at DMA cache
optimization).

> 
> I think confirming that this at least doesn't horrible blow up on a
> gralloc/gbm+mesa stack would be useful I think.
> -Daniel



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation
       [not found] ` <CAF2Aj3iEUkBDyyWDT63iT_7KrquOcEo_L5rCteGF1OJg8Ux3ug@mail.gmail.com>
@ 2021-05-21 20:08   ` John Stultz
  0 siblings, 0 replies; 18+ messages in thread
From: John Stultz @ 2021-05-21 20:08 UTC (permalink / raw)
  To: Lee Jones
  Cc: lkml, Sumit Semwal, Liam Mark, Laura Abbott, Brian Starkey,
	Hridya Valsaraju, Suren Baghdasaryan, Sandeep Patil,
	Daniel Mentz, Chris Goldsworthy, Ørjan Eide, Robin Murphy,
	Ezequiel Garcia, Simon Ser, James Jones,
	open list:DMA BUFFER SHARING FRAMEWORK, dri-devel,
	Nicolas Dufresne

On Fri, May 21, 2021 at 2:40 AM Lee Jones <lee.jones@linaro.org> wrote:
> On Tue, 10 Nov 2020 at 03:49, John Stultz <john.stultz@linaro.org> wrote:
>> This series reworks the system heap to use sgtables, and then
>> consolidates the pagelist method from the heap-helpers into the
>> CMA heap. After which the heap-helpers logic is removed (as it
>> is unused). I'd still like to find a better way to avoid some of
>> the logic duplication in implementing the entire dma_buf_ops
>> handlers per heap. But unfortunately that code is tied somewhat
>> to how the buffer's memory is tracked. As more heaps show up I
>> think we'll have a better idea how to best share code, so for
>> now I think this is ok.
>>
>> After this, the series introduces an optimization that
>> Ørjan Eide implemented for ION that avoids calling sync on
>> attachments that don't have a mapping.
>>
>> Next, an optimization to use larger order pages for the system
>> heap. This change brings us closer to the current performance
>> of the ION allocation code (though there still is a gap due
>> to ION using a mix of deferred-freeing and page pools, I'll be
>> looking at integrating those eventually).
>>
>> Finally, a reworked version of my uncached system heap
>> implementation I was submitting a few weeks back. Since it
>> duplicated a lot of the now reworked system heap code, I
>> realized it would be much simpler to add the functionality to
>> the system_heap implementation itself.
>>
>> While not improving the core allocation performance, the
>> uncached heap allocations do result in *much* improved
>> performance on HiKey960 as it avoids a lot of flushing and
>> invalidating buffers that the cpu doesn't touch often.
>>
>
>
> John, did this ever make it past v5?  I don't see a follow-up.

So most of these have landed upstream already.

The one exception is the system-uncached heap implementation, as
DanielV wanted a usecase where it was beneficial to a device with an
open driver.
Unfortunately this hasn't been trivial to show with the open gpu
devices I have, but taking Nicolas Dufresne's note, we're looking to
enable v4l2 integration in AOSP on db845c, so we can hopefully show
some benefit there. The HAL integration work has been taking some time
to get working though.

So it's a bit blocked on that for now.

thanks
-john

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-05-21 20:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10  3:49 [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation John Stultz
2020-11-10  3:49 ` [PATCH v5 1/7] dma-buf: system_heap: Rework system heap to use sgtables instead of pagelists John Stultz
2020-11-10  3:49 ` [PATCH v5 2/7] dma-buf: heaps: Move heap-helper logic into the cma_heap implementation John Stultz
2020-11-10  3:49 ` [PATCH v5 3/7] dma-buf: heaps: Remove heap-helpers code John Stultz
2020-11-10  3:49 ` [PATCH v5 4/7] dma-buf: heaps: Skip sync if not mapped John Stultz
2020-11-10  3:49 ` [PATCH v5 5/7] dma-buf: system_heap: Allocate higher order pages if available John Stultz
2020-11-10  3:49 ` [PATCH v5 6/7] dma-buf: dma-heap: Keep track of the heap device struct John Stultz
2020-11-10  3:49 ` [PATCH v5 7/7] dma-buf: system_heap: Add a system-uncached heap re-using the system heap John Stultz
2020-11-12  5:39 ` [PATCH v5 0/7] dma-buf: Performance improvements for system heap & a system-uncached implementation Sumit Semwal
2020-11-12  9:32   ` Daniel Vetter
2020-11-13  4:11     ` John Stultz
2020-11-13 20:39       ` Daniel Vetter
2020-11-18  2:40         ` John Stultz
2020-11-18  7:46           ` Daniel Vetter
2020-11-20  6:32             ` Sumit Semwal
2020-11-20  9:34               ` Daniel Vetter
2020-12-07 22:01         ` Nicolas Dufresne
     [not found] ` <CAF2Aj3iEUkBDyyWDT63iT_7KrquOcEo_L5rCteGF1OJg8Ux3ug@mail.gmail.com>
2021-05-21 20:08   ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).