linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Add option to mmap GEM buffers cached
@ 2021-03-07 20:28 Paul Cercueil
  2021-03-07 20:28 ` [PATCH v2 1/5] drm: Add and export function drm_gem_cma_create_noncoherent Paul Cercueil
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Paul Cercueil @ 2021-03-07 20:28 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips, Paul Cercueil

Rework of my previous patchset which added support for GEM buffers
backed by non-coherent memory to the ingenic-drm driver.

Having GEM buffers backed by non-coherent memory is interesting in
the particular case where it is faster to render to a non-coherent
buffer then sync the data cache, than to render to a write-combine
buffer, and (by extension) much faster than using a shadow buffer.
This is true for instance on some Ingenic SoCs, where even simple
blits (e.g. memcpy) are about three times faster using this method.

For the record, the previous patchset was accepted for 5.10 then had
to be reverted, as it conflicted with some changes made to the DMA API.

This new patchset is pretty different as it adds the functionality to
the DRM core. The first three patches add variants to existing functions
but with the "non-coherent memory" twist, exported as GPL symbols. The
fourth patch adds a function to be used with the damage helpers.
Finally, the last patch adds support for non-coherent GEM buffers to the
ingenic-drm driver. The functionality is enabled through a module
parameter, and is disabled by default.

Cheers,
-Paul

Paul Cercueil (5):
  drm: Add and export function drm_gem_cma_create_noncoherent
  drm: Add and export function drm_gem_cma_dumb_create_noncoherent
  drm: Add and export function drm_gem_cma_mmap_noncoherent
  drm: Add and export function drm_gem_cma_sync_data
  drm/ingenic: Add option to alloc cached GEM buffers

 drivers/gpu/drm/drm_gem_cma_helper.c      | 223 +++++++++++++++++++---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  49 ++++-
 drivers/gpu/drm/ingenic/ingenic-drm.h     |   4 +
 drivers/gpu/drm/ingenic/ingenic-ipu.c     |  14 +-
 include/drm/drm_gem_cma_helper.h          |  13 ++
 5 files changed, 273 insertions(+), 30 deletions(-)

-- 
2.30.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/5] drm: Add and export function drm_gem_cma_create_noncoherent
  2021-03-07 20:28 [PATCH v2 0/5] Add option to mmap GEM buffers cached Paul Cercueil
@ 2021-03-07 20:28 ` Paul Cercueil
  2021-03-11 12:23   ` Christoph Hellwig
  2021-03-07 20:28 ` [PATCH v2 2/5] drm: Add and export function drm_gem_cma_dumb_create_noncoherent Paul Cercueil
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-07 20:28 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips, Paul Cercueil

This function can be used by drivers that need to create a GEM object
with non-coherent backing memory.

Creating non-coherent CMA objects is useful on architectures where
writing to a buffer with the non-coherent cache attribute set then
invalidating the cache is faster than writing to the same buffer with
the write-combine cache attribute set. This is the case for instance on
some Ingenic SoCs.

v2: Add inline doc about why we need this, and improve commit message

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 76 +++++++++++++++++++++-------
 include/drm/drm_gem_cma_helper.h     |  2 +
 2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 7942cf05cd93..917b092b23c2 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -90,21 +90,10 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
 	return ERR_PTR(ret);
 }
 
-/**
- * drm_gem_cma_create - allocate an object with the given size
- * @drm: DRM device
- * @size: size of the object to allocate
- *
- * This function creates a CMA GEM object and allocates a contiguous chunk of
- * memory as backing store. The backing memory has the writecombine attribute
- * set.
- *
- * Returns:
- * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative
- * error code on failure.
- */
-struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
-					      size_t size)
+static struct drm_gem_cma_object *
+drm_gem_cma_create_with_cache_param(struct drm_device *drm,
+				    size_t size,
+				    bool noncoherent)
 {
 	struct drm_gem_cma_object *cma_obj;
 	int ret;
@@ -115,8 +104,16 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
 	if (IS_ERR(cma_obj))
 		return cma_obj;
 
-	cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
-				      GFP_KERNEL | __GFP_NOWARN);
+	if (noncoherent) {
+		cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
+						       &cma_obj->paddr,
+						       DMA_TO_DEVICE,
+						       GFP_KERNEL | __GFP_NOWARN);
+
+	} else {
+		cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
+					      GFP_KERNEL | __GFP_NOWARN);
+	}
 	if (!cma_obj->vaddr) {
 		drm_dbg(drm, "failed to allocate buffer with size %zu\n",
 			 size);
@@ -130,6 +127,51 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
 	drm_gem_object_put(&cma_obj->base);
 	return ERR_PTR(ret);
 }
+
+/**
+ * drm_gem_cma_create_noncoherent - allocate an object with the given size
+ *     and non-coherent cache attribute
+ * @drm: DRM device
+ * @size: size of the object to allocate
+ *
+ * This function creates a CMA GEM object and allocates a contiguous chunk of
+ * memory as backing store. The backing memory has the noncoherent attribute
+ * set.
+ *
+ * Creating non-coherent CMA objects is useful on architectures where writing
+ * to a buffer with the non-coherent cache attribute set then invalidating the
+ * cache is faster than writing to the same buffer with the write-combine cache
+ * attribute set. This is the case for instance on some Ingenic SoCs.
+ *
+ * Returns:
+ * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_cma_object *
+drm_gem_cma_create_noncoherent(struct drm_device *drm, size_t size)
+{
+	return drm_gem_cma_create_with_cache_param(drm, size, true);
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_create_noncoherent);
+
+/**
+ * drm_gem_cma_create - allocate an object with the given size
+ * @drm: DRM device
+ * @size: size of the object to allocate
+ *
+ * This function creates a CMA GEM object and allocates a contiguous chunk of
+ * memory as backing store. The backing memory has the writecombine attribute
+ * set.
+ *
+ * Returns:
+ * A struct drm_gem_cma_object * on success or an ERR_PTR()-encoded negative
+ * error code on failure.
+ */
+struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
+					      size_t size)
+{
+	return drm_gem_cma_create_with_cache_param(drm, size, false);
+}
 EXPORT_SYMBOL_GPL(drm_gem_cma_create);
 
 /**
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 0a9711caa3e8..360771f5f485 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -79,6 +79,8 @@ int drm_gem_cma_dumb_create(struct drm_file *file_priv,
 /* allocate physical memory */
 struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
 					      size_t size);
+struct drm_gem_cma_object *
+drm_gem_cma_create_noncoherent(struct drm_device *drm, size_t size);
 
 extern const struct vm_operations_struct drm_gem_cma_vm_ops;
 
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/5] drm: Add and export function drm_gem_cma_dumb_create_noncoherent
  2021-03-07 20:28 [PATCH v2 0/5] Add option to mmap GEM buffers cached Paul Cercueil
  2021-03-07 20:28 ` [PATCH v2 1/5] drm: Add and export function drm_gem_cma_create_noncoherent Paul Cercueil
@ 2021-03-07 20:28 ` Paul Cercueil
  2021-03-07 20:28 ` [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent Paul Cercueil
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Paul Cercueil @ 2021-03-07 20:28 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips, Paul Cercueil

This function can be used by drivers to create dumb buffers with
non-coherent backing memory.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 37 +++++++++++++++++++++++++---
 include/drm/drm_gem_cma_helper.h     |  5 ++++
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 917b092b23c2..d100c5f9c140 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -181,6 +181,7 @@ EXPORT_SYMBOL_GPL(drm_gem_cma_create);
  * @drm: DRM device
  * @size: size of the object to allocate
  * @handle: return location for the GEM handle
+ * @noncoherent: allocate object with non-coherent cache attribute
  *
  * This function creates a CMA GEM object, allocating a physically contiguous
  * chunk of memory as backing store. The GEM object is then added to the list
@@ -193,13 +194,13 @@ EXPORT_SYMBOL_GPL(drm_gem_cma_create);
 static struct drm_gem_cma_object *
 drm_gem_cma_create_with_handle(struct drm_file *file_priv,
 			       struct drm_device *drm, size_t size,
-			       uint32_t *handle)
+			       uint32_t *handle, bool noncoherent)
 {
 	struct drm_gem_cma_object *cma_obj;
 	struct drm_gem_object *gem_obj;
 	int ret;
 
-	cma_obj = drm_gem_cma_create(drm, size);
+	cma_obj = drm_gem_cma_create_with_cache_param(drm, size, noncoherent);
 	if (IS_ERR(cma_obj))
 		return cma_obj;
 
@@ -276,7 +277,7 @@ int drm_gem_cma_dumb_create_internal(struct drm_file *file_priv,
 		args->size = args->pitch * args->height;
 
 	cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size,
-						 &args->handle);
+						 &args->handle, false);
 	return PTR_ERR_OR_ZERO(cma_obj);
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create_internal);
@@ -309,11 +310,39 @@ int drm_gem_cma_dumb_create(struct drm_file *file_priv,
 	args->size = args->pitch * args->height;
 
 	cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size,
-						 &args->handle);
+						 &args->handle, false);
 	return PTR_ERR_OR_ZERO(cma_obj);
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create);
 
+/**
+ * drm_gem_cma_dumb_create_noncoherent - create a dumb buffer object with
+ *     non-coherent cache attribute
+ * @file_priv: DRM file-private structure to create the dumb buffer for
+ * @drm: DRM device
+ * @args: IOCTL data
+ *
+ * Same as drm_gem_cma_dumb_create, but the dumb buffer object created has
+ * the non-coherent cache attribute set.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_cma_dumb_create_noncoherent(struct drm_file *file_priv,
+					struct drm_device *drm,
+					struct drm_mode_create_dumb *args)
+{
+	struct drm_gem_cma_object *cma_obj;
+
+	args->pitch = DIV_ROUND_UP(args->width * args->bpp, 8);
+	args->size = args->pitch * args->height;
+
+	cma_obj = drm_gem_cma_create_with_handle(file_priv, drm, args->size,
+						 &args->handle, true);
+	return PTR_ERR_OR_ZERO(cma_obj);
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_dumb_create_noncoherent);
+
 const struct vm_operations_struct drm_gem_cma_vm_ops = {
 	.open = drm_gem_vm_open,
 	.close = drm_gem_vm_close,
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 360771f5f485..6b44e7492a63 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -76,6 +76,11 @@ int drm_gem_cma_dumb_create(struct drm_file *file_priv,
 			    struct drm_device *drm,
 			    struct drm_mode_create_dumb *args);
 
+/* create non-coherent memory region for DRM framebuffer */
+int drm_gem_cma_dumb_create_noncoherent(struct drm_file *file_priv,
+					struct drm_device *drm,
+					struct drm_mode_create_dumb *args);
+
 /* allocate physical memory */
 struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
 					      size_t size);
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent
  2021-03-07 20:28 [PATCH v2 0/5] Add option to mmap GEM buffers cached Paul Cercueil
  2021-03-07 20:28 ` [PATCH v2 1/5] drm: Add and export function drm_gem_cma_create_noncoherent Paul Cercueil
  2021-03-07 20:28 ` [PATCH v2 2/5] drm: Add and export function drm_gem_cma_dumb_create_noncoherent Paul Cercueil
@ 2021-03-07 20:28 ` Paul Cercueil
  2021-03-11 12:26   ` Christoph Hellwig
  2021-03-07 20:28 ` [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-07 20:28 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips, Paul Cercueil

This function can be used by drivers that need to mmap dumb buffers
created with non-coherent backing memory.

v2: Use dma_to_phys() since cma_obj->paddr isn't a phys_addr_t but a
dma_addr_t.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 67 +++++++++++++++++++++++++---
 include/drm/drm_gem_cma_helper.h     |  1 +
 2 files changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index d100c5f9c140..e39b0464e19d 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -10,6 +10,7 @@
  */
 
 #include <linux/dma-buf.h>
+#include <linux/dma-direct.h>
 #include <linux/dma-mapping.h>
 #include <linux/export.h>
 #include <linux/mm.h>
@@ -42,10 +43,20 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
 	.vm_ops = &drm_gem_cma_vm_ops,
 };
 
+static const struct drm_gem_object_funcs drm_gem_cma_noncoherent_funcs = {
+	.free = drm_gem_cma_free_object,
+	.print_info = drm_gem_cma_print_info,
+	.get_sg_table = drm_gem_cma_get_sg_table,
+	.vmap = drm_gem_cma_vmap,
+	.mmap = drm_gem_cma_mmap_noncoherent,
+	.vm_ops = &drm_gem_cma_vm_ops,
+};
+
 /**
  * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
  * @drm: DRM device
  * @size: size of the object to allocate
+ * @noncoherent: if true, will use non-coherent backed memory
  *
  * This function creates and initializes a GEM CMA object of the given size,
  * but doesn't allocate any memory to back the object.
@@ -55,7 +66,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
  * error code on failure.
  */
 static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, size_t size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size, bool noncoherent)
 {
 	struct drm_gem_cma_object *cma_obj;
 	struct drm_gem_object *gem_obj;
@@ -68,8 +79,12 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
 	if (!gem_obj)
 		return ERR_PTR(-ENOMEM);
 
-	if (!gem_obj->funcs)
-		gem_obj->funcs = &drm_gem_cma_default_funcs;
+	if (!gem_obj->funcs) {
+		if (noncoherent)
+			gem_obj->funcs = &drm_gem_cma_noncoherent_funcs;
+		else
+			gem_obj->funcs = &drm_gem_cma_default_funcs;
+	}
 
 	cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
 
@@ -100,7 +115,7 @@ drm_gem_cma_create_with_cache_param(struct drm_device *drm,
 
 	size = round_up(size, PAGE_SIZE);
 
-	cma_obj = __drm_gem_cma_create(drm, size);
+	cma_obj = __drm_gem_cma_create(drm, size, noncoherent);
 	if (IS_ERR(cma_obj))
 		return cma_obj;
 
@@ -503,7 +518,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 		return ERR_PTR(-EINVAL);
 
 	/* Create a CMA GEM buffer. */
-	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
+	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, false);
 	if (IS_ERR(cma_obj))
 		return ERR_CAST(cma_obj);
 
@@ -579,6 +594,48 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
 }
 EXPORT_SYMBOL_GPL(drm_gem_cma_mmap);
 
+/**
+ * drm_gem_cma_mmap_noncoherent - memory-map a CMA GEM object with
+ *     non-coherent cache attribute
+ * @filp: file object
+ * @vma: VMA for the area to be mapped
+ *
+ * Just like drm_gem_cma_mmap, but for a GEM object backed by non-coherent
+ * memory.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj,
+				 struct vm_area_struct *vma)
+{
+	struct drm_gem_cma_object *cma_obj;
+	unsigned long pfn;
+	int ret;
+
+	/*
+	 * Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the
+	 * vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map
+	 * the whole buffer.
+	 */
+	vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
+	vma->vm_flags &= ~VM_PFNMAP;
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+
+	cma_obj = to_drm_gem_cma_obj(obj);
+
+	pfn = PHYS_PFN(dma_to_phys(cma_obj->base.dev->dev, cma_obj->paddr));
+
+	ret = remap_pfn_range(vma, vma->vm_start, pfn,
+			      vma->vm_end - vma->vm_start,
+			      vma->vm_page_prot);
+	if (ret)
+		drm_gem_vm_close(vma);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_mmap_noncoherent);
+
 /**
  * drm_gem_cma_prime_import_sg_table_vmap - PRIME import another driver's
  *	scatter/gather table and get the virtual address of the buffer
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 6b44e7492a63..6a3f7e1312cc 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -107,6 +107,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 				  struct sg_table *sgt);
 int drm_gem_cma_vmap(struct drm_gem_object *obj, struct dma_buf_map *map);
 int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
+int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj, struct vm_area_struct *vma);
 
 /**
  * DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE - CMA GEM driver operations
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data
  2021-03-07 20:28 [PATCH v2 0/5] Add option to mmap GEM buffers cached Paul Cercueil
                   ` (2 preceding siblings ...)
  2021-03-07 20:28 ` [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent Paul Cercueil
@ 2021-03-07 20:28 ` Paul Cercueil
  2021-03-11 12:28   ` Christoph Hellwig
  2021-03-07 20:28 ` [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
  2021-03-08  8:41 ` [PATCH v2 0/5] Add option to mmap GEM buffers cached Thomas Zimmermann
  5 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-07 20:28 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips, Paul Cercueil

This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 43 ++++++++++++++++++++++++++++
 include/drm/drm_gem_cma_helper.h     |  5 ++++
 2 files changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index e39b0464e19d..fdae54a18670 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -18,9 +18,14 @@
 #include <linux/slab.h>
 
 #include <drm/drm.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_device.h>
 #include <drm/drm_drv.h>
+#include <drm/drm_fourcc.h>
+#include <drm/drm_fb_cma_helper.h>
+#include <drm/drm_framebuffer.h>
 #include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_plane.h>
 #include <drm/drm_vma_manager.h>
 
 /**
@@ -684,3 +689,41 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
 	return obj;
 }
 EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
+
+/**
+ * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
+ * @dev: DRM device
+ * @old_state: Old plane state
+ * @state: New plane state
+ *
+ * This function can be used by drivers that use damage clips and have
+ * CMA GEM objects backed by non-coherent memory. Calling this function
+ * in a plane's .atomic_update ensures that all the data in the backing
+ * memory have been written to RAM.
+ */
+void drm_gem_cma_sync_data(struct device *dev,
+			   struct drm_plane_state *old_state,
+			   struct drm_plane_state *state)
+{
+	const struct drm_format_info *finfo = state->fb->format;
+	struct drm_atomic_helper_damage_iter iter;
+	unsigned int offset, i;
+	struct drm_rect clip;
+	dma_addr_t daddr;
+
+	drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+
+	drm_atomic_for_each_plane_damage(&iter, &clip) {
+		for (i = 0; i < finfo->num_planes; i++) {
+			daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
+
+			/* Ignore x1/x2 values, invalidate complete lines */
+			offset = clip.y1 * state->fb->pitches[i];
+
+			dma_sync_single_for_device(dev, daddr + offset,
+				       (clip.y2 - clip.y1) * state->fb->pitches[i],
+				       DMA_TO_DEVICE);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 6a3f7e1312cc..cdd3fb456916 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -7,6 +7,7 @@
 #include <drm/drm_gem.h>
 
 struct drm_mode_create_dumb;
+struct drm_plane_state;
 
 /**
  * struct drm_gem_cma_object - GEM object backed by CMA memory allocations
@@ -190,4 +191,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
 				       struct dma_buf_attachment *attach,
 				       struct sg_table *sgt);
 
+void drm_gem_cma_sync_data(struct device *dev,
+			   struct drm_plane_state *old_state,
+			   struct drm_plane_state *state);
+
 #endif /* __DRM_GEM_CMA_HELPER_H__ */
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers
  2021-03-07 20:28 [PATCH v2 0/5] Add option to mmap GEM buffers cached Paul Cercueil
                   ` (3 preceding siblings ...)
  2021-03-07 20:28 ` [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
@ 2021-03-07 20:28 ` Paul Cercueil
       [not found]   ` <20210308034727.1951-1-hdanton@sina.com>
  2021-03-11 12:30   ` Christoph Hellwig
  2021-03-08  8:41 ` [PATCH v2 0/5] Add option to mmap GEM buffers cached Thomas Zimmermann
  5 siblings, 2 replies; 23+ messages in thread
From: Paul Cercueil @ 2021-03-07 20:28 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips, Paul Cercueil

With the module parameter ingenic-drm.cached_gem_buffers, it is possible
to specify that we want GEM buffers backed by non-coherent memory.

This dramatically speeds up software rendering on Ingenic SoCs, even for
tasks where write-combine memory should in theory be faster (e.g. simple
blits).

Leave it disabled by default, since it is specific to one use-case
(software rendering).

v2: Rework code to work with new DRM APIs regarding plane states

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 ++++++++++++++++++++++-
 drivers/gpu/drm/ingenic/ingenic-drm.h     |  4 ++
 drivers/gpu/drm/ingenic/ingenic-ipu.c     | 14 ++++++-
 3 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index d60e1eefc9d1..ba1ac0fcda74 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -9,6 +9,7 @@
 #include <linux/component.h>
 #include <linux/clk.h>
 #include <linux/dma-mapping.h>
+#include <linux/io.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/of_device.h>
@@ -23,6 +24,7 @@
 #include <drm/drm_color_mgmt.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_gem_cma_helper.h>
 #include <drm/drm_fb_cma_helper.h>
@@ -99,6 +101,11 @@ struct ingenic_drm {
 	struct notifier_block clock_nb;
 };
 
+static bool ingenic_drm_cached_gem_buf;
+module_param_named(cached_gem_buffers, ingenic_drm_cached_gem_buf, bool, 0400);
+MODULE_PARM_DESC(cached_gem_buffers,
+		 "Enable fully cached GEM buffers [default=false]");
+
 static bool ingenic_drm_writeable_reg(struct device *dev, unsigned int reg)
 {
 	switch (reg) {
@@ -410,6 +417,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
 	     old_plane_state->fb->format->format != new_plane_state->fb->format->format))
 		crtc_state->mode_changed = true;
 
+	drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
 	return 0;
 }
 
@@ -541,10 +550,20 @@ static void ingenic_drm_update_palette(struct ingenic_drm *priv,
 	}
 }
 
+void ingenic_drm_sync_data(struct device *dev,
+			   struct drm_plane_state *old_state,
+			   struct drm_plane_state *state)
+{
+	if (ingenic_drm_cached_gem_buf)
+		drm_gem_cma_sync_data(dev, old_state, state);
+}
+
 static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
 					    struct drm_atomic_state *state)
 {
 	struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
+	struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state,
+									  plane);
 	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
 									  plane);
 	struct drm_crtc_state *crtc_state;
@@ -554,6 +573,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
 	u32 fourcc;
 
 	if (newstate && newstate->fb) {
+		ingenic_drm_sync_data(priv->dev, oldstate, newstate);
+
 		crtc_state = newstate->crtc->state;
 
 		addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -743,6 +764,26 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
 	regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
 }
 
+static struct drm_framebuffer *
+ingenic_drm_gem_fb_create(struct drm_device *dev, struct drm_file *file,
+			  const struct drm_mode_fb_cmd2 *mode_cmd)
+{
+	if (ingenic_drm_cached_gem_buf)
+		return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
+
+	return drm_gem_fb_create(dev, file, mode_cmd);
+}
+
+static int ingenic_drm_gem_cma_dumb_create(struct drm_file *file_priv,
+					   struct drm_device *drm,
+					   struct drm_mode_create_dumb *args)
+{
+	if (ingenic_drm_cached_gem_buf)
+		return drm_gem_cma_dumb_create_noncoherent(file_priv, drm, args);
+
+	return drm_gem_cma_dumb_create(file_priv, drm, args);
+}
+
 DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
 
 static const struct drm_driver ingenic_drm_driver_data = {
@@ -755,7 +796,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
 	.patchlevel		= 0,
 
 	.fops			= &ingenic_drm_fops,
-	DRM_GEM_CMA_DRIVER_OPS,
+	DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
 
 	.irq_handler		= ingenic_drm_irq_handler,
 };
@@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs ingenic_drm_encoder_helper_funcs =
 };
 
 static const struct drm_mode_config_funcs ingenic_drm_mode_config_funcs = {
-	.fb_create		= drm_gem_fb_create,
+	.fb_create		= ingenic_drm_gem_fb_create,
 	.output_poll_changed	= drm_fb_helper_output_poll_changed,
 	.atomic_check		= drm_atomic_helper_check,
 	.atomic_commit		= drm_atomic_helper_commit,
@@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
 		return ret;
 	}
 
+	drm_plane_enable_fb_damage_clips(&priv->f1);
+
 	drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
 
 	ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
 			return ret;
 		}
 
+		drm_plane_enable_fb_damage_clips(&priv->f0);
+
 		if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
 			ret = component_bind_all(dev, drm);
 			if (ret) {
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h b/drivers/gpu/drm/ingenic/ingenic-drm.h
index 1b4347f7f084..b6bca356e024 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm.h
+++ b/drivers/gpu/drm/ingenic/ingenic-drm.h
@@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device *dev,
 			      struct drm_plane *plane, u32 fourcc);
 void ingenic_drm_plane_disable(struct device *dev, struct drm_plane *plane);
 
+void ingenic_drm_sync_data(struct device *dev,
+			   struct drm_plane_state *old_state,
+			   struct drm_plane_state *state);
+
 extern struct platform_driver *ingenic_ipu_driver_ptr;
 
 #endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */
diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 5ae6adab8306..7826eab044ba 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -20,6 +20,7 @@
 
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_fourcc.h>
@@ -285,6 +286,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
 					    struct drm_atomic_state *state)
 {
 	struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
+	struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state,
+									  plane);
 	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
 									  plane);
 	const struct drm_format_info *finfo;
@@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
 				JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
 	}
 
+	ingenic_drm_sync_data(ipu->master, oldstate, newstate);
+
 	/* New addresses will be committed in vblank handler... */
 	ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
 	if (finfo->num_planes > 1)
@@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 
 	if (!new_plane_state->crtc ||
 	    !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
-		return 0;
+		goto out_check_damage;
 
 	/* Plane must be fully visible */
 	if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 		return -EINVAL;
 
 	if (!osd_changed(new_plane_state, old_plane_state))
-		return 0;
+		goto out_check_damage;
 
 	crtc_state->mode_changed = true;
 
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 	ipu->denom_w = denom_w;
 	ipu->denom_h = denom_h;
 
+out_check_damage:
+	drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
 	return 0;
 }
 
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d)
 		return err;
 	}
 
+	drm_plane_enable_fb_damage_clips(plane);
+
 	/*
 	 * Sharpness settings range is [0,32]
 	 * 0       : nearest-neighbor
-- 
2.30.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached
  2021-03-07 20:28 [PATCH v2 0/5] Add option to mmap GEM buffers cached Paul Cercueil
                   ` (4 preceding siblings ...)
  2021-03-07 20:28 ` [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
@ 2021-03-08  8:41 ` Thomas Zimmermann
  2021-03-10 19:02   ` Paul Cercueil
  5 siblings, 1 reply; 23+ messages in thread
From: Thomas Zimmermann @ 2021-03-08  8:41 UTC (permalink / raw)
  To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
	Daniel Vetter
  Cc: Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips


[-- Attachment #1.1: Type: text/plain, Size: 2833 bytes --]

Hi Paul,

having individual functions for each mode only makes sense if the 
decision is at compile time. But in patch 5, you're working around your 
earlier design by introducing in-driver helpers that select the correct 
CMA function.

In SHMEM helpers we have the flag map_wc in the GEM structure that 
selects the pages caching mode (wc vs uncached). I think CMA should use 
this design as well. Have a map_noncoherent flag in the CMA GEM object 
and set it from the driver's implementation of gem_create_object.

And in the long run, we could try to consolidate all drivers/helpers 
mapping flags in struct drm_gem_object.

Best regards
Thomas

Am 07.03.21 um 21:28 schrieb Paul Cercueil:
> Rework of my previous patchset which added support for GEM buffers
> backed by non-coherent memory to the ingenic-drm driver.
> 
> Having GEM buffers backed by non-coherent memory is interesting in
> the particular case where it is faster to render to a non-coherent
> buffer then sync the data cache, than to render to a write-combine
> buffer, and (by extension) much faster than using a shadow buffer.
> This is true for instance on some Ingenic SoCs, where even simple
> blits (e.g. memcpy) are about three times faster using this method.
> 
> For the record, the previous patchset was accepted for 5.10 then had
> to be reverted, as it conflicted with some changes made to the DMA API.
> 
> This new patchset is pretty different as it adds the functionality to
> the DRM core. The first three patches add variants to existing functions
> but with the "non-coherent memory" twist, exported as GPL symbols. The
> fourth patch adds a function to be used with the damage helpers.
> Finally, the last patch adds support for non-coherent GEM buffers to the
> ingenic-drm driver. The functionality is enabled through a module
> parameter, and is disabled by default.
> 
> Cheers,
> -Paul
> 
> Paul Cercueil (5):
>    drm: Add and export function drm_gem_cma_create_noncoherent
>    drm: Add and export function drm_gem_cma_dumb_create_noncoherent
>    drm: Add and export function drm_gem_cma_mmap_noncoherent
>    drm: Add and export function drm_gem_cma_sync_data
>    drm/ingenic: Add option to alloc cached GEM buffers
> 
>   drivers/gpu/drm/drm_gem_cma_helper.c      | 223 +++++++++++++++++++---
>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  49 ++++-
>   drivers/gpu/drm/ingenic/ingenic-drm.h     |   4 +
>   drivers/gpu/drm/ingenic/ingenic-ipu.c     |  14 +-
>   include/drm/drm_gem_cma_helper.h          |  13 ++
>   5 files changed, 273 insertions(+), 30 deletions(-)
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers
       [not found]   ` <20210308034727.1951-1-hdanton@sina.com>
@ 2021-03-10 19:01     ` Paul Cercueil
       [not found]       ` <20210311022743.2542-1-hdanton@sina.com>
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-10 19:01 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Christoph Hellwig, Daniel Vetter, dri-devel, linux-kernel

Hi Hillf,

Le lun. 8 mars 2021 à 11:47, Hillf Danton <hdanton@sina.com> a écrit :
> On Sun,  7 Mar 2021 20:28:35 +0000  Paul Cercueil wrote:
>>  With the module parameter ingenic-drm.cached_gem_buffers, it is 
>> possible
>>  to specify that we want GEM buffers backed by non-coherent memory.
>> 
>>  This dramatically speeds up software rendering on Ingenic SoCs, 
>> even for
>>  tasks where write-combine memory should in theory be faster (e.g. 
>> simple
>>  blits).
> 
> Wondering if it is due to the tricks at [1].
> 
> If so, is dma_alloc_noncoherent() necessary in this patchset?

You confuse non-contiguous with non-coherent, which are two different 
things.

Cheers,
-Paul

> Christoph can you give us a concise lesson on noncoherency covering 
> at least
> noncoherent device, noncoherent memory(used in this work), no coherent
> caching(in [1]), their links to speedup, and the thumb rule to handle
> noncoherency in workdays. It feels toe curling every time I see 
> noncoherence
> going downtown with speedup hand in hand.
> 
> [1] Subject: [PATCH 6/6] media: uvcvideo: Use dma_alloc_noncontiguos 
> API
> https://lore.kernel.org/lkml/20210301085236.947011-7-hch@lst.de/#t
> 
>> 
>>  Leave it disabled by default, since it is specific to one use-case
>>  (software rendering).
>> 
>>  v2: Rework code to work with new DRM APIs regarding plane states
>> 
>>  Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>>  ---
>>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49 
>> ++++++++++++++++++++++-
>>   drivers/gpu/drm/ingenic/ingenic-drm.h     |  4 ++
>>   drivers/gpu/drm/ingenic/ingenic-ipu.c     | 14 ++++++-
>>   3 files changed, 63 insertions(+), 4 deletions(-)
>> 
>>  diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
>> b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>>  index d60e1eefc9d1..ba1ac0fcda74 100644
>>  --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>>  +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>>  @@ -9,6 +9,7 @@
>>   #include <linux/component.h>
>>   #include <linux/clk.h>
>>   #include <linux/dma-mapping.h>
>>  +#include <linux/io.h>
>>   #include <linux/module.h>
>>   #include <linux/mutex.h>
>>   #include <linux/of_device.h>
>>  @@ -23,6 +24,7 @@
>>   #include <drm/drm_color_mgmt.h>
>>   #include <drm/drm_crtc.h>
>>   #include <drm/drm_crtc_helper.h>
>>  +#include <drm/drm_damage_helper.h>
>>   #include <drm/drm_drv.h>
>>   #include <drm/drm_gem_cma_helper.h>
>>   #include <drm/drm_fb_cma_helper.h>
>>  @@ -99,6 +101,11 @@ struct ingenic_drm {
>>   	struct notifier_block clock_nb;
>>   };
>> 
>>  +static bool ingenic_drm_cached_gem_buf;
>>  +module_param_named(cached_gem_buffers, ingenic_drm_cached_gem_buf, 
>> bool, 0400);
>>  +MODULE_PARM_DESC(cached_gem_buffers,
>>  +		 "Enable fully cached GEM buffers [default=false]");
>>  +
>>   static bool ingenic_drm_writeable_reg(struct device *dev, unsigned 
>> int reg)
>>   {
>>   	switch (reg) {
>>  @@ -410,6 +417,8 @@ static int 
>> ingenic_drm_plane_atomic_check(struct drm_plane *plane,
>>   	     old_plane_state->fb->format->format != 
>> new_plane_state->fb->format->format))
>>   		crtc_state->mode_changed = true;
>> 
>>  +	drm_atomic_helper_check_plane_damage(state, new_plane_state);
>>  +
>>   	return 0;
>>   }
>> 
>>  @@ -541,10 +550,20 @@ static void ingenic_drm_update_palette(struct 
>> ingenic_drm *priv,
>>   	}
>>   }
>> 
>>  +void ingenic_drm_sync_data(struct device *dev,
>>  +			   struct drm_plane_state *old_state,
>>  +			   struct drm_plane_state *state)
>>  +{
>>  +	if (ingenic_drm_cached_gem_buf)
>>  +		drm_gem_cma_sync_data(dev, old_state, state);
>>  +}
>>  +
>>   static void ingenic_drm_plane_atomic_update(struct drm_plane 
>> *plane,
>>   					    struct drm_atomic_state *state)
>>   {
>>   	struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
>>  +	struct drm_plane_state *oldstate = 
>> drm_atomic_get_old_plane_state(state,
>>  +									  plane);
>>   	struct drm_plane_state *newstate = 
>> drm_atomic_get_new_plane_state(state,
>>   									  plane);
>>   	struct drm_crtc_state *crtc_state;
>>  @@ -554,6 +573,8 @@ static void 
>> ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>>   	u32 fourcc;
>> 
>>   	if (newstate && newstate->fb) {
>>  +		ingenic_drm_sync_data(priv->dev, oldstate, newstate);
>>  +
>>   		crtc_state = newstate->crtc->state;
>> 
>>   		addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>>  @@ -743,6 +764,26 @@ static void ingenic_drm_disable_vblank(struct 
>> drm_crtc *crtc)
>>   	regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, 
>> JZ_LCD_CTRL_EOF_IRQ, 0);
>>   }
>> 
>>  +static struct drm_framebuffer *
>>  +ingenic_drm_gem_fb_create(struct drm_device *dev, struct drm_file 
>> *file,
>>  +			  const struct drm_mode_fb_cmd2 *mode_cmd)
>>  +{
>>  +	if (ingenic_drm_cached_gem_buf)
>>  +		return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
>>  +
>>  +	return drm_gem_fb_create(dev, file, mode_cmd);
>>  +}
>>  +
>>  +static int ingenic_drm_gem_cma_dumb_create(struct drm_file 
>> *file_priv,
>>  +					   struct drm_device *drm,
>>  +					   struct drm_mode_create_dumb *args)
>>  +{
>>  +	if (ingenic_drm_cached_gem_buf)
>>  +		return drm_gem_cma_dumb_create_noncoherent(file_priv, drm, args);
>>  +
>>  +	return drm_gem_cma_dumb_create(file_priv, drm, args);
>>  +}
>>  +
>>   DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>> 
>>   static const struct drm_driver ingenic_drm_driver_data = {
>>  @@ -755,7 +796,7 @@ static const struct drm_driver 
>> ingenic_drm_driver_data = {
>>   	.patchlevel		= 0,
>> 
>>   	.fops			= &ingenic_drm_fops,
>>  -	DRM_GEM_CMA_DRIVER_OPS,
>>  
>> +	DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
>> 
>>   	.irq_handler		= ingenic_drm_irq_handler,
>>   };
>>  @@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs 
>> ingenic_drm_encoder_helper_funcs =
>>   };
>> 
>>   static const struct drm_mode_config_funcs 
>> ingenic_drm_mode_config_funcs = {
>>  -	.fb_create		= drm_gem_fb_create,
>>  +	.fb_create		= ingenic_drm_gem_fb_create,
>>   	.output_poll_changed	= drm_fb_helper_output_poll_changed,
>>   	.atomic_check		= drm_atomic_helper_check,
>>   	.atomic_commit		= drm_atomic_helper_commit,
>>  @@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device 
>> *dev, bool has_components)
>>   		return ret;
>>   	}
>> 
>>  +	drm_plane_enable_fb_damage_clips(&priv->f1);
>>  +
>>   	drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
>> 
>>   	ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
>>  @@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device 
>> *dev, bool has_components)
>>   			return ret;
>>   		}
>> 
>>  +		drm_plane_enable_fb_damage_clips(&priv->f0);
>>  +
>>   		if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
>>   			ret = component_bind_all(dev, drm);
>>   			if (ret) {
>>  diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h 
>> b/drivers/gpu/drm/ingenic/ingenic-drm.h
>>  index 1b4347f7f084..b6bca356e024 100644
>>  --- a/drivers/gpu/drm/ingenic/ingenic-drm.h
>>  +++ b/drivers/gpu/drm/ingenic/ingenic-drm.h
>>  @@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device 
>> *dev,
>>   			      struct drm_plane *plane, u32 fourcc);
>>   void ingenic_drm_plane_disable(struct device *dev, struct 
>> drm_plane *plane);
>> 
>>  +void ingenic_drm_sync_data(struct device *dev,
>>  +			   struct drm_plane_state *old_state,
>>  +			   struct drm_plane_state *state);
>>  +
>>   extern struct platform_driver *ingenic_ipu_driver_ptr;
>> 
>>   #endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */
>>  diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c 
>> b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>>  index 5ae6adab8306..7826eab044ba 100644
>>  --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>>  +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>>  @@ -20,6 +20,7 @@
>> 
>>   #include <drm/drm_atomic.h>
>>   #include <drm/drm_atomic_helper.h>
>>  +#include <drm/drm_damage_helper.h>
>>   #include <drm/drm_drv.h>
>>   #include <drm/drm_fb_cma_helper.h>
>>   #include <drm/drm_fourcc.h>
>>  @@ -285,6 +286,8 @@ static void 
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>>   					    struct drm_atomic_state *state)
>>   {
>>   	struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
>>  +	struct drm_plane_state *oldstate = 
>> drm_atomic_get_old_plane_state(state,
>>  +									  plane);
>>   	struct drm_plane_state *newstate = 
>> drm_atomic_get_new_plane_state(state,
>>   									  plane);
>>   	const struct drm_format_info *finfo;
>>  @@ -317,6 +320,8 @@ static void 
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>>   				JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
>>   	}
>> 
>>  +	ingenic_drm_sync_data(ipu->master, oldstate, newstate);
>>  +
>>   	/* New addresses will be committed in vblank handler... */
>>   	ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>>   	if (finfo->num_planes > 1)
>>  @@ -541,7 +546,7 @@ static int 
>> ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>> 
>>   	if (!new_plane_state->crtc ||
>>   	    !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
>>  -		return 0;
>>  +		goto out_check_damage;
>> 
>>   	/* Plane must be fully visible */
>>   	if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
>>  @@ -558,7 +563,7 @@ static int 
>> ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>>   		return -EINVAL;
>> 
>>   	if (!osd_changed(new_plane_state, old_plane_state))
>>  -		return 0;
>>  +		goto out_check_damage;
>> 
>>   	crtc_state->mode_changed = true;
>> 
>>  @@ -592,6 +597,9 @@ static int 
>> ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>>   	ipu->denom_w = denom_w;
>>   	ipu->denom_h = denom_h;
>> 
>>  +out_check_damage:
>>  +	drm_atomic_helper_check_plane_damage(state, new_plane_state);
>>  +
>>   	return 0;
>>   }
>> 
>>  @@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, 
>> struct device *master, void *d)
>>   		return err;
>>   	}
>> 
>>  +	drm_plane_enable_fb_damage_clips(plane);
>>  +
>>   	/*
>>   	 * Sharpness settings range is [0,32]
>>   	 * 0       : nearest-neighbor
>>  --
>>  2.30.1
> 
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached
  2021-03-08  8:41 ` [PATCH v2 0/5] Add option to mmap GEM buffers cached Thomas Zimmermann
@ 2021-03-10 19:02   ` Paul Cercueil
  2021-03-11  7:52     ` Thomas Zimmermann
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-10 19:02 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips

Hi Thomas,

Le lun. 8 mars 2021 à 9:41, Thomas Zimmermann <tzimmermann@suse.de> a 
écrit :
> Hi Paul,
> 
> having individual functions for each mode only makes sense if the 
> decision is at compile time. But in patch 5, you're working around 
> your earlier design by introducing in-driver helpers that select the 
> correct CMA function.
> 
> In SHMEM helpers we have the flag map_wc in the GEM structure that 
> selects the pages caching mode (wc vs uncached). I think CMA should 
> use this design as well. Have a map_noncoherent flag in the CMA GEM 
> object and set it from the driver's implementation of 
> gem_create_object.

Is that a new addition? That severely reduces the patchset size, which 
is perfect.

I'll send a V3 then.

Cheers,
-Paul

> And in the long run, we could try to consolidate all drivers/helpers 
> mapping flags in struct drm_gem_object.
> 
> Best regards
> Thomas
> 
> Am 07.03.21 um 21:28 schrieb Paul Cercueil:
>> Rework of my previous patchset which added support for GEM buffers
>> backed by non-coherent memory to the ingenic-drm driver.
>> 
>> Having GEM buffers backed by non-coherent memory is interesting in
>> the particular case where it is faster to render to a non-coherent
>> buffer then sync the data cache, than to render to a write-combine
>> buffer, and (by extension) much faster than using a shadow buffer.
>> This is true for instance on some Ingenic SoCs, where even simple
>> blits (e.g. memcpy) are about three times faster using this method.
>> 
>> For the record, the previous patchset was accepted for 5.10 then had
>> to be reverted, as it conflicted with some changes made to the DMA 
>> API.
>> 
>> This new patchset is pretty different as it adds the functionality to
>> the DRM core. The first three patches add variants to existing 
>> functions
>> but with the "non-coherent memory" twist, exported as GPL symbols. 
>> The
>> fourth patch adds a function to be used with the damage helpers.
>> Finally, the last patch adds support for non-coherent GEM buffers to 
>> the
>> ingenic-drm driver. The functionality is enabled through a module
>> parameter, and is disabled by default.
>> 
>> Cheers,
>> -Paul
>> 
>> Paul Cercueil (5):
>>    drm: Add and export function drm_gem_cma_create_noncoherent
>>    drm: Add and export function drm_gem_cma_dumb_create_noncoherent
>>    drm: Add and export function drm_gem_cma_mmap_noncoherent
>>    drm: Add and export function drm_gem_cma_sync_data
>>    drm/ingenic: Add option to alloc cached GEM buffers
>> 
>>   drivers/gpu/drm/drm_gem_cma_helper.c      | 223 
>> +++++++++++++++++++---
>>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  49 ++++-
>>   drivers/gpu/drm/ingenic/ingenic-drm.h     |   4 +
>>   drivers/gpu/drm/ingenic/ingenic-ipu.c     |  14 +-
>>   include/drm/drm_gem_cma_helper.h          |  13 ++
>>   5 files changed, 273 insertions(+), 30 deletions(-)
>> 
> 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/5] Add option to mmap GEM buffers cached
  2021-03-10 19:02   ` Paul Cercueil
@ 2021-03-11  7:52     ` Thomas Zimmermann
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Zimmermann @ 2021-03-11  7:52 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips


[-- Attachment #1.1: Type: text/plain, Size: 4211 bytes --]

Hi

Am 10.03.21 um 20:02 schrieb Paul Cercueil:
> Hi Thomas,
> 
> Le lun. 8 mars 2021 à 9:41, Thomas Zimmermann <tzimmermann@suse.de> a 
> écrit :
>> Hi Paul,
>>
>> having individual functions for each mode only makes sense if the 
>> decision is at compile time. But in patch 5, you're working around 
>> your earlier design by introducing in-driver helpers that select the 
>> correct CMA function.
>>
>> In SHMEM helpers we have the flag map_wc in the GEM structure that 
>> selects the pages caching mode (wc vs uncached). I think CMA should 

Re-reading this, it should rather be WC and cached.

>> use this design as well. Have a map_noncoherent flag in the CMA GEM 
>> object and set it from the driver's implementation of gem_create_object.
> 
> Is that a new addition? That severely reduces the patchset size, which 
> is perfect.

It was added a few months ago, around the time you send the first 
version of the patches at hand.

Originally, SHMEM uses write combining by default. And several drivers 
used default mapping flags instead (so usually cached). IIRC I 
streamlined everything and flipped the default. Most drivers can use 
cached mappings and only some require WC.

Recently there was also a patchset that added support for cached video 
RAM (or something like that). So at some point we could store these 
flags in drm_gem_object. For now, I'd just put a flag into 
drm_gem_cma_object.

> 
> I'll send a V3 then.

Great!

Best regards
Thomas

> 
> Cheers,
> -Paul
> 
>> And in the long run, we could try to consolidate all drivers/helpers 
>> mapping flags in struct drm_gem_object.
>>
>> Best regards
>> Thomas
>>
>> Am 07.03.21 um 21:28 schrieb Paul Cercueil:
>>> Rework of my previous patchset which added support for GEM buffers
>>> backed by non-coherent memory to the ingenic-drm driver.
>>>
>>> Having GEM buffers backed by non-coherent memory is interesting in
>>> the particular case where it is faster to render to a non-coherent
>>> buffer then sync the data cache, than to render to a write-combine
>>> buffer, and (by extension) much faster than using a shadow buffer.
>>> This is true for instance on some Ingenic SoCs, where even simple
>>> blits (e.g. memcpy) are about three times faster using this method.
>>>
>>> For the record, the previous patchset was accepted for 5.10 then had
>>> to be reverted, as it conflicted with some changes made to the DMA API.
>>>
>>> This new patchset is pretty different as it adds the functionality to
>>> the DRM core. The first three patches add variants to existing functions
>>> but with the "non-coherent memory" twist, exported as GPL symbols. The
>>> fourth patch adds a function to be used with the damage helpers.
>>> Finally, the last patch adds support for non-coherent GEM buffers to the
>>> ingenic-drm driver. The functionality is enabled through a module
>>> parameter, and is disabled by default.
>>>
>>> Cheers,
>>> -Paul
>>>
>>> Paul Cercueil (5):
>>>    drm: Add and export function drm_gem_cma_create_noncoherent
>>>    drm: Add and export function drm_gem_cma_dumb_create_noncoherent
>>>    drm: Add and export function drm_gem_cma_mmap_noncoherent
>>>    drm: Add and export function drm_gem_cma_sync_data
>>>    drm/ingenic: Add option to alloc cached GEM buffers
>>>
>>>   drivers/gpu/drm/drm_gem_cma_helper.c      | 223 +++++++++++++++++++---
>>>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  49 ++++-
>>>   drivers/gpu/drm/ingenic/ingenic-drm.h     |   4 +
>>>   drivers/gpu/drm/ingenic/ingenic-ipu.c     |  14 +-
>>>   include/drm/drm_gem_cma_helper.h          |  13 ++
>>>   5 files changed, 273 insertions(+), 30 deletions(-)
>>>
>>
>> -- 
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Maxfeldstr. 5, 90409 Nürnberg, Germany
>> (HRB 36809, AG Nürnberg)
>> Geschäftsführer: Felix Imendörffer
>>
> 
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers
       [not found]       ` <20210311022743.2542-1-hdanton@sina.com>
@ 2021-03-11 12:10         ` Paul Cercueil
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Cercueil @ 2021-03-11 12:10 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Christoph Hellwig, Daniel Vetter, dri-devel, linux-kernel



Le jeu. 11 mars 2021 à 10:27, Hillf Danton <hdanton@sina.com> a écrit 
:
> On Wed, 10 Mar 2021 19:01:01 +0000 Paul Cercueil wrote:
>> Le lun. 8 mars 2021  11:47, Hillf Danton <hdanton@sina.com> a crit :
>>>  On Sun,  7 Mar 2021 20:28:35 +0000  Paul Cercueil wrote:
>>>>   With the module parameter ingenic-drm.cached_gem_buffers, it is
>>>>  possible
>>>>   to specify that we want GEM buffers backed by non-coherent 
>>>> memory.
>>>> 
>>>>   This dramatically speeds up software rendering on Ingenic SoCs,
>>>>  even for
>>>>   tasks where write-combine memory should in theory be faster (e.g.
>>>>  simple
>>>>   blits).
>>> 
>>>  Wondering if it is due to the tricks at [1].
>>> 
>>>  If so, is dma_alloc_noncoherent() necessary in this patchset?
>> 
>> You confuse non-contiguous with non-coherent, which are two different
>> things.
> 
> You misunderstood me. From [1] we know coherent caching is arch thing,
> so your proposal is not mandatory on ARM IMHO - what baffles me is
> noncoherent back memory can speed up device, coherent ot not, 
> regardless
> of arch. Can you point me to the reasons behind your speedup?

Well, I did write in the cover letter that it would help *some* SoCs, 
so it is not meant to be a default setting. As you can see in the last 
patch, it is opt-in. Ingenic SoCs are MIPS SoCs, btw.

One thing I should add is that this speeds up *software* rendering (vs. 
write-combine). It makes perfect sense that it speeds up things like 
alpha-blending, but I can't explain why it speeds up standard blits. My 
guess is that cache line invalidation is just extremely fast on this 
SoC.

-Paul

>> 
>> Cheers,
>> -Paul
>> 
>>>  Christoph can you give us a concise lesson on noncoherency covering
>>>  at least
>>>  noncoherent device, noncoherent memory(used in this work), no 
>>> coherent
>>>  caching(in [1]), their links to speedup, and the thumb rule to 
>>> handle
>>>  noncoherency in workdays. It feels toe curling every time I see
>>>  noncoherence
>>>  going downtown with speedup hand in hand.
>>> 
>>>  [1] Subject: [PATCH 6/6] media: uvcvideo: Use 
>>> dma_alloc_noncontiguos
>>>  API
>>>  https://lore.kernel.org/lkml/20210301085236.947011-7-hch@lst.de/#t
>>> 
>>>> 
>>>>   Leave it disabled by default, since it is specific to one 
>>>> use-case
>>>>   (software rendering).
>>>> 
>>>>   v2: Rework code to work with new DRM APIs regarding plane states
>>>> 
>>>>   Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>>>>   ---
>>>>    drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 49
>>>>  ++++++++++++++++++++++-
>>>>    drivers/gpu/drm/ingenic/ingenic-drm.h     |  4 ++
>>>>    drivers/gpu/drm/ingenic/ingenic-ipu.c     | 14 ++++++-
>>>>    3 files changed, 63 insertions(+), 4 deletions(-)
>>>> 
>>>>   diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>>>>  b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>>>>   index d60e1eefc9d1..ba1ac0fcda74 100644
>>>>   --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>>>>   +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>>>>   @@ -9,6 +9,7 @@
>>>>    #include <linux/component.h>
>>>>    #include <linux/clk.h>
>>>>    #include <linux/dma-mapping.h>
>>>>   +#include <linux/io.h>
>>>>    #include <linux/module.h>
>>>>    #include <linux/mutex.h>
>>>>    #include <linux/of_device.h>
>>>>   @@ -23,6 +24,7 @@
>>>>    #include <drm/drm_color_mgmt.h>
>>>>    #include <drm/drm_crtc.h>
>>>>    #include <drm/drm_crtc_helper.h>
>>>>   +#include <drm/drm_damage_helper.h>
>>>>    #include <drm/drm_drv.h>
>>>>    #include <drm/drm_gem_cma_helper.h>
>>>>    #include <drm/drm_fb_cma_helper.h>
>>>>   @@ -99,6 +101,11 @@ struct ingenic_drm {
>>>>    	struct notifier_block clock_nb;
>>>>    };
>>>> 
>>>>   +static bool ingenic_drm_cached_gem_buf;
>>>>   +module_param_named(cached_gem_buffers, 
>>>> ingenic_drm_cached_gem_buf,
>>>>  bool, 0400);
>>>>   +MODULE_PARM_DESC(cached_gem_buffers,
>>>>   +		 "Enable fully cached GEM buffers [default=false]");
>>>>   +
>>>>    static bool ingenic_drm_writeable_reg(struct device *dev, 
>>>> unsigned
>>>>  int reg)
>>>>    {
>>>>    	switch (reg) {
>>>>   @@ -410,6 +417,8 @@ static int
>>>>  ingenic_drm_plane_atomic_check(struct drm_plane *plane,
>>>>    	     old_plane_state->fb->format->format !=
>>>>  new_plane_state->fb->format->format))
>>>>    		crtc_state->mode_changed = true;
>>>> 
>>>>   +	drm_atomic_helper_check_plane_damage(state, new_plane_state);
>>>>   +
>>>>    	return 0;
>>>>    }
>>>> 
>>>>   @@ -541,10 +550,20 @@ static void 
>>>> ingenic_drm_update_palette(struct
>>>>  ingenic_drm *priv,
>>>>    	}
>>>>    }
>>>> 
>>>>   +void ingenic_drm_sync_data(struct device *dev,
>>>>   +			   struct drm_plane_state *old_state,
>>>>   +			   struct drm_plane_state *state)
>>>>   +{
>>>>   +	if (ingenic_drm_cached_gem_buf)
>>>>   +		drm_gem_cma_sync_data(dev, old_state, state);
>>>>   +}
>>>>   +
>>>>    static void ingenic_drm_plane_atomic_update(struct drm_plane
>>>>  *plane,
>>>>    					    struct drm_atomic_state *state)
>>>>    {
>>>>    	struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
>>>>   +	struct drm_plane_state *oldstate =
>>>>  drm_atomic_get_old_plane_state(state,
>>>>   +									  plane);
>>>>    	struct drm_plane_state *newstate =
>>>>  drm_atomic_get_new_plane_state(state,
>>>>    									  plane);
>>>>    	struct drm_crtc_state *crtc_state;
>>>>   @@ -554,6 +573,8 @@ static void
>>>>  ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>>>>    	u32 fourcc;
>>>> 
>>>>    	if (newstate && newstate->fb) {
>>>>   +		ingenic_drm_sync_data(priv->dev, oldstate, newstate);
>>>>   +
>>>>    		crtc_state = newstate->crtc->state;
>>>> 
>>>>    		addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>>>>   @@ -743,6 +764,26 @@ static void 
>>>> ingenic_drm_disable_vblank(struct
>>>>  drm_crtc *crtc)
>>>>    	regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
>>>>  JZ_LCD_CTRL_EOF_IRQ, 0);
>>>>    }
>>>> 
>>>>   +static struct drm_framebuffer *
>>>>   +ingenic_drm_gem_fb_create(struct drm_device *dev, struct 
>>>> drm_file
>>>>  *file,
>>>>   +			  const struct drm_mode_fb_cmd2 *mode_cmd)
>>>>   +{
>>>>   +	if (ingenic_drm_cached_gem_buf)
>>>>   +		return drm_gem_fb_create_with_dirty(dev, file, mode_cmd);
>>>>   +
>>>>   +	return drm_gem_fb_create(dev, file, mode_cmd);
>>>>   +}
>>>>   +
>>>>   +static int ingenic_drm_gem_cma_dumb_create(struct drm_file
>>>>  *file_priv,
>>>>   +					   struct drm_device *drm,
>>>>   +					   struct drm_mode_create_dumb *args)
>>>>   +{
>>>>   +	if (ingenic_drm_cached_gem_buf)
>>>>   +		return drm_gem_cma_dumb_create_noncoherent(file_priv, drm, 
>>>> args);
>>>>   +
>>>>   +	return drm_gem_cma_dumb_create(file_priv, drm, args);
>>>>   +}
>>>>   +
>>>>    DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>>>> 
>>>>    static const struct drm_driver ingenic_drm_driver_data = {
>>>>   @@ -755,7 +796,7 @@ static const struct drm_driver
>>>>  ingenic_drm_driver_data = {
>>>>    	.patchlevel		= 0,
>>>> 
>>>>    	.fops			= &ingenic_drm_fops,
>>>>   -	DRM_GEM_CMA_DRIVER_OPS,
>>>> 
>>>>  
>>>> +	DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(ingenic_drm_gem_cma_dumb_create),
>>>> 
>>>>    	.irq_handler		= ingenic_drm_irq_handler,
>>>>    };
>>>>   @@ -805,7 +846,7 @@ static const struct drm_encoder_helper_funcs
>>>>  ingenic_drm_encoder_helper_funcs =
>>>>    };
>>>> 
>>>>    static const struct drm_mode_config_funcs
>>>>  ingenic_drm_mode_config_funcs = {
>>>>   -	.fb_create		= drm_gem_fb_create,
>>>>   +	.fb_create		= ingenic_drm_gem_fb_create,
>>>>    	.output_poll_changed	= drm_fb_helper_output_poll_changed,
>>>>    	.atomic_check		= drm_atomic_helper_check,
>>>>    	.atomic_commit		= drm_atomic_helper_commit,
>>>>   @@ -962,6 +1003,8 @@ static int ingenic_drm_bind(struct device
>>>>  *dev, bool has_components)
>>>>    		return ret;
>>>>    	}
>>>> 
>>>>   +	drm_plane_enable_fb_damage_clips(&priv->f1);
>>>>   +
>>>>    	drm_crtc_helper_add(&priv->crtc, 
>>>> &ingenic_drm_crtc_helper_funcs);
>>>> 
>>>>    	ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
>>>>   @@ -990,6 +1033,8 @@ static int ingenic_drm_bind(struct device
>>>>  *dev, bool has_components)
>>>>    			return ret;
>>>>    		}
>>>> 
>>>>   +		drm_plane_enable_fb_damage_clips(&priv->f0);
>>>>   +
>>>>    		if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
>>>>    			ret = component_bind_all(dev, drm);
>>>>    			if (ret) {
>>>>   diff --git a/drivers/gpu/drm/ingenic/ingenic-drm.h
>>>>  b/drivers/gpu/drm/ingenic/ingenic-drm.h
>>>>   index 1b4347f7f084..b6bca356e024 100644
>>>>   --- a/drivers/gpu/drm/ingenic/ingenic-drm.h
>>>>   +++ b/drivers/gpu/drm/ingenic/ingenic-drm.h
>>>>   @@ -185,6 +185,10 @@ void ingenic_drm_plane_config(struct device
>>>>  *dev,
>>>>    			      struct drm_plane *plane, u32 fourcc);
>>>>    void ingenic_drm_plane_disable(struct device *dev, struct
>>>>  drm_plane *plane);
>>>> 
>>>>   +void ingenic_drm_sync_data(struct device *dev,
>>>>   +			   struct drm_plane_state *old_state,
>>>>   +			   struct drm_plane_state *state);
>>>>   +
>>>>    extern struct platform_driver *ingenic_ipu_driver_ptr;
>>>> 
>>>>    #endif /* DRIVERS_GPU_DRM_INGENIC_INGENIC_DRM_H */
>>>>   diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>>>>  b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>>>>   index 5ae6adab8306..7826eab044ba 100644
>>>>   --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>>>>   +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>>>>   @@ -20,6 +20,7 @@
>>>> 
>>>>    #include <drm/drm_atomic.h>
>>>>    #include <drm/drm_atomic_helper.h>
>>>>   +#include <drm/drm_damage_helper.h>
>>>>    #include <drm/drm_drv.h>
>>>>    #include <drm/drm_fb_cma_helper.h>
>>>>    #include <drm/drm_fourcc.h>
>>>>   @@ -285,6 +286,8 @@ static void
>>>>  ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>>>>    					    struct drm_atomic_state *state)
>>>>    {
>>>>    	struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
>>>>   +	struct drm_plane_state *oldstate =
>>>>  drm_atomic_get_old_plane_state(state,
>>>>   +									  plane);
>>>>    	struct drm_plane_state *newstate =
>>>>  drm_atomic_get_new_plane_state(state,
>>>>    									  plane);
>>>>    	const struct drm_format_info *finfo;
>>>>   @@ -317,6 +320,8 @@ static void
>>>>  ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>>>>    				JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
>>>>    	}
>>>> 
>>>>   +	ingenic_drm_sync_data(ipu->master, oldstate, newstate);
>>>>   +
>>>>    	/* New addresses will be committed in vblank handler... */
>>>>    	ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 
>>>> 0);
>>>>    	if (finfo->num_planes > 1)
>>>>   @@ -541,7 +546,7 @@ static int
>>>>  ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>>>> 
>>>>    	if (!new_plane_state->crtc ||
>>>>    	    !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
>>>>   -		return 0;
>>>>   +		goto out_check_damage;
>>>> 
>>>>    	/* Plane must be fully visible */
>>>>    	if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 
>>>> ||
>>>>   @@ -558,7 +563,7 @@ static int
>>>>  ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>>>>    		return -EINVAL;
>>>> 
>>>>    	if (!osd_changed(new_plane_state, old_plane_state))
>>>>   -		return 0;
>>>>   +		goto out_check_damage;
>>>> 
>>>>    	crtc_state->mode_changed = true;
>>>> 
>>>>   @@ -592,6 +597,9 @@ static int
>>>>  ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>>>>    	ipu->denom_w = denom_w;
>>>>    	ipu->denom_h = denom_h;
>>>> 
>>>>   +out_check_damage:
>>>>   +	drm_atomic_helper_check_plane_damage(state, new_plane_state);
>>>>   +
>>>>    	return 0;
>>>>    }
>>>> 
>>>>   @@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device 
>>>> *dev,
>>>>  struct device *master, void *d)
>>>>    		return err;
>>>>    	}
>>>> 
>>>>   +	drm_plane_enable_fb_damage_clips(plane);
>>>>   +
>>>>    	/*
>>>>    	 * Sharpness settings range is [0,32]
>>>>    	 * 0       : nearest-neighbor
>>>>   --
>>>>   2.30.1



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/5] drm: Add and export function drm_gem_cma_create_noncoherent
  2021-03-07 20:28 ` [PATCH v2 1/5] drm: Add and export function drm_gem_cma_create_noncoherent Paul Cercueil
@ 2021-03-11 12:23   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-03-11 12:23 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips

> +static struct drm_gem_cma_object *
> +drm_gem_cma_create_with_cache_param(struct drm_device *drm,
> +				    size_t size,
> +				    bool noncoherent)

Does this helper really make much sense?  You basically have two
function calls in it, out of which one is dependend on the parameter.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent
  2021-03-07 20:28 ` [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent Paul Cercueil
@ 2021-03-11 12:26   ` Christoph Hellwig
  2021-03-11 12:32     ` Paul Cercueil
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-03-11 12:26 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips

> +int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj,
> +				 struct vm_area_struct *vma)
> +{
> +	struct drm_gem_cma_object *cma_obj;
> +	unsigned long pfn;
> +	int ret;
> +
> +	/*
> +	 * Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and set the
> +	 * vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map
> +	 * the whole buffer.
> +	 */
> +	vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
> +	vma->vm_flags &= ~VM_PFNMAP;
> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +
> +	cma_obj = to_drm_gem_cma_obj(obj);
> +
> +	pfn = PHYS_PFN(dma_to_phys(cma_obj->base.dev->dev, cma_obj->paddr));
> +
> +	ret = remap_pfn_range(vma, vma->vm_start, pfn,
> +			      vma->vm_end - vma->vm_start,
> +			      vma->vm_page_prot);

dma_to_phys must not be used by drivers.

I have a proper helper for this waiting for users:

http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952

If you can confirm the helpers works for you I can try to still sneak
it to Linus for 5.12 to ease the merge pain.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data
  2021-03-07 20:28 ` [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
@ 2021-03-11 12:28   ` Christoph Hellwig
  2021-03-11 12:33     ` Paul Cercueil
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-03-11 12:28 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips

On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
> +	drm_atomic_for_each_plane_damage(&iter, &clip) {
> +		for (i = 0; i < finfo->num_planes; i++) {
> +			daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
> +
> +			/* Ignore x1/x2 values, invalidate complete lines */
> +			offset = clip.y1 * state->fb->pitches[i];
> +
> +			dma_sync_single_for_device(dev, daddr + offset,
> +				       (clip.y2 - clip.y1) * state->fb->pitches[i],
> +				       DMA_TO_DEVICE);

Are these helpers only ever used to transfer data to the device and
never from it?  If so please clearly document that.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers
  2021-03-07 20:28 ` [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
       [not found]   ` <20210308034727.1951-1-hdanton@sina.com>
@ 2021-03-11 12:30   ` Christoph Hellwig
  2021-03-11 13:40     ` Paul Cercueil
  1 sibling, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-03-11 12:30 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips

On Sun, Mar 07, 2021 at 08:28:35PM +0000, Paul Cercueil wrote:
> With the module parameter ingenic-drm.cached_gem_buffers, it is possible
> to specify that we want GEM buffers backed by non-coherent memory.

Shouldn't there be a way to discover this through a DT property?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent
  2021-03-11 12:26   ` Christoph Hellwig
@ 2021-03-11 12:32     ` Paul Cercueil
  2021-03-11 12:36       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-11 12:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips

Hi Christoph,

Le jeu. 11 mars 2021 à 12:26, Christoph Hellwig <hch@infradead.org> a 
écrit :
>>  +int drm_gem_cma_mmap_noncoherent(struct drm_gem_object *obj,
>>  +				 struct vm_area_struct *vma)
>>  +{
>>  +	struct drm_gem_cma_object *cma_obj;
>>  +	unsigned long pfn;
>>  +	int ret;
>>  +
>>  +	/*
>>  +	 * Clear the VM_PFNMAP flag that was set by drm_gem_mmap(), and 
>> set the
>>  +	 * vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want 
>> to map
>>  +	 * the whole buffer.
>>  +	 */
>>  +	vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
>>  +	vma->vm_flags &= ~VM_PFNMAP;
>>  +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
>>  +
>>  +	cma_obj = to_drm_gem_cma_obj(obj);
>>  +
>>  +	pfn = PHYS_PFN(dma_to_phys(cma_obj->base.dev->dev, 
>> cma_obj->paddr));
>>  +
>>  +	ret = remap_pfn_range(vma, vma->vm_start, pfn,
>>  +			      vma->vm_end - vma->vm_start,
>>  +			      vma->vm_page_prot);
> 
> dma_to_phys must not be used by drivers.
> 
> I have a proper helper for this waiting for users:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952
> 
> If you can confirm the helpers works for you I can try to still sneak
> it to Linus for 5.12 to ease the merge pain.

I can try. How do I get a page pointer from a dma_addr_t?

-Paul



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data
  2021-03-11 12:28   ` Christoph Hellwig
@ 2021-03-11 12:33     ` Paul Cercueil
  2021-03-15  7:43       ` Thomas Zimmermann
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-11 12:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips



Le jeu. 11 mars 2021 à 12:28, Christoph Hellwig <hch@infradead.org> a 
écrit :
> On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
>>  +	drm_atomic_for_each_plane_damage(&iter, &clip) {
>>  +		for (i = 0; i < finfo->num_planes; i++) {
>>  +			daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
>>  +
>>  +			/* Ignore x1/x2 values, invalidate complete lines */
>>  +			offset = clip.y1 * state->fb->pitches[i];
>>  +
>>  +			dma_sync_single_for_device(dev, daddr + offset,
>>  +				       (clip.y2 - clip.y1) * state->fb->pitches[i],
>>  +				       DMA_TO_DEVICE);
> 
> Are these helpers only ever used to transfer data to the device and
> never from it?  If so please clearly document that.

Yes. In the DRM world, are there cases where we transfer data from the 
device? I assume these cases are handled by v4l2 instead.

-Paul



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent
  2021-03-11 12:32     ` Paul Cercueil
@ 2021-03-11 12:36       ` Christoph Hellwig
  2021-03-11 16:12         ` Paul Cercueil
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2021-03-11 12:36 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Christoph Hellwig, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Sam Ravnborg, od,
	dri-devel, linux-kernel, linux-mips

On Thu, Mar 11, 2021 at 12:32:27PM +0000, Paul Cercueil wrote:
> > dma_to_phys must not be used by drivers.
> > 
> > I have a proper helper for this waiting for users:
> > 
> > http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952
> > 
> > If you can confirm the helpers works for you I can try to still sneak
> > it to Linus for 5.12 to ease the merge pain.
> 
> I can try. How do I get a page pointer from a dma_addr_t?

You don't - you get it from using virt_to_page on the pointer returned
from dma_alloc_noncoherent.  That beind said to keep the API sane I
should probably add a wrapper that does that for you.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers
  2021-03-11 12:30   ` Christoph Hellwig
@ 2021-03-11 13:40     ` Paul Cercueil
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Cercueil @ 2021-03-11 13:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips



Le jeu. 11 mars 2021 à 12:30, Christoph Hellwig <hch@infradead.org> a 
écrit :
> On Sun, Mar 07, 2021 at 08:28:35PM +0000, Paul Cercueil wrote:
>>  With the module parameter ingenic-drm.cached_gem_buffers, it is 
>> possible
>>  to specify that we want GEM buffers backed by non-coherent memory.
> 
> Shouldn't there be a way to discover this through a DT property?

Good question. My original way of thinking was that as this feature 
speeds up only software rendering, this is really 
application-dependent: a modern desktop where everything is rendered 
via the GPU wouldn't benefit much from it. With that in mind, it is 
fine as a module option.

On the other hand... the "software rendering is faster with 
non-coherent buffers" really is a SoC property, since it is only true 
for some generations of Ingenic SoCs and not others. So it would make 
sense to have a DT property for it.

-Paul



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent
  2021-03-11 12:36       ` Christoph Hellwig
@ 2021-03-11 16:12         ` Paul Cercueil
  2021-03-12 16:36           ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Cercueil @ 2021-03-11 16:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips



Le jeu. 11 mars 2021 à 12:36, Christoph Hellwig <hch@infradead.org> a 
écrit :
> On Thu, Mar 11, 2021 at 12:32:27PM +0000, Paul Cercueil wrote:
>>  > dma_to_phys must not be used by drivers.
>>  >
>>  > I have a proper helper for this waiting for users:
>>  >
>>  > 
>> http://git.infradead.org/users/hch/misc.git/commitdiff/96a546e7229ec53aadbdb7936d1e5e6cb5958952
>>  >
>>  > If you can confirm the helpers works for you I can try to still 
>> sneak
>>  > it to Linus for 5.12 to ease the merge pain.
>> 
>>  I can try. How do I get a page pointer from a dma_addr_t?
> 
> You don't - you get it from using virt_to_page on the pointer returned
> from dma_alloc_noncoherent.  That beind said to keep the API sane I
> should probably add a wrapper that does that for you.

I tested using:

ret = dma_mmap_pages(cma_obj->base.dev->dev,
                     vma, vma->vm_end - vma->vm_start,
                     virt_to_page(cma_obj->vaddr));

It works fine.

I think I can use remap_pfn_range() for now, and switch to your new API 
once it's available in drm-misc-next.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent
  2021-03-11 16:12         ` Paul Cercueil
@ 2021-03-12 16:36           ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2021-03-12 16:36 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Christoph Hellwig, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Daniel Vetter, Sam Ravnborg, od,
	dri-devel, linux-kernel, linux-mips

On Thu, Mar 11, 2021 at 04:12:55PM +0000, Paul Cercueil wrote:
> ret = dma_mmap_pages(cma_obj->base.dev->dev,
>                     vma, vma->vm_end - vma->vm_start,
>                     virt_to_page(cma_obj->vaddr));
> 
> It works fine.
> 
> I think I can use remap_pfn_range() for now, and switch to your new API once
> it's available in drm-misc-next.

No, drivers must not use dma_to_phys, and they also must not include
dma-direct.h.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data
  2021-03-11 12:33     ` Paul Cercueil
@ 2021-03-15  7:43       ` Thomas Zimmermann
  2021-03-15 11:59         ` Paul Cercueil
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Zimmermann @ 2021-03-15  7:43 UTC (permalink / raw)
  To: Paul Cercueil, Christoph Hellwig
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sam Ravnborg, od, dri-devel, linux-kernel, linux-mips


[-- Attachment #1.1: Type: text/plain, Size: 1639 bytes --]

Hi

Am 11.03.21 um 13:33 schrieb Paul Cercueil:
> 
> 
> Le jeu. 11 mars 2021 à 12:28, Christoph Hellwig <hch@infradead.org> a 
> écrit :
>> On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
>>>  +    drm_atomic_for_each_plane_damage(&iter, &clip) {
>>>  +        for (i = 0; i < finfo->num_planes; i++) {
>>>  +            daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
>>>  +
>>>  +            /* Ignore x1/x2 values, invalidate complete lines */
>>>  +            offset = clip.y1 * state->fb->pitches[i];
>>>  +
>>>  +            dma_sync_single_for_device(dev, daddr + offset,
>>>  +                       (clip.y2 - clip.y1) * state->fb->pitches[i],
>>>  +                       DMA_TO_DEVICE);
>>
>> Are these helpers only ever used to transfer data to the device and
>> never from it?  If so please clearly document that.
> 
> Yes. In the DRM world, are there cases where we transfer data from the 
> device? I assume these cases are handled by v4l2 instead.

Software rendering (i.e., anything wrt dumb buffers) likely reads back 
framebuffer content during blit operations. For devices where this is a 
slow operation (e.g., PCI read) we set struct 
drm_mode_config.prefer_shadow to hint renderers to use shadow buffering.

Best regards
Thomas

> 
> -Paul
> 
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data
  2021-03-15  7:43       ` Thomas Zimmermann
@ 2021-03-15 11:59         ` Paul Cercueil
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Cercueil @ 2021-03-15 11:59 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Christoph Hellwig, Maarten Lankhorst, Maxime Ripard,
	David Airlie, Daniel Vetter, Sam Ravnborg, od, dri-devel,
	linux-kernel, linux-mips

Hi Thomas,

Le lun. 15 mars 2021 à 8:43, Thomas Zimmermann <tzimmermann@suse.de> a 
écrit :
> Hi
> 
> Am 11.03.21 um 13:33 schrieb Paul Cercueil:
>> 
>> 
>> Le jeu. 11 mars 2021 à 12:28, Christoph Hellwig <hch@infradead.org> 
>> a \x7fécrit :
>>> On Sun, Mar 07, 2021 at 08:28:34PM +0000, Paul Cercueil wrote:
>>>>  +    drm_atomic_for_each_plane_damage(&iter, &clip) {
>>>>  +        for (i = 0; i < finfo->num_planes; i++) {
>>>>  +            daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
>>>>  +
>>>>  +            /* Ignore x1/x2 values, invalidate complete lines */
>>>>  +            offset = clip.y1 * state->fb->pitches[i];
>>>>  +
>>>>  +            dma_sync_single_for_device(dev, daddr + offset,
>>>>  +                       (clip.y2 - clip.y1) * 
>>>> state->fb->pitches[i],
>>>>  +                       DMA_TO_DEVICE);
>>> 
>>> Are these helpers only ever used to transfer data to the device and
>>> never from it?  If so please clearly document that.
>> 
>> Yes. In the DRM world, are there cases where we transfer data from 
>> the \x7fdevice? I assume these cases are handled by v4l2 instead.
> 
> Software rendering (i.e., anything wrt dumb buffers) likely reads 
> back framebuffer content during blit operations. For devices where 
> this is a slow operation (e.g., PCI read) we set struct 
> drm_mode_config.prefer_shadow to hint renderers to use shadow 
> buffering.

This has been brought up a few times already. I answered that in the 
cover letter. In my case, *writes* (e.g. dumb memcpy) are also slower 
with a write-combine buffer than with a non-coherent buffer + cache 
sync. So a shadow buffer does nothing for me.

Cheers,
-Paul



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-03-15 12:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-07 20:28 [PATCH v2 0/5] Add option to mmap GEM buffers cached Paul Cercueil
2021-03-07 20:28 ` [PATCH v2 1/5] drm: Add and export function drm_gem_cma_create_noncoherent Paul Cercueil
2021-03-11 12:23   ` Christoph Hellwig
2021-03-07 20:28 ` [PATCH v2 2/5] drm: Add and export function drm_gem_cma_dumb_create_noncoherent Paul Cercueil
2021-03-07 20:28 ` [PATCH v2 3/5] drm: Add and export function drm_gem_cma_mmap_noncoherent Paul Cercueil
2021-03-11 12:26   ` Christoph Hellwig
2021-03-11 12:32     ` Paul Cercueil
2021-03-11 12:36       ` Christoph Hellwig
2021-03-11 16:12         ` Paul Cercueil
2021-03-12 16:36           ` Christoph Hellwig
2021-03-07 20:28 ` [PATCH v2 4/5] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
2021-03-11 12:28   ` Christoph Hellwig
2021-03-11 12:33     ` Paul Cercueil
2021-03-15  7:43       ` Thomas Zimmermann
2021-03-15 11:59         ` Paul Cercueil
2021-03-07 20:28 ` [PATCH v2 5/5] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
     [not found]   ` <20210308034727.1951-1-hdanton@sina.com>
2021-03-10 19:01     ` Paul Cercueil
     [not found]       ` <20210311022743.2542-1-hdanton@sina.com>
2021-03-11 12:10         ` Paul Cercueil
2021-03-11 12:30   ` Christoph Hellwig
2021-03-11 13:40     ` Paul Cercueil
2021-03-08  8:41 ` [PATCH v2 0/5] Add option to mmap GEM buffers cached Thomas Zimmermann
2021-03-10 19:02   ` Paul Cercueil
2021-03-11  7:52     ` Thomas Zimmermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).