linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/3] Add option to mmap GEM buffers cached
@ 2021-05-15 14:53 Paul Cercueil
  2021-05-15 14:53 ` [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil

Rework of this morning's v3 patchset.

The drm_gem_cma_prime_mmap() function that was added in v3 is now
replaced, instead we pass a "private" parameter to the
__drm_gem_cma_create() function.

Patches 2/3 and 3/3 are unmodified since v3.

Cheers,
-Paul

Paul Cercueil (3):
  drm: Add support for GEM buffers backed by non-coherent memory
  drm: Add and export function drm_gem_cma_sync_data
  drm/ingenic: Add option to alloc cached GEM buffers

 drivers/gpu/drm/drm_gem_cma_helper.c      | 93 ++++++++++++++++++++---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 +++++++++++++-
 drivers/gpu/drm/ingenic/ingenic-ipu.c     | 18 ++++-
 include/drm/drm_gem_cma_helper.h          |  8 ++
 4 files changed, 160 insertions(+), 15 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory
  2021-05-15 14:53 [PATCH v4 0/3] Add option to mmap GEM buffers cached Paul Cercueil
@ 2021-05-15 14:53 ` Paul Cercueil
  2021-05-15 18:49   ` Thomas Zimmermann
  2021-05-15 14:53 ` [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
  2021-05-15 14:53 ` [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
  2 siblings, 1 reply; 8+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil

Having GEM buffers backed by non-coherent memory is interesting in the
particular case where it is faster to render to a non-coherent buffer
then sync the data cache, than to render to a write-combine buffer, and
(by extension) much faster than using a shadow buffer. This is true for
instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
are about three times faster using this method.

Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
can be set by the drivers when they create the dumb buffer.

Since this really only applies to software rendering, disable this flag
as soon as the CMA objects are exported via PRIME.

v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
    the objects are mapped, and use the new dma_mmap_pages function.

v4: Make sure map_noncoherent is always disabled when creating GEM
    objects meant to be used with dma-buf.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
 include/drm/drm_gem_cma_helper.h     |  3 +++
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 7942cf05cd93..235c7a63da2b 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
  * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
  * @drm: DRM device
  * @size: size of the object to allocate
+ * @private: true if used for internal purposes
  *
  * This function creates and initializes a GEM CMA object of the given size,
  * but doesn't allocate any memory to back the object.
@@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
  * error code on failure.
  */
 static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, size_t size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
 {
 	struct drm_gem_cma_object *cma_obj;
 	struct drm_gem_object *gem_obj;
-	int ret;
+	int ret = 0;
 
 	if (drm->driver->gem_create_object)
 		gem_obj = drm->driver->gem_create_object(drm, size);
@@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
 
 	cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
 
-	ret = drm_gem_object_init(drm, gem_obj, size);
+	if (private) {
+		drm_gem_private_object_init(drm, gem_obj, size);
+
+		/* Always use writecombine for dma-buf mappings */
+		cma_obj->map_noncoherent = false;
+	} else {
+		ret = drm_gem_object_init(drm, gem_obj, size);
+	}
 	if (ret)
 		goto error;
 
@@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
 
 	size = round_up(size, PAGE_SIZE);
 
-	cma_obj = __drm_gem_cma_create(drm, size);
+	cma_obj = __drm_gem_cma_create(drm, size, false);
 	if (IS_ERR(cma_obj))
 		return cma_obj;
 
-	cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
-				      GFP_KERNEL | __GFP_NOWARN);
+	if (cma_obj->map_noncoherent) {
+		cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
+						       &cma_obj->paddr,
+						       DMA_TO_DEVICE,
+						       GFP_KERNEL | __GFP_NOWARN);
+	} else {
+		cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
+					      GFP_KERNEL | __GFP_NOWARN);
+	}
 	if (!cma_obj->vaddr) {
 		drm_dbg(drm, "failed to allocate buffer with size %zu\n",
 			 size);
@@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
 		return ERR_PTR(-EINVAL);
 
 	/* Create a CMA GEM buffer. */
-	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
+	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
 	if (IS_ERR(cma_obj))
 		return ERR_CAST(cma_obj);
 
@@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
 
 	cma_obj = to_drm_gem_cma_obj(obj);
 
-	ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
-			  cma_obj->paddr, vma->vm_end - vma->vm_start);
+	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+	if (!cma_obj->map_noncoherent)
+		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+	ret = dma_mmap_pages(cma_obj->base.dev->dev,
+			     vma, vma->vm_end - vma->vm_start,
+			     virt_to_page(cma_obj->vaddr));
 	if (ret)
 		drm_gem_vm_close(vma);
 
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 0a9711caa3e8..cd13508acbc1 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
  *       more than one entry but they are guaranteed to have contiguous
  *       DMA addresses.
  * @vaddr: kernel virtual address of the backing memory
+ * @map_noncoherent: if true, the GEM object is backed by non-coherent memory
  */
 struct drm_gem_cma_object {
 	struct drm_gem_object base;
@@ -24,6 +25,8 @@ struct drm_gem_cma_object {
 
 	/* For objects with DMA memory allocated by GEM CMA */
 	void *vaddr;
+
+	bool map_noncoherent;
 };
 
 #define to_drm_gem_cma_obj(gem_obj) \
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data
  2021-05-15 14:53 [PATCH v4 0/3] Add option to mmap GEM buffers cached Paul Cercueil
  2021-05-15 14:53 ` [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
@ 2021-05-15 14:53 ` Paul Cercueil
  2021-05-15 19:06   ` Thomas Zimmermann
  2021-05-15 14:53 ` [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
  2 siblings, 1 reply; 8+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil

This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.

v3: - Only sync data if using GEM objects backed by non-coherent memory.
    - Use a drm_device pointer instead of device pointer in prototype

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/drm_gem_cma_helper.c | 55 ++++++++++++++++++++++++++++
 include/drm/drm_gem_cma_helper.h     |  5 +++
 2 files changed, 60 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 235c7a63da2b..41f309e0e049 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -17,9 +17,14 @@
 #include <linux/slab.h>
 
 #include <drm/drm.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_device.h>
 #include <drm/drm_drv.h>
+#include <drm/drm_fourcc.h>
+#include <drm/drm_fb_cma_helper.h>
+#include <drm/drm_framebuffer.h>
 #include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_plane.h>
 #include <drm/drm_vma_manager.h>
 
 /**
@@ -576,3 +581,53 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
 	return obj;
 }
 EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
+
+/**
+ * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
+ * @drm: DRM device
+ * @old_state: Old plane state
+ * @state: New plane state
+ *
+ * This function can be used by drivers that use damage clips and have
+ * CMA GEM objects backed by non-coherent memory. Calling this function
+ * in a plane's .atomic_update ensures that all the data in the backing
+ * memory have been written to RAM.
+ */
+void drm_gem_cma_sync_data(struct drm_device *drm,
+			   struct drm_plane_state *old_state,
+			   struct drm_plane_state *state)
+{
+	const struct drm_format_info *finfo = state->fb->format;
+	struct drm_atomic_helper_damage_iter iter;
+	const struct drm_gem_cma_object *cma_obj;
+	unsigned int offset, i;
+	struct drm_rect clip;
+	dma_addr_t daddr;
+
+	for (i = 0; i < finfo->num_planes; i++) {
+		cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
+
+		if (cma_obj->map_noncoherent)
+			break;
+	}
+
+	/* No non-coherent buffers - no need to sync anything. */
+	if (i == finfo->num_planes)
+		return;
+
+	drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+
+	drm_atomic_for_each_plane_damage(&iter, &clip) {
+		for (i = 0; i < finfo->num_planes; i++) {
+			daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
+
+			/* Ignore x1/x2 values, invalidate complete lines */
+			offset = clip.y1 * state->fb->pitches[i];
+
+			dma_sync_single_for_device(drm->dev, daddr + offset,
+				       (clip.y2 - clip.y1) * state->fb->pitches[i],
+				       DMA_TO_DEVICE);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index cd13508acbc1..76af066ae3a7 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -7,6 +7,7 @@
 #include <drm/drm_gem.h>
 
 struct drm_mode_create_dumb;
+struct drm_plane_state;
 
 /**
  * struct drm_gem_cma_object - GEM object backed by CMA memory allocations
@@ -185,4 +186,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
 				       struct dma_buf_attachment *attach,
 				       struct sg_table *sgt);
 
+void drm_gem_cma_sync_data(struct drm_device *drm,
+			   struct drm_plane_state *old_state,
+			   struct drm_plane_state *state);
+
 #endif /* __DRM_GEM_CMA_HELPER_H__ */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
  2021-05-15 14:53 [PATCH v4 0/3] Add option to mmap GEM buffers cached Paul Cercueil
  2021-05-15 14:53 ` [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
  2021-05-15 14:53 ` [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
@ 2021-05-15 14:53 ` Paul Cercueil
  2021-05-15 19:42   ` Thomas Zimmermann
  2 siblings, 1 reply; 8+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
	Paul Cercueil

Alloc GEM buffers backed by noncoherent memory on SoCs where it is
actually faster than write-combine.

This dramatically speeds up software rendering on these SoCs, even for
tasks where write-combine memory should in theory be faster (e.g. simple
blits).

v3: The option is now selected per-SoC instead of being a module
    parameter.

Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 ++++++++++++++++++++++-
 drivers/gpu/drm/ingenic/ingenic-ipu.c     | 18 ++++++--
 2 files changed, 68 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 09225b770bb8..5f64e8583eec 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -9,6 +9,7 @@
 #include <linux/component.h>
 #include <linux/clk.h>
 #include <linux/dma-mapping.h>
+#include <linux/io.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/of_device.h>
@@ -23,6 +24,7 @@
 #include <drm/drm_color_mgmt.h>
 #include <drm/drm_crtc.h>
 #include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_encoder.h>
 #include <drm/drm_gem_cma_helper.h>
@@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
 struct jz_soc_info {
 	bool needs_dev_clk;
 	bool has_osd;
+	bool map_noncoherent;
 	unsigned int max_width, max_height;
 	const u32 *formats_f0, *formats_f1;
 	unsigned int num_formats_f0, num_formats_f1;
@@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
 	     old_plane_state->fb->format->format != new_plane_state->fb->format->format))
 		crtc_state->mode_changed = true;
 
+	drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
 	return 0;
 }
 
@@ -544,8 +549,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
 					    struct drm_atomic_state *state)
 {
 	struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
-	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
-									  plane);
+	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+	struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
 	struct drm_crtc_state *crtc_state;
 	struct ingenic_dma_hwdesc *hwdesc;
 	unsigned int width, height, cpp, offset;
@@ -553,6 +558,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
 	u32 fourcc;
 
 	if (newstate && newstate->fb) {
+		drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
+
 		crtc_state = newstate->crtc->state;
 
 		addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
 	regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
 }
 
+static int ingenic_drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
+					     struct drm_file *file_priv,
+					     unsigned int flags,
+					     unsigned int color,
+					     struct drm_clip_rect *clips,
+					     unsigned int num_clips)
+{
+	struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
+
+	if (!priv->soc_info->map_noncoherent)
+		return 0;
+
+	return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
+					 color, clips, num_clips);
+}
+
+static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs = {
+	.destroy	= drm_gem_fb_destroy,
+	.create_handle	= drm_gem_fb_create_handle,
+	.dirty          = ingenic_drm_atomic_helper_dirtyfb,
+};
+
+static struct drm_gem_object *
+ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
+{
+	struct ingenic_drm *priv = drm_device_get_priv(drm);
+	struct drm_gem_cma_object *obj;
+
+	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+	if (!obj)
+		return ERR_PTR(-ENOMEM);
+
+	obj->map_noncoherent = priv->soc_info->map_noncoherent;
+
+	return &obj->base;
+}
+
 DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
 
 static const struct drm_driver ingenic_drm_driver_data = {
@@ -754,6 +798,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
 	.patchlevel		= 0,
 
 	.fops			= &ingenic_drm_fops,
+	.gem_create_object	= ingenic_drm_gem_create_object,
 	DRM_GEM_CMA_DRIVER_OPS,
 
 	.irq_handler		= ingenic_drm_irq_handler,
@@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
 		return ret;
 	}
 
+	drm_plane_enable_fb_damage_clips(&priv->f1);
+
 	drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
 
 	ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
 			return ret;
 		}
 
+		drm_plane_enable_fb_damage_clips(&priv->f0);
+
 		if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
 			ret = component_bind_all(dev, drm);
 			if (ret) {
@@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
 static const struct jz_soc_info jz4740_soc_info = {
 	.needs_dev_clk = true,
 	.has_osd = false,
+	.map_noncoherent = false,
 	.max_width = 800,
 	.max_height = 600,
 	.formats_f1 = jz4740_formats,
@@ -1255,6 +1305,7 @@ static const struct jz_soc_info jz4740_soc_info = {
 static const struct jz_soc_info jz4725b_soc_info = {
 	.needs_dev_clk = false,
 	.has_osd = true,
+	.map_noncoherent = false,
 	.max_width = 800,
 	.max_height = 600,
 	.formats_f1 = jz4725b_formats_f1,
@@ -1266,6 +1317,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
 static const struct jz_soc_info jz4770_soc_info = {
 	.needs_dev_clk = false,
 	.has_osd = true,
+	.map_noncoherent = true,
 	.max_width = 1280,
 	.max_height = 720,
 	.formats_f1 = jz4770_formats_f1,
diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 3b1091e7c0cd..a4d1b500c3ad 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -20,10 +20,13 @@
 
 #include <drm/drm_atomic.h>
 #include <drm/drm_atomic_helper.h>
+#include <drm/drm_damage_helper.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_fb_cma_helper.h>
 #include <drm/drm_fourcc.h>
 #include <drm/drm_gem_atomic_helper.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_gem_framebuffer_helper.h>
 #include <drm/drm_plane.h>
 #include <drm/drm_plane_helper.h>
 #include <drm/drm_property.h>
@@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
 					    struct drm_atomic_state *state)
 {
 	struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
-	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
-									  plane);
+	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+	struct drm_plane_state *oldstate = drm_atomic_get_new_plane_state(state, plane);
 	const struct drm_format_info *finfo;
 	u32 ctrl, stride = 0, coef_index = 0, format = 0;
 	bool needs_modeset, upscaling_w, upscaling_h;
@@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
 				JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
 	}
 
+	drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
+
 	/* New addresses will be committed in vblank handler... */
 	ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
 	if (finfo->num_planes > 1)
@@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 
 	if (!new_plane_state->crtc ||
 	    !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
-		return 0;
+		goto out_check_damage;
 
 	/* Plane must be fully visible */
 	if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 		return -EINVAL;
 
 	if (!osd_changed(new_plane_state, old_plane_state))
-		return 0;
+		goto out_check_damage;
 
 	crtc_state->mode_changed = true;
 
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
 	ipu->denom_w = denom_w;
 	ipu->denom_h = denom_h;
 
+out_check_damage:
+	drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
 	return 0;
 }
 
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d)
 		return err;
 	}
 
+	drm_plane_enable_fb_damage_clips(plane);
+
 	/*
 	 * Sharpness settings range is [0,32]
 	 * 0       : nearest-neighbor
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory
  2021-05-15 14:53 ` [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
@ 2021-05-15 18:49   ` Thomas Zimmermann
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 18:49 UTC (permalink / raw)
  To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
	Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips


[-- Attachment #1.1: Type: text/plain, Size: 5916 bytes --]



Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> Having GEM buffers backed by non-coherent memory is interesting in the
> particular case where it is faster to render to a non-coherent buffer
> then sync the data cache, than to render to a write-combine buffer, and
> (by extension) much faster than using a shadow buffer. This is true for
> instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
> are about three times faster using this method.
> 
> Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
> can be set by the drivers when they create the dumb buffer.
> 
> Since this really only applies to software rendering, disable this flag
> as soon as the CMA objects are exported via PRIME.
> 
> v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
>      the objects are mapped, and use the new dma_mmap_pages function.
> 
> v4: Make sure map_noncoherent is always disabled when creating GEM
>      objects meant to be used with dma-buf.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>

Acked-by: Thomas Zimmermann <tzimmermann@suse.de>

> ---
>   drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
>   include/drm/drm_gem_cma_helper.h     |  3 +++
>   2 files changed, 32 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 7942cf05cd93..235c7a63da2b 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
>    * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
>    * @drm: DRM device
>    * @size: size of the object to allocate
> + * @private: true if used for internal purposes
>    *
>    * This function creates and initializes a GEM CMA object of the given size,
>    * but doesn't allocate any memory to back the object.
> @@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
>    * error code on failure.
>    */
>   static struct drm_gem_cma_object *
> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
> +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
>   {
>   	struct drm_gem_cma_object *cma_obj;
>   	struct drm_gem_object *gem_obj;
> -	int ret;
> +	int ret = 0;
>   
>   	if (drm->driver->gem_create_object)
>   		gem_obj = drm->driver->gem_create_object(drm, size);
> @@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
>   
>   	cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
>   
> -	ret = drm_gem_object_init(drm, gem_obj, size);
> +	if (private) {
> +		drm_gem_private_object_init(drm, gem_obj, size);
> +
> +		/* Always use writecombine for dma-buf mappings */
> +		cma_obj->map_noncoherent = false;
> +	} else {
> +		ret = drm_gem_object_init(drm, gem_obj, size);
> +	}
>   	if (ret)
>   		goto error;
>   
> @@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
>   
>   	size = round_up(size, PAGE_SIZE);
>   
> -	cma_obj = __drm_gem_cma_create(drm, size);
> +	cma_obj = __drm_gem_cma_create(drm, size, false);
>   	if (IS_ERR(cma_obj))
>   		return cma_obj;
>   
> -	cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> -				      GFP_KERNEL | __GFP_NOWARN);
> +	if (cma_obj->map_noncoherent) {
> +		cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
> +						       &cma_obj->paddr,
> +						       DMA_TO_DEVICE,
> +						       GFP_KERNEL | __GFP_NOWARN);
> +	} else {
> +		cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> +					      GFP_KERNEL | __GFP_NOWARN);
> +	}
>   	if (!cma_obj->vaddr) {
>   		drm_dbg(drm, "failed to allocate buffer with size %zu\n",
>   			 size);
> @@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device 
*dev,
>   		return ERR_PTR(-EINVAL);
>   
>   	/* Create a CMA GEM buffer. */
> -	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
> +	cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
>   	if (IS_ERR(cma_obj))
>   		return ERR_CAST(cma_obj);
>   
> @@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
>   
>   	cma_obj = to_drm_gem_cma_obj(obj);
>   
> -	ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
> -			  cma_obj->paddr, vma->vm_end - vma->vm_start);
> +	vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> +	if (!cma_obj->map_noncoherent)
> +		vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> +	ret = dma_mmap_pages(cma_obj->base.dev->dev,
> +			     vma, vma->vm_end - vma->vm_start,
> +			     virt_to_page(cma_obj->vaddr));
>   	if (ret)
>   		drm_gem_vm_close(vma);
>   
> diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
> index 0a9711caa3e8..cd13508acbc1 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
>    *       more than one entry but they are guaranteed to have contiguous
>    *       DMA addresses.
>    * @vaddr: kernel virtual address of the backing memory
> + * @map_noncoherent: if true, the GEM object is backed by non-coherent 
memory
>    */
>   struct drm_gem_cma_object {
>   	struct drm_gem_object base;
> @@ -24,6 +25,8 @@ struct drm_gem_cma_object {
>   
>   	/* For objects with DMA memory allocated by GEM CMA */
>   	void *vaddr;
> +
> +	bool map_noncoherent;
>   };
>   
>   #define to_drm_gem_cma_obj(gem_obj) \
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data
  2021-05-15 14:53 ` [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
@ 2021-05-15 19:06   ` Thomas Zimmermann
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 19:06 UTC (permalink / raw)
  To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
	Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips


[-- Attachment #1.1: Type: text/plain, Size: 5094 bytes --]

Hi

Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> This function can be used by drivers that use damage clips and have
> CMA GEM objects backed by non-coherent memory. Calling this function
> in a plane's .atomic_update ensures that all the data in the backing
> memory have been written to RAM.
> 
> v3: - Only sync data if using GEM objects backed by non-coherent memory.
>      - Use a drm_device pointer instead of device pointer in prototype
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>   drivers/gpu/drm/drm_gem_cma_helper.c | 55 ++++++++++++++++++++++++++++
>   include/drm/drm_gem_cma_helper.h     |  5 +++
>   2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 235c7a63da2b..41f309e0e049 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -17,9 +17,14 @@
>   #include <linux/slab.h>
>   
>   #include <drm/drm.h>
> +#include <drm/drm_damage_helper.h>
>   #include <drm/drm_device.h>
>   #include <drm/drm_drv.h>
> +#include <drm/drm_fourcc.h>
> +#include <drm/drm_fb_cma_helper.h>

Alphabetical order:

fb < fourcc

> +#include <drm/drm_framebuffer.h>
>   #include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_plane.h>
>   #include <drm/drm_vma_manager.h>
>   
>   /**
> @@ -576,3 +581,53 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
>   	return obj;
>   }
>   EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
> +
> +/**
> + * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
> + * @drm: DRM device
> + * @old_state: Old plane state
> + * @state: New plane state
> + *
> + * This function can be used by drivers that use damage clips and have
> + * CMA GEM objects backed by non-coherent memory. Calling this function
> + * in a plane's .atomic_update ensures that all the data in the backing
> + * memory have been written to RAM.
> + */
> +void drm_gem_cma_sync_data(struct drm_device *drm,
> +			   struct drm_plane_state *old_state,
> +			   struct drm_plane_state *state)
> +{
> +	const struct drm_format_info *finfo = state->fb->format;
> +	struct drm_atomic_helper_damage_iter iter;
> +	const struct drm_gem_cma_object *cma_obj;
> +	unsigned int offset, i;
> +	struct drm_rect clip;
> +	dma_addr_t daddr;
> +
> +	for (i = 0; i < finfo->num_planes; i++) {
> +		cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
> +
> +		if (cma_obj->map_noncoherent)
> +			break;
> +	}
> +
> +	/* No non-coherent buffers - no need to sync anything. */
> +	if (i == finfo->num_planes)
> +		return;
> +
> +	drm_atomic_helper_damage_iter_init(&iter, old_state, state);
> +
> +	drm_atomic_for_each_plane_damage(&iter, &clip) {
> +		for (i = 0; i < finfo->num_planes; i++) {
> +			daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
> +
> +			/* Ignore x1/x2 values, invalidate complete lines */
> +			offset = clip.y1 * state->fb->pitches[i];
> +
> +			dma_sync_single_for_device(drm->dev, daddr + offset,
> +				       (clip.y2 - clip.y1) * state->fb->pitches[i],
> +				       DMA_TO_DEVICE);

A framebuffer can have multiple BOs with different coherency. The 
current loop syncs every BO, but you only have to sync non-coherent memory.

I suggest to merge the above test loop into this sync loop, such that 
only non-coherent BOs get synced

damage_iter_init(iter)

for_each_damage_plane(iter) {
   for (i < finfo->num_planes) {
     cma_obj = drm_fb_cma_get_gem_obj(i)
     if (!cma_obj->non_coherent)
       continue;
     dma_sync_single_for_device()
   }
}

For cache locality, it might be better to exchange the loops:

for (i < finfo->num_planes) {

   damage_iter_init(iter)
   for_each_damage_plane(iter) {


   }
}

This way, you operate on the BOs one by one.

> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
> diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
> index cd13508acbc1..76af066ae3a7 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -7,6 +7,7 @@
>   #include <drm/drm_gem.h>
>   
>   struct drm_mode_create_dumb;
> +struct drm_plane_state;
>   
>   /**
>    * struct drm_gem_cma_object - GEM object backed by CMA memory allocations
> @@ -185,4 +186,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
>   				       struct dma_buf_attachment *attach,
>   				       struct sg_table *sgt);
>   
> +void drm_gem_cma_sync_data(struct drm_device *drm,
> +			   struct drm_plane_state *old_state,
> +			   struct drm_plane_state *state);
> +

Maybe call this function drm_gem_cma_sync_non_coherent() so that it's 
clear what the sync is about.

Best regards
Thomas

>   #endif /* __DRM_GEM_CMA_HELPER_H__ */
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
  2021-05-15 14:53 ` [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
@ 2021-05-15 19:42   ` Thomas Zimmermann
  2021-05-15 20:08     ` Paul Cercueil
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 19:42 UTC (permalink / raw)
  To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
	Daniel Vetter
  Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips


[-- Attachment #1.1: Type: text/plain, Size: 10297 bytes --]

Hi

Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> Alloc GEM buffers backed by noncoherent memory on SoCs where it is
> actually faster than write-combine.
> 
> This dramatically speeds up software rendering on these SoCs, even for
> tasks where write-combine memory should in theory be faster (e.g. simple
> blits).
> 
> v3: The option is now selected per-SoC instead of being a module
>      parameter.
> 
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 ++++++++++++++++++++++-
>   drivers/gpu/drm/ingenic/ingenic-ipu.c     | 18 ++++++--
>   2 files changed, 68 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> index 09225b770bb8..5f64e8583eec 100644
> --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> @@ -9,6 +9,7 @@
>   #include <linux/component.h>
>   #include <linux/clk.h>
>   #include <linux/dma-mapping.h>
> +#include <linux/io.h>
>   #include <linux/module.h>
>   #include <linux/mutex.h>
>   #include <linux/of_device.h>
> @@ -23,6 +24,7 @@
>   #include <drm/drm_color_mgmt.h>
>   #include <drm/drm_crtc.h>
>   #include <drm/drm_crtc_helper.h>
> +#include <drm/drm_damage_helper.h>
>   #include <drm/drm_drv.h>
>   #include <drm/drm_encoder.h>
>   #include <drm/drm_gem_cma_helper.h>
> @@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
>   struct jz_soc_info {
>   	bool needs_dev_clk;
>   	bool has_osd;
> +	bool map_noncoherent;
>   	unsigned int max_width, max_height;
>   	const u32 *formats_f0, *formats_f1;
>   	unsigned int num_formats_f0, num_formats_f1;
> @@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
>   	     old_plane_state->fb->format->format != new_plane_state->fb->format->format))
>   		crtc_state->mode_changed = true;
>   
> +	drm_atomic_helper_check_plane_damage(state, new_plane_state);
> +
>   	return 0;
>   }
>   
> @@ -544,8 +549,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>   					    struct drm_atomic_state *state)
>   {
>   	struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
> -	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
> -									  plane);
> +	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
> +	struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
>   	struct drm_crtc_state *crtc_state;
>   	struct ingenic_dma_hwdesc *hwdesc;
>   	unsigned int width, height, cpp, offset;
> @@ -553,6 +558,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>   	u32 fourcc;
>   
>   	if (newstate && newstate->fb) {
> +		drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
> +
>   		crtc_state = newstate->crtc->state;
>   
>   		addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
> @@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
>   	regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
>   }
>   
> +static int ingenic_drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> +					     struct drm_file *file_priv,
> +					     unsigned int flags,
> +					     unsigned int color,
> +					     struct drm_clip_rect *clips,
> +					     unsigned int num_clips)
> +{
> +	struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
> +
> +	if (!priv->soc_info->map_noncoherent)
> +		return 0;

I'm not sure you can get away without calling 
drm_atomic_helper_dirtyfb(). The function does some things with the 
plane's damage-clips property. If you don't call it here, the plane 
might pile up unhandled clipping areas. It's better to call it and rely 
on the test in drm_gem_cma_sync_data(). See below on how to optimize this.

> +
> +	return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
> +					 color, clips, num_clips);
> +}
> +
> +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs = 
{
> +	.destroy	= drm_gem_fb_destroy,
> +	.create_handle	= drm_gem_fb_create_handle,
> +	.dirty          = ingenic_drm_atomic_helper_dirtyfb,
> +};

You don't seem to be using this anywhere. You have to implement a custom 
fb_create for drm_mode_config_funcs. [1]

BUT: I think the overall approach should be to only use this on SoCs 
with non-coherency setting. Use drm_gem_fb_create() on systems without 
non-coherency and use drm_gem_fb_create_with_dirty() on systems with 
non-coherency (i.e., have two instances of drm_mode_config_funcs). Only 
call drm_plane_enable_fb_damage_clips() on systems with non-coherency.

> +
> +static struct drm_gem_object *
> +ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
> +{
> +	struct ingenic_drm *priv = drm_device_get_priv(drm);
> +	struct drm_gem_cma_object *obj;
> +
> +	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> +	if (!obj)
> +		return ERR_PTR(-ENOMEM);
> +
> +	obj->map_noncoherent = priv->soc_info->map_noncoherent;
> +
> +	return &obj->base;
> +}
> +
>   DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>   
>   static const struct drm_driver ingenic_drm_driver_data = {
> @@ -754,6 +798,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
>   	.patchlevel		= 0,
>   
>   	.fops			= &ingenic_drm_fops,
> +	.gem_create_object	= ingenic_drm_gem_create_object,
>   	DRM_GEM_CMA_DRIVER_OPS,
>   
>   	.irq_handler		= ingenic_drm_irq_handler,
> @@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
>   		return ret;
>   	}
>   
> +	drm_plane_enable_fb_damage_clips(&priv->f1);
> +
>   	drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
>   
>   	ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
> @@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
>   			return ret;
>   		}
>   
> +		drm_plane_enable_fb_damage_clips(&priv->f0);
> +
>   		if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
>   			ret = component_bind_all(dev, drm);
>   			if (ret) {
> @@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
>   static const struct jz_soc_info jz4740_soc_info = {
>   	.needs_dev_clk = true,
>   	.has_osd = false,
> +	.map_noncoherent = false,
>   	.max_width = 800,
>   	.max_height = 600,
>   	.formats_f1 = jz4740_formats,
> @@ -1255,6 +1305,7 @@ static const struct jz_soc_info jz4740_soc_info = 
{
>   static const struct jz_soc_info jz4725b_soc_info = {
>   	.needs_dev_clk = false,
>   	.has_osd = true,
> +	.map_noncoherent = false,
>   	.max_width = 800,
>   	.max_height = 600,
>   	.formats_f1 = jz4725b_formats_f1,
> @@ -1266,6 +1317,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
>   static const struct jz_soc_info jz4770_soc_info = {
>   	.needs_dev_clk = false,
>   	.has_osd = true,
> +	.map_noncoherent = true,
>   	.max_width = 1280,
>   	.max_height = 720,
>   	.formats_f1 = jz4770_formats_f1,
> diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
> index 3b1091e7c0cd..a4d1b500c3ad 100644
> --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
> +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
> @@ -20,10 +20,13 @@
>   
>   #include <drm/drm_atomic.h>
>   #include <drm/drm_atomic_helper.h>
> +#include <drm/drm_damage_helper.h>
>   #include <drm/drm_drv.h>
>   #include <drm/drm_fb_cma_helper.h>
>   #include <drm/drm_fourcc.h>
>   #include <drm/drm_gem_atomic_helper.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_gem_framebuffer_helper.h>
>   #include <drm/drm_plane.h>
>   #include <drm/drm_plane_helper.h>
>   #include <drm/drm_property.h>
> @@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>   					    struct drm_atomic_state *state)
>   {
>   	struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
> -	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
> -									  plane);
> +	struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
> +	struct drm_plane_state *oldstate = drm_atomic_get_new_plane_state(state, plane);

get_old_state ?

>   	const struct drm_format_info *finfo;
>   	u32 ctrl, stride = 0, coef_index = 0, format = 0;
>   	bool needs_modeset, upscaling_w, upscaling_h;
> @@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>   				JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
>   	}
>   
> +	drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
> +

If you want to optimize, maybe put this line behind

   if (priv->soc_info->map_noncoherent)

>   	/* New addresses will be committed in vblank handler... */
>   	ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>   	if (finfo->num_planes > 1)
> @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>   
>   	if (!new_plane_state->crtc ||
>   	    !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
> -		return 0;
> +		goto out_check_damage;
>   
>   	/* Plane must be fully visible */
>   	if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
> @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>   		return -EINVAL;
>   
>   	if (!osd_changed(new_plane_state, old_plane_state))
> -		return 0;
> +		goto out_check_damage;
>   
>   	crtc_state->mode_changed = true;
>   
> @@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>   	ipu->denom_w = denom_w;
>   	ipu->denom_h = denom_h;
>   
> +out_check_damage:
> +	drm_atomic_helper_check_plane_damage(state, new_plane_state);
> +

If you implement my suggestion above, this line could also be behind

   if (priv->soc_info->map_noncoherent)

Best regards

Thomas



[1] 
https://elixir.bootlin.com/linux/v5.13-rc1/source/drivers/gpu/drm/ingenic/ingenic-drm-drv.c#L808



-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
  2021-05-15 19:42   ` Thomas Zimmermann
@ 2021-05-15 20:08     ` Paul Cercueil
  0 siblings, 0 replies; 8+ messages in thread
From: Paul Cercueil @ 2021-05-15 20:08 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips

Hi Thomas,


Le sam., mai 15 2021 at 21:42:40 +0200, Thomas Zimmermann 
<tzimmermann@suse.de> a écrit :
> Hi
> 
> Am 15.05.21 um 16:53 schrieb Paul Cercueil:
>> Alloc GEM buffers backed by noncoherent memory on SoCs where it is
>> actually faster than write-combine.
>> 
>> This dramatically speeds up software rendering on these SoCs, even 
>> for
>> tasks where write-combine memory should in theory be faster (e.g. 
>> simple
>> blits).
>> 
>> v3: The option is now selected per-SoC instead of being a module
>>      parameter.
>> 
>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> ---
>>   drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 
>> ++++++++++++++++++++++-
>>   drivers/gpu/drm/ingenic/ingenic-ipu.c     | 18 ++++++--
>>   2 files changed, 68 insertions(+), 6 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
>> b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> index 09225b770bb8..5f64e8583eec 100644
>> --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> @@ -9,6 +9,7 @@
>>   #include <linux/component.h>
>>   #include <linux/clk.h>
>>   #include <linux/dma-mapping.h>
>> +#include <linux/io.h>
>>   #include <linux/module.h>
>>   #include <linux/mutex.h>
>>   #include <linux/of_device.h>
>> @@ -23,6 +24,7 @@
>>   #include <drm/drm_color_mgmt.h>
>>   #include <drm/drm_crtc.h>
>>   #include <drm/drm_crtc_helper.h>
>> +#include <drm/drm_damage_helper.h>
>>   #include <drm/drm_drv.h>
>>   #include <drm/drm_encoder.h>
>>   #include <drm/drm_gem_cma_helper.h>
>> @@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
>>   struct jz_soc_info {
>>   	bool needs_dev_clk;
>>   	bool has_osd;
>> +	bool map_noncoherent;
>>   	unsigned int max_width, max_height;
>>   	const u32 *formats_f0, *formats_f1;
>>   	unsigned int num_formats_f0, num_formats_f1;
>> @@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct 
>> drm_plane *plane,
>>   	     old_plane_state->fb->format->format != 
>> new_plane_state->fb->format->format))
>>   		crtc_state->mode_changed = true;
>>   \x7f+	drm_atomic_helper_check_plane_damage(state, new_plane_state);
>> +
>>   	return 0;
>>   }
>>   \x7f@@ -544,8 +549,8 @@ static void 
>> ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>>   					    struct drm_atomic_state *state)
>>   {
>>   	struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
>> -	struct drm_plane_state *newstate = 
>> drm_atomic_get_new_plane_state(state,
>> -									  plane);
>> +	struct drm_plane_state *newstate = 
>> drm_atomic_get_new_plane_state(state, plane);
>> +	struct drm_plane_state *oldstate = 
>> drm_atomic_get_old_plane_state(state, plane);
>>   	struct drm_crtc_state *crtc_state;
>>   	struct ingenic_dma_hwdesc *hwdesc;
>>   	unsigned int width, height, cpp, offset;
>> @@ -553,6 +558,8 @@ static void 
>> ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>>   	u32 fourcc;
>>   \x7f  	if (newstate && newstate->fb) {
>> +		drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
>> +
>>   		crtc_state = newstate->crtc->state;
>>   \x7f  		addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>> @@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct 
>> drm_crtc *crtc)
>>   	regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, 
>> JZ_LCD_CTRL_EOF_IRQ, 0);
>>   }
>>   \x7f+static int ingenic_drm_atomic_helper_dirtyfb(struct 
>> drm_framebuffer *fb,
>> +					     struct drm_file *file_priv,
>> +					     unsigned int flags,
>> +					     unsigned int color,
>> +					     struct drm_clip_rect *clips,
>> +					     unsigned int num_clips)
>> +{
>> +	struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
>> +
>> +	if (!priv->soc_info->map_noncoherent)
>> +		return 0;
> 
> I'm not sure you can get away without calling 
> drm_atomic_helper_dirtyfb(). The function does some things with the 
> plane's damage-clips property. If you don't call it here, the plane 
> might pile up unhandled clipping areas. It's better to call it and 
> rely on the test in drm_gem_cma_sync_data(). See below on how to 
> optimize this.

I guess that would work if I don't enable fb damage clips if 
!map_noncoherent, but I'm missing that.

>> +
>> +	return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
>> +					 color, clips, num_clips);
>> +}
>> +
>> +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs 
>> = {
>> +	.destroy	= drm_gem_fb_destroy,
>> +	.create_handle	= drm_gem_fb_create_handle,
>> +	.dirty          = ingenic_drm_atomic_helper_dirtyfb,
>> +};
> 
> You don't seem to be using this anywhere. You have to implement a 
> custom fb_create for drm_mode_config_funcs. [1]

You are totally right, my v2 had a ingenic_drm_gem_fb_create() and it 
got dropped somehow.

> BUT: I think the overall approach should be to only use this on SoCs 
> with non-coherency setting. Use drm_gem_fb_create() on systems 
> without non-coherency and use drm_gem_fb_create_with_dirty() on 
> systems with non-coherency (i.e., have two instances of 
> drm_mode_config_funcs). Only call drm_plane_enable_fb_damage_clips() 
> on systems with non-coherency.

Yes, that was the idea - enable this according to 
(priv->soc_info->map_noncoherent) which is set only on the SoCs where 
it makes sense.

>> +
>> +static struct drm_gem_object *
>> +ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
>> +{
>> +	struct ingenic_drm *priv = drm_device_get_priv(drm);
>> +	struct drm_gem_cma_object *obj;
>> +
>> +	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
>> +	if (!obj)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	obj->map_noncoherent = priv->soc_info->map_noncoherent;
>> +
>> +	return &obj->base;
>> +}
>> +
>>   DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>>   \x7f  static const struct drm_driver ingenic_drm_driver_data = {
>> @@ -754,6 +798,7 @@ static const struct drm_driver 
>> ingenic_drm_driver_data = {
>>   	.patchlevel		= 0,
>>   \x7f  	.fops			= &ingenic_drm_fops,
>> +	.gem_create_object	= ingenic_drm_gem_create_object,
>>   	DRM_GEM_CMA_DRIVER_OPS,
>>   \x7f  	.irq_handler		= ingenic_drm_irq_handler,
>> @@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev, 
>> bool has_components)
>>   		return ret;
>>   	}
>>   \x7f+	drm_plane_enable_fb_damage_clips(&priv->f1);
>> +
>>   	drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
>>   \x7f  	ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
>> @@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev, 
>> bool has_components)
>>   			return ret;
>>   		}
>>   \x7f+		drm_plane_enable_fb_damage_clips(&priv->f0);
>> +
>>   		if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
>>   			ret = component_bind_all(dev, drm);
>>   			if (ret) {
>> @@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
>>   static const struct jz_soc_info jz4740_soc_info = {
>>   	.needs_dev_clk = true,
>>   	.has_osd = false,
>> +	.map_noncoherent = false,
>>   	.max_width = 800,
>>   	.max_height = 600,
>>   	.formats_f1 = jz4740_formats,
>> @@ -1255,6 +1305,7 @@ static const struct jz_soc_info 
>> jz4740_soc_info = {
>>   static const struct jz_soc_info jz4725b_soc_info = {
>>   	.needs_dev_clk = false,
>>   	.has_osd = true,
>> +	.map_noncoherent = false,
>>   	.max_width = 800,
>>   	.max_height = 600,
>>   	.formats_f1 = jz4725b_formats_f1,
>> @@ -1266,6 +1317,7 @@ static const struct jz_soc_info 
>> jz4725b_soc_info = {
>>   static const struct jz_soc_info jz4770_soc_info = {
>>   	.needs_dev_clk = false,
>>   	.has_osd = true,
>> +	.map_noncoherent = true,
>>   	.max_width = 1280,
>>   	.max_height = 720,
>>   	.formats_f1 = jz4770_formats_f1,
>> diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c 
>> b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> index 3b1091e7c0cd..a4d1b500c3ad 100644
>> --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> @@ -20,10 +20,13 @@
>>   \x7f  #include <drm/drm_atomic.h>
>>   #include <drm/drm_atomic_helper.h>
>> +#include <drm/drm_damage_helper.h>
>>   #include <drm/drm_drv.h>
>>   #include <drm/drm_fb_cma_helper.h>
>>   #include <drm/drm_fourcc.h>
>>   #include <drm/drm_gem_atomic_helper.h>
>> +#include <drm/drm_gem_cma_helper.h>
>> +#include <drm/drm_gem_framebuffer_helper.h>
>>   #include <drm/drm_plane.h>
>>   #include <drm/drm_plane_helper.h>
>>   #include <drm/drm_property.h>
>> @@ -285,8 +288,8 @@ static void 
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>>   					    struct drm_atomic_state *state)
>>   {
>>   	struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
>> -	struct drm_plane_state *newstate = 
>> drm_atomic_get_new_plane_state(state,
>> -									  plane);
>> +	struct drm_plane_state *newstate = 
>> drm_atomic_get_new_plane_state(state, plane);
>> +	struct drm_plane_state *oldstate = 
>> drm_atomic_get_new_plane_state(state, plane);
> 
> get_old_state ?
> 
>>   	const struct drm_format_info *finfo;
>>   	u32 ctrl, stride = 0, coef_index = 0, format = 0;
>>   	bool needs_modeset, upscaling_w, upscaling_h;
>> @@ -317,6 +320,8 @@ static void 
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>>   				JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
>>   	}
>>   \x7f+	drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
>> +
> 
> If you want to optimize, maybe put this line behind
> 
>   if (priv->soc_info->map_noncoherent)
> 
>>   	/* New addresses will be committed in vblank handler... */
>>   	ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>>   	if (finfo->num_planes > 1)
>> @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct 
>> drm_plane *plane,
>>   \x7f  	if (!new_plane_state->crtc ||
>>   	    !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
>> -		return 0;
>> +		goto out_check_damage;
>>   \x7f  	/* Plane must be fully visible */
>>   	if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
>> @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct 
>> drm_plane *plane,
>>   		return -EINVAL;
>>   \x7f  	if (!osd_changed(new_plane_state, old_plane_state))
>> -		return 0;
>> +		goto out_check_damage;
>>   \x7f  	crtc_state->mode_changed = true;
>>   \x7f@@ -592,6 +597,9 @@ static int 
>> ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>>   	ipu->denom_w = denom_w;
>>   	ipu->denom_h = denom_h;
>>   \x7f+out_check_damage:
>> +	drm_atomic_helper_check_plane_damage(state, new_plane_state);
>> +
> 
> If you implement my suggestion above, this line could also be behind
> 
>   if (priv->soc_info->map_noncoherent)

Noted, thanks.

Cheers,
-Paul

> Best regards
> 
> Thomas
> 
> 
> 
> [1] 
> https://elixir.bootlin.com/linux/v5.13-rc1/source/drivers/gpu/drm/ingenic/ingenic-drm-drv.c#L808
> 
> 
> 
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-15 20:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-15 14:53 [PATCH v4 0/3] Add option to mmap GEM buffers cached Paul Cercueil
2021-05-15 14:53 ` [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
2021-05-15 18:49   ` Thomas Zimmermann
2021-05-15 14:53 ` [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
2021-05-15 19:06   ` Thomas Zimmermann
2021-05-15 14:53 ` [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
2021-05-15 19:42   ` Thomas Zimmermann
2021-05-15 20:08     ` Paul Cercueil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).