* [PATCH v4 0/3] Add option to mmap GEM buffers cached
@ 2021-05-15 14:53 ` Paul Cercueil
0 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
Paul Cercueil
Rework of this morning's v3 patchset.
The drm_gem_cma_prime_mmap() function that was added in v3 is now
replaced, instead we pass a "private" parameter to the
__drm_gem_cma_create() function.
Patches 2/3 and 3/3 are unmodified since v3.
Cheers,
-Paul
Paul Cercueil (3):
drm: Add support for GEM buffers backed by non-coherent memory
drm: Add and export function drm_gem_cma_sync_data
drm/ingenic: Add option to alloc cached GEM buffers
drivers/gpu/drm/drm_gem_cma_helper.c | 93 ++++++++++++++++++++---
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 +++++++++++++-
drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++-
include/drm/drm_gem_cma_helper.h | 8 ++
4 files changed, 160 insertions(+), 15 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 0/3] Add option to mmap GEM buffers cached
@ 2021-05-15 14:53 ` Paul Cercueil
0 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Paul Cercueil, linux-kernel, dri-devel, linux-mips,
Christoph Hellwig, list
Rework of this morning's v3 patchset.
The drm_gem_cma_prime_mmap() function that was added in v3 is now
replaced, instead we pass a "private" parameter to the
__drm_gem_cma_create() function.
Patches 2/3 and 3/3 are unmodified since v3.
Cheers,
-Paul
Paul Cercueil (3):
drm: Add support for GEM buffers backed by non-coherent memory
drm: Add and export function drm_gem_cma_sync_data
drm/ingenic: Add option to alloc cached GEM buffers
drivers/gpu/drm/drm_gem_cma_helper.c | 93 ++++++++++++++++++++---
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 +++++++++++++-
drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++-
include/drm/drm_gem_cma_helper.h | 8 ++
4 files changed, 160 insertions(+), 15 deletions(-)
--
2.30.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory
2021-05-15 14:53 ` Paul Cercueil
@ 2021-05-15 14:53 ` Paul Cercueil
-1 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
Paul Cercueil
Having GEM buffers backed by non-coherent memory is interesting in the
particular case where it is faster to render to a non-coherent buffer
then sync the data cache, than to render to a write-combine buffer, and
(by extension) much faster than using a shadow buffer. This is true for
instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
are about three times faster using this method.
Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
can be set by the drivers when they create the dumb buffer.
Since this really only applies to software rendering, disable this flag
as soon as the CMA objects are exported via PRIME.
v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
the objects are mapped, and use the new dma_mmap_pages function.
v4: Make sure map_noncoherent is always disabled when creating GEM
objects meant to be used with dma-buf.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
include/drm/drm_gem_cma_helper.h | 3 +++
2 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 7942cf05cd93..235c7a63da2b 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
* __drm_gem_cma_create - Create a GEM CMA object without allocating memory
* @drm: DRM device
* @size: size of the object to allocate
+ * @private: true if used for internal purposes
*
* This function creates and initializes a GEM CMA object of the given size,
* but doesn't allocate any memory to back the object.
@@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
* error code on failure.
*/
static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, size_t size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
{
struct drm_gem_cma_object *cma_obj;
struct drm_gem_object *gem_obj;
- int ret;
+ int ret = 0;
if (drm->driver->gem_create_object)
gem_obj = drm->driver->gem_create_object(drm, size);
@@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
- ret = drm_gem_object_init(drm, gem_obj, size);
+ if (private) {
+ drm_gem_private_object_init(drm, gem_obj, size);
+
+ /* Always use writecombine for dma-buf mappings */
+ cma_obj->map_noncoherent = false;
+ } else {
+ ret = drm_gem_object_init(drm, gem_obj, size);
+ }
if (ret)
goto error;
@@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
size = round_up(size, PAGE_SIZE);
- cma_obj = __drm_gem_cma_create(drm, size);
+ cma_obj = __drm_gem_cma_create(drm, size, false);
if (IS_ERR(cma_obj))
return cma_obj;
- cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
- GFP_KERNEL | __GFP_NOWARN);
+ if (cma_obj->map_noncoherent) {
+ cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
+ &cma_obj->paddr,
+ DMA_TO_DEVICE,
+ GFP_KERNEL | __GFP_NOWARN);
+ } else {
+ cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
+ GFP_KERNEL | __GFP_NOWARN);
+ }
if (!cma_obj->vaddr) {
drm_dbg(drm, "failed to allocate buffer with size %zu\n",
size);
@@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(-EINVAL);
/* Create a CMA GEM buffer. */
- cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
+ cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
if (IS_ERR(cma_obj))
return ERR_CAST(cma_obj);
@@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
cma_obj = to_drm_gem_cma_obj(obj);
- ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
- cma_obj->paddr, vma->vm_end - vma->vm_start);
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+ if (!cma_obj->map_noncoherent)
+ vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+ ret = dma_mmap_pages(cma_obj->base.dev->dev,
+ vma, vma->vm_end - vma->vm_start,
+ virt_to_page(cma_obj->vaddr));
if (ret)
drm_gem_vm_close(vma);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 0a9711caa3e8..cd13508acbc1 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
* more than one entry but they are guaranteed to have contiguous
* DMA addresses.
* @vaddr: kernel virtual address of the backing memory
+ * @map_noncoherent: if true, the GEM object is backed by non-coherent memory
*/
struct drm_gem_cma_object {
struct drm_gem_object base;
@@ -24,6 +25,8 @@ struct drm_gem_cma_object {
/* For objects with DMA memory allocated by GEM CMA */
void *vaddr;
+
+ bool map_noncoherent;
};
#define to_drm_gem_cma_obj(gem_obj) \
--
2.30.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory
@ 2021-05-15 14:53 ` Paul Cercueil
0 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Paul Cercueil, linux-kernel, dri-devel, linux-mips,
Christoph Hellwig, list
Having GEM buffers backed by non-coherent memory is interesting in the
particular case where it is faster to render to a non-coherent buffer
then sync the data cache, than to render to a write-combine buffer, and
(by extension) much faster than using a shadow buffer. This is true for
instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
are about three times faster using this method.
Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
can be set by the drivers when they create the dumb buffer.
Since this really only applies to software rendering, disable this flag
as soon as the CMA objects are exported via PRIME.
v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
the objects are mapped, and use the new dma_mmap_pages function.
v4: Make sure map_noncoherent is always disabled when creating GEM
objects meant to be used with dma-buf.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
include/drm/drm_gem_cma_helper.h | 3 +++
2 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 7942cf05cd93..235c7a63da2b 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
* __drm_gem_cma_create - Create a GEM CMA object without allocating memory
* @drm: DRM device
* @size: size of the object to allocate
+ * @private: true if used for internal purposes
*
* This function creates and initializes a GEM CMA object of the given size,
* but doesn't allocate any memory to back the object.
@@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
* error code on failure.
*/
static struct drm_gem_cma_object *
-__drm_gem_cma_create(struct drm_device *drm, size_t size)
+__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
{
struct drm_gem_cma_object *cma_obj;
struct drm_gem_object *gem_obj;
- int ret;
+ int ret = 0;
if (drm->driver->gem_create_object)
gem_obj = drm->driver->gem_create_object(drm, size);
@@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
- ret = drm_gem_object_init(drm, gem_obj, size);
+ if (private) {
+ drm_gem_private_object_init(drm, gem_obj, size);
+
+ /* Always use writecombine for dma-buf mappings */
+ cma_obj->map_noncoherent = false;
+ } else {
+ ret = drm_gem_object_init(drm, gem_obj, size);
+ }
if (ret)
goto error;
@@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
size = round_up(size, PAGE_SIZE);
- cma_obj = __drm_gem_cma_create(drm, size);
+ cma_obj = __drm_gem_cma_create(drm, size, false);
if (IS_ERR(cma_obj))
return cma_obj;
- cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
- GFP_KERNEL | __GFP_NOWARN);
+ if (cma_obj->map_noncoherent) {
+ cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
+ &cma_obj->paddr,
+ DMA_TO_DEVICE,
+ GFP_KERNEL | __GFP_NOWARN);
+ } else {
+ cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
+ GFP_KERNEL | __GFP_NOWARN);
+ }
if (!cma_obj->vaddr) {
drm_dbg(drm, "failed to allocate buffer with size %zu\n",
size);
@@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(-EINVAL);
/* Create a CMA GEM buffer. */
- cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
+ cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
if (IS_ERR(cma_obj))
return ERR_CAST(cma_obj);
@@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
cma_obj = to_drm_gem_cma_obj(obj);
- ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
- cma_obj->paddr, vma->vm_end - vma->vm_start);
+ vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+ if (!cma_obj->map_noncoherent)
+ vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
+
+ ret = dma_mmap_pages(cma_obj->base.dev->dev,
+ vma, vma->vm_end - vma->vm_start,
+ virt_to_page(cma_obj->vaddr));
if (ret)
drm_gem_vm_close(vma);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index 0a9711caa3e8..cd13508acbc1 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
* more than one entry but they are guaranteed to have contiguous
* DMA addresses.
* @vaddr: kernel virtual address of the backing memory
+ * @map_noncoherent: if true, the GEM object is backed by non-coherent memory
*/
struct drm_gem_cma_object {
struct drm_gem_object base;
@@ -24,6 +25,8 @@ struct drm_gem_cma_object {
/* For objects with DMA memory allocated by GEM CMA */
void *vaddr;
+
+ bool map_noncoherent;
};
#define to_drm_gem_cma_obj(gem_obj) \
--
2.30.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data
2021-05-15 14:53 ` Paul Cercueil
@ 2021-05-15 14:53 ` Paul Cercueil
-1 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
Paul Cercueil
This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.
v3: - Only sync data if using GEM objects backed by non-coherent memory.
- Use a drm_device pointer instead of device pointer in prototype
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
drivers/gpu/drm/drm_gem_cma_helper.c | 55 ++++++++++++++++++++++++++++
include/drm/drm_gem_cma_helper.h | 5 +++
2 files changed, 60 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 235c7a63da2b..41f309e0e049 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -17,9 +17,14 @@
#include <linux/slab.h>
#include <drm/drm.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_device.h>
#include <drm/drm_drv.h>
+#include <drm/drm_fourcc.h>
+#include <drm/drm_fb_cma_helper.h>
+#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_plane.h>
#include <drm/drm_vma_manager.h>
/**
@@ -576,3 +581,53 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
return obj;
}
EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
+
+/**
+ * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
+ * @drm: DRM device
+ * @old_state: Old plane state
+ * @state: New plane state
+ *
+ * This function can be used by drivers that use damage clips and have
+ * CMA GEM objects backed by non-coherent memory. Calling this function
+ * in a plane's .atomic_update ensures that all the data in the backing
+ * memory have been written to RAM.
+ */
+void drm_gem_cma_sync_data(struct drm_device *drm,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state)
+{
+ const struct drm_format_info *finfo = state->fb->format;
+ struct drm_atomic_helper_damage_iter iter;
+ const struct drm_gem_cma_object *cma_obj;
+ unsigned int offset, i;
+ struct drm_rect clip;
+ dma_addr_t daddr;
+
+ for (i = 0; i < finfo->num_planes; i++) {
+ cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
+
+ if (cma_obj->map_noncoherent)
+ break;
+ }
+
+ /* No non-coherent buffers - no need to sync anything. */
+ if (i == finfo->num_planes)
+ return;
+
+ drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+
+ drm_atomic_for_each_plane_damage(&iter, &clip) {
+ for (i = 0; i < finfo->num_planes; i++) {
+ daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
+
+ /* Ignore x1/x2 values, invalidate complete lines */
+ offset = clip.y1 * state->fb->pitches[i];
+
+ dma_sync_single_for_device(drm->dev, daddr + offset,
+ (clip.y2 - clip.y1) * state->fb->pitches[i],
+ DMA_TO_DEVICE);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index cd13508acbc1..76af066ae3a7 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -7,6 +7,7 @@
#include <drm/drm_gem.h>
struct drm_mode_create_dumb;
+struct drm_plane_state;
/**
* struct drm_gem_cma_object - GEM object backed by CMA memory allocations
@@ -185,4 +186,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
struct dma_buf_attachment *attach,
struct sg_table *sgt);
+void drm_gem_cma_sync_data(struct drm_device *drm,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state);
+
#endif /* __DRM_GEM_CMA_HELPER_H__ */
--
2.30.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data
@ 2021-05-15 14:53 ` Paul Cercueil
0 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Paul Cercueil, linux-kernel, dri-devel, linux-mips,
Christoph Hellwig, list
This function can be used by drivers that use damage clips and have
CMA GEM objects backed by non-coherent memory. Calling this function
in a plane's .atomic_update ensures that all the data in the backing
memory have been written to RAM.
v3: - Only sync data if using GEM objects backed by non-coherent memory.
- Use a drm_device pointer instead of device pointer in prototype
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
drivers/gpu/drm/drm_gem_cma_helper.c | 55 ++++++++++++++++++++++++++++
include/drm/drm_gem_cma_helper.h | 5 +++
2 files changed, 60 insertions(+)
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index 235c7a63da2b..41f309e0e049 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -17,9 +17,14 @@
#include <linux/slab.h>
#include <drm/drm.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_device.h>
#include <drm/drm_drv.h>
+#include <drm/drm_fourcc.h>
+#include <drm/drm_fb_cma_helper.h>
+#include <drm/drm_framebuffer.h>
#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_plane.h>
#include <drm/drm_vma_manager.h>
/**
@@ -576,3 +581,53 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
return obj;
}
EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
+
+/**
+ * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
+ * @drm: DRM device
+ * @old_state: Old plane state
+ * @state: New plane state
+ *
+ * This function can be used by drivers that use damage clips and have
+ * CMA GEM objects backed by non-coherent memory. Calling this function
+ * in a plane's .atomic_update ensures that all the data in the backing
+ * memory have been written to RAM.
+ */
+void drm_gem_cma_sync_data(struct drm_device *drm,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state)
+{
+ const struct drm_format_info *finfo = state->fb->format;
+ struct drm_atomic_helper_damage_iter iter;
+ const struct drm_gem_cma_object *cma_obj;
+ unsigned int offset, i;
+ struct drm_rect clip;
+ dma_addr_t daddr;
+
+ for (i = 0; i < finfo->num_planes; i++) {
+ cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
+
+ if (cma_obj->map_noncoherent)
+ break;
+ }
+
+ /* No non-coherent buffers - no need to sync anything. */
+ if (i == finfo->num_planes)
+ return;
+
+ drm_atomic_helper_damage_iter_init(&iter, old_state, state);
+
+ drm_atomic_for_each_plane_damage(&iter, &clip) {
+ for (i = 0; i < finfo->num_planes; i++) {
+ daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
+
+ /* Ignore x1/x2 values, invalidate complete lines */
+ offset = clip.y1 * state->fb->pitches[i];
+
+ dma_sync_single_for_device(drm->dev, daddr + offset,
+ (clip.y2 - clip.y1) * state->fb->pitches[i],
+ DMA_TO_DEVICE);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
index cd13508acbc1..76af066ae3a7 100644
--- a/include/drm/drm_gem_cma_helper.h
+++ b/include/drm/drm_gem_cma_helper.h
@@ -7,6 +7,7 @@
#include <drm/drm_gem.h>
struct drm_mode_create_dumb;
+struct drm_plane_state;
/**
* struct drm_gem_cma_object - GEM object backed by CMA memory allocations
@@ -185,4 +186,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
struct dma_buf_attachment *attach,
struct sg_table *sgt);
+void drm_gem_cma_sync_data(struct drm_device *drm,
+ struct drm_plane_state *old_state,
+ struct drm_plane_state *state);
+
#endif /* __DRM_GEM_CMA_HELPER_H__ */
--
2.30.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
2021-05-15 14:53 ` Paul Cercueil
@ 2021-05-15 14:53 ` Paul Cercueil
-1 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips,
Paul Cercueil
Alloc GEM buffers backed by noncoherent memory on SoCs where it is
actually faster than write-combine.
This dramatically speeds up software rendering on these SoCs, even for
tasks where write-combine memory should in theory be faster (e.g. simple
blits).
v3: The option is now selected per-SoC instead of being a module
parameter.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 ++++++++++++++++++++++-
drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++++--
2 files changed, 68 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 09225b770bb8..5f64e8583eec 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -9,6 +9,7 @@
#include <linux/component.h>
#include <linux/clk.h>
#include <linux/dma-mapping.h>
+#include <linux/io.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/of_device.h>
@@ -23,6 +24,7 @@
#include <drm/drm_color_mgmt.h>
#include <drm/drm_crtc.h>
#include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_drv.h>
#include <drm/drm_encoder.h>
#include <drm/drm_gem_cma_helper.h>
@@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
struct jz_soc_info {
bool needs_dev_clk;
bool has_osd;
+ bool map_noncoherent;
unsigned int max_width, max_height;
const u32 *formats_f0, *formats_f1;
unsigned int num_formats_f0, num_formats_f1;
@@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
old_plane_state->fb->format->format != new_plane_state->fb->format->format))
crtc_state->mode_changed = true;
+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
return 0;
}
@@ -544,8 +549,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
struct drm_atomic_state *state)
{
struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
- struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
- plane);
+ struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+ struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
struct drm_crtc_state *crtc_state;
struct ingenic_dma_hwdesc *hwdesc;
unsigned int width, height, cpp, offset;
@@ -553,6 +558,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
u32 fourcc;
if (newstate && newstate->fb) {
+ drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
+
crtc_state = newstate->crtc->state;
addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
}
+static int ingenic_drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
+ struct drm_file *file_priv,
+ unsigned int flags,
+ unsigned int color,
+ struct drm_clip_rect *clips,
+ unsigned int num_clips)
+{
+ struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
+
+ if (!priv->soc_info->map_noncoherent)
+ return 0;
+
+ return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
+ color, clips, num_clips);
+}
+
+static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs = {
+ .destroy = drm_gem_fb_destroy,
+ .create_handle = drm_gem_fb_create_handle,
+ .dirty = ingenic_drm_atomic_helper_dirtyfb,
+};
+
+static struct drm_gem_object *
+ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
+{
+ struct ingenic_drm *priv = drm_device_get_priv(drm);
+ struct drm_gem_cma_object *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ obj->map_noncoherent = priv->soc_info->map_noncoherent;
+
+ return &obj->base;
+}
+
DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
static const struct drm_driver ingenic_drm_driver_data = {
@@ -754,6 +798,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
.patchlevel = 0,
.fops = &ingenic_drm_fops,
+ .gem_create_object = ingenic_drm_gem_create_object,
DRM_GEM_CMA_DRIVER_OPS,
.irq_handler = ingenic_drm_irq_handler,
@@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
return ret;
}
+ drm_plane_enable_fb_damage_clips(&priv->f1);
+
drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
return ret;
}
+ drm_plane_enable_fb_damage_clips(&priv->f0);
+
if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
ret = component_bind_all(dev, drm);
if (ret) {
@@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
static const struct jz_soc_info jz4740_soc_info = {
.needs_dev_clk = true,
.has_osd = false,
+ .map_noncoherent = false,
.max_width = 800,
.max_height = 600,
.formats_f1 = jz4740_formats,
@@ -1255,6 +1305,7 @@ static const struct jz_soc_info jz4740_soc_info = {
static const struct jz_soc_info jz4725b_soc_info = {
.needs_dev_clk = false,
.has_osd = true,
+ .map_noncoherent = false,
.max_width = 800,
.max_height = 600,
.formats_f1 = jz4725b_formats_f1,
@@ -1266,6 +1317,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
static const struct jz_soc_info jz4770_soc_info = {
.needs_dev_clk = false,
.has_osd = true,
+ .map_noncoherent = true,
.max_width = 1280,
.max_height = 720,
.formats_f1 = jz4770_formats_f1,
diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 3b1091e7c0cd..a4d1b500c3ad 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -20,10 +20,13 @@
#include <drm/drm_atomic.h>
#include <drm/drm_atomic_helper.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_drv.h>
#include <drm/drm_fb_cma_helper.h>
#include <drm/drm_fourcc.h>
#include <drm/drm_gem_atomic_helper.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_gem_framebuffer_helper.h>
#include <drm/drm_plane.h>
#include <drm/drm_plane_helper.h>
#include <drm/drm_property.h>
@@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
struct drm_atomic_state *state)
{
struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
- struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
- plane);
+ struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+ struct drm_plane_state *oldstate = drm_atomic_get_new_plane_state(state, plane);
const struct drm_format_info *finfo;
u32 ctrl, stride = 0, coef_index = 0, format = 0;
bool needs_modeset, upscaling_w, upscaling_h;
@@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
}
+ drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
+
/* New addresses will be committed in vblank handler... */
ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
if (finfo->num_planes > 1)
@@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
if (!new_plane_state->crtc ||
!crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
- return 0;
+ goto out_check_damage;
/* Plane must be fully visible */
if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
return -EINVAL;
if (!osd_changed(new_plane_state, old_plane_state))
- return 0;
+ goto out_check_damage;
crtc_state->mode_changed = true;
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
ipu->denom_w = denom_w;
ipu->denom_h = denom_h;
+out_check_damage:
+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
return 0;
}
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d)
return err;
}
+ drm_plane_enable_fb_damage_clips(plane);
+
/*
* Sharpness settings range is [0,32]
* 0 : nearest-neighbor
--
2.30.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
@ 2021-05-15 14:53 ` Paul Cercueil
0 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 14:53 UTC (permalink / raw)
To: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
David Airlie, Daniel Vetter
Cc: Paul Cercueil, linux-kernel, dri-devel, linux-mips,
Christoph Hellwig, list
Alloc GEM buffers backed by noncoherent memory on SoCs where it is
actually faster than write-combine.
This dramatically speeds up software rendering on these SoCs, even for
tasks where write-combine memory should in theory be faster (e.g. simple
blits).
v3: The option is now selected per-SoC instead of being a module
parameter.
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
---
drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 ++++++++++++++++++++++-
drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++++--
2 files changed, 68 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index 09225b770bb8..5f64e8583eec 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -9,6 +9,7 @@
#include <linux/component.h>
#include <linux/clk.h>
#include <linux/dma-mapping.h>
+#include <linux/io.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/of_device.h>
@@ -23,6 +24,7 @@
#include <drm/drm_color_mgmt.h>
#include <drm/drm_crtc.h>
#include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_drv.h>
#include <drm/drm_encoder.h>
#include <drm/drm_gem_cma_helper.h>
@@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
struct jz_soc_info {
bool needs_dev_clk;
bool has_osd;
+ bool map_noncoherent;
unsigned int max_width, max_height;
const u32 *formats_f0, *formats_f1;
unsigned int num_formats_f0, num_formats_f1;
@@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
old_plane_state->fb->format->format != new_plane_state->fb->format->format))
crtc_state->mode_changed = true;
+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
return 0;
}
@@ -544,8 +549,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
struct drm_atomic_state *state)
{
struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
- struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
- plane);
+ struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+ struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
struct drm_crtc_state *crtc_state;
struct ingenic_dma_hwdesc *hwdesc;
unsigned int width, height, cpp, offset;
@@ -553,6 +558,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
u32 fourcc;
if (newstate && newstate->fb) {
+ drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
+
crtc_state = newstate->crtc->state;
addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
@@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
}
+static int ingenic_drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
+ struct drm_file *file_priv,
+ unsigned int flags,
+ unsigned int color,
+ struct drm_clip_rect *clips,
+ unsigned int num_clips)
+{
+ struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
+
+ if (!priv->soc_info->map_noncoherent)
+ return 0;
+
+ return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
+ color, clips, num_clips);
+}
+
+static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs = {
+ .destroy = drm_gem_fb_destroy,
+ .create_handle = drm_gem_fb_create_handle,
+ .dirty = ingenic_drm_atomic_helper_dirtyfb,
+};
+
+static struct drm_gem_object *
+ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
+{
+ struct ingenic_drm *priv = drm_device_get_priv(drm);
+ struct drm_gem_cma_object *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ obj->map_noncoherent = priv->soc_info->map_noncoherent;
+
+ return &obj->base;
+}
+
DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
static const struct drm_driver ingenic_drm_driver_data = {
@@ -754,6 +798,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
.patchlevel = 0,
.fops = &ingenic_drm_fops,
+ .gem_create_object = ingenic_drm_gem_create_object,
DRM_GEM_CMA_DRIVER_OPS,
.irq_handler = ingenic_drm_irq_handler,
@@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
return ret;
}
+ drm_plane_enable_fb_damage_clips(&priv->f1);
+
drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
@@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
return ret;
}
+ drm_plane_enable_fb_damage_clips(&priv->f0);
+
if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
ret = component_bind_all(dev, drm);
if (ret) {
@@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
static const struct jz_soc_info jz4740_soc_info = {
.needs_dev_clk = true,
.has_osd = false,
+ .map_noncoherent = false,
.max_width = 800,
.max_height = 600,
.formats_f1 = jz4740_formats,
@@ -1255,6 +1305,7 @@ static const struct jz_soc_info jz4740_soc_info = {
static const struct jz_soc_info jz4725b_soc_info = {
.needs_dev_clk = false,
.has_osd = true,
+ .map_noncoherent = false,
.max_width = 800,
.max_height = 600,
.formats_f1 = jz4725b_formats_f1,
@@ -1266,6 +1317,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
static const struct jz_soc_info jz4770_soc_info = {
.needs_dev_clk = false,
.has_osd = true,
+ .map_noncoherent = true,
.max_width = 1280,
.max_height = 720,
.formats_f1 = jz4770_formats_f1,
diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
index 3b1091e7c0cd..a4d1b500c3ad 100644
--- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
+++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
@@ -20,10 +20,13 @@
#include <drm/drm_atomic.h>
#include <drm/drm_atomic_helper.h>
+#include <drm/drm_damage_helper.h>
#include <drm/drm_drv.h>
#include <drm/drm_fb_cma_helper.h>
#include <drm/drm_fourcc.h>
#include <drm/drm_gem_atomic_helper.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_gem_framebuffer_helper.h>
#include <drm/drm_plane.h>
#include <drm/drm_plane_helper.h>
#include <drm/drm_property.h>
@@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
struct drm_atomic_state *state)
{
struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
- struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
- plane);
+ struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
+ struct drm_plane_state *oldstate = drm_atomic_get_new_plane_state(state, plane);
const struct drm_format_info *finfo;
u32 ctrl, stride = 0, coef_index = 0, format = 0;
bool needs_modeset, upscaling_w, upscaling_h;
@@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
}
+ drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
+
/* New addresses will be committed in vblank handler... */
ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
if (finfo->num_planes > 1)
@@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
if (!new_plane_state->crtc ||
!crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
- return 0;
+ goto out_check_damage;
/* Plane must be fully visible */
if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
@@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
return -EINVAL;
if (!osd_changed(new_plane_state, old_plane_state))
- return 0;
+ goto out_check_damage;
crtc_state->mode_changed = true;
@@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
ipu->denom_w = denom_w;
ipu->denom_h = denom_h;
+out_check_damage:
+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
+
return 0;
}
@@ -773,6 +781,8 @@ static int ingenic_ipu_bind(struct device *dev, struct device *master, void *d)
return err;
}
+ drm_plane_enable_fb_damage_clips(plane);
+
/*
* Sharpness settings range is [0,32]
* 0 : nearest-neighbor
--
2.30.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory
2021-05-15 14:53 ` Paul Cercueil
@ 2021-05-15 18:49 ` Thomas Zimmermann
-1 siblings, 0 replies; 16+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 18:49 UTC (permalink / raw)
To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
Daniel Vetter
Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips
[-- Attachment #1.1: Type: text/plain, Size: 5916 bytes --]
Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> Having GEM buffers backed by non-coherent memory is interesting in the
> particular case where it is faster to render to a non-coherent buffer
> then sync the data cache, than to render to a write-combine buffer, and
> (by extension) much faster than using a shadow buffer. This is true for
> instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
> are about three times faster using this method.
>
> Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
> can be set by the drivers when they create the dumb buffer.
>
> Since this really only applies to software rendering, disable this flag
> as soon as the CMA objects are exported via PRIME.
>
> v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
> the objects are mapped, and use the new dma_mmap_pages function.
>
> v4: Make sure map_noncoherent is always disabled when creating GEM
> objects meant to be used with dma-buf.
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
> drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
> include/drm/drm_gem_cma_helper.h | 3 +++
> 2 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 7942cf05cd93..235c7a63da2b 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
> * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
> * @drm: DRM device
> * @size: size of the object to allocate
> + * @private: true if used for internal purposes
> *
> * This function creates and initializes a GEM CMA object of the given size,
> * but doesn't allocate any memory to back the object.
> @@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
> * error code on failure.
> */
> static struct drm_gem_cma_object *
> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
> +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
> {
> struct drm_gem_cma_object *cma_obj;
> struct drm_gem_object *gem_obj;
> - int ret;
> + int ret = 0;
>
> if (drm->driver->gem_create_object)
> gem_obj = drm->driver->gem_create_object(drm, size);
> @@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
>
> cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
>
> - ret = drm_gem_object_init(drm, gem_obj, size);
> + if (private) {
> + drm_gem_private_object_init(drm, gem_obj, size);
> +
> + /* Always use writecombine for dma-buf mappings */
> + cma_obj->map_noncoherent = false;
> + } else {
> + ret = drm_gem_object_init(drm, gem_obj, size);
> + }
> if (ret)
> goto error;
>
> @@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
>
> size = round_up(size, PAGE_SIZE);
>
> - cma_obj = __drm_gem_cma_create(drm, size);
> + cma_obj = __drm_gem_cma_create(drm, size, false);
> if (IS_ERR(cma_obj))
> return cma_obj;
>
> - cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> - GFP_KERNEL | __GFP_NOWARN);
> + if (cma_obj->map_noncoherent) {
> + cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
> + &cma_obj->paddr,
> + DMA_TO_DEVICE,
> + GFP_KERNEL | __GFP_NOWARN);
> + } else {
> + cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> + GFP_KERNEL | __GFP_NOWARN);
> + }
> if (!cma_obj->vaddr) {
> drm_dbg(drm, "failed to allocate buffer with size %zu\n",
> size);
> @@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device
*dev,
> return ERR_PTR(-EINVAL);
>
> /* Create a CMA GEM buffer. */
> - cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
> + cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
> if (IS_ERR(cma_obj))
> return ERR_CAST(cma_obj);
>
> @@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
>
> cma_obj = to_drm_gem_cma_obj(obj);
>
> - ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
> - cma_obj->paddr, vma->vm_end - vma->vm_start);
> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> + if (!cma_obj->map_noncoherent)
> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> + ret = dma_mmap_pages(cma_obj->base.dev->dev,
> + vma, vma->vm_end - vma->vm_start,
> + virt_to_page(cma_obj->vaddr));
> if (ret)
> drm_gem_vm_close(vma);
>
> diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
> index 0a9711caa3e8..cd13508acbc1 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
> * more than one entry but they are guaranteed to have contiguous
> * DMA addresses.
> * @vaddr: kernel virtual address of the backing memory
> + * @map_noncoherent: if true, the GEM object is backed by non-coherent
memory
> */
> struct drm_gem_cma_object {
> struct drm_gem_object base;
> @@ -24,6 +25,8 @@ struct drm_gem_cma_object {
>
> /* For objects with DMA memory allocated by GEM CMA */
> void *vaddr;
> +
> + bool map_noncoherent;
> };
>
> #define to_drm_gem_cma_obj(gem_obj) \
>
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory
@ 2021-05-15 18:49 ` Thomas Zimmermann
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 18:49 UTC (permalink / raw)
To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
Daniel Vetter
Cc: Christoph Hellwig, list, linux-kernel, dri-devel, linux-mips
[-- Attachment #1.1: Type: text/plain, Size: 5916 bytes --]
Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> Having GEM buffers backed by non-coherent memory is interesting in the
> particular case where it is faster to render to a non-coherent buffer
> then sync the data cache, than to render to a write-combine buffer, and
> (by extension) much faster than using a shadow buffer. This is true for
> instance on some Ingenic SoCs, where even simple blits (e.g. memcpy)
> are about three times faster using this method.
>
> Add a 'map_noncoherent' flag to the drm_gem_cma_object structure, which
> can be set by the drivers when they create the dumb buffer.
>
> Since this really only applies to software rendering, disable this flag
> as soon as the CMA objects are exported via PRIME.
>
> v3: New patch. Now uses a simple 'map_noncoherent' flag to control how
> the objects are mapped, and use the new dma_mmap_pages function.
>
> v4: Make sure map_noncoherent is always disabled when creating GEM
> objects meant to be used with dma-buf.
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
> drivers/gpu/drm/drm_gem_cma_helper.c | 38 +++++++++++++++++++++-------
> include/drm/drm_gem_cma_helper.h | 3 +++
> 2 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 7942cf05cd93..235c7a63da2b 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -46,6 +46,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
> * __drm_gem_cma_create - Create a GEM CMA object without allocating memory
> * @drm: DRM device
> * @size: size of the object to allocate
> + * @private: true if used for internal purposes
> *
> * This function creates and initializes a GEM CMA object of the given size,
> * but doesn't allocate any memory to back the object.
> @@ -55,11 +56,11 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
> * error code on failure.
> */
> static struct drm_gem_cma_object *
> -__drm_gem_cma_create(struct drm_device *drm, size_t size)
> +__drm_gem_cma_create(struct drm_device *drm, size_t size, bool private)
> {
> struct drm_gem_cma_object *cma_obj;
> struct drm_gem_object *gem_obj;
> - int ret;
> + int ret = 0;
>
> if (drm->driver->gem_create_object)
> gem_obj = drm->driver->gem_create_object(drm, size);
> @@ -73,7 +74,14 @@ __drm_gem_cma_create(struct drm_device *drm, size_t size)
>
> cma_obj = container_of(gem_obj, struct drm_gem_cma_object, base);
>
> - ret = drm_gem_object_init(drm, gem_obj, size);
> + if (private) {
> + drm_gem_private_object_init(drm, gem_obj, size);
> +
> + /* Always use writecombine for dma-buf mappings */
> + cma_obj->map_noncoherent = false;
> + } else {
> + ret = drm_gem_object_init(drm, gem_obj, size);
> + }
> if (ret)
> goto error;
>
> @@ -111,12 +119,19 @@ struct drm_gem_cma_object *drm_gem_cma_create(struct drm_device *drm,
>
> size = round_up(size, PAGE_SIZE);
>
> - cma_obj = __drm_gem_cma_create(drm, size);
> + cma_obj = __drm_gem_cma_create(drm, size, false);
> if (IS_ERR(cma_obj))
> return cma_obj;
>
> - cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> - GFP_KERNEL | __GFP_NOWARN);
> + if (cma_obj->map_noncoherent) {
> + cma_obj->vaddr = dma_alloc_noncoherent(drm->dev, size,
> + &cma_obj->paddr,
> + DMA_TO_DEVICE,
> + GFP_KERNEL | __GFP_NOWARN);
> + } else {
> + cma_obj->vaddr = dma_alloc_wc(drm->dev, size, &cma_obj->paddr,
> + GFP_KERNEL | __GFP_NOWARN);
> + }
> if (!cma_obj->vaddr) {
> drm_dbg(drm, "failed to allocate buffer with size %zu\n",
> size);
> @@ -432,7 +447,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device
*dev,
> return ERR_PTR(-EINVAL);
>
> /* Create a CMA GEM buffer. */
> - cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size);
> + cma_obj = __drm_gem_cma_create(dev, attach->dmabuf->size, true);
> if (IS_ERR(cma_obj))
> return ERR_CAST(cma_obj);
>
> @@ -499,8 +514,13 @@ int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
>
> cma_obj = to_drm_gem_cma_obj(obj);
>
> - ret = dma_mmap_wc(cma_obj->base.dev->dev, vma, cma_obj->vaddr,
> - cma_obj->paddr, vma->vm_end - vma->vm_start);
> + vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
> + if (!cma_obj->map_noncoherent)
> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> +
> + ret = dma_mmap_pages(cma_obj->base.dev->dev,
> + vma, vma->vm_end - vma->vm_start,
> + virt_to_page(cma_obj->vaddr));
> if (ret)
> drm_gem_vm_close(vma);
>
> diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
> index 0a9711caa3e8..cd13508acbc1 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -16,6 +16,7 @@ struct drm_mode_create_dumb;
> * more than one entry but they are guaranteed to have contiguous
> * DMA addresses.
> * @vaddr: kernel virtual address of the backing memory
> + * @map_noncoherent: if true, the GEM object is backed by non-coherent
memory
> */
> struct drm_gem_cma_object {
> struct drm_gem_object base;
> @@ -24,6 +25,8 @@ struct drm_gem_cma_object {
>
> /* For objects with DMA memory allocated by GEM CMA */
> void *vaddr;
> +
> + bool map_noncoherent;
> };
>
> #define to_drm_gem_cma_obj(gem_obj) \
>
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data
2021-05-15 14:53 ` Paul Cercueil
@ 2021-05-15 19:06 ` Thomas Zimmermann
-1 siblings, 0 replies; 16+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 19:06 UTC (permalink / raw)
To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
Daniel Vetter
Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips
[-- Attachment #1.1: Type: text/plain, Size: 5094 bytes --]
Hi
Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> This function can be used by drivers that use damage clips and have
> CMA GEM objects backed by non-coherent memory. Calling this function
> in a plane's .atomic_update ensures that all the data in the backing
> memory have been written to RAM.
>
> v3: - Only sync data if using GEM objects backed by non-coherent memory.
> - Use a drm_device pointer instead of device pointer in prototype
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
> drivers/gpu/drm/drm_gem_cma_helper.c | 55 ++++++++++++++++++++++++++++
> include/drm/drm_gem_cma_helper.h | 5 +++
> 2 files changed, 60 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 235c7a63da2b..41f309e0e049 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -17,9 +17,14 @@
> #include <linux/slab.h>
>
> #include <drm/drm.h>
> +#include <drm/drm_damage_helper.h>
> #include <drm/drm_device.h>
> #include <drm/drm_drv.h>
> +#include <drm/drm_fourcc.h>
> +#include <drm/drm_fb_cma_helper.h>
Alphabetical order:
fb < fourcc
> +#include <drm/drm_framebuffer.h>
> #include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_plane.h>
> #include <drm/drm_vma_manager.h>
>
> /**
> @@ -576,3 +581,53 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
> return obj;
> }
> EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
> +
> +/**
> + * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
> + * @drm: DRM device
> + * @old_state: Old plane state
> + * @state: New plane state
> + *
> + * This function can be used by drivers that use damage clips and have
> + * CMA GEM objects backed by non-coherent memory. Calling this function
> + * in a plane's .atomic_update ensures that all the data in the backing
> + * memory have been written to RAM.
> + */
> +void drm_gem_cma_sync_data(struct drm_device *drm,
> + struct drm_plane_state *old_state,
> + struct drm_plane_state *state)
> +{
> + const struct drm_format_info *finfo = state->fb->format;
> + struct drm_atomic_helper_damage_iter iter;
> + const struct drm_gem_cma_object *cma_obj;
> + unsigned int offset, i;
> + struct drm_rect clip;
> + dma_addr_t daddr;
> +
> + for (i = 0; i < finfo->num_planes; i++) {
> + cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
> +
> + if (cma_obj->map_noncoherent)
> + break;
> + }
> +
> + /* No non-coherent buffers - no need to sync anything. */
> + if (i == finfo->num_planes)
> + return;
> +
> + drm_atomic_helper_damage_iter_init(&iter, old_state, state);
> +
> + drm_atomic_for_each_plane_damage(&iter, &clip) {
> + for (i = 0; i < finfo->num_planes; i++) {
> + daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
> +
> + /* Ignore x1/x2 values, invalidate complete lines */
> + offset = clip.y1 * state->fb->pitches[i];
> +
> + dma_sync_single_for_device(drm->dev, daddr + offset,
> + (clip.y2 - clip.y1) * state->fb->pitches[i],
> + DMA_TO_DEVICE);
A framebuffer can have multiple BOs with different coherency. The
current loop syncs every BO, but you only have to sync non-coherent memory.
I suggest to merge the above test loop into this sync loop, such that
only non-coherent BOs get synced
damage_iter_init(iter)
for_each_damage_plane(iter) {
for (i < finfo->num_planes) {
cma_obj = drm_fb_cma_get_gem_obj(i)
if (!cma_obj->non_coherent)
continue;
dma_sync_single_for_device()
}
}
For cache locality, it might be better to exchange the loops:
for (i < finfo->num_planes) {
damage_iter_init(iter)
for_each_damage_plane(iter) {
}
}
This way, you operate on the BOs one by one.
> + }
> + }
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
> diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
> index cd13508acbc1..76af066ae3a7 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -7,6 +7,7 @@
> #include <drm/drm_gem.h>
>
> struct drm_mode_create_dumb;
> +struct drm_plane_state;
>
> /**
> * struct drm_gem_cma_object - GEM object backed by CMA memory allocations
> @@ -185,4 +186,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
> struct dma_buf_attachment *attach,
> struct sg_table *sgt);
>
> +void drm_gem_cma_sync_data(struct drm_device *drm,
> + struct drm_plane_state *old_state,
> + struct drm_plane_state *state);
> +
Maybe call this function drm_gem_cma_sync_non_coherent() so that it's
clear what the sync is about.
Best regards
Thomas
> #endif /* __DRM_GEM_CMA_HELPER_H__ */
>
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data
@ 2021-05-15 19:06 ` Thomas Zimmermann
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 19:06 UTC (permalink / raw)
To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
Daniel Vetter
Cc: Christoph Hellwig, list, linux-kernel, dri-devel, linux-mips
[-- Attachment #1.1: Type: text/plain, Size: 5094 bytes --]
Hi
Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> This function can be used by drivers that use damage clips and have
> CMA GEM objects backed by non-coherent memory. Calling this function
> in a plane's .atomic_update ensures that all the data in the backing
> memory have been written to RAM.
>
> v3: - Only sync data if using GEM objects backed by non-coherent memory.
> - Use a drm_device pointer instead of device pointer in prototype
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
> drivers/gpu/drm/drm_gem_cma_helper.c | 55 ++++++++++++++++++++++++++++
> include/drm/drm_gem_cma_helper.h | 5 +++
> 2 files changed, 60 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 235c7a63da2b..41f309e0e049 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -17,9 +17,14 @@
> #include <linux/slab.h>
>
> #include <drm/drm.h>
> +#include <drm/drm_damage_helper.h>
> #include <drm/drm_device.h>
> #include <drm/drm_drv.h>
> +#include <drm/drm_fourcc.h>
> +#include <drm/drm_fb_cma_helper.h>
Alphabetical order:
fb < fourcc
> +#include <drm/drm_framebuffer.h>
> #include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_plane.h>
> #include <drm/drm_vma_manager.h>
>
> /**
> @@ -576,3 +581,53 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *dev,
> return obj;
> }
> EXPORT_SYMBOL(drm_gem_cma_prime_import_sg_table_vmap);
> +
> +/**
> + * drm_gem_cma_sync_data - Sync GEM object to non-coherent backing memory
> + * @drm: DRM device
> + * @old_state: Old plane state
> + * @state: New plane state
> + *
> + * This function can be used by drivers that use damage clips and have
> + * CMA GEM objects backed by non-coherent memory. Calling this function
> + * in a plane's .atomic_update ensures that all the data in the backing
> + * memory have been written to RAM.
> + */
> +void drm_gem_cma_sync_data(struct drm_device *drm,
> + struct drm_plane_state *old_state,
> + struct drm_plane_state *state)
> +{
> + const struct drm_format_info *finfo = state->fb->format;
> + struct drm_atomic_helper_damage_iter iter;
> + const struct drm_gem_cma_object *cma_obj;
> + unsigned int offset, i;
> + struct drm_rect clip;
> + dma_addr_t daddr;
> +
> + for (i = 0; i < finfo->num_planes; i++) {
> + cma_obj = drm_fb_cma_get_gem_obj(state->fb, i);
> +
> + if (cma_obj->map_noncoherent)
> + break;
> + }
> +
> + /* No non-coherent buffers - no need to sync anything. */
> + if (i == finfo->num_planes)
> + return;
> +
> + drm_atomic_helper_damage_iter_init(&iter, old_state, state);
> +
> + drm_atomic_for_each_plane_damage(&iter, &clip) {
> + for (i = 0; i < finfo->num_planes; i++) {
> + daddr = drm_fb_cma_get_gem_addr(state->fb, state, i);
> +
> + /* Ignore x1/x2 values, invalidate complete lines */
> + offset = clip.y1 * state->fb->pitches[i];
> +
> + dma_sync_single_for_device(drm->dev, daddr + offset,
> + (clip.y2 - clip.y1) * state->fb->pitches[i],
> + DMA_TO_DEVICE);
A framebuffer can have multiple BOs with different coherency. The
current loop syncs every BO, but you only have to sync non-coherent memory.
I suggest to merge the above test loop into this sync loop, such that
only non-coherent BOs get synced
damage_iter_init(iter)
for_each_damage_plane(iter) {
for (i < finfo->num_planes) {
cma_obj = drm_fb_cma_get_gem_obj(i)
if (!cma_obj->non_coherent)
continue;
dma_sync_single_for_device()
}
}
For cache locality, it might be better to exchange the loops:
for (i < finfo->num_planes) {
damage_iter_init(iter)
for_each_damage_plane(iter) {
}
}
This way, you operate on the BOs one by one.
> + }
> + }
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_cma_sync_data);
> diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
> index cd13508acbc1..76af066ae3a7 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -7,6 +7,7 @@
> #include <drm/drm_gem.h>
>
> struct drm_mode_create_dumb;
> +struct drm_plane_state;
>
> /**
> * struct drm_gem_cma_object - GEM object backed by CMA memory allocations
> @@ -185,4 +186,8 @@ drm_gem_cma_prime_import_sg_table_vmap(struct drm_device *drm,
> struct dma_buf_attachment *attach,
> struct sg_table *sgt);
>
> +void drm_gem_cma_sync_data(struct drm_device *drm,
> + struct drm_plane_state *old_state,
> + struct drm_plane_state *state);
> +
Maybe call this function drm_gem_cma_sync_non_coherent() so that it's
clear what the sync is about.
Best regards
Thomas
> #endif /* __DRM_GEM_CMA_HELPER_H__ */
>
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
2021-05-15 14:53 ` Paul Cercueil
@ 2021-05-15 19:42 ` Thomas Zimmermann
-1 siblings, 0 replies; 16+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 19:42 UTC (permalink / raw)
To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
Daniel Vetter
Cc: Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips
[-- Attachment #1.1: Type: text/plain, Size: 10297 bytes --]
Hi
Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> Alloc GEM buffers backed by noncoherent memory on SoCs where it is
> actually faster than write-combine.
>
> This dramatically speeds up software rendering on these SoCs, even for
> tasks where write-combine memory should in theory be faster (e.g. simple
> blits).
>
> v3: The option is now selected per-SoC instead of being a module
> parameter.
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 ++++++++++++++++++++++-
> drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++++--
> 2 files changed, 68 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> index 09225b770bb8..5f64e8583eec 100644
> --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> @@ -9,6 +9,7 @@
> #include <linux/component.h>
> #include <linux/clk.h>
> #include <linux/dma-mapping.h>
> +#include <linux/io.h>
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/of_device.h>
> @@ -23,6 +24,7 @@
> #include <drm/drm_color_mgmt.h>
> #include <drm/drm_crtc.h>
> #include <drm/drm_crtc_helper.h>
> +#include <drm/drm_damage_helper.h>
> #include <drm/drm_drv.h>
> #include <drm/drm_encoder.h>
> #include <drm/drm_gem_cma_helper.h>
> @@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
> struct jz_soc_info {
> bool needs_dev_clk;
> bool has_osd;
> + bool map_noncoherent;
> unsigned int max_width, max_height;
> const u32 *formats_f0, *formats_f1;
> unsigned int num_formats_f0, num_formats_f1;
> @@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
> old_plane_state->fb->format->format != new_plane_state->fb->format->format))
> crtc_state->mode_changed = true;
>
> + drm_atomic_helper_check_plane_damage(state, new_plane_state);
> +
> return 0;
> }
>
> @@ -544,8 +549,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
> struct drm_atomic_state *state)
> {
> struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
> - struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
> - plane);
> + struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
> + struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
> struct drm_crtc_state *crtc_state;
> struct ingenic_dma_hwdesc *hwdesc;
> unsigned int width, height, cpp, offset;
> @@ -553,6 +558,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
> u32 fourcc;
>
> if (newstate && newstate->fb) {
> + drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
> +
> crtc_state = newstate->crtc->state;
>
> addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
> @@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
> regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
> }
>
> +static int ingenic_drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv,
> + unsigned int flags,
> + unsigned int color,
> + struct drm_clip_rect *clips,
> + unsigned int num_clips)
> +{
> + struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
> +
> + if (!priv->soc_info->map_noncoherent)
> + return 0;
I'm not sure you can get away without calling
drm_atomic_helper_dirtyfb(). The function does some things with the
plane's damage-clips property. If you don't call it here, the plane
might pile up unhandled clipping areas. It's better to call it and rely
on the test in drm_gem_cma_sync_data(). See below on how to optimize this.
> +
> + return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
> + color, clips, num_clips);
> +}
> +
> +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs =
{
> + .destroy = drm_gem_fb_destroy,
> + .create_handle = drm_gem_fb_create_handle,
> + .dirty = ingenic_drm_atomic_helper_dirtyfb,
> +};
You don't seem to be using this anywhere. You have to implement a custom
fb_create for drm_mode_config_funcs. [1]
BUT: I think the overall approach should be to only use this on SoCs
with non-coherency setting. Use drm_gem_fb_create() on systems without
non-coherency and use drm_gem_fb_create_with_dirty() on systems with
non-coherency (i.e., have two instances of drm_mode_config_funcs). Only
call drm_plane_enable_fb_damage_clips() on systems with non-coherency.
> +
> +static struct drm_gem_object *
> +ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
> +{
> + struct ingenic_drm *priv = drm_device_get_priv(drm);
> + struct drm_gem_cma_object *obj;
> +
> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> + if (!obj)
> + return ERR_PTR(-ENOMEM);
> +
> + obj->map_noncoherent = priv->soc_info->map_noncoherent;
> +
> + return &obj->base;
> +}
> +
> DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>
> static const struct drm_driver ingenic_drm_driver_data = {
> @@ -754,6 +798,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
> .patchlevel = 0,
>
> .fops = &ingenic_drm_fops,
> + .gem_create_object = ingenic_drm_gem_create_object,
> DRM_GEM_CMA_DRIVER_OPS,
>
> .irq_handler = ingenic_drm_irq_handler,
> @@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
> return ret;
> }
>
> + drm_plane_enable_fb_damage_clips(&priv->f1);
> +
> drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
>
> ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
> @@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
> return ret;
> }
>
> + drm_plane_enable_fb_damage_clips(&priv->f0);
> +
> if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
> ret = component_bind_all(dev, drm);
> if (ret) {
> @@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
> static const struct jz_soc_info jz4740_soc_info = {
> .needs_dev_clk = true,
> .has_osd = false,
> + .map_noncoherent = false,
> .max_width = 800,
> .max_height = 600,
> .formats_f1 = jz4740_formats,
> @@ -1255,6 +1305,7 @@ static const struct jz_soc_info jz4740_soc_info =
{
> static const struct jz_soc_info jz4725b_soc_info = {
> .needs_dev_clk = false,
> .has_osd = true,
> + .map_noncoherent = false,
> .max_width = 800,
> .max_height = 600,
> .formats_f1 = jz4725b_formats_f1,
> @@ -1266,6 +1317,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
> static const struct jz_soc_info jz4770_soc_info = {
> .needs_dev_clk = false,
> .has_osd = true,
> + .map_noncoherent = true,
> .max_width = 1280,
> .max_height = 720,
> .formats_f1 = jz4770_formats_f1,
> diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
> index 3b1091e7c0cd..a4d1b500c3ad 100644
> --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
> +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
> @@ -20,10 +20,13 @@
>
> #include <drm/drm_atomic.h>
> #include <drm/drm_atomic_helper.h>
> +#include <drm/drm_damage_helper.h>
> #include <drm/drm_drv.h>
> #include <drm/drm_fb_cma_helper.h>
> #include <drm/drm_fourcc.h>
> #include <drm/drm_gem_atomic_helper.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_gem_framebuffer_helper.h>
> #include <drm/drm_plane.h>
> #include <drm/drm_plane_helper.h>
> #include <drm/drm_property.h>
> @@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
> struct drm_atomic_state *state)
> {
> struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
> - struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
> - plane);
> + struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
> + struct drm_plane_state *oldstate = drm_atomic_get_new_plane_state(state, plane);
get_old_state ?
> const struct drm_format_info *finfo;
> u32 ctrl, stride = 0, coef_index = 0, format = 0;
> bool needs_modeset, upscaling_w, upscaling_h;
> @@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
> JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
> }
>
> + drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
> +
If you want to optimize, maybe put this line behind
if (priv->soc_info->map_noncoherent)
> /* New addresses will be committed in vblank handler... */
> ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
> if (finfo->num_planes > 1)
> @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>
> if (!new_plane_state->crtc ||
> !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
> - return 0;
> + goto out_check_damage;
>
> /* Plane must be fully visible */
> if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
> @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
> return -EINVAL;
>
> if (!osd_changed(new_plane_state, old_plane_state))
> - return 0;
> + goto out_check_damage;
>
> crtc_state->mode_changed = true;
>
> @@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
> ipu->denom_w = denom_w;
> ipu->denom_h = denom_h;
>
> +out_check_damage:
> + drm_atomic_helper_check_plane_damage(state, new_plane_state);
> +
If you implement my suggestion above, this line could also be behind
if (priv->soc_info->map_noncoherent)
Best regards
Thomas
[1]
https://elixir.bootlin.com/linux/v5.13-rc1/source/drivers/gpu/drm/ingenic/ingenic-drm-drv.c#L808
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
@ 2021-05-15 19:42 ` Thomas Zimmermann
0 siblings, 0 replies; 16+ messages in thread
From: Thomas Zimmermann @ 2021-05-15 19:42 UTC (permalink / raw)
To: Paul Cercueil, Maarten Lankhorst, Maxime Ripard, David Airlie,
Daniel Vetter
Cc: Christoph Hellwig, list, linux-kernel, dri-devel, linux-mips
[-- Attachment #1.1: Type: text/plain, Size: 10297 bytes --]
Hi
Am 15.05.21 um 16:53 schrieb Paul Cercueil:
> Alloc GEM buffers backed by noncoherent memory on SoCs where it is
> actually faster than write-combine.
>
> This dramatically speeds up software rendering on these SoCs, even for
> tasks where write-combine memory should in theory be faster (e.g. simple
> blits).
>
> v3: The option is now selected per-SoC instead of being a module
> parameter.
>
> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> ---
> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56 ++++++++++++++++++++++-
> drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++++--
> 2 files changed, 68 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> index 09225b770bb8..5f64e8583eec 100644
> --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
> @@ -9,6 +9,7 @@
> #include <linux/component.h>
> #include <linux/clk.h>
> #include <linux/dma-mapping.h>
> +#include <linux/io.h>
> #include <linux/module.h>
> #include <linux/mutex.h>
> #include <linux/of_device.h>
> @@ -23,6 +24,7 @@
> #include <drm/drm_color_mgmt.h>
> #include <drm/drm_crtc.h>
> #include <drm/drm_crtc_helper.h>
> +#include <drm/drm_damage_helper.h>
> #include <drm/drm_drv.h>
> #include <drm/drm_encoder.h>
> #include <drm/drm_gem_cma_helper.h>
> @@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
> struct jz_soc_info {
> bool needs_dev_clk;
> bool has_osd;
> + bool map_noncoherent;
> unsigned int max_width, max_height;
> const u32 *formats_f0, *formats_f1;
> unsigned int num_formats_f0, num_formats_f1;
> @@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct drm_plane *plane,
> old_plane_state->fb->format->format != new_plane_state->fb->format->format))
> crtc_state->mode_changed = true;
>
> + drm_atomic_helper_check_plane_damage(state, new_plane_state);
> +
> return 0;
> }
>
> @@ -544,8 +549,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
> struct drm_atomic_state *state)
> {
> struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
> - struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
> - plane);
> + struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
> + struct drm_plane_state *oldstate = drm_atomic_get_old_plane_state(state, plane);
> struct drm_crtc_state *crtc_state;
> struct ingenic_dma_hwdesc *hwdesc;
> unsigned int width, height, cpp, offset;
> @@ -553,6 +558,8 @@ static void ingenic_drm_plane_atomic_update(struct drm_plane *plane,
> u32 fourcc;
>
> if (newstate && newstate->fb) {
> + drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
> +
> crtc_state = newstate->crtc->state;
>
> addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
> @@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct drm_crtc *crtc)
> regmap_update_bits(priv->map, JZ_REG_LCD_CTRL, JZ_LCD_CTRL_EOF_IRQ, 0);
> }
>
> +static int ingenic_drm_atomic_helper_dirtyfb(struct drm_framebuffer *fb,
> + struct drm_file *file_priv,
> + unsigned int flags,
> + unsigned int color,
> + struct drm_clip_rect *clips,
> + unsigned int num_clips)
> +{
> + struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
> +
> + if (!priv->soc_info->map_noncoherent)
> + return 0;
I'm not sure you can get away without calling
drm_atomic_helper_dirtyfb(). The function does some things with the
plane's damage-clips property. If you don't call it here, the plane
might pile up unhandled clipping areas. It's better to call it and rely
on the test in drm_gem_cma_sync_data(). See below on how to optimize this.
> +
> + return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
> + color, clips, num_clips);
> +}
> +
> +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs =
{
> + .destroy = drm_gem_fb_destroy,
> + .create_handle = drm_gem_fb_create_handle,
> + .dirty = ingenic_drm_atomic_helper_dirtyfb,
> +};
You don't seem to be using this anywhere. You have to implement a custom
fb_create for drm_mode_config_funcs. [1]
BUT: I think the overall approach should be to only use this on SoCs
with non-coherency setting. Use drm_gem_fb_create() on systems without
non-coherency and use drm_gem_fb_create_with_dirty() on systems with
non-coherency (i.e., have two instances of drm_mode_config_funcs). Only
call drm_plane_enable_fb_damage_clips() on systems with non-coherency.
> +
> +static struct drm_gem_object *
> +ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
> +{
> + struct ingenic_drm *priv = drm_device_get_priv(drm);
> + struct drm_gem_cma_object *obj;
> +
> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
> + if (!obj)
> + return ERR_PTR(-ENOMEM);
> +
> + obj->map_noncoherent = priv->soc_info->map_noncoherent;
> +
> + return &obj->base;
> +}
> +
> DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>
> static const struct drm_driver ingenic_drm_driver_data = {
> @@ -754,6 +798,7 @@ static const struct drm_driver ingenic_drm_driver_data = {
> .patchlevel = 0,
>
> .fops = &ingenic_drm_fops,
> + .gem_create_object = ingenic_drm_gem_create_object,
> DRM_GEM_CMA_DRIVER_OPS,
>
> .irq_handler = ingenic_drm_irq_handler,
> @@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
> return ret;
> }
>
> + drm_plane_enable_fb_damage_clips(&priv->f1);
> +
> drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
>
> ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
> @@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev, bool has_components)
> return ret;
> }
>
> + drm_plane_enable_fb_damage_clips(&priv->f0);
> +
> if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
> ret = component_bind_all(dev, drm);
> if (ret) {
> @@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
> static const struct jz_soc_info jz4740_soc_info = {
> .needs_dev_clk = true,
> .has_osd = false,
> + .map_noncoherent = false,
> .max_width = 800,
> .max_height = 600,
> .formats_f1 = jz4740_formats,
> @@ -1255,6 +1305,7 @@ static const struct jz_soc_info jz4740_soc_info =
{
> static const struct jz_soc_info jz4725b_soc_info = {
> .needs_dev_clk = false,
> .has_osd = true,
> + .map_noncoherent = false,
> .max_width = 800,
> .max_height = 600,
> .formats_f1 = jz4725b_formats_f1,
> @@ -1266,6 +1317,7 @@ static const struct jz_soc_info jz4725b_soc_info = {
> static const struct jz_soc_info jz4770_soc_info = {
> .needs_dev_clk = false,
> .has_osd = true,
> + .map_noncoherent = true,
> .max_width = 1280,
> .max_height = 720,
> .formats_f1 = jz4770_formats_f1,
> diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c b/drivers/gpu/drm/ingenic/ingenic-ipu.c
> index 3b1091e7c0cd..a4d1b500c3ad 100644
> --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
> +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
> @@ -20,10 +20,13 @@
>
> #include <drm/drm_atomic.h>
> #include <drm/drm_atomic_helper.h>
> +#include <drm/drm_damage_helper.h>
> #include <drm/drm_drv.h>
> #include <drm/drm_fb_cma_helper.h>
> #include <drm/drm_fourcc.h>
> #include <drm/drm_gem_atomic_helper.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_gem_framebuffer_helper.h>
> #include <drm/drm_plane.h>
> #include <drm/drm_plane_helper.h>
> #include <drm/drm_property.h>
> @@ -285,8 +288,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
> struct drm_atomic_state *state)
> {
> struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
> - struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state,
> - plane);
> + struct drm_plane_state *newstate = drm_atomic_get_new_plane_state(state, plane);
> + struct drm_plane_state *oldstate = drm_atomic_get_new_plane_state(state, plane);
get_old_state ?
> const struct drm_format_info *finfo;
> u32 ctrl, stride = 0, coef_index = 0, format = 0;
> bool needs_modeset, upscaling_w, upscaling_h;
> @@ -317,6 +320,8 @@ static void ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
> JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
> }
>
> + drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
> +
If you want to optimize, maybe put this line behind
if (priv->soc_info->map_noncoherent)
> /* New addresses will be committed in vblank handler... */
> ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
> if (finfo->num_planes > 1)
> @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>
> if (!new_plane_state->crtc ||
> !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
> - return 0;
> + goto out_check_damage;
>
> /* Plane must be fully visible */
> if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
> @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
> return -EINVAL;
>
> if (!osd_changed(new_plane_state, old_plane_state))
> - return 0;
> + goto out_check_damage;
>
> crtc_state->mode_changed = true;
>
> @@ -592,6 +597,9 @@ static int ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
> ipu->denom_w = denom_w;
> ipu->denom_h = denom_h;
>
> +out_check_damage:
> + drm_atomic_helper_check_plane_damage(state, new_plane_state);
> +
If you implement my suggestion above, this line could also be behind
if (priv->soc_info->map_noncoherent)
Best regards
Thomas
[1]
https://elixir.bootlin.com/linux/v5.13-rc1/source/drivers/gpu/drm/ingenic/ingenic-drm-drv.c#L808
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
2021-05-15 19:42 ` Thomas Zimmermann
@ 2021-05-15 20:08 ` Paul Cercueil
-1 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 20:08 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
Christoph Hellwig, list, dri-devel, linux-kernel, linux-mips
Hi Thomas,
Le sam., mai 15 2021 at 21:42:40 +0200, Thomas Zimmermann
<tzimmermann@suse.de> a écrit :
> Hi
>
> Am 15.05.21 um 16:53 schrieb Paul Cercueil:
>> Alloc GEM buffers backed by noncoherent memory on SoCs where it is
>> actually faster than write-combine.
>>
>> This dramatically speeds up software rendering on these SoCs, even
>> for
>> tasks where write-combine memory should in theory be faster (e.g.
>> simple
>> blits).
>>
>> v3: The option is now selected per-SoC instead of being a module
>> parameter.
>>
>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> ---
>> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56
>> ++++++++++++++++++++++-
>> drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++++--
>> 2 files changed, 68 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> index 09225b770bb8..5f64e8583eec 100644
>> --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> @@ -9,6 +9,7 @@
>> #include <linux/component.h>
>> #include <linux/clk.h>
>> #include <linux/dma-mapping.h>
>> +#include <linux/io.h>
>> #include <linux/module.h>
>> #include <linux/mutex.h>
>> #include <linux/of_device.h>
>> @@ -23,6 +24,7 @@
>> #include <drm/drm_color_mgmt.h>
>> #include <drm/drm_crtc.h>
>> #include <drm/drm_crtc_helper.h>
>> +#include <drm/drm_damage_helper.h>
>> #include <drm/drm_drv.h>
>> #include <drm/drm_encoder.h>
>> #include <drm/drm_gem_cma_helper.h>
>> @@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
>> struct jz_soc_info {
>> bool needs_dev_clk;
>> bool has_osd;
>> + bool map_noncoherent;
>> unsigned int max_width, max_height;
>> const u32 *formats_f0, *formats_f1;
>> unsigned int num_formats_f0, num_formats_f1;
>> @@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct
>> drm_plane *plane,
>> old_plane_state->fb->format->format !=
>> new_plane_state->fb->format->format))
>> crtc_state->mode_changed = true;
>> \x7f+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
>> +
>> return 0;
>> }
>> \x7f@@ -544,8 +549,8 @@ static void
>> ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>> struct drm_atomic_state *state)
>> {
>> struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
>> - struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state,
>> - plane);
>> + struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state, plane);
>> + struct drm_plane_state *oldstate =
>> drm_atomic_get_old_plane_state(state, plane);
>> struct drm_crtc_state *crtc_state;
>> struct ingenic_dma_hwdesc *hwdesc;
>> unsigned int width, height, cpp, offset;
>> @@ -553,6 +558,8 @@ static void
>> ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>> u32 fourcc;
>> \x7f if (newstate && newstate->fb) {
>> + drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
>> +
>> crtc_state = newstate->crtc->state;
>> \x7f addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>> @@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct
>> drm_crtc *crtc)
>> regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
>> JZ_LCD_CTRL_EOF_IRQ, 0);
>> }
>> \x7f+static int ingenic_drm_atomic_helper_dirtyfb(struct
>> drm_framebuffer *fb,
>> + struct drm_file *file_priv,
>> + unsigned int flags,
>> + unsigned int color,
>> + struct drm_clip_rect *clips,
>> + unsigned int num_clips)
>> +{
>> + struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
>> +
>> + if (!priv->soc_info->map_noncoherent)
>> + return 0;
>
> I'm not sure you can get away without calling
> drm_atomic_helper_dirtyfb(). The function does some things with the
> plane's damage-clips property. If you don't call it here, the plane
> might pile up unhandled clipping areas. It's better to call it and
> rely on the test in drm_gem_cma_sync_data(). See below on how to
> optimize this.
I guess that would work if I don't enable fb damage clips if
!map_noncoherent, but I'm missing that.
>> +
>> + return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
>> + color, clips, num_clips);
>> +}
>> +
>> +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs
>> = {
>> + .destroy = drm_gem_fb_destroy,
>> + .create_handle = drm_gem_fb_create_handle,
>> + .dirty = ingenic_drm_atomic_helper_dirtyfb,
>> +};
>
> You don't seem to be using this anywhere. You have to implement a
> custom fb_create for drm_mode_config_funcs. [1]
You are totally right, my v2 had a ingenic_drm_gem_fb_create() and it
got dropped somehow.
> BUT: I think the overall approach should be to only use this on SoCs
> with non-coherency setting. Use drm_gem_fb_create() on systems
> without non-coherency and use drm_gem_fb_create_with_dirty() on
> systems with non-coherency (i.e., have two instances of
> drm_mode_config_funcs). Only call drm_plane_enable_fb_damage_clips()
> on systems with non-coherency.
Yes, that was the idea - enable this according to
(priv->soc_info->map_noncoherent) which is set only on the SoCs where
it makes sense.
>> +
>> +static struct drm_gem_object *
>> +ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
>> +{
>> + struct ingenic_drm *priv = drm_device_get_priv(drm);
>> + struct drm_gem_cma_object *obj;
>> +
>> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
>> + if (!obj)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + obj->map_noncoherent = priv->soc_info->map_noncoherent;
>> +
>> + return &obj->base;
>> +}
>> +
>> DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>> \x7f static const struct drm_driver ingenic_drm_driver_data = {
>> @@ -754,6 +798,7 @@ static const struct drm_driver
>> ingenic_drm_driver_data = {
>> .patchlevel = 0,
>> \x7f .fops = &ingenic_drm_fops,
>> + .gem_create_object = ingenic_drm_gem_create_object,
>> DRM_GEM_CMA_DRIVER_OPS,
>> \x7f .irq_handler = ingenic_drm_irq_handler,
>> @@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev,
>> bool has_components)
>> return ret;
>> }
>> \x7f+ drm_plane_enable_fb_damage_clips(&priv->f1);
>> +
>> drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
>> \x7f ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
>> @@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev,
>> bool has_components)
>> return ret;
>> }
>> \x7f+ drm_plane_enable_fb_damage_clips(&priv->f0);
>> +
>> if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
>> ret = component_bind_all(dev, drm);
>> if (ret) {
>> @@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
>> static const struct jz_soc_info jz4740_soc_info = {
>> .needs_dev_clk = true,
>> .has_osd = false,
>> + .map_noncoherent = false,
>> .max_width = 800,
>> .max_height = 600,
>> .formats_f1 = jz4740_formats,
>> @@ -1255,6 +1305,7 @@ static const struct jz_soc_info
>> jz4740_soc_info = {
>> static const struct jz_soc_info jz4725b_soc_info = {
>> .needs_dev_clk = false,
>> .has_osd = true,
>> + .map_noncoherent = false,
>> .max_width = 800,
>> .max_height = 600,
>> .formats_f1 = jz4725b_formats_f1,
>> @@ -1266,6 +1317,7 @@ static const struct jz_soc_info
>> jz4725b_soc_info = {
>> static const struct jz_soc_info jz4770_soc_info = {
>> .needs_dev_clk = false,
>> .has_osd = true,
>> + .map_noncoherent = true,
>> .max_width = 1280,
>> .max_height = 720,
>> .formats_f1 = jz4770_formats_f1,
>> diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> index 3b1091e7c0cd..a4d1b500c3ad 100644
>> --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> @@ -20,10 +20,13 @@
>> \x7f #include <drm/drm_atomic.h>
>> #include <drm/drm_atomic_helper.h>
>> +#include <drm/drm_damage_helper.h>
>> #include <drm/drm_drv.h>
>> #include <drm/drm_fb_cma_helper.h>
>> #include <drm/drm_fourcc.h>
>> #include <drm/drm_gem_atomic_helper.h>
>> +#include <drm/drm_gem_cma_helper.h>
>> +#include <drm/drm_gem_framebuffer_helper.h>
>> #include <drm/drm_plane.h>
>> #include <drm/drm_plane_helper.h>
>> #include <drm/drm_property.h>
>> @@ -285,8 +288,8 @@ static void
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>> struct drm_atomic_state *state)
>> {
>> struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
>> - struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state,
>> - plane);
>> + struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state, plane);
>> + struct drm_plane_state *oldstate =
>> drm_atomic_get_new_plane_state(state, plane);
>
> get_old_state ?
>
>> const struct drm_format_info *finfo;
>> u32 ctrl, stride = 0, coef_index = 0, format = 0;
>> bool needs_modeset, upscaling_w, upscaling_h;
>> @@ -317,6 +320,8 @@ static void
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>> JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
>> }
>> \x7f+ drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
>> +
>
> If you want to optimize, maybe put this line behind
>
> if (priv->soc_info->map_noncoherent)
>
>> /* New addresses will be committed in vblank handler... */
>> ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>> if (finfo->num_planes > 1)
>> @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct
>> drm_plane *plane,
>> \x7f if (!new_plane_state->crtc ||
>> !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
>> - return 0;
>> + goto out_check_damage;
>> \x7f /* Plane must be fully visible */
>> if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
>> @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct
>> drm_plane *plane,
>> return -EINVAL;
>> \x7f if (!osd_changed(new_plane_state, old_plane_state))
>> - return 0;
>> + goto out_check_damage;
>> \x7f crtc_state->mode_changed = true;
>> \x7f@@ -592,6 +597,9 @@ static int
>> ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>> ipu->denom_w = denom_w;
>> ipu->denom_h = denom_h;
>> \x7f+out_check_damage:
>> + drm_atomic_helper_check_plane_damage(state, new_plane_state);
>> +
>
> If you implement my suggestion above, this line could also be behind
>
> if (priv->soc_info->map_noncoherent)
Noted, thanks.
Cheers,
-Paul
> Best regards
>
> Thomas
>
>
>
> [1]
> https://elixir.bootlin.com/linux/v5.13-rc1/source/drivers/gpu/drm/ingenic/ingenic-drm-drv.c#L808
>
>
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers
@ 2021-05-15 20:08 ` Paul Cercueil
0 siblings, 0 replies; 16+ messages in thread
From: Paul Cercueil @ 2021-05-15 20:08 UTC (permalink / raw)
To: Thomas Zimmermann
Cc: David Airlie, linux-kernel, linux-mips, Christoph Hellwig,
dri-devel, list
Hi Thomas,
Le sam., mai 15 2021 at 21:42:40 +0200, Thomas Zimmermann
<tzimmermann@suse.de> a écrit :
> Hi
>
> Am 15.05.21 um 16:53 schrieb Paul Cercueil:
>> Alloc GEM buffers backed by noncoherent memory on SoCs where it is
>> actually faster than write-combine.
>>
>> This dramatically speeds up software rendering on these SoCs, even
>> for
>> tasks where write-combine memory should in theory be faster (e.g.
>> simple
>> blits).
>>
>> v3: The option is now selected per-SoC instead of being a module
>> parameter.
>>
>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> ---
>> drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 56
>> ++++++++++++++++++++++-
>> drivers/gpu/drm/ingenic/ingenic-ipu.c | 18 ++++++--
>> 2 files changed, 68 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> index 09225b770bb8..5f64e8583eec 100644
>> --- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
>> @@ -9,6 +9,7 @@
>> #include <linux/component.h>
>> #include <linux/clk.h>
>> #include <linux/dma-mapping.h>
>> +#include <linux/io.h>
>> #include <linux/module.h>
>> #include <linux/mutex.h>
>> #include <linux/of_device.h>
>> @@ -23,6 +24,7 @@
>> #include <drm/drm_color_mgmt.h>
>> #include <drm/drm_crtc.h>
>> #include <drm/drm_crtc_helper.h>
>> +#include <drm/drm_damage_helper.h>
>> #include <drm/drm_drv.h>
>> #include <drm/drm_encoder.h>
>> #include <drm/drm_gem_cma_helper.h>
>> @@ -57,6 +59,7 @@ struct ingenic_dma_hwdescs {
>> struct jz_soc_info {
>> bool needs_dev_clk;
>> bool has_osd;
>> + bool map_noncoherent;
>> unsigned int max_width, max_height;
>> const u32 *formats_f0, *formats_f1;
>> unsigned int num_formats_f0, num_formats_f1;
>> @@ -410,6 +413,8 @@ static int ingenic_drm_plane_atomic_check(struct
>> drm_plane *plane,
>> old_plane_state->fb->format->format !=
>> new_plane_state->fb->format->format))
>> crtc_state->mode_changed = true;
>> \x7f+ drm_atomic_helper_check_plane_damage(state, new_plane_state);
>> +
>> return 0;
>> }
>> \x7f@@ -544,8 +549,8 @@ static void
>> ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>> struct drm_atomic_state *state)
>> {
>> struct ingenic_drm *priv = drm_device_get_priv(plane->dev);
>> - struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state,
>> - plane);
>> + struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state, plane);
>> + struct drm_plane_state *oldstate =
>> drm_atomic_get_old_plane_state(state, plane);
>> struct drm_crtc_state *crtc_state;
>> struct ingenic_dma_hwdesc *hwdesc;
>> unsigned int width, height, cpp, offset;
>> @@ -553,6 +558,8 @@ static void
>> ingenic_drm_plane_atomic_update(struct drm_plane *plane,
>> u32 fourcc;
>> \x7f if (newstate && newstate->fb) {
>> + drm_gem_cma_sync_data(&priv->drm, oldstate, newstate);
>> +
>> crtc_state = newstate->crtc->state;
>> \x7f addr = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>> @@ -742,6 +749,43 @@ static void ingenic_drm_disable_vblank(struct
>> drm_crtc *crtc)
>> regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
>> JZ_LCD_CTRL_EOF_IRQ, 0);
>> }
>> \x7f+static int ingenic_drm_atomic_helper_dirtyfb(struct
>> drm_framebuffer *fb,
>> + struct drm_file *file_priv,
>> + unsigned int flags,
>> + unsigned int color,
>> + struct drm_clip_rect *clips,
>> + unsigned int num_clips)
>> +{
>> + struct ingenic_drm *priv = drm_device_get_priv(fb->dev);
>> +
>> + if (!priv->soc_info->map_noncoherent)
>> + return 0;
>
> I'm not sure you can get away without calling
> drm_atomic_helper_dirtyfb(). The function does some things with the
> plane's damage-clips property. If you don't call it here, the plane
> might pile up unhandled clipping areas. It's better to call it and
> rely on the test in drm_gem_cma_sync_data(). See below on how to
> optimize this.
I guess that would work if I don't enable fb damage clips if
!map_noncoherent, but I'm missing that.
>> +
>> + return drm_atomic_helper_dirtyfb(fb, file_priv, flags,
>> + color, clips, num_clips);
>> +}
>> +
>> +static const struct drm_framebuffer_funcs ingenic_drm_gem_fb_funcs
>> = {
>> + .destroy = drm_gem_fb_destroy,
>> + .create_handle = drm_gem_fb_create_handle,
>> + .dirty = ingenic_drm_atomic_helper_dirtyfb,
>> +};
>
> You don't seem to be using this anywhere. You have to implement a
> custom fb_create for drm_mode_config_funcs. [1]
You are totally right, my v2 had a ingenic_drm_gem_fb_create() and it
got dropped somehow.
> BUT: I think the overall approach should be to only use this on SoCs
> with non-coherency setting. Use drm_gem_fb_create() on systems
> without non-coherency and use drm_gem_fb_create_with_dirty() on
> systems with non-coherency (i.e., have two instances of
> drm_mode_config_funcs). Only call drm_plane_enable_fb_damage_clips()
> on systems with non-coherency.
Yes, that was the idea - enable this according to
(priv->soc_info->map_noncoherent) which is set only on the SoCs where
it makes sense.
>> +
>> +static struct drm_gem_object *
>> +ingenic_drm_gem_create_object(struct drm_device *drm, size_t size)
>> +{
>> + struct ingenic_drm *priv = drm_device_get_priv(drm);
>> + struct drm_gem_cma_object *obj;
>> +
>> + obj = kzalloc(sizeof(*obj), GFP_KERNEL);
>> + if (!obj)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + obj->map_noncoherent = priv->soc_info->map_noncoherent;
>> +
>> + return &obj->base;
>> +}
>> +
>> DEFINE_DRM_GEM_CMA_FOPS(ingenic_drm_fops);
>> \x7f static const struct drm_driver ingenic_drm_driver_data = {
>> @@ -754,6 +798,7 @@ static const struct drm_driver
>> ingenic_drm_driver_data = {
>> .patchlevel = 0,
>> \x7f .fops = &ingenic_drm_fops,
>> + .gem_create_object = ingenic_drm_gem_create_object,
>> DRM_GEM_CMA_DRIVER_OPS,
>> \x7f .irq_handler = ingenic_drm_irq_handler,
>> @@ -961,6 +1006,8 @@ static int ingenic_drm_bind(struct device *dev,
>> bool has_components)
>> return ret;
>> }
>> \x7f+ drm_plane_enable_fb_damage_clips(&priv->f1);
>> +
>> drm_crtc_helper_add(&priv->crtc, &ingenic_drm_crtc_helper_funcs);
>> \x7f ret = drm_crtc_init_with_planes(drm, &priv->crtc, primary,
>> @@ -989,6 +1036,8 @@ static int ingenic_drm_bind(struct device *dev,
>> bool has_components)
>> return ret;
>> }
>> \x7f+ drm_plane_enable_fb_damage_clips(&priv->f0);
>> +
>> if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && has_components) {
>> ret = component_bind_all(dev, drm);
>> if (ret) {
>> @@ -1245,6 +1294,7 @@ static const u32 jz4770_formats_f0[] = {
>> static const struct jz_soc_info jz4740_soc_info = {
>> .needs_dev_clk = true,
>> .has_osd = false,
>> + .map_noncoherent = false,
>> .max_width = 800,
>> .max_height = 600,
>> .formats_f1 = jz4740_formats,
>> @@ -1255,6 +1305,7 @@ static const struct jz_soc_info
>> jz4740_soc_info = {
>> static const struct jz_soc_info jz4725b_soc_info = {
>> .needs_dev_clk = false,
>> .has_osd = true,
>> + .map_noncoherent = false,
>> .max_width = 800,
>> .max_height = 600,
>> .formats_f1 = jz4725b_formats_f1,
>> @@ -1266,6 +1317,7 @@ static const struct jz_soc_info
>> jz4725b_soc_info = {
>> static const struct jz_soc_info jz4770_soc_info = {
>> .needs_dev_clk = false,
>> .has_osd = true,
>> + .map_noncoherent = true,
>> .max_width = 1280,
>> .max_height = 720,
>> .formats_f1 = jz4770_formats_f1,
>> diff --git a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> index 3b1091e7c0cd..a4d1b500c3ad 100644
>> --- a/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> +++ b/drivers/gpu/drm/ingenic/ingenic-ipu.c
>> @@ -20,10 +20,13 @@
>> \x7f #include <drm/drm_atomic.h>
>> #include <drm/drm_atomic_helper.h>
>> +#include <drm/drm_damage_helper.h>
>> #include <drm/drm_drv.h>
>> #include <drm/drm_fb_cma_helper.h>
>> #include <drm/drm_fourcc.h>
>> #include <drm/drm_gem_atomic_helper.h>
>> +#include <drm/drm_gem_cma_helper.h>
>> +#include <drm/drm_gem_framebuffer_helper.h>
>> #include <drm/drm_plane.h>
>> #include <drm/drm_plane_helper.h>
>> #include <drm/drm_property.h>
>> @@ -285,8 +288,8 @@ static void
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>> struct drm_atomic_state *state)
>> {
>> struct ingenic_ipu *ipu = plane_to_ingenic_ipu(plane);
>> - struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state,
>> - plane);
>> + struct drm_plane_state *newstate =
>> drm_atomic_get_new_plane_state(state, plane);
>> + struct drm_plane_state *oldstate =
>> drm_atomic_get_new_plane_state(state, plane);
>
> get_old_state ?
>
>> const struct drm_format_info *finfo;
>> u32 ctrl, stride = 0, coef_index = 0, format = 0;
>> bool needs_modeset, upscaling_w, upscaling_h;
>> @@ -317,6 +320,8 @@ static void
>> ingenic_ipu_plane_atomic_update(struct drm_plane *plane,
>> JZ_IPU_CTRL_CHIP_EN | JZ_IPU_CTRL_LCDC_SEL);
>> }
>> \x7f+ drm_gem_cma_sync_data(ipu->drm, oldstate, newstate);
>> +
>
> If you want to optimize, maybe put this line behind
>
> if (priv->soc_info->map_noncoherent)
>
>> /* New addresses will be committed in vblank handler... */
>> ipu->addr_y = drm_fb_cma_get_gem_addr(newstate->fb, newstate, 0);
>> if (finfo->num_planes > 1)
>> @@ -541,7 +546,7 @@ static int ingenic_ipu_plane_atomic_check(struct
>> drm_plane *plane,
>> \x7f if (!new_plane_state->crtc ||
>> !crtc_state->mode.hdisplay || !crtc_state->mode.vdisplay)
>> - return 0;
>> + goto out_check_damage;
>> \x7f /* Plane must be fully visible */
>> if (new_plane_state->crtc_x < 0 || new_plane_state->crtc_y < 0 ||
>> @@ -558,7 +563,7 @@ static int ingenic_ipu_plane_atomic_check(struct
>> drm_plane *plane,
>> return -EINVAL;
>> \x7f if (!osd_changed(new_plane_state, old_plane_state))
>> - return 0;
>> + goto out_check_damage;
>> \x7f crtc_state->mode_changed = true;
>> \x7f@@ -592,6 +597,9 @@ static int
>> ingenic_ipu_plane_atomic_check(struct drm_plane *plane,
>> ipu->denom_w = denom_w;
>> ipu->denom_h = denom_h;
>> \x7f+out_check_damage:
>> + drm_atomic_helper_check_plane_damage(state, new_plane_state);
>> +
>
> If you implement my suggestion above, this line could also be behind
>
> if (priv->soc_info->map_noncoherent)
Noted, thanks.
Cheers,
-Paul
> Best regards
>
> Thomas
>
>
>
> [1]
> https://elixir.bootlin.com/linux/v5.13-rc1/source/drivers/gpu/drm/ingenic/ingenic-drm-drv.c#L808
>
>
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-05-15 20:08 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-15 14:53 [PATCH v4 0/3] Add option to mmap GEM buffers cached Paul Cercueil
2021-05-15 14:53 ` Paul Cercueil
2021-05-15 14:53 ` [PATCH v4 1/3] drm: Add support for GEM buffers backed by non-coherent memory Paul Cercueil
2021-05-15 14:53 ` Paul Cercueil
2021-05-15 18:49 ` Thomas Zimmermann
2021-05-15 18:49 ` Thomas Zimmermann
2021-05-15 14:53 ` [PATCH v4 2/3] drm: Add and export function drm_gem_cma_sync_data Paul Cercueil
2021-05-15 14:53 ` Paul Cercueil
2021-05-15 19:06 ` Thomas Zimmermann
2021-05-15 19:06 ` Thomas Zimmermann
2021-05-15 14:53 ` [PATCH v4 3/3] drm/ingenic: Add option to alloc cached GEM buffers Paul Cercueil
2021-05-15 14:53 ` Paul Cercueil
2021-05-15 19:42 ` Thomas Zimmermann
2021-05-15 19:42 ` Thomas Zimmermann
2021-05-15 20:08 ` Paul Cercueil
2021-05-15 20:08 ` Paul Cercueil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.