All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Exynos DRM: add Picture Processor extension
       [not found] <CGME20170420091406eucas1p24c50a0015545105081257d880727386c@eucas1p2.samsung.com>
@ 2017-04-20  9:13 ` Marek Szyprowski
       [not found]   ` <CGME20170420091406eucas1p2ba4648e8e70ecca9c472017c21d654e1@eucas1p2.samsung.com>
                     ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-20  9:13 UTC (permalink / raw)
  To: dri-devel, linux-samsung-soc
  Cc: Marek Szyprowski, Inki Dae, Seung-Woo Kim, Andrzej Hajda,
	Bartlomiej Zolnierkiewicz, Tobias Jakobi, Rob Clark,
	Daniel Vetter

Dear all,

This is an updated proposal for extending EXYNOS DRM API with generic support
for hardware modules, which can be used for processing image data from the
one memory buffer to another. Typical memory-to-memory operations are:
rotation, scaling, colour space conversion or mix of them. This is
a follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
processors", which has been rejected as "not really needed in the DRM core":
http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html

In this proposal I moved all the code to Exynos DRM driver, so now this
will be specific only to Exynos DRM. I've also changed the name from
framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
with fbdev API.

Here is a bit more information what picture processors are:

Embedded SoCs are known to have a number of hardware blocks, which perform
such operations. They can be used in paralel to the main GPU module to
offload CPU from processing grapics or video data. One of example use of
such modules is implementing video overlay, which usually requires color
space conversion from NV12 (or similar) to RGB32 color space and scaling to
target window size.

The proposed API is heavily inspired by atomic KMS approach - it is also
based on DRM objects and their properties. A new DRM object is introduced:
picture processor (called pp for convenience). Such objects have a set of
standard DRM properties, which describes the operation to be performed by
respective hardware module. In typical case those properties are a source
fb id and rectangle (x, y, width, height) and destination fb id and
rectangle. Optionally a rotation property can be also specified if
supported by the given hardware. To perform an operation on image data,
userspace provides a set of properties and their values for given fbproc
object in a similar way as object and properties are provided for
performing atomic page flip / mode setting.

The proposed API consists of the 3 new ioctls:
- DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
  processors,
- DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
  processor,
- DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
  property set.

The proposed API is extensible. Drivers can attach their own, custom
properties to add support for more advanced picture processing (for example
blending).

This proposal aims to replace Exynos DRM IPP (Image Post Processing)
subsystem. IPP API is over-engineered in general, but not really extensible
on the other side. It is also buggy, with significant design flaws - the
biggest issue is the fact that the API covers memory-2-memory picture
operations together with CRTC writeback and duplicating features, which
belongs to video plane. Comparing with IPP subsystem, the PP framework is
smaller (1807 vs 778 lines) and allows driver simplification (Exynos
rotator driver smaller by over 200 lines).

Open questions:
- How to expose pp capabilities and supported formats? Currently this is done
  with a drm_exynos_pp_get structure and DRM_IOCTL_EXYNOS_PP_GET ioctl.
  However one can try to use IMMUTABLE properties for capabilities and src/dst
  format set. Rationale: recently Rob Clark proposed to create a DRM property
  with supported pixelformats and modifiers:
  http://www.spinics.net/lists/dri-devel/msg137380.html
- Is it okay to use DRM objects and properties API (DRM_IOCTL_MODE_GETPROPERTY
  and DRM_IOCTL_MODE_OBJ_GETPROPERTIES ioctls) for this purpose?

TODO:
- convert remaining Exynos DRM IPP drivers (FIMC, GScaller)
- remove Exynos DRM IPP subsystem
- (optional) provide virtual V4L2 mem2mem device on top of Exynos PP framework

Patches were tested on Exynos 4412-based Odroid U3 board, on top of Linux
next-20170420 kernel.

Best regards
Marek Szyprowski
Samsung R&D Institute Poland


Changelog:
v1:
- moved this feature from DRM core to Exynos DRM driver
- changed name from framebuffer processor to picture processor
- simplified code to cover only things needed by Exynos drivers
- implemented simple fifo task scheduler
- cleaned up rotator driver conversion (removed IPP remainings)


v0: http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
- initial post of "[RFC 0/2] New feature: Framebuffer processors"
- generic approach implemented in DRM core, rejected


Patch summary:

Marek Szyprowski (4):
  drm: Export functions to create custom DRM objects
  drm: Add support for vendor specific DRM objects with custom
    properties
  drm/exynos: Add Picture Processor framework
  drm/exynos: Convert Exynos Rotator driver to Picture Processor
    interface

 drivers/gpu/drm/drm_crtc_internal.h         |   4 -
 drivers/gpu/drm/drm_mode_object.c           |  11 +-
 drivers/gpu/drm/drm_property.c              |   2 +-
 drivers/gpu/drm/exynos/Kconfig              |   1 -
 drivers/gpu/drm/exynos/Makefile             |   3 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c     |   9 +
 drivers/gpu/drm/exynos/exynos_drm_drv.h     |  15 +
 drivers/gpu/drm/exynos/exynos_drm_pp.c      | 775 ++++++++++++++++++++++++++++
 drivers/gpu/drm/exynos/exynos_drm_pp.h      | 155 ++++++
 drivers/gpu/drm/exynos/exynos_drm_rotator.c | 513 +++++-------------
 drivers/gpu/drm/exynos/exynos_drm_rotator.h |  19 -
 include/drm/drm_mode_object.h               |   6 +
 include/drm/drm_property.h                  |   7 +
 include/uapi/drm/drm_mode.h                 |   1 +
 include/uapi/drm/exynos_drm.h               |  62 +++
 15 files changed, 1166 insertions(+), 417 deletions(-)
 create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.c
 create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.h
 delete mode 100644 drivers/gpu/drm/exynos/exynos_drm_rotator.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 1/4] drm: Export functions to create custom DRM objects
       [not found]   ` <CGME20170420091406eucas1p2ba4648e8e70ecca9c472017c21d654e1@eucas1p2.samsung.com>
@ 2017-04-20  9:13     ` Marek Szyprowski
  0 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-20  9:13 UTC (permalink / raw)
  To: dri-devel, linux-samsung-soc
  Cc: Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Marek Szyprowski

Make drm_mode_object_add() and drm_mode_object_unregister() functions
public, so the drivers can register their own DRM objects to the core.
Those objects can be queried by generic DRM_IOCTL_MODE_OBJ_GETPROPERTIES
ioctl.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/gpu/drm/drm_crtc_internal.h | 4 ----
 drivers/gpu/drm/drm_mode_object.c   | 2 ++
 include/drm/drm_mode_object.h       | 6 ++++++
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_crtc_internal.h b/drivers/gpu/drm/drm_crtc_internal.h
index d077c5490041..160d489f7240 100644
--- a/drivers/gpu/drm/drm_crtc_internal.h
+++ b/drivers/gpu/drm/drm_crtc_internal.h
@@ -101,14 +101,10 @@ int drm_mode_destroyblob_ioctl(struct drm_device *dev,
 int __drm_mode_object_add(struct drm_device *dev, struct drm_mode_object *obj,
 			  uint32_t obj_type, bool register_obj,
 			  void (*obj_free_cb)(struct kref *kref));
-int drm_mode_object_add(struct drm_device *dev, struct drm_mode_object *obj,
-			uint32_t obj_type);
 void drm_mode_object_register(struct drm_device *dev,
 			      struct drm_mode_object *obj);
 struct drm_mode_object *__drm_mode_object_find(struct drm_device *dev,
 					       uint32_t id, uint32_t type);
-void drm_mode_object_unregister(struct drm_device *dev,
-				struct drm_mode_object *object);
 int drm_mode_object_get_properties(struct drm_mode_object *obj, bool atomic,
 				   uint32_t __user *prop_ptr,
 				   uint64_t __user *prop_values,
diff --git a/drivers/gpu/drm/drm_mode_object.c b/drivers/gpu/drm/drm_mode_object.c
index da9a9adbcc98..052dcabe26af 100644
--- a/drivers/gpu/drm/drm_mode_object.c
+++ b/drivers/gpu/drm/drm_mode_object.c
@@ -73,6 +73,7 @@ int drm_mode_object_add(struct drm_device *dev,
 {
 	return __drm_mode_object_add(dev, obj, obj_type, true, NULL);
 }
+EXPORT_SYMBOL(drm_mode_object_add);
 
 void drm_mode_object_register(struct drm_device *dev,
 			      struct drm_mode_object *obj)
@@ -103,6 +104,7 @@ void drm_mode_object_unregister(struct drm_device *dev,
 	}
 	mutex_unlock(&dev->mode_config.idr_mutex);
 }
+EXPORT_SYMBOL(drm_mode_object_unregister);
 
 struct drm_mode_object *__drm_mode_object_find(struct drm_device *dev,
 					       uint32_t id, uint32_t type)
diff --git a/include/drm/drm_mode_object.h b/include/drm/drm_mode_object.h
index a767b4a30a6d..f91aee0a1705 100644
--- a/include/drm/drm_mode_object.h
+++ b/include/drm/drm_mode_object.h
@@ -112,6 +112,12 @@ struct drm_object_properties {
 		return "(unknown)";				\
 	}
 
+int drm_mode_object_add(struct drm_device *dev,
+			struct drm_mode_object *obj, uint32_t obj_type);
+
+void drm_mode_object_unregister(struct drm_device *dev,
+				struct drm_mode_object *object);
+
 struct drm_mode_object *drm_mode_object_find(struct drm_device *dev,
 					     uint32_t id, uint32_t type);
 void drm_mode_object_get(struct drm_mode_object *obj);
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 2/4] drm: Add support for vendor specific DRM objects with custom properties
       [not found]   ` <CGME20170420091407eucas1p2da1e16aa00e6d0bf8bd305422c3a9ba9@eucas1p2.samsung.com>
@ 2017-04-20  9:13     ` Marek Szyprowski
  0 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-20  9:13 UTC (permalink / raw)
  To: dri-devel, linux-samsung-soc
  Cc: Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Marek Szyprowski

Add a DRM_MODE_PROP_VENDOR flag, which allows to create DRM properties
for vendor, custom DRM objects. This allows to create OBJECT type properties,
which were reserved only for ATOMIC mode sets. This flag is also checked
in drm_object_property_get_value() function to let userspace to get
default value for such properties instead of calling the atomic path.

This change, together with ability of registering custom DRM objects from
the device drivers allows exposing some driver specific entities as DRM
objects, which can be then queried with standard
DRM_IOCTL_MODE_OBJ_GETPROPERTIES and DRM_IOCTL_MODE_GETPROPERTY ioctls.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/gpu/drm/drm_mode_object.c | 9 +++++----
 drivers/gpu/drm/drm_property.c    | 2 +-
 include/drm/drm_property.h        | 7 +++++++
 include/uapi/drm/drm_mode.h       | 1 +
 4 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_mode_object.c b/drivers/gpu/drm/drm_mode_object.c
index 052dcabe26af..3cbefc1a7f4c 100644
--- a/drivers/gpu/drm/drm_mode_object.c
+++ b/drivers/gpu/drm/drm_mode_object.c
@@ -268,12 +268,13 @@ int drm_object_property_get_value(struct drm_mode_object *obj,
 {
 	int i;
 
-	/* read-only properties bypass atomic mechanism and still store
-	 * their value in obj->properties->values[].. mostly to avoid
-	 * having to deal w/ EDID and similar props in atomic paths:
+	/* custom vendor or read-only properties bypass atomic mechanism
+	 * and still store their value in obj->properties->values[].. mostly
+	 * to avoid having to deal w/ EDID and similar props in atomic paths:
 	 */
 	if (drm_drv_uses_atomic_modeset(property->dev) &&
-			!(property->flags & DRM_MODE_PROP_IMMUTABLE))
+			!(property->flags &
+			  (DRM_MODE_PROP_IMMUTABLE | DRM_MODE_PROP_VENDOR)))
 		return drm_atomic_get_property(obj, property, val);
 
 	for (i = 0; i < obj->properties->count; i++) {
diff --git a/drivers/gpu/drm/drm_property.c b/drivers/gpu/drm/drm_property.c
index 3e88fa24eab3..a3fd496665de 100644
--- a/drivers/gpu/drm/drm_property.c
+++ b/drivers/gpu/drm/drm_property.c
@@ -318,7 +318,7 @@ struct drm_property *drm_property_create_object(struct drm_device *dev,
 
 	flags |= DRM_MODE_PROP_OBJECT;
 
-	if (WARN_ON(!(flags & DRM_MODE_PROP_ATOMIC)))
+	if (WARN_ON(!(flags & (DRM_MODE_PROP_ATOMIC | DRM_MODE_PROP_VENDOR))))
 		return NULL;
 
 	property = drm_property_create(dev, flags, name, 1);
diff --git a/include/drm/drm_property.h b/include/drm/drm_property.h
index 13e8c17d1c79..d9a3d6450ffe 100644
--- a/include/drm/drm_property.h
+++ b/include/drm/drm_property.h
@@ -152,6 +152,13 @@ struct drm_property {
 	 *     properties. This is generally used to expose probe state to
 	 *     usersapce, e.g. the EDID, or the connector path property on DP
 	 *     MST sinks.
+	 *
+	 * DRM_MODE_PROP_VENDOR
+	 *     Set for vendor specific properties, for non-modeset vendor
+	 *     specific objects, which can be accessed by
+	 *     DRM_IOCTL_MODE_GETPROPERTY and DRM_IOCTL_MODE_OBJ_GETPROPERTIES,
+	 *     properties are not exposed to legacy userspace.
+	 *
 	 */
 	uint32_t flags;
 
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 8c67fc03d53d..2100afc1328a 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -322,6 +322,7 @@ struct drm_mode_get_connector {
  * witout being aware that this could be triggering a lengthy modeset.
  */
 #define DRM_MODE_PROP_ATOMIC        0x80000000
+#define DRM_MODE_PROP_VENDOR        0x40000000
 
 struct drm_mode_property_enum {
 	__u64 value;
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 3/4] drm/exynos: Add Picture Processor framework
       [not found]   ` <CGME20170420091407eucas1p281bd7bb7f7b45855cf593ec8aed6136a@eucas1p2.samsung.com>
@ 2017-04-20  9:13     ` Marek Szyprowski
  0 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-20  9:13 UTC (permalink / raw)
  To: dri-devel, linux-samsung-soc
  Cc: Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Marek Szyprowski

This patch extends Exynos DRM API with picture processor hardware modules.
Such modules can be used for processing image data from the one memory buffer
to another. Typical memory-to-memory operations are: rotation, scaling, colour
space conversion or mix of them.

The proposed API is heavily inspired by atomic KMS approach - it is also
based on DRM objects and their properties. A new DRM object is introduced:
picture processor (called pp for convenience). Such objects have a set of
standard DRM properties, which describes the operation to be performed by
respective hardware module. In typical case those properties are a source
fb id and rectangle (x, y, width, height) and destination fb id and
rectangle. Optionally a rotation property can be also specified if
supported by the given hardware. To perform an operation on image data,
userspace provides a set of properties and their values for given fbproc
object in a similar way as object and properties are provided for
performing atomic page flip / mode setting.

The proposed API consists of the 3 new ioctls:
- DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
  processors,
- DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
  processor,
- DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
  property set.

The proposed API is extensible. Drivers can attach their own, custom
properties to add support for more advanced picture processing (for example
blending).

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/gpu/drm/exynos/Makefile         |   3 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c |   8 +
 drivers/gpu/drm/exynos/exynos_drm_drv.h |  15 +
 drivers/gpu/drm/exynos/exynos_drm_pp.c  | 775 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/exynos/exynos_drm_pp.h  | 155 +++++++
 include/uapi/drm/exynos_drm.h           |  62 +++
 6 files changed, 1017 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.c
 create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.h

diff --git a/drivers/gpu/drm/exynos/Makefile b/drivers/gpu/drm/exynos/Makefile
index f663490e949d..2632b0ee5d2d 100644
--- a/drivers/gpu/drm/exynos/Makefile
+++ b/drivers/gpu/drm/exynos/Makefile
@@ -3,7 +3,8 @@
 # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
 
 exynosdrm-y := exynos_drm_drv.o exynos_drm_crtc.o exynos_drm_fb.o \
-		exynos_drm_gem.o exynos_drm_core.o exynos_drm_plane.o
+		exynos_drm_gem.o exynos_drm_core.o exynos_drm_plane.o \
+		exynos_drm_pp.o
 
 exynosdrm-$(CONFIG_DRM_FBDEV_EMULATION) += exynos_drm_fbdev.o
 exynosdrm-$(CONFIG_DRM_EXYNOS_IOMMU) += exynos_drm_iommu.o
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 09d3c4c3c858..41942b111285 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -26,6 +26,7 @@
 #include "exynos_drm_fb.h"
 #include "exynos_drm_gem.h"
 #include "exynos_drm_plane.h"
+#include "exynos_drm_pp.h"
 #include "exynos_drm_vidi.h"
 #include "exynos_drm_g2d.h"
 #include "exynos_drm_ipp.h"
@@ -128,6 +129,12 @@ static void exynos_drm_lastclose(struct drm_device *dev)
 			DRM_AUTH | DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(EXYNOS_IPP_CMD_CTRL, exynos_drm_ipp_cmd_ctrl,
 			DRM_AUTH | DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(EXYNOS_PP_GET_RESOURCES, exynos_drm_pp_get_res,
+			DRM_AUTH | DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(EXYNOS_PP_GET, exynos_drm_pp_get,
+			DRM_AUTH | DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(EXYNOS_PP_COMMIT, exynos_drm_pp_commit,
+			DRM_AUTH | DRM_RENDER_ALLOW),
 };
 
 static const struct file_operations exynos_drm_driver_fops = {
@@ -360,6 +367,7 @@ static int exynos_drm_bind(struct device *dev)
 	drm_mode_config_init(drm);
 
 	exynos_drm_mode_config_init(drm);
+	exynos_drm_pp_init(drm);
 
 	/* setup possible_clones. */
 	cnt = 0;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h b/drivers/gpu/drm/exynos/exynos_drm_drv.h
index cb3176930596..7915200f2f7c 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h
@@ -220,6 +220,21 @@ struct exynos_drm_private {
 	u32			pending;
 	spinlock_t		lock;
 	wait_queue_head_t	wait;
+
+	/* for pp api */
+	int num_pp;
+	struct list_head pp_list;
+
+	struct drm_property *pp_src_fb;
+	struct drm_property *pp_src_x;
+	struct drm_property *pp_src_y;
+	struct drm_property *pp_src_w;
+	struct drm_property *pp_src_h;
+	struct drm_property *pp_dst_fb;
+	struct drm_property *pp_dst_x;
+	struct drm_property *pp_dst_y;
+	struct drm_property *pp_dst_w;
+	struct drm_property *pp_dst_h;
 };
 
 static inline struct device *to_dma_dev(struct drm_device *dev)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_pp.c b/drivers/gpu/drm/exynos/exynos_drm_pp.c
new file mode 100644
index 000000000000..18c4738b7679
--- /dev/null
+++ b/drivers/gpu/drm/exynos/exynos_drm_pp.c
@@ -0,0 +1,775 @@
+/*
+ * Copyright (C) 2017 Samsung Electronics Co.Ltd
+ * Authors:
+ *	Marek Szyprowski <m.szyprowski@samsung.com>
+ *
+ * Exynos DRM Picture Processor (PP) related functions
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+
+#include <drm/drmP.h>
+#include <drm/drm_mode.h>
+#include <uapi/drm/exynos_drm.h>
+
+#include "exynos_drm_drv.h"
+#include "exynos_drm_pp.h"
+
+struct drm_pending_exynos_pp_event {
+	struct drm_pending_event base;
+	struct drm_exynos_pp_event event;
+};
+
+/**
+ * exynos_drm_pp_create_properties - Initialize Picture Processor extension
+ * @dev: DRM device
+ */
+int exynos_drm_pp_init(struct drm_device *dev)
+{
+	struct exynos_drm_private *priv = dev->dev_private;
+	struct drm_property *prop;
+
+	INIT_LIST_HEAD(&priv->pp_list);
+
+	prop = drm_property_create_object(dev, DRM_MODE_PROP_VENDOR,
+			"SRC_FB_ID", DRM_MODE_OBJECT_FB);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_src_fb = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"SRC_X", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_src_x = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"SRC_Y", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_src_y = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"SRC_W", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_src_w = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"SRC_H", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_src_h = prop;
+
+	prop = drm_property_create_object(dev, DRM_MODE_PROP_VENDOR,
+			"DST_FB_ID", DRM_MODE_OBJECT_FB);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_dst_fb = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"DST_X", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_dst_x = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"DST_Y", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_dst_y = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"DST_W", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_dst_w = prop;
+
+	prop = drm_property_create_range(dev, DRM_MODE_PROP_VENDOR,
+			"DST_H", 0, UINT_MAX);
+	if (!prop)
+		return -ENOMEM;
+	priv->pp_dst_h = prop;
+
+	return 0;
+}
+
+/**
+ * exynos_drm_pp_register - Register a new picture processor hardware module
+ * @dev: DRM device
+ * @pp: pp module to init
+ * @funcs: callbacks for the new pp object
+ * @caps: bitmask of pp capabilities (%DRM_EXYNOS_PP_CAP_*)
+ * @src_fmts: array of supported source fb formats (%DRM_FORMAT_*)
+ * @src_fmt_count: number of elements in @src_fmts
+ * @dst_fmts: array of supported destination fb formats (%DRM_FORMAT_*)
+ * @dst_fmt_count: number of elements in @dst_fmts
+ * @rotation: a set of supported rotation transformations
+ * @name: printf style format string, or NULL for the default name
+ *
+ * Initializes a pp module.
+ *
+ * Returns:
+ * Zero on success, error code on failure.
+ */
+int exynos_drm_pp_register(struct drm_device *dev, struct exynos_drm_pp *pp,
+		    const struct exynos_drm_pp_funcs *funcs, unsigned int caps,
+		    const uint32_t *src_fmts, unsigned int src_fmt_count,
+		    const uint32_t *dst_fmts, unsigned int dst_fmt_count,
+		    unsigned int rotation, const char *name, ...)
+{
+	static const struct drm_prop_enum_list props[] = {
+		{ __builtin_ffs(DRM_ROTATE_0) - 1,   "rotate-0" },
+		{ __builtin_ffs(DRM_ROTATE_90) - 1,  "rotate-90" },
+		{ __builtin_ffs(DRM_ROTATE_180) - 1, "rotate-180" },
+		{ __builtin_ffs(DRM_ROTATE_270) - 1, "rotate-270" },
+		{ __builtin_ffs(DRM_REFLECT_X) - 1,  "reflect-x" },
+		{ __builtin_ffs(DRM_REFLECT_Y) - 1,  "reflect-y" },
+	};
+	struct exynos_drm_private *priv = dev->dev_private;
+	struct drm_property *prop;
+	int ret;
+
+	ret = drm_mode_object_add(dev, &pp->base, DRM_EXYNOS_OBJECT_PP);
+	if (ret)
+		return ret;
+
+	spin_lock_init(&pp->lock);
+	INIT_LIST_HEAD(&pp->todo_list);
+	init_waitqueue_head(&pp->done_wq);
+	pp->base.properties = &pp->properties;
+	pp->dev = dev;
+	pp->funcs = funcs;
+	pp->capabilities = caps;
+	pp->src_format_count = src_fmt_count;
+	pp->dst_format_count = dst_fmt_count;
+
+	if (name) {
+		va_list ap;
+
+		va_start(ap, name);
+		pp->name = kvasprintf(GFP_KERNEL, name, ap);
+		va_end(ap);
+	} else {
+		pp->name = kasprintf(GFP_KERNEL, "pp-%d",
+					 priv->num_pp);
+	}
+	if (!pp->name)
+		goto free;
+
+	pp->src_format_types = kmemdup(src_fmts,
+				  sizeof(uint32_t) * src_fmt_count, GFP_KERNEL);
+	if (!pp->src_format_types)
+		goto free;
+
+	pp->dst_format_types = kmemdup(dst_fmts,
+				  sizeof(uint32_t) * dst_fmt_count, GFP_KERNEL);
+	if (!pp->dst_format_types)
+		goto free;
+
+	prop = drm_property_create_bitmask(dev, DRM_MODE_PROP_VENDOR,
+					   "rotation", props, ARRAY_SIZE(props),
+					   rotation);
+	if (!prop)
+		goto free;
+
+	pp->rotation_property = prop;
+
+	list_add_tail(&pp->head, &priv->pp_list);
+
+	drm_object_attach_property(&pp->base, priv->pp_src_fb, 0);
+	drm_object_attach_property(&pp->base, priv->pp_src_x, 0);
+	drm_object_attach_property(&pp->base, priv->pp_src_y, 0);
+	drm_object_attach_property(&pp->base, priv->pp_src_w, 0);
+	drm_object_attach_property(&pp->base, priv->pp_src_h, 0);
+	drm_object_attach_property(&pp->base, priv->pp_dst_fb, 0);
+	drm_object_attach_property(&pp->base, priv->pp_dst_x, 0);
+	drm_object_attach_property(&pp->base, priv->pp_dst_y, 0);
+	drm_object_attach_property(&pp->base, priv->pp_dst_w, 0);
+	drm_object_attach_property(&pp->base, priv->pp_dst_h, 0);
+	drm_object_attach_property(&pp->base, prop, DRM_ROTATE_0);
+
+	priv->num_pp++;
+	DRM_DEBUG_DRIVER("Registered pp %d\n", pp->base.id);
+
+	return 0;
+
+free:
+	kfree(pp->dst_format_types);
+	kfree(pp->src_format_types);
+	kfree(pp->name);
+	drm_mode_object_unregister(dev, &pp->base);
+	return -ENOMEM;
+}
+
+/**
+ * exynos_drm_pp_unregister - Unregister the picture processor module
+ * @dev: DRM device
+ * @pp: pp module
+ */
+void exynos_drm_pp_unregister(struct drm_device *dev, struct exynos_drm_pp *pp)
+{
+	BUG_ON(pp->task);
+	BUG_ON(!list_empty(&pp->todo_list));
+
+	kfree(pp->dst_format_types);
+	kfree(pp->src_format_types);
+	kfree(pp->name);
+	drm_mode_object_unregister(dev, &pp->base);
+}
+
+/**
+ * exynos_drm_pp_get_res - enumerate all pp modules
+ * @dev: DRM device
+ * @data: ioctl data
+ * @file_priv: DRM file info
+ *
+ * Construct a list of pp ids to return to the user.
+ *
+ * Called by the user via ioctl.
+ *
+ * Returns:
+ * Zero on success, negative errno on failure.
+ */
+int exynos_drm_pp_get_res(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	struct exynos_drm_private *priv = dev->dev_private;
+	struct drm_exynos_pp_get_res *resp = data;
+	struct exynos_drm_pp *pp;
+	uint32_t __user *pp_ptr;
+	unsigned int count = priv->num_pp, copied = 0;
+
+	/*
+	 * This ioctl is called twice, once to determine how much space is
+	 * needed, and the 2nd time to fill it.
+	 */
+	if (count && resp->count_pps >= count) {
+		pp_ptr = (uint32_t __user *)
+					(unsigned long)resp->pp_id_ptr;
+
+		list_for_each_entry(pp, &priv->pp_list, head) {
+			if (put_user(pp->base.id, pp_ptr + copied))
+				return -EFAULT;
+			copied++;
+		}
+	}
+	resp->count_pps = count;
+
+	return 0;
+}
+
+static inline struct exynos_drm_pp *exynos_drm_pp_find(struct drm_device *dev,
+						       uint32_t id)
+{
+	struct exynos_drm_private *priv = dev->dev_private;
+	struct exynos_drm_pp *pp;
+
+	list_for_each_entry(pp, &priv->pp_list, head) {
+		if (pp->base.id == id)
+			return pp;
+	}
+	return NULL;
+}
+
+/**
+ * exynos_drm_pp_get - get picture processor module parameters
+ * @dev: DRM device
+ * @data: ioctl data
+ * @file_priv: DRM file info
+ *
+ * Construct a pp configuration structure to return to the user.
+ *
+ * Called by the user via ioctl.
+ *
+ * Returns:
+ * Zero on success, negative errno on failure.
+ */
+int exynos_drm_pp_get(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv)
+{
+	struct drm_exynos_pp_get *resp = data;
+	struct exynos_drm_pp *pp;
+	uint32_t __user *format_ptr;
+
+	pp = exynos_drm_pp_find(dev, resp->pp_id);
+	if (!pp)
+		return -ENOENT;
+
+	resp->pp_id = pp->base.id;
+	resp->capabilities = pp->capabilities;
+
+	/*
+	 * This ioctl is called twice, once to determine how much space is
+	 * needed, and the 2nd time to fill it.
+	 */
+	if (pp->src_format_count &&
+	    (resp->src_format_count >= pp->src_format_count)) {
+		format_ptr = (uint32_t __user *)
+				(unsigned long)resp->src_format_type_ptr;
+		if (copy_to_user(format_ptr, pp->src_format_types,
+				 sizeof(uint32_t) * pp->src_format_count))
+			return -EFAULT;
+	}
+	if (pp->dst_format_count &&
+	    (resp->dst_format_count >= pp->dst_format_count)) {
+		format_ptr = (uint32_t __user *)
+				(unsigned long)resp->dst_format_type_ptr;
+		if (copy_to_user(format_ptr, pp->dst_format_types,
+				 sizeof(uint32_t) * pp->dst_format_count))
+			return -EFAULT;
+	}
+	resp->src_format_count = pp->src_format_count;
+	resp->dst_format_count = pp->dst_format_count;
+
+	return 0;
+}
+
+static inline struct exynos_drm_pp_task *
+	exynos_drm_pp_task_alloc(struct exynos_drm_pp *pp)
+{
+	struct exynos_drm_pp_task *task;
+
+	task = kzalloc(sizeof(*task), GFP_KERNEL);
+	if (!task)
+		return NULL;
+
+	task->dev = pp->dev;
+	task->pp = pp;
+	task->src_w = task->dst_w = UINT_MAX;
+	task->src_h = task->dst_h = UINT_MAX;
+	task->rotation = DRM_ROTATE_0;
+
+	DRM_DEBUG_DRIVER("Allocated task %pK\n", task);
+
+	return task;
+}
+
+static void exynos_drm_pp_task_free(struct exynos_drm_pp *pp,
+				 struct exynos_drm_pp_task *task)
+{
+	DRM_DEBUG_DRIVER("Freeing task %pK\n", task);
+
+	task->pp = NULL;
+
+	if (task->src_fb) {
+		drm_framebuffer_unreference(task->src_fb);
+		task->src_fb = NULL;
+	}
+	if (task->dst_fb) {
+		drm_framebuffer_unreference(task->dst_fb);
+		task->dst_fb = NULL;
+	}
+	if (task->event) {
+		drm_event_cancel_free(pp->dev, &task->event->base);
+		task->event = NULL;
+	}
+	kfree(task);
+}
+
+static int exynos_drm_pp_task_set_property(struct exynos_drm_pp_task *task,
+		struct drm_property *prop, uint64_t prop_value)
+{
+	struct drm_device *dev = task->dev;
+	struct exynos_drm_private *priv = dev->dev_private;
+	struct exynos_drm_pp *pp = task->pp;
+	struct drm_framebuffer *fb;
+	int ret = 0;
+
+	if (prop == priv->pp_src_fb) {
+		fb = drm_framebuffer_lookup(dev, prop_value);
+		if (task->src_fb)
+			drm_framebuffer_unreference(task->src_fb);
+		task->src_fb = fb;
+	} else if (prop == priv->pp_src_x) {
+		task->src_x = prop_value;
+	} else if (prop == priv->pp_src_y) {
+		task->src_y = prop_value;
+	} else if (prop == priv->pp_src_w) {
+		task->src_w = prop_value;
+	} else if (prop == priv->pp_src_h) {
+		task->src_h = prop_value;
+	} else if (prop == priv->pp_dst_fb) {
+		fb = drm_framebuffer_lookup(dev, prop_value);
+		if (task->dst_fb)
+			drm_framebuffer_unreference(task->dst_fb);
+		task->dst_fb = fb;
+	} else if (prop == priv->pp_dst_x) {
+		task->dst_x = prop_value;
+	} else if (prop == priv->pp_dst_y) {
+		task->dst_y = prop_value;
+	} else if (prop == priv->pp_dst_w) {
+		task->dst_w = prop_value;
+	} else if (prop == priv->pp_dst_h) {
+		task->dst_h = prop_value;
+	} else if (prop == pp->rotation_property) {
+		task->rotation = prop_value;
+	} else {
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static struct drm_pending_exynos_pp_event *exynos_drm_pp_event_create(
+			struct drm_device *dev, struct drm_file *file_priv,
+			uint64_t user_data)
+{
+	struct drm_pending_exynos_pp_event *e = NULL;
+	int ret;
+
+	e = kzalloc(sizeof(*e), GFP_KERNEL);
+	if (!e)
+		return NULL;
+
+	e->event.base.type = DRM_EXYNOS_PP_EVENT;
+	e->event.base.length = sizeof(e->event);
+	e->event.user_data = user_data;
+
+	if (file_priv) {
+		ret = drm_event_reserve_init(dev, file_priv, &e->base,
+					     &e->event.base);
+		if (ret) {
+			kfree(e);
+			return NULL;
+		}
+	}
+
+	return e;
+}
+
+static void exynos_drm_pp_event_send(struct drm_device *dev,
+				  struct exynos_drm_pp *pp,
+				  struct drm_pending_exynos_pp_event *e)
+{
+	struct timeval now = ktime_to_timeval(ktime_get());
+
+	e->event.tv_sec = now.tv_sec;
+	e->event.tv_usec = now.tv_usec;
+	e->event.sequence = atomic_inc_return(&pp->sequence);
+
+	drm_send_event(dev, &e->base);
+}
+
+static inline bool drm_fb_check_format(struct drm_framebuffer *fb,
+				const uint32_t *formats, int format_counts)
+{
+	while (format_counts--)
+		if (*formats++ == fb->format->format)
+			return true;
+	return false;
+}
+
+static int exynos_drm_pp_task_check(struct exynos_drm_pp_task *task)
+{
+	struct exynos_drm_pp *pp = task->pp;
+	int ret = 0;
+
+	DRM_DEBUG_DRIVER("checking %pK\n", task);
+
+	if (!task->src_fb || !task->dst_fb)
+		return -EINVAL;
+
+	if (!drm_fb_check_format(task->src_fb, pp->src_format_types,
+				 pp->src_format_count))
+		return -EINVAL;
+
+	if (!drm_fb_check_format(task->dst_fb, pp->dst_format_types,
+				 pp->dst_format_count))
+		return -EINVAL;
+
+	if (task->src_w == UINT_MAX)
+		task->src_w = task->src_fb->width << 16;
+	if (task->src_h == UINT_MAX)
+		task->src_h = task->src_fb->height << 16;
+	if (task->dst_w == UINT_MAX)
+		task->dst_w = task->dst_fb->width << 16;
+	if (task->dst_h == UINT_MAX)
+		task->dst_h = task->dst_fb->height << 16;
+
+	if (task->src_x + task->src_w > (task->src_fb->width << 16) ||
+	    task->src_y + task->src_h > (task->src_fb->height << 16) ||
+	    task->dst_x + task->dst_w > (task->dst_fb->width << 16) ||
+	    task->dst_y + task->dst_h > (task->dst_fb->height << 16))
+		return -EINVAL;
+
+	if (!(pp->capabilities & DRM_EXYNOS_PP_CAP_CROP) &&
+	    (task->src_x || task->src_y || task->dst_x || task->dst_y))
+		return -EINVAL;
+
+	if (!(pp->capabilities & DRM_EXYNOS_PP_CAP_ROTATE) &&
+	    task->rotation != DRM_ROTATE_0)
+		return -EINVAL;
+
+	if (!(pp->capabilities & DRM_EXYNOS_PP_CAP_SCALE) &&
+	    (!drm_rotation_90_or_270(task->rotation) &&
+	     (task->src_w != task->dst_w || task->src_h != task->dst_h)) &&
+	    (drm_rotation_90_or_270(task->rotation) &&
+	     (task->src_w != task->dst_h || task->src_h != task->dst_w)))
+		return -EINVAL;
+
+	if (!(pp->capabilities & DRM_EXYNOS_PP_CAP_CONVERT) &&
+	    task->src_fb->format->format != task->dst_fb->format->format)
+		return -EINVAL;
+
+	if (!(pp->capabilities & DRM_EXYNOS_PP_CAP_FB_MODIFIERS) &&
+	    ((task->src_fb->flags & DRM_MODE_FB_MODIFIERS) ||
+	     (task->dst_fb->flags & DRM_MODE_FB_MODIFIERS)))
+		return -EINVAL;
+
+	if (pp->funcs->check)
+		ret = pp->funcs->check(pp, task);
+
+	return ret;
+}
+
+static int exynos_drm_pp_task_cleanup(struct exynos_drm_pp_task *task)
+{
+	int ret = task->ret;
+
+	if (ret == 0 && task->event) {
+		exynos_drm_pp_event_send(task->dev, task->pp, task->event);
+		/* ensure event won't be canceled on task free */
+		task->event = NULL;
+	}
+
+	exynos_drm_pp_task_free(task->pp, task);
+	return ret;
+}
+
+static void exynos_drm_pp_cleanup_work(struct work_struct *work)
+{
+	struct exynos_drm_pp_task *task = container_of(work,
+				struct exynos_drm_pp_task, cleanup_work);
+
+	exynos_drm_pp_task_cleanup(task);
+}
+
+static void exynos_drm_pp_next_task(struct exynos_drm_pp *pp);
+
+void exynos_drm_pp_task_done(struct exynos_drm_pp_task *task, int ret)
+{
+	struct exynos_drm_pp *pp = task->pp;
+	unsigned long flags;
+
+	DRM_DEBUG_DRIVER("pp: %d, task %pK done\n", pp->base.id,
+			 task);
+
+	spin_lock_irqsave(&pp->lock, flags);
+	if (pp->task == task)
+		pp->task = NULL;
+	task->flags |= DRM_EXYNOS_PP_TASK_DONE;
+	task->ret = ret;
+	spin_unlock_irqrestore(&pp->lock, flags);
+
+	exynos_drm_pp_next_task(pp);
+	wake_up(&pp->done_wq);
+
+	if (task->flags & DRM_EXYNOS_PP_TASK_ASYNC) {
+		INIT_WORK(&task->cleanup_work, exynos_drm_pp_cleanup_work);
+		schedule_work(&task->cleanup_work);
+	}
+}
+
+static void exynos_drm_pp_next_task(struct exynos_drm_pp *pp)
+{
+	struct exynos_drm_pp_task *task;
+	unsigned long flags;
+	int ret;
+
+	DRM_DEBUG_DRIVER("pp: %d, try to run new task\n", pp->base.id);
+
+	spin_lock_irqsave(&pp->lock, flags);
+
+	if (pp->task || list_empty(&pp->todo_list)) {
+		spin_unlock_irqrestore(&pp->lock, flags);
+		return;
+	}
+
+	task = list_first_entry(&pp->todo_list, struct exynos_drm_pp_task,
+				head);
+	list_del_init(&task->head);
+	pp->task = task;
+
+	spin_unlock_irqrestore(&pp->lock, flags);
+
+	DRM_DEBUG_DRIVER("pp: %d, selected task %pK to run\n",
+			 pp->base.id, task);
+
+	ret = pp->funcs->commit(pp, task);
+	if (ret)
+		exynos_drm_pp_task_done(task, ret);
+}
+
+static void exynos_drm_pp_schedule_task(struct exynos_drm_pp *pp,
+				     struct exynos_drm_pp_task *task)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pp->lock, flags);
+	list_add(&task->head, &pp->todo_list);
+	spin_unlock_irqrestore(&pp->lock, flags);
+
+	exynos_drm_pp_next_task(pp);
+}
+
+static void exynos_drm_pp_task_abort(struct exynos_drm_pp *pp,
+				  struct exynos_drm_pp_task *task)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pp->lock, flags);
+	if (task->flags & DRM_EXYNOS_PP_TASK_DONE) {
+		/* already completed task */
+		exynos_drm_pp_task_cleanup(task);
+	} else if (pp->task != task) {
+		/* task has not been scheduled for execution yet */
+		list_del_init(&task->head);
+		exynos_drm_pp_task_cleanup(task);
+	} else {
+		/*
+		 * currently processed task, call abort() and perform
+		 * cleanup with async worker
+		 */
+		task->flags |= DRM_EXYNOS_PP_TASK_ASYNC;
+		if (pp->funcs->abort)
+			pp->funcs->abort(pp, task);
+	}
+	spin_unlock_irqrestore(&pp->lock, flags);
+}
+
+/**
+ * exynos_drm_pp_ioctl - perform operation on framebuffer processor object
+ * @dev: DRM device
+ * @data: ioctl data
+ * @file_priv: DRM file info
+ *
+ * Construct a pp task from the set of properties provided from the user
+ * and try to schedule it to framebuffer processor hardware.
+ *
+ * Called by the user via ioctl.
+ *
+ * Returns:
+ * Zero on success, negative errno on failure.
+ */
+int exynos_drm_pp_commit(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	struct drm_exynos_pp_commit *arg = data;
+	uint32_t __user *props_ptr =
+		(uint32_t __user *)(unsigned long)(arg->props_ptr);
+	uint64_t __user *prop_values_ptr =
+		(uint64_t __user *)(unsigned long)(arg->prop_values_ptr);
+	struct exynos_drm_pp *pp;
+	struct exynos_drm_pp_task *task;
+	int ret = 0;
+	unsigned int i;
+
+	if (arg->flags & ~DRM_EXYNOS_PP_FLAGS)
+		return -EINVAL;
+
+	if (arg->reserved)
+		return -EINVAL;
+
+	/* can't test and expect an event at the same time */
+	if ((arg->flags & DRM_EXYNOS_PP_FLAG_TEST_ONLY) &&
+			(arg->flags & DRM_EXYNOS_PP_FLAG_EVENT))
+		return -EINVAL;
+
+	pp = exynos_drm_pp_find(dev, arg->pp_id);
+	if (!pp)
+		return -ENOENT;
+
+	task = exynos_drm_pp_task_alloc(pp);
+	if (!task) {
+		ret = -ENOMEM;
+		goto free;
+	}
+
+	for (i = 0; i < arg->count_props; i++) {
+		uint32_t prop_id;
+		uint64_t prop_value;
+		struct drm_property *prop;
+
+		if (get_user(prop_id, props_ptr + i)) {
+			ret = -EFAULT;
+			goto free;
+		}
+
+		prop = drm_property_find(dev, prop_id);
+		if (!prop) {
+			ret = -ENOENT;
+			goto free;
+		}
+
+		if (copy_from_user(&prop_value, prop_values_ptr + i,
+				   sizeof(prop_value))) {
+			ret = -EFAULT;
+			goto free;
+		}
+
+		ret = exynos_drm_pp_task_set_property(task, prop, prop_value);
+		if (ret)
+			goto free;
+	}
+
+	if (arg->flags & DRM_EXYNOS_PP_FLAG_EVENT) {
+		struct drm_pending_exynos_pp_event *e;
+
+		e = exynos_drm_pp_event_create(dev, file_priv, arg->user_data);
+		if (!e) {
+			ret = -ENOMEM;
+			goto free;
+		}
+		task->event = e;
+	}
+
+	ret = exynos_drm_pp_task_check(task);
+	if (ret || arg->flags & DRM_EXYNOS_PP_FLAG_TEST_ONLY)
+		goto free;
+
+	/*
+	 * Queue task for processing on the hardware. task object will be
+	 * then freed after exynos_drm_pp_task_done()
+	 */
+	if (arg->flags & DRM_EXYNOS_PP_FLAG_NONBLOCK) {
+		DRM_DEBUG_DRIVER("pp: %d, nonblocking processing task %pK\n",
+				 task->pp->base.id, task);
+
+		task->flags |= DRM_EXYNOS_PP_TASK_ASYNC;
+		exynos_drm_pp_schedule_task(task->pp, task);
+		ret = 0;
+	} else {
+		DRM_DEBUG_DRIVER("pp: %d, processing task %pK\n", pp->base.id,
+				 task);
+		exynos_drm_pp_schedule_task(pp, task);
+		ret = wait_event_interruptible(pp->done_wq,
+					task->flags & DRM_EXYNOS_PP_TASK_DONE);
+		if (ret)
+			exynos_drm_pp_task_abort(pp, task);
+		else
+			ret = exynos_drm_pp_task_cleanup(task);
+	}
+	return ret;
+free:
+	exynos_drm_pp_task_free(pp, task);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/exynos/exynos_drm_pp.h b/drivers/gpu/drm/exynos/exynos_drm_pp.h
new file mode 100644
index 000000000000..d892097f7e89
--- /dev/null
+++ b/drivers/gpu/drm/exynos/exynos_drm_pp.h
@@ -0,0 +1,155 @@
+/*
+ * Copyright (c) 2017 Samsung Electronics Co., Ltd.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#ifndef _EXYNOS_DRM_FBPROC_H_
+#define _EXYNOS_DRM_FBPORC_H_
+
+#include <drm/drmP.h>
+
+struct exynos_drm_pp;
+struct exynos_drm_pp_task;
+
+/**
+ * struct exynos_drm_pp_funcs - exynos_drm_pp control functions
+ */
+struct exynos_drm_pp_funcs {
+	/**
+	 * @check:
+	 *
+	 * This is the optional hook to validate an pp task. This function
+	 * must reject any task which the hardware or driver doesn't support.
+	 * This includes but is of course not limited to:
+	 *
+	 *  - Checking that the framebuffers, scaling and placement
+	 *    requirements and so on are within the limits of the hardware.
+	 *
+	 *  - The driver does not need to repeat basic input validation like
+	 *    done in the exynos_drm_pp_check_only() function. The core does
+	 *    that before calling this hook.
+	 *
+	 * RETURNS:
+	 *
+	 * 0 on success or one of the below negative error codes:
+	 *
+	 *  - -EINVAL, if any of the above constraints are violated.
+	 */
+	int (*check)(struct exynos_drm_pp *pp,
+		     struct exynos_drm_pp_task *task);
+
+	/**
+	 * @commit:
+	 *
+	 * This is the main entry point to start framebuffer processing
+	 * in the hardware. The exynos_drm_pp_task has been already validated.
+	 * This function must not wait until the device finishes processing.
+	 * When the driver finishes processing, it has to call
+	 * exynos_exynos_drm_pp_task_done() function.
+	 *
+	 * RETURNS:
+	 *
+	 * 0 on success or negative error codes in case of failure.
+	 */
+	int (*commit)(struct exynos_drm_pp *pp,
+		      struct exynos_drm_pp_task *task);
+
+	/**
+	 * @abort:
+	 *
+	 * Informs the driver that it has to abort the currently running
+	 * task as soon as possible (i.e. as soon as it can stop the device
+	 * safely), even if the task would not have been finished by then.
+	 * After the driver performs the necessary steps, it has to call
+	 * exynos_drm_pp_task_done() (as if the task ended normally).
+	 * This function does not have to (and will usually not) wait
+	 * until the device enters a state when it can be stopped.
+	 */
+	void (*abort)(struct exynos_drm_pp *pp,
+		      struct exynos_drm_pp_task *task);
+};
+
+/**
+ * struct exynos_drm_pp - central picture processor module structure
+ */
+struct exynos_drm_pp {
+	struct drm_device *dev;
+	struct list_head head;
+
+	char *name;
+	struct drm_mode_object base;
+	const struct exynos_drm_pp_funcs *funcs;
+	unsigned int capabilities;
+	atomic_t sequence;
+
+	spinlock_t lock;
+	struct exynos_drm_pp_task *task;
+	struct list_head todo_list;
+	wait_queue_head_t done_wq;
+
+	uint32_t *src_format_types;
+	unsigned int src_format_count;
+	uint32_t *dst_format_types;
+	unsigned int dst_format_count;
+
+	struct drm_object_properties properties;
+
+	struct drm_property *rotation_property;
+};
+
+/**
+ * struct exynos_drm_pp_task - a structure describing transformation that
+ * has to be performed by the picture processor hardware module
+ */
+struct exynos_drm_pp_task {
+	struct drm_device *dev;
+	struct exynos_drm_pp *pp;
+	struct list_head head;
+
+	struct drm_framebuffer *src_fb;
+
+	/* Source values are 16.16 fixed point */
+	uint32_t src_x, src_y;
+	uint32_t src_h, src_w;
+
+	struct drm_framebuffer *dst_fb;
+
+	/* Destination values are 16.16 fixed point */
+	uint32_t dst_x, dst_y;
+	uint32_t dst_h, dst_w;
+
+	unsigned int rotation;
+
+	struct work_struct cleanup_work;
+	unsigned int flags;
+	int ret;
+
+	struct drm_pending_exynos_pp_event *event;
+};
+
+#define DRM_EXYNOS_PP_TASK_DONE		(1 << 0)
+#define DRM_EXYNOS_PP_TASK_ASYNC	(1 << 1)
+
+int exynos_drm_pp_init(struct drm_device *dev);
+
+int exynos_drm_pp_register(struct drm_device *dev, struct exynos_drm_pp *pp,
+		    const struct exynos_drm_pp_funcs *funcs, unsigned int caps,
+		    const uint32_t *src_fmts, unsigned int src_fmt_count,
+		    const uint32_t *dst_fmts, unsigned int dst_fmt_count,
+		    unsigned int rotation, const char *name, ...);
+void exynos_drm_pp_unregister(struct drm_device *dev, struct exynos_drm_pp *pp);
+
+void exynos_drm_pp_task_done(struct exynos_drm_pp_task *task, int ret);
+
+int exynos_drm_pp_get_res(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+int exynos_drm_pp_get(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv);
+int exynos_drm_pp_commit(struct drm_device *dev,
+			 void *data, struct drm_file *file_priv);
+
+#endif
diff --git a/include/uapi/drm/exynos_drm.h b/include/uapi/drm/exynos_drm.h
index cb3e9f9d029f..5a13287856f3 100644
--- a/include/uapi/drm/exynos_drm.h
+++ b/include/uapi/drm/exynos_drm.h
@@ -300,6 +300,46 @@ struct drm_exynos_ipp_cmd_ctrl {
 	__u32	ctrl;
 };
 
+struct drm_exynos_pp_get_res {
+	__u64 pp_id_ptr;
+	__u32 count_pps;
+};
+
+struct drm_exynos_pp_get {
+	__u32 pp_id;
+	__u32 capabilities;
+
+	__u32 src_format_count;
+	__u32 dst_format_count;
+	__u64 src_format_type_ptr;
+	__u64 dst_format_type_ptr;
+};
+
+#define DRM_EXYNOS_OBJECT_PP 0x88888888
+
+#define DRM_EXYNOS_PP_CAP_CROP		0x01
+#define DRM_EXYNOS_PP_CAP_ROTATE	0x02
+#define DRM_EXYNOS_PP_CAP_SCALE		0x04
+#define DRM_EXYNOS_PP_CAP_CONVERT	0x08
+#define DRM_EXYNOS_PP_CAP_FB_MODIFIERS	0x1000
+
+#define DRM_EXYNOS_PP_FLAG_EVENT	0x01
+#define DRM_EXYNOS_PP_FLAG_TEST_ONLY	0x02
+#define DRM_EXYNOS_PP_FLAG_NONBLOCK	0x04
+
+#define DRM_EXYNOS_PP_FLAGS (DRM_EXYNOS_PP_FLAG_EVENT |\
+		DRM_EXYNOS_PP_FLAG_TEST_ONLY | DRM_EXYNOS_PP_FLAG_NONBLOCK)
+
+struct drm_exynos_pp_commit {
+	__u32 pp_id;
+	__u32 flags;
+	__u32 count_props;
+	__u64 props_ptr;
+	__u64 prop_values_ptr;
+	__u64 reserved;
+	__u64 user_data;
+};
+
 #define DRM_EXYNOS_GEM_CREATE		0x00
 #define DRM_EXYNOS_GEM_MAP		0x01
 /* Reserved 0x03 ~ 0x05 for exynos specific gem ioctl */
@@ -317,6 +357,10 @@ struct drm_exynos_ipp_cmd_ctrl {
 #define DRM_EXYNOS_IPP_QUEUE_BUF	0x32
 #define DRM_EXYNOS_IPP_CMD_CTRL	0x33
 
+#define DRM_EXYNOS_PP_GET_RESOURCES	0x40
+#define DRM_EXYNOS_PP_GET		0x41
+#define DRM_EXYNOS_PP_COMMIT		0x42
+
 #define DRM_IOCTL_EXYNOS_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_GEM_CREATE, struct drm_exynos_gem_create)
 #define DRM_IOCTL_EXYNOS_GEM_MAP		DRM_IOWR(DRM_COMMAND_BASE + \
@@ -343,9 +387,17 @@ struct drm_exynos_ipp_cmd_ctrl {
 #define DRM_IOCTL_EXYNOS_IPP_CMD_CTRL		DRM_IOWR(DRM_COMMAND_BASE + \
 		DRM_EXYNOS_IPP_CMD_CTRL, struct drm_exynos_ipp_cmd_ctrl)
 
+#define DRM_IOCTL_EXYNOS_PP_GET_RESOURCES	DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_PP_GET_RESOURCES, struct drm_exynos_pp_get_res)
+#define DRM_IOCTL_EXYNOS_PP_GET			DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_PP_GET, struct drm_exynos_pp_get)
+#define DRM_IOCTL_EXYNOS_PP_COMMIT		DRM_IOWR(DRM_COMMAND_BASE + \
+		DRM_EXYNOS_PP_COMMIT, struct drm_exynos_pp_commit)
+
 /* EXYNOS specific events */
 #define DRM_EXYNOS_G2D_EVENT		0x80000000
 #define DRM_EXYNOS_IPP_EVENT		0x80000001
+#define DRM_EXYNOS_PP_EVENT		0x80000002
 
 struct drm_exynos_g2d_event {
 	struct drm_event	base;
@@ -366,6 +418,16 @@ struct drm_exynos_ipp_event {
 	__u32			buf_id[EXYNOS_DRM_OPS_MAX];
 };
 
+struct drm_exynos_pp_event {
+	struct drm_event	base;
+	__u64			user_data;
+	__u32			tv_sec;
+	__u32			tv_usec;
+	__u32			pp_id;
+	__u32			sequence;
+	__u64			reserved;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 4/4] drm/exynos: Convert Exynos Rotator driver to Picture Processor interface
       [not found]   ` <CGME20170420091408eucas1p2ef5b57fdcafcf13fbc52763f7cb43d45@eucas1p2.samsung.com>
@ 2017-04-20  9:13     ` Marek Szyprowski
  0 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-20  9:13 UTC (permalink / raw)
  To: dri-devel, linux-samsung-soc
  Cc: Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Marek Szyprowski

This patch converts Exynos Rotator driver from Exynos IPP API to Exynos
DRM Picture Processor API.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/gpu/drm/exynos/Kconfig              |   1 -
 drivers/gpu/drm/exynos/exynos_drm_drv.c     |   1 +
 drivers/gpu/drm/exynos/exynos_drm_rotator.c | 513 +++++++---------------------
 drivers/gpu/drm/exynos/exynos_drm_rotator.h |  19 --
 4 files changed, 127 insertions(+), 407 deletions(-)
 delete mode 100644 drivers/gpu/drm/exynos/exynos_drm_rotator.h

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 1d185347c64c..84c8cc2aa28d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -106,7 +106,6 @@ config DRM_EXYNOS_FIMC
 
 config DRM_EXYNOS_ROTATOR
 	bool "Rotator"
-	depends on DRM_EXYNOS_IPP
 	help
 	  Choose this option if you want to use Exynos Rotator for DRM.
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index 41942b111285..60b634726d09 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -282,6 +282,7 @@ struct exynos_drm_driver_info {
 		DRV_PTR(fimc_driver, CONFIG_DRM_EXYNOS_FIMC),
 	}, {
 		DRV_PTR(rotator_driver, CONFIG_DRM_EXYNOS_ROTATOR),
+		DRM_COMPONENT_DRIVER
 	}, {
 		DRV_PTR(gsc_driver, CONFIG_DRM_EXYNOS_GSC),
 	}, {
diff --git a/drivers/gpu/drm/exynos/exynos_drm_rotator.c b/drivers/gpu/drm/exynos/exynos_drm_rotator.c
index 79282a820ecc..d5308a23d148 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_rotator.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_rotator.c
@@ -10,6 +10,7 @@
  */
 
 #include <linux/kernel.h>
+#include <linux/component.h>
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
@@ -21,29 +22,18 @@
 #include <drm/drmP.h>
 #include <drm/exynos_drm.h>
 #include "regs-rotator.h"
+#include "exynos_drm_fb.h"
 #include "exynos_drm_drv.h"
-#include "exynos_drm_ipp.h"
+#include "exynos_drm_iommu.h"
+#include "exynos_drm_pp.h"
 
 /*
  * Rotator supports image crop/rotator and input/output DMA operations.
  * input DMA reads image data from the memory.
  * output DMA writes image data to memory.
- *
- * M2M operation : supports crop/scale/rotation/csc so on.
- * Memory ----> Rotator H/W ----> Memory.
- */
-
-/*
- * TODO
- * 1. check suspend/resume api if needed.
- * 2. need to check use case platform_device_id.
- * 3. check src/dst size with, height.
- * 4. need to add supported list in prop_list.
  */
 
 #define get_rot_context(dev)	platform_get_drvdata(to_platform_device(dev))
-#define get_ctx_from_ippdrv(ippdrv)	container_of(ippdrv,\
-					struct rot_context, ippdrv);
 #define rot_read(offset)		readl(rot->regs + (offset))
 #define rot_write(cfg, offset)	writel(cfg, rot->regs + (offset))
 
@@ -83,23 +73,20 @@ struct rot_limit_table {
 /*
  * A structure of rotator context.
  * @ippdrv: prepare initialization using ippdrv.
- * @regs_res: register resources.
  * @regs: memory mapped io registers.
  * @clock: rotator gate clock.
  * @limit_tbl: limitation of rotator.
- * @irq: irq number.
- * @cur_buf_id: current operation buffer id.
  * @suspended: suspended state.
  */
 struct rot_context {
-	struct exynos_drm_ippdrv	ippdrv;
-	struct resource	*regs_res;
+	struct exynos_drm_pp pp;
+	struct drm_device *drm_dev;
+	struct device	*dev;
 	void __iomem	*regs;
 	struct clk	*clock;
 	struct rot_limit_table	*limit_tbl;
-	int	irq;
-	int	cur_buf_id[EXYNOS_DRM_OPS_MAX];
 	bool	suspended;
+	struct exynos_drm_pp_task	*task;
 };
 
 static void rotator_reg_set_irq(struct rot_context *rot, bool enable)
@@ -138,9 +125,6 @@ static enum rot_irq_status rotator_reg_get_irq_status(struct rot_context *rot)
 static irqreturn_t rotator_irq_handler(int irq, void *arg)
 {
 	struct rot_context *rot = arg;
-	struct exynos_drm_ippdrv *ippdrv = &rot->ippdrv;
-	struct drm_exynos_ipp_cmd_node *c_node = ippdrv->c_node;
-	struct drm_exynos_ipp_event_work *event_work = c_node->event_work;
 	enum rot_irq_status irq_status;
 	u32 val;
 
@@ -152,13 +136,13 @@ static irqreturn_t rotator_irq_handler(int irq, void *arg)
 	val |= ROT_STATUS_IRQ_PENDING((u32)irq_status);
 	rot_write(val, ROT_STATUS);
 
-	if (irq_status == ROT_IRQ_STATUS_COMPLETE) {
-		event_work->ippdrv = ippdrv;
-		event_work->buf_id[EXYNOS_DRM_OPS_DST] =
-			rot->cur_buf_id[EXYNOS_DRM_OPS_DST];
-		queue_work(ippdrv->event_workq, &event_work->work);
-	} else {
-		DRM_ERROR("the SFR is set illegally\n");
+	if (rot->task) {
+		struct exynos_drm_pp_task *task = rot->task;
+
+		rot->task = NULL;
+		pm_runtime_put(rot->dev);
+		exynos_drm_pp_task_done(task,
+			irq_status == ROT_IRQ_STATUS_COMPLETE ? 0 : -EINVAL);
 	}
 
 	return IRQ_HANDLED;
@@ -214,9 +198,6 @@ static int rotator_src_set_fmt(struct device *dev, u32 fmt)
 	case DRM_FORMAT_XRGB8888:
 		val |= ROT_CONTROL_FMT_RGB888;
 		break;
-	default:
-		DRM_ERROR("invalid image format\n");
-		return -EINVAL;
 	}
 
 	rot_write(val, ROT_CONTROL);
@@ -224,33 +205,18 @@ static int rotator_src_set_fmt(struct device *dev, u32 fmt)
 	return 0;
 }
 
-static inline bool rotator_check_reg_fmt(u32 fmt)
-{
-	if ((fmt == ROT_CONTROL_FMT_YCBCR420_2P) ||
-	    (fmt == ROT_CONTROL_FMT_RGB888))
-		return true;
-
-	return false;
-}
-
-static int rotator_src_set_size(struct device *dev, int swap,
-		struct drm_exynos_pos *pos,
-		struct drm_exynos_sz *sz)
+static int rotator_src_set_buf(struct device *dev, struct drm_exynos_pos *pos,
+			       struct drm_framebuffer *fb)
 {
 	struct rot_context *rot = dev_get_drvdata(dev);
 	u32 fmt, hsize, vsize;
 	u32 val;
 
-	/* Get format */
 	fmt = rotator_reg_get_fmt(rot);
-	if (!rotator_check_reg_fmt(fmt)) {
-		DRM_ERROR("invalid format.\n");
-		return -EINVAL;
-	}
 
 	/* Align buffer size */
-	hsize = sz->hsize;
-	vsize = sz->vsize;
+	hsize = fb->width;
+	vsize = fb->height;
 	rotator_align_size(rot, fmt, &hsize, &vsize);
 
 	/* Set buffer size configuration */
@@ -263,131 +229,54 @@ static int rotator_src_set_size(struct device *dev, int swap,
 	val = ROT_SRC_CROP_SIZE_H(pos->h) | ROT_SRC_CROP_SIZE_W(pos->w);
 	rot_write(val, ROT_SRC_CROP_SIZE);
 
-	return 0;
-}
-
-static int rotator_src_set_addr(struct device *dev,
-		struct drm_exynos_ipp_buf_info *buf_info,
-		u32 buf_id, enum drm_exynos_ipp_buf_type buf_type)
-{
-	struct rot_context *rot = dev_get_drvdata(dev);
-	dma_addr_t addr[EXYNOS_DRM_PLANAR_MAX];
-	u32 val, fmt, hsize, vsize;
-	int i;
-
-	/* Set current buf_id */
-	rot->cur_buf_id[EXYNOS_DRM_OPS_SRC] = buf_id;
-
-	switch (buf_type) {
-	case IPP_BUF_ENQUEUE:
-		/* Set address configuration */
-		for_each_ipp_planar(i)
-			addr[i] = buf_info->base[i];
-
-		/* Get format */
-		fmt = rotator_reg_get_fmt(rot);
-		if (!rotator_check_reg_fmt(fmt)) {
-			DRM_ERROR("invalid format.\n");
-			return -EINVAL;
-		}
-
-		/* Re-set cb planar for NV12 format */
-		if ((fmt == ROT_CONTROL_FMT_YCBCR420_2P) &&
-		    !addr[EXYNOS_DRM_PLANAR_CB]) {
-
-			val = rot_read(ROT_SRC_BUF_SIZE);
-			hsize = ROT_GET_BUF_SIZE_W(val);
-			vsize = ROT_GET_BUF_SIZE_H(val);
-
-			/* Set cb planar */
-			addr[EXYNOS_DRM_PLANAR_CB] =
-				addr[EXYNOS_DRM_PLANAR_Y] + hsize * vsize;
-		}
-
-		for_each_ipp_planar(i)
-			rot_write(addr[i], ROT_SRC_BUF_ADDR(i));
-		break;
-	case IPP_BUF_DEQUEUE:
-		for_each_ipp_planar(i)
-			rot_write(0x0, ROT_SRC_BUF_ADDR(i));
-		break;
-	default:
-		/* Nothing to do */
-		break;
-	}
+	/* Set buffer DMA address */
+	rot_write(exynos_drm_fb_dma_addr(fb, 0), ROT_SRC_BUF_ADDR(0));
+	rot_write(exynos_drm_fb_dma_addr(fb, 1), ROT_SRC_BUF_ADDR(1));
 
 	return 0;
 }
 
-static int rotator_dst_set_transf(struct device *dev,
-		enum drm_exynos_degree degree,
-		enum drm_exynos_flip flip, bool *swap)
+static int rotator_dst_set_transf(struct device *dev, unsigned int rotation)
 {
 	struct rot_context *rot = dev_get_drvdata(dev);
 	u32 val;
 
 	/* Set transform configuration */
 	val = rot_read(ROT_CONTROL);
+
 	val &= ~ROT_CONTROL_FLIP_MASK;
 
-	switch (flip) {
-	case EXYNOS_DRM_FLIP_VERTICAL:
+	if (rotation & DRM_REFLECT_Y)
 		val |= ROT_CONTROL_FLIP_VERTICAL;
-		break;
-	case EXYNOS_DRM_FLIP_HORIZONTAL:
+	if (rotation & DRM_REFLECT_X)
 		val |= ROT_CONTROL_FLIP_HORIZONTAL;
-		break;
-	default:
-		/* Flip None */
-		break;
-	}
 
 	val &= ~ROT_CONTROL_ROT_MASK;
 
-	switch (degree) {
-	case EXYNOS_DRM_DEGREE_90:
+	if (rotation & DRM_ROTATE_90)
 		val |= ROT_CONTROL_ROT_90;
-		break;
-	case EXYNOS_DRM_DEGREE_180:
+	else if (rotation & DRM_ROTATE_180)
 		val |= ROT_CONTROL_ROT_180;
-		break;
-	case EXYNOS_DRM_DEGREE_270:
+	else if (rotation & DRM_ROTATE_270)
 		val |= ROT_CONTROL_ROT_270;
-		break;
-	default:
-		/* Rotation 0 Degree */
-		break;
-	}
 
 	rot_write(val, ROT_CONTROL);
 
-	/* Check degree for setting buffer size swap */
-	if ((degree == EXYNOS_DRM_DEGREE_90) ||
-	    (degree == EXYNOS_DRM_DEGREE_270))
-		*swap = true;
-	else
-		*swap = false;
-
 	return 0;
 }
 
-static int rotator_dst_set_size(struct device *dev, int swap,
-		struct drm_exynos_pos *pos,
-		struct drm_exynos_sz *sz)
+static int rotator_dst_set_buf(struct device *dev, struct drm_exynos_pos *pos,
+			       struct drm_framebuffer *fb)
 {
 	struct rot_context *rot = dev_get_drvdata(dev);
-	u32 val, fmt, hsize, vsize;
+	u32 fmt, hsize, vsize;
+	u32 val;
 
-	/* Get format */
 	fmt = rotator_reg_get_fmt(rot);
-	if (!rotator_check_reg_fmt(fmt)) {
-		DRM_ERROR("invalid format.\n");
-		return -EINVAL;
-	}
 
 	/* Align buffer size */
-	hsize = sz->hsize;
-	vsize = sz->vsize;
+	hsize = fb->width;
+	vsize = fb->height;
 	rotator_align_size(rot, fmt, &hsize, &vsize);
 
 	/* Set buffer size configuration */
@@ -398,227 +287,23 @@ static int rotator_dst_set_size(struct device *dev, int swap,
 	val = ROT_CROP_POS_Y(pos->y) | ROT_CROP_POS_X(pos->x);
 	rot_write(val, ROT_DST_CROP_POS);
 
-	return 0;
-}
-
-static int rotator_dst_set_addr(struct device *dev,
-		struct drm_exynos_ipp_buf_info *buf_info,
-		u32 buf_id, enum drm_exynos_ipp_buf_type buf_type)
-{
-	struct rot_context *rot = dev_get_drvdata(dev);
-	dma_addr_t addr[EXYNOS_DRM_PLANAR_MAX];
-	u32 val, fmt, hsize, vsize;
-	int i;
-
-	/* Set current buf_id */
-	rot->cur_buf_id[EXYNOS_DRM_OPS_DST] = buf_id;
-
-	switch (buf_type) {
-	case IPP_BUF_ENQUEUE:
-		/* Set address configuration */
-		for_each_ipp_planar(i)
-			addr[i] = buf_info->base[i];
-
-		/* Get format */
-		fmt = rotator_reg_get_fmt(rot);
-		if (!rotator_check_reg_fmt(fmt)) {
-			DRM_ERROR("invalid format.\n");
-			return -EINVAL;
-		}
-
-		/* Re-set cb planar for NV12 format */
-		if ((fmt == ROT_CONTROL_FMT_YCBCR420_2P) &&
-		    !addr[EXYNOS_DRM_PLANAR_CB]) {
-			/* Get buf size */
-			val = rot_read(ROT_DST_BUF_SIZE);
-
-			hsize = ROT_GET_BUF_SIZE_W(val);
-			vsize = ROT_GET_BUF_SIZE_H(val);
-
-			/* Set cb planar */
-			addr[EXYNOS_DRM_PLANAR_CB] =
-				addr[EXYNOS_DRM_PLANAR_Y] + hsize * vsize;
-		}
-
-		for_each_ipp_planar(i)
-			rot_write(addr[i], ROT_DST_BUF_ADDR(i));
-		break;
-	case IPP_BUF_DEQUEUE:
-		for_each_ipp_planar(i)
-			rot_write(0x0, ROT_DST_BUF_ADDR(i));
-		break;
-	default:
-		/* Nothing to do */
-		break;
-	}
-
-	return 0;
-}
-
-static struct exynos_drm_ipp_ops rot_src_ops = {
-	.set_fmt	=	rotator_src_set_fmt,
-	.set_size	=	rotator_src_set_size,
-	.set_addr	=	rotator_src_set_addr,
-};
-
-static struct exynos_drm_ipp_ops rot_dst_ops = {
-	.set_transf	=	rotator_dst_set_transf,
-	.set_size	=	rotator_dst_set_size,
-	.set_addr	=	rotator_dst_set_addr,
-};
-
-static int rotator_init_prop_list(struct exynos_drm_ippdrv *ippdrv)
-{
-	struct drm_exynos_ipp_prop_list *prop_list = &ippdrv->prop_list;
-
-	prop_list->version = 1;
-	prop_list->flip = (1 << EXYNOS_DRM_FLIP_VERTICAL) |
-				(1 << EXYNOS_DRM_FLIP_HORIZONTAL);
-	prop_list->degree = (1 << EXYNOS_DRM_DEGREE_0) |
-				(1 << EXYNOS_DRM_DEGREE_90) |
-				(1 << EXYNOS_DRM_DEGREE_180) |
-				(1 << EXYNOS_DRM_DEGREE_270);
-	prop_list->csc = 0;
-	prop_list->crop = 0;
-	prop_list->scale = 0;
+	/* Set buffer DMA address */
+	rot_write(exynos_drm_fb_dma_addr(fb, 0), ROT_DST_BUF_ADDR(0));
+	rot_write(exynos_drm_fb_dma_addr(fb, 1), ROT_DST_BUF_ADDR(1));
 
 	return 0;
 }
 
-static inline bool rotator_check_drm_fmt(u32 fmt)
-{
-	switch (fmt) {
-	case DRM_FORMAT_XRGB8888:
-	case DRM_FORMAT_NV12:
-		return true;
-	default:
-		DRM_DEBUG_KMS("not support format\n");
-		return false;
-	}
-}
-
-static inline bool rotator_check_drm_flip(enum drm_exynos_flip flip)
-{
-	switch (flip) {
-	case EXYNOS_DRM_FLIP_NONE:
-	case EXYNOS_DRM_FLIP_VERTICAL:
-	case EXYNOS_DRM_FLIP_HORIZONTAL:
-	case EXYNOS_DRM_FLIP_BOTH:
-		return true;
-	default:
-		DRM_DEBUG_KMS("invalid flip\n");
-		return false;
-	}
-}
-
-static int rotator_ippdrv_check_property(struct device *dev,
-		struct drm_exynos_ipp_property *property)
-{
-	struct drm_exynos_ipp_config *src_config =
-					&property->config[EXYNOS_DRM_OPS_SRC];
-	struct drm_exynos_ipp_config *dst_config =
-					&property->config[EXYNOS_DRM_OPS_DST];
-	struct drm_exynos_pos *src_pos = &src_config->pos;
-	struct drm_exynos_pos *dst_pos = &dst_config->pos;
-	struct drm_exynos_sz *src_sz = &src_config->sz;
-	struct drm_exynos_sz *dst_sz = &dst_config->sz;
-	bool swap = false;
-
-	/* Check format configuration */
-	if (src_config->fmt != dst_config->fmt) {
-		DRM_DEBUG_KMS("not support csc feature\n");
-		return -EINVAL;
-	}
-
-	if (!rotator_check_drm_fmt(dst_config->fmt)) {
-		DRM_DEBUG_KMS("invalid format\n");
-		return -EINVAL;
-	}
-
-	/* Check transform configuration */
-	if (src_config->degree != EXYNOS_DRM_DEGREE_0) {
-		DRM_DEBUG_KMS("not support source-side rotation\n");
-		return -EINVAL;
-	}
-
-	switch (dst_config->degree) {
-	case EXYNOS_DRM_DEGREE_90:
-	case EXYNOS_DRM_DEGREE_270:
-		swap = true;
-	case EXYNOS_DRM_DEGREE_0:
-	case EXYNOS_DRM_DEGREE_180:
-		/* No problem */
-		break;
-	default:
-		DRM_DEBUG_KMS("invalid degree\n");
-		return -EINVAL;
-	}
-
-	if (src_config->flip != EXYNOS_DRM_FLIP_NONE) {
-		DRM_DEBUG_KMS("not support source-side flip\n");
-		return -EINVAL;
-	}
-
-	if (!rotator_check_drm_flip(dst_config->flip)) {
-		DRM_DEBUG_KMS("invalid flip\n");
-		return -EINVAL;
-	}
-
-	/* Check size configuration */
-	if ((src_pos->x + src_pos->w > src_sz->hsize) ||
-		(src_pos->y + src_pos->h > src_sz->vsize)) {
-		DRM_DEBUG_KMS("out of source buffer bound\n");
-		return -EINVAL;
-	}
-
-	if (swap) {
-		if ((dst_pos->x + dst_pos->h > dst_sz->vsize) ||
-			(dst_pos->y + dst_pos->w > dst_sz->hsize)) {
-			DRM_DEBUG_KMS("out of destination buffer bound\n");
-			return -EINVAL;
-		}
-
-		if ((src_pos->w != dst_pos->h) || (src_pos->h != dst_pos->w)) {
-			DRM_DEBUG_KMS("not support scale feature\n");
-			return -EINVAL;
-		}
-	} else {
-		if ((dst_pos->x + dst_pos->w > dst_sz->hsize) ||
-			(dst_pos->y + dst_pos->h > dst_sz->vsize)) {
-			DRM_DEBUG_KMS("out of destination buffer bound\n");
-			return -EINVAL;
-		}
-
-		if ((src_pos->w != dst_pos->w) || (src_pos->h != dst_pos->h)) {
-			DRM_DEBUG_KMS("not support scale feature\n");
-			return -EINVAL;
-		}
-	}
-
-	return 0;
-}
-
-static int rotator_ippdrv_start(struct device *dev, enum drm_exynos_ipp_cmd cmd)
+static int rotator_start(struct device *dev)
 {
 	struct rot_context *rot = dev_get_drvdata(dev);
 	u32 val;
 
-	if (rot->suspended) {
-		DRM_ERROR("suspended state\n");
-		return -EPERM;
-	}
-
-	if (cmd != IPP_CMD_M2M) {
-		DRM_ERROR("not support cmd: %d\n", cmd);
-		return -EINVAL;
-	}
-
 	/* Set interrupt enable */
 	rotator_reg_set_irq(rot, true);
 
 	val = rot_read(ROT_CONTROL);
 	val |= ROT_CONTROL_START;
-
 	rot_write(val, ROT_CONTROL);
 
 	return 0;
@@ -692,11 +377,86 @@ static int rotator_ippdrv_start(struct device *dev, enum drm_exynos_ipp_cmd cmd)
 };
 MODULE_DEVICE_TABLE(of, exynos_rotator_match);
 
+static int rotator_commit(struct exynos_drm_pp *pp,
+			  struct exynos_drm_pp_task *task)
+{
+	struct rot_context *rot =
+			container_of(pp, struct rot_context, pp);
+	struct device *dev = rot->dev;
+	struct drm_exynos_pos src_pos = {
+		task->src_x >> 16, task->src_y >> 16,
+		task->src_w >> 16, task->src_h >> 16,
+	};
+	struct drm_exynos_pos dst_pos = {
+		task->dst_x >> 16, task->dst_y >> 16,
+		task->dst_w >> 16, task->dst_h >> 16,
+	};
+
+	pm_runtime_get_sync(dev);
+	rot->task = task;
+
+	rotator_src_set_fmt(dev, task->src_fb->format->format);
+	rotator_src_set_buf(dev, &src_pos, task->src_fb);
+	rotator_dst_set_transf(dev, task->rotation);
+	rotator_dst_set_buf(dev, &dst_pos, task->dst_fb);
+	rotator_start(dev);
+
+	return 0;
+}
+
+struct exynos_drm_pp_funcs pp_funcs = {
+	.commit = rotator_commit,
+};
+
+static const uint32_t rotator_formats[] = {
+	DRM_FORMAT_XRGB8888,
+	DRM_FORMAT_NV12,
+};
+
+static int rotator_bind(struct device *dev, struct device *master, void *data)
+{
+	struct rot_context *rot = dev_get_drvdata(dev);
+	struct drm_device *drm_dev = data;
+	struct exynos_drm_pp *pp = &rot->pp;
+
+	rot->drm_dev = drm_dev;
+	drm_iommu_attach_device(drm_dev, dev);
+
+	exynos_drm_pp_register(drm_dev, pp, &pp_funcs,
+			   DRM_EXYNOS_PP_CAP_CROP | DRM_EXYNOS_PP_CAP_ROTATE,
+			   rotator_formats, ARRAY_SIZE(rotator_formats),
+			   rotator_formats, ARRAY_SIZE(rotator_formats),
+			   DRM_ROTATE_0 | DRM_ROTATE_90 | DRM_ROTATE_180 |
+			   DRM_ROTATE_270 | DRM_REFLECT_X | DRM_REFLECT_Y,
+			   "rotator");
+
+	dev_info(dev, "The exynos rotator has been probed successfully\n");
+
+	return 0;
+}
+
+static void rotator_unbind(struct device *dev, struct device *master,
+			void *data)
+{
+	struct rot_context *rot = dev_get_drvdata(dev);
+	struct drm_device *drm_dev = data;
+	struct exynos_drm_pp *pp = &rot->pp;
+
+	exynos_drm_pp_unregister(drm_dev, pp);
+	drm_iommu_detach_device(rot->drm_dev, rot->dev);
+}
+
+static const struct component_ops rotator_component_ops = {
+	.bind	= rotator_bind,
+	.unbind = rotator_unbind,
+};
+
 static int rotator_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
+	struct resource	*regs_res;
 	struct rot_context *rot;
-	struct exynos_drm_ippdrv *ippdrv;
+	int irq;
 	int ret;
 
 	if (!dev->of_node) {
@@ -710,19 +470,20 @@ static int rotator_probe(struct platform_device *pdev)
 
 	rot->limit_tbl = (struct rot_limit_table *)
 				of_device_get_match_data(dev);
-	rot->regs_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	rot->regs = devm_ioremap_resource(dev, rot->regs_res);
+	rot->dev = dev;
+	regs_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	rot->regs = devm_ioremap_resource(dev, regs_res);
 	if (IS_ERR(rot->regs))
 		return PTR_ERR(rot->regs);
 
-	rot->irq = platform_get_irq(pdev, 0);
-	if (rot->irq < 0) {
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0) {
 		dev_err(dev, "failed to get irq\n");
-		return rot->irq;
+		return irq;
 	}
 
-	ret = devm_request_threaded_irq(dev, rot->irq, NULL,
-			rotator_irq_handler, IRQF_ONESHOT, "drm_rotator", rot);
+	ret = devm_request_threaded_irq(dev, irq, NULL,	rotator_irq_handler,
+					IRQF_ONESHOT, "drm_rotator", rot);
 	if (ret < 0) {
 		dev_err(dev, "failed to request irq\n");
 		return ret;
@@ -735,30 +496,11 @@ static int rotator_probe(struct platform_device *pdev)
 	}
 
 	pm_runtime_enable(dev);
-
-	ippdrv = &rot->ippdrv;
-	ippdrv->dev = dev;
-	ippdrv->ops[EXYNOS_DRM_OPS_SRC] = &rot_src_ops;
-	ippdrv->ops[EXYNOS_DRM_OPS_DST] = &rot_dst_ops;
-	ippdrv->check_property = rotator_ippdrv_check_property;
-	ippdrv->start = rotator_ippdrv_start;
-	ret = rotator_init_prop_list(ippdrv);
-	if (ret < 0) {
-		dev_err(dev, "failed to init property list.\n");
-		goto err_ippdrv_register;
-	}
-
-	DRM_DEBUG_KMS("ippdrv[%pK]\n", ippdrv);
-
 	platform_set_drvdata(pdev, rot);
 
-	ret = exynos_drm_ippdrv_register(ippdrv);
-	if (ret < 0) {
-		dev_err(dev, "failed to register drm rotator device\n");
+	ret = component_add(dev, &rotator_component_ops);
+	if (ret)
 		goto err_ippdrv_register;
-	}
-
-	dev_info(dev, "The exynos rotator is probed successfully\n");
 
 	return 0;
 
@@ -770,11 +512,8 @@ static int rotator_probe(struct platform_device *pdev)
 static int rotator_remove(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
-	struct rot_context *rot = dev_get_drvdata(dev);
-	struct exynos_drm_ippdrv *ippdrv = &rot->ippdrv;
-
-	exynos_drm_ippdrv_unregister(ippdrv);
 
+	component_del(dev, &rotator_component_ops);
 	pm_runtime_disable(dev);
 
 	return 0;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_rotator.h b/drivers/gpu/drm/exynos/exynos_drm_rotator.h
deleted file mode 100644
index 71a0b4c0c1e8..000000000000
--- a/drivers/gpu/drm/exynos/exynos_drm_rotator.h
+++ /dev/null
@@ -1,19 +0,0 @@
-/*
- * Copyright (c) 2012 Samsung Electronics Co., Ltd.
- *
- * Authors:
- *	YoungJun Cho <yj44.cho@samsung.com>
- *	Eunchul Kim <chulspro.kim@samsung.com>
- *
- * This program is free software; you can redistribute  it and/or modify it
- * under  the terms of  the GNU General  Public License as published by the
- * Free Software Foundation;  either version 2 of the  License, or (at your
- * option) any later version.
- */
-
-#ifndef	_EXYNOS_DRM_ROTATOR_H_
-#define	_EXYNOS_DRM_ROTATOR_H_
-
-/* TODO */
-
-#endif
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-20  9:13 ` [RFC 0/4] Exynos DRM: add Picture Processor extension Marek Szyprowski
                     ` (3 preceding siblings ...)
       [not found]   ` <CGME20170420091408eucas1p2ef5b57fdcafcf13fbc52763f7cb43d45@eucas1p2.samsung.com>
@ 2017-04-20 10:25   ` Laurent Pinchart
  2017-04-20 11:23     ` Marek Szyprowski
  2017-04-20 19:02   ` Dave Airlie
  5 siblings, 1 reply; 34+ messages in thread
From: Laurent Pinchart @ 2017-04-20 10:25 UTC (permalink / raw)
  To: dri-devel
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	Tobias Jakobi, Sakari Ailus, Marek Szyprowski

Hi Marek,

(CC'ing Sakari Ailus)

Thank you for the patches.

On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
> Dear all,
> 
> This is an updated proposal for extending EXYNOS DRM API with generic
> support for hardware modules, which can be used for processing image data
> from the one memory buffer to another. Typical memory-to-memory operations
> are: rotation, scaling, colour space conversion or mix of them. This is a
> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
> processors", which has been rejected as "not really needed in the DRM
> core":
> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> 
> In this proposal I moved all the code to Exynos DRM driver, so now this
> will be specific only to Exynos DRM. I've also changed the name from
> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
> with fbdev API.
> 
> Here is a bit more information what picture processors are:
> 
> Embedded SoCs are known to have a number of hardware blocks, which perform
> such operations. They can be used in paralel to the main GPU module to
> offload CPU from processing grapics or video data. One of example use of
> such modules is implementing video overlay, which usually requires color
> space conversion from NV12 (or similar) to RGB32 color space and scaling to
> target window size.
> 
> The proposed API is heavily inspired by atomic KMS approach - it is also
> based on DRM objects and their properties. A new DRM object is introduced:
> picture processor (called pp for convenience). Such objects have a set of
> standard DRM properties, which describes the operation to be performed by
> respective hardware module. In typical case those properties are a source
> fb id and rectangle (x, y, width, height) and destination fb id and
> rectangle. Optionally a rotation property can be also specified if
> supported by the given hardware. To perform an operation on image data,
> userspace provides a set of properties and their values for given fbproc
> object in a similar way as object and properties are provided for
> performing atomic page flip / mode setting.
> 
> The proposed API consists of the 3 new ioctls:
> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>   processors,
> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>   processor,
> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>   property set.
> 
> The proposed API is extensible. Drivers can attach their own, custom
> properties to add support for more advanced picture processing (for example
> blending).
> 
> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
> subsystem. IPP API is over-engineered in general, but not really extensible
> on the other side. It is also buggy, with significant design flaws - the
> biggest issue is the fact that the API covers memory-2-memory picture
> operations together with CRTC writeback and duplicating features, which
> belongs to video plane. Comparing with IPP subsystem, the PP framework is
> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
> rotator driver smaller by over 200 lines).

This seems to be the kind of hardware that is typically supported by V4L2. 
Stupid question, why DRM ?

> Open questions:
> - How to expose pp capabilities and supported formats? Currently this is
> done with a drm_exynos_pp_get structure and DRM_IOCTL_EXYNOS_PP_GET ioctl.
> However one can try to use IMMUTABLE properties for capabilities and
> src/dst format set. Rationale: recently Rob Clark proposed to create a DRM
> property with supported pixelformats and modifiers:
>   http://www.spinics.net/lists/dri-devel/msg137380.html
> - Is it okay to use DRM objects and properties API
> (DRM_IOCTL_MODE_GETPROPERTY and DRM_IOCTL_MODE_OBJ_GETPROPERTIES ioctls)
> for this purpose?
> 
> TODO:
> - convert remaining Exynos DRM IPP drivers (FIMC, GScaller)
> - remove Exynos DRM IPP subsystem
> - (optional) provide virtual V4L2 mem2mem device on top of Exynos PP
> framework
> 
> Patches were tested on Exynos 4412-based Odroid U3 board, on top of Linux
> next-20170420 kernel.
> 
> Best regards
> Marek Szyprowski
> Samsung R&D Institute Poland
> 
> 
> Changelog:
> v1:
> - moved this feature from DRM core to Exynos DRM driver
> - changed name from framebuffer processor to picture processor
> - simplified code to cover only things needed by Exynos drivers
> - implemented simple fifo task scheduler
> - cleaned up rotator driver conversion (removed IPP remainings)
> 
> 
> v0:
> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> - initial post of "[RFC 0/2] New feature: Framebuffer processors"
> - generic approach implemented in DRM core, rejected
> 
> 
> Patch summary:
> 
> Marek Szyprowski (4):
>   drm: Export functions to create custom DRM objects
>   drm: Add support for vendor specific DRM objects with custom
>     properties
>   drm/exynos: Add Picture Processor framework
>   drm/exynos: Convert Exynos Rotator driver to Picture Processor
>     interface
> 
>  drivers/gpu/drm/drm_crtc_internal.h         |   4 -
>  drivers/gpu/drm/drm_mode_object.c           |  11 +-
>  drivers/gpu/drm/drm_property.c              |   2 +-
>  drivers/gpu/drm/exynos/Kconfig              |   1 -
>  drivers/gpu/drm/exynos/Makefile             |   3 +-
>  drivers/gpu/drm/exynos/exynos_drm_drv.c     |   9 +
>  drivers/gpu/drm/exynos/exynos_drm_drv.h     |  15 +
>  drivers/gpu/drm/exynos/exynos_drm_pp.c      | 775 +++++++++++++++++++++++++
>  drivers/gpu/drm/exynos/exynos_drm_pp.h      | 155 ++++++
>  drivers/gpu/drm/exynos/exynos_drm_rotator.c | 513 +++++-------------
>  drivers/gpu/drm/exynos/exynos_drm_rotator.h |  19 -
>  include/drm/drm_mode_object.h               |   6 +
>  include/drm/drm_property.h                  |   7 +
>  include/uapi/drm/drm_mode.h                 |   1 +
>  include/uapi/drm/exynos_drm.h               |  62 +++
>  15 files changed, 1166 insertions(+), 417 deletions(-)
>  create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.c
>  create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.h
>  delete mode 100644 drivers/gpu/drm/exynos/exynos_drm_rotator.h

-- 
Regards,

Laurent Pinchart

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-20 10:25   ` [RFC 0/4] Exynos DRM: add Picture Processor extension Laurent Pinchart
@ 2017-04-20 11:23     ` Marek Szyprowski
  2017-04-20 12:17       ` Tobias Jakobi
  2017-04-25 22:21       ` Sakari Ailus
  0 siblings, 2 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-20 11:23 UTC (permalink / raw)
  To: Laurent Pinchart, dri-devel
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	Tobias Jakobi, Sakari Ailus

Hi Laurent,

On 2017-04-20 12:25, Laurent Pinchart wrote:
> Hi Marek,
>
> (CC'ing Sakari Ailus)
>
> Thank you for the patches.
>
> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>> Dear all,
>>
>> This is an updated proposal for extending EXYNOS DRM API with generic
>> support for hardware modules, which can be used for processing image data
>> from the one memory buffer to another. Typical memory-to-memory operations
>> are: rotation, scaling, colour space conversion or mix of them. This is a
>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>> processors", which has been rejected as "not really needed in the DRM
>> core":
>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>
>> In this proposal I moved all the code to Exynos DRM driver, so now this
>> will be specific only to Exynos DRM. I've also changed the name from
>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>> with fbdev API.
>>
>> Here is a bit more information what picture processors are:
>>
>> Embedded SoCs are known to have a number of hardware blocks, which perform
>> such operations. They can be used in paralel to the main GPU module to
>> offload CPU from processing grapics or video data. One of example use of
>> such modules is implementing video overlay, which usually requires color
>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>> target window size.
>>
>> The proposed API is heavily inspired by atomic KMS approach - it is also
>> based on DRM objects and their properties. A new DRM object is introduced:
>> picture processor (called pp for convenience). Such objects have a set of
>> standard DRM properties, which describes the operation to be performed by
>> respective hardware module. In typical case those properties are a source
>> fb id and rectangle (x, y, width, height) and destination fb id and
>> rectangle. Optionally a rotation property can be also specified if
>> supported by the given hardware. To perform an operation on image data,
>> userspace provides a set of properties and their values for given fbproc
>> object in a similar way as object and properties are provided for
>> performing atomic page flip / mode setting.
>>
>> The proposed API consists of the 3 new ioctls:
>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>    processors,
>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>    processor,
>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>    property set.
>>
>> The proposed API is extensible. Drivers can attach their own, custom
>> properties to add support for more advanced picture processing (for example
>> blending).
>>
>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>> subsystem. IPP API is over-engineered in general, but not really extensible
>> on the other side. It is also buggy, with significant design flaws - the
>> biggest issue is the fact that the API covers memory-2-memory picture
>> operations together with CRTC writeback and duplicating features, which
>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>> rotator driver smaller by over 200 lines).
> This seems to be the kind of hardware that is typically supported by V4L2.
> Stupid question, why DRM ?

Let me elaborate a bit on the reasons for implementing it in Exynos DRM:

1. we want to replace existing Exynos IPP subsystem:
  - it is used only in some internal/vendor trees, not in open-source
  - we want it to have sane and potentially extensible userspace API
  - but we don't want to loose its functionality

2. we want to have simple API for performing single image processing
operation:
  - typically it will be used by compositing window manager, this means that
    some parameters of the processing might change on each vblank (like
    destination rectangle for example). This api allows such change on each
    operation without any additional cost. V4L2 requires to reinitialize
    queues with new configuration on such change, what means that a bunch of
    ioctls has to be called.
  - validating processing parameters in V4l2 API is really complicated,
    because the parameters (format, src&dest rectangles, rotation) are being
    set incrementally, so we have to either allow some impossible, 
transitional
    configurations or complicate the configuration steps even more (like
    calling some ioctls multiple times for both input and output). In 
the end
    all parameters have to be again validated just before performing the
    operation.

3. generic approach (to add it to DRM core) has been rejected:
http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html

4. this api can be considered as extended 'blit' operation, other DRM 
drivers
    (MGA, R128, VIA) already have ioctls for such operation, so there is 
also
    place in DRM for it

>> Open questions:
>> - How to expose pp capabilities and supported formats? Currently this is
>> done with a drm_exynos_pp_get structure and DRM_IOCTL_EXYNOS_PP_GET ioctl.
>> However one can try to use IMMUTABLE properties for capabilities and
>> src/dst format set. Rationale: recently Rob Clark proposed to create a DRM
>> property with supported pixelformats and modifiers:
>>    http://www.spinics.net/lists/dri-devel/msg137380.html
>> - Is it okay to use DRM objects and properties API
>> (DRM_IOCTL_MODE_GETPROPERTY and DRM_IOCTL_MODE_OBJ_GETPROPERTIES ioctls)
>> for this purpose?
>>
>> TODO:
>> - convert remaining Exynos DRM IPP drivers (FIMC, GScaller)
>> - remove Exynos DRM IPP subsystem
>> - (optional) provide virtual V4L2 mem2mem device on top of Exynos PP
>> framework
>>
>> Patches were tested on Exynos 4412-based Odroid U3 board, on top of Linux
>> next-20170420 kernel.
>>
>> Best regards
>> Marek Szyprowski
>> Samsung R&D Institute Poland
>>
>>
>> Changelog:
>> v1:
>> - moved this feature from DRM core to Exynos DRM driver
>> - changed name from framebuffer processor to picture processor
>> - simplified code to cover only things needed by Exynos drivers
>> - implemented simple fifo task scheduler
>> - cleaned up rotator driver conversion (removed IPP remainings)
>>
>>
>> v0:
>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>> - initial post of "[RFC 0/2] New feature: Framebuffer processors"
>> - generic approach implemented in DRM core, rejected
>>
>>
>> Patch summary:
>>
>> Marek Szyprowski (4):
>>    drm: Export functions to create custom DRM objects
>>    drm: Add support for vendor specific DRM objects with custom
>>      properties
>>    drm/exynos: Add Picture Processor framework
>>    drm/exynos: Convert Exynos Rotator driver to Picture Processor
>>      interface
>>
>>   drivers/gpu/drm/drm_crtc_internal.h         |   4 -
>>   drivers/gpu/drm/drm_mode_object.c           |  11 +-
>>   drivers/gpu/drm/drm_property.c              |   2 +-
>>   drivers/gpu/drm/exynos/Kconfig              |   1 -
>>   drivers/gpu/drm/exynos/Makefile             |   3 +-
>>   drivers/gpu/drm/exynos/exynos_drm_drv.c     |   9 +
>>   drivers/gpu/drm/exynos/exynos_drm_drv.h     |  15 +
>>   drivers/gpu/drm/exynos/exynos_drm_pp.c      | 775 +++++++++++++++++++++++++
>>   drivers/gpu/drm/exynos/exynos_drm_pp.h      | 155 ++++++
>>   drivers/gpu/drm/exynos/exynos_drm_rotator.c | 513 +++++-------------
>>   drivers/gpu/drm/exynos/exynos_drm_rotator.h |  19 -
>>   include/drm/drm_mode_object.h               |   6 +
>>   include/drm/drm_property.h                  |   7 +
>>   include/uapi/drm/drm_mode.h                 |   1 +
>>   include/uapi/drm/exynos_drm.h               |  62 +++
>>   15 files changed, 1166 insertions(+), 417 deletions(-)
>>   create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.c
>>   create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.h
>>   delete mode 100644 drivers/gpu/drm/exynos/exynos_drm_rotator.h

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-20 11:23     ` Marek Szyprowski
@ 2017-04-20 12:17       ` Tobias Jakobi
  2017-04-25 22:21       ` Sakari Ailus
  1 sibling, 0 replies; 34+ messages in thread
From: Tobias Jakobi @ 2017-04-20 12:17 UTC (permalink / raw)
  To: Marek Szyprowski, Laurent Pinchart, dri-devel
  Cc: Sakari Ailus, linux-samsung-soc, Seung-Woo Kim,
	Bartlomiej Zolnierkiewicz

Hello everyone,


Marek Szyprowski wrote:
> Hi Laurent,
> 
> On 2017-04-20 12:25, Laurent Pinchart wrote:
>> Hi Marek,
>>
>> (CC'ing Sakari Ailus)
>>
>> Thank you for the patches.
>>
>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>> Dear all,
>>>
>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>> support for hardware modules, which can be used for processing image
>>> data
>>> from the one memory buffer to another. Typical memory-to-memory
>>> operations
>>> are: rotation, scaling, colour space conversion or mix of them. This
>>> is a
>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>> processors", which has been rejected as "not really needed in the DRM
>>> core":
>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>
>>>
>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>> will be specific only to Exynos DRM. I've also changed the name from
>>> framebuffer processor (fbproc) to picture processor (pp) to avoid
>>> confusion
>>> with fbdev API.
>>>
>>> Here is a bit more information what picture processors are:
>>>
>>> Embedded SoCs are known to have a number of hardware blocks, which
>>> perform
>>> such operations. They can be used in paralel to the main GPU module to
>>> offload CPU from processing grapics or video data. One of example use of
>>> such modules is implementing video overlay, which usually requires color
>>> space conversion from NV12 (or similar) to RGB32 color space and
>>> scaling to
>>> target window size.
>>>
>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>> based on DRM objects and their properties. A new DRM object is
>>> introduced:
>>> picture processor (called pp for convenience). Such objects have a
>>> set of
>>> standard DRM properties, which describes the operation to be
>>> performed by
>>> respective hardware module. In typical case those properties are a
>>> source
>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>> rectangle. Optionally a rotation property can be also specified if
>>> supported by the given hardware. To perform an operation on image data,
>>> userspace provides a set of properties and their values for given fbproc
>>> object in a similar way as object and properties are provided for
>>> performing atomic page flip / mode setting.
>>>
>>> The proposed API consists of the 3 new ioctls:
>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>    processors,
>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>    processor,
>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>    property set.
>>>
>>> The proposed API is extensible. Drivers can attach their own, custom
>>> properties to add support for more advanced picture processing (for
>>> example
>>> blending).
>>>
>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>> subsystem. IPP API is over-engineered in general, but not really
>>> extensible
>>> on the other side. It is also buggy, with significant design flaws - the
>>> biggest issue is the fact that the API covers memory-2-memory picture
>>> operations together with CRTC writeback and duplicating features, which
>>> belongs to video plane. Comparing with IPP subsystem, the PP
>>> framework is
>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>> rotator driver smaller by over 200 lines).
>> This seems to be the kind of hardware that is typically supported by
>> V4L2.
>> Stupid question, why DRM ?
> 
> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
> 
> 1. we want to replace existing Exynos IPP subsystem:
>  - it is used only in some internal/vendor trees, not in open-source
>  - we want it to have sane and potentially extensible userspace API
>  - but we don't want to loose its functionality
> 
> 2. we want to have simple API for performing single image processing
> operation:
>  - typically it will be used by compositing window manager, this means that
>    some parameters of the processing might change on each vblank (like
>    destination rectangle for example). This api allows such change on each
>    operation without any additional cost. V4L2 requires to reinitialize
>    queues with new configuration on such change, what means that a bunch of
>    ioctls has to be called.
>  - validating processing parameters in V4l2 API is really complicated,
>    because the parameters (format, src&dest rectangles, rotation) are being
>    set incrementally, so we have to either allow some impossible,
> transitional
>    configurations or complicate the configuration steps even more (like
>    calling some ioctls multiple times for both input and output). In the
> end
>    all parameters have to be again validated just before performing the
>    operation.
> 
> 3. generic approach (to add it to DRM core) has been rejected:
> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> 
> 4. this api can be considered as extended 'blit' operation, other DRM
> drivers
>    (MGA, R128, VIA) already have ioctls for such operation, so there is
> also
>    place in DRM for it
> 
I just wanted to say that I like the proposed API. I could do some
reviewing, if the API isn't discarded quickly like last time (when it
was still proposed as part of DRM core).

To comment about why this should be in DRM land. I personally could
never get comfortable with the V4L2 API. It seems to be overengineered
for the context of a blitting op, hence what Marek already said. I
prefer to work with one API, so I don't have to worry about potential
problems when two of them interact. For V4L2 this would be exchange of
buffers, where I always have to think where I have to create some
buffer, because I might be able to export it, but not import it on the
other side. In this case DRM is way simpler, I just pass around my GEM
handle and that's it. Stuff like that.

Also, I still haven't dropped this small project of mine, using such an
API for an Exynos renderer backend for the mpv media player.



With best wishes,
Tobias


>>> Open questions:
>>> - How to expose pp capabilities and supported formats? Currently this is
>>> done with a drm_exynos_pp_get structure and DRM_IOCTL_EXYNOS_PP_GET
>>> ioctl.
>>> However one can try to use IMMUTABLE properties for capabilities and
>>> src/dst format set. Rationale: recently Rob Clark proposed to create
>>> a DRM
>>> property with supported pixelformats and modifiers:
>>>    http://www.spinics.net/lists/dri-devel/msg137380.html
>>> - Is it okay to use DRM objects and properties API
>>> (DRM_IOCTL_MODE_GETPROPERTY and DRM_IOCTL_MODE_OBJ_GETPROPERTIES ioctls)
>>> for this purpose?
>>>
>>> TODO:
>>> - convert remaining Exynos DRM IPP drivers (FIMC, GScaller)
>>> - remove Exynos DRM IPP subsystem
>>> - (optional) provide virtual V4L2 mem2mem device on top of Exynos PP
>>> framework
>>>
>>> Patches were tested on Exynos 4412-based Odroid U3 board, on top of
>>> Linux
>>> next-20170420 kernel.
>>>
>>> Best regards
>>> Marek Szyprowski
>>> Samsung R&D Institute Poland
>>>
>>>
>>> Changelog:
>>> v1:
>>> - moved this feature from DRM core to Exynos DRM driver
>>> - changed name from framebuffer processor to picture processor
>>> - simplified code to cover only things needed by Exynos drivers
>>> - implemented simple fifo task scheduler
>>> - cleaned up rotator driver conversion (removed IPP remainings)
>>>
>>>
>>> v0:
>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>
>>> - initial post of "[RFC 0/2] New feature: Framebuffer processors"
>>> - generic approach implemented in DRM core, rejected
>>>
>>>
>>> Patch summary:
>>>
>>> Marek Szyprowski (4):
>>>    drm: Export functions to create custom DRM objects
>>>    drm: Add support for vendor specific DRM objects with custom
>>>      properties
>>>    drm/exynos: Add Picture Processor framework
>>>    drm/exynos: Convert Exynos Rotator driver to Picture Processor
>>>      interface
>>>
>>>   drivers/gpu/drm/drm_crtc_internal.h         |   4 -
>>>   drivers/gpu/drm/drm_mode_object.c           |  11 +-
>>>   drivers/gpu/drm/drm_property.c              |   2 +-
>>>   drivers/gpu/drm/exynos/Kconfig              |   1 -
>>>   drivers/gpu/drm/exynos/Makefile             |   3 +-
>>>   drivers/gpu/drm/exynos/exynos_drm_drv.c     |   9 +
>>>   drivers/gpu/drm/exynos/exynos_drm_drv.h     |  15 +
>>>   drivers/gpu/drm/exynos/exynos_drm_pp.c      | 775
>>> +++++++++++++++++++++++++
>>>   drivers/gpu/drm/exynos/exynos_drm_pp.h      | 155 ++++++
>>>   drivers/gpu/drm/exynos/exynos_drm_rotator.c | 513 +++++-------------
>>>   drivers/gpu/drm/exynos/exynos_drm_rotator.h |  19 -
>>>   include/drm/drm_mode_object.h               |   6 +
>>>   include/drm/drm_property.h                  |   7 +
>>>   include/uapi/drm/drm_mode.h                 |   1 +
>>>   include/uapi/drm/exynos_drm.h               |  62 +++
>>>   15 files changed, 1166 insertions(+), 417 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.c
>>>   create mode 100644 drivers/gpu/drm/exynos/exynos_drm_pp.h
>>>   delete mode 100644 drivers/gpu/drm/exynos/exynos_drm_rotator.h
> 
> Best regards

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-20  9:13 ` [RFC 0/4] Exynos DRM: add Picture Processor extension Marek Szyprowski
                     ` (4 preceding siblings ...)
  2017-04-20 10:25   ` [RFC 0/4] Exynos DRM: add Picture Processor extension Laurent Pinchart
@ 2017-04-20 19:02   ` Dave Airlie
  2017-04-25  6:59     ` Marek Szyprowski
  5 siblings, 1 reply; 34+ messages in thread
From: Dave Airlie @ 2017-04-20 19:02 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Tobias Jakobi, linux-samsung-soc, Seung-Woo Kim, dri-devel,
	Bartlomiej Zolnierkiewicz

On 20 April 2017 at 19:13, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
> Dear all,
>
> This is an updated proposal for extending EXYNOS DRM API with generic support
> for hardware modules, which can be used for processing image data from the
> one memory buffer to another. Typical memory-to-memory operations are:
> rotation, scaling, colour space conversion or mix of them. This is
> a follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
> processors", which has been rejected as "not really needed in the DRM core":
> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>
> In this proposal I moved all the code to Exynos DRM driver, so now this
> will be specific only to Exynos DRM. I've also changed the name from
> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
> with fbdev API.
>
> Here is a bit more information what picture processors are:
>
> Embedded SoCs are known to have a number of hardware blocks, which perform
> such operations. They can be used in paralel to the main GPU module to
> offload CPU from processing grapics or video data. One of example use of
> such modules is implementing video overlay, which usually requires color
> space conversion from NV12 (or similar) to RGB32 color space and scaling to
> target window size.
>
> The proposed API is heavily inspired by atomic KMS approach - it is also
> based on DRM objects and their properties. A new DRM object is introduced:
> picture processor (called pp for convenience). Such objects have a set of
> standard DRM properties, which describes the operation to be performed by
> respective hardware module. In typical case those properties are a source
> fb id and rectangle (x, y, width, height) and destination fb id and
> rectangle. Optionally a rotation property can be also specified if
> supported by the given hardware. To perform an operation on image data,
> userspace provides a set of properties and their values for given fbproc
> object in a similar way as object and properties are provided for
> performing atomic page flip / mode setting.
>
> The proposed API consists of the 3 new ioctls:
> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>   processors,
> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>   processor,
> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>   property set.
>
> The proposed API is extensible. Drivers can attach their own, custom
> properties to add support for more advanced picture processing (for example
> blending).

So this looks more like a command submission API like we have for other drivers.

Is there an overarching reason why it needs to reuse the core drm
object tracking
and properties, or was that just a nice to have, vs something like
amdgpu chunks.

My worry about exposing objects and properties to the drivers is I'm
sure someone
could get quite inventive and end up with a forked atomic API that we don't see,
or undocumented things.

Dave.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-20 19:02   ` Dave Airlie
@ 2017-04-25  6:59     ` Marek Szyprowski
  0 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-25  6:59 UTC (permalink / raw)
  To: Dave Airlie
  Cc: dri-devel, linux-samsung-soc, Bartlomiej Zolnierkiewicz,
	Seung-Woo Kim, Tobias Jakobi

Hi Dave,

On 2017-04-20 21:02, Dave Airlie wrote:
> On 20 April 2017 at 19:13, Marek Szyprowski <m.szyprowski@samsung.com> wrote:
>> This is an updated proposal for extending EXYNOS DRM API with generic support
>> for hardware modules, which can be used for processing image data from the
>> one memory buffer to another. Typical memory-to-memory operations are:
>> rotation, scaling, colour space conversion or mix of them. This is
>> a follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>> processors", which has been rejected as "not really needed in the DRM core":
>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>
>> In this proposal I moved all the code to Exynos DRM driver, so now this
>> will be specific only to Exynos DRM. I've also changed the name from
>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>> with fbdev API.
>>
>> Here is a bit more information what picture processors are:
>>
>> Embedded SoCs are known to have a number of hardware blocks, which perform
>> such operations. They can be used in paralel to the main GPU module to
>> offload CPU from processing grapics or video data. One of example use of
>> such modules is implementing video overlay, which usually requires color
>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>> target window size.
>>
>> The proposed API is heavily inspired by atomic KMS approach - it is also
>> based on DRM objects and their properties. A new DRM object is introduced:
>> picture processor (called pp for convenience). Such objects have a set of
>> standard DRM properties, which describes the operation to be performed by
>> respective hardware module. In typical case those properties are a source
>> fb id and rectangle (x, y, width, height) and destination fb id and
>> rectangle. Optionally a rotation property can be also specified if
>> supported by the given hardware. To perform an operation on image data,
>> userspace provides a set of properties and their values for given fbproc
>> object in a similar way as object and properties are provided for
>> performing atomic page flip / mode setting.
>>
>> The proposed API consists of the 3 new ioctls:
>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>    processors,
>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>    processor,
>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>    property set.
>>
>> The proposed API is extensible. Drivers can attach their own, custom
>> properties to add support for more advanced picture processing (for example
>> blending).
> So this looks more like a command submission API like we have for other drivers.
>
> Is there an overarching reason why it needs to reuse the core drm
> object tracking
> and properties, or was that just a nice to have, vs something like
> amdgpu chunks.

Thanks for your comment.

DRM objects and properties were my first choice when designing this new api,
but I'm still a bit new to DRM at all. I was also a bit fascinated by the
atomic KMS approach, though. I selected them simply to reuse the code for
managing objects and enumerating their properties from userspace. If this is
not the preferred approach, I will rewrite the code to use something custom.
I didn't know about amdgpu chunks, but from the quick look they are just a
structure to store a set of ids and data for them. Maybe there is no need to
use strings for enumerating the properties and limiting the API to the known
set of IDs will be more than enough in this case.

> My worry about exposing objects and properties to the drivers is I'm
> sure someone
> could get quite inventive and end up with a forked atomic API that we don't see,
> or undocumented things.

Okay. I will try different approach then.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-20 11:23     ` Marek Szyprowski
  2017-04-20 12:17       ` Tobias Jakobi
@ 2017-04-25 22:21       ` Sakari Ailus
  2017-04-26 14:53         ` Nicolas Dufresne
                           ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Sakari Ailus @ 2017-04-25 22:21 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Sakari Ailus, linux-media

Hi Marek,

On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
> Hi Laurent,
> 
> On 2017-04-20 12:25, Laurent Pinchart wrote:
> >Hi Marek,
> >
> >(CC'ing Sakari Ailus)
> >
> >Thank you for the patches.
> >
> >On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
> >>Dear all,
> >>
> >>This is an updated proposal for extending EXYNOS DRM API with generic
> >>support for hardware modules, which can be used for processing image data
> >>from the one memory buffer to another. Typical memory-to-memory operations
> >>are: rotation, scaling, colour space conversion or mix of them. This is a
> >>follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
> >>processors", which has been rejected as "not really needed in the DRM
> >>core":
> >>http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> >>
> >>In this proposal I moved all the code to Exynos DRM driver, so now this
> >>will be specific only to Exynos DRM. I've also changed the name from
> >>framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
> >>with fbdev API.
> >>
> >>Here is a bit more information what picture processors are:
> >>
> >>Embedded SoCs are known to have a number of hardware blocks, which perform
> >>such operations. They can be used in paralel to the main GPU module to
> >>offload CPU from processing grapics or video data. One of example use of
> >>such modules is implementing video overlay, which usually requires color
> >>space conversion from NV12 (or similar) to RGB32 color space and scaling to
> >>target window size.
> >>
> >>The proposed API is heavily inspired by atomic KMS approach - it is also
> >>based on DRM objects and their properties. A new DRM object is introduced:
> >>picture processor (called pp for convenience). Such objects have a set of
> >>standard DRM properties, which describes the operation to be performed by
> >>respective hardware module. In typical case those properties are a source
> >>fb id and rectangle (x, y, width, height) and destination fb id and
> >>rectangle. Optionally a rotation property can be also specified if
> >>supported by the given hardware. To perform an operation on image data,
> >>userspace provides a set of properties and their values for given fbproc
> >>object in a similar way as object and properties are provided for
> >>performing atomic page flip / mode setting.
> >>
> >>The proposed API consists of the 3 new ioctls:
> >>- DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
> >>   processors,
> >>- DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
> >>   processor,
> >>- DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
> >>   property set.
> >>
> >>The proposed API is extensible. Drivers can attach their own, custom
> >>properties to add support for more advanced picture processing (for example
> >>blending).
> >>
> >>This proposal aims to replace Exynos DRM IPP (Image Post Processing)
> >>subsystem. IPP API is over-engineered in general, but not really extensible
> >>on the other side. It is also buggy, with significant design flaws - the
> >>biggest issue is the fact that the API covers memory-2-memory picture
> >>operations together with CRTC writeback and duplicating features, which
> >>belongs to video plane. Comparing with IPP subsystem, the PP framework is
> >>smaller (1807 vs 778 lines) and allows driver simplification (Exynos
> >>rotator driver smaller by over 200 lines).
> >This seems to be the kind of hardware that is typically supported by V4L2.
> >Stupid question, why DRM ?
> 
> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
> 
> 1. we want to replace existing Exynos IPP subsystem:
>  - it is used only in some internal/vendor trees, not in open-source
>  - we want it to have sane and potentially extensible userspace API
>  - but we don't want to loose its functionality
> 
> 2. we want to have simple API for performing single image processing
> operation:
>  - typically it will be used by compositing window manager, this means that
>    some parameters of the processing might change on each vblank (like
>    destination rectangle for example). This api allows such change on each
>    operation without any additional cost. V4L2 requires to reinitialize
>    queues with new configuration on such change, what means that a bunch of
>    ioctls has to be called.

What do you mean by re-initialising the queue? Format, buffers or something
else?

If you need a larger buffer than what you have already allocated, you'll
need to re-allocate, V4L2 or not.

We also do lack a way to destroy individual buffers in V4L2. It'd be up to
implementing that and some work in videobuf2.

Another thing is that V4L2 is very stream oriented. For most devices that's
fine as a lot of the parameters are not changeable during streaming,
especially if the pipeline is handled by multiple drivers. That said, for
devices that process data from memory to memory performing changes in the
media bus formats and pipeline configuration is not very efficient
currently, largely for the same reason.

The request API that people have been working for a bit different use cases
isn't in mainline yet. It would allow more efficient per-request
configuration than what is currently possible, but it has turned out to be
far from trivial to implement.

>  - validating processing parameters in V4l2 API is really complicated,
>    because the parameters (format, src&dest rectangles, rotation) are being
>    set incrementally, so we have to either allow some impossible,
> transitional
>    configurations or complicate the configuration steps even more (like
>    calling some ioctls multiple times for both input and output). In the end
>    all parameters have to be again validated just before performing the
>    operation.

You have to validate the parameters in any case. In a MC pipeline this takes
place when the stream is started.

> 
> 3. generic approach (to add it to DRM core) has been rejected:
> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html

For GPUs I generally understand the reasoning: there's a very limited number
of users of this API --- primarily because it's not an application
interface.

If you have a device that however falls under the scope of V4L2 (at least
API-wise), does this continue to be the case? Will there be only one or two
(or so) users for this API? Is it the case here?

Using a device specific interface definitely has some benefits: there's no
need to think how would you generalise the interface for other similar
devices. There's no need to consider backwards compatibility as it's not a
requirement. The drawback is that the applications that need to support
similar devices will bear the burden of having to support different APIs.

I don't mean to say that you should ram whatever under V4L2 / MC
independently of how unworkable that might be, but there are also clear
advantages in using a standardised interface such as V4L2.

V4L2 has a long history behind it and if it was designed today, I bet it
would look quite different from what it is now.

> 
> 4. this api can be considered as extended 'blit' operation, other DRM
> drivers
>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>    place in DRM for it

Added LMML to cc.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-25 22:21       ` Sakari Ailus
@ 2017-04-26 14:53         ` Nicolas Dufresne
  2017-04-26 15:16             ` Tobias Jakobi
  2017-04-26 16:52             ` Tobias Jakobi
  2017-04-27 13:52         ` Marek Szyprowski
  2017-05-10  1:24         ` Inki Dae
  2 siblings, 2 replies; 34+ messages in thread
From: Nicolas Dufresne @ 2017-04-26 14:53 UTC (permalink / raw)
  To: Sakari Ailus, Marek Szyprowski
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Sakari Ailus, linux-media

[-- Attachment #1: Type: text/plain, Size: 10485 bytes --]

Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
> Hi Marek,
> 
> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
> > Hi Laurent,
> > 
> > On 2017-04-20 12:25, Laurent Pinchart wrote:
> > > Hi Marek,
> > > 
> > > (CC'ing Sakari Ailus)
> > > 
> > > Thank you for the patches.
> > > 
> > > On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
> > > > Dear all,
> > > > 
> > > > This is an updated proposal for extending EXYNOS DRM API with generic
> > > > support for hardware modules, which can be used for processing image data
> > > > from the one memory buffer to another. Typical memory-to-memory operations
> > > > are: rotation, scaling, colour space conversion or mix of them. This is a
> > > > follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
> > > > processors", which has been rejected as "not really needed in the DRM
> > > > core":
> > > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> > > > 
> > > > In this proposal I moved all the code to Exynos DRM driver, so now this
> > > > will be specific only to Exynos DRM. I've also changed the name from
> > > > framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
> > > > with fbdev API.
> > > > 
> > > > Here is a bit more information what picture processors are:
> > > > 
> > > > Embedded SoCs are known to have a number of hardware blocks, which perform
> > > > such operations. They can be used in paralel to the main GPU module to
> > > > offload CPU from processing grapics or video data. One of example use of
> > > > such modules is implementing video overlay, which usually requires color
> > > > space conversion from NV12 (or similar) to RGB32 color space and scaling to
> > > > target window size.
> > > > 
> > > > The proposed API is heavily inspired by atomic KMS approach - it is also
> > > > based on DRM objects and their properties. A new DRM object is introduced:
> > > > picture processor (called pp for convenience). Such objects have a set of
> > > > standard DRM properties, which describes the operation to be performed by
> > > > respective hardware module. In typical case those properties are a source
> > > > fb id and rectangle (x, y, width, height) and destination fb id and
> > > > rectangle. Optionally a rotation property can be also specified if
> > > > supported by the given hardware. To perform an operation on image data,
> > > > userspace provides a set of properties and their values for given fbproc
> > > > object in a similar way as object and properties are provided for
> > > > performing atomic page flip / mode setting.
> > > > 
> > > > The proposed API consists of the 3 new ioctls:
> > > > - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
> > > >   processors,
> > > > - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
> > > >   processor,
> > > > - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
> > > >   property set.
> > > > 
> > > > The proposed API is extensible. Drivers can attach their own, custom
> > > > properties to add support for more advanced picture processing (for example
> > > > blending).
> > > > 
> > > > This proposal aims to replace Exynos DRM IPP (Image Post Processing)
> > > > subsystem. IPP API is over-engineered in general, but not really extensible
> > > > on the other side. It is also buggy, with significant design flaws - the
> > > > biggest issue is the fact that the API covers memory-2-memory picture
> > > > operations together with CRTC writeback and duplicating features, which
> > > > belongs to video plane. Comparing with IPP subsystem, the PP framework is
> > > > smaller (1807 vs 778 lines) and allows driver simplification (Exynos
> > > > rotator driver smaller by over 200 lines).

Just a side note, we have written code in GStreamer using the Exnynos 4
FIMC IPP driver. I don't know how many, if any, deployment still exist
(Exynos 4 is relatively old now), but there exist userspace for the
FIMC driver. We use this for color transformation (from tiled to
linear) and scaling. The FIMC driver is in fact quite stable in
upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
largely based on it and has received some maintenance to properly work
in GStreamer. unlike this DRM API, you can reuse the same userspace
code across multiple platforms (which we do already). We have also
integrated this driver in Chromium in the past (not upstream though).

I am well aware that the blitter driver has not got much attention
though. But again, V4L2 offers a generic interface to userspace
application. Fixing this driver could enable some work like this one:

https://bugzilla.gnome.org/show_bug.cgi?id=772766

This work in progress feature is a generic hardware accelerated video
mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
believe is in staging right now). Again, unlike the exynos/drm, this
code could be reused between platforms.

In general, the problem with the DRM approach is that it only targets
displays. We often need to use these IP block for stream pre/post
processing outside a "playback" use case.

What I'd like so see instead here, is an approach that helps both world
 instead of trying to win the control over the IP block. Renesas
development seems to lead toward the right direction by creating
drivers that can be both interfaced in DRM and V4L2. For IPP and
GScaler on Exynos, this would be a greater benefit and finally the code
could be shared, having a single place to fix when we find bugs.

> > > 
> > > This seems to be the kind of hardware that is typically supported by V4L2.
> > > Stupid question, why DRM ?
> > 
> > Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
> > 
> > 1. we want to replace existing Exynos IPP subsystem:
> >  - it is used only in some internal/vendor trees, not in open-source
> >  - we want it to have sane and potentially extensible userspace API
> >  - but we don't want to loose its functionality
> > 
> > 2. we want to have simple API for performing single image processing
> > operation:
> >  - typically it will be used by compositing window manager, this means that
> >    some parameters of the processing might change on each vblank (like
> >    destination rectangle for example). This api allows such change on each
> >    operation without any additional cost. V4L2 requires to reinitialize
> >    queues with new configuration on such change, what means that a bunch of
> >    ioctls has to be called.
> 
> What do you mean by re-initialising the queue? Format, buffers or something
> else?
> 
> If you need a larger buffer than what you have already allocated, you'll
> need to re-allocate, V4L2 or not.
> 
> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
> implementing that and some work in videobuf2.
> 
> Another thing is that V4L2 is very stream oriented. For most devices that's
> fine as a lot of the parameters are not changeable during streaming,
> especially if the pipeline is handled by multiple drivers. That said, for
> devices that process data from memory to memory performing changes in the
> media bus formats and pipeline configuration is not very efficient
> currently, largely for the same reason.
> 
> The request API that people have been working for a bit different use cases
> isn't in mainline yet. It would allow more efficient per-request
> configuration than what is currently possible, but it has turned out to be
> far from trivial to implement.
> 
> >  - validating processing parameters in V4l2 API is really complicated,
> >    because the parameters (format, src&dest rectangles, rotation) are being
> >    set incrementally, so we have to either allow some impossible,
> > transitional
> >    configurations or complicate the configuration steps even more (like
> >    calling some ioctls multiple times for both input and output). In the end
> >    all parameters have to be again validated just before performing the
> >    operation.
> 
> You have to validate the parameters in any case. In a MC pipeline this takes
> place when the stream is started.
> 
> > 
> > 3. generic approach (to add it to DRM core) has been rejected:
> > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> 
> For GPUs I generally understand the reasoning: there's a very limited number
> of users of this API --- primarily because it's not an application
> interface.
> 
> If you have a device that however falls under the scope of V4L2 (at least
> API-wise), does this continue to be the case? Will there be only one or two
> (or so) users for this API? Is it the case here?
> 
> Using a device specific interface definitely has some benefits: there's no
> need to think how would you generalise the interface for other similar
> devices. There's no need to consider backwards compatibility as it's not a
> requirement. The drawback is that the applications that need to support
> similar devices will bear the burden of having to support different APIs.
> 
> I don't mean to say that you should ram whatever under V4L2 / MC
> independently of how unworkable that might be, but there are also clear
> advantages in using a standardised interface such as V4L2.
> 
> V4L2 has a long history behind it and if it was designed today, I bet it
> would look quite different from what it is now.
> 
> > 
> > 4. this api can be considered as extended 'blit' operation, other DRM
> > drivers
> >    (MGA, R128, VIA) already have ioctls for such operation, so there is also
> >    place in DRM for it

Note that I am convince that using these custom IOCTL within a
"compositor" implementation is much easier and uniform compared to
using a v4l2 driver. It probably offers lower latency. But these are
non-generic and are not a great fit for streaming purpose. Request API
and probably explicit fence may mitigate this though. Meanwhile, there
is some indication that even though complex, there is already some
people that do think implementing a compositor combining V4L2 and DRM
is feasible.

http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
land_weston_v2.pdf

> 
> Added LMML to cc.

Thanks.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-26 14:53         ` Nicolas Dufresne
@ 2017-04-26 15:16             ` Tobias Jakobi
  2017-04-26 16:52             ` Tobias Jakobi
  1 sibling, 0 replies; 34+ messages in thread
From: Tobias Jakobi @ 2017-04-26 15:16 UTC (permalink / raw)
  To: Nicolas Dufresne, Sakari Ailus, Marek Szyprowski
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Sakari Ailus,
	linux-media

Hello everyone,


Nicolas Dufresne wrote:
> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
>> Hi Marek,
>>
>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>> Hi Laurent,
>>>
>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>> Hi Marek,
>>>>
>>>> (CC'ing Sakari Ailus)
>>>>
>>>> Thank you for the patches.
>>>>
>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>> Dear all,
>>>>>
>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>> support for hardware modules, which can be used for processing image data
>>>>> from the one memory buffer to another. Typical memory-to-memory operations
>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>> core":
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>> with fbdev API.
>>>>>
>>>>> Here is a bit more information what picture processors are:
>>>>>
>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>> such modules is implementing video overlay, which usually requires color
>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>> target window size.
>>>>>
>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>> standard DRM properties, which describes the operation to be performed by
>>>>> respective hardware module. In typical case those properties are a source
>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>> supported by the given hardware. To perform an operation on image data,
>>>>> userspace provides a set of properties and their values for given fbproc
>>>>> object in a similar way as object and properties are provided for
>>>>> performing atomic page flip / mode setting.
>>>>>
>>>>> The proposed API consists of the 3 new ioctls:
>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>   processors,
>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>   processor,
>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>   property set.
>>>>>
>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>> properties to add support for more advanced picture processing (for example
>>>>> blending).
>>>>>
>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>> operations together with CRTC writeback and duplicating features, which
>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>> rotator driver smaller by over 200 lines).
> 
> Just a side note, we have written code in GStreamer using the Exnynos 4
> FIMC IPP driver. I don't know how many, if any, deployment still exist
> (Exynos 4 is relatively old now), but there exist userspace for the
> FIMC driver. We use this for color transformation (from tiled to
> linear) and scaling. The FIMC driver is in fact quite stable in
> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
> largely based on it and has received some maintenance to properly work
> in GStreamer. unlike this DRM API, you can reuse the same userspace
> code across multiple platforms (which we do already). We have also
> integrated this driver in Chromium in the past (not upstream though).
> 
> I am well aware that the blitter driver has not got much attention
> though. But again, V4L2 offers a generic interface to userspace
> application. Fixing this driver could enable some work like this one:
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=772766
> 
> This work in progress feature is a generic hardware accelerated video
> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
> believe is in staging right now). Again, unlike the exynos/drm, this
> code could be reused between platforms.
> 
> In general, the problem with the DRM approach is that it only targets
> displays. We often need to use these IP block for stream pre/post
> processing outside a "playback" use case.
just a short note that this is not true. You can use all this
functionality e.g. through render nodes, without needing to have a
display attached to your system.

With best wishes,
Tobias


> What I'd like so see instead here, is an approach that helps both world
>  instead of trying to win the control over the IP block. Renesas
> development seems to lead toward the right direction by creating
> drivers that can be both interfaced in DRM and V4L2. For IPP and
> GScaler on Exynos, this would be a greater benefit and finally the code
> could be shared, having a single place to fix when we find bugs.
> 
>>>>
>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>> Stupid question, why DRM ?
>>>
>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>
>>> 1. we want to replace existing Exynos IPP subsystem:
>>>  - it is used only in some internal/vendor trees, not in open-source
>>>  - we want it to have sane and potentially extensible userspace API
>>>  - but we don't want to loose its functionality
>>>
>>> 2. we want to have simple API for performing single image processing
>>> operation:
>>>  - typically it will be used by compositing window manager, this means that
>>>    some parameters of the processing might change on each vblank (like
>>>    destination rectangle for example). This api allows such change on each
>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>    queues with new configuration on such change, what means that a bunch of
>>>    ioctls has to be called.
>>
>> What do you mean by re-initialising the queue? Format, buffers or something
>> else?
>>
>> If you need a larger buffer than what you have already allocated, you'll
>> need to re-allocate, V4L2 or not.
>>
>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>> implementing that and some work in videobuf2.
>>
>> Another thing is that V4L2 is very stream oriented. For most devices that's
>> fine as a lot of the parameters are not changeable during streaming,
>> especially if the pipeline is handled by multiple drivers. That said, for
>> devices that process data from memory to memory performing changes in the
>> media bus formats and pipeline configuration is not very efficient
>> currently, largely for the same reason.
>>
>> The request API that people have been working for a bit different use cases
>> isn't in mainline yet. It would allow more efficient per-request
>> configuration than what is currently possible, but it has turned out to be
>> far from trivial to implement.
>>
>>>  - validating processing parameters in V4l2 API is really complicated,
>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>    set incrementally, so we have to either allow some impossible,
>>> transitional
>>>    configurations or complicate the configuration steps even more (like
>>>    calling some ioctls multiple times for both input and output). In the end
>>>    all parameters have to be again validated just before performing the
>>>    operation.
>>
>> You have to validate the parameters in any case. In a MC pipeline this takes
>> place when the stream is started.
>>
>>>
>>> 3. generic approach (to add it to DRM core) has been rejected:
>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>
>> For GPUs I generally understand the reasoning: there's a very limited number
>> of users of this API --- primarily because it's not an application
>> interface.
>>
>> If you have a device that however falls under the scope of V4L2 (at least
>> API-wise), does this continue to be the case? Will there be only one or two
>> (or so) users for this API? Is it the case here?
>>
>> Using a device specific interface definitely has some benefits: there's no
>> need to think how would you generalise the interface for other similar
>> devices. There's no need to consider backwards compatibility as it's not a
>> requirement. The drawback is that the applications that need to support
>> similar devices will bear the burden of having to support different APIs.
>>
>> I don't mean to say that you should ram whatever under V4L2 / MC
>> independently of how unworkable that might be, but there are also clear
>> advantages in using a standardised interface such as V4L2.
>>
>> V4L2 has a long history behind it and if it was designed today, I bet it
>> would look quite different from what it is now.
>>
>>>
>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>> drivers
>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>    place in DRM for it
> 
> Note that I am convince that using these custom IOCTL within a
> "compositor" implementation is much easier and uniform compared to
> using a v4l2 driver. It probably offers lower latency. But these are
> non-generic and are not a great fit for streaming purpose. Request API
> and probably explicit fence may mitigate this though. Meanwhile, there
> is some indication that even though complex, there is already some
> people that do think implementing a compositor combining V4L2 and DRM
> is feasible.
> 
> http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
> land_weston_v2.pdf
> 
>>
>> Added LMML to cc.
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
@ 2017-04-26 15:16             ` Tobias Jakobi
  0 siblings, 0 replies; 34+ messages in thread
From: Tobias Jakobi @ 2017-04-26 15:16 UTC (permalink / raw)
  To: Nicolas Dufresne, Sakari Ailus, Marek Szyprowski
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	dri-devel, Laurent Pinchart, Sakari Ailus, linux-media

Hello everyone,


Nicolas Dufresne wrote:
> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
>> Hi Marek,
>>
>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>> Hi Laurent,
>>>
>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>> Hi Marek,
>>>>
>>>> (CC'ing Sakari Ailus)
>>>>
>>>> Thank you for the patches.
>>>>
>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>> Dear all,
>>>>>
>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>> support for hardware modules, which can be used for processing image data
>>>>> from the one memory buffer to another. Typical memory-to-memory operations
>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>> core":
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>> with fbdev API.
>>>>>
>>>>> Here is a bit more information what picture processors are:
>>>>>
>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>> such modules is implementing video overlay, which usually requires color
>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>> target window size.
>>>>>
>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>> standard DRM properties, which describes the operation to be performed by
>>>>> respective hardware module. In typical case those properties are a source
>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>> supported by the given hardware. To perform an operation on image data,
>>>>> userspace provides a set of properties and their values for given fbproc
>>>>> object in a similar way as object and properties are provided for
>>>>> performing atomic page flip / mode setting.
>>>>>
>>>>> The proposed API consists of the 3 new ioctls:
>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>   processors,
>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>   processor,
>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>   property set.
>>>>>
>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>> properties to add support for more advanced picture processing (for example
>>>>> blending).
>>>>>
>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>> operations together with CRTC writeback and duplicating features, which
>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>> rotator driver smaller by over 200 lines).
> 
> Just a side note, we have written code in GStreamer using the Exnynos 4
> FIMC IPP driver. I don't know how many, if any, deployment still exist
> (Exynos 4 is relatively old now), but there exist userspace for the
> FIMC driver. We use this for color transformation (from tiled to
> linear) and scaling. The FIMC driver is in fact quite stable in
> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
> largely based on it and has received some maintenance to properly work
> in GStreamer. unlike this DRM API, you can reuse the same userspace
> code across multiple platforms (which we do already). We have also
> integrated this driver in Chromium in the past (not upstream though).
> 
> I am well aware that the blitter driver has not got much attention
> though. But again, V4L2 offers a generic interface to userspace
> application. Fixing this driver could enable some work like this one:
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=772766
> 
> This work in progress feature is a generic hardware accelerated video
> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
> believe is in staging right now). Again, unlike the exynos/drm, this
> code could be reused between platforms.
> 
> In general, the problem with the DRM approach is that it only targets
> displays. We often need to use these IP block for stream pre/post
> processing outside a "playback" use case.
just a short note that this is not true. You can use all this
functionality e.g. through render nodes, without needing to have a
display attached to your system.

With best wishes,
Tobias


> What I'd like so see instead here, is an approach that helps both world
>  instead of trying to win the control over the IP block. Renesas
> development seems to lead toward the right direction by creating
> drivers that can be both interfaced in DRM and V4L2. For IPP and
> GScaler on Exynos, this would be a greater benefit and finally the code
> could be shared, having a single place to fix when we find bugs.
> 
>>>>
>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>> Stupid question, why DRM ?
>>>
>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>
>>> 1. we want to replace existing Exynos IPP subsystem:
>>>  - it is used only in some internal/vendor trees, not in open-source
>>>  - we want it to have sane and potentially extensible userspace API
>>>  - but we don't want to loose its functionality
>>>
>>> 2. we want to have simple API for performing single image processing
>>> operation:
>>>  - typically it will be used by compositing window manager, this means that
>>>    some parameters of the processing might change on each vblank (like
>>>    destination rectangle for example). This api allows such change on each
>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>    queues with new configuration on such change, what means that a bunch of
>>>    ioctls has to be called.
>>
>> What do you mean by re-initialising the queue? Format, buffers or something
>> else?
>>
>> If you need a larger buffer than what you have already allocated, you'll
>> need to re-allocate, V4L2 or not.
>>
>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>> implementing that and some work in videobuf2.
>>
>> Another thing is that V4L2 is very stream oriented. For most devices that's
>> fine as a lot of the parameters are not changeable during streaming,
>> especially if the pipeline is handled by multiple drivers. That said, for
>> devices that process data from memory to memory performing changes in the
>> media bus formats and pipeline configuration is not very efficient
>> currently, largely for the same reason.
>>
>> The request API that people have been working for a bit different use cases
>> isn't in mainline yet. It would allow more efficient per-request
>> configuration than what is currently possible, but it has turned out to be
>> far from trivial to implement.
>>
>>>  - validating processing parameters in V4l2 API is really complicated,
>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>    set incrementally, so we have to either allow some impossible,
>>> transitional
>>>    configurations or complicate the configuration steps even more (like
>>>    calling some ioctls multiple times for both input and output). In the end
>>>    all parameters have to be again validated just before performing the
>>>    operation.
>>
>> You have to validate the parameters in any case. In a MC pipeline this takes
>> place when the stream is started.
>>
>>>
>>> 3. generic approach (to add it to DRM core) has been rejected:
>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>
>> For GPUs I generally understand the reasoning: there's a very limited number
>> of users of this API --- primarily because it's not an application
>> interface.
>>
>> If you have a device that however falls under the scope of V4L2 (at least
>> API-wise), does this continue to be the case? Will there be only one or two
>> (or so) users for this API? Is it the case here?
>>
>> Using a device specific interface definitely has some benefits: there's no
>> need to think how would you generalise the interface for other similar
>> devices. There's no need to consider backwards compatibility as it's not a
>> requirement. The drawback is that the applications that need to support
>> similar devices will bear the burden of having to support different APIs.
>>
>> I don't mean to say that you should ram whatever under V4L2 / MC
>> independently of how unworkable that might be, but there are also clear
>> advantages in using a standardised interface such as V4L2.
>>
>> V4L2 has a long history behind it and if it was designed today, I bet it
>> would look quite different from what it is now.
>>
>>>
>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>> drivers
>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>    place in DRM for it
> 
> Note that I am convince that using these custom IOCTL within a
> "compositor" implementation is much easier and uniform compared to
> using a v4l2 driver. It probably offers lower latency. But these are
> non-generic and are not a great fit for streaming purpose. Request API
> and probably explicit fence may mitigate this though. Meanwhile, there
> is some indication that even though complex, there is already some
> people that do think implementing a compositor combining V4L2 and DRM
> is feasible.
> 
> http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
> land_weston_v2.pdf
> 
>>
>> Added LMML to cc.
> 
> Thanks.
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-26 14:53         ` Nicolas Dufresne
@ 2017-04-26 16:52             ` Tobias Jakobi
  2017-04-26 16:52             ` Tobias Jakobi
  1 sibling, 0 replies; 34+ messages in thread
From: Tobias Jakobi @ 2017-04-26 16:52 UTC (permalink / raw)
  To: Nicolas Dufresne, Sakari Ailus, Marek Szyprowski
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Sakari Ailus, linux-media

Hello again,


Nicolas Dufresne wrote:
> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
>> Hi Marek,
>>
>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>> Hi Laurent,
>>>
>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>> Hi Marek,
>>>>
>>>> (CC'ing Sakari Ailus)
>>>>
>>>> Thank you for the patches.
>>>>
>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>> Dear all,
>>>>>
>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>> support for hardware modules, which can be used for processing image data
>>>>> from the one memory buffer to another. Typical memory-to-memory operations
>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>> core":
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>> with fbdev API.
>>>>>
>>>>> Here is a bit more information what picture processors are:
>>>>>
>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>> such modules is implementing video overlay, which usually requires color
>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>> target window size.
>>>>>
>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>> standard DRM properties, which describes the operation to be performed by
>>>>> respective hardware module. In typical case those properties are a source
>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>> supported by the given hardware. To perform an operation on image data,
>>>>> userspace provides a set of properties and their values for given fbproc
>>>>> object in a similar way as object and properties are provided for
>>>>> performing atomic page flip / mode setting.
>>>>>
>>>>> The proposed API consists of the 3 new ioctls:
>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>   processors,
>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>   processor,
>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>   property set.
>>>>>
>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>> properties to add support for more advanced picture processing (for example
>>>>> blending).
>>>>>
>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>> operations together with CRTC writeback and duplicating features, which
>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>> rotator driver smaller by over 200 lines).
> 
> Just a side note, we have written code in GStreamer using the Exnynos 4
> FIMC IPP driver. I don't know how many, if any, deployment still exist
> (Exynos 4 is relatively old now), but there exist userspace for the
> FIMC driver.
I was searching for this code, but I didn't find anything. Are you sure
you really mean the FIMC IPP in Exynos DRM, and not just the FIMC driver
from the V4L2 subsystem?


With best wishes,
Tobias



> We use this for color transformation (from tiled to
> linear) and scaling. The FIMC driver is in fact quite stable in
> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
> largely based on it and has received some maintenance to properly work
> in GStreamer. unlike this DRM API, you can reuse the same userspace
> code across multiple platforms (which we do already). We have also
> integrated this driver in Chromium in the past (not upstream though).
> 
> I am well aware that the blitter driver has not got much attention
> though. But again, V4L2 offers a generic interface to userspace
> application. Fixing this driver could enable some work like this one:
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=772766
> 
> This work in progress feature is a generic hardware accelerated video
> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
> believe is in staging right now). Again, unlike the exynos/drm, this
> code could be reused between platforms.
> 
> In general, the problem with the DRM approach is that it only targets
> displays. We often need to use these IP block for stream pre/post
> processing outside a "playback" use case.
> 
> What I'd like so see instead here, is an approach that helps both world
>  instead of trying to win the control over the IP block. Renesas
> development seems to lead toward the right direction by creating
> drivers that can be both interfaced in DRM and V4L2. For IPP and
> GScaler on Exynos, this would be a greater benefit and finally the code
> could be shared, having a single place to fix when we find bugs.
> 
>>>>
>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>> Stupid question, why DRM ?
>>>
>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>
>>> 1. we want to replace existing Exynos IPP subsystem:
>>>  - it is used only in some internal/vendor trees, not in open-source
>>>  - we want it to have sane and potentially extensible userspace API
>>>  - but we don't want to loose its functionality
>>>
>>> 2. we want to have simple API for performing single image processing
>>> operation:
>>>  - typically it will be used by compositing window manager, this means that
>>>    some parameters of the processing might change on each vblank (like
>>>    destination rectangle for example). This api allows such change on each
>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>    queues with new configuration on such change, what means that a bunch of
>>>    ioctls has to be called.
>>
>> What do you mean by re-initialising the queue? Format, buffers or something
>> else?
>>
>> If you need a larger buffer than what you have already allocated, you'll
>> need to re-allocate, V4L2 or not.
>>
>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>> implementing that and some work in videobuf2.
>>
>> Another thing is that V4L2 is very stream oriented. For most devices that's
>> fine as a lot of the parameters are not changeable during streaming,
>> especially if the pipeline is handled by multiple drivers. That said, for
>> devices that process data from memory to memory performing changes in the
>> media bus formats and pipeline configuration is not very efficient
>> currently, largely for the same reason.
>>
>> The request API that people have been working for a bit different use cases
>> isn't in mainline yet. It would allow more efficient per-request
>> configuration than what is currently possible, but it has turned out to be
>> far from trivial to implement.
>>
>>>  - validating processing parameters in V4l2 API is really complicated,
>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>    set incrementally, so we have to either allow some impossible,
>>> transitional
>>>    configurations or complicate the configuration steps even more (like
>>>    calling some ioctls multiple times for both input and output). In the end
>>>    all parameters have to be again validated just before performing the
>>>    operation.
>>
>> You have to validate the parameters in any case. In a MC pipeline this takes
>> place when the stream is started.
>>
>>>
>>> 3. generic approach (to add it to DRM core) has been rejected:
>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>
>> For GPUs I generally understand the reasoning: there's a very limited number
>> of users of this API --- primarily because it's not an application
>> interface.
>>
>> If you have a device that however falls under the scope of V4L2 (at least
>> API-wise), does this continue to be the case? Will there be only one or two
>> (or so) users for this API? Is it the case here?
>>
>> Using a device specific interface definitely has some benefits: there's no
>> need to think how would you generalise the interface for other similar
>> devices. There's no need to consider backwards compatibility as it's not a
>> requirement. The drawback is that the applications that need to support
>> similar devices will bear the burden of having to support different APIs.
>>
>> I don't mean to say that you should ram whatever under V4L2 / MC
>> independently of how unworkable that might be, but there are also clear
>> advantages in using a standardised interface such as V4L2.
>>
>> V4L2 has a long history behind it and if it was designed today, I bet it
>> would look quite different from what it is now.
>>
>>>
>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>> drivers
>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>    place in DRM for it
> 
> Note that I am convince that using these custom IOCTL within a
> "compositor" implementation is much easier and uniform compared to
> using a v4l2 driver. It probably offers lower latency. But these are
> non-generic and are not a great fit for streaming purpose. Request API
> and probably explicit fence may mitigate this though. Meanwhile, there
> is some indication that even though complex, there is already some
> people that do think implementing a compositor combining V4L2 and DRM
> is feasible.
> 
> http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
> land_weston_v2.pdf
> 
>>
>> Added LMML to cc.
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
@ 2017-04-26 16:52             ` Tobias Jakobi
  0 siblings, 0 replies; 34+ messages in thread
From: Tobias Jakobi @ 2017-04-26 16:52 UTC (permalink / raw)
  To: Nicolas Dufresne, Sakari Ailus, Marek Szyprowski
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	dri-devel, Tobias Jakobi, Laurent Pinchart, Sakari Ailus,
	linux-media

Hello again,


Nicolas Dufresne wrote:
> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
>> Hi Marek,
>>
>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>> Hi Laurent,
>>>
>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>> Hi Marek,
>>>>
>>>> (CC'ing Sakari Ailus)
>>>>
>>>> Thank you for the patches.
>>>>
>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>> Dear all,
>>>>>
>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>> support for hardware modules, which can be used for processing image data
>>>>> from the one memory buffer to another. Typical memory-to-memory operations
>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>> core":
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>> with fbdev API.
>>>>>
>>>>> Here is a bit more information what picture processors are:
>>>>>
>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>> such modules is implementing video overlay, which usually requires color
>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>> target window size.
>>>>>
>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>> standard DRM properties, which describes the operation to be performed by
>>>>> respective hardware module. In typical case those properties are a source
>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>> supported by the given hardware. To perform an operation on image data,
>>>>> userspace provides a set of properties and their values for given fbproc
>>>>> object in a similar way as object and properties are provided for
>>>>> performing atomic page flip / mode setting.
>>>>>
>>>>> The proposed API consists of the 3 new ioctls:
>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>   processors,
>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>   processor,
>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>   property set.
>>>>>
>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>> properties to add support for more advanced picture processing (for example
>>>>> blending).
>>>>>
>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>> operations together with CRTC writeback and duplicating features, which
>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>> rotator driver smaller by over 200 lines).
> 
> Just a side note, we have written code in GStreamer using the Exnynos 4
> FIMC IPP driver. I don't know how many, if any, deployment still exist
> (Exynos 4 is relatively old now), but there exist userspace for the
> FIMC driver.
I was searching for this code, but I didn't find anything. Are you sure
you really mean the FIMC IPP in Exynos DRM, and not just the FIMC driver
from the V4L2 subsystem?


With best wishes,
Tobias



> We use this for color transformation (from tiled to
> linear) and scaling. The FIMC driver is in fact quite stable in
> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
> largely based on it and has received some maintenance to properly work
> in GStreamer. unlike this DRM API, you can reuse the same userspace
> code across multiple platforms (which we do already). We have also
> integrated this driver in Chromium in the past (not upstream though).
> 
> I am well aware that the blitter driver has not got much attention
> though. But again, V4L2 offers a generic interface to userspace
> application. Fixing this driver could enable some work like this one:
> 
> https://bugzilla.gnome.org/show_bug.cgi?id=772766
> 
> This work in progress feature is a generic hardware accelerated video
> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
> believe is in staging right now). Again, unlike the exynos/drm, this
> code could be reused between platforms.
> 
> In general, the problem with the DRM approach is that it only targets
> displays. We often need to use these IP block for stream pre/post
> processing outside a "playback" use case.
> 
> What I'd like so see instead here, is an approach that helps both world
>  instead of trying to win the control over the IP block. Renesas
> development seems to lead toward the right direction by creating
> drivers that can be both interfaced in DRM and V4L2. For IPP and
> GScaler on Exynos, this would be a greater benefit and finally the code
> could be shared, having a single place to fix when we find bugs.
> 
>>>>
>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>> Stupid question, why DRM ?
>>>
>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>
>>> 1. we want to replace existing Exynos IPP subsystem:
>>>  - it is used only in some internal/vendor trees, not in open-source
>>>  - we want it to have sane and potentially extensible userspace API
>>>  - but we don't want to loose its functionality
>>>
>>> 2. we want to have simple API for performing single image processing
>>> operation:
>>>  - typically it will be used by compositing window manager, this means that
>>>    some parameters of the processing might change on each vblank (like
>>>    destination rectangle for example). This api allows such change on each
>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>    queues with new configuration on such change, what means that a bunch of
>>>    ioctls has to be called.
>>
>> What do you mean by re-initialising the queue? Format, buffers or something
>> else?
>>
>> If you need a larger buffer than what you have already allocated, you'll
>> need to re-allocate, V4L2 or not.
>>
>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>> implementing that and some work in videobuf2.
>>
>> Another thing is that V4L2 is very stream oriented. For most devices that's
>> fine as a lot of the parameters are not changeable during streaming,
>> especially if the pipeline is handled by multiple drivers. That said, for
>> devices that process data from memory to memory performing changes in the
>> media bus formats and pipeline configuration is not very efficient
>> currently, largely for the same reason.
>>
>> The request API that people have been working for a bit different use cases
>> isn't in mainline yet. It would allow more efficient per-request
>> configuration than what is currently possible, but it has turned out to be
>> far from trivial to implement.
>>
>>>  - validating processing parameters in V4l2 API is really complicated,
>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>    set incrementally, so we have to either allow some impossible,
>>> transitional
>>>    configurations or complicate the configuration steps even more (like
>>>    calling some ioctls multiple times for both input and output). In the end
>>>    all parameters have to be again validated just before performing the
>>>    operation.
>>
>> You have to validate the parameters in any case. In a MC pipeline this takes
>> place when the stream is started.
>>
>>>
>>> 3. generic approach (to add it to DRM core) has been rejected:
>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>
>> For GPUs I generally understand the reasoning: there's a very limited number
>> of users of this API --- primarily because it's not an application
>> interface.
>>
>> If you have a device that however falls under the scope of V4L2 (at least
>> API-wise), does this continue to be the case? Will there be only one or two
>> (or so) users for this API? Is it the case here?
>>
>> Using a device specific interface definitely has some benefits: there's no
>> need to think how would you generalise the interface for other similar
>> devices. There's no need to consider backwards compatibility as it's not a
>> requirement. The drawback is that the applications that need to support
>> similar devices will bear the burden of having to support different APIs.
>>
>> I don't mean to say that you should ram whatever under V4L2 / MC
>> independently of how unworkable that might be, but there are also clear
>> advantages in using a standardised interface such as V4L2.
>>
>> V4L2 has a long history behind it and if it was designed today, I bet it
>> would look quite different from what it is now.
>>
>>>
>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>> drivers
>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>    place in DRM for it
> 
> Note that I am convince that using these custom IOCTL within a
> "compositor" implementation is much easier and uniform compared to
> using a v4l2 driver. It probably offers lower latency. But these are
> non-generic and are not a great fit for streaming purpose. Request API
> and probably explicit fence may mitigate this though. Meanwhile, there
> is some indication that even though complex, there is already some
> people that do think implementing a compositor combining V4L2 and DRM
> is feasible.
> 
> http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
> land_weston_v2.pdf
> 
>>
>> Added LMML to cc.
> 
> Thanks.
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-26 16:52             ` Tobias Jakobi
  (?)
@ 2017-04-26 19:18             ` Nicolas Dufresne
  2017-04-26 19:31               ` Tobias Jakobi
  -1 siblings, 1 reply; 34+ messages in thread
From: Nicolas Dufresne @ 2017-04-26 19:18 UTC (permalink / raw)
  To: Tobias Jakobi, Sakari Ailus, Marek Szyprowski
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Sakari Ailus,
	linux-media

[-- Attachment #1: Type: text/plain, Size: 12197 bytes --]

Le mercredi 26 avril 2017 à 18:52 +0200, Tobias Jakobi a écrit :
> Hello again,
> 
> 
> Nicolas Dufresne wrote:
> > Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
> > > Hi Marek,
> > > 
> > > On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
> > > > Hi Laurent,
> > > > 
> > > > On 2017-04-20 12:25, Laurent Pinchart wrote:
> > > > > Hi Marek,
> > > > > 
> > > > > (CC'ing Sakari Ailus)
> > > > > 
> > > > > Thank you for the patches.
> > > > > 
> > > > > On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
> > > > > > Dear all,
> > > > > > 
> > > > > > This is an updated proposal for extending EXYNOS DRM API with generic
> > > > > > support for hardware modules, which can be used for processing image data
> > > > > > from the one memory buffer to another. Typical memory-to-memory operations
> > > > > > are: rotation, scaling, colour space conversion or mix of them. This is a
> > > > > > follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
> > > > > > processors", which has been rejected as "not really needed in the DRM
> > > > > > core":
> > > > > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> > > > > > 
> > > > > > In this proposal I moved all the code to Exynos DRM driver, so now this
> > > > > > will be specific only to Exynos DRM. I've also changed the name from
> > > > > > framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
> > > > > > with fbdev API.
> > > > > > 
> > > > > > Here is a bit more information what picture processors are:
> > > > > > 
> > > > > > Embedded SoCs are known to have a number of hardware blocks, which perform
> > > > > > such operations. They can be used in paralel to the main GPU module to
> > > > > > offload CPU from processing grapics or video data. One of example use of
> > > > > > such modules is implementing video overlay, which usually requires color
> > > > > > space conversion from NV12 (or similar) to RGB32 color space and scaling to
> > > > > > target window size.
> > > > > > 
> > > > > > The proposed API is heavily inspired by atomic KMS approach - it is also
> > > > > > based on DRM objects and their properties. A new DRM object is introduced:
> > > > > > picture processor (called pp for convenience). Such objects have a set of
> > > > > > standard DRM properties, which describes the operation to be performed by
> > > > > > respective hardware module. In typical case those properties are a source
> > > > > > fb id and rectangle (x, y, width, height) and destination fb id and
> > > > > > rectangle. Optionally a rotation property can be also specified if
> > > > > > supported by the given hardware. To perform an operation on image data,
> > > > > > userspace provides a set of properties and their values for given fbproc
> > > > > > object in a similar way as object and properties are provided for
> > > > > > performing atomic page flip / mode setting.
> > > > > > 
> > > > > > The proposed API consists of the 3 new ioctls:
> > > > > > - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
> > > > > >   processors,
> > > > > > - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
> > > > > >   processor,
> > > > > > - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
> > > > > >   property set.
> > > > > > 
> > > > > > The proposed API is extensible. Drivers can attach their own, custom
> > > > > > properties to add support for more advanced picture processing (for example
> > > > > > blending).
> > > > > > 
> > > > > > This proposal aims to replace Exynos DRM IPP (Image Post Processing)
> > > > > > subsystem. IPP API is over-engineered in general, but not really extensible
> > > > > > on the other side. It is also buggy, with significant design flaws - the
> > > > > > biggest issue is the fact that the API covers memory-2-memory picture
> > > > > > operations together with CRTC writeback and duplicating features, which
> > > > > > belongs to video plane. Comparing with IPP subsystem, the PP framework is
> > > > > > smaller (1807 vs 778 lines) and allows driver simplification (Exynos
> > > > > > rotator driver smaller by over 200 lines).
> > 
> > Just a side note, we have written code in GStreamer using the Exnynos 4
> > FIMC IPP driver. I don't know how many, if any, deployment still exist
> > (Exynos 4 is relatively old now), but there exist userspace for the
> > FIMC driver.
> 
> I was searching for this code, but I didn't find anything. Are you sure
> you really mean the FIMC IPP in Exynos DRM, and not just the FIMC driver
> from the V4L2 subsystem?

Oops, I manage to be unclear. Having two drivers on the same IP isn't
helping. We wrote code around the FIMC driver on V4L2 side. This
driver:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/platform/exynos4-is/fimc-m2m.c

And this code:

https://cgit.freedesktop.org/gstreamer/gst-plugins-good/tree/sys/v4l2/gstv4l2transform.c

Unless I have miss-read, the proposal here is to deprecate the V4L side
and improve the DRM side (which I stand against in my reply).

> 
> 
> With best wishes,
> Tobias
> 
> 
> 
> > We use this for color transformation (from tiled to
> > linear) and scaling. The FIMC driver is in fact quite stable in
> > upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
> > largely based on it and has received some maintenance to properly work
> > in GStreamer. unlike this DRM API, you can reuse the same userspace
> > code across multiple platforms (which we do already). We have also
> > integrated this driver in Chromium in the past (not upstream though).
> > 
> > I am well aware that the blitter driver has not got much attention
> > though. But again, V4L2 offers a generic interface to userspace
> > application. Fixing this driver could enable some work like this one:
> > 
> > https://bugzilla.gnome.org/show_bug.cgi?id=772766
> > 
> > This work in progress feature is a generic hardware accelerated video
> > mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
> > believe is in staging right now). Again, unlike the exynos/drm, this
> > code could be reused between platforms.
> > 
> > In general, the problem with the DRM approach is that it only targets
> > displays. We often need to use these IP block for stream pre/post
> > processing outside a "playback" use case.
> > 
> > What I'd like so see instead here, is an approach that helps both world
> >  instead of trying to win the control over the IP block. Renesas
> > development seems to lead toward the right direction by creating
> > drivers that can be both interfaced in DRM and V4L2. For IPP and
> > GScaler on Exynos, this would be a greater benefit and finally the code
> > could be shared, having a single place to fix when we find bugs.
> > 
> > > > > 
> > > > > This seems to be the kind of hardware that is typically supported by V4L2.
> > > > > Stupid question, why DRM ?
> > > > 
> > > > Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
> > > > 
> > > > 1. we want to replace existing Exynos IPP subsystem:
> > > >  - it is used only in some internal/vendor trees, not in open-source
> > > >  - we want it to have sane and potentially extensible userspace API
> > > >  - but we don't want to loose its functionality
> > > > 
> > > > 2. we want to have simple API for performing single image processing
> > > > operation:
> > > >  - typically it will be used by compositing window manager, this means that
> > > >    some parameters of the processing might change on each vblank (like
> > > >    destination rectangle for example). This api allows such change on each
> > > >    operation without any additional cost. V4L2 requires to reinitialize
> > > >    queues with new configuration on such change, what means that a bunch of
> > > >    ioctls has to be called.
> > > 
> > > What do you mean by re-initialising the queue? Format, buffers or something
> > > else?
> > > 
> > > If you need a larger buffer than what you have already allocated, you'll
> > > need to re-allocate, V4L2 or not.
> > > 
> > > We also do lack a way to destroy individual buffers in V4L2. It'd be up to
> > > implementing that and some work in videobuf2.
> > > 
> > > Another thing is that V4L2 is very stream oriented. For most devices that's
> > > fine as a lot of the parameters are not changeable during streaming,
> > > especially if the pipeline is handled by multiple drivers. That said, for
> > > devices that process data from memory to memory performing changes in the
> > > media bus formats and pipeline configuration is not very efficient
> > > currently, largely for the same reason.
> > > 
> > > The request API that people have been working for a bit different use cases
> > > isn't in mainline yet. It would allow more efficient per-request
> > > configuration than what is currently possible, but it has turned out to be
> > > far from trivial to implement.
> > > 
> > > >  - validating processing parameters in V4l2 API is really complicated,
> > > >    because the parameters (format, src&dest rectangles, rotation) are being
> > > >    set incrementally, so we have to either allow some impossible,
> > > > transitional
> > > >    configurations or complicate the configuration steps even more (like
> > > >    calling some ioctls multiple times for both input and output). In the end
> > > >    all parameters have to be again validated just before performing the
> > > >    operation.
> > > 
> > > You have to validate the parameters in any case. In a MC pipeline this takes
> > > place when the stream is started.
> > > 
> > > > 
> > > > 3. generic approach (to add it to DRM core) has been rejected:
> > > > http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> > > 
> > > For GPUs I generally understand the reasoning: there's a very limited number
> > > of users of this API --- primarily because it's not an application
> > > interface.
> > > 
> > > If you have a device that however falls under the scope of V4L2 (at least
> > > API-wise), does this continue to be the case? Will there be only one or two
> > > (or so) users for this API? Is it the case here?
> > > 
> > > Using a device specific interface definitely has some benefits: there's no
> > > need to think how would you generalise the interface for other similar
> > > devices. There's no need to consider backwards compatibility as it's not a
> > > requirement. The drawback is that the applications that need to support
> > > similar devices will bear the burden of having to support different APIs.
> > > 
> > > I don't mean to say that you should ram whatever under V4L2 / MC
> > > independently of how unworkable that might be, but there are also clear
> > > advantages in using a standardised interface such as V4L2.
> > > 
> > > V4L2 has a long history behind it and if it was designed today, I bet it
> > > would look quite different from what it is now.
> > > 
> > > > 
> > > > 4. this api can be considered as extended 'blit' operation, other DRM
> > > > drivers
> > > >    (MGA, R128, VIA) already have ioctls for such operation, so there is also
> > > >    place in DRM for it
> > 
> > Note that I am convince that using these custom IOCTL within a
> > "compositor" implementation is much easier and uniform compared to
> > using a v4l2 driver. It probably offers lower latency. But these are
> > non-generic and are not a great fit for streaming purpose. Request API
> > and probably explicit fence may mitigate this though. Meanwhile, there
> > is some indication that even though complex, there is already some
> > people that do think implementing a compositor combining V4L2 and DRM
> > is feasible.
> > 
> > http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
> > land_weston_v2.pdf
> > 
> > > 
> > > Added LMML to cc.
> > 
> > Thanks.
> > 
> 
> 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-26 19:18             ` Nicolas Dufresne
@ 2017-04-26 19:31               ` Tobias Jakobi
  2017-04-26 19:36                 ` Nicolas Dufresne
  0 siblings, 1 reply; 34+ messages in thread
From: Tobias Jakobi @ 2017-04-26 19:31 UTC (permalink / raw)
  To: Nicolas Dufresne, Sakari Ailus, Marek Szyprowski
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Sakari Ailus,
	linux-media

Hey,

Nicolas Dufresne wrote:
> Le mercredi 26 avril 2017 à 18:52 +0200, Tobias Jakobi a écrit :
>> Hello again,
>>
>>
>> Nicolas Dufresne wrote:
>>> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
>>>> Hi Marek,
>>>>
>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>> Hi Laurent,
>>>>>
>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>> Hi Marek,
>>>>>>
>>>>>> (CC'ing Sakari Ailus)
>>>>>>
>>>>>> Thank you for the patches.
>>>>>>
>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>> support for hardware modules, which can be used for processing image data
>>>>>>> from the one memory buffer to another. Typical memory-to-memory operations
>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>> core":
>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>
>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>>> with fbdev API.
>>>>>>>
>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>
>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>>> target window size.
>>>>>>>
>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>>> respective hardware module. In typical case those properties are a source
>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>>> object in a similar way as object and properties are provided for
>>>>>>> performing atomic page flip / mode setting.
>>>>>>>
>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>   processors,
>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>>   processor,
>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>>   property set.
>>>>>>>
>>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>>> properties to add support for more advanced picture processing (for example
>>>>>>> blending).
>>>>>>>
>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>>> rotator driver smaller by over 200 lines).
>>>
>>> Just a side note, we have written code in GStreamer using the Exnynos 4
>>> FIMC IPP driver. I don't know how many, if any, deployment still exist
>>> (Exynos 4 is relatively old now), but there exist userspace for the
>>> FIMC driver.
>>
>> I was searching for this code, but I didn't find anything. Are you sure
>> you really mean the FIMC IPP in Exynos DRM, and not just the FIMC driver
>> from the V4L2 subsystem?
> 
> Oops, I manage to be unclear. Having two drivers on the same IP isn't
> helping. We wrote code around the FIMC driver on V4L2 side. This
> driver:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/media/platform/exynos4-is/fimc-m2m.c
> 
> And this code:
> 
> https://cgit.freedesktop.org/gstreamer/gst-plugins-good/tree/sys/v4l2/gstv4l2transform.c
> 
> Unless I have miss-read, the proposal here is to deprecate the V4L side
> and improve the DRM side (which I stand against in my reply).
I'm pretty sure you have misread Marek's description of the patchset.
The picture processor API should replaced/deprecate the IPP API that is
currently implemented in the Exynos DRM.

In particular this affects the following files:
- drivers/gpu/drm/exynos/exynos_drm_ipp.{c,h}
- drivers/gpu/drm/exynos/exynos_drm_fimc.{c,h}
- drivers/gpu/drm/exynos/exynos_drm_gsc.{c,h}
- drivers/gpu/drm/exynos/exynos_drm_rotator.{c,h}

I know only two places where the IPP API is actually used. Tizen and my
experimental mpv backend.

With best wishes,
Tobias



> 
>>
>>
>> With best wishes,
>> Tobias
>>
>>
>>
>>> We use this for color transformation (from tiled to
>>> linear) and scaling. The FIMC driver is in fact quite stable in
>>> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
>>> largely based on it and has received some maintenance to properly work
>>> in GStreamer. unlike this DRM API, you can reuse the same userspace
>>> code across multiple platforms (which we do already). We have also
>>> integrated this driver in Chromium in the past (not upstream though).
>>>
>>> I am well aware that the blitter driver has not got much attention
>>> though. But again, V4L2 offers a generic interface to userspace
>>> application. Fixing this driver could enable some work like this one:
>>>
>>> https://bugzilla.gnome.org/show_bug.cgi?id=772766
>>>
>>> This work in progress feature is a generic hardware accelerated video
>>> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
>>> believe is in staging right now). Again, unlike the exynos/drm, this
>>> code could be reused between platforms.
>>>
>>> In general, the problem with the DRM approach is that it only targets
>>> displays. We often need to use these IP block for stream pre/post
>>> processing outside a "playback" use case.
>>>
>>> What I'd like so see instead here, is an approach that helps both world
>>>  instead of trying to win the control over the IP block. Renesas
>>> development seems to lead toward the right direction by creating
>>> drivers that can be both interfaced in DRM and V4L2. For IPP and
>>> GScaler on Exynos, this would be a greater benefit and finally the code
>>> could be shared, having a single place to fix when we find bugs.
>>>
>>>>>>
>>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>>> Stupid question, why DRM ?
>>>>>
>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>>
>>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>>  - it is used only in some internal/vendor trees, not in open-source
>>>>>  - we want it to have sane and potentially extensible userspace API
>>>>>  - but we don't want to loose its functionality
>>>>>
>>>>> 2. we want to have simple API for performing single image processing
>>>>> operation:
>>>>>  - typically it will be used by compositing window manager, this means that
>>>>>    some parameters of the processing might change on each vblank (like
>>>>>    destination rectangle for example). This api allows such change on each
>>>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>>>    queues with new configuration on such change, what means that a bunch of
>>>>>    ioctls has to be called.
>>>>
>>>> What do you mean by re-initialising the queue? Format, buffers or something
>>>> else?
>>>>
>>>> If you need a larger buffer than what you have already allocated, you'll
>>>> need to re-allocate, V4L2 or not.
>>>>
>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>>> implementing that and some work in videobuf2.
>>>>
>>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>>> fine as a lot of the parameters are not changeable during streaming,
>>>> especially if the pipeline is handled by multiple drivers. That said, for
>>>> devices that process data from memory to memory performing changes in the
>>>> media bus formats and pipeline configuration is not very efficient
>>>> currently, largely for the same reason.
>>>>
>>>> The request API that people have been working for a bit different use cases
>>>> isn't in mainline yet. It would allow more efficient per-request
>>>> configuration than what is currently possible, but it has turned out to be
>>>> far from trivial to implement.
>>>>
>>>>>  - validating processing parameters in V4l2 API is really complicated,
>>>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>>>    set incrementally, so we have to either allow some impossible,
>>>>> transitional
>>>>>    configurations or complicate the configuration steps even more (like
>>>>>    calling some ioctls multiple times for both input and output). In the end
>>>>>    all parameters have to be again validated just before performing the
>>>>>    operation.
>>>>
>>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>>> place when the stream is started.
>>>>
>>>>>
>>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>
>>>> For GPUs I generally understand the reasoning: there's a very limited number
>>>> of users of this API --- primarily because it's not an application
>>>> interface.
>>>>
>>>> If you have a device that however falls under the scope of V4L2 (at least
>>>> API-wise), does this continue to be the case? Will there be only one or two
>>>> (or so) users for this API? Is it the case here?
>>>>
>>>> Using a device specific interface definitely has some benefits: there's no
>>>> need to think how would you generalise the interface for other similar
>>>> devices. There's no need to consider backwards compatibility as it's not a
>>>> requirement. The drawback is that the applications that need to support
>>>> similar devices will bear the burden of having to support different APIs.
>>>>
>>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>>> independently of how unworkable that might be, but there are also clear
>>>> advantages in using a standardised interface such as V4L2.
>>>>
>>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>>> would look quite different from what it is now.
>>>>
>>>>>
>>>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>>>> drivers
>>>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>>>    place in DRM for it
>>>
>>> Note that I am convince that using these custom IOCTL within a
>>> "compositor" implementation is much easier and uniform compared to
>>> using a v4l2 driver. It probably offers lower latency. But these are
>>> non-generic and are not a great fit for streaming purpose. Request API
>>> and probably explicit fence may mitigate this though. Meanwhile, there
>>> is some indication that even though complex, there is already some
>>> people that do think implementing a compositor combining V4L2 and DRM
>>> is feasible.
>>>
>>> http://events.linuxfoundation.org/sites/events/files/slides/als2015_way
>>> land_weston_v2.pdf
>>>
>>>>
>>>> Added LMML to cc.
>>>
>>> Thanks.
>>>
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-26 19:31               ` Tobias Jakobi
@ 2017-04-26 19:36                 ` Nicolas Dufresne
  0 siblings, 0 replies; 34+ messages in thread
From: Nicolas Dufresne @ 2017-04-26 19:36 UTC (permalink / raw)
  To: Tobias Jakobi, Sakari Ailus, Marek Szyprowski
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Sakari Ailus,
	linux-media

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

Le mercredi 26 avril 2017 à 21:31 +0200, Tobias Jakobi a écrit :
> I'm pretty sure you have misread Marek's description of the patchset.
> The picture processor API should replaced/deprecate the IPP API that is
> currently implemented in the Exynos DRM.
> 
> In particular this affects the following files:
> - drivers/gpu/drm/exynos/exynos_drm_ipp.{c,h}
> - drivers/gpu/drm/exynos/exynos_drm_fimc.{c,h}
> - drivers/gpu/drm/exynos/exynos_drm_gsc.{c,h}
> - drivers/gpu/drm/exynos/exynos_drm_rotator.{c,h}
> 
> I know only two places where the IPP API is actually used. Tizen and my
> experimental mpv backend.

Sorry for the noise then.

regards,
Nicolas

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-25 22:21       ` Sakari Ailus
  2017-04-26 14:53         ` Nicolas Dufresne
@ 2017-04-27 13:52         ` Marek Szyprowski
  2017-05-10  1:24         ` Inki Dae
  2 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-27 13:52 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Tobias Jakobi,
	Sakari Ailus, linux-media

Hi Sakari,

On 2017-04-26 00:21, Sakari Ailus wrote:
> Hi Marek,
>
> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>> Hi Laurent,
>>
>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>> Hi Marek,
>>>
>>> (CC'ing Sakari Ailus)
>>>
>>> Thank you for the patches.
>>>
>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>> Dear all,
>>>>
>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>> support for hardware modules, which can be used for processing image data
>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>> processors", which has been rejected as "not really needed in the DRM
>>>> core":
>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>
>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>> with fbdev API.
>>>>
>>>> Here is a bit more information what picture processors are:
>>>>
>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>> such operations. They can be used in paralel to the main GPU module to
>>>> offload CPU from processing grapics or video data. One of example use of
>>>> such modules is implementing video overlay, which usually requires color
>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>> target window size.
>>>>
>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>> picture processor (called pp for convenience). Such objects have a set of
>>>> standard DRM properties, which describes the operation to be performed by
>>>> respective hardware module. In typical case those properties are a source
>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>> rectangle. Optionally a rotation property can be also specified if
>>>> supported by the given hardware. To perform an operation on image data,
>>>> userspace provides a set of properties and their values for given fbproc
>>>> object in a similar way as object and properties are provided for
>>>> performing atomic page flip / mode setting.
>>>>
>>>> The proposed API consists of the 3 new ioctls:
>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>    processors,
>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>    processor,
>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>    property set.
>>>>
>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>> properties to add support for more advanced picture processing (for example
>>>> blending).
>>>>
>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>> on the other side. It is also buggy, with significant design flaws - the
>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>> operations together with CRTC writeback and duplicating features, which
>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>> rotator driver smaller by over 200 lines).
>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>> Stupid question, why DRM ?
>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>
>> 1. we want to replace existing Exynos IPP subsystem:
>>   - it is used only in some internal/vendor trees, not in open-source
>>   - we want it to have sane and potentially extensible userspace API
>>   - but we don't want to loose its functionality
>>
>> 2. we want to have simple API for performing single image processing
>> operation:
>>   - typically it will be used by compositing window manager, this means that
>>     some parameters of the processing might change on each vblank (like
>>     destination rectangle for example). This api allows such change on each
>>     operation without any additional cost. V4L2 requires to reinitialize
>>     queues with new configuration on such change, what means that a bunch of
>>     ioctls has to be called.
> What do you mean by re-initialising the queue? Format, buffers or something
> else?

In case of compositor use case, the parameter that is being changed most
frequently is source and/or destination rectangle position and/or size.

> If you need a larger buffer than what you have already allocated, you'll
> need to re-allocate, V4L2 or not.
>
> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
> implementing that and some work in videobuf2.

Well if we would use V4l2, buffers will always come as dmabuf objects. There
is a hard limit of the number of buffers that can be imported to v4l2/vb2
queue to get buffer ids. This also limits easy processing of the buffers
in the compositor, because you would need to reinitialize the v4l2 queues
to get new set of v4l2/vb2 buffer ids.

> Another thing is that V4L2 is very stream oriented. For most devices that's
> fine as a lot of the parameters are not changeable during streaming,
> especially if the pipeline is handled by multiple drivers. That said, for
> devices that process data from memory to memory performing changes in the
> media bus formats and pipeline configuration is not very efficient
> currently, largely for the same reason.
>
> The request API that people have been working for a bit different use cases
> isn't in mainline yet. It would allow more efficient per-request
> configuration than what is currently possible, but it has turned out to be
> far from trivial to implement.
>
>>   - validating processing parameters in V4l2 API is really complicated,
>>     because the parameters (format, src&dest rectangles, rotation) are being
>>     set incrementally, so we have to either allow some impossible,
>> transitional
>>     configurations or complicate the configuration steps even more (like
>>     calling some ioctls multiple times for both input and output). In the end
>>     all parameters have to be again validated just before performing the
>>     operation.
> You have to validate the parameters in any case. In a MC pipeline this takes
> place when the stream is started.

Well, in case of v4l2 one would need to stop and restart 'steaming' on 
both queues
of mem2mem device just to change some transformation parameters...

>> 3. generic approach (to add it to DRM core) has been rejected:
>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> For GPUs I generally understand the reasoning: there's a very limited number
> of users of this API --- primarily because it's not an application
> interface.
>
> If you have a device that however falls under the scope of V4L2 (at least
> API-wise), does this continue to be the case? Will there be only one or two
> (or so) users for this API? Is it the case here?
>
> Using a device specific interface definitely has some benefits: there's no
> need to think how would you generalise the interface for other similar
> devices. There's no need to consider backwards compatibility as it's not a
> requirement. The drawback is that the applications that need to support
> similar devices will bear the burden of having to support different APIs.
>
> I don't mean to say that you should ram whatever under V4L2 / MC
> independently of how unworkable that might be, but there are also clear
> advantages in using a standardised interface such as V4L2.
>
> V4L2 has a long history behind it and if it was designed today, I bet it
> would look quite different from what it is now.

IMHO V4l2 becomes both a bit over-engineered because of the 'backwards
compatibility' and too limited on the other hand to support really complex
hardware (iirc none of the top mobile Android HW vendors use V4l2 for their
cameras subsystems). This is however a completely separate topic.

>> 4. this api can be considered as extended 'blit' operation, other DRM
>> drivers
>>     (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>     place in DRM for it
> Added LMML to cc.
>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-26 15:16             ` Tobias Jakobi
@ 2017-04-27 13:52               ` Marek Szyprowski
  -1 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-27 13:52 UTC (permalink / raw)
  To: Tobias Jakobi, Nicolas Dufresne, Sakari Ailus
  Cc: Laurent Pinchart, dri-devel, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, Sakari Ailus,
	linux-media

Hi Tobias and Nicolas,

On 2017-04-26 17:16, Tobias Jakobi wrote:
> Nicolas Dufresne wrote:
>> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
>>> Hi Marek,
>>>
>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>> Hi Laurent,
>>>>
>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>> Hi Marek,
>>>>>
>>>>> (CC'ing Sakari Ailus)
>>>>>
>>>>> Thank you for the patches.
>>>>>
>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>> support for hardware modules, which can be used for processing image data
>>>>>> from the one memory buffer to another. Typical memory-to-memory operations
>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>> core":
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>
>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>> with fbdev API.
>>>>>>
>>>>>> Here is a bit more information what picture processors are:
>>>>>>
>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>> target window size.
>>>>>>
>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>> respective hardware module. In typical case those properties are a source
>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>> object in a similar way as object and properties are provided for
>>>>>> performing atomic page flip / mode setting.
>>>>>>
>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>    processors,
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>    processor,
>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>    property set.
>>>>>>
>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>> properties to add support for more advanced picture processing (for example
>>>>>> blending).
>>>>>>
>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>> rotator driver smaller by over 200 lines).
>> Just a side note, we have written code in GStreamer using the Exnynos 4
>> FIMC IPP driver. I don't know how many, if any, deployment still exist
>> (Exynos 4 is relatively old now), but there exist userspace for the
>> FIMC driver. We use this for color transformation (from tiled to
>> linear) and scaling. The FIMC driver is in fact quite stable in
>> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
>> largely based on it and has received some maintenance to properly work
>> in GStreamer. unlike this DRM API, you can reuse the same userspace
>> code across multiple platforms (which we do already). We have also
>> integrated this driver in Chromium in the past (not upstream though).
>>
>> I am well aware that the blitter driver has not got much attention
>> though. But again, V4L2 offers a generic interface to userspace
>> application. Fixing this driver could enable some work like this one:
>>
>> https://bugzilla.gnome.org/show_bug.cgi?id=772766
>>
>> This work in progress feature is a generic hardware accelerated video
>> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
>> believe is in staging right now). Again, unlike the exynos/drm, this
>> code could be reused between platforms.
>>
>> In general, the problem with the DRM approach is that it only targets
>> displays. We often need to use these IP block for stream pre/post
>> processing outside a "playback" use case.
> just a short note that this is not true. You can use all this
> functionality e.g. through render nodes, without needing to have a
> display attached to your system.

Yes. As an alternative I also plan to provide generic V4L2-style mem2mem
device on top of this Exynos DRM interface. This will also help on the newer
Exynos SoCs, which even don't have GScaller. This way we can also easily get
rid of two drivers for GScaller hw. FIMC will probably stay in V4L2 because
of its camera related functions.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
@ 2017-04-27 13:52               ` Marek Szyprowski
  0 siblings, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-04-27 13:52 UTC (permalink / raw)
  To: Tobias Jakobi, Nicolas Dufresne, Sakari Ailus
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	dri-devel, Laurent Pinchart, Sakari Ailus, linux-media

Hi Tobias and Nicolas,

On 2017-04-26 17:16, Tobias Jakobi wrote:
> Nicolas Dufresne wrote:
>> Le mercredi 26 avril 2017 à 01:21 +0300, Sakari Ailus a écrit :
>>> Hi Marek,
>>>
>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>> Hi Laurent,
>>>>
>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>> Hi Marek,
>>>>>
>>>>> (CC'ing Sakari Ailus)
>>>>>
>>>>> Thank you for the patches.
>>>>>
>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>> support for hardware modules, which can be used for processing image data
>>>>>> from the one memory buffer to another. Typical memory-to-memory operations
>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>> core":
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>
>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>> with fbdev API.
>>>>>>
>>>>>> Here is a bit more information what picture processors are:
>>>>>>
>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>> target window size.
>>>>>>
>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>> respective hardware module. In typical case those properties are a source
>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>> object in a similar way as object and properties are provided for
>>>>>> performing atomic page flip / mode setting.
>>>>>>
>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>    processors,
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>    processor,
>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>    property set.
>>>>>>
>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>> properties to add support for more advanced picture processing (for example
>>>>>> blending).
>>>>>>
>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>> rotator driver smaller by over 200 lines).
>> Just a side note, we have written code in GStreamer using the Exnynos 4
>> FIMC IPP driver. I don't know how many, if any, deployment still exist
>> (Exynos 4 is relatively old now), but there exist userspace for the
>> FIMC driver. We use this for color transformation (from tiled to
>> linear) and scaling. The FIMC driver is in fact quite stable in
>> upstream kernel today. The GScaler V4L2 M2M driver on Exynos 5 is
>> largely based on it and has received some maintenance to properly work
>> in GStreamer. unlike this DRM API, you can reuse the same userspace
>> code across multiple platforms (which we do already). We have also
>> integrated this driver in Chromium in the past (not upstream though).
>>
>> I am well aware that the blitter driver has not got much attention
>> though. But again, V4L2 offers a generic interface to userspace
>> application. Fixing this driver could enable some work like this one:
>>
>> https://bugzilla.gnome.org/show_bug.cgi?id=772766
>>
>> This work in progress feature is a generic hardware accelerated video
>> mixer. It has been tested with IMX.6 v4l2 m2m blitter driver (which I
>> believe is in staging right now). Again, unlike the exynos/drm, this
>> code could be reused between platforms.
>>
>> In general, the problem with the DRM approach is that it only targets
>> displays. We often need to use these IP block for stream pre/post
>> processing outside a "playback" use case.
> just a short note that this is not true. You can use all this
> functionality e.g. through render nodes, without needing to have a
> display attached to your system.

Yes. As an alternative I also plan to provide generic V4L2-style mem2mem
device on top of this Exynos DRM interface. This will also help on the newer
Exynos SoCs, which even don't have GScaller. This way we can also easily get
rid of two drivers for GScaller hw. FIMC will probably stay in V4L2 because
of its camera related functions.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-04-25 22:21       ` Sakari Ailus
  2017-04-26 14:53         ` Nicolas Dufresne
  2017-04-27 13:52         ` Marek Szyprowski
@ 2017-05-10  1:24         ` Inki Dae
  2017-05-10  5:38           ` Tomasz Figa
  2 siblings, 1 reply; 34+ messages in thread
From: Inki Dae @ 2017-05-10  1:24 UTC (permalink / raw)
  To: Sakari Ailus, Marek Szyprowski
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	dri-devel, Tobias Jakobi, Laurent Pinchart, Sakari Ailus,
	linux-media



2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
> Hi Marek,
> 
> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>> Hi Laurent,
>>
>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>> Hi Marek,
>>>
>>> (CC'ing Sakari Ailus)
>>>
>>> Thank you for the patches.
>>>
>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>> Dear all,
>>>>
>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>> support for hardware modules, which can be used for processing image data
>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>> processors", which has been rejected as "not really needed in the DRM
>>>> core":
>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>
>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>> with fbdev API.
>>>>
>>>> Here is a bit more information what picture processors are:
>>>>
>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>> such operations. They can be used in paralel to the main GPU module to
>>>> offload CPU from processing grapics or video data. One of example use of
>>>> such modules is implementing video overlay, which usually requires color
>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>> target window size.
>>>>
>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>> picture processor (called pp for convenience). Such objects have a set of
>>>> standard DRM properties, which describes the operation to be performed by
>>>> respective hardware module. In typical case those properties are a source
>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>> rectangle. Optionally a rotation property can be also specified if
>>>> supported by the given hardware. To perform an operation on image data,
>>>> userspace provides a set of properties and their values for given fbproc
>>>> object in a similar way as object and properties are provided for
>>>> performing atomic page flip / mode setting.
>>>>
>>>> The proposed API consists of the 3 new ioctls:
>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>   processors,
>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>   processor,
>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>   property set.
>>>>
>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>> properties to add support for more advanced picture processing (for example
>>>> blending).
>>>>
>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>> on the other side. It is also buggy, with significant design flaws - the
>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>> operations together with CRTC writeback and duplicating features, which
>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>> rotator driver smaller by over 200 lines).
>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>> Stupid question, why DRM ?
>>
>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>
>> 1. we want to replace existing Exynos IPP subsystem:
>>  - it is used only in some internal/vendor trees, not in open-source
>>  - we want it to have sane and potentially extensible userspace API
>>  - but we don't want to loose its functionality
>>
>> 2. we want to have simple API for performing single image processing
>> operation:
>>  - typically it will be used by compositing window manager, this means that
>>    some parameters of the processing might change on each vblank (like
>>    destination rectangle for example). This api allows such change on each
>>    operation without any additional cost. V4L2 requires to reinitialize
>>    queues with new configuration on such change, what means that a bunch of
>>    ioctls has to be called.
> 
> What do you mean by re-initialising the queue? Format, buffers or something
> else?
> 
> If you need a larger buffer than what you have already allocated, you'll
> need to re-allocate, V4L2 or not.
> 
> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
> implementing that and some work in videobuf2.
> 
> Another thing is that V4L2 is very stream oriented. For most devices that's
> fine as a lot of the parameters are not changeable during streaming,
> especially if the pipeline is handled by multiple drivers. That said, for
> devices that process data from memory to memory performing changes in the
> media bus formats and pipeline configuration is not very efficient
> currently, largely for the same reason.
> 
> The request API that people have been working for a bit different use cases
> isn't in mainline yet. It would allow more efficient per-request
> configuration than what is currently possible, but it has turned out to be
> far from trivial to implement.
> 
>>  - validating processing parameters in V4l2 API is really complicated,
>>    because the parameters (format, src&dest rectangles, rotation) are being
>>    set incrementally, so we have to either allow some impossible,
>> transitional
>>    configurations or complicate the configuration steps even more (like
>>    calling some ioctls multiple times for both input and output). In the end
>>    all parameters have to be again validated just before performing the
>>    operation.
> 
> You have to validate the parameters in any case. In a MC pipeline this takes
> place when the stream is started.
> 
>>
>> 3. generic approach (to add it to DRM core) has been rejected:
>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> 
> For GPUs I generally understand the reasoning: there's a very limited number
> of users of this API --- primarily because it's not an application
> interface.
> 
> If you have a device that however falls under the scope of V4L2 (at least
> API-wise), does this continue to be the case? Will there be only one or two
> (or so) users for this API? Is it the case here?
> 
> Using a device specific interface definitely has some benefits: there's no
> need to think how would you generalise the interface for other similar
> devices. There's no need to consider backwards compatibility as it's not a
> requirement. The drawback is that the applications that need to support
> similar devices will bear the burden of having to support different APIs.
> 
> I don't mean to say that you should ram whatever under V4L2 / MC
> independently of how unworkable that might be, but there are also clear
> advantages in using a standardised interface such as V4L2.
> 
> V4L2 has a long history behind it and if it was designed today, I bet it
> would look quite different from what it is now.

It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.

However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.

It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)

In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.


Thanks,
Inki Dae

> 
>>
>> 4. this api can be considered as extended 'blit' operation, other DRM
>> drivers
>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>    place in DRM for it
> 
> Added LMML to cc.
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10  1:24         ` Inki Dae
@ 2017-05-10  5:38           ` Tomasz Figa
  2017-05-10  6:27             ` Inki Dae
  0 siblings, 1 reply; 34+ messages in thread
From: Tomasz Figa @ 2017-05-10  5:38 UTC (permalink / raw)
  To: linux-media
  Cc: Sakari Ailus, Marek Szyprowski, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, dri-devel,
	Tobias Jakobi, Laurent Pinchart, Sakari Ailus, Inki Dae

Hi Everyone,

On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>
>
> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>> Hi Marek,
>>
>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>> Hi Laurent,
>>>
>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>> Hi Marek,
>>>>
>>>> (CC'ing Sakari Ailus)
>>>>
>>>> Thank you for the patches.
>>>>
>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>> Dear all,
>>>>>
>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>> support for hardware modules, which can be used for processing image data
>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>> core":
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>> with fbdev API.
>>>>>
>>>>> Here is a bit more information what picture processors are:
>>>>>
>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>> such modules is implementing video overlay, which usually requires color
>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>> target window size.
>>>>>
>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>> standard DRM properties, which describes the operation to be performed by
>>>>> respective hardware module. In typical case those properties are a source
>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>> supported by the given hardware. To perform an operation on image data,
>>>>> userspace provides a set of properties and their values for given fbproc
>>>>> object in a similar way as object and properties are provided for
>>>>> performing atomic page flip / mode setting.
>>>>>
>>>>> The proposed API consists of the 3 new ioctls:
>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>   processors,
>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>   processor,
>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>   property set.
>>>>>
>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>> properties to add support for more advanced picture processing (for example
>>>>> blending).
>>>>>
>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>> operations together with CRTC writeback and duplicating features, which
>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>> rotator driver smaller by over 200 lines).
>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>> Stupid question, why DRM ?
>>>
>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>
>>> 1. we want to replace existing Exynos IPP subsystem:
>>>  - it is used only in some internal/vendor trees, not in open-source
>>>  - we want it to have sane and potentially extensible userspace API
>>>  - but we don't want to loose its functionality
>>>
>>> 2. we want to have simple API for performing single image processing
>>> operation:
>>>  - typically it will be used by compositing window manager, this means that
>>>    some parameters of the processing might change on each vblank (like
>>>    destination rectangle for example). This api allows such change on each
>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>    queues with new configuration on such change, what means that a bunch of
>>>    ioctls has to be called.
>>
>> What do you mean by re-initialising the queue? Format, buffers or something
>> else?
>>
>> If you need a larger buffer than what you have already allocated, you'll
>> need to re-allocate, V4L2 or not.
>>
>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>> implementing that and some work in videobuf2.
>>
>> Another thing is that V4L2 is very stream oriented. For most devices that's
>> fine as a lot of the parameters are not changeable during streaming,
>> especially if the pipeline is handled by multiple drivers. That said, for
>> devices that process data from memory to memory performing changes in the
>> media bus formats and pipeline configuration is not very efficient
>> currently, largely for the same reason.
>>
>> The request API that people have been working for a bit different use cases
>> isn't in mainline yet. It would allow more efficient per-request
>> configuration than what is currently possible, but it has turned out to be
>> far from trivial to implement.
>>
>>>  - validating processing parameters in V4l2 API is really complicated,
>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>    set incrementally, so we have to either allow some impossible,
>>> transitional
>>>    configurations or complicate the configuration steps even more (like
>>>    calling some ioctls multiple times for both input and output). In the end
>>>    all parameters have to be again validated just before performing the
>>>    operation.
>>
>> You have to validate the parameters in any case. In a MC pipeline this takes
>> place when the stream is started.
>>
>>>
>>> 3. generic approach (to add it to DRM core) has been rejected:
>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>
>> For GPUs I generally understand the reasoning: there's a very limited number
>> of users of this API --- primarily because it's not an application
>> interface.
>>
>> If you have a device that however falls under the scope of V4L2 (at least
>> API-wise), does this continue to be the case? Will there be only one or two
>> (or so) users for this API? Is it the case here?
>>
>> Using a device specific interface definitely has some benefits: there's no
>> need to think how would you generalise the interface for other similar
>> devices. There's no need to consider backwards compatibility as it's not a
>> requirement. The drawback is that the applications that need to support
>> similar devices will bear the burden of having to support different APIs.
>>
>> I don't mean to say that you should ram whatever under V4L2 / MC
>> independently of how unworkable that might be, but there are also clear
>> advantages in using a standardised interface such as V4L2.
>>
>> V4L2 has a long history behind it and if it was designed today, I bet it
>> would look quite different from what it is now.
>
> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>
> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>
> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>
> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>

I agree with many of the arguments given by Inki above and earlier by
Marek. However, they apply to already existing V4L2 implementation,
not V4L2 as the idea in general, and I believe a comparison against a
complete new API that doesn't even exist in the kernel tree and
userspace yet (only in terms of patches on the list) is not fair.

I strongly (if that's of any value) stand on Sakari's side and also
agree with DRM maintainers. V4L2 is already there, provides a general
interface for the userspace and already support the kind of devices
Marek mention. Sure, it might have several issues, but please give me
an example of a subsystem/interface/code that doesn't have any.
Instead of taking the easy (for short term) path, with a bit more
effort we can get something than in long run should end up much
better.

Best regards,
Tomasz

>
> Thanks,
> Inki Dae
>
>>
>>>
>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>> drivers
>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>    place in DRM for it
>>
>> Added LMML to cc.
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10  5:38           ` Tomasz Figa
@ 2017-05-10  6:27             ` Inki Dae
  2017-05-10  6:38                 ` Tomasz Figa
  2017-05-10  7:55               ` Daniel Vetter
  0 siblings, 2 replies; 34+ messages in thread
From: Inki Dae @ 2017-05-10  6:27 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: Sakari Ailus, Marek Szyprowski, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, dri-devel,
	Tobias Jakobi, Laurent Pinchart, Sakari Ailus

Hi Tomasz,

2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
> Hi Everyone,
> 
> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>>
>>
>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>> Hi Marek,
>>>
>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>> Hi Laurent,
>>>>
>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>> Hi Marek,
>>>>>
>>>>> (CC'ing Sakari Ailus)
>>>>>
>>>>> Thank you for the patches.
>>>>>
>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>> support for hardware modules, which can be used for processing image data
>>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>> core":
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>
>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>> with fbdev API.
>>>>>>
>>>>>> Here is a bit more information what picture processors are:
>>>>>>
>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>> target window size.
>>>>>>
>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>> respective hardware module. In typical case those properties are a source
>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>> object in a similar way as object and properties are provided for
>>>>>> performing atomic page flip / mode setting.
>>>>>>
>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>   processors,
>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>   processor,
>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>   property set.
>>>>>>
>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>> properties to add support for more advanced picture processing (for example
>>>>>> blending).
>>>>>>
>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>> rotator driver smaller by over 200 lines).
>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>> Stupid question, why DRM ?
>>>>
>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>
>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>  - it is used only in some internal/vendor trees, not in open-source
>>>>  - we want it to have sane and potentially extensible userspace API
>>>>  - but we don't want to loose its functionality
>>>>
>>>> 2. we want to have simple API for performing single image processing
>>>> operation:
>>>>  - typically it will be used by compositing window manager, this means that
>>>>    some parameters of the processing might change on each vblank (like
>>>>    destination rectangle for example). This api allows such change on each
>>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>>    queues with new configuration on such change, what means that a bunch of
>>>>    ioctls has to be called.
>>>
>>> What do you mean by re-initialising the queue? Format, buffers or something
>>> else?
>>>
>>> If you need a larger buffer than what you have already allocated, you'll
>>> need to re-allocate, V4L2 or not.
>>>
>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>> implementing that and some work in videobuf2.
>>>
>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>> fine as a lot of the parameters are not changeable during streaming,
>>> especially if the pipeline is handled by multiple drivers. That said, for
>>> devices that process data from memory to memory performing changes in the
>>> media bus formats and pipeline configuration is not very efficient
>>> currently, largely for the same reason.
>>>
>>> The request API that people have been working for a bit different use cases
>>> isn't in mainline yet. It would allow more efficient per-request
>>> configuration than what is currently possible, but it has turned out to be
>>> far from trivial to implement.
>>>
>>>>  - validating processing parameters in V4l2 API is really complicated,
>>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>>    set incrementally, so we have to either allow some impossible,
>>>> transitional
>>>>    configurations or complicate the configuration steps even more (like
>>>>    calling some ioctls multiple times for both input and output). In the end
>>>>    all parameters have to be again validated just before performing the
>>>>    operation.
>>>
>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>> place when the stream is started.
>>>
>>>>
>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>
>>> For GPUs I generally understand the reasoning: there's a very limited number
>>> of users of this API --- primarily because it's not an application
>>> interface.
>>>
>>> If you have a device that however falls under the scope of V4L2 (at least
>>> API-wise), does this continue to be the case? Will there be only one or two
>>> (or so) users for this API? Is it the case here?
>>>
>>> Using a device specific interface definitely has some benefits: there's no
>>> need to think how would you generalise the interface for other similar
>>> devices. There's no need to consider backwards compatibility as it's not a
>>> requirement. The drawback is that the applications that need to support
>>> similar devices will bear the burden of having to support different APIs.
>>>
>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>> independently of how unworkable that might be, but there are also clear
>>> advantages in using a standardised interface such as V4L2.
>>>
>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>> would look quite different from what it is now.
>>
>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>>
>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>>
>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>>
>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>>
> 
> I agree with many of the arguments given by Inki above and earlier by
> Marek. However, they apply to already existing V4L2 implementation,
> not V4L2 as the idea in general, and I believe a comparison against a
> complete new API that doesn't even exist in the kernel tree and
> userspace yet (only in terms of patches on the list) is not fair.

Below is a user space who uses Exynos DRM post processor driver, IPP driver.
https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen

Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
Ps. other DRM drivers in mainline already have such or similar API.

We will also open the user space who uses new API later.


Thanks,
Inki Dae

> 
> I strongly (if that's of any value) stand on Sakari's side and also
> agree with DRM maintainers. V4L2 is already there, provides a general
> interface for the userspace and already support the kind of devices
> Marek mention. Sure, it might have several issues, but please give me
> an example of a subsystem/interface/code that doesn't have any.
> Instead of taking the easy (for short term) path, with a bit more
> effort we can get something than in long run should end up much
> better.
> 
> Best regards,
> Tomasz
> 
>>
>> Thanks,
>> Inki Dae
>>
>>>
>>>>
>>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>>> drivers
>>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>>    place in DRM for it
>>>
>>> Added LMML to cc.
>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10  6:27             ` Inki Dae
@ 2017-05-10  6:38                 ` Tomasz Figa
  2017-05-10  7:55               ` Daniel Vetter
  1 sibling, 0 replies; 34+ messages in thread
From: Tomasz Figa @ 2017-05-10  6:38 UTC (permalink / raw)
  To: Inki Dae
  Cc: linux-media, Sakari Ailus, Marek Szyprowski, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, dri-devel,
	Tobias Jakobi, Laurent Pinchart, Sakari Ailus

On Wed, May 10, 2017 at 2:27 PM, Inki Dae <inki.dae@samsung.com> wrote:
> Hi Tomasz,
>
> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>> Hi Everyone,
>>
>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>>>
>>>
>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>> Hi Marek,
>>>>
>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>> Hi Laurent,
>>>>>
>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>> Hi Marek,
>>>>>>
>>>>>> (CC'ing Sakari Ailus)
>>>>>>
>>>>>> Thank you for the patches.
>>>>>>
>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>> support for hardware modules, which can be used for processing image data
>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>> core":
>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>
>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>>> with fbdev API.
>>>>>>>
>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>
>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>>> target window size.
>>>>>>>
>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>>> respective hardware module. In typical case those properties are a source
>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>>> object in a similar way as object and properties are provided for
>>>>>>> performing atomic page flip / mode setting.
>>>>>>>
>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>   processors,
>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>>   processor,
>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>>   property set.
>>>>>>>
>>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>>> properties to add support for more advanced picture processing (for example
>>>>>>> blending).
>>>>>>>
>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>>> rotator driver smaller by over 200 lines).
>>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>>> Stupid question, why DRM ?
>>>>>
>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>>
>>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>>  - it is used only in some internal/vendor trees, not in open-source
>>>>>  - we want it to have sane and potentially extensible userspace API
>>>>>  - but we don't want to loose its functionality
>>>>>
>>>>> 2. we want to have simple API for performing single image processing
>>>>> operation:
>>>>>  - typically it will be used by compositing window manager, this means that
>>>>>    some parameters of the processing might change on each vblank (like
>>>>>    destination rectangle for example). This api allows such change on each
>>>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>>>    queues with new configuration on such change, what means that a bunch of
>>>>>    ioctls has to be called.
>>>>
>>>> What do you mean by re-initialising the queue? Format, buffers or something
>>>> else?
>>>>
>>>> If you need a larger buffer than what you have already allocated, you'll
>>>> need to re-allocate, V4L2 or not.
>>>>
>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>>> implementing that and some work in videobuf2.
>>>>
>>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>>> fine as a lot of the parameters are not changeable during streaming,
>>>> especially if the pipeline is handled by multiple drivers. That said, for
>>>> devices that process data from memory to memory performing changes in the
>>>> media bus formats and pipeline configuration is not very efficient
>>>> currently, largely for the same reason.
>>>>
>>>> The request API that people have been working for a bit different use cases
>>>> isn't in mainline yet. It would allow more efficient per-request
>>>> configuration than what is currently possible, but it has turned out to be
>>>> far from trivial to implement.
>>>>
>>>>>  - validating processing parameters in V4l2 API is really complicated,
>>>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>>>    set incrementally, so we have to either allow some impossible,
>>>>> transitional
>>>>>    configurations or complicate the configuration steps even more (like
>>>>>    calling some ioctls multiple times for both input and output). In the end
>>>>>    all parameters have to be again validated just before performing the
>>>>>    operation.
>>>>
>>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>>> place when the stream is started.
>>>>
>>>>>
>>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>
>>>> For GPUs I generally understand the reasoning: there's a very limited number
>>>> of users of this API --- primarily because it's not an application
>>>> interface.
>>>>
>>>> If you have a device that however falls under the scope of V4L2 (at least
>>>> API-wise), does this continue to be the case? Will there be only one or two
>>>> (or so) users for this API? Is it the case here?
>>>>
>>>> Using a device specific interface definitely has some benefits: there's no
>>>> need to think how would you generalise the interface for other similar
>>>> devices. There's no need to consider backwards compatibility as it's not a
>>>> requirement. The drawback is that the applications that need to support
>>>> similar devices will bear the burden of having to support different APIs.
>>>>
>>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>>> independently of how unworkable that might be, but there are also clear
>>>> advantages in using a standardised interface such as V4L2.
>>>>
>>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>>> would look quite different from what it is now.
>>>
>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>>>
>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>>>
>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>>>
>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>>>
>>
>> I agree with many of the arguments given by Inki above and earlier by
>> Marek. However, they apply to already existing V4L2 implementation,
>> not V4L2 as the idea in general, and I believe a comparison against a
>> complete new API that doesn't even exist in the kernel tree and
>> userspace yet (only in terms of patches on the list) is not fair.
>
> Below is a user space who uses Exynos DRM post processor driver, IPP driver.
> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen
>

Right, but the API is really Exynos-specific, while V4L2 is designed
from the start as a generic one and maintained as such.

> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
> Ps. other DRM drivers in mainline already have such or similar API.

This kind of contradicts with response Marek received from DRM
community about his proposal. Which drivers in particular you have in
mind?

>
> We will also open the user space who uses new API later.

There is also already user space which uses V4L2 for this and V4L2
drivers for hardware similar to the one targeted by Marek's proposal,
including GStreamer support and iMX6 devices that Nicolas mentioned
before.

Best regards,
Tomasz

>
>
> Thanks,
> Inki Dae
>
>>
>> I strongly (if that's of any value) stand on Sakari's side and also
>> agree with DRM maintainers. V4L2 is already there, provides a general
>> interface for the userspace and already support the kind of devices
>> Marek mention. Sure, it might have several issues, but please give me
>> an example of a subsystem/interface/code that doesn't have any.
>> Instead of taking the easy (for short term) path, with a bit more
>> effort we can get something than in long run should end up much
>> better.
>>
>> Best regards,
>> Tomasz
>>
>>>
>>> Thanks,
>>> Inki Dae
>>>
>>>>
>>>>>
>>>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>>>> drivers
>>>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>>>    place in DRM for it
>>>>
>>>> Added LMML to cc.
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
@ 2017-05-10  6:38                 ` Tomasz Figa
  0 siblings, 0 replies; 34+ messages in thread
From: Tomasz Figa @ 2017-05-10  6:38 UTC (permalink / raw)
  To: Inki Dae
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	dri-devel, Tobias Jakobi, Sakari Ailus, Laurent Pinchart,
	Sakari Ailus, linux-media, Marek Szyprowski

On Wed, May 10, 2017 at 2:27 PM, Inki Dae <inki.dae@samsung.com> wrote:
> Hi Tomasz,
>
> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>> Hi Everyone,
>>
>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>>>
>>>
>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>> Hi Marek,
>>>>
>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>> Hi Laurent,
>>>>>
>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>> Hi Marek,
>>>>>>
>>>>>> (CC'ing Sakari Ailus)
>>>>>>
>>>>>> Thank you for the patches.
>>>>>>
>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>> support for hardware modules, which can be used for processing image data
>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>> core":
>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>
>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>>> with fbdev API.
>>>>>>>
>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>
>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>>> target window size.
>>>>>>>
>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>>> respective hardware module. In typical case those properties are a source
>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>>> object in a similar way as object and properties are provided for
>>>>>>> performing atomic page flip / mode setting.
>>>>>>>
>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>   processors,
>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>>   processor,
>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>>   property set.
>>>>>>>
>>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>>> properties to add support for more advanced picture processing (for example
>>>>>>> blending).
>>>>>>>
>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>>> rotator driver smaller by over 200 lines).
>>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>>> Stupid question, why DRM ?
>>>>>
>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>>
>>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>>  - it is used only in some internal/vendor trees, not in open-source
>>>>>  - we want it to have sane and potentially extensible userspace API
>>>>>  - but we don't want to loose its functionality
>>>>>
>>>>> 2. we want to have simple API for performing single image processing
>>>>> operation:
>>>>>  - typically it will be used by compositing window manager, this means that
>>>>>    some parameters of the processing might change on each vblank (like
>>>>>    destination rectangle for example). This api allows such change on each
>>>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>>>    queues with new configuration on such change, what means that a bunch of
>>>>>    ioctls has to be called.
>>>>
>>>> What do you mean by re-initialising the queue? Format, buffers or something
>>>> else?
>>>>
>>>> If you need a larger buffer than what you have already allocated, you'll
>>>> need to re-allocate, V4L2 or not.
>>>>
>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>>> implementing that and some work in videobuf2.
>>>>
>>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>>> fine as a lot of the parameters are not changeable during streaming,
>>>> especially if the pipeline is handled by multiple drivers. That said, for
>>>> devices that process data from memory to memory performing changes in the
>>>> media bus formats and pipeline configuration is not very efficient
>>>> currently, largely for the same reason.
>>>>
>>>> The request API that people have been working for a bit different use cases
>>>> isn't in mainline yet. It would allow more efficient per-request
>>>> configuration than what is currently possible, but it has turned out to be
>>>> far from trivial to implement.
>>>>
>>>>>  - validating processing parameters in V4l2 API is really complicated,
>>>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>>>    set incrementally, so we have to either allow some impossible,
>>>>> transitional
>>>>>    configurations or complicate the configuration steps even more (like
>>>>>    calling some ioctls multiple times for both input and output). In the end
>>>>>    all parameters have to be again validated just before performing the
>>>>>    operation.
>>>>
>>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>>> place when the stream is started.
>>>>
>>>>>
>>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>
>>>> For GPUs I generally understand the reasoning: there's a very limited number
>>>> of users of this API --- primarily because it's not an application
>>>> interface.
>>>>
>>>> If you have a device that however falls under the scope of V4L2 (at least
>>>> API-wise), does this continue to be the case? Will there be only one or two
>>>> (or so) users for this API? Is it the case here?
>>>>
>>>> Using a device specific interface definitely has some benefits: there's no
>>>> need to think how would you generalise the interface for other similar
>>>> devices. There's no need to consider backwards compatibility as it's not a
>>>> requirement. The drawback is that the applications that need to support
>>>> similar devices will bear the burden of having to support different APIs.
>>>>
>>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>>> independently of how unworkable that might be, but there are also clear
>>>> advantages in using a standardised interface such as V4L2.
>>>>
>>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>>> would look quite different from what it is now.
>>>
>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>>>
>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>>>
>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>>>
>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>>>
>>
>> I agree with many of the arguments given by Inki above and earlier by
>> Marek. However, they apply to already existing V4L2 implementation,
>> not V4L2 as the idea in general, and I believe a comparison against a
>> complete new API that doesn't even exist in the kernel tree and
>> userspace yet (only in terms of patches on the list) is not fair.
>
> Below is a user space who uses Exynos DRM post processor driver, IPP driver.
> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen
>

Right, but the API is really Exynos-specific, while V4L2 is designed
from the start as a generic one and maintained as such.

> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
> Ps. other DRM drivers in mainline already have such or similar API.

This kind of contradicts with response Marek received from DRM
community about his proposal. Which drivers in particular you have in
mind?

>
> We will also open the user space who uses new API later.

There is also already user space which uses V4L2 for this and V4L2
drivers for hardware similar to the one targeted by Marek's proposal,
including GStreamer support and iMX6 devices that Nicolas mentioned
before.

Best regards,
Tomasz

>
>
> Thanks,
> Inki Dae
>
>>
>> I strongly (if that's of any value) stand on Sakari's side and also
>> agree with DRM maintainers. V4L2 is already there, provides a general
>> interface for the userspace and already support the kind of devices
>> Marek mention. Sure, it might have several issues, but please give me
>> an example of a subsystem/interface/code that doesn't have any.
>> Instead of taking the easy (for short term) path, with a bit more
>> effort we can get something than in long run should end up much
>> better.
>>
>> Best regards,
>> Tomasz
>>
>>>
>>> Thanks,
>>> Inki Dae
>>>
>>>>
>>>>>
>>>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>>>> drivers
>>>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>>>    place in DRM for it
>>>>
>>>> Added LMML to cc.
>>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10  6:27             ` Inki Dae
  2017-05-10  6:38                 ` Tomasz Figa
@ 2017-05-10  7:55               ` Daniel Vetter
  2017-05-10 10:31                   ` Inki Dae
  2017-05-10 13:30                 ` Marek Szyprowski
  1 sibling, 2 replies; 34+ messages in thread
From: Daniel Vetter @ 2017-05-10  7:55 UTC (permalink / raw)
  To: Inki Dae
  Cc: Tomasz Figa, linux-media, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, dri-devel,
	Tobias Jakobi, Sakari Ailus, Laurent Pinchart, Sakari Ailus,
	Marek Szyprowski

On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote:
> Hi Tomasz,
> 
> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
> > Hi Everyone,
> > 
> > On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
> >>
> >>
> >> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
> >>> Hi Marek,
> >>>
> >>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
> >>>> Hi Laurent,
> >>>>
> >>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
> >>>>> Hi Marek,
> >>>>>
> >>>>> (CC'ing Sakari Ailus)
> >>>>>
> >>>>> Thank you for the patches.
> >>>>>
> >>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
> >>>>>> Dear all,
> >>>>>>
> >>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
> >>>>>> support for hardware modules, which can be used for processing image data
> >>>>> >from the one memory buffer to another. Typical memory-to-memory operations
> >>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
> >>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
> >>>>>> processors", which has been rejected as "not really needed in the DRM
> >>>>>> core":
> >>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> >>>>>>
> >>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
> >>>>>> will be specific only to Exynos DRM. I've also changed the name from
> >>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
> >>>>>> with fbdev API.
> >>>>>>
> >>>>>> Here is a bit more information what picture processors are:
> >>>>>>
> >>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
> >>>>>> such operations. They can be used in paralel to the main GPU module to
> >>>>>> offload CPU from processing grapics or video data. One of example use of
> >>>>>> such modules is implementing video overlay, which usually requires color
> >>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
> >>>>>> target window size.
> >>>>>>
> >>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
> >>>>>> based on DRM objects and their properties. A new DRM object is introduced:
> >>>>>> picture processor (called pp for convenience). Such objects have a set of
> >>>>>> standard DRM properties, which describes the operation to be performed by
> >>>>>> respective hardware module. In typical case those properties are a source
> >>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
> >>>>>> rectangle. Optionally a rotation property can be also specified if
> >>>>>> supported by the given hardware. To perform an operation on image data,
> >>>>>> userspace provides a set of properties and their values for given fbproc
> >>>>>> object in a similar way as object and properties are provided for
> >>>>>> performing atomic page flip / mode setting.
> >>>>>>
> >>>>>> The proposed API consists of the 3 new ioctls:
> >>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
> >>>>>>   processors,
> >>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
> >>>>>>   processor,
> >>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
> >>>>>>   property set.
> >>>>>>
> >>>>>> The proposed API is extensible. Drivers can attach their own, custom
> >>>>>> properties to add support for more advanced picture processing (for example
> >>>>>> blending).
> >>>>>>
> >>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
> >>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
> >>>>>> on the other side. It is also buggy, with significant design flaws - the
> >>>>>> biggest issue is the fact that the API covers memory-2-memory picture
> >>>>>> operations together with CRTC writeback and duplicating features, which
> >>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
> >>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
> >>>>>> rotator driver smaller by over 200 lines).
> >>>>> This seems to be the kind of hardware that is typically supported by V4L2.
> >>>>> Stupid question, why DRM ?
> >>>>
> >>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
> >>>>
> >>>> 1. we want to replace existing Exynos IPP subsystem:
> >>>>  - it is used only in some internal/vendor trees, not in open-source
> >>>>  - we want it to have sane and potentially extensible userspace API
> >>>>  - but we don't want to loose its functionality
> >>>>
> >>>> 2. we want to have simple API for performing single image processing
> >>>> operation:
> >>>>  - typically it will be used by compositing window manager, this means that
> >>>>    some parameters of the processing might change on each vblank (like
> >>>>    destination rectangle for example). This api allows such change on each
> >>>>    operation without any additional cost. V4L2 requires to reinitialize
> >>>>    queues with new configuration on such change, what means that a bunch of
> >>>>    ioctls has to be called.
> >>>
> >>> What do you mean by re-initialising the queue? Format, buffers or something
> >>> else?
> >>>
> >>> If you need a larger buffer than what you have already allocated, you'll
> >>> need to re-allocate, V4L2 or not.
> >>>
> >>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
> >>> implementing that and some work in videobuf2.
> >>>
> >>> Another thing is that V4L2 is very stream oriented. For most devices that's
> >>> fine as a lot of the parameters are not changeable during streaming,
> >>> especially if the pipeline is handled by multiple drivers. That said, for
> >>> devices that process data from memory to memory performing changes in the
> >>> media bus formats and pipeline configuration is not very efficient
> >>> currently, largely for the same reason.
> >>>
> >>> The request API that people have been working for a bit different use cases
> >>> isn't in mainline yet. It would allow more efficient per-request
> >>> configuration than what is currently possible, but it has turned out to be
> >>> far from trivial to implement.
> >>>
> >>>>  - validating processing parameters in V4l2 API is really complicated,
> >>>>    because the parameters (format, src&dest rectangles, rotation) are being
> >>>>    set incrementally, so we have to either allow some impossible,
> >>>> transitional
> >>>>    configurations or complicate the configuration steps even more (like
> >>>>    calling some ioctls multiple times for both input and output). In the end
> >>>>    all parameters have to be again validated just before performing the
> >>>>    operation.
> >>>
> >>> You have to validate the parameters in any case. In a MC pipeline this takes
> >>> place when the stream is started.
> >>>
> >>>>
> >>>> 3. generic approach (to add it to DRM core) has been rejected:
> >>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
> >>>
> >>> For GPUs I generally understand the reasoning: there's a very limited number
> >>> of users of this API --- primarily because it's not an application
> >>> interface.
> >>>
> >>> If you have a device that however falls under the scope of V4L2 (at least
> >>> API-wise), does this continue to be the case? Will there be only one or two
> >>> (or so) users for this API? Is it the case here?
> >>>
> >>> Using a device specific interface definitely has some benefits: there's no
> >>> need to think how would you generalise the interface for other similar
> >>> devices. There's no need to consider backwards compatibility as it's not a
> >>> requirement. The drawback is that the applications that need to support
> >>> similar devices will bear the burden of having to support different APIs.
> >>>
> >>> I don't mean to say that you should ram whatever under V4L2 / MC
> >>> independently of how unworkable that might be, but there are also clear
> >>> advantages in using a standardised interface such as V4L2.
> >>>
> >>> V4L2 has a long history behind it and if it was designed today, I bet it
> >>> would look quite different from what it is now.
> >>
> >> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
> >>
> >> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
> >>
> >> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
> >>
> >> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
> >>
> > 
> > I agree with many of the arguments given by Inki above and earlier by
> > Marek. However, they apply to already existing V4L2 implementation,
> > not V4L2 as the idea in general, and I believe a comparison against a
> > complete new API that doesn't even exist in the kernel tree and
> > userspace yet (only in terms of patches on the list) is not fair.
> 
> Below is a user space who uses Exynos DRM post processor driver, IPP driver.
> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen
> 
> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
> Ps. other DRM drivers in mainline already have such or similar API.
> 
> We will also open the user space who uses new API later.

Those drivers are different, because they just expose a hw-specific abi.
Like the current IPP interfaces exposed by drm/exynos.

I think you have 2 options:
- Extend the current IPP interface in drm/exynos with whatever new pixel
  processor modes you want. Of course this still means you need to have
  the userspace side open-source, but otherwise it's all private to exynos
  hardware and software.

- If you want something standardized otoh, go with v4l2. And the issues
  you point out in v4l2 aren't uapi issues, but implementation details of
  the current vbuf helpers, which can be fixed. At least that's my
  understanding. And it should be fairly easy to fix that, simply switch
  from doing a map/unmap for every q/deqbuf to caching the mappings and
  use the stream dma-api interfaces to only do the flush (if needed at
  all, should turn into a no-op) on q/deqbuf.

Trying to come up with a generic drm api has imo not much chance of
getting accepted anytime soon (since for the simple pixel processor
pipeline it's just duplicating v4l, and for something more generic/faster
a generic interfaces is alwas too slow).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10  6:38                 ` Tomasz Figa
  (?)
@ 2017-05-10 10:29                 ` Inki Dae
  2017-05-10 13:18                     ` Daniel Vetter
  -1 siblings, 1 reply; 34+ messages in thread
From: Inki Dae @ 2017-05-10 10:29 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: linux-media, Sakari Ailus, Marek Szyprowski, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, dri-devel,
	Tobias Jakobi, Laurent Pinchart, Sakari Ailus



2017년 05월 10일 15:38에 Tomasz Figa 이(가) 쓴 글:
> On Wed, May 10, 2017 at 2:27 PM, Inki Dae <inki.dae@samsung.com> wrote:
>> Hi Tomasz,
>>
>> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>>> Hi Everyone,
>>>
>>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>>>>
>>>>
>>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>>> Hi Marek,
>>>>>
>>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>>> Hi Laurent,
>>>>>>
>>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> (CC'ing Sakari Ailus)
>>>>>>>
>>>>>>> Thank you for the patches.
>>>>>>>
>>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>>> support for hardware modules, which can be used for processing image data
>>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>>> core":
>>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>>
>>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>>>> with fbdev API.
>>>>>>>>
>>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>>
>>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>>>> target window size.
>>>>>>>>
>>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>>>> respective hardware module. In typical case those properties are a source
>>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>>>> object in a similar way as object and properties are provided for
>>>>>>>> performing atomic page flip / mode setting.
>>>>>>>>
>>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>>   processors,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>>>   processor,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>>>   property set.
>>>>>>>>
>>>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>>>> properties to add support for more advanced picture processing (for example
>>>>>>>> blending).
>>>>>>>>
>>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>>>> rotator driver smaller by over 200 lines).
>>>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>>>> Stupid question, why DRM ?
>>>>>>
>>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>>>
>>>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>>>  - it is used only in some internal/vendor trees, not in open-source
>>>>>>  - we want it to have sane and potentially extensible userspace API
>>>>>>  - but we don't want to loose its functionality
>>>>>>
>>>>>> 2. we want to have simple API for performing single image processing
>>>>>> operation:
>>>>>>  - typically it will be used by compositing window manager, this means that
>>>>>>    some parameters of the processing might change on each vblank (like
>>>>>>    destination rectangle for example). This api allows such change on each
>>>>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>>>>    queues with new configuration on such change, what means that a bunch of
>>>>>>    ioctls has to be called.
>>>>>
>>>>> What do you mean by re-initialising the queue? Format, buffers or something
>>>>> else?
>>>>>
>>>>> If you need a larger buffer than what you have already allocated, you'll
>>>>> need to re-allocate, V4L2 or not.
>>>>>
>>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>>>> implementing that and some work in videobuf2.
>>>>>
>>>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>>>> fine as a lot of the parameters are not changeable during streaming,
>>>>> especially if the pipeline is handled by multiple drivers. That said, for
>>>>> devices that process data from memory to memory performing changes in the
>>>>> media bus formats and pipeline configuration is not very efficient
>>>>> currently, largely for the same reason.
>>>>>
>>>>> The request API that people have been working for a bit different use cases
>>>>> isn't in mainline yet. It would allow more efficient per-request
>>>>> configuration than what is currently possible, but it has turned out to be
>>>>> far from trivial to implement.
>>>>>
>>>>>>  - validating processing parameters in V4l2 API is really complicated,
>>>>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>>>>    set incrementally, so we have to either allow some impossible,
>>>>>> transitional
>>>>>>    configurations or complicate the configuration steps even more (like
>>>>>>    calling some ioctls multiple times for both input and output). In the end
>>>>>>    all parameters have to be again validated just before performing the
>>>>>>    operation.
>>>>>
>>>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>>>> place when the stream is started.
>>>>>
>>>>>>
>>>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> For GPUs I generally understand the reasoning: there's a very limited number
>>>>> of users of this API --- primarily because it's not an application
>>>>> interface.
>>>>>
>>>>> If you have a device that however falls under the scope of V4L2 (at least
>>>>> API-wise), does this continue to be the case? Will there be only one or two
>>>>> (or so) users for this API? Is it the case here?
>>>>>
>>>>> Using a device specific interface definitely has some benefits: there's no
>>>>> need to think how would you generalise the interface for other similar
>>>>> devices. There's no need to consider backwards compatibility as it's not a
>>>>> requirement. The drawback is that the applications that need to support
>>>>> similar devices will bear the burden of having to support different APIs.
>>>>>
>>>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>>>> independently of how unworkable that might be, but there are also clear
>>>>> advantages in using a standardised interface such as V4L2.
>>>>>
>>>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>>>> would look quite different from what it is now.
>>>>
>>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>>>>
>>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>>>>
>>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>>>>
>>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>>>>
>>>
>>> I agree with many of the arguments given by Inki above and earlier by
>>> Marek. However, they apply to already existing V4L2 implementation,
>>> not V4L2 as the idea in general, and I believe a comparison against a
>>> complete new API that doesn't even exist in the kernel tree and
>>> userspace yet (only in terms of patches on the list) is not fair.
>>
>> Below is a user space who uses Exynos DRM post processor driver, IPP driver.
>> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen
>>
> 
> Right, but the API is really Exynos-specific, while V4L2 is designed
> from the start as a generic one and maintained as such.

As I mentioned before, I think this is only benefit we could get through V4L2.

> 
>> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
>> Ps. other DRM drivers in mainline already have such or similar API.
> 
> This kind of contradicts with response Marek received from DRM
> community about his proposal. Which drivers in particular you have in
> mind?

You can check vmw_overlay_ioctl of vmwgfx driver and intel_overlay_put_image_ioctl of i915 driver. These was all I could find in mainline.
Seems the boundaries of whether we have to implement pre/post post processing mem2mem driver in V4L2 or DRM are really vague.


Thanks,
Inki Dae

> 
>>
>> We will also open the user space who uses new API later.
> 
> There is also already user space which uses V4L2 for this and V4L2
> drivers for hardware similar to the one targeted by Marek's proposal,
> including GStreamer support and iMX6 devices that Nicolas mentioned
> before.
> 
> Best regards,
> Tomasz
> 
>>
>>
>> Thanks,
>> Inki Dae
>>
>>>
>>> I strongly (if that's of any value) stand on Sakari's side and also
>>> agree with DRM maintainers. V4L2 is already there, provides a general
>>> interface for the userspace and already support the kind of devices
>>> Marek mention. Sure, it might have several issues, but please give me
>>> an example of a subsystem/interface/code that doesn't have any.
>>> Instead of taking the easy (for short term) path, with a bit more
>>> effort we can get something than in long run should end up much
>>> better.
>>>
>>> Best regards,
>>> Tomasz
>>>
>>>>
>>>> Thanks,
>>>> Inki Dae
>>>>
>>>>>
>>>>>>
>>>>>> 4. this api can be considered as extended 'blit' operation, other DRM
>>>>>> drivers
>>>>>>    (MGA, R128, VIA) already have ioctls for such operation, so there is also
>>>>>>    place in DRM for it
>>>>>
>>>>> Added LMML to cc.
>>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
> 
> 
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10  7:55               ` Daniel Vetter
@ 2017-05-10 10:31                   ` Inki Dae
  2017-05-10 13:30                 ` Marek Szyprowski
  1 sibling, 0 replies; 34+ messages in thread
From: Inki Dae @ 2017-05-10 10:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Tomasz Figa, linux-media, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, dri-devel,
	Tobias Jakobi, Sakari Ailus, Laurent Pinchart, Sakari Ailus,
	Marek Szyprowski



2017년 05월 10일 16:55에 Daniel Vetter 이(가) 쓴 글:
> On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote:
>> Hi Tomasz,
>>
>> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>>> Hi Everyone,
>>>
>>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>>>>
>>>>
>>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>>> Hi Marek,
>>>>>
>>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>>> Hi Laurent,
>>>>>>
>>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> (CC'ing Sakari Ailus)
>>>>>>>
>>>>>>> Thank you for the patches.
>>>>>>>
>>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>>> support for hardware modules, which can be used for processing image data
>>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>>> core":
>>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>>
>>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>>>> with fbdev API.
>>>>>>>>
>>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>>
>>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>>>> target window size.
>>>>>>>>
>>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>>>> respective hardware module. In typical case those properties are a source
>>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>>>> object in a similar way as object and properties are provided for
>>>>>>>> performing atomic page flip / mode setting.
>>>>>>>>
>>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>>   processors,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>>>   processor,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>>>   property set.
>>>>>>>>
>>>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>>>> properties to add support for more advanced picture processing (for example
>>>>>>>> blending).
>>>>>>>>
>>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>>>> rotator driver smaller by over 200 lines).
>>>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>>>> Stupid question, why DRM ?
>>>>>>
>>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>>>
>>>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>>>  - it is used only in some internal/vendor trees, not in open-source
>>>>>>  - we want it to have sane and potentially extensible userspace API
>>>>>>  - but we don't want to loose its functionality
>>>>>>
>>>>>> 2. we want to have simple API for performing single image processing
>>>>>> operation:
>>>>>>  - typically it will be used by compositing window manager, this means that
>>>>>>    some parameters of the processing might change on each vblank (like
>>>>>>    destination rectangle for example). This api allows such change on each
>>>>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>>>>    queues with new configuration on such change, what means that a bunch of
>>>>>>    ioctls has to be called.
>>>>>
>>>>> What do you mean by re-initialising the queue? Format, buffers or something
>>>>> else?
>>>>>
>>>>> If you need a larger buffer than what you have already allocated, you'll
>>>>> need to re-allocate, V4L2 or not.
>>>>>
>>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>>>> implementing that and some work in videobuf2.
>>>>>
>>>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>>>> fine as a lot of the parameters are not changeable during streaming,
>>>>> especially if the pipeline is handled by multiple drivers. That said, for
>>>>> devices that process data from memory to memory performing changes in the
>>>>> media bus formats and pipeline configuration is not very efficient
>>>>> currently, largely for the same reason.
>>>>>
>>>>> The request API that people have been working for a bit different use cases
>>>>> isn't in mainline yet. It would allow more efficient per-request
>>>>> configuration than what is currently possible, but it has turned out to be
>>>>> far from trivial to implement.
>>>>>
>>>>>>  - validating processing parameters in V4l2 API is really complicated,
>>>>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>>>>    set incrementally, so we have to either allow some impossible,
>>>>>> transitional
>>>>>>    configurations or complicate the configuration steps even more (like
>>>>>>    calling some ioctls multiple times for both input and output). In the end
>>>>>>    all parameters have to be again validated just before performing the
>>>>>>    operation.
>>>>>
>>>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>>>> place when the stream is started.
>>>>>
>>>>>>
>>>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> For GPUs I generally understand the reasoning: there's a very limited number
>>>>> of users of this API --- primarily because it's not an application
>>>>> interface.
>>>>>
>>>>> If you have a device that however falls under the scope of V4L2 (at least
>>>>> API-wise), does this continue to be the case? Will there be only one or two
>>>>> (or so) users for this API? Is it the case here?
>>>>>
>>>>> Using a device specific interface definitely has some benefits: there's no
>>>>> need to think how would you generalise the interface for other similar
>>>>> devices. There's no need to consider backwards compatibility as it's not a
>>>>> requirement. The drawback is that the applications that need to support
>>>>> similar devices will bear the burden of having to support different APIs.
>>>>>
>>>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>>>> independently of how unworkable that might be, but there are also clear
>>>>> advantages in using a standardised interface such as V4L2.
>>>>>
>>>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>>>> would look quite different from what it is now.
>>>>
>>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>>>>
>>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>>>>
>>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>>>>
>>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>>>>
>>>
>>> I agree with many of the arguments given by Inki above and earlier by
>>> Marek. However, they apply to already existing V4L2 implementation,
>>> not V4L2 as the idea in general, and I believe a comparison against a
>>> complete new API that doesn't even exist in the kernel tree and
>>> userspace yet (only in terms of patches on the list) is not fair.
>>
>> Below is a user space who uses Exynos DRM post processor driver, IPP driver.
>> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen
>>
>> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
>> Ps. other DRM drivers in mainline already have such or similar API.
>>
>> We will also open the user space who uses new API later.
> 
> Those drivers are different, because they just expose a hw-specific abi.
> Like the current IPP interfaces exposed by drm/exynos.
> 
> I think you have 2 options:
> - Extend the current IPP interface in drm/exynos with whatever new pixel
>   processor modes you want. Of course this still means you need to have

Yes, this is only thing we could select as of now.


Thanks,
Inki Dae


>   the userspace side open-source, but otherwise it's all private to exynos
>   hardware and software.
> 
> - If you want something standardized otoh, go with v4l2. And the issues
>   you point out in v4l2 aren't uapi issues, but implementation details of
>   the current vbuf helpers, which can be fixed. At least that's my
>   understanding. And it should be fairly easy to fix that, simply switch
>   from doing a map/unmap for every q/deqbuf to caching the mappings and
>   use the stream dma-api interfaces to only do the flush (if needed at
>   all, should turn into a no-op) on q/deqbuf.
> 
> Trying to come up with a generic drm api has imo not much chance of
> getting accepted anytime soon (since for the simple pixel processor
> pipeline it's just duplicating v4l, and for something more generic/faster
> a generic interfaces is alwas too slow).
> -Daniel
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
@ 2017-05-10 10:31                   ` Inki Dae
  0 siblings, 0 replies; 34+ messages in thread
From: Inki Dae @ 2017-05-10 10:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	dri-devel, Tomasz Figa, Tobias Jakobi, Sakari Ailus,
	Laurent Pinchart, Sakari Ailus, Marek Szyprowski, linux-media



2017년 05월 10일 16:55에 Daniel Vetter 이(가) 쓴 글:
> On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote:
>> Hi Tomasz,
>>
>> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>>> Hi Everyone,
>>>
>>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>>>>
>>>>
>>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>>> Hi Marek,
>>>>>
>>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>>> Hi Laurent,
>>>>>>
>>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> (CC'ing Sakari Ailus)
>>>>>>>
>>>>>>> Thank you for the patches.
>>>>>>>
>>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>>> support for hardware modules, which can be used for processing image data
>>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>>> core":
>>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>>
>>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>>>> with fbdev API.
>>>>>>>>
>>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>>
>>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>>>> target window size.
>>>>>>>>
>>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>>>> respective hardware module. In typical case those properties are a source
>>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>>>> object in a similar way as object and properties are provided for
>>>>>>>> performing atomic page flip / mode setting.
>>>>>>>>
>>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>>   processors,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>>>   processor,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>>>   property set.
>>>>>>>>
>>>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>>>> properties to add support for more advanced picture processing (for example
>>>>>>>> blending).
>>>>>>>>
>>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>>>> rotator driver smaller by over 200 lines).
>>>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>>>> Stupid question, why DRM ?
>>>>>>
>>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>>>
>>>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>>>  - it is used only in some internal/vendor trees, not in open-source
>>>>>>  - we want it to have sane and potentially extensible userspace API
>>>>>>  - but we don't want to loose its functionality
>>>>>>
>>>>>> 2. we want to have simple API for performing single image processing
>>>>>> operation:
>>>>>>  - typically it will be used by compositing window manager, this means that
>>>>>>    some parameters of the processing might change on each vblank (like
>>>>>>    destination rectangle for example). This api allows such change on each
>>>>>>    operation without any additional cost. V4L2 requires to reinitialize
>>>>>>    queues with new configuration on such change, what means that a bunch of
>>>>>>    ioctls has to be called.
>>>>>
>>>>> What do you mean by re-initialising the queue? Format, buffers or something
>>>>> else?
>>>>>
>>>>> If you need a larger buffer than what you have already allocated, you'll
>>>>> need to re-allocate, V4L2 or not.
>>>>>
>>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>>>> implementing that and some work in videobuf2.
>>>>>
>>>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>>>> fine as a lot of the parameters are not changeable during streaming,
>>>>> especially if the pipeline is handled by multiple drivers. That said, for
>>>>> devices that process data from memory to memory performing changes in the
>>>>> media bus formats and pipeline configuration is not very efficient
>>>>> currently, largely for the same reason.
>>>>>
>>>>> The request API that people have been working for a bit different use cases
>>>>> isn't in mainline yet. It would allow more efficient per-request
>>>>> configuration than what is currently possible, but it has turned out to be
>>>>> far from trivial to implement.
>>>>>
>>>>>>  - validating processing parameters in V4l2 API is really complicated,
>>>>>>    because the parameters (format, src&dest rectangles, rotation) are being
>>>>>>    set incrementally, so we have to either allow some impossible,
>>>>>> transitional
>>>>>>    configurations or complicate the configuration steps even more (like
>>>>>>    calling some ioctls multiple times for both input and output). In the end
>>>>>>    all parameters have to be again validated just before performing the
>>>>>>    operation.
>>>>>
>>>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>>>> place when the stream is started.
>>>>>
>>>>>>
>>>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>
>>>>> For GPUs I generally understand the reasoning: there's a very limited number
>>>>> of users of this API --- primarily because it's not an application
>>>>> interface.
>>>>>
>>>>> If you have a device that however falls under the scope of V4L2 (at least
>>>>> API-wise), does this continue to be the case? Will there be only one or two
>>>>> (or so) users for this API? Is it the case here?
>>>>>
>>>>> Using a device specific interface definitely has some benefits: there's no
>>>>> need to think how would you generalise the interface for other similar
>>>>> devices. There's no need to consider backwards compatibility as it's not a
>>>>> requirement. The drawback is that the applications that need to support
>>>>> similar devices will bear the burden of having to support different APIs.
>>>>>
>>>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>>>> independently of how unworkable that might be, but there are also clear
>>>>> advantages in using a standardised interface such as V4L2.
>>>>>
>>>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>>>> would look quite different from what it is now.
>>>>
>>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>>>>
>>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>>>>
>>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>>>>
>>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>>>>
>>>
>>> I agree with many of the arguments given by Inki above and earlier by
>>> Marek. However, they apply to already existing V4L2 implementation,
>>> not V4L2 as the idea in general, and I believe a comparison against a
>>> complete new API that doesn't even exist in the kernel tree and
>>> userspace yet (only in terms of patches on the list) is not fair.
>>
>> Below is a user space who uses Exynos DRM post processor driver, IPP driver.
>> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen
>>
>> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
>> Ps. other DRM drivers in mainline already have such or similar API.
>>
>> We will also open the user space who uses new API later.
> 
> Those drivers are different, because they just expose a hw-specific abi.
> Like the current IPP interfaces exposed by drm/exynos.
> 
> I think you have 2 options:
> - Extend the current IPP interface in drm/exynos with whatever new pixel
>   processor modes you want. Of course this still means you need to have

Yes, this is only thing we could select as of now.


Thanks,
Inki Dae


>   the userspace side open-source, but otherwise it's all private to exynos
>   hardware and software.
> 
> - If you want something standardized otoh, go with v4l2. And the issues
>   you point out in v4l2 aren't uapi issues, but implementation details of
>   the current vbuf helpers, which can be fixed. At least that's my
>   understanding. And it should be fairly easy to fix that, simply switch
>   from doing a map/unmap for every q/deqbuf to caching the mappings and
>   use the stream dma-api interfaces to only do the flush (if needed at
>   all, should turn into a no-op) on q/deqbuf.
> 
> Trying to come up with a generic drm api has imo not much chance of
> getting accepted anytime soon (since for the simple pixel processor
> pipeline it's just duplicating v4l, and for something more generic/faster
> a generic interfaces is alwas too slow).
> -Daniel
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10 10:29                 ` Inki Dae
@ 2017-05-10 13:18                     ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2017-05-10 13:18 UTC (permalink / raw)
  To: Inki Dae
  Cc: Tomasz Figa, linux-samsung-soc, Bartlomiej Zolnierkiewicz,
	Seung-Woo Kim, dri-devel, Tobias Jakobi, Sakari Ailus,
	Laurent Pinchart, Sakari Ailus, linux-media, Marek Szyprowski

On Wed, May 10, 2017 at 12:29 PM, Inki Dae <inki.dae@samsung.com> wrote:
>> This kind of contradicts with response Marek received from DRM
>> community about his proposal. Which drivers in particular you have in
>> mind?
>
> You can check vmw_overlay_ioctl of vmwgfx driver and intel_overlay_put_image_ioctl of i915 driver. These was all I could find in mainline.
> Seems the boundaries of whether we have to implement pre/post post processing mem2mem driver in V4L2 or DRM are really vague.

These aren't picture processors, but overlay plane support merged
before we had the core drm overlay support. Please do not emulate them
at all, your patch will be rejected :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
@ 2017-05-10 13:18                     ` Daniel Vetter
  0 siblings, 0 replies; 34+ messages in thread
From: Daniel Vetter @ 2017-05-10 13:18 UTC (permalink / raw)
  To: Inki Dae
  Cc: linux-samsung-soc, Bartlomiej Zolnierkiewicz, Seung-Woo Kim,
	dri-devel, Tomasz Figa, Tobias Jakobi, Sakari Ailus,
	Laurent Pinchart, Sakari Ailus, Marek Szyprowski, linux-media

On Wed, May 10, 2017 at 12:29 PM, Inki Dae <inki.dae@samsung.com> wrote:
>> This kind of contradicts with response Marek received from DRM
>> community about his proposal. Which drivers in particular you have in
>> mind?
>
> You can check vmw_overlay_ioctl of vmwgfx driver and intel_overlay_put_image_ioctl of i915 driver. These was all I could find in mainline.
> Seems the boundaries of whether we have to implement pre/post post processing mem2mem driver in V4L2 or DRM are really vague.

These aren't picture processors, but overlay plane support merged
before we had the core drm overlay support. Please do not emulate them
at all, your patch will be rejected :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 0/4] Exynos DRM: add Picture Processor extension
  2017-05-10  7:55               ` Daniel Vetter
  2017-05-10 10:31                   ` Inki Dae
@ 2017-05-10 13:30                 ` Marek Szyprowski
  1 sibling, 0 replies; 34+ messages in thread
From: Marek Szyprowski @ 2017-05-10 13:30 UTC (permalink / raw)
  To: Daniel Vetter, Inki Dae
  Cc: Tomasz Figa, linux-media, linux-samsung-soc,
	Bartlomiej Zolnierkiewicz, Seung-Woo Kim, dri-devel,
	Tobias Jakobi, Sakari Ailus, Laurent Pinchart, Sakari Ailus

Hi Daniel,

On 2017-05-10 09:55, Daniel Vetter wrote:
> On Wed, May 10, 2017 at 03:27:02PM +0900, Inki Dae wrote:
>> Hi Tomasz,
>>
>> 2017년 05월 10일 14:38에 Tomasz Figa 이(가) 쓴 글:
>>> Hi Everyone,
>>>
>>> On Wed, May 10, 2017 at 9:24 AM, Inki Dae <inki.dae@samsung.com> wrote:
>>>>
>>>> 2017년 04월 26일 07:21에 Sakari Ailus 이(가) 쓴 글:
>>>>> Hi Marek,
>>>>>
>>>>> On Thu, Apr 20, 2017 at 01:23:09PM +0200, Marek Szyprowski wrote:
>>>>>> Hi Laurent,
>>>>>>
>>>>>> On 2017-04-20 12:25, Laurent Pinchart wrote:
>>>>>>> Hi Marek,
>>>>>>>
>>>>>>> (CC'ing Sakari Ailus)
>>>>>>>
>>>>>>> Thank you for the patches.
>>>>>>>
>>>>>>> On Thursday 20 Apr 2017 11:13:36 Marek Szyprowski wrote:
>>>>>>>> Dear all,
>>>>>>>>
>>>>>>>> This is an updated proposal for extending EXYNOS DRM API with generic
>>>>>>>> support for hardware modules, which can be used for processing image data
>>>>>>> >from the one memory buffer to another. Typical memory-to-memory operations
>>>>>>>> are: rotation, scaling, colour space conversion or mix of them. This is a
>>>>>>>> follow-up of my previous proposal "[RFC 0/2] New feature: Framebuffer
>>>>>>>> processors", which has been rejected as "not really needed in the DRM
>>>>>>>> core":
>>>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>>>>>
>>>>>>>> In this proposal I moved all the code to Exynos DRM driver, so now this
>>>>>>>> will be specific only to Exynos DRM. I've also changed the name from
>>>>>>>> framebuffer processor (fbproc) to picture processor (pp) to avoid confusion
>>>>>>>> with fbdev API.
>>>>>>>>
>>>>>>>> Here is a bit more information what picture processors are:
>>>>>>>>
>>>>>>>> Embedded SoCs are known to have a number of hardware blocks, which perform
>>>>>>>> such operations. They can be used in paralel to the main GPU module to
>>>>>>>> offload CPU from processing grapics or video data. One of example use of
>>>>>>>> such modules is implementing video overlay, which usually requires color
>>>>>>>> space conversion from NV12 (or similar) to RGB32 color space and scaling to
>>>>>>>> target window size.
>>>>>>>>
>>>>>>>> The proposed API is heavily inspired by atomic KMS approach - it is also
>>>>>>>> based on DRM objects and their properties. A new DRM object is introduced:
>>>>>>>> picture processor (called pp for convenience). Such objects have a set of
>>>>>>>> standard DRM properties, which describes the operation to be performed by
>>>>>>>> respective hardware module. In typical case those properties are a source
>>>>>>>> fb id and rectangle (x, y, width, height) and destination fb id and
>>>>>>>> rectangle. Optionally a rotation property can be also specified if
>>>>>>>> supported by the given hardware. To perform an operation on image data,
>>>>>>>> userspace provides a set of properties and their values for given fbproc
>>>>>>>> object in a similar way as object and properties are provided for
>>>>>>>> performing atomic page flip / mode setting.
>>>>>>>>
>>>>>>>> The proposed API consists of the 3 new ioctls:
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET_RESOURCES: to enumerate all available picture
>>>>>>>>    processors,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_GET: to query capabilities of given picture
>>>>>>>>    processor,
>>>>>>>> - DRM_IOCTL_EXYNOS_PP_COMMIT: to perform operation described by given
>>>>>>>>    property set.
>>>>>>>>
>>>>>>>> The proposed API is extensible. Drivers can attach their own, custom
>>>>>>>> properties to add support for more advanced picture processing (for example
>>>>>>>> blending).
>>>>>>>>
>>>>>>>> This proposal aims to replace Exynos DRM IPP (Image Post Processing)
>>>>>>>> subsystem. IPP API is over-engineered in general, but not really extensible
>>>>>>>> on the other side. It is also buggy, with significant design flaws - the
>>>>>>>> biggest issue is the fact that the API covers memory-2-memory picture
>>>>>>>> operations together with CRTC writeback and duplicating features, which
>>>>>>>> belongs to video plane. Comparing with IPP subsystem, the PP framework is
>>>>>>>> smaller (1807 vs 778 lines) and allows driver simplification (Exynos
>>>>>>>> rotator driver smaller by over 200 lines).
>>>>>>> This seems to be the kind of hardware that is typically supported by V4L2.
>>>>>>> Stupid question, why DRM ?
>>>>>> Let me elaborate a bit on the reasons for implementing it in Exynos DRM:
>>>>>>
>>>>>> 1. we want to replace existing Exynos IPP subsystem:
>>>>>>   - it is used only in some internal/vendor trees, not in open-source
>>>>>>   - we want it to have sane and potentially extensible userspace API
>>>>>>   - but we don't want to loose its functionality
>>>>>>
>>>>>> 2. we want to have simple API for performing single image processing
>>>>>> operation:
>>>>>>   - typically it will be used by compositing window manager, this means that
>>>>>>     some parameters of the processing might change on each vblank (like
>>>>>>     destination rectangle for example). This api allows such change on each
>>>>>>     operation without any additional cost. V4L2 requires to reinitialize
>>>>>>     queues with new configuration on such change, what means that a bunch of
>>>>>>     ioctls has to be called.
>>>>> What do you mean by re-initialising the queue? Format, buffers or something
>>>>> else?
>>>>>
>>>>> If you need a larger buffer than what you have already allocated, you'll
>>>>> need to re-allocate, V4L2 or not.
>>>>>
>>>>> We also do lack a way to destroy individual buffers in V4L2. It'd be up to
>>>>> implementing that and some work in videobuf2.
>>>>>
>>>>> Another thing is that V4L2 is very stream oriented. For most devices that's
>>>>> fine as a lot of the parameters are not changeable during streaming,
>>>>> especially if the pipeline is handled by multiple drivers. That said, for
>>>>> devices that process data from memory to memory performing changes in the
>>>>> media bus formats and pipeline configuration is not very efficient
>>>>> currently, largely for the same reason.
>>>>>
>>>>> The request API that people have been working for a bit different use cases
>>>>> isn't in mainline yet. It would allow more efficient per-request
>>>>> configuration than what is currently possible, but it has turned out to be
>>>>> far from trivial to implement.
>>>>>
>>>>>>   - validating processing parameters in V4l2 API is really complicated,
>>>>>>     because the parameters (format, src&dest rectangles, rotation) are being
>>>>>>     set incrementally, so we have to either allow some impossible,
>>>>>> transitional
>>>>>>     configurations or complicate the configuration steps even more (like
>>>>>>     calling some ioctls multiple times for both input and output). In the end
>>>>>>     all parameters have to be again validated just before performing the
>>>>>>     operation.
>>>>> You have to validate the parameters in any case. In a MC pipeline this takes
>>>>> place when the stream is started.
>>>>>
>>>>>> 3. generic approach (to add it to DRM core) has been rejected:
>>>>>> http://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg146286.html
>>>>> For GPUs I generally understand the reasoning: there's a very limited number
>>>>> of users of this API --- primarily because it's not an application
>>>>> interface.
>>>>>
>>>>> If you have a device that however falls under the scope of V4L2 (at least
>>>>> API-wise), does this continue to be the case? Will there be only one or two
>>>>> (or so) users for this API? Is it the case here?
>>>>>
>>>>> Using a device specific interface definitely has some benefits: there's no
>>>>> need to think how would you generalise the interface for other similar
>>>>> devices. There's no need to consider backwards compatibility as it's not a
>>>>> requirement. The drawback is that the applications that need to support
>>>>> similar devices will bear the burden of having to support different APIs.
>>>>>
>>>>> I don't mean to say that you should ram whatever under V4L2 / MC
>>>>> independently of how unworkable that might be, but there are also clear
>>>>> advantages in using a standardised interface such as V4L2.
>>>>>
>>>>> V4L2 has a long history behind it and if it was designed today, I bet it
>>>>> would look quite different from what it is now.
>>>> It's true. There is definitely a benefit with V4L2 because V4L2 provides Linux standard ABI - for DRM as of now not.
>>>>
>>>> However, I think that is a only benefit we could get through V4L2. Using V4L2 makes software stack of Platform to be complicated - We have to open video device node and card device node to display a image on the screen scaling or converting color space of the image and also we need to export DMA buffer from one side and import it to other side using DMABUF.
>>>>
>>>> It may not related to this but even V4L2 has performance problem - every QBUF/DQBUF requests performs mapping/unmapping DMA buffer you already know this. :)
>>>>
>>>> In addition, recently Display subsystems on ARM SoC tend to include pre/post processing hardware in Display controller - OMAP, Exynos8895 and MSM as long as I know.
>>>>
>>> I agree with many of the arguments given by Inki above and earlier by
>>> Marek. However, they apply to already existing V4L2 implementation,
>>> not V4L2 as the idea in general, and I believe a comparison against a
>>> complete new API that doesn't even exist in the kernel tree and
>>> userspace yet (only in terms of patches on the list) is not fair.
>> Below is a user space who uses Exynos DRM post processor driver, IPP driver.
>> https://review.tizen.org/git/?p=platform/adaptation/samsung_exynos/libtdm-exynos.git;a=blob;f=src/tdm_exynos_pp.c;h=db20e6f226d313672d1d468e06d80526ea30121c;hb=refs/heads/tizen
>>
>> Marek patch series is just a new version of this driver which is specific to Exynos DRM. Marek is trying to enhance this driver.
>> Ps. other DRM drivers in mainline already have such or similar API.
>>
>> We will also open the user space who uses new API later.
> Those drivers are different, because they just expose a hw-specific abi.
> Like the current IPP interfaces exposed by drm/exynos.
>
> I think you have 2 options:
> - Extend the current IPP interface in drm/exynos with whatever new pixel
>    processor modes you want. Of course this still means you need to have
>    the userspace side open-source, but otherwise it's all private to exynos
>    hardware and software.

That's what I proposed in RFC v2 posted 2 days ago - everything is kept in
Exynos DRM driver, no changes to DRM-core at all. Maybe calling it Exynos
IPP v2 would have been a better idea.

> - If you want something standardized otoh, go with v4l2. And the issues
>    you point out in v4l2 aren't uapi issues, but implementation details of
>    the current vbuf helpers, which can be fixed. At least that's my
>    understanding. And it should be fairly easy to fix that, simply switch
>    from doing a map/unmap for every q/deqbuf to caching the mappings and
>    use the stream dma-api interfaces to only do the flush (if needed at
>    all, should turn into a no-op) on q/deqbuf.
>
> Trying to come up with a generic drm api has imo not much chance of
> getting accepted anytime soon (since for the simple pixel processor
> pipeline it's just duplicating v4l, and for something more generic/faster
> a generic interfaces is alwas too slow).

Okay, I don't want to make it DRM-wide. For us it is enough to have it at
Exynos DRM, like existing IPP ioctls.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2017-05-10 13:30 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20170420091406eucas1p24c50a0015545105081257d880727386c@eucas1p2.samsung.com>
2017-04-20  9:13 ` [RFC 0/4] Exynos DRM: add Picture Processor extension Marek Szyprowski
     [not found]   ` <CGME20170420091406eucas1p2ba4648e8e70ecca9c472017c21d654e1@eucas1p2.samsung.com>
2017-04-20  9:13     ` [RFC 1/4] drm: Export functions to create custom DRM objects Marek Szyprowski
     [not found]   ` <CGME20170420091407eucas1p2da1e16aa00e6d0bf8bd305422c3a9ba9@eucas1p2.samsung.com>
2017-04-20  9:13     ` [RFC 2/4] drm: Add support for vendor specific DRM objects with custom properties Marek Szyprowski
     [not found]   ` <CGME20170420091407eucas1p281bd7bb7f7b45855cf593ec8aed6136a@eucas1p2.samsung.com>
2017-04-20  9:13     ` [RFC 3/4] drm/exynos: Add Picture Processor framework Marek Szyprowski
     [not found]   ` <CGME20170420091408eucas1p2ef5b57fdcafcf13fbc52763f7cb43d45@eucas1p2.samsung.com>
2017-04-20  9:13     ` [RFC 4/4] drm/exynos: Convert Exynos Rotator driver to Picture Processor interface Marek Szyprowski
2017-04-20 10:25   ` [RFC 0/4] Exynos DRM: add Picture Processor extension Laurent Pinchart
2017-04-20 11:23     ` Marek Szyprowski
2017-04-20 12:17       ` Tobias Jakobi
2017-04-25 22:21       ` Sakari Ailus
2017-04-26 14:53         ` Nicolas Dufresne
2017-04-26 15:16           ` Tobias Jakobi
2017-04-26 15:16             ` Tobias Jakobi
2017-04-27 13:52             ` Marek Szyprowski
2017-04-27 13:52               ` Marek Szyprowski
2017-04-26 16:52           ` Tobias Jakobi
2017-04-26 16:52             ` Tobias Jakobi
2017-04-26 19:18             ` Nicolas Dufresne
2017-04-26 19:31               ` Tobias Jakobi
2017-04-26 19:36                 ` Nicolas Dufresne
2017-04-27 13:52         ` Marek Szyprowski
2017-05-10  1:24         ` Inki Dae
2017-05-10  5:38           ` Tomasz Figa
2017-05-10  6:27             ` Inki Dae
2017-05-10  6:38               ` Tomasz Figa
2017-05-10  6:38                 ` Tomasz Figa
2017-05-10 10:29                 ` Inki Dae
2017-05-10 13:18                   ` Daniel Vetter
2017-05-10 13:18                     ` Daniel Vetter
2017-05-10  7:55               ` Daniel Vetter
2017-05-10 10:31                 ` Inki Dae
2017-05-10 10:31                   ` Inki Dae
2017-05-10 13:30                 ` Marek Szyprowski
2017-04-20 19:02   ` Dave Airlie
2017-04-25  6:59     ` Marek Szyprowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.