dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS
@ 2023-10-19 21:21 Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 01/17] drm/atomic: Allow get_value for immutable properties on atomic drivers Harry Wentland
                   ` (17 more replies)
  0 siblings, 18 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

This is an early RFC set for a color pipeline API, along with a
sample implementation in VKMS. All the key API bits are here.
VKMS now supports two named transfer function colorops and we
have an IGT test that confirms that sRGB EOTF, followed by its
inverse gives us expected results within +/- 1 8 bpc codepoint
value.

This patchset is grouped as follows:
 - Patches 1-2: couple general patches/fixes
 - Patches 3-5: introduce kunit to VKMS
 - Patch 6: description of motivation and details behind the
            Color Pipeline API. If you're reading nothing else
            but are interested in the topic I highly recommend
            you take a look at this.
 - Patches 7-15: Add core DRM API bits
 - Patches 15-17: VKMS implementation

There are plenty of things that I would like to see here but
haven't had a chance to look at. These will (hopefully) be
addressed in future iterations:
 - Abandon IOCTLs and discover colorops as clients iterate the pipeline
 - Add color_pipeline client cap and deprecate existing color encoding and
   color range properties.
   See https://lists.freedesktop.org/archives/dri-devel/2023-September/422643.html
 - Add CTM colorop to VKMS
 - Add custom LUT colorops to VKMS
 - Add pre-blending 3DLUT with tetrahedral interpolation to VKMS
 - How to support HW which can't bypass entire pipeline?
 - Add ability to create colorops that don't have BYPASS
 - Can we do a LOAD / COMMIT model for LUTs (and other properties)?

IGT tests can be found at
https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1

IGT patches are also being sent to the igt-dev mailing list.

libdrm changes to support the new IOCTLs are at
https://gitlab.freedesktop.org/hwentland/drm/-/merge_requests/1

If you prefer a gitlab MR for review you can find it at
https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5

A slightly different approach for a Color Pipeline API was sent by
Uma Shankar and can be found at
https://patchwork.freedesktop.org/series/123024/

The main difference is that his approach is not introducing a new DRM
core object but instead exposes color pipelines via blob properties.
There are pros and cons to both approaches.

v2:
 - Rebased on drm-misc-next
 - Introduce a VKMS Kunit so we can test LUT functionality in vkms_composer
 - Incorporate feedback in color_pipeline.rst doc
 - Add support for sRGB inverse EOTF
 - Add 2nd enumerated TF colorop to VKMS
 - Fix LUTs and some issues with applying LUTs in VKMS

Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>

Harry Wentland (17):
  drm/atomic: Allow get_value for immutable properties on atomic drivers
  drm: Don't treat 0 as -1 in drm_fixp2int_ceil
  drm/vkms: Create separate Kconfig file for VKMS
  drm/vkms: Add kunit tests for VKMS LUT handling
  drm/vkms: Avoid reading beyond LUT array
  drm/doc/rfc: Describe why prescriptive color pipeline is needed
  drm/colorop: Introduce new drm_colorop mode object
  drm/colorop: Add TYPE property
  drm/color: Add 1D Curve subtype
  drm/colorop: Add BYPASS property
  drm/colorop: Add NEXT property
  drm/colorop: Add atomic state print for drm_colorop
  drm/colorop: Add new IOCTLs to retrieve drm_colorop objects
  drm/plane: Add COLOR PIPELINE property
  drm/colorop: Add NEXT to colorop state print
  drm/vkms: Add enumerated 1D curve colorop
  drm/vkms: Add kunit tests for linear and sRGB LUTs

 Documentation/gpu/rfc/color_pipeline.rst      | 347 ++++++++
 drivers/gpu/drm/Kconfig                       |  14 +-
 drivers/gpu/drm/Makefile                      |   1 +
 drivers/gpu/drm/drm_atomic.c                  | 155 ++++
 drivers/gpu/drm/drm_atomic_helper.c           |  12 +
 drivers/gpu/drm/drm_atomic_state_helper.c     |   5 +
 drivers/gpu/drm/drm_atomic_uapi.c             | 110 +++
 drivers/gpu/drm/drm_colorop.c                 | 384 +++++++++
 drivers/gpu/drm/drm_crtc_internal.h           |   4 +
 drivers/gpu/drm/drm_ioctl.c                   |   5 +
 drivers/gpu/drm/drm_mode_config.c             |   7 +
 drivers/gpu/drm/drm_mode_object.c             |   3 +-
 drivers/gpu/drm/drm_plane_helper.c            |   2 +-
 drivers/gpu/drm/vkms/Kconfig                  |  20 +
 drivers/gpu/drm/vkms/Makefile                 |   6 +-
 drivers/gpu/drm/vkms/tests/.kunitconfig       |   4 +
 drivers/gpu/drm/vkms/tests/Makefile           |   4 +
 drivers/gpu/drm/vkms/tests/vkms_color_tests.c | 100 +++
 drivers/gpu/drm/vkms/vkms_colorop.c           |  85 ++
 drivers/gpu/drm/vkms/vkms_composer.c          |  77 +-
 drivers/gpu/drm/vkms/vkms_composer.h          |  25 +
 drivers/gpu/drm/vkms/vkms_drv.h               |   4 +
 drivers/gpu/drm/vkms/vkms_luts.c              | 802 ++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_luts.h              |  12 +
 drivers/gpu/drm/vkms/vkms_plane.c             |   2 +
 include/drm/drm_atomic.h                      |  82 ++
 include/drm/drm_atomic_uapi.h                 |   3 +
 include/drm/drm_colorop.h                     | 235 +++++
 include/drm/drm_fixed.h                       |   2 +-
 include/drm/drm_mode_config.h                 |  18 +
 include/drm/drm_plane.h                       |  10 +
 include/uapi/drm/drm.h                        |   3 +
 include/uapi/drm/drm_mode.h                   |  22 +
 33 files changed, 2530 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
 create mode 100644 drivers/gpu/drm/drm_colorop.c
 create mode 100644 drivers/gpu/drm/vkms/Kconfig
 create mode 100644 drivers/gpu/drm/vkms/tests/.kunitconfig
 create mode 100644 drivers/gpu/drm/vkms/tests/Makefile
 create mode 100644 drivers/gpu/drm/vkms/tests/vkms_color_tests.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_colorop.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_composer.h
 create mode 100644 drivers/gpu/drm/vkms/vkms_luts.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_luts.h
 create mode 100644 include/drm/drm_colorop.h

--
2.42.0


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 01/17] drm/atomic: Allow get_value for immutable properties on atomic drivers
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 02/17] drm: Don't treat 0 as -1 in drm_fixp2int_ceil Harry Wentland
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

drm_colorops use immutable properties, for type and next.
Even though drivers create these properties at initialization
they will need to look at the properties when parsing a
color pipeline for programming during an atomic check
or commit operation.

This aligns the get_value call with behavior of the set_value
call.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_mode_object.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_mode_object.c b/drivers/gpu/drm/drm_mode_object.c
index ac0d2ce3f870..c9b1cd48547a 100644
--- a/drivers/gpu/drm/drm_mode_object.c
+++ b/drivers/gpu/drm/drm_mode_object.c
@@ -351,7 +351,8 @@ static int __drm_object_property_get_value(struct drm_mode_object *obj,
 int drm_object_property_get_value(struct drm_mode_object *obj,
 				  struct drm_property *property, uint64_t *val)
 {
-	WARN_ON(drm_drv_uses_atomic_modeset(property->dev));
+	WARN_ON(drm_drv_uses_atomic_modeset(property->dev) &&
+		!(property->flags & DRM_MODE_PROP_IMMUTABLE));
 
 	return __drm_object_property_get_value(obj, property, val);
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 02/17] drm: Don't treat 0 as -1 in drm_fixp2int_ceil
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 01/17] drm/atomic: Allow get_value for immutable properties on atomic drivers Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 03/17] drm/vkms: Create separate Kconfig file for VKMS Harry Wentland
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Unit testing this in VKMS shows that passing 0 into
this function returns -1, which is highly counter-
intuitive. Fix it by checking whether the input is
>= 0 instead of > 0.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 include/drm/drm_fixed.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/drm_fixed.h b/include/drm/drm_fixed.h
index 6ea339d5de08..0c9f917a4d4b 100644
--- a/include/drm/drm_fixed.h
+++ b/include/drm/drm_fixed.h
@@ -95,7 +95,7 @@ static inline int drm_fixp2int_round(s64 a)
 
 static inline int drm_fixp2int_ceil(s64 a)
 {
-	if (a > 0)
+	if (a >= 0)
 		return drm_fixp2int(a + DRM_FIXED_ALMOST_ONE);
 	else
 		return drm_fixp2int(a - DRM_FIXED_ALMOST_ONE);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 03/17] drm/vkms: Create separate Kconfig file for VKMS
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 01/17] drm/atomic: Allow get_value for immutable properties on atomic drivers Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 02/17] drm: Don't treat 0 as -1 in drm_fixp2int_ceil Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 04/17] drm/vkms: Add kunit tests for VKMS LUT handling Harry Wentland
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

This aligns with most other DRM drivers and will allow
us to add new VKMS config options without polluting
the DRM Kconfig.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/Kconfig      | 14 +-------------
 drivers/gpu/drm/vkms/Kconfig | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 13 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/Kconfig

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 48ca28a2e4ff..61ebd682c9b0 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -286,19 +286,7 @@ config DRM_VGEM
 	  as used by Mesa's software renderer for enhanced performance.
 	  If M is selected the module will be called vgem.
 
-config DRM_VKMS
-	tristate "Virtual KMS (EXPERIMENTAL)"
-	depends on DRM && MMU
-	select DRM_KMS_HELPER
-	select DRM_GEM_SHMEM_HELPER
-	select CRC32
-	default n
-	help
-	  Virtual Kernel Mode-Setting (VKMS) is used for testing or for
-	  running GPU in a headless machines. Choose this option to get
-	  a VKMS.
-
-	  If M is selected the module will be called vkms.
+source "drivers/gpu/drm/vkms/Kconfig"
 
 source "drivers/gpu/drm/exynos/Kconfig"
 
diff --git a/drivers/gpu/drm/vkms/Kconfig b/drivers/gpu/drm/vkms/Kconfig
new file mode 100644
index 000000000000..1816562381a2
--- /dev/null
+++ b/drivers/gpu/drm/vkms/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0+
+
+config DRM_VKMS
+	tristate "Virtual KMS (EXPERIMENTAL)"
+	depends on DRM && MMU
+	select DRM_KMS_HELPER
+	select DRM_GEM_SHMEM_HELPER
+	select CRC32
+	default n
+	help
+	  Virtual Kernel Mode-Setting (VKMS) is used for testing or for
+	  running GPU in a headless machines. Choose this option to get
+	  a VKMS.
+
+	  If M is selected the module will be called vkms.
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 04/17] drm/vkms: Add kunit tests for VKMS LUT handling
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (2 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 03/17] drm/vkms: Create separate Kconfig file for VKMS Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-23 22:34   ` Arthur Grillo
  2023-10-19 21:21 ` [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array Harry Wentland
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Debugging LUT math is much easier when we can unit test
it. Add kunit functionality to VKMS and add tests for
 - get_lut_index
 - lerp_u16

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/vkms/Kconfig                  |  5 ++
 drivers/gpu/drm/vkms/Makefile                 |  2 +
 drivers/gpu/drm/vkms/tests/.kunitconfig       |  4 ++
 drivers/gpu/drm/vkms/tests/Makefile           |  4 ++
 drivers/gpu/drm/vkms/tests/vkms_color_tests.c | 64 +++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_composer.c          |  4 +-
 drivers/gpu/drm/vkms/vkms_composer.h          | 11 ++++
 7 files changed, 92 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/tests/.kunitconfig
 create mode 100644 drivers/gpu/drm/vkms/tests/Makefile
 create mode 100644 drivers/gpu/drm/vkms/tests/vkms_color_tests.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_composer.h

diff --git a/drivers/gpu/drm/vkms/Kconfig b/drivers/gpu/drm/vkms/Kconfig
index 1816562381a2..372cc5fa92f1 100644
--- a/drivers/gpu/drm/vkms/Kconfig
+++ b/drivers/gpu/drm/vkms/Kconfig
@@ -13,3 +13,8 @@ config DRM_VKMS
 	  a VKMS.
 
 	  If M is selected the module will be called vkms.
+
+config DRM_VKMS_KUNIT_TESTS
+	tristate "Tests for VKMS" if !KUNIT_ALL_TESTS
+	depends on DRM_VKMS && KUNIT
+	default KUNIT_ALL_TESTS
diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 1b28a6a32948..d3440f228f46 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -9,3 +9,5 @@ vkms-y := \
 	vkms_writeback.o
 
 obj-$(CONFIG_DRM_VKMS) += vkms.o
+
+obj-y += tests/
\ No newline at end of file
diff --git a/drivers/gpu/drm/vkms/tests/.kunitconfig b/drivers/gpu/drm/vkms/tests/.kunitconfig
new file mode 100644
index 000000000000..70e378228cbd
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/.kunitconfig
@@ -0,0 +1,4 @@
+CONFIG_KUNIT=y
+CONFIG_DRM=y
+CONFIG_DRM_VKMS=y
+CONFIG_DRM_VKMS_KUNIT_TESTS=y
diff --git a/drivers/gpu/drm/vkms/tests/Makefile b/drivers/gpu/drm/vkms/tests/Makefile
new file mode 100644
index 000000000000..761465332ff2
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0+
+
+obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += \
+	vkms_color_tests.o
\ No newline at end of file
diff --git a/drivers/gpu/drm/vkms/tests/vkms_color_tests.c b/drivers/gpu/drm/vkms/tests/vkms_color_tests.c
new file mode 100644
index 000000000000..843b2e1d607e
--- /dev/null
+++ b/drivers/gpu/drm/vkms/tests/vkms_color_tests.c
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#include <kunit/test.h>
+
+#include <drm/drm_fixed.h>
+
+#include "../vkms_composer.h"
+
+#define TEST_LUT_SIZE 16
+
+static struct drm_color_lut test_linear_array[TEST_LUT_SIZE] = {
+	{ 0x0, 0x0, 0x0, 0 },
+	{ 0x1111, 0x1111, 0x1111, 0 },
+	{ 0x2222, 0x2222, 0x2222, 0 },
+	{ 0x3333, 0x3333, 0x3333, 0 },
+	{ 0x4444, 0x4444, 0x4444, 0 },
+	{ 0x5555, 0x5555, 0x5555, 0 },
+	{ 0x6666, 0x6666, 0x6666, 0 },
+	{ 0x7777, 0x7777, 0x7777, 0 },
+	{ 0x8888, 0x8888, 0x8888, 0 },
+	{ 0x9999, 0x9999, 0x9999, 0 },
+	{ 0xaaaa, 0xaaaa, 0xaaaa, 0 },
+	{ 0xbbbb, 0xbbbb, 0xbbbb, 0 },
+	{ 0xcccc, 0xcccc, 0xcccc, 0 },
+	{ 0xdddd, 0xdddd, 0xdddd, 0 },
+	{ 0xeeee, 0xeeee, 0xeeee, 0 },
+	{ 0xffff, 0xffff, 0xffff, 0 },
+};
+
+const struct vkms_color_lut test_linear_lut = {
+	.base = test_linear_array,
+	.lut_length = TEST_LUT_SIZE,
+	.channel_value2index_ratio = 0xf000fll
+};
+
+
+static void vkms_color_test_get_lut_index(struct kunit *test)
+{
+	int i;
+
+	KUNIT_EXPECT_EQ(test, drm_fixp2int(get_lut_index(&test_linear_lut, test_linear_array[0].red)), 0);
+
+	for (i = 0; i < TEST_LUT_SIZE; i++)
+		KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&test_linear_lut, test_linear_array[i].red)), i);
+}
+
+static void vkms_color_test_lerp(struct kunit *test)
+{
+	KUNIT_EXPECT_EQ(test, lerp_u16(0x0, 0x10, 0x80000000), 0x8);
+}
+
+static struct kunit_case vkms_color_test_cases[] = {
+	KUNIT_CASE(vkms_color_test_get_lut_index),
+	KUNIT_CASE(vkms_color_test_lerp),
+	{}
+};
+
+static struct kunit_suite vkms_color_test_suite = {
+	.name = "vkms-color",
+	.test_cases = vkms_color_test_cases,
+};
+kunit_test_suite(vkms_color_test_suite);
+
+MODULE_LICENSE("GPL");
\ No newline at end of file
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 3c99fb8b54e2..a0a3a6fd2926 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -91,7 +91,7 @@ static void fill_background(const struct pixel_argb_u16 *background_color,
 }
 
 // lerp(a, b, t) = a + (b - a) * t
-static u16 lerp_u16(u16 a, u16 b, s64 t)
+u16 lerp_u16(u16 a, u16 b, s64 t)
 {
 	s64 a_fp = drm_int2fixp(a);
 	s64 b_fp = drm_int2fixp(b);
@@ -101,7 +101,7 @@ static u16 lerp_u16(u16 a, u16 b, s64 t)
 	return drm_fixp2int(a_fp + delta);
 }
 
-static s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value)
+s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value)
 {
 	s64 color_channel_fp = drm_int2fixp(channel_value);
 
diff --git a/drivers/gpu/drm/vkms/vkms_composer.h b/drivers/gpu/drm/vkms/vkms_composer.h
new file mode 100644
index 000000000000..11c5de9cc961
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_composer.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _VKMS_COMPOSER_H_
+#define _VKMS_COMPOSER_H_
+
+#include "vkms_drv.h"
+
+s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value);
+u16 lerp_u16(u16 a, u16 b, s64 t);
+
+#endif /* _VKMS_COMPOSER_H_ */
\ No newline at end of file
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (3 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 04/17] drm/vkms: Add kunit tests for VKMS LUT handling Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-30 13:29   ` Pekka Paalanen
  2023-10-19 21:21 ` [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed Harry Wentland
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

When the floor LUT index (drm_fixp2int(lut_index) is the last
index of the array the ceil LUT index will point to an entry
beyond the array. Make sure we guard against it and use the
value of the floot LUT index.

Blurb about LUT creation and how first element should be 0x0 and
last one 0xffff.

Hold on, is that even correct? What should the ends of a LUT be?
How does UNORM work and how does it apply to LUTs?

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index a0a3a6fd2926..cf1dff162920 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -123,6 +123,8 @@ static u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 chan
 				      enum lut_channel channel)
 {
 	s64 lut_index = get_lut_index(lut, channel_value);
+	u16 *floor_lut_value, *ceil_lut_value;
+	u16 floor_channel_value, ceil_channel_value;
 
 	/*
 	 * This checks if `struct drm_color_lut` has any gap added by the compiler
@@ -130,11 +132,15 @@ static u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 chan
 	 */
 	static_assert(sizeof(struct drm_color_lut) == sizeof(__u16) * 4);
 
-	u16 *floor_lut_value = (__u16 *)&lut->base[drm_fixp2int(lut_index)];
-	u16 *ceil_lut_value = (__u16 *)&lut->base[drm_fixp2int_ceil(lut_index)];
+	floor_lut_value = (__u16 *)&lut->base[drm_fixp2int(lut_index)];
+	if (drm_fixp2int(lut_index) == (lut->lut_length - 1))
+		/* We're at the end of the LUT array, use same value for ceil and floor */
+		ceil_lut_value = floor_lut_value;
+	else
+		ceil_lut_value = (__u16 *)&lut->base[drm_fixp2int_ceil(lut_index)];
 
-	u16 floor_channel_value = floor_lut_value[channel];
-	u16 ceil_channel_value = ceil_lut_value[channel];
+	floor_channel_value = floor_lut_value[channel];
+	ceil_channel_value = ceil_lut_value[channel];
 
 	return lerp_u16(floor_channel_value, ceil_channel_value,
 			lut_index & DRM_FIXED_DECIMAL_MASK);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (4 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-20 14:22   ` Sebastian Wick
  2023-11-08 12:18   ` Shankar, Uma
  2023-10-19 21:21 ` [RFC PATCH v2 07/17] drm/colorop: Introduce new drm_colorop mode object Harry Wentland
                   ` (11 subsequent siblings)
  17 siblings, 2 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

v2:
 - Update colorop visualizations to match reality (Sebastian, Alex Hung)
 - Updated wording (Pekka)
 - Change BYPASS wording to make it non-mandatory (Sebastian)
 - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
   section (Pekka)
 - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
 - Add "Driver Implementer's Guide" section (Pekka)
 - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 Documentation/gpu/rfc/color_pipeline.rst | 347 +++++++++++++++++++++++
 1 file changed, 347 insertions(+)
 create mode 100644 Documentation/gpu/rfc/color_pipeline.rst

diff --git a/Documentation/gpu/rfc/color_pipeline.rst b/Documentation/gpu/rfc/color_pipeline.rst
new file mode 100644
index 000000000000..af5f2ea29116
--- /dev/null
+++ b/Documentation/gpu/rfc/color_pipeline.rst
@@ -0,0 +1,347 @@
+========================
+Linux Color Pipeline API
+========================
+
+What problem are we solving?
+============================
+
+We would like to support pre-, and post-blending complex color
+transformations in display controller hardware in order to allow for
+HW-supported HDR use-cases, as well as to provide support to
+color-managed applications, such as video or image editors.
+
+It is possible to support an HDR output on HW supporting the Colorspace
+and HDR Metadata drm_connector properties, but that requires the
+compositor or application to render and compose the content into one
+final buffer intended for display. Doing so is costly.
+
+Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and other
+operations to support color transformations. These operations are often
+implemented in fixed-function HW and therefore much more power efficient than
+performing similar operations via shaders or CPU.
+
+We would like to make use of this HW functionality to support complex color
+transformations with no, or minimal CPU or shader load.
+
+
+How are other OSes solving this problem?
+========================================
+
+The most widely supported use-cases regard HDR content, whether video or
+gaming.
+
+Most OSes will specify the source content format (color gamut, encoding transfer
+function, and other metadata, such as max and average light levels) to a driver.
+Drivers will then program their fixed-function HW accordingly to map from a
+source content buffer's space to a display's space.
+
+When fixed-function HW is not available the compositor will assemble a shader to
+ask the GPU to perform the transformation from the source content format to the
+display's format.
+
+A compositor's mapping function and a driver's mapping function are usually
+entirely separate concepts. On OSes where a HW vendor has no insight into
+closed-source compositor code such a vendor will tune their color management
+code to visually match the compositor's. On other OSes, where both mapping
+functions are open to an implementer they will ensure both mappings match.
+
+This results in mapping algorithm lock-in, meaning that no-one alone can
+experiment with or introduce new mapping algorithms and achieve
+consistent results regardless of which implementation path is taken.
+
+Why is Linux different?
+=======================
+
+Unlike other OSes, where there is one compositor for one or more drivers, on
+Linux we have a many-to-many relationship. Many compositors; many drivers.
+In addition each compositor vendor or community has their own view of how
+color management should be done. This is what makes Linux so beautiful.
+
+This means that a HW vendor can now no longer tune their driver to one
+compositor, as tuning it to one could make it look fairly different from
+another compositor's color mapping.
+
+We need a better solution.
+
+
+Descriptive API
+===============
+
+An API that describes the source and destination colorspaces is a descriptive
+API. It describes the input and output color spaces but does not describe
+how precisely they should be mapped. Such a mapping includes many minute
+design decision that can greatly affect the look of the final result.
+
+It is not feasible to describe such mapping with enough detail to ensure the
+same result from each implementation. In fact, these mappings are a very active
+research area.
+
+
+Prescriptive API
+================
+
+A prescriptive API describes not the source and destination colorspaces. It
+instead prescribes a recipe for how to manipulate pixel values to arrive at the
+desired outcome.
+
+This recipe is generally an ordered list of straight-forward operations,
+with clear mathematical definitions, such as 1D LUTs, 3D LUTs, matrices,
+or other operations that can be described in a precise manner.
+
+
+The Color Pipeline API
+======================
+
+HW color management pipelines can significantly differ between HW
+vendors in terms of availability, ordering, and capabilities of HW
+blocks. This makes a common definition of color management blocks and
+their ordering nigh impossible. Instead we are defining an API that
+allows user space to discover the HW capabilities in a generic manner,
+agnostic of specific drivers and hardware.
+
+
+drm_colorop Object & IOCTLs
+===========================
+
+To support the definition of color pipelines we define the DRM core
+object type drm_colorop. Individual drm_colorop objects will be chained
+via the NEXT property of a drm_colorop to constitute a color pipeline.
+Each drm_colorop object is unique, i.e., even if multiple color
+pipelines have the same operation they won't share the same drm_colorop
+object to describe that operation.
+
+Note that drivers are not expected to map drm_colorop objects statically
+to specific HW blocks. The mapping of drm_colorop objects is entirely a
+driver-internal detail and can be as dynamic or static as a driver needs
+it to be. See more in the Driver Implementation Guide section below.
+
+Just like other DRM objects the drm_colorop objects are discovered via
+IOCTLs:
+
+DRM_IOCTL_MODE_GETCOLOROPRESOURCES: This IOCTL is used to retrieve the
+number of all drm_colorop objects.
+
+DRM_IOCTL_MODE_GETCOLOROP: This IOCTL is used to read one drm_colorop.
+It includes the ID for the colorop object, as well as the plane_id of
+the associated plane. All other values should be registered as
+properties.
+
+Each drm_colorop has three core properties:
+
+TYPE: The type of transformation, such as
+* enumerated curve
+* custom (uniform) 1D LUT
+* 3x3 matrix
+* 3x4 matrix
+* 3D LUT
+* etc.
+
+Depending on the type of transformation other properties will describe
+more details.
+
+BYPASS: A boolean property that can be used to easily put a block into
+bypass mode. While setting other properties might fail atomic check,
+setting the BYPASS property to true should never fail. The BYPASS
+property is not mandatory for a colorop, as long as the entire pipeline
+can get bypassed by setting the COLOR_PIPELINE on a plane to '0'.
+
+NEXT: The ID of the next drm_colorop in a color pipeline, or 0 if this
+drm_colorop is the last in the chain.
+
+An example of a drm_colorop object might look like one of these::
+
+    /* 1D enumerated curve */
+    Color operation 42
+    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
+    ├─ "BYPASS": bool {true, false}
+    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
+    └─ "NEXT": immutable color operation ID = 43
+
+    /* custom 4k entry 1D LUT */
+    Color operation 52
+    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
+    ├─ "BYPASS": bool {true, false}
+    ├─ "LUT_1D_SIZE": immutable range = 4096
+    ├─ "LUT_1D": blob
+    └─ "NEXT": immutable color operation ID = 0
+
+    /* 17^3 3D LUT */
+    Color operation 72
+    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 3D LUT
+    ├─ "BYPASS": bool {true, false}
+    ├─ "LUT_3D_SIZE": immutable range = 17
+    ├─ "LUT_3D": blob
+    └─ "NEXT": immutable color operation ID = 73
+
+
+COLOR_PIPELINE Plane Property
+=============================
+
+Color Pipelines are created by a driver and advertised via a new
+COLOR_PIPELINE enum property on each plane. Values of the property
+always include '0', which is the default and means all color processing
+is disabled. Additional values will be the object IDs of the first
+drm_colorop in a pipeline. A driver can create and advertise none, one,
+or more possible color pipelines. A DRM client will select a color
+pipeline by setting the COLOR PIPELINE to the respective value.
+
+In the case where drivers have custom support for pre-blending color
+processing those drivers shall reject atomic commits that are trying to
+use both the custom color properties, as well as the COLOR_PIPELINE
+property.
+
+An example of a COLOR_PIPELINE property on a plane might look like this::
+
+    Plane 10
+    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
+    ├─ …
+    └─ "color_pipeline": enum {0, 42, 52} = 0
+
+
+Color Pipeline Discovery
+========================
+
+A DRM client wanting color management on a drm_plane will:
+
+1. Read all drm_colorop objects
+2. Get the COLOR_PIPELINE property of the plane
+3. iterate all COLOR_PIPELINE enum values
+4. for each enum value walk the color pipeline (via the NEXT pointers)
+   and see if the available color operations are suitable for the
+   desired color management operations
+
+An example of chained properties to define an AMD pre-blending color
+pipeline might look like this::
+
+    Plane 10
+    ├─ "TYPE" (immutable) = Primary
+    └─ "COLOR_PIPELINE": enum {0, 44} = 0
+
+    Color operation 44
+    ├─ "TYPE" (immutable) = 1D enumerated curve
+    ├─ "BYPASS": bool
+    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
+    └─ "NEXT" (immutable) = 45
+
+    Color operation 45
+    ├─ "TYPE" (immutable) = 3x4 Matrix
+    ├─ "BYPASS": bool
+    ├─ "MATRIX_3_4": blob
+    └─ "NEXT" (immutable) = 46
+
+    Color operation 46
+    ├─ "TYPE" (immutable) = 1D enumerated curve
+    ├─ "BYPASS": bool
+    ├─ "CURVE_1D_TYPE": enum {sRGB Inverse EOTF, PQ Inverse EOTF} = sRGB EOTF
+    └─ "NEXT" (immutable) = 47
+
+    Color operation 47
+    ├─ "TYPE" (immutable) = 1D LUT
+    ├─ "LUT_1D_SIZE": immutable range = 4096
+    ├─ "LUT_1D_DATA": blob
+    └─ "NEXT" (immutable) = 48
+
+    Color operation 48
+    ├─ "TYPE" (immutable) = 3D LUT
+    ├─ "LUT_3D_SIZE" (immutable) = 17
+    ├─ "LUT_3D_DATA": blob
+    └─ "NEXT" (immutable) = 49
+
+    Color operation 49
+    ├─ "TYPE" (immutable) = 1D enumerated curve
+    ├─ "BYPASS": bool
+    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
+    └─ "NEXT" (immutable) = 0
+
+
+Color Pipeline Programming
+==========================
+
+Once a DRM client has found a suitable pipeline it will:
+
+1. Set the COLOR_PIPELINE enum value to the one pointing at the first
+   drm_colorop object of the desired pipeline
+2. Set the properties for all drm_colorop objects in the pipeline to the
+   desired values, setting BYPASS to true for unused drm_colorop blocks,
+   and false for enabled drm_colorop blocks
+3. Perform atomic_check/commit as desired
+
+To configure the pipeline for an HDR10 PQ plane and blending in linear
+space, a compositor might perform an atomic commit with the following
+property values::
+
+    Plane 10
+    └─ "COLOR_PIPELINE" = 42
+
+    Color operation 42 (input CSC)
+    └─ "BYPASS" = true
+
+    Color operation 44 (DeGamma)
+    └─ "BYPASS" = true
+
+    Color operation 45 (gamut remap)
+    └─ "BYPASS" = true
+
+    Color operation 46 (shaper LUT RAM)
+    └─ "BYPASS" = true
+
+    Color operation 47 (3D LUT RAM)
+    └─ "LUT_3D_DATA" = Gamut mapping + tone mapping + night mode
+
+    Color operation 48 (blend gamma)
+    └─ "CURVE_1D_TYPE" = PQ EOTF
+
+
+Driver Implementer's Guide
+==========================
+
+What does this all mean for driver implementations? As noted above the
+colorops can map to HW directly but don't need to do so. Here are some
+suggestions on how to think about creating your color pipelines:
+
+- Try to expose pipelines that use already defined colorops, even if
+  your hardware pipeline is split differently. This allows existing
+  userspace to immediately take advantage of the hardware.
+
+- Additionally, try to expose your actual hardware blocks as colorops.
+  Define new colorop types where you believe it can offer significant
+  benefits if userspace learns to program them.
+
+- Avoid defining new colorops for compound operations with very narrow
+  scope. If you have a hardware block for a special operation that
+  cannot be split further, you can expose that as a new colorop type.
+  However, try to not define colorops for "use cases", especially if
+  they require you to combine multiple hardware blocks.
+
+- Design new colorops as prescriptive, not descriptive; by the
+  mathematical formula, not by the assumed input and output.
+
+A defined colorop type must be deterministic. Its operation can depend
+only on its properties and input and nothing else, allowed error
+tolerance notwithstanding.
+
+
+Driver Forward/Backward Compatibility
+=====================================
+
+As this is uAPI drivers can't regress color pipelines that have been
+introduced for a given HW generation. New HW generations are free to
+abandon color pipelines advertised for previous generations.
+Nevertheless, it can be beneficial to carry support for existing color
+pipelines forward as those will likely already have support in DRM
+clients.
+
+Introducing new colorops to a pipeline is fine, as long as they can be
+disabled or are purely informational. DRM clients implementing support
+for the pipeline can always skip unknown properties as long as they can
+be confident that doing so will not cause unexpected results.
+
+If a new colorop doesn't fall into one of the above categories
+(bypassable or informational) the modified pipeline would be unusable
+for user space. In this case a new pipeline should be defined.
+
+
+References
+==========
+
+1. https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_hD5nAccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1QWn488=@emersion.fr/
\ No newline at end of file
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 07/17] drm/colorop: Introduce new drm_colorop mode object
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (5 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 08/17] drm/colorop: Add TYPE property Harry Wentland
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

This patches introduces a new drm_colorop mode object. This
object represents color transformations and can be used to
define color pipelines.

We also introduce the drm_colorop_state here, as well as
various helpers and state tracking bits.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/Makefile            |   1 +
 drivers/gpu/drm/drm_atomic.c        |  79 +++++++++++++
 drivers/gpu/drm/drm_atomic_helper.c |  12 ++
 drivers/gpu/drm/drm_atomic_uapi.c   |  48 ++++++++
 drivers/gpu/drm/drm_colorop.c       | 169 ++++++++++++++++++++++++++++
 drivers/gpu/drm/drm_mode_config.c   |   7 ++
 drivers/gpu/drm/drm_plane_helper.c  |   2 +-
 include/drm/drm_atomic.h            |  82 ++++++++++++++
 include/drm/drm_atomic_uapi.h       |   1 +
 include/drm/drm_colorop.h           | 157 ++++++++++++++++++++++++++
 include/drm/drm_mode_config.h       |  18 +++
 include/drm/drm_plane.h             |   2 +
 include/uapi/drm/drm.h              |   3 +
 include/uapi/drm/drm_mode.h         |   1 +
 14 files changed, 581 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/drm_colorop.c
 create mode 100644 include/drm/drm_colorop.h

diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 8e1bde059170..7ba67f9775e7 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -16,6 +16,7 @@ drm-y := \
 	drm_client.o \
 	drm_client_modeset.o \
 	drm_color_mgmt.o \
+	drm_colorop.o \
 	drm_connector.o \
 	drm_crtc.o \
 	drm_displayid.o \
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index f1a503aafe5a..d55db5a06940 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -42,6 +42,7 @@
 #include <drm/drm_mode.h>
 #include <drm/drm_print.h>
 #include <drm/drm_writeback.h>
+#include <drm/drm_colorop.h>
 
 #include "drm_crtc_internal.h"
 #include "drm_internal.h"
@@ -108,6 +109,7 @@ void drm_atomic_state_default_release(struct drm_atomic_state *state)
 	kfree(state->connectors);
 	kfree(state->crtcs);
 	kfree(state->planes);
+	kfree(state->colorops);
 	kfree(state->private_objs);
 }
 EXPORT_SYMBOL(drm_atomic_state_default_release);
@@ -139,6 +141,10 @@ drm_atomic_state_init(struct drm_device *dev, struct drm_atomic_state *state)
 				sizeof(*state->planes), GFP_KERNEL);
 	if (!state->planes)
 		goto fail;
+	state->colorops = kcalloc(dev->mode_config.num_colorop,
+				  sizeof(*state->colorops), GFP_KERNEL);
+	if (!state->colorops)
+		goto fail;
 
 	/*
 	 * Because drm_atomic_state can be committed asynchronously we need our
@@ -250,6 +256,20 @@ void drm_atomic_state_default_clear(struct drm_atomic_state *state)
 		state->planes[i].new_state = NULL;
 	}
 
+	for (i = 0; i < config->num_colorop; i++) {
+		struct drm_colorop *colorop = state->colorops[i].ptr;
+
+		if (!colorop)
+			continue;
+
+		drm_colorop_atomic_destroy_state(colorop,
+						 state->colorops[i].state);
+		state->colorops[i].ptr = NULL;
+		state->colorops[i].state = NULL;
+		state->colorops[i].old_state = NULL;
+		state->colorops[i].new_state = NULL;
+	}
+
 	for (i = 0; i < state->num_private_objs; i++) {
 		struct drm_private_obj *obj = state->private_objs[i].ptr;
 
@@ -571,6 +591,65 @@ drm_atomic_get_plane_state(struct drm_atomic_state *state,
 }
 EXPORT_SYMBOL(drm_atomic_get_plane_state);
 
+
+/**
+ * drm_atomic_get_colorop_state - get colorop state
+ * @state: global atomic state object
+ * @colorop: colorop to get state object for
+ *
+ * This function returns the colorop state for the given colorop, allocating it
+ * if needed. It will also grab the relevant plane lock to make sure that the
+ * state is consistent.
+ *
+ * Returns:
+ *
+ * Either the allocated state or the error code encoded into the pointer. When
+ * the error is EDEADLK then the w/w mutex code has detected a deadlock and the
+ * entire atomic sequence must be restarted. All other errors are fatal.
+ */
+struct drm_colorop_state *
+drm_atomic_get_colorop_state(struct drm_atomic_state *state,
+			     struct drm_colorop *colorop)
+{
+	int ret, index = drm_colorop_index(colorop);
+	struct drm_colorop_state *colorop_state;
+	struct drm_plane_state *plane_state;
+
+	WARN_ON(!state->acquire_ctx);
+
+	colorop_state = drm_atomic_get_existing_colorop_state(state, colorop);
+	if (colorop_state)
+		return colorop_state;
+
+	/* TODO where is the unlock? */
+	ret = drm_modeset_lock(&colorop->plane->mutex, state->acquire_ctx);
+	if (ret)
+		return ERR_PTR(ret);
+
+	colorop_state = drm_atomic_helper_colorop_duplicate_state(colorop);
+	if (!colorop_state)
+		return ERR_PTR(-ENOMEM);
+
+	state->colorops[index].state = colorop_state;
+	state->colorops[index].ptr = colorop;
+	state->colorops[index].old_state = colorop->state;
+	state->colorops[index].new_state = colorop_state;
+	colorop_state->state = state;
+
+	drm_dbg_atomic(colorop->dev, "Added [COLOROP:%d] %p state to %p\n",
+		       colorop->base.id, colorop_state, state);
+
+	/* TODO is this necessary? */
+
+	plane_state = drm_atomic_get_plane_state(state,
+						 colorop->plane);
+	if (IS_ERR(plane_state))
+		return ERR_CAST(plane_state);
+
+	return colorop_state;
+}
+EXPORT_SYMBOL(drm_atomic_get_colorop_state);
+
 static bool
 plane_switching_crtc(const struct drm_plane_state *old_plane_state,
 		     const struct drm_plane_state *new_plane_state)
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 71d399397107..54c3d66d008f 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -2985,6 +2985,8 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
 	struct drm_crtc_state *old_crtc_state, *new_crtc_state;
 	struct drm_plane *plane;
 	struct drm_plane_state *old_plane_state, *new_plane_state;
+	struct drm_colorop *colorop;
+	struct drm_colorop_state *old_colorop_state, *new_colorop_state;
 	struct drm_crtc_commit *commit;
 	struct drm_private_obj *obj;
 	struct drm_private_state *old_obj_state, *new_obj_state;
@@ -3062,6 +3064,16 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
 		}
 	}
 
+	for_each_oldnew_colorop_in_state(state, colorop, old_colorop_state, new_colorop_state, i) {
+		WARN_ON(colorop->state != old_colorop_state);
+
+		old_colorop_state->state = state;
+		new_colorop_state->state = NULL;
+
+		state->colorops[i].state = old_colorop_state;
+		colorop->state = new_colorop_state;
+	}
+
 	for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
 		WARN_ON(plane->state != old_plane_state);
 
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c
index 98d3b10c08ae..21da1b327ee9 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -34,6 +34,7 @@
 #include <drm/drm_drv.h>
 #include <drm/drm_writeback.h>
 #include <drm/drm_vblank.h>
+#include <drm/drm_colorop.h>
 
 #include <linux/dma-fence.h>
 #include <linux/uaccess.h>
@@ -664,6 +665,26 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
 	return 0;
 }
 
+
+static int drm_atomic_colorop_set_property(struct drm_colorop *colorop,
+		struct drm_colorop_state *state, struct drm_file *file_priv,
+		struct drm_property *property, uint64_t val)
+{
+	drm_dbg_atomic(colorop->dev,
+			"[COLOROP:%d] unknown property [PROP:%d:%s]]\n",
+			colorop->base.id,
+			property->base.id, property->name);
+	return -EINVAL;
+}
+
+static int
+drm_atomic_colorop_get_property(struct drm_colorop *colorop,
+		const struct drm_colorop_state *state,
+		struct drm_property *property, uint64_t *val)
+{
+	return -EINVAL;
+}
+
 static int drm_atomic_set_writeback_fb_for_connector(
 		struct drm_connector_state *conn_state,
 		struct drm_framebuffer *fb)
@@ -926,6 +947,16 @@ int drm_atomic_get_property(struct drm_mode_object *obj,
 				plane->state, property, val);
 		break;
 	}
+	case DRM_MODE_OBJECT_COLOROP: {
+		struct drm_colorop *colorop = obj_to_colorop(obj);
+
+		if (colorop->plane)
+			WARN_ON(!drm_modeset_is_locked(&colorop->plane->mutex));
+
+		ret = drm_atomic_colorop_get_property(colorop,
+				colorop->state, property, val);
+		break;
+	}
 	default:
 		drm_dbg_atomic(dev, "[OBJECT:%d] has no properties\n", obj->id);
 		ret = -EINVAL;
@@ -1061,6 +1092,23 @@ int drm_atomic_set_property(struct drm_atomic_state *state,
 		ret = drm_atomic_plane_set_property(plane,
 				plane_state, file_priv,
 				prop, prop_value);
+
+		break;
+	}
+	case DRM_MODE_OBJECT_COLOROP: {
+		struct drm_colorop *colorop = obj_to_colorop(obj);
+		struct drm_colorop_state *colorop_state;
+
+		colorop_state = drm_atomic_get_colorop_state(state, colorop);
+		if (IS_ERR(colorop_state)) {
+			ret = PTR_ERR(colorop_state);
+			break;
+		}
+
+		ret = drm_atomic_colorop_set_property(colorop,
+				colorop_state, file_priv,
+				prop, prop_value);
+
 		break;
 	}
 	default:
diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
new file mode 100644
index 000000000000..78d6a0067f5b
--- /dev/null
+++ b/drivers/gpu/drm/drm_colorop.c
@@ -0,0 +1,169 @@
+/*
+ * Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: AMD
+ *
+ */
+
+#include <drm/drm_colorop.h>
+#include <drm/drm_print.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_plane.h>
+
+#include "drm_crtc_internal.h"
+
+/* TODO big colorop doc, including properties, etc. */
+
+/* Init Helpers */
+
+int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
+		     struct drm_plane *plane)
+{
+	struct drm_mode_config *config = &dev->mode_config;
+	int ret = 0;
+
+	ret = drm_mode_object_add(dev, &colorop->base, DRM_MODE_OBJECT_COLOROP);
+	if (ret)
+		return ret;
+
+	colorop->base.properties = &colorop->properties;
+	colorop->dev = dev;
+	colorop->plane = plane;
+
+	list_add_tail(&colorop->head, &config->colorop_list);
+	colorop->index = config->num_colorop++;
+
+	/* add properties */
+	return ret;
+}
+EXPORT_SYMBOL(drm_colorop_init);
+
+void __drm_atomic_helper_colorop_duplicate_state(struct drm_colorop *colorop,
+						 struct drm_colorop_state *state)
+{
+	memcpy(state, colorop->state, sizeof(*state));
+}
+
+struct drm_colorop_state *
+drm_atomic_helper_colorop_duplicate_state(struct drm_colorop *colorop)
+{
+	struct drm_colorop_state *state;
+
+	if (WARN_ON(!colorop->state))
+		return NULL;
+
+	state = kmalloc(sizeof(*state), GFP_KERNEL);
+	if (state)
+		__drm_atomic_helper_colorop_duplicate_state(colorop, state);
+
+	return state;
+}
+
+
+void drm_colorop_atomic_destroy_state(struct drm_colorop *colorop,
+				      struct drm_colorop_state *state)
+{
+	kfree(state);
+}
+
+/**
+ * __drm_colorop_destroy_state - release colorop state
+ * @state: colorop state object to release
+ *
+ * Releases all resources stored in the colorop state without actually freeing
+ * the memory of the colorop state. This is useful for drivers that subclass the
+ * colorop state.
+ */
+void __drm_colorop_destroy_state(struct drm_colorop_state *state)
+{
+	/* TODO might need this later */
+}
+
+/**
+ * drm_colorop_destroy_state - default state destroy hook
+ * @colorop: drm colorop
+ * @state: colorop state object to release
+ *
+ * Default colorop state destroy hook for drivers which don't have their own
+ * subclassed colorop state structure.
+ */
+void drm_colorop_destroy_state(struct drm_colorop *colorop,
+					   struct drm_colorop_state *state)
+{
+	kfree(state);
+}
+EXPORT_SYMBOL(drm_colorop_destroy_state);
+
+/**
+ * __drm_colorop_state_reset - resets colorop state to default values
+ * @colorop_state: atomic colorop state, must not be NULL
+ * @colorop: colorop object, must not be NULL
+ *
+ * Initializes the newly allocated @colorop_state with default
+ * values. This is useful for drivers that subclass the CRTC state.
+ */
+void __drm_colorop_state_reset(struct drm_colorop_state *colorop_state,
+					   struct drm_colorop *colorop)
+{
+	colorop_state->colorop = colorop;
+}
+EXPORT_SYMBOL(__drm_colorop_state_reset);
+
+/**
+ * __drm_colorop_reset - reset state on colorop
+ * @colorop: drm colorop
+ * @colorop_state: colorop state to assign
+ *
+ * Initializes the newly allocated @colorop_state and assigns it to
+ * the &drm_crtc->state pointer of @colorop, usually required when
+ * initializing the drivers or when called from the &drm_colorop_funcs.reset
+ * hook.
+ *
+ * This is useful for drivers that subclass the colorop state.
+ */
+void __drm_colorop_reset(struct drm_colorop *colorop,
+				     struct drm_colorop_state *colorop_state)
+{
+	if (colorop_state)
+		__drm_colorop_state_reset(colorop_state, colorop);
+
+	colorop->state = colorop_state;
+}
+
+/**
+ * drm_colorop_reset - reset colorop atomic state
+ * @colorop: drm colorop
+ *
+ * Resets the atomic state for @colorop by freeing the state pointer (which might
+ * be NULL, e.g. at driver load time) and allocating a new empty state object.
+ */
+void drm_colorop_reset(struct drm_colorop *colorop)
+{
+	if (colorop->state)
+		__drm_colorop_destroy_state(colorop->state);
+
+	kfree(colorop->state);
+	colorop->state = kzalloc(sizeof(*colorop->state), GFP_KERNEL);
+
+	if (colorop->state)
+		__drm_colorop_reset(colorop, colorop->state);
+}
+EXPORT_SYMBOL(drm_colorop_reset);
diff --git a/drivers/gpu/drm/drm_mode_config.c b/drivers/gpu/drm/drm_mode_config.c
index 8525ef851540..30c6fb10353b 100644
--- a/drivers/gpu/drm/drm_mode_config.c
+++ b/drivers/gpu/drm/drm_mode_config.c
@@ -29,6 +29,7 @@
 #include <drm/drm_managed.h>
 #include <drm/drm_mode_config.h>
 #include <drm/drm_print.h>
+#include <drm/drm_colorop.h>
 #include <linux/dma-resv.h>
 
 #include "drm_crtc_internal.h"
@@ -182,11 +183,15 @@ int drm_mode_getresources(struct drm_device *dev, void *data,
 void drm_mode_config_reset(struct drm_device *dev)
 {
 	struct drm_crtc *crtc;
+	struct drm_colorop *colorop;
 	struct drm_plane *plane;
 	struct drm_encoder *encoder;
 	struct drm_connector *connector;
 	struct drm_connector_list_iter conn_iter;
 
+	drm_for_each_colorop(colorop, dev)
+		drm_colorop_reset(colorop);
+
 	drm_for_each_plane(plane, dev)
 		if (plane->funcs->reset)
 			plane->funcs->reset(plane);
@@ -413,6 +418,7 @@ int drmm_mode_config_init(struct drm_device *dev)
 	INIT_LIST_HEAD(&dev->mode_config.property_list);
 	INIT_LIST_HEAD(&dev->mode_config.property_blob_list);
 	INIT_LIST_HEAD(&dev->mode_config.plane_list);
+	INIT_LIST_HEAD(&dev->mode_config.colorop_list);
 	INIT_LIST_HEAD(&dev->mode_config.privobj_list);
 	idr_init_base(&dev->mode_config.object_idr, 1);
 	idr_init_base(&dev->mode_config.tile_idr, 1);
@@ -434,6 +440,7 @@ int drmm_mode_config_init(struct drm_device *dev)
 	dev->mode_config.num_crtc = 0;
 	dev->mode_config.num_encoder = 0;
 	dev->mode_config.num_total_plane = 0;
+	dev->mode_config.num_colorop = 0;
 
 	if (IS_ENABLED(CONFIG_LOCKDEP)) {
 		struct drm_modeset_acquire_ctx modeset_ctx;
diff --git a/drivers/gpu/drm/drm_plane_helper.c b/drivers/gpu/drm/drm_plane_helper.c
index 5e95089676ff..912580eca1e5 100644
--- a/drivers/gpu/drm/drm_plane_helper.c
+++ b/drivers/gpu/drm/drm_plane_helper.c
@@ -310,4 +310,4 @@ int drm_plane_helper_atomic_check(struct drm_plane *plane, struct drm_atomic_sta
 						   DRM_PLANE_NO_SCALING,
 						   false, false);
 }
-EXPORT_SYMBOL(drm_plane_helper_atomic_check);
+EXPORT_SYMBOL(drm_plane_helper_atomic_check);
\ No newline at end of file
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
index cf8e1220a4ac..634b2827765f 100644
--- a/include/drm/drm_atomic.h
+++ b/include/drm/drm_atomic.h
@@ -30,6 +30,7 @@
 
 #include <drm/drm_crtc.h>
 #include <drm/drm_util.h>
+#include <drm/drm_colorop.h>
 
 /**
  * struct drm_crtc_commit - track modeset commits on a CRTC
@@ -157,6 +158,11 @@ struct drm_crtc_commit {
 	bool abort_completion;
 };
 
+struct __drm_colorops_state {
+	struct drm_colorop *ptr;
+	struct drm_colorop_state *state, *old_state, *new_state;
+};
+
 struct __drm_planes_state {
 	struct drm_plane *ptr;
 	struct drm_plane_state *state, *old_state, *new_state;
@@ -398,6 +404,7 @@ struct drm_atomic_state {
 	 * states.
 	 */
 	bool duplicated : 1;
+	struct __drm_colorops_state *colorops;
 	struct __drm_planes_state *planes;
 	struct __drm_crtcs_state *crtcs;
 	int num_connector;
@@ -501,6 +508,9 @@ drm_atomic_get_crtc_state(struct drm_atomic_state *state,
 struct drm_plane_state * __must_check
 drm_atomic_get_plane_state(struct drm_atomic_state *state,
 			   struct drm_plane *plane);
+struct drm_colorop_state *
+drm_atomic_get_colorop_state(struct drm_atomic_state *state,
+			     struct drm_colorop *colorop);
 struct drm_connector_state * __must_check
 drm_atomic_get_connector_state(struct drm_atomic_state *state,
 			       struct drm_connector *connector);
@@ -630,6 +640,55 @@ drm_atomic_get_new_plane_state(const struct drm_atomic_state *state,
 	return state->planes[drm_plane_index(plane)].new_state;
 }
 
+
+/**
+ * drm_atomic_get_existing_colorop_state - get colorop state, if it exists
+ * @state: global atomic state object
+ * @colorop: colorop to grab
+ *
+ * This function returns the colorop state for the given colorop, or NULL
+ * if the colorop is not part of the global atomic state.
+ *
+ * This function is deprecated, @drm_atomic_get_old_colorop_state or
+ * @drm_atomic_get_new_colorop_state should be used instead.
+ */
+static inline struct drm_colorop_state *
+drm_atomic_get_existing_colorop_state(struct drm_atomic_state *state,
+				    struct drm_colorop *colorop)
+{
+	return state->colorops[drm_colorop_index(colorop)].state;
+}
+
+/**
+ * drm_atomic_get_old_colorop_state - get colorop state, if it exists
+ * @state: global atomic state object
+ * @colorop: colorop to grab
+ *
+ * This function returns the old colorop state for the given colorop, or
+ * NULL if the colorop is not part of the global atomic state.
+ */
+static inline struct drm_colorop_state *
+drm_atomic_get_old_colorop_state(struct drm_atomic_state *state,
+			       struct drm_colorop *colorop)
+{
+	return state->colorops[drm_colorop_index(colorop)].old_state;
+}
+
+/**
+ * drm_atomic_get_new_colorop_state - get colorop state, if it exists
+ * @state: global atomic state object
+ * @colorop: colorop to grab
+ *
+ * This function returns the new colorop state for the given colorop, or
+ * NULL if the colorop is not part of the global atomic state.
+ */
+static inline struct drm_colorop_state *
+drm_atomic_get_new_colorop_state(struct drm_atomic_state *state,
+			       struct drm_colorop *colorop)
+{
+	return state->colorops[drm_colorop_index(colorop)].new_state;
+}
+
 /**
  * drm_atomic_get_existing_connector_state - get connector state, if it exists
  * @state: global atomic state object
@@ -877,6 +936,29 @@ void drm_state_dump(struct drm_device *dev, struct drm_printer *p);
 			     (new_crtc_state) = (__state)->crtcs[__i].new_state, \
 			     (void)(new_crtc_state) /* Only to avoid unused-but-set-variable warning */, 1))
 
+/**
+ * for_each_oldnew_colorop_in_state - iterate over all colorops in an atomic update
+ * @__state: &struct drm_atomic_state pointer
+ * @colorop: &struct drm_colorop iteration cursor
+ * @old_colorop_state: &struct drm_colorop_state iteration cursor for the old state
+ * @new_colorop_state: &struct drm_colorop_state iteration cursor for the new state
+ * @__i: int iteration cursor, for macro-internal use
+ *
+ * This iterates over all colorops in an atomic update, tracking both old and
+ * new state. This is useful in places where the state delta needs to be
+ * considered, for example in atomic check functions.
+ */
+#define for_each_oldnew_colorop_in_state(__state, colorop, old_colorop_state, new_colorop_state, __i) \
+	for ((__i) = 0;							\
+	     (__i) < (__state)->dev->mode_config.num_colorop;	\
+	     (__i)++)							\
+		for_each_if ((__state)->colorops[__i].ptr &&		\
+			     ((colorop) = (__state)->colorops[__i].ptr,	\
+			      (void)(colorop) /* Only to avoid unused-but-set-variable warning */, \
+			      (old_colorop_state) = (__state)->colorops[__i].old_state,\
+			      (new_colorop_state) = (__state)->colorops[__i].new_state, 1))
+
+
 /**
  * for_each_oldnew_plane_in_state - iterate over all planes in an atomic update
  * @__state: &struct drm_atomic_state pointer
diff --git a/include/drm/drm_atomic_uapi.h b/include/drm/drm_atomic_uapi.h
index 4c6d39d7bdb2..70a115d523cd 100644
--- a/include/drm/drm_atomic_uapi.h
+++ b/include/drm/drm_atomic_uapi.h
@@ -37,6 +37,7 @@ struct drm_crtc;
 struct drm_connector_state;
 struct dma_fence;
 struct drm_framebuffer;
+struct drm_colorop;
 
 int __must_check
 drm_atomic_set_mode_for_crtc(struct drm_crtc_state *state,
diff --git a/include/drm/drm_colorop.h b/include/drm/drm_colorop.h
new file mode 100644
index 000000000000..3dd169b0317d
--- /dev/null
+++ b/include/drm/drm_colorop.h
@@ -0,0 +1,157 @@
+/*
+ * Copyright (C) 2023 Advanced Micro Devices, Inc. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: AMD
+ *
+ */
+
+#ifndef __DRM_COLOROP_H__
+#define __DRM_COLOROP_H__
+
+#include <drm/drm_mode_object.h>
+#include <drm/drm_mode.h>
+#include <drm/drm_property.h>
+
+/**
+ * struct drm_colorop_state - mutable colorop state
+ */
+struct drm_colorop_state {
+	/** @colorop: backpointer to the colorop */
+	struct drm_colorop *colorop;
+
+	/* colorop properties */
+
+	/** @state: backpointer to global drm_atomic_state */
+	struct drm_atomic_state *state;
+};
+
+/**
+ * struct drm_colorop - DRM color operation control structure
+ *
+ * A colorop represents one color operation. They can be chained via
+ * the 'next' pointer to build a color pipeline.
+ */
+struct drm_colorop {
+	/** @dev: parent DRM device */
+	struct drm_device *dev;
+
+	/**
+	 * @head:
+	 *
+	 * List of all colorops on @dev, linked from &drm_mode_config.colorop_list.
+	 * Invariant over the lifetime of @dev and therefore does not need
+	 * locking.
+	 */
+	struct list_head head;
+
+	/**
+	 * @index: Position inside the mode_config.list, can be used as an array
+	 * index. It is invariant over the lifetime of the plane.
+	 */
+	unsigned index;
+
+	/* TODO do we need a separate mutex or will we tag along with the plane mutex? */
+
+	/** @base base mode object*/
+	struct drm_mode_object base;
+
+	/**
+	 * @plane:
+	 *
+	 * The plane on which the colorop sits. A drm_colorop is always unique
+	 * to a plane.
+	 */
+	struct drm_plane *plane;
+
+	/**
+	 * @state:
+	 *
+	 * Current atomic state for this colorop.
+	 *
+	 * This is protected by @mutex. Note that nonblocking atomic commits
+	 * access the current colorop state without taking locks. Either by
+	 * going through the &struct drm_atomic_state pointers, see
+	 * for_each_oldnew_plane_in_state(), for_each_old_plane_in_state() and
+	 * for_each_new_plane_in_state(). Or through careful ordering of atomic
+	 * commit operations as implemented in the atomic helpers, see
+	 * &struct drm_crtc_commit.
+	 *
+	 * TODO keep, remove, or rewrite above plane references?
+	 */
+	struct drm_colorop_state *state;
+
+	/* colorop properties */
+
+	/** @properties: property tracking for this plane */
+	struct drm_object_properties properties;
+
+};
+
+#define obj_to_colorop(x) container_of(x, struct drm_colorop, base)
+
+/**
+ * drm_crtc_find - look up a Colorop object from its ID
+ * @dev: DRM device
+ * @file_priv: drm file to check for lease against.
+ * @id: &drm_mode_object ID
+ *
+ * This can be used to look up a Colorop from its userspace ID. Only used by
+ * drivers for legacy IOCTLs and interface, nowadays extensions to the KMS
+ * userspace interface should be done using &drm_property.
+ */
+static inline struct drm_colorop *drm_colorop_find(struct drm_device *dev,
+		struct drm_file *file_priv,
+		uint32_t id)
+{
+	struct drm_mode_object *mo;
+	mo = drm_mode_object_find(dev, file_priv, id, DRM_MODE_OBJECT_COLOROP);
+	return mo ? obj_to_colorop(mo) : NULL;
+}
+
+int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
+		     struct drm_plane *plane);
+
+struct drm_colorop_state *
+drm_atomic_helper_colorop_duplicate_state(struct drm_colorop *colorop);
+
+void drm_colorop_atomic_destroy_state(struct drm_colorop *colorop,
+				      struct drm_colorop_state *state);
+
+void drm_colorop_reset(struct drm_colorop *colorop);
+
+/**
+ * drm_colorop_index - find the index of a registered colorop
+ * @colorop: colorop to find index for
+ *
+ * Given a registered colorop, return the index of that colorop within a DRM
+ * device's list of colorops.
+ */
+static inline unsigned int drm_colorop_index(const struct drm_colorop *colorop)
+{
+	return colorop->index;
+}
+
+
+#define drm_for_each_colorop(colorop, dev) \
+	list_for_each_entry(colorop, &(dev)->mode_config.colorop_list, head)
+
+
+#endif /* __DRM_COLOROP_H__ */
diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
index 973119a9176b..492b8c120c80 100644
--- a/include/drm/drm_mode_config.h
+++ b/include/drm/drm_mode_config.h
@@ -505,6 +505,24 @@ struct drm_mode_config {
 	 */
 	struct list_head plane_list;
 
+	/**
+	 * @num_colorop:
+	 *
+	 * Number of colorop objects on this device.
+	 * This is invariant over the lifetime of a device and hence doesn't
+	 * need any locks.
+	 */
+	int num_colorop;
+
+	/**
+	 * @colorops_list:
+	 *
+	 * List of colorop objects linked with &drm_colorop.head. This is
+	 * invariant over the lifetime of a device and hence doesn't need any
+	 * locks.
+	 */
+	struct list_head colorop_list;
+
 	/**
 	 * @num_crtc:
 	 *
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index 79d62856defb..57bbd0cd73a9 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -227,6 +227,8 @@ struct drm_plane_state {
 	 */
 	enum drm_scaling_filter scaling_filter;
 
+	struct drm_colorop *color_pipeline;
+
 	/**
 	 * @commit: Tracks the pending commit to prevent use-after-free conditions,
 	 * and for async plane updates.
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 794c1d857677..b22adfabc677 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -1198,6 +1198,9 @@ extern "C" {
 
 #define DRM_IOCTL_SYNCOBJ_EVENTFD	DRM_IOWR(0xCF, struct drm_syncobj_eventfd)
 
+#define DRM_IOCTL_MODE_GETCOLOROPRESOURCES DRM_IOWR(0xD0, struct drm_mode_get_colorop_res)
+#define DRM_IOCTL_MODE_GETCOLOROP          DRM_IOWR(0xD1, struct drm_mode_get_colorop)
+
 /*
  * Device specific ioctls should only be in their respective headers
  * The device specific ioctl range is from 0x40 to 0x9f.
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index ea1b639bcb28..009a800676ac 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -629,6 +629,7 @@ struct drm_mode_connector_set_property {
 #define DRM_MODE_OBJECT_FB 0xfbfbfbfb
 #define DRM_MODE_OBJECT_BLOB 0xbbbbbbbb
 #define DRM_MODE_OBJECT_PLANE 0xeeeeeeee
+#define DRM_MODE_OBJECT_COLOROP 0xfafafafa
 #define DRM_MODE_OBJECT_ANY 0
 
 struct drm_mode_obj_get_properties {
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 08/17] drm/colorop: Add TYPE property
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (6 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 07/17] drm/colorop: Introduce new drm_colorop mode object Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 09/17] drm/color: Add 1D Curve subtype Harry Wentland
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Add a read-only TYPE property. The TYPE specifies the colorop
type, such as enumerated curve, 1D LUT, CTM, 3D LUT, PWL LUT,
etc.

For now we're only introducing an enumerated 1D LUT type to
illustrate the concept.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_atomic.c      |  4 +--
 drivers/gpu/drm/drm_atomic_uapi.c |  8 +++++-
 drivers/gpu/drm/drm_colorop.c     | 43 ++++++++++++++++++++++++++++++-
 include/drm/drm_colorop.h         | 21 ++++++++++++++-
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index d55db5a06940..524bec520287 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -636,8 +636,8 @@ drm_atomic_get_colorop_state(struct drm_atomic_state *state,
 	state->colorops[index].new_state = colorop_state;
 	colorop_state->state = state;
 
-	drm_dbg_atomic(colorop->dev, "Added [COLOROP:%d] %p state to %p\n",
-		       colorop->base.id, colorop_state, state);
+	drm_dbg_atomic(colorop->dev, "Added [COLOROP:%d:%d] %p state to %p\n",
+		       colorop->base.id, colorop->type, colorop_state, state);
 
 	/* TODO is this necessary? */
 
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c
index 21da1b327ee9..f22bd8671236 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -682,7 +682,13 @@ drm_atomic_colorop_get_property(struct drm_colorop *colorop,
 		const struct drm_colorop_state *state,
 		struct drm_property *property, uint64_t *val)
 {
-	return -EINVAL;
+	if (property == colorop->type_property) {
+		*val = colorop->type;
+	} else {
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 static int drm_atomic_set_writeback_fb_for_connector(
diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
index 78d6a0067f5b..33e7dbf4dbe4 100644
--- a/drivers/gpu/drm/drm_colorop.c
+++ b/drivers/gpu/drm/drm_colorop.c
@@ -32,12 +32,17 @@
 
 /* TODO big colorop doc, including properties, etc. */
 
+static const struct drm_prop_enum_list drm_colorop_type_enum_list[] = {
+	{ DRM_COLOROP_1D_CURVE, "1D Curve" },
+};
+
 /* Init Helpers */
 
 int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
-		     struct drm_plane *plane)
+		     struct drm_plane *plane, enum drm_colorop_type type)
 {
 	struct drm_mode_config *config = &dev->mode_config;
+	struct drm_property *prop;
 	int ret = 0;
 
 	ret = drm_mode_object_add(dev, &colorop->base, DRM_MODE_OBJECT_COLOROP);
@@ -46,12 +51,28 @@ int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
 
 	colorop->base.properties = &colorop->properties;
 	colorop->dev = dev;
+	colorop->type = type;
 	colorop->plane = plane;
 
 	list_add_tail(&colorop->head, &config->colorop_list);
 	colorop->index = config->num_colorop++;
 
 	/* add properties */
+
+	/* type */
+	prop = drm_property_create_enum(dev,
+					DRM_MODE_PROP_IMMUTABLE | DRM_MODE_PROP_ATOMIC,
+					"TYPE", drm_colorop_type_enum_list,
+					ARRAY_SIZE(drm_colorop_type_enum_list));
+	if (!prop)
+		return -ENOMEM;
+
+	colorop->type_property = prop;
+
+	drm_object_attach_property(&colorop->base,
+				   colorop->type_property,
+				   colorop->type);
+
 	return ret;
 }
 EXPORT_SYMBOL(drm_colorop_init);
@@ -167,3 +188,23 @@ void drm_colorop_reset(struct drm_colorop *colorop)
 		__drm_colorop_reset(colorop, colorop->state);
 }
 EXPORT_SYMBOL(drm_colorop_reset);
+
+
+static const char * const colorop_type_name[] = {
+	[DRM_COLOROP_1D_CURVE] = "1D Curve",
+};
+
+/**
+ * drm_get_colorop_type_name - return a string for colorop type
+ * @range: colorop type to compute name of
+ *
+ * In contrast to the other drm_get_*_name functions this one here returns a
+ * const pointer and hence is threadsafe.
+ */
+const char *drm_get_colorop_type_name(enum drm_colorop_type type)
+{
+	if (WARN_ON(type >= ARRAY_SIZE(colorop_type_name)))
+		return "unknown";
+
+	return colorop_type_name[type];
+}
diff --git a/include/drm/drm_colorop.h b/include/drm/drm_colorop.h
index 3dd169b0317d..22a217372428 100644
--- a/include/drm/drm_colorop.h
+++ b/include/drm/drm_colorop.h
@@ -30,6 +30,10 @@
 #include <drm/drm_mode.h>
 #include <drm/drm_property.h>
 
+enum drm_colorop_type {
+	DRM_COLOROP_1D_CURVE
+};
+
 /**
  * struct drm_colorop_state - mutable colorop state
  */
@@ -103,6 +107,21 @@ struct drm_colorop {
 	/** @properties: property tracking for this plane */
 	struct drm_object_properties properties;
 
+	/**
+	 * @type:
+	 *
+	 * Read-only
+	 * Type of color operation
+	 */
+	enum drm_colorop_type type;
+
+	/**
+	 * @type_property:
+	 *
+	 * Read-only "TYPE" enum property for specifying the type of
+	 * this color operation. The type is enum drm_colorop_type.
+	 */
+	struct drm_property *type_property;
 };
 
 #define obj_to_colorop(x) container_of(x, struct drm_colorop, base)
@@ -127,7 +146,7 @@ static inline struct drm_colorop *drm_colorop_find(struct drm_device *dev,
 }
 
 int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
-		     struct drm_plane *plane);
+		     struct drm_plane *plane, enum drm_colorop_type type);
 
 struct drm_colorop_state *
 drm_atomic_helper_colorop_duplicate_state(struct drm_colorop *colorop);
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 09/17] drm/color: Add 1D Curve subtype
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (7 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 08/17] drm/colorop: Add TYPE property Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 10/17] drm/colorop: Add BYPASS property Harry Wentland
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_atomic_uapi.c | 18 ++++++++++----
 drivers/gpu/drm/drm_colorop.c     | 39 +++++++++++++++++++++++++++++++
 include/drm/drm_colorop.h         | 20 ++++++++++++++++
 3 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c
index f22bd8671236..52b9b48e5757 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -670,11 +670,17 @@ static int drm_atomic_colorop_set_property(struct drm_colorop *colorop,
 		struct drm_colorop_state *state, struct drm_file *file_priv,
 		struct drm_property *property, uint64_t val)
 {
-	drm_dbg_atomic(colorop->dev,
-			"[COLOROP:%d] unknown property [PROP:%d:%s]]\n",
-			colorop->base.id,
-			property->base.id, property->name);
-	return -EINVAL;
+	if (property == colorop->curve_1d_type_property) {
+		state->curve_1d_type = val;
+	} else {
+		drm_dbg_atomic(colorop->dev,
+			       "[COLOROP:%d:%d] unknown property [PROP:%d:%s]]\n",
+			       colorop->base.id, colorop->type,
+			       property->base.id, property->name);
+		return -EINVAL;
+	}
+
+	return 0;
 }
 
 static int
@@ -684,6 +690,8 @@ drm_atomic_colorop_get_property(struct drm_colorop *colorop,
 {
 	if (property == colorop->type_property) {
 		*val = colorop->type;
+	} else if (property == colorop->curve_1d_type_property) {
+		*val = state->curve_1d_type;
 	} else {
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
index 33e7dbf4dbe4..8d8f9461950f 100644
--- a/drivers/gpu/drm/drm_colorop.c
+++ b/drivers/gpu/drm/drm_colorop.c
@@ -36,6 +36,11 @@ static const struct drm_prop_enum_list drm_colorop_type_enum_list[] = {
 	{ DRM_COLOROP_1D_CURVE, "1D Curve" },
 };
 
+static const struct drm_prop_enum_list drm_colorop_curve_1d_type_enum_list[] = {
+	{ DRM_COLOROP_1D_CURVE_SRGB_EOTF, "sRGB EOTF" },
+	{ DRM_COLOROP_1D_CURVE_SRGB_INV_EOTF, "sRGB Inverse EOTF" },
+};
+
 /* Init Helpers */
 
 int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
@@ -73,6 +78,20 @@ int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
 				   colorop->type_property,
 				   colorop->type);
 
+	/* curve_1d_type */
+	/* TODO move to mode_config? */
+	prop = drm_property_create_enum(dev, DRM_MODE_PROP_ATOMIC,
+					"CURVE_1D_TYPE",
+					drm_colorop_curve_1d_type_enum_list,
+					ARRAY_SIZE(drm_colorop_curve_1d_type_enum_list));
+	if (!prop)
+		return -ENOMEM;
+
+	colorop->curve_1d_type_property = prop;
+	drm_object_attach_property(&colorop->base,
+				   colorop->curve_1d_type_property,
+				   0);
+
 	return ret;
 }
 EXPORT_SYMBOL(drm_colorop_init);
@@ -194,6 +213,11 @@ static const char * const colorop_type_name[] = {
 	[DRM_COLOROP_1D_CURVE] = "1D Curve",
 };
 
+static const char * const colorop_curve_1d_type_name[] = {
+	[DRM_COLOROP_1D_CURVE_SRGB_EOTF] = "sRGB EOTF",
+	[DRM_COLOROP_1D_CURVE_SRGB_INV_EOTF] = "sRGB Inverse EOTF",
+};
+
 /**
  * drm_get_colorop_type_name - return a string for colorop type
  * @range: colorop type to compute name of
@@ -208,3 +232,18 @@ const char *drm_get_colorop_type_name(enum drm_colorop_type type)
 
 	return colorop_type_name[type];
 }
+
+/**
+ * drm_get_colorop_curve_1d_type_name - return a string for 1D curve type
+ * @range: 1d curve type to compute name of
+ *
+ * In contrast to the other drm_get_*_name functions this one here returns a
+ * const pointer and hence is threadsafe.
+ */
+const char *drm_get_colorop_curve_1d_type_name(enum drm_colorop_curve_1d_type type)
+{
+	if (WARN_ON(type >= ARRAY_SIZE(colorop_curve_1d_type_name)))
+		return "unknown";
+
+	return colorop_curve_1d_type_name[type];
+}
diff --git a/include/drm/drm_colorop.h b/include/drm/drm_colorop.h
index 22a217372428..7701b61ff7e9 100644
--- a/include/drm/drm_colorop.h
+++ b/include/drm/drm_colorop.h
@@ -34,6 +34,11 @@ enum drm_colorop_type {
 	DRM_COLOROP_1D_CURVE
 };
 
+enum drm_colorop_curve_1d_type {
+	DRM_COLOROP_1D_CURVE_SRGB_EOTF,
+	DRM_COLOROP_1D_CURVE_SRGB_INV_EOTF
+};
+
 /**
  * struct drm_colorop_state - mutable colorop state
  */
@@ -43,6 +48,13 @@ struct drm_colorop_state {
 
 	/* colorop properties */
 
+	/**
+	 * @curve_1d_type:
+	 *
+	 * Type of 1D curve.
+	 */
+	enum drm_colorop_curve_1d_type curve_1d_type;
+
 	/** @state: backpointer to global drm_atomic_state */
 	struct drm_atomic_state *state;
 };
@@ -122,6 +134,14 @@ struct drm_colorop {
 	 * this color operation. The type is enum drm_colorop_type.
 	 */
 	struct drm_property *type_property;
+
+	/**
+	 * @curve_1d_type:
+	 *
+	 * Sub-type for DRM_COLOROP_1D_CURVE type.
+	 */
+	struct drm_property *curve_1d_type_property;
+
 };
 
 #define obj_to_colorop(x) container_of(x, struct drm_colorop, base)
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 10/17] drm/colorop: Add BYPASS property
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (8 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 09/17] drm/color: Add 1D Curve subtype Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 11/17] drm/colorop: Add NEXT property Harry Wentland
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

We want to be able to bypass each colorop at all times.
Introduce a new BYPASS boolean property for this.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_atomic_uapi.c |  6 +++++-
 drivers/gpu/drm/drm_colorop.c     | 15 +++++++++++++++
 include/drm/drm_colorop.h         | 20 ++++++++++++++++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c
index 52b9b48e5757..a8f7a8a6639a 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -670,7 +670,9 @@ static int drm_atomic_colorop_set_property(struct drm_colorop *colorop,
 		struct drm_colorop_state *state, struct drm_file *file_priv,
 		struct drm_property *property, uint64_t val)
 {
-	if (property == colorop->curve_1d_type_property) {
+	if (property == colorop->bypass_property) {
+		state->bypass = val;
+	} else if (property == colorop->curve_1d_type_property) {
 		state->curve_1d_type = val;
 	} else {
 		drm_dbg_atomic(colorop->dev,
@@ -690,6 +692,8 @@ drm_atomic_colorop_get_property(struct drm_colorop *colorop,
 {
 	if (property == colorop->type_property) {
 		*val = colorop->type;
+	} else if (property == colorop->bypass_property) {
+		*val = state->bypass;
 	} else if (property == colorop->curve_1d_type_property) {
 		*val = state->curve_1d_type;
 	} else {
diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
index 8d8f9461950f..ff6331fe5d5e 100644
--- a/drivers/gpu/drm/drm_colorop.c
+++ b/drivers/gpu/drm/drm_colorop.c
@@ -78,6 +78,18 @@ int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
 				   colorop->type_property,
 				   colorop->type);
 
+	/* bypass */
+	/* TODO can we reuse the mode_config->active_prop? */
+	prop = drm_property_create_bool(dev, DRM_MODE_PROP_ATOMIC,
+					"BYPASS");
+	if (!prop)
+		return -ENOMEM;
+
+	colorop->bypass_property = prop;
+	drm_object_attach_property(&colorop->base,
+				   colorop->bypass_property,
+				   1);
+
 	/* curve_1d_type */
 	/* TODO move to mode_config? */
 	prop = drm_property_create_enum(dev, DRM_MODE_PROP_ATOMIC,
@@ -100,6 +112,8 @@ void __drm_atomic_helper_colorop_duplicate_state(struct drm_colorop *colorop,
 						 struct drm_colorop_state *state)
 {
 	memcpy(state, colorop->state, sizeof(*state));
+
+	state->bypass = true;
 }
 
 struct drm_colorop_state *
@@ -164,6 +178,7 @@ void __drm_colorop_state_reset(struct drm_colorop_state *colorop_state,
 					   struct drm_colorop *colorop)
 {
 	colorop_state->colorop = colorop;
+	colorop_state->bypass = true;
 }
 EXPORT_SYMBOL(__drm_colorop_state_reset);
 
diff --git a/include/drm/drm_colorop.h b/include/drm/drm_colorop.h
index 7701b61ff7e9..69636f6752a0 100644
--- a/include/drm/drm_colorop.h
+++ b/include/drm/drm_colorop.h
@@ -48,6 +48,14 @@ struct drm_colorop_state {
 
 	/* colorop properties */
 
+	/**
+	 * @bypass:
+	 *
+	 * True if colorop shall be bypassed. False if colorop is
+	 * enabled.
+	 */
+	bool bypass;
+
 	/**
 	 * @curve_1d_type:
 	 *
@@ -135,6 +143,18 @@ struct drm_colorop {
 	 */
 	struct drm_property *type_property;
 
+	/**
+	 * @bypass_property:
+	 *
+	 * Boolean property to control enablement of the color
+	 * operation. Setting bypass to "true" shall always be supported
+	 * in order to allow compositors to quickly fall back to
+	 * alternate methods of color processing. This is important
+	 * since setting color operations can fail due to unique
+	 * HW constraints.
+	 */
+	struct drm_property *bypass_property;
+
 	/**
 	 * @curve_1d_type:
 	 *
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 11/17] drm/colorop: Add NEXT property
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (9 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 10/17] drm/colorop: Add BYPASS property Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 12/17] drm/colorop: Add atomic state print for drm_colorop Harry Wentland
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

We'll construct color pipelines out of drm_colorop by
chaining them via the NEXT pointer. NEXT will point to
the next drm_colorop in the pipeline, or by 0 if we're
at the end of the pipeline.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_colorop.c | 27 +++++++++++++++++++++++++++
 include/drm/drm_colorop.h     | 12 ++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
index ff6331fe5d5e..bc1250718baf 100644
--- a/drivers/gpu/drm/drm_colorop.c
+++ b/drivers/gpu/drm/drm_colorop.c
@@ -104,6 +104,15 @@ int drm_colorop_init(struct drm_device *dev, struct drm_colorop *colorop,
 				   colorop->curve_1d_type_property,
 				   0);
 
+	prop = drm_property_create_object(dev, DRM_MODE_PROP_IMMUTABLE | DRM_MODE_PROP_ATOMIC,
+			"NEXT", DRM_MODE_OBJECT_COLOROP);
+	if (!prop)
+		return -ENOMEM;
+	colorop->next_property = prop;
+	drm_object_attach_property(&colorop->base,
+				   colorop->next_property,
+				   0);
+
 	return ret;
 }
 EXPORT_SYMBOL(drm_colorop_init);
@@ -262,3 +271,21 @@ const char *drm_get_colorop_curve_1d_type_name(enum drm_colorop_curve_1d_type ty
 
 	return colorop_curve_1d_type_name[type];
 }
+
+/**
+ * drm_colorop_set_next_property - sets the next pointer
+ * @colorop: drm colorop
+ * @next: next colorop
+ *
+ * Should be used when constructing the color pipeline
+ */
+void drm_colorop_set_next_property(struct drm_colorop *colorop, struct drm_colorop *next)
+{
+	if (!colorop->next_property)
+		return;
+
+	drm_object_property_set_value(&colorop->base,
+				      colorop->next_property,
+				      next->base.id);
+}
+EXPORT_SYMBOL(drm_colorop_set_next_property);
diff --git a/include/drm/drm_colorop.h b/include/drm/drm_colorop.h
index 69636f6752a0..1ddd0e65fe36 100644
--- a/include/drm/drm_colorop.h
+++ b/include/drm/drm_colorop.h
@@ -162,10 +162,20 @@ struct drm_colorop {
 	 */
 	struct drm_property *curve_1d_type_property;
 
+	/**
+	 * @next_property
+	 *
+	 * Read-only property to next colorop in the pipeline
+	 */
+	struct drm_property *next_property;
+
 };
 
 #define obj_to_colorop(x) container_of(x, struct drm_colorop, base)
 
+
+
+
 /**
  * drm_crtc_find - look up a Colorop object from its ID
  * @dev: DRM device
@@ -212,5 +222,7 @@ static inline unsigned int drm_colorop_index(const struct drm_colorop *colorop)
 #define drm_for_each_colorop(colorop, dev) \
 	list_for_each_entry(colorop, &(dev)->mode_config.colorop_list, head)
 
+void drm_colorop_set_next_property(struct drm_colorop *colorop, struct drm_colorop *next);
+
 
 #endif /* __DRM_COLOROP_H__ */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 12/17] drm/colorop: Add atomic state print for drm_colorop
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (10 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 11/17] drm/colorop: Add NEXT property Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 13/17] drm/colorop: Add new IOCTLs to retrieve drm_colorop objects Harry Wentland
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_atomic.c | 29 +++++++++++++++++++++++++++++
 include/drm/drm_colorop.h    |  5 +++++
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 524bec520287..15bd18c9e2be 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -792,6 +792,19 @@ static int drm_atomic_plane_check(const struct drm_plane_state *old_plane_state,
 	return 0;
 }
 
+
+
+static void drm_atomic_colorop_print_state(struct drm_printer *p,
+		const struct drm_colorop_state *state)
+{
+	struct drm_colorop *colorop = state->colorop;
+
+	drm_printf(p, "colorop[%u]:\n", colorop->base.id);
+	drm_printf(p, "\ttype=%s\n", drm_get_colorop_type_name(colorop->type));
+	drm_printf(p, "\tbypass=%u\n", state->bypass);
+	drm_printf(p, "\tcurve_1d_type=%s\n", drm_get_colorop_curve_1d_type_name(state->curve_1d_type));
+}
+
 static void drm_atomic_plane_print_state(struct drm_printer *p,
 		const struct drm_plane_state *state)
 {
@@ -812,6 +825,13 @@ static void drm_atomic_plane_print_state(struct drm_printer *p,
 		   drm_get_color_encoding_name(state->color_encoding));
 	drm_printf(p, "\tcolor-range=%s\n",
 		   drm_get_color_range_name(state->color_range));
+#if 0
+	drm_printf(p, "\tcolor-pipeline=%s\n",
+		   drm_get_color_pipeline_name(state->color_pipeline));
+#else
+	drm_printf(p, "\tcolor-pipeline=%d\n",
+		   state->color_pipeline ? state->color_pipeline->base.id : 0);
+#endif
 
 	if (plane->funcs->atomic_print_state)
 		plane->funcs->atomic_print_state(p, state);
@@ -1848,6 +1868,7 @@ static void __drm_state_dump(struct drm_device *dev, struct drm_printer *p,
 			     bool take_locks)
 {
 	struct drm_mode_config *config = &dev->mode_config;
+	struct drm_colorop *colorop;
 	struct drm_plane *plane;
 	struct drm_crtc *crtc;
 	struct drm_connector *connector;
@@ -1856,6 +1877,14 @@ static void __drm_state_dump(struct drm_device *dev, struct drm_printer *p,
 	if (!drm_drv_uses_atomic_modeset(dev))
 		return;
 
+	list_for_each_entry(colorop, &config->colorop_list, head) {
+		if (take_locks)
+			drm_modeset_lock(&colorop->plane->mutex, NULL);
+		drm_atomic_colorop_print_state(p, colorop->state);
+		if (take_locks)
+			drm_modeset_unlock(&colorop->plane->mutex);
+	}
+
 	list_for_each_entry(plane, &config->plane_list, head) {
 		if (take_locks)
 			drm_modeset_lock(&plane->mutex, NULL);
diff --git a/include/drm/drm_colorop.h b/include/drm/drm_colorop.h
index 1ddd0e65fe36..622a671d2458 100644
--- a/include/drm/drm_colorop.h
+++ b/include/drm/drm_colorop.h
@@ -222,6 +222,11 @@ static inline unsigned int drm_colorop_index(const struct drm_colorop *colorop)
 #define drm_for_each_colorop(colorop, dev) \
 	list_for_each_entry(colorop, &(dev)->mode_config.colorop_list, head)
 
+const char *drm_get_color_pipeline_name(struct drm_colorop *colorop);
+
+const char *drm_get_colorop_type_name(enum drm_colorop_type type);
+const char *drm_get_colorop_curve_1d_type_name(enum drm_colorop_curve_1d_type type);
+
 void drm_colorop_set_next_property(struct drm_colorop *colorop, struct drm_colorop *next);
 
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 13/17] drm/colorop: Add new IOCTLs to retrieve drm_colorop objects
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (11 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 12/17] drm/colorop: Add atomic state print for drm_colorop Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 14/17] drm/plane: Add COLOR PIPELINE property Harry Wentland
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Since we created a new DRM object we need new IOCTLs (and
new libdrm functions) to retrieve those objects.

TODO: Can we make these IOCTLs and libdrm functions generic
to allow for new DRM objects in the future without the need
for new IOCTLs and libdrm functions?

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_colorop.c       | 51 +++++++++++++++++++++++++++++
 drivers/gpu/drm/drm_crtc_internal.h |  4 +++
 drivers/gpu/drm/drm_ioctl.c         |  5 +++
 include/uapi/drm/drm_mode.h         | 21 ++++++++++++
 4 files changed, 81 insertions(+)

diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
index bc1250718baf..1afd5fbe8776 100644
--- a/drivers/gpu/drm/drm_colorop.c
+++ b/drivers/gpu/drm/drm_colorop.c
@@ -32,6 +32,57 @@
 
 /* TODO big colorop doc, including properties, etc. */
 
+/* IOCTLs */
+
+int drm_mode_getcolorop_res(struct drm_device *dev, void *data,
+			    struct drm_file *file_priv)
+{
+	struct drm_mode_get_colorop_res *colorop_resp = data;
+	struct drm_colorop *colorop;
+	uint32_t __user *colorop_ptr;
+	int count = 0;
+
+	if (!drm_core_check_feature(dev, DRIVER_MODESET))
+		return -EOPNOTSUPP;
+
+	colorop_ptr = u64_to_user_ptr(colorop_resp->colorop_id_ptr);
+
+	/*
+	 * This ioctl is called twice, once to determine how much space is
+	 * needed, and the 2nd time to fill it.
+	 */
+	drm_for_each_colorop(colorop, dev) {
+		if (drm_lease_held(file_priv, colorop->base.id)) {
+			if (count < colorop_resp->count_colorops &&
+			    put_user(colorop->base.id, colorop_ptr + count))
+				return -EFAULT;
+			count++;
+		}
+	}
+	colorop_resp->count_colorops = count;
+
+	return 0;
+}
+
+int drm_mode_getcolorop(struct drm_device *dev, void *data,
+		        struct drm_file *file_priv)
+{
+	struct drm_mode_get_colorop *colorop_resp = data;
+	struct drm_colorop *colorop;
+
+	if (!drm_core_check_feature(dev, DRIVER_MODESET))
+		return -EOPNOTSUPP;
+
+	colorop = drm_colorop_find(dev, file_priv, colorop_resp->colorop_id);
+	if (!colorop)
+		return -ENOENT;
+
+	colorop_resp->colorop_id = colorop->base.id;
+	colorop_resp->plane_id = colorop->plane ? colorop->plane->base.id : 0;
+
+	return 0;
+}
+
 static const struct drm_prop_enum_list drm_colorop_type_enum_list[] = {
 	{ DRM_COLOROP_1D_CURVE, "1D Curve" },
 };
diff --git a/drivers/gpu/drm/drm_crtc_internal.h b/drivers/gpu/drm/drm_crtc_internal.h
index 8556c3b3ff88..252cd7e607e3 100644
--- a/drivers/gpu/drm/drm_crtc_internal.h
+++ b/drivers/gpu/drm/drm_crtc_internal.h
@@ -278,6 +278,10 @@ int drm_mode_getplane(struct drm_device *dev,
 		      void *data, struct drm_file *file_priv);
 int drm_mode_setplane(struct drm_device *dev,
 		      void *data, struct drm_file *file_priv);
+int drm_mode_getcolorop_res(struct drm_device *dev, void *data,
+			    struct drm_file *file_priv);
+int drm_mode_getcolorop(struct drm_device *dev, void *data,
+		        struct drm_file *file_priv);
 int drm_mode_cursor_ioctl(struct drm_device *dev,
 			  void *data, struct drm_file *file_priv);
 int drm_mode_cursor2_ioctl(struct drm_device *dev,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 77590b0f38fa..8a4b7d8d8a0b 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -717,6 +717,11 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_GET_LEASE, drm_mode_get_lease_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF(DRM_IOCTL_MODE_REVOKE_LEASE, drm_mode_revoke_lease_ioctl, DRM_MASTER),
+
+	DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETCOLOROPRESOURCES, drm_mode_getcolorop_res, 0),
+	/* TODO do we need GETCOLOROP? */
+	DRM_IOCTL_DEF(DRM_IOCTL_MODE_GETCOLOROP, drm_mode_getcolorop, 0),
+
 };
 
 #define DRM_CORE_IOCTL_COUNT	ARRAY_SIZE(drm_ioctls)
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 009a800676ac..5c71eb011181 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -357,6 +357,27 @@ struct drm_mode_get_plane {
 	__u64 format_type_ptr;
 };
 
+struct drm_mode_get_colorop_res {
+	__u64 colorop_id_ptr;
+	__u32 count_colorops;
+};
+
+
+/**
+ * struct drm_mode_get_colorop - Get colorop metadata.
+ *
+ * Userspace can perform a GETCOLOROP ioctl to retrieve information about a
+ * colorop.
+ */
+struct drm_mode_get_colorop {
+	/**
+	 * @colorop_id: Object ID of the colorop whose information should be
+	 * retrieved. Set by caller.
+	 */
+	__u32 colorop_id;
+	__u32 plane_id;
+};
+
 struct drm_mode_get_plane_res {
 	__u64 plane_id_ptr;
 	__u32 count_planes;
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 14/17] drm/plane: Add COLOR PIPELINE property
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (12 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 13/17] drm/colorop: Add new IOCTLs to retrieve drm_colorop objects Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 15/17] drm/colorop: Add NEXT to colorop state print Harry Wentland
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

We're adding a new enum COLOR PIPELINE property. This
property will have entries for each COLOR PIPELINE by
referencing the DRM object ID of the first drm_colorop
of the pipeline. 0 disables the entire COLOR PIPELINE.

Userspace can use this to discover the available color
pipelines, as well as set the desired one. The color
pipelines are programmed via properties on the actual
drm_colorop objects.

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_atomic.c              | 46 +++++++++++++++++++++++
 drivers/gpu/drm/drm_atomic_state_helper.c |  5 +++
 drivers/gpu/drm/drm_atomic_uapi.c         | 44 ++++++++++++++++++++++
 include/drm/drm_atomic_uapi.h             |  2 +
 include/drm/drm_plane.h                   |  8 ++++
 5 files changed, 105 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 15bd18c9e2be..781bd3aa1849 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -1472,6 +1472,52 @@ drm_atomic_add_affected_planes(struct drm_atomic_state *state,
 }
 EXPORT_SYMBOL(drm_atomic_add_affected_planes);
 
+/**
+ * drm_atomic_add_affected_colorops - add colorops for plane
+ * @state: atomic state
+ * @plane: DRM plane
+ *
+ * This function walks the current configuration and adds all colorops
+ * currently used by @plane to the atomic configuration @state. This is useful
+ * when an atomic commit also needs to check all currently enabled colorop on
+ * @plane, e.g. when changing the mode. It's also useful when re-enabling a plane
+ * to avoid special code to force-enable all colorops.
+ *
+ * Since acquiring a colorop state will always also acquire the w/w mutex of the
+ * current plane for that colorop (if there is any) adding all the colorop states for
+ * a plane will not reduce parallelism of atomic updates.
+ *
+ * Returns:
+ * 0 on success or can fail with -EDEADLK or -ENOMEM. When the error is EDEADLK
+ * then the w/w mutex code has detected a deadlock and the entire atomic
+ * sequence must be restarted. All other errors are fatal.
+ */
+int
+drm_atomic_add_affected_colorops(struct drm_atomic_state *state,
+				 struct drm_plane *plane)
+{
+	struct drm_colorop *colorop;
+	struct drm_colorop_state *colorop_state;
+
+	WARN_ON(!drm_atomic_get_new_plane_state(state, plane));
+
+	drm_dbg_atomic(plane->dev,
+		       "Adding all current colorops for [plane:%d:%s] to %p\n",
+		       plane->base.id, plane->name, state);
+
+	drm_for_each_colorop(colorop, plane->dev) {
+		if (colorop->plane != plane)
+			continue;
+
+		colorop_state = drm_atomic_get_colorop_state(state, colorop);
+		if (IS_ERR(colorop_state))
+			return PTR_ERR(colorop_state);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(drm_atomic_add_affected_colorops);
+
 /**
  * drm_atomic_check_only - check whether a given config would work
  * @state: atomic configuration to check
diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c b/drivers/gpu/drm/drm_atomic_state_helper.c
index 784e63d70a42..3c5f2c8e33d0 100644
--- a/drivers/gpu/drm/drm_atomic_state_helper.c
+++ b/drivers/gpu/drm/drm_atomic_state_helper.c
@@ -267,6 +267,11 @@ void __drm_atomic_helper_plane_state_reset(struct drm_plane_state *plane_state,
 			plane_state->color_range = val;
 	}
 
+	if (plane->color_pipeline_property) {
+		/* default is always NULL, i.e., bypass */
+		plane_state->color_pipeline = NULL;
+	}
+
 	if (plane->zpos_property) {
 		if (!drm_object_property_get_default_value(&plane->base,
 							   plane->zpos_property,
diff --git a/drivers/gpu/drm/drm_atomic_uapi.c b/drivers/gpu/drm/drm_atomic_uapi.c
index a8f7a8a6639a..c6629fdaa114 100644
--- a/drivers/gpu/drm/drm_atomic_uapi.c
+++ b/drivers/gpu/drm/drm_atomic_uapi.c
@@ -256,6 +256,38 @@ drm_atomic_set_fb_for_plane(struct drm_plane_state *plane_state,
 }
 EXPORT_SYMBOL(drm_atomic_set_fb_for_plane);
 
+
+/**
+ * drm_atomic_set_colorop_for_plane - set colorop for plane
+ * @plane_state: atomic state object for the plane
+ * @colorop: colorop to use for the plane
+ *
+ * Changing the assigned framebuffer for a plane requires us to grab a reference
+ * to the new fb and drop the reference to the old fb, if there is one. This
+ * function takes care of all these details besides updating the pointer in the
+ * state object itself.
+ */
+void
+drm_atomic_set_colorop_for_plane(struct drm_plane_state *plane_state,
+				 struct drm_colorop *colorop)
+{
+	struct drm_plane *plane = plane_state->plane;
+
+	if (colorop)
+		drm_dbg_atomic(plane->dev,
+			       "Set [COLOROP:%d] for [PLANE:%d:%s] state %p\n",
+			       colorop->base.id, plane->base.id, plane->name,
+			       plane_state);
+	else
+		drm_dbg_atomic(plane->dev,
+			       "Set [NOCOLOROP] for [PLANE:%d:%s] state %p\n",
+			       plane->base.id, plane->name, plane_state);
+
+	plane_state->color_pipeline = colorop;
+}
+EXPORT_SYMBOL(drm_atomic_set_colorop_for_plane);
+
+
 /**
  * drm_atomic_set_crtc_for_connector - set CRTC for connector
  * @conn_state: atomic state object for the connector
@@ -581,6 +613,16 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane,
 		state->color_encoding = val;
 	} else if (property == plane->color_range_property) {
 		state->color_range = val;
+	} else if (property == plane->color_pipeline_property) {
+		/* find DRM colorop object */
+		struct drm_colorop *colorop = NULL;
+		colorop = drm_colorop_find(dev, file_priv, val);
+
+		if (val && !colorop)
+			return -EACCES;
+
+		/* set it on drm_plane_state */
+		drm_atomic_set_colorop_for_plane(state, colorop);
 	} else if (property == config->prop_fb_damage_clips) {
 		ret = drm_atomic_replace_property_blob_from_id(dev,
 					&state->fb_damage_clips,
@@ -647,6 +689,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
 		*val = state->color_encoding;
 	} else if (property == plane->color_range_property) {
 		*val = state->color_range;
+	} else if (property == plane->color_pipeline_property) {
+		*val = (state->color_pipeline) ? state->color_pipeline->base.id : 0;
 	} else if (property == config->prop_fb_damage_clips) {
 		*val = (state->fb_damage_clips) ?
 			state->fb_damage_clips->base.id : 0;
diff --git a/include/drm/drm_atomic_uapi.h b/include/drm/drm_atomic_uapi.h
index 70a115d523cd..436315523326 100644
--- a/include/drm/drm_atomic_uapi.h
+++ b/include/drm/drm_atomic_uapi.h
@@ -50,6 +50,8 @@ drm_atomic_set_crtc_for_plane(struct drm_plane_state *plane_state,
 			      struct drm_crtc *crtc);
 void drm_atomic_set_fb_for_plane(struct drm_plane_state *plane_state,
 				 struct drm_framebuffer *fb);
+void drm_atomic_set_colorop_for_plane(struct drm_plane_state *plane_state,
+				      struct drm_colorop *colorop);
 int __must_check
 drm_atomic_set_crtc_for_connector(struct drm_connector_state *conn_state,
 				  struct drm_crtc *crtc);
diff --git a/include/drm/drm_plane.h b/include/drm/drm_plane.h
index 57bbd0cd73a9..e65074f266c0 100644
--- a/include/drm/drm_plane.h
+++ b/include/drm/drm_plane.h
@@ -745,6 +745,14 @@ struct drm_plane {
 	 */
 	struct drm_property *color_range_property;
 
+	/**
+	 * @color_pipeline_property:
+	 *
+	 * Optional "COLOR_PIPELINE" enum property for specifying
+	 * a color pipeline to use on the plane.
+	 */
+	struct drm_property *color_pipeline_property;
+
 	/**
 	 * @scaling_filter_property: property to apply a particular filter while
 	 * scaling.
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 15/17] drm/colorop: Add NEXT to colorop state print
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (13 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 14/17] drm/plane: Add COLOR PIPELINE property Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 16/17] drm/vkms: Add enumerated 1D curve colorop Harry Wentland
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/drm_atomic.c  |  1 +
 drivers/gpu/drm/drm_colorop.c | 42 +++++++++++++++++++++++++++++++++++
 include/drm/drm_colorop.h     |  2 ++
 3 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 781bd3aa1849..cfe9199a15d2 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -803,6 +803,7 @@ static void drm_atomic_colorop_print_state(struct drm_printer *p,
 	drm_printf(p, "\ttype=%s\n", drm_get_colorop_type_name(colorop->type));
 	drm_printf(p, "\tbypass=%u\n", state->bypass);
 	drm_printf(p, "\tcurve_1d_type=%s\n", drm_get_colorop_curve_1d_type_name(state->curve_1d_type));
+	drm_printf(p, "\tnext=%d\n", drm_colorop_get_next_property(colorop));
 }
 
 static void drm_atomic_plane_print_state(struct drm_printer *p,
diff --git a/drivers/gpu/drm/drm_colorop.c b/drivers/gpu/drm/drm_colorop.c
index 1afd5fbe8776..ff6f938cc28c 100644
--- a/drivers/gpu/drm/drm_colorop.c
+++ b/drivers/gpu/drm/drm_colorop.c
@@ -340,3 +340,45 @@ void drm_colorop_set_next_property(struct drm_colorop *colorop, struct drm_color
 				      next->base.id);
 }
 EXPORT_SYMBOL(drm_colorop_set_next_property);
+
+/**
+ * drm_colorop_set_next_property - gets the next colorop ID
+ * @colorop: drm colorop
+ *
+ * Returns:
+ * The DRM object ID of the next colorop
+ */
+uint32_t drm_colorop_get_next_property(struct drm_colorop *colorop)
+{
+	uint64_t next_id = 0;
+
+	if (!colorop->next_property)
+		return 0;
+
+	drm_object_property_get_value(&colorop->base,
+				      colorop->next_property,
+				      &next_id);
+
+	return (uint32_t) next_id;
+}
+EXPORT_SYMBOL(drm_colorop_get_next_property);
+
+
+/**
+ * drm_colorop_set_next_property - gets the next colorop ID
+ * @colorop: drm colorop
+ *
+ * Returns:
+ * The DRM object ID of the next colorop
+ */
+struct drm_colorop *drm_colorop_get_next(struct drm_colorop *colorop)
+{
+	uint64_t next_id = drm_colorop_get_next_property(colorop);
+
+	if (!next_id)
+		return NULL;
+
+	return drm_colorop_find(colorop->dev, NULL, next_id);
+
+}
+EXPORT_SYMBOL(drm_colorop_get_next);
\ No newline at end of file
diff --git a/include/drm/drm_colorop.h b/include/drm/drm_colorop.h
index 622a671d2458..2ba506a0ea4d 100644
--- a/include/drm/drm_colorop.h
+++ b/include/drm/drm_colorop.h
@@ -228,6 +228,8 @@ const char *drm_get_colorop_type_name(enum drm_colorop_type type);
 const char *drm_get_colorop_curve_1d_type_name(enum drm_colorop_curve_1d_type type);
 
 void drm_colorop_set_next_property(struct drm_colorop *colorop, struct drm_colorop *next);
+uint32_t drm_colorop_get_next_property(struct drm_colorop *colorop);
+struct drm_colorop *drm_colorop_get_next(struct drm_colorop *colorop);
 
 
 #endif /* __DRM_COLOROP_H__ */
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 16/17] drm/vkms: Add enumerated 1D curve colorop
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (14 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 15/17] drm/colorop: Add NEXT to colorop state print Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-10-19 21:21 ` [RFC PATCH v2 17/17] drm/vkms: Add kunit tests for linear and sRGB LUTs Harry Wentland
  2023-11-08 11:54 ` [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Shankar, Uma
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

This patch introduces a VKMS color pipeline that includes two
drm_colorops for named transfer functions. For now the only ones
supported are sRGB EOTF, sRGB Inverse EOTF, and a Linear TF.
We will expand this in the future but I don't want to do so
without accompanying IGT tests.

We introduce a new vkms_luts.c file that hard-codes sRGB EOTF,
sRGB Inverse EOTF, and a linear EOTF LUT. These have been
generated with 256 entries each as IGT is currently testing
only 8 bpc surfaces. We will likely need higher precision
but I'm reluctant to make that change without clear indication
that we need it. We'll revisit and, if necessary, regenerate
the LUTs when we have IGT tests for higher precision buffers.

v2:
 - Add commit description
 - Fix sRGB EOTF LUT definition
 - Add linear and sRGB inverse EOTF LUTs

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/vkms/Makefile        |   4 +-
 drivers/gpu/drm/vkms/vkms_colorop.c  |  85 +++
 drivers/gpu/drm/vkms/vkms_composer.c |  46 ++
 drivers/gpu/drm/vkms/vkms_drv.h      |   4 +
 drivers/gpu/drm/vkms/vkms_luts.c     | 802 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_luts.h     |  12 +
 drivers/gpu/drm/vkms/vkms_plane.c    |   2 +
 7 files changed, 954 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/vkms/vkms_colorop.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_luts.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_luts.h

diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index d3440f228f46..eb208f3e6780 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -6,7 +6,9 @@ vkms-y := \
 	vkms_formats.o \
 	vkms_crtc.o \
 	vkms_composer.o \
-	vkms_writeback.o
+	vkms_writeback.o \
+	vkms_colorop.o \
+	vkms_luts.o
 
 obj-$(CONFIG_DRM_VKMS) += vkms.o
 
diff --git a/drivers/gpu/drm/vkms/vkms_colorop.c b/drivers/gpu/drm/vkms/vkms_colorop.c
new file mode 100644
index 000000000000..9a26b9fdc4a2
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_colorop.c
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#include <linux/slab.h>
+#include <drm/drm_colorop.h>
+#include <drm/drm_print.h>
+#include <drm/drm_property.h>
+#include <drm/drm_plane.h>
+
+#define MAX_COLOR_PIPELINES 5
+
+const int vkms_initialize_tf_pipeline(struct drm_plane *plane, struct drm_prop_enum_list *list)
+{
+
+	struct drm_colorop *op, *prev_op;
+	struct drm_device *dev = plane->dev;
+	int ret;
+
+	/* 1st op: 1d curve */
+	op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL);
+	if (!op) {
+		DRM_ERROR("KMS: Failed to allocate colorop\n");
+		return -ENOMEM;
+	}
+
+	ret = drm_colorop_init(dev, op, plane, DRM_COLOROP_1D_CURVE);
+	if (ret)
+		return ret;
+
+	list->type = op->base.id;
+	list->name = kasprintf(GFP_KERNEL, "Color Pipeline %d", op->base.id);
+
+	prev_op = op;
+
+	/* 2nd op: 1d curve */
+	op = kzalloc(sizeof(struct drm_colorop), GFP_KERNEL);
+	if (!op) {
+		DRM_ERROR("KMS: Failed to allocate colorop\n");
+		return -ENOMEM;
+	}
+
+	ret = drm_colorop_init(dev, op, plane, DRM_COLOROP_1D_CURVE);
+	if (ret)
+		return ret;
+
+	drm_colorop_set_next_property(prev_op, op);
+
+	return 0;
+}
+
+int vkms_initialize_colorops(struct drm_plane *plane)
+{
+	struct drm_device *dev = plane->dev;
+	struct drm_property *prop;
+	struct drm_prop_enum_list pipelines[MAX_COLOR_PIPELINES];
+	int len = 0;
+	int ret;
+
+	/* Add "Bypass" (i.e. NULL) pipeline */
+	pipelines[len].type = 0;
+	pipelines[len].name = "Bypass";
+	len++;
+
+	/* Add pipeline consisting of transfer functions */
+	ret = vkms_initialize_tf_pipeline(plane, &(pipelines[len]));
+	if (ret)
+		return ret;
+	len++;
+
+	/* Create COLOR_PIPELINE property and attach */
+	prop = drm_property_create_enum(dev, DRM_MODE_PROP_ATOMIC,
+					"COLOR_PIPELINE",
+					pipelines, len);
+	if (!prop)
+		return -ENOMEM;
+
+	plane->color_pipeline_property = prop;
+
+	drm_object_attach_property(&plane->base, prop, 0);
+
+	/* TODO do we even need this? */
+	if (plane->state)
+		plane->state->color_pipeline = NULL;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index cf1dff162920..73b7d5e94021 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -12,6 +12,8 @@
 #include <linux/minmax.h>
 
 #include "vkms_drv.h"
+#include "vkms_composer.h"
+#include "vkms_luts.h"
 
 static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
 {
@@ -163,6 +165,47 @@ static void apply_lut(const struct vkms_crtc_state *crtc_state, struct line_buff
 	}
 }
 
+static void pre_blend_color_transform(const struct vkms_plane_state *plane_state, struct line_buffer *output_buffer)
+{
+	struct drm_colorop *colorop = plane_state->base.base.color_pipeline;
+
+	while (colorop) {
+		struct drm_colorop_state *colorop_state;
+
+		if (!colorop)
+			return;
+
+		/* TODO this is probably wrong */
+		colorop_state = colorop->state;
+
+		if (!colorop_state)
+			return;
+
+		for (size_t x = 0; x < output_buffer->n_pixels; x++) {
+			struct pixel_argb_u16 *pixel = &output_buffer->pixels[x];
+
+			if (colorop->type == DRM_COLOROP_1D_CURVE &&
+				colorop_state->bypass == false) {
+				switch (colorop_state->curve_1d_type) {
+					case DRM_COLOROP_1D_CURVE_SRGB_INV_EOTF:
+						pixel->r = apply_lut_to_channel_value(&srgb_inv_eotf, pixel->r, LUT_RED);
+						pixel->g = apply_lut_to_channel_value(&srgb_inv_eotf, pixel->g, LUT_GREEN);
+						pixel->b = apply_lut_to_channel_value(&srgb_inv_eotf, pixel->b, LUT_BLUE);
+						break;
+					case DRM_COLOROP_1D_CURVE_SRGB_EOTF:
+					default:
+						pixel->r = apply_lut_to_channel_value(&srgb_eotf, pixel->r, LUT_RED);
+						pixel->g = apply_lut_to_channel_value(&srgb_eotf, pixel->g, LUT_GREEN);
+						pixel->b = apply_lut_to_channel_value(&srgb_eotf, pixel->b, LUT_BLUE);
+						break;
+				}
+			}
+		}
+
+		colorop = drm_colorop_get_next(colorop);
+	}
+}
+
 /**
  * blend - blend the pixels from all planes and compute crc
  * @wb: The writeback frame buffer metadata
@@ -200,6 +243,9 @@ static void blend(struct vkms_writeback_job *wb,
 				continue;
 
 			vkms_compose_row(stage_buffer, plane[i], y_pos);
+
+			pre_blend_color_transform(plane[i], stage_buffer);
+
 			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
 					    output_buffer);
 		}
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 8f5710debb1e..2bcc24c196a2 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -170,4 +170,8 @@ void vkms_writeback_row(struct vkms_writeback_job *wb, const struct line_buffer
 /* Writeback */
 int vkms_enable_writeback_connector(struct vkms_device *vkmsdev);
 
+/* Colorops */
+int vkms_initialize_colorops(struct drm_plane *plane);
+
+
 #endif /* _VKMS_DRV_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_luts.c b/drivers/gpu/drm/vkms/vkms_luts.c
new file mode 100644
index 000000000000..6553d6d442b4
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_luts.c
@@ -0,0 +1,802 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#include <drm/drm_mode.h>
+
+#include "vkms_drv.h"
+#include "vkms_luts.h"
+
+static struct drm_color_lut linear_array[LUT_SIZE] = {
+	{ 0x0, 0x0, 0x0, 0 },
+	{ 0x101, 0x101, 0x101, 0 },
+	{ 0x202, 0x202, 0x202, 0 },
+	{ 0x303, 0x303, 0x303, 0 },
+	{ 0x404, 0x404, 0x404, 0 },
+	{ 0x505, 0x505, 0x505, 0 },
+	{ 0x606, 0x606, 0x606, 0 },
+	{ 0x707, 0x707, 0x707, 0 },
+	{ 0x808, 0x808, 0x808, 0 },
+	{ 0x909, 0x909, 0x909, 0 },
+	{ 0xa0a, 0xa0a, 0xa0a, 0 },
+	{ 0xb0b, 0xb0b, 0xb0b, 0 },
+	{ 0xc0c, 0xc0c, 0xc0c, 0 },
+	{ 0xd0d, 0xd0d, 0xd0d, 0 },
+	{ 0xe0e, 0xe0e, 0xe0e, 0 },
+	{ 0xf0f, 0xf0f, 0xf0f, 0 },
+	{ 0x1010, 0x1010, 0x1010, 0 },
+	{ 0x1111, 0x1111, 0x1111, 0 },
+	{ 0x1212, 0x1212, 0x1212, 0 },
+	{ 0x1313, 0x1313, 0x1313, 0 },
+	{ 0x1414, 0x1414, 0x1414, 0 },
+	{ 0x1515, 0x1515, 0x1515, 0 },
+	{ 0x1616, 0x1616, 0x1616, 0 },
+	{ 0x1717, 0x1717, 0x1717, 0 },
+	{ 0x1818, 0x1818, 0x1818, 0 },
+	{ 0x1919, 0x1919, 0x1919, 0 },
+	{ 0x1a1a, 0x1a1a, 0x1a1a, 0 },
+	{ 0x1b1b, 0x1b1b, 0x1b1b, 0 },
+	{ 0x1c1c, 0x1c1c, 0x1c1c, 0 },
+	{ 0x1d1d, 0x1d1d, 0x1d1d, 0 },
+	{ 0x1e1e, 0x1e1e, 0x1e1e, 0 },
+	{ 0x1f1f, 0x1f1f, 0x1f1f, 0 },
+	{ 0x2020, 0x2020, 0x2020, 0 },
+	{ 0x2121, 0x2121, 0x2121, 0 },
+	{ 0x2222, 0x2222, 0x2222, 0 },
+	{ 0x2323, 0x2323, 0x2323, 0 },
+	{ 0x2424, 0x2424, 0x2424, 0 },
+	{ 0x2525, 0x2525, 0x2525, 0 },
+	{ 0x2626, 0x2626, 0x2626, 0 },
+	{ 0x2727, 0x2727, 0x2727, 0 },
+	{ 0x2828, 0x2828, 0x2828, 0 },
+	{ 0x2929, 0x2929, 0x2929, 0 },
+	{ 0x2a2a, 0x2a2a, 0x2a2a, 0 },
+	{ 0x2b2b, 0x2b2b, 0x2b2b, 0 },
+	{ 0x2c2c, 0x2c2c, 0x2c2c, 0 },
+	{ 0x2d2d, 0x2d2d, 0x2d2d, 0 },
+	{ 0x2e2e, 0x2e2e, 0x2e2e, 0 },
+	{ 0x2f2f, 0x2f2f, 0x2f2f, 0 },
+	{ 0x3030, 0x3030, 0x3030, 0 },
+	{ 0x3131, 0x3131, 0x3131, 0 },
+	{ 0x3232, 0x3232, 0x3232, 0 },
+	{ 0x3333, 0x3333, 0x3333, 0 },
+	{ 0x3434, 0x3434, 0x3434, 0 },
+	{ 0x3535, 0x3535, 0x3535, 0 },
+	{ 0x3636, 0x3636, 0x3636, 0 },
+	{ 0x3737, 0x3737, 0x3737, 0 },
+	{ 0x3838, 0x3838, 0x3838, 0 },
+	{ 0x3939, 0x3939, 0x3939, 0 },
+	{ 0x3a3a, 0x3a3a, 0x3a3a, 0 },
+	{ 0x3b3b, 0x3b3b, 0x3b3b, 0 },
+	{ 0x3c3c, 0x3c3c, 0x3c3c, 0 },
+	{ 0x3d3d, 0x3d3d, 0x3d3d, 0 },
+	{ 0x3e3e, 0x3e3e, 0x3e3e, 0 },
+	{ 0x3f3f, 0x3f3f, 0x3f3f, 0 },
+	{ 0x4040, 0x4040, 0x4040, 0 },
+	{ 0x4141, 0x4141, 0x4141, 0 },
+	{ 0x4242, 0x4242, 0x4242, 0 },
+	{ 0x4343, 0x4343, 0x4343, 0 },
+	{ 0x4444, 0x4444, 0x4444, 0 },
+	{ 0x4545, 0x4545, 0x4545, 0 },
+	{ 0x4646, 0x4646, 0x4646, 0 },
+	{ 0x4747, 0x4747, 0x4747, 0 },
+	{ 0x4848, 0x4848, 0x4848, 0 },
+	{ 0x4949, 0x4949, 0x4949, 0 },
+	{ 0x4a4a, 0x4a4a, 0x4a4a, 0 },
+	{ 0x4b4b, 0x4b4b, 0x4b4b, 0 },
+	{ 0x4c4c, 0x4c4c, 0x4c4c, 0 },
+	{ 0x4d4d, 0x4d4d, 0x4d4d, 0 },
+	{ 0x4e4e, 0x4e4e, 0x4e4e, 0 },
+	{ 0x4f4f, 0x4f4f, 0x4f4f, 0 },
+	{ 0x5050, 0x5050, 0x5050, 0 },
+	{ 0x5151, 0x5151, 0x5151, 0 },
+	{ 0x5252, 0x5252, 0x5252, 0 },
+	{ 0x5353, 0x5353, 0x5353, 0 },
+	{ 0x5454, 0x5454, 0x5454, 0 },
+	{ 0x5555, 0x5555, 0x5555, 0 },
+	{ 0x5656, 0x5656, 0x5656, 0 },
+	{ 0x5757, 0x5757, 0x5757, 0 },
+	{ 0x5858, 0x5858, 0x5858, 0 },
+	{ 0x5959, 0x5959, 0x5959, 0 },
+	{ 0x5a5a, 0x5a5a, 0x5a5a, 0 },
+	{ 0x5b5b, 0x5b5b, 0x5b5b, 0 },
+	{ 0x5c5c, 0x5c5c, 0x5c5c, 0 },
+	{ 0x5d5d, 0x5d5d, 0x5d5d, 0 },
+	{ 0x5e5e, 0x5e5e, 0x5e5e, 0 },
+	{ 0x5f5f, 0x5f5f, 0x5f5f, 0 },
+	{ 0x6060, 0x6060, 0x6060, 0 },
+	{ 0x6161, 0x6161, 0x6161, 0 },
+	{ 0x6262, 0x6262, 0x6262, 0 },
+	{ 0x6363, 0x6363, 0x6363, 0 },
+	{ 0x6464, 0x6464, 0x6464, 0 },
+	{ 0x6565, 0x6565, 0x6565, 0 },
+	{ 0x6666, 0x6666, 0x6666, 0 },
+	{ 0x6767, 0x6767, 0x6767, 0 },
+	{ 0x6868, 0x6868, 0x6868, 0 },
+	{ 0x6969, 0x6969, 0x6969, 0 },
+	{ 0x6a6a, 0x6a6a, 0x6a6a, 0 },
+	{ 0x6b6b, 0x6b6b, 0x6b6b, 0 },
+	{ 0x6c6c, 0x6c6c, 0x6c6c, 0 },
+	{ 0x6d6d, 0x6d6d, 0x6d6d, 0 },
+	{ 0x6e6e, 0x6e6e, 0x6e6e, 0 },
+	{ 0x6f6f, 0x6f6f, 0x6f6f, 0 },
+	{ 0x7070, 0x7070, 0x7070, 0 },
+	{ 0x7171, 0x7171, 0x7171, 0 },
+	{ 0x7272, 0x7272, 0x7272, 0 },
+	{ 0x7373, 0x7373, 0x7373, 0 },
+	{ 0x7474, 0x7474, 0x7474, 0 },
+	{ 0x7575, 0x7575, 0x7575, 0 },
+	{ 0x7676, 0x7676, 0x7676, 0 },
+	{ 0x7777, 0x7777, 0x7777, 0 },
+	{ 0x7878, 0x7878, 0x7878, 0 },
+	{ 0x7979, 0x7979, 0x7979, 0 },
+	{ 0x7a7a, 0x7a7a, 0x7a7a, 0 },
+	{ 0x7b7b, 0x7b7b, 0x7b7b, 0 },
+	{ 0x7c7c, 0x7c7c, 0x7c7c, 0 },
+	{ 0x7d7d, 0x7d7d, 0x7d7d, 0 },
+	{ 0x7e7e, 0x7e7e, 0x7e7e, 0 },
+	{ 0x7f7f, 0x7f7f, 0x7f7f, 0 },
+	{ 0x8080, 0x8080, 0x8080, 0 },
+	{ 0x8181, 0x8181, 0x8181, 0 },
+	{ 0x8282, 0x8282, 0x8282, 0 },
+	{ 0x8383, 0x8383, 0x8383, 0 },
+	{ 0x8484, 0x8484, 0x8484, 0 },
+	{ 0x8585, 0x8585, 0x8585, 0 },
+	{ 0x8686, 0x8686, 0x8686, 0 },
+	{ 0x8787, 0x8787, 0x8787, 0 },
+	{ 0x8888, 0x8888, 0x8888, 0 },
+	{ 0x8989, 0x8989, 0x8989, 0 },
+	{ 0x8a8a, 0x8a8a, 0x8a8a, 0 },
+	{ 0x8b8b, 0x8b8b, 0x8b8b, 0 },
+	{ 0x8c8c, 0x8c8c, 0x8c8c, 0 },
+	{ 0x8d8d, 0x8d8d, 0x8d8d, 0 },
+	{ 0x8e8e, 0x8e8e, 0x8e8e, 0 },
+	{ 0x8f8f, 0x8f8f, 0x8f8f, 0 },
+	{ 0x9090, 0x9090, 0x9090, 0 },
+	{ 0x9191, 0x9191, 0x9191, 0 },
+	{ 0x9292, 0x9292, 0x9292, 0 },
+	{ 0x9393, 0x9393, 0x9393, 0 },
+	{ 0x9494, 0x9494, 0x9494, 0 },
+	{ 0x9595, 0x9595, 0x9595, 0 },
+	{ 0x9696, 0x9696, 0x9696, 0 },
+	{ 0x9797, 0x9797, 0x9797, 0 },
+	{ 0x9898, 0x9898, 0x9898, 0 },
+	{ 0x9999, 0x9999, 0x9999, 0 },
+	{ 0x9a9a, 0x9a9a, 0x9a9a, 0 },
+	{ 0x9b9b, 0x9b9b, 0x9b9b, 0 },
+	{ 0x9c9c, 0x9c9c, 0x9c9c, 0 },
+	{ 0x9d9d, 0x9d9d, 0x9d9d, 0 },
+	{ 0x9e9e, 0x9e9e, 0x9e9e, 0 },
+	{ 0x9f9f, 0x9f9f, 0x9f9f, 0 },
+	{ 0xa0a0, 0xa0a0, 0xa0a0, 0 },
+	{ 0xa1a1, 0xa1a1, 0xa1a1, 0 },
+	{ 0xa2a2, 0xa2a2, 0xa2a2, 0 },
+	{ 0xa3a3, 0xa3a3, 0xa3a3, 0 },
+	{ 0xa4a4, 0xa4a4, 0xa4a4, 0 },
+	{ 0xa5a5, 0xa5a5, 0xa5a5, 0 },
+	{ 0xa6a6, 0xa6a6, 0xa6a6, 0 },
+	{ 0xa7a7, 0xa7a7, 0xa7a7, 0 },
+	{ 0xa8a8, 0xa8a8, 0xa8a8, 0 },
+	{ 0xa9a9, 0xa9a9, 0xa9a9, 0 },
+	{ 0xaaaa, 0xaaaa, 0xaaaa, 0 },
+	{ 0xabab, 0xabab, 0xabab, 0 },
+	{ 0xacac, 0xacac, 0xacac, 0 },
+	{ 0xadad, 0xadad, 0xadad, 0 },
+	{ 0xaeae, 0xaeae, 0xaeae, 0 },
+	{ 0xafaf, 0xafaf, 0xafaf, 0 },
+	{ 0xb0b0, 0xb0b0, 0xb0b0, 0 },
+	{ 0xb1b1, 0xb1b1, 0xb1b1, 0 },
+	{ 0xb2b2, 0xb2b2, 0xb2b2, 0 },
+	{ 0xb3b3, 0xb3b3, 0xb3b3, 0 },
+	{ 0xb4b4, 0xb4b4, 0xb4b4, 0 },
+	{ 0xb5b5, 0xb5b5, 0xb5b5, 0 },
+	{ 0xb6b6, 0xb6b6, 0xb6b6, 0 },
+	{ 0xb7b7, 0xb7b7, 0xb7b7, 0 },
+	{ 0xb8b8, 0xb8b8, 0xb8b8, 0 },
+	{ 0xb9b9, 0xb9b9, 0xb9b9, 0 },
+	{ 0xbaba, 0xbaba, 0xbaba, 0 },
+	{ 0xbbbb, 0xbbbb, 0xbbbb, 0 },
+	{ 0xbcbc, 0xbcbc, 0xbcbc, 0 },
+	{ 0xbdbd, 0xbdbd, 0xbdbd, 0 },
+	{ 0xbebe, 0xbebe, 0xbebe, 0 },
+	{ 0xbfbf, 0xbfbf, 0xbfbf, 0 },
+	{ 0xc0c0, 0xc0c0, 0xc0c0, 0 },
+	{ 0xc1c1, 0xc1c1, 0xc1c1, 0 },
+	{ 0xc2c2, 0xc2c2, 0xc2c2, 0 },
+	{ 0xc3c3, 0xc3c3, 0xc3c3, 0 },
+	{ 0xc4c4, 0xc4c4, 0xc4c4, 0 },
+	{ 0xc5c5, 0xc5c5, 0xc5c5, 0 },
+	{ 0xc6c6, 0xc6c6, 0xc6c6, 0 },
+	{ 0xc7c7, 0xc7c7, 0xc7c7, 0 },
+	{ 0xc8c8, 0xc8c8, 0xc8c8, 0 },
+	{ 0xc9c9, 0xc9c9, 0xc9c9, 0 },
+	{ 0xcaca, 0xcaca, 0xcaca, 0 },
+	{ 0xcbcb, 0xcbcb, 0xcbcb, 0 },
+	{ 0xcccc, 0xcccc, 0xcccc, 0 },
+	{ 0xcdcd, 0xcdcd, 0xcdcd, 0 },
+	{ 0xcece, 0xcece, 0xcece, 0 },
+	{ 0xcfcf, 0xcfcf, 0xcfcf, 0 },
+	{ 0xd0d0, 0xd0d0, 0xd0d0, 0 },
+	{ 0xd1d1, 0xd1d1, 0xd1d1, 0 },
+	{ 0xd2d2, 0xd2d2, 0xd2d2, 0 },
+	{ 0xd3d3, 0xd3d3, 0xd3d3, 0 },
+	{ 0xd4d4, 0xd4d4, 0xd4d4, 0 },
+	{ 0xd5d5, 0xd5d5, 0xd5d5, 0 },
+	{ 0xd6d6, 0xd6d6, 0xd6d6, 0 },
+	{ 0xd7d7, 0xd7d7, 0xd7d7, 0 },
+	{ 0xd8d8, 0xd8d8, 0xd8d8, 0 },
+	{ 0xd9d9, 0xd9d9, 0xd9d9, 0 },
+	{ 0xdada, 0xdada, 0xdada, 0 },
+	{ 0xdbdb, 0xdbdb, 0xdbdb, 0 },
+	{ 0xdcdc, 0xdcdc, 0xdcdc, 0 },
+	{ 0xdddd, 0xdddd, 0xdddd, 0 },
+	{ 0xdede, 0xdede, 0xdede, 0 },
+	{ 0xdfdf, 0xdfdf, 0xdfdf, 0 },
+	{ 0xe0e0, 0xe0e0, 0xe0e0, 0 },
+	{ 0xe1e1, 0xe1e1, 0xe1e1, 0 },
+	{ 0xe2e2, 0xe2e2, 0xe2e2, 0 },
+	{ 0xe3e3, 0xe3e3, 0xe3e3, 0 },
+	{ 0xe4e4, 0xe4e4, 0xe4e4, 0 },
+	{ 0xe5e5, 0xe5e5, 0xe5e5, 0 },
+	{ 0xe6e6, 0xe6e6, 0xe6e6, 0 },
+	{ 0xe7e7, 0xe7e7, 0xe7e7, 0 },
+	{ 0xe8e8, 0xe8e8, 0xe8e8, 0 },
+	{ 0xe9e9, 0xe9e9, 0xe9e9, 0 },
+	{ 0xeaea, 0xeaea, 0xeaea, 0 },
+	{ 0xebeb, 0xebeb, 0xebeb, 0 },
+	{ 0xecec, 0xecec, 0xecec, 0 },
+	{ 0xeded, 0xeded, 0xeded, 0 },
+	{ 0xeeee, 0xeeee, 0xeeee, 0 },
+	{ 0xefef, 0xefef, 0xefef, 0 },
+	{ 0xf0f0, 0xf0f0, 0xf0f0, 0 },
+	{ 0xf1f1, 0xf1f1, 0xf1f1, 0 },
+	{ 0xf2f2, 0xf2f2, 0xf2f2, 0 },
+	{ 0xf3f3, 0xf3f3, 0xf3f3, 0 },
+	{ 0xf4f4, 0xf4f4, 0xf4f4, 0 },
+	{ 0xf5f5, 0xf5f5, 0xf5f5, 0 },
+	{ 0xf6f6, 0xf6f6, 0xf6f6, 0 },
+	{ 0xf7f7, 0xf7f7, 0xf7f7, 0 },
+	{ 0xf8f8, 0xf8f8, 0xf8f8, 0 },
+	{ 0xf9f9, 0xf9f9, 0xf9f9, 0 },
+	{ 0xfafa, 0xfafa, 0xfafa, 0 },
+	{ 0xfbfb, 0xfbfb, 0xfbfb, 0 },
+	{ 0xfcfc, 0xfcfc, 0xfcfc, 0 },
+	{ 0xfdfd, 0xfdfd, 0xfdfd, 0 },
+	{ 0xfefe, 0xfefe, 0xfefe, 0 },
+	{ 0xffff, 0xffff, 0xffff, 0 },
+};
+
+const struct vkms_color_lut linear_eotf = {
+	.base = linear_array,
+	.lut_length = LUT_SIZE,
+	.channel_value2index_ratio = 0xff00ffll
+};
+
+
+static struct drm_color_lut srgb_array[LUT_SIZE] = {
+	{ 0x0, 0x0, 0x0, 0 },
+	{ 0x13, 0x13, 0x13, 0 },
+	{ 0x27, 0x27, 0x27, 0 },
+	{ 0x3b, 0x3b, 0x3b, 0 },
+	{ 0x4f, 0x4f, 0x4f, 0 },
+	{ 0x63, 0x63, 0x63, 0 },
+	{ 0x77, 0x77, 0x77, 0 },
+	{ 0x8b, 0x8b, 0x8b, 0 },
+	{ 0x9f, 0x9f, 0x9f, 0 },
+	{ 0xb3, 0xb3, 0xb3, 0 },
+	{ 0xc6, 0xc6, 0xc6, 0 },
+	{ 0xdb, 0xdb, 0xdb, 0 },
+	{ 0xf0, 0xf0, 0xf0, 0 },
+	{ 0x107, 0x107, 0x107, 0 },
+	{ 0x11f, 0x11f, 0x11f, 0 },
+	{ 0x139, 0x139, 0x139, 0 },
+	{ 0x153, 0x153, 0x153, 0 },
+	{ 0x16f, 0x16f, 0x16f, 0 },
+	{ 0x18c, 0x18c, 0x18c, 0 },
+	{ 0x1aa, 0x1aa, 0x1aa, 0 },
+	{ 0x1ca, 0x1ca, 0x1ca, 0 },
+	{ 0x1eb, 0x1eb, 0x1eb, 0 },
+	{ 0x20d, 0x20d, 0x20d, 0 },
+	{ 0x231, 0x231, 0x231, 0 },
+	{ 0x256, 0x256, 0x256, 0 },
+	{ 0x27d, 0x27d, 0x27d, 0 },
+	{ 0x2a4, 0x2a4, 0x2a4, 0 },
+	{ 0x2ce, 0x2ce, 0x2ce, 0 },
+	{ 0x2f9, 0x2f9, 0x2f9, 0 },
+	{ 0x325, 0x325, 0x325, 0 },
+	{ 0x352, 0x352, 0x352, 0 },
+	{ 0x381, 0x381, 0x381, 0 },
+	{ 0x3b2, 0x3b2, 0x3b2, 0 },
+	{ 0x3e4, 0x3e4, 0x3e4, 0 },
+	{ 0x418, 0x418, 0x418, 0 },
+	{ 0x44d, 0x44d, 0x44d, 0 },
+	{ 0x484, 0x484, 0x484, 0 },
+	{ 0x4bc, 0x4bc, 0x4bc, 0 },
+	{ 0x4f6, 0x4f6, 0x4f6, 0 },
+	{ 0x531, 0x531, 0x531, 0 },
+	{ 0x56e, 0x56e, 0x56e, 0 },
+	{ 0x5ad, 0x5ad, 0x5ad, 0 },
+	{ 0x5ed, 0x5ed, 0x5ed, 0 },
+	{ 0x62f, 0x62f, 0x62f, 0 },
+	{ 0x672, 0x672, 0x672, 0 },
+	{ 0x6b7, 0x6b7, 0x6b7, 0 },
+	{ 0x6fe, 0x6fe, 0x6fe, 0 },
+	{ 0x746, 0x746, 0x746, 0 },
+	{ 0x791, 0x791, 0x791, 0 },
+	{ 0x7dc, 0x7dc, 0x7dc, 0 },
+	{ 0x82a, 0x82a, 0x82a, 0 },
+	{ 0x879, 0x879, 0x879, 0 },
+	{ 0x8ca, 0x8ca, 0x8ca, 0 },
+	{ 0x91d, 0x91d, 0x91d, 0 },
+	{ 0x971, 0x971, 0x971, 0 },
+	{ 0x9c7, 0x9c7, 0x9c7, 0 },
+	{ 0xa1f, 0xa1f, 0xa1f, 0 },
+	{ 0xa79, 0xa79, 0xa79, 0 },
+	{ 0xad4, 0xad4, 0xad4, 0 },
+	{ 0xb32, 0xb32, 0xb32, 0 },
+	{ 0xb91, 0xb91, 0xb91, 0 },
+	{ 0xbf2, 0xbf2, 0xbf2, 0 },
+	{ 0xc54, 0xc54, 0xc54, 0 },
+	{ 0xcb9, 0xcb9, 0xcb9, 0 },
+	{ 0xd1f, 0xd1f, 0xd1f, 0 },
+	{ 0xd88, 0xd88, 0xd88, 0 },
+	{ 0xdf2, 0xdf2, 0xdf2, 0 },
+	{ 0xe5e, 0xe5e, 0xe5e, 0 },
+	{ 0xecc, 0xecc, 0xecc, 0 },
+	{ 0xf3c, 0xf3c, 0xf3c, 0 },
+	{ 0xfad, 0xfad, 0xfad, 0 },
+	{ 0x1021, 0x1021, 0x1021, 0 },
+	{ 0x1096, 0x1096, 0x1096, 0 },
+	{ 0x110e, 0x110e, 0x110e, 0 },
+	{ 0x1187, 0x1187, 0x1187, 0 },
+	{ 0x1203, 0x1203, 0x1203, 0 },
+	{ 0x1280, 0x1280, 0x1280, 0 },
+	{ 0x12ff, 0x12ff, 0x12ff, 0 },
+	{ 0x1380, 0x1380, 0x1380, 0 },
+	{ 0x1404, 0x1404, 0x1404, 0 },
+	{ 0x1489, 0x1489, 0x1489, 0 },
+	{ 0x1510, 0x1510, 0x1510, 0 },
+	{ 0x1599, 0x1599, 0x1599, 0 },
+	{ 0x1624, 0x1624, 0x1624, 0 },
+	{ 0x16b2, 0x16b2, 0x16b2, 0 },
+	{ 0x1741, 0x1741, 0x1741, 0 },
+	{ 0x17d2, 0x17d2, 0x17d2, 0 },
+	{ 0x1865, 0x1865, 0x1865, 0 },
+	{ 0x18fb, 0x18fb, 0x18fb, 0 },
+	{ 0x1992, 0x1992, 0x1992, 0 },
+	{ 0x1a2c, 0x1a2c, 0x1a2c, 0 },
+	{ 0x1ac8, 0x1ac8, 0x1ac8, 0 },
+	{ 0x1b65, 0x1b65, 0x1b65, 0 },
+	{ 0x1c05, 0x1c05, 0x1c05, 0 },
+	{ 0x1ca7, 0x1ca7, 0x1ca7, 0 },
+	{ 0x1d4b, 0x1d4b, 0x1d4b, 0 },
+	{ 0x1df1, 0x1df1, 0x1df1, 0 },
+	{ 0x1e99, 0x1e99, 0x1e99, 0 },
+	{ 0x1f44, 0x1f44, 0x1f44, 0 },
+	{ 0x1ff0, 0x1ff0, 0x1ff0, 0 },
+	{ 0x209f, 0x209f, 0x209f, 0 },
+	{ 0x2150, 0x2150, 0x2150, 0 },
+	{ 0x2203, 0x2203, 0x2203, 0 },
+	{ 0x22b8, 0x22b8, 0x22b8, 0 },
+	{ 0x2370, 0x2370, 0x2370, 0 },
+	{ 0x2429, 0x2429, 0x2429, 0 },
+	{ 0x24e5, 0x24e5, 0x24e5, 0 },
+	{ 0x25a3, 0x25a3, 0x25a3, 0 },
+	{ 0x2663, 0x2663, 0x2663, 0 },
+	{ 0x2726, 0x2726, 0x2726, 0 },
+	{ 0x27ea, 0x27ea, 0x27ea, 0 },
+	{ 0x28b1, 0x28b1, 0x28b1, 0 },
+	{ 0x297a, 0x297a, 0x297a, 0 },
+	{ 0x2a45, 0x2a45, 0x2a45, 0 },
+	{ 0x2b13, 0x2b13, 0x2b13, 0 },
+	{ 0x2be3, 0x2be3, 0x2be3, 0 },
+	{ 0x2cb5, 0x2cb5, 0x2cb5, 0 },
+	{ 0x2d89, 0x2d89, 0x2d89, 0 },
+	{ 0x2e60, 0x2e60, 0x2e60, 0 },
+	{ 0x2f39, 0x2f39, 0x2f39, 0 },
+	{ 0x3014, 0x3014, 0x3014, 0 },
+	{ 0x30f2, 0x30f2, 0x30f2, 0 },
+	{ 0x31d2, 0x31d2, 0x31d2, 0 },
+	{ 0x32b4, 0x32b4, 0x32b4, 0 },
+	{ 0x3398, 0x3398, 0x3398, 0 },
+	{ 0x347f, 0x347f, 0x347f, 0 },
+	{ 0x3569, 0x3569, 0x3569, 0 },
+	{ 0x3654, 0x3654, 0x3654, 0 },
+	{ 0x3742, 0x3742, 0x3742, 0 },
+	{ 0x3832, 0x3832, 0x3832, 0 },
+	{ 0x3925, 0x3925, 0x3925, 0 },
+	{ 0x3a1a, 0x3a1a, 0x3a1a, 0 },
+	{ 0x3b11, 0x3b11, 0x3b11, 0 },
+	{ 0x3c0b, 0x3c0b, 0x3c0b, 0 },
+	{ 0x3d07, 0x3d07, 0x3d07, 0 },
+	{ 0x3e05, 0x3e05, 0x3e05, 0 },
+	{ 0x3f06, 0x3f06, 0x3f06, 0 },
+	{ 0x400a, 0x400a, 0x400a, 0 },
+	{ 0x410f, 0x410f, 0x410f, 0 },
+	{ 0x4218, 0x4218, 0x4218, 0 },
+	{ 0x4322, 0x4322, 0x4322, 0 },
+	{ 0x442f, 0x442f, 0x442f, 0 },
+	{ 0x453f, 0x453f, 0x453f, 0 },
+	{ 0x4650, 0x4650, 0x4650, 0 },
+	{ 0x4765, 0x4765, 0x4765, 0 },
+	{ 0x487c, 0x487c, 0x487c, 0 },
+	{ 0x4995, 0x4995, 0x4995, 0 },
+	{ 0x4ab1, 0x4ab1, 0x4ab1, 0 },
+	{ 0x4bcf, 0x4bcf, 0x4bcf, 0 },
+	{ 0x4cf0, 0x4cf0, 0x4cf0, 0 },
+	{ 0x4e13, 0x4e13, 0x4e13, 0 },
+	{ 0x4f39, 0x4f39, 0x4f39, 0 },
+	{ 0x5061, 0x5061, 0x5061, 0 },
+	{ 0x518b, 0x518b, 0x518b, 0 },
+	{ 0x52b9, 0x52b9, 0x52b9, 0 },
+	{ 0x53e8, 0x53e8, 0x53e8, 0 },
+	{ 0x551b, 0x551b, 0x551b, 0 },
+	{ 0x5650, 0x5650, 0x5650, 0 },
+	{ 0x5787, 0x5787, 0x5787, 0 },
+	{ 0x58c1, 0x58c1, 0x58c1, 0 },
+	{ 0x59fd, 0x59fd, 0x59fd, 0 },
+	{ 0x5b3c, 0x5b3c, 0x5b3c, 0 },
+	{ 0x5c7e, 0x5c7e, 0x5c7e, 0 },
+	{ 0x5dc2, 0x5dc2, 0x5dc2, 0 },
+	{ 0x5f09, 0x5f09, 0x5f09, 0 },
+	{ 0x6052, 0x6052, 0x6052, 0 },
+	{ 0x619e, 0x619e, 0x619e, 0 },
+	{ 0x62ec, 0x62ec, 0x62ec, 0 },
+	{ 0x643d, 0x643d, 0x643d, 0 },
+	{ 0x6591, 0x6591, 0x6591, 0 },
+	{ 0x66e7, 0x66e7, 0x66e7, 0 },
+	{ 0x6840, 0x6840, 0x6840, 0 },
+	{ 0x699b, 0x699b, 0x699b, 0 },
+	{ 0x6afa, 0x6afa, 0x6afa, 0 },
+	{ 0x6c5a, 0x6c5a, 0x6c5a, 0 },
+	{ 0x6dbe, 0x6dbe, 0x6dbe, 0 },
+	{ 0x6f24, 0x6f24, 0x6f24, 0 },
+	{ 0x708c, 0x708c, 0x708c, 0 },
+	{ 0x71f8, 0x71f8, 0x71f8, 0 },
+	{ 0x7366, 0x7366, 0x7366, 0 },
+	{ 0x74d6, 0x74d6, 0x74d6, 0 },
+	{ 0x764a, 0x764a, 0x764a, 0 },
+	{ 0x77c0, 0x77c0, 0x77c0, 0 },
+	{ 0x7938, 0x7938, 0x7938, 0 },
+	{ 0x7ab4, 0x7ab4, 0x7ab4, 0 },
+	{ 0x7c32, 0x7c32, 0x7c32, 0 },
+	{ 0x7db3, 0x7db3, 0x7db3, 0 },
+	{ 0x7f36, 0x7f36, 0x7f36, 0 },
+	{ 0x80bc, 0x80bc, 0x80bc, 0 },
+	{ 0x8245, 0x8245, 0x8245, 0 },
+	{ 0x83d1, 0x83d1, 0x83d1, 0 },
+	{ 0x855f, 0x855f, 0x855f, 0 },
+	{ 0x86f0, 0x86f0, 0x86f0, 0 },
+	{ 0x8884, 0x8884, 0x8884, 0 },
+	{ 0x8a1a, 0x8a1a, 0x8a1a, 0 },
+	{ 0x8bb4, 0x8bb4, 0x8bb4, 0 },
+	{ 0x8d50, 0x8d50, 0x8d50, 0 },
+	{ 0x8eee, 0x8eee, 0x8eee, 0 },
+	{ 0x9090, 0x9090, 0x9090, 0 },
+	{ 0x9234, 0x9234, 0x9234, 0 },
+	{ 0x93db, 0x93db, 0x93db, 0 },
+	{ 0x9585, 0x9585, 0x9585, 0 },
+	{ 0x9732, 0x9732, 0x9732, 0 },
+	{ 0x98e1, 0x98e1, 0x98e1, 0 },
+	{ 0x9a93, 0x9a93, 0x9a93, 0 },
+	{ 0x9c48, 0x9c48, 0x9c48, 0 },
+	{ 0x9e00, 0x9e00, 0x9e00, 0 },
+	{ 0x9fbb, 0x9fbb, 0x9fbb, 0 },
+	{ 0xa178, 0xa178, 0xa178, 0 },
+	{ 0xa338, 0xa338, 0xa338, 0 },
+	{ 0xa4fb, 0xa4fb, 0xa4fb, 0 },
+	{ 0xa6c1, 0xa6c1, 0xa6c1, 0 },
+	{ 0xa88a, 0xa88a, 0xa88a, 0 },
+	{ 0xaa56, 0xaa56, 0xaa56, 0 },
+	{ 0xac24, 0xac24, 0xac24, 0 },
+	{ 0xadf5, 0xadf5, 0xadf5, 0 },
+	{ 0xafc9, 0xafc9, 0xafc9, 0 },
+	{ 0xb1a0, 0xb1a0, 0xb1a0, 0 },
+	{ 0xb37a, 0xb37a, 0xb37a, 0 },
+	{ 0xb557, 0xb557, 0xb557, 0 },
+	{ 0xb736, 0xb736, 0xb736, 0 },
+	{ 0xb919, 0xb919, 0xb919, 0 },
+	{ 0xbafe, 0xbafe, 0xbafe, 0 },
+	{ 0xbce6, 0xbce6, 0xbce6, 0 },
+	{ 0xbed2, 0xbed2, 0xbed2, 0 },
+	{ 0xc0c0, 0xc0c0, 0xc0c0, 0 },
+	{ 0xc2b0, 0xc2b0, 0xc2b0, 0 },
+	{ 0xc4a4, 0xc4a4, 0xc4a4, 0 },
+	{ 0xc69b, 0xc69b, 0xc69b, 0 },
+	{ 0xc895, 0xc895, 0xc895, 0 },
+	{ 0xca91, 0xca91, 0xca91, 0 },
+	{ 0xcc91, 0xcc91, 0xcc91, 0 },
+	{ 0xce93, 0xce93, 0xce93, 0 },
+	{ 0xd098, 0xd098, 0xd098, 0 },
+	{ 0xd2a1, 0xd2a1, 0xd2a1, 0 },
+	{ 0xd4ac, 0xd4ac, 0xd4ac, 0 },
+	{ 0xd6ba, 0xd6ba, 0xd6ba, 0 },
+	{ 0xd8cb, 0xd8cb, 0xd8cb, 0 },
+	{ 0xdadf, 0xdadf, 0xdadf, 0 },
+	{ 0xdcf7, 0xdcf7, 0xdcf7, 0 },
+	{ 0xdf11, 0xdf11, 0xdf11, 0 },
+	{ 0xe12e, 0xe12e, 0xe12e, 0 },
+	{ 0xe34e, 0xe34e, 0xe34e, 0 },
+	{ 0xe571, 0xe571, 0xe571, 0 },
+	{ 0xe796, 0xe796, 0xe796, 0 },
+	{ 0xe9bf, 0xe9bf, 0xe9bf, 0 },
+	{ 0xebeb, 0xebeb, 0xebeb, 0 },
+	{ 0xee1a, 0xee1a, 0xee1a, 0 },
+	{ 0xf04c, 0xf04c, 0xf04c, 0 },
+	{ 0xf281, 0xf281, 0xf281, 0 },
+	{ 0xf4b9, 0xf4b9, 0xf4b9, 0 },
+	{ 0xf6f4, 0xf6f4, 0xf6f4, 0 },
+	{ 0xf932, 0xf932, 0xf932, 0 },
+	{ 0xfb73, 0xfb73, 0xfb73, 0 },
+	{ 0xfdb7, 0xfdb7, 0xfdb7, 0 },
+	{ 0xffff, 0xffff, 0xffff, 0 },
+};
+
+const struct vkms_color_lut srgb_eotf = {
+	.base = srgb_array,
+	.lut_length = LUT_SIZE,
+	.channel_value2index_ratio = 0xff00ffll
+};
+
+static struct drm_color_lut srgb_inv_array[LUT_SIZE] = {
+	{ 0x0, 0x0, 0x0, 0 },
+	{ 0xcc2, 0xcc2, 0xcc2, 0 },
+	{ 0x15be, 0x15be, 0x15be, 0 },
+	{ 0x1c56, 0x1c56, 0x1c56, 0 },
+	{ 0x21bd, 0x21bd, 0x21bd, 0 },
+	{ 0x2666, 0x2666, 0x2666, 0 },
+	{ 0x2a8a, 0x2a8a, 0x2a8a, 0 },
+	{ 0x2e4c, 0x2e4c, 0x2e4c, 0 },
+	{ 0x31c0, 0x31c0, 0x31c0, 0 },
+	{ 0x34f6, 0x34f6, 0x34f6, 0 },
+	{ 0x37f9, 0x37f9, 0x37f9, 0 },
+	{ 0x3acf, 0x3acf, 0x3acf, 0 },
+	{ 0x3d80, 0x3d80, 0x3d80, 0 },
+	{ 0x4010, 0x4010, 0x4010, 0 },
+	{ 0x4284, 0x4284, 0x4284, 0 },
+	{ 0x44dd, 0x44dd, 0x44dd, 0 },
+	{ 0x4720, 0x4720, 0x4720, 0 },
+	{ 0x494e, 0x494e, 0x494e, 0 },
+	{ 0x4b69, 0x4b69, 0x4b69, 0 },
+	{ 0x4d73, 0x4d73, 0x4d73, 0 },
+	{ 0x4f6e, 0x4f6e, 0x4f6e, 0 },
+	{ 0x5159, 0x5159, 0x5159, 0 },
+	{ 0x5337, 0x5337, 0x5337, 0 },
+	{ 0x5509, 0x5509, 0x5509, 0 },
+	{ 0x56cf, 0x56cf, 0x56cf, 0 },
+	{ 0x588a, 0x588a, 0x588a, 0 },
+	{ 0x5a3b, 0x5a3b, 0x5a3b, 0 },
+	{ 0x5be2, 0x5be2, 0x5be2, 0 },
+	{ 0x5d80, 0x5d80, 0x5d80, 0 },
+	{ 0x5f16, 0x5f16, 0x5f16, 0 },
+	{ 0x60a4, 0x60a4, 0x60a4, 0 },
+	{ 0x6229, 0x6229, 0x6229, 0 },
+	{ 0x63a8, 0x63a8, 0x63a8, 0 },
+	{ 0x6520, 0x6520, 0x6520, 0 },
+	{ 0x6691, 0x6691, 0x6691, 0 },
+	{ 0x67fc, 0x67fc, 0x67fc, 0 },
+	{ 0x6961, 0x6961, 0x6961, 0 },
+	{ 0x6ac0, 0x6ac0, 0x6ac0, 0 },
+	{ 0x6c19, 0x6c19, 0x6c19, 0 },
+	{ 0x6d6e, 0x6d6e, 0x6d6e, 0 },
+	{ 0x6ebd, 0x6ebd, 0x6ebd, 0 },
+	{ 0x7008, 0x7008, 0x7008, 0 },
+	{ 0x714d, 0x714d, 0x714d, 0 },
+	{ 0x728f, 0x728f, 0x728f, 0 },
+	{ 0x73cc, 0x73cc, 0x73cc, 0 },
+	{ 0x7504, 0x7504, 0x7504, 0 },
+	{ 0x7639, 0x7639, 0x7639, 0 },
+	{ 0x776a, 0x776a, 0x776a, 0 },
+	{ 0x7897, 0x7897, 0x7897, 0 },
+	{ 0x79c1, 0x79c1, 0x79c1, 0 },
+	{ 0x7ae7, 0x7ae7, 0x7ae7, 0 },
+	{ 0x7c09, 0x7c09, 0x7c09, 0 },
+	{ 0x7d28, 0x7d28, 0x7d28, 0 },
+	{ 0x7e44, 0x7e44, 0x7e44, 0 },
+	{ 0x7f5d, 0x7f5d, 0x7f5d, 0 },
+	{ 0x8073, 0x8073, 0x8073, 0 },
+	{ 0x8186, 0x8186, 0x8186, 0 },
+	{ 0x8296, 0x8296, 0x8296, 0 },
+	{ 0x83a4, 0x83a4, 0x83a4, 0 },
+	{ 0x84ae, 0x84ae, 0x84ae, 0 },
+	{ 0x85b6, 0x85b6, 0x85b6, 0 },
+	{ 0x86bc, 0x86bc, 0x86bc, 0 },
+	{ 0x87bf, 0x87bf, 0x87bf, 0 },
+	{ 0x88bf, 0x88bf, 0x88bf, 0 },
+	{ 0x89be, 0x89be, 0x89be, 0 },
+	{ 0x8ab9, 0x8ab9, 0x8ab9, 0 },
+	{ 0x8bb3, 0x8bb3, 0x8bb3, 0 },
+	{ 0x8cab, 0x8cab, 0x8cab, 0 },
+	{ 0x8da0, 0x8da0, 0x8da0, 0 },
+	{ 0x8e93, 0x8e93, 0x8e93, 0 },
+	{ 0x8f84, 0x8f84, 0x8f84, 0 },
+	{ 0x9073, 0x9073, 0x9073, 0 },
+	{ 0x9161, 0x9161, 0x9161, 0 },
+	{ 0x924c, 0x924c, 0x924c, 0 },
+	{ 0x9335, 0x9335, 0x9335, 0 },
+	{ 0x941d, 0x941d, 0x941d, 0 },
+	{ 0x9503, 0x9503, 0x9503, 0 },
+	{ 0x95e7, 0x95e7, 0x95e7, 0 },
+	{ 0x96c9, 0x96c9, 0x96c9, 0 },
+	{ 0x97aa, 0x97aa, 0x97aa, 0 },
+	{ 0x9889, 0x9889, 0x9889, 0 },
+	{ 0x9966, 0x9966, 0x9966, 0 },
+	{ 0x9a42, 0x9a42, 0x9a42, 0 },
+	{ 0x9b1c, 0x9b1c, 0x9b1c, 0 },
+	{ 0x9bf5, 0x9bf5, 0x9bf5, 0 },
+	{ 0x9ccc, 0x9ccc, 0x9ccc, 0 },
+	{ 0x9da1, 0x9da1, 0x9da1, 0 },
+	{ 0x9e76, 0x9e76, 0x9e76, 0 },
+	{ 0x9f49, 0x9f49, 0x9f49, 0 },
+	{ 0xa01a, 0xa01a, 0xa01a, 0 },
+	{ 0xa0ea, 0xa0ea, 0xa0ea, 0 },
+	{ 0xa1b9, 0xa1b9, 0xa1b9, 0 },
+	{ 0xa286, 0xa286, 0xa286, 0 },
+	{ 0xa352, 0xa352, 0xa352, 0 },
+	{ 0xa41d, 0xa41d, 0xa41d, 0 },
+	{ 0xa4e7, 0xa4e7, 0xa4e7, 0 },
+	{ 0xa5af, 0xa5af, 0xa5af, 0 },
+	{ 0xa676, 0xa676, 0xa676, 0 },
+	{ 0xa73c, 0xa73c, 0xa73c, 0 },
+	{ 0xa801, 0xa801, 0xa801, 0 },
+	{ 0xa8c5, 0xa8c5, 0xa8c5, 0 },
+	{ 0xa987, 0xa987, 0xa987, 0 },
+	{ 0xaa48, 0xaa48, 0xaa48, 0 },
+	{ 0xab09, 0xab09, 0xab09, 0 },
+	{ 0xabc8, 0xabc8, 0xabc8, 0 },
+	{ 0xac86, 0xac86, 0xac86, 0 },
+	{ 0xad43, 0xad43, 0xad43, 0 },
+	{ 0xadff, 0xadff, 0xadff, 0 },
+	{ 0xaeba, 0xaeba, 0xaeba, 0 },
+	{ 0xaf74, 0xaf74, 0xaf74, 0 },
+	{ 0xb02d, 0xb02d, 0xb02d, 0 },
+	{ 0xb0e5, 0xb0e5, 0xb0e5, 0 },
+	{ 0xb19c, 0xb19c, 0xb19c, 0 },
+	{ 0xb252, 0xb252, 0xb252, 0 },
+	{ 0xb307, 0xb307, 0xb307, 0 },
+	{ 0xb3bb, 0xb3bb, 0xb3bb, 0 },
+	{ 0xb46f, 0xb46f, 0xb46f, 0 },
+	{ 0xb521, 0xb521, 0xb521, 0 },
+	{ 0xb5d3, 0xb5d3, 0xb5d3, 0 },
+	{ 0xb683, 0xb683, 0xb683, 0 },
+	{ 0xb733, 0xb733, 0xb733, 0 },
+	{ 0xb7e2, 0xb7e2, 0xb7e2, 0 },
+	{ 0xb890, 0xb890, 0xb890, 0 },
+	{ 0xb93d, 0xb93d, 0xb93d, 0 },
+	{ 0xb9ea, 0xb9ea, 0xb9ea, 0 },
+	{ 0xba96, 0xba96, 0xba96, 0 },
+	{ 0xbb40, 0xbb40, 0xbb40, 0 },
+	{ 0xbbea, 0xbbea, 0xbbea, 0 },
+	{ 0xbc94, 0xbc94, 0xbc94, 0 },
+	{ 0xbd3c, 0xbd3c, 0xbd3c, 0 },
+	{ 0xbde4, 0xbde4, 0xbde4, 0 },
+	{ 0xbe8b, 0xbe8b, 0xbe8b, 0 },
+	{ 0xbf31, 0xbf31, 0xbf31, 0 },
+	{ 0xbfd7, 0xbfd7, 0xbfd7, 0 },
+	{ 0xc07b, 0xc07b, 0xc07b, 0 },
+	{ 0xc120, 0xc120, 0xc120, 0 },
+	{ 0xc1c3, 0xc1c3, 0xc1c3, 0 },
+	{ 0xc266, 0xc266, 0xc266, 0 },
+	{ 0xc308, 0xc308, 0xc308, 0 },
+	{ 0xc3a9, 0xc3a9, 0xc3a9, 0 },
+	{ 0xc449, 0xc449, 0xc449, 0 },
+	{ 0xc4e9, 0xc4e9, 0xc4e9, 0 },
+	{ 0xc589, 0xc589, 0xc589, 0 },
+	{ 0xc627, 0xc627, 0xc627, 0 },
+	{ 0xc6c5, 0xc6c5, 0xc6c5, 0 },
+	{ 0xc763, 0xc763, 0xc763, 0 },
+	{ 0xc7ff, 0xc7ff, 0xc7ff, 0 },
+	{ 0xc89b, 0xc89b, 0xc89b, 0 },
+	{ 0xc937, 0xc937, 0xc937, 0 },
+	{ 0xc9d2, 0xc9d2, 0xc9d2, 0 },
+	{ 0xca6c, 0xca6c, 0xca6c, 0 },
+	{ 0xcb06, 0xcb06, 0xcb06, 0 },
+	{ 0xcb9f, 0xcb9f, 0xcb9f, 0 },
+	{ 0xcc37, 0xcc37, 0xcc37, 0 },
+	{ 0xcccf, 0xcccf, 0xcccf, 0 },
+	{ 0xcd66, 0xcd66, 0xcd66, 0 },
+	{ 0xcdfd, 0xcdfd, 0xcdfd, 0 },
+	{ 0xce93, 0xce93, 0xce93, 0 },
+	{ 0xcf29, 0xcf29, 0xcf29, 0 },
+	{ 0xcfbe, 0xcfbe, 0xcfbe, 0 },
+	{ 0xd053, 0xd053, 0xd053, 0 },
+	{ 0xd0e7, 0xd0e7, 0xd0e7, 0 },
+	{ 0xd17a, 0xd17a, 0xd17a, 0 },
+	{ 0xd20d, 0xd20d, 0xd20d, 0 },
+	{ 0xd2a0, 0xd2a0, 0xd2a0, 0 },
+	{ 0xd331, 0xd331, 0xd331, 0 },
+	{ 0xd3c3, 0xd3c3, 0xd3c3, 0 },
+	{ 0xd454, 0xd454, 0xd454, 0 },
+	{ 0xd4e4, 0xd4e4, 0xd4e4, 0 },
+	{ 0xd574, 0xd574, 0xd574, 0 },
+	{ 0xd603, 0xd603, 0xd603, 0 },
+	{ 0xd692, 0xd692, 0xd692, 0 },
+	{ 0xd720, 0xd720, 0xd720, 0 },
+	{ 0xd7ae, 0xd7ae, 0xd7ae, 0 },
+	{ 0xd83c, 0xd83c, 0xd83c, 0 },
+	{ 0xd8c9, 0xd8c9, 0xd8c9, 0 },
+	{ 0xd955, 0xd955, 0xd955, 0 },
+	{ 0xd9e1, 0xd9e1, 0xd9e1, 0 },
+	{ 0xda6d, 0xda6d, 0xda6d, 0 },
+	{ 0xdaf8, 0xdaf8, 0xdaf8, 0 },
+	{ 0xdb83, 0xdb83, 0xdb83, 0 },
+	{ 0xdc0d, 0xdc0d, 0xdc0d, 0 },
+	{ 0xdc97, 0xdc97, 0xdc97, 0 },
+	{ 0xdd20, 0xdd20, 0xdd20, 0 },
+	{ 0xdda9, 0xdda9, 0xdda9, 0 },
+	{ 0xde31, 0xde31, 0xde31, 0 },
+	{ 0xdeb9, 0xdeb9, 0xdeb9, 0 },
+	{ 0xdf41, 0xdf41, 0xdf41, 0 },
+	{ 0xdfc8, 0xdfc8, 0xdfc8, 0 },
+	{ 0xe04f, 0xe04f, 0xe04f, 0 },
+	{ 0xe0d5, 0xe0d5, 0xe0d5, 0 },
+	{ 0xe15b, 0xe15b, 0xe15b, 0 },
+	{ 0xe1e0, 0xe1e0, 0xe1e0, 0 },
+	{ 0xe266, 0xe266, 0xe266, 0 },
+	{ 0xe2ea, 0xe2ea, 0xe2ea, 0 },
+	{ 0xe36f, 0xe36f, 0xe36f, 0 },
+	{ 0xe3f3, 0xe3f3, 0xe3f3, 0 },
+	{ 0xe476, 0xe476, 0xe476, 0 },
+	{ 0xe4f9, 0xe4f9, 0xe4f9, 0 },
+	{ 0xe57c, 0xe57c, 0xe57c, 0 },
+	{ 0xe5fe, 0xe5fe, 0xe5fe, 0 },
+	{ 0xe680, 0xe680, 0xe680, 0 },
+	{ 0xe702, 0xe702, 0xe702, 0 },
+	{ 0xe783, 0xe783, 0xe783, 0 },
+	{ 0xe804, 0xe804, 0xe804, 0 },
+	{ 0xe884, 0xe884, 0xe884, 0 },
+	{ 0xe905, 0xe905, 0xe905, 0 },
+	{ 0xe984, 0xe984, 0xe984, 0 },
+	{ 0xea04, 0xea04, 0xea04, 0 },
+	{ 0xea83, 0xea83, 0xea83, 0 },
+	{ 0xeb02, 0xeb02, 0xeb02, 0 },
+	{ 0xeb80, 0xeb80, 0xeb80, 0 },
+	{ 0xebfe, 0xebfe, 0xebfe, 0 },
+	{ 0xec7b, 0xec7b, 0xec7b, 0 },
+	{ 0xecf9, 0xecf9, 0xecf9, 0 },
+	{ 0xed76, 0xed76, 0xed76, 0 },
+	{ 0xedf2, 0xedf2, 0xedf2, 0 },
+	{ 0xee6f, 0xee6f, 0xee6f, 0 },
+	{ 0xeeeb, 0xeeeb, 0xeeeb, 0 },
+	{ 0xef66, 0xef66, 0xef66, 0 },
+	{ 0xefe2, 0xefe2, 0xefe2, 0 },
+	{ 0xf05d, 0xf05d, 0xf05d, 0 },
+	{ 0xf0d7, 0xf0d7, 0xf0d7, 0 },
+	{ 0xf152, 0xf152, 0xf152, 0 },
+	{ 0xf1cc, 0xf1cc, 0xf1cc, 0 },
+	{ 0xf245, 0xf245, 0xf245, 0 },
+	{ 0xf2bf, 0xf2bf, 0xf2bf, 0 },
+	{ 0xf338, 0xf338, 0xf338, 0 },
+	{ 0xf3b0, 0xf3b0, 0xf3b0, 0 },
+	{ 0xf429, 0xf429, 0xf429, 0 },
+	{ 0xf4a1, 0xf4a1, 0xf4a1, 0 },
+	{ 0xf519, 0xf519, 0xf519, 0 },
+	{ 0xf590, 0xf590, 0xf590, 0 },
+	{ 0xf608, 0xf608, 0xf608, 0 },
+	{ 0xf67e, 0xf67e, 0xf67e, 0 },
+	{ 0xf6f5, 0xf6f5, 0xf6f5, 0 },
+	{ 0xf76b, 0xf76b, 0xf76b, 0 },
+	{ 0xf7e1, 0xf7e1, 0xf7e1, 0 },
+	{ 0xf857, 0xf857, 0xf857, 0 },
+	{ 0xf8cd, 0xf8cd, 0xf8cd, 0 },
+	{ 0xf942, 0xf942, 0xf942, 0 },
+	{ 0xf9b7, 0xf9b7, 0xf9b7, 0 },
+	{ 0xfa2b, 0xfa2b, 0xfa2b, 0 },
+	{ 0xfaa0, 0xfaa0, 0xfaa0, 0 },
+	{ 0xfb14, 0xfb14, 0xfb14, 0 },
+	{ 0xfb88, 0xfb88, 0xfb88, 0 },
+	{ 0xfbfb, 0xfbfb, 0xfbfb, 0 },
+	{ 0xfc6e, 0xfc6e, 0xfc6e, 0 },
+	{ 0xfce1, 0xfce1, 0xfce1, 0 },
+	{ 0xfd54, 0xfd54, 0xfd54, 0 },
+	{ 0xfdc6, 0xfdc6, 0xfdc6, 0 },
+	{ 0xfe39, 0xfe39, 0xfe39, 0 },
+	{ 0xfeaa, 0xfeaa, 0xfeaa, 0 },
+	{ 0xff1c, 0xff1c, 0xff1c, 0 },
+	{ 0xff8d, 0xff8d, 0xff8d, 0 },
+	{ 0xffff, 0xffff, 0xffff, 0 },
+};
+
+const struct vkms_color_lut srgb_inv_eotf = {
+	.base = srgb_inv_array,
+	.lut_length = LUT_SIZE,
+	.channel_value2index_ratio = 0xff00ffll
+};
diff --git a/drivers/gpu/drm/vkms/vkms_luts.h b/drivers/gpu/drm/vkms/vkms_luts.h
new file mode 100644
index 000000000000..053512a643f7
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_luts.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _VKMS_LUTS_H_
+#define _VKMS_LUTS_H_
+
+#define LUT_SIZE 256
+
+extern const struct vkms_color_lut linear_eotf;
+extern const struct vkms_color_lut srgb_eotf;
+extern const struct vkms_color_lut srgb_inv_eotf;
+
+#endif /* _VKMS_LUTS_H_ */
\ No newline at end of file
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index e5c625ab8e3e..8520ee0534d1 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -215,5 +215,7 @@ struct vkms_plane *vkms_plane_init(struct vkms_device *vkmsdev,
 	drm_plane_create_rotation_property(&plane->base, DRM_MODE_ROTATE_0,
 					   DRM_MODE_ROTATE_MASK | DRM_MODE_REFLECT_MASK);
 
+	vkms_initialize_colorops(&plane->base);
+
 	return plane;
 }
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH v2 17/17] drm/vkms: Add kunit tests for linear and sRGB LUTs
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (15 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 16/17] drm/vkms: Add enumerated 1D curve colorop Harry Wentland
@ 2023-10-19 21:21 ` Harry Wentland
  2023-11-08 11:54 ` [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Shankar, Uma
  17 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-10-19 21:21 UTC (permalink / raw)
  To: dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Christopher Braga, Pekka Paalanen, Hector Martin, Xaver Hugl,
	Joshua Ashton

Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
Cc: Simon Ser <contact@emersion.fr>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Melissa Wen <mwen@igalia.com>
Cc: Jonas Ådahl <jadahl@redhat.com>
Cc: Sebastian Wick <sebastian.wick@redhat.com>
Cc: Shashank Sharma <shashank.sharma@amd.com>
Cc: Alexander Goins <agoins@nvidia.com>
Cc: Joshua Ashton <joshua@froggi.es>
Cc: Michel Dänzer <mdaenzer@redhat.com>
Cc: Aleix Pol <aleixpol@kde.org>
Cc: Xaver Hugl <xaver.hugl@gmail.com>
Cc: Victoria Brekenfeld <victoria@system76.com>
Cc: Sima <daniel@ffwll.ch>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Naseer Ahmed <quic_naseer@quicinc.com>
Cc: Christopher Braga <quic_cbraga@quicinc.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Arthur Grillo <arthurgrillo@riseup.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Liviu Dudau <Liviu.Dudau@arm.com>
Cc: Sasha McIntosh <sashamcintosh@google.com>
---
 drivers/gpu/drm/vkms/tests/vkms_color_tests.c | 38 ++++++++++++++++++-
 drivers/gpu/drm/vkms/vkms_composer.c          | 13 +------
 drivers/gpu/drm/vkms/vkms_composer.h          | 14 +++++++
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/vkms/tests/vkms_color_tests.c b/drivers/gpu/drm/vkms/tests/vkms_color_tests.c
index 843b2e1d607e..14decb5d1b64 100644
--- a/drivers/gpu/drm/vkms/tests/vkms_color_tests.c
+++ b/drivers/gpu/drm/vkms/tests/vkms_color_tests.c
@@ -5,6 +5,7 @@
 #include <drm/drm_fixed.h>
 
 #include "../vkms_composer.h"
+#include "../vkms_luts.h"
 
 #define TEST_LUT_SIZE 16
 
@@ -33,7 +34,6 @@ const struct vkms_color_lut test_linear_lut = {
 	.channel_value2index_ratio = 0xf000fll
 };
 
-
 static void vkms_color_test_get_lut_index(struct kunit *test)
 {
 	int i;
@@ -42,6 +42,19 @@ static void vkms_color_test_get_lut_index(struct kunit *test)
 
 	for (i = 0; i < TEST_LUT_SIZE; i++)
 		KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&test_linear_lut, test_linear_array[i].red)), i);
+
+	KUNIT_EXPECT_EQ(test, drm_fixp2int(get_lut_index(&srgb_eotf, 0x0)), 0x0);
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_eotf, 0x0)), 0x0);
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_eotf, 0x101)), 0x1);
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_eotf, 0x202)), 0x2);
+
+	KUNIT_EXPECT_EQ(test, drm_fixp2int(get_lut_index(&srgb_inv_eotf, 0x0)), 0x0);
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_inv_eotf, 0x0)), 0x0);
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_inv_eotf, 0x101)), 0x1);
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_inv_eotf, 0x202)), 0x2);
+
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_eotf, 0xfefe)), 0xfe);
+	KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&srgb_eotf, 0xffff)), 0xff);
 }
 
 static void vkms_color_test_lerp(struct kunit *test)
@@ -49,9 +62,32 @@ static void vkms_color_test_lerp(struct kunit *test)
 	KUNIT_EXPECT_EQ(test, lerp_u16(0x0, 0x10, 0x80000000), 0x8);
 }
 
+static void vkms_color_test_linear(struct kunit *test)
+{
+	for (int i = 0; i < LUT_SIZE; i++) {
+		int linear = apply_lut_to_channel_value(&linear_eotf, i * 0x101, LUT_RED);
+		KUNIT_EXPECT_EQ(test, DIV_ROUND_CLOSEST(linear, 0x101), i);
+	}
+}
+
+static void vkms_color_srgb_inv_srgb(struct kunit *test)
+{
+	u16 srgb, final;
+
+	for (int i = 0; i < LUT_SIZE; i++) {
+		srgb = apply_lut_to_channel_value(&srgb_eotf, i * 0x101, LUT_RED);
+		final = apply_lut_to_channel_value(&srgb_inv_eotf, srgb, LUT_RED);
+
+		KUNIT_EXPECT_GE(test, final / 0x101, i-1);
+		KUNIT_EXPECT_LE(test, final / 0x101, i+1);
+	}
+}
+
 static struct kunit_case vkms_color_test_cases[] = {
 	KUNIT_CASE(vkms_color_test_get_lut_index),
 	KUNIT_CASE(vkms_color_test_lerp),
+	KUNIT_CASE(vkms_color_test_linear),
+	KUNIT_CASE(vkms_color_srgb_inv_srgb),
 	{}
 };
 
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 73b7d5e94021..24c984f2876f 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -110,18 +110,7 @@ s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value)
 	return drm_fixp_mul(color_channel_fp, lut->channel_value2index_ratio);
 }
 
-/*
- * This enum is related to the positions of the variables inside
- * `struct drm_color_lut`, so the order of both needs to be the same.
- */
-enum lut_channel {
-	LUT_RED = 0,
-	LUT_GREEN,
-	LUT_BLUE,
-	LUT_RESERVED
-};
-
-static u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 channel_value,
+u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 channel_value,
 				      enum lut_channel channel)
 {
 	s64 lut_index = get_lut_index(lut, channel_value);
diff --git a/drivers/gpu/drm/vkms/vkms_composer.h b/drivers/gpu/drm/vkms/vkms_composer.h
index 11c5de9cc961..d92497c555eb 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.h
+++ b/drivers/gpu/drm/vkms/vkms_composer.h
@@ -8,4 +8,18 @@
 s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value);
 u16 lerp_u16(u16 a, u16 b, s64 t);
 
+/*
+ * This enum is related to the positions of the variables inside
+ * `struct drm_color_lut`, so the order of both needs to be the same.
+ */
+enum lut_channel {
+	LUT_RED = 0,
+	LUT_GREEN,
+	LUT_BLUE,
+	LUT_RESERVED
+};
+
+u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 channel_value,
+			       enum lut_channel channel);
+
 #endif /* _VKMS_COMPOSER_H_ */
\ No newline at end of file
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-19 21:21 ` [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed Harry Wentland
@ 2023-10-20 14:22   ` Sebastian Wick
  2023-10-20 14:57     ` Pekka Paalanen
  2023-11-08 12:18   ` Shankar, Uma
  1 sibling, 1 reply; 49+ messages in thread
From: Sebastian Wick @ 2023-10-20 14:22 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Aleix Pol, Shashank Sharma,
	wayland-devel, Jonas Ådahl, Uma Shankar, Abhinav Kumar,
	Naseer Ahmed, Melissa Wen, Christopher Braga, Pekka Paalanen,
	Hector Martin, Xaver Hugl, Joshua Ashton

Thanks for continuing to work on this!

On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> v2:
>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>  - Updated wording (Pekka)
>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>    section (Pekka)
>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>  - Add "Driver Implementer's Guide" section (Pekka)
>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> 
> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Jonas Ådahl <jadahl@redhat.com>
> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Cc: Alexander Goins <agoins@nvidia.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Michel Dänzer <mdaenzer@redhat.com>
> Cc: Aleix Pol <aleixpol@kde.org>
> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> Cc: Victoria Brekenfeld <victoria@system76.com>
> Cc: Sima <daniel@ffwll.ch>
> Cc: Uma Shankar <uma.shankar@intel.com>
> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
> Cc: Christopher Braga <quic_cbraga@quicinc.com>
> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> Cc: Arthur Grillo <arthurgrillo@riseup.net>
> Cc: Hector Martin <marcan@marcan.st>
> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
> Cc: Sasha McIntosh <sashamcintosh@google.com>
> ---
>  Documentation/gpu/rfc/color_pipeline.rst | 347 +++++++++++++++++++++++
>  1 file changed, 347 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> 
> diff --git a/Documentation/gpu/rfc/color_pipeline.rst b/Documentation/gpu/rfc/color_pipeline.rst
> new file mode 100644
> index 000000000000..af5f2ea29116
> --- /dev/null
> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> @@ -0,0 +1,347 @@
> +========================
> +Linux Color Pipeline API
> +========================
> +
> +What problem are we solving?
> +============================
> +
> +We would like to support pre-, and post-blending complex color
> +transformations in display controller hardware in order to allow for
> +HW-supported HDR use-cases, as well as to provide support to
> +color-managed applications, such as video or image editors.
> +
> +It is possible to support an HDR output on HW supporting the Colorspace
> +and HDR Metadata drm_connector properties, but that requires the
> +compositor or application to render and compose the content into one
> +final buffer intended for display. Doing so is costly.
> +
> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and other
> +operations to support color transformations. These operations are often
> +implemented in fixed-function HW and therefore much more power efficient than
> +performing similar operations via shaders or CPU.
> +
> +We would like to make use of this HW functionality to support complex color
> +transformations with no, or minimal CPU or shader load.
> +
> +
> +How are other OSes solving this problem?
> +========================================
> +
> +The most widely supported use-cases regard HDR content, whether video or
> +gaming.
> +
> +Most OSes will specify the source content format (color gamut, encoding transfer
> +function, and other metadata, such as max and average light levels) to a driver.
> +Drivers will then program their fixed-function HW accordingly to map from a
> +source content buffer's space to a display's space.
> +
> +When fixed-function HW is not available the compositor will assemble a shader to
> +ask the GPU to perform the transformation from the source content format to the
> +display's format.
> +
> +A compositor's mapping function and a driver's mapping function are usually
> +entirely separate concepts. On OSes where a HW vendor has no insight into
> +closed-source compositor code such a vendor will tune their color management
> +code to visually match the compositor's. On other OSes, where both mapping
> +functions are open to an implementer they will ensure both mappings match.
> +
> +This results in mapping algorithm lock-in, meaning that no-one alone can
> +experiment with or introduce new mapping algorithms and achieve
> +consistent results regardless of which implementation path is taken.
> +
> +Why is Linux different?
> +=======================
> +
> +Unlike other OSes, where there is one compositor for one or more drivers, on
> +Linux we have a many-to-many relationship. Many compositors; many drivers.
> +In addition each compositor vendor or community has their own view of how
> +color management should be done. This is what makes Linux so beautiful.
> +
> +This means that a HW vendor can now no longer tune their driver to one
> +compositor, as tuning it to one could make it look fairly different from
> +another compositor's color mapping.
> +
> +We need a better solution.
> +
> +
> +Descriptive API
> +===============
> +
> +An API that describes the source and destination colorspaces is a descriptive
> +API. It describes the input and output color spaces but does not describe
> +how precisely they should be mapped. Such a mapping includes many minute
> +design decision that can greatly affect the look of the final result.
> +
> +It is not feasible to describe such mapping with enough detail to ensure the
> +same result from each implementation. In fact, these mappings are a very active
> +research area.
> +
> +
> +Prescriptive API
> +================
> +
> +A prescriptive API describes not the source and destination colorspaces. It
> +instead prescribes a recipe for how to manipulate pixel values to arrive at the
> +desired outcome.
> +
> +This recipe is generally an ordered list of straight-forward operations,
> +with clear mathematical definitions, such as 1D LUTs, 3D LUTs, matrices,
> +or other operations that can be described in a precise manner.
> +
> +
> +The Color Pipeline API
> +======================
> +
> +HW color management pipelines can significantly differ between HW
> +vendors in terms of availability, ordering, and capabilities of HW
> +blocks. This makes a common definition of color management blocks and
> +their ordering nigh impossible. Instead we are defining an API that
> +allows user space to discover the HW capabilities in a generic manner,
> +agnostic of specific drivers and hardware.
> +
> +
> +drm_colorop Object & IOCTLs
> +===========================
> +
> +To support the definition of color pipelines we define the DRM core
> +object type drm_colorop. Individual drm_colorop objects will be chained
> +via the NEXT property of a drm_colorop to constitute a color pipeline.
> +Each drm_colorop object is unique, i.e., even if multiple color
> +pipelines have the same operation they won't share the same drm_colorop
> +object to describe that operation.
> +
> +Note that drivers are not expected to map drm_colorop objects statically
> +to specific HW blocks. The mapping of drm_colorop objects is entirely a
> +driver-internal detail and can be as dynamic or static as a driver needs
> +it to be. See more in the Driver Implementation Guide section below.
> +
> +Just like other DRM objects the drm_colorop objects are discovered via
> +IOCTLs:
> +
> +DRM_IOCTL_MODE_GETCOLOROPRESOURCES: This IOCTL is used to retrieve the
> +number of all drm_colorop objects.
> +
> +DRM_IOCTL_MODE_GETCOLOROP: This IOCTL is used to read one drm_colorop.
> +It includes the ID for the colorop object, as well as the plane_id of
> +the associated plane. All other values should be registered as
> +properties.
> +
> +Each drm_colorop has three core properties:
> +
> +TYPE: The type of transformation, such as
> +* enumerated curve
> +* custom (uniform) 1D LUT
> +* 3x3 matrix
> +* 3x4 matrix
> +* 3D LUT
> +* etc.
> +
> +Depending on the type of transformation other properties will describe
> +more details.
> +
> +BYPASS: A boolean property that can be used to easily put a block into
> +bypass mode. While setting other properties might fail atomic check,
> +setting the BYPASS property to true should never fail. The BYPASS

It hurts me to say as someone who is going to deal with this in user
space but I think we should drop the requirement to never fail setting a
pipeline to bypass mode with !ALLOW_MODESET.

On IRC there was a discussion with Sima where he explained that atomic
checks always check from current state (C) to a new state (B). This
doesn't imply B->C will succeed as well. So to make the guarantee
possible we'd have to change all drivers to be able to check from
arbitrary state A to arbitrary state B and then check both C->B and
B->C (or let user space do it).

Let's leave this can of worms for another time and then solve it not
just for the color pipeline but for any state.

> +property is not mandatory for a colorop, as long as the entire pipeline
> +can get bypassed by setting the COLOR_PIPELINE on a plane to '0'.
> +
> +NEXT: The ID of the next drm_colorop in a color pipeline, or 0 if this
> +drm_colorop is the last in the chain.
> +
> +An example of a drm_colorop object might look like one of these::
> +
> +    /* 1D enumerated curve */
> +    Color operation 42
> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> +    ├─ "BYPASS": bool {true, false}
> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
> +    └─ "NEXT": immutable color operation ID = 43
> +
> +    /* custom 4k entry 1D LUT */
> +    Color operation 52
> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
> +    ├─ "BYPASS": bool {true, false}
> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> +    ├─ "LUT_1D": blob
> +    └─ "NEXT": immutable color operation ID = 0
> +
> +    /* 17^3 3D LUT */
> +    Color operation 72
> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 3D LUT
> +    ├─ "BYPASS": bool {true, false}
> +    ├─ "LUT_3D_SIZE": immutable range = 17
> +    ├─ "LUT_3D": blob
> +    └─ "NEXT": immutable color operation ID = 73
> +
> +
> +COLOR_PIPELINE Plane Property
> +=============================
> +
> +Color Pipelines are created by a driver and advertised via a new
> +COLOR_PIPELINE enum property on each plane. Values of the property
> +always include '0', which is the default and means all color processing
> +is disabled. Additional values will be the object IDs of the first
> +drm_colorop in a pipeline. A driver can create and advertise none, one,
> +or more possible color pipelines. A DRM client will select a color
> +pipeline by setting the COLOR PIPELINE to the respective value.
> +
> +In the case where drivers have custom support for pre-blending color
> +processing those drivers shall reject atomic commits that are trying to
> +use both the custom color properties, as well as the COLOR_PIPELINE
> +property.

I think we all agree that we need a CAP even for the pre-blending
pipeline anyway because of COLOR_ENCODING etc. So this probably should
be more general and should say that with this CAP to expose the color
pipeline any other pre-blending color processing properties need to be
removed and all driver-internal pre-blending color processing must be
disabled.

> +
> +An example of a COLOR_PIPELINE property on a plane might look like this::
> +
> +    Plane 10
> +    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> +    ├─ …
> +    └─ "color_pipeline": enum {0, 42, 52} = 0
> +
> +
> +Color Pipeline Discovery
> +========================
> +
> +A DRM client wanting color management on a drm_plane will:
> +
> +1. Read all drm_colorop objects
> +2. Get the COLOR_PIPELINE property of the plane
> +3. iterate all COLOR_PIPELINE enum values
> +4. for each enum value walk the color pipeline (via the NEXT pointers)
> +   and see if the available color operations are suitable for the
> +   desired color management operations
> +
> +An example of chained properties to define an AMD pre-blending color
> +pipeline might look like this::
> +
> +    Plane 10
> +    ├─ "TYPE" (immutable) = Primary
> +    └─ "COLOR_PIPELINE": enum {0, 44} = 0
> +
> +    Color operation 44
> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> +    ├─ "BYPASS": bool
> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> +    └─ "NEXT" (immutable) = 45
> +
> +    Color operation 45
> +    ├─ "TYPE" (immutable) = 3x4 Matrix
> +    ├─ "BYPASS": bool
> +    ├─ "MATRIX_3_4": blob
> +    └─ "NEXT" (immutable) = 46
> +
> +    Color operation 46
> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> +    ├─ "BYPASS": bool
> +    ├─ "CURVE_1D_TYPE": enum {sRGB Inverse EOTF, PQ Inverse EOTF} = sRGB EOTF
> +    └─ "NEXT" (immutable) = 47
> +
> +    Color operation 47
> +    ├─ "TYPE" (immutable) = 1D LUT
> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> +    ├─ "LUT_1D_DATA": blob
> +    └─ "NEXT" (immutable) = 48
> +
> +    Color operation 48
> +    ├─ "TYPE" (immutable) = 3D LUT
> +    ├─ "LUT_3D_SIZE" (immutable) = 17
> +    ├─ "LUT_3D_DATA": blob
> +    └─ "NEXT" (immutable) = 49
> +
> +    Color operation 49
> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> +    ├─ "BYPASS": bool
> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> +    └─ "NEXT" (immutable) = 0
> +
> +
> +Color Pipeline Programming
> +==========================
> +
> +Once a DRM client has found a suitable pipeline it will:
> +
> +1. Set the COLOR_PIPELINE enum value to the one pointing at the first
> +   drm_colorop object of the desired pipeline
> +2. Set the properties for all drm_colorop objects in the pipeline to the
> +   desired values, setting BYPASS to true for unused drm_colorop blocks,
> +   and false for enabled drm_colorop blocks
> +3. Perform atomic_check/commit as desired
> +
> +To configure the pipeline for an HDR10 PQ plane and blending in linear
> +space, a compositor might perform an atomic commit with the following
> +property values::
> +
> +    Plane 10
> +    └─ "COLOR_PIPELINE" = 42
> +
> +    Color operation 42 (input CSC)
> +    └─ "BYPASS" = true
> +
> +    Color operation 44 (DeGamma)
> +    └─ "BYPASS" = true
> +
> +    Color operation 45 (gamut remap)
> +    └─ "BYPASS" = true
> +
> +    Color operation 46 (shaper LUT RAM)
> +    └─ "BYPASS" = true
> +
> +    Color operation 47 (3D LUT RAM)
> +    └─ "LUT_3D_DATA" = Gamut mapping + tone mapping + night mode
> +
> +    Color operation 48 (blend gamma)
> +    └─ "CURVE_1D_TYPE" = PQ EOTF
> +
> +
> +Driver Implementer's Guide
> +==========================
> +
> +What does this all mean for driver implementations? As noted above the
> +colorops can map to HW directly but don't need to do so. Here are some
> +suggestions on how to think about creating your color pipelines:
> +
> +- Try to expose pipelines that use already defined colorops, even if
> +  your hardware pipeline is split differently. This allows existing
> +  userspace to immediately take advantage of the hardware.
> +
> +- Additionally, try to expose your actual hardware blocks as colorops.
> +  Define new colorop types where you believe it can offer significant
> +  benefits if userspace learns to program them.
> +
> +- Avoid defining new colorops for compound operations with very narrow
> +  scope. If you have a hardware block for a special operation that
> +  cannot be split further, you can expose that as a new colorop type.
> +  However, try to not define colorops for "use cases", especially if
> +  they require you to combine multiple hardware blocks.
> +
> +- Design new colorops as prescriptive, not descriptive; by the
> +  mathematical formula, not by the assumed input and output.
> +
> +A defined colorop type must be deterministic. Its operation can depend
> +only on its properties and input and nothing else, allowed error
> +tolerance notwithstanding.

Maybe add that the exact behavior or formula of the element must be
documented entirely.

> +
> +
> +Driver Forward/Backward Compatibility
> +=====================================
> +
> +As this is uAPI drivers can't regress color pipelines that have been
> +introduced for a given HW generation. New HW generations are free to
> +abandon color pipelines advertised for previous generations.
> +Nevertheless, it can be beneficial to carry support for existing color
> +pipelines forward as those will likely already have support in DRM
> +clients.
> +
> +Introducing new colorops to a pipeline is fine, as long as they can be
> +disabled or are purely informational. DRM clients implementing support
> +for the pipeline can always skip unknown properties as long as they can
> +be confident that doing so will not cause unexpected results.
> +
> +If a new colorop doesn't fall into one of the above categories
> +(bypassable or informational) the modified pipeline would be unusable
> +for user space. In this case a new pipeline should be defined.

How can user space detect an informational element? Should we just add a
BYPASS property to informational elements, make it read only and set to
true maybe? Or something more descriptive?

> +
> +
> +References
> +==========
> +
> +1. https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_hD5nAccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1QWn488=@emersion.fr/
> \ No newline at end of file
> -- 
> 2.42.0
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-20 14:22   ` Sebastian Wick
@ 2023-10-20 14:57     ` Pekka Paalanen
  2023-10-20 15:23       ` Harry Wentland
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-10-20 14:57 UTC (permalink / raw)
  To: Sebastian Wick
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga, Aleix Pol,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Hector Martin,
	Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 2785 bytes --]

On Fri, 20 Oct 2023 16:22:56 +0200
Sebastian Wick <sebastian.wick@redhat.com> wrote:

> Thanks for continuing to work on this!
> 
> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> > v2:
> >  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> >  - Updated wording (Pekka)
> >  - Change BYPASS wording to make it non-mandatory (Sebastian)
> >  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >    section (Pekka)
> >  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
> >  - Add "Driver Implementer's Guide" section (Pekka)
> >  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> > 

...

> > +Driver Forward/Backward Compatibility
> > +=====================================
> > +
> > +As this is uAPI drivers can't regress color pipelines that have been
> > +introduced for a given HW generation. New HW generations are free to
> > +abandon color pipelines advertised for previous generations.
> > +Nevertheless, it can be beneficial to carry support for existing color
> > +pipelines forward as those will likely already have support in DRM
> > +clients.
> > +
> > +Introducing new colorops to a pipeline is fine, as long as they can be
> > +disabled or are purely informational. DRM clients implementing support
> > +for the pipeline can always skip unknown properties as long as they can
> > +be confident that doing so will not cause unexpected results.
> > +
> > +If a new colorop doesn't fall into one of the above categories
> > +(bypassable or informational) the modified pipeline would be unusable
> > +for user space. In this case a new pipeline should be defined.  
> 
> How can user space detect an informational element? Should we just add a
> BYPASS property to informational elements, make it read only and set to
> true maybe? Or something more descriptive?

Read-only BYPASS set to true would be fine by me, I guess.

I think we also need a definition of "informational".

Counter-example 1: a colorop that represents a non-configurable
YUV<->RGB conversion. Maybe it determines its operation from FB pixel
format. It cannot be set to bypass, it cannot be configured, and it
will alter color values.

Counter-example 2: image size scaling colorop. It might not be
configurable, it is controlled by the plane CRTC_* and SRC_*
properties. You still need to understand what it does, so you can
arrange the scaling to work correctly. (Do not want to scale an image
with PQ-encoded values as Josh demonstrated in XDC.)

Counter-example 3: image sampling colorop. Averages FB originated color
values to produce a color sample. Again do not want to do this with
PQ-encoded values.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-20 14:57     ` Pekka Paalanen
@ 2023-10-20 15:23       ` Harry Wentland
  2023-10-23  8:12         ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Harry Wentland @ 2023-10-20 15:23 UTC (permalink / raw)
  To: Pekka Paalanen, Sebastian Wick
  Cc: Aleix Pol, Sasha McIntosh, Abhinav Kumar, Shashank Sharma,
	Xaver Hugl, Hector Martin, Liviu Dudau, Victoria Brekenfeld,
	dri-devel, wayland-devel, Melissa Wen, Michel Dänzer,
	Jonas Ådahl, Joshua Ashton, Naseer Ahmed, Uma Shankar,
	Christopher Braga, Arthur Grillo



On 2023-10-20 10:57, Pekka Paalanen wrote:
> On Fri, 20 Oct 2023 16:22:56 +0200
> Sebastian Wick <sebastian.wick@redhat.com> wrote:
> 
>> Thanks for continuing to work on this!
>>
>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
>>> v2:
>>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>>  - Updated wording (Pekka)
>>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>>    section (Pekka)
>>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>>>  - Add "Driver Implementer's Guide" section (Pekka)
>>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
>>>
> 
> ...
> 
>>> +Driver Forward/Backward Compatibility
>>> +=====================================
>>> +
>>> +As this is uAPI drivers can't regress color pipelines that have been
>>> +introduced for a given HW generation. New HW generations are free to
>>> +abandon color pipelines advertised for previous generations.
>>> +Nevertheless, it can be beneficial to carry support for existing color
>>> +pipelines forward as those will likely already have support in DRM
>>> +clients.
>>> +
>>> +Introducing new colorops to a pipeline is fine, as long as they can be
>>> +disabled or are purely informational. DRM clients implementing support
>>> +for the pipeline can always skip unknown properties as long as they can
>>> +be confident that doing so will not cause unexpected results.
>>> +
>>> +If a new colorop doesn't fall into one of the above categories
>>> +(bypassable or informational) the modified pipeline would be unusable
>>> +for user space. In this case a new pipeline should be defined.  
>>
>> How can user space detect an informational element? Should we just add a
>> BYPASS property to informational elements, make it read only and set to
>> true maybe? Or something more descriptive?
> 
> Read-only BYPASS set to true would be fine by me, I guess.
> 

Don't you mean set to false? An informational element will always do
something, so it can't be bypassed.

> I think we also need a definition of "informational".
> 
> Counter-example 1: a colorop that represents a non-configurable

Not sure what's "counter" for these examples?

> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> format. It cannot be set to bypass, it cannot be configured, and it
> will alter color values.
> 
> Counter-example 2: image size scaling colorop. It might not be
> configurable, it is controlled by the plane CRTC_* and SRC_*
> properties. You still need to understand what it does, so you can
> arrange the scaling to work correctly. (Do not want to scale an image
> with PQ-encoded values as Josh demonstrated in XDC.)
> 

IMO the position of the scaling operation is the thing that's important
here as the color pipeline won't define scaling properties.

> Counter-example 3: image sampling colorop. Averages FB originated color
> values to produce a color sample. Again do not want to do this with
> PQ-encoded values.
> 

Wouldn't this only happen during a scaling op?

Harry

> 
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-20 15:23       ` Harry Wentland
@ 2023-10-23  8:12         ` Pekka Paalanen
  2023-10-25 20:16           ` Alex Goins
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-10-23  8:12 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Uma Shankar, Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 4036 bytes --]

On Fri, 20 Oct 2023 11:23:28 -0400
Harry Wentland <harry.wentland@amd.com> wrote:

> On 2023-10-20 10:57, Pekka Paalanen wrote:
> > On Fri, 20 Oct 2023 16:22:56 +0200
> > Sebastian Wick <sebastian.wick@redhat.com> wrote:
> >   
> >> Thanks for continuing to work on this!
> >>
> >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:  
> >>> v2:
> >>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> >>>  - Updated wording (Pekka)
> >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >>>    section (Pekka)
> >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
> >>>  - Add "Driver Implementer's Guide" section (Pekka)
> >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> >>>  
> > 
> > ...
> >   
> >>> +Driver Forward/Backward Compatibility
> >>> +=====================================
> >>> +
> >>> +As this is uAPI drivers can't regress color pipelines that have been
> >>> +introduced for a given HW generation. New HW generations are free to
> >>> +abandon color pipelines advertised for previous generations.
> >>> +Nevertheless, it can be beneficial to carry support for existing color
> >>> +pipelines forward as those will likely already have support in DRM
> >>> +clients.
> >>> +
> >>> +Introducing new colorops to a pipeline is fine, as long as they can be
> >>> +disabled or are purely informational. DRM clients implementing support
> >>> +for the pipeline can always skip unknown properties as long as they can
> >>> +be confident that doing so will not cause unexpected results.
> >>> +
> >>> +If a new colorop doesn't fall into one of the above categories
> >>> +(bypassable or informational) the modified pipeline would be unusable
> >>> +for user space. In this case a new pipeline should be defined.    
> >>
> >> How can user space detect an informational element? Should we just add a
> >> BYPASS property to informational elements, make it read only and set to
> >> true maybe? Or something more descriptive?  
> > 
> > Read-only BYPASS set to true would be fine by me, I guess.
> >   
> 
> Don't you mean set to false? An informational element will always do
> something, so it can't be bypassed.

Yeah, this is why we need a definition. I understand "informational" to
not change pixel values in any way. Previously I had some weird idea
that scaling doesn't alter color, but of course it may.


> > I think we also need a definition of "informational".
> > 
> > Counter-example 1: a colorop that represents a non-configurable  
> 
> Not sure what's "counter" for these examples?
> 
> > YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> > format. It cannot be set to bypass, it cannot be configured, and it
> > will alter color values.
> > 
> > Counter-example 2: image size scaling colorop. It might not be
> > configurable, it is controlled by the plane CRTC_* and SRC_*
> > properties. You still need to understand what it does, so you can
> > arrange the scaling to work correctly. (Do not want to scale an image
> > with PQ-encoded values as Josh demonstrated in XDC.)
> >   
> 
> IMO the position of the scaling operation is the thing that's important
> here as the color pipeline won't define scaling properties.
> 
> > Counter-example 3: image sampling colorop. Averages FB originated color
> > values to produce a color sample. Again do not want to do this with
> > PQ-encoded values.
> >   
> 
> Wouldn't this only happen during a scaling op?

There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
coordinates can be fractional, which makes nearest vs. bilinear
sampling have a difference even if there is no scaling.

There is also the question of chroma siting with sub-sampled YUV. I
don't know how that actually works, or how it theoretically should work.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 04/17] drm/vkms: Add kunit tests for VKMS LUT handling
  2023-10-19 21:21 ` [RFC PATCH v2 04/17] drm/vkms: Add kunit tests for VKMS LUT handling Harry Wentland
@ 2023-10-23 22:34   ` Arthur Grillo
  0 siblings, 0 replies; 49+ messages in thread
From: Arthur Grillo @ 2023-10-23 22:34 UTC (permalink / raw)
  To: Harry Wentland, dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Christopher Braga, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Pekka Paalanen, Hector Martin, Xaver Hugl, Joshua Ashton



On 19/10/23 18:21, Harry Wentland wrote:
> Debugging LUT math is much easier when we can unit test
> it. Add kunit functionality to VKMS and add tests for
>  - get_lut_index
>  - lerp_u16
> 
> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Jonas Ådahl <jadahl@redhat.com>
> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Cc: Alexander Goins <agoins@nvidia.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Michel Dänzer <mdaenzer@redhat.com>
> Cc: Aleix Pol <aleixpol@kde.org>
> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> Cc: Victoria Brekenfeld <victoria@system76.com>
> Cc: Sima <daniel@ffwll.ch>
> Cc: Uma Shankar <uma.shankar@intel.com>
> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
> Cc: Christopher Braga <quic_cbraga@quicinc.com>
> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> Cc: Arthur Grillo <arthurgrillo@riseup.net>
> Cc: Hector Martin <marcan@marcan.st>
> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
> Cc: Sasha McIntosh <sashamcintosh@google.com>
> ---
>  drivers/gpu/drm/vkms/Kconfig                  |  5 ++
>  drivers/gpu/drm/vkms/Makefile                 |  2 +
>  drivers/gpu/drm/vkms/tests/.kunitconfig       |  4 ++
>  drivers/gpu/drm/vkms/tests/Makefile           |  4 ++
>  drivers/gpu/drm/vkms/tests/vkms_color_tests.c | 64 +++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_composer.c          |  4 +-
>  drivers/gpu/drm/vkms/vkms_composer.h          | 11 ++++
>  7 files changed, 92 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/tests/.kunitconfig
>  create mode 100644 drivers/gpu/drm/vkms/tests/Makefile
>  create mode 100644 drivers/gpu/drm/vkms/tests/vkms_color_tests.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_composer.h
> 
> diff --git a/drivers/gpu/drm/vkms/Kconfig b/drivers/gpu/drm/vkms/Kconfig
> index 1816562381a2..372cc5fa92f1 100644
> --- a/drivers/gpu/drm/vkms/Kconfig
> +++ b/drivers/gpu/drm/vkms/Kconfig
> @@ -13,3 +13,8 @@ config DRM_VKMS
>  	  a VKMS.
>  
>  	  If M is selected the module will be called vkms.
> +
> +config DRM_VKMS_KUNIT_TESTS
> +	tristate "Tests for VKMS" if !KUNIT_ALL_TESTS
> +	depends on DRM_VKMS && KUNIT
> +	default KUNIT_ALL_TESTS
> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> index 1b28a6a32948..d3440f228f46 100644
> --- a/drivers/gpu/drm/vkms/Makefile
> +++ b/drivers/gpu/drm/vkms/Makefile
> @@ -9,3 +9,5 @@ vkms-y := \
>  	vkms_writeback.o
>  
>  obj-$(CONFIG_DRM_VKMS) += vkms.o
> +
> +obj-y += tests/
> \ No newline at end of file
> diff --git a/drivers/gpu/drm/vkms/tests/.kunitconfig b/drivers/gpu/drm/vkms/tests/.kunitconfig
> new file mode 100644
> index 000000000000..70e378228cbd
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/tests/.kunitconfig
> @@ -0,0 +1,4 @@
> +CONFIG_KUNIT=y
> +CONFIG_DRM=y
> +CONFIG_DRM_VKMS=y
> +CONFIG_DRM_VKMS_KUNIT_TESTS=y
> diff --git a/drivers/gpu/drm/vkms/tests/Makefile b/drivers/gpu/drm/vkms/tests/Makefile
> new file mode 100644
> index 000000000000..761465332ff2
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/tests/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0+
> +
> +obj-$(CONFIG_DRM_VKMS_KUNIT_TESTS) += \
> +	vkms_color_tests.o
> \ No newline at end of file
> diff --git a/drivers/gpu/drm/vkms/tests/vkms_color_tests.c b/drivers/gpu/drm/vkms/tests/vkms_color_tests.c
> new file mode 100644
> index 000000000000..843b2e1d607e
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/tests/vkms_color_tests.c
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#include <kunit/test.h>
> +
> +#include <drm/drm_fixed.h>
> +
> +#include "../vkms_composer.h"
> +
> +#define TEST_LUT_SIZE 16
> +
> +static struct drm_color_lut test_linear_array[TEST_LUT_SIZE] = {
> +	{ 0x0, 0x0, 0x0, 0 },
> +	{ 0x1111, 0x1111, 0x1111, 0 },
> +	{ 0x2222, 0x2222, 0x2222, 0 },
> +	{ 0x3333, 0x3333, 0x3333, 0 },
> +	{ 0x4444, 0x4444, 0x4444, 0 },
> +	{ 0x5555, 0x5555, 0x5555, 0 },
> +	{ 0x6666, 0x6666, 0x6666, 0 },
> +	{ 0x7777, 0x7777, 0x7777, 0 },
> +	{ 0x8888, 0x8888, 0x8888, 0 },
> +	{ 0x9999, 0x9999, 0x9999, 0 },
> +	{ 0xaaaa, 0xaaaa, 0xaaaa, 0 },
> +	{ 0xbbbb, 0xbbbb, 0xbbbb, 0 },
> +	{ 0xcccc, 0xcccc, 0xcccc, 0 },
> +	{ 0xdddd, 0xdddd, 0xdddd, 0 },
> +	{ 0xeeee, 0xeeee, 0xeeee, 0 },
> +	{ 0xffff, 0xffff, 0xffff, 0 },
> +};
> +
> +const struct vkms_color_lut test_linear_lut = {
> +	.base = test_linear_array,
> +	.lut_length = TEST_LUT_SIZE,
> +	.channel_value2index_ratio = 0xf000fll
> +};
> +
> +
> +static void vkms_color_test_get_lut_index(struct kunit *test)
> +{
> +	int i;
> +
> +	KUNIT_EXPECT_EQ(test, drm_fixp2int(get_lut_index(&test_linear_lut, test_linear_array[0].red)), 0);
> +
> +	for (i = 0; i < TEST_LUT_SIZE; i++)
> +		KUNIT_EXPECT_EQ(test, drm_fixp2int_ceil(get_lut_index(&test_linear_lut, test_linear_array[i].red)), i);
> +}
> +
> +static void vkms_color_test_lerp(struct kunit *test)
> +{
> +	KUNIT_EXPECT_EQ(test, lerp_u16(0x0, 0x10, 0x80000000), 0x8);
> +}
> +
> +static struct kunit_case vkms_color_test_cases[] = {
> +	KUNIT_CASE(vkms_color_test_get_lut_index),
> +	KUNIT_CASE(vkms_color_test_lerp),
> +	{}
> +};
> +
> +static struct kunit_suite vkms_color_test_suite = {
> +	.name = "vkms-color",
> +	.test_cases = vkms_color_test_cases,
> +};
> +kunit_test_suite(vkms_color_test_suite);
> +
> +MODULE_LICENSE("GPL");
> \ No newline at end of file
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 3c99fb8b54e2..a0a3a6fd2926 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -91,7 +91,7 @@ static void fill_background(const struct pixel_argb_u16 *background_color,
>  }
>  
>  // lerp(a, b, t) = a + (b - a) * t
> -static u16 lerp_u16(u16 a, u16 b, s64 t)
> +u16 lerp_u16(u16 a, u16 b, s64 t)
>  {
>  	s64 a_fp = drm_int2fixp(a);
>  	s64 b_fp = drm_int2fixp(b);
> @@ -101,7 +101,7 @@ static u16 lerp_u16(u16 a, u16 b, s64 t)
>  	return drm_fixp2int(a_fp + delta);
>  }
>  
> -static s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value)
> +s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value)
>  {
>  	s64 color_channel_fp = drm_int2fixp(channel_value);
>  
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.h b/drivers/gpu/drm/vkms/vkms_composer.h
> new file mode 100644
> index 000000000000..11c5de9cc961
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_composer.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +
> +#ifndef _VKMS_COMPOSER_H_
> +#define _VKMS_COMPOSER_H_
> +
> +#include "vkms_drv.h"
> +
> +s64 get_lut_index(const struct vkms_color_lut *lut, u16 channel_value);
> +u16 lerp_u16(u16 a, u16 b, s64 t);

Not that exposing these functions is inherently wrong, but it might be
better to follow the documentation's suggestion for testing static
functions[0].

[0]: https://www.kernel.org/doc/html/latest/dev-tools/kunit/usage.html#testing-static-functions

Best Regards,
~Arthur Grillo

> +
> +#endif /* _VKMS_COMPOSER_H_ */
> \ No newline at end of file

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-23  8:12         ` Pekka Paalanen
@ 2023-10-25 20:16           ` Alex Goins
  2023-10-26  8:57             ` Pekka Paalanen
  2023-11-07 16:52             ` Harry Wentland
  0 siblings, 2 replies; 49+ messages in thread
From: Alex Goins @ 2023-10-25 20:16 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Uma Shankar, Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 11920 bytes --]

Thank you Harry and all other contributors for your work on this. Responses
inline -

On Mon, 23 Oct 2023, Pekka Paalanen wrote:

> On Fri, 20 Oct 2023 11:23:28 -0400
> Harry Wentland <harry.wentland@amd.com> wrote:
> 
> > On 2023-10-20 10:57, Pekka Paalanen wrote:
> > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > Sebastian Wick <sebastian.wick@redhat.com> wrote:
> > >   
> > >> Thanks for continuing to work on this!
> > >>
> > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:  
> > >>> v2:
> > >>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> > >>>  - Updated wording (Pekka)
> > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> > >>>    section (Pekka)
> > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
> > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> > >
> > > ...
> > >
> > >>> +An example of a drm_colorop object might look like one of these::
> > >>> +
> > >>> +    /* 1D enumerated curve */
> > >>> +    Color operation 42
> > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > >>> +    ├─ "BYPASS": bool {true, false}
> > >>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
> > >>> +    └─ "NEXT": immutable color operation ID = 43

I know these are just examples, but I would also like to suggest the possibility
of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
compared to setting an identity in some cases depending on the hardware. See
below for more on this, RE: implicit format conversions.

Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
offline discussions that it would nonetheless be helpful to expose enumerated
curves in order to hide the vendor-specific complexities of programming
segmented LUTs from clients. In that case, we would simply refer to the
enumerated curve when calculating/choosing segmented LUT entries.

Another thing that came up in offline discussions is that we could use multiple
color operations to program a single operation in hardware. As I understand it,
AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
"HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
we could combine them into a singular LUT in software, such that you can combine
e.g. segmented PQ EOTF with night light. One caveat is that you will lose
precision from the custom LUT where it overlaps with the linear section of the
enumerated curve, but that is unavoidable and shouldn't be an issue in most
use-cases.

Actually, the current examples in the proposal don't include a multiplier color
op, which might be useful. For AMD as above, but also for NVIDIA as the
following issue arises:

As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
to in floating point varies depending on the source content. If it's SDR
content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
is that this is also what AMD's "HDR Multiplier" stage is used for, is that
correct?

From the given enumerated curves, it's not clear how they would map to the
above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
value of 125.0? That may work, but it tends towards the "descriptive" notion of
assuming the source content, which may not be accurate in all cases. This is
also an issue for the custom 1D LUT, as the blob will need to be converted to
FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
value be, given that we no longer have any hint as to the source content?

I think a multiplier color op solves all of these issues. Named curves and
custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then
be adjusted by the multiplier. For 80 nit SDR content, set it to 1, for 400
nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc. 

> > >>> +
> > >>> +    /* custom 4k entry 1D LUT */
> > >>> +    Color operation 52
> > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
> > >>> +    ├─ "BYPASS": bool {true, false}
> > >>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> > >>> +    ├─ "LUT_1D": blob
> > >>> +    └─ "NEXT": immutable color operation ID = 0
> > > 
> > > ...
> > >   
> > >>> +Driver Forward/Backward Compatibility
> > >>> +=====================================
> > >>> +
> > >>> +As this is uAPI drivers can't regress color pipelines that have been
> > >>> +introduced for a given HW generation. New HW generations are free to
> > >>> +abandon color pipelines advertised for previous generations.
> > >>> +Nevertheless, it can be beneficial to carry support for existing color
> > >>> +pipelines forward as those will likely already have support in DRM
> > >>> +clients.
> > >>> +
> > >>> +Introducing new colorops to a pipeline is fine, as long as they can be
> > >>> +disabled or are purely informational. DRM clients implementing support
> > >>> +for the pipeline can always skip unknown properties as long as they can
> > >>> +be confident that doing so will not cause unexpected results.
> > >>> +
> > >>> +If a new colorop doesn't fall into one of the above categories
> > >>> +(bypassable or informational) the modified pipeline would be unusable
> > >>> +for user space. In this case a new pipeline should be defined.    
> > >>
> > >> How can user space detect an informational element? Should we just add a
> > >> BYPASS property to informational elements, make it read only and set to
> > >> true maybe? Or something more descriptive?  
> > > 
> > > Read-only BYPASS set to true would be fine by me, I guess.
> > >   
> > 
> > Don't you mean set to false? An informational element will always do
> > something, so it can't be bypassed.
> 
> Yeah, this is why we need a definition. I understand "informational" to
> not change pixel values in any way. Previously I had some weird idea
> that scaling doesn't alter color, but of course it may.

On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
implicit fixed-point to FP16 conversions, and vice versa.

For example, the "degamma" LUT towards the beginning of the pipeline implicitly
converts from fixed point to FP16, and some of the following operations expect
to operate in FP16. As such, if you have a fixed point input and don't bypass
those following operations, you *must not* bypass the LUT, even if you are
otherwise just programming it with the identity. Conversely, if you have a
floating point input, you *must* bypass the LUT.

Could informational elements and allowing the exclusion of the BYPASS property
be used to convey this information to the client?  For example, we could expose
one pipeline with the LUT exposed with read-only BYPASS set to false, and
sandwich it with informational "Fixed Point" and "FP16" elements to accommodate
fixed point input. Then, expose another pipeline with the LUT missing, and an
informational "FP16" element in its place to accommodate floating point input.

That's just an example; we also have other operations in the pipeline that do
similar implicit conversions. In these cases we don't want the operations to be
bypassed individually, so instead we would expose them as mandatory in some
pipelines and missing in others, with informational elements to help inform the
client of which to choose. Is that acceptable under the current proposal?

Note that in this case, the information just has to do with what format the
pixels should be in, it doesn't correspond to any specific operation. So, I'm
not sure that BYPASS has any meaning for informational elements in this context.

> > > I think we also need a definition of "informational".
> > > 
> > > Counter-example 1: a colorop that represents a non-configurable  
> > 
> > Not sure what's "counter" for these examples?
> > 
> > > YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> > > format. It cannot be set to bypass, it cannot be configured, and it
> > > will alter color values.

Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
based on the principle that read-only blobs could be used to express some static
pipeline elements without the need to define a new type, but got mixed opinions.
I think this demonstrates the principle further, as clients could detect this
programmatically instead of having to special-case the informational element.

> > > 
> > > Counter-example 2: image size scaling colorop. It might not be
> > > configurable, it is controlled by the plane CRTC_* and SRC_*
> > > properties. You still need to understand what it does, so you can
> > > arrange the scaling to work correctly. (Do not want to scale an image
> > > with PQ-encoded values as Josh demonstrated in XDC.)
> > >   
> > 
> > IMO the position of the scaling operation is the thing that's important
> > here as the color pipeline won't define scaling properties.

I agree that blending should ideally be done in linear space, and I remember
that from Josh's presentation at XDC, but I don't recall the same being said for
scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
pipeline that is meant to be in PQ space (more on this below), and that was
found to achieve better results at HDR/SDR boundaries. Of course, this only
bolsters the argument that it would be helpful to have an informational "scaler"
element to understand at which stage scaling takes place.

> > > Counter-example 3: image sampling colorop. Averages FB originated color
> > > values to produce a color sample. Again do not want to do this with
> > > PQ-encoded values.
> > >   
> > 
> > Wouldn't this only happen during a scaling op?
> 
> There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
> coordinates can be fractional, which makes nearest vs. bilinear
> sampling have a difference even if there is no scaling.
> 
> There is also the question of chroma siting with sub-sampled YUV. I
> don't know how that actually works, or how it theoretically should work.

We have some operations in our pipeline that are intended to be static, i.e. a
static matrix that converts from RGB to LMS, and later another that converts
from LMS to ICtCp. There are even LUTs that are intended to be static,
converting from linear to PQ and vice versa. All of this is because the
pre-blending scaler and tone mapping operator are intended to operate in ICtCp
PQ space. Although the stated LUTs and matrices are intended to be static, they
are actually programmable. In offline discussions, it was indicated that it
would be helpful to actually expose the programmability, as opposed to exposing
them as non-bypassable blocks, as some compositors may have novel uses for them.

Despite being programmable, the LUTs are updated in a manner that is less
efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
if there was some way to tag operations according to their performance,
for example so that clients can prefer a high performance one when they
intend to do an animated transition? I recall from the XDC HDR workshop
that this is also an issue with AMD's 3DLUT, where updates can be too
slow to animate.

Thanks,
Alex Goins
NVIDIA Linux Driver Team

> Thanks,
> pq
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-25 20:16           ` Alex Goins
@ 2023-10-26  8:57             ` Pekka Paalanen
  2023-10-26 17:30               ` Sebastian Wick
  2023-11-07 16:52               ` Harry Wentland
  2023-11-07 16:52             ` Harry Wentland
  1 sibling, 2 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-10-26  8:57 UTC (permalink / raw)
  To: Alex Goins
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Uma Shankar, Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 19837 bytes --]

On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
Alex Goins <agoins@nvidia.com> wrote:

> Thank you Harry and all other contributors for your work on this. Responses
> inline -
> 
> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> 
> > On Fri, 20 Oct 2023 11:23:28 -0400
> > Harry Wentland <harry.wentland@amd.com> wrote:
> >   
> > > On 2023-10-20 10:57, Pekka Paalanen wrote:  
> > > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > > Sebastian Wick <sebastian.wick@redhat.com> wrote:
> > > >     
> > > >> Thanks for continuing to work on this!
> > > >>
> > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:    
> > > >>> v2:
> > > >>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> > > >>>  - Updated wording (Pekka)
> > > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> > > >>>    section (Pekka)
> > > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
> > > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > > >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)  
> > > >
> > > > ...
> > > >  
> > > >>> +An example of a drm_colorop object might look like one of these::
> > > >>> +
> > > >>> +    /* 1D enumerated curve */
> > > >>> +    Color operation 42
> > > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > > >>> +    ├─ "BYPASS": bool {true, false}
> > > >>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
> > > >>> +    └─ "NEXT": immutable color operation ID = 43  
> 
> I know these are just examples, but I would also like to suggest the possibility
> of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> compared to setting an identity in some cases depending on the hardware. See
> below for more on this, RE: implicit format conversions.
> 
> Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
> offline discussions that it would nonetheless be helpful to expose enumerated
> curves in order to hide the vendor-specific complexities of programming
> segmented LUTs from clients. In that case, we would simply refer to the
> enumerated curve when calculating/choosing segmented LUT entries.

That's a good idea.

> Another thing that came up in offline discussions is that we could use multiple
> color operations to program a single operation in hardware. As I understand it,
> AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
> "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
> we could combine them into a singular LUT in software, such that you can combine
> e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> precision from the custom LUT where it overlaps with the linear section of the
> enumerated curve, but that is unavoidable and shouldn't be an issue in most
> use-cases.

Indeed.

> Actually, the current examples in the proposal don't include a multiplier color
> op, which might be useful. For AMD as above, but also for NVIDIA as the
> following issue arises:
> 
> As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
> point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
> to in floating point varies depending on the source content. If it's SDR
> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
> content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
> correct?

It would be against the UAPI design principles to tag content as HDR or
SDR. What you can do instead is to expose a colorop with a multiplier of
1.0 or 125.0 to match your hardware behaviour, then tell your hardware
that the input is SDR or HDR to get the expected multiplier. You will
never know what the content actually is, anyway.

Of course, if we want to have a arbitrary multiplier colorop that is
somewhat standard, as in, exposed by many drivers to ease userspace
development, you can certainly use any combination of your hardware
features you need to realize the UAPI prescribed mathematical operation.

Since we are talking about floating-point in hardware, a multiplier
does not significantly affect precision.

In order to mathematically define all colorops, I believe it is
necessary to define all colorops in terms of floating-point values (as
in math), even if they operate on fixed-point or integer. By this I
mean that if the input is 8 bpc unsigned integer pixel format for
instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped
to 1.0, and the color pipeline starts with [0.0, 1.0], not [0, 255]
domain. We have to agree on this mapping for all channels on all pixel
formats. However, there is a "but" further below.

I also propose that quantization range is NOT considered in the raw
value mapping, so that we can handle quantization range in colorops
explicitly, allowing us to e.g. handle sub-blacks and super-whites when
necessary. (These are currently impossible to represent in the legacy
color properties, because everything is converted to full range and
clipped before any color operations.)

> From the given enumerated curves, it's not clear how they would map to the
> above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
> value of 125.0? That may work, but it tends towards the "descriptive" notion of
> assuming the source content, which may not be accurate in all cases. This is
> also an issue for the custom 1D LUT, as the blob will need to be converted to
> FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
> value be, given that we no longer have any hint as to the source content?

In my opinion, all finite non-negative transfer functions should
operate with [0.0, 1.0] domain and [0.0, 1.0] range, and that includes
all sRGB, power 2.2, and PQ curves.

If we look at BT.2100, there is no such encoding even mentioned where
125.0 would correspond to 10k cd/m². That 125.0 convention already has
a built-in assumption what the color spaces are and what the conversion
is aiming to do. IOW, I would say that choice is opinionated from the
start. The multiplier in BT.2100 is always 10000.

Given that elements like various kinds of look-up tables inherently
assume that the domain is [0.0, 1.0] (because the it is a table that
has a beginning and an end, and the usual convention is that the
beginning is zero and the end is one), I think it is best to stick to
the [0.0, 1.0] range where possible. If we go out of that range, then
we have to define how a LUT would apply in a sensible way.

Many TFs are intended to be defined only on [0.0, 1.0] -> [0.0, 1.0].
Some curves, like power 2.2, have a mathematical form that naturally
extends outside of that range. Power 2.2 generalizes to >1.0 input
values as is, but not for negative input values. If needed for negative
input values, it is common to use y = -TF(-x) for x < 0 mirroring.

scRGB is the prime example that intentionally uses negative channel
values. We can also have negative channel values with limited
quantization range, sometimes even intentionally (xvYCC chroma, or
PLUGE test sub-blacks). Out-of-unit-range values can also appear as a
side-effect of signal processing, and they should not get clipped
prematurely. This is a challenge for colorops that fundamentally cannot
handle out-of-unit-range values.

There are various workarounds. scRGB colorimetry can be converted into
BT.2020 primaries for example, to avoid saturation induced negative
values. Limited quantization range signal could be processed as-is,
meaning that the limited range is mapped to [16.0/255, 235.0/255]
instead of [0.0, 1.0] or so. But then, we have a complication with
transfer functions.

> I think a multiplier color op solves all of these issues. Named curves and
> custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then
> be adjusted by the multiplier.

Pretty much.

> For 80 nit SDR content, set it to 1, for 400
> nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc. 

That I think is a another story.

> > > >>> +
> > > >>> +    /* custom 4k entry 1D LUT */
> > > >>> +    Color operation 52
> > > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
> > > >>> +    ├─ "BYPASS": bool {true, false}
> > > >>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> > > >>> +    ├─ "LUT_1D": blob
> > > >>> +    └─ "NEXT": immutable color operation ID = 0  
> > > > 
> > > > ...
> > > >     
> > > >>> +Driver Forward/Backward Compatibility
> > > >>> +=====================================
> > > >>> +
> > > >>> +As this is uAPI drivers can't regress color pipelines that have been
> > > >>> +introduced for a given HW generation. New HW generations are free to
> > > >>> +abandon color pipelines advertised for previous generations.
> > > >>> +Nevertheless, it can be beneficial to carry support for existing color
> > > >>> +pipelines forward as those will likely already have support in DRM
> > > >>> +clients.
> > > >>> +
> > > >>> +Introducing new colorops to a pipeline is fine, as long as they can be
> > > >>> +disabled or are purely informational. DRM clients implementing support
> > > >>> +for the pipeline can always skip unknown properties as long as they can
> > > >>> +be confident that doing so will not cause unexpected results.
> > > >>> +
> > > >>> +If a new colorop doesn't fall into one of the above categories
> > > >>> +(bypassable or informational) the modified pipeline would be unusable
> > > >>> +for user space. In this case a new pipeline should be defined.      
> > > >>
> > > >> How can user space detect an informational element? Should we just add a
> > > >> BYPASS property to informational elements, make it read only and set to
> > > >> true maybe? Or something more descriptive?    
> > > > 
> > > > Read-only BYPASS set to true would be fine by me, I guess.
> > > >     
> > > 
> > > Don't you mean set to false? An informational element will always do
> > > something, so it can't be bypassed.  
> > 
> > Yeah, this is why we need a definition. I understand "informational" to
> > not change pixel values in any way. Previously I had some weird idea
> > that scaling doesn't alter color, but of course it may.  
> 
> On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
> implicit fixed-point to FP16 conversions, and vice versa.

Above, I claimed that the UAPI should be defined in nominal
floating-point values, but I wonder, would that work? Would we need to
have explicit colorops for converting from raw pixel data values into
nominal floating-point in the UAPI?

> For example, the "degamma" LUT towards the beginning of the pipeline implicitly
> converts from fixed point to FP16, and some of the following operations expect
> to operate in FP16. As such, if you have a fixed point input and don't bypass
> those following operations, you *must not* bypass the LUT, even if you are
> otherwise just programming it with the identity. Conversely, if you have a
> floating point input, you *must* bypass the LUT.

Interesting. Since the color pipeline is not(?) meant to replace pixel
format definitions which already make the difference between fixed and
floating point, wouldn't this little detail need to be taken care of by
the driver under the hood?

What if I want to use degamma colorop with a floating-point
framebuffer? Simply not possible on this hardware?

> Could informational elements and allowing the exclusion of the BYPASS property
> be used to convey this information to the client?  For example, we could expose
> one pipeline with the LUT exposed with read-only BYPASS set to false, and
> sandwich it with informational "Fixed Point" and "FP16" elements to accommodate
> fixed point input. Then, expose another pipeline with the LUT missing, and an
> informational "FP16" element in its place to accommodate floating point input.
> 
> That's just an example; we also have other operations in the pipeline that do
> similar implicit conversions. In these cases we don't want the operations to be
> bypassed individually, so instead we would expose them as mandatory in some
> pipelines and missing in others, with informational elements to help inform the
> client of which to choose. Is that acceptable under the current proposal?
> 
> Note that in this case, the information just has to do with what format the
> pixels should be in, it doesn't correspond to any specific operation. So, I'm
> not sure that BYPASS has any meaning for informational elements in this context.

Very good questions. Do we have to expose those conversions in the UAPI
to make things work for this hardware? Meaning that we cannot assume all
colorops work in nominal floating-point from userspace perspective
(perhaps with varying degrees of precision).

> > > > I think we also need a definition of "informational".
> > > > 
> > > > Counter-example 1: a colorop that represents a non-configurable    
> > > 
> > > Not sure what's "counter" for these examples?
> > >   
> > > > YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> > > > format. It cannot be set to bypass, it cannot be configured, and it
> > > > will alter color values.  
> 
> Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
> no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
> based on the principle that read-only blobs could be used to express some static
> pipeline elements without the need to define a new type, but got mixed opinions.
> I think this demonstrates the principle further, as clients could detect this
> programmatically instead of having to special-case the informational element.

If the blob depends on the pixel format (i.e. the driver automatically
chooses a different blob per pixel format), then I think we would need
to expose all the blobs and how they correspond to pixel formats.
Otherwise ok, I guess.

However, do we want or need to make a color pipeline or colorop
conditional on pixel formats? For example, if you use a YUV 4:2:0 type
of pixel format, then you must use this pipeline and not any other. Or
floating-point type of pixel format. I did not anticipate this before,
I assumed that all color pipelines and colorops are independent of the
framebuffer pixel format. A specific colorop might have a property that
needs to agree with the framebuffer pixel format, but I didn't expect
further limitations.

"Without the need to define a new type" is something I think we need to
consider case by case. I have a hard time giving a general opinion.

> > > > 
> > > > Counter-example 2: image size scaling colorop. It might not be
> > > > configurable, it is controlled by the plane CRTC_* and SRC_*
> > > > properties. You still need to understand what it does, so you can
> > > > arrange the scaling to work correctly. (Do not want to scale an image
> > > > with PQ-encoded values as Josh demonstrated in XDC.)
> > > >     
> > > 
> > > IMO the position of the scaling operation is the thing that's important
> > > here as the color pipeline won't define scaling properties.  
> 
> I agree that blending should ideally be done in linear space, and I remember
> that from Josh's presentation at XDC, but I don't recall the same being said for
> scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
> pipeline that is meant to be in PQ space (more on this below), and that was
> found to achieve better results at HDR/SDR boundaries. Of course, this only
> bolsters the argument that it would be helpful to have an informational "scaler"
> element to understand at which stage scaling takes place.

Both blending and scaling are fundamentally the same operation: you
have two or more source colors (pixels), and you want to compute a
weighted average of them following what happens in nature, that is,
physics, as that is what humans are used to.

Both blending and scaling will suffer from the same problems if the
operation is performed on not light-linear values. The result of the
weighted average does not correspond to physics.

The problem may be hard to observe with natural imagery, but Josh's
example shows it very clearly. Maybe that effect is sometimes useful
for some imagery in some use cases, but it is still an accidental
side-effect. You might get even better results if you don't rely on
accidental side-effects but design a separate operation for the exact
goal you have.

Mind, by scaling we mean changing image size. Not scaling color values.

> > > > Counter-example 3: image sampling colorop. Averages FB originated color
> > > > values to produce a color sample. Again do not want to do this with
> > > > PQ-encoded values.
> > > >     
> > > 
> > > Wouldn't this only happen during a scaling op?  
> > 
> > There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
> > coordinates can be fractional, which makes nearest vs. bilinear
> > sampling have a difference even if there is no scaling.
> > 
> > There is also the question of chroma siting with sub-sampled YUV. I
> > don't know how that actually works, or how it theoretically should work.  
> 
> We have some operations in our pipeline that are intended to be static, i.e. a
> static matrix that converts from RGB to LMS, and later another that converts
> from LMS to ICtCp. There are even LUTs that are intended to be static,
> converting from linear to PQ and vice versa. All of this is because the
> pre-blending scaler and tone mapping operator are intended to operate in ICtCp
> PQ space. Although the stated LUTs and matrices are intended to be static, they
> are actually programmable. In offline discussions, it was indicated that it
> would be helpful to actually expose the programmability, as opposed to exposing
> them as non-bypassable blocks, as some compositors may have novel uses for them.

Correct. Doing tone-mapping in ICtCp etc. are already policy that
userspace might or might not agree with.

Exposing static colorops will help usages that adhere to current
prevalent standards around very specific use cases. There may be
millions of devices needing exactly that processing in their usage, but
it is also quite limiting in what one can do with the hardware.

> Despite being programmable, the LUTs are updated in a manner that is less
> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
> if there was some way to tag operations according to their performance,
> for example so that clients can prefer a high performance one when they
> intend to do an animated transition? I recall from the XDC HDR workshop
> that this is also an issue with AMD's 3DLUT, where updates can be too
> slow to animate.

I can certainly see such information being useful, but then we need to
somehow quantize the performance.

What I was left puzzled about after the XDC workshop is that is it
possible to pre-load configurations in the background (slow), and then
quickly switch between them? Hardware-wise I mean.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-26  8:57             ` Pekka Paalanen
@ 2023-10-26 17:30               ` Sebastian Wick
  2023-10-26 19:25                 ` Alex Goins
  2023-11-07 16:52                 ` Harry Wentland
  2023-11-07 16:52               ` Harry Wentland
  1 sibling, 2 replies; 49+ messages in thread
From: Sebastian Wick @ 2023-10-26 17:30 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga, Aleix Pol,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Hector Martin,
	Xaver Hugl, Joshua Ashton

On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> Alex Goins <agoins@nvidia.com> wrote:
> 
> > Thank you Harry and all other contributors for your work on this. Responses
> > inline -
> > 
> > On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> > 
> > > On Fri, 20 Oct 2023 11:23:28 -0400
> > > Harry Wentland <harry.wentland@amd.com> wrote:
> > >   
> > > > On 2023-10-20 10:57, Pekka Paalanen wrote:  
> > > > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > > > Sebastian Wick <sebastian.wick@redhat.com> wrote:
> > > > >     
> > > > >> Thanks for continuing to work on this!
> > > > >>
> > > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:    
> > > > >>> v2:
> > > > >>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> > > > >>>  - Updated wording (Pekka)
> > > > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > > > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> > > > >>>    section (Pekka)
> > > > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
> > > > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > > > >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)  
> > > > >
> > > > > ...
> > > > >  
> > > > >>> +An example of a drm_colorop object might look like one of these::
> > > > >>> +
> > > > >>> +    /* 1D enumerated curve */
> > > > >>> +    Color operation 42
> > > > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > > > >>> +    ├─ "BYPASS": bool {true, false}
> > > > >>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
> > > > >>> +    └─ "NEXT": immutable color operation ID = 43  
> > 
> > I know these are just examples, but I would also like to suggest the possibility
> > of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> > compared to setting an identity in some cases depending on the hardware. See
> > below for more on this, RE: implicit format conversions.
> > 
> > Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
> > offline discussions that it would nonetheless be helpful to expose enumerated
> > curves in order to hide the vendor-specific complexities of programming
> > segmented LUTs from clients. In that case, we would simply refer to the
> > enumerated curve when calculating/choosing segmented LUT entries.
> 
> That's a good idea.
> 
> > Another thing that came up in offline discussions is that we could use multiple
> > color operations to program a single operation in hardware. As I understand it,
> > AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
> > "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
> > we could combine them into a singular LUT in software, such that you can combine
> > e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> > precision from the custom LUT where it overlaps with the linear section of the
> > enumerated curve, but that is unavoidable and shouldn't be an issue in most
> > use-cases.
> 
> Indeed.
> 
> > Actually, the current examples in the proposal don't include a multiplier color
> > op, which might be useful. For AMD as above, but also for NVIDIA as the
> > following issue arises:
> > 
> > As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
> > point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
> > to in floating point varies depending on the source content. If it's SDR
> > content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> > potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
> > content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
> > is that this is also what AMD's "HDR Multiplier" stage is used for, is that
> > correct?
> 
> It would be against the UAPI design principles to tag content as HDR or
> SDR. What you can do instead is to expose a colorop with a multiplier of
> 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
> that the input is SDR or HDR to get the expected multiplier. You will
> never know what the content actually is, anyway.
> 
> Of course, if we want to have a arbitrary multiplier colorop that is
> somewhat standard, as in, exposed by many drivers to ease userspace
> development, you can certainly use any combination of your hardware
> features you need to realize the UAPI prescribed mathematical operation.
> 
> Since we are talking about floating-point in hardware, a multiplier
> does not significantly affect precision.
> 
> In order to mathematically define all colorops, I believe it is
> necessary to define all colorops in terms of floating-point values (as
> in math), even if they operate on fixed-point or integer. By this I
> mean that if the input is 8 bpc unsigned integer pixel format for
> instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped
> to 1.0, and the color pipeline starts with [0.0, 1.0], not [0, 255]
> domain. We have to agree on this mapping for all channels on all pixel
> formats. However, there is a "but" further below.
> 
> I also propose that quantization range is NOT considered in the raw
> value mapping, so that we can handle quantization range in colorops
> explicitly, allowing us to e.g. handle sub-blacks and super-whites when
> necessary. (These are currently impossible to represent in the legacy
> color properties, because everything is converted to full range and
> clipped before any color operations.)
> 
> > From the given enumerated curves, it's not clear how they would map to the
> > above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
> > value of 125.0? That may work, but it tends towards the "descriptive" notion of
> > assuming the source content, which may not be accurate in all cases. This is
> > also an issue for the custom 1D LUT, as the blob will need to be converted to
> > FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
> > value be, given that we no longer have any hint as to the source content?
> 
> In my opinion, all finite non-negative transfer functions should
> operate with [0.0, 1.0] domain and [0.0, 1.0] range, and that includes
> all sRGB, power 2.2, and PQ curves.
> 
> If we look at BT.2100, there is no such encoding even mentioned where
> 125.0 would correspond to 10k cd/m². That 125.0 convention already has
> a built-in assumption what the color spaces are and what the conversion
> is aiming to do. IOW, I would say that choice is opinionated from the
> start. The multiplier in BT.2100 is always 10000.
> 
> Given that elements like various kinds of look-up tables inherently
> assume that the domain is [0.0, 1.0] (because the it is a table that
> has a beginning and an end, and the usual convention is that the
> beginning is zero and the end is one), I think it is best to stick to
> the [0.0, 1.0] range where possible. If we go out of that range, then
> we have to define how a LUT would apply in a sensible way.
> 
> Many TFs are intended to be defined only on [0.0, 1.0] -> [0.0, 1.0].
> Some curves, like power 2.2, have a mathematical form that naturally
> extends outside of that range. Power 2.2 generalizes to >1.0 input
> values as is, but not for negative input values. If needed for negative
> input values, it is common to use y = -TF(-x) for x < 0 mirroring.
> 
> scRGB is the prime example that intentionally uses negative channel
> values. We can also have negative channel values with limited
> quantization range, sometimes even intentionally (xvYCC chroma, or
> PLUGE test sub-blacks). Out-of-unit-range values can also appear as a
> side-effect of signal processing, and they should not get clipped
> prematurely. This is a challenge for colorops that fundamentally cannot
> handle out-of-unit-range values.
> 
> There are various workarounds. scRGB colorimetry can be converted into
> BT.2020 primaries for example, to avoid saturation induced negative
> values. Limited quantization range signal could be processed as-is,
> meaning that the limited range is mapped to [16.0/255, 235.0/255]
> instead of [0.0, 1.0] or so. But then, we have a complication with
> transfer functions.
> 
> > I think a multiplier color op solves all of these issues. Named curves and
> > custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then
> > be adjusted by the multiplier.
> 
> Pretty much.
> 
> > For 80 nit SDR content, set it to 1, for 400
> > nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc. 
> 
> That I think is a another story.
> 
> > > > >>> +
> > > > >>> +    /* custom 4k entry 1D LUT */
> > > > >>> +    Color operation 52
> > > > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
> > > > >>> +    ├─ "BYPASS": bool {true, false}
> > > > >>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> > > > >>> +    ├─ "LUT_1D": blob
> > > > >>> +    └─ "NEXT": immutable color operation ID = 0  
> > > > > 
> > > > > ...
> > > > >     
> > > > >>> +Driver Forward/Backward Compatibility
> > > > >>> +=====================================
> > > > >>> +
> > > > >>> +As this is uAPI drivers can't regress color pipelines that have been
> > > > >>> +introduced for a given HW generation. New HW generations are free to
> > > > >>> +abandon color pipelines advertised for previous generations.
> > > > >>> +Nevertheless, it can be beneficial to carry support for existing color
> > > > >>> +pipelines forward as those will likely already have support in DRM
> > > > >>> +clients.
> > > > >>> +
> > > > >>> +Introducing new colorops to a pipeline is fine, as long as they can be
> > > > >>> +disabled or are purely informational. DRM clients implementing support
> > > > >>> +for the pipeline can always skip unknown properties as long as they can
> > > > >>> +be confident that doing so will not cause unexpected results.
> > > > >>> +
> > > > >>> +If a new colorop doesn't fall into one of the above categories
> > > > >>> +(bypassable or informational) the modified pipeline would be unusable
> > > > >>> +for user space. In this case a new pipeline should be defined.      
> > > > >>
> > > > >> How can user space detect an informational element? Should we just add a
> > > > >> BYPASS property to informational elements, make it read only and set to
> > > > >> true maybe? Or something more descriptive?    
> > > > > 
> > > > > Read-only BYPASS set to true would be fine by me, I guess.
> > > > >     
> > > > 
> > > > Don't you mean set to false? An informational element will always do
> > > > something, so it can't be bypassed.  
> > > 
> > > Yeah, this is why we need a definition. I understand "informational" to
> > > not change pixel values in any way. Previously I had some weird idea
> > > that scaling doesn't alter color, but of course it may.  
> > 
> > On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
> > implicit fixed-point to FP16 conversions, and vice versa.
> 
> Above, I claimed that the UAPI should be defined in nominal
> floating-point values, but I wonder, would that work? Would we need to
> have explicit colorops for converting from raw pixel data values into
> nominal floating-point in the UAPI?
> 
> > For example, the "degamma" LUT towards the beginning of the pipeline implicitly
> > converts from fixed point to FP16, and some of the following operations expect
> > to operate in FP16. As such, if you have a fixed point input and don't bypass
> > those following operations, you *must not* bypass the LUT, even if you are
> > otherwise just programming it with the identity. Conversely, if you have a
> > floating point input, you *must* bypass the LUT.
> 
> Interesting. Since the color pipeline is not(?) meant to replace pixel
> format definitions which already make the difference between fixed and
> floating point, wouldn't this little detail need to be taken care of by
> the driver under the hood?
> 
> What if I want to use degamma colorop with a floating-point
> framebuffer? Simply not possible on this hardware?
> 
> > Could informational elements and allowing the exclusion of the BYPASS property
> > be used to convey this information to the client?  For example, we could expose
> > one pipeline with the LUT exposed with read-only BYPASS set to false, and
> > sandwich it with informational "Fixed Point" and "FP16" elements to accommodate
> > fixed point input. Then, expose another pipeline with the LUT missing, and an
> > informational "FP16" element in its place to accommodate floating point input.
> > 
> > That's just an example; we also have other operations in the pipeline that do
> > similar implicit conversions. In these cases we don't want the operations to be
> > bypassed individually, so instead we would expose them as mandatory in some
> > pipelines and missing in others, with informational elements to help inform the
> > client of which to choose. Is that acceptable under the current proposal?
> > 
> > Note that in this case, the information just has to do with what format the
> > pixels should be in, it doesn't correspond to any specific operation. So, I'm
> > not sure that BYPASS has any meaning for informational elements in this context.
> 
> Very good questions. Do we have to expose those conversions in the UAPI
> to make things work for this hardware? Meaning that we cannot assume all
> colorops work in nominal floating-point from userspace perspective
> (perhaps with varying degrees of precision).

I had this in my original proposal I think (maybe I only thought about
it, not sure).

We really should figure this one out. Can we get away with normalized
[0,1] fp as a user space abstraction or not?

> 
> > > > > I think we also need a definition of "informational".
> > > > > 
> > > > > Counter-example 1: a colorop that represents a non-configurable    
> > > > 
> > > > Not sure what's "counter" for these examples?
> > > >   
> > > > > YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> > > > > format. It cannot be set to bypass, it cannot be configured, and it
> > > > > will alter color values.  
> > 
> > Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
> > no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
> > based on the principle that read-only blobs could be used to express some static
> > pipeline elements without the need to define a new type, but got mixed opinions.
> > I think this demonstrates the principle further, as clients could detect this
> > programmatically instead of having to special-case the informational element.
> 

I'm all for exposing fixed color ops but I suspect that most of those
follow some standard and in those cases instead of exposing the matrix
values one should prefer to expose a named matrix (e.g. BT.601, BT.709,
BT.2020).

As a general rule: always expose the highest level description. Going
from a name to exact values is trivial, going from values to a name is
much harder.

> If the blob depends on the pixel format (i.e. the driver automatically
> chooses a different blob per pixel format), then I think we would need
> to expose all the blobs and how they correspond to pixel formats.
> Otherwise ok, I guess.
> 
> However, do we want or need to make a color pipeline or colorop
> conditional on pixel formats? For example, if you use a YUV 4:2:0 type
> of pixel format, then you must use this pipeline and not any other. Or
> floating-point type of pixel format. I did not anticipate this before,
> I assumed that all color pipelines and colorops are independent of the
> framebuffer pixel format. A specific colorop might have a property that
> needs to agree with the framebuffer pixel format, but I didn't expect
> further limitations.

We could simply fail commits when the pipeline and pixel format don't
work together. We'll probably need some kind of ingress no-op node
anyway and maybe could list pixel formats there if required to make it
easier to find a working configuration.

> "Without the need to define a new type" is something I think we need to
> consider case by case. I have a hard time giving a general opinion.
> 
> > > > > 
> > > > > Counter-example 2: image size scaling colorop. It might not be
> > > > > configurable, it is controlled by the plane CRTC_* and SRC_*
> > > > > properties. You still need to understand what it does, so you can
> > > > > arrange the scaling to work correctly. (Do not want to scale an image
> > > > > with PQ-encoded values as Josh demonstrated in XDC.)
> > > > >     
> > > > 
> > > > IMO the position of the scaling operation is the thing that's important
> > > > here as the color pipeline won't define scaling properties.  
> > 
> > I agree that blending should ideally be done in linear space, and I remember
> > that from Josh's presentation at XDC, but I don't recall the same being said for
> > scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
> > pipeline that is meant to be in PQ space (more on this below), and that was
> > found to achieve better results at HDR/SDR boundaries. Of course, this only
> > bolsters the argument that it would be helpful to have an informational "scaler"
> > element to understand at which stage scaling takes place.
> 
> Both blending and scaling are fundamentally the same operation: you
> have two or more source colors (pixels), and you want to compute a
> weighted average of them following what happens in nature, that is,
> physics, as that is what humans are used to.
> 
> Both blending and scaling will suffer from the same problems if the
> operation is performed on not light-linear values. The result of the
> weighted average does not correspond to physics.
> 
> The problem may be hard to observe with natural imagery, but Josh's
> example shows it very clearly. Maybe that effect is sometimes useful
> for some imagery in some use cases, but it is still an accidental
> side-effect. You might get even better results if you don't rely on
> accidental side-effects but design a separate operation for the exact
> goal you have.
> 
> Mind, by scaling we mean changing image size. Not scaling color values.
> 
> > > > > Counter-example 3: image sampling colorop. Averages FB originated color
> > > > > values to produce a color sample. Again do not want to do this with
> > > > > PQ-encoded values.
> > > > >     
> > > > 
> > > > Wouldn't this only happen during a scaling op?  
> > > 
> > > There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
> > > coordinates can be fractional, which makes nearest vs. bilinear
> > > sampling have a difference even if there is no scaling.
> > > 
> > > There is also the question of chroma siting with sub-sampled YUV. I
> > > don't know how that actually works, or how it theoretically should work.  
> > 
> > We have some operations in our pipeline that are intended to be static, i.e. a
> > static matrix that converts from RGB to LMS, and later another that converts
> > from LMS to ICtCp. There are even LUTs that are intended to be static,
> > converting from linear to PQ and vice versa. All of this is because the
> > pre-blending scaler and tone mapping operator are intended to operate in ICtCp
> > PQ space. Although the stated LUTs and matrices are intended to be static, they
> > are actually programmable. In offline discussions, it was indicated that it
> > would be helpful to actually expose the programmability, as opposed to exposing
> > them as non-bypassable blocks, as some compositors may have novel uses for them.
> 
> Correct. Doing tone-mapping in ICtCp etc. are already policy that
> userspace might or might not agree with.
> 
> Exposing static colorops will help usages that adhere to current
> prevalent standards around very specific use cases. There may be
> millions of devices needing exactly that processing in their usage, but
> it is also quite limiting in what one can do with the hardware.
> 
> > Despite being programmable, the LUTs are updated in a manner that is less
> > efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
> > if there was some way to tag operations according to their performance,
> > for example so that clients can prefer a high performance one when they
> > intend to do an animated transition? I recall from the XDC HDR workshop
> > that this is also an issue with AMD's 3DLUT, where updates can be too
> > slow to animate.
> 
> I can certainly see such information being useful, but then we need to
> somehow quantize the performance.
> 
> What I was left puzzled about after the XDC workshop is that is it
> possible to pre-load configurations in the background (slow), and then
> quickly switch between them? Hardware-wise I mean.

We could define that pipelines with a lower ID are to be preferred over
higher IDs.

The issue is that if programming a pipeline becomes too slow to be
useful it probably should just not be made available to user space.

The prepare-commit idea for blob properties would help to make the
pipelines usable again, but until then it's probably a good idea to just
not expose those pipelines.

> 
> 
> Thanks,
> pq



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-26 17:30               ` Sebastian Wick
@ 2023-10-26 19:25                 ` Alex Goins
  2023-10-27  8:59                   ` Michel Dänzer
                                     ` (2 more replies)
  2023-11-07 16:52                 ` Harry Wentland
  1 sibling, 3 replies; 49+ messages in thread
From: Alex Goins @ 2023-10-26 19:25 UTC (permalink / raw)
  To: Sebastian Wick
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga, Aleix Pol,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Uma Shankar,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Pekka Paalanen,
	Hector Martin, Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 29514 bytes --]

On Thu, 26 Oct 2023, Sebastian Wick wrote:

> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> > On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> > Alex Goins <agoins@nvidia.com> wrote:
> >
> > > Thank you Harry and all other contributors for your work on this. Responses
> > > inline -
> > >
> > > On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> > >
> > > > On Fri, 20 Oct 2023 11:23:28 -0400
> > > > Harry Wentland <harry.wentland@amd.com> wrote:
> > > >
> > > > > On 2023-10-20 10:57, Pekka Paalanen wrote:
> > > > > > On Fri, 20 Oct 2023 16:22:56 +0200
> > > > > > Sebastian Wick <sebastian.wick@redhat.com> wrote:
> > > > > >
> > > > > >> Thanks for continuing to work on this!
> > > > > >>
> > > > > >> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
> > > > > >>> v2:
> > > > > >>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> > > > > >>>  - Updated wording (Pekka)
> > > > > >>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> > > > > >>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> > > > > >>>    section (Pekka)
> > > > > >>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
> > > > > >>>  - Add "Driver Implementer's Guide" section (Pekka)
> > > > > >>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> > > > > >
> > > > > > ...
> > > > > >
> > > > > >>> +An example of a drm_colorop object might look like one of these::
> > > > > >>> +
> > > > > >>> +    /* 1D enumerated curve */
> > > > > >>> +    Color operation 42
> > > > > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
> > > > > >>> +    ├─ "BYPASS": bool {true, false}
> > > > > >>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
> > > > > >>> +    └─ "NEXT": immutable color operation ID = 43
> > >
> > > I know these are just examples, but I would also like to suggest the possibility
> > > of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> > > compared to setting an identity in some cases depending on the hardware. See
> > > below for more on this, RE: implicit format conversions.
> > >
> > > Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
> > > offline discussions that it would nonetheless be helpful to expose enumerated
> > > curves in order to hide the vendor-specific complexities of programming
> > > segmented LUTs from clients. In that case, we would simply refer to the
> > > enumerated curve when calculating/choosing segmented LUT entries.
> >
> > That's a good idea.
> >
> > > Another thing that came up in offline discussions is that we could use multiple
> > > color operations to program a single operation in hardware. As I understand it,
> > > AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
> > > "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
> > > we could combine them into a singular LUT in software, such that you can combine
> > > e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> > > precision from the custom LUT where it overlaps with the linear section of the
> > > enumerated curve, but that is unavoidable and shouldn't be an issue in most
> > > use-cases.
> >
> > Indeed.
> >
> > > Actually, the current examples in the proposal don't include a multiplier color
> > > op, which might be useful. For AMD as above, but also for NVIDIA as the
> > > following issue arises:
> > >
> > > As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
> > > point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
> > > to in floating point varies depending on the source content. If it's SDR
> > > content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> > > potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
> > > content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
> > > is that this is also what AMD's "HDR Multiplier" stage is used for, is that
> > > correct?
> >
> > It would be against the UAPI design principles to tag content as HDR or
> > SDR. What you can do instead is to expose a colorop with a multiplier of
> > 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
> > that the input is SDR or HDR to get the expected multiplier. You will
> > never know what the content actually is, anyway.

Right, I didn't mean to suggest that we should tag content as HDR or SDR in the
UAPI, just relating to the end result in the pipe, ultimately it would be
determined by the multiplier color op. 

> >
> > Of course, if we want to have a arbitrary multiplier colorop that is
> > somewhat standard, as in, exposed by many drivers to ease userspace
> > development, you can certainly use any combination of your hardware
> > features you need to realize the UAPI prescribed mathematical operation.
> >
> > Since we are talking about floating-point in hardware, a multiplier
> > does not significantly affect precision.
> >
> > In order to mathematically define all colorops, I believe it is
> > necessary to define all colorops in terms of floating-point values (as
> > in math), even if they operate on fixed-point or integer. By this I
> > mean that if the input is 8 bpc unsigned integer pixel format for
> > instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped
> > to 1.0, and the color pipeline starts with [0.0, 1.0], not [0, 255]
> > domain. We have to agree on this mapping for all channels on all pixel
> > formats. However, there is a "but" further below.

I think this makes sense insofar as how we interact with the UAPI, and that's
basically how fixed point works for us anyway. However, relating to your "but",
it doesn't avoid the issue with hardware expectations about pixel formats since
it doesn't change the underlying pixel format.

> >
> > I also propose that quantization range is NOT considered in the raw
> > value mapping, so that we can handle quantization range in colorops
> > explicitly, allowing us to e.g. handle sub-blacks and super-whites when
> > necessary. (These are currently impossible to represent in the legacy
> > color properties, because everything is converted to full range and
> > clipped before any color operations.)
> >
> > > From the given enumerated curves, it's not clear how they would map to the
> > > above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
> > > value of 125.0? That may work, but it tends towards the "descriptive" notion of
> > > assuming the source content, which may not be accurate in all cases. This is
> > > also an issue for the custom 1D LUT, as the blob will need to be converted to
> > > FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
> > > value be, given that we no longer have any hint as to the source content?
> >
> > In my opinion, all finite non-negative transfer functions should
> > operate with [0.0, 1.0] domain and [0.0, 1.0] range, and that includes
> > all sRGB, power 2.2, and PQ curves.

Right, I think so too, otherwise you are making assumptions about the source
content. For example, it's possible to do HDR with a basic gamma curve, so you
can't really assume that gamma should always go up to 1.0, but PQ up to 125.0.
If you did that, it would necessitate adding an "HDR Gamma" curve, which is
converging back on a "descriptive" UAPI. By leaving the final range up to the
subsequent multiplier, the client gets to choose independently from the TF,
which seems more in line with the goals of this proposal.

> >
> > If we look at BT.2100, there is no such encoding even mentioned where
> > 125.0 would correspond to 10k cd/m². That 125.0 convention already has
> > a built-in assumption what the color spaces are and what the conversion
> > is aiming to do. IOW, I would say that choice is opinionated from the
> > start. The multiplier in BT.2100 is always 10000.

Be that as it may, the convention of FP16 125.0 corresponding to 10k nits is
baked in our hardware, so it's unavoidable at least for NVIDIA pipelines.

> >
> > Given that elements like various kinds of look-up tables inherently
> > assume that the domain is [0.0, 1.0] (because the it is a table that
> > has a beginning and an end, and the usual convention is that the
> > beginning is zero and the end is one), I think it is best to stick to
> > the [0.0, 1.0] range where possible. If we go out of that range, then
> > we have to define how a LUT would apply in a sensible way.

In my last reply I mentioned a static (but actually programmable) LUT that is
typically used to convert FP16 linear pixels to fixed point PQ before handing
them to the scaler and tone mapping operator. You're actually right that it
indexes in the fixed point [0.0, 1.0] range for the reasons you describe, but
because the input pixels are expected to be FP16 in the [0.0, 125.0] range, it
applies a non-programmable 1/125.0 normalization factor first.

In this case, you could think of the LUT as indexing on [0.0, 125.0], but as you
point out there would need to be some way to describe that. Maybe we actually
need a fractional multiplier / divider color op. NVIDIA pipes that include this
LUT would need to include a mandatory 1/125.0 factor immediately prior to the
LUT, then LUT can continue assuming a range of [0.0, 1.0].

Assuming you are using the hardware in a conventional way, specifying a
multiplier of 1.0 after the "degamma" LUT would then map to the 80-nit PQ range
after the static (but actually programmable) PQ LUT, whereas specifying a
multiplier of 125.0 would map to the 10,000-nit PQ range, which is what we want.
I guess it's kind of messy, but the effect would be that color ops other than
multipliers/dividers would still be in the [0.0, 1.0] domain, and any multiplier
that exceeds that range would have to be normalized by a divider before any
other color op.

> >
> > Many TFs are intended to be defined only on [0.0, 1.0] -> [0.0, 1.0].
> > Some curves, like power 2.2, have a mathematical form that naturally
> > extends outside of that range. Power 2.2 generalizes to >1.0 input
> > values as is, but not for negative input values. If needed for negative
> > input values, it is common to use y = -TF(-x) for x < 0 mirroring.
> >
> > scRGB is the prime example that intentionally uses negative channel
> > values. We can also have negative channel values with limited
> > quantization range, sometimes even intentionally (xvYCC chroma, or
> > PLUGE test sub-blacks). Out-of-unit-range values can also appear as a
> > side-effect of signal processing, and they should not get clipped
> > prematurely. This is a challenge for colorops that fundamentally cannot
> > handle out-of-unit-range values.
> >
> > There are various workarounds. scRGB colorimetry can be converted into
> > BT.2020 primaries for example, to avoid saturation induced negative
> > values. Limited quantization range signal could be processed as-is,
> > meaning that the limited range is mapped to [16.0/255, 235.0/255]
> > instead of [0.0, 1.0] or so. But then, we have a complication with
> > transfer functions.
> >
> > > I think a multiplier color op solves all of these issues. Named curves and
> > > custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then
> > > be adjusted by the multiplier.
> >
> > Pretty much.
> >
> > > For 80 nit SDR content, set it to 1, for 400
> > > nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc.
> >
> > That I think is a another story.
> >
> > > > > >>> +
> > > > > >>> +    /* custom 4k entry 1D LUT */
> > > > > >>> +    Color operation 52
> > > > > >>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
> > > > > >>> +    ├─ "BYPASS": bool {true, false}
> > > > > >>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> > > > > >>> +    ├─ "LUT_1D": blob
> > > > > >>> +    └─ "NEXT": immutable color operation ID = 0
> > > > > >
> > > > > > ...
> > > > > >
> > > > > >>> +Driver Forward/Backward Compatibility
> > > > > >>> +=====================================
> > > > > >>> +
> > > > > >>> +As this is uAPI drivers can't regress color pipelines that have been
> > > > > >>> +introduced for a given HW generation. New HW generations are free to
> > > > > >>> +abandon color pipelines advertised for previous generations.
> > > > > >>> +Nevertheless, it can be beneficial to carry support for existing color
> > > > > >>> +pipelines forward as those will likely already have support in DRM
> > > > > >>> +clients.
> > > > > >>> +
> > > > > >>> +Introducing new colorops to a pipeline is fine, as long as they can be
> > > > > >>> +disabled or are purely informational. DRM clients implementing support
> > > > > >>> +for the pipeline can always skip unknown properties as long as they can
> > > > > >>> +be confident that doing so will not cause unexpected results.
> > > > > >>> +
> > > > > >>> +If a new colorop doesn't fall into one of the above categories
> > > > > >>> +(bypassable or informational) the modified pipeline would be unusable
> > > > > >>> +for user space. In this case a new pipeline should be defined.
> > > > > >>
> > > > > >> How can user space detect an informational element? Should we just add a
> > > > > >> BYPASS property to informational elements, make it read only and set to
> > > > > >> true maybe? Or something more descriptive?
> > > > > >
> > > > > > Read-only BYPASS set to true would be fine by me, I guess.
> > > > > >
> > > > >
> > > > > Don't you mean set to false? An informational element will always do
> > > > > something, so it can't be bypassed.
> > > >
> > > > Yeah, this is why we need a definition. I understand "informational" to
> > > > not change pixel values in any way. Previously I had some weird idea
> > > > that scaling doesn't alter color, but of course it may.
> > >
> > > On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
> > > implicit fixed-point to FP16 conversions, and vice versa.
> >
> > Above, I claimed that the UAPI should be defined in nominal
> > floating-point values, but I wonder, would that work? Would we need to
> > have explicit colorops for converting from raw pixel data values into
> > nominal floating-point in the UAPI?

Yeah, I think something like that is needed, or another solution as discussed
below. Even if we define the UAPI in terms of floating point, the actual
underlying pixel format needs to match the expectations of each stage as it
flows through the pipe.

> >
> > > For example, the "degamma" LUT towards the beginning of the pipeline implicitly
> > > converts from fixed point to FP16, and some of the following operations expect
> > > to operate in FP16. As such, if you have a fixed point input and don't bypass
> > > those following operations, you *must not* bypass the LUT, even if you are
> > > otherwise just programming it with the identity. Conversely, if you have a
> > > floating point input, you *must* bypass the LUT.
> >
> > Interesting. Since the color pipeline is not(?) meant to replace pixel
> > format definitions which already make the difference between fixed and
> > floating point, wouldn't this little detail need to be taken care of by
> > the driver under the hood?

We could take care of it under the hood in the case where the pixel format is
fixed point but the "degamma" LUT is bypassed, simply by programming it with the
identity to allow for the conversion to take place. But when the pixel format is
FP16 and the "degamma" LUT is *not* bypassed, we would need to either ignore the
LUT (bad) or fail the atomic commit. That's why we need some way to communicate
the restriction to the client, otherwise they are left guessing why the atomic
commit failed.

> >
> > What if I want to use degamma colorop with a floating-point
> > framebuffer? Simply not possible on this hardware?

Right, it's not possible. The "degamma" LUT always does an implicit conversion
from fixed point to FP16, so if the pixel format is already FP16 it isn't
usable. However, the aforementioned static (actually programmable) LUT that
follows the "degamma" LUT expects FP16 pixels, so you could still use that to do
some kind of transformation. That's actually a good example of a novel use that
justifies compositors being able to program it.

> >
> > > Could informational elements and allowing the exclusion of the BYPASS property
> > > be used to convey this information to the client?  For example, we could expose
> > > one pipeline with the LUT exposed with read-only BYPASS set to false, and
> > > sandwich it with informational "Fixed Point" and "FP16" elements to accommodate
> > > fixed point input. Then, expose another pipeline with the LUT missing, and an
> > > informational "FP16" element in its place to accommodate floating point input.
> > >
> > > That's just an example; we also have other operations in the pipeline that do
> > > similar implicit conversions. In these cases we don't want the operations to be
> > > bypassed individually, so instead we would expose them as mandatory in some
> > > pipelines and missing in others, with informational elements to help inform the
> > > client of which to choose. Is that acceptable under the current proposal?
> > >
> > > Note that in this case, the information just has to do with what format the
> > > pixels should be in, it doesn't correspond to any specific operation. So, I'm
> > > not sure that BYPASS has any meaning for informational elements in this context.
> >
> > Very good questions. Do we have to expose those conversions in the UAPI
> > to make things work for this hardware? Meaning that we cannot assume all
> > colorops work in nominal floating-point from userspace perspective
> > (perhaps with varying degrees of precision).
> 
> I had this in my original proposal I think (maybe I only thought about
> it, not sure).
> 
> We really should figure this one out. Can we get away with normalized
> [0,1] fp as a user space abstraction or not?

I think the conversion needs to be exposed at least just the one time at the
beginning alongside the "degamma" LUT, since the choice is influenced an outside
factor (the input pixel format). There are subsequent intermediate conversions
as well, but that's only an issue if we allow the relevant color ops to be
bypassed individually. If we expose a multitude of pipes where the relevant ops
are either missing or mandatory in unison, we can avoid mismatched pixel formats
while maintaining the illusion of a pipe that operates entirely in floating
point.

Or, pipes could just have explicit associated input pixel format(s). The above
technique of exposing multiple pipes instead of bypassing color ops individually
would still work, and clients would just have to choose a pipe that matches the
input pixel format. That way, the actual color ops themselves could still be
defined in terms of normalized [0.0, 1.0] floating point (multipliers/dividers
excepted), and clients can continue thinking in terms of that after making the
initial selection.

> 
> >
> > > > > > I think we also need a definition of "informational".
> > > > > >
> > > > > > Counter-example 1: a colorop that represents a non-configurable
> > > > >
> > > > > Not sure what's "counter" for these examples?
> > > > >
> > > > > > YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> > > > > > format. It cannot be set to bypass, it cannot be configured, and it
> > > > > > will alter color values.
> > >
> > > Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
> > > no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
> > > based on the principle that read-only blobs could be used to express some static
> > > pipeline elements without the need to define a new type, but got mixed opinions.
> > > I think this demonstrates the principle further, as clients could detect this
> > > programmatically instead of having to special-case the informational element.
> >
> 
> I'm all for exposing fixed color ops but I suspect that most of those
> follow some standard and in those cases instead of exposing the matrix
> values one should prefer to expose a named matrix (e.g. BT.601, BT.709,
> BT.2020).
> 
> As a general rule: always expose the highest level description. Going
> from a name to exact values is trivial, going from values to a name is
> much harder.

Good point. It would need to be a conversion between any two defined color
spaces e.g. BT.709-to-BT.2020, hence why it's much harder to go backwards.

> > If the blob depends on the pixel format (i.e. the driver automatically
> > chooses a different blob per pixel format), then I think we would need
> > to expose all the blobs and how they correspond to pixel formats.
> > Otherwise ok, I guess.
> >
> > However, do we want or need to make a color pipeline or colorop
> > conditional on pixel formats? For example, if you use a YUV 4:2:0 type
> > of pixel format, then you must use this pipeline and not any other. Or
> > floating-point type of pixel format. I did not anticipate this before,
> > I assumed that all color pipelines and colorops are independent of the
> > framebuffer pixel format. A specific colorop might have a property that
> > needs to agree with the framebuffer pixel format, but I didn't expect
> > further limitations.
> 
> We could simply fail commits when the pipeline and pixel format don't
> work together. We'll probably need some kind of ingress no-op node
> anyway and maybe could list pixel formats there if required to make it
> easier to find a working configuration.

Yeah, we could, but having to figure that out through trial and error would be
unfortunate. Per above, it might be easiest to just tag pipelines with a pixel
format instead of trying to include the pixel format conversion as a color op.

> > "Without the need to define a new type" is something I think we need to
> > consider case by case. I have a hard time giving a general opinion.
> >
> > > > > >
> > > > > > Counter-example 2: image size scaling colorop. It might not be
> > > > > > configurable, it is controlled by the plane CRTC_* and SRC_*
> > > > > > properties. You still need to understand what it does, so you can
> > > > > > arrange the scaling to work correctly. (Do not want to scale an image
> > > > > > with PQ-encoded values as Josh demonstrated in XDC.)
> > > > > >
> > > > >
> > > > > IMO the position of the scaling operation is the thing that's important
> > > > > here as the color pipeline won't define scaling properties.
> > >
> > > I agree that blending should ideally be done in linear space, and I remember
> > > that from Josh's presentation at XDC, but I don't recall the same being said for
> > > scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
> > > pipeline that is meant to be in PQ space (more on this below), and that was
> > > found to achieve better results at HDR/SDR boundaries. Of course, this only
> > > bolsters the argument that it would be helpful to have an informational "scaler"
> > > element to understand at which stage scaling takes place.
> >
> > Both blending and scaling are fundamentally the same operation: you
> > have two or more source colors (pixels), and you want to compute a
> > weighted average of them following what happens in nature, that is,
> > physics, as that is what humans are used to.
> >
> > Both blending and scaling will suffer from the same problems if the
> > operation is performed on not light-linear values. The result of the
> > weighted average does not correspond to physics.
> >
> > The problem may be hard to observe with natural imagery, but Josh's
> > example shows it very clearly. Maybe that effect is sometimes useful
> > for some imagery in some use cases, but it is still an accidental
> > side-effect. You might get even better results if you don't rely on
> > accidental side-effects but design a separate operation for the exact
> > goal you have.
> >
> > Mind, by scaling we mean changing image size. Not scaling color values.
> >

Fair enough, but it might not always be a choice given the hardware.

> > > > > > Counter-example 3: image sampling colorop. Averages FB originated color
> > > > > > values to produce a color sample. Again do not want to do this with
> > > > > > PQ-encoded values.
> > > > > >
> > > > >
> > > > > Wouldn't this only happen during a scaling op?
> > > >
> > > > There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
> > > > coordinates can be fractional, which makes nearest vs. bilinear
> > > > sampling have a difference even if there is no scaling.
> > > >
> > > > There is also the question of chroma siting with sub-sampled YUV. I
> > > > don't know how that actually works, or how it theoretically should work.
> > >
> > > We have some operations in our pipeline that are intended to be static, i.e. a
> > > static matrix that converts from RGB to LMS, and later another that converts
> > > from LMS to ICtCp. There are even LUTs that are intended to be static,
> > > converting from linear to PQ and vice versa. All of this is because the
> > > pre-blending scaler and tone mapping operator are intended to operate in ICtCp
> > > PQ space. Although the stated LUTs and matrices are intended to be static, they
> > > are actually programmable. In offline discussions, it was indicated that it
> > > would be helpful to actually expose the programmability, as opposed to exposing
> > > them as non-bypassable blocks, as some compositors may have novel uses for them.
> >
> > Correct. Doing tone-mapping in ICtCp etc. are already policy that
> > userspace might or might not agree with.
> >
> > Exposing static colorops will help usages that adhere to current
> > prevalent standards around very specific use cases. There may be
> > millions of devices needing exactly that processing in their usage, but
> > it is also quite limiting in what one can do with the hardware.
> >
> > > Despite being programmable, the LUTs are updated in a manner that is less
> > > efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
> > > if there was some way to tag operations according to their performance,
> > > for example so that clients can prefer a high performance one when they
> > > intend to do an animated transition? I recall from the XDC HDR workshop
> > > that this is also an issue with AMD's 3DLUT, where updates can be too
> > > slow to animate.
> >
> > I can certainly see such information being useful, but then we need to
> > somehow quantize the performance.

Right, which wouldn't even necessarily be universal, could depend on the given
host, GPU, etc. It could just be a relative performance indication, to give an
order of preference. That wouldn't tell you if it can or can't be animated, but
when choosing between two LUTs to animate you could prefer the higher
performance one.

> >
> > What I was left puzzled about after the XDC workshop is that is it
> > possible to pre-load configurations in the background (slow), and then
> > quickly switch between them? Hardware-wise I mean.

This works fine for our "fast" LUTs, you just point them to a surface in video
memory and they flip to it. You could keep multiple surfaces around and flip
between them without having to reprogram them in software. We can easily do that
with enumerated curves, populating them when the driver initializes instead of
waiting for the client to request them. You can even point multiple hardware
LUTs to the same video memory surface, if they need the same curve.

> 
> We could define that pipelines with a lower ID are to be preferred over
> higher IDs.

Sure, but this isn't just an issue with a pipeline as a whole, but the
individual elements within it and how to use them in a given context.

> 
> The issue is that if programming a pipeline becomes too slow to be
> useful it probably should just not be made available to user space.

It's not that programming the pipeline is overall too slow. The LUTs we have
that are relatively slow to program are meant to be set infrequently, or even
just once, to allow the scaler and tone mapping operator to operate in fixed
point PQ space. You might still want the tone mapper, so you would choose a
pipeline that includes them, but when it comes to e.g. animating a night light,
you would want to choose a different LUT for that purpose.

> 
> The prepare-commit idea for blob properties would help to make the
> pipelines usable again, but until then it's probably a good idea to just
> not expose those pipelines.

The prepare-commit idea actually wouldn't work for these LUTs, because they are
programmed using methods instead of pointing them to a surface. I'm actually not
sure how slow it actually is, would need to benchmark it. I think not exposing
them at all would be overkill, since it would mean you can't use the preblending
scaler or tonemapper, and animation isn't necessary for that.

The AMD 3DLUT is another example of a LUT that is slow to update, and it would
obviously be a major loss if that wasn't exposed. There just needs to be some
way for clients to know if they are going to kill performance by trying to
change it every frame.

Thanks,
Alex

> 
> >
> >
> > Thanks,
> > pq
> 
> 
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-26 19:25                 ` Alex Goins
@ 2023-10-27  8:59                   ` Michel Dänzer
  2023-10-27 10:01                     ` Sebastian Wick
  2023-11-04 23:01                   ` Christopher Braga
  2023-11-07 16:52                   ` Harry Wentland
  2 siblings, 1 reply; 49+ messages in thread
From: Michel Dänzer @ 2023-10-27  8:59 UTC (permalink / raw)
  To: Alex Goins, Sebastian Wick
  Cc: Aleix Pol, Sasha McIntosh, Pekka Paalanen, Abhinav Kumar,
	Shashank Sharma, Xaver Hugl, Hector Martin, Liviu Dudau,
	Victoria Brekenfeld, dri-devel, Arthur Grillo, Melissa Wen,
	Jonas Ådahl, Uma Shankar, Joshua Ashton, wayland-devel,
	Christopher Braga, Naseer Ahmed

On 10/26/23 21:25, Alex Goins wrote:
> On Thu, 26 Oct 2023, Sebastian Wick wrote:
>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>> Alex Goins <agoins@nvidia.com> wrote:
>>>
>>>> Despite being programmable, the LUTs are updated in a manner that is less
>>>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
>>>> if there was some way to tag operations according to their performance,
>>>> for example so that clients can prefer a high performance one when they
>>>> intend to do an animated transition? I recall from the XDC HDR workshop
>>>> that this is also an issue with AMD's 3DLUT, where updates can be too
>>>> slow to animate.
>>>
>>> I can certainly see such information being useful, but then we need to
>>> somehow quantize the performance.
> 
> Right, which wouldn't even necessarily be universal, could depend on the given
> host, GPU, etc. It could just be a relative performance indication, to give an
> order of preference. That wouldn't tell you if it can or can't be animated, but
> when choosing between two LUTs to animate you could prefer the higher
> performance one.
> 
>>>
>>> What I was left puzzled about after the XDC workshop is that is it
>>> possible to pre-load configurations in the background (slow), and then
>>> quickly switch between them? Hardware-wise I mean.
> 
> This works fine for our "fast" LUTs, you just point them to a surface in video
> memory and they flip to it. You could keep multiple surfaces around and flip
> between them without having to reprogram them in software. We can easily do that
> with enumerated curves, populating them when the driver initializes instead of
> waiting for the client to request them. You can even point multiple hardware
> LUTs to the same video memory surface, if they need the same curve.
> 
>>
>> We could define that pipelines with a lower ID are to be preferred over
>> higher IDs.
> 
> Sure, but this isn't just an issue with a pipeline as a whole, but the
> individual elements within it and how to use them in a given context.
> 
>>
>> The issue is that if programming a pipeline becomes too slow to be
>> useful it probably should just not be made available to user space.
> 
> It's not that programming the pipeline is overall too slow. The LUTs we have
> that are relatively slow to program are meant to be set infrequently, or even
> just once, to allow the scaler and tone mapping operator to operate in fixed
> point PQ space. You might still want the tone mapper, so you would choose a
> pipeline that includes them, but when it comes to e.g. animating a night light,
> you would want to choose a different LUT for that purpose.
> 
>>
>> The prepare-commit idea for blob properties would help to make the
>> pipelines usable again, but until then it's probably a good idea to just
>> not expose those pipelines.
> 
> The prepare-commit idea actually wouldn't work for these LUTs, because they are
> programmed using methods instead of pointing them to a surface. I'm actually not
> sure how slow it actually is, would need to benchmark it. I think not exposing
> them at all would be overkill, since it would mean you can't use the preblending
> scaler or tonemapper, and animation isn't necessary for that.
> 
> The AMD 3DLUT is another example of a LUT that is slow to update, and it would
> obviously be a major loss if that wasn't exposed. There just needs to be some
> way for clients to know if they are going to kill performance by trying to
> change it every frame.

Might a first step be to require the ALLOW_MODESET flag to be set when changing the values for a colorop which is too slow to be updated per refresh cycle?

This would tell the compositor: You can use this colorop, but you can't change its values on the fly.


-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-27  8:59                   ` Michel Dänzer
@ 2023-10-27 10:01                     ` Sebastian Wick
  2023-10-27 12:01                       ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Sebastian Wick @ 2023-10-27 10:01 UTC (permalink / raw)
  To: Michel Dänzer
  Cc: Sasha McIntosh, Pekka Paalanen, Abhinav Kumar, Shashank Sharma,
	Xaver Hugl, Hector Martin, Liviu Dudau, Victoria Brekenfeld,
	dri-devel, Christopher Braga, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Joshua Ashton, Aleix Pol, wayland-devel,
	Arthur Grillo, Naseer Ahmed

On Fri, Oct 27, 2023 at 10:59:25AM +0200, Michel Dänzer wrote:
> On 10/26/23 21:25, Alex Goins wrote:
> > On Thu, 26 Oct 2023, Sebastian Wick wrote:
> >> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> >>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> >>> Alex Goins <agoins@nvidia.com> wrote:
> >>>
> >>>> Despite being programmable, the LUTs are updated in a manner that is less
> >>>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
> >>>> if there was some way to tag operations according to their performance,
> >>>> for example so that clients can prefer a high performance one when they
> >>>> intend to do an animated transition? I recall from the XDC HDR workshop
> >>>> that this is also an issue with AMD's 3DLUT, where updates can be too
> >>>> slow to animate.
> >>>
> >>> I can certainly see such information being useful, but then we need to
> >>> somehow quantize the performance.
> > 
> > Right, which wouldn't even necessarily be universal, could depend on the given
> > host, GPU, etc. It could just be a relative performance indication, to give an
> > order of preference. That wouldn't tell you if it can or can't be animated, but
> > when choosing between two LUTs to animate you could prefer the higher
> > performance one.
> > 
> >>>
> >>> What I was left puzzled about after the XDC workshop is that is it
> >>> possible to pre-load configurations in the background (slow), and then
> >>> quickly switch between them? Hardware-wise I mean.
> > 
> > This works fine for our "fast" LUTs, you just point them to a surface in video
> > memory and they flip to it. You could keep multiple surfaces around and flip
> > between them without having to reprogram them in software. We can easily do that
> > with enumerated curves, populating them when the driver initializes instead of
> > waiting for the client to request them. You can even point multiple hardware
> > LUTs to the same video memory surface, if they need the same curve.
> > 
> >>
> >> We could define that pipelines with a lower ID are to be preferred over
> >> higher IDs.
> > 
> > Sure, but this isn't just an issue with a pipeline as a whole, but the
> > individual elements within it and how to use them in a given context.
> > 
> >>
> >> The issue is that if programming a pipeline becomes too slow to be
> >> useful it probably should just not be made available to user space.
> > 
> > It's not that programming the pipeline is overall too slow. The LUTs we have
> > that are relatively slow to program are meant to be set infrequently, or even
> > just once, to allow the scaler and tone mapping operator to operate in fixed
> > point PQ space. You might still want the tone mapper, so you would choose a
> > pipeline that includes them, but when it comes to e.g. animating a night light,
> > you would want to choose a different LUT for that purpose.
> > 
> >>
> >> The prepare-commit idea for blob properties would help to make the
> >> pipelines usable again, but until then it's probably a good idea to just
> >> not expose those pipelines.
> > 
> > The prepare-commit idea actually wouldn't work for these LUTs, because they are
> > programmed using methods instead of pointing them to a surface. I'm actually not
> > sure how slow it actually is, would need to benchmark it. I think not exposing
> > them at all would be overkill, since it would mean you can't use the preblending
> > scaler or tonemapper, and animation isn't necessary for that.
> > 
> > The AMD 3DLUT is another example of a LUT that is slow to update, and it would
> > obviously be a major loss if that wasn't exposed. There just needs to be some
> > way for clients to know if they are going to kill performance by trying to
> > change it every frame.
> 
> Might a first step be to require the ALLOW_MODESET flag to be set when changing the values for a colorop which is too slow to be updated per refresh cycle?
> 
> This would tell the compositor: You can use this colorop, but you can't change its values on the fly.

I argued before that changing any color op to passthrough should never
require ALLOW_MODESET and while this is really hard to guarantee from a
driver perspective I still believe that it's better to not expose any
feature requiring ALLOW_MODESET or taking too long to program to be
useful for per-frame changes.

When user space has ways to figure out if going back to a specific state
(in this case setting everything to bypass) without ALLOW_MODESET we can
revisit this decision, but until then, let's keep things simple and only
expose things that work reliably without ALLOW_MODESET and fast enough
to work for per-frame changes.

Harry, Pekka: Should we document this? It obviously restricts what can
be exposed but exposing things that can't be used by user space isn't
useful.

> 
> -- 
> Earthling Michel Dänzer            |                  https://redhat.com
> Libre software enthusiast          |         Mesa and Xwayland developer
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-27 10:01                     ` Sebastian Wick
@ 2023-10-27 12:01                       ` Pekka Paalanen
  0 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-10-27 12:01 UTC (permalink / raw)
  To: Sebastian Wick
  Cc: Sasha McIntosh, Abhinav Kumar, Michel Dänzer, Xaver Hugl,
	Shashank Sharma, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Christopher Braga, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Joshua Ashton, Aleix Pol, Hector Martin, wayland-devel,
	Arthur Grillo, Naseer Ahmed

[-- Attachment #1: Type: text/plain, Size: 5907 bytes --]

On Fri, 27 Oct 2023 12:01:32 +0200
Sebastian Wick <sebastian.wick@redhat.com> wrote:

> On Fri, Oct 27, 2023 at 10:59:25AM +0200, Michel Dänzer wrote:
> > On 10/26/23 21:25, Alex Goins wrote:  
> > > On Thu, 26 Oct 2023, Sebastian Wick wrote:  
> > >> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:  
> > >>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> > >>> Alex Goins <agoins@nvidia.com> wrote:
> > >>>  
> > >>>> Despite being programmable, the LUTs are updated in a manner that is less
> > >>>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
> > >>>> if there was some way to tag operations according to their performance,
> > >>>> for example so that clients can prefer a high performance one when they
> > >>>> intend to do an animated transition? I recall from the XDC HDR workshop
> > >>>> that this is also an issue with AMD's 3DLUT, where updates can be too
> > >>>> slow to animate.  
> > >>>
> > >>> I can certainly see such information being useful, but then we need to
> > >>> somehow quantize the performance.  
> > > 
> > > Right, which wouldn't even necessarily be universal, could depend on the given
> > > host, GPU, etc. It could just be a relative performance indication, to give an
> > > order of preference. That wouldn't tell you if it can or can't be animated, but
> > > when choosing between two LUTs to animate you could prefer the higher
> > > performance one.
> > >   
> > >>>
> > >>> What I was left puzzled about after the XDC workshop is that is it
> > >>> possible to pre-load configurations in the background (slow), and then
> > >>> quickly switch between them? Hardware-wise I mean.  
> > > 
> > > This works fine for our "fast" LUTs, you just point them to a surface in video
> > > memory and they flip to it. You could keep multiple surfaces around and flip
> > > between them without having to reprogram them in software. We can easily do that
> > > with enumerated curves, populating them when the driver initializes instead of
> > > waiting for the client to request them. You can even point multiple hardware
> > > LUTs to the same video memory surface, if they need the same curve.
> > >   
> > >>
> > >> We could define that pipelines with a lower ID are to be preferred over
> > >> higher IDs.  
> > > 
> > > Sure, but this isn't just an issue with a pipeline as a whole, but the
> > > individual elements within it and how to use them in a given context.
> > >   
> > >>
> > >> The issue is that if programming a pipeline becomes too slow to be
> > >> useful it probably should just not be made available to user space.  
> > > 
> > > It's not that programming the pipeline is overall too slow. The LUTs we have
> > > that are relatively slow to program are meant to be set infrequently, or even
> > > just once, to allow the scaler and tone mapping operator to operate in fixed
> > > point PQ space. You might still want the tone mapper, so you would choose a
> > > pipeline that includes them, but when it comes to e.g. animating a night light,
> > > you would want to choose a different LUT for that purpose.
> > >   
> > >>
> > >> The prepare-commit idea for blob properties would help to make the
> > >> pipelines usable again, but until then it's probably a good idea to just
> > >> not expose those pipelines.  
> > > 
> > > The prepare-commit idea actually wouldn't work for these LUTs, because they are
> > > programmed using methods instead of pointing them to a surface. I'm actually not
> > > sure how slow it actually is, would need to benchmark it. I think not exposing
> > > them at all would be overkill, since it would mean you can't use the preblending
> > > scaler or tonemapper, and animation isn't necessary for that.
> > > 
> > > The AMD 3DLUT is another example of a LUT that is slow to update, and it would
> > > obviously be a major loss if that wasn't exposed. There just needs to be some
> > > way for clients to know if they are going to kill performance by trying to
> > > change it every frame.  
> > 
> > Might a first step be to require the ALLOW_MODESET flag to be set when changing the values for a colorop which is too slow to be updated per refresh cycle?
> > 
> > This would tell the compositor: You can use this colorop, but you can't change its values on the fly.  
> 
> I argued before that changing any color op to passthrough should never
> require ALLOW_MODESET and while this is really hard to guarantee from a
> driver perspective I still believe that it's better to not expose any
> feature requiring ALLOW_MODESET or taking too long to program to be
> useful for per-frame changes.
> 
> When user space has ways to figure out if going back to a specific state
> (in this case setting everything to bypass) without ALLOW_MODESET we can
> revisit this decision, but until then, let's keep things simple and only
> expose things that work reliably without ALLOW_MODESET and fast enough
> to work for per-frame changes.
> 
> Harry, Pekka: Should we document this? It obviously restricts what can
> be exposed but exposing things that can't be used by user space isn't
> useful.

In an ideal world... but in real world, I don't know.

Would it help if there was a list collected, with all the things in
various hardware that is known to be too heavy to reprogram every
refresh? Maybe that would allow a more educated decision?

I bet that depends also on the refresh rate.

I would probably be fine with some sort of update cost classification
on colorops, and the kernel keeping track of blobs: if userspace sets
the same blob on the same colorop that is already there (by blob ID, no
need to compare contents), then it's a no-op change.


Anyway, I really like reading Alex Goins' reply, it seems we are very
much on the same page here. :-)


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array
  2023-10-19 21:21 ` [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array Harry Wentland
@ 2023-10-30 13:29   ` Pekka Paalanen
  2023-11-06 20:48     ` Harry Wentland
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-10-30 13:29 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Uma Shankar, Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 4656 bytes --]

On Thu, 19 Oct 2023 17:21:21 -0400
Harry Wentland <harry.wentland@amd.com> wrote:

> When the floor LUT index (drm_fixp2int(lut_index) is the last
> index of the array the ceil LUT index will point to an entry
> beyond the array. Make sure we guard against it and use the
> value of the floot LUT index.
> 
> Blurb about LUT creation and how first element should be 0x0 and
> last one 0xffff.
> 
> Hold on, is that even correct? What should the ends of a LUT be?
> How does UNORM work and how does it apply to LUTs?

Do you mean how should UNORM input value map to LUT entries for LUT
indexing?

I suppose UNORM 16-bit converts to nominal real values as:
- 0x0: 0.0
- 0xffff: 1.0

And in LUT, you want 0.0 to map to the first LUT element exactly, and
1.0 to map to the last LUT element exactly, even if whatever
interpolation may be in use, right?

If so, it is important to make sure that, assuming linear interpolation
for instance, there is no "dead zone" at either end. Given high
interpolation precision, any step away from 0.0 or 1.0 needs to imply a
change in the real-valued output, assuming e.g. identity LUT.

If LUT has N elements, and 16-bit UNORM input value is I, then (in
naive real-valued math, so no implicit truncation between operations)

x = I / 0xffff * (N - 1)
ia = floor(x)
ib = min(ia + 1, N - 1)

f = x - floor(x)
y = (1 - f) * LUT[ia] + f * LUT[ib]


Does that help?

In my mind, I'm thinking of a uniformly distributed LUT as a 1-D
texture, because that's how I have implemented them in GL. There you
have to be careful so that input values 0.0 and 1.0 map to the *center*
of the first and last texel, and not to the edges of the texture like
texture coordinates do. Then you can use the GL linear texture
interpolation as-is.


Thanks,
pq


> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Jonas Ådahl <jadahl@redhat.com>
> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Cc: Alexander Goins <agoins@nvidia.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Michel Dänzer <mdaenzer@redhat.com>
> Cc: Aleix Pol <aleixpol@kde.org>
> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> Cc: Victoria Brekenfeld <victoria@system76.com>
> Cc: Sima <daniel@ffwll.ch>
> Cc: Uma Shankar <uma.shankar@intel.com>
> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
> Cc: Christopher Braga <quic_cbraga@quicinc.com>
> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> Cc: Arthur Grillo <arthurgrillo@riseup.net>
> Cc: Hector Martin <marcan@marcan.st>
> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
> Cc: Sasha McIntosh <sashamcintosh@google.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index a0a3a6fd2926..cf1dff162920 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -123,6 +123,8 @@ static u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 chan
>  				      enum lut_channel channel)
>  {
>  	s64 lut_index = get_lut_index(lut, channel_value);
> +	u16 *floor_lut_value, *ceil_lut_value;
> +	u16 floor_channel_value, ceil_channel_value;
>  
>  	/*
>  	 * This checks if `struct drm_color_lut` has any gap added by the compiler
> @@ -130,11 +132,15 @@ static u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 chan
>  	 */
>  	static_assert(sizeof(struct drm_color_lut) == sizeof(__u16) * 4);
>  
> -	u16 *floor_lut_value = (__u16 *)&lut->base[drm_fixp2int(lut_index)];
> -	u16 *ceil_lut_value = (__u16 *)&lut->base[drm_fixp2int_ceil(lut_index)];
> +	floor_lut_value = (__u16 *)&lut->base[drm_fixp2int(lut_index)];
> +	if (drm_fixp2int(lut_index) == (lut->lut_length - 1))
> +		/* We're at the end of the LUT array, use same value for ceil and floor */
> +		ceil_lut_value = floor_lut_value;
> +	else
> +		ceil_lut_value = (__u16 *)&lut->base[drm_fixp2int_ceil(lut_index)];
>  
> -	u16 floor_channel_value = floor_lut_value[channel];
> -	u16 ceil_channel_value = ceil_lut_value[channel];
> +	floor_channel_value = floor_lut_value[channel];
> +	ceil_channel_value = ceil_lut_value[channel];
>  
>  	return lerp_u16(floor_channel_value, ceil_channel_value,
>  			lut_index & DRM_FIXED_DECIMAL_MASK);


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-26 19:25                 ` Alex Goins
  2023-10-27  8:59                   ` Michel Dänzer
@ 2023-11-04 23:01                   ` Christopher Braga
  2023-11-07 16:52                     ` Harry Wentland
  2023-11-07 16:52                   ` Harry Wentland
  2 siblings, 1 reply; 49+ messages in thread
From: Christopher Braga @ 2023-11-04 23:01 UTC (permalink / raw)
  To: Alex Goins, Sebastian Wick
  Cc: Aleix Pol, Sasha McIntosh, Abhinav Kumar, Shashank Sharma,
	Xaver Hugl, Hector Martin, Liviu Dudau, Victoria Brekenfeld,
	dri-devel, wayland-devel, Melissa Wen, Pekka Paalanen,
	Jonas Ådahl, Joshua Ashton, Michel Dänzer,
	Naseer Ahmed, Uma Shankar, Arthur Grillo

Just want to loop back to before we branched off deeper into the 
programming performance talk

On 10/26/2023 3:25 PM, Alex Goins wrote:
> On Thu, 26 Oct 2023, Sebastian Wick wrote:
> 
>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>> Alex Goins <agoins@nvidia.com> wrote:
>>>
>>>> Thank you Harry and all other contributors for your work on this. Responses
>>>> inline -
>>>>
>>>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>>>
>>>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>>>> Harry Wentland <harry.wentland@amd.com> wrote:
>>>>>
>>>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
>>>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>>>>>>
>>>>>>>> Thanks for continuing to work on this!
>>>>>>>>
>>>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:
>>>>>>>>> v2:
>>>>>>>>>   - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>>>>>>>>   - Updated wording (Pekka)
>>>>>>>>>   - Change BYPASS wording to make it non-mandatory (Sebastian)
>>>>>>>>>   - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>>>>>>>>     section (Pekka)
>>>>>>>>>   - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>>>>>>>>>   - Add "Driver Implementer's Guide" section (Pekka)
>>>>>>>>>   - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>>> +An example of a drm_colorop object might look like one of these::
>>>>>>>>> +
>>>>>>>>> +    /* 1D enumerated curve */
>>>>>>>>> +    Color operation 42
>>>>>>>>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
>>>>>>>>> +    ├─ "BYPASS": bool {true, false}
>>>>>>>>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
>>>>>>>>> +    └─ "NEXT": immutable color operation ID = 43
>>>>
>>>> I know these are just examples, but I would also like to suggest the possibility
>>>> of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
>>>> compared to setting an identity in some cases depending on the hardware. See
>>>> below for more on this, RE: implicit format conversions.
>>>>
>>>> Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
>>>> offline discussions that it would nonetheless be helpful to expose enumerated
>>>> curves in order to hide the vendor-specific complexities of programming
>>>> segmented LUTs from clients. In that case, we would simply refer to the
>>>> enumerated curve when calculating/choosing segmented LUT entries.
>>>
>>> That's a good idea.
>>>
>>>> Another thing that came up in offline discussions is that we could use multiple
>>>> color operations to program a single operation in hardware. As I understand it,
>>>> AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
>>>> "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
>>>> we could combine them into a singular LUT in software, such that you can combine
>>>> e.g. segmented PQ EOTF with night light. One caveat is that you will lose
>>>> precision from the custom LUT where it overlaps with the linear section of the
>>>> enumerated curve, but that is unavoidable and shouldn't be an issue in most
>>>> use-cases.
>>>
>>> Indeed.
>>>
>>>> Actually, the current examples in the proposal don't include a multiplier color
>>>> op, which might be useful. For AMD as above, but also for NVIDIA as the
>>>> following issue arises:
>>>>
>>>> As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed

If possible, let's declare this as two blocks. One that informatively 
declares the conversion is present, and another for the de-gamma. This 
will help with block-reuse between vendors.

>>>> point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
>>>> to in floating point varies depending on the source content. If it's SDR
>>>> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
>>>> potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
>>>> content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
>>>> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
>>>> correct?
>>>
>>> It would be against the UAPI design principles to tag content as HDR or
>>> SDR. What you can do instead is to expose a colorop with a multiplier of
>>> 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
>>> that the input is SDR or HDR to get the expected multiplier. You will
>>> never know what the content actually is, anyway.
> 
> Right, I didn't mean to suggest that we should tag content as HDR or SDR in the
> UAPI, just relating to the end result in the pipe, ultimately it would be
> determined by the multiplier color op.
> 

A multiplier could work but we would should give OEMs the option to 
either make it "informative" and fixed by the hardware, or fully 
configurable. With the Qualcomm pipeline how we absorb FP16 pixel 
buffers, as well as how we convert them to fixed point data actually has 
a dependency on the desired de-gamma and gamma processing. So for an 
example:

If a source pixel buffer is scRGB encoded FP16 content we would expect 
input pixel content to be up to 7.5, with the IGC output reaching 125 as 
in the NVIDIA case. Likewise gamma 2.2 encoded FP16 content would be 0-1 
in and 0-1 out.

So in the Qualcomm case the expectations are fixed depending on the use 
case.

It is sounding to me like we would need to be able to declare three 
things here:
1. Value range expectations *into* the de-gamma block. A multiplier 
wouldn't work here because it would be more of a clipping operation. I 
guess we would have to add an explicit clamping block as well.
2. What the value range expectations  at the *output* of de-gamma 
processing block. Also covered by using another multiplier block.
3. Value range expectations *into* a gamma processing block. This should 
be covered by declaring a multiplier post-csc, but only assuming CSC 
output is normalized in the desired value range. A clamping block would 
be preferable because it describes what happens when it isn't.

All this is do-able, but it seems like it would require the definition 
of multiple color pipelines to expose the different limitations for 
color block configuration combinations. Additionally, would it be easy 
for user space to find the right pipeline?

>>>
>>> Of course, if we want to have a arbitrary multiplier colorop that is
>>> somewhat standard, as in, exposed by many drivers to ease userspace
>>> development, you can certainly use any combination of your hardware
>>> features you need to realize the UAPI prescribed mathematical operation.
>>>
>>> Since we are talking about floating-point in hardware, a multiplier
>>> does not significantly affect precision.
>>>
>>> In order to mathematically define all colorops, I believe it is
>>> necessary to define all colorops in terms of floating-point values (as
>>> in math), even if they operate on fixed-point or integer. By this I
>>> mean that if the input is 8 bpc unsigned integer pixel format for
>>> instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped
>>> to 1.0, and the color pipeline starts with [0.0, 1.0], not [0, 255]
>>> domain. We have to agree on this mapping for all channels on all pixel
>>> formats. However, there is a "but" further below.
> 
> I think this makes sense insofar as how we interact with the UAPI, and that's
> basically how fixed point works for us anyway. However, relating to your "but",
> it doesn't avoid the issue with hardware expectations about pixel formats since
> it doesn't change the underlying pixel format.
> 
>>>
>>> I also propose that quantization range is NOT considered in the raw
>>> value mapping, so that we can handle quantization range in colorops
>>> explicitly, allowing us to e.g. handle sub-blacks and super-whites when
>>> necessary. (These are currently impossible to represent in the legacy
>>> color properties, because everything is converted to full range and
>>> clipped before any color operations.)
>>>
>>>>  From the given enumerated curves, it's not clear how they would map to the
>>>> above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
>>>> value of 125.0? That may work, but it tends towards the "descriptive" notion of
>>>> assuming the source content, which may not be accurate in all cases. This is
>>>> also an issue for the custom 1D LUT, as the blob will need to be converted to
>>>> FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
>>>> value be, given that we no longer have any hint as to the source content?
>>>
>>> In my opinion, all finite non-negative transfer functions should
>>> operate with [0.0, 1.0] domain and [0.0, 1.0] range, and that includes
>>> all sRGB, power 2.2, and PQ curves.
> 
> Right, I think so too, otherwise you are making assumptions about the source
> content. For example, it's possible to do HDR with a basic gamma curve, so you
> can't really assume that gamma should always go up to 1.0, but PQ up to 125.0.
> If you did that, it would necessitate adding an "HDR Gamma" curve, which is
> converging back on a "descriptive" UAPI. By leaving the final range up to the
> subsequent multiplier, the client gets to choose independently from the TF,
> which seems more in line with the goals of this proposal.
>  >>>
>>> If we look at BT.2100, there is no such encoding even mentioned where
>>> 125.0 would correspond to 10k cd/m². That 125.0 convention already has
>>> a built-in assumption what the color spaces are and what the conversion
>>> is aiming to do. IOW, I would say that choice is opinionated from the
>>> start. The multiplier in BT.2100 is always 10000.
> 
> Be that as it may, the convention of FP16 125.0 corresponding to 10k nits is
> baked in our hardware, so it's unavoidable at least for NVIDIA pipelines.
>  >>>
>>> Given that elements like various kinds of look-up tables inherently
>>> assume that the domain is [0.0, 1.0] (because the it is a table that
>>> has a beginning and an end, and the usual convention is that the
>>> beginning is zero and the end is one), I think it is best to stick to
>>> the [0.0, 1.0] range where possible. If we go out of that range, then
>>> we have to define how a LUT would apply in a sensible way.
> 
> In my last reply I mentioned a static (but actually programmable) LUT that is
> typically used to convert FP16 linear pixels to fixed point PQ before handing
> them to the scaler and tone mapping operator. You're actually right that it
> indexes in the fixed point [0.0, 1.0] range for the reasons you describe, but
> because the input pixels are expected to be FP16 in the [0.0, 125.0] range, it
> applies a non-programmable 1/125.0 normalization factor first.
> 
> In this case, you could think of the LUT as indexing on [0.0, 125.0], but as you
> point out there would need to be some way to describe that. Maybe we actually
> need a fractional multiplier / divider color op. NVIDIA pipes that include this
> LUT would need to include a mandatory 1/125.0 factor immediately prior to the
> LUT, then LUT can continue assuming a range of [0.0, 1.0].
> 
> Assuming you are using the hardware in a conventional way, specifying a
> multiplier of 1.0 after the "degamma" LUT would then map to the 80-nit PQ range
> after the static (but actually programmable) PQ LUT, whereas specifying a
> multiplier of 125.0 would map to the 10,000-nit PQ range, which is what we want.
> I guess it's kind of messy, but the effect would be that color ops other than
> multipliers/dividers would still be in the [0.0, 1.0] domain, and any multiplier
> that exceeds that range would have to be normalized by a divider before any
> other color op.
> 

Hmm. A multiplier would resolve issues when input linear FP16 data that 
has different ideas on what 1.0 means in regards to nits values (think 
of Apple's EDR as an example). For a client to go from their definition 
to hardware definition of 1.0 = x nits, we would need to expose what the 
pipeline sees as 1.0 though. So in this case the multiplier would be 
programmable, but the divisor is informational? It seems like the later 
would have an influence on how the former is programmed.

>>>
>>> Many TFs are intended to be defined only on [0.0, 1.0] -> [0.0, 1.0].
>>> Some curves, like power 2.2, have a mathematical form that naturally
>>> extends outside of that range. Power 2.2 generalizes to >1.0 input
>>> values as is, but not for negative input values. If needed for negative
>>> input values, it is common to use y = -TF(-x) for x < 0 mirroring.
>>>
>>> scRGB is the prime example that intentionally uses negative channel
>>> values. We can also have negative channel values with limited
>>> quantization range, sometimes even intentionally (xvYCC chroma, or
>>> PLUGE test sub-blacks). Out-of-unit-range values can also appear as a
>>> side-effect of signal processing, and they should not get clipped
>>> prematurely. This is a challenge for colorops that fundamentally cannot
>>> handle out-of-unit-range values.
>>>
>>> There are various workarounds. scRGB colorimetry can be converted into
>>> BT.2020 primaries for example, to avoid saturation induced negative
>>> values. Limited quantization range signal could be processed as-is,
>>> meaning that the limited range is mapped to [16.0/255, 235.0/255]
>>> instead of [0.0, 1.0] or so. But then, we have a complication with
>>> transfer functions.
>>>
>>>> I think a multiplier color op solves all of these issues. Named curves and
>>>> custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then
>>>> be adjusted by the multiplier.
>>>
>>> Pretty much.
>>>
>>>> For 80 nit SDR content, set it to 1, for 400
>>>> nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc.
>>>
>>> That I think is a another story.
>>>
>>>>>>>>> +
>>>>>>>>> +    /* custom 4k entry 1D LUT */
>>>>>>>>> +    Color operation 52
>>>>>>>>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
>>>>>>>>> +    ├─ "BYPASS": bool {true, false}
>>>>>>>>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
>>>>>>>>> +    ├─ "LUT_1D": blob
>>>>>>>>> +    └─ "NEXT": immutable color operation ID = 0
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>>> +Driver Forward/Backward Compatibility
>>>>>>>>> +=====================================
>>>>>>>>> +
>>>>>>>>> +As this is uAPI drivers can't regress color pipelines that have been
>>>>>>>>> +introduced for a given HW generation. New HW generations are free to
>>>>>>>>> +abandon color pipelines advertised for previous generations.
>>>>>>>>> +Nevertheless, it can be beneficial to carry support for existing color
>>>>>>>>> +pipelines forward as those will likely already have support in DRM
>>>>>>>>> +clients.
>>>>>>>>> +
>>>>>>>>> +Introducing new colorops to a pipeline is fine, as long as they can be
>>>>>>>>> +disabled or are purely informational. DRM clients implementing support
>>>>>>>>> +for the pipeline can always skip unknown properties as long as they can
>>>>>>>>> +be confident that doing so will not cause unexpected results.
>>>>>>>>> +
>>>>>>>>> +If a new colorop doesn't fall into one of the above categories
>>>>>>>>> +(bypassable or informational) the modified pipeline would be unusable
>>>>>>>>> +for user space. In this case a new pipeline should be defined.
>>>>>>>>
>>>>>>>> How can user space detect an informational element? Should we just add a
>>>>>>>> BYPASS property to informational elements, make it read only and set to
>>>>>>>> true maybe? Or something more descriptive?
>>>>>>>
>>>>>>> Read-only BYPASS set to true would be fine by me, I guess.
>>>>>>>
>>>>>>
>>>>>> Don't you mean set to false? An informational element will always do
>>>>>> something, so it can't be bypassed.
>>>>>
>>>>> Yeah, this is why we need a definition. I understand "informational" to
>>>>> not change pixel values in any way. Previously I had some weird idea
>>>>> that scaling doesn't alter color, but of course it may.
>>>>
>>>> On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
>>>> implicit fixed-point to FP16 conversions, and vice versa.
>>>
>>> Above, I claimed that the UAPI should be defined in nominal
>>> floating-point values, but I wonder, would that work? Would we need to
>>> have explicit colorops for converting from raw pixel data values into
>>> nominal floating-point in the UAPI?
> 
> Yeah, I think something like that is needed, or another solution as discussed
> below. Even if we define the UAPI in terms of floating point, the actual
> underlying pixel format needs to match the expectations of each stage as it
> flows through the pipe.
> 

Strongly agree on this. Pixel format and block relationships definitely 
exist.

>>>
>>>> For example, the "degamma" LUT towards the beginning of the pipeline implicitly
>>>> converts from fixed point to FP16, and some of the following operations expect
>>>> to operate in FP16. As such, if you have a fixed point input and don't bypass
>>>> those following operations, you *must not* bypass the LUT, even if you are
>>>> otherwise just programming it with the identity. Conversely, if you have a
>>>> floating point input, you *must* bypass the LUT.
>>>
>>> Interesting. Since the color pipeline is not(?) meant to replace pixel
>>> format definitions which already make the difference between fixed and
>>> floating point, wouldn't this little detail need to be taken care of by
>>> the driver under the hood?
> 
> We could take care of it under the hood in the case where the pixel format is
> fixed point but the "degamma" LUT is bypassed, simply by programming it with the
> identity to allow for the conversion to take place. But when the pixel format is
> FP16 and the "degamma" LUT is *not* bypassed, we would need to either ignore the
> LUT (bad) or fail the atomic commit. That's why we need some way to communicate
> the restriction to the client, otherwise they are left guessing why the atomic
> commit failed.
> 
>>>
>>> What if I want to use degamma colorop with a floating-point
>>> framebuffer? Simply not possible on this hardware?
> 
> Right, it's not possible. The "degamma" LUT always does an implicit conversion
> from fixed point to FP16, so if the pixel format is already FP16 it isn't
> usable. However, the aforementioned static (actually programmable) LUT that
> follows the "degamma" LUT expects FP16 pixels, so you could still use that to do
> some kind of transformation. That's actually a good example of a novel use that
> justifies compositors being able to program it.
> 
>>>
>>>> Could informational elements and allowing the exclusion of the BYPASS property
>>>> be used to convey this information to the client?  For example, we could expose
>>>> one pipeline with the LUT exposed with read-only BYPASS set to false, and
>>>> sandwich it with informational "Fixed Point" and "FP16" elements to accommodate
>>>> fixed point input. Then, expose another pipeline with the LUT missing, and an
>>>> informational "FP16" element in its place to accommodate floating point input.
>>>>
>>>> That's just an example; we also have other operations in the pipeline that do
>>>> similar implicit conversions. In these cases we don't want the operations to be
>>>> bypassed individually, so instead we would expose them as mandatory in some
>>>> pipelines and missing in others, with informational elements to help inform the
>>>> client of which to choose. Is that acceptable under the current proposal?
>>>>
>>>> Note that in this case, the information just has to do with what format the
>>>> pixels should be in, it doesn't correspond to any specific operation. So, I'm
>>>> not sure that BYPASS has any meaning for informational elements in this context.
>>>
>>> Very good questions. Do we have to expose those conversions in the UAPI
>>> to make things work for this hardware? Meaning that we cannot assume all
>>> colorops work in nominal floating-point from userspace perspective
>>> (perhaps with varying degrees of precision).
>>
>> I had this in my original proposal I think (maybe I only thought about
>> it, not sure).
>>
>> We really should figure this one out. Can we get away with normalized
>> [0,1] fp as a user space abstraction or not?
> 
> I think the conversion needs to be exposed at least just the one time at the
> beginning alongside the "degamma" LUT, since the choice is influenced an outside
> factor (the input pixel format). There are subsequent intermediate conversions
> as well, but that's only an issue if we allow the relevant color ops to be
> bypassed individually. If we expose a multitude of pipes where the relevant ops
> are either missing or mandatory in unison, we can avoid mismatched pixel formats
> while maintaining the illusion of a pipe that operates entirely in floating
> point.
> 
> Or, pipes could just have explicit associated input pixel format(s). The above
> technique of exposing multiple pipes instead of bypassing color ops individually
> would still work, and clients would just have to choose a pipe that matches the
> input pixel format. That way, the actual color ops themselves could still be
> defined in terms of normalized [0.0, 1.0] floating point (multipliers/dividers
> excepted), and clients can continue thinking in terms of that after making the
> initial selection.
> 
>>
>>>
>>>>>>> I think we also need a definition of "informational".
>>>>>>>
>>>>>>> Counter-example 1: a colorop that represents a non-configurable
>>>>>>
>>>>>> Not sure what's "counter" for these examples?
>>>>>>
>>>>>>> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
>>>>>>> format. It cannot be set to bypass, it cannot be configured, and it
>>>>>>> will alter color values.
>>>>
>>>> Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
>>>> no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
>>>> based on the principle that read-only blobs could be used to express some static
>>>> pipeline elements without the need to define a new type, but got mixed opinions.
>>>> I think this demonstrates the principle further, as clients could detect this
>>>> programmatically instead of having to special-case the informational element.
>>>
>>
>> I'm all for exposing fixed color ops but I suspect that most of those
>> follow some standard and in those cases instead of exposing the matrix
>> values one should prefer to expose a named matrix (e.g. BT.601, BT.709,
>> BT.2020).
>>
>> As a general rule: always expose the highest level description. Going
>> from a name to exact values is trivial, going from values to a name is
>> much harder.
> 
> Good point. It would need to be a conversion between any two defined color
> spaces e.g. BT.709-to-BT.2020, hence why it's much harder to go backwards.
> 

A small advantage of providing name + values (or just blob ID) is that 
if the compositor needs to make a GPU shader that matches the hardware 
they could refer to the matrix values from the driver instead of having 
their own copy of what the standard says the conversion should be.

>>> If the blob depends on the pixel format (i.e. the driver automatically
>>> chooses a different blob per pixel format), then I think we would need
>>> to expose all the blobs and how they correspond to pixel formats.
>>> Otherwise ok, I guess.
>>>
>>> However, do we want or need to make a color pipeline or colorop
>>> conditional on pixel formats? For example, if you use a YUV 4:2:0 type
>>> of pixel format, then you must use this pipeline and not any other. Or
>>> floating-point type of pixel format. I did not anticipate this before,
>>> I assumed that all color pipelines and colorops are independent of the
>>> framebuffer pixel format. A specific colorop might have a property that
>>> needs to agree with the framebuffer pixel format, but I didn't expect
>>> further limitations.
>>
>> We could simply fail commits when the pipeline and pixel format don't
>> work together. We'll probably need some kind of ingress no-op node
>> anyway and maybe could list pixel formats there if required to make it
>> easier to find a working configuration.
> 
> Yeah, we could, but having to figure that out through trial and error would be
> unfortunate. Per above, it might be easiest to just tag pipelines with a pixel
> format instead of trying to include the pixel format conversion as a color op.
> 

I definitely think this is going to be needed. That said, this also 
means that compositors that don't know how to configure this pipeline 
might not be able to use the format.

If we take the FP16 example again, there may be able to be some sort of 
default programming to allow the hardware to absorb the content, but 
avoiding clipping of the content couldn't be guaranteed. We would end up 
having a functional pipeline, but the output result could end up being 
less than ideal. It really will depend on how the input content is packed.

>>> "Without the need to define a new type" is something I think we need to
>>> consider case by case. I have a hard time giving a general opinion.
>>>
>>>>>>>
>>>>>>> Counter-example 2: image size scaling colorop. It might not be
>>>>>>> configurable, it is controlled by the plane CRTC_* and SRC_*
>>>>>>> properties. You still need to understand what it does, so you can
>>>>>>> arrange the scaling to work correctly. (Do not want to scale an image
>>>>>>> with PQ-encoded values as Josh demonstrated in XDC.)
>>>>>>>
>>>>>>
>>>>>> IMO the position of the scaling operation is the thing that's important
>>>>>> here as the color pipeline won't define scaling properties.
>>>>
>>>> I agree that blending should ideally be done in linear space, and I remember
>>>> that from Josh's presentation at XDC, but I don't recall the same being said for
>>>> scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
>>>> pipeline that is meant to be in PQ space (more on this below), and that was
>>>> found to achieve better results at HDR/SDR boundaries. Of course, this only
>>>> bolsters the argument that it would be helpful to have an informational "scaler"
>>>> element to understand at which stage scaling takes place.
>>>
>>> Both blending and scaling are fundamentally the same operation: you
>>> have two or more source colors (pixels), and you want to compute a
>>> weighted average of them following what happens in nature, that is,
>>> physics, as that is what humans are used to.
>>>
>>> Both blending and scaling will suffer from the same problems if the
>>> operation is performed on not light-linear values. The result of the
>>> weighted average does not correspond to physics.
>>>
>>> The problem may be hard to observe with natural imagery, but Josh's
>>> example shows it very clearly. Maybe that effect is sometimes useful
>>> for some imagery in some use cases, but it is still an accidental
>>> side-effect. You might get even better results if you don't rely on
>>> accidental side-effects but design a separate operation for the exact
>>> goal you have.
>>>
>>> Mind, by scaling we mean changing image size. Not scaling color values.
>>>
> 
> Fair enough, but it might not always be a choice given the hardware.
> 

Agreeing with Alex here. I get there is some debate over the best way to 
do this, but I think it is best to leave it up to the driver to declare 
how that is done.

>>>>>>> Counter-example 3: image sampling colorop. Averages FB originated color
>>>>>>> values to produce a color sample. Again do not want to do this with
>>>>>>> PQ-encoded values.
>>>>>>>
>>>>>>
>>>>>> Wouldn't this only happen during a scaling op?
>>>>>
>>>>> There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
>>>>> coordinates can be fractional, which makes nearest vs. bilinear
>>>>> sampling have a difference even if there is no scaling.
>>>>>
>>>>> There is also the question of chroma siting with sub-sampled YUV. I
>>>>> don't know how that actually works, or how it theoretically should work.
>>>>
>>>> We have some operations in our pipeline that are intended to be static, i.e. a
>>>> static matrix that converts from RGB to LMS, and later another that converts
>>>> from LMS to ICtCp. There are even LUTs that are intended to be static,
>>>> converting from linear to PQ and vice versa. All of this is because the
>>>> pre-blending scaler and tone mapping operator are intended to operate in ICtCp
>>>> PQ space. Although the stated LUTs and matrices are intended to be static, they
>>>> are actually programmable. In offline discussions, it was indicated that it
>>>> would be helpful to actually expose the programmability, as opposed to exposing
>>>> them as non-bypassable blocks, as some compositors may have novel uses for them.
>>>
>>> Correct. Doing tone-mapping in ICtCp etc. are already policy that
>>> userspace might or might not agree with.
>>>
>>> Exposing static colorops will help usages that adhere to current
>>> prevalent standards around very specific use cases. There may be
>>> millions of devices needing exactly that processing in their usage, but
>>> it is also quite limiting in what one can do with the hardware.
>>>
>>>> Despite being programmable, the LUTs are updated in a manner that is less
>>>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
>>>> if there was some way to tag operations according to their performance,
>>>> for example so that clients can prefer a high performance one when they
>>>> intend to do an animated transition? I recall from the XDC HDR workshop
>>>> that this is also an issue with AMD's 3DLUT, where updates can be too
>>>> slow to animate.
>>>
>>> I can certainly see such information being useful, but then we need to
>>> somehow quantize the performance.
> 
> Right, which wouldn't even necessarily be universal, could depend on the given
> host, GPU, etc. It could just be a relative performance indication, to give an
> order of preference. That wouldn't tell you if it can or can't be animated, but
> when choosing between two LUTs to animate you could prefer the higher
> performance one.
> 
>>>
>>> What I was left puzzled about after the XDC workshop is that is it
>>> possible to pre-load configurations in the background (slow), and then
>>> quickly switch between them? Hardware-wise I mean.
> 
> This works fine for our "fast" LUTs, you just point them to a surface in video
> memory and they flip to it. You could keep multiple surfaces around and flip
> between them without having to reprogram them in software. We can easily do that
> with enumerated curves, populating them when the driver initializes instead of
> waiting for the client to request them. You can even point multiple hardware
> LUTs to the same video memory surface, if they need the same curve.
> 
>>
>> We could define that pipelines with a lower ID are to be preferred over
>> higher IDs.
> 
> Sure, but this isn't just an issue with a pipeline as a whole, but the
> individual elements within it and how to use them in a given context.
> 
>>
>> The issue is that if programming a pipeline becomes too slow to be
>> useful it probably should just not be made available to user space.
> 
> It's not that programming the pipeline is overall too slow. The LUTs we have
> that are relatively slow to program are meant to be set infrequently, or even
> just once, to allow the scaler and tone mapping operator to operate in fixed
> point PQ space. You might still want the tone mapper, so you would choose a
> pipeline that includes them, but when it comes to e.g. animating a night light,
> you would want to choose a different LUT for that purpose.
> 
>>
>> The prepare-commit idea for blob properties would help to make the
>> pipelines usable again, but until then it's probably a good idea to just
>> not expose those pipelines.
> 
> The prepare-commit idea actually wouldn't work for these LUTs, because they are
> programmed using methods instead of pointing them to a surface. I'm actually not
> sure how slow it actually is, would need to benchmark it. I think not exposing
> them at all would be overkill, since it would mean you can't use the preblending
> scaler or tonemapper, and animation isn't necessary for that.
> 
> The AMD 3DLUT is another example of a LUT that is slow to update, and it would
> obviously be a major loss if that wasn't exposed. There just needs to be some
> way for clients to know if they are going to kill performance by trying to
> change it every frame.
> 
> Thanks,
> Alex
> 

To clarify, what are we defining as slow to update here? Something we 
aren't able to update within a frame (let's say at a low frame rate such 
as 30 fps for discussion's sake)? A block that requires a programming 
sequence of disable + program + enable to update? Defining performance 
seems like it can get murky if we start to consider frame concurrent 
updates among multiple color blocks as well.

Thanks,
Christopher

>>
>>>
>>>
>>> Thanks,
>>> pq
>>
>>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array
  2023-10-30 13:29   ` Pekka Paalanen
@ 2023-11-06 20:48     ` Harry Wentland
  0 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-11-06 20:48 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Uma Shankar, Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton



On 2023-10-30 09:29, Pekka Paalanen wrote:
> On Thu, 19 Oct 2023 17:21:21 -0400
> Harry Wentland <harry.wentland@amd.com> wrote:
> 
>> When the floor LUT index (drm_fixp2int(lut_index) is the last
>> index of the array the ceil LUT index will point to an entry
>> beyond the array. Make sure we guard against it and use the
>> value of the floot LUT index.
>>
>> Blurb about LUT creation and how first element should be 0x0 and
>> last one 0xffff.
>>
>> Hold on, is that even correct? What should the ends of a LUT be?
>> How does UNORM work and how does it apply to LUTs?
> 
> Do you mean how should UNORM input value map to LUT entries for LUT
> indexing?
> 
> I suppose UNORM 16-bit converts to nominal real values as:
> - 0x0: 0.0
> - 0xffff: 1.0
> 
> And in LUT, you want 0.0 to map to the first LUT element exactly, and
> 1.0 to map to the last LUT element exactly, even if whatever
> interpolation may be in use, right?
> 
> If so, it is important to make sure that, assuming linear interpolation
> for instance, there is no "dead zone" at either end. Given high
> interpolation precision, any step away from 0.0 or 1.0 needs to imply a
> change in the real-valued output, assuming e.g. identity LUT.
> 
> If LUT has N elements, and 16-bit UNORM input value is I, then (in
> naive real-valued math, so no implicit truncation between operations)
> 
> x = I / 0xffff * (N - 1)
> ia = floor(x)
> ib = min(ia + 1, N - 1)
> 
> f = x - floor(x)
> y = (1 - f) * LUT[ia] + f * LUT[ib]
> 
> 
> Does that help?
> 

Thanks. Yes, this is what the code is doing (with this commit).

The commit description was an oversight and only reflect my initial
thoughts when coding it, before I made sure this is the right way
to go about it. I'll update it.

Harry

> In my mind, I'm thinking of a uniformly distributed LUT as a 1-D
> texture, because that's how I have implemented them in GL. There you
> have to be careful so that input values 0.0 and 1.0 map to the *center*
> of the first and last texel, and not to the edges of the texture like
> texture coordinates do. Then you can use the GL linear texture
> interpolation as-is.
> 
> 
> Thanks,
> pq
> 
> 
>> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
>> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>> Cc: Simon Ser <contact@emersion.fr>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Melissa Wen <mwen@igalia.com>
>> Cc: Jonas Ådahl <jadahl@redhat.com>
>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Cc: Alexander Goins <agoins@nvidia.com>
>> Cc: Joshua Ashton <joshua@froggi.es>
>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>> Cc: Aleix Pol <aleixpol@kde.org>
>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>> Cc: Victoria Brekenfeld <victoria@system76.com>
>> Cc: Sima <daniel@ffwll.ch>
>> Cc: Uma Shankar <uma.shankar@intel.com>
>> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
>> Cc: Christopher Braga <quic_cbraga@quicinc.com>
>> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
>> Cc: Arthur Grillo <arthurgrillo@riseup.net>
>> Cc: Hector Martin <marcan@marcan.st>
>> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
>> Cc: Sasha McIntosh <sashamcintosh@google.com>
>> ---
>>  drivers/gpu/drm/vkms/vkms_composer.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index a0a3a6fd2926..cf1dff162920 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -123,6 +123,8 @@ static u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 chan
>>  				      enum lut_channel channel)
>>  {
>>  	s64 lut_index = get_lut_index(lut, channel_value);
>> +	u16 *floor_lut_value, *ceil_lut_value;
>> +	u16 floor_channel_value, ceil_channel_value;
>>  
>>  	/*
>>  	 * This checks if `struct drm_color_lut` has any gap added by the compiler
>> @@ -130,11 +132,15 @@ static u16 apply_lut_to_channel_value(const struct vkms_color_lut *lut, u16 chan
>>  	 */
>>  	static_assert(sizeof(struct drm_color_lut) == sizeof(__u16) * 4);
>>  
>> -	u16 *floor_lut_value = (__u16 *)&lut->base[drm_fixp2int(lut_index)];
>> -	u16 *ceil_lut_value = (__u16 *)&lut->base[drm_fixp2int_ceil(lut_index)];
>> +	floor_lut_value = (__u16 *)&lut->base[drm_fixp2int(lut_index)];
>> +	if (drm_fixp2int(lut_index) == (lut->lut_length - 1))
>> +		/* We're at the end of the LUT array, use same value for ceil and floor */
>> +		ceil_lut_value = floor_lut_value;
>> +	else
>> +		ceil_lut_value = (__u16 *)&lut->base[drm_fixp2int_ceil(lut_index)];
>>  
>> -	u16 floor_channel_value = floor_lut_value[channel];
>> -	u16 ceil_channel_value = ceil_lut_value[channel];
>> +	floor_channel_value = floor_lut_value[channel];
>> +	ceil_channel_value = ceil_lut_value[channel];
>>  
>>  	return lerp_u16(floor_channel_value, ceil_channel_value,
>>  			lut_index & DRM_FIXED_DECIMAL_MASK);
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-25 20:16           ` Alex Goins
  2023-10-26  8:57             ` Pekka Paalanen
@ 2023-11-07 16:52             ` Harry Wentland
  1 sibling, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-11-07 16:52 UTC (permalink / raw)
  To: Alex Goins, Pekka Paalanen
  Cc: Sebastian Wick, Sasha McIntosh, Abhinav Kumar, Shashank Sharma,
	Xaver Hugl, Hector Martin, Liviu Dudau, Victoria Brekenfeld,
	dri-devel, wayland-devel, Melissa Wen, Michel Dänzer,
	Jonas Ådahl, Joshua Ashton, Aleix Pol, Naseer Ahmed,
	Uma Shankar, Christopher Braga, Arthur Grillo



On 2023-10-25 16:16, Alex Goins wrote:
> Thank you Harry and all other contributors for your work on this. Responses
> inline -
> 

Thanks for your comments on this. Apologies for the late response.
I was focussing on the simpler responses to my patch set first and
left your last as it's the most interesting.

> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> 
>> On Fri, 20 Oct 2023 11:23:28 -0400
>> Harry Wentland <harry.wentland@amd.com> wrote:
>>
>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>>>   
>>>>> Thanks for continuing to work on this!
>>>>>
>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:  
>>>>>> v2:
>>>>>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>>>>>  - Updated wording (Pekka)
>>>>>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>>>>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>>>>>    section (Pekka)
>>>>>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>>>>>>  - Add "Driver Implementer's Guide" section (Pekka)
>>>>>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
>>>>
>>>> ...
>>>>
>>>>>> +An example of a drm_colorop object might look like one of these::
>>>>>> +
>>>>>> +    /* 1D enumerated curve */
>>>>>> +    Color operation 42
>>>>>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
>>>>>> +    ├─ "BYPASS": bool {true, false}
>>>>>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
>>>>>> +    └─ "NEXT": immutable color operation ID = 43
> 
> I know these are just examples, but I would also like to suggest the possibility
> of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
> compared to setting an identity in some cases depending on the hardware. See
> below for more on this, RE: implicit format conversions.
> 
> Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
> offline discussions that it would nonetheless be helpful to expose enumerated
> curves in order to hide the vendor-specific complexities of programming
> segmented LUTs from clients. In that case, we would simply refer to the
> enumerated curve when calculating/choosing segmented LUT entries.
> 
> Another thing that came up in offline discussions is that we could use multiple
> color operations to program a single operation in hardware. As I understand it,
> AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
> "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
> we could combine them into a singular LUT in software, such that you can combine
> e.g. segmented PQ EOTF with night light. One caveat is that you will lose
> precision from the custom LUT where it overlaps with the linear section of the
> enumerated curve, but that is unavoidable and shouldn't be an issue in most
> use-cases.
> 

FWIW, for the most part we don't have ROMs followed by custom LUTs. We have
either a ROM-based HW block or a segmented programmable LUT. In the case of the
former we will only expose named transfer functions. In the case of the latter
we expose a named TF, followed by custom LUT and merge them into one segmented
LUT.

> Actually, the current examples in the proposal don't include a multiplier color
> op, which might be useful. For AMD as above, but also for NVIDIA as the
> following issue arises:
> 

The current examples are only examples. A multiplier coloro opwould make a lot
of sense.

> As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
> point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
> to in floating point varies depending on the source content. If it's SDR
> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
> potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
> content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
> correct?
> 

Our PQ transfer function will also map to [0.0, 125.0] without use of the HDR
multiplier. The HDR multiplier is intended to be used to scale SDR brightness
when the user moves the SDR brightness slider in the OS.

> From the given enumerated curves, it's not clear how they would map to the
> above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
> value of 125.0? That may work, but it tends towards the "descriptive" notion of

Yes, I think we need to be clear about the output range of a named transfer
function. While AMD and NVidia map PQ to [0.0, 125.0] I could see others map
it to [0.0, 1.0] (and maybe scale sRGB down to 1/125.0 or some other value).

> assuming the source content, which may not be accurate in all cases. This is
> also an issue for the custom 1D LUT, as the blob will need to be converted to
> FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
> value be, given that we no longer have any hint as to the source content?
> 

I consider input data to be in UNORM and convert that to [0.0, 1.0]. Transfer
functions (such as PQ) might then scale that beyond the [0.0, 1.0] range.

> I think a multiplier color op solves all of these issues. Named curves and
> custom 1D LUTs would by default assume a max FP16 value of 1.0, which would then
> be adjusted by the multiplier. For 80 nit SDR content, set it to 1, for 400
> nit SDR content, set it to 5, for HDR PQ content, set it to 125, etc. 
> 

The custom ROMs won't allow adjustment on AMD HW, so it would then need to be a
fixed multiplier. I would be in favor of defining the named PQ curve as

DRM_COLOROP_1D_CURVE_PQ_125_EOTF

for the [0.0, 125.0] TF, or as

DRM_COLOROP_1D_CURVE_PQ_1_EOTF

for HW that maps it to [0.0, 1.0]

>>>>>> +
>>>>>> +    /* custom 4k entry 1D LUT */
>>>>>> +    Color operation 52
>>>>>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D LUT
>>>>>> +    ├─ "BYPASS": bool {true, false}
>>>>>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
>>>>>> +    ├─ "LUT_1D": blob
>>>>>> +    └─ "NEXT": immutable color operation ID = 0
>>>>
>>>> ...
>>>>   
>>>>>> +Driver Forward/Backward Compatibility
>>>>>> +=====================================
>>>>>> +
>>>>>> +As this is uAPI drivers can't regress color pipelines that have been
>>>>>> +introduced for a given HW generation. New HW generations are free to
>>>>>> +abandon color pipelines advertised for previous generations.
>>>>>> +Nevertheless, it can be beneficial to carry support for existing color
>>>>>> +pipelines forward as those will likely already have support in DRM
>>>>>> +clients.
>>>>>> +
>>>>>> +Introducing new colorops to a pipeline is fine, as long as they can be
>>>>>> +disabled or are purely informational. DRM clients implementing support
>>>>>> +for the pipeline can always skip unknown properties as long as they can
>>>>>> +be confident that doing so will not cause unexpected results.
>>>>>> +
>>>>>> +If a new colorop doesn't fall into one of the above categories
>>>>>> +(bypassable or informational) the modified pipeline would be unusable
>>>>>> +for user space. In this case a new pipeline should be defined.    
>>>>>
>>>>> How can user space detect an informational element? Should we just add a
>>>>> BYPASS property to informational elements, make it read only and set to
>>>>> true maybe? Or something more descriptive?  
>>>>
>>>> Read-only BYPASS set to true would be fine by me, I guess.
>>>>   
>>>
>>> Don't you mean set to false? An informational element will always do
>>> something, so it can't be bypassed.
>>
>> Yeah, this is why we need a definition. I understand "informational" to
>> not change pixel values in any way. Previously I had some weird idea
>> that scaling doesn't alter color, but of course it may.
> 
> On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
> implicit fixed-point to FP16 conversions, and vice versa.
> 
> For example, the "degamma" LUT towards the beginning of the pipeline implicitly
> converts from fixed point to FP16, and some of the following operations expect
> to operate in FP16. As such, if you have a fixed point input and don't bypass
> those following operations, you *must not* bypass the LUT, even if you are
> otherwise just programming it with the identity. Conversely, if you have a
> floating point input, you *must* bypass the LUT.
> 
> Could informational elements and allowing the exclusion of the BYPASS property
> be used to convey this information to the client?  For example, we could expose
> one pipeline with the LUT exposed with read-only BYPASS set to false, and
> sandwich it with informational "Fixed Point" and "FP16" elements to accommodate
> fixed point input. Then, expose another pipeline with the LUT missing, and an
> informational "FP16" element in its place to accommodate floating point input.
> 

I wonder if an informational element at the beginning of the pipeline can
advertise the FOURCC formats this pipeline can operate on. For AMD HW we also
have certain things we can only do on RGB and not on NV12, for example.

> That's just an example; we also have other operations in the pipeline that do
> similar implicit conversions. In these cases we don't want the operations to be
> bypassed individually, so instead we would expose them as mandatory in some
> pipelines and missing in others, with informational elements to help inform the
> client of which to choose. Is that acceptable under the current proposal?
> 
> Note that in this case, the information just has to do with what format the
> pixels should be in, it doesn't correspond to any specific operation. So, I'm
> not sure that BYPASS has any meaning for informational elements in this context.
> 
>>>> I think we also need a definition of "informational".
>>>>
>>>> Counter-example 1: a colorop that represents a non-configurable  
>>>
>>> Not sure what's "counter" for these examples?
>>>
>>>> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
>>>> format. It cannot be set to bypass, it cannot be configured, and it
>>>> will alter color values.
> 
> Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
> no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
> based on the principle that read-only blobs could be used to express some static
> pipeline elements without the need to define a new type, but got mixed opinions.
> I think this demonstrates the principle further, as clients could detect this
> programmatically instead of having to special-case the informational element.
> 

That's an option. But I think a "named matrix" type might make more sense so you
don't need to create a pipeline for each read-only matrix and so userspace
doesn't need to parse the read-only matrix to find out which conversion it does.

>>>>
>>>> Counter-example 2: image size scaling colorop. It might not be
>>>> configurable, it is controlled by the plane CRTC_* and SRC_*
>>>> properties. You still need to understand what it does, so you can
>>>> arrange the scaling to work correctly. (Do not want to scale an image
>>>> with PQ-encoded values as Josh demonstrated in XDC.)
>>>>   
>>>
>>> IMO the position of the scaling operation is the thing that's important
>>> here as the color pipeline won't define scaling properties.
> 
> I agree that blending should ideally be done in linear space, and I remember
> that from Josh's presentation at XDC, but I don't recall the same being said for
> scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
> pipeline that is meant to be in PQ space (more on this below), and that was
> found to achieve better results at HDR/SDR boundaries. Of course, this only
> bolsters the argument that it would be helpful to have an informational "scaler"
> element to understand at which stage scaling takes place.
> 

I think an informational scaler makes sense. It's interesting how different HW
vendors made different design decisions here as no OS ever really defined which
space they want scaling to be performed in.

>>>> Counter-example 3: image sampling colorop. Averages FB originated color
>>>> values to produce a color sample. Again do not want to do this with
>>>> PQ-encoded values.
>>>>   
>>>
>>> Wouldn't this only happen during a scaling op?
>>
>> There is certainly some overlap between examples 2 and 3. IIRC SRC_X/Y
>> coordinates can be fractional, which makes nearest vs. bilinear
>> sampling have a difference even if there is no scaling.
>>
>> There is also the question of chroma siting with sub-sampled YUV. I
>> don't know how that actually works, or how it theoretically should work.
> 
> We have some operations in our pipeline that are intended to be static, i.e. a
> static matrix that converts from RGB to LMS, and later another that converts
> from LMS to ICtCp. There are even LUTs that are intended to be static,
> converting from linear to PQ and vice versa. All of this is because the
> pre-blending scaler and tone mapping operator are intended to operate in ICtCp
> PQ space. Although the stated LUTs and matrices are intended to be static, they
> are actually programmable. In offline discussions, it was indicated that it
> would be helpful to actually expose the programmability, as opposed to exposing
> them as non-bypassable blocks, as some compositors may have novel uses for them.
> 
> Despite being programmable, the LUTs are updated in a manner that is less
> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
> if there was some way to tag operations according to their performance,
> for example so that clients can prefer a high performance one when they
> intend to do an animated transition? I recall from the XDC HDR workshop
> that this is also an issue with AMD's 3DLUT, where updates can be too
> slow to animate.
> 

That's an interesting idea.

Harry

> Thanks,
> Alex Goins
> NVIDIA Linux Driver Team
> 
>> Thanks,
>> pq


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-26  8:57             ` Pekka Paalanen
  2023-10-26 17:30               ` Sebastian Wick
@ 2023-11-07 16:52               ` Harry Wentland
  1 sibling, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-11-07 16:52 UTC (permalink / raw)
  To: Pekka Paalanen, Alex Goins
  Cc: Sebastian Wick, Sasha McIntosh, Abhinav Kumar, Shashank Sharma,
	Xaver Hugl, Hector Martin, Liviu Dudau, Victoria Brekenfeld,
	dri-devel, wayland-devel, Melissa Wen, Michel Dänzer,
	Jonas Ådahl, Joshua Ashton, Aleix Pol, Naseer Ahmed,
	Uma Shankar, Christopher Braga, Arthur Grillo



On 2023-10-26 04:57, Pekka Paalanen wrote:
> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> Alex Goins <agoins@nvidia.com> wrote:
> 
>> Thank you Harry and all other contributors for your work on this. Responses
>> inline -
>>
>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>
>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>> Harry Wentland <harry.wentland@amd.com> wrote:
>>>   
>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:  
>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>>>>     
>>>>>> Thanks for continuing to work on this!
>>>>>>
>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:    
>>>>>>> v2:
>>>>>>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>>>>>>  - Updated wording (Pekka)
>>>>>>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>>>>>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>>>>>>    section (Pekka)
>>>>>>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>>>>>>>  - Add "Driver Implementer's Guide" section (Pekka)
>>>>>>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)  
>>>>>
>>>>> ...
>>>>>  
>>>>>>> +An example of a drm_colorop object might look like one of these::
>>>>>>> +
>>>>>>> +    /* 1D enumerated curve */
>>>>>>> +    Color operation 42
>>>>>>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4 matrix, 3D LUT, etc.} = 1D enumerated curve
>>>>>>> +    ├─ "BYPASS": bool {true, false}
>>>>>>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ inverse EOTF, …}
>>>>>>> +    └─ "NEXT": immutable color operation ID = 43  
>>
>> I know these are just examples, but I would also like to suggest the possibility
>> of an "identity" CURVE_1D_TYPE. BYPASS = true might get different results
>> compared to setting an identity in some cases depending on the hardware. See
>> below for more on this, RE: implicit format conversions.
>>
>> Although NVIDIA hardware doesn't use a ROM for enumerated curves, it came up in
>> offline discussions that it would nonetheless be helpful to expose enumerated
>> curves in order to hide the vendor-specific complexities of programming
>> segmented LUTs from clients. In that case, we would simply refer to the
>> enumerated curve when calculating/choosing segmented LUT entries.
> 
> That's a good idea.
> 
>> Another thing that came up in offline discussions is that we could use multiple
>> color operations to program a single operation in hardware. As I understand it,
>> AMD has a ROM-defined LUT, followed by a custom 4K entry LUT, followed by an
>> "HDR Multiplier". On NVIDIA we don't have these as separate hardware stages, but
>> we could combine them into a singular LUT in software, such that you can combine
>> e.g. segmented PQ EOTF with night light. One caveat is that you will lose
>> precision from the custom LUT where it overlaps with the linear section of the
>> enumerated curve, but that is unavoidable and shouldn't be an issue in most
>> use-cases.
> 
> Indeed.
> 
>> Actually, the current examples in the proposal don't include a multiplier color
>> op, which might be useful. For AMD as above, but also for NVIDIA as the
>> following issue arises:
>>
>> As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
>> point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
>> to in floating point varies depending on the source content. If it's SDR
>> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
>> potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
>> content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
>> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
>> correct?
> 
> It would be against the UAPI design principles to tag content as HDR or
> SDR. What you can do instead is to expose a colorop with a multiplier of
> 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
> that the input is SDR or HDR to get the expected multiplier. You will
> never know what the content actually is, anyway.
> 
> Of course, if we want to have a arbitrary multiplier colorop that is
> somewhat standard, as in, exposed by many drivers to ease userspace
> development, you can certainly use any combination of your hardware
> features you need to realize the UAPI prescribed mathematical operation.
> 
> Since we are talking about floating-point in hardware, a multiplier
> does not significantly affect precision.
> 
> In order to mathematically define all colorops, I believe it is
> necessary to define all colorops in terms of floating-point values (as
> in math), even if they operate on fixed-point or integer. By this I
> mean that if the input is 8 bpc unsigned integer pixel format for
> instance, 0 raw pixel channel value is mapped to 0.0 and 255 is mapped
> to 1.0, and the color pipeline starts with [0.0, 1.0], not [0, 255]
> domain. We have to agree on this mapping for all channels on all pixel
> formats. However, there is a "but" further below.
> 
> I also propose that quantization range is NOT considered in the raw
> value mapping, so that we can handle quantization range in colorops
> explicitly, allowing us to e.g. handle sub-blacks and super-whites when
> necessary. (These are currently impossible to represent in the legacy
> color properties, because everything is converted to full range and
> clipped before any color operations.)
> 

I pretty much agree with anything you say up to here. :)

>> From the given enumerated curves, it's not clear how they would map to the
>> above. Should sRGB EOTF have a max FP16 value of 1.0, and the PQ EOTF a max FP16
>> value of 125.0? That may work, but it tends towards the "descriptive" notion of
>> assuming the source content, which may not be accurate in all cases. This is
>> also an issue for the custom 1D LUT, as the blob will need to be converted to
>> FP16 in order to populate our "degamma" LUT. What should the resulting max FP16
>> value be, given that we no longer have any hint as to the source content?
> 
> In my opinion, all finite non-negative transfer functions should
> operate with [0.0, 1.0] domain and [0.0, 1.0] range, and that includes
> all sRGB, power 2.2, and PQ curves.
> 

That wouldn't work with AMD HW that encodes a PQ transfer function that
has an output range of [0.0, 125.0]. I suggest making the range a part
of the named TF definition.

> If we look at BT.2100, there is no such encoding even mentioned where
> 125.0 would correspond to 10k cd/m². That 125.0 convention already has
> a built-in assumption what the color spaces are and what the conversion
> is aiming to do. IOW, I would say that choice is opinionated from the
> start. The multiplier in BT.2100 is always 10000.
> 

Sure, the choice is opinionated but a certain large OS vendor has had
a large influence in how HW vendors designed their color pipelines.

snip

>> On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
>> implicit fixed-point to FP16 conversions, and vice versa.
> 
> Above, I claimed that the UAPI should be defined in nominal
> floating-point values, but I wonder, would that work? Would we need to
> have explicit colorops for converting from raw pixel data values into
> nominal floating-point in the UAPI?
> 

I think it's important that we keep a level of abstraction a the driver level.
I'm not sure it would serve anyone if we defined this.

snip

>>>>> I think we also need a definition of "informational".
>>>>>
>>>>> Counter-example 1: a colorop that represents a non-configurable    
>>>>
>>>> Not sure what's "counter" for these examples?
>>>>   
>>>>> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
>>>>> format. It cannot be set to bypass, it cannot be configured, and it
>>>>> will alter color values.  
>>
>> Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
>> no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
>> based on the principle that read-only blobs could be used to express some static
>> pipeline elements without the need to define a new type, but got mixed opinions.
>> I think this demonstrates the principle further, as clients could detect this
>> programmatically instead of having to special-case the informational element.
> 
> If the blob depends on the pixel format (i.e. the driver automatically
> chooses a different blob per pixel format), then I think we would need
> to expose all the blobs and how they correspond to pixel formats.
> Otherwise ok, I guess.
> 
> However, do we want or need to make a color pipeline or colorop
> conditional on pixel formats? For example, if you use a YUV 4:2:0 type
> of pixel format, then you must use this pipeline and not any other. Or
> floating-point type of pixel format. I did not anticipate this before,
> I assumed that all color pipelines and colorops are independent of the
> framebuffer pixel format. A specific colorop might have a property that
> needs to agree with the framebuffer pixel format, but I didn't expect
> further limitations.
> 

Yes, I think we'll want that.

> "Without the need to define a new type" is something I think we need to
> consider case by case. I have a hard time giving a general opinion.
> 
>>>>>
>>>>> Counter-example 2: image size scaling colorop. It might not be
>>>>> configurable, it is controlled by the plane CRTC_* and SRC_*
>>>>> properties. You still need to understand what it does, so you can
>>>>> arrange the scaling to work correctly. (Do not want to scale an image
>>>>> with PQ-encoded values as Josh demonstrated in XDC.)
>>>>>     
>>>>
>>>> IMO the position of the scaling operation is the thing that's important
>>>> here as the color pipeline won't define scaling properties.  
>>
>> I agree that blending should ideally be done in linear space, and I remember
>> that from Josh's presentation at XDC, but I don't recall the same being said for
>> scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
>> pipeline that is meant to be in PQ space (more on this below), and that was
>> found to achieve better results at HDR/SDR boundaries. Of course, this only
>> bolsters the argument that it would be helpful to have an informational "scaler"
>> element to understand at which stage scaling takes place.
> 
> Both blending and scaling are fundamentally the same operation: you
> have two or more source colors (pixels), and you want to compute a
> weighted average of them following what happens in nature, that is,
> physics, as that is what humans are used to.
> 
> Both blending and scaling will suffer from the same problems if the
> operation is performed on not light-linear values. The result of the
> weighted average does not correspond to physics.
> 
> The problem may be hard to observe with natural imagery, but Josh's
> example shows it very clearly. Maybe that effect is sometimes useful
> for some imagery in some use cases, but it is still an accidental
> side-effect. You might get even better results if you don't rely on
> accidental side-effects but design a separate operation for the exact
> goal you have.
> 

Many people looked at this problem inside AMD and probably at other
companies. Not all of them arrive at the same conclusion. The type of
image will also greatly affect what one considers better.

But it sounds like we'll need an informational scaling element at least
for compositors that care. Do we need that as a first iteration of a
working DRM/KMS solution, though? So far other OSes have not cared and
people have (probably) not complained about it.

snip

>> Despite being programmable, the LUTs are updated in a manner that is less
>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
>> if there was some way to tag operations according to their performance,
>> for example so that clients can prefer a high performance one when they
>> intend to do an animated transition? I recall from the XDC HDR workshop
>> that this is also an issue with AMD's 3DLUT, where updates can be too
>> slow to animate.
> 
> I can certainly see such information being useful, but then we need to
> somehow quantize the performance.
> 
> What I was left puzzled about after the XDC workshop is that is it
> possible to pre-load configurations in the background (slow), and then
> quickly switch between them? Hardware-wise I mean.
> 

On AMD HW, yes. How to fit that into the atomic API is a separate
question. :D

Harry

> 
> Thanks,
> pq


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-26 17:30               ` Sebastian Wick
  2023-10-26 19:25                 ` Alex Goins
@ 2023-11-07 16:52                 ` Harry Wentland
  2023-11-07 21:17                   ` Sebastian Wick
  1 sibling, 1 reply; 49+ messages in thread
From: Harry Wentland @ 2023-11-07 16:52 UTC (permalink / raw)
  To: Sebastian Wick, Pekka Paalanen
  Cc: Sasha McIntosh, Abhinav Kumar, Shashank Sharma, Xaver Hugl,
	Hector Martin, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	wayland-devel, Melissa Wen, Michel Dänzer, Jonas Ådahl,
	Joshua Ashton, Aleix Pol, Naseer Ahmed, Uma Shankar,
	Christopher Braga, Arthur Grillo



On 2023-10-26 13:30, Sebastian Wick wrote:
> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>> Alex Goins <agoins@nvidia.com> wrote:
>>
>>> Thank you Harry and all other contributors for your work on this. Responses
>>> inline -
>>>
>>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>>
>>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>>> Harry Wentland <harry.wentland@amd.com> wrote:
>>>>   
>>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:  
>>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>>>>>     
>>>>>>> Thanks for continuing to work on this!
>>>>>>>
>>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:    

snip

>>
>>>>>> I think we also need a definition of "informational".
>>>>>>
>>>>>> Counter-example 1: a colorop that represents a non-configurable    
>>>>>
>>>>> Not sure what's "counter" for these examples?
>>>>>   
>>>>>> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
>>>>>> format. It cannot be set to bypass, it cannot be configured, and it
>>>>>> will alter color values.  
>>>
>>> Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
>>> no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
>>> based on the principle that read-only blobs could be used to express some static
>>> pipeline elements without the need to define a new type, but got mixed opinions.
>>> I think this demonstrates the principle further, as clients could detect this
>>> programmatically instead of having to special-case the informational element.
>>
> 
> I'm all for exposing fixed color ops but I suspect that most of those
> follow some standard and in those cases instead of exposing the matrix
> values one should prefer to expose a named matrix (e.g. BT.601, BT.709,
> BT.2020).
> 

Agreed.

> As a general rule: always expose the highest level description. Going
> from a name to exact values is trivial, going from values to a name is
> much harder.
> 
>> If the blob depends on the pixel format (i.e. the driver automatically
>> chooses a different blob per pixel format), then I think we would need
>> to expose all the blobs and how they correspond to pixel formats.
>> Otherwise ok, I guess.
>>
>> However, do we want or need to make a color pipeline or colorop
>> conditional on pixel formats? For example, if you use a YUV 4:2:0 type
>> of pixel format, then you must use this pipeline and not any other. Or
>> floating-point type of pixel format. I did not anticipate this before,
>> I assumed that all color pipelines and colorops are independent of the
>> framebuffer pixel format. A specific colorop might have a property that
>> needs to agree with the framebuffer pixel format, but I didn't expect
>> further limitations.
> 
> We could simply fail commits when the pipeline and pixel format don't
> work together. We'll probably need some kind of ingress no-op node
> anyway and maybe could list pixel formats there if required to make it
> easier to find a working configuration.
> 

The problem with failing commits is that user-space has no idea why it
failed. If this means that userspace falls back to SW composition for
NV12 and P010 it would avoid HW offloading in one of the most important
use-cases on AMD HW for power-saving purposes.

snip

>>> Despite being programmable, the LUTs are updated in a manner that is less
>>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
>>> if there was some way to tag operations according to their performance,
>>> for example so that clients can prefer a high performance one when they
>>> intend to do an animated transition? I recall from the XDC HDR workshop
>>> that this is also an issue with AMD's 3DLUT, where updates can be too
>>> slow to animate.
>>
>> I can certainly see such information being useful, but then we need to
>> somehow quantize the performance.
>>
>> What I was left puzzled about after the XDC workshop is that is it
>> possible to pre-load configurations in the background (slow), and then
>> quickly switch between them? Hardware-wise I mean.
> 
> We could define that pipelines with a lower ID are to be preferred over
> higher IDs.
> 
> The issue is that if programming a pipeline becomes too slow to be
> useful it probably should just not be made available to user space.
> 
> The prepare-commit idea for blob properties would help to make the
> pipelines usable again, but until then it's probably a good idea to just
> not expose those pipelines.
> 

It's a bit of a judgment call what's too slow, though. The value of having
a HW colorop might outweigh the cost of the programming time for some
compositors but not for others.

Harry

>>
>>
>> Thanks,
>> pq
> 
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-26 19:25                 ` Alex Goins
  2023-10-27  8:59                   ` Michel Dänzer
  2023-11-04 23:01                   ` Christopher Braga
@ 2023-11-07 16:52                   ` Harry Wentland
  2 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-11-07 16:52 UTC (permalink / raw)
  To: Alex Goins, Sebastian Wick
  Cc: Aleix Pol, Sasha McIntosh, Abhinav Kumar, Shashank Sharma,
	Xaver Hugl, Hector Martin, Liviu Dudau, Pekka Paalanen,
	dri-devel, Victoria Brekenfeld, Melissa Wen, Michel Dänzer,
	Jonas Ådahl, Joshua Ashton, Uma Shankar, Naseer Ahmed,
	wayland-devel, Christopher Braga, Arthur Grillo



On 2023-10-26 15:25, Alex Goins wrote:
> On Thu, 26 Oct 2023, Sebastian Wick wrote:
> 
>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>> Alex Goins <agoins@nvidia.com> wrote:
>>>
>>>> Thank you Harry and all other contributors for your work on this. Responses
>>>> inline -
>>>>
>>>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>>>
>>>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>>>> Harry Wentland <harry.wentland@amd.com> wrote:
>>>>>
>>>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
>>>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>>>>>>
>>>>>>>> Thanks for continuing to work on this!
>>>>>>>>
>>>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

snip

>>>
>>> If we look at BT.2100, there is no such encoding even mentioned where
>>> 125.0 would correspond to 10k cd/m². That 125.0 convention already has
>>> a built-in assumption what the color spaces are and what the conversion
>>> is aiming to do. IOW, I would say that choice is opinionated from the
>>> start. The multiplier in BT.2100 is always 10000.
> 
> Be that as it may, the convention of FP16 125.0 corresponding to 10k nits is
> baked in our hardware, so it's unavoidable at least for NVIDIA pipelines.
> 

Yeah, that's not just NVidia, it's basically the same for AMD. Though I
think we can work without that assumption, but the PQ TF you get from AMD
will map to [0.0, 125.0].

snip

>>
>> We could simply fail commits when the pipeline and pixel format don't
>> work together. We'll probably need some kind of ingress no-op node
>> anyway and maybe could list pixel formats there if required to make it
>> easier to find a working configuration.
> 
> Yeah, we could, but having to figure that out through trial and error would be
> unfortunate. Per above, it might be easiest to just tag pipelines with a pixel
> format instead of trying to include the pixel format conversion as a color op.
> 

Agreed, We've been looking at libliftoff a bit but one of the problem is
that it does a lot of atomic checks to figure out an optimal HW plane
configuration and we run out of time budget before we're able to check
all options.

Atomic check failure is really not well suited for this stuff.


>>> "Without the need to define a new type" is something I think we need to
>>> consider case by case. I have a hard time giving a general opinion.
>>>
>>>>>>>
>>>>>>> Counter-example 2: image size scaling colorop. It might not be
>>>>>>> configurable, it is controlled by the plane CRTC_* and SRC_*
>>>>>>> properties. You still need to understand what it does, so you can
>>>>>>> arrange the scaling to work correctly. (Do not want to scale an image
>>>>>>> with PQ-encoded values as Josh demonstrated in XDC.)
>>>>>>>
>>>>>>
>>>>>> IMO the position of the scaling operation is the thing that's important
>>>>>> here as the color pipeline won't define scaling properties.
>>>>
>>>> I agree that blending should ideally be done in linear space, and I remember
>>>> that from Josh's presentation at XDC, but I don't recall the same being said for
>>>> scaling. In fact, the NVIDIA pre-blending scaler exists in a stage of the
>>>> pipeline that is meant to be in PQ space (more on this below), and that was
>>>> found to achieve better results at HDR/SDR boundaries. Of course, this only
>>>> bolsters the argument that it would be helpful to have an informational "scaler"
>>>> element to understand at which stage scaling takes place.
>>>
>>> Both blending and scaling are fundamentally the same operation: you
>>> have two or more source colors (pixels), and you want to compute a
>>> weighted average of them following what happens in nature, that is,
>>> physics, as that is what humans are used to.
>>>
>>> Both blending and scaling will suffer from the same problems if the
>>> operation is performed on not light-linear values. The result of the
>>> weighted average does not correspond to physics.
>>>
>>> The problem may be hard to observe with natural imagery, but Josh's
>>> example shows it very clearly. Maybe that effect is sometimes useful
>>> for some imagery in some use cases, but it is still an accidental
>>> side-effect. You might get even better results if you don't rely on
>>> accidental side-effects but design a separate operation for the exact
>>> goal you have.
>>>
>>> Mind, by scaling we mean changing image size. Not scaling color values.
>>>
> 
> Fair enough, but it might not always be a choice given the hardware.
> 

I'm thinking of this as an information element, not a programmable.
Some HW could define this as programmable, but I probably wouldn't
on AMD HW.

snip

>>>
>>> What I was left puzzled about after the XDC workshop is that is it
>>> possible to pre-load configurations in the background (slow), and then
>>> quickly switch between them? Hardware-wise I mean.
> 
> This works fine for our "fast" LUTs, you just point them to a surface in video
> memory and they flip to it. You could keep multiple surfaces around and flip
> between them without having to reprogram them in software. We can easily do that
> with enumerated curves, populating them when the driver initializes instead of
> waiting for the client to request them. You can even point multiple hardware
> LUTs to the same video memory surface, if they need the same curve.
> 

Ultimately I think that's the best way to solve this problem, but it needs
HW that can do this.

snip

>>
>> The prepare-commit idea for blob properties would help to make the
>> pipelines usable again, but until then it's probably a good idea to just
>> not expose those pipelines.
> 
> The prepare-commit idea actually wouldn't work for these LUTs, because they are
> programmed using methods instead of pointing them to a surface. I'm actually not
> sure how slow it actually is, would need to benchmark it. I think not exposing
> them at all would be overkill, since it would mean you can't use the preblending
> scaler or tonemapper, and animation isn't necessary for that.
> 

I tend to agree. Maybe a "Heavy Operation" flag that tells userspace they can
use it but it might come at a significant cost.

Harry

> The AMD 3DLUT is another example of a LUT that is slow to update, and it would
> obviously be a major loss if that wasn't exposed. There just needs to be some
> way for clients to know if they are going to kill performance by trying to
> change it every frame.
> 
> Thanks,
> Alex
> 
>>
>>>
>>>
>>> Thanks,
>>> pq
>>
>>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-04 23:01                   ` Christopher Braga
@ 2023-11-07 16:52                     ` Harry Wentland
  0 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-11-07 16:52 UTC (permalink / raw)
  To: Christopher Braga, Alex Goins, Sebastian Wick
  Cc: Aleix Pol, Sasha McIntosh, Abhinav Kumar, Shashank Sharma,
	Xaver Hugl, Hector Martin, Liviu Dudau, Pekka Paalanen,
	dri-devel, Victoria Brekenfeld, Melissa Wen, Michel Dänzer,
	Jonas Ådahl, Joshua Ashton, Uma Shankar, Naseer Ahmed,
	wayland-devel, Arthur Grillo



On 2023-11-04 19:01, Christopher Braga wrote:
> Just want to loop back to before we branched off deeper into the programming performance talk
> 
> On 10/26/2023 3:25 PM, Alex Goins wrote:
>> On Thu, 26 Oct 2023, Sebastian Wick wrote:
>>
>>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>>> Alex Goins <agoins@nvidia.com> wrote:
>>>>
>>>>> Thank you Harry and all other contributors for your work on this. Responses
>>>>> inline -
>>>>>
>>>>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>>>>
>>>>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>>>>> Harry Wentland <harry.wentland@amd.com> wrote:
>>>>>>
>>>>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
>>>>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>>>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks for continuing to work on this!
>>>>>>>>>
>>>>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

snip

>>>>> Actually, the current examples in the proposal don't include a multiplier color
>>>>> op, which might be useful. For AMD as above, but also for NVIDIA as the
>>>>> following issue arises:
>>>>>
>>>>> As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
> 
> If possible, let's declare this as two blocks. One that informatively declares the conversion is present, and another for the de-gamma. This will help with block-reuse between vendors.
> 
>>>>> point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
>>>>> to in floating point varies depending on the source content. If it's SDR
>>>>> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
>>>>> potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
>>>>> content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
>>>>> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
>>>>> correct?
>>>>
>>>> It would be against the UAPI design principles to tag content as HDR or
>>>> SDR. What you can do instead is to expose a colorop with a multiplier of
>>>> 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
>>>> that the input is SDR or HDR to get the expected multiplier. You will
>>>> never know what the content actually is, anyway.
>>
>> Right, I didn't mean to suggest that we should tag content as HDR or SDR in the
>> UAPI, just relating to the end result in the pipe, ultimately it would be
>> determined by the multiplier color op.
>>
> 
> A multiplier could work but we would should give OEMs the option to either make it "informative" and fixed by the hardware, or fully configurable. With the Qualcomm pipeline how we absorb FP16 pixel buffers, as well as how we convert them to fixed point data actually has a dependency on the desired de-gamma and gamma processing. So for an example:
> 
> If a source pixel buffer is scRGB encoded FP16 content we would expect input pixel content to be up to 7.5, with the IGC output reaching 125 as in the NVIDIA case. Likewise gamma 2.2 encoded FP16 content would be 0-1 in and 0-1 out.
> 
> So in the Qualcomm case the expectations are fixed depending on the use case.
> 
> It is sounding to me like we would need to be able to declare three things here:
> 1. Value range expectations *into* the de-gamma block. A multiplier wouldn't work here because it would be more of a clipping operation. I guess we would have to add an explicit clamping block as well.
> 2. What the value range expectations  at the *output* of de-gamma processing block. Also covered by using another multiplier block.
> 3. Value range expectations *into* a gamma processing block. This should be covered by declaring a multiplier post-csc, but only assuming CSC output is normalized in the desired value range. A clamping block would be preferable because it describes what happens when it isn't.
> 

What about adding informational input and output range properties
to colorops? I think Intel's PWL definitions had something like
that, but I'd have to take a look at that again. While I'm not
in favor of defining segmented LUTs at the uAPI the input/output
ranges seem to be something of value.

> All this is do-able, but it seems like it would require the definition of multiple color pipelines to expose the different limitations for color block configuration combinations. Additionally, would it be easy for user space to find the right pipeline?
> 

I'm also a little concerned that some of these proposals mean we'd
have to expose an inordinate number of color pipelines and color
pipeline selection becomes difficult and error prone.

snip

>>>> Given that elements like various kinds of look-up tables inherently
>>>> assume that the domain is [0.0, 1.0] (because the it is a table that
>>>> has a beginning and an end, and the usual convention is that the
>>>> beginning is zero and the end is one), I think it is best to stick to
>>>> the [0.0, 1.0] range where possible. If we go out of that range, then
>>>> we have to define how a LUT would apply in a sensible way.
>>
>> In my last reply I mentioned a static (but actually programmable) LUT that is
>> typically used to convert FP16 linear pixels to fixed point PQ before handing
>> them to the scaler and tone mapping operator. You're actually right that it
>> indexes in the fixed point [0.0, 1.0] range for the reasons you describe, but
>> because the input pixels are expected to be FP16 in the [0.0, 125.0] range, it
>> applies a non-programmable 1/125.0 normalization factor first.
>>
>> In this case, you could think of the LUT as indexing on [0.0, 125.0], but as you
>> point out there would need to be some way to describe that. Maybe we actually
>> need a fractional multiplier / divider color op. NVIDIA pipes that include this
>> LUT would need to include a mandatory 1/125.0 factor immediately prior to the
>> LUT, then LUT can continue assuming a range of [0.0, 1.0].
>>
>> Assuming you are using the hardware in a conventional way, specifying a
>> multiplier of 1.0 after the "degamma" LUT would then map to the 80-nit PQ range
>> after the static (but actually programmable) PQ LUT, whereas specifying a
>> multiplier of 125.0 would map to the 10,000-nit PQ range, which is what we want.
>> I guess it's kind of messy, but the effect would be that color ops other than
>> multipliers/dividers would still be in the [0.0, 1.0] domain, and any multiplier
>> that exceeds that range would have to be normalized by a divider before any
>> other color op.
>>
> 
> Hmm. A multiplier would resolve issues when input linear FP16 data that has different ideas on what 1.0 means in regards to nits values (think of Apple's EDR as an example). For a client to go from their definition to hardware definition of 1.0 = x nits, we would need to expose what the pipeline sees as 1.0 though. So in this case the multiplier would be programmable, but the divisor is informational? It seems like the later would have an influence on how the former is programmed.
> 

A programmable multiplier would either need to be backed by a HW block
to perform the operation or require a driver to scale the LUT or matrix
values of an adjacent LUT or matrix block.

snip

>>>>>>
>>>>>> Yeah, this is why we need a definition. I understand "informational" to
>>>>>> not change pixel values in any way. Previously I had some weird idea
>>>>>> that scaling doesn't alter color, but of course it may.
>>>>>
>>>>> On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
>>>>> implicit fixed-point to FP16 conversions, and vice versa.
>>>>
>>>> Above, I claimed that the UAPI should be defined in nominal
>>>> floating-point values, but I wonder, would that work? Would we need to
>>>> have explicit colorops for converting from raw pixel data values into
>>>> nominal floating-point in the UAPI?
>>
>> Yeah, I think something like that is needed, or another solution as discussed
>> below. Even if we define the UAPI in terms of floating point, the actual
>> underlying pixel format needs to match the expectations of each stage as it
>> flows through the pipe.
>>
> 
> Strongly agree on this. Pixel format and block relationships definitely exist.
> 

Interesting to see this isn't just an AMD thing. :)

snip

>>>>
>>>> Both blending and scaling are fundamentally the same operation: you
>>>> have two or more source colors (pixels), and you want to compute a
>>>> weighted average of them following what happens in nature, that is,
>>>> physics, as that is what humans are used to.
>>>>
>>>> Both blending and scaling will suffer from the same problems if the
>>>> operation is performed on not light-linear values. The result of the
>>>> weighted average does not correspond to physics.
>>>>
>>>> The problem may be hard to observe with natural imagery, but Josh's
>>>> example shows it very clearly. Maybe that effect is sometimes useful
>>>> for some imagery in some use cases, but it is still an accidental
>>>> side-effect. You might get even better results if you don't rely on
>>>> accidental side-effects but design a separate operation for the exact
>>>> goal you have.
>>>>
>>>> Mind, by scaling we mean changing image size. Not scaling color values.
>>>>
>>
>> Fair enough, but it might not always be a choice given the hardware.
>>
> 
> Agreeing with Alex here. I get there is some debate over the best way to do this, but I think it is best to leave it up to the driver to declare how that is done.

Same.

snip

>>>>
>>>> What I was left puzzled about after the XDC workshop is that is it
>>>> possible to pre-load configurations in the background (slow), and then
>>>> quickly switch between them? Hardware-wise I mean.
>>
>> This works fine for our "fast" LUTs, you just point them to a surface in video
>> memory and they flip to it. You could keep multiple surfaces around and flip
>> between them without having to reprogram them in software. We can easily do that
>> with enumerated curves, populating them when the driver initializes instead of
>> waiting for the client to request them. You can even point multiple hardware
>> LUTs to the same video memory surface, if they need the same curve.
>>
>>>
>>> We could define that pipelines with a lower ID are to be preferred over
>>> higher IDs.
>>
>> Sure, but this isn't just an issue with a pipeline as a whole, but the
>> individual elements within it and how to use them in a given context.
>>
>>>
>>> The issue is that if programming a pipeline becomes too slow to be
>>> useful it probably should just not be made available to user space.
>>
>> It's not that programming the pipeline is overall too slow. The LUTs we have
>> that are relatively slow to program are meant to be set infrequently, or even
>> just once, to allow the scaler and tone mapping operator to operate in fixed
>> point PQ space. You might still want the tone mapper, so you would choose a
>> pipeline that includes them, but when it comes to e.g. animating a night light,
>> you would want to choose a different LUT for that purpose.
>>
>>>
>>> The prepare-commit idea for blob properties would help to make the
>>> pipelines usable again, but until then it's probably a good idea to just
>>> not expose those pipelines.
>>
>> The prepare-commit idea actually wouldn't work for these LUTs, because they are
>> programmed using methods instead of pointing them to a surface. I'm actually not
>> sure how slow it actually is, would need to benchmark it. I think not exposing
>> them at all would be overkill, since it would mean you can't use the preblending
>> scaler or tonemapper, and animation isn't necessary for that.
>>
>> The AMD 3DLUT is another example of a LUT that is slow to update, and it would
>> obviously be a major loss if that wasn't exposed. There just needs to be some
>> way for clients to know if they are going to kill performance by trying to
>> change it every frame.
>>
>> Thanks,
>> Alex
>>
> 
> To clarify, what are we defining as slow to update here? Something we aren't able to update within a frame (let's say at a low frame rate such as 30 fps for discussion's sake)? A block that requires a programming sequence of disable + program + enable to update? Defining performance seems like it can get murky if we start to consider frame concurrent updates among multiple color blocks as well.
> 

I think any definition for slow would need to be imprecise on some level.
In the AMD 3DLUT case we can take around 8 ms. Some compositors need the
programming time to be well under 1 ms, even for low frame rates. Those
compositors might want to know if an operation might be undesirable if
they care about latency. I'm not sure we could reliably indicate more.

Harry

> Thanks,
> Christopher
> 
>>>
>>>>
>>>>
>>>> Thanks,
>>>> pq
>>>
>>>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-07 16:52                 ` Harry Wentland
@ 2023-11-07 21:17                   ` Sebastian Wick
  0 siblings, 0 replies; 49+ messages in thread
From: Sebastian Wick @ 2023-11-07 21:17 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Aleix Pol, Shashank Sharma,
	wayland-devel, Jonas Ådahl, Uma Shankar, Abhinav Kumar,
	Naseer Ahmed, Melissa Wen, Pekka Paalanen, Christopher Braga,
	Hector Martin, Xaver Hugl, Joshua Ashton

On Tue, Nov 07, 2023 at 11:52:11AM -0500, Harry Wentland wrote:
> 
> 
> On 2023-10-26 13:30, Sebastian Wick wrote:
> > On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
> >> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
> >> Alex Goins <agoins@nvidia.com> wrote:
> >>
> >>> Thank you Harry and all other contributors for your work on this. Responses
> >>> inline -
> >>>
> >>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
> >>>
> >>>> On Fri, 20 Oct 2023 11:23:28 -0400
> >>>> Harry Wentland <harry.wentland@amd.com> wrote:
> >>>>   
> >>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:  
> >>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
> >>>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
> >>>>>>     
> >>>>>>> Thanks for continuing to work on this!
> >>>>>>>
> >>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:    
> 
> snip
> 
> >>
> >>>>>> I think we also need a definition of "informational".
> >>>>>>
> >>>>>> Counter-example 1: a colorop that represents a non-configurable    
> >>>>>
> >>>>> Not sure what's "counter" for these examples?
> >>>>>   
> >>>>>> YUV<->RGB conversion. Maybe it determines its operation from FB pixel
> >>>>>> format. It cannot be set to bypass, it cannot be configured, and it
> >>>>>> will alter color values.  
> >>>
> >>> Would it be reasonable to expose this is a 3x4 matrix with a read-only blob and
> >>> no BYPASS property? I already brought up a similar idea at the XDC HDR Workshop
> >>> based on the principle that read-only blobs could be used to express some static
> >>> pipeline elements without the need to define a new type, but got mixed opinions.
> >>> I think this demonstrates the principle further, as clients could detect this
> >>> programmatically instead of having to special-case the informational element.
> >>
> > 
> > I'm all for exposing fixed color ops but I suspect that most of those
> > follow some standard and in those cases instead of exposing the matrix
> > values one should prefer to expose a named matrix (e.g. BT.601, BT.709,
> > BT.2020).
> > 
> 
> Agreed.
> 
> > As a general rule: always expose the highest level description. Going
> > from a name to exact values is trivial, going from values to a name is
> > much harder.
> > 
> >> If the blob depends on the pixel format (i.e. the driver automatically
> >> chooses a different blob per pixel format), then I think we would need
> >> to expose all the blobs and how they correspond to pixel formats.
> >> Otherwise ok, I guess.
> >>
> >> However, do we want or need to make a color pipeline or colorop
> >> conditional on pixel formats? For example, if you use a YUV 4:2:0 type
> >> of pixel format, then you must use this pipeline and not any other. Or
> >> floating-point type of pixel format. I did not anticipate this before,
> >> I assumed that all color pipelines and colorops are independent of the
> >> framebuffer pixel format. A specific colorop might have a property that
> >> needs to agree with the framebuffer pixel format, but I didn't expect
> >> further limitations.
> > 
> > We could simply fail commits when the pipeline and pixel format don't
> > work together. We'll probably need some kind of ingress no-op node
> > anyway and maybe could list pixel formats there if required to make it
> > easier to find a working configuration.
> > 
> 
> The problem with failing commits is that user-space has no idea why it
> failed. If this means that userspace falls back to SW composition for
> NV12 and P010 it would avoid HW offloading in one of the most important
> use-cases on AMD HW for power-saving purposes.

Exposing which pixel formats work with a pipeline should be
uncontroversial, and so should be an informative scaler op.

Both can be added without a problem at a later time, so let's not make
any of that mandatory for the first version. One step after the other.

> 
> snip
> 
> >>> Despite being programmable, the LUTs are updated in a manner that is less
> >>> efficient as compared to e.g. the non-static "degamma" LUT. Would it be helpful
> >>> if there was some way to tag operations according to their performance,
> >>> for example so that clients can prefer a high performance one when they
> >>> intend to do an animated transition? I recall from the XDC HDR workshop
> >>> that this is also an issue with AMD's 3DLUT, where updates can be too
> >>> slow to animate.
> >>
> >> I can certainly see such information being useful, but then we need to
> >> somehow quantize the performance.
> >>
> >> What I was left puzzled about after the XDC workshop is that is it
> >> possible to pre-load configurations in the background (slow), and then
> >> quickly switch between them? Hardware-wise I mean.
> > 
> > We could define that pipelines with a lower ID are to be preferred over
> > higher IDs.
> > 
> > The issue is that if programming a pipeline becomes too slow to be
> > useful it probably should just not be made available to user space.
> > 
> > The prepare-commit idea for blob properties would help to make the
> > pipelines usable again, but until then it's probably a good idea to just
> > not expose those pipelines.
> > 
> 
> It's a bit of a judgment call what's too slow, though. The value of having
> a HW colorop might outweigh the cost of the programming time for some
> compositors but not for others.
> 
> Harry
> 
> >>
> >>
> >> Thanks,
> >> pq
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS
  2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
                   ` (16 preceding siblings ...)
  2023-10-19 21:21 ` [RFC PATCH v2 17/17] drm/vkms: Add kunit tests for linear and sRGB LUTs Harry Wentland
@ 2023-11-08 11:54 ` Shankar, Uma
  2023-11-08 14:32   ` Harry Wentland
  17 siblings, 1 reply; 49+ messages in thread
From: Shankar, Uma @ 2023-11-08 11:54 UTC (permalink / raw)
  To: Harry Wentland, dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Abhinav Kumar,
	Naseer Ahmed, Melissa Wen, Aleix Pol, Christopher Braga,
	Pekka Paalanen, Hector Martin, Xaver Hugl, Joshua Ashton



> -----Original Message-----
> From: Harry Wentland <harry.wentland@amd.com>
> Sent: Friday, October 20, 2023 2:51 AM
> To: dri-devel@lists.freedesktop.org
> Cc: wayland-devel@lists.freedesktop.org; Harry Wentland
> <harry.wentland@amd.com>; Ville Syrjala <ville.syrjala@linux.intel.com>; Pekka
> Paalanen <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>;
> Melissa Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>; Sebastian
> Wick <sebastian.wick@redhat.com>; Shashank Sharma
> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>; Joshua
> Ashton <joshua@froggi.es>; Michel Dänzer <mdaenzer@redhat.com>; Aleix Pol
> <aleixpol@kde.org>; Xaver Hugl <xaver.hugl@gmail.com>; Victoria Brekenfeld
> <victoria@system76.com>; Sima <daniel@ffwll.ch>; Shankar, Uma
> <uma.shankar@intel.com>; Naseer Ahmed <quic_naseer@quicinc.com>;
> Christopher Braga <quic_cbraga@quicinc.com>; Abhinav Kumar
> <quic_abhinavk@quicinc.com>; Arthur Grillo <arthurgrillo@riseup.net>; Hector
> Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha
> McIntosh <sashamcintosh@google.com>
> Subject: [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS
> 
> This is an early RFC set for a color pipeline API, along with a sample
> implementation in VKMS. All the key API bits are here.
> VKMS now supports two named transfer function colorops and we have an IGT
> test that confirms that sRGB EOTF, followed by its inverse gives us expected
> results within +/- 1 8 bpc codepoint value.
> 
> This patchset is grouped as follows:
>  - Patches 1-2: couple general patches/fixes
>  - Patches 3-5: introduce kunit to VKMS
>  - Patch 6: description of motivation and details behind the
>             Color Pipeline API. If you're reading nothing else
>             but are interested in the topic I highly recommend
>             you take a look at this.
>  - Patches 7-15: Add core DRM API bits
>  - Patches 15-17: VKMS implementation
> 
> There are plenty of things that I would like to see here but haven't had a chance
> to look at. These will (hopefully) be addressed in future iterations:
>  - Abandon IOCTLs and discover colorops as clients iterate the pipeline
>  - Add color_pipeline client cap and deprecate existing color encoding and
>    color range properties.
>    See https://lists.freedesktop.org/archives/dri-devel/2023-
> September/422643.html
>  - Add CTM colorop to VKMS
>  - Add custom LUT colorops to VKMS
>  - Add pre-blending 3DLUT with tetrahedral interpolation to VKMS
>  - How to support HW which can't bypass entire pipeline?
>  - Add ability to create colorops that don't have BYPASS
>  - Can we do a LOAD / COMMIT model for LUTs (and other properties)?
> 
> IGT tests can be found at
> https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1
> 
> IGT patches are also being sent to the igt-dev mailing list.
> 
> libdrm changes to support the new IOCTLs are at
> https://gitlab.freedesktop.org/hwentland/drm/-/merge_requests/1
> 
> If you prefer a gitlab MR for review you can find it at
> https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5
> 
> A slightly different approach for a Color Pipeline API was sent by Uma Shankar
> and can be found at https://patchwork.freedesktop.org/series/123024/
> 
> The main difference is that his approach is not introducing a new DRM core object
> but instead exposes color pipelines via blob properties.
> There are pros and cons to both approaches.

Thanks Harry and all others who have actively contributed to the design and
discussions thus far.

Due to other commitments, we couldn't participate in XDC this time and also
the delay on our part. Our apologies.

We looked at the approach and are aligned to go with property-based design,
with some suggestions. Will follow in comments in respective patches.
We are also in process of trying this for Intel's hardware to identify if any gaps.

Regards,
Uma Shankar

> v2:
>  - Rebased on drm-misc-next
>  - Introduce a VKMS Kunit so we can test LUT functionality in vkms_composer
>  - Incorporate feedback in color_pipeline.rst doc
>  - Add support for sRGB inverse EOTF
>  - Add 2nd enumerated TF colorop to VKMS
>  - Fix LUTs and some issues with applying LUTs in VKMS
> 
> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Jonas Ådahl <jadahl@redhat.com>
> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Cc: Alexander Goins <agoins@nvidia.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Michel Dänzer <mdaenzer@redhat.com>
> Cc: Aleix Pol <aleixpol@kde.org>
> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> Cc: Victoria Brekenfeld <victoria@system76.com>
> Cc: Sima <daniel@ffwll.ch>
> Cc: Uma Shankar <uma.shankar@intel.com>
> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
> Cc: Christopher Braga <quic_cbraga@quicinc.com>
> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> Cc: Arthur Grillo <arthurgrillo@riseup.net>
> Cc: Hector Martin <marcan@marcan.st>
> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
> Cc: Sasha McIntosh <sashamcintosh@google.com>
> 
> Harry Wentland (17):
>   drm/atomic: Allow get_value for immutable properties on atomic drivers
>   drm: Don't treat 0 as -1 in drm_fixp2int_ceil
>   drm/vkms: Create separate Kconfig file for VKMS
>   drm/vkms: Add kunit tests for VKMS LUT handling
>   drm/vkms: Avoid reading beyond LUT array
>   drm/doc/rfc: Describe why prescriptive color pipeline is needed
>   drm/colorop: Introduce new drm_colorop mode object
>   drm/colorop: Add TYPE property
>   drm/color: Add 1D Curve subtype
>   drm/colorop: Add BYPASS property
>   drm/colorop: Add NEXT property
>   drm/colorop: Add atomic state print for drm_colorop
>   drm/colorop: Add new IOCTLs to retrieve drm_colorop objects
>   drm/plane: Add COLOR PIPELINE property
>   drm/colorop: Add NEXT to colorop state print
>   drm/vkms: Add enumerated 1D curve colorop
>   drm/vkms: Add kunit tests for linear and sRGB LUTs
> 
>  Documentation/gpu/rfc/color_pipeline.rst      | 347 ++++++++
>  drivers/gpu/drm/Kconfig                       |  14 +-
>  drivers/gpu/drm/Makefile                      |   1 +
>  drivers/gpu/drm/drm_atomic.c                  | 155 ++++
>  drivers/gpu/drm/drm_atomic_helper.c           |  12 +
>  drivers/gpu/drm/drm_atomic_state_helper.c     |   5 +
>  drivers/gpu/drm/drm_atomic_uapi.c             | 110 +++
>  drivers/gpu/drm/drm_colorop.c                 | 384 +++++++++
>  drivers/gpu/drm/drm_crtc_internal.h           |   4 +
>  drivers/gpu/drm/drm_ioctl.c                   |   5 +
>  drivers/gpu/drm/drm_mode_config.c             |   7 +
>  drivers/gpu/drm/drm_mode_object.c             |   3 +-
>  drivers/gpu/drm/drm_plane_helper.c            |   2 +-
>  drivers/gpu/drm/vkms/Kconfig                  |  20 +
>  drivers/gpu/drm/vkms/Makefile                 |   6 +-
>  drivers/gpu/drm/vkms/tests/.kunitconfig       |   4 +
>  drivers/gpu/drm/vkms/tests/Makefile           |   4 +
>  drivers/gpu/drm/vkms/tests/vkms_color_tests.c | 100 +++
>  drivers/gpu/drm/vkms/vkms_colorop.c           |  85 ++
>  drivers/gpu/drm/vkms/vkms_composer.c          |  77 +-
>  drivers/gpu/drm/vkms/vkms_composer.h          |  25 +
>  drivers/gpu/drm/vkms/vkms_drv.h               |   4 +
>  drivers/gpu/drm/vkms/vkms_luts.c              | 802 ++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_luts.h              |  12 +
>  drivers/gpu/drm/vkms/vkms_plane.c             |   2 +
>  include/drm/drm_atomic.h                      |  82 ++
>  include/drm/drm_atomic_uapi.h                 |   3 +
>  include/drm/drm_colorop.h                     | 235 +++++
>  include/drm/drm_fixed.h                       |   2 +-
>  include/drm/drm_mode_config.h                 |  18 +
>  include/drm/drm_plane.h                       |  10 +
>  include/uapi/drm/drm.h                        |   3 +
>  include/uapi/drm/drm_mode.h                   |  22 +
>  33 files changed, 2530 insertions(+), 35 deletions(-)  create mode 100644
> Documentation/gpu/rfc/color_pipeline.rst
>  create mode 100644 drivers/gpu/drm/drm_colorop.c  create mode 100644
> drivers/gpu/drm/vkms/Kconfig  create mode 100644
> drivers/gpu/drm/vkms/tests/.kunitconfig
>  create mode 100644 drivers/gpu/drm/vkms/tests/Makefile
>  create mode 100644 drivers/gpu/drm/vkms/tests/vkms_color_tests.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_colorop.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_composer.h
>  create mode 100644 drivers/gpu/drm/vkms/vkms_luts.c  create mode 100644
> drivers/gpu/drm/vkms/vkms_luts.h  create mode 100644
> include/drm/drm_colorop.h
> 
> --
> 2.42.0


^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-10-19 21:21 ` [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed Harry Wentland
  2023-10-20 14:22   ` Sebastian Wick
@ 2023-11-08 12:18   ` Shankar, Uma
  2023-11-08 13:43     ` Joshua Ashton
  2023-11-08 14:37     ` Harry Wentland
  1 sibling, 2 replies; 49+ messages in thread
From: Shankar, Uma @ 2023-11-08 12:18 UTC (permalink / raw)
  To: Harry Wentland, dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Abhinav Kumar,
	Naseer Ahmed, Melissa Wen, Aleix Pol, Christopher Braga,
	Pekka Paalanen, Hector Martin, Xaver Hugl, Joshua Ashton



> -----Original Message-----
> From: Harry Wentland <harry.wentland@amd.com>
> Sent: Friday, October 20, 2023 2:51 AM
> To: dri-devel@lists.freedesktop.org
> Cc: wayland-devel@lists.freedesktop.org; Harry Wentland
> <harry.wentland@amd.com>; Ville Syrjala <ville.syrjala@linux.intel.com>; Pekka
> Paalanen <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>;
> Melissa Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>; Sebastian
> Wick <sebastian.wick@redhat.com>; Shashank Sharma
> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>; Joshua
> Ashton <joshua@froggi.es>; Michel Dänzer <mdaenzer@redhat.com>; Aleix Pol
> <aleixpol@kde.org>; Xaver Hugl <xaver.hugl@gmail.com>; Victoria Brekenfeld
> <victoria@system76.com>; Sima <daniel@ffwll.ch>; Shankar, Uma
> <uma.shankar@intel.com>; Naseer Ahmed <quic_naseer@quicinc.com>;
> Christopher Braga <quic_cbraga@quicinc.com>; Abhinav Kumar
> <quic_abhinavk@quicinc.com>; Arthur Grillo <arthurgrillo@riseup.net>; Hector
> Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha
> McIntosh <sashamcintosh@google.com>
> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> v2:
>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>  - Updated wording (Pekka)
>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>    section (Pekka)
>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>  - Add "Driver Implementer's Guide" section (Pekka)
>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
> 
> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> Cc: Simon Ser <contact@emersion.fr>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Melissa Wen <mwen@igalia.com>
> Cc: Jonas Ådahl <jadahl@redhat.com>
> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> Cc: Shashank Sharma <shashank.sharma@amd.com>
> Cc: Alexander Goins <agoins@nvidia.com>
> Cc: Joshua Ashton <joshua@froggi.es>
> Cc: Michel Dänzer <mdaenzer@redhat.com>
> Cc: Aleix Pol <aleixpol@kde.org>
> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> Cc: Victoria Brekenfeld <victoria@system76.com>
> Cc: Sima <daniel@ffwll.ch>
> Cc: Uma Shankar <uma.shankar@intel.com>
> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
> Cc: Christopher Braga <quic_cbraga@quicinc.com>
> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> Cc: Arthur Grillo <arthurgrillo@riseup.net>
> Cc: Hector Martin <marcan@marcan.st>
> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
> Cc: Sasha McIntosh <sashamcintosh@google.com>
> ---
>  Documentation/gpu/rfc/color_pipeline.rst | 347 +++++++++++++++++++++++
>  1 file changed, 347 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> 
> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
> b/Documentation/gpu/rfc/color_pipeline.rst
> new file mode 100644
> index 000000000000..af5f2ea29116
> --- /dev/null
> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> @@ -0,0 +1,347 @@
> +========================
> +Linux Color Pipeline API
> +========================
> +
> +What problem are we solving?
> +============================
> +
> +We would like to support pre-, and post-blending complex color
> +transformations in display controller hardware in order to allow for
> +HW-supported HDR use-cases, as well as to provide support to
> +color-managed applications, such as video or image editors.
> +
> +It is possible to support an HDR output on HW supporting the Colorspace
> +and HDR Metadata drm_connector properties, but that requires the
> +compositor or application to render and compose the content into one
> +final buffer intended for display. Doing so is costly.
> +
> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and
> +other operations to support color transformations. These operations are
> +often implemented in fixed-function HW and therefore much more power
> +efficient than performing similar operations via shaders or CPU.
> +
> +We would like to make use of this HW functionality to support complex
> +color transformations with no, or minimal CPU or shader load.
> +
> +
> +How are other OSes solving this problem?
> +========================================
> +
> +The most widely supported use-cases regard HDR content, whether video
> +or gaming.
> +
> +Most OSes will specify the source content format (color gamut, encoding
> +transfer function, and other metadata, such as max and average light levels) to a
> driver.
> +Drivers will then program their fixed-function HW accordingly to map
> +from a source content buffer's space to a display's space.
> +
> +When fixed-function HW is not available the compositor will assemble a
> +shader to ask the GPU to perform the transformation from the source
> +content format to the display's format.
> +
> +A compositor's mapping function and a driver's mapping function are
> +usually entirely separate concepts. On OSes where a HW vendor has no
> +insight into closed-source compositor code such a vendor will tune
> +their color management code to visually match the compositor's. On
> +other OSes, where both mapping functions are open to an implementer they will
> ensure both mappings match.
> +
> +This results in mapping algorithm lock-in, meaning that no-one alone
> +can experiment with or introduce new mapping algorithms and achieve
> +consistent results regardless of which implementation path is taken.
> +
> +Why is Linux different?
> +=======================
> +
> +Unlike other OSes, where there is one compositor for one or more
> +drivers, on Linux we have a many-to-many relationship. Many compositors;
> many drivers.
> +In addition each compositor vendor or community has their own view of
> +how color management should be done. This is what makes Linux so beautiful.
> +
> +This means that a HW vendor can now no longer tune their driver to one
> +compositor, as tuning it to one could make it look fairly different
> +from another compositor's color mapping.
> +
> +We need a better solution.
> +
> +
> +Descriptive API
> +===============
> +
> +An API that describes the source and destination colorspaces is a
> +descriptive API. It describes the input and output color spaces but
> +does not describe how precisely they should be mapped. Such a mapping
> +includes many minute design decision that can greatly affect the look of the final
> result.
> +
> +It is not feasible to describe such mapping with enough detail to
> +ensure the same result from each implementation. In fact, these
> +mappings are a very active research area.
> +
> +
> +Prescriptive API
> +================
> +
> +A prescriptive API describes not the source and destination
> +colorspaces. It instead prescribes a recipe for how to manipulate pixel
> +values to arrive at the desired outcome.
> +
> +This recipe is generally an ordered list of straight-forward
> +operations, with clear mathematical definitions, such as 1D LUTs, 3D
> +LUTs, matrices, or other operations that can be described in a precise manner.
> +
> +
> +The Color Pipeline API
> +======================
> +
> +HW color management pipelines can significantly differ between HW
> +vendors in terms of availability, ordering, and capabilities of HW
> +blocks. This makes a common definition of color management blocks and
> +their ordering nigh impossible. Instead we are defining an API that
> +allows user space to discover the HW capabilities in a generic manner,
> +agnostic of specific drivers and hardware.
> +
> +
> +drm_colorop Object & IOCTLs
> +===========================
> +
> +To support the definition of color pipelines we define the DRM core
> +object type drm_colorop. Individual drm_colorop objects will be chained
> +via the NEXT property of a drm_colorop to constitute a color pipeline.
> +Each drm_colorop object is unique, i.e., even if multiple color
> +pipelines have the same operation they won't share the same drm_colorop
> +object to describe that operation.
> +
> +Note that drivers are not expected to map drm_colorop objects
> +statically to specific HW blocks. The mapping of drm_colorop objects is
> +entirely a driver-internal detail and can be as dynamic or static as a
> +driver needs it to be. See more in the Driver Implementation Guide section
> below.
> +
> +Just like other DRM objects the drm_colorop objects are discovered via
> +IOCTLs:
> +
> +DRM_IOCTL_MODE_GETCOLOROPRESOURCES: This IOCTL is used to retrieve
> the
> +number of all drm_colorop objects.
> +
> +DRM_IOCTL_MODE_GETCOLOROP: This IOCTL is used to read one drm_colorop.
> +It includes the ID for the colorop object, as well as the plane_id of
> +the associated plane. All other values should be registered as
> +properties.
> +
> +Each drm_colorop has three core properties:
> +
> +TYPE: The type of transformation, such as
> +* enumerated curve
> +* custom (uniform) 1D LUT
> +* 3x3 matrix
> +* 3x4 matrix
> +* 3D LUT
> +* etc.
> +
> +Depending on the type of transformation other properties will describe
> +more details.
> +
> +BYPASS: A boolean property that can be used to easily put a block into
> +bypass mode. While setting other properties might fail atomic check,
> +setting the BYPASS property to true should never fail. The BYPASS
> +property is not mandatory for a colorop, as long as the entire pipeline
> +can get bypassed by setting the COLOR_PIPELINE on a plane to '0'.
> +
> +NEXT: The ID of the next drm_colorop in a color pipeline, or 0 if this
> +drm_colorop is the last in the chain.
> +
> +An example of a drm_colorop object might look like one of these::
> +
> +    /* 1D enumerated curve */
> +    Color operation 42
> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
> matrix, 3D LUT, etc.} = 1D enumerated curve
> +    ├─ "BYPASS": bool {true, false}
> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ
> inverse EOTF, …}

Having the fixed function enum for some targeted input/output may not be scalable
for all usecases. There are multiple colorspaces and transfer functions possible,
so it will not be possible to cover all these by any enum definitions. Also, this will
depend on the capabilities of respective hardware from various vendors.

> +    └─ "NEXT": immutable color operation ID = 43	
> +
> +    /* custom 4k entry 1D LUT */
> +    Color operation 52
> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
> matrix, 3D LUT, etc.} = 1D LUT
> +    ├─ "BYPASS": bool {true, false}
> +    ├─ "LUT_1D_SIZE": immutable range = 4096

For the size and capability of individual LUT block, it would be good to add this
as a blob as defined in the blob approach we were planning earlier. So just taking
that part of the series to have this capability detection generic. Refer below:
https://patchwork.freedesktop.org/patch/554855/?series=123023&rev=1

Basically, use this structure for lut capability and arrangement:
struct drm_color_lut_range {
	/* DRM_MODE_LUT_* */
	__u32 flags;
	/* number of points on the curve */
	__u16 count;
	/* input/output bits per component */
	__u8 input_bpc, output_bpc;
	/* input start/end values */
	__s32 start, end;
	/* output min/max values */
	__s32 min, max;
};

If the intention is to have just 1 segment with 4096, it can be easily described there.
Additionally, this can also cater to any kind of lut arrangement, PWL, segmented or logarithmic.

> +    ├─ "LUT_1D": blob
> +    └─ "NEXT": immutable color operation ID = 0
> +
> +    /* 17^3 3D LUT */
> +    Color operation 72
> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
> matrix, 3D LUT, etc.} = 3D LUT
> +    ├─ "BYPASS": bool {true, false}
> +    ├─ "LUT_3D_SIZE": immutable range = 17
> +    ├─ "LUT_3D": blob
> +    └─ "NEXT": immutable color operation ID = 73
> +
> +
> +COLOR_PIPELINE Plane Property
> +=============================
> +
> +Color Pipelines are created by a driver and advertised via a new
> +COLOR_PIPELINE enum property on each plane. Values of the property
> +always include '0', which is the default and means all color processing
> +is disabled. Additional values will be the object IDs of the first
> +drm_colorop in a pipeline. A driver can create and advertise none, one,
> +or more possible color pipelines. A DRM client will select a color
> +pipeline by setting the COLOR PIPELINE to the respective value.
> +
> +In the case where drivers have custom support for pre-blending color
> +processing those drivers shall reject atomic commits that are trying to
> +use both the custom color properties, as well as the COLOR_PIPELINE
> +property.
> +
> +An example of a COLOR_PIPELINE property on a plane might look like this::
> +
> +    Plane 10
> +    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> +    ├─ …
> +    └─ "color_pipeline": enum {0, 42, 52} = 0
> +
> +
> +Color Pipeline Discovery
> +========================
> +
> +A DRM client wanting color management on a drm_plane will:
> +
> +1. Read all drm_colorop objects
> +2. Get the COLOR_PIPELINE property of the plane 3. iterate all
> +COLOR_PIPELINE enum values 4. for each enum value walk the color
> +pipeline (via the NEXT pointers)
> +   and see if the available color operations are suitable for the
> +   desired color management operations
> +
> +An example of chained properties to define an AMD pre-blending color
> +pipeline might look like this::
> +
> +    Plane 10
> +    ├─ "TYPE" (immutable) = Primary
> +    └─ "COLOR_PIPELINE": enum {0, 44} = 0
> +
> +    Color operation 44
> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> +    ├─ "BYPASS": bool
> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> +    └─ "NEXT" (immutable) = 45
> +
> +    Color operation 45
> +    ├─ "TYPE" (immutable) = 3x4 Matrix
> +    ├─ "BYPASS": bool
> +    ├─ "MATRIX_3_4": blob
> +    └─ "NEXT" (immutable) = 46
> +
> +    Color operation 46
> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> +    ├─ "BYPASS": bool
> +    ├─ "CURVE_1D_TYPE": enum {sRGB Inverse EOTF, PQ Inverse EOTF} = sRGB
> EOTF
> +    └─ "NEXT" (immutable) = 47
> +
> +    Color operation 47
> +    ├─ "TYPE" (immutable) = 1D LUT
> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> +    ├─ "LUT_1D_DATA": blob
> +    └─ "NEXT" (immutable) = 48
> +
> +    Color operation 48
> +    ├─ "TYPE" (immutable) = 3D LUT
> +    ├─ "LUT_3D_SIZE" (immutable) = 17
> +    ├─ "LUT_3D_DATA": blob
> +    └─ "NEXT" (immutable) = 49
> +
> +    Color operation 49
> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> +    ├─ "BYPASS": bool
> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> +    └─ "NEXT" (immutable) = 0
> +
> +
> +Color Pipeline Programming
> +==========================
> +
> +Once a DRM client has found a suitable pipeline it will:
> +
> +1. Set the COLOR_PIPELINE enum value to the one pointing at the first
> +   drm_colorop object of the desired pipeline 2. Set the properties for
> +all drm_colorop objects in the pipeline to the
> +   desired values, setting BYPASS to true for unused drm_colorop blocks,
> +   and false for enabled drm_colorop blocks 3. Perform
> +atomic_check/commit as desired
> +
> +To configure the pipeline for an HDR10 PQ plane and blending in linear
> +space, a compositor might perform an atomic commit with the following
> +property values::
> +
> +    Plane 10
> +    └─ "COLOR_PIPELINE" = 42
> +
> +    Color operation 42 (input CSC)
> +    └─ "BYPASS" = true
> +
> +    Color operation 44 (DeGamma)
> +    └─ "BYPASS" = true
> +
> +    Color operation 45 (gamut remap)
> +    └─ "BYPASS" = true
> +
> +    Color operation 46 (shaper LUT RAM)
> +    └─ "BYPASS" = true
> +
> +    Color operation 47 (3D LUT RAM)
> +    └─ "LUT_3D_DATA" = Gamut mapping + tone mapping + night mode
> +
> +    Color operation 48 (blend gamma)
> +    └─ "CURVE_1D_TYPE" = PQ EOTF
> +
> +
> +Driver Implementer's Guide
> +==========================
> +
> +What does this all mean for driver implementations? As noted above the
> +colorops can map to HW directly but don't need to do so. Here are some
> +suggestions on how to think about creating your color pipelines:
> +
> +- Try to expose pipelines that use already defined colorops, even if
> +  your hardware pipeline is split differently. This allows existing
> +  userspace to immediately take advantage of the hardware.
> +
> +- Additionally, try to expose your actual hardware blocks as colorops.
> +  Define new colorop types where you believe it can offer significant
> +  benefits if userspace learns to program them.
> +
> +- Avoid defining new colorops for compound operations with very narrow
> +  scope. If you have a hardware block for a special operation that
> +  cannot be split further, you can expose that as a new colorop type.
> +  However, try to not define colorops for "use cases", especially if
> +  they require you to combine multiple hardware blocks.
> +
> +- Design new colorops as prescriptive, not descriptive; by the
> +  mathematical formula, not by the assumed input and output.
> +
> +A defined colorop type must be deterministic. Its operation can depend
> +only on its properties and input and nothing else, allowed error
> +tolerance notwithstanding.
> +
> +
> +Driver Forward/Backward Compatibility
> +=====================================
> +
> +As this is uAPI drivers can't regress color pipelines that have been
> +introduced for a given HW generation. New HW generations are free to
> +abandon color pipelines advertised for previous generations.
> +Nevertheless, it can be beneficial to carry support for existing color
> +pipelines forward as those will likely already have support in DRM
> +clients.
> +
> +Introducing new colorops to a pipeline is fine, as long as they can be
> +disabled or are purely informational. DRM clients implementing support
> +for the pipeline can always skip unknown properties as long as they can
> +be confident that doing so will not cause unexpected results.
> +
> +If a new colorop doesn't fall into one of the above categories
> +(bypassable or informational) the modified pipeline would be unusable
> +for user space. In this case a new pipeline should be defined.

Thanks again for this nice documentation and capturing all the details clearly.

Regards,
Uma Shankar

> +
> +References
> +==========
> +
> +1.
> +https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_hD5n
> +AccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1
> QWn488=
> +@emersion.fr/
> \ No newline at end of file
> --
> 2.42.0


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-08 12:18   ` Shankar, Uma
@ 2023-11-08 13:43     ` Joshua Ashton
  2023-11-09 10:17       ` Shankar, Uma
  2023-11-08 14:37     ` Harry Wentland
  1 sibling, 1 reply; 49+ messages in thread
From: Joshua Ashton @ 2023-11-08 13:43 UTC (permalink / raw)
  To: Shankar, Uma, Harry Wentland, dri-devel
  Cc: Sebastian Wick, Sasha McIntosh, Pekka Paalanen, Abhinav Kumar,
	Shashank Sharma, Xaver Hugl, Hector Martin, Liviu Dudau,
	Michel Dänzer, wayland-devel, Melissa Wen, Jonas Ådahl,
	Arthur Grillo, Victoria Brekenfeld, Aleix Pol, Naseer Ahmed,
	Christopher Braga



On 11/8/23 12:18, Shankar, Uma wrote:
> 
> 
>> -----Original Message-----
>> From: Harry Wentland <harry.wentland@amd.com>
>> Sent: Friday, October 20, 2023 2:51 AM
>> To: dri-devel@lists.freedesktop.org
>> Cc: wayland-devel@lists.freedesktop.org; Harry Wentland
>> <harry.wentland@amd.com>; Ville Syrjala <ville.syrjala@linux.intel.com>; Pekka
>> Paalanen <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>;
>> Melissa Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>; Sebastian
>> Wick <sebastian.wick@redhat.com>; Shashank Sharma
>> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>; Joshua
>> Ashton <joshua@froggi.es>; Michel Dänzer <mdaenzer@redhat.com>; Aleix Pol
>> <aleixpol@kde.org>; Xaver Hugl <xaver.hugl@gmail.com>; Victoria Brekenfeld
>> <victoria@system76.com>; Sima <daniel@ffwll.ch>; Shankar, Uma
>> <uma.shankar@intel.com>; Naseer Ahmed <quic_naseer@quicinc.com>;
>> Christopher Braga <quic_cbraga@quicinc.com>; Abhinav Kumar
>> <quic_abhinavk@quicinc.com>; Arthur Grillo <arthurgrillo@riseup.net>; Hector
>> Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha
>> McIntosh <sashamcintosh@google.com>
>> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
>> pipeline is needed
>>
>> v2:
>>   - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>   - Updated wording (Pekka)
>>   - Change BYPASS wording to make it non-mandatory (Sebastian)
>>   - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>     section (Pekka)
>>   - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>>   - Add "Driver Implementer's Guide" section (Pekka)
>>   - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
>>
>> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
>> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>> Cc: Simon Ser <contact@emersion.fr>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Melissa Wen <mwen@igalia.com>
>> Cc: Jonas Ådahl <jadahl@redhat.com>
>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Cc: Alexander Goins <agoins@nvidia.com>
>> Cc: Joshua Ashton <joshua@froggi.es>
>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>> Cc: Aleix Pol <aleixpol@kde.org>
>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>> Cc: Victoria Brekenfeld <victoria@system76.com>
>> Cc: Sima <daniel@ffwll.ch>
>> Cc: Uma Shankar <uma.shankar@intel.com>
>> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
>> Cc: Christopher Braga <quic_cbraga@quicinc.com>
>> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
>> Cc: Arthur Grillo <arthurgrillo@riseup.net>
>> Cc: Hector Martin <marcan@marcan.st>
>> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
>> Cc: Sasha McIntosh <sashamcintosh@google.com>
>> ---
>>   Documentation/gpu/rfc/color_pipeline.rst | 347 +++++++++++++++++++++++
>>   1 file changed, 347 insertions(+)
>>   create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
>>
>> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
>> b/Documentation/gpu/rfc/color_pipeline.rst
>> new file mode 100644
>> index 000000000000..af5f2ea29116
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/color_pipeline.rst
>> @@ -0,0 +1,347 @@
>> +========================
>> +Linux Color Pipeline API
>> +========================
>> +
>> +What problem are we solving?
>> +============================
>> +
>> +We would like to support pre-, and post-blending complex color
>> +transformations in display controller hardware in order to allow for
>> +HW-supported HDR use-cases, as well as to provide support to
>> +color-managed applications, such as video or image editors.
>> +
>> +It is possible to support an HDR output on HW supporting the Colorspace
>> +and HDR Metadata drm_connector properties, but that requires the
>> +compositor or application to render and compose the content into one
>> +final buffer intended for display. Doing so is costly.
>> +
>> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and
>> +other operations to support color transformations. These operations are
>> +often implemented in fixed-function HW and therefore much more power
>> +efficient than performing similar operations via shaders or CPU.
>> +
>> +We would like to make use of this HW functionality to support complex
>> +color transformations with no, or minimal CPU or shader load.
>> +
>> +
>> +How are other OSes solving this problem?
>> +========================================
>> +
>> +The most widely supported use-cases regard HDR content, whether video
>> +or gaming.
>> +
>> +Most OSes will specify the source content format (color gamut, encoding
>> +transfer function, and other metadata, such as max and average light levels) to a
>> driver.
>> +Drivers will then program their fixed-function HW accordingly to map
>> +from a source content buffer's space to a display's space.
>> +
>> +When fixed-function HW is not available the compositor will assemble a
>> +shader to ask the GPU to perform the transformation from the source
>> +content format to the display's format.
>> +
>> +A compositor's mapping function and a driver's mapping function are
>> +usually entirely separate concepts. On OSes where a HW vendor has no
>> +insight into closed-source compositor code such a vendor will tune
>> +their color management code to visually match the compositor's. On
>> +other OSes, where both mapping functions are open to an implementer they will
>> ensure both mappings match.
>> +
>> +This results in mapping algorithm lock-in, meaning that no-one alone
>> +can experiment with or introduce new mapping algorithms and achieve
>> +consistent results regardless of which implementation path is taken.
>> +
>> +Why is Linux different?
>> +=======================
>> +
>> +Unlike other OSes, where there is one compositor for one or more
>> +drivers, on Linux we have a many-to-many relationship. Many compositors;
>> many drivers.
>> +In addition each compositor vendor or community has their own view of
>> +how color management should be done. This is what makes Linux so beautiful.
>> +
>> +This means that a HW vendor can now no longer tune their driver to one
>> +compositor, as tuning it to one could make it look fairly different
>> +from another compositor's color mapping.
>> +
>> +We need a better solution.
>> +
>> +
>> +Descriptive API
>> +===============
>> +
>> +An API that describes the source and destination colorspaces is a
>> +descriptive API. It describes the input and output color spaces but
>> +does not describe how precisely they should be mapped. Such a mapping
>> +includes many minute design decision that can greatly affect the look of the final
>> result.
>> +
>> +It is not feasible to describe such mapping with enough detail to
>> +ensure the same result from each implementation. In fact, these
>> +mappings are a very active research area.
>> +
>> +
>> +Prescriptive API
>> +================
>> +
>> +A prescriptive API describes not the source and destination
>> +colorspaces. It instead prescribes a recipe for how to manipulate pixel
>> +values to arrive at the desired outcome.
>> +
>> +This recipe is generally an ordered list of straight-forward
>> +operations, with clear mathematical definitions, such as 1D LUTs, 3D
>> +LUTs, matrices, or other operations that can be described in a precise manner.
>> +
>> +
>> +The Color Pipeline API
>> +======================
>> +
>> +HW color management pipelines can significantly differ between HW
>> +vendors in terms of availability, ordering, and capabilities of HW
>> +blocks. This makes a common definition of color management blocks and
>> +their ordering nigh impossible. Instead we are defining an API that
>> +allows user space to discover the HW capabilities in a generic manner,
>> +agnostic of specific drivers and hardware.
>> +
>> +
>> +drm_colorop Object & IOCTLs
>> +===========================
>> +
>> +To support the definition of color pipelines we define the DRM core
>> +object type drm_colorop. Individual drm_colorop objects will be chained
>> +via the NEXT property of a drm_colorop to constitute a color pipeline.
>> +Each drm_colorop object is unique, i.e., even if multiple color
>> +pipelines have the same operation they won't share the same drm_colorop
>> +object to describe that operation.
>> +
>> +Note that drivers are not expected to map drm_colorop objects
>> +statically to specific HW blocks. The mapping of drm_colorop objects is
>> +entirely a driver-internal detail and can be as dynamic or static as a
>> +driver needs it to be. See more in the Driver Implementation Guide section
>> below.
>> +
>> +Just like other DRM objects the drm_colorop objects are discovered via
>> +IOCTLs:
>> +
>> +DRM_IOCTL_MODE_GETCOLOROPRESOURCES: This IOCTL is used to retrieve
>> the
>> +number of all drm_colorop objects.
>> +
>> +DRM_IOCTL_MODE_GETCOLOROP: This IOCTL is used to read one drm_colorop.
>> +It includes the ID for the colorop object, as well as the plane_id of
>> +the associated plane. All other values should be registered as
>> +properties.
>> +
>> +Each drm_colorop has three core properties:
>> +
>> +TYPE: The type of transformation, such as
>> +* enumerated curve
>> +* custom (uniform) 1D LUT
>> +* 3x3 matrix
>> +* 3x4 matrix
>> +* 3D LUT
>> +* etc.
>> +
>> +Depending on the type of transformation other properties will describe
>> +more details.
>> +
>> +BYPASS: A boolean property that can be used to easily put a block into
>> +bypass mode. While setting other properties might fail atomic check,
>> +setting the BYPASS property to true should never fail. The BYPASS
>> +property is not mandatory for a colorop, as long as the entire pipeline
>> +can get bypassed by setting the COLOR_PIPELINE on a plane to '0'.
>> +
>> +NEXT: The ID of the next drm_colorop in a color pipeline, or 0 if this
>> +drm_colorop is the last in the chain.
>> +
>> +An example of a drm_colorop object might look like one of these::
>> +
>> +    /* 1D enumerated curve */
>> +    Color operation 42
>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
>> matrix, 3D LUT, etc.} = 1D enumerated curve
>> +    ├─ "BYPASS": bool {true, false}
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ
>> inverse EOTF, …}
> 
> Having the fixed function enum for some targeted input/output may not be scalable
> for all usecases. There are multiple colorspaces and transfer functions possible,
> so it will not be possible to cover all these by any enum definitions. Also, this will
> depend on the capabilities of respective hardware from various vendors.

The reason this exists is such that certain HW vendors such as AMD have 
transfer functions implemented in HW. It is important to take advantage 
of these for both precision and power reasons.

Additionally, not every vendor implements bucketed/segemented LUTs the 
same way, so it's not feasible to expose that in a way that's 
particularly useful or not vendor-specific.

Thus we decided to have a regular 1D LUT modulated onto a known curve. 
This is the only real cross-vendor solution here that allows HW curve 
implementations to be taken advantage of and also works with 
bucketing/segemented LUTs.
(Including vendors we are not aware of yet).

This also means that vendors that only support HW curves at some stages 
without an actual LUT are also serviced.

You are right that there *might* be some usecase not covered by this 
right now, and that it would need kernel churn to implement new curves, 
but unfortunately that's the compromise that we (so-far) have decided on 
in order to ensure everyone can have good, precise, power-efficient support.

It is always possible for us to extend the uAPI at a later date for 
other curves, or other properties that might expose a generic segmented 
LUT interface (such as what you have proposed for a while) for vendors 
that can support it.
(With the whole color pipeline thing, we can essentially do 'versioning' 
with that, if we wanted a new 1D LUT type.)

Thanks!
- Joshie 🐸✨

> 
>> +    └─ "NEXT": immutable color operation ID = 43	
>> +
>> +    /* custom 4k entry 1D LUT */
>> +    Color operation 52
>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
>> matrix, 3D LUT, etc.} = 1D LUT
>> +    ├─ "BYPASS": bool {true, false}
>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> 
> For the size and capability of individual LUT block, it would be good to add this
> as a blob as defined in the blob approach we were planning earlier. So just taking
> that part of the series to have this capability detection generic. Refer below:
> https://patchwork.freedesktop.org/patch/554855/?series=123023&rev=1
> 
> Basically, use this structure for lut capability and arrangement:
> struct drm_color_lut_range {
> 	/* DRM_MODE_LUT_* */
> 	__u32 flags;
> 	/* number of points on the curve */
> 	__u16 count;
> 	/* input/output bits per component */
> 	__u8 input_bpc, output_bpc;
> 	/* input start/end values */
> 	__s32 start, end;
> 	/* output min/max values */
> 	__s32 min, max;
> };
> 
> If the intention is to have just 1 segment with 4096, it can be easily described there.
> Additionally, this can also cater to any kind of lut arrangement, PWL, segmented or logarithmic.
> 
>> +    ├─ "LUT_1D": blob
>> +    └─ "NEXT": immutable color operation ID = 0
>> +
>> +    /* 17^3 3D LUT */
>> +    Color operation 72
>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
>> matrix, 3D LUT, etc.} = 3D LUT
>> +    ├─ "BYPASS": bool {true, false}
>> +    ├─ "LUT_3D_SIZE": immutable range = 17
>> +    ├─ "LUT_3D": blob
>> +    └─ "NEXT": immutable color operation ID = 73
>> +
>> +
>> +COLOR_PIPELINE Plane Property
>> +=============================
>> +
>> +Color Pipelines are created by a driver and advertised via a new
>> +COLOR_PIPELINE enum property on each plane. Values of the property
>> +always include '0', which is the default and means all color processing
>> +is disabled. Additional values will be the object IDs of the first
>> +drm_colorop in a pipeline. A driver can create and advertise none, one,
>> +or more possible color pipelines. A DRM client will select a color
>> +pipeline by setting the COLOR PIPELINE to the respective value.
>> +
>> +In the case where drivers have custom support for pre-blending color
>> +processing those drivers shall reject atomic commits that are trying to
>> +use both the custom color properties, as well as the COLOR_PIPELINE
>> +property.
>> +
>> +An example of a COLOR_PIPELINE property on a plane might look like this::
>> +
>> +    Plane 10
>> +    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>> +    ├─ …
>> +    └─ "color_pipeline": enum {0, 42, 52} = 0
>> +
>> +
>> +Color Pipeline Discovery
>> +========================
>> +
>> +A DRM client wanting color management on a drm_plane will:
>> +
>> +1. Read all drm_colorop objects
>> +2. Get the COLOR_PIPELINE property of the plane 3. iterate all
>> +COLOR_PIPELINE enum values 4. for each enum value walk the color
>> +pipeline (via the NEXT pointers)
>> +   and see if the available color operations are suitable for the
>> +   desired color management operations
>> +
>> +An example of chained properties to define an AMD pre-blending color
>> +pipeline might look like this::
>> +
>> +    Plane 10
>> +    ├─ "TYPE" (immutable) = Primary
>> +    └─ "COLOR_PIPELINE": enum {0, 44} = 0
>> +
>> +    Color operation 44
>> +    ├─ "TYPE" (immutable) = 1D enumerated curve
>> +    ├─ "BYPASS": bool
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
>> +    └─ "NEXT" (immutable) = 45
>> +
>> +    Color operation 45
>> +    ├─ "TYPE" (immutable) = 3x4 Matrix
>> +    ├─ "BYPASS": bool
>> +    ├─ "MATRIX_3_4": blob
>> +    └─ "NEXT" (immutable) = 46
>> +
>> +    Color operation 46
>> +    ├─ "TYPE" (immutable) = 1D enumerated curve
>> +    ├─ "BYPASS": bool
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB Inverse EOTF, PQ Inverse EOTF} = sRGB
>> EOTF
>> +    └─ "NEXT" (immutable) = 47
>> +
>> +    Color operation 47
>> +    ├─ "TYPE" (immutable) = 1D LUT
>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
>> +    ├─ "LUT_1D_DATA": blob
>> +    └─ "NEXT" (immutable) = 48
>> +
>> +    Color operation 48
>> +    ├─ "TYPE" (immutable) = 3D LUT
>> +    ├─ "LUT_3D_SIZE" (immutable) = 17
>> +    ├─ "LUT_3D_DATA": blob
>> +    └─ "NEXT" (immutable) = 49
>> +
>> +    Color operation 49
>> +    ├─ "TYPE" (immutable) = 1D enumerated curve
>> +    ├─ "BYPASS": bool
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
>> +    └─ "NEXT" (immutable) = 0
>> +
>> +
>> +Color Pipeline Programming
>> +==========================
>> +
>> +Once a DRM client has found a suitable pipeline it will:
>> +
>> +1. Set the COLOR_PIPELINE enum value to the one pointing at the first
>> +   drm_colorop object of the desired pipeline 2. Set the properties for
>> +all drm_colorop objects in the pipeline to the
>> +   desired values, setting BYPASS to true for unused drm_colorop blocks,
>> +   and false for enabled drm_colorop blocks 3. Perform
>> +atomic_check/commit as desired
>> +
>> +To configure the pipeline for an HDR10 PQ plane and blending in linear
>> +space, a compositor might perform an atomic commit with the following
>> +property values::
>> +
>> +    Plane 10
>> +    └─ "COLOR_PIPELINE" = 42
>> +
>> +    Color operation 42 (input CSC)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 44 (DeGamma)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 45 (gamut remap)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 46 (shaper LUT RAM)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 47 (3D LUT RAM)
>> +    └─ "LUT_3D_DATA" = Gamut mapping + tone mapping + night mode
>> +
>> +    Color operation 48 (blend gamma)
>> +    └─ "CURVE_1D_TYPE" = PQ EOTF
>> +
>> +
>> +Driver Implementer's Guide
>> +==========================
>> +
>> +What does this all mean for driver implementations? As noted above the
>> +colorops can map to HW directly but don't need to do so. Here are some
>> +suggestions on how to think about creating your color pipelines:
>> +
>> +- Try to expose pipelines that use already defined colorops, even if
>> +  your hardware pipeline is split differently. This allows existing
>> +  userspace to immediately take advantage of the hardware.
>> +
>> +- Additionally, try to expose your actual hardware blocks as colorops.
>> +  Define new colorop types where you believe it can offer significant
>> +  benefits if userspace learns to program them.
>> +
>> +- Avoid defining new colorops for compound operations with very narrow
>> +  scope. If you have a hardware block for a special operation that
>> +  cannot be split further, you can expose that as a new colorop type.
>> +  However, try to not define colorops for "use cases", especially if
>> +  they require you to combine multiple hardware blocks.
>> +
>> +- Design new colorops as prescriptive, not descriptive; by the
>> +  mathematical formula, not by the assumed input and output.
>> +
>> +A defined colorop type must be deterministic. Its operation can depend
>> +only on its properties and input and nothing else, allowed error
>> +tolerance notwithstanding.
>> +
>> +
>> +Driver Forward/Backward Compatibility
>> +=====================================
>> +
>> +As this is uAPI drivers can't regress color pipelines that have been
>> +introduced for a given HW generation. New HW generations are free to
>> +abandon color pipelines advertised for previous generations.
>> +Nevertheless, it can be beneficial to carry support for existing color
>> +pipelines forward as those will likely already have support in DRM
>> +clients.
>> +
>> +Introducing new colorops to a pipeline is fine, as long as they can be
>> +disabled or are purely informational. DRM clients implementing support
>> +for the pipeline can always skip unknown properties as long as they can
>> +be confident that doing so will not cause unexpected results.
>> +
>> +If a new colorop doesn't fall into one of the above categories
>> +(bypassable or informational) the modified pipeline would be unusable
>> +for user space. In this case a new pipeline should be defined.
> 
> Thanks again for this nice documentation and capturing all the details clearly.
> 
> Regards,
> Uma Shankar
> 
>> +
>> +References
>> +==========
>> +
>> +1.
>> +https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_hD5n
>> +AccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1
>> QWn488=
>> +@emersion.fr/
>> \ No newline at end of file
>> --
>> 2.42.0
> 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS
  2023-11-08 11:54 ` [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Shankar, Uma
@ 2023-11-08 14:32   ` Harry Wentland
  0 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-11-08 14:32 UTC (permalink / raw)
  To: Shankar, Uma, dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Abhinav Kumar,
	Naseer Ahmed, Melissa Wen, Aleix Pol, Christopher Braga,
	Pekka Paalanen, Hector Martin, Xaver Hugl, Joshua Ashton



On 2023-11-08 06:54, Shankar, Uma wrote:
> 
> 
>> -----Original Message-----
>> From: Harry Wentland <harry.wentland@amd.com>
>> Sent: Friday, October 20, 2023 2:51 AM
>> To: dri-devel@lists.freedesktop.org
>> Cc: wayland-devel@lists.freedesktop.org; Harry Wentland
>> <harry.wentland@amd.com>; Ville Syrjala <ville.syrjala@linux.intel.com>; Pekka
>> Paalanen <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>;
>> Melissa Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>; Sebastian
>> Wick <sebastian.wick@redhat.com>; Shashank Sharma
>> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>; Joshua
>> Ashton <joshua@froggi.es>; Michel Dänzer <mdaenzer@redhat.com>; Aleix Pol
>> <aleixpol@kde.org>; Xaver Hugl <xaver.hugl@gmail.com>; Victoria Brekenfeld
>> <victoria@system76.com>; Sima <daniel@ffwll.ch>; Shankar, Uma
>> <uma.shankar@intel.com>; Naseer Ahmed <quic_naseer@quicinc.com>;
>> Christopher Braga <quic_cbraga@quicinc.com>; Abhinav Kumar
>> <quic_abhinavk@quicinc.com>; Arthur Grillo <arthurgrillo@riseup.net>; Hector
>> Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha
>> McIntosh <sashamcintosh@google.com>
>> Subject: [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS
>>
>> This is an early RFC set for a color pipeline API, along with a sample
>> implementation in VKMS. All the key API bits are here.
>> VKMS now supports two named transfer function colorops and we have an IGT
>> test that confirms that sRGB EOTF, followed by its inverse gives us expected
>> results within +/- 1 8 bpc codepoint value.
>>
>> This patchset is grouped as follows:
>>  - Patches 1-2: couple general patches/fixes
>>  - Patches 3-5: introduce kunit to VKMS
>>  - Patch 6: description of motivation and details behind the
>>             Color Pipeline API. If you're reading nothing else
>>             but are interested in the topic I highly recommend
>>             you take a look at this.
>>  - Patches 7-15: Add core DRM API bits
>>  - Patches 15-17: VKMS implementation
>>
>> There are plenty of things that I would like to see here but haven't had a chance
>> to look at. These will (hopefully) be addressed in future iterations:
>>  - Abandon IOCTLs and discover colorops as clients iterate the pipeline
>>  - Add color_pipeline client cap and deprecate existing color encoding and
>>    color range properties.
>>    See https://lists.freedesktop.org/archives/dri-devel/2023-
>> September/422643.html
>>  - Add CTM colorop to VKMS
>>  - Add custom LUT colorops to VKMS
>>  - Add pre-blending 3DLUT with tetrahedral interpolation to VKMS
>>  - How to support HW which can't bypass entire pipeline?
>>  - Add ability to create colorops that don't have BYPASS
>>  - Can we do a LOAD / COMMIT model for LUTs (and other properties)?
>>
>> IGT tests can be found at
>> https://gitlab.freedesktop.org/hwentland/igt-gpu-tools/-/merge_requests/1
>>
>> IGT patches are also being sent to the igt-dev mailing list.
>>
>> libdrm changes to support the new IOCTLs are at
>> https://gitlab.freedesktop.org/hwentland/drm/-/merge_requests/1
>>
>> If you prefer a gitlab MR for review you can find it at
>> https://gitlab.freedesktop.org/hwentland/linux/-/merge_requests/5
>>
>> A slightly different approach for a Color Pipeline API was sent by Uma Shankar
>> and can be found at https://patchwork.freedesktop.org/series/123024/
>>
>> The main difference is that his approach is not introducing a new DRM core object
>> but instead exposes color pipelines via blob properties.
>> There are pros and cons to both approaches.
> 
> Thanks Harry and all others who have actively contributed to the design and
> discussions thus far.
> 
> Due to other commitments, we couldn't participate in XDC this time and also
> the delay on our part. Our apologies.
> 
> We looked at the approach and are aligned to go with property-based design,
> with some suggestions. Will follow in comments in respective patches.
> We are also in process of trying this for Intel's hardware to identify if any gaps.
> 

That's great to hear. Thanks, Uma.

Harry

> Regards,
> Uma Shankar
> 
>> v2:
>>  - Rebased on drm-misc-next
>>  - Introduce a VKMS Kunit so we can test LUT functionality in vkms_composer
>>  - Incorporate feedback in color_pipeline.rst doc
>>  - Add support for sRGB inverse EOTF
>>  - Add 2nd enumerated TF colorop to VKMS
>>  - Fix LUTs and some issues with applying LUTs in VKMS
>>
>> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>> Cc: Simon Ser <contact@emersion.fr>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Melissa Wen <mwen@igalia.com>
>> Cc: Jonas Ådahl <jadahl@redhat.com>
>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Cc: Alexander Goins <agoins@nvidia.com>
>> Cc: Joshua Ashton <joshua@froggi.es>
>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>> Cc: Aleix Pol <aleixpol@kde.org>
>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>> Cc: Victoria Brekenfeld <victoria@system76.com>
>> Cc: Sima <daniel@ffwll.ch>
>> Cc: Uma Shankar <uma.shankar@intel.com>
>> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
>> Cc: Christopher Braga <quic_cbraga@quicinc.com>
>> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
>> Cc: Arthur Grillo <arthurgrillo@riseup.net>
>> Cc: Hector Martin <marcan@marcan.st>
>> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
>> Cc: Sasha McIntosh <sashamcintosh@google.com>
>>
>> Harry Wentland (17):
>>   drm/atomic: Allow get_value for immutable properties on atomic drivers
>>   drm: Don't treat 0 as -1 in drm_fixp2int_ceil
>>   drm/vkms: Create separate Kconfig file for VKMS
>>   drm/vkms: Add kunit tests for VKMS LUT handling
>>   drm/vkms: Avoid reading beyond LUT array
>>   drm/doc/rfc: Describe why prescriptive color pipeline is needed
>>   drm/colorop: Introduce new drm_colorop mode object
>>   drm/colorop: Add TYPE property
>>   drm/color: Add 1D Curve subtype
>>   drm/colorop: Add BYPASS property
>>   drm/colorop: Add NEXT property
>>   drm/colorop: Add atomic state print for drm_colorop
>>   drm/colorop: Add new IOCTLs to retrieve drm_colorop objects
>>   drm/plane: Add COLOR PIPELINE property
>>   drm/colorop: Add NEXT to colorop state print
>>   drm/vkms: Add enumerated 1D curve colorop
>>   drm/vkms: Add kunit tests for linear and sRGB LUTs
>>
>>  Documentation/gpu/rfc/color_pipeline.rst      | 347 ++++++++
>>  drivers/gpu/drm/Kconfig                       |  14 +-
>>  drivers/gpu/drm/Makefile                      |   1 +
>>  drivers/gpu/drm/drm_atomic.c                  | 155 ++++
>>  drivers/gpu/drm/drm_atomic_helper.c           |  12 +
>>  drivers/gpu/drm/drm_atomic_state_helper.c     |   5 +
>>  drivers/gpu/drm/drm_atomic_uapi.c             | 110 +++
>>  drivers/gpu/drm/drm_colorop.c                 | 384 +++++++++
>>  drivers/gpu/drm/drm_crtc_internal.h           |   4 +
>>  drivers/gpu/drm/drm_ioctl.c                   |   5 +
>>  drivers/gpu/drm/drm_mode_config.c             |   7 +
>>  drivers/gpu/drm/drm_mode_object.c             |   3 +-
>>  drivers/gpu/drm/drm_plane_helper.c            |   2 +-
>>  drivers/gpu/drm/vkms/Kconfig                  |  20 +
>>  drivers/gpu/drm/vkms/Makefile                 |   6 +-
>>  drivers/gpu/drm/vkms/tests/.kunitconfig       |   4 +
>>  drivers/gpu/drm/vkms/tests/Makefile           |   4 +
>>  drivers/gpu/drm/vkms/tests/vkms_color_tests.c | 100 +++
>>  drivers/gpu/drm/vkms/vkms_colorop.c           |  85 ++
>>  drivers/gpu/drm/vkms/vkms_composer.c          |  77 +-
>>  drivers/gpu/drm/vkms/vkms_composer.h          |  25 +
>>  drivers/gpu/drm/vkms/vkms_drv.h               |   4 +
>>  drivers/gpu/drm/vkms/vkms_luts.c              | 802 ++++++++++++++++++
>>  drivers/gpu/drm/vkms/vkms_luts.h              |  12 +
>>  drivers/gpu/drm/vkms/vkms_plane.c             |   2 +
>>  include/drm/drm_atomic.h                      |  82 ++
>>  include/drm/drm_atomic_uapi.h                 |   3 +
>>  include/drm/drm_colorop.h                     | 235 +++++
>>  include/drm/drm_fixed.h                       |   2 +-
>>  include/drm/drm_mode_config.h                 |  18 +
>>  include/drm/drm_plane.h                       |  10 +
>>  include/uapi/drm/drm.h                        |   3 +
>>  include/uapi/drm/drm_mode.h                   |  22 +
>>  33 files changed, 2530 insertions(+), 35 deletions(-)  create mode 100644
>> Documentation/gpu/rfc/color_pipeline.rst
>>  create mode 100644 drivers/gpu/drm/drm_colorop.c  create mode 100644
>> drivers/gpu/drm/vkms/Kconfig  create mode 100644
>> drivers/gpu/drm/vkms/tests/.kunitconfig
>>  create mode 100644 drivers/gpu/drm/vkms/tests/Makefile
>>  create mode 100644 drivers/gpu/drm/vkms/tests/vkms_color_tests.c
>>  create mode 100644 drivers/gpu/drm/vkms/vkms_colorop.c
>>  create mode 100644 drivers/gpu/drm/vkms/vkms_composer.h
>>  create mode 100644 drivers/gpu/drm/vkms/vkms_luts.c  create mode 100644
>> drivers/gpu/drm/vkms/vkms_luts.h  create mode 100644
>> include/drm/drm_colorop.h
>>
>> --
>> 2.42.0
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-08 12:18   ` Shankar, Uma
  2023-11-08 13:43     ` Joshua Ashton
@ 2023-11-08 14:37     ` Harry Wentland
  2023-11-09 10:24       ` Shankar, Uma
  1 sibling, 1 reply; 49+ messages in thread
From: Harry Wentland @ 2023-11-08 14:37 UTC (permalink / raw)
  To: Shankar, Uma, dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Abhinav Kumar,
	Naseer Ahmed, Melissa Wen, Aleix Pol, Christopher Braga,
	Pekka Paalanen, Hector Martin, Xaver Hugl, Joshua Ashton



On 2023-11-08 07:18, Shankar, Uma wrote:
> 
> 
>> -----Original Message-----
>> From: Harry Wentland <harry.wentland@amd.com>
>> Sent: Friday, October 20, 2023 2:51 AM
>> To: dri-devel@lists.freedesktop.org
>> Cc: wayland-devel@lists.freedesktop.org; Harry Wentland
>> <harry.wentland@amd.com>; Ville Syrjala <ville.syrjala@linux.intel.com>; Pekka
>> Paalanen <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>;
>> Melissa Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>; Sebastian
>> Wick <sebastian.wick@redhat.com>; Shashank Sharma
>> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>; Joshua
>> Ashton <joshua@froggi.es>; Michel Dänzer <mdaenzer@redhat.com>; Aleix Pol
>> <aleixpol@kde.org>; Xaver Hugl <xaver.hugl@gmail.com>; Victoria Brekenfeld
>> <victoria@system76.com>; Sima <daniel@ffwll.ch>; Shankar, Uma
>> <uma.shankar@intel.com>; Naseer Ahmed <quic_naseer@quicinc.com>;
>> Christopher Braga <quic_cbraga@quicinc.com>; Abhinav Kumar
>> <quic_abhinavk@quicinc.com>; Arthur Grillo <arthurgrillo@riseup.net>; Hector
>> Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha
>> McIntosh <sashamcintosh@google.com>
>> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
>> pipeline is needed
>>
>> v2:
>>  - Update colorop visualizations to match reality (Sebastian, Alex Hung)
>>  - Updated wording (Pekka)
>>  - Change BYPASS wording to make it non-mandatory (Sebastian)
>>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
>>    section (Pekka)
>>  - Use PQ EOTF instead of its inverse in Pipeline Programming example (Melissa)
>>  - Add "Driver Implementer's Guide" section (Pekka)
>>  - Add "Driver Forward/Backward Compatibility" section (Sebastian, Pekka)
>>
>> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
>> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
>> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
>> Cc: Simon Ser <contact@emersion.fr>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Melissa Wen <mwen@igalia.com>
>> Cc: Jonas Ådahl <jadahl@redhat.com>
>> Cc: Sebastian Wick <sebastian.wick@redhat.com>
>> Cc: Shashank Sharma <shashank.sharma@amd.com>
>> Cc: Alexander Goins <agoins@nvidia.com>
>> Cc: Joshua Ashton <joshua@froggi.es>
>> Cc: Michel Dänzer <mdaenzer@redhat.com>
>> Cc: Aleix Pol <aleixpol@kde.org>
>> Cc: Xaver Hugl <xaver.hugl@gmail.com>
>> Cc: Victoria Brekenfeld <victoria@system76.com>
>> Cc: Sima <daniel@ffwll.ch>
>> Cc: Uma Shankar <uma.shankar@intel.com>
>> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
>> Cc: Christopher Braga <quic_cbraga@quicinc.com>
>> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
>> Cc: Arthur Grillo <arthurgrillo@riseup.net>
>> Cc: Hector Martin <marcan@marcan.st>
>> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
>> Cc: Sasha McIntosh <sashamcintosh@google.com>
>> ---
>>  Documentation/gpu/rfc/color_pipeline.rst | 347 +++++++++++++++++++++++
>>  1 file changed, 347 insertions(+)
>>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
>>
>> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
>> b/Documentation/gpu/rfc/color_pipeline.rst
>> new file mode 100644
>> index 000000000000..af5f2ea29116
>> --- /dev/null
>> +++ b/Documentation/gpu/rfc/color_pipeline.rst
>> @@ -0,0 +1,347 @@
>> +========================
>> +Linux Color Pipeline API
>> +========================
>> +
>> +What problem are we solving?
>> +============================
>> +
>> +We would like to support pre-, and post-blending complex color
>> +transformations in display controller hardware in order to allow for
>> +HW-supported HDR use-cases, as well as to provide support to
>> +color-managed applications, such as video or image editors.
>> +
>> +It is possible to support an HDR output on HW supporting the Colorspace
>> +and HDR Metadata drm_connector properties, but that requires the
>> +compositor or application to render and compose the content into one
>> +final buffer intended for display. Doing so is costly.
>> +
>> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices, and
>> +other operations to support color transformations. These operations are
>> +often implemented in fixed-function HW and therefore much more power
>> +efficient than performing similar operations via shaders or CPU.
>> +
>> +We would like to make use of this HW functionality to support complex
>> +color transformations with no, or minimal CPU or shader load.
>> +
>> +
>> +How are other OSes solving this problem?
>> +========================================
>> +
>> +The most widely supported use-cases regard HDR content, whether video
>> +or gaming.
>> +
>> +Most OSes will specify the source content format (color gamut, encoding
>> +transfer function, and other metadata, such as max and average light levels) to a
>> driver.
>> +Drivers will then program their fixed-function HW accordingly to map
>> +from a source content buffer's space to a display's space.
>> +
>> +When fixed-function HW is not available the compositor will assemble a
>> +shader to ask the GPU to perform the transformation from the source
>> +content format to the display's format.
>> +
>> +A compositor's mapping function and a driver's mapping function are
>> +usually entirely separate concepts. On OSes where a HW vendor has no
>> +insight into closed-source compositor code such a vendor will tune
>> +their color management code to visually match the compositor's. On
>> +other OSes, where both mapping functions are open to an implementer they will
>> ensure both mappings match.
>> +
>> +This results in mapping algorithm lock-in, meaning that no-one alone
>> +can experiment with or introduce new mapping algorithms and achieve
>> +consistent results regardless of which implementation path is taken.
>> +
>> +Why is Linux different?
>> +=======================
>> +
>> +Unlike other OSes, where there is one compositor for one or more
>> +drivers, on Linux we have a many-to-many relationship. Many compositors;
>> many drivers.
>> +In addition each compositor vendor or community has their own view of
>> +how color management should be done. This is what makes Linux so beautiful.
>> +
>> +This means that a HW vendor can now no longer tune their driver to one
>> +compositor, as tuning it to one could make it look fairly different
>> +from another compositor's color mapping.
>> +
>> +We need a better solution.
>> +
>> +
>> +Descriptive API
>> +===============
>> +
>> +An API that describes the source and destination colorspaces is a
>> +descriptive API. It describes the input and output color spaces but
>> +does not describe how precisely they should be mapped. Such a mapping
>> +includes many minute design decision that can greatly affect the look of the final
>> result.
>> +
>> +It is not feasible to describe such mapping with enough detail to
>> +ensure the same result from each implementation. In fact, these
>> +mappings are a very active research area.
>> +
>> +
>> +Prescriptive API
>> +================
>> +
>> +A prescriptive API describes not the source and destination
>> +colorspaces. It instead prescribes a recipe for how to manipulate pixel
>> +values to arrive at the desired outcome.
>> +
>> +This recipe is generally an ordered list of straight-forward
>> +operations, with clear mathematical definitions, such as 1D LUTs, 3D
>> +LUTs, matrices, or other operations that can be described in a precise manner.
>> +
>> +
>> +The Color Pipeline API
>> +======================
>> +
>> +HW color management pipelines can significantly differ between HW
>> +vendors in terms of availability, ordering, and capabilities of HW
>> +blocks. This makes a common definition of color management blocks and
>> +their ordering nigh impossible. Instead we are defining an API that
>> +allows user space to discover the HW capabilities in a generic manner,
>> +agnostic of specific drivers and hardware.
>> +
>> +
>> +drm_colorop Object & IOCTLs
>> +===========================
>> +
>> +To support the definition of color pipelines we define the DRM core
>> +object type drm_colorop. Individual drm_colorop objects will be chained
>> +via the NEXT property of a drm_colorop to constitute a color pipeline.
>> +Each drm_colorop object is unique, i.e., even if multiple color
>> +pipelines have the same operation they won't share the same drm_colorop
>> +object to describe that operation.
>> +
>> +Note that drivers are not expected to map drm_colorop objects
>> +statically to specific HW blocks. The mapping of drm_colorop objects is
>> +entirely a driver-internal detail and can be as dynamic or static as a
>> +driver needs it to be. See more in the Driver Implementation Guide section
>> below.
>> +
>> +Just like other DRM objects the drm_colorop objects are discovered via
>> +IOCTLs:
>> +
>> +DRM_IOCTL_MODE_GETCOLOROPRESOURCES: This IOCTL is used to retrieve
>> the
>> +number of all drm_colorop objects.
>> +
>> +DRM_IOCTL_MODE_GETCOLOROP: This IOCTL is used to read one drm_colorop.
>> +It includes the ID for the colorop object, as well as the plane_id of
>> +the associated plane. All other values should be registered as
>> +properties.
>> +
>> +Each drm_colorop has three core properties:
>> +
>> +TYPE: The type of transformation, such as
>> +* enumerated curve
>> +* custom (uniform) 1D LUT
>> +* 3x3 matrix
>> +* 3x4 matrix
>> +* 3D LUT
>> +* etc.
>> +
>> +Depending on the type of transformation other properties will describe
>> +more details.
>> +
>> +BYPASS: A boolean property that can be used to easily put a block into
>> +bypass mode. While setting other properties might fail atomic check,
>> +setting the BYPASS property to true should never fail. The BYPASS
>> +property is not mandatory for a colorop, as long as the entire pipeline
>> +can get bypassed by setting the COLOR_PIPELINE on a plane to '0'.
>> +
>> +NEXT: The ID of the next drm_colorop in a color pipeline, or 0 if this
>> +drm_colorop is the last in the chain.
>> +
>> +An example of a drm_colorop object might look like one of these::
>> +
>> +    /* 1D enumerated curve */
>> +    Color operation 42
>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
>> matrix, 3D LUT, etc.} = 1D enumerated curve
>> +    ├─ "BYPASS": bool {true, false}
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF, PQ
>> inverse EOTF, …}
> 
> Having the fixed function enum for some targeted input/output may not be scalable
> for all usecases. There are multiple colorspaces and transfer functions possible,
> so it will not be possible to cover all these by any enum definitions. Also, this will
> depend on the capabilities of respective hardware from various vendors.
> 

Agreed, and this is only an example of one TYPE of colorop, the "1D enumerated
curve". There is a place for a "1D LUT", that's a traditional 1D LUT, or even
a "PWL" type, if someone wants to define that.

The beauty with the DRM object and properties approach is that this is extensible
without breaking existing implementations in the kernel or userspace.

>> +    └─ "NEXT": immutable color operation ID = 43	
>> +
>> +    /* custom 4k entry 1D LUT */
>> +    Color operation 52
>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
>> matrix, 3D LUT, etc.} = 1D LUT
>> +    ├─ "BYPASS": bool {true, false}
>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> 
> For the size and capability of individual LUT block, it would be good to add this
> as a blob as defined in the blob approach we were planning earlier. So just taking
> that part of the series to have this capability detection generic. Refer below:
> https://patchwork.freedesktop.org/patch/554855/?series=123023&rev=1
> 
> Basically, use this structure for lut capability and arrangement:
> struct drm_color_lut_range {
> 	/* DRM_MODE_LUT_* */
> 	__u32 flags;
> 	/* number of points on the curve */
> 	__u16 count;
> 	/* input/output bits per component */
> 	__u8 input_bpc, output_bpc;
> 	/* input start/end values */
> 	__s32 start, end;
> 	/* output min/max values */
> 	__s32 min, max;
> };
> 
> If the intention is to have just 1 segment with 4096, it can be easily described there.
> Additionally, this can also cater to any kind of lut arrangement, PWL, segmented or logarithmic.
> 

Thanks for sharing this again. We've had some discussion about this and it looks
like we definitely want something to describe the range of the domain of the LUT
as well as it's output values, maybe also things like clamping. Your struct seems
to cover all of that.

>> +    ├─ "LUT_1D": blob
>> +    └─ "NEXT": immutable color operation ID = 0
>> +
>> +    /* 17^3 3D LUT */
>> +    Color operation 72
>> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3 matrix, 3x4
>> matrix, 3D LUT, etc.} = 3D LUT
>> +    ├─ "BYPASS": bool {true, false}
>> +    ├─ "LUT_3D_SIZE": immutable range = 17
>> +    ├─ "LUT_3D": blob
>> +    └─ "NEXT": immutable color operation ID = 73
>> +
>> +
>> +COLOR_PIPELINE Plane Property
>> +=============================
>> +
>> +Color Pipelines are created by a driver and advertised via a new
>> +COLOR_PIPELINE enum property on each plane. Values of the property
>> +always include '0', which is the default and means all color processing
>> +is disabled. Additional values will be the object IDs of the first
>> +drm_colorop in a pipeline. A driver can create and advertise none, one,
>> +or more possible color pipelines. A DRM client will select a color
>> +pipeline by setting the COLOR PIPELINE to the respective value.
>> +
>> +In the case where drivers have custom support for pre-blending color
>> +processing those drivers shall reject atomic commits that are trying to
>> +use both the custom color properties, as well as the COLOR_PIPELINE
>> +property.
>> +
>> +An example of a COLOR_PIPELINE property on a plane might look like this::
>> +
>> +    Plane 10
>> +    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>> +    ├─ …
>> +    └─ "color_pipeline": enum {0, 42, 52} = 0
>> +
>> +
>> +Color Pipeline Discovery
>> +========================
>> +
>> +A DRM client wanting color management on a drm_plane will:
>> +
>> +1. Read all drm_colorop objects
>> +2. Get the COLOR_PIPELINE property of the plane 3. iterate all
>> +COLOR_PIPELINE enum values 4. for each enum value walk the color
>> +pipeline (via the NEXT pointers)
>> +   and see if the available color operations are suitable for the
>> +   desired color management operations
>> +
>> +An example of chained properties to define an AMD pre-blending color
>> +pipeline might look like this::
>> +
>> +    Plane 10
>> +    ├─ "TYPE" (immutable) = Primary
>> +    └─ "COLOR_PIPELINE": enum {0, 44} = 0
>> +
>> +    Color operation 44
>> +    ├─ "TYPE" (immutable) = 1D enumerated curve
>> +    ├─ "BYPASS": bool
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
>> +    └─ "NEXT" (immutable) = 45
>> +
>> +    Color operation 45
>> +    ├─ "TYPE" (immutable) = 3x4 Matrix
>> +    ├─ "BYPASS": bool
>> +    ├─ "MATRIX_3_4": blob
>> +    └─ "NEXT" (immutable) = 46
>> +
>> +    Color operation 46
>> +    ├─ "TYPE" (immutable) = 1D enumerated curve
>> +    ├─ "BYPASS": bool
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB Inverse EOTF, PQ Inverse EOTF} = sRGB
>> EOTF
>> +    └─ "NEXT" (immutable) = 47
>> +
>> +    Color operation 47
>> +    ├─ "TYPE" (immutable) = 1D LUT
>> +    ├─ "LUT_1D_SIZE": immutable range = 4096
>> +    ├─ "LUT_1D_DATA": blob
>> +    └─ "NEXT" (immutable) = 48
>> +
>> +    Color operation 48
>> +    ├─ "TYPE" (immutable) = 3D LUT
>> +    ├─ "LUT_3D_SIZE" (immutable) = 17
>> +    ├─ "LUT_3D_DATA": blob
>> +    └─ "NEXT" (immutable) = 49
>> +
>> +    Color operation 49
>> +    ├─ "TYPE" (immutable) = 1D enumerated curve
>> +    ├─ "BYPASS": bool
>> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
>> +    └─ "NEXT" (immutable) = 0
>> +
>> +
>> +Color Pipeline Programming
>> +==========================
>> +
>> +Once a DRM client has found a suitable pipeline it will:
>> +
>> +1. Set the COLOR_PIPELINE enum value to the one pointing at the first
>> +   drm_colorop object of the desired pipeline 2. Set the properties for
>> +all drm_colorop objects in the pipeline to the
>> +   desired values, setting BYPASS to true for unused drm_colorop blocks,
>> +   and false for enabled drm_colorop blocks 3. Perform
>> +atomic_check/commit as desired
>> +
>> +To configure the pipeline for an HDR10 PQ plane and blending in linear
>> +space, a compositor might perform an atomic commit with the following
>> +property values::
>> +
>> +    Plane 10
>> +    └─ "COLOR_PIPELINE" = 42
>> +
>> +    Color operation 42 (input CSC)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 44 (DeGamma)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 45 (gamut remap)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 46 (shaper LUT RAM)
>> +    └─ "BYPASS" = true
>> +
>> +    Color operation 47 (3D LUT RAM)
>> +    └─ "LUT_3D_DATA" = Gamut mapping + tone mapping + night mode
>> +
>> +    Color operation 48 (blend gamma)
>> +    └─ "CURVE_1D_TYPE" = PQ EOTF
>> +
>> +
>> +Driver Implementer's Guide
>> +==========================
>> +
>> +What does this all mean for driver implementations? As noted above the
>> +colorops can map to HW directly but don't need to do so. Here are some
>> +suggestions on how to think about creating your color pipelines:
>> +
>> +- Try to expose pipelines that use already defined colorops, even if
>> +  your hardware pipeline is split differently. This allows existing
>> +  userspace to immediately take advantage of the hardware.
>> +
>> +- Additionally, try to expose your actual hardware blocks as colorops.
>> +  Define new colorop types where you believe it can offer significant
>> +  benefits if userspace learns to program them.
>> +
>> +- Avoid defining new colorops for compound operations with very narrow
>> +  scope. If you have a hardware block for a special operation that
>> +  cannot be split further, you can expose that as a new colorop type.
>> +  However, try to not define colorops for "use cases", especially if
>> +  they require you to combine multiple hardware blocks.
>> +
>> +- Design new colorops as prescriptive, not descriptive; by the
>> +  mathematical formula, not by the assumed input and output.
>> +
>> +A defined colorop type must be deterministic. Its operation can depend
>> +only on its properties and input and nothing else, allowed error
>> +tolerance notwithstanding.
>> +
>> +
>> +Driver Forward/Backward Compatibility
>> +=====================================
>> +
>> +As this is uAPI drivers can't regress color pipelines that have been
>> +introduced for a given HW generation. New HW generations are free to
>> +abandon color pipelines advertised for previous generations.
>> +Nevertheless, it can be beneficial to carry support for existing color
>> +pipelines forward as those will likely already have support in DRM
>> +clients.
>> +
>> +Introducing new colorops to a pipeline is fine, as long as they can be
>> +disabled or are purely informational. DRM clients implementing support
>> +for the pipeline can always skip unknown properties as long as they can
>> +be confident that doing so will not cause unexpected results.
>> +
>> +If a new colorop doesn't fall into one of the above categories
>> +(bypassable or informational) the modified pipeline would be unusable
>> +for user space. In this case a new pipeline should be defined.
> 
> Thanks again for this nice documentation and capturing all the details clearly.
> 

Thanks for your feedback.

Harry

> Regards,
> Uma Shankar
> 
>> +
>> +References
>> +==========
>> +
>> +1.
>> +https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_hD5n
>> +AccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1
>> QWn488=
>> +@emersion.fr/
>> \ No newline at end of file
>> --
>> 2.42.0
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-08 13:43     ` Joshua Ashton
@ 2023-11-09 10:17       ` Shankar, Uma
  2023-11-09 11:55         ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Shankar, Uma @ 2023-11-09 10:17 UTC (permalink / raw)
  To: Joshua Ashton, Harry Wentland, dri-devel
  Cc: Sebastian Wick, Sasha McIntosh, Pekka Paalanen, Abhinav Kumar,
	Shashank Sharma, Xaver Hugl, Hector Martin, Liviu Dudau,
	Michel Dänzer, wayland-devel, Melissa Wen, Jonas Ådahl,
	Arthur Grillo, Victoria Brekenfeld, Aleix Pol, Naseer Ahmed,
	Christopher Braga



> -----Original Message-----
> From: Joshua Ashton <joshua@froggi.es>
> Sent: Wednesday, November 8, 2023 7:13 PM
> To: Shankar, Uma <uma.shankar@intel.com>; Harry Wentland
> <harry.wentland@amd.com>; dri-devel@lists.freedesktop.org
> Cc: wayland-devel@lists.freedesktop.org; Ville Syrjala
> <ville.syrjala@linux.intel.com>; Pekka Paalanen
> <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>; Melissa
> Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>; Sebastian Wick
> <sebastian.wick@redhat.com>; Shashank Sharma
> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>; Michel
> Dänzer <mdaenzer@redhat.com>; Aleix Pol <aleixpol@kde.org>; Xaver Hugl
> <xaver.hugl@gmail.com>; Victoria Brekenfeld <victoria@system76.com>; Sima
> <daniel@ffwll.ch>; Naseer Ahmed <quic_naseer@quicinc.com>; Christopher
> Braga <quic_cbraga@quicinc.com>; Abhinav Kumar
> <quic_abhinavk@quicinc.com>; Arthur Grillo <arthurgrillo@riseup.net>; Hector
> Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha
> McIntosh <sashamcintosh@google.com>
> Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> 
> 
> On 11/8/23 12:18, Shankar, Uma wrote:
> >
> >
> >> -----Original Message-----
> >> From: Harry Wentland <harry.wentland@amd.com>
> >> Sent: Friday, October 20, 2023 2:51 AM
> >> To: dri-devel@lists.freedesktop.org
> >> Cc: wayland-devel@lists.freedesktop.org; Harry Wentland
> >> <harry.wentland@amd.com>; Ville Syrjala
> >> <ville.syrjala@linux.intel.com>; Pekka Paalanen
> >> <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>;
> >> Melissa Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>;
> >> Sebastian Wick <sebastian.wick@redhat.com>; Shashank Sharma
> >> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>;
> >> Joshua Ashton <joshua@froggi.es>; Michel Dänzer
> >> <mdaenzer@redhat.com>; Aleix Pol <aleixpol@kde.org>; Xaver Hugl
> >> <xaver.hugl@gmail.com>; Victoria Brekenfeld <victoria@system76.com>;
> >> Sima <daniel@ffwll.ch>; Shankar, Uma <uma.shankar@intel.com>; Naseer
> >> Ahmed <quic_naseer@quicinc.com>; Christopher Braga
> >> <quic_cbraga@quicinc.com>; Abhinav Kumar <quic_abhinavk@quicinc.com>;
> >> Arthur Grillo <arthurgrillo@riseup.net>; Hector Martin
> >> <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha McIntosh
> >> <sashamcintosh@google.com>
> >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive
> >> color pipeline is needed
> >>
> >> v2:
> >>   - Update colorop visualizations to match reality (Sebastian, Alex Hung)
> >>   - Updated wording (Pekka)
> >>   - Change BYPASS wording to make it non-mandatory (Sebastian)
> >>   - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >>     section (Pekka)
> >>   - Use PQ EOTF instead of its inverse in Pipeline Programming example
> (Melissa)
> >>   - Add "Driver Implementer's Guide" section (Pekka)
> >>   - Add "Driver Forward/Backward Compatibility" section (Sebastian,
> >> Pekka)
> >>
> >> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
> >> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> >> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> >> Cc: Simon Ser <contact@emersion.fr>
> >> Cc: Harry Wentland <harry.wentland@amd.com>
> >> Cc: Melissa Wen <mwen@igalia.com>
> >> Cc: Jonas Ådahl <jadahl@redhat.com>
> >> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> >> Cc: Shashank Sharma <shashank.sharma@amd.com>
> >> Cc: Alexander Goins <agoins@nvidia.com>
> >> Cc: Joshua Ashton <joshua@froggi.es>
> >> Cc: Michel Dänzer <mdaenzer@redhat.com>
> >> Cc: Aleix Pol <aleixpol@kde.org>
> >> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> >> Cc: Victoria Brekenfeld <victoria@system76.com>
> >> Cc: Sima <daniel@ffwll.ch>
> >> Cc: Uma Shankar <uma.shankar@intel.com>
> >> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
> >> Cc: Christopher Braga <quic_cbraga@quicinc.com>
> >> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> >> Cc: Arthur Grillo <arthurgrillo@riseup.net>
> >> Cc: Hector Martin <marcan@marcan.st>
> >> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
> >> Cc: Sasha McIntosh <sashamcintosh@google.com>
> >> ---
> >>   Documentation/gpu/rfc/color_pipeline.rst | 347 +++++++++++++++++++++++
> >>   1 file changed, 347 insertions(+)
> >>   create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> >>
> >> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
> >> b/Documentation/gpu/rfc/color_pipeline.rst
> >> new file mode 100644
> >> index 000000000000..af5f2ea29116
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> >> @@ -0,0 +1,347 @@
> >> +========================
> >> +Linux Color Pipeline API
> >> +========================
> >> +
> >> +What problem are we solving?
> >> +============================
> >> +
> >> +We would like to support pre-, and post-blending complex color
> >> +transformations in display controller hardware in order to allow for
> >> +HW-supported HDR use-cases, as well as to provide support to
> >> +color-managed applications, such as video or image editors.
> >> +
> >> +It is possible to support an HDR output on HW supporting the
> >> +Colorspace and HDR Metadata drm_connector properties, but that
> >> +requires the compositor or application to render and compose the
> >> +content into one final buffer intended for display. Doing so is costly.
> >> +
> >> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices,
> >> +and other operations to support color transformations. These
> >> +operations are often implemented in fixed-function HW and therefore
> >> +much more power efficient than performing similar operations via shaders or
> CPU.
> >> +
> >> +We would like to make use of this HW functionality to support
> >> +complex color transformations with no, or minimal CPU or shader load.
> >> +
> >> +
> >> +How are other OSes solving this problem?
> >> +========================================
> >> +
> >> +The most widely supported use-cases regard HDR content, whether
> >> +video or gaming.
> >> +
> >> +Most OSes will specify the source content format (color gamut,
> >> +encoding transfer function, and other metadata, such as max and
> >> +average light levels) to a
> >> driver.
> >> +Drivers will then program their fixed-function HW accordingly to map
> >> +from a source content buffer's space to a display's space.
> >> +
> >> +When fixed-function HW is not available the compositor will assemble
> >> +a shader to ask the GPU to perform the transformation from the
> >> +source content format to the display's format.
> >> +
> >> +A compositor's mapping function and a driver's mapping function are
> >> +usually entirely separate concepts. On OSes where a HW vendor has no
> >> +insight into closed-source compositor code such a vendor will tune
> >> +their color management code to visually match the compositor's. On
> >> +other OSes, where both mapping functions are open to an implementer
> >> +they will
> >> ensure both mappings match.
> >> +
> >> +This results in mapping algorithm lock-in, meaning that no-one alone
> >> +can experiment with or introduce new mapping algorithms and achieve
> >> +consistent results regardless of which implementation path is taken.
> >> +
> >> +Why is Linux different?
> >> +=======================
> >> +
> >> +Unlike other OSes, where there is one compositor for one or more
> >> +drivers, on Linux we have a many-to-many relationship. Many
> >> +compositors;
> >> many drivers.
> >> +In addition each compositor vendor or community has their own view
> >> +of how color management should be done. This is what makes Linux so
> beautiful.
> >> +
> >> +This means that a HW vendor can now no longer tune their driver to
> >> +one compositor, as tuning it to one could make it look fairly
> >> +different from another compositor's color mapping.
> >> +
> >> +We need a better solution.
> >> +
> >> +
> >> +Descriptive API
> >> +===============
> >> +
> >> +An API that describes the source and destination colorspaces is a
> >> +descriptive API. It describes the input and output color spaces but
> >> +does not describe how precisely they should be mapped. Such a
> >> +mapping includes many minute design decision that can greatly affect
> >> +the look of the final
> >> result.
> >> +
> >> +It is not feasible to describe such mapping with enough detail to
> >> +ensure the same result from each implementation. In fact, these
> >> +mappings are a very active research area.
> >> +
> >> +
> >> +Prescriptive API
> >> +================
> >> +
> >> +A prescriptive API describes not the source and destination
> >> +colorspaces. It instead prescribes a recipe for how to manipulate
> >> +pixel values to arrive at the desired outcome.
> >> +
> >> +This recipe is generally an ordered list of straight-forward
> >> +operations, with clear mathematical definitions, such as 1D LUTs, 3D
> >> +LUTs, matrices, or other operations that can be described in a precise manner.
> >> +
> >> +
> >> +The Color Pipeline API
> >> +======================
> >> +
> >> +HW color management pipelines can significantly differ between HW
> >> +vendors in terms of availability, ordering, and capabilities of HW
> >> +blocks. This makes a common definition of color management blocks
> >> +and their ordering nigh impossible. Instead we are defining an API
> >> +that allows user space to discover the HW capabilities in a generic
> >> +manner, agnostic of specific drivers and hardware.
> >> +
> >> +
> >> +drm_colorop Object & IOCTLs
> >> +===========================
> >> +
> >> +To support the definition of color pipelines we define the DRM core
> >> +object type drm_colorop. Individual drm_colorop objects will be
> >> +chained via the NEXT property of a drm_colorop to constitute a color
> pipeline.
> >> +Each drm_colorop object is unique, i.e., even if multiple color
> >> +pipelines have the same operation they won't share the same
> >> +drm_colorop object to describe that operation.
> >> +
> >> +Note that drivers are not expected to map drm_colorop objects
> >> +statically to specific HW blocks. The mapping of drm_colorop objects
> >> +is entirely a driver-internal detail and can be as dynamic or static
> >> +as a driver needs it to be. See more in the Driver Implementation
> >> +Guide section
> >> below.
> >> +
> >> +Just like other DRM objects the drm_colorop objects are discovered
> >> +via
> >> +IOCTLs:
> >> +
> >> +DRM_IOCTL_MODE_GETCOLOROPRESOURCES: This IOCTL is used to
> retrieve
> >> the
> >> +number of all drm_colorop objects.
> >> +
> >> +DRM_IOCTL_MODE_GETCOLOROP: This IOCTL is used to read one
> drm_colorop.
> >> +It includes the ID for the colorop object, as well as the plane_id
> >> +of the associated plane. All other values should be registered as
> >> +properties.
> >> +
> >> +Each drm_colorop has three core properties:
> >> +
> >> +TYPE: The type of transformation, such as
> >> +* enumerated curve
> >> +* custom (uniform) 1D LUT
> >> +* 3x3 matrix
> >> +* 3x4 matrix
> >> +* 3D LUT
> >> +* etc.
> >> +
> >> +Depending on the type of transformation other properties will
> >> +describe more details.
> >> +
> >> +BYPASS: A boolean property that can be used to easily put a block
> >> +into bypass mode. While setting other properties might fail atomic
> >> +check, setting the BYPASS property to true should never fail. The
> >> +BYPASS property is not mandatory for a colorop, as long as the
> >> +entire pipeline can get bypassed by setting the COLOR_PIPELINE on a plane
> to '0'.
> >> +
> >> +NEXT: The ID of the next drm_colorop in a color pipeline, or 0 if
> >> +this drm_colorop is the last in the chain.
> >> +
> >> +An example of a drm_colorop object might look like one of these::
> >> +
> >> +    /* 1D enumerated curve */
> >> +    Color operation 42
> >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> >> + matrix, 3x4
> >> matrix, 3D LUT, etc.} = 1D enumerated curve
> >> +    ├─ "BYPASS": bool {true, false}
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF,
> >> + PQ
> >> inverse EOTF, …}
> >
> > Having the fixed function enum for some targeted input/output may not
> > be scalable for all usecases. There are multiple colorspaces and
> > transfer functions possible, so it will not be possible to cover all
> > these by any enum definitions. Also, this will depend on the capabilities of
> respective hardware from various vendors.
> 
> The reason this exists is such that certain HW vendors such as AMD have transfer
> functions implemented in HW. It is important to take advantage of these for both
> precision and power reasons.

Issue we see here is that, it will be too usecase and vendor specific.
There will be BT601, BT709, BT2020, SRGB, HDR EOTF and many more. Not to forget
we will need linearization and non-linearization enums for each of these. Also 
a CTM indication to convert colospace. Also, if the underlying hardware block is 
programmable, its not limited to be used only for the colorspace management but
can be used for other color enhancements as well by a capable client.

Hence, we feel that it is bordering on being descriptive with too many possible
combinations (not easy to generalize). So, if hardware is programmable, lets
expose its capability through a blob and be generic.

For any fixed function hardware where Lut etc is stored in ROM and just a control/enable
bit is provided to driver, we can define a pipeline with a vendor specific color block. This
can be identified with a flag (better ways can be discussed). 

For example, on some of the Intel platform, we had a fixed function to convert colorspaces
directly with a bit setting. These kinds of things should be vendor specific and not be part
of generic userspace implementation.
For reference:
001b	YUV601 to RGB601 YUV BT.601 to RGB BT.601 conversion.
010b	YUV709 to RGB709 YUV BT.709 to RGB BT.709 conversion.
011b	YUV2020 to RGB2020 YUV BT.2020 to RGB BT.2020 conversion.
100b	RGB709 to RGB2020 RGB BT.709 to RGB BT.2020 conversion.

> Additionally, not every vendor implements bucketed/segemented LUTs the same
> way, so it's not feasible to expose that in a way that's particularly useful or not
> vendor-specific.

If the underlying hardware is programmable, the structure which we propose to advertise
the capability of the block to userspace will be sufficient to compute the LUT coefficients.
The caps can be :
1. Number of segments in Lut
2. Precision of lut
3. Starting and ending point of the segment
4. Number of samples in the segment.
5. Any other flag which could be useful in this computation.

This way we can compute LUT's generically and send to driver. This will be scalable for all
colorspaces, configurations and vendors.

> Thus we decided to have a regular 1D LUT modulated onto a known curve.
> This is the only real cross-vendor solution here that allows HW curve
> implementations to be taken advantage of and also works with
> bucketing/segemented LUTs.
> (Including vendors we are not aware of yet).
> 
> This also means that vendors that only support HW curves at some stages without
> an actual LUT are also serviced.

Any fixed function vendor implementation should be supported but with a vendor
specific color block. Trying to come up with enums which aligns with some underlying
hardware may not be scalable.

> You are right that there *might* be some usecase not covered by this right now,
> and that it would need kernel churn to implement new curves, but unfortunately
> that's the compromise that we (so-far) have decided on in order to ensure
> everyone can have good, precise, power-efficient support.

Yes, we are aligned on this. But believe programmable hardware should be able to
expose its caps. Fixed function hardware should be non-generic and vendor specific.

> It is always possible for us to extend the uAPI at a later date for other curves, or
> other properties that might expose a generic segmented LUT interface (such as
> what you have proposed for a while) for vendors that can support it.
> (With the whole color pipeline thing, we can essentially do 'versioning'
> with that, if we wanted a new 1D LUT type.)

Most of the hardware vendors have programmable luts (including AMD), so it would be
good to have this as a default generic compositor implementation. And yes, any new color
block with a type can be added to the existing API's as the need arises without breaking
compatibility.

Regards,
Uma Shankar

> 
> Thanks!
> - Joshie 🐸✨
> 
> >
> >> +    └─ "NEXT": immutable color operation ID = 43
> >> +
> >> +    /* custom 4k entry 1D LUT */
> >> +    Color operation 52
> >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> >> + matrix, 3x4
> >> matrix, 3D LUT, etc.} = 1D LUT
> >> +    ├─ "BYPASS": bool {true, false}
> >> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> >
> > For the size and capability of individual LUT block, it would be good
> > to add this as a blob as defined in the blob approach we were planning
> > earlier. So just taking that part of the series to have this capability detection
> generic. Refer below:
> > https://patchwork.freedesktop.org/patch/554855/?series=123023&rev=1
> >
> > Basically, use this structure for lut capability and arrangement:
> > struct drm_color_lut_range {
> > 	/* DRM_MODE_LUT_* */
> > 	__u32 flags;
> > 	/* number of points on the curve */
> > 	__u16 count;
> > 	/* input/output bits per component */
> > 	__u8 input_bpc, output_bpc;
> > 	/* input start/end values */
> > 	__s32 start, end;
> > 	/* output min/max values */
> > 	__s32 min, max;
> > };
> >
> > If the intention is to have just 1 segment with 4096, it can be easily described
> there.
> > Additionally, this can also cater to any kind of lut arrangement, PWL, segmented
> or logarithmic.
> >
> >> +    ├─ "LUT_1D": blob
> >> +    └─ "NEXT": immutable color operation ID = 0
> >> +
> >> +    /* 17^3 3D LUT */
> >> +    Color operation 72
> >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> >> + matrix, 3x4
> >> matrix, 3D LUT, etc.} = 3D LUT
> >> +    ├─ "BYPASS": bool {true, false}
> >> +    ├─ "LUT_3D_SIZE": immutable range = 17
> >> +    ├─ "LUT_3D": blob
> >> +    └─ "NEXT": immutable color operation ID = 73
> >> +
> >> +
> >> +COLOR_PIPELINE Plane Property
> >> +=============================
> >> +
> >> +Color Pipelines are created by a driver and advertised via a new
> >> +COLOR_PIPELINE enum property on each plane. Values of the property
> >> +always include '0', which is the default and means all color
> >> +processing is disabled. Additional values will be the object IDs of
> >> +the first drm_colorop in a pipeline. A driver can create and
> >> +advertise none, one, or more possible color pipelines. A DRM client
> >> +will select a color pipeline by setting the COLOR PIPELINE to the respective
> value.
> >> +
> >> +In the case where drivers have custom support for pre-blending color
> >> +processing those drivers shall reject atomic commits that are trying
> >> +to use both the custom color properties, as well as the
> >> +COLOR_PIPELINE property.
> >> +
> >> +An example of a COLOR_PIPELINE property on a plane might look like this::
> >> +
> >> +    Plane 10
> >> +    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> >> +    ├─ …
> >> +    └─ "color_pipeline": enum {0, 42, 52} = 0
> >> +
> >> +
> >> +Color Pipeline Discovery
> >> +========================
> >> +
> >> +A DRM client wanting color management on a drm_plane will:
> >> +
> >> +1. Read all drm_colorop objects
> >> +2. Get the COLOR_PIPELINE property of the plane 3. iterate all
> >> +COLOR_PIPELINE enum values 4. for each enum value walk the color
> >> +pipeline (via the NEXT pointers)
> >> +   and see if the available color operations are suitable for the
> >> +   desired color management operations
> >> +
> >> +An example of chained properties to define an AMD pre-blending color
> >> +pipeline might look like this::
> >> +
> >> +    Plane 10
> >> +    ├─ "TYPE" (immutable) = Primary
> >> +    └─ "COLOR_PIPELINE": enum {0, 44} = 0
> >> +
> >> +    Color operation 44
> >> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> >> +    └─ "NEXT" (immutable) = 45
> >> +
> >> +    Color operation 45
> >> +    ├─ "TYPE" (immutable) = 3x4 Matrix
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "MATRIX_3_4": blob
> >> +    └─ "NEXT" (immutable) = 46
> >> +
> >> +    Color operation 46
> >> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB Inverse EOTF, PQ Inverse EOTF} =
> >> + sRGB
> >> EOTF
> >> +    └─ "NEXT" (immutable) = 47
> >> +
> >> +    Color operation 47
> >> +    ├─ "TYPE" (immutable) = 1D LUT
> >> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> >> +    ├─ "LUT_1D_DATA": blob
> >> +    └─ "NEXT" (immutable) = 48
> >> +
> >> +    Color operation 48
> >> +    ├─ "TYPE" (immutable) = 3D LUT
> >> +    ├─ "LUT_3D_SIZE" (immutable) = 17
> >> +    ├─ "LUT_3D_DATA": blob
> >> +    └─ "NEXT" (immutable) = 49
> >> +
> >> +    Color operation 49
> >> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> >> +    └─ "NEXT" (immutable) = 0
> >> +
> >> +
> >> +Color Pipeline Programming
> >> +==========================
> >> +
> >> +Once a DRM client has found a suitable pipeline it will:
> >> +
> >> +1. Set the COLOR_PIPELINE enum value to the one pointing at the first
> >> +   drm_colorop object of the desired pipeline 2. Set the properties
> >> +for all drm_colorop objects in the pipeline to the
> >> +   desired values, setting BYPASS to true for unused drm_colorop blocks,
> >> +   and false for enabled drm_colorop blocks 3. Perform
> >> +atomic_check/commit as desired
> >> +
> >> +To configure the pipeline for an HDR10 PQ plane and blending in
> >> +linear space, a compositor might perform an atomic commit with the
> >> +following property values::
> >> +
> >> +    Plane 10
> >> +    └─ "COLOR_PIPELINE" = 42
> >> +
> >> +    Color operation 42 (input CSC)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 44 (DeGamma)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 45 (gamut remap)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 46 (shaper LUT RAM)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 47 (3D LUT RAM)
> >> +    └─ "LUT_3D_DATA" = Gamut mapping + tone mapping + night mode
> >> +
> >> +    Color operation 48 (blend gamma)
> >> +    └─ "CURVE_1D_TYPE" = PQ EOTF
> >> +
> >> +
> >> +Driver Implementer's Guide
> >> +==========================
> >> +
> >> +What does this all mean for driver implementations? As noted above
> >> +the colorops can map to HW directly but don't need to do so. Here
> >> +are some suggestions on how to think about creating your color pipelines:
> >> +
> >> +- Try to expose pipelines that use already defined colorops, even if
> >> +  your hardware pipeline is split differently. This allows existing
> >> +  userspace to immediately take advantage of the hardware.
> >> +
> >> +- Additionally, try to expose your actual hardware blocks as colorops.
> >> +  Define new colorop types where you believe it can offer
> >> +significant
> >> +  benefits if userspace learns to program them.
> >> +
> >> +- Avoid defining new colorops for compound operations with very
> >> +narrow
> >> +  scope. If you have a hardware block for a special operation that
> >> +  cannot be split further, you can expose that as a new colorop type.
> >> +  However, try to not define colorops for "use cases", especially if
> >> +  they require you to combine multiple hardware blocks.
> >> +
> >> +- Design new colorops as prescriptive, not descriptive; by the
> >> +  mathematical formula, not by the assumed input and output.
> >> +
> >> +A defined colorop type must be deterministic. Its operation can
> >> +depend only on its properties and input and nothing else, allowed
> >> +error tolerance notwithstanding.
> >> +
> >> +
> >> +Driver Forward/Backward Compatibility
> >> +=====================================
> >> +
> >> +As this is uAPI drivers can't regress color pipelines that have been
> >> +introduced for a given HW generation. New HW generations are free to
> >> +abandon color pipelines advertised for previous generations.
> >> +Nevertheless, it can be beneficial to carry support for existing
> >> +color pipelines forward as those will likely already have support in
> >> +DRM clients.
> >> +
> >> +Introducing new colorops to a pipeline is fine, as long as they can
> >> +be disabled or are purely informational. DRM clients implementing
> >> +support for the pipeline can always skip unknown properties as long
> >> +as they can be confident that doing so will not cause unexpected results.
> >> +
> >> +If a new colorop doesn't fall into one of the above categories
> >> +(bypassable or informational) the modified pipeline would be
> >> +unusable for user space. In this case a new pipeline should be defined.
> >
> > Thanks again for this nice documentation and capturing all the details clearly.
> >
> > Regards,
> > Uma Shankar
> >
> >> +
> >> +References
> >> +==========
> >> +
> >> +1.
> >> +https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_h
> >> +D5n
> >>
> +AccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1
> >> QWn488=
> >> +@emersion.fr/
> >> \ No newline at end of file
> >> --
> >> 2.42.0
> >
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-08 14:37     ` Harry Wentland
@ 2023-11-09 10:24       ` Shankar, Uma
  0 siblings, 0 replies; 49+ messages in thread
From: Shankar, Uma @ 2023-11-09 10:24 UTC (permalink / raw)
  To: Harry Wentland, dri-devel
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld,
	Michel Dänzer, Arthur Grillo, Sebastian Wick,
	Shashank Sharma, wayland-devel, Jonas Ådahl, Abhinav Kumar,
	Naseer Ahmed, Melissa Wen, Aleix Pol, Christopher Braga,
	Pekka Paalanen, Hector Martin, Xaver Hugl, Joshua Ashton



> -----Original Message-----
> From: Harry Wentland <harry.wentland@amd.com>
> Sent: Wednesday, November 8, 2023 8:08 PM
> To: Shankar, Uma <uma.shankar@intel.com>; dri-devel@lists.freedesktop.org
> Cc: wayland-devel@lists.freedesktop.org; Ville Syrjala
> <ville.syrjala@linux.intel.com>; Pekka Paalanen
> <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>; Melissa
> Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>; Sebastian Wick
> <sebastian.wick@redhat.com>; Shashank Sharma
> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>; Joshua
> Ashton <joshua@froggi.es>; Michel Dänzer <mdaenzer@redhat.com>; Aleix Pol
> <aleixpol@kde.org>; Xaver Hugl <xaver.hugl@gmail.com>; Victoria Brekenfeld
> <victoria@system76.com>; Sima <daniel@ffwll.ch>; Naseer Ahmed
> <quic_naseer@quicinc.com>; Christopher Braga <quic_cbraga@quicinc.com>;
> Abhinav Kumar <quic_abhinavk@quicinc.com>; Arthur Grillo
> <arthurgrillo@riseup.net>; Hector Martin <marcan@marcan.st>; Liviu Dudau
> <Liviu.Dudau@arm.com>; Sasha McIntosh <sashamcintosh@google.com>
> Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> 
> 
> On 2023-11-08 07:18, Shankar, Uma wrote:
> >
> >
> >> -----Original Message-----
> >> From: Harry Wentland <harry.wentland@amd.com>
> >> Sent: Friday, October 20, 2023 2:51 AM
> >> To: dri-devel@lists.freedesktop.org
> >> Cc: wayland-devel@lists.freedesktop.org; Harry Wentland
> >> <harry.wentland@amd.com>; Ville Syrjala
> >> <ville.syrjala@linux.intel.com>; Pekka Paalanen
> >> <pekka.paalanen@collabora.com>; Simon Ser <contact@emersion.fr>;
> >> Melissa Wen <mwen@igalia.com>; Jonas Ådahl <jadahl@redhat.com>;
> >> Sebastian Wick <sebastian.wick@redhat.com>; Shashank Sharma
> >> <shashank.sharma@amd.com>; Alexander Goins <agoins@nvidia.com>;
> >> Joshua Ashton <joshua@froggi.es>; Michel Dänzer
> >> <mdaenzer@redhat.com>; Aleix Pol <aleixpol@kde.org>; Xaver Hugl
> >> <xaver.hugl@gmail.com>; Victoria Brekenfeld <victoria@system76.com>;
> >> Sima <daniel@ffwll.ch>; Shankar, Uma <uma.shankar@intel.com>; Naseer
> >> Ahmed <quic_naseer@quicinc.com>; Christopher Braga
> >> <quic_cbraga@quicinc.com>; Abhinav Kumar <quic_abhinavk@quicinc.com>;
> >> Arthur Grillo <arthurgrillo@riseup.net>; Hector Martin
> >> <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Sasha McIntosh
> >> <sashamcintosh@google.com>
> >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive
> >> color pipeline is needed
> >>
> >> v2:
> >>  - Update colorop visualizations to match reality (Sebastian, Alex
> >> Hung)
> >>  - Updated wording (Pekka)
> >>  - Change BYPASS wording to make it non-mandatory (Sebastian)
> >>  - Drop cover-letter-like paragraph from COLOR_PIPELINE Plane Property
> >>    section (Pekka)
> >>  - Use PQ EOTF instead of its inverse in Pipeline Programming example
> >> (Melissa)
> >>  - Add "Driver Implementer's Guide" section (Pekka)
> >>  - Add "Driver Forward/Backward Compatibility" section (Sebastian,
> >> Pekka)
> >>
> >> Signed-off-by: Harry Wentland <harry.wentland@amd.com>
> >> Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
> >> Cc: Pekka Paalanen <pekka.paalanen@collabora.com>
> >> Cc: Simon Ser <contact@emersion.fr>
> >> Cc: Harry Wentland <harry.wentland@amd.com>
> >> Cc: Melissa Wen <mwen@igalia.com>
> >> Cc: Jonas Ådahl <jadahl@redhat.com>
> >> Cc: Sebastian Wick <sebastian.wick@redhat.com>
> >> Cc: Shashank Sharma <shashank.sharma@amd.com>
> >> Cc: Alexander Goins <agoins@nvidia.com>
> >> Cc: Joshua Ashton <joshua@froggi.es>
> >> Cc: Michel Dänzer <mdaenzer@redhat.com>
> >> Cc: Aleix Pol <aleixpol@kde.org>
> >> Cc: Xaver Hugl <xaver.hugl@gmail.com>
> >> Cc: Victoria Brekenfeld <victoria@system76.com>
> >> Cc: Sima <daniel@ffwll.ch>
> >> Cc: Uma Shankar <uma.shankar@intel.com>
> >> Cc: Naseer Ahmed <quic_naseer@quicinc.com>
> >> Cc: Christopher Braga <quic_cbraga@quicinc.com>
> >> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
> >> Cc: Arthur Grillo <arthurgrillo@riseup.net>
> >> Cc: Hector Martin <marcan@marcan.st>
> >> Cc: Liviu Dudau <Liviu.Dudau@arm.com>
> >> Cc: Sasha McIntosh <sashamcintosh@google.com>
> >> ---
> >>  Documentation/gpu/rfc/color_pipeline.rst | 347
> >> +++++++++++++++++++++++
> >>  1 file changed, 347 insertions(+)
> >>  create mode 100644 Documentation/gpu/rfc/color_pipeline.rst
> >>
> >> diff --git a/Documentation/gpu/rfc/color_pipeline.rst
> >> b/Documentation/gpu/rfc/color_pipeline.rst
> >> new file mode 100644
> >> index 000000000000..af5f2ea29116
> >> --- /dev/null
> >> +++ b/Documentation/gpu/rfc/color_pipeline.rst
> >> @@ -0,0 +1,347 @@
> >> +========================
> >> +Linux Color Pipeline API
> >> +========================
> >> +
> >> +What problem are we solving?
> >> +============================
> >> +
> >> +We would like to support pre-, and post-blending complex color
> >> +transformations in display controller hardware in order to allow for
> >> +HW-supported HDR use-cases, as well as to provide support to
> >> +color-managed applications, such as video or image editors.
> >> +
> >> +It is possible to support an HDR output on HW supporting the
> >> +Colorspace and HDR Metadata drm_connector properties, but that
> >> +requires the compositor or application to render and compose the
> >> +content into one final buffer intended for display. Doing so is costly.
> >> +
> >> +Most modern display HW offers various 1D LUTs, 3D LUTs, matrices,
> >> +and other operations to support color transformations. These
> >> +operations are often implemented in fixed-function HW and therefore
> >> +much more power efficient than performing similar operations via shaders or
> CPU.
> >> +
> >> +We would like to make use of this HW functionality to support
> >> +complex color transformations with no, or minimal CPU or shader load.
> >> +
> >> +
> >> +How are other OSes solving this problem?
> >> +========================================
> >> +
> >> +The most widely supported use-cases regard HDR content, whether
> >> +video or gaming.
> >> +
> >> +Most OSes will specify the source content format (color gamut,
> >> +encoding transfer function, and other metadata, such as max and
> >> +average light levels) to a
> >> driver.
> >> +Drivers will then program their fixed-function HW accordingly to map
> >> +from a source content buffer's space to a display's space.
> >> +
> >> +When fixed-function HW is not available the compositor will assemble
> >> +a shader to ask the GPU to perform the transformation from the
> >> +source content format to the display's format.
> >> +
> >> +A compositor's mapping function and a driver's mapping function are
> >> +usually entirely separate concepts. On OSes where a HW vendor has no
> >> +insight into closed-source compositor code such a vendor will tune
> >> +their color management code to visually match the compositor's. On
> >> +other OSes, where both mapping functions are open to an implementer
> >> +they will
> >> ensure both mappings match.
> >> +
> >> +This results in mapping algorithm lock-in, meaning that no-one alone
> >> +can experiment with or introduce new mapping algorithms and achieve
> >> +consistent results regardless of which implementation path is taken.
> >> +
> >> +Why is Linux different?
> >> +=======================
> >> +
> >> +Unlike other OSes, where there is one compositor for one or more
> >> +drivers, on Linux we have a many-to-many relationship. Many
> >> +compositors;
> >> many drivers.
> >> +In addition each compositor vendor or community has their own view
> >> +of how color management should be done. This is what makes Linux so
> beautiful.
> >> +
> >> +This means that a HW vendor can now no longer tune their driver to
> >> +one compositor, as tuning it to one could make it look fairly
> >> +different from another compositor's color mapping.
> >> +
> >> +We need a better solution.
> >> +
> >> +
> >> +Descriptive API
> >> +===============
> >> +
> >> +An API that describes the source and destination colorspaces is a
> >> +descriptive API. It describes the input and output color spaces but
> >> +does not describe how precisely they should be mapped. Such a
> >> +mapping includes many minute design decision that can greatly affect
> >> +the look of the final
> >> result.
> >> +
> >> +It is not feasible to describe such mapping with enough detail to
> >> +ensure the same result from each implementation. In fact, these
> >> +mappings are a very active research area.
> >> +
> >> +
> >> +Prescriptive API
> >> +================
> >> +
> >> +A prescriptive API describes not the source and destination
> >> +colorspaces. It instead prescribes a recipe for how to manipulate
> >> +pixel values to arrive at the desired outcome.
> >> +
> >> +This recipe is generally an ordered list of straight-forward
> >> +operations, with clear mathematical definitions, such as 1D LUTs, 3D
> >> +LUTs, matrices, or other operations that can be described in a precise manner.
> >> +
> >> +
> >> +The Color Pipeline API
> >> +======================
> >> +
> >> +HW color management pipelines can significantly differ between HW
> >> +vendors in terms of availability, ordering, and capabilities of HW
> >> +blocks. This makes a common definition of color management blocks
> >> +and their ordering nigh impossible. Instead we are defining an API
> >> +that allows user space to discover the HW capabilities in a generic
> >> +manner, agnostic of specific drivers and hardware.
> >> +
> >> +
> >> +drm_colorop Object & IOCTLs
> >> +===========================
> >> +
> >> +To support the definition of color pipelines we define the DRM core
> >> +object type drm_colorop. Individual drm_colorop objects will be
> >> +chained via the NEXT property of a drm_colorop to constitute a color
> pipeline.
> >> +Each drm_colorop object is unique, i.e., even if multiple color
> >> +pipelines have the same operation they won't share the same
> >> +drm_colorop object to describe that operation.
> >> +
> >> +Note that drivers are not expected to map drm_colorop objects
> >> +statically to specific HW blocks. The mapping of drm_colorop objects
> >> +is entirely a driver-internal detail and can be as dynamic or static
> >> +as a driver needs it to be. See more in the Driver Implementation
> >> +Guide section
> >> below.
> >> +
> >> +Just like other DRM objects the drm_colorop objects are discovered
> >> +via
> >> +IOCTLs:
> >> +
> >> +DRM_IOCTL_MODE_GETCOLOROPRESOURCES: This IOCTL is used to
> retrieve
> >> the
> >> +number of all drm_colorop objects.
> >> +
> >> +DRM_IOCTL_MODE_GETCOLOROP: This IOCTL is used to read one
> drm_colorop.
> >> +It includes the ID for the colorop object, as well as the plane_id
> >> +of the associated plane. All other values should be registered as
> >> +properties.
> >> +
> >> +Each drm_colorop has three core properties:
> >> +
> >> +TYPE: The type of transformation, such as
> >> +* enumerated curve
> >> +* custom (uniform) 1D LUT
> >> +* 3x3 matrix
> >> +* 3x4 matrix
> >> +* 3D LUT
> >> +* etc.
> >> +
> >> +Depending on the type of transformation other properties will
> >> +describe more details.
> >> +
> >> +BYPASS: A boolean property that can be used to easily put a block
> >> +into bypass mode. While setting other properties might fail atomic
> >> +check, setting the BYPASS property to true should never fail. The
> >> +BYPASS property is not mandatory for a colorop, as long as the
> >> +entire pipeline can get bypassed by setting the COLOR_PIPELINE on a plane
> to '0'.
> >> +
> >> +NEXT: The ID of the next drm_colorop in a color pipeline, or 0 if
> >> +this drm_colorop is the last in the chain.
> >> +
> >> +An example of a drm_colorop object might look like one of these::
> >> +
> >> +    /* 1D enumerated curve */
> >> +    Color operation 42
> >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> >> + matrix, 3x4
> >> matrix, 3D LUT, etc.} = 1D enumerated curve
> >> +    ├─ "BYPASS": bool {true, false}
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF,
> >> + PQ
> >> inverse EOTF, …}
> >
> > Having the fixed function enum for some targeted input/output may not
> > be scalable for all usecases. There are multiple colorspaces and
> > transfer functions possible, so it will not be possible to cover all
> > these by any enum definitions. Also, this will depend on the capabilities of
> respective hardware from various vendors.
> >
> 
> Agreed, and this is only an example of one TYPE of colorop, the "1D enumerated
> curve". There is a place for a "1D LUT", that's a traditional 1D LUT, or even a
> "PWL" type, if someone wants to define that.
> 
> The beauty with the DRM object and properties approach is that this is extensible
> without breaking existing implementations in the kernel or userspace.

Yeah, the only concern with enums I had was on the possible combinations and its
associated mapping on various hardware and vendors. 

So a generic userspace should rely on capability detection and programming, which
will be scalable and useful for all possible hardware and vendors.
Some custom hardware can be handled by vendor specific block and its related HAL.

> >> +    └─ "NEXT": immutable color operation ID = 43
> >> +
> >> +    /* custom 4k entry 1D LUT */
> >> +    Color operation 52
> >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> >> + matrix, 3x4
> >> matrix, 3D LUT, etc.} = 1D LUT
> >> +    ├─ "BYPASS": bool {true, false}
> >> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> >
> > For the size and capability of individual LUT block, it would be good
> > to add this as a blob as defined in the blob approach we were planning
> > earlier. So just taking that part of the series to have this capability detection
> generic. Refer below:
> > https://patchwork.freedesktop.org/patch/554855/?series=123023&rev=1
> >
> > Basically, use this structure for lut capability and arrangement:
> > struct drm_color_lut_range {
> > 	/* DRM_MODE_LUT_* */
> > 	__u32 flags;
> > 	/* number of points on the curve */
> > 	__u16 count;
> > 	/* input/output bits per component */
> > 	__u8 input_bpc, output_bpc;
> > 	/* input start/end values */
> > 	__s32 start, end;
> > 	/* output min/max values */
> > 	__s32 min, max;
> > };
> >
> > If the intention is to have just 1 segment with 4096, it can be easily described
> there.
> > Additionally, this can also cater to any kind of lut arrangement, PWL, segmented
> or logarithmic.
> >
> 
> Thanks for sharing this again. We've had some discussion about this and it looks
> like we definitely want something to describe the range of the domain of the LUT
> as well as it's output values, maybe also things like clamping. Your struct seems to
> cover all of that.

Sure, thanks Harry.

Regards,
Uma Shankar

> >> +    ├─ "LUT_1D": blob
> >> +    └─ "NEXT": immutable color operation ID = 0
> >> +
> >> +    /* 17^3 3D LUT */
> >> +    Color operation 72
> >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> >> + matrix, 3x4
> >> matrix, 3D LUT, etc.} = 3D LUT
> >> +    ├─ "BYPASS": bool {true, false}
> >> +    ├─ "LUT_3D_SIZE": immutable range = 17
> >> +    ├─ "LUT_3D": blob
> >> +    └─ "NEXT": immutable color operation ID = 73
> >> +
> >> +
> >> +COLOR_PIPELINE Plane Property
> >> +=============================
> >> +
> >> +Color Pipelines are created by a driver and advertised via a new
> >> +COLOR_PIPELINE enum property on each plane. Values of the property
> >> +always include '0', which is the default and means all color
> >> +processing is disabled. Additional values will be the object IDs of
> >> +the first drm_colorop in a pipeline. A driver can create and
> >> +advertise none, one, or more possible color pipelines. A DRM client
> >> +will select a color pipeline by setting the COLOR PIPELINE to the respective
> value.
> >> +
> >> +In the case where drivers have custom support for pre-blending color
> >> +processing those drivers shall reject atomic commits that are trying
> >> +to use both the custom color properties, as well as the
> >> +COLOR_PIPELINE property.
> >> +
> >> +An example of a COLOR_PIPELINE property on a plane might look like this::
> >> +
> >> +    Plane 10
> >> +    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> >> +    ├─ …
> >> +    └─ "color_pipeline": enum {0, 42, 52} = 0
> >> +
> >> +
> >> +Color Pipeline Discovery
> >> +========================
> >> +
> >> +A DRM client wanting color management on a drm_plane will:
> >> +
> >> +1. Read all drm_colorop objects
> >> +2. Get the COLOR_PIPELINE property of the plane 3. iterate all
> >> +COLOR_PIPELINE enum values 4. for each enum value walk the color
> >> +pipeline (via the NEXT pointers)
> >> +   and see if the available color operations are suitable for the
> >> +   desired color management operations
> >> +
> >> +An example of chained properties to define an AMD pre-blending color
> >> +pipeline might look like this::
> >> +
> >> +    Plane 10
> >> +    ├─ "TYPE" (immutable) = Primary
> >> +    └─ "COLOR_PIPELINE": enum {0, 44} = 0
> >> +
> >> +    Color operation 44
> >> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> >> +    └─ "NEXT" (immutable) = 45
> >> +
> >> +    Color operation 45
> >> +    ├─ "TYPE" (immutable) = 3x4 Matrix
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "MATRIX_3_4": blob
> >> +    └─ "NEXT" (immutable) = 46
> >> +
> >> +    Color operation 46
> >> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB Inverse EOTF, PQ Inverse EOTF} =
> >> + sRGB
> >> EOTF
> >> +    └─ "NEXT" (immutable) = 47
> >> +
> >> +    Color operation 47
> >> +    ├─ "TYPE" (immutable) = 1D LUT
> >> +    ├─ "LUT_1D_SIZE": immutable range = 4096
> >> +    ├─ "LUT_1D_DATA": blob
> >> +    └─ "NEXT" (immutable) = 48
> >> +
> >> +    Color operation 48
> >> +    ├─ "TYPE" (immutable) = 3D LUT
> >> +    ├─ "LUT_3D_SIZE" (immutable) = 17
> >> +    ├─ "LUT_3D_DATA": blob
> >> +    └─ "NEXT" (immutable) = 49
> >> +
> >> +    Color operation 49
> >> +    ├─ "TYPE" (immutable) = 1D enumerated curve
> >> +    ├─ "BYPASS": bool
> >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, PQ EOTF} = sRGB EOTF
> >> +    └─ "NEXT" (immutable) = 0
> >> +
> >> +
> >> +Color Pipeline Programming
> >> +==========================
> >> +
> >> +Once a DRM client has found a suitable pipeline it will:
> >> +
> >> +1. Set the COLOR_PIPELINE enum value to the one pointing at the first
> >> +   drm_colorop object of the desired pipeline 2. Set the properties
> >> +for all drm_colorop objects in the pipeline to the
> >> +   desired values, setting BYPASS to true for unused drm_colorop blocks,
> >> +   and false for enabled drm_colorop blocks 3. Perform
> >> +atomic_check/commit as desired
> >> +
> >> +To configure the pipeline for an HDR10 PQ plane and blending in
> >> +linear space, a compositor might perform an atomic commit with the
> >> +following property values::
> >> +
> >> +    Plane 10
> >> +    └─ "COLOR_PIPELINE" = 42
> >> +
> >> +    Color operation 42 (input CSC)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 44 (DeGamma)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 45 (gamut remap)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 46 (shaper LUT RAM)
> >> +    └─ "BYPASS" = true
> >> +
> >> +    Color operation 47 (3D LUT RAM)
> >> +    └─ "LUT_3D_DATA" = Gamut mapping + tone mapping + night mode
> >> +
> >> +    Color operation 48 (blend gamma)
> >> +    └─ "CURVE_1D_TYPE" = PQ EOTF
> >> +
> >> +
> >> +Driver Implementer's Guide
> >> +==========================
> >> +
> >> +What does this all mean for driver implementations? As noted above
> >> +the colorops can map to HW directly but don't need to do so. Here
> >> +are some suggestions on how to think about creating your color pipelines:
> >> +
> >> +- Try to expose pipelines that use already defined colorops, even if
> >> +  your hardware pipeline is split differently. This allows existing
> >> +  userspace to immediately take advantage of the hardware.
> >> +
> >> +- Additionally, try to expose your actual hardware blocks as colorops.
> >> +  Define new colorop types where you believe it can offer
> >> +significant
> >> +  benefits if userspace learns to program them.
> >> +
> >> +- Avoid defining new colorops for compound operations with very
> >> +narrow
> >> +  scope. If you have a hardware block for a special operation that
> >> +  cannot be split further, you can expose that as a new colorop type.
> >> +  However, try to not define colorops for "use cases", especially if
> >> +  they require you to combine multiple hardware blocks.
> >> +
> >> +- Design new colorops as prescriptive, not descriptive; by the
> >> +  mathematical formula, not by the assumed input and output.
> >> +
> >> +A defined colorop type must be deterministic. Its operation can
> >> +depend only on its properties and input and nothing else, allowed
> >> +error tolerance notwithstanding.
> >> +
> >> +
> >> +Driver Forward/Backward Compatibility
> >> +=====================================
> >> +
> >> +As this is uAPI drivers can't regress color pipelines that have been
> >> +introduced for a given HW generation. New HW generations are free to
> >> +abandon color pipelines advertised for previous generations.
> >> +Nevertheless, it can be beneficial to carry support for existing
> >> +color pipelines forward as those will likely already have support in
> >> +DRM clients.
> >> +
> >> +Introducing new colorops to a pipeline is fine, as long as they can
> >> +be disabled or are purely informational. DRM clients implementing
> >> +support for the pipeline can always skip unknown properties as long
> >> +as they can be confident that doing so will not cause unexpected results.
> >> +
> >> +If a new colorop doesn't fall into one of the above categories
> >> +(bypassable or informational) the modified pipeline would be
> >> +unusable for user space. In this case a new pipeline should be defined.
> >
> > Thanks again for this nice documentation and capturing all the details clearly.
> >
> 
> Thanks for your feedback.
> 
> Harry
> 
> > Regards,
> > Uma Shankar
> >
> >> +
> >> +References
> >> +==========
> >> +
> >> +1.
> >> +https://lore.kernel.org/dri-devel/QMers3awXvNCQlyhWdTtsPwkp5ie9bze_h
> >> +D5n
> >>
> +AccFW7a_RXlWjYB7MoUW_8CKLT2bSQwIXVi5H6VULYIxCdgvryZoAoJnC5lZgyK1
> >> QWn488=
> >> +@emersion.fr/
> >> \ No newline at end of file
> >> --
> >> 2.42.0
> >


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-09 10:17       ` Shankar, Uma
@ 2023-11-09 11:55         ` Pekka Paalanen
  2023-11-10 11:27           ` Shankar, Uma
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-11-09 11:55 UTC (permalink / raw)
  To: Shankar, Uma
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 9203 bytes --]

On Thu, 9 Nov 2023 10:17:11 +0000
"Shankar, Uma" <uma.shankar@intel.com> wrote:

> > -----Original Message-----
> > From: Joshua Ashton <joshua@froggi.es>
> > Sent: Wednesday, November 8, 2023 7:13 PM
> > To: Shankar, Uma <uma.shankar@intel.com>; Harry Wentland
> > <harry.wentland@amd.com>; dri-devel@lists.freedesktop.org

...

> > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> > pipeline is needed
> > 
> > 
> > 
> > On 11/8/23 12:18, Shankar, Uma wrote:  
> > >
> > >  
> > >> -----Original Message-----
> > >> From: Harry Wentland <harry.wentland@amd.com>
> > >> Sent: Friday, October 20, 2023 2:51 AM
> > >> To: dri-devel@lists.freedesktop.org

...

> > >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive
> > >> color pipeline is needed

...

> > >> +An example of a drm_colorop object might look like one of these::
> > >> +
> > >> +    /* 1D enumerated curve */
> > >> +    Color operation 42
> > >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> > >> + matrix, 3x4
> > >> matrix, 3D LUT, etc.} = 1D enumerated curve
> > >> +    ├─ "BYPASS": bool {true, false}
> > >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ EOTF,
> > >> + PQ
> > >> inverse EOTF, …}  
> > >
> > > Having the fixed function enum for some targeted input/output may not
> > > be scalable for all usecases. There are multiple colorspaces and
> > > transfer functions possible, so it will not be possible to cover all
> > > these by any enum definitions. Also, this will depend on the capabilities of  
> > respective hardware from various vendors.
> > 
> > The reason this exists is such that certain HW vendors such as AMD have transfer
> > functions implemented in HW. It is important to take advantage of these for both
> > precision and power reasons.  
> 
> Issue we see here is that, it will be too usecase and vendor specific.
> There will be BT601, BT709, BT2020, SRGB, HDR EOTF and many more. Not to forget
> we will need linearization and non-linearization enums for each of these.

I don't see that as a problem at all. It's not a combinatorial
explosion like input/output combinations in a single enum would be.
It's always a curve and its inverse at most.

It's KMS properties, not every driver needs to implement every
defined enum value but only those values it can and wants to support.
Userspace also sees the supported list, it does not need trial and
error.

This is the only way to actually use hard-wired curves. The
alternative would be for userspace to submit a LUT of some type, and
the driver needs to start guessing if it matches one of the hard-wired
curves the hardware supports, which is just not feasible.

Hard-wired curves are an addition, not a replacement, to custom
curves defined by parameters or various different LUT representations.
Many of these hard-wired curves will emerge as is from common use cases.

> Also 
> a CTM indication to convert colospace.

Did someone propose to enumerate matrices? I would not do that, unless
you literally have hard-wired matrices in hardware and cannot do custom
matrices.

> Also, if the underlying hardware block is 
> programmable, its not limited to be used only for the colorspace management but
> can be used for other color enhancements as well by a capable client.

Yes, that's why we have other types for curves, the programmable ones.

> Hence, we feel that it is bordering on being descriptive with too many possible
> combinations (not easy to generalize). So, if hardware is programmable, lets
> expose its capability through a blob and be generic.

It's not descriptive though. It's a prescription of a mathematical
function the hardware implements as fixed-function hardware. The
function is a curve. There is no implication that the curve must be
used with specific input or output color spaces.

> For any fixed function hardware where Lut etc is stored in ROM and just a control/enable
> bit is provided to driver, we can define a pipeline with a vendor specific color block. This
> can be identified with a flag (better ways can be discussed). 

No, there is no need for that. A curve type will do well.

A vendor specific colorop needs vendor specific userspace code to
program *at all*. A generic curve colorop might list some curve types
the userspace does not understand, but also curve types userspace does
understand. The understood curve types can still be used by userspace.

> For example, on some of the Intel platform, we had a fixed function to convert colorspaces
> directly with a bit setting. These kinds of things should be vendor specific and not be part
> of generic userspace implementation.

Why would you forbid generic userspace from making use of them?

> For reference:
> 001b	YUV601 to RGB601 YUV BT.601 to RGB BT.601 conversion.
> 010b	YUV709 to RGB709 YUV BT.709 to RGB BT.709 conversion.
> 011b	YUV2020 to RGB2020 YUV BT.2020 to RGB BT.2020 conversion.
> 100b	RGB709 to RGB2020 RGB BT.709 to RGB BT.2020 conversion.

This is nothing like the curves we talked about above.

Anyway, you can expose these fixed-function operations with a colorop
that has an enum choosing the conversion. There is no need to make it
vendor-specific at all. It's possible that only specific chips from
Intel support it, but nothing stops anyone else from implementing or
emulating the colorop if they can construct a hardware configuration
achieving the same result.

It seems there are already problems in exploding the number of
pipelines to expose, so it's best to try to avoid single-use colorops
and use enums in more generic colorops instead.

> 
> > Additionally, not every vendor implements bucketed/segemented LUTs the same
> > way, so it's not feasible to expose that in a way that's particularly useful or not
> > vendor-specific.  

Joshua, I see no problem here really. They are just another type of LUT
for a curve colorop, with a different configuration blob that can be
defined in the UAPI.

> If the underlying hardware is programmable, the structure which we propose to advertise
> the capability of the block to userspace will be sufficient to compute the LUT coefficients.
> The caps can be :
> 1. Number of segments in Lut
> 2. Precision of lut
> 3. Starting and ending point of the segment
> 4. Number of samples in the segment.
> 5. Any other flag which could be useful in this computation.
> 
> This way we can compute LUT's generically and send to driver. This will be scalable for all
> colorspaces, configurations and vendors.

Drop the mention of colorspaces, and I hope so. :-)

Color spaces don't quite exist in a prescriptive pipeline definition.

> > Thus we decided to have a regular 1D LUT modulated onto a known curve.
> > This is the only real cross-vendor solution here that allows HW curve
> > implementations to be taken advantage of and also works with
> > bucketing/segemented LUTs.
> > (Including vendors we are not aware of yet).
> > 
> > This also means that vendors that only support HW curves at some stages without
> > an actual LUT are also serviced.  
> 
> Any fixed function vendor implementation should be supported but with a vendor
> specific color block. Trying to come up with enums which aligns with some underlying
> hardware may not be scalable.

I disagree with both of you.

Who said there could be only one "degamma" block on a plane's pipeline?

If hardware is best modelled as a fixed-function selectable curve
followed by a custom curve, then expose exactly those two generic
colorops. Nothing stops a pipeline from having two curve colorops in
sequence with a disjoint set of supported types or features. If some
hardware does not have one of the curve colorops, then just don't add
the missing one in a pipeline.


Thanks,
pq

> > You are right that there *might* be some usecase not covered by this right now,
> > and that it would need kernel churn to implement new curves, but unfortunately
> > that's the compromise that we (so-far) have decided on in order to ensure
> > everyone can have good, precise, power-efficient support.  
> 
> Yes, we are aligned on this. But believe programmable hardware should be able to
> expose its caps. Fixed function hardware should be non-generic and vendor specific.
> 
> > It is always possible for us to extend the uAPI at a later date for other curves, or
> > other properties that might expose a generic segmented LUT interface (such as
> > what you have proposed for a while) for vendors that can support it.
> > (With the whole color pipeline thing, we can essentially do 'versioning'
> > with that, if we wanted a new 1D LUT type.)  
> 
> Most of the hardware vendors have programmable luts (including AMD), so it would be
> good to have this as a default generic compositor implementation. And yes, any new color
> block with a type can be added to the existing API's as the need arises without breaking
> compatibility.
> 
> Regards,
> Uma Shankar
> 
> > 
> > Thanks!
> > - Joshie 🐸✨

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-09 11:55         ` Pekka Paalanen
@ 2023-11-10 11:27           ` Shankar, Uma
  2023-11-10 13:27             ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Shankar, Uma @ 2023-11-10 11:27 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton



> -----Original Message-----
> From: Pekka Paalanen <ppaalanen@gmail.com>
> Sent: Thursday, November 9, 2023 5:26 PM
> To: Shankar, Uma <uma.shankar@intel.com>
> Cc: Joshua Ashton <joshua@froggi.es>; Harry Wentland
> <harry.wentland@amd.com>; dri-devel@lists.freedesktop.org; Sebastian Wick
> <sebastian.wick@redhat.com>; Sasha McIntosh <sashamcintosh@google.com>;
> Abhinav Kumar <quic_abhinavk@quicinc.com>; Shashank Sharma
> <shashank.sharma@amd.com>; Xaver Hugl <xaver.hugl@gmail.com>; Hector
> Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Alexander
> Goins <agoins@nvidia.com>; Michel Dänzer <mdaenzer@redhat.com>; wayland-
> devel@lists.freedesktop.org; Melissa Wen <mwen@igalia.com>; Jonas Ådahl
> <jadahl@redhat.com>; Arthur Grillo <arthurgrillo@riseup.net>; Victoria
> Brekenfeld <victoria@system76.com>; Sima <daniel@ffwll.ch>; Aleix Pol
> <aleixpol@kde.org>; Naseer Ahmed <quic_naseer@quicinc.com>; Christopher
> Braga <quic_cbraga@quicinc.com>; Ville Syrjala <ville.syrjala@linux.intel.com>
> Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> pipeline is needed
> 
> On Thu, 9 Nov 2023 10:17:11 +0000
> "Shankar, Uma" <uma.shankar@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Joshua Ashton <joshua@froggi.es>
> > > Sent: Wednesday, November 8, 2023 7:13 PM
> > > To: Shankar, Uma <uma.shankar@intel.com>; Harry Wentland
> > > <harry.wentland@amd.com>; dri-devel@lists.freedesktop.org
> 
> ...
> 
> > > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > prescriptive color pipeline is needed
> > >
> > >
> > >
> > > On 11/8/23 12:18, Shankar, Uma wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Harry Wentland <harry.wentland@amd.com>
> > > >> Sent: Friday, October 20, 2023 2:51 AM
> > > >> To: dri-devel@lists.freedesktop.org
> 
> ...
> 
> > > >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > >> prescriptive color pipeline is needed
> 
> ...
> 
> > > >> +An example of a drm_colorop object might look like one of these::
> > > >> +
> > > >> +    /* 1D enumerated curve */
> > > >> +    Color operation 42
> > > >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> > > >> + matrix, 3x4
> > > >> matrix, 3D LUT, etc.} = 1D enumerated curve
> > > >> +    ├─ "BYPASS": bool {true, false}
> > > >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ
> > > >> + EOTF, PQ
> > > >> inverse EOTF, …}
> > > >
> > > > Having the fixed function enum for some targeted input/output may
> > > > not be scalable for all usecases. There are multiple colorspaces
> > > > and transfer functions possible, so it will not be possible to
> > > > cover all these by any enum definitions. Also, this will depend on
> > > > the capabilities of
> > > respective hardware from various vendors.
> > >
> > > The reason this exists is such that certain HW vendors such as AMD
> > > have transfer functions implemented in HW. It is important to take
> > > advantage of these for both precision and power reasons.
> >
> > Issue we see here is that, it will be too usecase and vendor specific.
> > There will be BT601, BT709, BT2020, SRGB, HDR EOTF and many more. Not
> > to forget we will need linearization and non-linearization enums for each of
> these.
> 
> I don't see that as a problem at all. It's not a combinatorial explosion like
> input/output combinations in a single enum would be.
> It's always a curve and its inverse at most.
> 
> It's KMS properties, not every driver needs to implement every defined enum
> value but only those values it can and wants to support.
> Userspace also sees the supported list, it does not need trial and error.
> 
> This is the only way to actually use hard-wired curves. The alternative would be
> for userspace to submit a LUT of some type, and the driver needs to start
> guessing if it matches one of the hard-wired curves the hardware supports, which
> is just not feasible.
> 
> Hard-wired curves are an addition, not a replacement, to custom curves defined
> by parameters or various different LUT representations.
> Many of these hard-wired curves will emerge as is from common use cases.

Point taken, we can go with this fixed function curve types as long as it represents a
single mathematical operation, thereby avoiding the combination nightmare.

However, just want to make sure that the same thing can be done with a programmable
hardware. In the case above, lut tables for the same need to be hardcoded in driver for
various platforms (depending on its capabilities, precision, number, and distribution of luts etc).
This is manageable, but driver will get bloated with all kinds of hardcoded lut tables,
which could have been easily computed by the compositor runtime. Driver cannot compute
the tables runtime due to the complexity of the floating math involved, so hardcoded
lut tables will be the only option. 

So we should just ensure that if these enums are not exposed by a driver, but a programmable
lut block is exposed instead, userspace should fall back to the programmable lut. Having the
fixed function enum should not become a mandatory norm to implement and expose even for a
programmable hardware.

With this we will be able to cater to both kinds of hardware with a generic userspace.
Hope this expectation is ok.

> > Also
> > a CTM indication to convert colospace.
> 
> Did someone propose to enumerate matrices? I would not do that, unless you
> literally have hard-wired matrices in hardware and cannot do custom matrices.

Not currently, but there can be fixed function matrix for certain color space or
format conversion like BT709->BT2020 etc..
However, we see this is not proposed currently and if not needed, it's fine and
don't want to bring another non-problem for discussion.

> > Also, if the underlying hardware block is programmable, its not
> > limited to be used only for the colorspace management but can be used
> > for other color enhancements as well by a capable client.
> 
> Yes, that's why we have other types for curves, the programmable ones.

Got that and agree, it's fine as mentioned above.

> > Hence, we feel that it is bordering on being descriptive with too many
> > possible combinations (not easy to generalize). So, if hardware is
> > programmable, lets expose its capability through a blob and be generic.
> 
> It's not descriptive though. It's a prescription of a mathematical function the
> hardware implements as fixed-function hardware. The function is a curve. There
> is no implication that the curve must be used with specific input or output color
> spaces.

As long as we don’t mix combinations it should be fine. But all hardware's may not
represent these fixed functions with single mathematical operation level granularity.
It would be tough to represent such color blocks with a single enum.

> > For any fixed function hardware where Lut etc is stored in ROM and
> > just a control/enable bit is provided to driver, we can define a
> > pipeline with a vendor specific color block. This can be identified with a flag
> (better ways can be discussed).
> 
> No, there is no need for that. A curve type will do well.

Agree and aligned here.

> A vendor specific colorop needs vendor specific userspace code to program *at
> all*. A generic curve colorop might list some curve types the userspace does not
> understand, but also curve types userspace does understand. The understood
> curve types can still be used by userspace.

Issue is with combination operation in hardware. If it’s a single mathematical operation,
it would be easy.

> > For example, on some of the Intel platform, we had a fixed function to
> > convert colorspaces directly with a bit setting. These kinds of things
> > should be vendor specific and not be part of generic userspace implementation.
> 
> Why would you forbid generic userspace from making use of them?

Issue is that it was not one single mathematical operation but a combination
as described below.
 
> > For reference:
> > 001b	YUV601 to RGB601 YUV BT.601 to RGB BT.601 conversion.
> > 010b	YUV709 to RGB709 YUV BT.709 to RGB BT.709 conversion.
> > 011b	YUV2020 to RGB2020 YUV BT.2020 to RGB BT.2020 conversion.
> > 100b	RGB709 to RGB2020 RGB BT.709 to RGB BT.2020 conversion.
> 
> This is nothing like the curves we talked about above.
> Anyway, you can expose these fixed-function operations with a colorop that has
> an enum choosing the conversion. There is no need to make it vendor-specific at
> all. It's possible that only specific chips from Intel support it, but nothing stops
> anyone else from implementing or emulating the colorop if they can construct a
> hardware configuration achieving the same result.
> 
> It seems there are already problems in exploding the number of pipelines to
> expose, so it's best to try to avoid single-use colorops and use enums in more
> generic colorops instead.

Yeah, this is how hardware will implement and it involves multiple mathematical operations,
controlled with one programmable bit to enable the same. These will be tough to generalize.
What should be the type of color op for these would be an open.

It would be great if we can address this generically.

> >
> > > Additionally, not every vendor implements bucketed/segemented LUTs
> > > the same way, so it's not feasible to expose that in a way that's
> > > particularly useful or not vendor-specific.
> 
> Joshua, I see no problem here really. They are just another type of LUT for a curve
> colorop, with a different configuration blob that can be defined in the UAPI.

Yeah, agree.
And the programmable hardware can be easily exposed and generalize for all vendors,
so it should not be a concern.

> > If the underlying hardware is programmable, the structure which we
> > propose to advertise the capability of the block to userspace will be sufficient to
> compute the LUT coefficients.
> > The caps can be :
> > 1. Number of segments in Lut
> > 2. Precision of lut
> > 3. Starting and ending point of the segment 4. Number of samples in
> > the segment.
> > 5. Any other flag which could be useful in this computation.
> >
> > This way we can compute LUT's generically and send to driver. This
> > will be scalable for all colorspaces, configurations and vendors.
> 
> Drop the mention of colorspaces, and I hope so. :-)
> 
> Color spaces don't quite exist in a prescriptive pipeline definition.

Yeah. For driver it's just a LUT for programmable hardware, OR mathematical
operation for fixed function hardware defined via enum 😊

> > > Thus we decided to have a regular 1D LUT modulated onto a known curve.
> > > This is the only real cross-vendor solution here that allows HW
> > > curve implementations to be taken advantage of and also works with
> > > bucketing/segemented LUTs.
> > > (Including vendors we are not aware of yet).
> > >
> > > This also means that vendors that only support HW curves at some
> > > stages without an actual LUT are also serviced.
> >
> > Any fixed function vendor implementation should be supported but with
> > a vendor specific color block. Trying to come up with enums which
> > aligns with some underlying hardware may not be scalable.
> 
> I disagree with both of you.
> 
> Who said there could be only one "degamma" block on a plane's pipeline?
> 
> If hardware is best modelled as a fixed-function selectable curve followed by a
> custom curve, then expose exactly those two generic colorops. Nothing stops a
> pipeline from having two curve colorops in sequence with a disjoint set of
> supported types or features. If some hardware does not have one of the curve
> colorops, then just don't add the missing one in a pipeline.

Agree, I think we are aligned now here.

Regards,
Uma Shankar

> 
> 
> Thanks,
> pq
> 
> > > You are right that there *might* be some usecase not covered by this
> > > right now, and that it would need kernel churn to implement new
> > > curves, but unfortunately that's the compromise that we (so-far)
> > > have decided on in order to ensure everyone can have good, precise, power-
> efficient support.
> >
> > Yes, we are aligned on this. But believe programmable hardware should
> > be able to expose its caps. Fixed function hardware should be non-generic and
> vendor specific.
> >
> > > It is always possible for us to extend the uAPI at a later date for
> > > other curves, or other properties that might expose a generic
> > > segmented LUT interface (such as what you have proposed for a while) for
> vendors that can support it.
> > > (With the whole color pipeline thing, we can essentially do 'versioning'
> > > with that, if we wanted a new 1D LUT type.)
> >
> > Most of the hardware vendors have programmable luts (including AMD),
> > so it would be good to have this as a default generic compositor
> > implementation. And yes, any new color block with a type can be added
> > to the existing API's as the need arises without breaking compatibility.
> >
> > Regards,
> > Uma Shankar
> >
> > >
> > > Thanks!
> > > - Joshie 🐸✨

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
  2023-11-10 11:27           ` Shankar, Uma
@ 2023-11-10 13:27             ` Pekka Paalanen
  0 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-11-10 13:27 UTC (permalink / raw)
  To: Shankar, Uma
  Cc: Sasha McIntosh, Liviu Dudau, Victoria Brekenfeld, dri-devel,
	Michel Dänzer, Arthur Grillo, Christopher Braga,
	Sebastian Wick, Shashank Sharma, wayland-devel, Jonas Ådahl,
	Abhinav Kumar, Naseer Ahmed, Melissa Wen, Aleix Pol,
	Hector Martin, Xaver Hugl, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 14394 bytes --]

On Fri, 10 Nov 2023 11:27:14 +0000
"Shankar, Uma" <uma.shankar@intel.com> wrote:

> > -----Original Message-----
> > From: Pekka Paalanen <ppaalanen@gmail.com>
> > Sent: Thursday, November 9, 2023 5:26 PM
> > To: Shankar, Uma <uma.shankar@intel.com>
> > Cc: Joshua Ashton <joshua@froggi.es>; Harry Wentland
> > <harry.wentland@amd.com>; dri-devel@lists.freedesktop.org; Sebastian Wick
> > <sebastian.wick@redhat.com>; Sasha McIntosh <sashamcintosh@google.com>;
> > Abhinav Kumar <quic_abhinavk@quicinc.com>; Shashank Sharma
> > <shashank.sharma@amd.com>; Xaver Hugl <xaver.hugl@gmail.com>; Hector
> > Martin <marcan@marcan.st>; Liviu Dudau <Liviu.Dudau@arm.com>; Alexander
> > Goins <agoins@nvidia.com>; Michel Dänzer <mdaenzer@redhat.com>; wayland-
> > devel@lists.freedesktop.org; Melissa Wen <mwen@igalia.com>; Jonas Ådahl
> > <jadahl@redhat.com>; Arthur Grillo <arthurgrillo@riseup.net>; Victoria
> > Brekenfeld <victoria@system76.com>; Sima <daniel@ffwll.ch>; Aleix Pol
> > <aleixpol@kde.org>; Naseer Ahmed <quic_naseer@quicinc.com>; Christopher
> > Braga <quic_cbraga@quicinc.com>; Ville Syrjala <ville.syrjala@linux.intel.com>
> > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color
> > pipeline is needed
> > 
> > On Thu, 9 Nov 2023 10:17:11 +0000
> > "Shankar, Uma" <uma.shankar@intel.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Joshua Ashton <joshua@froggi.es>
> > > > Sent: Wednesday, November 8, 2023 7:13 PM
> > > > To: Shankar, Uma <uma.shankar@intel.com>; Harry Wentland
> > > > <harry.wentland@amd.com>; dri-devel@lists.freedesktop.org  
> > 
> > ...
> >   
> > > > Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > > prescriptive color pipeline is needed
> > > >
> > > >
> > > >
> > > > On 11/8/23 12:18, Shankar, Uma wrote:  
> > > > >
> > > > >  
> > > > >> -----Original Message-----
> > > > >> From: Harry Wentland <harry.wentland@amd.com>
> > > > >> Sent: Friday, October 20, 2023 2:51 AM
> > > > >> To: dri-devel@lists.freedesktop.org  
> > 
> > ...
> >   
> > > > >> Subject: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why
> > > > >> prescriptive color pipeline is needed  
> > 
> > ...
> >   
> > > > >> +An example of a drm_colorop object might look like one of these::
> > > > >> +
> > > > >> +    /* 1D enumerated curve */
> > > > >> +    Color operation 42
> > > > >> +    ├─ "TYPE": immutable enum {1D enumerated curve, 1D LUT, 3x3
> > > > >> + matrix, 3x4
> > > > >> matrix, 3D LUT, etc.} = 1D enumerated curve
> > > > >> +    ├─ "BYPASS": bool {true, false}
> > > > >> +    ├─ "CURVE_1D_TYPE": enum {sRGB EOTF, sRGB inverse EOTF, PQ
> > > > >> + EOTF, PQ
> > > > >> inverse EOTF, …}  
> > > > >
> > > > > Having the fixed function enum for some targeted input/output may
> > > > > not be scalable for all usecases. There are multiple colorspaces
> > > > > and transfer functions possible, so it will not be possible to
> > > > > cover all these by any enum definitions. Also, this will depend on
> > > > > the capabilities of  
> > > > respective hardware from various vendors.
> > > >
> > > > The reason this exists is such that certain HW vendors such as AMD
> > > > have transfer functions implemented in HW. It is important to take
> > > > advantage of these for both precision and power reasons.  
> > >
> > > Issue we see here is that, it will be too usecase and vendor specific.
> > > There will be BT601, BT709, BT2020, SRGB, HDR EOTF and many more. Not
> > > to forget we will need linearization and non-linearization enums for each of  
> > these.
> > 
> > I don't see that as a problem at all. It's not a combinatorial explosion like
> > input/output combinations in a single enum would be.
> > It's always a curve and its inverse at most.
> > 
> > It's KMS properties, not every driver needs to implement every defined enum
> > value but only those values it can and wants to support.
> > Userspace also sees the supported list, it does not need trial and error.
> > 
> > This is the only way to actually use hard-wired curves. The alternative would be
> > for userspace to submit a LUT of some type, and the driver needs to start
> > guessing if it matches one of the hard-wired curves the hardware supports, which
> > is just not feasible.
> > 
> > Hard-wired curves are an addition, not a replacement, to custom curves defined
> > by parameters or various different LUT representations.
> > Many of these hard-wired curves will emerge as is from common use cases.  
> 
> Point taken, we can go with this fixed function curve types as long as it represents a
> single mathematical operation, thereby avoiding the combination nightmare.
> 
> However, just want to make sure that the same thing can be done with a programmable
> hardware. In the case above, lut tables for the same need to be hardcoded in driver for
> various platforms (depending on its capabilities, precision, number, and distribution of luts etc).

Hi Uma,

you can do that if you want to.

> This is manageable, but driver will get bloated with all kinds of hardcoded lut tables,
> which could have been easily computed by the compositor runtime. Driver cannot compute
> the tables runtime due to the complexity of the floating math involved, so hardcoded
> lut tables will be the only option. 

You do not have to do that if you don't want to.

> So we should just ensure that if these enums are not exposed by a driver, but a programmable
> lut block is exposed instead, userspace should fall back to the programmable lut. Having the
> fixed function enum should not become a mandatory norm to implement and expose even for a
> programmable hardware.

I agree.

> With this we will be able to cater to both kinds of hardware with a generic userspace.
> Hope this expectation is ok.
> 
> > > Also
> > > a CTM indication to convert colospace.  
> > 
> > Did someone propose to enumerate matrices? I would not do that, unless you
> > literally have hard-wired matrices in hardware and cannot do custom matrices.  
> 
> Not currently, but there can be fixed function matrix for certain color space or
> format conversion like BT709->BT2020 etc..
> However, we see this is not proposed currently and if not needed, it's fine and
> don't want to bring another non-problem for discussion.
> 
> > > Also, if the underlying hardware block is programmable, its not
> > > limited to be used only for the colorspace management but can be used
> > > for other color enhancements as well by a capable client.  
> > 
> > Yes, that's why we have other types for curves, the programmable ones.  
> 
> Got that and agree, it's fine as mentioned above.
> 
> > > Hence, we feel that it is bordering on being descriptive with too many
> > > possible combinations (not easy to generalize). So, if hardware is
> > > programmable, lets expose its capability through a blob and be generic.  
> > 
> > It's not descriptive though. It's a prescription of a mathematical function the
> > hardware implements as fixed-function hardware. The function is a curve. There
> > is no implication that the curve must be used with specific input or output color
> > spaces.  
> 
> As long as we don’t mix combinations it should be fine. But all hardware's may not
> represent these fixed functions with single mathematical operation level granularity.
> It would be tough to represent such color blocks with a single enum.

If a colorop does not fit for some hardware, then the driver should
not expose that colorop or pipeline.

> > > For any fixed function hardware where Lut etc is stored in ROM and
> > > just a control/enable bit is provided to driver, we can define a
> > > pipeline with a vendor specific color block. This can be identified with a flag  
> > (better ways can be discussed).
> > 
> > No, there is no need for that. A curve type will do well.  
> 
> Agree and aligned here.
> 
> > A vendor specific colorop needs vendor specific userspace code to program *at
> > all*. A generic curve colorop might list some curve types the userspace does not
> > understand, but also curve types userspace does understand. The understood
> > curve types can still be used by userspace.  
> 
> Issue is with combination operation in hardware. If it’s a single mathematical operation,
> it would be easy.
> 
> > > For example, on some of the Intel platform, we had a fixed function to
> > > convert colorspaces directly with a bit setting. These kinds of things
> > > should be vendor specific and not be part of generic userspace implementation.  
> > 
> > Why would you forbid generic userspace from making use of them?  
> 
> Issue is that it was not one single mathematical operation but a combination
> as described below.
>  
> > > For reference:
> > > 001b	YUV601 to RGB601 YUV BT.601 to RGB BT.601 conversion.
> > > 010b	YUV709 to RGB709 YUV BT.709 to RGB BT.709 conversion.
> > > 011b	YUV2020 to RGB2020 YUV BT.2020 to RGB BT.2020 conversion.
> > > 100b	RGB709 to RGB2020 RGB BT.709 to RGB BT.2020 conversion.  
> > 
> > This is nothing like the curves we talked about above.
> > Anyway, you can expose these fixed-function operations with a colorop that has
> > an enum choosing the conversion. There is no need to make it vendor-specific at
> > all. It's possible that only specific chips from Intel support it, but nothing stops
> > anyone else from implementing or emulating the colorop if they can construct a
> > hardware configuration achieving the same result.
> > 
> > It seems there are already problems in exploding the number of pipelines to
> > expose, so it's best to try to avoid single-use colorops and use enums in more
> > generic colorops instead.  
> 
> Yeah, this is how hardware will implement and it involves multiple mathematical operations,
> controlled with one programmable bit to enable the same. These will be tough to generalize.
> What should be the type of color op for these would be an open.
> 
> It would be great if we can address this generically.

We would need to know what those four things actually do. Your
description is very vague. Are there curves involved?


> > > > Additionally, not every vendor implements bucketed/segemented LUTs
> > > > the same way, so it's not feasible to expose that in a way that's
> > > > particularly useful or not vendor-specific.  
> > 
> > Joshua, I see no problem here really. They are just another type of LUT for a curve
> > colorop, with a different configuration blob that can be defined in the UAPI.  
> 
> Yeah, agree.
> And the programmable hardware can be easily exposed and generalize for all vendors,
> so it should not be a concern.
> 
> > > If the underlying hardware is programmable, the structure which we
> > > propose to advertise the capability of the block to userspace will be sufficient to  
> > compute the LUT coefficients.  
> > > The caps can be :
> > > 1. Number of segments in Lut
> > > 2. Precision of lut
> > > 3. Starting and ending point of the segment 4. Number of samples in
> > > the segment.
> > > 5. Any other flag which could be useful in this computation.
> > >
> > > This way we can compute LUT's generically and send to driver. This
> > > will be scalable for all colorspaces, configurations and vendors.  
> > 
> > Drop the mention of colorspaces, and I hope so. :-)
> > 
> > Color spaces don't quite exist in a prescriptive pipeline definition.  
> 
> Yeah. For driver it's just a LUT for programmable hardware, OR mathematical
> operation for fixed function hardware defined via enum 😊
> 
> > > > Thus we decided to have a regular 1D LUT modulated onto a known curve.
> > > > This is the only real cross-vendor solution here that allows HW
> > > > curve implementations to be taken advantage of and also works with
> > > > bucketing/segemented LUTs.
> > > > (Including vendors we are not aware of yet).
> > > >
> > > > This also means that vendors that only support HW curves at some
> > > > stages without an actual LUT are also serviced.  
> > >
> > > Any fixed function vendor implementation should be supported but with
> > > a vendor specific color block. Trying to come up with enums which
> > > aligns with some underlying hardware may not be scalable.  
> > 
> > I disagree with both of you.
> > 
> > Who said there could be only one "degamma" block on a plane's pipeline?
> > 
> > If hardware is best modelled as a fixed-function selectable curve followed by a
> > custom curve, then expose exactly those two generic colorops. Nothing stops a
> > pipeline from having two curve colorops in sequence with a disjoint set of
> > supported types or features. If some hardware does not have one of the curve
> > colorops, then just don't add the missing one in a pipeline.  
> 
> Agree, I think we are aligned now here.

Awesome!

Thanks,
pq


> > > > You are right that there *might* be some usecase not covered by this
> > > > right now, and that it would need kernel churn to implement new
> > > > curves, but unfortunately that's the compromise that we (so-far)
> > > > have decided on in order to ensure everyone can have good, precise, power-  
> > efficient support.  
> > >
> > > Yes, we are aligned on this. But believe programmable hardware should
> > > be able to expose its caps. Fixed function hardware should be non-generic and  
> > vendor specific.  
> > >  
> > > > It is always possible for us to extend the uAPI at a later date for
> > > > other curves, or other properties that might expose a generic
> > > > segmented LUT interface (such as what you have proposed for a while) for  
> > vendors that can support it.  
> > > > (With the whole color pipeline thing, we can essentially do 'versioning'
> > > > with that, if we wanted a new 1D LUT type.)  
> > >
> > > Most of the hardware vendors have programmable luts (including AMD),
> > > so it would be good to have this as a default generic compositor
> > > implementation. And yes, any new color block with a type can be added
> > > to the existing API's as the need arises without breaking compatibility.
> > >
> > > Regards,
> > > Uma Shankar
> > >  
> > > >
> > > > Thanks!
> > > > - Joshie 🐸✨  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-11-10 13:27 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 01/17] drm/atomic: Allow get_value for immutable properties on atomic drivers Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 02/17] drm: Don't treat 0 as -1 in drm_fixp2int_ceil Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 03/17] drm/vkms: Create separate Kconfig file for VKMS Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 04/17] drm/vkms: Add kunit tests for VKMS LUT handling Harry Wentland
2023-10-23 22:34   ` Arthur Grillo
2023-10-19 21:21 ` [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array Harry Wentland
2023-10-30 13:29   ` Pekka Paalanen
2023-11-06 20:48     ` Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed Harry Wentland
2023-10-20 14:22   ` Sebastian Wick
2023-10-20 14:57     ` Pekka Paalanen
2023-10-20 15:23       ` Harry Wentland
2023-10-23  8:12         ` Pekka Paalanen
2023-10-25 20:16           ` Alex Goins
2023-10-26  8:57             ` Pekka Paalanen
2023-10-26 17:30               ` Sebastian Wick
2023-10-26 19:25                 ` Alex Goins
2023-10-27  8:59                   ` Michel Dänzer
2023-10-27 10:01                     ` Sebastian Wick
2023-10-27 12:01                       ` Pekka Paalanen
2023-11-04 23:01                   ` Christopher Braga
2023-11-07 16:52                     ` Harry Wentland
2023-11-07 16:52                   ` Harry Wentland
2023-11-07 16:52                 ` Harry Wentland
2023-11-07 21:17                   ` Sebastian Wick
2023-11-07 16:52               ` Harry Wentland
2023-11-07 16:52             ` Harry Wentland
2023-11-08 12:18   ` Shankar, Uma
2023-11-08 13:43     ` Joshua Ashton
2023-11-09 10:17       ` Shankar, Uma
2023-11-09 11:55         ` Pekka Paalanen
2023-11-10 11:27           ` Shankar, Uma
2023-11-10 13:27             ` Pekka Paalanen
2023-11-08 14:37     ` Harry Wentland
2023-11-09 10:24       ` Shankar, Uma
2023-10-19 21:21 ` [RFC PATCH v2 07/17] drm/colorop: Introduce new drm_colorop mode object Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 08/17] drm/colorop: Add TYPE property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 09/17] drm/color: Add 1D Curve subtype Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 10/17] drm/colorop: Add BYPASS property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 11/17] drm/colorop: Add NEXT property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 12/17] drm/colorop: Add atomic state print for drm_colorop Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 13/17] drm/colorop: Add new IOCTLs to retrieve drm_colorop objects Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 14/17] drm/plane: Add COLOR PIPELINE property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 15/17] drm/colorop: Add NEXT to colorop state print Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 16/17] drm/vkms: Add enumerated 1D curve colorop Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 17/17] drm/vkms: Add kunit tests for linear and sRGB LUTs Harry Wentland
2023-11-08 11:54 ` [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Shankar, Uma
2023-11-08 14:32   ` Harry Wentland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).