dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Plane color pipeline KMS uAPI
@ 2023-05-04 15:22 Simon Ser
  2023-05-04 21:10 ` Harry Wentland
                   ` (5 more replies)
  0 siblings, 6 replies; 49+ messages in thread
From: Simon Ser @ 2023-05-04 15:22 UTC (permalink / raw)
  To: DRI Development
  Cc: Pekka Paalanen, Jonas Ådahl, xaver.hugl, Melissa Wen,
	wayland-devel, Michel Dänzer, Uma Shankar,
	Victoria Brekenfeld, Aleix Pol, Sebastian Wick, Joshua Ashton

Hi all,

The goal of this RFC is to expose a generic KMS uAPI to configure the color
pipeline before blending, ie. after a pixel is tapped from a plane's
framebuffer and before it's blended with other planes. With this new uAPI we
aim to reduce the battery life impact of color management and HDR on mobile
devices, to improve performance and to decrease latency by skipping
composition on the 3D engine. This proposal is the result of discussions at
the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
familiar with the AMD, Intel and NVIDIA hardware have participated in the
discussion.

This proposal takes a prescriptive approach instead of a descriptive approach.
Drivers describe the available hardware blocks in terms of low-level
mathematical operations, then user-space configures each block. We decided
against a descriptive approach where user-space would provide a high-level
description of the colorspace and other parameters: we want to give more
control and flexibility to user-space, e.g. to be able to replicate exactly the
color pipeline with shaders and switch between shaders and KMS pipelines
seamlessly, and to avoid forcing user-space into a particular color management
policy.

We've decided against mirroring the existing CRTC properties
DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
pipeline can significantly differ between vendors and this approach cannot
accurately abstract all hardware. In particular, the availability, ordering and
capabilities of hardware blocks is different on each display engine. So, we've
decided to go for a highly detailed hardware capability discovery.

This new uAPI should not be in conflict with existing standard KMS properties,
since there are none which control the pre-blending color pipeline at the
moment. It does conflict with any vendor-specific properties like
NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
properties. Drivers will need to either reject atomic commits configuring both
uAPIs, or alternatively we could add a DRM client cap which hides the vendor
properties and shows the new generic properties when enabled.

To use this uAPI, first user-space needs to discover hardware capabilities via
KMS objects and properties, then user-space can configure the hardware via an
atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.

Our proposal introduces a new "color_pipeline" plane property, and a new KMS
object type, "COLOROP" (short for color operation). The "color_pipeline" plane
property is an enum, each enum entry represents a color pipeline supported by
the hardware. The special zero entry indicates that the pipeline is in
"bypass"/"no-op" mode. For instance, the following plane properties describe a
primary plane with 2 supported pipelines but currently configured in bypass
mode:

    Plane 10
    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
    ├─ …
    └─ "color_pipeline": enum {0, 42, 52} = 0

The non-zero entries describe color pipelines as a linked list of COLOROP KMS
objects. The entry value is an object ID pointing to the head of the linked
list (the first operation in the color pipeline).

The new COLOROP objects also expose a number of KMS properties. Each has a
type, a reference to the next COLOROP object in the linked list, and other
type-specific properties. Here is an example for a 1D LUT operation:

    Color operation 42
    ├─ "type": enum {Bypass, 1D curve} = 1D curve
    ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
    ├─ "lut_size": immutable range = 4096
    ├─ "lut_data": blob
    └─ "next": immutable color operation ID = 43

To configure this hardware block, user-space can fill a KMS blob with 4096 u32
entries, then set "lut_data" to the blob ID. Other color operation types might
have different properties.

Here is another example with a 3D LUT:

    Color operation 42
    ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
    ├─ "lut_size": immutable range = 33
    ├─ "lut_data": blob
    └─ "next": immutable color operation ID = 43

And one last example with a matrix:

    Color operation 42
    ├─ "type": enum {Bypass, Matrix} = Matrix
    ├─ "matrix_data": blob
    └─ "next": immutable color operation ID = 43

[Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
blocks which can be bypassed instead.]

[Jonas note: perhaps a single "data" property for both LUTs and matrices
would make more sense. And a "size" prop for both 1D and 3D LUTs.]

If some hardware supports re-ordering operations in the color pipeline, the
driver can expose multiple pipelines with different operation ordering, and
user-space can pick the ordering it prefers by selecting the right pipeline.
The same scheme can be used to expose hardware blocks supporting multiple
precision levels.

That's pretty much all there is to it, but as always the devil is in the
details.

First, we realized that we need a way to indicate where the scaling operation
is happening. The contents of the framebuffer attached to the plane might be
scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
the colorspace scaling is applied in, the result will be different, so we need
a way for the kernel to indicate which hardware blocks are pre-scaling, and
which ones are post-scaling. We introduce a special "scaling" operation type,
which is part of the pipeline like other operations but serves an informational
role only (effectively, the operation cannot be configured by user-space, all
of its properties are immutable). For example:

    Color operation 43
    ├─ "type": immutable enum {Scaling} = Scaling
    └─ "next": immutable color operation ID = 44

[Simon note: an alternative would be to split the color pipeline into two, by
having two plane properties ("color_pipeline_pre_scale" and
"color_pipeline_post_scale") instead of a single one. This would be similar to
the way we want to split pre-blending and post-blending. This could be less
expressive for drivers, there may be hardware where there are dependencies
between the pre- and post-scaling pipeline?]

Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
contains some fixed-function blocks which convert from LMS to ICtCp and cannot
be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
where user-space provides a high-level description of the colorspace
conversions it needs to perform, and this is at odds with our KMS uAPI
proposal. To address this issue, we suggest adding a special block type which
describes a fixed conversion from one colorspace to another and cannot be
configured by user-space. Then user-space will need to accomodate its pipeline
for these special blocks. Such fixed hardware blocks need to be well enough
documented so that they can be implemented via shaders.

We also noted that it should always be possible for user-space to completely
disable the color pipeline and switch back to bypass/identity without a
modeset. Some drivers will need to fail atomic commits for some color
pipelines, in particular for some specific LUT payloads. For instance, AMD
doesn't support curves which are too steep, and Intel doesn't support curves
which decrease. This isn't something which routinely happens, but there might
be more cases where the hardware needs to reject the pipeline. Thus, when
user-space has a running KMS color pipeline, then hits a case where the
pipeline cannot keep running (gets rejected by the driver), user-space needs to
be able to immediately fall back to shaders without any glitch. This doesn't
seem to be an issue for AMD, Intel and NVIDIA.

This uAPI is extensible: we can add more color operations, and we can add more
properties for each color operation type. For instance, we might want to add
support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
to keep the scope of the proposal manageable.

Later on, we plan to re-use the same machinery for post-blending color
pipelines. There are some more details about post-blending which have been
separately debated at the hackfest, but we believe it's a viable plan. This
solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
we'd like to introduce a client cap to hide the old properties and show the new
post-blending color pipeline properties.

We envision a future user-space library to translate a high-level descriptive
color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
for color pipelines"). The library could also offer a translation into shaders.
This should help share more infrastructure between compositors and ease KMS
offloading. This should also help dealing with the NVIDIA case.

To wrap things up, let's take a real-world example: how would gamescope [2]
configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].

AMD would expose the following objects and properties:

    Plane 10
    ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
    └─ "color_pipeline": enum {0, 42} = 0
    Color operation 42 (input CSC)
    ├─ "type": enum {Bypass, Matrix} = Matrix
    ├─ "matrix_data": blob
    └─ "next": immutable color operation ID = 43
    Color operation 43
    ├─ "type": enum {Scaling} = Scaling
    └─ "next": immutable color operation ID = 44
    Color operation 44 (DeGamma)
    ├─ "type": enum {Bypass, 1D curve} = 1D curve
    ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
    └─ "next": immutable color operation ID = 45
    Color operation 45 (gamut remap)
    ├─ "type": enum {Bypass, Matrix} = Matrix
    ├─ "matrix_data": blob
    └─ "next": immutable color operation ID = 46
    Color operation 46 (shaper LUT RAM)
    ├─ "type": enum {Bypass, 1D curve} = 1D curve
    ├─ "1d_curve_type": enum {LUT} = LUT
    ├─ "lut_size": immutable range = 4096
    ├─ "lut_data": blob
    └─ "next": immutable color operation ID = 47
    Color operation 47 (3D LUT RAM)
    ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
    ├─ "lut_size": immutable range = 17
    ├─ "lut_data": blob
    └─ "next": immutable color operation ID = 48
    Color operation 48 (blend gamma)
    ├─ "type": enum {Bypass, 1D curve} = 1D curve
    ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
    ├─ "lut_size": immutable range = 4096
    ├─ "lut_data": blob
    └─ "next": immutable color operation ID = 0

To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
display, gamescope would perform an atomic commit with the following property
values:

    Plane 10
    └─ "color_pipeline" = 42
    Color operation 42 (input CSC)
    └─ "matrix_data" = PQ → scRGB (TF)
    Color operation 44 (DeGamma)
    └─ "type" = Bypass
    Color operation 45 (gamut remap)
    └─ "matrix_data" = scRGB (TF) → PQ
    Color operation 46 (shaper LUT RAM)
    └─ "lut_data" = PQ → Display native
    Color operation 47 (3D LUT RAM)
    └─ "lut_data" = Gamut mapping + tone mapping + night mode
    Color operation 48 (blend gamma)
    └─ "1d_curve_type" = PQ

I hope comparing these properties to the diagrams linked above can help
understand how the uAPI would be used and give an idea of its viability.

Please feel free to provide feedback! It would be especially useful to have
someone familiar with Arm SoCs look at this, to confirm that this proposal
would work there.

Unless there is a show-stopper, we plan to follow up this RFC with
implementations for AMD, Intel, NVIDIA, gamescope, and IGT.

Many thanks to everybody who contributed to the hackfest, on-site or remotely!
Let's work together to make this happen!

Simon, on behalf of the hackfest participants

[1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
[2]: https://github.com/ValveSoftware/gamescope
[3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
[4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-04 15:22 [RFC] Plane color pipeline KMS uAPI Simon Ser
@ 2023-05-04 21:10 ` Harry Wentland
  2023-05-05 11:41 ` Pekka Paalanen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-05-04 21:10 UTC (permalink / raw)
  To: Simon Ser, DRI Development
  Cc: Pekka Paalanen, Jonas Ådahl, xaver.hugl, Melissa Wen,
	wayland-devel, Michel Dänzer, Uma Shankar,
	Victoria Brekenfeld, Aleix Pol, Sebastian Wick, Joshua Ashton



On 5/4/23 11:22, Simon Ser wrote:
> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
> 

Thanks for typing this up. It does a great job describing the vision.

> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.
> 
> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.
> 
> To use this uAPI, first user-space needs to discover hardware capabilities via
> KMS objects and properties, then user-space can configure the hardware via an
> atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> 
> Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> property is an enum, each enum entry represents a color pipeline supported by
> the hardware. The special zero entry indicates that the pipeline is in
> "bypass"/"no-op" mode. For instance, the following plane properties describe a
> primary plane with 2 supported pipelines but currently configured in bypass
> mode:
> 
>      Plane 10
>      ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>      ├─ …
>      └─ "color_pipeline": enum {0, 42, 52} = 0
> 
> The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> objects. The entry value is an object ID pointing to the head of the linked
> list (the first operation in the color pipeline).
> 
> The new COLOROP objects also expose a number of KMS properties. Each has a
> type, a reference to the next COLOROP object in the linked list, and other
> type-specific properties. Here is an example for a 1D LUT operation:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 43
> 
> To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> entries, then set "lut_data" to the blob ID. Other color operation types might
> have different properties.
> 
> Here is another example with a 3D LUT:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>      ├─ "lut_size": immutable range = 33
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 43
> 
> And one last example with a matrix:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 43
> 
> [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> blocks which can be bypassed instead.]

I would favor a "bypass" boolean property.

> 
> [Jonas note: perhaps a single "data" property for both LUTs and matrices
> would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> 

I concur. We'll probably want to document for which types a property 
applies.

> If some hardware supports re-ordering operations in the color pipeline, the
> driver can expose multiple pipelines with different operation ordering, and
> user-space can pick the ordering it prefers by selecting the right pipeline.
> The same scheme can be used to expose hardware blocks supporting multiple
> precision levels.
> 
> That's pretty much all there is to it, but as always the devil is in the
> details.
> 

One such detail that might need some thought is whether the specific 
pipeline configuration exposed by a driver becomes uAPI. In theory I 
might be breaking use-cases userspace has if I change my color pipeline, 
but it would still discoverable and usable if userspace uses the uAPI in 
a truly vendor-neutral way.

Thoughts?

> First, we realized that we need a way to indicate where the scaling operation
> is happening. The contents of the framebuffer attached to the plane might be
> scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> the colorspace scaling is applied in, the result will be different, so we need
> a way for the kernel to indicate which hardware blocks are pre-scaling, and
> which ones are post-scaling. We introduce a special "scaling" operation type,
> which is part of the pipeline like other operations but serves an informational
> role only (effectively, the operation cannot be configured by user-space, all
> of its properties are immutable). For example:
> 
>      Color operation 43
>      ├─ "type": immutable enum {Scaling} = Scaling
>      └─ "next": immutable color operation ID = 44
> 
> [Simon note: an alternative would be to split the color pipeline into two, by
> having two plane properties ("color_pipeline_pre_scale" and
> "color_pipeline_post_scale") instead of a single one. This would be similar to
> the way we want to split pre-blending and post-blending. This could be less
> expressive for drivers, there may be hardware where there are dependencies
> between the pre- and post-scaling pipeline?]
> 

I would prefer to avoid splitting the pipeline again. We can't easily 
avoid the pre-/post-blending split but for scaling it might be more 
straight-forward to add a read-only scaling op. This isn't a strong 
preference since I could see either way working out well.

> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> where user-space provides a high-level description of the colorspace
> conversions it needs to perform, and this is at odds with our KMS uAPI
> proposal. To address this issue, we suggest adding a special block type which
> describes a fixed conversion from one colorspace to another and cannot be
> configured by user-space. Then user-space will need to accomodate its pipeline
> for these special blocks. Such fixed hardware blocks need to be well enough
> documented so that they can be implemented via shaders.
> 
> We also noted that it should always be possible for user-space to completely
> disable the color pipeline and switch back to bypass/identity without a
> modeset. Some drivers will need to fail atomic commits for some color
> pipelines, in particular for some specific LUT payloads. For instance, AMD
> doesn't support curves which are too steep, and Intel doesn't support curves
> which decrease. This isn't something which routinely happens, but there might
> be more cases where the hardware needs to reject the pipeline. Thus, when
> user-space has a running KMS color pipeline, then hits a case where the
> pipeline cannot keep running (gets rejected by the driver), user-space needs to
> be able to immediately fall back to shaders without any glitch. This doesn't
> seem to be an issue for AMD, Intel and NVIDIA.
> 
> This uAPI is extensible: we can add more color operations, and we can add more
> properties for each color operation type. For instance, we might want to add
> support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> to keep the scope of the proposal manageable.
> 
> Later on, we plan to re-use the same machinery for post-blending color
> pipelines. There are some more details about post-blending which have been
> separately debated at the hackfest, but we believe it's a viable plan. This
> solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> we'd like to introduce a client cap to hide the old properties and show the new
> post-blending color pipeline properties.
> 
> We envision a future user-space library to translate a high-level descriptive
> color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> for color pipelines"). The library could also offer a translation into shaders.
> This should help share more infrastructure between compositors and ease KMS
> offloading. This should also help dealing with the NVIDIA case.
> 
> To wrap things up, let's take a real-world example: how would gamescope [2]
> configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> 
> AMD would expose the following objects and properties:
> 
>      Plane 10
>      ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>      └─ "color_pipeline": enum {0, 42} = 0
>      Color operation 42 (input CSC)
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 43
>      Color operation 43
>      ├─ "type": enum {Scaling} = Scaling
>      └─ "next": immutable color operation ID = 44
>      Color operation 44 (DeGamma)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
>      └─ "next": immutable color operation ID = 45
>      Color operation 45 (gamut remap)
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 46
>      Color operation 46 (shaper LUT RAM)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 47
>      Color operation 47 (3D LUT RAM)
>      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>      ├─ "lut_size": immutable range = 17
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 48
>      Color operation 48 (blend gamma)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 0
> 
> To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> display, gamescope would perform an atomic commit with the following property
> values:
> 
>      Plane 10
>      └─ "color_pipeline" = 42
>      Color operation 42 (input CSC)
>      └─ "matrix_data" = PQ → scRGB (TF)
>      Color operation 44 (DeGamma)
>      └─ "type" = Bypass
>      Color operation 45 (gamut remap)
>      └─ "matrix_data" = scRGB (TF) → PQ
>      Color operation 46 (shaper LUT RAM)
>      └─ "lut_data" = PQ → Display native
>      Color operation 47 (3D LUT RAM)
>      └─ "lut_data" = Gamut mapping + tone mapping + night mode
>      Color operation 48 (blend gamma)
>      └─ "1d_curve_type" = PQ
> 
> I hope comparing these properties to the diagrams linked above can help
> understand how the uAPI would be used and give an idea of its viability.
> 
> Please feel free to provide feedback! It would be especially useful to have
> someone familiar with Arm SoCs look at this, to confirm that this proposal
> would work there.
> 

This is the major gap we have with this proposal, so I hope someone 
working on the Arm SoC drivers sees this and can comment.

Again, thanks for typing this up, Simon.

Harry

> Unless there is a show-stopper, we plan to follow up this RFC with
> implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> 
> Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> Let's work together to make this happen!
> 
> Simon, on behalf of the hackfest participants
> 
> [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> [2]: https://github.com/ValveSoftware/gamescope
> [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-04 15:22 [RFC] Plane color pipeline KMS uAPI Simon Ser
  2023-05-04 21:10 ` Harry Wentland
@ 2023-05-05 11:41 ` Pekka Paalanen
  2023-05-05 13:30   ` Joshua Ashton
  2023-05-05 15:28 ` Daniel Vetter
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-05 11:41 UTC (permalink / raw)
  To: Simon Ser
  Cc: xaver.hugl, DRI Development, wayland-devel, Melissa Wen,
	Jonas Ådahl, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Aleix Pol, Joshua Ashton, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 13969 bytes --]

On Thu, 04 May 2023 15:22:59 +0000
Simon Ser <contact@emersion.fr> wrote:

> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.

Hi Simon,

this is an excellent write-up, thank you!

Harry's question about what constitutes UAPI is a good one for danvet.

I don't really have much to add here, a couple inline comments. I think
this could work.

> 
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.
> 
> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.
> 
> To use this uAPI, first user-space needs to discover hardware capabilities via
> KMS objects and properties, then user-space can configure the hardware via an
> atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> 
> Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> property is an enum, each enum entry represents a color pipeline supported by
> the hardware. The special zero entry indicates that the pipeline is in
> "bypass"/"no-op" mode. For instance, the following plane properties describe a
> primary plane with 2 supported pipelines but currently configured in bypass
> mode:
> 
>     Plane 10
>     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>     ├─ …
>     └─ "color_pipeline": enum {0, 42, 52} = 0
> 
> The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> objects. The entry value is an object ID pointing to the head of the linked
> list (the first operation in the color pipeline).
> 
> The new COLOROP objects also expose a number of KMS properties. Each has a
> type, a reference to the next COLOROP object in the linked list, and other
> type-specific properties. Here is an example for a 1D LUT operation:
> 
>     Color operation 42
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>     ├─ "lut_size": immutable range = 4096
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 43
> 
> To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> entries, then set "lut_data" to the blob ID. Other color operation types might
> have different properties.
> 
> Here is another example with a 3D LUT:
> 
>     Color operation 42
>     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>     ├─ "lut_size": immutable range = 33
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 43
> 
> And one last example with a matrix:
> 
>     Color operation 42
>     ├─ "type": enum {Bypass, Matrix} = Matrix
>     ├─ "matrix_data": blob
>     └─ "next": immutable color operation ID = 43
> 
> [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> blocks which can be bypassed instead.]
> 
> [Jonas note: perhaps a single "data" property for both LUTs and matrices
> would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> 
> If some hardware supports re-ordering operations in the color pipeline, the
> driver can expose multiple pipelines with different operation ordering, and
> user-space can pick the ordering it prefers by selecting the right pipeline.
> The same scheme can be used to expose hardware blocks supporting multiple
> precision levels.
> 
> That's pretty much all there is to it, but as always the devil is in the
> details.
> 
> First, we realized that we need a way to indicate where the scaling operation
> is happening. The contents of the framebuffer attached to the plane might be
> scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> the colorspace scaling is applied in, the result will be different, so we need
> a way for the kernel to indicate which hardware blocks are pre-scaling, and
> which ones are post-scaling. We introduce a special "scaling" operation type,
> which is part of the pipeline like other operations but serves an informational
> role only (effectively, the operation cannot be configured by user-space, all
> of its properties are immutable). For example:
> 
>     Color operation 43
>     ├─ "type": immutable enum {Scaling} = Scaling
>     └─ "next": immutable color operation ID = 44

I like this.

> 
> [Simon note: an alternative would be to split the color pipeline into two, by
> having two plane properties ("color_pipeline_pre_scale" and
> "color_pipeline_post_scale") instead of a single one. This would be similar to
> the way we want to split pre-blending and post-blending. This could be less
> expressive for drivers, there may be hardware where there are dependencies
> between the pre- and post-scaling pipeline?]
> 
> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> where user-space provides a high-level description of the colorspace
> conversions it needs to perform, and this is at odds with our KMS uAPI
> proposal. To address this issue, we suggest adding a special block type which
> describes a fixed conversion from one colorspace to another and cannot be
> configured by user-space. Then user-space will need to accomodate its pipeline
> for these special blocks. Such fixed hardware blocks need to be well enough
> documented so that they can be implemented via shaders.
> 
> We also noted that it should always be possible for user-space to completely
> disable the color pipeline and switch back to bypass/identity without a
> modeset. Some drivers will need to fail atomic commits for some color
> pipelines, in particular for some specific LUT payloads. For instance, AMD
> doesn't support curves which are too steep, and Intel doesn't support curves
> which decrease. This isn't something which routinely happens, but there might
> be more cases where the hardware needs to reject the pipeline. Thus, when
> user-space has a running KMS color pipeline, then hits a case where the
> pipeline cannot keep running (gets rejected by the driver), user-space needs to
> be able to immediately fall back to shaders without any glitch. This doesn't
> seem to be an issue for AMD, Intel and NVIDIA.
> 
> This uAPI is extensible: we can add more color operations, and we can add more
> properties for each color operation type. For instance, we might want to add
> support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> to keep the scope of the proposal manageable.
> 
> Later on, we plan to re-use the same machinery for post-blending color
> pipelines. There are some more details about post-blending which have been
> separately debated at the hackfest, but we believe it's a viable plan. This
> solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> we'd like to introduce a client cap to hide the old properties and show the new
> post-blending color pipeline properties.
> 
> We envision a future user-space library to translate a high-level descriptive
> color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> for color pipelines"). The library could also offer a translation into shaders.
> This should help share more infrastructure between compositors and ease KMS
> offloading. This should also help dealing with the NVIDIA case.
> 
> To wrap things up, let's take a real-world example: how would gamescope [2]
> configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> 
> AMD would expose the following objects and properties:
> 
>     Plane 10
>     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>     └─ "color_pipeline": enum {0, 42} = 0
>     Color operation 42 (input CSC)
>     ├─ "type": enum {Bypass, Matrix} = Matrix
>     ├─ "matrix_data": blob
>     └─ "next": immutable color operation ID = 43
>     Color operation 43
>     ├─ "type": enum {Scaling} = Scaling
>     └─ "next": immutable color operation ID = 44
>     Color operation 44 (DeGamma)
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
>     └─ "next": immutable color operation ID = 45
>     Color operation 45 (gamut remap)
>     ├─ "type": enum {Bypass, Matrix} = Matrix
>     ├─ "matrix_data": blob
>     └─ "next": immutable color operation ID = 46
>     Color operation 46 (shaper LUT RAM)
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {LUT} = LUT
>     ├─ "lut_size": immutable range = 4096
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 47
>     Color operation 47 (3D LUT RAM)
>     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>     ├─ "lut_size": immutable range = 17
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 48
>     Color operation 48 (blend gamma)
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
>     ├─ "lut_size": immutable range = 4096
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 0
> 
> To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> display, gamescope would perform an atomic commit with the following property
> values:
> 
>     Plane 10
>     └─ "color_pipeline" = 42
>     Color operation 42 (input CSC)
>     └─ "matrix_data" = PQ → scRGB (TF)
>     Color operation 44 (DeGamma)
>     └─ "type" = Bypass
>     Color operation 45 (gamut remap)
>     └─ "matrix_data" = scRGB (TF) → PQ
>     Color operation 46 (shaper LUT RAM)
>     └─ "lut_data" = PQ → Display native
>     Color operation 47 (3D LUT RAM)
>     └─ "lut_data" = Gamut mapping + tone mapping + night mode
>     Color operation 48 (blend gamma)
>     └─ "1d_curve_type" = PQ

You cannot do a TF with a matrix, and a gamut remap with a matrix on
electrical values is certainly surprising, so the example here is a
bit odd, but I don't think that hurts the intention of demonstration.

Btw. ISTR that if you want to do scaling properly with alpha channel,
you need optical values multiplied by alpha. Alpha vs. scaling is just
yet another thing to look into, and TF operations do not work with
pre-mult.


Thanks,
pq

> 
> I hope comparing these properties to the diagrams linked above can help
> understand how the uAPI would be used and give an idea of its viability.
> 
> Please feel free to provide feedback! It would be especially useful to have
> someone familiar with Arm SoCs look at this, to confirm that this proposal
> would work there.
> 
> Unless there is a show-stopper, we plan to follow up this RFC with
> implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> 
> Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> Let's work together to make this happen!
> 
> Simon, on behalf of the hackfest participants
> 
> [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> [2]: https://github.com/ValveSoftware/gamescope
> [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 11:41 ` Pekka Paalanen
@ 2023-05-05 13:30   ` Joshua Ashton
  2023-05-05 14:16     ` Pekka Paalanen
                       ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Joshua Ashton @ 2023-05-05 13:30 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: xaver.hugl, DRI Development, wayland-devel, Melissa Wen,
	Jonas Ådahl, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Aleix Pol, Sebastian Wick

Some corrections and replies inline.

On Fri, 5 May 2023 at 12:42, Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Thu, 04 May 2023 15:22:59 +0000
> Simon Ser <contact@emersion.fr> wrote:
>
> > Hi all,
> >
> > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > pipeline before blending, ie. after a pixel is tapped from a plane's
> > framebuffer and before it's blended with other planes. With this new uAPI we
> > aim to reduce the battery life impact of color management and HDR on mobile
> > devices, to improve performance and to decrease latency by skipping
> > composition on the 3D engine. This proposal is the result of discussions at
> > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > discussion.
>
> Hi Simon,
>
> this is an excellent write-up, thank you!
>
> Harry's question about what constitutes UAPI is a good one for danvet.
>
> I don't really have much to add here, a couple inline comments. I think
> this could work.
>
> >
> > This proposal takes a prescriptive approach instead of a descriptive approach.
> > Drivers describe the available hardware blocks in terms of low-level
> > mathematical operations, then user-space configures each block. We decided
> > against a descriptive approach where user-space would provide a high-level
> > description of the colorspace and other parameters: we want to give more
> > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > color pipeline with shaders and switch between shaders and KMS pipelines
> > seamlessly, and to avoid forcing user-space into a particular color management
> > policy.
> >
> > We've decided against mirroring the existing CRTC properties
> > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > pipeline can significantly differ between vendors and this approach cannot
> > accurately abstract all hardware. In particular, the availability, ordering and
> > capabilities of hardware blocks is different on each display engine. So, we've
> > decided to go for a highly detailed hardware capability discovery.
> >
> > This new uAPI should not be in conflict with existing standard KMS properties,
> > since there are none which control the pre-blending color pipeline at the
> > moment. It does conflict with any vendor-specific properties like
> > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > properties. Drivers will need to either reject atomic commits configuring both
> > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > properties and shows the new generic properties when enabled.
> >
> > To use this uAPI, first user-space needs to discover hardware capabilities via
> > KMS objects and properties, then user-space can configure the hardware via an
> > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> >
> > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> > property is an enum, each enum entry represents a color pipeline supported by
> > the hardware. The special zero entry indicates that the pipeline is in
> > "bypass"/"no-op" mode. For instance, the following plane properties describe a
> > primary plane with 2 supported pipelines but currently configured in bypass
> > mode:
> >
> >     Plane 10
> >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> >     ├─ …
> >     └─ "color_pipeline": enum {0, 42, 52} = 0
> >
> > The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> > objects. The entry value is an object ID pointing to the head of the linked
> > list (the first operation in the color pipeline).
> >
> > The new COLOROP objects also expose a number of KMS properties. Each has a
> > type, a reference to the next COLOROP object in the linked list, and other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> >     Color operation 42
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> >     ├─ "lut_size": immutable range = 4096
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 43
> >
> > To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation types might
> > have different properties.
> >
> > Here is another example with a 3D LUT:
> >
> >     Color operation 42
> >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >     ├─ "lut_size": immutable range = 33
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 43
> >
> > And one last example with a matrix:
> >
> >     Color operation 42
> >     ├─ "type": enum {Bypass, Matrix} = Matrix
> >     ├─ "matrix_data": blob
> >     └─ "next": immutable color operation ID = 43
> >
> > [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> > a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> > blocks which can be bypassed instead.]
> >
> > [Jonas note: perhaps a single "data" property for both LUTs and matrices
> > would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> >
> > If some hardware supports re-ordering operations in the color pipeline, the
> > driver can expose multiple pipelines with different operation ordering, and
> > user-space can pick the ordering it prefers by selecting the right pipeline.
> > The same scheme can be used to expose hardware blocks supporting multiple
> > precision levels.
> >
> > That's pretty much all there is to it, but as always the devil is in the
> > details.
> >
> > First, we realized that we need a way to indicate where the scaling operation
> > is happening. The contents of the framebuffer attached to the plane might be
> > scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> > the colorspace scaling is applied in, the result will be different, so we need
> > a way for the kernel to indicate which hardware blocks are pre-scaling, and
> > which ones are post-scaling. We introduce a special "scaling" operation type,
> > which is part of the pipeline like other operations but serves an informational
> > role only (effectively, the operation cannot be configured by user-space, all
> > of its properties are immutable). For example:
> >
> >     Color operation 43
> >     ├─ "type": immutable enum {Scaling} = Scaling
> >     └─ "next": immutable color operation ID = 44
>
> I like this.
>
> >
> > [Simon note: an alternative would be to split the color pipeline into two, by
> > having two plane properties ("color_pipeline_pre_scale" and
> > "color_pipeline_post_scale") instead of a single one. This would be similar to
> > the way we want to split pre-blending and post-blending. This could be less
> > expressive for drivers, there may be hardware where there are dependencies
> > between the pre- and post-scaling pipeline?]
> >
> > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> > contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> > be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> > where user-space provides a high-level description of the colorspace
> > conversions it needs to perform, and this is at odds with our KMS uAPI
> > proposal. To address this issue, we suggest adding a special block type which
> > describes a fixed conversion from one colorspace to another and cannot be
> > configured by user-space. Then user-space will need to accomodate its pipeline
> > for these special blocks. Such fixed hardware blocks need to be well enough
> > documented so that they can be implemented via shaders.
> >
> > We also noted that it should always be possible for user-space to completely
> > disable the color pipeline and switch back to bypass/identity without a
> > modeset. Some drivers will need to fail atomic commits for some color
> > pipelines, in particular for some specific LUT payloads. For instance, AMD
> > doesn't support curves which are too steep, and Intel doesn't support curves
> > which decrease. This isn't something which routinely happens, but there might
> > be more cases where the hardware needs to reject the pipeline. Thus, when
> > user-space has a running KMS color pipeline, then hits a case where the
> > pipeline cannot keep running (gets rejected by the driver), user-space needs to
> > be able to immediately fall back to shaders without any glitch. This doesn't
> > seem to be an issue for AMD, Intel and NVIDIA.
> >
> > This uAPI is extensible: we can add more color operations, and we can add more
> > properties for each color operation type. For instance, we might want to add
> > support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> > the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> > to keep the scope of the proposal manageable.
> >
> > Later on, we plan to re-use the same machinery for post-blending color
> > pipelines. There are some more details about post-blending which have been
> > separately debated at the hackfest, but we believe it's a viable plan. This
> > solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> > we'd like to introduce a client cap to hide the old properties and show the new
> > post-blending color pipeline properties.
> >
> > We envision a future user-space library to translate a high-level descriptive
> > color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> > for color pipelines"). The library could also offer a translation into shaders.
> > This should help share more infrastructure between compositors and ease KMS
> > offloading. This should also help dealing with the NVIDIA case.
> >
> > To wrap things up, let's take a real-world example: how would gamescope [2]
> > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> >
> > AMD would expose the following objects and properties:
> >
> >     Plane 10
> >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> >     └─ "color_pipeline": enum {0, 42} = 0
> >     Color operation 42 (input CSC)
> >     ├─ "type": enum {Bypass, Matrix} = Matrix
> >     ├─ "matrix_data": blob
> >     └─ "next": immutable color operation ID = 43
> >     Color operation 43
> >     ├─ "type": enum {Scaling} = Scaling
> >     └─ "next": immutable color operation ID = 44
> >     Color operation 44 (DeGamma)
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> >     └─ "next": immutable color operation ID = 45

Some vendors have per-tap degamma and some have a degamma after the sample.
How do we distinguish that behaviour?
It is important to know.

> >     Color operation 45 (gamut remap)
> >     ├─ "type": enum {Bypass, Matrix} = Matrix
> >     ├─ "matrix_data": blob
> >     └─ "next": immutable color operation ID = 46
> >     Color operation 46 (shaper LUT RAM)
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {LUT} = LUT
> >     ├─ "lut_size": immutable range = 4096
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 47
> >     Color operation 47 (3D LUT RAM)
> >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >     ├─ "lut_size": immutable range = 17
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 48
> >     Color operation 48 (blend gamma)
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
> >     ├─ "lut_size": immutable range = 4096
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 0
> >
> > To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> > display, gamescope would perform an atomic commit with the following property
> > values:
> >
> >     Plane 10
> >     └─ "color_pipeline" = 42
> >     Color operation 42 (input CSC)
> >     └─ "matrix_data" = PQ → scRGB (TF)

^
Not sure what this is.
We don't use an input CSC before degamma.

> >     Color operation 44 (DeGamma)
> >     └─ "type" = Bypass

^
If we did PQ, this would be PQ -> Linear / 80
If this was sRGB, it'd be sRGB -> Linear
If this was scRGB this would be just treating it as it is. So... Linear / 80.

> >     Color operation 45 (gamut remap)
> >     └─ "matrix_data" = scRGB (TF) → PQ

^
This is wrong, we just use this to do scRGB primaries (709) to 2020.

We then go from scRGB -> PQ to go into our shaper + 3D LUT.

> >     Color operation 46 (shaper LUT RAM)
> >     └─ "lut_data" = PQ → Display native

^
"Display native" is just the response curve of the display.
In HDR10, this would just be PQ -> PQ
If we were doing HDR10 on SDR, this would be PQ -> Gamma 2.2 (mapped
from 0 to display native luminance) [with a potential bit of headroom
for tonemapping in the 3D LUT]
For SDR on HDR10 this would be Gamma 2.2 -> PQ (Not intending to start
an sRGB vs G2.2 argument here! :P)

> >     Color operation 47 (3D LUT RAM)
> >     └─ "lut_data" = Gamut mapping + tone mapping + night mode
> >     Color operation 48 (blend gamma)
> >     └─ "1d_curve_type" = PQ

^
This is wrong, this should be Display Native -> Linearized Display Referred

>
> You cannot do a TF with a matrix, and a gamut remap with a matrix on
> electrical values is certainly surprising, so the example here is a
> bit odd, but I don't think that hurts the intention of demonstration.

I have done some corrections inline.

You can see our fully correct color pipeline here:
https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png

Please let me know if you have any more questions about our color pipeline.

>
> Btw. ISTR that if you want to do scaling properly with alpha channel,
> you need optical values multiplied by alpha. Alpha vs. scaling is just
> yet another thing to look into, and TF operations do not work with
> pre-mult.

What are your concerns here?

Having pre-multiplied alpha is fine with a TF: the alpha was
premultiplied in linear, then encoded with the TF by the client.
If you think of a TF as something something relative to a bunch of
reference state or whatever then you might think "oh you can't do
that!", but you really can.
It's really best to just think of it as a mathematical encoding of a
value in all instances that we touch.

The only issue is that you lose precision from having pre-multiplied
alpha as it's quantized to fit into the DRM format rather than using
the full range then getting divided by the alpha at blend time.
It doesn't end up being a visible issue ever however in my experience, at 8bpc.

Thanks
 - Joshie 🐸✨

>
>
> Thanks,
> pq
>
> >
> > I hope comparing these properties to the diagrams linked above can help
> > understand how the uAPI would be used and give an idea of its viability.
> >
> > Please feel free to provide feedback! It would be especially useful to have
> > someone familiar with Arm SoCs look at this, to confirm that this proposal
> > would work there.
> >
> > Unless there is a show-stopper, we plan to follow up this RFC with
> > implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> >
> > Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> > Let's work together to make this happen!
> >
> > Simon, on behalf of the hackfest participants
> >
> > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> > [2]: https://github.com/ValveSoftware/gamescope
> > [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 13:30   ` Joshua Ashton
@ 2023-05-05 14:16     ` Pekka Paalanen
  2023-05-05 17:01       ` Joshua Ashton
  2023-05-09 11:23     ` Melissa Wen
  2023-05-11 21:21     ` Simon Ser
  2 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-05 14:16 UTC (permalink / raw)
  To: Joshua Ashton
  Cc: xaver.hugl, DRI Development, wayland-devel, Melissa Wen,
	Jonas Ådahl, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Aleix Pol, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 4849 bytes --]

On Fri, 5 May 2023 14:30:11 +0100
Joshua Ashton <joshua@froggi.es> wrote:

> Some corrections and replies inline.
> 
> On Fri, 5 May 2023 at 12:42, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Thu, 04 May 2023 15:22:59 +0000
> > Simon Ser <contact@emersion.fr> wrote:

...

> > > To wrap things up, let's take a real-world example: how would gamescope [2]
> > > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> > > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> > >
> > > AMD would expose the following objects and properties:
> > >
> > >     Plane 10
> > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > >     └─ "color_pipeline": enum {0, 42} = 0
> > >     Color operation 42 (input CSC)
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >     Color operation 43
> > >     ├─ "type": enum {Scaling} = Scaling
> > >     └─ "next": immutable color operation ID = 44
> > >     Color operation 44 (DeGamma)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> > >     └─ "next": immutable color operation ID = 45  
> 
> Some vendors have per-tap degamma and some have a degamma after the sample.
> How do we distinguish that behaviour?
> It is important to know.

...

> > Btw. ISTR that if you want to do scaling properly with alpha channel,
> > you need optical values multiplied by alpha. Alpha vs. scaling is just
> > yet another thing to look into, and TF operations do not work with
> > pre-mult.  
> 
> What are your concerns here?

I believe this is exactly the same question as yours about sampling, at
least for up-scaling where sampling the framebuffer interpolates in
some way.

Oh, interpolation mode would fit in the scaling COLOROP...

> Having pre-multiplied alpha is fine with a TF: the alpha was
> premultiplied in linear, then encoded with the TF by the client.

There are two different ways to pre-multiply: into optical values
(okay), and into electrical values (what everyone actually does, and
what Wayland assumes by default).

What you described is the thing mostly no-one does in GUI graphics.
Even in the web.

> If you think of a TF as something something relative to a bunch of
> reference state or whatever then you might think "oh you can't do
> that!", but you really can.
> It's really best to just think of it as a mathematical encoding of a
> value in all instances that we touch.

True, except when it's false. If you assume that decoding is the exact
mathematical inverse of encoding, then your conclusion follows.

Unfortunately many video standards do not have it so. BT.601, BT.709,
and I forget if BT.2020 (SDR) as well encode with one function and
decode with something that is not the inverse, and it is totally
intentional and necessary mangling of the values to get the expected
result on screen. Someone has called this "implicit color management".

So one needs to be very careful here what the actual characteristics
are.

> The only issue is that you lose precision from having pre-multiplied
> alpha as it's quantized to fit into the DRM format rather than using
> the full range then getting divided by the alpha at blend time.
> It doesn't end up being a visible issue ever however in my experience, at 8bpc.

That's true. Wait, why would you divide by alpha for blending?
Blending/interpolation is the only operation where pre-mult is useful.


Thanks,
pq

> 
> Thanks
>  - Joshie 🐸✨
> 
> >
> >
> > Thanks,
> > pq
> >  
> > >
> > > I hope comparing these properties to the diagrams linked above can help
> > > understand how the uAPI would be used and give an idea of its viability.
> > >
> > > Please feel free to provide feedback! It would be especially useful to have
> > > someone familiar with Arm SoCs look at this, to confirm that this proposal
> > > would work there.
> > >
> > > Unless there is a show-stopper, we plan to follow up this RFC with
> > > implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> > >
> > > Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> > > Let's work together to make this happen!
> > >
> > > Simon, on behalf of the hackfest participants
> > >
> > > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> > > [2]: https://github.com/ValveSoftware/gamescope
> > > [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg  
> >  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-04 15:22 [RFC] Plane color pipeline KMS uAPI Simon Ser
  2023-05-04 21:10 ` Harry Wentland
  2023-05-05 11:41 ` Pekka Paalanen
@ 2023-05-05 15:28 ` Daniel Vetter
  2023-05-05 15:57   ` Sebastian Wick
  2023-05-05 16:06   ` Simon Ser
  2023-05-05 20:40 ` Dave Airlie
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 49+ messages in thread
From: Daniel Vetter @ 2023-05-05 15:28 UTC (permalink / raw)
  To: Simon Ser
  Cc: Pekka Paalanen, wayland-devel, Michel Dänzer,
	DRI Development, xaver.hugl, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Aleix Pol, Sebastian Wick,
	Joshua Ashton

On Thu, May 04, 2023 at 03:22:59PM +0000, Simon Ser wrote:
> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
> 
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.

Ack on the prescriptive approach, but generic imo. Descriptive pretty much
means you need the shaders at the same api level for fallback purposes,
and we're not going to have that ever in kms. That would need something
like hwc in userspace to work.

And not generic in it's ultimate consquence would mean we just do a blob
for a crtc with all the vendor register stuff like adf (android display
framework) does, because I really don't see a point in trying a
generic-looking-but-not vendor uapi with each color op/stage split out.

So from very far and pure gut feeling, this seems like a good middle
ground in the uapi design space we have here.

> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.
> 
> To use this uAPI, first user-space needs to discover hardware capabilities via
> KMS objects and properties, then user-space can configure the hardware via an
> atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> 
> Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> property is an enum, each enum entry represents a color pipeline supported by
> the hardware. The special zero entry indicates that the pipeline is in
> "bypass"/"no-op" mode. For instance, the following plane properties describe a
> primary plane with 2 supported pipelines but currently configured in bypass
> mode:
> 
>     Plane 10
>     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>     ├─ …
>     └─ "color_pipeline": enum {0, 42, 52} = 0

A bit confused, why is this an enum, and not just an immutable prop that
points at the first element? You already can disable elements with the
bypass thing, also bypassing by changing the pointers to the next node in
the graph seems a bit confusing and redundant.

> The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> objects. The entry value is an object ID pointing to the head of the linked
> list (the first operation in the color pipeline).
> 
> The new COLOROP objects also expose a number of KMS properties. Each has a
> type, a reference to the next COLOROP object in the linked list, and other
> type-specific properties. Here is an example for a 1D LUT operation:

Ok no comments from me on the actual color operations and semantics of all
that, because I have simply nothing to bring to that except confusion :-)

Some higher level thoughts instead:

- I really like that we just go with graph nodes here. I think that was
  bound to happen sooner or later with kms (we almost got there with
  writeback, and with hindsight maybe should have).

- Since there's other use-cases for graph nodes (maybe scaler modes, or
  histogram samplers for adaptive backglight, or blending that goes beyond
  the stacked alpha blending we have now) it think we should make this all
  fairly generic:
  * Add a new graph node kms object type.
  * Add a class type so that userspace knows which graph nodes it must
    understand for a feature (like "ColorOp" on planes here), and which it
    can ignore (like perhaps a scaler node to control the interpolation)
  * Probably need to adjust the object property type. Currently that
    accept any object of a given type (crtc, fb, blob are the major ones).
    I think for these graph nodes we want an explicit enumeration of the
    possible next objects. In kms thus far we've done that with the
    separate possible_* mask properties, but they're cumbersome.
  * It sounds like for now we only have immutable next pointers, so that
    would simplify the first iteration, but should probably anticipate all
    this.

- I think the graph node should be built on top of the driver private
  atomic obj/state stuff, and could then be further subclassed for
  specific types. It's a bit much stacking, but avoids too much wheel
  reinventing, and the worst boilerplate can be avoided with some macros
  that combine the pointer chasing with the containter_of upcast. With
  that you can easily build some helpers to walk the graph for a crtc or
  plane or whatever really.

- I guess core atomic code should at least do the graph link validation
  and basic things like that, probably not really more to do. And
  validating the standard properties on some graph nodes ofc.

- I have no idea how we should support the standardization of the state
  structures. Doing a separate subclass for each type sounds extremely
  painful, but unions otoh are ugly. Ideally type-indexed and type safe
  union but C isn't good enough for that. I do think that we should keep
  up the goal that standard properties are decoded into state structures
  in core atomic code, and not in each implementation individaully.

- I think the only other precendent for something like this is the media
  control api in the media subystem. I think it'd be really good to get
  someone like Laurent to ack the graph node infrastructure to make sure
  we're missing any lesson they've learned already. If there's anything
  else we should pull these folks in too ofc.

For merge plan I dropped some ideas already on Harry's rfc for
vendor-private properties, the only thing to add is that we might want to
type up the consensus plan into a merged doc like
Documentation/gpu/rfc/hdr-plane.rst or whatever you feel like for a name.

Cheers, Daniel


> 
>     Color operation 42
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>     ├─ "lut_size": immutable range = 4096
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 43
> 
> To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> entries, then set "lut_data" to the blob ID. Other color operation types might
> have different properties.
> 
> Here is another example with a 3D LUT:
> 
>     Color operation 42
>     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>     ├─ "lut_size": immutable range = 33
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 43
> 
> And one last example with a matrix:
> 
>     Color operation 42
>     ├─ "type": enum {Bypass, Matrix} = Matrix
>     ├─ "matrix_data": blob
>     └─ "next": immutable color operation ID = 43
> 
> [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> blocks which can be bypassed instead.]
> 
> [Jonas note: perhaps a single "data" property for both LUTs and matrices
> would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> 
> If some hardware supports re-ordering operations in the color pipeline, the
> driver can expose multiple pipelines with different operation ordering, and
> user-space can pick the ordering it prefers by selecting the right pipeline.
> The same scheme can be used to expose hardware blocks supporting multiple
> precision levels.
> 
> That's pretty much all there is to it, but as always the devil is in the
> details.
> 
> First, we realized that we need a way to indicate where the scaling operation
> is happening. The contents of the framebuffer attached to the plane might be
> scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> the colorspace scaling is applied in, the result will be different, so we need
> a way for the kernel to indicate which hardware blocks are pre-scaling, and
> which ones are post-scaling. We introduce a special "scaling" operation type,
> which is part of the pipeline like other operations but serves an informational
> role only (effectively, the operation cannot be configured by user-space, all
> of its properties are immutable). For example:
> 
>     Color operation 43
>     ├─ "type": immutable enum {Scaling} = Scaling
>     └─ "next": immutable color operation ID = 44
> 
> [Simon note: an alternative would be to split the color pipeline into two, by
> having two plane properties ("color_pipeline_pre_scale" and
> "color_pipeline_post_scale") instead of a single one. This would be similar to
> the way we want to split pre-blending and post-blending. This could be less
> expressive for drivers, there may be hardware where there are dependencies
> between the pre- and post-scaling pipeline?]
> 
> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> where user-space provides a high-level description of the colorspace
> conversions it needs to perform, and this is at odds with our KMS uAPI
> proposal. To address this issue, we suggest adding a special block type which
> describes a fixed conversion from one colorspace to another and cannot be
> configured by user-space. Then user-space will need to accomodate its pipeline
> for these special blocks. Such fixed hardware blocks need to be well enough
> documented so that they can be implemented via shaders.
> 
> We also noted that it should always be possible for user-space to completely
> disable the color pipeline and switch back to bypass/identity without a
> modeset. Some drivers will need to fail atomic commits for some color
> pipelines, in particular for some specific LUT payloads. For instance, AMD
> doesn't support curves which are too steep, and Intel doesn't support curves
> which decrease. This isn't something which routinely happens, but there might
> be more cases where the hardware needs to reject the pipeline. Thus, when
> user-space has a running KMS color pipeline, then hits a case where the
> pipeline cannot keep running (gets rejected by the driver), user-space needs to
> be able to immediately fall back to shaders without any glitch. This doesn't
> seem to be an issue for AMD, Intel and NVIDIA.
> 
> This uAPI is extensible: we can add more color operations, and we can add more
> properties for each color operation type. For instance, we might want to add
> support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> to keep the scope of the proposal manageable.
> 
> Later on, we plan to re-use the same machinery for post-blending color
> pipelines. There are some more details about post-blending which have been
> separately debated at the hackfest, but we believe it's a viable plan. This
> solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> we'd like to introduce a client cap to hide the old properties and show the new
> post-blending color pipeline properties.
> 
> We envision a future user-space library to translate a high-level descriptive
> color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> for color pipelines"). The library could also offer a translation into shaders.
> This should help share more infrastructure between compositors and ease KMS
> offloading. This should also help dealing with the NVIDIA case.
> 
> To wrap things up, let's take a real-world example: how would gamescope [2]
> configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> 
> AMD would expose the following objects and properties:
> 
>     Plane 10
>     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>     └─ "color_pipeline": enum {0, 42} = 0
>     Color operation 42 (input CSC)
>     ├─ "type": enum {Bypass, Matrix} = Matrix
>     ├─ "matrix_data": blob
>     └─ "next": immutable color operation ID = 43
>     Color operation 43
>     ├─ "type": enum {Scaling} = Scaling
>     └─ "next": immutable color operation ID = 44
>     Color operation 44 (DeGamma)
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
>     └─ "next": immutable color operation ID = 45
>     Color operation 45 (gamut remap)
>     ├─ "type": enum {Bypass, Matrix} = Matrix
>     ├─ "matrix_data": blob
>     └─ "next": immutable color operation ID = 46
>     Color operation 46 (shaper LUT RAM)
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {LUT} = LUT
>     ├─ "lut_size": immutable range = 4096
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 47
>     Color operation 47 (3D LUT RAM)
>     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>     ├─ "lut_size": immutable range = 17
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 48
>     Color operation 48 (blend gamma)
>     ├─ "type": enum {Bypass, 1D curve} = 1D curve
>     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
>     ├─ "lut_size": immutable range = 4096
>     ├─ "lut_data": blob
>     └─ "next": immutable color operation ID = 0
> 
> To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> display, gamescope would perform an atomic commit with the following property
> values:
> 
>     Plane 10
>     └─ "color_pipeline" = 42
>     Color operation 42 (input CSC)
>     └─ "matrix_data" = PQ → scRGB (TF)
>     Color operation 44 (DeGamma)
>     └─ "type" = Bypass
>     Color operation 45 (gamut remap)
>     └─ "matrix_data" = scRGB (TF) → PQ
>     Color operation 46 (shaper LUT RAM)
>     └─ "lut_data" = PQ → Display native
>     Color operation 47 (3D LUT RAM)
>     └─ "lut_data" = Gamut mapping + tone mapping + night mode
>     Color operation 48 (blend gamma)
>     └─ "1d_curve_type" = PQ
> 
> I hope comparing these properties to the diagrams linked above can help
> understand how the uAPI would be used and give an idea of its viability.
> 
> Please feel free to provide feedback! It would be especially useful to have
> someone familiar with Arm SoCs look at this, to confirm that this proposal
> would work there.
> 
> Unless there is a show-stopper, we plan to follow up this RFC with
> implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> 
> Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> Let's work together to make this happen!
> 
> Simon, on behalf of the hackfest participants
> 
> [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> [2]: https://github.com/ValveSoftware/gamescope
> [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 15:28 ` Daniel Vetter
@ 2023-05-05 15:57   ` Sebastian Wick
  2023-05-05 19:51     ` Daniel Vetter
  2023-05-05 16:06   ` Simon Ser
  1 sibling, 1 reply; 49+ messages in thread
From: Sebastian Wick @ 2023-05-05 15:57 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Pekka Paalanen, xaver.hugl, DRI Development, wayland-devel,
	Melissa Wen, Michel Dänzer, Jonas Ådahl,
	Victoria Brekenfeld, Aleix Pol, Joshua Ashton, Uma Shankar

On Fri, May 5, 2023 at 5:28 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, May 04, 2023 at 03:22:59PM +0000, Simon Ser wrote:
> > Hi all,
> >
> > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > pipeline before blending, ie. after a pixel is tapped from a plane's
> > framebuffer and before it's blended with other planes. With this new uAPI we
> > aim to reduce the battery life impact of color management and HDR on mobile
> > devices, to improve performance and to decrease latency by skipping
> > composition on the 3D engine. This proposal is the result of discussions at
> > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > discussion.
> >
> > This proposal takes a prescriptive approach instead of a descriptive approach.
> > Drivers describe the available hardware blocks in terms of low-level
> > mathematical operations, then user-space configures each block. We decided
> > against a descriptive approach where user-space would provide a high-level
> > description of the colorspace and other parameters: we want to give more
> > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > color pipeline with shaders and switch between shaders and KMS pipelines
> > seamlessly, and to avoid forcing user-space into a particular color management
> > policy.
>
> Ack on the prescriptive approach, but generic imo. Descriptive pretty much
> means you need the shaders at the same api level for fallback purposes,
> and we're not going to have that ever in kms. That would need something
> like hwc in userspace to work.

Which would be nice to have but that would be forcing a specific color
pipeline on everyone and we explicitly want to avoid that. There are
just too many trade-offs to consider.

> And not generic in it's ultimate consquence would mean we just do a blob
> for a crtc with all the vendor register stuff like adf (android display
> framework) does, because I really don't see a point in trying a
> generic-looking-but-not vendor uapi with each color op/stage split out.
>
> So from very far and pure gut feeling, this seems like a good middle
> ground in the uapi design space we have here.

Good to hear!

> > We've decided against mirroring the existing CRTC properties
> > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > pipeline can significantly differ between vendors and this approach cannot
> > accurately abstract all hardware. In particular, the availability, ordering and
> > capabilities of hardware blocks is different on each display engine. So, we've
> > decided to go for a highly detailed hardware capability discovery.
> >
> > This new uAPI should not be in conflict with existing standard KMS properties,
> > since there are none which control the pre-blending color pipeline at the
> > moment. It does conflict with any vendor-specific properties like
> > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > properties. Drivers will need to either reject atomic commits configuring both
> > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > properties and shows the new generic properties when enabled.
> >
> > To use this uAPI, first user-space needs to discover hardware capabilities via
> > KMS objects and properties, then user-space can configure the hardware via an
> > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> >
> > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> > property is an enum, each enum entry represents a color pipeline supported by
> > the hardware. The special zero entry indicates that the pipeline is in
> > "bypass"/"no-op" mode. For instance, the following plane properties describe a
> > primary plane with 2 supported pipelines but currently configured in bypass
> > mode:
> >
> >     Plane 10
> >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> >     ├─ …
> >     └─ "color_pipeline": enum {0, 42, 52} = 0
>
> A bit confused, why is this an enum, and not just an immutable prop that
> points at the first element? You already can disable elements with the
> bypass thing, also bypassing by changing the pointers to the next node in
> the graph seems a bit confusing and redundant.

We want to allow multiple pipelines to exist and a plane can choose
the pipeline by selecting the first element of the pipeline. The enum
here lists all the possible pipelines that can be attached to the
surface.

> > The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> > objects. The entry value is an object ID pointing to the head of the linked
> > list (the first operation in the color pipeline).
> >
> > The new COLOROP objects also expose a number of KMS properties. Each has a
> > type, a reference to the next COLOROP object in the linked list, and other
> > type-specific properties. Here is an example for a 1D LUT operation:
>
> Ok no comments from me on the actual color operations and semantics of all
> that, because I have simply nothing to bring to that except confusion :-)
>
> Some higher level thoughts instead:
>
> - I really like that we just go with graph nodes here. I think that was
>   bound to happen sooner or later with kms (we almost got there with
>   writeback, and with hindsight maybe should have).
>
> - Since there's other use-cases for graph nodes (maybe scaler modes, or
>   histogram samplers for adaptive backglight, or blending that goes beyond
>   the stacked alpha blending we have now) it think we should make this all
>   fairly generic:
>   * Add a new graph node kms object type.
>   * Add a class type so that userspace knows which graph nodes it must
>     understand for a feature (like "ColorOp" on planes here), and which it
>     can ignore (like perhaps a scaler node to control the interpolation)
>   * Probably need to adjust the object property type. Currently that
>     accept any object of a given type (crtc, fb, blob are the major ones).
>     I think for these graph nodes we want an explicit enumeration of the
>     possible next objects. In kms thus far we've done that with the
>     separate possible_* mask properties, but they're cumbersome.
>   * It sounds like for now we only have immutable next pointers, so that
>     would simplify the first iteration, but should probably anticipate all
>     this.

Just to be clear: right now we don't expect any pipeline to be a graph
but only linked lists. It probably doesn't hurt to generalize this to
graphs but that's not what we want to do here (for now).

> - I think the graph node should be built on top of the driver private
>   atomic obj/state stuff, and could then be further subclassed for
>   specific types. It's a bit much stacking, but avoids too much wheel
>   reinventing, and the worst boilerplate can be avoided with some macros
>   that combine the pointer chasing with the containter_of upcast. With
>   that you can easily build some helpers to walk the graph for a crtc or
>   plane or whatever really.
>
> - I guess core atomic code should at least do the graph link validation
>   and basic things like that, probably not really more to do. And
>   validating the standard properties on some graph nodes ofc.
>
> - I have no idea how we should support the standardization of the state
>   structures. Doing a separate subclass for each type sounds extremely
>   painful, but unions otoh are ugly. Ideally type-indexed and type safe
>   union but C isn't good enough for that. I do think that we should keep
>   up the goal that standard properties are decoded into state structures
>   in core atomic code, and not in each implementation individaully.
>
> - I think the only other precendent for something like this is the media
>   control api in the media subystem. I think it'd be really good to get
>   someone like Laurent to ack the graph node infrastructure to make sure
>   we're missing any lesson they've learned already. If there's anything
>   else we should pull these folks in too ofc.
>
> For merge plan I dropped some ideas already on Harry's rfc for
> vendor-private properties, the only thing to add is that we might want to
> type up the consensus plan into a merged doc like
> Documentation/gpu/rfc/hdr-plane.rst or whatever you feel like for a name.
>
> Cheers, Daniel
>
>
> >
> >     Color operation 42
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> >     ├─ "lut_size": immutable range = 4096
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 43
> >
> > To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation types might
> > have different properties.
> >
> > Here is another example with a 3D LUT:
> >
> >     Color operation 42
> >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >     ├─ "lut_size": immutable range = 33
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 43
> >
> > And one last example with a matrix:
> >
> >     Color operation 42
> >     ├─ "type": enum {Bypass, Matrix} = Matrix
> >     ├─ "matrix_data": blob
> >     └─ "next": immutable color operation ID = 43
> >
> > [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> > a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> > blocks which can be bypassed instead.]
> >
> > [Jonas note: perhaps a single "data" property for both LUTs and matrices
> > would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> >
> > If some hardware supports re-ordering operations in the color pipeline, the
> > driver can expose multiple pipelines with different operation ordering, and
> > user-space can pick the ordering it prefers by selecting the right pipeline.
> > The same scheme can be used to expose hardware blocks supporting multiple
> > precision levels.
> >
> > That's pretty much all there is to it, but as always the devil is in the
> > details.
> >
> > First, we realized that we need a way to indicate where the scaling operation
> > is happening. The contents of the framebuffer attached to the plane might be
> > scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> > the colorspace scaling is applied in, the result will be different, so we need
> > a way for the kernel to indicate which hardware blocks are pre-scaling, and
> > which ones are post-scaling. We introduce a special "scaling" operation type,
> > which is part of the pipeline like other operations but serves an informational
> > role only (effectively, the operation cannot be configured by user-space, all
> > of its properties are immutable). For example:
> >
> >     Color operation 43
> >     ├─ "type": immutable enum {Scaling} = Scaling
> >     └─ "next": immutable color operation ID = 44
> >
> > [Simon note: an alternative would be to split the color pipeline into two, by
> > having two plane properties ("color_pipeline_pre_scale" and
> > "color_pipeline_post_scale") instead of a single one. This would be similar to
> > the way we want to split pre-blending and post-blending. This could be less
> > expressive for drivers, there may be hardware where there are dependencies
> > between the pre- and post-scaling pipeline?]
> >
> > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> > contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> > be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> > where user-space provides a high-level description of the colorspace
> > conversions it needs to perform, and this is at odds with our KMS uAPI
> > proposal. To address this issue, we suggest adding a special block type which
> > describes a fixed conversion from one colorspace to another and cannot be
> > configured by user-space. Then user-space will need to accomodate its pipeline
> > for these special blocks. Such fixed hardware blocks need to be well enough
> > documented so that they can be implemented via shaders.
> >
> > We also noted that it should always be possible for user-space to completely
> > disable the color pipeline and switch back to bypass/identity without a
> > modeset. Some drivers will need to fail atomic commits for some color
> > pipelines, in particular for some specific LUT payloads. For instance, AMD
> > doesn't support curves which are too steep, and Intel doesn't support curves
> > which decrease. This isn't something which routinely happens, but there might
> > be more cases where the hardware needs to reject the pipeline. Thus, when
> > user-space has a running KMS color pipeline, then hits a case where the
> > pipeline cannot keep running (gets rejected by the driver), user-space needs to
> > be able to immediately fall back to shaders without any glitch. This doesn't
> > seem to be an issue for AMD, Intel and NVIDIA.
> >
> > This uAPI is extensible: we can add more color operations, and we can add more
> > properties for each color operation type. For instance, we might want to add
> > support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> > the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> > to keep the scope of the proposal manageable.
> >
> > Later on, we plan to re-use the same machinery for post-blending color
> > pipelines. There are some more details about post-blending which have been
> > separately debated at the hackfest, but we believe it's a viable plan. This
> > solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> > we'd like to introduce a client cap to hide the old properties and show the new
> > post-blending color pipeline properties.
> >
> > We envision a future user-space library to translate a high-level descriptive
> > color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> > for color pipelines"). The library could also offer a translation into shaders.
> > This should help share more infrastructure between compositors and ease KMS
> > offloading. This should also help dealing with the NVIDIA case.
> >
> > To wrap things up, let's take a real-world example: how would gamescope [2]
> > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> >
> > AMD would expose the following objects and properties:
> >
> >     Plane 10
> >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> >     └─ "color_pipeline": enum {0, 42} = 0
> >     Color operation 42 (input CSC)
> >     ├─ "type": enum {Bypass, Matrix} = Matrix
> >     ├─ "matrix_data": blob
> >     └─ "next": immutable color operation ID = 43
> >     Color operation 43
> >     ├─ "type": enum {Scaling} = Scaling
> >     └─ "next": immutable color operation ID = 44
> >     Color operation 44 (DeGamma)
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> >     └─ "next": immutable color operation ID = 45
> >     Color operation 45 (gamut remap)
> >     ├─ "type": enum {Bypass, Matrix} = Matrix
> >     ├─ "matrix_data": blob
> >     └─ "next": immutable color operation ID = 46
> >     Color operation 46 (shaper LUT RAM)
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {LUT} = LUT
> >     ├─ "lut_size": immutable range = 4096
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 47
> >     Color operation 47 (3D LUT RAM)
> >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >     ├─ "lut_size": immutable range = 17
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 48
> >     Color operation 48 (blend gamma)
> >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
> >     ├─ "lut_size": immutable range = 4096
> >     ├─ "lut_data": blob
> >     └─ "next": immutable color operation ID = 0
> >
> > To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> > display, gamescope would perform an atomic commit with the following property
> > values:
> >
> >     Plane 10
> >     └─ "color_pipeline" = 42
> >     Color operation 42 (input CSC)
> >     └─ "matrix_data" = PQ → scRGB (TF)
> >     Color operation 44 (DeGamma)
> >     └─ "type" = Bypass
> >     Color operation 45 (gamut remap)
> >     └─ "matrix_data" = scRGB (TF) → PQ
> >     Color operation 46 (shaper LUT RAM)
> >     └─ "lut_data" = PQ → Display native
> >     Color operation 47 (3D LUT RAM)
> >     └─ "lut_data" = Gamut mapping + tone mapping + night mode
> >     Color operation 48 (blend gamma)
> >     └─ "1d_curve_type" = PQ
> >
> > I hope comparing these properties to the diagrams linked above can help
> > understand how the uAPI would be used and give an idea of its viability.
> >
> > Please feel free to provide feedback! It would be especially useful to have
> > someone familiar with Arm SoCs look at this, to confirm that this proposal
> > would work there.
> >
> > Unless there is a show-stopper, we plan to follow up this RFC with
> > implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> >
> > Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> > Let's work together to make this happen!
> >
> > Simon, on behalf of the hackfest participants
> >
> > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> > [2]: https://github.com/ValveSoftware/gamescope
> > [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 15:28 ` Daniel Vetter
  2023-05-05 15:57   ` Sebastian Wick
@ 2023-05-05 16:06   ` Simon Ser
  2023-05-05 19:53     ` Daniel Vetter
  1 sibling, 1 reply; 49+ messages in thread
From: Simon Ser @ 2023-05-05 16:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Pekka Paalanen, wayland-devel, Michel Dänzer,
	DRI Development, xaver.hugl, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Aleix Pol, Sebastian Wick,
	Joshua Ashton

On Friday, May 5th, 2023 at 17:28, Daniel Vetter <daniel@ffwll.ch> wrote:

> Ok no comments from me on the actual color operations and semantics of all
> that, because I have simply nothing to bring to that except confusion :-)
> 
> Some higher level thoughts instead:
> 
> - I really like that we just go with graph nodes here. I think that was
>   bound to happen sooner or later with kms (we almost got there with
>   writeback, and with hindsight maybe should have).

I'd really rather not do graphs here. We only need linked lists as Sebastian
said. Graphs would significantly add more complexity to this proposal, and
I don't think that's a good idea unless there is a strong use-case.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 14:16     ` Pekka Paalanen
@ 2023-05-05 17:01       ` Joshua Ashton
  0 siblings, 0 replies; 49+ messages in thread
From: Joshua Ashton @ 2023-05-05 17:01 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: xaver.hugl, DRI Development, wayland-devel, Melissa Wen,
	Jonas Ådahl, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Aleix Pol, Sebastian Wick



On 5/5/23 15:16, Pekka Paalanen wrote:
> On Fri, 5 May 2023 14:30:11 +0100
> Joshua Ashton <joshua@froggi.es> wrote:
> 
>> Some corrections and replies inline.
>>
>> On Fri, 5 May 2023 at 12:42, Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>>
>>> On Thu, 04 May 2023 15:22:59 +0000
>>> Simon Ser <contact@emersion.fr> wrote:
> 
> ...
> 
>>>> To wrap things up, let's take a real-world example: how would gamescope [2]
>>>> configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
>>>> pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
>>>>
>>>> AMD would expose the following objects and properties:
>>>>
>>>>      Plane 10
>>>>      ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>>>>      └─ "color_pipeline": enum {0, 42} = 0
>>>>      Color operation 42 (input CSC)
>>>>      ├─ "type": enum {Bypass, Matrix} = Matrix
>>>>      ├─ "matrix_data": blob
>>>>      └─ "next": immutable color operation ID = 43
>>>>      Color operation 43
>>>>      ├─ "type": enum {Scaling} = Scaling
>>>>      └─ "next": immutable color operation ID = 44
>>>>      Color operation 44 (DeGamma)
>>>>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>>>>      ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
>>>>      └─ "next": immutable color operation ID = 45
>>
>> Some vendors have per-tap degamma and some have a degamma after the sample.
>> How do we distinguish that behaviour?
>> It is important to know.
> 
> ...
> 
>>> Btw. ISTR that if you want to do scaling properly with alpha channel,
>>> you need optical values multiplied by alpha. Alpha vs. scaling is just
>>> yet another thing to look into, and TF operations do not work with
>>> pre-mult.
>>
>> What are your concerns here?
> 
> I believe this is exactly the same question as yours about sampling, at
> least for up-scaling where sampling the framebuffer interpolates in
> some way.
> 
> Oh, interpolation mode would fit in the scaling COLOROP...
> 
>> Having pre-multiplied alpha is fine with a TF: the alpha was
>> premultiplied in linear, then encoded with the TF by the client.
> 
> There are two different ways to pre-multiply: into optical values
> (okay), and into electrical values (what everyone actually does, and
> what Wayland assumes by default).
> 
> What you described is the thing mostly no-one does in GUI graphics.
> Even in the web.

Yeah, I have seen this problem many times before in different fields.

There are not many transparent clients that I know of (most of them are 
Gamescope Overlays), but the ones I do know of do actually do the 
premultiply in linear space (mainly because they use sRGB image views 
for their color attachments so it gets handled for them).

 From my perspective and experience, we definitely shouldn't do anything 
to try and 'fix' apps doing their premultiply in the wrong space.

I've had to deal with this before in game development on a transparent 
HUD, and my solution and thinking for that was:
It was authored (or "mastered") with this behaviour in mind. So that's 
what we should do.
It felt bad to 'break' the blending on the HUD of that game, but it 
looked better, and it was what was intended before it was 'fixed' in a 
later engine version.

It is still definitely interesting to think about, but I don't think 
presents a problem at all.
In fact, doing anything would just 'break' the expected behaviour of apps.

> 
>> If you think of a TF as something something relative to a bunch of
>> reference state or whatever then you might think "oh you can't do
>> that!", but you really can.
>> It's really best to just think of it as a mathematical encoding of a
>> value in all instances that we touch.
> 
> True, except when it's false. If you assume that decoding is the exact
> mathematical inverse of encoding, then your conclusion follows.
> 
> Unfortunately many video standards do not have it so. BT.601, BT.709,
> and I forget if BT.2020 (SDR) as well encode with one function and
> decode with something that is not the inverse, and it is totally
> intentional and necessary mangling of the values to get the expected
> result on screen. Someone has called this "implicit color management".
> 
> So one needs to be very careful here what the actual characteristics
> are.
> 
>> The only issue is that you lose precision from having pre-multiplied
>> alpha as it's quantized to fit into the DRM format rather than using
>> the full range then getting divided by the alpha at blend time.
>> It doesn't end up being a visible issue ever however in my experience, at 8bpc.
> 
> That's true. Wait, why would you divide by alpha for blending?
> Blending/interpolation is the only operation where pre-mult is useful.

I mis-spoke, I meant multiply.

- Joshie 🐸✨

> 
> 
> Thanks,
> pq
> 
>>
>> Thanks
>>   - Joshie 🐸✨
>>
>>>
>>>
>>> Thanks,
>>> pq
>>>   
>>>>
>>>> I hope comparing these properties to the diagrams linked above can help
>>>> understand how the uAPI would be used and give an idea of its viability.
>>>>
>>>> Please feel free to provide feedback! It would be especially useful to have
>>>> someone familiar with Arm SoCs look at this, to confirm that this proposal
>>>> would work there.
>>>>
>>>> Unless there is a show-stopper, we plan to follow up this RFC with
>>>> implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
>>>>
>>>> Many thanks to everybody who contributed to the hackfest, on-site or remotely!
>>>> Let's work together to make this happen!
>>>>
>>>> Simon, on behalf of the hackfest participants
>>>>
>>>> [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
>>>> [2]: https://github.com/ValveSoftware/gamescope
>>>> [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
>>>> [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg
>>>   
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 15:57   ` Sebastian Wick
@ 2023-05-05 19:51     ` Daniel Vetter
  2023-05-08  8:24       ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Vetter @ 2023-05-05 19:51 UTC (permalink / raw)
  To: Sebastian Wick
  Cc: Pekka Paalanen, xaver.hugl, DRI Development, wayland-devel,
	Melissa Wen, Michel Dänzer, Jonas Ådahl,
	Victoria Brekenfeld, Aleix Pol, Joshua Ashton, Uma Shankar

On Fri, May 05, 2023 at 05:57:37PM +0200, Sebastian Wick wrote:
> On Fri, May 5, 2023 at 5:28 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, May 04, 2023 at 03:22:59PM +0000, Simon Ser wrote:
> > > Hi all,
> > >
> > > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > framebuffer and before it's blended with other planes. With this new uAPI we
> > > aim to reduce the battery life impact of color management and HDR on mobile
> > > devices, to improve performance and to decrease latency by skipping
> > > composition on the 3D engine. This proposal is the result of discussions at
> > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > discussion.
> > >
> > > This proposal takes a prescriptive approach instead of a descriptive approach.
> > > Drivers describe the available hardware blocks in terms of low-level
> > > mathematical operations, then user-space configures each block. We decided
> > > against a descriptive approach where user-space would provide a high-level
> > > description of the colorspace and other parameters: we want to give more
> > > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > seamlessly, and to avoid forcing user-space into a particular color management
> > > policy.
> >
> > Ack on the prescriptive approach, but generic imo. Descriptive pretty much
> > means you need the shaders at the same api level for fallback purposes,
> > and we're not going to have that ever in kms. That would need something
> > like hwc in userspace to work.
> 
> Which would be nice to have but that would be forcing a specific color
> pipeline on everyone and we explicitly want to avoid that. There are
> just too many trade-offs to consider.
> 
> > And not generic in it's ultimate consquence would mean we just do a blob
> > for a crtc with all the vendor register stuff like adf (android display
> > framework) does, because I really don't see a point in trying a
> > generic-looking-but-not vendor uapi with each color op/stage split out.
> >
> > So from very far and pure gut feeling, this seems like a good middle
> > ground in the uapi design space we have here.
> 
> Good to hear!
> 
> > > We've decided against mirroring the existing CRTC properties
> > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > > pipeline can significantly differ between vendors and this approach cannot
> > > accurately abstract all hardware. In particular, the availability, ordering and
> > > capabilities of hardware blocks is different on each display engine. So, we've
> > > decided to go for a highly detailed hardware capability discovery.
> > >
> > > This new uAPI should not be in conflict with existing standard KMS properties,
> > > since there are none which control the pre-blending color pipeline at the
> > > moment. It does conflict with any vendor-specific properties like
> > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > > properties. Drivers will need to either reject atomic commits configuring both
> > > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > > properties and shows the new generic properties when enabled.
> > >
> > > To use this uAPI, first user-space needs to discover hardware capabilities via
> > > KMS objects and properties, then user-space can configure the hardware via an
> > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> > >
> > > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > > object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> > > property is an enum, each enum entry represents a color pipeline supported by
> > > the hardware. The special zero entry indicates that the pipeline is in
> > > "bypass"/"no-op" mode. For instance, the following plane properties describe a
> > > primary plane with 2 supported pipelines but currently configured in bypass
> > > mode:
> > >
> > >     Plane 10
> > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > >     ├─ …
> > >     └─ "color_pipeline": enum {0, 42, 52} = 0
> >
> > A bit confused, why is this an enum, and not just an immutable prop that
> > points at the first element? You already can disable elements with the
> > bypass thing, also bypassing by changing the pointers to the next node in
> > the graph seems a bit confusing and redundant.
> 
> We want to allow multiple pipelines to exist and a plane can choose
> the pipeline by selecting the first element of the pipeline. The enum
> here lists all the possible pipelines that can be attached to the
> surface.

Ah in that case I guess we do need the flexibility of explicitly
enumerated object property right away I guess. The example looked a bit
like just bypass would do the trick.

> > > The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> > > objects. The entry value is an object ID pointing to the head of the linked
> > > list (the first operation in the color pipeline).
> > >
> > > The new COLOROP objects also expose a number of KMS properties. Each has a
> > > type, a reference to the next COLOROP object in the linked list, and other
> > > type-specific properties. Here is an example for a 1D LUT operation:
> >
> > Ok no comments from me on the actual color operations and semantics of all
> > that, because I have simply nothing to bring to that except confusion :-)
> >
> > Some higher level thoughts instead:
> >
> > - I really like that we just go with graph nodes here. I think that was
> >   bound to happen sooner or later with kms (we almost got there with
> >   writeback, and with hindsight maybe should have).
> >
> > - Since there's other use-cases for graph nodes (maybe scaler modes, or
> >   histogram samplers for adaptive backglight, or blending that goes beyond
> >   the stacked alpha blending we have now) it think we should make this all
> >   fairly generic:
> >   * Add a new graph node kms object type.
> >   * Add a class type so that userspace knows which graph nodes it must
> >     understand for a feature (like "ColorOp" on planes here), and which it
> >     can ignore (like perhaps a scaler node to control the interpolation)
> >   * Probably need to adjust the object property type. Currently that
> >     accept any object of a given type (crtc, fb, blob are the major ones).
> >     I think for these graph nodes we want an explicit enumeration of the
> >     possible next objects. In kms thus far we've done that with the
> >     separate possible_* mask properties, but they're cumbersome.
> >   * It sounds like for now we only have immutable next pointers, so that
> >     would simplify the first iteration, but should probably anticipate all
> >     this.
> 
> Just to be clear: right now we don't expect any pipeline to be a graph
> but only linked lists. It probably doesn't hurt to generalize this to
> graphs but that's not what we want to do here (for now).

Oh a list is still a graph :-) Also my idea isn't to model a graph data
structure, but just the graph nodes, and a bit of scaffolding to handle
the links/pointers. Whether you only build a list of a graph out of that
is kinda irrelevant.

Plus with the multiple pipelines you can already have a non-list in the
starting point already.

Cheers, Daniel

> > - I think the graph node should be built on top of the driver private
> >   atomic obj/state stuff, and could then be further subclassed for
> >   specific types. It's a bit much stacking, but avoids too much wheel
> >   reinventing, and the worst boilerplate can be avoided with some macros
> >   that combine the pointer chasing with the containter_of upcast. With
> >   that you can easily build some helpers to walk the graph for a crtc or
> >   plane or whatever really.
> >
> > - I guess core atomic code should at least do the graph link validation
> >   and basic things like that, probably not really more to do. And
> >   validating the standard properties on some graph nodes ofc.
> >
> > - I have no idea how we should support the standardization of the state
> >   structures. Doing a separate subclass for each type sounds extremely
> >   painful, but unions otoh are ugly. Ideally type-indexed and type safe
> >   union but C isn't good enough for that. I do think that we should keep
> >   up the goal that standard properties are decoded into state structures
> >   in core atomic code, and not in each implementation individaully.
> >
> > - I think the only other precendent for something like this is the media
> >   control api in the media subystem. I think it'd be really good to get
> >   someone like Laurent to ack the graph node infrastructure to make sure
> >   we're missing any lesson they've learned already. If there's anything
> >   else we should pull these folks in too ofc.
> >
> > For merge plan I dropped some ideas already on Harry's rfc for
> > vendor-private properties, the only thing to add is that we might want to
> > type up the consensus plan into a merged doc like
> > Documentation/gpu/rfc/hdr-plane.rst or whatever you feel like for a name.
> >
> > Cheers, Daniel
> >
> >
> > >
> > >     Color operation 42
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >
> > > To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> > > entries, then set "lut_data" to the blob ID. Other color operation types might
> > > have different properties.
> > >
> > > Here is another example with a 3D LUT:
> > >
> > >     Color operation 42
> > >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> > >     ├─ "lut_size": immutable range = 33
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >
> > > And one last example with a matrix:
> > >
> > >     Color operation 42
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >
> > > [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> > > a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> > > blocks which can be bypassed instead.]
> > >
> > > [Jonas note: perhaps a single "data" property for both LUTs and matrices
> > > would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> > >
> > > If some hardware supports re-ordering operations in the color pipeline, the
> > > driver can expose multiple pipelines with different operation ordering, and
> > > user-space can pick the ordering it prefers by selecting the right pipeline.
> > > The same scheme can be used to expose hardware blocks supporting multiple
> > > precision levels.
> > >
> > > That's pretty much all there is to it, but as always the devil is in the
> > > details.
> > >
> > > First, we realized that we need a way to indicate where the scaling operation
> > > is happening. The contents of the framebuffer attached to the plane might be
> > > scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> > > the colorspace scaling is applied in, the result will be different, so we need
> > > a way for the kernel to indicate which hardware blocks are pre-scaling, and
> > > which ones are post-scaling. We introduce a special "scaling" operation type,
> > > which is part of the pipeline like other operations but serves an informational
> > > role only (effectively, the operation cannot be configured by user-space, all
> > > of its properties are immutable). For example:
> > >
> > >     Color operation 43
> > >     ├─ "type": immutable enum {Scaling} = Scaling
> > >     └─ "next": immutable color operation ID = 44
> > >
> > > [Simon note: an alternative would be to split the color pipeline into two, by
> > > having two plane properties ("color_pipeline_pre_scale" and
> > > "color_pipeline_post_scale") instead of a single one. This would be similar to
> > > the way we want to split pre-blending and post-blending. This could be less
> > > expressive for drivers, there may be hardware where there are dependencies
> > > between the pre- and post-scaling pipeline?]
> > >
> > > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> > > contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> > > be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> > > where user-space provides a high-level description of the colorspace
> > > conversions it needs to perform, and this is at odds with our KMS uAPI
> > > proposal. To address this issue, we suggest adding a special block type which
> > > describes a fixed conversion from one colorspace to another and cannot be
> > > configured by user-space. Then user-space will need to accomodate its pipeline
> > > for these special blocks. Such fixed hardware blocks need to be well enough
> > > documented so that they can be implemented via shaders.
> > >
> > > We also noted that it should always be possible for user-space to completely
> > > disable the color pipeline and switch back to bypass/identity without a
> > > modeset. Some drivers will need to fail atomic commits for some color
> > > pipelines, in particular for some specific LUT payloads. For instance, AMD
> > > doesn't support curves which are too steep, and Intel doesn't support curves
> > > which decrease. This isn't something which routinely happens, but there might
> > > be more cases where the hardware needs to reject the pipeline. Thus, when
> > > user-space has a running KMS color pipeline, then hits a case where the
> > > pipeline cannot keep running (gets rejected by the driver), user-space needs to
> > > be able to immediately fall back to shaders without any glitch. This doesn't
> > > seem to be an issue for AMD, Intel and NVIDIA.
> > >
> > > This uAPI is extensible: we can add more color operations, and we can add more
> > > properties for each color operation type. For instance, we might want to add
> > > support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> > > the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> > > to keep the scope of the proposal manageable.
> > >
> > > Later on, we plan to re-use the same machinery for post-blending color
> > > pipelines. There are some more details about post-blending which have been
> > > separately debated at the hackfest, but we believe it's a viable plan. This
> > > solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> > > we'd like to introduce a client cap to hide the old properties and show the new
> > > post-blending color pipeline properties.
> > >
> > > We envision a future user-space library to translate a high-level descriptive
> > > color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> > > for color pipelines"). The library could also offer a translation into shaders.
> > > This should help share more infrastructure between compositors and ease KMS
> > > offloading. This should also help dealing with the NVIDIA case.
> > >
> > > To wrap things up, let's take a real-world example: how would gamescope [2]
> > > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> > > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> > >
> > > AMD would expose the following objects and properties:
> > >
> > >     Plane 10
> > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > >     └─ "color_pipeline": enum {0, 42} = 0
> > >     Color operation 42 (input CSC)
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >     Color operation 43
> > >     ├─ "type": enum {Scaling} = Scaling
> > >     └─ "next": immutable color operation ID = 44
> > >     Color operation 44 (DeGamma)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> > >     └─ "next": immutable color operation ID = 45
> > >     Color operation 45 (gamut remap)
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 46
> > >     Color operation 46 (shaper LUT RAM)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 47
> > >     Color operation 47 (3D LUT RAM)
> > >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> > >     ├─ "lut_size": immutable range = 17
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 48
> > >     Color operation 48 (blend gamma)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 0
> > >
> > > To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> > > display, gamescope would perform an atomic commit with the following property
> > > values:
> > >
> > >     Plane 10
> > >     └─ "color_pipeline" = 42
> > >     Color operation 42 (input CSC)
> > >     └─ "matrix_data" = PQ → scRGB (TF)
> > >     Color operation 44 (DeGamma)
> > >     └─ "type" = Bypass
> > >     Color operation 45 (gamut remap)
> > >     └─ "matrix_data" = scRGB (TF) → PQ
> > >     Color operation 46 (shaper LUT RAM)
> > >     └─ "lut_data" = PQ → Display native
> > >     Color operation 47 (3D LUT RAM)
> > >     └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > >     Color operation 48 (blend gamma)
> > >     └─ "1d_curve_type" = PQ
> > >
> > > I hope comparing these properties to the diagrams linked above can help
> > > understand how the uAPI would be used and give an idea of its viability.
> > >
> > > Please feel free to provide feedback! It would be especially useful to have
> > > someone familiar with Arm SoCs look at this, to confirm that this proposal
> > > would work there.
> > >
> > > Unless there is a show-stopper, we plan to follow up this RFC with
> > > implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> > >
> > > Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> > > Let's work together to make this happen!
> > >
> > > Simon, on behalf of the hackfest participants
> > >
> > > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> > > [2]: https://github.com/ValveSoftware/gamescope
> > > [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> >
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 16:06   ` Simon Ser
@ 2023-05-05 19:53     ` Daniel Vetter
  2023-05-08  8:58       ` Simon Ser
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Vetter @ 2023-05-05 19:53 UTC (permalink / raw)
  To: Simon Ser
  Cc: Sebastian Wick, Pekka Paalanen, Michel Dänzer,
	DRI Development, xaver.hugl, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Aleix Pol, wayland-devel,
	Joshua Ashton

On Fri, May 05, 2023 at 04:06:26PM +0000, Simon Ser wrote:
> On Friday, May 5th, 2023 at 17:28, Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > Ok no comments from me on the actual color operations and semantics of all
> > that, because I have simply nothing to bring to that except confusion :-)
> > 
> > Some higher level thoughts instead:
> > 
> > - I really like that we just go with graph nodes here. I think that was
> >   bound to happen sooner or later with kms (we almost got there with
> >   writeback, and with hindsight maybe should have).
> 
> I'd really rather not do graphs here. We only need linked lists as Sebastian
> said. Graphs would significantly add more complexity to this proposal, and
> I don't think that's a good idea unless there is a strong use-case.

You have a graph, because a graph is just nodes + links. I did _not_
propose a full generic graph structure, the link pointer would be in the
class/type specific structure only. Like how we have the plane->crtc or
connector->crtc links already like that (which already _is_ is full blown
graph).

Maybe explain what exactly you're thinking under "do graphs here" so I
understand what you mean differently than me?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-04 15:22 [RFC] Plane color pipeline KMS uAPI Simon Ser
                   ` (2 preceding siblings ...)
  2023-05-05 15:28 ` Daniel Vetter
@ 2023-05-05 20:40 ` Dave Airlie
  2023-05-05 22:20   ` Sebastian Wick
       [not found]   ` <20230505160435.6e3ffa4a@n2pa>
  2023-05-09  8:04 ` Pekka Paalanen
       [not found] ` <4341dac6-ada1-2a75-1c22-086d96408a85@quicinc.com>
  5 siblings, 2 replies; 49+ messages in thread
From: Dave Airlie @ 2023-05-05 20:40 UTC (permalink / raw)
  To: Simon Ser
  Cc: Aleix Pol, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Michel Dänzer, Sebastian Wick,
	Joshua Ashton

On Fri, 5 May 2023 at 01:23, Simon Ser <contact@emersion.fr> wrote:
>
> Hi all,
>
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
>
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.

I'm not 100% sold on the prescriptive here, let's see if someone can
get me over the line with some questions later.

My feeling is color pipeline hw is not a done deal, and that hw
vendors will be revising/evolving/churning the hw blocks for a while
longer, as there is no real standards in the area to aim for, all the
vendors are mostly just doing whatever gets Windows over the line and
keeps hw engineers happy. So I have some concerns here around forwards
compatibility and hence the API design.

I guess my main concern is if you expose a bunch of hw blocks and
someone comes up with a novel new thing, will all existing userspace
work, without falling back to shaders?
Do we have minimum guarantees on what hardware blocks have to be
exposed to build a useable pipeline?
If a hardware block goes away in a new silicon revision, do I have to
rewrite my compositor? or will it be expected that the kernel will
emulate the old pipelines on top of whatever new fancy thing exists.

We are not Android, or even Steam OS on a Steamdeck, we have to be
able to independently update the kernel for new hardware and not
require every compositor currently providing HDR to need to support
new hardware blocks and models at the same time.

Dave.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 20:40 ` Dave Airlie
@ 2023-05-05 22:20   ` Sebastian Wick
  2023-05-07 23:14     ` Dave Airlie
       [not found]   ` <20230505160435.6e3ffa4a@n2pa>
  1 sibling, 1 reply; 49+ messages in thread
From: Sebastian Wick @ 2023-05-05 22:20 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Aleix Pol, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Michel Dänzer, Joshua Ashton

On Fri, May 5, 2023 at 10:40 PM Dave Airlie <airlied@gmail.com> wrote:
>
> On Fri, 5 May 2023 at 01:23, Simon Ser <contact@emersion.fr> wrote:
> >
> > Hi all,
> >
> > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > pipeline before blending, ie. after a pixel is tapped from a plane's
> > framebuffer and before it's blended with other planes. With this new uAPI we
> > aim to reduce the battery life impact of color management and HDR on mobile
> > devices, to improve performance and to decrease latency by skipping
> > composition on the 3D engine. This proposal is the result of discussions at
> > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > discussion.
> >
> > This proposal takes a prescriptive approach instead of a descriptive approach.
> > Drivers describe the available hardware blocks in terms of low-level
> > mathematical operations, then user-space configures each block. We decided
> > against a descriptive approach where user-space would provide a high-level
> > description of the colorspace and other parameters: we want to give more
> > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > color pipeline with shaders and switch between shaders and KMS pipelines
> > seamlessly, and to avoid forcing user-space into a particular color management
> > policy.
>
> I'm not 100% sold on the prescriptive here, let's see if someone can
> get me over the line with some questions later.
>
> My feeling is color pipeline hw is not a done deal, and that hw
> vendors will be revising/evolving/churning the hw blocks for a while
> longer, as there is no real standards in the area to aim for, all the
> vendors are mostly just doing whatever gets Windows over the line and
> keeps hw engineers happy. So I have some concerns here around forwards
> compatibility and hence the API design.
>
> I guess my main concern is if you expose a bunch of hw blocks and
> someone comes up with a novel new thing, will all existing userspace
> work, without falling back to shaders?
> Do we have minimum guarantees on what hardware blocks have to be
> exposed to build a useable pipeline?
> If a hardware block goes away in a new silicon revision, do I have to
> rewrite my compositor? or will it be expected that the kernel will
> emulate the old pipelines on top of whatever new fancy thing exists.

I think there are two answers to those questions.

The first one is that right now KMS already doesn't guarantee that
every property is supported on all hardware. The guarantee we have is
that properties that are supported on a piece of hardware on a
specific kernel will be supported on the same hardware on later
kernels. The color pipeline is no different here. For a specific piece
of hardware a newer kernel might only change the pipelines in a
backwards compatible way and add new pipelines.

So to answer your question: if some hardware with a novel pipeline
will show up it might not be supported and that's fine. We already
have cases where some hardware does not support the gamma lut property
but only the CSC property and that breaks night light because we never
bothered to write a shader fallback. KMS provides ways to offload work
but a generic user space always has to provide a fallback and this
doesn't change. Hardware specific user space on the other hand will
keep working with the forward compatibility guarantees we want to
provide.

The second answer is that we want to provide a user space library
which takes a description of a color pipeline and tries to map that to
the available KMS color pipelines. If there is a novel color
operation, adding support in this library would then make it possible
to offload compatible color pipelines on this new hardware for all
consumers of the library. Obviously there is no guarantee that
whatever color pipeline compositors come up with can actually be
realized on specific hardware but that's just an inherent hardware
issue.

> We are not Android, or even Steam OS on a Steamdeck, we have to be
> able to independently update the kernel for new hardware and not
> require every compositor currently providing HDR to need to support
> new hardware blocks and models at the same time.
>
> Dave.
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 22:20   ` Sebastian Wick
@ 2023-05-07 23:14     ` Dave Airlie
  2023-05-08  9:37       ` Pekka Paalanen
                         ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Dave Airlie @ 2023-05-07 23:14 UTC (permalink / raw)
  To: Sebastian Wick
  Cc: Aleix Pol, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Michel Dänzer, Joshua Ashton

On Sat, 6 May 2023 at 08:21, Sebastian Wick <sebastian.wick@redhat.com> wrote:
>
> On Fri, May 5, 2023 at 10:40 PM Dave Airlie <airlied@gmail.com> wrote:
> >
> > On Fri, 5 May 2023 at 01:23, Simon Ser <contact@emersion.fr> wrote:
> > >
> > > Hi all,
> > >
> > > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > framebuffer and before it's blended with other planes. With this new uAPI we
> > > aim to reduce the battery life impact of color management and HDR on mobile
> > > devices, to improve performance and to decrease latency by skipping
> > > composition on the 3D engine. This proposal is the result of discussions at
> > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > discussion.
> > >
> > > This proposal takes a prescriptive approach instead of a descriptive approach.
> > > Drivers describe the available hardware blocks in terms of low-level
> > > mathematical operations, then user-space configures each block. We decided
> > > against a descriptive approach where user-space would provide a high-level
> > > description of the colorspace and other parameters: we want to give more
> > > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > seamlessly, and to avoid forcing user-space into a particular color management
> > > policy.
> >
> > I'm not 100% sold on the prescriptive here, let's see if someone can
> > get me over the line with some questions later.
> >
> > My feeling is color pipeline hw is not a done deal, and that hw
> > vendors will be revising/evolving/churning the hw blocks for a while
> > longer, as there is no real standards in the area to aim for, all the
> > vendors are mostly just doing whatever gets Windows over the line and
> > keeps hw engineers happy. So I have some concerns here around forwards
> > compatibility and hence the API design.
> >
> > I guess my main concern is if you expose a bunch of hw blocks and
> > someone comes up with a novel new thing, will all existing userspace
> > work, without falling back to shaders?
> > Do we have minimum guarantees on what hardware blocks have to be
> > exposed to build a useable pipeline?
> > If a hardware block goes away in a new silicon revision, do I have to
> > rewrite my compositor? or will it be expected that the kernel will
> > emulate the old pipelines on top of whatever new fancy thing exists.
>
> I think there are two answers to those questions.

These aren't selling me much better :-)
>
> The first one is that right now KMS already doesn't guarantee that
> every property is supported on all hardware. The guarantee we have is
> that properties that are supported on a piece of hardware on a
> specific kernel will be supported on the same hardware on later
> kernels. The color pipeline is no different here. For a specific piece
> of hardware a newer kernel might only change the pipelines in a
> backwards compatible way and add new pipelines.
>
> So to answer your question: if some hardware with a novel pipeline
> will show up it might not be supported and that's fine. We already
> have cases where some hardware does not support the gamma lut property
> but only the CSC property and that breaks night light because we never
> bothered to write a shader fallback. KMS provides ways to offload work
> but a generic user space always has to provide a fallback and this
> doesn't change. Hardware specific user space on the other hand will
> keep working with the forward compatibility guarantees we want to
> provide.

In my mind we've screwed up already, isn't a case to be made for
continue down the same path.

The kernel is meant to be a hardware abstraction layer, not just a
hardware exposure layer. The kernel shouldn't set policy and there are
cases where it can't act as an abstraction layer (like where you need
a compiler), but I'm not sold that this case is one of those yet. I'm
open to being educated here on why it would be.

>
> The second answer is that we want to provide a user space library
> which takes a description of a color pipeline and tries to map that to
> the available KMS color pipelines. If there is a novel color
> operation, adding support in this library would then make it possible
> to offload compatible color pipelines on this new hardware for all
> consumers of the library. Obviously there is no guarantee that
> whatever color pipeline compositors come up with can actually be
> realized on specific hardware but that's just an inherent hardware
> issue.
>

Why does this library need to be in userspace though? If there's a
library making device dependent decisions, why can't we just make
those device dependent decisions in the kernel?

This feels like we are trying to go down the Android HWC road, but we
aren't in that business.

My thoughts would be userspace has to have some way to describe what
it wants anyways, otherwise it does sound like I need to update
mutter, kwin, surfaceflinger, chromeos, gamescope, every time a new HW
device comes out that operates slightly different to previously
generations. This isn't the kernel doing hw abstraction at all, it's
the kernel just opting out of designing interfaces and it isn't
something I'm sold on.

Dave.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 19:51     ` Daniel Vetter
@ 2023-05-08  8:24       ` Pekka Paalanen
  2023-05-08  9:00         ` Daniel Vetter
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-08  8:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Sebastian Wick, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Michel Dänzer, Jonas Ådahl,
	Victoria Brekenfeld, Aleix Pol, Uma Shankar, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 9645 bytes --]

On Fri, 5 May 2023 21:51:41 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Fri, May 05, 2023 at 05:57:37PM +0200, Sebastian Wick wrote:
> > On Fri, May 5, 2023 at 5:28 PM Daniel Vetter <daniel@ffwll.ch> wrote:  
> > >
> > > On Thu, May 04, 2023 at 03:22:59PM +0000, Simon Ser wrote:  
> > > > Hi all,
> > > >
> > > > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > framebuffer and before it's blended with other planes. With this new uAPI we
> > > > aim to reduce the battery life impact of color management and HDR on mobile
> > > > devices, to improve performance and to decrease latency by skipping
> > > > composition on the 3D engine. This proposal is the result of discussions at
> > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > > discussion.
> > > >
> > > > This proposal takes a prescriptive approach instead of a descriptive approach.
> > > > Drivers describe the available hardware blocks in terms of low-level
> > > > mathematical operations, then user-space configures each block. We decided
> > > > against a descriptive approach where user-space would provide a high-level
> > > > description of the colorspace and other parameters: we want to give more
> > > > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > > seamlessly, and to avoid forcing user-space into a particular color management
> > > > policy.  
> > >
> > > Ack on the prescriptive approach, but generic imo. Descriptive pretty much
> > > means you need the shaders at the same api level for fallback purposes,
> > > and we're not going to have that ever in kms. That would need something
> > > like hwc in userspace to work.  
> > 
> > Which would be nice to have but that would be forcing a specific color
> > pipeline on everyone and we explicitly want to avoid that. There are
> > just too many trade-offs to consider.
> >   
> > > And not generic in it's ultimate consquence would mean we just do a blob
> > > for a crtc with all the vendor register stuff like adf (android display
> > > framework) does, because I really don't see a point in trying a
> > > generic-looking-but-not vendor uapi with each color op/stage split out.
> > >
> > > So from very far and pure gut feeling, this seems like a good middle
> > > ground in the uapi design space we have here.  
> > 
> > Good to hear!
> >   
> > > > We've decided against mirroring the existing CRTC properties
> > > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > > > pipeline can significantly differ between vendors and this approach cannot
> > > > accurately abstract all hardware. In particular, the availability, ordering and
> > > > capabilities of hardware blocks is different on each display engine. So, we've
> > > > decided to go for a highly detailed hardware capability discovery.
> > > >
> > > > This new uAPI should not be in conflict with existing standard KMS properties,
> > > > since there are none which control the pre-blending color pipeline at the
> > > > moment. It does conflict with any vendor-specific properties like
> > > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > > > properties. Drivers will need to either reject atomic commits configuring both
> > > > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > > > properties and shows the new generic properties when enabled.
> > > >
> > > > To use this uAPI, first user-space needs to discover hardware capabilities via
> > > > KMS objects and properties, then user-space can configure the hardware via an
> > > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> > > >
> > > > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > > > object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> > > > property is an enum, each enum entry represents a color pipeline supported by
> > > > the hardware. The special zero entry indicates that the pipeline is in
> > > > "bypass"/"no-op" mode. For instance, the following plane properties describe a
> > > > primary plane with 2 supported pipelines but currently configured in bypass
> > > > mode:
> > > >
> > > >     Plane 10
> > > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > > >     ├─ …
> > > >     └─ "color_pipeline": enum {0, 42, 52} = 0  
> > >
> > > A bit confused, why is this an enum, and not just an immutable prop that
> > > points at the first element? You already can disable elements with the
> > > bypass thing, also bypassing by changing the pointers to the next node in
> > > the graph seems a bit confusing and redundant.  
> > 
> > We want to allow multiple pipelines to exist and a plane can choose
> > the pipeline by selecting the first element of the pipeline. The enum
> > here lists all the possible pipelines that can be attached to the
> > surface.  
> 
> Ah in that case I guess we do need the flexibility of explicitly
> enumerated object property right away I guess. The example looked a bit
> like just bypass would do the trick.

Setting individual pipeline elements to bypass is not flexible enough,
because it does not allow re-ordering the pipeline elements.

OTOH, hardware does not allow arbitrary re-ordering of elements,
therefore all the "next" links in the pipeline elements are immutable.

Presumably when some re-ordering in hardware is possible, the number of
order-variants is small (given we can also bypass individual elements),
so all variants are explicitly enumerated at the plane property.

This way there are no traps like "you cannot enable all elements at the
same time" that a single standard pipeline would end up with. All
elements for all enumerated pipelines are always usable simultaneously,
I believe.

"Re-ordering" may not even be an accurate description of hardware. Some
hardware elements could be mutually exclusive, whether they implement
the same or a different operation.

> > > > The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> > > > objects. The entry value is an object ID pointing to the head of the linked
> > > > list (the first operation in the color pipeline).
> > > >
> > > > The new COLOROP objects also expose a number of KMS properties. Each has a
> > > > type, a reference to the next COLOROP object in the linked list, and other
> > > > type-specific properties. Here is an example for a 1D LUT operation:  
> > >
> > > Ok no comments from me on the actual color operations and semantics of all
> > > that, because I have simply nothing to bring to that except confusion :-)
> > >
> > > Some higher level thoughts instead:
> > >
> > > - I really like that we just go with graph nodes here. I think that was
> > >   bound to happen sooner or later with kms (we almost got there with
> > >   writeback, and with hindsight maybe should have).
> > >
> > > - Since there's other use-cases for graph nodes (maybe scaler modes, or
> > >   histogram samplers for adaptive backglight, or blending that goes beyond
> > >   the stacked alpha blending we have now) it think we should make this all
> > >   fairly generic:
> > >   * Add a new graph node kms object type.
> > >   * Add a class type so that userspace knows which graph nodes it must
> > >     understand for a feature (like "ColorOp" on planes here), and which it
> > >     can ignore (like perhaps a scaler node to control the interpolation)
> > >   * Probably need to adjust the object property type. Currently that
> > >     accept any object of a given type (crtc, fb, blob are the major ones).
> > >     I think for these graph nodes we want an explicit enumeration of the
> > >     possible next objects. In kms thus far we've done that with the
> > >     separate possible_* mask properties, but they're cumbersome.
> > >   * It sounds like for now we only have immutable next pointers, so that
> > >     would simplify the first iteration, but should probably anticipate all
> > >     this.  
> > 
> > Just to be clear: right now we don't expect any pipeline to be a graph
> > but only linked lists. It probably doesn't hurt to generalize this to
> > graphs but that's not what we want to do here (for now).  
> 
> Oh a list is still a graph :-) Also my idea isn't to model a graph data
> structure, but just the graph nodes, and a bit of scaffolding to handle
> the links/pointers. Whether you only build a list of a graph out of that
> is kinda irrelevant.
> 
> Plus with the multiple pipelines you can already have a non-list in the
> starting point already.

No, there is no need for a graph. It literally is just multiple
single-linked lists at the UAPI level, and userspace picks one for the
plane.

If you change the immutable "next element" properties to mutable, then
I think the UAPI collapses. It becomes far too hard to know what even
could work, and probing for success/failure like with KMS in general is
simply infeasible.

Let's leave the more wild scenarios for the second total rewrite of the
KMS color API (assuming this is the start of the first). Otherwise we
get nowhere. That's what Wayland protocol design work has taught us.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
       [not found]   ` <20230505160435.6e3ffa4a@n2pa>
@ 2023-05-08  8:49     ` Pekka Paalanen
  0 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-08  8:49 UTC (permalink / raw)
  To: Steven Kucharzyk
  Cc: Aleix Pol, xaver.hugl, DRI Development, wayland-devel,
	Melissa Wen, Jonas Ådahl, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Sebastian Wick, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 3925 bytes --]

On Fri, 5 May 2023 16:04:35 -0500
Steven Kucharzyk <stvr_8888@comcast.net> wrote:

> Hi,
> 
> I'm new to this list and probably can't contribute but interested, I
> passed your original posting to a friend and have enclosed his thoughts
> ... old hash or food for thought ??? I ask your forgiveness if you find
> this inappropriate. (am of the elk * act first, ask for forgiveness
> afterward)  ;-)

Thanks, but please use reply-to-all, it's a bit painful to add back all
the other mailing lists and people.

> 
> Steven
> 
> --------------------- start 
> 
> Steven Kucharzyk wrote:
> 
> > Thought you might find this of interest.    
> 
> Hi,
> 	thanks for sending it to me.
> 
> Unfortunately I don't know enough about the context to say anything
> specific about it.
> 
> The best I can do is state the big picture aims I would
> look for, as someone with a background in display systems
> electronic design, rendering software development
> and Color Science. (I apologize in advance if any of this
> is preaching to the choir!)
> 
> 1) I would make sure that someone with a strong Color Science
> background was consulted in the development of the API.

Where can we find someone like that, who would also not start by saying
we cannot get anything right, or that we cannot change the old software
architecture, and would actually try to understand *our* goals and
limitations as well? Who could commit to long discussions over several
years in a *friendly* manner?

It would take extreme amounts of patience from that person.

> 2) I would be measuring the API against its ability to
> support a "profiling" color management workflow. This workflow
> allows using the full capability of a display, while also allowing
> simultaneous display of multiple sources encoded in any colorspace.
> So the basic architecture is to have a final frame buffer (real
> or virtual) in the native displays colorspace, and use any
> graphics hardware color transform and rendering capability to
> assist with the transformation of data in different source
> colorspaces into the displays native colorspace.
> 
> 3) The third thing I would be looking for, is enough
> standardization that user mode software can be written
> that will get key benefits of what's available in the hardware,
> without needing to be customized to lots of different hardware
> specifics. For instance, I'd make sure that there was a standard display
> frame buffer to display mode that applied per channel curves
> that are specified in a standard way. (i.e. make sure that there
> is an easy to use replacement for XRRCrtcGamma.)
> 
> Any API that is specific to a type or model of graphics card,
> will retard development of color management support to a very large
> degree - the financial and development costs of obtaining, configuring
> and testing against multiple graphic card makes and models puts this
> in the too hard basket for anyone other than a corporation.
> 
> Perhaps little of the above is relevant, if this is a low level API
> that is to be used by other operating system sub-systems such
> as display graphics API's like X11 or Wayland, which will choose
> specific display rendering models and implement them with the hardware
> capabilities that are available.

That is exactly what it is. It is a way to save power and gain
performance when things happen to fit in place just right: what one
needs to do matches what the dedicated color processing hardware blocks
implement.

> From a color management point of view,
> it is the operating system & UI graphics API's that are the ones that
> are desirable to work with, since they are meant to insulate
> applications from hardware details.

Indeed. Anything the display controller hardware cannot do will be
implemented by other means, e.g. on the GPU, by a display server.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 19:53     ` Daniel Vetter
@ 2023-05-08  8:58       ` Simon Ser
  2023-05-08  9:18         ` Daniel Vetter
       [not found]         ` <20230508185409.07501f40@n2pa>
  0 siblings, 2 replies; 49+ messages in thread
From: Simon Ser @ 2023-05-08  8:58 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Pekka Paalanen, wayland-devel, Michel Dänzer,
	DRI Development, xaver.hugl, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Aleix Pol, Sebastian Wick,
	Joshua Ashton

On Friday, May 5th, 2023 at 21:53, Daniel Vetter <daniel@ffwll.ch> wrote:

> On Fri, May 05, 2023 at 04:06:26PM +0000, Simon Ser wrote:
> > On Friday, May 5th, 2023 at 17:28, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > Ok no comments from me on the actual color operations and semantics of all
> > > that, because I have simply nothing to bring to that except confusion :-)
> > >
> > > Some higher level thoughts instead:
> > >
> > > - I really like that we just go with graph nodes here. I think that was
> > >   bound to happen sooner or later with kms (we almost got there with
> > >   writeback, and with hindsight maybe should have).
> >
> > I'd really rather not do graphs here. We only need linked lists as Sebastian
> > said. Graphs would significantly add more complexity to this proposal, and
> > I don't think that's a good idea unless there is a strong use-case.
> 
> You have a graph, because a graph is just nodes + links. I did _not_
> propose a full generic graph structure, the link pointer would be in the
> class/type specific structure only. Like how we have the plane->crtc or
> connector->crtc links already like that (which already _is_ is full blown
> graph).

I really don't get why a pointer in a struct makes plane->crtc a full-blown
graph. There is only a single parent-child link. A plane has a reference to a
CRTC, and nothing more.

You could say that anything is a graph. Yes, even an isolated struct somewhere
is a graph: one with a single node and no link. But I don't follow what's the
point of explaining everything with a graph when we only need a much simpler
subset of the concept of graphs?

Putting the graph thing aside, what are you suggesting exactly from a concrete
uAPI point-of-view? Introducing a new struct type? Would it be a colorop
specific struct, or a more generic one? What would be the fields? Why do you
think that's necessary and better than the current proposal?

My understanding so far is that you're suggesting introducing something like
this at the uAPI level:

    struct drm_mode_node {
        uint32_t id;

        uint32_t children_count;
        uint32_t *children; // list of child object IDs
    };

I don't think this is a good idea for multiple reasons. First, this is
overkill: we don't need this complexity, and this complexity will make it more
difficult to reason about the color pipeline. This is a premature abstraction,
one we don't need right now, and one I heaven't heard a potential future
use-case for. Sure, one can kill an ant with a sledgehammer if they'd like, but
that's not the right tool for the job.

Second, this will make user-space miserable. User-space already has a tricky
task to achieve to translate its abstract descriptive color pipeline to our
proposed simple list of color operations. If we expose a full-blown graph, then
the user-space logic will need to handle arbitrary graphs. This will have a
significant cost (on implementation and testing), which we will be paying in
terms of time spent and in terms of bugs.

Last, this kind of generic "node" struct is at odds with existing KMS object
types. So far, KMS objects are concrete like CRTC, connector, plane, etc.
"Node" is abstract. This is inconsistent.

Please let me know whether the above is what you have in mind. If not, please
explain what exactly you mean by "graphs" in terms of uAPI, and please explain
why we need it and what real-world use-cases it would solve.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-08  8:24       ` Pekka Paalanen
@ 2023-05-08  9:00         ` Daniel Vetter
  0 siblings, 0 replies; 49+ messages in thread
From: Daniel Vetter @ 2023-05-08  9:00 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Sebastian Wick, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Michel Dänzer, Jonas Ådahl,
	Victoria Brekenfeld, Aleix Pol, Uma Shankar, Joshua Ashton

On Mon, 8 May 2023 at 10:24, Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Fri, 5 May 2023 21:51:41 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Fri, May 05, 2023 at 05:57:37PM +0200, Sebastian Wick wrote:
> > > On Fri, May 5, 2023 at 5:28 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Thu, May 04, 2023 at 03:22:59PM +0000, Simon Ser wrote:
> > > > > Hi all,
> > > > >
> > > > > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > > framebuffer and before it's blended with other planes. With this new uAPI we
> > > > > aim to reduce the battery life impact of color management and HDR on mobile
> > > > > devices, to improve performance and to decrease latency by skipping
> > > > > composition on the 3D engine. This proposal is the result of discussions at
> > > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > > > discussion.
> > > > >
> > > > > This proposal takes a prescriptive approach instead of a descriptive approach.
> > > > > Drivers describe the available hardware blocks in terms of low-level
> > > > > mathematical operations, then user-space configures each block. We decided
> > > > > against a descriptive approach where user-space would provide a high-level
> > > > > description of the colorspace and other parameters: we want to give more
> > > > > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > > > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > > > seamlessly, and to avoid forcing user-space into a particular color management
> > > > > policy.
> > > >
> > > > Ack on the prescriptive approach, but generic imo. Descriptive pretty much
> > > > means you need the shaders at the same api level for fallback purposes,
> > > > and we're not going to have that ever in kms. That would need something
> > > > like hwc in userspace to work.
> > >
> > > Which would be nice to have but that would be forcing a specific color
> > > pipeline on everyone and we explicitly want to avoid that. There are
> > > just too many trade-offs to consider.
> > >
> > > > And not generic in it's ultimate consquence would mean we just do a blob
> > > > for a crtc with all the vendor register stuff like adf (android display
> > > > framework) does, because I really don't see a point in trying a
> > > > generic-looking-but-not vendor uapi with each color op/stage split out.
> > > >
> > > > So from very far and pure gut feeling, this seems like a good middle
> > > > ground in the uapi design space we have here.
> > >
> > > Good to hear!
> > >
> > > > > We've decided against mirroring the existing CRTC properties
> > > > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > > > > pipeline can significantly differ between vendors and this approach cannot
> > > > > accurately abstract all hardware. In particular, the availability, ordering and
> > > > > capabilities of hardware blocks is different on each display engine. So, we've
> > > > > decided to go for a highly detailed hardware capability discovery.
> > > > >
> > > > > This new uAPI should not be in conflict with existing standard KMS properties,
> > > > > since there are none which control the pre-blending color pipeline at the
> > > > > moment. It does conflict with any vendor-specific properties like
> > > > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > > > > properties. Drivers will need to either reject atomic commits configuring both
> > > > > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > > > > properties and shows the new generic properties when enabled.
> > > > >
> > > > > To use this uAPI, first user-space needs to discover hardware capabilities via
> > > > > KMS objects and properties, then user-space can configure the hardware via an
> > > > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> > > > >
> > > > > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > > > > object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> > > > > property is an enum, each enum entry represents a color pipeline supported by
> > > > > the hardware. The special zero entry indicates that the pipeline is in
> > > > > "bypass"/"no-op" mode. For instance, the following plane properties describe a
> > > > > primary plane with 2 supported pipelines but currently configured in bypass
> > > > > mode:
> > > > >
> > > > >     Plane 10
> > > > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > > > >     ├─ …
> > > > >     └─ "color_pipeline": enum {0, 42, 52} = 0
> > > >
> > > > A bit confused, why is this an enum, and not just an immutable prop that
> > > > points at the first element? You already can disable elements with the
> > > > bypass thing, also bypassing by changing the pointers to the next node in
> > > > the graph seems a bit confusing and redundant.
> > >
> > > We want to allow multiple pipelines to exist and a plane can choose
> > > the pipeline by selecting the first element of the pipeline. The enum
> > > here lists all the possible pipelines that can be attached to the
> > > surface.
> >
> > Ah in that case I guess we do need the flexibility of explicitly
> > enumerated object property right away I guess. The example looked a bit
> > like just bypass would do the trick.
>
> Setting individual pipeline elements to bypass is not flexible enough,
> because it does not allow re-ordering the pipeline elements.
>
> OTOH, hardware does not allow arbitrary re-ordering of elements,
> therefore all the "next" links in the pipeline elements are immutable.
>
> Presumably when some re-ordering in hardware is possible, the number of
> order-variants is small (given we can also bypass individual elements),
> so all variants are explicitly enumerated at the plane property.
>
> This way there are no traps like "you cannot enable all elements at the
> same time" that a single standard pipeline would end up with. All
> elements for all enumerated pipelines are always usable simultaneously,
> I believe.
>
> "Re-ordering" may not even be an accurate description of hardware. Some
> hardware elements could be mutually exclusive, whether they implement
> the same or a different operation.
>
> > > > > The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> > > > > objects. The entry value is an object ID pointing to the head of the linked
> > > > > list (the first operation in the color pipeline).
> > > > >
> > > > > The new COLOROP objects also expose a number of KMS properties. Each has a
> > > > > type, a reference to the next COLOROP object in the linked list, and other
> > > > > type-specific properties. Here is an example for a 1D LUT operation:
> > > >
> > > > Ok no comments from me on the actual color operations and semantics of all
> > > > that, because I have simply nothing to bring to that except confusion :-)
> > > >
> > > > Some higher level thoughts instead:
> > > >
> > > > - I really like that we just go with graph nodes here. I think that was
> > > >   bound to happen sooner or later with kms (we almost got there with
> > > >   writeback, and with hindsight maybe should have).
> > > >
> > > > - Since there's other use-cases for graph nodes (maybe scaler modes, or
> > > >   histogram samplers for adaptive backglight, or blending that goes beyond
> > > >   the stacked alpha blending we have now) it think we should make this all
> > > >   fairly generic:
> > > >   * Add a new graph node kms object type.
> > > >   * Add a class type so that userspace knows which graph nodes it must
> > > >     understand for a feature (like "ColorOp" on planes here), and which it
> > > >     can ignore (like perhaps a scaler node to control the interpolation)
> > > >   * Probably need to adjust the object property type. Currently that
> > > >     accept any object of a given type (crtc, fb, blob are the major ones).
> > > >     I think for these graph nodes we want an explicit enumeration of the
> > > >     possible next objects. In kms thus far we've done that with the
> > > >     separate possible_* mask properties, but they're cumbersome.
> > > >   * It sounds like for now we only have immutable next pointers, so that
> > > >     would simplify the first iteration, but should probably anticipate all
> > > >     this.
> > >
> > > Just to be clear: right now we don't expect any pipeline to be a graph
> > > but only linked lists. It probably doesn't hurt to generalize this to
> > > graphs but that's not what we want to do here (for now).
> >
> > Oh a list is still a graph :-) Also my idea isn't to model a graph data
> > structure, but just the graph nodes, and a bit of scaffolding to handle
> > the links/pointers. Whether you only build a list of a graph out of that
> > is kinda irrelevant.
> >
> > Plus with the multiple pipelines you can already have a non-list in the
> > starting point already.
>
> No, there is no need for a graph. It literally is just multiple
> single-linked lists at the UAPI level, and userspace picks one for the
> plane.
>
> If you change the immutable "next element" properties to mutable, then
> I think the UAPI collapses. It becomes far too hard to know what even
> could work, and probing for success/failure like with KMS in general is
> simply infeasible.
>
> Let's leave the more wild scenarios for the second total rewrite of the
> KMS color API (assuming this is the start of the first). Otherwise we
> get nowhere. That's what Wayland protocol design work has taught us.

A list is still a graph.

What I'm asking for is _not_ that you make this into a full flexible
graph, because that doesn't make sense.

What I'm asking is that we don't add some special colorop nodes, like
we added planes, then cursor/primary planes, then writeback nodes in
the past, because there will be more. Instead I'm asking that you add
a graph node thing (call it a "it's only a list node but in future it
might be a more generic graph node" if that makes it easier to
understand, but maybe the uapi should have a shorter name like NODE,
like in "list node"), with a class value of "ColorOp" and an uapi
promise that userspace can ignore any graph nodes for classes it
doesn't understand when it walks a chain. Plus like the minimum amount
of scaffolding internall to make these graph nodes connect over edges
(which doesn't even have to be walkable really, just common support to
set/get links and check they point at something valid, for some value
of "valid" that makes sense).

I'm not asking you at all to make the graph you have any different, or
any more flexible, or anything like this at all. That can all be done
later on (probably with new classes and stuff since for color
conversion the linear pipeline is a good fit I think). I just don't
want a new GETFOORESOURCES and GETFOO/SETFOO ioctl DRM_MODE_OBJECT_FOO
for every new feature $FOO. Because there's going to be more than
ColorOp (historically about one new class every few years for the past
12 or so years of kms).

Does that make some sense? Or do we have to have this all specific to
ColorOp to make absolutey sure no one gets confused that a color
operations pipeline is linear to make sure that when people read "node
of the color op pipeline with a single immutable next pointer" they
somehow get the funky idea it's a full blown graph network? That seems
a bit excessive to me :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-08  8:58       ` Simon Ser
@ 2023-05-08  9:18         ` Daniel Vetter
  2023-05-08 18:10           ` Harry Wentland
       [not found]         ` <20230508185409.07501f40@n2pa>
  1 sibling, 1 reply; 49+ messages in thread
From: Daniel Vetter @ 2023-05-08  9:18 UTC (permalink / raw)
  To: Simon Ser
  Cc: Pekka Paalanen, wayland-devel, Michel Dänzer,
	DRI Development, xaver.hugl, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Aleix Pol, Sebastian Wick,
	Joshua Ashton

On Mon, 8 May 2023 at 10:58, Simon Ser <contact@emersion.fr> wrote:
>
> On Friday, May 5th, 2023 at 21:53, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Fri, May 05, 2023 at 04:06:26PM +0000, Simon Ser wrote:
> > > On Friday, May 5th, 2023 at 17:28, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > Ok no comments from me on the actual color operations and semantics of all
> > > > that, because I have simply nothing to bring to that except confusion :-)
> > > >
> > > > Some higher level thoughts instead:
> > > >
> > > > - I really like that we just go with graph nodes here. I think that was
> > > >   bound to happen sooner or later with kms (we almost got there with
> > > >   writeback, and with hindsight maybe should have).
> > >
> > > I'd really rather not do graphs here. We only need linked lists as Sebastian
> > > said. Graphs would significantly add more complexity to this proposal, and
> > > I don't think that's a good idea unless there is a strong use-case.
> >
> > You have a graph, because a graph is just nodes + links. I did _not_
> > propose a full generic graph structure, the link pointer would be in the
> > class/type specific structure only. Like how we have the plane->crtc or
> > connector->crtc links already like that (which already _is_ is full blown
> > graph).
>
> I really don't get why a pointer in a struct makes plane->crtc a full-blown
> graph. There is only a single parent-child link. A plane has a reference to a
> CRTC, and nothing more.
>
> You could say that anything is a graph. Yes, even an isolated struct somewhere
> is a graph: one with a single node and no link. But I don't follow what's the
> point of explaining everything with a graph when we only need a much simpler
> subset of the concept of graphs?
>
> Putting the graph thing aside, what are you suggesting exactly from a concrete
> uAPI point-of-view? Introducing a new struct type? Would it be a colorop
> specific struct, or a more generic one? What would be the fields? Why do you
> think that's necessary and better than the current proposal?
>
> My understanding so far is that you're suggesting introducing something like
> this at the uAPI level:
>
>     struct drm_mode_node {
>         uint32_t id;
>
>         uint32_t children_count;
>         uint32_t *children; // list of child object IDs
>     };

Already too much I think

struct drm_mode_node {
    struct drm_mode_object base;
    struct drm_private_obj atomic_base;
    enum drm_mode_node_enum type;
};

The actual graph links would be in the specific type's state
structure, like they are for everything else. And the limits would be
on the property type, we probably need a new DRM_MODE_PROP_OBJECT_ENUM
to make the new limitations work correctly, since the current
DRM_MODE_PROP_OBJECT only limits to a specific type of object, not an
explicit list of drm_mode_object.id.

You might not even need a node subclass for the state stuff, that
would directly be a drm_color_op_state that only embeds
drm_private_state.

Another uapi difference is that the new kms objects would be of type
DRM_MODE_OBJECT_NODE, and would always have a "class" property.

> I don't think this is a good idea for multiple reasons. First, this is
> overkill: we don't need this complexity, and this complexity will make it more
> difficult to reason about the color pipeline. This is a premature abstraction,
> one we don't need right now, and one I heaven't heard a potential future
> use-case for. Sure, one can kill an ant with a sledgehammer if they'd like, but
> that's not the right tool for the job.
>
> Second, this will make user-space miserable. User-space already has a tricky
> task to achieve to translate its abstract descriptive color pipeline to our
> proposed simple list of color operations. If we expose a full-blown graph, then
> the user-space logic will need to handle arbitrary graphs. This will have a
> significant cost (on implementation and testing), which we will be paying in
> terms of time spent and in terms of bugs.

The color op pipeline would still be linear. I did not ask for a non-linear one.

> Last, this kind of generic "node" struct is at odds with existing KMS object
> types. So far, KMS objects are concrete like CRTC, connector, plane, etc.
> "Node" is abstract. This is inconsistent.

Yeah I think I think we should change that. That's essentially the
full extend of my proposal. The classes + possible_foo mask approach
just always felt rather brittle to me (and there's plenty of userspace
out there to prove that's the case), going more explicit with the
links with enumerated combos feels better. Plus it should allow
building a bit cleaner interfaces for drivers to construct the correct
graphs, because drivers _also_ rather consistently got the entire
possible_foo mask business wrong.

> Please let me know whether the above is what you have in mind. If not, please
> explain what exactly you mean by "graphs" in terms of uAPI, and please explain
> why we need it and what real-world use-cases it would solve.

_Way_ too much graph compared to what I'm proposing :-)

Also I guess what's not clear: This is 100% a bikeshed with no impact
on the actual color handling pipeline in any semantic way. At all. If
you think it is, it's not what I mean.

I guess the misunderstanding started out with me asking for "graph
nodes" and you thinking "full blown graph structure with mandatory
flexibility". I really only wanted to bring up the slightly more
generic "node" think, and you can totally think of them as "list
nodes" in the context of color op pipelines.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-07 23:14     ` Dave Airlie
@ 2023-05-08  9:37       ` Pekka Paalanen
  2023-05-08 10:03       ` Jonas Ådahl
  2023-05-09 14:31       ` Harry Wentland
  2 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-08  9:37 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Sebastian Wick, xaver.hugl, Aleix Pol, DRI Development,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Michel Dänzer, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 9678 bytes --]

On Mon, 8 May 2023 09:14:18 +1000
Dave Airlie <airlied@gmail.com> wrote:

> On Sat, 6 May 2023 at 08:21, Sebastian Wick <sebastian.wick@redhat.com> wrote:
> >
> > On Fri, May 5, 2023 at 10:40 PM Dave Airlie <airlied@gmail.com> wrote:  
> > >
> > > On Fri, 5 May 2023 at 01:23, Simon Ser <contact@emersion.fr> wrote:  
> > > >
> > > > Hi all,
> > > >
> > > > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > framebuffer and before it's blended with other planes. With this new uAPI we
> > > > aim to reduce the battery life impact of color management and HDR on mobile
> > > > devices, to improve performance and to decrease latency by skipping
> > > > composition on the 3D engine. This proposal is the result of discussions at
> > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > > discussion.
> > > >
> > > > This proposal takes a prescriptive approach instead of a descriptive approach.
> > > > Drivers describe the available hardware blocks in terms of low-level
> > > > mathematical operations, then user-space configures each block. We decided
> > > > against a descriptive approach where user-space would provide a high-level
> > > > description of the colorspace and other parameters: we want to give more
> > > > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > > seamlessly, and to avoid forcing user-space into a particular color management
> > > > policy.  
> > >
> > > I'm not 100% sold on the prescriptive here, let's see if someone can
> > > get me over the line with some questions later.

Hi Dave,

generic userspace must always be able to fall back to GPU shaders or
something else, when a window suddenly stops being eligible for a KMS
plane. That can happen due to a simple window re-stacking operation for
example, maybe a notification pops up temporarily. Hence, it is highly
desirable to be able to implement the exact same algorithm in shaders
as the display hardware does, in order to not cause visible glitches
on screen.

One way to do that is to have a prescriptive UAPI design. Userspace
decides what algorithms to use for color processing, and the UAPI simply
offers a way to implement those well-defined mathematical operations.
An alternative could be that the UAPI gives userspace back shader
programs that implement the same as what the hardware does, but... ugh.

Choosing the algorithm is policy. Userspace must be in control of
policy, right? Therefore a descriptive UAPI is simply not possible.
There is no single correct algorithm for these things, there are many
flavors, more and less correct, different quality/performance
trade-offs, and even just matters of taste. Sometimes even end user
taste, that might need to be configurable. Applications have built-in
assumptions too, and they vary.

To clarify, a descriptive UAPI is a design where userspace tells the
kernel "my source 1 is sRGB, my source 2 is BT.2100/PQ YCbCr 4:2:0 with
blahblahblah metadata, do whatever to display those on KMS planes
simultaneously". As I mentioned, there is not just one answer to that,
and we should also allow for innovation in the algorithms by everyone,
not just hardware designers.

A prescriptive UAPI is where we communicate mathematical operations
without any semantics. It is inherently free of policy in the kernel.

> > >
> > > My feeling is color pipeline hw is not a done deal, and that hw
> > > vendors will be revising/evolving/churning the hw blocks for a while
> > > longer, as there is no real standards in the area to aim for, all the
> > > vendors are mostly just doing whatever gets Windows over the line and
> > > keeps hw engineers happy. So I have some concerns here around forwards
> > > compatibility and hence the API design.
> > >
> > > I guess my main concern is if you expose a bunch of hw blocks and
> > > someone comes up with a novel new thing, will all existing userspace
> > > work, without falling back to shaders?
> > > Do we have minimum guarantees on what hardware blocks have to be
> > > exposed to build a useable pipeline?
> > > If a hardware block goes away in a new silicon revision, do I have to
> > > rewrite my compositor? or will it be expected that the kernel will
> > > emulate the old pipelines on top of whatever new fancy thing exists.  
> >
> > I think there are two answers to those questions.  
> 
> These aren't selling me much better :-)
> >
> > The first one is that right now KMS already doesn't guarantee that
> > every property is supported on all hardware. The guarantee we have is
> > that properties that are supported on a piece of hardware on a
> > specific kernel will be supported on the same hardware on later
> > kernels. The color pipeline is no different here. For a specific piece
> > of hardware a newer kernel might only change the pipelines in a
> > backwards compatible way and add new pipelines.
> >
> > So to answer your question: if some hardware with a novel pipeline
> > will show up it might not be supported and that's fine. We already
> > have cases where some hardware does not support the gamma lut property
> > but only the CSC property and that breaks night light because we never
> > bothered to write a shader fallback. KMS provides ways to offload work
> > but a generic user space always has to provide a fallback and this
> > doesn't change. Hardware specific user space on the other hand will
> > keep working with the forward compatibility guarantees we want to
> > provide.  
> 
> In my mind we've screwed up already, isn't a case to be made for
> continue down the same path.
> 
> The kernel is meant to be a hardware abstraction layer, not just a
> hardware exposure layer. The kernel shouldn't set policy and there are
> cases where it can't act as an abstraction layer (like where you need
> a compiler), but I'm not sold that this case is one of those yet. I'm
> open to being educated here on why it would be.

If the display hardware cannot do an operation that userspace needs,
would you have the kernel internally have a GPU shader to achieve
that operation? It could be kernel-build-time compiled...

How would you implement all of CRTC properties DEGAMMA, CTM and GAMMA
in a kernel driver when the hardware simply does not have those
operations?

Why would it be a screw-up if an API cannot deliver what hardware
cannot do?

> >
> > The second answer is that we want to provide a user space library
> > which takes a description of a color pipeline and tries to map that to
> > the available KMS color pipelines. If there is a novel color
> > operation, adding support in this library would then make it possible
> > to offload compatible color pipelines on this new hardware for all
> > consumers of the library. Obviously there is no guarantee that
> > whatever color pipeline compositors come up with can actually be
> > realized on specific hardware but that's just an inherent hardware
> > issue.
> >  
> 
> Why does this library need to be in userspace though? If there's a
> library making device dependent decisions, why can't we just make
> those device dependent decisions in the kernel?

What happened to the idea "put it in the kernel only if it has to
be in the kernel"? Userspace is much easier to work with, faster to
release, faster to fix, easier to innovate, and so on.

Kernel UAPI cannot be deprecated, which means the kernel implementation
can never get simpler. A userspace library OTOH can be left in
maintenance mode and new incompatible major version can be started,
maybe as another project, with no burden of having to keep the old
stuff working, because the old stuff will not need to be touched and it
just keeps working same as ever. There can even be several differently
designed userspace libraries for projects to choose from.

We have much less an idea of what such library API should look like
than the kernel UAPI proposed here. There is no Khronos committee here.
I mean, Khronos tried, right? OpenWF?

The aim is to be able to take advantage of hardware to the fullest,
which excludes the possibility of hidden copies in the kernel, which
excludes GPU fallbacks in the kernel, so it's natural the kernel UAPI
design aims to expose hardware the way it is.

> This feels like we are trying to go down the Android HWC road, but we
> aren't in that business.
> 
> My thoughts would be userspace has to have some way to describe what
> it wants anyways, otherwise it does sound like I need to update
> mutter, kwin, surfaceflinger, chromeos, gamescope, every time a new HW
> device comes out that operates slightly different to previously
> generations. This isn't the kernel doing hw abstraction at all, it's
> the kernel just opting out of designing interfaces and it isn't
> something I'm sold on.

Userspace, that does not want to be hardware-specific, always has a
fallback path, usually through Vulkan or OpenGL composition.

Even hardware-specific userspace will never regress due to a kernel
update. You have to swap out hardware in order to potentially "regress".

I never thought that swapping out hardware causing something to stop
working when it has never worked on that hardware ever, could be seen
as a kernel regression. Have the rules changed?


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-07 23:14     ` Dave Airlie
  2023-05-08  9:37       ` Pekka Paalanen
@ 2023-05-08 10:03       ` Jonas Ådahl
  2023-05-09 14:31       ` Harry Wentland
  2 siblings, 0 replies; 49+ messages in thread
From: Jonas Ådahl @ 2023-05-08 10:03 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Sebastian Wick, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Michel Dänzer, Uma Shankar,
	Victoria Brekenfeld, Aleix Pol, Joshua Ashton

On Mon, May 08, 2023 at 09:14:18AM +1000, Dave Airlie wrote:
> On Sat, 6 May 2023 at 08:21, Sebastian Wick <sebastian.wick@redhat.com> wrote:
> >
> > On Fri, May 5, 2023 at 10:40 PM Dave Airlie <airlied@gmail.com> wrote:
> > >
> > > On Fri, 5 May 2023 at 01:23, Simon Ser <contact@emersion.fr> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > > framebuffer and before it's blended with other planes. With this new uAPI we
> > > > aim to reduce the battery life impact of color management and HDR on mobile
> > > > devices, to improve performance and to decrease latency by skipping
> > > > composition on the 3D engine. This proposal is the result of discussions at
> > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > > discussion.
> > > >
> > > > This proposal takes a prescriptive approach instead of a descriptive approach.
> > > > Drivers describe the available hardware blocks in terms of low-level
> > > > mathematical operations, then user-space configures each block. We decided
> > > > against a descriptive approach where user-space would provide a high-level
> > > > description of the colorspace and other parameters: we want to give more
> > > > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > > seamlessly, and to avoid forcing user-space into a particular color management
> > > > policy.
> > >
> > > I'm not 100% sold on the prescriptive here, let's see if someone can
> > > get me over the line with some questions later.
> > >
> > > My feeling is color pipeline hw is not a done deal, and that hw
> > > vendors will be revising/evolving/churning the hw blocks for a while
> > > longer, as there is no real standards in the area to aim for, all the
> > > vendors are mostly just doing whatever gets Windows over the line and
> > > keeps hw engineers happy. So I have some concerns here around forwards
> > > compatibility and hence the API design.
> > >
> > > I guess my main concern is if you expose a bunch of hw blocks and
> > > someone comes up with a novel new thing, will all existing userspace
> > > work, without falling back to shaders?
> > > Do we have minimum guarantees on what hardware blocks have to be
> > > exposed to build a useable pipeline?
> > > If a hardware block goes away in a new silicon revision, do I have to
> > > rewrite my compositor? or will it be expected that the kernel will
> > > emulate the old pipelines on top of whatever new fancy thing exists.
> >
> > I think there are two answers to those questions.
> 
> These aren't selling me much better :-)
> >
> > The first one is that right now KMS already doesn't guarantee that
> > every property is supported on all hardware. The guarantee we have is
> > that properties that are supported on a piece of hardware on a
> > specific kernel will be supported on the same hardware on later
> > kernels. The color pipeline is no different here. For a specific piece
> > of hardware a newer kernel might only change the pipelines in a
> > backwards compatible way and add new pipelines.
> >
> > So to answer your question: if some hardware with a novel pipeline
> > will show up it might not be supported and that's fine. We already
> > have cases where some hardware does not support the gamma lut property
> > but only the CSC property and that breaks night light because we never
> > bothered to write a shader fallback. KMS provides ways to offload work
> > but a generic user space always has to provide a fallback and this
> > doesn't change. Hardware specific user space on the other hand will
> > keep working with the forward compatibility guarantees we want to
> > provide.
> 
> In my mind we've screwed up already, isn't a case to be made for
> continue down the same path.
> 
> The kernel is meant to be a hardware abstraction layer, not just a
> hardware exposure layer. The kernel shouldn't set policy and there are
> cases where it can't act as an abstraction layer (like where you need
> a compiler), but I'm not sold that this case is one of those yet. I'm
> open to being educated here on why it would be.

It would still be an abstraction of the hardware, just that the level
of abstraction is a bit "lower" than your intuition currently tells you
we should have. IMO it's not too different from the kernel providing low
level input events describing what what the hardware can do and does,
with a rather massive user space library (libinput) turning all of that
low level nonsense to actual useful abstractions.

In this case it's the other way around, the kernel provides vendor
independent knobs that describe what the output hardware can do, and
exactly how it does it, and a userspace library turns that into a
different and perhaps more useful abstraction.

I realize input and output is dramatically different, just making a
point about the level of abstraction that is ideal is not necessary "the
more the better".

> 
> >
> > The second answer is that we want to provide a user space library
> > which takes a description of a color pipeline and tries to map that to
> > the available KMS color pipelines. If there is a novel color
> > operation, adding support in this library would then make it possible
> > to offload compatible color pipelines on this new hardware for all
> > consumers of the library. Obviously there is no guarantee that
> > whatever color pipeline compositors come up with can actually be
> > realized on specific hardware but that's just an inherent hardware
> > issue.
> >
> 
> Why does this library need to be in userspace though? If there's a
> library making device dependent decisions, why can't we just make
> those device dependent decisions in the kernel?

Compositors will want to switch between using the KMS color pipeline and
using shaders without visible differences at any point in time, or
predictable visible differences. Lets say for example you are playing a
video, and everything bypasses the GPU, and we're saving important power
and all that. Suddenly the user moves the mouse or touches the screen
and we'll suddenly have overlays that make it necessary to stop
bypassing the GPU and start compositing.

These transitions should not result in any visible difference, and that
is hard/impossible to do perfectly if the level of abstraction is too
high, as implementation details of the pipeline would be hidden. The
decisions the kernel had to make to turn the descriptive declaration
into actual hardware configuration wouldn't be predictable or known.

Userspace needs to know how the kernel implements a pipeline, so that it
can decide if it's "good enough" or perhaps even adapt its compositing
to match it so that it can implement non-glitchy offloading. The
compositor should make the decision if pixel perfect transitions is
mandatory or not, it shouldn't be a policy implemented inside the
kernel.

It is also within scope of a library that provides the descriptive API
to also know how to handle the fallback, e.g. by providing shaders that
compositors can use when compositing. The kernel is the wrong place to
generate shaders.

> 
> This feels like we are trying to go down the Android HWC road, but we
> aren't in that business.
> 
> My thoughts would be userspace has to have some way to describe what
> it wants anyways, otherwise it does sound like I need to update
> mutter, kwin, surfaceflinger, chromeos, gamescope, every time a new HW
> device comes out that operates slightly different to previously
> generations. This isn't the kernel doing hw abstraction at all, it's
> the kernel just opting out of designing interfaces and it isn't
> something I'm sold on.

It is true that for a new generation of hardware that changes the color
pipeline in a way that makes existing userspace no longer able to
off-load compositing, needs an updated userspace, but the grand long
term idea is that one wouldn't update all those compositors, but only
the shared library that provides the descriptive API they all (ideally)
make use of. The compositors still handle interacting with KMS
themselves, but would share this helper library that can help
configuring a subset of the knobs KMS provide to their individual needs.

So indeed the kernel isn't doing all the abstraction, it'd be the kernel
together with a userspace library.

Jonas

> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-08  9:18         ` Daniel Vetter
@ 2023-05-08 18:10           ` Harry Wentland
  0 siblings, 0 replies; 49+ messages in thread
From: Harry Wentland @ 2023-05-08 18:10 UTC (permalink / raw)
  To: Daniel Vetter, Simon Ser
  Cc: Pekka Paalanen, wayland-devel, Michel Dänzer,
	DRI Development, xaver.hugl, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Aleix Pol, Sebastian Wick,
	Joshua Ashton



On 5/8/23 05:18, Daniel Vetter wrote:
> On Mon, 8 May 2023 at 10:58, Simon Ser <contact@emersion.fr> wrote:
>>
>> On Friday, May 5th, 2023 at 21:53, Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>>> On Fri, May 05, 2023 at 04:06:26PM +0000, Simon Ser wrote:
>>>> On Friday, May 5th, 2023 at 17:28, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>
>>>>> Ok no comments from me on the actual color operations and semantics of all
>>>>> that, because I have simply nothing to bring to that except confusion :-)
>>>>>
>>>>> Some higher level thoughts instead:
>>>>>
>>>>> - I really like that we just go with graph nodes here. I think that was
>>>>>   bound to happen sooner or later with kms (we almost got there with
>>>>>   writeback, and with hindsight maybe should have).
>>>>
>>>> I'd really rather not do graphs here. We only need linked lists as Sebastian
>>>> said. Graphs would significantly add more complexity to this proposal, and
>>>> I don't think that's a good idea unless there is a strong use-case.
>>>
>>> You have a graph, because a graph is just nodes + links. I did _not_
>>> propose a full generic graph structure, the link pointer would be in the
>>> class/type specific structure only. Like how we have the plane->crtc or
>>> connector->crtc links already like that (which already _is_ is full blown
>>> graph).
>>
>> I really don't get why a pointer in a struct makes plane->crtc a full-blown
>> graph. There is only a single parent-child link. A plane has a reference to a
>> CRTC, and nothing more.
>>
>> You could say that anything is a graph. Yes, even an isolated struct somewhere
>> is a graph: one with a single node and no link. But I don't follow what's the
>> point of explaining everything with a graph when we only need a much simpler
>> subset of the concept of graphs?
>>
>> Putting the graph thing aside, what are you suggesting exactly from a concrete
>> uAPI point-of-view? Introducing a new struct type? Would it be a colorop
>> specific struct, or a more generic one? What would be the fields? Why do you
>> think that's necessary and better than the current proposal?
>>
>> My understanding so far is that you're suggesting introducing something like
>> this at the uAPI level:
>>
>>     struct drm_mode_node {
>>         uint32_t id;
>>
>>         uint32_t children_count;
>>         uint32_t *children; // list of child object IDs
>>     };
> 
> Already too much I think
> 
> struct drm_mode_node {
>     struct drm_mode_object base;
>     struct drm_private_obj atomic_base;
>     enum drm_mode_node_enum type;
> };
> 

This would be about as much as we would want for a 'node' struct, for
reasons that others already outlined. In short, a good API for a color
pipeline needs to do a good job communicating the constraints. Hence the
"next" pointer needs to be live in a colorop struct, whether it's a
drm_private_obj or its own thing.

I'm not quite seeing much benefits with a drm_mode_node other than being
able to have a GET_NODE IOCTL instead of a GET_COLOROP, the former being
able to be re-used for future scenarios that might need a "node." I feel
this adds a layer of confusion to the API.

Harry

> The actual graph links would be in the specific type's state
> structure, like they are for everything else. And the limits would be
> on the property type, we probably need a new DRM_MODE_PROP_OBJECT_ENUM
> to make the new limitations work correctly, since the current
> DRM_MODE_PROP_OBJECT only limits to a specific type of object, not an
> explicit list of drm_mode_object.id.
> 
> You might not even need a node subclass for the state stuff, that
> would directly be a drm_color_op_state that only embeds
> drm_private_state.
> 
> Another uapi difference is that the new kms objects would be of type
> DRM_MODE_OBJECT_NODE, and would always have a "class" property.
> 
>> I don't think this is a good idea for multiple reasons. First, this is
>> overkill: we don't need this complexity, and this complexity will make it more
>> difficult to reason about the color pipeline. This is a premature abstraction,
>> one we don't need right now, and one I heaven't heard a potential future
>> use-case for. Sure, one can kill an ant with a sledgehammer if they'd like, but
>> that's not the right tool for the job.
>>
>> Second, this will make user-space miserable. User-space already has a tricky
>> task to achieve to translate its abstract descriptive color pipeline to our
>> proposed simple list of color operations. If we expose a full-blown graph, then
>> the user-space logic will need to handle arbitrary graphs. This will have a
>> significant cost (on implementation and testing), which we will be paying in
>> terms of time spent and in terms of bugs.
> 
> The color op pipeline would still be linear. I did not ask for a non-linear one.
> 
>> Last, this kind of generic "node" struct is at odds with existing KMS object
>> types. So far, KMS objects are concrete like CRTC, connector, plane, etc.
>> "Node" is abstract. This is inconsistent.
> 
> Yeah I think I think we should change that. That's essentially the
> full extend of my proposal. The classes + possible_foo mask approach
> just always felt rather brittle to me (and there's plenty of userspace
> out there to prove that's the case), going more explicit with the
> links with enumerated combos feels better. Plus it should allow
> building a bit cleaner interfaces for drivers to construct the correct
> graphs, because drivers _also_ rather consistently got the entire
> possible_foo mask business wrong.
> 
>> Please let me know whether the above is what you have in mind. If not, please
>> explain what exactly you mean by "graphs" in terms of uAPI, and please explain
>> why we need it and what real-world use-cases it would solve.
> 
> _Way_ too much graph compared to what I'm proposing :-)
> 
> Also I guess what's not clear: This is 100% a bikeshed with no impact
> on the actual color handling pipeline in any semantic way. At all. If
> you think it is, it's not what I mean.
> 
> I guess the misunderstanding started out with me asking for "graph
> nodes" and you thinking "full blown graph structure with mandatory
> flexibility". I really only wanted to bring up the slightly more
> generic "node" think, and you can totally think of them as "list
> nodes" in the context of color op pipelines.
> -Daniel


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-04 15:22 [RFC] Plane color pipeline KMS uAPI Simon Ser
                   ` (3 preceding siblings ...)
  2023-05-05 20:40 ` Dave Airlie
@ 2023-05-09  8:04 ` Pekka Paalanen
       [not found] ` <4341dac6-ada1-2a75-1c22-086d96408a85@quicinc.com>
  5 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-09  8:04 UTC (permalink / raw)
  To: Simon Ser
  Cc: xaver.hugl, DRI Development, wayland-devel, Melissa Wen,
	Jonas Ådahl, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Aleix Pol, Joshua Ashton, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 6480 bytes --]

On Thu, 04 May 2023 15:22:59 +0000
Simon Ser <contact@emersion.fr> wrote:

> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
> 
> This proposal takes a prescriptive approach instead of a descriptive approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate exactly the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color management
> policy.
> 
> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, ordering and
> capabilities of hardware blocks is different on each display engine. So, we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> properties and shows the new generic properties when enabled.

Hi,

I have some further ideas which do conflict with some existing KMS
properties. This dives into the color encoding specific side of the
UAPI.

The main idea is to make the color pipeline not specific to RGB. We
might actually be having YCbCr, XYZ, ICtCp and whatnot instead, at
least in the middle of a pipeline. The aim is to avoid the confusion
from statements like "my red channel is actually luma and not red". So
it's purely syntactic. ISTR some people being against saying "R is just
a channel name, it's not necessarily a red component."

Therefore I propose to address the color channels with indices instead:

	ch 0, ch 1, ch 2, ch 3

Then we define the mapping between pixel and wire formats and the
indices:

	R = ch 0
	G = ch 1
	B = ch 2
	A = ch 3

	Y = ch 0
	U = ch 1
	V = ch 2

If necessary, the following can also be defined:

	Z = ch 1
	X = ch 2
	L = ch 0
	M = ch 1
	S = ch 2

The Y from YUV and Y from XYZ share the designation for the name's sake
although they are not the same quantity. If YUV is not a well-defined
designation wrt. YCbCr, ICtCp and everything else in the same category,
we can assign Cb, Cr, I, Ct, Cp etc. instead. That might be more clear
anyway even if there is a popular convention.

We can also choose differently, to e.g. match the H.273 mapping where
channels are assigned such that Y=G, Cr=R, Cb=B. H.273 gives mappings
between almost all of these, so if using those make more sense, then
let's use those. In the end it shouldn't matter too much, since one
does not arbitrarily mix channels from different formats. Special care
needs to be taken when defining COLOROP elements that do not handle all
channels interchangeably. (E.g. a curve set element is mostly
channel-agnostic when it applies the same curve to channels 0-2, but ch
3 is pass-through.)

Then, we define COLOROP elements in terms of channel indices. This
removes any implied connection to any specific color coding. Elements
that just do not make sense for arbitrary channel ordering, e.g.
special-case elements or enumerated matrix elements with a specific
purpose, will document the intended usage and the expected channel
mapping.

The main reason to do all this is to ultimately allow e.g. limited
range YCbCr scanout with a fully pass-through pipeline with no implied
conversion to or from RGB.

This is where some existing KMS properties will conflict: those that
affect how current implicit YUV-RGB conversions are done. These
properties shall be replaced with COLOROP elements in pipelines, so
that they can be controlled explicitly and we can know where they
reside wrt. e.g. sampling operations. Chroma channel reconstruction
from sub-sampled chroma planes could potentially be explicitly
represented, and so it could also be controlled (chroma siting).

"Nominal" or "normalized" color value encoding at the input and output
of each COLOROP element needs to be defined as well. Some elements,
like matrices, can theoretically handle arbitrary values, but some
elements like LUTs are inherently limited in their input values.
Regardless of how a LUT is defined, it almost always assumes an input
range [0.0, 1.0] of nominal color value encoding. This is particularly
important for cases where out-of-unit-range values are expected:
- scRGB
- maybe some HDR encodings
- YCbCr and anything else with chroma which is normally [-0.5, 0.5]
- xvYCC

Most likely we need to find some way to map everything to the nominal
range [0.0, 1.0] if at all possible. We also need to know how elements
handle input values outside of that range.

The KMS property to control the scaling filter would be replaced
by the scaling COLOROP element with changeable filter property.

All this applies to pre-blending pipelines already. For the
post-blending pipeline we have the added complication of drivers
automatically choosing the wire format, but maybe that can be
hand-waved with a special-purpose COLOROP element that does whatever
the driver chooses, until it can be fully controlled from userspace.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
       [not found]         ` <20230508185409.07501f40@n2pa>
@ 2023-05-09  8:17           ` Pekka Paalanen
  0 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-09  8:17 UTC (permalink / raw)
  To: Steven Kucharzyk; +Cc: DRI Development, wayland-devel

[-- Attachment #1: Type: text/plain, Size: 1428 bytes --]

On Mon, 8 May 2023 18:54:09 -0500
Steven Kucharzyk <stvr_8888@comcast.net> wrote:

> I'd like to ask if there is a block/flow chart/diagram that has been
> created that represent the elements that are being discussed for this
> RFC? If so, would you be so kind as to point me to it or send it to me?

Hi Steven,

the whole point of the design is that there is no predefined block
diagram or flow chart. It would not fit hardware well, as hardware
generations and vendors do not generally have a common design. Instead,
the idea is to model what the hardware can do, and for that each driver
will create a set of specific pipelines the hardware implements.
Userspace then choose a pipeline that suits it and populates its
parameters.

As for the elements themselves, we can hopefully define some commonly
available types, but undoubtedly there will be a few hardware-specific
elements as well. Otherwise some piece of special hardware functionality
cannot be used at all.

The job of defining a generic pipeline model and mapping that to actual
hardware elements is left for a userspace library. I expect there will
be multiple pipeline models, more to be introduced over time. Hence
putting that in a userspace library instead of carving it in stone at
the kernel UAPI.


Next time, please do use reply-to-all, you have again dropped everyone
and other mailing lists from the CC.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 13:30   ` Joshua Ashton
  2023-05-05 14:16     ` Pekka Paalanen
@ 2023-05-09 11:23     ` Melissa Wen
  2023-05-09 11:47       ` Pekka Paalanen
  2023-05-11 21:21     ` Simon Ser
  2 siblings, 1 reply; 49+ messages in thread
From: Melissa Wen @ 2023-05-09 11:23 UTC (permalink / raw)
  To: Joshua Ashton
  Cc: Jonas Ådahl, DRI Development, xaver.hugl,
	Victoria Brekenfeld, Pekka Paalanen, Uma Shankar,
	Michel Dänzer, Aleix Pol, Sebastian Wick, wayland-devel

[-- Attachment #1: Type: text/plain, Size: 17724 bytes --]

On 05/05, Joshua Ashton wrote:
> Some corrections and replies inline.
> 
> On Fri, 5 May 2023 at 12:42, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Thu, 04 May 2023 15:22:59 +0000
> > Simon Ser <contact@emersion.fr> wrote:
> >
> > > Hi all,
> > >
> > > The goal of this RFC is to expose a generic KMS uAPI to configure the color
> > > pipeline before blending, ie. after a pixel is tapped from a plane's
> > > framebuffer and before it's blended with other planes. With this new uAPI we
> > > aim to reduce the battery life impact of color management and HDR on mobile
> > > devices, to improve performance and to decrease latency by skipping
> > > composition on the 3D engine. This proposal is the result of discussions at
> > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> > > familiar with the AMD, Intel and NVIDIA hardware have participated in the
> > > discussion.
> >
> > Hi Simon,
> >
> > this is an excellent write-up, thank you!
> >
> > Harry's question about what constitutes UAPI is a good one for danvet.
> >
> > I don't really have much to add here, a couple inline comments. I think
> > this could work.
> >
> > >
> > > This proposal takes a prescriptive approach instead of a descriptive approach.
> > > Drivers describe the available hardware blocks in terms of low-level
> > > mathematical operations, then user-space configures each block. We decided
> > > against a descriptive approach where user-space would provide a high-level
> > > description of the colorspace and other parameters: we want to give more
> > > control and flexibility to user-space, e.g. to be able to replicate exactly the
> > > color pipeline with shaders and switch between shaders and KMS pipelines
> > > seamlessly, and to avoid forcing user-space into a particular color management
> > > policy.
> > >
> > > We've decided against mirroring the existing CRTC properties
> > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> > > pipeline can significantly differ between vendors and this approach cannot
> > > accurately abstract all hardware. In particular, the availability, ordering and
> > > capabilities of hardware blocks is different on each display engine. So, we've
> > > decided to go for a highly detailed hardware capability discovery.
> > >
> > > This new uAPI should not be in conflict with existing standard KMS properties,
> > > since there are none which control the pre-blending color pipeline at the
> > > moment. It does conflict with any vendor-specific properties like
> > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> > > properties. Drivers will need to either reject atomic commits configuring both
> > > uAPIs, or alternatively we could add a DRM client cap which hides the vendor
> > > properties and shows the new generic properties when enabled.
> > >
> > > To use this uAPI, first user-space needs to discover hardware capabilities via
> > > KMS objects and properties, then user-space can configure the hardware via an
> > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> > >
> > > Our proposal introduces a new "color_pipeline" plane property, and a new KMS
> > > object type, "COLOROP" (short for color operation). The "color_pipeline" plane
> > > property is an enum, each enum entry represents a color pipeline supported by
> > > the hardware. The special zero entry indicates that the pipeline is in
> > > "bypass"/"no-op" mode. For instance, the following plane properties describe a
> > > primary plane with 2 supported pipelines but currently configured in bypass
> > > mode:
> > >
> > >     Plane 10
> > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > >     ├─ …
> > >     └─ "color_pipeline": enum {0, 42, 52} = 0
> > >
> > > The non-zero entries describe color pipelines as a linked list of COLOROP KMS
> > > objects. The entry value is an object ID pointing to the head of the linked
> > > list (the first operation in the color pipeline).
> > >
> > > The new COLOROP objects also expose a number of KMS properties. Each has a
> > > type, a reference to the next COLOROP object in the linked list, and other
> > > type-specific properties. Here is an example for a 1D LUT operation:
> > >
> > >     Color operation 42
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >
> > > To configure this hardware block, user-space can fill a KMS blob with 4096 u32
> > > entries, then set "lut_data" to the blob ID. Other color operation types might
> > > have different properties.
> > >
> > > Here is another example with a 3D LUT:
> > >
> > >     Color operation 42
> > >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> > >     ├─ "lut_size": immutable range = 33
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >
> > > And one last example with a matrix:
> > >
> > >     Color operation 42
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >
> > > [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is
> > > a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> > > blocks which can be bypassed instead.]
> > >
> > > [Jonas note: perhaps a single "data" property for both LUTs and matrices
> > > would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> > >
> > > If some hardware supports re-ordering operations in the color pipeline, the
> > > driver can expose multiple pipelines with different operation ordering, and
> > > user-space can pick the ordering it prefers by selecting the right pipeline.
> > > The same scheme can be used to expose hardware blocks supporting multiple
> > > precision levels.
> > >
> > > That's pretty much all there is to it, but as always the devil is in the
> > > details.
> > >
> > > First, we realized that we need a way to indicate where the scaling operation
> > > is happening. The contents of the framebuffer attached to the plane might be
> > > scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on
> > > the colorspace scaling is applied in, the result will be different, so we need
> > > a way for the kernel to indicate which hardware blocks are pre-scaling, and
> > > which ones are post-scaling. We introduce a special "scaling" operation type,
> > > which is part of the pipeline like other operations but serves an informational
> > > role only (effectively, the operation cannot be configured by user-space, all
> > > of its properties are immutable). For example:
> > >
> > >     Color operation 43
> > >     ├─ "type": immutable enum {Scaling} = Scaling
> > >     └─ "next": immutable color operation ID = 44
> >
> > I like this.
> >
> > >
> > > [Simon note: an alternative would be to split the color pipeline into two, by
> > > having two plane properties ("color_pipeline_pre_scale" and
> > > "color_pipeline_post_scale") instead of a single one. This would be similar to
> > > the way we want to split pre-blending and post-blending. This could be less
> > > expressive for drivers, there may be hardware where there are dependencies
> > > between the pre- and post-scaling pipeline?]
> > >
> > > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> > > contains some fixed-function blocks which convert from LMS to ICtCp and cannot
> > > be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs
> > > where user-space provides a high-level description of the colorspace
> > > conversions it needs to perform, and this is at odds with our KMS uAPI
> > > proposal. To address this issue, we suggest adding a special block type which
> > > describes a fixed conversion from one colorspace to another and cannot be
> > > configured by user-space. Then user-space will need to accomodate its pipeline
> > > for these special blocks. Such fixed hardware blocks need to be well enough
> > > documented so that they can be implemented via shaders.
> > >
> > > We also noted that it should always be possible for user-space to completely
> > > disable the color pipeline and switch back to bypass/identity without a
> > > modeset. Some drivers will need to fail atomic commits for some color
> > > pipelines, in particular for some specific LUT payloads. For instance, AMD
> > > doesn't support curves which are too steep, and Intel doesn't support curves
> > > which decrease. This isn't something which routinely happens, but there might
> > > be more cases where the hardware needs to reject the pipeline. Thus, when
> > > user-space has a running KMS color pipeline, then hits a case where the
> > > pipeline cannot keep running (gets rejected by the driver), user-space needs to
> > > be able to immediately fall back to shaders without any glitch. This doesn't
> > > seem to be an issue for AMD, Intel and NVIDIA.
> > >
> > > This uAPI is extensible: we can add more color operations, and we can add more
> > > properties for each color operation type. For instance, we might want to add
> > > support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise
> > > the effective precision of the LUTs. The uAPI is deliberately somewhat minimal
> > > to keep the scope of the proposal manageable.
> > >
> > > Later on, we plan to re-use the same machinery for post-blending color
> > > pipelines. There are some more details about post-blending which have been
> > > separately debated at the hackfest, but we believe it's a viable plan. This
> > > solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so
> > > we'd like to introduce a client cap to hide the old properties and show the new
> > > post-blending color pipeline properties.
> > >
> > > We envision a future user-space library to translate a high-level descriptive
> > > color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but
> > > for color pipelines"). The library could also offer a translation into shaders.
> > > This should help share more infrastructure between compositors and ease KMS
> > > offloading. This should also help dealing with the NVIDIA case.
> > >
> > > To wrap things up, let's take a real-world example: how would gamescope [2]
> > > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color
> > > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> > >
> > > AMD would expose the following objects and properties:
> > >
> > >     Plane 10
> > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > >     └─ "color_pipeline": enum {0, 42} = 0
> > >     Color operation 42 (input CSC)
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >     Color operation 43
> > >     ├─ "type": enum {Scaling} = Scaling
> > >     └─ "next": immutable color operation ID = 44
> > >     Color operation 44 (DeGamma)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> > >     └─ "next": immutable color operation ID = 45
> 
> Some vendors have per-tap degamma and some have a degamma after the sample.
> How do we distinguish that behaviour?
> It is important to know.
> 
> > >     Color operation 45 (gamut remap)
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 46
> > >     Color operation 46 (shaper LUT RAM)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 47
> > >     Color operation 47 (3D LUT RAM)
> > >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> > >     ├─ "lut_size": immutable range = 17
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 48
> > >     Color operation 48 (blend gamma)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 0
> > >
> > > To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> > > display, gamescope would perform an atomic commit with the following property
> > > values:
> > >
> > >     Plane 10
> > >     └─ "color_pipeline" = 42
> > >     Color operation 42 (input CSC)
> > >     └─ "matrix_data" = PQ → scRGB (TF)
> 
> ^
> Not sure what this is.
> We don't use an input CSC before degamma.
> 
> > >     Color operation 44 (DeGamma)
> > >     └─ "type" = Bypass
> 
> ^
> If we did PQ, this would be PQ -> Linear / 80
> If this was sRGB, it'd be sRGB -> Linear
> If this was scRGB this would be just treating it as it is. So... Linear / 80.
> 
> > >     Color operation 45 (gamut remap)
> > >     └─ "matrix_data" = scRGB (TF) → PQ
> 
> ^
> This is wrong, we just use this to do scRGB primaries (709) to 2020.
> 
> We then go from scRGB -> PQ to go into our shaper + 3D LUT.
> 
> > >     Color operation 46 (shaper LUT RAM)
> > >     └─ "lut_data" = PQ → Display native
> 
> ^
> "Display native" is just the response curve of the display.
> In HDR10, this would just be PQ -> PQ
> If we were doing HDR10 on SDR, this would be PQ -> Gamma 2.2 (mapped
> from 0 to display native luminance) [with a potential bit of headroom
> for tonemapping in the 3D LUT]
> For SDR on HDR10 this would be Gamma 2.2 -> PQ (Not intending to start
> an sRGB vs G2.2 argument here! :P)
> 
> > >     Color operation 47 (3D LUT RAM)
> > >     └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > >     Color operation 48 (blend gamma)
> > >     └─ "1d_curve_type" = PQ
> 
> ^
> This is wrong, this should be Display Native -> Linearized Display Referred

This is a good point to discuss. I understand for the HDR10 case that we
are just setting an enumerated TF (that is PQ for this case - correct me
if I got it wrong) but, unlike when we use a user-LUT, we don't know
from the API that this enumerated TF value with an empty LUT is used for
linearizing/degamma. Perhaps this could come as a pair? Any idea?

> 
> >
> > You cannot do a TF with a matrix, and a gamut remap with a matrix on
> > electrical values is certainly surprising, so the example here is a
> > bit odd, but I don't think that hurts the intention of demonstration.
> 
> I have done some corrections inline.
> 
> You can see our fully correct color pipeline here:
> https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png
> 
> Please let me know if you have any more questions about our color pipeline.
> 
> >
> > Btw. ISTR that if you want to do scaling properly with alpha channel,
> > you need optical values multiplied by alpha. Alpha vs. scaling is just
> > yet another thing to look into, and TF operations do not work with
> > pre-mult.
> 
> What are your concerns here?
> 
> Having pre-multiplied alpha is fine with a TF: the alpha was
> premultiplied in linear, then encoded with the TF by the client.
> If you think of a TF as something something relative to a bunch of
> reference state or whatever then you might think "oh you can't do
> that!", but you really can.
> It's really best to just think of it as a mathematical encoding of a
> value in all instances that we touch.
> 
> The only issue is that you lose precision from having pre-multiplied
> alpha as it's quantized to fit into the DRM format rather than using
> the full range then getting divided by the alpha at blend time.
> It doesn't end up being a visible issue ever however in my experience, at 8bpc.
> 
> Thanks
>  - Joshie 🐸✨
> 
> >
> >
> > Thanks,
> > pq
> >
> > >
> > > I hope comparing these properties to the diagrams linked above can help
> > > understand how the uAPI would be used and give an idea of its viability.
> > >
> > > Please feel free to provide feedback! It would be especially useful to have
> > > someone familiar with Arm SoCs look at this, to confirm that this proposal
> > > would work there.
> > >
> > > Unless there is a show-stopper, we plan to follow up this RFC with
> > > implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> > >
> > > Many thanks to everybody who contributed to the hackfest, on-site or remotely!
> > > Let's work together to make this happen!
> > >
> > > Simon, on behalf of the hackfest participants
> > >
> > > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> > > [2]: https://github.com/ValveSoftware/gamescope
> > > [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg
> >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-09 11:23     ` Melissa Wen
@ 2023-05-09 11:47       ` Pekka Paalanen
  2023-05-09 17:01         ` Melissa Wen
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-09 11:47 UTC (permalink / raw)
  To: Melissa Wen
  Cc: xaver.hugl, DRI Development, wayland-devel, Victoria Brekenfeld,
	Jonas Ådahl, Uma Shankar, Joshua Ashton, Michel Dänzer,
	Aleix Pol, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 2043 bytes --]

On Tue, 9 May 2023 10:23:49 -0100
Melissa Wen <mwen@igalia.com> wrote:

> On 05/05, Joshua Ashton wrote:
> > Some corrections and replies inline.
> > 
> > On Fri, 5 May 2023 at 12:42, Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > >
> > > On Thu, 04 May 2023 15:22:59 +0000
> > > Simon Ser <contact@emersion.fr> wrote:
> > >  

...

> > > >     Color operation 47 (3D LUT RAM)
> > > >     └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > > >     Color operation 48 (blend gamma)
> > > >     └─ "1d_curve_type" = PQ  
> > 
> > ^
> > This is wrong, this should be Display Native -> Linearized Display Referred  
> 
> This is a good point to discuss. I understand for the HDR10 case that we
> are just setting an enumerated TF (that is PQ for this case - correct me
> if I got it wrong) but, unlike when we use a user-LUT, we don't know
> from the API that this enumerated TF value with an empty LUT is used for
> linearizing/degamma. Perhaps this could come as a pair? Any idea?

PQ curve is an EOTF, so it's always from electrical to optical.

Are you asking for something like

"1d_curve_type" = "PQ EOTF"

vs.

"1d_curve_type" = "inverse PQ EOTF"?

I think that's how it should work. It's not a given that if a
hardware block can do a curve, it can also do its inverse. They need to
be advertised explicitly.


Thanks,
pq

ps. I picked my nick in the 90s. Any resemblance to Perceptual
Quantizer is unintended. ;-)


> > >
> > > You cannot do a TF with a matrix, and a gamut remap with a matrix on
> > > electrical values is certainly surprising, so the example here is a
> > > bit odd, but I don't think that hurts the intention of demonstration.  
> > 
> > I have done some corrections inline.
> > 
> > You can see our fully correct color pipeline here:
> > https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > 
> > Please let me know if you have any more questions about our color pipeline.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-07 23:14     ` Dave Airlie
  2023-05-08  9:37       ` Pekka Paalanen
  2023-05-08 10:03       ` Jonas Ådahl
@ 2023-05-09 14:31       ` Harry Wentland
  2023-05-09 19:53         ` Dave Airlie
  2 siblings, 1 reply; 49+ messages in thread
From: Harry Wentland @ 2023-05-09 14:31 UTC (permalink / raw)
  To: Dave Airlie, Sebastian Wick
  Cc: Aleix Pol, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Michel Dänzer, Joshua Ashton



On 5/7/23 19:14, Dave Airlie wrote:
> On Sat, 6 May 2023 at 08:21, Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>
>> On Fri, May 5, 2023 at 10:40 PM Dave Airlie <airlied@gmail.com> wrote:
>>>
>>> On Fri, 5 May 2023 at 01:23, Simon Ser <contact@emersion.fr> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> The goal of this RFC is to expose a generic KMS uAPI to configure the color
>>>> pipeline before blending, ie. after a pixel is tapped from a plane's
>>>> framebuffer and before it's blended with other planes. With this new uAPI we
>>>> aim to reduce the battery life impact of color management and HDR on mobile
>>>> devices, to improve performance and to decrease latency by skipping
>>>> composition on the 3D engine. This proposal is the result of discussions at
>>>> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
>>>> familiar with the AMD, Intel and NVIDIA hardware have participated in the
>>>> discussion.
>>>>
>>>> This proposal takes a prescriptive approach instead of a descriptive approach.
>>>> Drivers describe the available hardware blocks in terms of low-level
>>>> mathematical operations, then user-space configures each block. We decided
>>>> against a descriptive approach where user-space would provide a high-level
>>>> description of the colorspace and other parameters: we want to give more
>>>> control and flexibility to user-space, e.g. to be able to replicate exactly the
>>>> color pipeline with shaders and switch between shaders and KMS pipelines
>>>> seamlessly, and to avoid forcing user-space into a particular color management
>>>> policy.
>>>
>>> I'm not 100% sold on the prescriptive here, let's see if someone can
>>> get me over the line with some questions later.
>>>
>>> My feeling is color pipeline hw is not a done deal, and that hw
>>> vendors will be revising/evolving/churning the hw blocks for a while
>>> longer, as there is no real standards in the area to aim for, all the
>>> vendors are mostly just doing whatever gets Windows over the line and
>>> keeps hw engineers happy. So I have some concerns here around forwards
>>> compatibility and hence the API design.
>>>
>>> I guess my main concern is if you expose a bunch of hw blocks and
>>> someone comes up with a novel new thing, will all existing userspace
>>> work, without falling back to shaders?
>>> Do we have minimum guarantees on what hardware blocks have to be
>>> exposed to build a useable pipeline?
>>> If a hardware block goes away in a new silicon revision, do I have to
>>> rewrite my compositor? or will it be expected that the kernel will
>>> emulate the old pipelines on top of whatever new fancy thing exists.
>>
>> I think there are two answers to those questions.
> 
> These aren't selling me much better :-)
>>
>> The first one is that right now KMS already doesn't guarantee that
>> every property is supported on all hardware. The guarantee we have is
>> that properties that are supported on a piece of hardware on a
>> specific kernel will be supported on the same hardware on later
>> kernels. The color pipeline is no different here. For a specific piece
>> of hardware a newer kernel might only change the pipelines in a
>> backwards compatible way and add new pipelines.
>>
>> So to answer your question: if some hardware with a novel pipeline
>> will show up it might not be supported and that's fine. We already
>> have cases where some hardware does not support the gamma lut property
>> but only the CSC property and that breaks night light because we never
>> bothered to write a shader fallback. KMS provides ways to offload work
>> but a generic user space always has to provide a fallback and this
>> doesn't change. Hardware specific user space on the other hand will
>> keep working with the forward compatibility guarantees we want to
>> provide.
> 
> In my mind we've screwed up already, isn't a case to be made for
> continue down the same path.
> 
> The kernel is meant to be a hardware abstraction layer, not just a
> hardware exposure layer. The kernel shouldn't set policy and there are
> cases where it can't act as an abstraction layer (like where you need
> a compiler), but I'm not sold that this case is one of those yet. I'm
> open to being educated here on why it would be.
> 

Thanks for raising these points. When I started out looking at color
management I favored the descriptive model. Most other HW vendors
I've talked to also tell me that they think about descriptive APIs
since that allows HW vendors to map that to whatever their HW supports.

Sebastian, Pekka, and others managed to change my mind about this
but I still keep having difficult questions within AMD.

Sebastian, Pekka, and Jonas have already done a good job to describe
our reasoning behind the prescriptive model. It might be helpful to
see how different the results of different tone-mapping operators
can look:

http://helgeseetzen.com/wp-content/uploads/2017/06/HS1.pdf

According to my understanding all other platforms that have HDR now
have a single compositor. At least that's true for Windows. This allows
driver developers to tune their tone-mapping algorithm to match the
algorithm used by the compositor when offloading plane composition.

This is not true on Linux, where we have a myriad of compositors for
good reasons, many of which have a different view of how they want color
management to look like. Even if we would come up with an API that lets
compositors define their input, output, scaling, and blending space in
detail it would still not be feasible to describe the minutia of
the tone-mapping algorithms, hence leading to differences in output
when KMS color management is used.

I am debating whether we need to be serious about a userspace library
(or maybe a user-mode driver) to provide an abstraction from the
descriptive to the prescriptive model. HW vendors need a way to provide
timely support for new HW generations without requiring updates to a
large number of compositors.

Harry

>>
>> The second answer is that we want to provide a user space library
>> which takes a description of a color pipeline and tries to map that to
>> the available KMS color pipelines. If there is a novel color
>> operation, adding support in this library would then make it possible
>> to offload compatible color pipelines on this new hardware for all
>> consumers of the library. Obviously there is no guarantee that
>> whatever color pipeline compositors come up with can actually be
>> realized on specific hardware but that's just an inherent hardware
>> issue.
>>
> 
> Why does this library need to be in userspace though? If there's a
> library making device dependent decisions, why can't we just make
> those device dependent decisions in the kernel?
> 
> This feels like we are trying to go down the Android HWC road, but we
> aren't in that business.
> 
> My thoughts would be userspace has to have some way to describe what
> it wants anyways, otherwise it does sound like I need to update
> mutter, kwin, surfaceflinger, chromeos, gamescope, every time a new HW
> device comes out that operates slightly different to previously
> generations. This isn't the kernel doing hw abstraction at all, it's
> the kernel just opting out of designing interfaces and it isn't
> something I'm sold on.
> 
> Dave.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-09 11:47       ` Pekka Paalanen
@ 2023-05-09 17:01         ` Melissa Wen
  0 siblings, 0 replies; 49+ messages in thread
From: Melissa Wen @ 2023-05-09 17:01 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: xaver.hugl, DRI Development, wayland-devel, Victoria Brekenfeld,
	Jonas Ådahl, Uma Shankar, Joshua Ashton, Michel Dänzer,
	Aleix Pol, Sebastian Wick

[-- Attachment #1: Type: text/plain, Size: 2266 bytes --]

On 05/09, Pekka Paalanen wrote:
> On Tue, 9 May 2023 10:23:49 -0100
> Melissa Wen <mwen@igalia.com> wrote:
> 
> > On 05/05, Joshua Ashton wrote:
> > > Some corrections and replies inline.
> > > 
> > > On Fri, 5 May 2023 at 12:42, Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > > >
> > > > On Thu, 04 May 2023 15:22:59 +0000
> > > > Simon Ser <contact@emersion.fr> wrote:
> > > >  
> 
> ...
> 
> > > > >     Color operation 47 (3D LUT RAM)
> > > > >     └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > > > >     Color operation 48 (blend gamma)
> > > > >     └─ "1d_curve_type" = PQ  
> > > 
> > > ^
> > > This is wrong, this should be Display Native -> Linearized Display Referred  
> > 
> > This is a good point to discuss. I understand for the HDR10 case that we
> > are just setting an enumerated TF (that is PQ for this case - correct me
> > if I got it wrong) but, unlike when we use a user-LUT, we don't know
> > from the API that this enumerated TF value with an empty LUT is used for
> > linearizing/degamma. Perhaps this could come as a pair? Any idea?
> 
> PQ curve is an EOTF, so it's always from electrical to optical.
> 
> Are you asking for something like
> 
> "1d_curve_type" = "PQ EOTF"
> 
> vs.
> 
> "1d_curve_type" = "inverse PQ EOTF"?
> 
> I think that's how it should work. It's not a given that if a
> hardware block can do a curve, it can also do its inverse. They need to
> be advertised explicitly.

Sounds good and clear to me.

Thanks!

Melissa

> 
> 
> Thanks,
> pq
> 
> ps. I picked my nick in the 90s. Any resemblance to Perceptual
> Quantizer is unintended. ;-)

:D

> 
> 
> > > >
> > > > You cannot do a TF with a matrix, and a gamut remap with a matrix on
> > > > electrical values is certainly surprising, so the example here is a
> > > > bit odd, but I don't think that hurts the intention of demonstration.  
> > > 
> > > I have done some corrections inline.
> > > 
> > > You can see our fully correct color pipeline here:
> > > https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png
> > > 
> > > Please let me know if you have any more questions about our color pipeline.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-09 14:31       ` Harry Wentland
@ 2023-05-09 19:53         ` Dave Airlie
  2023-05-09 20:22           ` Simon Ser
  0 siblings, 1 reply; 49+ messages in thread
From: Dave Airlie @ 2023-05-09 19:53 UTC (permalink / raw)
  To: Harry Wentland
  Cc: Sebastian Wick, Pekka Paalanen, xaver.hugl, Aleix Pol,
	DRI Development, wayland-devel, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Michel Dänzer,
	Joshua Ashton

On Wed, 10 May 2023 at 00:31, Harry Wentland <harry.wentland@amd.com> wrote:
>
>
>
> On 5/7/23 19:14, Dave Airlie wrote:
> > On Sat, 6 May 2023 at 08:21, Sebastian Wick <sebastian.wick@redhat.com> wrote:
> >>
> >> On Fri, May 5, 2023 at 10:40 PM Dave Airlie <airlied@gmail.com> wrote:
> >>>
> >>> On Fri, 5 May 2023 at 01:23, Simon Ser <contact@emersion.fr> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> >>>> pipeline before blending, ie. after a pixel is tapped from a plane's
> >>>> framebuffer and before it's blended with other planes. With this new uAPI we
> >>>> aim to reduce the battery life impact of color management and HDR on mobile
> >>>> devices, to improve performance and to decrease latency by skipping
> >>>> composition on the 3D engine. This proposal is the result of discussions at
> >>>> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> >>>> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> >>>> discussion.
> >>>>
> >>>> This proposal takes a prescriptive approach instead of a descriptive approach.
> >>>> Drivers describe the available hardware blocks in terms of low-level
> >>>> mathematical operations, then user-space configures each block. We decided
> >>>> against a descriptive approach where user-space would provide a high-level
> >>>> description of the colorspace and other parameters: we want to give more
> >>>> control and flexibility to user-space, e.g. to be able to replicate exactly the
> >>>> color pipeline with shaders and switch between shaders and KMS pipelines
> >>>> seamlessly, and to avoid forcing user-space into a particular color management
> >>>> policy.
> >>>
> >>> I'm not 100% sold on the prescriptive here, let's see if someone can
> >>> get me over the line with some questions later.
> >>>
> >>> My feeling is color pipeline hw is not a done deal, and that hw
> >>> vendors will be revising/evolving/churning the hw blocks for a while
> >>> longer, as there is no real standards in the area to aim for, all the
> >>> vendors are mostly just doing whatever gets Windows over the line and
> >>> keeps hw engineers happy. So I have some concerns here around forwards
> >>> compatibility and hence the API design.
> >>>
> >>> I guess my main concern is if you expose a bunch of hw blocks and
> >>> someone comes up with a novel new thing, will all existing userspace
> >>> work, without falling back to shaders?
> >>> Do we have minimum guarantees on what hardware blocks have to be
> >>> exposed to build a useable pipeline?
> >>> If a hardware block goes away in a new silicon revision, do I have to
> >>> rewrite my compositor? or will it be expected that the kernel will
> >>> emulate the old pipelines on top of whatever new fancy thing exists.
> >>
> >> I think there are two answers to those questions.
> >
> > These aren't selling me much better :-)
> >>
> >> The first one is that right now KMS already doesn't guarantee that
> >> every property is supported on all hardware. The guarantee we have is
> >> that properties that are supported on a piece of hardware on a
> >> specific kernel will be supported on the same hardware on later
> >> kernels. The color pipeline is no different here. For a specific piece
> >> of hardware a newer kernel might only change the pipelines in a
> >> backwards compatible way and add new pipelines.
> >>
> >> So to answer your question: if some hardware with a novel pipeline
> >> will show up it might not be supported and that's fine. We already
> >> have cases where some hardware does not support the gamma lut property
> >> but only the CSC property and that breaks night light because we never
> >> bothered to write a shader fallback. KMS provides ways to offload work
> >> but a generic user space always has to provide a fallback and this
> >> doesn't change. Hardware specific user space on the other hand will
> >> keep working with the forward compatibility guarantees we want to
> >> provide.
> >
> > In my mind we've screwed up already, isn't a case to be made for
> > continue down the same path.
> >
> > The kernel is meant to be a hardware abstraction layer, not just a
> > hardware exposure layer. The kernel shouldn't set policy and there are
> > cases where it can't act as an abstraction layer (like where you need
> > a compiler), but I'm not sold that this case is one of those yet. I'm
> > open to being educated here on why it would be.
> >
>
> Thanks for raising these points. When I started out looking at color
> management I favored the descriptive model. Most other HW vendors
> I've talked to also tell me that they think about descriptive APIs
> since that allows HW vendors to map that to whatever their HW supports.
>
> Sebastian, Pekka, and others managed to change my mind about this
> but I still keep having difficult questions within AMD.
>
> Sebastian, Pekka, and Jonas have already done a good job to describe
> our reasoning behind the prescriptive model. It might be helpful to
> see how different the results of different tone-mapping operators
> can look:
>
> http://helgeseetzen.com/wp-content/uploads/2017/06/HS1.pdf
>
> According to my understanding all other platforms that have HDR now
> have a single compositor. At least that's true for Windows. This allows
> driver developers to tune their tone-mapping algorithm to match the
> algorithm used by the compositor when offloading plane composition.
>
> This is not true on Linux, where we have a myriad of compositors for
> good reasons, many of which have a different view of how they want color
> management to look like. Even if we would come up with an API that lets
> compositors define their input, output, scaling, and blending space in
> detail it would still not be feasible to describe the minutia of
> the tone-mapping algorithms, hence leading to differences in output
> when KMS color management is used.
>
> I am debating whether we need to be serious about a userspace library
> (or maybe a user-mode driver) to provide an abstraction from the
> descriptive to the prescriptive model. HW vendors need a way to provide
> timely support for new HW generations without requiring updates to a
> large number of compositors.

There are also other vendor side effects to having this in userspace.

Will the library have a loader?
Will it allow proprietary plugins?
Will it allow proprietary reimplementations?
What will happen when a vendor wants distros to ship *their*
proprietary fork of said library?

How would NVIDIA integrate this with their proprietary stack?

Dave.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-09 19:53         ` Dave Airlie
@ 2023-05-09 20:22           ` Simon Ser
  2023-05-10  7:59             ` Jonas Ådahl
  2023-05-10  8:48             ` Pekka Paalanen
  0 siblings, 2 replies; 49+ messages in thread
From: Simon Ser @ 2023-05-09 20:22 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Sebastian Wick, Pekka Paalanen, xaver.hugl, DRI Development,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Michel Dänzer, Aleix Pol,
	Joshua Ashton

On Tuesday, May 9th, 2023 at 21:53, Dave Airlie <airlied@gmail.com> wrote:

> There are also other vendor side effects to having this in userspace.
> 
> Will the library have a loader?
> Will it allow proprietary plugins?
> Will it allow proprietary reimplementations?
> What will happen when a vendor wants distros to ship their
> proprietary fork of said library?
> 
> How would NVIDIA integrate this with their proprietary stack?

Since all color operations exposed by KMS are standard, the library
would just be a simple one: no loader, no plugin, no proprietary pieces,
etc.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-09 20:22           ` Simon Ser
@ 2023-05-10  7:59             ` Jonas Ådahl
  2023-05-10  8:59               ` Pekka Paalanen
  2023-05-11  9:51               ` Karol Herbst
  2023-05-10  8:48             ` Pekka Paalanen
  1 sibling, 2 replies; 49+ messages in thread
From: Jonas Ådahl @ 2023-05-10  7:59 UTC (permalink / raw)
  To: Simon Ser
  Cc: Sebastian Wick, Pekka Paalanen, Aleix Pol, DRI Development,
	xaver.hugl, Melissa Wen, Michel Dänzer, Uma Shankar,
	Victoria Brekenfeld, wayland-devel, Joshua Ashton

On Tue, May 09, 2023 at 08:22:30PM +0000, Simon Ser wrote:
> On Tuesday, May 9th, 2023 at 21:53, Dave Airlie <airlied@gmail.com> wrote:
> 
> > There are also other vendor side effects to having this in userspace.
> > 
> > Will the library have a loader?
> > Will it allow proprietary plugins?
> > Will it allow proprietary reimplementations?
> > What will happen when a vendor wants distros to ship their
> > proprietary fork of said library?
> > 
> > How would NVIDIA integrate this with their proprietary stack?
> 
> Since all color operations exposed by KMS are standard, the library
> would just be a simple one: no loader, no plugin, no proprietary pieces,
> etc.
> 

There might be pipelines/color-ops only exposed by proprietary out of
tree drivers; the operation types and semantics should ideally be
defined upstream, but the code paths would in practice be vendor
specific, potentially without any upstream driver using them. It should
be clear whether an implementation that makes such a pipeline work is in
scope for the upstream library.

The same applies to the kernel; it must be clear whether pipeline
elements that potentially will only be exposed by out of tree drivers
will be acceptable upstream, at least as documented operations.


Jonas


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-09 20:22           ` Simon Ser
  2023-05-10  7:59             ` Jonas Ådahl
@ 2023-05-10  8:48             ` Pekka Paalanen
  1 sibling, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-10  8:48 UTC (permalink / raw)
  To: Simon Ser
  Cc: Sebastian Wick, xaver.hugl, DRI Development, wayland-devel,
	Melissa Wen, Jonas Ådahl, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Aleix Pol, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 4231 bytes --]

On Tue, 09 May 2023 20:22:30 +0000
Simon Ser <contact@emersion.fr> wrote:

> On Tuesday, May 9th, 2023 at 21:53, Dave Airlie <airlied@gmail.com> wrote:
> 
> > There are also other vendor side effects to having this in userspace.
> > 
> > Will the library have a loader?
> > Will it allow proprietary plugins?
> > Will it allow proprietary reimplementations?
> > What will happen when a vendor wants distros to ship their
> > proprietary fork of said library?
> > 
> > How would NVIDIA integrate this with their proprietary stack?  
> 
> Since all color operations exposed by KMS are standard, the library
> would just be a simple one: no loader, no plugin, no proprietary pieces,
> etc.

Hi,

that's certainly the long term goal, and *if* Linux software can in any
way guide hardware design, then I believe it is an achievable goal. I
understand "standard" as something that is widely implemented in
various hardware rather than only "well-defined and documented and
free to implement in any hardware if its vendor cared".

However, like I mentioned in my other reply to Steven, I expect there
will be a time period when each hardware has custom processing blocks
no other hardware (same or different vendor) has. I might not call them
outright proprietary though, because in order have them exposed via
UAPI, the mathematical model of the processing block must be documented
with its UAPI. This means there cannot be secrets on what the hardware
does, which means there cannot be a requirement for secret sauce in
userspace either.

I wonder if we can also require new COLOROP elements to be freely
implementable by anyone anywhere in any way one wants? Or do kernel
maintainers just need to NAK proposals for elements that might not be
that free?

Anything that is driver-chosen or automatic can also be proprietary,
because today's KMS UAPI rules do not require documenting how automatic
features work, e.g. the existing YUV-to-RGB conversion. Hardware could
have whatever wild skin tone improvement algorithms hidden in there for
example. In this new proposal, there cannot be undocumented behaviour.

Dave, if we went with a descriptive UAPI model, everything behind it
could be proprietary and secret. That's not open in the least.

On Wed, 10 May 2023 at 00:31, Harry Wentland <harry.wentland@amd.com> wrote:
>
> I am debating whether we need to be serious about a userspace library
> (or maybe a user-mode driver) to provide an abstraction from the
> descriptive to the prescriptive model. HW vendors need a way to provide
> timely support for new HW generations without requiring updates to a
> large number of compositors.  

Drivers can always map old COLOROP elements to new style hardware
blocks if they can achieve the same mathematical operation up to
whatever precision was promised before. I think that should be the main
form of supporting hardware evolution. Then also add new alternative
COLOROP elements that can better utilize the hardware block.

Naturally that means that COLOROP elements must be designed to be
somewhat generic to have a reasonable life time. They cannot be
extremely tightly married to the hardware implementation that might
cease to exist in the very next hardware revision.

Let's say some vendor has a hardware block that does a series of
operations in an optimized fashion, perhaps with hardwired constants.
This is exposed as a custom COLOROP element. The next hardware revision
no longer has this block, but it has a bunch of new blocks that can
produce the exact same result. The driver for this hardware can expose
two different pipelines: one using the old COLOROP element, and another
using a bunch of other COLOROP elements which exposes the new
flexibility of the hardware design better. If userspace chooses the
former pipeline, the driver just programs the bunch of blocks to behave
accordingly. Hopefully the other COLOROP elements will be more standard
than the old element.

Over time, I hope this causes an evolution where hardware implements
only the most standard COLOROP elements, and special-case compound
elements will eventually fall out of use over the decades.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-10  7:59             ` Jonas Ådahl
@ 2023-05-10  8:59               ` Pekka Paalanen
  2023-05-11  9:51               ` Karol Herbst
  1 sibling, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-10  8:59 UTC (permalink / raw)
  To: Jonas Ådahl
  Cc: Sebastian Wick, Aleix Pol, DRI Development, xaver.hugl,
	Melissa Wen, Michel Dänzer, Uma Shankar,
	Victoria Brekenfeld, wayland-devel, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 1889 bytes --]

On Wed, 10 May 2023 09:59:21 +0200
Jonas Ådahl <jadahl@redhat.com> wrote:

> On Tue, May 09, 2023 at 08:22:30PM +0000, Simon Ser wrote:
> > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie <airlied@gmail.com> wrote:
> >   
> > > There are also other vendor side effects to having this in userspace.
> > > 
> > > Will the library have a loader?
> > > Will it allow proprietary plugins?
> > > Will it allow proprietary reimplementations?
> > > What will happen when a vendor wants distros to ship their
> > > proprietary fork of said library?
> > > 
> > > How would NVIDIA integrate this with their proprietary stack?  
> > 
> > Since all color operations exposed by KMS are standard, the library
> > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > etc.
> >   
> 
> There might be pipelines/color-ops only exposed by proprietary out of
> tree drivers; the operation types and semantics should ideally be
> defined upstream, but the code paths would in practice be vendor
> specific, potentially without any upstream driver using them. It should
> be clear whether an implementation that makes such a pipeline work is in
> scope for the upstream library.
> 
> The same applies to the kernel; it must be clear whether pipeline
> elements that potentially will only be exposed by out of tree drivers
> will be acceptable upstream, at least as documented operations.

In my opinion, a COLOROP element definition can be accepted in the
upstream kernel documentation only if there is also an upstream driver
implementing it. It does not need to be a "direct" hardware
implementation, it could also be the upstream driver mapping the
COLOROP to whatever hardware block or block chain it has.

For the userspace library I don't know. I am puzzled whether people
want to allow proprietary components or deny them.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-10  7:59             ` Jonas Ådahl
  2023-05-10  8:59               ` Pekka Paalanen
@ 2023-05-11  9:51               ` Karol Herbst
  2023-05-11 16:56                 ` Joshua Ashton
  1 sibling, 1 reply; 49+ messages in thread
From: Karol Herbst @ 2023-05-11  9:51 UTC (permalink / raw)
  To: Jonas Ådahl
  Cc: Sebastian Wick, Pekka Paalanen, Aleix Pol, DRI Development,
	xaver.hugl, Melissa Wen, Michel Dänzer, Uma Shankar,
	Victoria Brekenfeld, wayland-devel, Joshua Ashton

On Wed, May 10, 2023 at 9:59 AM Jonas Ådahl <jadahl@redhat.com> wrote:
>
> On Tue, May 09, 2023 at 08:22:30PM +0000, Simon Ser wrote:
> > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie <airlied@gmail.com> wrote:
> >
> > > There are also other vendor side effects to having this in userspace.
> > >
> > > Will the library have a loader?
> > > Will it allow proprietary plugins?
> > > Will it allow proprietary reimplementations?
> > > What will happen when a vendor wants distros to ship their
> > > proprietary fork of said library?
> > >
> > > How would NVIDIA integrate this with their proprietary stack?
> >
> > Since all color operations exposed by KMS are standard, the library
> > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > etc.
> >
>
> There might be pipelines/color-ops only exposed by proprietary out of
> tree drivers; the operation types and semantics should ideally be
> defined upstream, but the code paths would in practice be vendor
> specific, potentially without any upstream driver using them. It should
> be clear whether an implementation that makes such a pipeline work is in
> scope for the upstream library.
>
> The same applies to the kernel; it must be clear whether pipeline
> elements that potentially will only be exposed by out of tree drivers
> will be acceptable upstream, at least as documented operations.
>

they aren't. All code in the kernel needs to be used by in-tree
drivers otherwise it's fair to delete it. DRM requires any UAPI change
to have a real open source user in space user.

Nvidia knows this and they went to great lengths to fulfill this
requirement in the past. They'll manage.

>
> Jonas
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-11  9:51               ` Karol Herbst
@ 2023-05-11 16:56                 ` Joshua Ashton
  2023-05-11 18:56                   ` Jonas Ådahl
  2023-05-11 19:29                   ` Simon Ser
  0 siblings, 2 replies; 49+ messages in thread
From: Joshua Ashton @ 2023-05-11 16:56 UTC (permalink / raw)
  To: Karol Herbst
  Cc: Sebastian Wick, Pekka Paalanen, Aleix Pol, DRI Development,
	xaver.hugl, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Michel Dänzer, wayland-devel

When we are talking about being 'prescriptive' in the API, are we
outright saying we don't want to support arbitrary 3D LUTs, or are we
just offering certain algorithms to be 'executed' for a plane/crtc/etc
in the atomic API? I am confused...

There is so much stuff to do with color, that I don't think a
prescriptive API in the kernel could ever keep up with the things that
we want to be pushing from Gamescope/SteamOS. For example, we have so
many things going on, night mode, SDR gamut widening, HDR/SDR gain,
the ability to apply 'looks' for eg. invert luma or for retro looks,
enhanced contrast, tonemapping, inverse tonemapping... We also are
going to be doing a bunch of stuff with EETFs for handling out of
range HDR content for scanout.

Some of what we do is kinda standard, regular "there is a paper on
this" algorithms, and others are not.
While yes, it might be very possible to do simple things, once you
start wanting to do something 'different', that's kinda lock-in.

Whether this co-exists with arbitrary LUTs (that we definitely want
for SteamOS) or not:
I think putting a bunch of math-y stuff like this into the kernel is
probably the complete wrong approach. Everything would need to be
fixed point and it would be a huge pain in the butt to deal with on
that side.

Maybe this is a "hot take", but IMO, DRM atomic is already waaay too
much being done in the kernel space. I think making it go even further
and having it be a prescriptive color API is a complete step in the
wrong direction.

There is also the problem of... if there is a bug in the math here or
we want to add a new feature, if it's kernel side, you are locked in
to having that bug until the next release on your distro and probably
years if it's a new feature!
Updating kernels is much harder for 'enterprise' distros if it is not
mission critical. Having all of this in userspace is completely fine
however...

If you want to make some userspace prescriptive -> descriptive color
library I am all for that for general case compositors, but I don't
think I would use something like that in Gamescope.
That's not to be rude, we are just picky and want freedom to do what
we want and iterate on it easily.

I guess this all comes back to my initial point... having some
userspace to handle stuff that is either kinda or entirely vendor
specific is the right way of solving this problem :-P

- Joshie 🐸✨

On Thu, 11 May 2023 at 09:51, Karol Herbst <kherbst@redhat.com> wrote:
>
> On Wed, May 10, 2023 at 9:59 AM Jonas Ådahl <jadahl@redhat.com> wrote:
> >
> > On Tue, May 09, 2023 at 08:22:30PM +0000, Simon Ser wrote:
> > > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie <airlied@gmail.com> wrote:
> > >
> > > > There are also other vendor side effects to having this in userspace.
> > > >
> > > > Will the library have a loader?
> > > > Will it allow proprietary plugins?
> > > > Will it allow proprietary reimplementations?
> > > > What will happen when a vendor wants distros to ship their
> > > > proprietary fork of said library?
> > > >
> > > > How would NVIDIA integrate this with their proprietary stack?
> > >
> > > Since all color operations exposed by KMS are standard, the library
> > > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > > etc.
> > >
> >
> > There might be pipelines/color-ops only exposed by proprietary out of
> > tree drivers; the operation types and semantics should ideally be
> > defined upstream, but the code paths would in practice be vendor
> > specific, potentially without any upstream driver using them. It should
> > be clear whether an implementation that makes such a pipeline work is in
> > scope for the upstream library.
> >
> > The same applies to the kernel; it must be clear whether pipeline
> > elements that potentially will only be exposed by out of tree drivers
> > will be acceptable upstream, at least as documented operations.
> >
>
> they aren't. All code in the kernel needs to be used by in-tree
> drivers otherwise it's fair to delete it. DRM requires any UAPI change
> to have a real open source user in space user.
>
> Nvidia knows this and they went to great lengths to fulfill this
> requirement in the past. They'll manage.
>
> >
> > Jonas
> >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-11 16:56                 ` Joshua Ashton
@ 2023-05-11 18:56                   ` Jonas Ådahl
  2023-05-11 19:29                   ` Simon Ser
  1 sibling, 0 replies; 49+ messages in thread
From: Jonas Ådahl @ 2023-05-11 18:56 UTC (permalink / raw)
  To: Joshua Ashton
  Cc: Sebastian Wick, Pekka Paalanen, Karol Herbst, Aleix Pol,
	DRI Development, xaver.hugl, Melissa Wen, Michel Dänzer,
	Uma Shankar, Victoria Brekenfeld, wayland-devel

On Thu, May 11, 2023 at 04:56:47PM +0000, Joshua Ashton wrote:
> When we are talking about being 'prescriptive' in the API, are we
> outright saying we don't want to support arbitrary 3D LUTs, or are we
> just offering certain algorithms to be 'executed' for a plane/crtc/etc
> in the atomic API? I am confused...

The 'prescriptive' idea that the RFC of this thread proposes *is* a way
to support arbitrary 3D LUTs (and other mathematical operations),
arbitrarily, in a somewhat vendored way, only that it will not be vendor
prefixed hard coded properties with specific positions in the pipeline,
but instead more or less an introspectable pipeline, describing what
kind of LUT's, Matrix multiplication (and in what order) etc a hardware
can do.

The theoretical userspace library would be the one turning descriptive
"please turn this into that" requests into the "prescriptive" color
pipeline operations. It would target general purpose compositors, but it
wouldn't be mandatory. Doing vendor specific implemantions in gamescope
would be possible; it wouldn't look like the verion that exist somewhere
now that uses a bunch of AMD_* properties, it'd look more like the
example Simon had in the initial RFC.


Jonas

> 
> There is so much stuff to do with color, that I don't think a
> prescriptive API in the kernel could ever keep up with the things that
> we want to be pushing from Gamescope/SteamOS. For example, we have so
> many things going on, night mode, SDR gamut widening, HDR/SDR gain,
> the ability to apply 'looks' for eg. invert luma or for retro looks,
> enhanced contrast, tonemapping, inverse tonemapping... We also are
> going to be doing a bunch of stuff with EETFs for handling out of
> range HDR content for scanout.
> 
> Some of what we do is kinda standard, regular "there is a paper on
> this" algorithms, and others are not.
> While yes, it might be very possible to do simple things, once you
> start wanting to do something 'different', that's kinda lock-in.
> 
> Whether this co-exists with arbitrary LUTs (that we definitely want
> for SteamOS) or not:
> I think putting a bunch of math-y stuff like this into the kernel is
> probably the complete wrong approach. Everything would need to be
> fixed point and it would be a huge pain in the butt to deal with on
> that side.
> 
> Maybe this is a "hot take", but IMO, DRM atomic is already waaay too
> much being done in the kernel space. I think making it go even further
> and having it be a prescriptive color API is a complete step in the
> wrong direction.
> 
> There is also the problem of... if there is a bug in the math here or
> we want to add a new feature, if it's kernel side, you are locked in
> to having that bug until the next release on your distro and probably
> years if it's a new feature!
> Updating kernels is much harder for 'enterprise' distros if it is not
> mission critical. Having all of this in userspace is completely fine
> however...
> 
> If you want to make some userspace prescriptive -> descriptive color
> library I am all for that for general case compositors, but I don't
> think I would use something like that in Gamescope.
> That's not to be rude, we are just picky and want freedom to do what
> we want and iterate on it easily.
> 
> I guess this all comes back to my initial point... having some
> userspace to handle stuff that is either kinda or entirely vendor
> specific is the right way of solving this problem :-P
> 
> - Joshie 🐸✨
> 
> On Thu, 11 May 2023 at 09:51, Karol Herbst <kherbst@redhat.com> wrote:
> >
> > On Wed, May 10, 2023 at 9:59 AM Jonas Ådahl <jadahl@redhat.com> wrote:
> > >
> > > On Tue, May 09, 2023 at 08:22:30PM +0000, Simon Ser wrote:
> > > > On Tuesday, May 9th, 2023 at 21:53, Dave Airlie <airlied@gmail.com> wrote:
> > > >
> > > > > There are also other vendor side effects to having this in userspace.
> > > > >
> > > > > Will the library have a loader?
> > > > > Will it allow proprietary plugins?
> > > > > Will it allow proprietary reimplementations?
> > > > > What will happen when a vendor wants distros to ship their
> > > > > proprietary fork of said library?
> > > > >
> > > > > How would NVIDIA integrate this with their proprietary stack?
> > > >
> > > > Since all color operations exposed by KMS are standard, the library
> > > > would just be a simple one: no loader, no plugin, no proprietary pieces,
> > > > etc.
> > > >
> > >
> > > There might be pipelines/color-ops only exposed by proprietary out of
> > > tree drivers; the operation types and semantics should ideally be
> > > defined upstream, but the code paths would in practice be vendor
> > > specific, potentially without any upstream driver using them. It should
> > > be clear whether an implementation that makes such a pipeline work is in
> > > scope for the upstream library.
> > >
> > > The same applies to the kernel; it must be clear whether pipeline
> > > elements that potentially will only be exposed by out of tree drivers
> > > will be acceptable upstream, at least as documented operations.
> > >
> >
> > they aren't. All code in the kernel needs to be used by in-tree
> > drivers otherwise it's fair to delete it. DRM requires any UAPI change
> > to have a real open source user in space user.
> >
> > Nvidia knows this and they went to great lengths to fulfill this
> > requirement in the past. They'll manage.
> >
> > >
> > > Jonas
> > >
> >
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-11 16:56                 ` Joshua Ashton
  2023-05-11 18:56                   ` Jonas Ådahl
@ 2023-05-11 19:29                   ` Simon Ser
  2023-05-12  7:24                     ` Pekka Paalanen
  1 sibling, 1 reply; 49+ messages in thread
From: Simon Ser @ 2023-05-11 19:29 UTC (permalink / raw)
  To: Joshua Ashton
  Cc: Sebastian Wick, Pekka Paalanen, Karol Herbst, Aleix Pol,
	DRI Development, xaver.hugl, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Michel Dänzer,
	wayland-devel

On Thursday, May 11th, 2023 at 18:56, Joshua Ashton <joshua@froggi.es> wrote:

> When we are talking about being 'prescriptive' in the API, are we
> outright saying we don't want to support arbitrary 3D LUTs, or are we
> just offering certain algorithms to be 'executed' for a plane/crtc/etc
> in the atomic API? I am confused...

From a kernel PoV:

- Prescriptive = here are the available hardware blocks, feel free to
  configure each as you like
- Descriptive = give me the source and destination color-spaces and I
  take care of everything

This proposal is a prescriptive API. We haven't explored _that_ much
how a descriptive API would look like, probably it can include some way
to do Night Light and similar features but not sure how high-level
they'd look like. A descriptive API is inherently more restrictive than
a prescriptive API.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-05 13:30   ` Joshua Ashton
  2023-05-05 14:16     ` Pekka Paalanen
  2023-05-09 11:23     ` Melissa Wen
@ 2023-05-11 21:21     ` Simon Ser
  2 siblings, 0 replies; 49+ messages in thread
From: Simon Ser @ 2023-05-11 21:21 UTC (permalink / raw)
  To: Joshua Ashton
  Cc: Jonas Ådahl, DRI Development, xaver.hugl, Melissa Wen,
	Pekka Paalanen, Uma Shankar, Victoria Brekenfeld,
	Michel Dänzer, Aleix Pol, Sebastian Wick, wayland-devel

On Friday, May 5th, 2023 at 15:30, Joshua Ashton <joshua@froggi.es> wrote:

> > > AMD would expose the following objects and properties:
> > >
> > >     Plane 10
> > >     ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
> > >     └─ "color_pipeline": enum {0, 42} = 0
> > >     Color operation 42 (input CSC)
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 43
> > >     Color operation 43
> > >     ├─ "type": enum {Scaling} = Scaling
> > >     └─ "next": immutable color operation ID = 44
> > >     Color operation 44 (DeGamma)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
> > >     └─ "next": immutable color operation ID = 45
> 
> Some vendors have per-tap degamma and some have a degamma after the sample.
> How do we distinguish that behaviour?
> It is important to know.

Can you elaborate? What is "per-tap" and "sample"? Is the "Scaling" color
operation above not enough to indicate where in the pipeline the hw performs
scaling?

> > >     Color operation 45 (gamut remap)
> > >     ├─ "type": enum {Bypass, Matrix} = Matrix
> > >     ├─ "matrix_data": blob
> > >     └─ "next": immutable color operation ID = 46
> > >     Color operation 46 (shaper LUT RAM)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 47
> > >     Color operation 47 (3D LUT RAM)
> > >     ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> > >     ├─ "lut_size": immutable range = 17
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 48
> > >     Color operation 48 (blend gamma)
> > >     ├─ "type": enum {Bypass, 1D curve} = 1D curve
> > >     ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
> > >     ├─ "lut_size": immutable range = 4096
> > >     ├─ "lut_data": blob
> > >     └─ "next": immutable color operation ID = 0
> > >
> > > To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> > > display, gamescope would perform an atomic commit with the following property
> > > values:
> > >
> > >     Plane 10
> > >     └─ "color_pipeline" = 42
> > >     Color operation 42 (input CSC)
> > >     └─ "matrix_data" = PQ → scRGB (TF)
> 
> ^
> Not sure what this is.
> We don't use an input CSC before degamma.
> 
> > >     Color operation 44 (DeGamma)
> > >     └─ "type" = Bypass
> 
> ^
> If we did PQ, this would be PQ -> Linear / 80
> If this was sRGB, it'd be sRGB -> Linear
> If this was scRGB this would be just treating it as it is. So... Linear / 80.
> 
> > >     Color operation 45 (gamut remap)
> > >     └─ "matrix_data" = scRGB (TF) → PQ
> 
> ^
> This is wrong, we just use this to do scRGB primaries (709) to 2020.
> 
> We then go from scRGB -> PQ to go into our shaper + 3D LUT.
> 
> > >     Color operation 46 (shaper LUT RAM)
> > >     └─ "lut_data" = PQ → Display native
> 
> ^
> "Display native" is just the response curve of the display.
> In HDR10, this would just be PQ -> PQ
> If we were doing HDR10 on SDR, this would be PQ -> Gamma 2.2 (mapped
> from 0 to display native luminance) [with a potential bit of headroom
> for tonemapping in the 3D LUT]
> For SDR on HDR10 this would be Gamma 2.2 -> PQ (Not intending to start
> an sRGB vs G2.2 argument here! :P)
> 
> > >     Color operation 47 (3D LUT RAM)
> > >     └─ "lut_data" = Gamut mapping + tone mapping + night mode
> > >     Color operation 48 (blend gamma)
> > >     └─ "1d_curve_type" = PQ
> 
> ^
> This is wrong, this should be Display Native -> Linearized Display Referred

In the HDR case, isn't this the inverse of PQ?

> > You cannot do a TF with a matrix, and a gamut remap with a matrix on
> > electrical values is certainly surprising, so the example here is a
> > bit odd, but I don't think that hurts the intention of demonstration.
> 
> I have done some corrections inline.
> 
> You can see our fully correct color pipeline here:
> https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png
> 
> Please let me know if you have any more questions about our color pipeline.

As expected, I got the gamescope part wrong. I'm pretty confident that the
proposed API would still work since the AMD vendor-specific props would just
be exposed as color operation objects. Can you confirm we can make the
gamescope pipeline work with the AMD color pipeline outlined above?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-05-11 19:29                   ` Simon Ser
@ 2023-05-12  7:24                     ` Pekka Paalanen
  0 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-05-12  7:24 UTC (permalink / raw)
  To: Simon Ser
  Cc: Sebastian Wick, Karol Herbst, xaver.hugl, DRI Development,
	Victoria Brekenfeld, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Michel Dänzer, Aleix Pol, wayland-devel, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 2596 bytes --]

On Thu, 11 May 2023 19:29:27 +0000
Simon Ser <contact@emersion.fr> wrote:

> On Thursday, May 11th, 2023 at 18:56, Joshua Ashton <joshua@froggi.es> wrote:
> 
> > When we are talking about being 'prescriptive' in the API, are we
> > outright saying we don't want to support arbitrary 3D LUTs, or are we
> > just offering certain algorithms to be 'executed' for a plane/crtc/etc
> > in the atomic API? I am confused...  
> 
> From a kernel PoV:
> 
> - Prescriptive = here are the available hardware blocks, feel free to
>   configure each as you like
> - Descriptive = give me the source and destination color-spaces and I
>   take care of everything
> 
> This proposal is a prescriptive API. We haven't explored _that_ much
> how a descriptive API would look like, probably it can include some way
> to do Night Light and similar features but not sure how high-level
> they'd look like. A descriptive API is inherently more restrictive than
> a prescriptive API.

Right. Just like Jonas said, an arbitrary 3D LUT is a well-defined
mathematical operation with no semantics at all, therefore it is a
prescriptive element. A 3D LUT does not fit well in a descriptive API
design, one would need to jump through lots of hoops to turn it into
something descriptive'ish (like ICC does).

I think Joshua mixed up the definitions of "descriptive" and
"prescriptive".

If Gamescope was using a descriptive KMS UAPI, then it would have very
little or no say in what color operations are done and how.

If Gamescope is using prescriptive KMS UAPI, then Gamescope has to know
exactly what it wants to do, how it wants to achieve that, and map that
to the available mathematical processing blocks.

A descriptive UAPI would mean all color policy is in the kernel. A
prescriptive UAPI means all policy is in userspace.

Wayland uses the opposite design principle of KMS UAPI. Wayland is
descriptive, KMS is prescriptive. This puts the color policy into a
Wayland compositor. If we have a library converting descriptive to
prescriptive, then that library contains a policy.

Going from descriptive to prescriptive is easy, just add policy. Going
from prescriptive to descriptive is practically impossible, because
you'd have to "subtract" any policy that has already been applied, in
order to understand what the starting point was.

Coming back to KMS, the color transformations must be prescriptive, but
then we also need to be able to send descriptive information to video
sinks so that video sinks understand what our pixel values mean.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
       [not found] ` <4341dac6-ada1-2a75-1c22-086d96408a85@quicinc.com>
@ 2023-06-09 15:52   ` Christopher Braga
  2023-06-09 16:30     ` Simon Ser
  0 siblings, 1 reply; 49+ messages in thread
From: Christopher Braga @ 2023-06-09 15:52 UTC (permalink / raw)
  To: Simon Ser
  Cc: Aleix Pol, Pekka Paalanen, DRI Development, xaver.hugl,
	Michel Dänzer, wayland-devel, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Sebastian Wick, Joshua Ashton

> Hi all,
> 
> The goal of this RFC is to expose a generic KMS uAPI to configure the color
> pipeline before blending, ie. after a pixel is tapped from a plane's
> framebuffer and before it's blended with other planes. With this new 
> uAPI we
> aim to reduce the battery life impact of color management and HDR on mobile
> devices, to improve performance and to decrease latency by skipping
> composition on the 3D engine. This proposal is the result of discussions at
> the Red Hat HDR hackfest [1] which took place a few days ago. Engineers
> familiar with the AMD, Intel and NVIDIA hardware have participated in the
> discussion.
> 
> This proposal takes a prescriptive approach instead of a descriptive 
> approach.
> Drivers describe the available hardware blocks in terms of low-level
> mathematical operations, then user-space configures each block. We decided
> against a descriptive approach where user-space would provide a high-level
> description of the colorspace and other parameters: we want to give more
> control and flexibility to user-space, e.g. to be able to replicate 
> exactly the
> color pipeline with shaders and switch between shaders and KMS pipelines
> seamlessly, and to avoid forcing user-space into a particular color 
> management
> policy.
> 
Thanks for posting this Simon! This overview does a great job of
breaking down the proposal. A few questions inline below.

> We've decided against mirroring the existing CRTC properties
> DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management
> pipeline can significantly differ between vendors and this approach cannot
> accurately abstract all hardware. In particular, the availability, 
> ordering and
> capabilities of hardware blocks is different on each display engine. So, 
> we've
> decided to go for a highly detailed hardware capability discovery.
> 
> This new uAPI should not be in conflict with existing standard KMS 
> properties,
> since there are none which control the pre-blending color pipeline at the
> moment. It does conflict with any vendor-specific properties like
> NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific
> properties. Drivers will need to either reject atomic commits 
> configuring both
> uAPIs, or alternatively we could add a DRM client cap which hides the 
> vendor
> properties and shows the new generic properties when enabled.
> 
> To use this uAPI, first user-space needs to discover hardware 
> capabilities via
> KMS objects and properties, then user-space can configure the hardware 
> via an
> atomic commit. This works similarly to the existing KMS uAPI, e.g. planes.
> 
> Our proposal introduces a new "color_pipeline" plane property, and a new 
> KMS
> object type, "COLOROP" (short for color operation). The "color_pipeline" 
> plane
> property is an enum, each enum entry represents a color pipeline 
> supported by
> the hardware. The special zero entry indicates that the pipeline is in
> "bypass"/"no-op" mode. For instance, the following plane properties 
> describe a
> primary plane with 2 supported pipelines but currently configured in bypass
> mode:
> 
>      Plane 10
>      ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>      ├─ …
>      └─ "color_pipeline": enum {0, 42, 52} = 0
> 
> The non-zero entries describe color pipelines as a linked list of 
> COLOROP KMS
> objects. The entry value is an object ID pointing to the head of the linked
> list (the first operation in the color pipeline).
> 
> The new COLOROP objects also expose a number of KMS properties. Each has a
> type, a reference to the next COLOROP object in the linked list, and other
> type-specific properties. Here is an example for a 1D LUT operation:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
curves? Will different hardware be allowed to expose a subset of these 
enum values?

>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 43
> 
Some hardware has per channel 1D LUT values, while others use the same 
LUT for all channels.  We will definitely need to expose this in the 
UAPI in some form.

> To configure this hardware block, user-space can fill a KMS blob with 
> 4096 u32
> entries, then set "lut_data" to the blob ID. Other color operation types 
> might
> have different properties.
> 
The bit-depth of the LUT is an important piece of information we should
include by default. Are we assuming that the DRM driver will always
reduce the input values to the resolution supported by the pipeline? 
This could result in differences between the hardware behavior
and the shader behavior.

Additionally, some pipelines are floating point while others are fixed. 
How would user space know if it needs to pack 32 bit integer values vs
32 bit float values?

> Here is another example with a 3D LUT:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>      ├─ "lut_size": immutable range = 33
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 43
> 
We are going to need to expose the packing order here to avoid any 
programming uncertainty. I don't think we can safely assume all hardware
is equivalent.

> And one last example with a matrix:
> 
>      Color operation 42
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 43
> 
It is unclear to me what the default sizing of this matrix is. Any 
objections to exposing these details with an additional property?

> [Simon note: having "Bypass" in the "type" enum, and making "type" 
> mutable is
> a bit weird. Maybe we can just add an "active"/"bypass" boolean property on
> blocks which can be bypassed instead.]
> 
Bypass boolean is favored by me as well.

> [Jonas note: perhaps a single "data" property for both LUTs and matrices
> would make more sense. And a "size" prop for both 1D and 3D LUTs.]
> 
> If some hardware supports re-ordering operations in the color pipeline, the
> driver can expose multiple pipelines with different operation ordering, and
> user-space can pick the ordering it prefers by selecting the right 
> pipeline.
> The same scheme can be used to expose hardware blocks supporting multiple
> precision levels.
> 
> That's pretty much all there is to it, but as always the devil is in the
> details.
> 
Dithering logic exists in some pipelines. I think we need a plan to 
expose that here as well.

> First, we realized that we need a way to indicate where the scaling 
> operation
> is happening. The contents of the framebuffer attached to the plane 
> might be
> scaled up or down depending on the CRTC_W and CRTC_H properties. 
> Depending on
> the colorspace scaling is applied in, the result will be different, so 
> we need
> a way for the kernel to indicate which hardware blocks are pre-scaling, and
> which ones are post-scaling. We introduce a special "scaling" operation 
> type,
> which is part of the pipeline like other operations but serves an 
> informational
> role only (effectively, the operation cannot be configured by 
> user-space, all
> of its properties are immutable). For example:
> 
>      Color operation 43
>      ├─ "type": immutable enum {Scaling} = Scaling
>      └─ "next": immutable color operation ID = 44
> 
> [Simon note: an alternative would be to split the color pipeline into 
> two, by
> having two plane properties ("color_pipeline_pre_scale" and
> "color_pipeline_post_scale") instead of a single one. This would be 
> similar to
> the way we want to split pre-blending and post-blending. This could be less
> expressive for drivers, there may be hardware where there are dependencies
> between the pre- and post-scaling pipeline?]
> 
As others have noted, breaking up the pipeline with immutable blocks 
makes the most sense to me here. This way we don't have to predict ahead 
of time every type of block that maybe affected by pipeline ordering. 
Splitting the pipeline into two properties now means future
logical splits would require introduction of further plane properties.

> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> contains some fixed-function blocks which convert from LMS to ICtCp and 
> cannot
> be disabled/bypassed. NVIDIA hardware has been designed for descriptive 
> APIs
> where user-space provides a high-level description of the colorspace
> conversions it needs to perform, and this is at odds with our KMS uAPI
> proposal. To address this issue, we suggest adding a special block type 
> which
> describes a fixed conversion from one colorspace to another and cannot be
> configured by user-space. Then user-space will need to accomodate its 
> pipeline
> for these special blocks. Such fixed hardware blocks need to be well enough
> documented so that they can be implemented via shaders.
> 
A few questions here. What is the current plan for documenting the 
mathematical model for each exposed block? Will each defined 'type' enum 
value be locked to a definition in the kernel documents? As an example, 
when we say '3D LUT' in this proposal does this mean the block will 
expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a 
direct in to out LUT mapping?

Overall I am a fan of this proposal though. The prescriptive color 
pipeline UAPI is simple and easy to follow.

Regards,
Christopher

> We also noted that it should always be possible for user-space to 
> completely
> disable the color pipeline and switch back to bypass/identity without a
> modeset. Some drivers will need to fail atomic commits for some color
> pipelines, in particular for some specific LUT payloads. For instance, AMD
> doesn't support curves which are too steep, and Intel doesn't support 
> curves
> which decrease. This isn't something which routinely happens, but there 
> might
> be more cases where the hardware needs to reject the pipeline. Thus, when
> user-space has a running KMS color pipeline, then hits a case where the
> pipeline cannot keep running (gets rejected by the driver), user-space 
> needs to
> be able to immediately fall back to shaders without any glitch. This 
> doesn't
> seem to be an issue for AMD, Intel and NVIDIA.
> 
> This uAPI is extensible: we can add more color operations, and we can 
> add more
> properties for each color operation type. For instance, we might want to 
> add
> support for Intel piece-wise linear (PWL) 1D curves, or might want to 
> advertise
> the effective precision of the LUTs. The uAPI is deliberately somewhat 
> minimal
> to keep the scope of the proposal manageable.
> 
> Later on, we plan to re-use the same machinery for post-blending color
> pipelines. There are some more details about post-blending which have been
> separately debated at the hackfest, but we believe it's a viable plan. This
> solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT 
> properties, so
> we'd like to introduce a client cap to hide the old properties and show 
> the new
> post-blending color pipeline properties.
> 
> We envision a future user-space library to translate a high-level 
> descriptive
> color pipeline into low-level prescriptive KMS color pipeline 
> ("libliftoff but
> for color pipelines"). The library could also offer a translation into 
> shaders.
> This should help share more infrastructure between compositors and ease KMS
> offloading. This should also help dealing with the NVIDIA case.
> 
> To wrap things up, let's take a real-world example: how would gamescope [2]
> configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope 
> color
> pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4].
> 
> AMD would expose the following objects and properties:
> 
>      Plane 10
>      ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary
>      └─ "color_pipeline": enum {0, 42} = 0
>      Color operation 42 (input CSC)
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 43
>      Color operation 43
>      ├─ "type": enum {Scaling} = Scaling
>      └─ "next": immutable color operation ID = 44
>      Color operation 44 (DeGamma)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB
>      └─ "next": immutable color operation ID = 45
>      Color operation 45 (gamut remap)
>      ├─ "type": enum {Bypass, Matrix} = Matrix
>      ├─ "matrix_data": blob
>      └─ "next": immutable color operation ID = 46
>      Color operation 46 (shaper LUT RAM)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 47
>      Color operation 47 (3D LUT RAM)
>      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>      ├─ "lut_size": immutable range = 17
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 48
>      Color operation 48 (blend gamma)
>      ├─ "type": enum {Bypass, 1D curve} = 1D curve
>      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT
>      ├─ "lut_size": immutable range = 4096
>      ├─ "lut_data": blob
>      └─ "next": immutable color operation ID = 0
> 
> To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR
> display, gamescope would perform an atomic commit with the following 
> property
> values:
> 
>      Plane 10
>      └─ "color_pipeline" = 42
>      Color operation 42 (input CSC)
>      └─ "matrix_data" = PQ → scRGB (TF)
>      Color operation 44 (DeGamma)
>      └─ "type" = Bypass
>      Color operation 45 (gamut remap)
>      └─ "matrix_data" = scRGB (TF) → PQ
>      Color operation 46 (shaper LUT RAM)
>      └─ "lut_data" = PQ → Display native
>      Color operation 47 (3D LUT RAM)
>      └─ "lut_data" = Gamut mapping + tone mapping + night mode
>      Color operation 48 (blend gamma)
>      └─ "1d_curve_type" = PQ
> 
> I hope comparing these properties to the diagrams linked above can help
> understand how the uAPI would be used and give an idea of its viability.
> 
> Please feel free to provide feedback! It would be especially useful to have
> someone familiar with Arm SoCs look at this, to confirm that this proposal
> would work there.
> 
> Unless there is a show-stopper, we plan to follow up this RFC with
> implementations for AMD, Intel, NVIDIA, gamescope, and IGT.
> 
> Many thanks to everybody who contributed to the hackfest, on-site or 
> remotely!
> Let's work together to make this happen!
> 
> Simon, on behalf of the hackfest participants
> 
> [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023
> [2]: https://github.com/ValveSoftware/gamescope
> [3]: 
> https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png
> [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-09 15:52   ` Christopher Braga
@ 2023-06-09 16:30     ` Simon Ser
  2023-06-09 23:11       ` Christopher Braga
  0 siblings, 1 reply; 49+ messages in thread
From: Simon Ser @ 2023-06-09 16:30 UTC (permalink / raw)
  To: Christopher Braga
  Cc: Aleix Pol, Pekka Paalanen, DRI Development, xaver.hugl,
	Michel Dänzer, wayland-devel, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Sebastian Wick, Joshua Ashton

Hi Christopher,

On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:

> > The new COLOROP objects also expose a number of KMS properties. Each has a
> > type, a reference to the next COLOROP object in the linked list, and other
> > type-specific properties. Here is an example for a 1D LUT operation:
> >
> >      Color operation 42
> >      ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >      ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> curves? Will different hardware be allowed to expose a subset of these
> enum values?

Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.

> >      ├─ "lut_size": immutable range = 4096
> >      ├─ "lut_data": blob
> >      └─ "next": immutable color operation ID = 43
> >
> Some hardware has per channel 1D LUT values, while others use the same
> LUT for all channels.  We will definitely need to expose this in the
> UAPI in some form.

Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
to get exposed as another color operation block.

> > To configure this hardware block, user-space can fill a KMS blob with
> > 4096 u32
> > entries, then set "lut_data" to the blob ID. Other color operation types
> > might
> > have different properties.
> >
> The bit-depth of the LUT is an important piece of information we should
> include by default. Are we assuming that the DRM driver will always
> reduce the input values to the resolution supported by the pipeline?
> This could result in differences between the hardware behavior
> and the shader behavior.
> 
> Additionally, some pipelines are floating point while others are fixed.
> How would user space know if it needs to pack 32 bit integer values vs
> 32 bit float values?

Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
definition of LUT blob (u16 elements) and it's up to the driver to convert.

Using a very precise format for the uAPI has the nice property of making the
uAPI much simpler to use. User-space sends high precision data and it's up to
drivers to map that to whatever the hardware accepts.

Exposing the actual hardware precision is something we've talked about during
the hackfest. It'll probably be useful to some extent, but will require some
discussion to figure out how to design the uAPI. Maybe a simple property is
enough, maybe not (e.g. fully describing the precision of segmented LUTs would
probably be trickier).

I'd rather keep things simple for the first pass, we can always add more
properties for bit depth etc later on.

> > Here is another example with a 3D LUT:
> >
> >      Color operation 42
> >      ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >      ├─ "lut_size": immutable range = 33
> >      ├─ "lut_data": blob
> >      └─ "next": immutable color operation ID = 43
> >
> We are going to need to expose the packing order here to avoid any
> programming uncertainty. I don't think we can safely assume all hardware
> is equivalent.

The driver can easily change the layout of the matrix and do any conversion
necessary when programming the hardware. We do need to document what layout is
used in the uAPI for sure.

> > And one last example with a matrix:
> >
> >      Color operation 42
> >      ├─ "type": enum {Bypass, Matrix} = Matrix
> >      ├─ "matrix_data": blob
> >      └─ "next": immutable color operation ID = 43
> >
> It is unclear to me what the default sizing of this matrix is. Any
> objections to exposing these details with an additional property?

The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
that wouldn't be enough?

> Dithering logic exists in some pipelines. I think we need a plan to
> expose that here as well.

Hm, I'm not too familiar with dithering. Do you think it would make sense to
expose as an additional colorop block? Do you think it would have more
consequences on the design?

I want to re-iterate that we don't need to ship all features from day 1. We
just need to come up with a uAPI design on which new features can be built on.

> > [Simon note: an alternative would be to split the color pipeline into
> > two, by
> > having two plane properties ("color_pipeline_pre_scale" and
> > "color_pipeline_post_scale") instead of a single one. This would be
> > similar to
> > the way we want to split pre-blending and post-blending. This could be less
> > expressive for drivers, there may be hardware where there are dependencies
> > between the pre- and post-scaling pipeline?]
> >
> As others have noted, breaking up the pipeline with immutable blocks
> makes the most sense to me here. This way we don't have to predict ahead
> of time every type of block that maybe affected by pipeline ordering.
> Splitting the pipeline into two properties now means future
> logical splits would require introduction of further plane properties.

Right, if there are more "breaking points", then we'll need immutable blocks
anyways.

> > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> > contains some fixed-function blocks which convert from LMS to ICtCp and
> > cannot
> > be disabled/bypassed. NVIDIA hardware has been designed for descriptive
> > APIs
> > where user-space provides a high-level description of the colorspace
> > conversions it needs to perform, and this is at odds with our KMS uAPI
> > proposal. To address this issue, we suggest adding a special block type
> > which
> > describes a fixed conversion from one colorspace to another and cannot be
> > configured by user-space. Then user-space will need to accomodate its
> > pipeline
> > for these special blocks. Such fixed hardware blocks need to be well enough
> > documented so that they can be implemented via shaders.
> >
> A few questions here. What is the current plan for documenting the
> mathematical model for each exposed block? Will each defined 'type' enum
> value be locked to a definition in the kernel documents? As an example,
> when we say '3D LUT' in this proposal does this mean the block will
> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
> direct in to out LUT mapping?

I think we'll want to document these things, yes. We do want to give _some_
slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
hardware segmented LUTs with a different number of elements per LUT segment.
But being mathematically precise (probably with formulae in the docs) is
definitely a goal, and absolutely necessary to implement a shader-based
fallback.

> Overall I am a fan of this proposal though. The prescriptive color
> pipeline UAPI is simple and easy to follow.

Thank you for the comments! Let me know if you disagree with some of the above,
or if my answers are unclear.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-09 16:30     ` Simon Ser
@ 2023-06-09 23:11       ` Christopher Braga
  2023-06-12  9:21         ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Christopher Braga @ 2023-06-09 23:11 UTC (permalink / raw)
  To: Simon Ser
  Cc: Aleix Pol, Pekka Paalanen, DRI Development, xaver.hugl,
	Michel Dänzer, wayland-devel, Melissa Wen, Jonas Ådahl,
	Uma Shankar, Victoria Brekenfeld, Sebastian Wick, Joshua Ashton



On 6/9/2023 12:30 PM, Simon Ser wrote:
> Hi Christopher,
> 
> On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
> 
>>> The new COLOROP objects also expose a number of KMS properties. Each has a
>>> type, a reference to the next COLOROP object in the linked list, and other
>>> type-specific properties. Here is an example for a 1D LUT operation:
>>>
>>>       Color operation 42
>>>       ├─ "type": enum {Bypass, 1D curve} = 1D curve
>>>       ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
>> curves? Will different hardware be allowed to expose a subset of these
>> enum values?
> 
> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> 
>>>       ├─ "lut_size": immutable range = 4096
>>>       ├─ "lut_data": blob
>>>       └─ "next": immutable color operation ID = 43
>>>
>> Some hardware has per channel 1D LUT values, while others use the same
>> LUT for all channels.  We will definitely need to expose this in the
>> UAPI in some form.
> 
> Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
> DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
> to get exposed as another color operation block.
> 
>>> To configure this hardware block, user-space can fill a KMS blob with
>>> 4096 u32
>>> entries, then set "lut_data" to the blob ID. Other color operation types
>>> might
>>> have different properties.
>>>
>> The bit-depth of the LUT is an important piece of information we should
>> include by default. Are we assuming that the DRM driver will always
>> reduce the input values to the resolution supported by the pipeline?
>> This could result in differences between the hardware behavior
>> and the shader behavior.
>>
>> Additionally, some pipelines are floating point while others are fixed.
>> How would user space know if it needs to pack 32 bit integer values vs
>> 32 bit float values?
> 
> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
> definition of LUT blob (u16 elements) and it's up to the driver to convert.
> 
> Using a very precise format for the uAPI has the nice property of making the
> uAPI much simpler to use. User-space sends high precision data and it's up to
> drivers to map that to whatever the hardware accepts.
>
Conversion from a larger uint type to a smaller type sounds low effort, 
however if a block works in a floating point space things are going to 
get messy really quickly. If the block operates in FP16 space and the 
interface is 16 bits we are good, but going from 32 bits to FP16 (such 
as in the matrix case or 3DLUT) is less than ideal.

> Exposing the actual hardware precision is something we've talked about during
> the hackfest. It'll probably be useful to some extent, but will require some
> discussion to figure out how to design the uAPI. Maybe a simple property is
> enough, maybe not (e.g. fully describing the precision of segmented LUTs would
> probably be trickier).
> 
> I'd rather keep things simple for the first pass, we can always add more
> properties for bit depth etc later on.
> 
Indicating if a block operates on / with fixed vs float values is 
significant enough that I think we should account for this in initial 
design. It will have a affect on both the user space value packing + 
expected value ranges in the hardware.

>>> Here is another example with a 3D LUT:
>>>
>>>       Color operation 42
>>>       ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>>>       ├─ "lut_size": immutable range = 33
>>>       ├─ "lut_data": blob
>>>       └─ "next": immutable color operation ID = 43
>>>
>> We are going to need to expose the packing order here to avoid any
>> programming uncertainty. I don't think we can safely assume all hardware
>> is equivalent.
> 
> The driver can easily change the layout of the matrix and do any conversion
> necessary when programming the hardware. We do need to document what layout is
> used in the uAPI for sure.
> 
>>> And one last example with a matrix:
>>>
>>>       Color operation 42
>>>       ├─ "type": enum {Bypass, Matrix} = Matrix
>>>       ├─ "matrix_data": blob
>>>       └─ "next": immutable color operation ID = 43
>>>
>> It is unclear to me what the default sizing of this matrix is. Any
>> objections to exposing these details with an additional property?
> 
> The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
> that wouldn't be enough?

Larger cases do exist, but as you mention this can be resolved with a 
different type then. I don't have any issues with the default 'Matrix' 
type being 9 entries.

> 
>> Dithering logic exists in some pipelines. I think we need a plan to
>> expose that here as well.
> 
> Hm, I'm not too familiar with dithering. Do you think it would make sense to
> expose as an additional colorop block? Do you think it would have more
> consequences on the design?
> 
> I want to re-iterate that we don't need to ship all features from day 1. We
> just need to come up with a uAPI design on which new features can be built on.
> 

Agreed. I don't think this will affect the proposed design so this can 
be figured out once we have a DRM driver impl that declares this block.

>>> [Simon note: an alternative would be to split the color pipeline into
>>> two, by
>>> having two plane properties ("color_pipeline_pre_scale" and
>>> "color_pipeline_post_scale") instead of a single one. This would be
>>> similar to
>>> the way we want to split pre-blending and post-blending. This could be less
>>> expressive for drivers, there may be hardware where there are dependencies
>>> between the pre- and post-scaling pipeline?]
>>>
>> As others have noted, breaking up the pipeline with immutable blocks
>> makes the most sense to me here. This way we don't have to predict ahead
>> of time every type of block that maybe affected by pipeline ordering.
>> Splitting the pipeline into two properties now means future
>> logical splits would require introduction of further plane properties.
> 
> Right, if there are more "breaking points", then we'll need immutable blocks
> anyways.
> 
>>> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
>>> contains some fixed-function blocks which convert from LMS to ICtCp and
>>> cannot
>>> be disabled/bypassed. NVIDIA hardware has been designed for descriptive
>>> APIs
>>> where user-space provides a high-level description of the colorspace
>>> conversions it needs to perform, and this is at odds with our KMS uAPI
>>> proposal. To address this issue, we suggest adding a special block type
>>> which
>>> describes a fixed conversion from one colorspace to another and cannot be
>>> configured by user-space. Then user-space will need to accomodate its
>>> pipeline
>>> for these special blocks. Such fixed hardware blocks need to be well enough
>>> documented so that they can be implemented via shaders.
>>>
>> A few questions here. What is the current plan for documenting the
>> mathematical model for each exposed block? Will each defined 'type' enum
>> value be locked to a definition in the kernel documents? As an example,
>> when we say '3D LUT' in this proposal does this mean the block will
>> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
>> direct in to out LUT mapping?
> 
> I think we'll want to document these things, yes. We do want to give _some_
> slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
> hardware segmented LUTs with a different number of elements per LUT segment.
> But being mathematically precise (probably with formulae in the docs) is
> definitely a goal, and absolutely necessary to implement a shader-based
> fallback.

I agree some driver slack is necessary, however ideally this will be 
locked down enough that from the compositor side they see "1D LUT" and 
know exactly what to expect independent of the hardware. This way 
regardless of if I am running on a NVIDIA / AMD / QCOM / etc... chip, 
common color pipeline strategies can be used. Assuming a perfect world 
where there is a workable overlap between chips of course.

Anyways, this isn't something we need to hammer down right this moment.

Regards,
Christopher
> 
>> Overall I am a fan of this proposal though. The prescriptive color
>> pipeline UAPI is simple and easy to follow.
> 
> Thank you for the comments! Let me know if you disagree with some of the above,
> or if my answers are unclear.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-09 23:11       ` Christopher Braga
@ 2023-06-12  9:21         ` Pekka Paalanen
  2023-06-12 16:56           ` Christopher Braga
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-06-12  9:21 UTC (permalink / raw)
  To: Christopher Braga
  Cc: Aleix Pol, DRI Development, xaver.hugl, Michel Dänzer,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Sebastian Wick, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 11501 bytes --]

On Fri, 9 Jun 2023 19:11:25 -0400
Christopher Braga <quic_cbraga@quicinc.com> wrote:

> On 6/9/2023 12:30 PM, Simon Ser wrote:
> > Hi Christopher,
> > 
> > On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >   
> >>> The new COLOROP objects also expose a number of KMS properties. Each has a
> >>> type, a reference to the next COLOROP object in the linked list, and other
> >>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>
> >>>       Color operation 42
> >>>       ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>>       ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT  
> >> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >> curves? Will different hardware be allowed to expose a subset of these
> >> enum values?  
> > 
> > Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> >   
> >>>       ├─ "lut_size": immutable range = 4096
> >>>       ├─ "lut_data": blob
> >>>       └─ "next": immutable color operation ID = 43
> >>>  
> >> Some hardware has per channel 1D LUT values, while others use the same
> >> LUT for all channels.  We will definitely need to expose this in the
> >> UAPI in some form.  
> > 
> > Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
> > DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
> > to get exposed as another color operation block.
> >   
> >>> To configure this hardware block, user-space can fill a KMS blob with
> >>> 4096 u32
> >>> entries, then set "lut_data" to the blob ID. Other color operation types
> >>> might
> >>> have different properties.
> >>>  
> >> The bit-depth of the LUT is an important piece of information we should
> >> include by default. Are we assuming that the DRM driver will always
> >> reduce the input values to the resolution supported by the pipeline?
> >> This could result in differences between the hardware behavior
> >> and the shader behavior.
> >>
> >> Additionally, some pipelines are floating point while others are fixed.
> >> How would user space know if it needs to pack 32 bit integer values vs
> >> 32 bit float values?  
> > 
> > Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
> > definition of LUT blob (u16 elements) and it's up to the driver to convert.
> > 
> > Using a very precise format for the uAPI has the nice property of making the
> > uAPI much simpler to use. User-space sends high precision data and it's up to
> > drivers to map that to whatever the hardware accepts.
> >  
> Conversion from a larger uint type to a smaller type sounds low effort, 
> however if a block works in a floating point space things are going to 
> get messy really quickly. If the block operates in FP16 space and the 
> interface is 16 bits we are good, but going from 32 bits to FP16 (such 
> as in the matrix case or 3DLUT) is less than ideal.

Hi Christopher,

are you thinking of precision loss, or the overhead of conversion?

Conversion from N-bit fixed point to N-bit floating-point is generally
lossy, too, and the other direction as well.

What exactly would be messy?

> 
> > Exposing the actual hardware precision is something we've talked about during
> > the hackfest. It'll probably be useful to some extent, but will require some
> > discussion to figure out how to design the uAPI. Maybe a simple property is
> > enough, maybe not (e.g. fully describing the precision of segmented LUTs would
> > probably be trickier).
> > 
> > I'd rather keep things simple for the first pass, we can always add more
> > properties for bit depth etc later on.
> >   
> Indicating if a block operates on / with fixed vs float values is 
> significant enough that I think we should account for this in initial 
> design. It will have a affect on both the user space value packing + 
> expected value ranges in the hardware.

What do you mean by "value packing"? Memory layout of the bits forming
a value? Or possible exact values of a specific type?

I don't think fixed vs. float is the most important thing. Even fixed
point formats can have different numbers of bits for whole numbers,
which changes the usable value range and not only precision. Userspace
at the very least needs to know the usable value range for the block's
inputs, outputs, and parameters.

When defining the precision for inputs, outputs and parameters, then
fixed- vs. floating-point becomes meaningful in explaining what "N bits
of precision" means.

Then there is the question of variable precision that depends on the
actual block input and parameter values, how to represent that. Worst
case precision might be too pessimistic alone.

> >>> Here is another example with a 3D LUT:
> >>>
> >>>       Color operation 42
> >>>       ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >>>       ├─ "lut_size": immutable range = 33
> >>>       ├─ "lut_data": blob
> >>>       └─ "next": immutable color operation ID = 43
> >>>  
> >> We are going to need to expose the packing order here to avoid any
> >> programming uncertainty. I don't think we can safely assume all hardware
> >> is equivalent.  
> > 
> > The driver can easily change the layout of the matrix and do any conversion
> > necessary when programming the hardware. We do need to document what layout is
> > used in the uAPI for sure.
> >   
> >>> And one last example with a matrix:
> >>>
> >>>       Color operation 42
> >>>       ├─ "type": enum {Bypass, Matrix} = Matrix
> >>>       ├─ "matrix_data": blob
> >>>       └─ "next": immutable color operation ID = 43
> >>>  
> >> It is unclear to me what the default sizing of this matrix is. Any
> >> objections to exposing these details with an additional property?  
> > 
> > The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
> > that wouldn't be enough?  
> 
> Larger cases do exist, but as you mention this can be resolved with a 
> different type then. I don't have any issues with the default 'Matrix' 
> type being 9 entries.

Please, tell us more. How big, and what are they used for?

IIRC ICC has 3x3 matrix + offset vector. Do you have even more?


> >   
> >> Dithering logic exists in some pipelines. I think we need a plan to
> >> expose that here as well.  
> > 
> > Hm, I'm not too familiar with dithering. Do you think it would make sense to
> > expose as an additional colorop block? Do you think it would have more
> > consequences on the design?

I think it would be an additional block, and no other consequences, be
it temporal and/or spatial dithering, as long as it does not look at
neighbouring pixels to determine the output for current pixel.

> > 
> > I want to re-iterate that we don't need to ship all features from day 1. We
> > just need to come up with a uAPI design on which new features can be built on.
> >   
> 
> Agreed. I don't think this will affect the proposed design so this can 
> be figured out once we have a DRM driver impl that declares this block.
> 
> >>> [Simon note: an alternative would be to split the color pipeline into
> >>> two, by
> >>> having two plane properties ("color_pipeline_pre_scale" and
> >>> "color_pipeline_post_scale") instead of a single one. This would be
> >>> similar to
> >>> the way we want to split pre-blending and post-blending. This could be less
> >>> expressive for drivers, there may be hardware where there are dependencies
> >>> between the pre- and post-scaling pipeline?]
> >>>  
> >> As others have noted, breaking up the pipeline with immutable blocks
> >> makes the most sense to me here. This way we don't have to predict ahead
> >> of time every type of block that maybe affected by pipeline ordering.
> >> Splitting the pipeline into two properties now means future
> >> logical splits would require introduction of further plane properties.  
> > 
> > Right, if there are more "breaking points", then we'll need immutable blocks
> > anyways.
> >   
> >>> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> >>> contains some fixed-function blocks which convert from LMS to ICtCp and
> >>> cannot
> >>> be disabled/bypassed. NVIDIA hardware has been designed for descriptive
> >>> APIs
> >>> where user-space provides a high-level description of the colorspace
> >>> conversions it needs to perform, and this is at odds with our KMS uAPI
> >>> proposal. To address this issue, we suggest adding a special block type
> >>> which
> >>> describes a fixed conversion from one colorspace to another and cannot be
> >>> configured by user-space. Then user-space will need to accomodate its
> >>> pipeline
> >>> for these special blocks. Such fixed hardware blocks need to be well enough
> >>> documented so that they can be implemented via shaders.
> >>>  
> >> A few questions here. What is the current plan for documenting the
> >> mathematical model for each exposed block? Will each defined 'type' enum
> >> value be locked to a definition in the kernel documents? As an example,
> >> when we say '3D LUT' in this proposal does this mean the block will
> >> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
> >> direct in to out LUT mapping?  
> > 
> > I think we'll want to document these things, yes. We do want to give _some_
> > slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
> > hardware segmented LUTs with a different number of elements per LUT segment.
> > But being mathematically precise (probably with formulae in the docs) is
> > definitely a goal, and absolutely necessary to implement a shader-based
> > fallback.  
> 
> I agree some driver slack is necessary, however ideally this will be 
> locked down enough that from the compositor side they see "1D LUT" and 
> know exactly what to expect independent of the hardware. This way 
> regardless of if I am running on a NVIDIA / AMD / QCOM / etc... chip, 
> common color pipeline strategies can be used. Assuming a perfect world 
> where there is a workable overlap between chips of course.

Yes, of course, at least for a start.

However, the long term plan includes a shared userspace library with
driver- and hardware-specific knowledge to use hardware- and
driver-specific blocks. All blocks still need to be explicitly
specified in the kernel UAPI documentation, the idea is that it should
not be a problem for many vendors to have blocks no-one else does. The
library would offer a much more generic API, and use snowflake blocks
to their fullest. The library would also spit out OpenGL shaders and
whatnot for the fallback.

The future in the long term could be either way: evolving towards
generic KMS UAPI blocks with no need for a userspace library
abstraction, or evolving towards hardware-specific KMS UAPI blocks with
a userspace library to abstract them like Mesa does for GPUs.


Thanks,
pq

> Anyways, this isn't something we need to hammer down right this moment.
> 
> Regards,
> Christopher
> >   
> >> Overall I am a fan of this proposal though. The prescriptive color
> >> pipeline UAPI is simple and easy to follow.  
> > 
> > Thank you for the comments! Let me know if you disagree with some of the above,
> > or if my answers are unclear.  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-12  9:21         ` Pekka Paalanen
@ 2023-06-12 16:56           ` Christopher Braga
  2023-06-13  8:23             ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Christopher Braga @ 2023-06-12 16:56 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Aleix Pol, DRI Development, xaver.hugl, Michel Dänzer,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Sebastian Wick, Joshua Ashton



On 6/12/2023 5:21 AM, Pekka Paalanen wrote:
> On Fri, 9 Jun 2023 19:11:25 -0400
> Christopher Braga <quic_cbraga@quicinc.com> wrote:
> 
>> On 6/9/2023 12:30 PM, Simon Ser wrote:
>>> Hi Christopher,
>>>
>>> On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
>>>    
>>>>> The new COLOROP objects also expose a number of KMS properties. Each has a
>>>>> type, a reference to the next COLOROP object in the linked list, and other
>>>>> type-specific properties. Here is an example for a 1D LUT operation:
>>>>>
>>>>>        Color operation 42
>>>>>        ├─ "type": enum {Bypass, 1D curve} = 1D curve
>>>>>        ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>>>> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
>>>> curves? Will different hardware be allowed to expose a subset of these
>>>> enum values?
>>>
>>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
>>>    
>>>>>        ├─ "lut_size": immutable range = 4096
>>>>>        ├─ "lut_data": blob
>>>>>        └─ "next": immutable color operation ID = 43
>>>>>   
>>>> Some hardware has per channel 1D LUT values, while others use the same
>>>> LUT for all channels.  We will definitely need to expose this in the
>>>> UAPI in some form.
>>>
>>> Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
>>> DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
>>> to get exposed as another color operation block.
>>>    
>>>>> To configure this hardware block, user-space can fill a KMS blob with
>>>>> 4096 u32
>>>>> entries, then set "lut_data" to the blob ID. Other color operation types
>>>>> might
>>>>> have different properties.
>>>>>   
>>>> The bit-depth of the LUT is an important piece of information we should
>>>> include by default. Are we assuming that the DRM driver will always
>>>> reduce the input values to the resolution supported by the pipeline?
>>>> This could result in differences between the hardware behavior
>>>> and the shader behavior.
>>>>
>>>> Additionally, some pipelines are floating point while others are fixed.
>>>> How would user space know if it needs to pack 32 bit integer values vs
>>>> 32 bit float values?
>>>
>>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
>>> definition of LUT blob (u16 elements) and it's up to the driver to convert.
>>>
>>> Using a very precise format for the uAPI has the nice property of making the
>>> uAPI much simpler to use. User-space sends high precision data and it's up to
>>> drivers to map that to whatever the hardware accepts.
>>>   
>> Conversion from a larger uint type to a smaller type sounds low effort,
>> however if a block works in a floating point space things are going to
>> get messy really quickly. If the block operates in FP16 space and the
>> interface is 16 bits we are good, but going from 32 bits to FP16 (such
>> as in the matrix case or 3DLUT) is less than ideal.
> 
> Hi Christopher,
> 
> are you thinking of precision loss, or the overhead of conversion?
> 
> Conversion from N-bit fixed point to N-bit floating-point is generally
> lossy, too, and the other direction as well.
> 
> What exactly would be messy?
> 
Overheard of conversion is the primary concern here. Having to extract 
and / or calculate the significand + exponent components in the kernel 
is burdensome and imo a task better suited for user space. This also has 
to be done every blob set, meaning that if user space is re-using 
pre-calculated blobs we would be repeating the same conversion 
operations in kernel space unnecessarily.

I agree normalization of the value causing precision loss and rounding 
we can't avoid.

We should also consider the fact that float pipelines have been known to 
use the scrgb definition for floating point values 
(https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt). 
In cases like this where there may be a expected value range in the 
pipeline, how to normalize a larger input becomes a little confusing. Ex 
- Does U32 MAX become FP16 MAX or value MAX (i.e 127).

>>
>>> Exposing the actual hardware precision is something we've talked about during
>>> the hackfest. It'll probably be useful to some extent, but will require some
>>> discussion to figure out how to design the uAPI. Maybe a simple property is
>>> enough, maybe not (e.g. fully describing the precision of segmented LUTs would
>>> probably be trickier).
>>>
>>> I'd rather keep things simple for the first pass, we can always add more
>>> properties for bit depth etc later on.
>>>    
>> Indicating if a block operates on / with fixed vs float values is
>> significant enough that I think we should account for this in initial
>> design. It will have a affect on both the user space value packing +
>> expected value ranges in the hardware.
> 
> What do you mean by "value packing"? Memory layout of the bits forming
> a value? Or possible exact values of a specific type? >
Both really. If the kernel is provided a U32 value, we need to know if 
this is a U32 value, or a float packed into a U32 container. Likewise as 
mentioned with the scRGB above, float could even adjust the value range 
expectations.

> I don't think fixed vs. float is the most important thing. Even fixed
> point formats can have different numbers of bits for whole numbers,
> which changes the usable value range and not only precision. Userspace
> at the very least needs to know the usable value range for the block's
> inputs, outputs, and parameters.
> 
> When defining the precision for inputs, outputs and parameters, then
> fixed- vs. floating-point becomes meaningful in explaining what "N bits
> of precision" means.
> 
> Then there is the question of variable precision that depends on the
> actual block input and parameter values, how to represent that. Worst
> case precision might be too pessimistic alone.
> 
Agreed. More information probably is needed to full define the interface 
expectations.

>>>>> Here is another example with a 3D LUT:
>>>>>
>>>>>        Color operation 42
>>>>>        ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>>>>>        ├─ "lut_size": immutable range = 33
>>>>>        ├─ "lut_data": blob
>>>>>        └─ "next": immutable color operation ID = 43
>>>>>   
>>>> We are going to need to expose the packing order here to avoid any
>>>> programming uncertainty. I don't think we can safely assume all hardware
>>>> is equivalent.
>>>
>>> The driver can easily change the layout of the matrix and do any conversion
>>> necessary when programming the hardware. We do need to document what layout is
>>> used in the uAPI for sure.
>>>    
>>>>> And one last example with a matrix:
>>>>>
>>>>>        Color operation 42
>>>>>        ├─ "type": enum {Bypass, Matrix} = Matrix
>>>>>        ├─ "matrix_data": blob
>>>>>        └─ "next": immutable color operation ID = 43
>>>>>   
>>>> It is unclear to me what the default sizing of this matrix is. Any
>>>> objections to exposing these details with an additional property?
>>>
>>> The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
>>> that wouldn't be enough?
>>
>> Larger cases do exist, but as you mention this can be resolved with a
>> different type then. I don't have any issues with the default 'Matrix'
>> type being 9 entries.
> 
> Please, tell us more. How big, and what are they used for?
> 
> IIRC ICC has 3x3 matrix + offset vector. Do you have even more?
> 
> 
Offset is one. Range adjustment 'vector' is another. But ultimately this 
proposal is flexible enough that this can probably just be another color 
block in the pipeline. No complaints from me here.

>>>    
>>>> Dithering logic exists in some pipelines. I think we need a plan to
>>>> expose that here as well.
>>>
>>> Hm, I'm not too familiar with dithering. Do you think it would make sense to
>>> expose as an additional colorop block? Do you think it would have more
>>> consequences on the design?
> 
> I think it would be an additional block, and no other consequences, be
> it temporal and/or spatial dithering, as long as it does not look at
> neighbouring pixels to determine the output for current pixel.
> 
>>>
>>> I want to re-iterate that we don't need to ship all features from day 1. We
>>> just need to come up with a uAPI design on which new features can be built on.
>>>    
>>
>> Agreed. I don't think this will affect the proposed design so this can
>> be figured out once we have a DRM driver impl that declares this block.
>>
>>>>> [Simon note: an alternative would be to split the color pipeline into
>>>>> two, by
>>>>> having two plane properties ("color_pipeline_pre_scale" and
>>>>> "color_pipeline_post_scale") instead of a single one. This would be
>>>>> similar to
>>>>> the way we want to split pre-blending and post-blending. This could be less
>>>>> expressive for drivers, there may be hardware where there are dependencies
>>>>> between the pre- and post-scaling pipeline?]
>>>>>   
>>>> As others have noted, breaking up the pipeline with immutable blocks
>>>> makes the most sense to me here. This way we don't have to predict ahead
>>>> of time every type of block that maybe affected by pipeline ordering.
>>>> Splitting the pipeline into two properties now means future
>>>> logical splits would require introduction of further plane properties.
>>>
>>> Right, if there are more "breaking points", then we'll need immutable blocks
>>> anyways.
>>>    
>>>>> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
>>>>> contains some fixed-function blocks which convert from LMS to ICtCp and
>>>>> cannot
>>>>> be disabled/bypassed. NVIDIA hardware has been designed for descriptive
>>>>> APIs
>>>>> where user-space provides a high-level description of the colorspace
>>>>> conversions it needs to perform, and this is at odds with our KMS uAPI
>>>>> proposal. To address this issue, we suggest adding a special block type
>>>>> which
>>>>> describes a fixed conversion from one colorspace to another and cannot be
>>>>> configured by user-space. Then user-space will need to accomodate its
>>>>> pipeline
>>>>> for these special blocks. Such fixed hardware blocks need to be well enough
>>>>> documented so that they can be implemented via shaders.
>>>>>   
>>>> A few questions here. What is the current plan for documenting the
>>>> mathematical model for each exposed block? Will each defined 'type' enum
>>>> value be locked to a definition in the kernel documents? As an example,
>>>> when we say '3D LUT' in this proposal does this mean the block will
>>>> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
>>>> direct in to out LUT mapping?
>>>
>>> I think we'll want to document these things, yes. We do want to give _some_
>>> slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
>>> hardware segmented LUTs with a different number of elements per LUT segment.
>>> But being mathematically precise (probably with formulae in the docs) is
>>> definitely a goal, and absolutely necessary to implement a shader-based
>>> fallback.
>>
>> I agree some driver slack is necessary, however ideally this will be
>> locked down enough that from the compositor side they see "1D LUT" and
>> know exactly what to expect independent of the hardware. This way
>> regardless of if I am running on a NVIDIA / AMD / QCOM / etc... chip,
>> common color pipeline strategies can be used. Assuming a perfect world
>> where there is a workable overlap between chips of course.
> 
> Yes, of course, at least for a start.
> 
> However, the long term plan includes a shared userspace library with
> driver- and hardware-specific knowledge to use hardware- and
> driver-specific blocks. All blocks still need to be explicitly
> specified in the kernel UAPI documentation, the idea is that it should
> not be a problem for many vendors to have blocks no-one else does. The
> library would offer a much more generic API, and use snowflake blocks
> to their fullest. The library would also spit out OpenGL shaders and
> whatnot for the fallback.
> 
> The future in the long term could be either way: evolving towards
> generic KMS UAPI blocks with no need for a userspace library
> abstraction, or evolving towards hardware-specific KMS UAPI blocks with
> a userspace library to abstract them like Mesa does for GPUs.
> 
Sounds good to me!

Thanks,
Christopher

> 
> Thanks,
> pq
> 
>> Anyways, this isn't something we need to hammer down right this moment.
>>
>> Regards,
>> Christopher
>>>    
>>>> Overall I am a fan of this proposal though. The prescriptive color
>>>> pipeline UAPI is simple and easy to follow.
>>>
>>> Thank you for the comments! Let me know if you disagree with some of the above,
>>> or if my answers are unclear.
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-12 16:56           ` Christopher Braga
@ 2023-06-13  8:23             ` Pekka Paalanen
  2023-06-13 16:29               ` Christopher Braga
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-06-13  8:23 UTC (permalink / raw)
  To: Christopher Braga
  Cc: Aleix Pol, DRI Development, xaver.hugl, Michel Dänzer,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Sebastian Wick, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 16919 bytes --]

On Mon, 12 Jun 2023 12:56:57 -0400
Christopher Braga <quic_cbraga@quicinc.com> wrote:

> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:
> > On Fri, 9 Jun 2023 19:11:25 -0400
> > Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >   
> >> On 6/9/2023 12:30 PM, Simon Ser wrote:  
> >>> Hi Christopher,
> >>>
> >>> On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >>>      
> >>>>> The new COLOROP objects also expose a number of KMS properties. Each has a
> >>>>> type, a reference to the next COLOROP object in the linked list, and other
> >>>>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>>>
> >>>>>        Color operation 42
> >>>>>        ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>>>>        ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT  
> >>>> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >>>> curves? Will different hardware be allowed to expose a subset of these
> >>>> enum values?  
> >>>
> >>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> >>>      
> >>>>>        ├─ "lut_size": immutable range = 4096
> >>>>>        ├─ "lut_data": blob
> >>>>>        └─ "next": immutable color operation ID = 43
> >>>>>     
> >>>> Some hardware has per channel 1D LUT values, while others use the same
> >>>> LUT for all channels.  We will definitely need to expose this in the
> >>>> UAPI in some form.  
> >>>
> >>> Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
> >>> DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
> >>> to get exposed as another color operation block.
> >>>      
> >>>>> To configure this hardware block, user-space can fill a KMS blob with
> >>>>> 4096 u32
> >>>>> entries, then set "lut_data" to the blob ID. Other color operation types
> >>>>> might
> >>>>> have different properties.
> >>>>>     
> >>>> The bit-depth of the LUT is an important piece of information we should
> >>>> include by default. Are we assuming that the DRM driver will always
> >>>> reduce the input values to the resolution supported by the pipeline?
> >>>> This could result in differences between the hardware behavior
> >>>> and the shader behavior.
> >>>>
> >>>> Additionally, some pipelines are floating point while others are fixed.
> >>>> How would user space know if it needs to pack 32 bit integer values vs
> >>>> 32 bit float values?  
> >>>
> >>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
> >>> definition of LUT blob (u16 elements) and it's up to the driver to convert.
> >>>
> >>> Using a very precise format for the uAPI has the nice property of making the
> >>> uAPI much simpler to use. User-space sends high precision data and it's up to
> >>> drivers to map that to whatever the hardware accepts.
> >>>     
> >> Conversion from a larger uint type to a smaller type sounds low effort,
> >> however if a block works in a floating point space things are going to
> >> get messy really quickly. If the block operates in FP16 space and the
> >> interface is 16 bits we are good, but going from 32 bits to FP16 (such
> >> as in the matrix case or 3DLUT) is less than ideal.  
> > 
> > Hi Christopher,
> > 
> > are you thinking of precision loss, or the overhead of conversion?
> > 
> > Conversion from N-bit fixed point to N-bit floating-point is generally
> > lossy, too, and the other direction as well.
> > 
> > What exactly would be messy?
> >   
> Overheard of conversion is the primary concern here. Having to extract 
> and / or calculate the significand + exponent components in the kernel 
> is burdensome and imo a task better suited for user space. This also has 
> to be done every blob set, meaning that if user space is re-using 
> pre-calculated blobs we would be repeating the same conversion 
> operations in kernel space unnecessarily.

What is burdensome in that calculation? I don't think you would need to
use any actual floating-point instructions. Logarithm for finding the
exponent is about finding the highest bit set in an integer and
everything is conveniently expressed in base-2. Finding significand is
just masking the integer based on the exponent.

Can you not cache the converted data, keyed by the DRM blob unique
identity vs. the KMS property it is attached to?

You can assume that userspace will not be re-creating DRM blobs without
a reason to believe the contents have changed. If the same blob is set
on the same property repeatedly, I would definitely not expect a driver
to convert the data again. If a driver does that, it seems like it
should be easy to avoid, though I'm no kernel dev. Even if the
conversion was just a memcpy, I would still posit it needs to be
avoided when the data has obviously not changed. Blobs are immutable.

Userspace having to use hardware-specific number formats would probably
not be well received.

> I agree normalization of the value causing precision loss and rounding 
> we can't avoid.
> 
> We should also consider the fact that float pipelines have been known to 
> use the scrgb definition for floating point values 
> (https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt). 

scRGB is as good a definition of color encoding as "floating-point" is
for numbers. What I mean is that it carries very little usable meaning,
and without further information it is practically boundless
- infinite - in both color gamut and dynamic range. Just like any
floating-point quantity.

However, what we want from KMS color pipeline is zero implied or
defined meaning. That means scRGB carries too much meaning, because the
primaries are fixed and (1.0, 1.0, 1.0) should match sRGB/SDR white.

Btw. if one brings in nit units, you assume a specific viewing
environment which is rarely true in reality. I'll leave that rabbit
hole for another time. I just want to mention that nit (cd/m²) is a
unit that is relative to the chosen viewing environment when your goal
is a specific perception of brightness.

> In cases like this where there may be a expected value range in the 
> pipeline, how to normalize a larger input becomes a little confusing. Ex 
> - Does U32 MAX become FP16 MAX or value MAX (i.e 127).

UAPI simply needs to specify the number encoding used in the UAPI, how
bit patterns map to real numbers. Real numbers are then what the color
pipeline operates on.

However, intermediate value representation used between two KMS colorop
blocks is never observable to userspace. All userspace needs to know is
the usable value range and precision behaviour. I think that is best
defined for the input and output of each block rather than what flows
in between, because an optional (e.g. LUT) block when bypassed does not
impose its limitations.

What does 1.0 actually mean, that is left for userspace to use however
it wishes. There are only pipeline boundary conditions to that: the
input to a pipeline comes from a DRM FB, so it has a number encoding
specified mostly by pixel format, and an arbitrary colorimetric
encoding that only userspace knows. The output of the pipeline has to
be standardised so that drivers can number-encode the pipeline output
correctly to wire format on e.g. HDMI. Userspace alone is responsible
for making sure the colorimetry matches what the sink expects.

Individual KMS color pipeline colorop blocks need to define their own
acceptable input and output ranges. E.g. a look-up table may assume
that it's input is in [0.0, 1.0] and anything outside is clamped to
that range. That poses restrictions on how userspace can use the block.

> >>  
> >>> Exposing the actual hardware precision is something we've talked about during
> >>> the hackfest. It'll probably be useful to some extent, but will require some
> >>> discussion to figure out how to design the uAPI. Maybe a simple property is
> >>> enough, maybe not (e.g. fully describing the precision of segmented LUTs would
> >>> probably be trickier).
> >>>
> >>> I'd rather keep things simple for the first pass, we can always add more
> >>> properties for bit depth etc later on.
> >>>      
> >> Indicating if a block operates on / with fixed vs float values is
> >> significant enough that I think we should account for this in initial
> >> design. It will have a affect on both the user space value packing +
> >> expected value ranges in the hardware.  
> > 
> > What do you mean by "value packing"? Memory layout of the bits forming
> > a value? Or possible exact values of a specific type? >  
> Both really. If the kernel is provided a U32 value, we need to know if 
> this is a U32 value, or a float packed into a U32 container. Likewise as 
> mentioned with the scRGB above, float could even adjust the value range 
> expectations.

Right. The UAPI will simply define that.

> > I don't think fixed vs. float is the most important thing. Even fixed
> > point formats can have different numbers of bits for whole numbers,
> > which changes the usable value range and not only precision. Userspace
> > at the very least needs to know the usable value range for the block's
> > inputs, outputs, and parameters.
> > 
> > When defining the precision for inputs, outputs and parameters, then
> > fixed- vs. floating-point becomes meaningful in explaining what "N bits
> > of precision" means.
> > 
> > Then there is the question of variable precision that depends on the
> > actual block input and parameter values, how to represent that. Worst
> > case precision might be too pessimistic alone.
> >   
> Agreed. More information probably is needed to full define the interface 
> expectations.
> 
> >>>>> Here is another example with a 3D LUT:
> >>>>>
> >>>>>        Color operation 42
> >>>>>        ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >>>>>        ├─ "lut_size": immutable range = 33
> >>>>>        ├─ "lut_data": blob
> >>>>>        └─ "next": immutable color operation ID = 43
> >>>>>     
> >>>> We are going to need to expose the packing order here to avoid any
> >>>> programming uncertainty. I don't think we can safely assume all hardware
> >>>> is equivalent.  
> >>>
> >>> The driver can easily change the layout of the matrix and do any conversion
> >>> necessary when programming the hardware. We do need to document what layout is
> >>> used in the uAPI for sure.
> >>>      
> >>>>> And one last example with a matrix:
> >>>>>
> >>>>>        Color operation 42
> >>>>>        ├─ "type": enum {Bypass, Matrix} = Matrix
> >>>>>        ├─ "matrix_data": blob
> >>>>>        └─ "next": immutable color operation ID = 43
> >>>>>     
> >>>> It is unclear to me what the default sizing of this matrix is. Any
> >>>> objections to exposing these details with an additional property?  
> >>>
> >>> The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
> >>> that wouldn't be enough?  
> >>
> >> Larger cases do exist, but as you mention this can be resolved with a
> >> different type then. I don't have any issues with the default 'Matrix'
> >> type being 9 entries.  
> > 
> > Please, tell us more. How big, and what are they used for?
> > 
> > IIRC ICC has 3x3 matrix + offset vector. Do you have even more?
> > 
> >   
> Offset is one. Range adjustment 'vector' is another. But ultimately this 
> proposal is flexible enough that this can probably just be another color 
> block in the pipeline. No complaints from me here.

What is a range adjustment vector? A vector of a multiplier per color
channel? Does it include offset?

Yes, sounds like just another block.

> >>>      
> >>>> Dithering logic exists in some pipelines. I think we need a plan to
> >>>> expose that here as well.  
> >>>
> >>> Hm, I'm not too familiar with dithering. Do you think it would make sense to
> >>> expose as an additional colorop block? Do you think it would have more
> >>> consequences on the design?  
> > 
> > I think it would be an additional block, and no other consequences, be
> > it temporal and/or spatial dithering, as long as it does not look at
> > neighbouring pixels to determine the output for current pixel.
> >   
> >>>
> >>> I want to re-iterate that we don't need to ship all features from day 1. We
> >>> just need to come up with a uAPI design on which new features can be built on.
> >>>      
> >>
> >> Agreed. I don't think this will affect the proposed design so this can
> >> be figured out once we have a DRM driver impl that declares this block.
> >>  
> >>>>> [Simon note: an alternative would be to split the color pipeline into
> >>>>> two, by
> >>>>> having two plane properties ("color_pipeline_pre_scale" and
> >>>>> "color_pipeline_post_scale") instead of a single one. This would be
> >>>>> similar to
> >>>>> the way we want to split pre-blending and post-blending. This could be less
> >>>>> expressive for drivers, there may be hardware where there are dependencies
> >>>>> between the pre- and post-scaling pipeline?]
> >>>>>     
> >>>> As others have noted, breaking up the pipeline with immutable blocks
> >>>> makes the most sense to me here. This way we don't have to predict ahead
> >>>> of time every type of block that maybe affected by pipeline ordering.
> >>>> Splitting the pipeline into two properties now means future
> >>>> logical splits would require introduction of further plane properties.  
> >>>
> >>> Right, if there are more "breaking points", then we'll need immutable blocks
> >>> anyways.
> >>>      
> >>>>> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> >>>>> contains some fixed-function blocks which convert from LMS to ICtCp and
> >>>>> cannot
> >>>>> be disabled/bypassed. NVIDIA hardware has been designed for descriptive
> >>>>> APIs
> >>>>> where user-space provides a high-level description of the colorspace
> >>>>> conversions it needs to perform, and this is at odds with our KMS uAPI
> >>>>> proposal. To address this issue, we suggest adding a special block type
> >>>>> which
> >>>>> describes a fixed conversion from one colorspace to another and cannot be
> >>>>> configured by user-space. Then user-space will need to accomodate its
> >>>>> pipeline
> >>>>> for these special blocks. Such fixed hardware blocks need to be well enough
> >>>>> documented so that they can be implemented via shaders.
> >>>>>     
> >>>> A few questions here. What is the current plan for documenting the
> >>>> mathematical model for each exposed block? Will each defined 'type' enum
> >>>> value be locked to a definition in the kernel documents? As an example,
> >>>> when we say '3D LUT' in this proposal does this mean the block will
> >>>> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
> >>>> direct in to out LUT mapping?  
> >>>
> >>> I think we'll want to document these things, yes. We do want to give _some_
> >>> slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
> >>> hardware segmented LUTs with a different number of elements per LUT segment.
> >>> But being mathematically precise (probably with formulae in the docs) is
> >>> definitely a goal, and absolutely necessary to implement a shader-based
> >>> fallback.  
> >>
> >> I agree some driver slack is necessary, however ideally this will be
> >> locked down enough that from the compositor side they see "1D LUT" and
> >> know exactly what to expect independent of the hardware. This way
> >> regardless of if I am running on a NVIDIA / AMD / QCOM / etc... chip,
> >> common color pipeline strategies can be used. Assuming a perfect world
> >> where there is a workable overlap between chips of course.  
> > 
> > Yes, of course, at least for a start.
> > 
> > However, the long term plan includes a shared userspace library with
> > driver- and hardware-specific knowledge to use hardware- and
> > driver-specific blocks. All blocks still need to be explicitly
> > specified in the kernel UAPI documentation, the idea is that it should
> > not be a problem for many vendors to have blocks no-one else does. The
> > library would offer a much more generic API, and use snowflake blocks
> > to their fullest. The library would also spit out OpenGL shaders and
> > whatnot for the fallback.
> > 
> > The future in the long term could be either way: evolving towards
> > generic KMS UAPI blocks with no need for a userspace library
> > abstraction, or evolving towards hardware-specific KMS UAPI blocks with
> > a userspace library to abstract them like Mesa does for GPUs.
> >   
> Sounds good to me!

Awesome!


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-13  8:23             ` Pekka Paalanen
@ 2023-06-13 16:29               ` Christopher Braga
  2023-06-14  9:00                 ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Christopher Braga @ 2023-06-13 16:29 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Aleix Pol, DRI Development, xaver.hugl, Michel Dänzer,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Sebastian Wick, Joshua Ashton



On 6/13/2023 4:23 AM, Pekka Paalanen wrote:
> On Mon, 12 Jun 2023 12:56:57 -0400
> Christopher Braga <quic_cbraga@quicinc.com> wrote:
> 
>> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:
>>> On Fri, 9 Jun 2023 19:11:25 -0400
>>> Christopher Braga <quic_cbraga@quicinc.com> wrote:
>>>    
>>>> On 6/9/2023 12:30 PM, Simon Ser wrote:
>>>>> Hi Christopher,
>>>>>
>>>>> On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
>>>>>       
>>>>>>> The new COLOROP objects also expose a number of KMS properties. Each has a
>>>>>>> type, a reference to the next COLOROP object in the linked list, and other
>>>>>>> type-specific properties. Here is an example for a 1D LUT operation:
>>>>>>>
>>>>>>>         Color operation 42
>>>>>>>         ├─ "type": enum {Bypass, 1D curve} = 1D curve
>>>>>>>         ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>>>>>> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
>>>>>> curves? Will different hardware be allowed to expose a subset of these
>>>>>> enum values?
>>>>>
>>>>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
>>>>>       
>>>>>>>         ├─ "lut_size": immutable range = 4096
>>>>>>>         ├─ "lut_data": blob
>>>>>>>         └─ "next": immutable color operation ID = 43
>>>>>>>      
>>>>>> Some hardware has per channel 1D LUT values, while others use the same
>>>>>> LUT for all channels.  We will definitely need to expose this in the
>>>>>> UAPI in some form.
>>>>>
>>>>> Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
>>>>> DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
>>>>> to get exposed as another color operation block.
>>>>>       
>>>>>>> To configure this hardware block, user-space can fill a KMS blob with
>>>>>>> 4096 u32
>>>>>>> entries, then set "lut_data" to the blob ID. Other color operation types
>>>>>>> might
>>>>>>> have different properties.
>>>>>>>      
>>>>>> The bit-depth of the LUT is an important piece of information we should
>>>>>> include by default. Are we assuming that the DRM driver will always
>>>>>> reduce the input values to the resolution supported by the pipeline?
>>>>>> This could result in differences between the hardware behavior
>>>>>> and the shader behavior.
>>>>>>
>>>>>> Additionally, some pipelines are floating point while others are fixed.
>>>>>> How would user space know if it needs to pack 32 bit integer values vs
>>>>>> 32 bit float values?
>>>>>
>>>>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
>>>>> definition of LUT blob (u16 elements) and it's up to the driver to convert.
>>>>>
>>>>> Using a very precise format for the uAPI has the nice property of making the
>>>>> uAPI much simpler to use. User-space sends high precision data and it's up to
>>>>> drivers to map that to whatever the hardware accepts.
>>>>>      
>>>> Conversion from a larger uint type to a smaller type sounds low effort,
>>>> however if a block works in a floating point space things are going to
>>>> get messy really quickly. If the block operates in FP16 space and the
>>>> interface is 16 bits we are good, but going from 32 bits to FP16 (such
>>>> as in the matrix case or 3DLUT) is less than ideal.
>>>
>>> Hi Christopher,
>>>
>>> are you thinking of precision loss, or the overhead of conversion?
>>>
>>> Conversion from N-bit fixed point to N-bit floating-point is generally
>>> lossy, too, and the other direction as well.
>>>
>>> What exactly would be messy?
>>>    
>> Overheard of conversion is the primary concern here. Having to extract
>> and / or calculate the significand + exponent components in the kernel
>> is burdensome and imo a task better suited for user space. This also has
>> to be done every blob set, meaning that if user space is re-using
>> pre-calculated blobs we would be repeating the same conversion
>> operations in kernel space unnecessarily.
> 
> What is burdensome in that calculation? I don't think you would need to
> use any actual floating-point instructions. Logarithm for finding the
> exponent is about finding the highest bit set in an integer and
> everything is conveniently expressed in base-2. Finding significand is
> just masking the integer based on the exponent.
> 
Oh it definitely can be done, but I think this is just a difference of 
opinion at this point. At the end of the day we will do it if we have 
to, but it is just more optimal if a more agreeable common type is used.

> Can you not cache the converted data, keyed by the DRM blob unique
> identity vs. the KMS property it is attached to?
If the userspace compositor has N common transforms (ex: standard P3 -> 
sRGB matrix), they would likely have N unique blobs. Obviously from the 
kernel end we wouldn't want to cache the transform of every blob passed 
down through the UAPI.

> 
> You can assume that userspace will not be re-creating DRM blobs without
> a reason to believe the contents have changed. If the same blob is set
> on the same property repeatedly, I would definitely not expect a driver
> to convert the data again.
If the blob ID is unchanged there is no issue since caching the last 
result is already common. As you say, blobs are immutable so no update 
is needed. I'd question why the compositor keeps trying to send down the
same blob ID though.

> If a driver does that, it seems like it
> should be easy to avoid, though I'm no kernel dev. Even if the
> conversion was just a memcpy, I would still posit it needs to be
> avoided when the data has obviously not changed. Blobs are immutable.
>  > Userspace having to use hardware-specific number formats would probably
> not be well received.
> 
To be clear, I am not asking user space to use custom value packing made 
purely for the hardware's benefit (this sounds like a problem just 
waiting to happen). Just support in the color pipeline UAPI for common 
numerical data types such as 16-bit floats. That said...

>> I agree normalization of the value causing precision loss and rounding
>> we can't avoid.
>>
>> We should also consider the fact that float pipelines have been known to
>> use the scrgb definition for floating point values
>> (https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt).
> 
> scRGB is as good a definition of color encoding as "floating-point" is
> for numbers. What I mean is that it carries very little usable meaning,
> and without further information it is practically boundless
> - infinite - in both color gamut and dynamic range. Just like any
> floating-point quantity.
> 
> However, what we want from KMS color pipeline is zero implied or
> defined meaning. That means scRGB carries too much meaning, because the
> primaries are fixed and (1.0, 1.0, 1.0) should match sRGB/SDR white.
>  > Btw. if one brings in nit units, you assume a specific viewing
> environment which is rarely true in reality. I'll leave that rabbit
> hole for another time. I just want to mention that nit (cd/m²) is a
> unit that is relative to the chosen viewing environment when your goal
> is a specific perception of brightness.
> 
>> In cases like this where there may be a expected value range in the
>> pipeline, how to normalize a larger input becomes a little confusing. Ex
>> - Does U32 MAX become FP16 MAX or value MAX (i.e 127).
> 
> UAPI simply needs to specify the number encoding used in the UAPI, how
> bit patterns map to real numbers. Real numbers are then what the color
> pipeline operates on.
> 
If we plan to have the color pipeline UAPI expose these details then I 
am satisfied.

> However, intermediate value representation used between two KMS colorop
> blocks is never observable to userspace. All userspace needs to know is
> the usable value range and precision behaviour. I think that is best
> defined for the input and output of each block rather than what flows
> in between, because an optional (e.g. LUT) block when bypassed does not
> impose its limitations.
> 
Sure. Everything in between can be inferred from the pipeline.

> What does 1.0 actually mean, that is left for userspace to use however
> it wishes. There are only pipeline boundary conditions to that: the
> input to a pipeline comes from a DRM FB, so it has a number encoding
> specified mostly by pixel format, and an arbitrary colorimetric
> encoding that only userspace knows. The output of the pipeline has to
> be standardised so that drivers can number-encode the pipeline output
> correctly to wire format on e.g. HDMI. Userspace alone is responsible
> for making sure the colorimetry matches what the sink expects.
> 
> Individual KMS color pipeline colorop blocks need to define their own
> acceptable input and output ranges. E.g. a look-up table may assume
> that it's input is in [0.0, 1.0] and anything outside is clamped to
> that range. That poses restrictions on how userspace can use the block.
> 
>>>>   
>>>>> Exposing the actual hardware precision is something we've talked about during
>>>>> the hackfest. It'll probably be useful to some extent, but will require some
>>>>> discussion to figure out how to design the uAPI. Maybe a simple property is
>>>>> enough, maybe not (e.g. fully describing the precision of segmented LUTs would
>>>>> probably be trickier).
>>>>>
>>>>> I'd rather keep things simple for the first pass, we can always add more
>>>>> properties for bit depth etc later on.
>>>>>       
>>>> Indicating if a block operates on / with fixed vs float values is
>>>> significant enough that I think we should account for this in initial
>>>> design. It will have a affect on both the user space value packing +
>>>> expected value ranges in the hardware.
>>>
>>> What do you mean by "value packing"? Memory layout of the bits forming
>>> a value? Or possible exact values of a specific type? >
>> Both really. If the kernel is provided a U32 value, we need to know if
>> this is a U32 value, or a float packed into a U32 container. Likewise as
>> mentioned with the scRGB above, float could even adjust the value range
>> expectations.
> 
> Right. The UAPI will simply define that.
> 
Great!

Thanks,
Christopher

>>> I don't think fixed vs. float is the most important thing. Even fixed
>>> point formats can have different numbers of bits for whole numbers,
>>> which changes the usable value range and not only precision. Userspace
>>> at the very least needs to know the usable value range for the block's
>>> inputs, outputs, and parameters.
>>>
>>> When defining the precision for inputs, outputs and parameters, then
>>> fixed- vs. floating-point becomes meaningful in explaining what "N bits
>>> of precision" means.
>>>
>>> Then there is the question of variable precision that depends on the
>>> actual block input and parameter values, how to represent that. Worst
>>> case precision might be too pessimistic alone.
>>>    
>> Agreed. More information probably is needed to full define the interface
>> expectations.
>>
>>>>>>> Here is another example with a 3D LUT:
>>>>>>>
>>>>>>>         Color operation 42
>>>>>>>         ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>>>>>>>         ├─ "lut_size": immutable range = 33
>>>>>>>         ├─ "lut_data": blob
>>>>>>>         └─ "next": immutable color operation ID = 43
>>>>>>>      
>>>>>> We are going to need to expose the packing order here to avoid any
>>>>>> programming uncertainty. I don't think we can safely assume all hardware
>>>>>> is equivalent.
>>>>>
>>>>> The driver can easily change the layout of the matrix and do any conversion
>>>>> necessary when programming the hardware. We do need to document what layout is
>>>>> used in the uAPI for sure.
>>>>>       
>>>>>>> And one last example with a matrix:
>>>>>>>
>>>>>>>         Color operation 42
>>>>>>>         ├─ "type": enum {Bypass, Matrix} = Matrix
>>>>>>>         ├─ "matrix_data": blob
>>>>>>>         └─ "next": immutable color operation ID = 43
>>>>>>>      
>>>>>> It is unclear to me what the default sizing of this matrix is. Any
>>>>>> objections to exposing these details with an additional property?
>>>>>
>>>>> The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
>>>>> that wouldn't be enough?
>>>>
>>>> Larger cases do exist, but as you mention this can be resolved with a
>>>> different type then. I don't have any issues with the default 'Matrix'
>>>> type being 9 entries.
>>>
>>> Please, tell us more. How big, and what are they used for?
>>>
>>> IIRC ICC has 3x3 matrix + offset vector. Do you have even more?
>>>
>>>    
>> Offset is one. Range adjustment 'vector' is another. But ultimately this
>> proposal is flexible enough that this can probably just be another color
>> block in the pipeline. No complaints from me here.
> 
> What is a range adjustment vector? A vector of a multiplier per color
> channel? Does it include offset?
> 
> Yes, sounds like just another block.
> 
>>>>>       
>>>>>> Dithering logic exists in some pipelines. I think we need a plan to
>>>>>> expose that here as well.
>>>>>
>>>>> Hm, I'm not too familiar with dithering. Do you think it would make sense to
>>>>> expose as an additional colorop block? Do you think it would have more
>>>>> consequences on the design?
>>>
>>> I think it would be an additional block, and no other consequences, be
>>> it temporal and/or spatial dithering, as long as it does not look at
>>> neighbouring pixels to determine the output for current pixel.
>>>    
>>>>>
>>>>> I want to re-iterate that we don't need to ship all features from day 1. We
>>>>> just need to come up with a uAPI design on which new features can be built on.
>>>>>       
>>>>
>>>> Agreed. I don't think this will affect the proposed design so this can
>>>> be figured out once we have a DRM driver impl that declares this block.
>>>>   
>>>>>>> [Simon note: an alternative would be to split the color pipeline into
>>>>>>> two, by
>>>>>>> having two plane properties ("color_pipeline_pre_scale" and
>>>>>>> "color_pipeline_post_scale") instead of a single one. This would be
>>>>>>> similar to
>>>>>>> the way we want to split pre-blending and post-blending. This could be less
>>>>>>> expressive for drivers, there may be hardware where there are dependencies
>>>>>>> between the pre- and post-scaling pipeline?]
>>>>>>>      
>>>>>> As others have noted, breaking up the pipeline with immutable blocks
>>>>>> makes the most sense to me here. This way we don't have to predict ahead
>>>>>> of time every type of block that maybe affected by pipeline ordering.
>>>>>> Splitting the pipeline into two properties now means future
>>>>>> logical splits would require introduction of further plane properties.
>>>>>
>>>>> Right, if there are more "breaking points", then we'll need immutable blocks
>>>>> anyways.
>>>>>       
>>>>>>> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
>>>>>>> contains some fixed-function blocks which convert from LMS to ICtCp and
>>>>>>> cannot
>>>>>>> be disabled/bypassed. NVIDIA hardware has been designed for descriptive
>>>>>>> APIs
>>>>>>> where user-space provides a high-level description of the colorspace
>>>>>>> conversions it needs to perform, and this is at odds with our KMS uAPI
>>>>>>> proposal. To address this issue, we suggest adding a special block type
>>>>>>> which
>>>>>>> describes a fixed conversion from one colorspace to another and cannot be
>>>>>>> configured by user-space. Then user-space will need to accomodate its
>>>>>>> pipeline
>>>>>>> for these special blocks. Such fixed hardware blocks need to be well enough
>>>>>>> documented so that they can be implemented via shaders.
>>>>>>>      
>>>>>> A few questions here. What is the current plan for documenting the
>>>>>> mathematical model for each exposed block? Will each defined 'type' enum
>>>>>> value be locked to a definition in the kernel documents? As an example,
>>>>>> when we say '3D LUT' in this proposal does this mean the block will
>>>>>> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
>>>>>> direct in to out LUT mapping?
>>>>>
>>>>> I think we'll want to document these things, yes. We do want to give _some_
>>>>> slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
>>>>> hardware segmented LUTs with a different number of elements per LUT segment.
>>>>> But being mathematically precise (probably with formulae in the docs) is
>>>>> definitely a goal, and absolutely necessary to implement a shader-based
>>>>> fallback.
>>>>
>>>> I agree some driver slack is necessary, however ideally this will be
>>>> locked down enough that from the compositor side they see "1D LUT" and
>>>> know exactly what to expect independent of the hardware. This way
>>>> regardless of if I am running on a NVIDIA / AMD / QCOM / etc... chip,
>>>> common color pipeline strategies can be used. Assuming a perfect world
>>>> where there is a workable overlap between chips of course.
>>>
>>> Yes, of course, at least for a start.
>>>
>>> However, the long term plan includes a shared userspace library with
>>> driver- and hardware-specific knowledge to use hardware- and
>>> driver-specific blocks. All blocks still need to be explicitly
>>> specified in the kernel UAPI documentation, the idea is that it should
>>> not be a problem for many vendors to have blocks no-one else does. The
>>> library would offer a much more generic API, and use snowflake blocks
>>> to their fullest. The library would also spit out OpenGL shaders and
>>> whatnot for the fallback.
>>>
>>> The future in the long term could be either way: evolving towards
>>> generic KMS UAPI blocks with no need for a userspace library
>>> abstraction, or evolving towards hardware-specific KMS UAPI blocks with
>>> a userspace library to abstract them like Mesa does for GPUs.
>>>    
>> Sounds good to me!
> 
> Awesome!
> 
> 
> Thanks,
> pq

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-13 16:29               ` Christopher Braga
@ 2023-06-14  9:00                 ` Pekka Paalanen
  2023-06-15 21:44                   ` Christopher Braga
  0 siblings, 1 reply; 49+ messages in thread
From: Pekka Paalanen @ 2023-06-14  9:00 UTC (permalink / raw)
  To: Christopher Braga
  Cc: Aleix Pol, DRI Development, xaver.hugl, Michel Dänzer,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Sebastian Wick, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 19958 bytes --]

On Tue, 13 Jun 2023 12:29:55 -0400
Christopher Braga <quic_cbraga@quicinc.com> wrote:

> On 6/13/2023 4:23 AM, Pekka Paalanen wrote:
> > On Mon, 12 Jun 2023 12:56:57 -0400
> > Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >   
> >> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:  
> >>> On Fri, 9 Jun 2023 19:11:25 -0400
> >>> Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >>>      
> >>>> On 6/9/2023 12:30 PM, Simon Ser wrote:  
> >>>>> Hi Christopher,
> >>>>>
> >>>>> On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >>>>>         
> >>>>>>> The new COLOROP objects also expose a number of KMS properties. Each has a
> >>>>>>> type, a reference to the next COLOROP object in the linked list, and other
> >>>>>>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>>>>>
> >>>>>>>         Color operation 42
> >>>>>>>         ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>>>>>>         ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT  
> >>>>>> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >>>>>> curves? Will different hardware be allowed to expose a subset of these
> >>>>>> enum values?  
> >>>>>
> >>>>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> >>>>>         
> >>>>>>>         ├─ "lut_size": immutable range = 4096
> >>>>>>>         ├─ "lut_data": blob
> >>>>>>>         └─ "next": immutable color operation ID = 43
> >>>>>>>        
> >>>>>> Some hardware has per channel 1D LUT values, while others use the same
> >>>>>> LUT for all channels.  We will definitely need to expose this in the
> >>>>>> UAPI in some form.  
> >>>>>
> >>>>> Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
> >>>>> DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
> >>>>> to get exposed as another color operation block.
> >>>>>         
> >>>>>>> To configure this hardware block, user-space can fill a KMS blob with
> >>>>>>> 4096 u32
> >>>>>>> entries, then set "lut_data" to the blob ID. Other color operation types
> >>>>>>> might
> >>>>>>> have different properties.
> >>>>>>>        
> >>>>>> The bit-depth of the LUT is an important piece of information we should
> >>>>>> include by default. Are we assuming that the DRM driver will always
> >>>>>> reduce the input values to the resolution supported by the pipeline?
> >>>>>> This could result in differences between the hardware behavior
> >>>>>> and the shader behavior.
> >>>>>>
> >>>>>> Additionally, some pipelines are floating point while others are fixed.
> >>>>>> How would user space know if it needs to pack 32 bit integer values vs
> >>>>>> 32 bit float values?  
> >>>>>
> >>>>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
> >>>>> definition of LUT blob (u16 elements) and it's up to the driver to convert.
> >>>>>
> >>>>> Using a very precise format for the uAPI has the nice property of making the
> >>>>> uAPI much simpler to use. User-space sends high precision data and it's up to
> >>>>> drivers to map that to whatever the hardware accepts.
> >>>>>        
> >>>> Conversion from a larger uint type to a smaller type sounds low effort,
> >>>> however if a block works in a floating point space things are going to
> >>>> get messy really quickly. If the block operates in FP16 space and the
> >>>> interface is 16 bits we are good, but going from 32 bits to FP16 (such
> >>>> as in the matrix case or 3DLUT) is less than ideal.  
> >>>
> >>> Hi Christopher,
> >>>
> >>> are you thinking of precision loss, or the overhead of conversion?
> >>>
> >>> Conversion from N-bit fixed point to N-bit floating-point is generally
> >>> lossy, too, and the other direction as well.
> >>>
> >>> What exactly would be messy?
> >>>      
> >> Overheard of conversion is the primary concern here. Having to extract
> >> and / or calculate the significand + exponent components in the kernel
> >> is burdensome and imo a task better suited for user space. This also has
> >> to be done every blob set, meaning that if user space is re-using
> >> pre-calculated blobs we would be repeating the same conversion
> >> operations in kernel space unnecessarily.  
> > 
> > What is burdensome in that calculation? I don't think you would need to
> > use any actual floating-point instructions. Logarithm for finding the
> > exponent is about finding the highest bit set in an integer and
> > everything is conveniently expressed in base-2. Finding significand is
> > just masking the integer based on the exponent.
> >   
> Oh it definitely can be done, but I think this is just a difference of 
> opinion at this point. At the end of the day we will do it if we have 
> to, but it is just more optimal if a more agreeable common type is used.
> 
> > Can you not cache the converted data, keyed by the DRM blob unique
> > identity vs. the KMS property it is attached to?  
> If the userspace compositor has N common transforms (ex: standard P3 -> 
> sRGB matrix), they would likely have N unique blobs. Obviously from the 
> kernel end we wouldn't want to cache the transform of every blob passed 
> down through the UAPI.

Hi Christoper,

as long as the blob exists, why not?

> > You can assume that userspace will not be re-creating DRM blobs without
> > a reason to believe the contents have changed. If the same blob is set
> > on the same property repeatedly, I would definitely not expect a driver
> > to convert the data again.  
> If the blob ID is unchanged there is no issue since caching the last 
> result is already common. As you say, blobs are immutable so no update 
> is needed. I'd question why the compositor keeps trying to send down the
> same blob ID though.

To avoid hard-to-debug situations with userspace vs. kernel view of KMS
state getting out of sync by a bug, for example. I did originally write
such KMS state caching in Weston to avoid emitting unchanged state, but
that was deemed unnecessary as the kernel side needs to do the same
comparisons "anyway".

> > If a driver does that, it seems like it
> > should be easy to avoid, though I'm no kernel dev. Even if the
> > conversion was just a memcpy, I would still posit it needs to be
> > avoided when the data has obviously not changed. Blobs are immutable.  
> >  > Userspace having to use hardware-specific number formats would probably  
> > not be well received.
> >   
> To be clear, I am not asking user space to use custom value packing made 
> purely for the hardware's benefit (this sounds like a problem just 
> waiting to happen). Just support in the color pipeline UAPI for common 
> numerical data types such as 16-bit floats. That said...

I wonder if there actually is a significant difference between
converting float<->float and int<->float if everything else is equally
fine.

It's possible that requirements on range and precision do
call for both types in UAPI, then we obviously need both.

> >> I agree normalization of the value causing precision loss and rounding
> >> we can't avoid.
> >>
> >> We should also consider the fact that float pipelines have been known to
> >> use the scrgb definition for floating point values
> >> (https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt).  
> > 
> > scRGB is as good a definition of color encoding as "floating-point" is
> > for numbers. What I mean is that it carries very little usable meaning,
> > and without further information it is practically boundless
> > - infinite - in both color gamut and dynamic range. Just like any
> > floating-point quantity.
> > 
> > However, what we want from KMS color pipeline is zero implied or
> > defined meaning. That means scRGB carries too much meaning, because the
> > primaries are fixed and (1.0, 1.0, 1.0) should match sRGB/SDR white.  
> >  > Btw. if one brings in nit units, you assume a specific viewing  
> > environment which is rarely true in reality. I'll leave that rabbit
> > hole for another time. I just want to mention that nit (cd/m²) is a
> > unit that is relative to the chosen viewing environment when your goal
> > is a specific perception of brightness.
> >   
> >> In cases like this where there may be a expected value range in the
> >> pipeline, how to normalize a larger input becomes a little confusing. Ex
> >> - Does U32 MAX become FP16 MAX or value MAX (i.e 127).  
> > 
> > UAPI simply needs to specify the number encoding used in the UAPI, how
> > bit patterns map to real numbers. Real numbers are then what the color
> > pipeline operates on.
> >   
> If we plan to have the color pipeline UAPI expose these details then I 
> am satisfied.

Very good. I do not see how else it could even work.


Thanks,
pq


> > However, intermediate value representation used between two KMS colorop
> > blocks is never observable to userspace. All userspace needs to know is
> > the usable value range and precision behaviour. I think that is best
> > defined for the input and output of each block rather than what flows
> > in between, because an optional (e.g. LUT) block when bypassed does not
> > impose its limitations.
> >   
> Sure. Everything in between can be inferred from the pipeline.
> 
> > What does 1.0 actually mean, that is left for userspace to use however
> > it wishes. There are only pipeline boundary conditions to that: the
> > input to a pipeline comes from a DRM FB, so it has a number encoding
> > specified mostly by pixel format, and an arbitrary colorimetric
> > encoding that only userspace knows. The output of the pipeline has to
> > be standardised so that drivers can number-encode the pipeline output
> > correctly to wire format on e.g. HDMI. Userspace alone is responsible
> > for making sure the colorimetry matches what the sink expects.
> > 
> > Individual KMS color pipeline colorop blocks need to define their own
> > acceptable input and output ranges. E.g. a look-up table may assume
> > that it's input is in [0.0, 1.0] and anything outside is clamped to
> > that range. That poses restrictions on how userspace can use the block.
> >   
> >>>>     
> >>>>> Exposing the actual hardware precision is something we've talked about during
> >>>>> the hackfest. It'll probably be useful to some extent, but will require some
> >>>>> discussion to figure out how to design the uAPI. Maybe a simple property is
> >>>>> enough, maybe not (e.g. fully describing the precision of segmented LUTs would
> >>>>> probably be trickier).
> >>>>>
> >>>>> I'd rather keep things simple for the first pass, we can always add more
> >>>>> properties for bit depth etc later on.
> >>>>>         
> >>>> Indicating if a block operates on / with fixed vs float values is
> >>>> significant enough that I think we should account for this in initial
> >>>> design. It will have a affect on both the user space value packing +
> >>>> expected value ranges in the hardware.  
> >>>
> >>> What do you mean by "value packing"? Memory layout of the bits forming
> >>> a value? Or possible exact values of a specific type? >  
> >> Both really. If the kernel is provided a U32 value, we need to know if
> >> this is a U32 value, or a float packed into a U32 container. Likewise as
> >> mentioned with the scRGB above, float could even adjust the value range
> >> expectations.  
> > 
> > Right. The UAPI will simply define that.
> >   
> Great!
> 
> Thanks,
> Christopher
> 
> >>> I don't think fixed vs. float is the most important thing. Even fixed
> >>> point formats can have different numbers of bits for whole numbers,
> >>> which changes the usable value range and not only precision. Userspace
> >>> at the very least needs to know the usable value range for the block's
> >>> inputs, outputs, and parameters.
> >>>
> >>> When defining the precision for inputs, outputs and parameters, then
> >>> fixed- vs. floating-point becomes meaningful in explaining what "N bits
> >>> of precision" means.
> >>>
> >>> Then there is the question of variable precision that depends on the
> >>> actual block input and parameter values, how to represent that. Worst
> >>> case precision might be too pessimistic alone.
> >>>      
> >> Agreed. More information probably is needed to full define the interface
> >> expectations.
> >>  
> >>>>>>> Here is another example with a 3D LUT:
> >>>>>>>
> >>>>>>>         Color operation 42
> >>>>>>>         ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
> >>>>>>>         ├─ "lut_size": immutable range = 33
> >>>>>>>         ├─ "lut_data": blob
> >>>>>>>         └─ "next": immutable color operation ID = 43
> >>>>>>>        
> >>>>>> We are going to need to expose the packing order here to avoid any
> >>>>>> programming uncertainty. I don't think we can safely assume all hardware
> >>>>>> is equivalent.  
> >>>>>
> >>>>> The driver can easily change the layout of the matrix and do any conversion
> >>>>> necessary when programming the hardware. We do need to document what layout is
> >>>>> used in the uAPI for sure.
> >>>>>         
> >>>>>>> And one last example with a matrix:
> >>>>>>>
> >>>>>>>         Color operation 42
> >>>>>>>         ├─ "type": enum {Bypass, Matrix} = Matrix
> >>>>>>>         ├─ "matrix_data": blob
> >>>>>>>         └─ "next": immutable color operation ID = 43
> >>>>>>>        
> >>>>>> It is unclear to me what the default sizing of this matrix is. Any
> >>>>>> objections to exposing these details with an additional property?  
> >>>>>
> >>>>> The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
> >>>>> that wouldn't be enough?  
> >>>>
> >>>> Larger cases do exist, but as you mention this can be resolved with a
> >>>> different type then. I don't have any issues with the default 'Matrix'
> >>>> type being 9 entries.  
> >>>
> >>> Please, tell us more. How big, and what are they used for?
> >>>
> >>> IIRC ICC has 3x3 matrix + offset vector. Do you have even more?
> >>>
> >>>      
> >> Offset is one. Range adjustment 'vector' is another. But ultimately this
> >> proposal is flexible enough that this can probably just be another color
> >> block in the pipeline. No complaints from me here.  
> > 
> > What is a range adjustment vector? A vector of a multiplier per color
> > channel? Does it include offset?
> > 
> > Yes, sounds like just another block.
> >   
> >>>>>         
> >>>>>> Dithering logic exists in some pipelines. I think we need a plan to
> >>>>>> expose that here as well.  
> >>>>>
> >>>>> Hm, I'm not too familiar with dithering. Do you think it would make sense to
> >>>>> expose as an additional colorop block? Do you think it would have more
> >>>>> consequences on the design?  
> >>>
> >>> I think it would be an additional block, and no other consequences, be
> >>> it temporal and/or spatial dithering, as long as it does not look at
> >>> neighbouring pixels to determine the output for current pixel.
> >>>      
> >>>>>
> >>>>> I want to re-iterate that we don't need to ship all features from day 1. We
> >>>>> just need to come up with a uAPI design on which new features can be built on.
> >>>>>         
> >>>>
> >>>> Agreed. I don't think this will affect the proposed design so this can
> >>>> be figured out once we have a DRM driver impl that declares this block.
> >>>>     
> >>>>>>> [Simon note: an alternative would be to split the color pipeline into
> >>>>>>> two, by
> >>>>>>> having two plane properties ("color_pipeline_pre_scale" and
> >>>>>>> "color_pipeline_post_scale") instead of a single one. This would be
> >>>>>>> similar to
> >>>>>>> the way we want to split pre-blending and post-blending. This could be less
> >>>>>>> expressive for drivers, there may be hardware where there are dependencies
> >>>>>>> between the pre- and post-scaling pipeline?]
> >>>>>>>        
> >>>>>> As others have noted, breaking up the pipeline with immutable blocks
> >>>>>> makes the most sense to me here. This way we don't have to predict ahead
> >>>>>> of time every type of block that maybe affected by pipeline ordering.
> >>>>>> Splitting the pipeline into two properties now means future
> >>>>>> logical splits would require introduction of further plane properties.  
> >>>>>
> >>>>> Right, if there are more "breaking points", then we'll need immutable blocks
> >>>>> anyways.
> >>>>>         
> >>>>>>> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
> >>>>>>> contains some fixed-function blocks which convert from LMS to ICtCp and
> >>>>>>> cannot
> >>>>>>> be disabled/bypassed. NVIDIA hardware has been designed for descriptive
> >>>>>>> APIs
> >>>>>>> where user-space provides a high-level description of the colorspace
> >>>>>>> conversions it needs to perform, and this is at odds with our KMS uAPI
> >>>>>>> proposal. To address this issue, we suggest adding a special block type
> >>>>>>> which
> >>>>>>> describes a fixed conversion from one colorspace to another and cannot be
> >>>>>>> configured by user-space. Then user-space will need to accomodate its
> >>>>>>> pipeline
> >>>>>>> for these special blocks. Such fixed hardware blocks need to be well enough
> >>>>>>> documented so that they can be implemented via shaders.
> >>>>>>>        
> >>>>>> A few questions here. What is the current plan for documenting the
> >>>>>> mathematical model for each exposed block? Will each defined 'type' enum
> >>>>>> value be locked to a definition in the kernel documents? As an example,
> >>>>>> when we say '3D LUT' in this proposal does this mean the block will
> >>>>>> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
> >>>>>> direct in to out LUT mapping?  
> >>>>>
> >>>>> I think we'll want to document these things, yes. We do want to give _some_
> >>>>> slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
> >>>>> hardware segmented LUTs with a different number of elements per LUT segment.
> >>>>> But being mathematically precise (probably with formulae in the docs) is
> >>>>> definitely a goal, and absolutely necessary to implement a shader-based
> >>>>> fallback.  
> >>>>
> >>>> I agree some driver slack is necessary, however ideally this will be
> >>>> locked down enough that from the compositor side they see "1D LUT" and
> >>>> know exactly what to expect independent of the hardware. This way
> >>>> regardless of if I am running on a NVIDIA / AMD / QCOM / etc... chip,
> >>>> common color pipeline strategies can be used. Assuming a perfect world
> >>>> where there is a workable overlap between chips of course.  
> >>>
> >>> Yes, of course, at least for a start.
> >>>
> >>> However, the long term plan includes a shared userspace library with
> >>> driver- and hardware-specific knowledge to use hardware- and
> >>> driver-specific blocks. All blocks still need to be explicitly
> >>> specified in the kernel UAPI documentation, the idea is that it should
> >>> not be a problem for many vendors to have blocks no-one else does. The
> >>> library would offer a much more generic API, and use snowflake blocks
> >>> to their fullest. The library would also spit out OpenGL shaders and
> >>> whatnot for the fallback.
> >>>
> >>> The future in the long term could be either way: evolving towards
> >>> generic KMS UAPI blocks with no need for a userspace library
> >>> abstraction, or evolving towards hardware-specific KMS UAPI blocks with
> >>> a userspace library to abstract them like Mesa does for GPUs.
> >>>      
> >> Sounds good to me!  
> > 
> > Awesome!
> > 
> > 
> > Thanks,
> > pq  


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-14  9:00                 ` Pekka Paalanen
@ 2023-06-15 21:44                   ` Christopher Braga
  2023-06-16  7:59                     ` Pekka Paalanen
  0 siblings, 1 reply; 49+ messages in thread
From: Christopher Braga @ 2023-06-15 21:44 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Aleix Pol, DRI Development, xaver.hugl, Michel Dänzer,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Sebastian Wick, Joshua Ashton



On 6/14/2023 5:00 AM, Pekka Paalanen wrote:
> On Tue, 13 Jun 2023 12:29:55 -0400
> Christopher Braga <quic_cbraga@quicinc.com> wrote:
> 
>> On 6/13/2023 4:23 AM, Pekka Paalanen wrote:
>>> On Mon, 12 Jun 2023 12:56:57 -0400
>>> Christopher Braga <quic_cbraga@quicinc.com> wrote:
>>>    
>>>> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:
>>>>> On Fri, 9 Jun 2023 19:11:25 -0400
>>>>> Christopher Braga <quic_cbraga@quicinc.com> wrote:
>>>>>       
>>>>>> On 6/9/2023 12:30 PM, Simon Ser wrote:
>>>>>>> Hi Christopher,
>>>>>>>
>>>>>>> On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
>>>>>>>          
>>>>>>>>> The new COLOROP objects also expose a number of KMS properties. Each has a
>>>>>>>>> type, a reference to the next COLOROP object in the linked list, and other
>>>>>>>>> type-specific properties. Here is an example for a 1D LUT operation:
>>>>>>>>>
>>>>>>>>>          Color operation 42
>>>>>>>>>          ├─ "type": enum {Bypass, 1D curve} = 1D curve
>>>>>>>>>          ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT
>>>>>>>> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
>>>>>>>> curves? Will different hardware be allowed to expose a subset of these
>>>>>>>> enum values?
>>>>>>>
>>>>>>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
>>>>>>>          
>>>>>>>>>          ├─ "lut_size": immutable range = 4096
>>>>>>>>>          ├─ "lut_data": blob
>>>>>>>>>          └─ "next": immutable color operation ID = 43
>>>>>>>>>         
>>>>>>>> Some hardware has per channel 1D LUT values, while others use the same
>>>>>>>> LUT for all channels.  We will definitely need to expose this in the
>>>>>>>> UAPI in some form.
>>>>>>>
>>>>>>> Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
>>>>>>> DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
>>>>>>> to get exposed as another color operation block.
>>>>>>>          
>>>>>>>>> To configure this hardware block, user-space can fill a KMS blob with
>>>>>>>>> 4096 u32
>>>>>>>>> entries, then set "lut_data" to the blob ID. Other color operation types
>>>>>>>>> might
>>>>>>>>> have different properties.
>>>>>>>>>         
>>>>>>>> The bit-depth of the LUT is an important piece of information we should
>>>>>>>> include by default. Are we assuming that the DRM driver will always
>>>>>>>> reduce the input values to the resolution supported by the pipeline?
>>>>>>>> This could result in differences between the hardware behavior
>>>>>>>> and the shader behavior.
>>>>>>>>
>>>>>>>> Additionally, some pipelines are floating point while others are fixed.
>>>>>>>> How would user space know if it needs to pack 32 bit integer values vs
>>>>>>>> 32 bit float values?
>>>>>>>
>>>>>>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
>>>>>>> definition of LUT blob (u16 elements) and it's up to the driver to convert.
>>>>>>>
>>>>>>> Using a very precise format for the uAPI has the nice property of making the
>>>>>>> uAPI much simpler to use. User-space sends high precision data and it's up to
>>>>>>> drivers to map that to whatever the hardware accepts.
>>>>>>>         
>>>>>> Conversion from a larger uint type to a smaller type sounds low effort,
>>>>>> however if a block works in a floating point space things are going to
>>>>>> get messy really quickly. If the block operates in FP16 space and the
>>>>>> interface is 16 bits we are good, but going from 32 bits to FP16 (such
>>>>>> as in the matrix case or 3DLUT) is less than ideal.
>>>>>
>>>>> Hi Christopher,
>>>>>
>>>>> are you thinking of precision loss, or the overhead of conversion?
>>>>>
>>>>> Conversion from N-bit fixed point to N-bit floating-point is generally
>>>>> lossy, too, and the other direction as well.
>>>>>
>>>>> What exactly would be messy?
>>>>>       
>>>> Overheard of conversion is the primary concern here. Having to extract
>>>> and / or calculate the significand + exponent components in the kernel
>>>> is burdensome and imo a task better suited for user space. This also has
>>>> to be done every blob set, meaning that if user space is re-using
>>>> pre-calculated blobs we would be repeating the same conversion
>>>> operations in kernel space unnecessarily.
>>>
>>> What is burdensome in that calculation? I don't think you would need to
>>> use any actual floating-point instructions. Logarithm for finding the
>>> exponent is about finding the highest bit set in an integer and
>>> everything is conveniently expressed in base-2. Finding significand is
>>> just masking the integer based on the exponent.
>>>    
>> Oh it definitely can be done, but I think this is just a difference of
>> opinion at this point. At the end of the day we will do it if we have
>> to, but it is just more optimal if a more agreeable common type is used.
>>
>>> Can you not cache the converted data, keyed by the DRM blob unique
>>> identity vs. the KMS property it is attached to?
>> If the userspace compositor has N common transforms (ex: standard P3 ->
>> sRGB matrix), they would likely have N unique blobs. Obviously from the
>> kernel end we wouldn't want to cache the transform of every blob passed
>> down through the UAPI.
> 
> Hi Christoper,
> 
> as long as the blob exists, why not?

Generally because this is an unbounded amount of blobs. I'm not 100% 
sure what the typical behavior is upstream, but in our driver we have 
scenarios where we can have per-frame blob updates (unique per-frame blobs).

Speaking of per-frame blob updates, there is one concern I neglected to 
bring up. Internally we have seen scenarios where frequent blob 
allocation can lead to memory allocation delays of two frames or higher. 
This typically was seen when the system is under high memory usage and 
the blob allocation is > 1 page. The patch 
https://patchwork.freedesktop.org/patch/525857/ was uploaded a few 
months back to help mitigate these delays, but it didn't gain traction 
at the time.

This color pipeline UAPI is ultimately going to have the same problem. 
Frequent 3DLUT color block updates will result in large allocations, and 
if there is high system memory usage this could see blob allocation 
delays. So two things here:
- Let's reconsider https://patchwork.freedesktop.org/patch/525857/ so 
frequent blob allocation doesn't get unnecessarily delayed
- Do we have any alternative methods at our disposal for sending down 
the color configuration data? Generally blobs work fine for low update 
or blob cycling use cases, but frequent blob data updates results in a 
total per frame IOCTL sequence of:
   (IOCTL_BLOB_DESTROY * #_of_blob_updates) +		
     (IOCTL_BLOB_CREATE * #_of_blob_updates) + IOCTL_DRM_ATOMIC

Thanks,
Christopher

> 
>>> You can assume that userspace will not be re-creating DRM blobs without
>>> a reason to believe the contents have changed. If the same blob is set
>>> on the same property repeatedly, I would definitely not expect a driver
>>> to convert the data again.
>> If the blob ID is unchanged there is no issue since caching the last
>> result is already common. As you say, blobs are immutable so no update
>> is needed. I'd question why the compositor keeps trying to send down the
>> same blob ID though.
> 
> To avoid hard-to-debug situations with userspace vs. kernel view of KMS
> state getting out of sync by a bug, for example. I did originally write
> such KMS state caching in Weston to avoid emitting unchanged state, but
> that was deemed unnecessary as the kernel side needs to do the same
> comparisons "anyway".
> 
>>> If a driver does that, it seems like it
>>> should be easy to avoid, though I'm no kernel dev. Even if the
>>> conversion was just a memcpy, I would still posit it needs to be
>>> avoided when the data has obviously not changed. Blobs are immutable.
>>>   > Userspace having to use hardware-specific number formats would probably
>>> not be well received.
>>>    
>> To be clear, I am not asking user space to use custom value packing made
>> purely for the hardware's benefit (this sounds like a problem just
>> waiting to happen). Just support in the color pipeline UAPI for common
>> numerical data types such as 16-bit floats. That said...
> 
> I wonder if there actually is a significant difference between
> converting float<->float and int<->float if everything else is equally
> fine.
> 
> It's possible that requirements on range and precision do
> call for both types in UAPI, then we obviously need both.
> 
>>>> I agree normalization of the value causing precision loss and rounding
>>>> we can't avoid.
>>>>
>>>> We should also consider the fact that float pipelines have been known to
>>>> use the scrgb definition for floating point values
>>>> (https://registry.khronos.org/EGL/extensions/EXT/EGL_EXT_gl_colorspace_scrgb_linear.txt).
>>>
>>> scRGB is as good a definition of color encoding as "floating-point" is
>>> for numbers. What I mean is that it carries very little usable meaning,
>>> and without further information it is practically boundless
>>> - infinite - in both color gamut and dynamic range. Just like any
>>> floating-point quantity.
>>>
>>> However, what we want from KMS color pipeline is zero implied or
>>> defined meaning. That means scRGB carries too much meaning, because the
>>> primaries are fixed and (1.0, 1.0, 1.0) should match sRGB/SDR white.
>>>   > Btw. if one brings in nit units, you assume a specific viewing
>>> environment which is rarely true in reality. I'll leave that rabbit
>>> hole for another time. I just want to mention that nit (cd/m²) is a
>>> unit that is relative to the chosen viewing environment when your goal
>>> is a specific perception of brightness.
>>>    
>>>> In cases like this where there may be a expected value range in the
>>>> pipeline, how to normalize a larger input becomes a little confusing. Ex
>>>> - Does U32 MAX become FP16 MAX or value MAX (i.e 127).
>>>
>>> UAPI simply needs to specify the number encoding used in the UAPI, how
>>> bit patterns map to real numbers. Real numbers are then what the color
>>> pipeline operates on.
>>>    
>> If we plan to have the color pipeline UAPI expose these details then I
>> am satisfied.
> 
> Very good. I do not see how else it could even work.
> 
> 
> Thanks,
> pq
> 
> 
>>> However, intermediate value representation used between two KMS colorop
>>> blocks is never observable to userspace. All userspace needs to know is
>>> the usable value range and precision behaviour. I think that is best
>>> defined for the input and output of each block rather than what flows
>>> in between, because an optional (e.g. LUT) block when bypassed does not
>>> impose its limitations.
>>>    
>> Sure. Everything in between can be inferred from the pipeline.
>>
>>> What does 1.0 actually mean, that is left for userspace to use however
>>> it wishes. There are only pipeline boundary conditions to that: the
>>> input to a pipeline comes from a DRM FB, so it has a number encoding
>>> specified mostly by pixel format, and an arbitrary colorimetric
>>> encoding that only userspace knows. The output of the pipeline has to
>>> be standardised so that drivers can number-encode the pipeline output
>>> correctly to wire format on e.g. HDMI. Userspace alone is responsible
>>> for making sure the colorimetry matches what the sink expects.
>>>
>>> Individual KMS color pipeline colorop blocks need to define their own
>>> acceptable input and output ranges. E.g. a look-up table may assume
>>> that it's input is in [0.0, 1.0] and anything outside is clamped to
>>> that range. That poses restrictions on how userspace can use the block.
>>>    
>>>>>>      
>>>>>>> Exposing the actual hardware precision is something we've talked about during
>>>>>>> the hackfest. It'll probably be useful to some extent, but will require some
>>>>>>> discussion to figure out how to design the uAPI. Maybe a simple property is
>>>>>>> enough, maybe not (e.g. fully describing the precision of segmented LUTs would
>>>>>>> probably be trickier).
>>>>>>>
>>>>>>> I'd rather keep things simple for the first pass, we can always add more
>>>>>>> properties for bit depth etc later on.
>>>>>>>          
>>>>>> Indicating if a block operates on / with fixed vs float values is
>>>>>> significant enough that I think we should account for this in initial
>>>>>> design. It will have a affect on both the user space value packing +
>>>>>> expected value ranges in the hardware.
>>>>>
>>>>> What do you mean by "value packing"? Memory layout of the bits forming
>>>>> a value? Or possible exact values of a specific type? >
>>>> Both really. If the kernel is provided a U32 value, we need to know if
>>>> this is a U32 value, or a float packed into a U32 container. Likewise as
>>>> mentioned with the scRGB above, float could even adjust the value range
>>>> expectations.
>>>
>>> Right. The UAPI will simply define that.
>>>    
>> Great!
>>
>> Thanks,
>> Christopher
>>
>>>>> I don't think fixed vs. float is the most important thing. Even fixed
>>>>> point formats can have different numbers of bits for whole numbers,
>>>>> which changes the usable value range and not only precision. Userspace
>>>>> at the very least needs to know the usable value range for the block's
>>>>> inputs, outputs, and parameters.
>>>>>
>>>>> When defining the precision for inputs, outputs and parameters, then
>>>>> fixed- vs. floating-point becomes meaningful in explaining what "N bits
>>>>> of precision" means.
>>>>>
>>>>> Then there is the question of variable precision that depends on the
>>>>> actual block input and parameter values, how to represent that. Worst
>>>>> case precision might be too pessimistic alone.
>>>>>       
>>>> Agreed. More information probably is needed to full define the interface
>>>> expectations.
>>>>   
>>>>>>>>> Here is another example with a 3D LUT:
>>>>>>>>>
>>>>>>>>>          Color operation 42
>>>>>>>>>          ├─ "type": enum {Bypass, 3D LUT} = 3D LUT
>>>>>>>>>          ├─ "lut_size": immutable range = 33
>>>>>>>>>          ├─ "lut_data": blob
>>>>>>>>>          └─ "next": immutable color operation ID = 43
>>>>>>>>>         
>>>>>>>> We are going to need to expose the packing order here to avoid any
>>>>>>>> programming uncertainty. I don't think we can safely assume all hardware
>>>>>>>> is equivalent.
>>>>>>>
>>>>>>> The driver can easily change the layout of the matrix and do any conversion
>>>>>>> necessary when programming the hardware. We do need to document what layout is
>>>>>>> used in the uAPI for sure.
>>>>>>>          
>>>>>>>>> And one last example with a matrix:
>>>>>>>>>
>>>>>>>>>          Color operation 42
>>>>>>>>>          ├─ "type": enum {Bypass, Matrix} = Matrix
>>>>>>>>>          ├─ "matrix_data": blob
>>>>>>>>>          └─ "next": immutable color operation ID = 43
>>>>>>>>>         
>>>>>>>> It is unclear to me what the default sizing of this matrix is. Any
>>>>>>>> objections to exposing these details with an additional property?
>>>>>>>
>>>>>>> The existing CTM property uses 9 uint64 (S31.32) values. Is there a case where
>>>>>>> that wouldn't be enough?
>>>>>>
>>>>>> Larger cases do exist, but as you mention this can be resolved with a
>>>>>> different type then. I don't have any issues with the default 'Matrix'
>>>>>> type being 9 entries.
>>>>>
>>>>> Please, tell us more. How big, and what are they used for?
>>>>>
>>>>> IIRC ICC has 3x3 matrix + offset vector. Do you have even more?
>>>>>
>>>>>       
>>>> Offset is one. Range adjustment 'vector' is another. But ultimately this
>>>> proposal is flexible enough that this can probably just be another color
>>>> block in the pipeline. No complaints from me here.
>>>
>>> What is a range adjustment vector? A vector of a multiplier per color
>>> channel? Does it include offset?
>>>
>>> Yes, sounds like just another block.
>>>    
>>>>>>>          
>>>>>>>> Dithering logic exists in some pipelines. I think we need a plan to
>>>>>>>> expose that here as well.
>>>>>>>
>>>>>>> Hm, I'm not too familiar with dithering. Do you think it would make sense to
>>>>>>> expose as an additional colorop block? Do you think it would have more
>>>>>>> consequences on the design?
>>>>>
>>>>> I think it would be an additional block, and no other consequences, be
>>>>> it temporal and/or spatial dithering, as long as it does not look at
>>>>> neighbouring pixels to determine the output for current pixel.
>>>>>       
>>>>>>>
>>>>>>> I want to re-iterate that we don't need to ship all features from day 1. We
>>>>>>> just need to come up with a uAPI design on which new features can be built on.
>>>>>>>          
>>>>>>
>>>>>> Agreed. I don't think this will affect the proposed design so this can
>>>>>> be figured out once we have a DRM driver impl that declares this block.
>>>>>>      
>>>>>>>>> [Simon note: an alternative would be to split the color pipeline into
>>>>>>>>> two, by
>>>>>>>>> having two plane properties ("color_pipeline_pre_scale" and
>>>>>>>>> "color_pipeline_post_scale") instead of a single one. This would be
>>>>>>>>> similar to
>>>>>>>>> the way we want to split pre-blending and post-blending. This could be less
>>>>>>>>> expressive for drivers, there may be hardware where there are dependencies
>>>>>>>>> between the pre- and post-scaling pipeline?]
>>>>>>>>>         
>>>>>>>> As others have noted, breaking up the pipeline with immutable blocks
>>>>>>>> makes the most sense to me here. This way we don't have to predict ahead
>>>>>>>> of time every type of block that maybe affected by pipeline ordering.
>>>>>>>> Splitting the pipeline into two properties now means future
>>>>>>>> logical splits would require introduction of further plane properties.
>>>>>>>
>>>>>>> Right, if there are more "breaking points", then we'll need immutable blocks
>>>>>>> anyways.
>>>>>>>          
>>>>>>>>> Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware
>>>>>>>>> contains some fixed-function blocks which convert from LMS to ICtCp and
>>>>>>>>> cannot
>>>>>>>>> be disabled/bypassed. NVIDIA hardware has been designed for descriptive
>>>>>>>>> APIs
>>>>>>>>> where user-space provides a high-level description of the colorspace
>>>>>>>>> conversions it needs to perform, and this is at odds with our KMS uAPI
>>>>>>>>> proposal. To address this issue, we suggest adding a special block type
>>>>>>>>> which
>>>>>>>>> describes a fixed conversion from one colorspace to another and cannot be
>>>>>>>>> configured by user-space. Then user-space will need to accomodate its
>>>>>>>>> pipeline
>>>>>>>>> for these special blocks. Such fixed hardware blocks need to be well enough
>>>>>>>>> documented so that they can be implemented via shaders.
>>>>>>>>>         
>>>>>>>> A few questions here. What is the current plan for documenting the
>>>>>>>> mathematical model for each exposed block? Will each defined 'type' enum
>>>>>>>> value be locked to a definition in the kernel documents? As an example,
>>>>>>>> when we say '3D LUT' in this proposal does this mean the block will
>>>>>>>> expose a tri-linear interpolated 3D LUT interface? Is '1D Curve' a
>>>>>>>> direct in to out LUT mapping?
>>>>>>>
>>>>>>> I think we'll want to document these things, yes. We do want to give _some_
>>>>>>> slack to drivers, so that they can e.g. implement the "1D LUT" colorop via
>>>>>>> hardware segmented LUTs with a different number of elements per LUT segment.
>>>>>>> But being mathematically precise (probably with formulae in the docs) is
>>>>>>> definitely a goal, and absolutely necessary to implement a shader-based
>>>>>>> fallback.
>>>>>>
>>>>>> I agree some driver slack is necessary, however ideally this will be
>>>>>> locked down enough that from the compositor side they see "1D LUT" and
>>>>>> know exactly what to expect independent of the hardware. This way
>>>>>> regardless of if I am running on a NVIDIA / AMD / QCOM / etc... chip,
>>>>>> common color pipeline strategies can be used. Assuming a perfect world
>>>>>> where there is a workable overlap between chips of course.
>>>>>
>>>>> Yes, of course, at least for a start.
>>>>>
>>>>> However, the long term plan includes a shared userspace library with
>>>>> driver- and hardware-specific knowledge to use hardware- and
>>>>> driver-specific blocks. All blocks still need to be explicitly
>>>>> specified in the kernel UAPI documentation, the idea is that it should
>>>>> not be a problem for many vendors to have blocks no-one else does. The
>>>>> library would offer a much more generic API, and use snowflake blocks
>>>>> to their fullest. The library would also spit out OpenGL shaders and
>>>>> whatnot for the fallback.
>>>>>
>>>>> The future in the long term could be either way: evolving towards
>>>>> generic KMS UAPI blocks with no need for a userspace library
>>>>> abstraction, or evolving towards hardware-specific KMS UAPI blocks with
>>>>> a userspace library to abstract them like Mesa does for GPUs.
>>>>>       
>>>> Sounds good to me!
>>>
>>> Awesome!
>>>
>>>
>>> Thanks,
>>> pq
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC] Plane color pipeline KMS uAPI
  2023-06-15 21:44                   ` Christopher Braga
@ 2023-06-16  7:59                     ` Pekka Paalanen
  0 siblings, 0 replies; 49+ messages in thread
From: Pekka Paalanen @ 2023-06-16  7:59 UTC (permalink / raw)
  To: Christopher Braga
  Cc: Aleix Pol, DRI Development, xaver.hugl, Michel Dänzer,
	wayland-devel, Melissa Wen, Jonas Ådahl, Uma Shankar,
	Victoria Brekenfeld, Sebastian Wick, Joshua Ashton

[-- Attachment #1: Type: text/plain, Size: 9087 bytes --]

On Thu, 15 Jun 2023 17:44:33 -0400
Christopher Braga <quic_cbraga@quicinc.com> wrote:

> On 6/14/2023 5:00 AM, Pekka Paalanen wrote:
> > On Tue, 13 Jun 2023 12:29:55 -0400
> > Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >   
> >> On 6/13/2023 4:23 AM, Pekka Paalanen wrote:  
> >>> On Mon, 12 Jun 2023 12:56:57 -0400
> >>> Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >>>      
> >>>> On 6/12/2023 5:21 AM, Pekka Paalanen wrote:  
> >>>>> On Fri, 9 Jun 2023 19:11:25 -0400
> >>>>> Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >>>>>         
> >>>>>> On 6/9/2023 12:30 PM, Simon Ser wrote:  
> >>>>>>> Hi Christopher,
> >>>>>>>
> >>>>>>> On Friday, June 9th, 2023 at 17:52, Christopher Braga <quic_cbraga@quicinc.com> wrote:
> >>>>>>>            
> >>>>>>>>> The new COLOROP objects also expose a number of KMS properties. Each has a
> >>>>>>>>> type, a reference to the next COLOROP object in the linked list, and other
> >>>>>>>>> type-specific properties. Here is an example for a 1D LUT operation:
> >>>>>>>>>
> >>>>>>>>>          Color operation 42
> >>>>>>>>>          ├─ "type": enum {Bypass, 1D curve} = 1D curve
> >>>>>>>>>          ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT  
> >>>>>>>> The options sRGB / PQ / BT.709 / HLG would select hard-coded 1D
> >>>>>>>> curves? Will different hardware be allowed to expose a subset of these
> >>>>>>>> enum values?  
> >>>>>>>
> >>>>>>> Yes. Only hardcoded LUTs supported by the HW are exposed as enum entries.
> >>>>>>>            
> >>>>>>>>>          ├─ "lut_size": immutable range = 4096
> >>>>>>>>>          ├─ "lut_data": blob
> >>>>>>>>>          └─ "next": immutable color operation ID = 43
> >>>>>>>>>           
> >>>>>>>> Some hardware has per channel 1D LUT values, while others use the same
> >>>>>>>> LUT for all channels.  We will definitely need to expose this in the
> >>>>>>>> UAPI in some form.  
> >>>>>>>
> >>>>>>> Hm, I was assuming per-channel 1D LUTs here, just like the existing GAMMA_LUT/
> >>>>>>> DEGAMMA_LUT properties work. If some hardware can't support that, it'll need
> >>>>>>> to get exposed as another color operation block.
> >>>>>>>            
> >>>>>>>>> To configure this hardware block, user-space can fill a KMS blob with
> >>>>>>>>> 4096 u32
> >>>>>>>>> entries, then set "lut_data" to the blob ID. Other color operation types
> >>>>>>>>> might
> >>>>>>>>> have different properties.
> >>>>>>>>>           
> >>>>>>>> The bit-depth of the LUT is an important piece of information we should
> >>>>>>>> include by default. Are we assuming that the DRM driver will always
> >>>>>>>> reduce the input values to the resolution supported by the pipeline?
> >>>>>>>> This could result in differences between the hardware behavior
> >>>>>>>> and the shader behavior.
> >>>>>>>>
> >>>>>>>> Additionally, some pipelines are floating point while others are fixed.
> >>>>>>>> How would user space know if it needs to pack 32 bit integer values vs
> >>>>>>>> 32 bit float values?  
> >>>>>>>
> >>>>>>> Again, I'm deferring to the existing GAMMA_LUT/DEGAMMA_LUT. These use a common
> >>>>>>> definition of LUT blob (u16 elements) and it's up to the driver to convert.
> >>>>>>>
> >>>>>>> Using a very precise format for the uAPI has the nice property of making the
> >>>>>>> uAPI much simpler to use. User-space sends high precision data and it's up to
> >>>>>>> drivers to map that to whatever the hardware accepts.
> >>>>>>>           
> >>>>>> Conversion from a larger uint type to a smaller type sounds low effort,
> >>>>>> however if a block works in a floating point space things are going to
> >>>>>> get messy really quickly. If the block operates in FP16 space and the
> >>>>>> interface is 16 bits we are good, but going from 32 bits to FP16 (such
> >>>>>> as in the matrix case or 3DLUT) is less than ideal.  
> >>>>>
> >>>>> Hi Christopher,
> >>>>>
> >>>>> are you thinking of precision loss, or the overhead of conversion?
> >>>>>
> >>>>> Conversion from N-bit fixed point to N-bit floating-point is generally
> >>>>> lossy, too, and the other direction as well.
> >>>>>
> >>>>> What exactly would be messy?
> >>>>>         
> >>>> Overheard of conversion is the primary concern here. Having to extract
> >>>> and / or calculate the significand + exponent components in the kernel
> >>>> is burdensome and imo a task better suited for user space. This also has
> >>>> to be done every blob set, meaning that if user space is re-using
> >>>> pre-calculated blobs we would be repeating the same conversion
> >>>> operations in kernel space unnecessarily.  
> >>>
> >>> What is burdensome in that calculation? I don't think you would need to
> >>> use any actual floating-point instructions. Logarithm for finding the
> >>> exponent is about finding the highest bit set in an integer and
> >>> everything is conveniently expressed in base-2. Finding significand is
> >>> just masking the integer based on the exponent.
> >>>      
> >> Oh it definitely can be done, but I think this is just a difference of
> >> opinion at this point. At the end of the day we will do it if we have
> >> to, but it is just more optimal if a more agreeable common type is used.
> >>  
> >>> Can you not cache the converted data, keyed by the DRM blob unique
> >>> identity vs. the KMS property it is attached to?  
> >> If the userspace compositor has N common transforms (ex: standard P3 ->
> >> sRGB matrix), they would likely have N unique blobs. Obviously from the
> >> kernel end we wouldn't want to cache the transform of every blob passed
> >> down through the UAPI.  
> > 
> > Hi Christoper,
> > 
> > as long as the blob exists, why not?  
> 
> Generally because this is an unbounded amount of blobs. I'm not 100% 
> sure what the typical behavior is upstream, but in our driver we have 
> scenarios where we can have per-frame blob updates (unique per-frame blobs).

All kernel allocated blob-related data should be accounted to the
userspace process. I don't think that happens today, but I think it
definitely should. Userspace can create a practically unlimited number
of arbitrary sized blobs to begin with, consuming arbitrary amounts of
kernel memory at will, even without drivers caching any derived data.

It does not seem to me like refusing to cache derived blob data would
really help.

> Speaking of per-frame blob updates, there is one concern I neglected to 
> bring up. Internally we have seen scenarios where frequent blob 
> allocation can lead to memory allocation delays of two frames or higher. 
> This typically was seen when the system is under high memory usage and 
> the blob allocation is > 1 page. The patch 
> https://patchwork.freedesktop.org/patch/525857/ was uploaded a few 
> months back to help mitigate these delays, but it didn't gain traction 
> at the time.

That is worrying.

As a userspace developer, I like the idea of limiting blob allocation
to DRM master only, but if the concern is the DRM master leaking, then
I'd imagine process accounting could at least point to the culprit.

Trying to defend against a malicious DRM master is in my opinion a
little moot. Untrusted processes should not be able to gain DRM master
to begin with.

Hmm, but DRM leasing...

> This color pipeline UAPI is ultimately going to have the same problem. 
> Frequent 3DLUT color block updates will result in large allocations, and 
> if there is high system memory usage this could see blob allocation 
> delays. So two things here:
> - Let's reconsider https://patchwork.freedesktop.org/patch/525857/ so 
> frequent blob allocation doesn't get unnecessarily delayed
> - Do we have any alternative methods at our disposal for sending down 
> the color configuration data? Generally blobs work fine for low update 
> or blob cycling use cases, but frequent blob data updates results in a 
> total per frame IOCTL sequence of:
>    (IOCTL_BLOB_DESTROY * #_of_blob_updates) +		
>      (IOCTL_BLOB_CREATE * #_of_blob_updates) + IOCTL_DRM_ATOMIC

Good questions.

I have no ideas for that, but I got a random idea to mitigate the blob
conversion overhead:

What if we had a new kind of blob that is targeted to a specific
property of a specific KMS object at creation?

Then the driver could do the conversion work at create ioctl time, and
store only the derived data and not the original userspace data at all.
Then there are no unexpected delays due to allocation or conversion at
atomic commit time, and the memory cost is optimal for the specific
usage.

The disadvantage is that the blob is then tied to the specific property
of the specific KMS object, and cannot be used anywhere else. I'm not
sure how much of a problem that would be in practice for userspace
having to create maybe even more blobs per-plane or per-crtc, or a
problem for drivers that have a flexible mapping between KMS objects
and hardware blocks.


Thanks
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-06-16  7:59 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-04 15:22 [RFC] Plane color pipeline KMS uAPI Simon Ser
2023-05-04 21:10 ` Harry Wentland
2023-05-05 11:41 ` Pekka Paalanen
2023-05-05 13:30   ` Joshua Ashton
2023-05-05 14:16     ` Pekka Paalanen
2023-05-05 17:01       ` Joshua Ashton
2023-05-09 11:23     ` Melissa Wen
2023-05-09 11:47       ` Pekka Paalanen
2023-05-09 17:01         ` Melissa Wen
2023-05-11 21:21     ` Simon Ser
2023-05-05 15:28 ` Daniel Vetter
2023-05-05 15:57   ` Sebastian Wick
2023-05-05 19:51     ` Daniel Vetter
2023-05-08  8:24       ` Pekka Paalanen
2023-05-08  9:00         ` Daniel Vetter
2023-05-05 16:06   ` Simon Ser
2023-05-05 19:53     ` Daniel Vetter
2023-05-08  8:58       ` Simon Ser
2023-05-08  9:18         ` Daniel Vetter
2023-05-08 18:10           ` Harry Wentland
     [not found]         ` <20230508185409.07501f40@n2pa>
2023-05-09  8:17           ` Pekka Paalanen
2023-05-05 20:40 ` Dave Airlie
2023-05-05 22:20   ` Sebastian Wick
2023-05-07 23:14     ` Dave Airlie
2023-05-08  9:37       ` Pekka Paalanen
2023-05-08 10:03       ` Jonas Ådahl
2023-05-09 14:31       ` Harry Wentland
2023-05-09 19:53         ` Dave Airlie
2023-05-09 20:22           ` Simon Ser
2023-05-10  7:59             ` Jonas Ådahl
2023-05-10  8:59               ` Pekka Paalanen
2023-05-11  9:51               ` Karol Herbst
2023-05-11 16:56                 ` Joshua Ashton
2023-05-11 18:56                   ` Jonas Ådahl
2023-05-11 19:29                   ` Simon Ser
2023-05-12  7:24                     ` Pekka Paalanen
2023-05-10  8:48             ` Pekka Paalanen
     [not found]   ` <20230505160435.6e3ffa4a@n2pa>
2023-05-08  8:49     ` Pekka Paalanen
2023-05-09  8:04 ` Pekka Paalanen
     [not found] ` <4341dac6-ada1-2a75-1c22-086d96408a85@quicinc.com>
2023-06-09 15:52   ` Christopher Braga
2023-06-09 16:30     ` Simon Ser
2023-06-09 23:11       ` Christopher Braga
2023-06-12  9:21         ` Pekka Paalanen
2023-06-12 16:56           ` Christopher Braga
2023-06-13  8:23             ` Pekka Paalanen
2023-06-13 16:29               ` Christopher Braga
2023-06-14  9:00                 ` Pekka Paalanen
2023-06-15 21:44                   ` Christopher Braga
2023-06-16  7:59                     ` Pekka Paalanen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).