On 05/05, Joshua Ashton wrote: > Some corrections and replies inline. > > On Fri, 5 May 2023 at 12:42, Pekka Paalanen wrote: > > > > On Thu, 04 May 2023 15:22:59 +0000 > > Simon Ser wrote: > > > > > Hi all, > > > > > > The goal of this RFC is to expose a generic KMS uAPI to configure the color > > > pipeline before blending, ie. after a pixel is tapped from a plane's > > > framebuffer and before it's blended with other planes. With this new uAPI we > > > aim to reduce the battery life impact of color management and HDR on mobile > > > devices, to improve performance and to decrease latency by skipping > > > composition on the 3D engine. This proposal is the result of discussions at > > > the Red Hat HDR hackfest [1] which took place a few days ago. Engineers > > > familiar with the AMD, Intel and NVIDIA hardware have participated in the > > > discussion. > > > > Hi Simon, > > > > this is an excellent write-up, thank you! > > > > Harry's question about what constitutes UAPI is a good one for danvet. > > > > I don't really have much to add here, a couple inline comments. I think > > this could work. > > > > > > > > This proposal takes a prescriptive approach instead of a descriptive approach. > > > Drivers describe the available hardware blocks in terms of low-level > > > mathematical operations, then user-space configures each block. We decided > > > against a descriptive approach where user-space would provide a high-level > > > description of the colorspace and other parameters: we want to give more > > > control and flexibility to user-space, e.g. to be able to replicate exactly the > > > color pipeline with shaders and switch between shaders and KMS pipelines > > > seamlessly, and to avoid forcing user-space into a particular color management > > > policy. > > > > > > We've decided against mirroring the existing CRTC properties > > > DEGAMMA_LUT/CTM/GAMMA_LUT onto KMS planes. Indeed, the color management > > > pipeline can significantly differ between vendors and this approach cannot > > > accurately abstract all hardware. In particular, the availability, ordering and > > > capabilities of hardware blocks is different on each display engine. So, we've > > > decided to go for a highly detailed hardware capability discovery. > > > > > > This new uAPI should not be in conflict with existing standard KMS properties, > > > since there are none which control the pre-blending color pipeline at the > > > moment. It does conflict with any vendor-specific properties like > > > NV_INPUT_COLORSPACE or the patches on the mailing list adding AMD-specific > > > properties. Drivers will need to either reject atomic commits configuring both > > > uAPIs, or alternatively we could add a DRM client cap which hides the vendor > > > properties and shows the new generic properties when enabled. > > > > > > To use this uAPI, first user-space needs to discover hardware capabilities via > > > KMS objects and properties, then user-space can configure the hardware via an > > > atomic commit. This works similarly to the existing KMS uAPI, e.g. planes. > > > > > > Our proposal introduces a new "color_pipeline" plane property, and a new KMS > > > object type, "COLOROP" (short for color operation). The "color_pipeline" plane > > > property is an enum, each enum entry represents a color pipeline supported by > > > the hardware. The special zero entry indicates that the pipeline is in > > > "bypass"/"no-op" mode. For instance, the following plane properties describe a > > > primary plane with 2 supported pipelines but currently configured in bypass > > > mode: > > > > > > Plane 10 > > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary > > > ├─ … > > > └─ "color_pipeline": enum {0, 42, 52} = 0 > > > > > > The non-zero entries describe color pipelines as a linked list of COLOROP KMS > > > objects. The entry value is an object ID pointing to the head of the linked > > > list (the first operation in the color pipeline). > > > > > > The new COLOROP objects also expose a number of KMS properties. Each has a > > > type, a reference to the next COLOROP object in the linked list, and other > > > type-specific properties. Here is an example for a 1D LUT operation: > > > > > > Color operation 42 > > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > > ├─ "1d_curve_type": enum {LUT, sRGB, PQ, BT.709, HLG, …} = LUT > > > ├─ "lut_size": immutable range = 4096 > > > ├─ "lut_data": blob > > > └─ "next": immutable color operation ID = 43 > > > > > > To configure this hardware block, user-space can fill a KMS blob with 4096 u32 > > > entries, then set "lut_data" to the blob ID. Other color operation types might > > > have different properties. > > > > > > Here is another example with a 3D LUT: > > > > > > Color operation 42 > > > ├─ "type": enum {Bypass, 3D LUT} = 3D LUT > > > ├─ "lut_size": immutable range = 33 > > > ├─ "lut_data": blob > > > └─ "next": immutable color operation ID = 43 > > > > > > And one last example with a matrix: > > > > > > Color operation 42 > > > ├─ "type": enum {Bypass, Matrix} = Matrix > > > ├─ "matrix_data": blob > > > └─ "next": immutable color operation ID = 43 > > > > > > [Simon note: having "Bypass" in the "type" enum, and making "type" mutable is > > > a bit weird. Maybe we can just add an "active"/"bypass" boolean property on > > > blocks which can be bypassed instead.] > > > > > > [Jonas note: perhaps a single "data" property for both LUTs and matrices > > > would make more sense. And a "size" prop for both 1D and 3D LUTs.] > > > > > > If some hardware supports re-ordering operations in the color pipeline, the > > > driver can expose multiple pipelines with different operation ordering, and > > > user-space can pick the ordering it prefers by selecting the right pipeline. > > > The same scheme can be used to expose hardware blocks supporting multiple > > > precision levels. > > > > > > That's pretty much all there is to it, but as always the devil is in the > > > details. > > > > > > First, we realized that we need a way to indicate where the scaling operation > > > is happening. The contents of the framebuffer attached to the plane might be > > > scaled up or down depending on the CRTC_W and CRTC_H properties. Depending on > > > the colorspace scaling is applied in, the result will be different, so we need > > > a way for the kernel to indicate which hardware blocks are pre-scaling, and > > > which ones are post-scaling. We introduce a special "scaling" operation type, > > > which is part of the pipeline like other operations but serves an informational > > > role only (effectively, the operation cannot be configured by user-space, all > > > of its properties are immutable). For example: > > > > > > Color operation 43 > > > ├─ "type": immutable enum {Scaling} = Scaling > > > └─ "next": immutable color operation ID = 44 > > > > I like this. > > > > > > > > [Simon note: an alternative would be to split the color pipeline into two, by > > > having two plane properties ("color_pipeline_pre_scale" and > > > "color_pipeline_post_scale") instead of a single one. This would be similar to > > > the way we want to split pre-blending and post-blending. This could be less > > > expressive for drivers, there may be hardware where there are dependencies > > > between the pre- and post-scaling pipeline?] > > > > > > Then, Alex from NVIDIA described how their hardware works. NVIDIA hardware > > > contains some fixed-function blocks which convert from LMS to ICtCp and cannot > > > be disabled/bypassed. NVIDIA hardware has been designed for descriptive APIs > > > where user-space provides a high-level description of the colorspace > > > conversions it needs to perform, and this is at odds with our KMS uAPI > > > proposal. To address this issue, we suggest adding a special block type which > > > describes a fixed conversion from one colorspace to another and cannot be > > > configured by user-space. Then user-space will need to accomodate its pipeline > > > for these special blocks. Such fixed hardware blocks need to be well enough > > > documented so that they can be implemented via shaders. > > > > > > We also noted that it should always be possible for user-space to completely > > > disable the color pipeline and switch back to bypass/identity without a > > > modeset. Some drivers will need to fail atomic commits for some color > > > pipelines, in particular for some specific LUT payloads. For instance, AMD > > > doesn't support curves which are too steep, and Intel doesn't support curves > > > which decrease. This isn't something which routinely happens, but there might > > > be more cases where the hardware needs to reject the pipeline. Thus, when > > > user-space has a running KMS color pipeline, then hits a case where the > > > pipeline cannot keep running (gets rejected by the driver), user-space needs to > > > be able to immediately fall back to shaders without any glitch. This doesn't > > > seem to be an issue for AMD, Intel and NVIDIA. > > > > > > This uAPI is extensible: we can add more color operations, and we can add more > > > properties for each color operation type. For instance, we might want to add > > > support for Intel piece-wise linear (PWL) 1D curves, or might want to advertise > > > the effective precision of the LUTs. The uAPI is deliberately somewhat minimal > > > to keep the scope of the proposal manageable. > > > > > > Later on, we plan to re-use the same machinery for post-blending color > > > pipelines. There are some more details about post-blending which have been > > > separately debated at the hackfest, but we believe it's a viable plan. This > > > solution would supersede the existing DEGAMMA_LUT/CTM/GAMMA_LUT properties, so > > > we'd like to introduce a client cap to hide the old properties and show the new > > > post-blending color pipeline properties. > > > > > > We envision a future user-space library to translate a high-level descriptive > > > color pipeline into low-level prescriptive KMS color pipeline ("libliftoff but > > > for color pipelines"). The library could also offer a translation into shaders. > > > This should help share more infrastructure between compositors and ease KMS > > > offloading. This should also help dealing with the NVIDIA case. > > > > > > To wrap things up, let's take a real-world example: how would gamescope [2] > > > configure the AMD DCN 3.0 hardware for its color pipeline? The gamescope color > > > pipeline is described in [3]. The AMD DCN 3.0 hardware is described in [4]. > > > > > > AMD would expose the following objects and properties: > > > > > > Plane 10 > > > ├─ "type": immutable enum {Overlay, Primary, Cursor} = Primary > > > └─ "color_pipeline": enum {0, 42} = 0 > > > Color operation 42 (input CSC) > > > ├─ "type": enum {Bypass, Matrix} = Matrix > > > ├─ "matrix_data": blob > > > └─ "next": immutable color operation ID = 43 > > > Color operation 43 > > > ├─ "type": enum {Scaling} = Scaling > > > └─ "next": immutable color operation ID = 44 > > > Color operation 44 (DeGamma) > > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > > ├─ "1d_curve_type": enum {sRGB, PQ, …} = sRGB > > > └─ "next": immutable color operation ID = 45 > > Some vendors have per-tap degamma and some have a degamma after the sample. > How do we distinguish that behaviour? > It is important to know. > > > > Color operation 45 (gamut remap) > > > ├─ "type": enum {Bypass, Matrix} = Matrix > > > ├─ "matrix_data": blob > > > └─ "next": immutable color operation ID = 46 > > > Color operation 46 (shaper LUT RAM) > > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > > ├─ "1d_curve_type": enum {LUT} = LUT > > > ├─ "lut_size": immutable range = 4096 > > > ├─ "lut_data": blob > > > └─ "next": immutable color operation ID = 47 > > > Color operation 47 (3D LUT RAM) > > > ├─ "type": enum {Bypass, 3D LUT} = 3D LUT > > > ├─ "lut_size": immutable range = 17 > > > ├─ "lut_data": blob > > > └─ "next": immutable color operation ID = 48 > > > Color operation 48 (blend gamma) > > > ├─ "type": enum {Bypass, 1D curve} = 1D curve > > > ├─ "1d_curve_type": enum {LUT, sRGB, PQ, …} = LUT > > > ├─ "lut_size": immutable range = 4096 > > > ├─ "lut_data": blob > > > └─ "next": immutable color operation ID = 0 > > > > > > To configure the pipeline for an HDR10 PQ plane (path at the top) and a HDR > > > display, gamescope would perform an atomic commit with the following property > > > values: > > > > > > Plane 10 > > > └─ "color_pipeline" = 42 > > > Color operation 42 (input CSC) > > > └─ "matrix_data" = PQ → scRGB (TF) > > ^ > Not sure what this is. > We don't use an input CSC before degamma. > > > > Color operation 44 (DeGamma) > > > └─ "type" = Bypass > > ^ > If we did PQ, this would be PQ -> Linear / 80 > If this was sRGB, it'd be sRGB -> Linear > If this was scRGB this would be just treating it as it is. So... Linear / 80. > > > > Color operation 45 (gamut remap) > > > └─ "matrix_data" = scRGB (TF) → PQ > > ^ > This is wrong, we just use this to do scRGB primaries (709) to 2020. > > We then go from scRGB -> PQ to go into our shaper + 3D LUT. > > > > Color operation 46 (shaper LUT RAM) > > > └─ "lut_data" = PQ → Display native > > ^ > "Display native" is just the response curve of the display. > In HDR10, this would just be PQ -> PQ > If we were doing HDR10 on SDR, this would be PQ -> Gamma 2.2 (mapped > from 0 to display native luminance) [with a potential bit of headroom > for tonemapping in the 3D LUT] > For SDR on HDR10 this would be Gamma 2.2 -> PQ (Not intending to start > an sRGB vs G2.2 argument here! :P) > > > > Color operation 47 (3D LUT RAM) > > > └─ "lut_data" = Gamut mapping + tone mapping + night mode > > > Color operation 48 (blend gamma) > > > └─ "1d_curve_type" = PQ > > ^ > This is wrong, this should be Display Native -> Linearized Display Referred This is a good point to discuss. I understand for the HDR10 case that we are just setting an enumerated TF (that is PQ for this case - correct me if I got it wrong) but, unlike when we use a user-LUT, we don't know from the API that this enumerated TF value with an empty LUT is used for linearizing/degamma. Perhaps this could come as a pair? Any idea? > > > > > You cannot do a TF with a matrix, and a gamut remap with a matrix on > > electrical values is certainly surprising, so the example here is a > > bit odd, but I don't think that hurts the intention of demonstration. > > I have done some corrections inline. > > You can see our fully correct color pipeline here: > https://raw.githubusercontent.com/ValveSoftware/gamescope/master/src/docs/Steam%20Deck%20Display%20Pipeline.png > > Please let me know if you have any more questions about our color pipeline. > > > > > Btw. ISTR that if you want to do scaling properly with alpha channel, > > you need optical values multiplied by alpha. Alpha vs. scaling is just > > yet another thing to look into, and TF operations do not work with > > pre-mult. > > What are your concerns here? > > Having pre-multiplied alpha is fine with a TF: the alpha was > premultiplied in linear, then encoded with the TF by the client. > If you think of a TF as something something relative to a bunch of > reference state or whatever then you might think "oh you can't do > that!", but you really can. > It's really best to just think of it as a mathematical encoding of a > value in all instances that we touch. > > The only issue is that you lose precision from having pre-multiplied > alpha as it's quantized to fit into the DRM format rather than using > the full range then getting divided by the alpha at blend time. > It doesn't end up being a visible issue ever however in my experience, at 8bpc. > > Thanks > - Joshie 🐸✨ > > > > > > > Thanks, > > pq > > > > > > > > I hope comparing these properties to the diagrams linked above can help > > > understand how the uAPI would be used and give an idea of its viability. > > > > > > Please feel free to provide feedback! It would be especially useful to have > > > someone familiar with Arm SoCs look at this, to confirm that this proposal > > > would work there. > > > > > > Unless there is a show-stopper, we plan to follow up this RFC with > > > implementations for AMD, Intel, NVIDIA, gamescope, and IGT. > > > > > > Many thanks to everybody who contributed to the hackfest, on-site or remotely! > > > Let's work together to make this happen! > > > > > > Simon, on behalf of the hackfest participants > > > > > > [1]: https://wiki.gnome.org/Hackfests/ShellDisplayNext2023 > > > [2]: https://github.com/ValveSoftware/gamescope > > > [3]: https://github.com/ValveSoftware/gamescope/blob/5af321724c8b8a29cef5ae9e31293fd5d560c4ec/src/docs/Steam%20Deck%20Display%20Pipeline.png > > > [4]: https://kernel.org/doc/html/latest/_images/dcn3_cm_drm_current.svg > >