dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Harry Wentland <harry.wentland@amd.com>
To: Christopher Braga <quic_cbraga@quicinc.com>,
	Alex Goins <agoins@nvidia.com>,
	Sebastian Wick <sebastian.wick@redhat.com>
Cc: "Aleix Pol" <aleixpol@kde.org>,
	"Sasha McIntosh" <sashamcintosh@google.com>,
	"Abhinav Kumar" <quic_abhinavk@quicinc.com>,
	"Shashank Sharma" <shashank.sharma@amd.com>,
	"Xaver Hugl" <xaver.hugl@gmail.com>,
	"Hector Martin" <marcan@marcan.st>,
	"Liviu Dudau" <Liviu.Dudau@arm.com>,
	"Pekka Paalanen" <ppaalanen@gmail.com>,
	dri-devel@lists.freedesktop.org,
	"Victoria Brekenfeld" <victoria@system76.com>,
	"Melissa Wen" <mwen@igalia.com>,
	"Michel Dänzer" <mdaenzer@redhat.com>,
	"Jonas Ådahl" <jadahl@redhat.com>,
	"Joshua Ashton" <joshua@froggi.es>,
	"Uma Shankar" <uma.shankar@intel.com>,
	"Naseer Ahmed" <quic_naseer@quicinc.com>,
	wayland-devel@lists.freedesktop.org,
	"Arthur Grillo" <arthurgrillo@riseup.net>
Subject: Re: [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed
Date: Tue, 7 Nov 2023 11:52:19 -0500	[thread overview]
Message-ID: <0dc74587-e2f8-4f6d-95d8-c4a0679e78bf@amd.com> (raw)
In-Reply-To: <7aa2519b-4eac-4378-8d9c-9180428ef0b2@quicinc.com>



On 2023-11-04 19:01, Christopher Braga wrote:
> Just want to loop back to before we branched off deeper into the programming performance talk
> 
> On 10/26/2023 3:25 PM, Alex Goins wrote:
>> On Thu, 26 Oct 2023, Sebastian Wick wrote:
>>
>>> On Thu, Oct 26, 2023 at 11:57:47AM +0300, Pekka Paalanen wrote:
>>>> On Wed, 25 Oct 2023 15:16:08 -0500 (CDT)
>>>> Alex Goins <agoins@nvidia.com> wrote:
>>>>
>>>>> Thank you Harry and all other contributors for your work on this. Responses
>>>>> inline -
>>>>>
>>>>> On Mon, 23 Oct 2023, Pekka Paalanen wrote:
>>>>>
>>>>>> On Fri, 20 Oct 2023 11:23:28 -0400
>>>>>> Harry Wentland <harry.wentland@amd.com> wrote:
>>>>>>
>>>>>>> On 2023-10-20 10:57, Pekka Paalanen wrote:
>>>>>>>> On Fri, 20 Oct 2023 16:22:56 +0200
>>>>>>>> Sebastian Wick <sebastian.wick@redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks for continuing to work on this!
>>>>>>>>>
>>>>>>>>> On Thu, Oct 19, 2023 at 05:21:22PM -0400, Harry Wentland wrote:

snip

>>>>> Actually, the current examples in the proposal don't include a multiplier color
>>>>> op, which might be useful. For AMD as above, but also for NVIDIA as the
>>>>> following issue arises:
>>>>>
>>>>> As discussed further below, the NVIDIA "degamma" LUT performs an implicit fixed
> 
> If possible, let's declare this as two blocks. One that informatively declares the conversion is present, and another for the de-gamma. This will help with block-reuse between vendors.
> 
>>>>> point to FP16 conversion. In that conversion, what fixed point 0xFFFFFFFF maps
>>>>> to in floating point varies depending on the source content. If it's SDR
>>>>> content, we want the max value in FP16 to be 1.0 (80 nits), subject to a
>>>>> potential boost multiplier if we want SDR content to be brighter. If it's HDR PQ
>>>>> content, we want the max value in FP16 to be 125.0 (10,000 nits). My assumption
>>>>> is that this is also what AMD's "HDR Multiplier" stage is used for, is that
>>>>> correct?
>>>>
>>>> It would be against the UAPI design principles to tag content as HDR or
>>>> SDR. What you can do instead is to expose a colorop with a multiplier of
>>>> 1.0 or 125.0 to match your hardware behaviour, then tell your hardware
>>>> that the input is SDR or HDR to get the expected multiplier. You will
>>>> never know what the content actually is, anyway.
>>
>> Right, I didn't mean to suggest that we should tag content as HDR or SDR in the
>> UAPI, just relating to the end result in the pipe, ultimately it would be
>> determined by the multiplier color op.
>>
> 
> A multiplier could work but we would should give OEMs the option to either make it "informative" and fixed by the hardware, or fully configurable. With the Qualcomm pipeline how we absorb FP16 pixel buffers, as well as how we convert them to fixed point data actually has a dependency on the desired de-gamma and gamma processing. So for an example:
> 
> If a source pixel buffer is scRGB encoded FP16 content we would expect input pixel content to be up to 7.5, with the IGC output reaching 125 as in the NVIDIA case. Likewise gamma 2.2 encoded FP16 content would be 0-1 in and 0-1 out.
> 
> So in the Qualcomm case the expectations are fixed depending on the use case.
> 
> It is sounding to me like we would need to be able to declare three things here:
> 1. Value range expectations *into* the de-gamma block. A multiplier wouldn't work here because it would be more of a clipping operation. I guess we would have to add an explicit clamping block as well.
> 2. What the value range expectations  at the *output* of de-gamma processing block. Also covered by using another multiplier block.
> 3. Value range expectations *into* a gamma processing block. This should be covered by declaring a multiplier post-csc, but only assuming CSC output is normalized in the desired value range. A clamping block would be preferable because it describes what happens when it isn't.
> 

What about adding informational input and output range properties
to colorops? I think Intel's PWL definitions had something like
that, but I'd have to take a look at that again. While I'm not
in favor of defining segmented LUTs at the uAPI the input/output
ranges seem to be something of value.

> All this is do-able, but it seems like it would require the definition of multiple color pipelines to expose the different limitations for color block configuration combinations. Additionally, would it be easy for user space to find the right pipeline?
> 

I'm also a little concerned that some of these proposals mean we'd
have to expose an inordinate number of color pipelines and color
pipeline selection becomes difficult and error prone.

snip

>>>> Given that elements like various kinds of look-up tables inherently
>>>> assume that the domain is [0.0, 1.0] (because the it is a table that
>>>> has a beginning and an end, and the usual convention is that the
>>>> beginning is zero and the end is one), I think it is best to stick to
>>>> the [0.0, 1.0] range where possible. If we go out of that range, then
>>>> we have to define how a LUT would apply in a sensible way.
>>
>> In my last reply I mentioned a static (but actually programmable) LUT that is
>> typically used to convert FP16 linear pixels to fixed point PQ before handing
>> them to the scaler and tone mapping operator. You're actually right that it
>> indexes in the fixed point [0.0, 1.0] range for the reasons you describe, but
>> because the input pixels are expected to be FP16 in the [0.0, 125.0] range, it
>> applies a non-programmable 1/125.0 normalization factor first.
>>
>> In this case, you could think of the LUT as indexing on [0.0, 125.0], but as you
>> point out there would need to be some way to describe that. Maybe we actually
>> need a fractional multiplier / divider color op. NVIDIA pipes that include this
>> LUT would need to include a mandatory 1/125.0 factor immediately prior to the
>> LUT, then LUT can continue assuming a range of [0.0, 1.0].
>>
>> Assuming you are using the hardware in a conventional way, specifying a
>> multiplier of 1.0 after the "degamma" LUT would then map to the 80-nit PQ range
>> after the static (but actually programmable) PQ LUT, whereas specifying a
>> multiplier of 125.0 would map to the 10,000-nit PQ range, which is what we want.
>> I guess it's kind of messy, but the effect would be that color ops other than
>> multipliers/dividers would still be in the [0.0, 1.0] domain, and any multiplier
>> that exceeds that range would have to be normalized by a divider before any
>> other color op.
>>
> 
> Hmm. A multiplier would resolve issues when input linear FP16 data that has different ideas on what 1.0 means in regards to nits values (think of Apple's EDR as an example). For a client to go from their definition to hardware definition of 1.0 = x nits, we would need to expose what the pipeline sees as 1.0 though. So in this case the multiplier would be programmable, but the divisor is informational? It seems like the later would have an influence on how the former is programmed.
> 

A programmable multiplier would either need to be backed by a HW block
to perform the operation or require a driver to scale the LUT or matrix
values of an adjacent LUT or matrix block.

snip

>>>>>>
>>>>>> Yeah, this is why we need a definition. I understand "informational" to
>>>>>> not change pixel values in any way. Previously I had some weird idea
>>>>>> that scaling doesn't alter color, but of course it may.
>>>>>
>>>>> On recent hardware, the NVIDIA pre-blending pipeline includes LUTs that do
>>>>> implicit fixed-point to FP16 conversions, and vice versa.
>>>>
>>>> Above, I claimed that the UAPI should be defined in nominal
>>>> floating-point values, but I wonder, would that work? Would we need to
>>>> have explicit colorops for converting from raw pixel data values into
>>>> nominal floating-point in the UAPI?
>>
>> Yeah, I think something like that is needed, or another solution as discussed
>> below. Even if we define the UAPI in terms of floating point, the actual
>> underlying pixel format needs to match the expectations of each stage as it
>> flows through the pipe.
>>
> 
> Strongly agree on this. Pixel format and block relationships definitely exist.
> 

Interesting to see this isn't just an AMD thing. :)

snip

>>>>
>>>> Both blending and scaling are fundamentally the same operation: you
>>>> have two or more source colors (pixels), and you want to compute a
>>>> weighted average of them following what happens in nature, that is,
>>>> physics, as that is what humans are used to.
>>>>
>>>> Both blending and scaling will suffer from the same problems if the
>>>> operation is performed on not light-linear values. The result of the
>>>> weighted average does not correspond to physics.
>>>>
>>>> The problem may be hard to observe with natural imagery, but Josh's
>>>> example shows it very clearly. Maybe that effect is sometimes useful
>>>> for some imagery in some use cases, but it is still an accidental
>>>> side-effect. You might get even better results if you don't rely on
>>>> accidental side-effects but design a separate operation for the exact
>>>> goal you have.
>>>>
>>>> Mind, by scaling we mean changing image size. Not scaling color values.
>>>>
>>
>> Fair enough, but it might not always be a choice given the hardware.
>>
> 
> Agreeing with Alex here. I get there is some debate over the best way to do this, but I think it is best to leave it up to the driver to declare how that is done.

Same.

snip

>>>>
>>>> What I was left puzzled about after the XDC workshop is that is it
>>>> possible to pre-load configurations in the background (slow), and then
>>>> quickly switch between them? Hardware-wise I mean.
>>
>> This works fine for our "fast" LUTs, you just point them to a surface in video
>> memory and they flip to it. You could keep multiple surfaces around and flip
>> between them without having to reprogram them in software. We can easily do that
>> with enumerated curves, populating them when the driver initializes instead of
>> waiting for the client to request them. You can even point multiple hardware
>> LUTs to the same video memory surface, if they need the same curve.
>>
>>>
>>> We could define that pipelines with a lower ID are to be preferred over
>>> higher IDs.
>>
>> Sure, but this isn't just an issue with a pipeline as a whole, but the
>> individual elements within it and how to use them in a given context.
>>
>>>
>>> The issue is that if programming a pipeline becomes too slow to be
>>> useful it probably should just not be made available to user space.
>>
>> It's not that programming the pipeline is overall too slow. The LUTs we have
>> that are relatively slow to program are meant to be set infrequently, or even
>> just once, to allow the scaler and tone mapping operator to operate in fixed
>> point PQ space. You might still want the tone mapper, so you would choose a
>> pipeline that includes them, but when it comes to e.g. animating a night light,
>> you would want to choose a different LUT for that purpose.
>>
>>>
>>> The prepare-commit idea for blob properties would help to make the
>>> pipelines usable again, but until then it's probably a good idea to just
>>> not expose those pipelines.
>>
>> The prepare-commit idea actually wouldn't work for these LUTs, because they are
>> programmed using methods instead of pointing them to a surface. I'm actually not
>> sure how slow it actually is, would need to benchmark it. I think not exposing
>> them at all would be overkill, since it would mean you can't use the preblending
>> scaler or tonemapper, and animation isn't necessary for that.
>>
>> The AMD 3DLUT is another example of a LUT that is slow to update, and it would
>> obviously be a major loss if that wasn't exposed. There just needs to be some
>> way for clients to know if they are going to kill performance by trying to
>> change it every frame.
>>
>> Thanks,
>> Alex
>>
> 
> To clarify, what are we defining as slow to update here? Something we aren't able to update within a frame (let's say at a low frame rate such as 30 fps for discussion's sake)? A block that requires a programming sequence of disable + program + enable to update? Defining performance seems like it can get murky if we start to consider frame concurrent updates among multiple color blocks as well.
> 

I think any definition for slow would need to be imprecise on some level.
In the AMD 3DLUT case we can take around 8 ms. Some compositors need the
programming time to be well under 1 ms, even for low frame rates. Those
compositors might want to know if an operation might be undesirable if
they care about latency. I'm not sure we could reliably indicate more.

Harry

> Thanks,
> Christopher
> 
>>>
>>>>
>>>>
>>>> Thanks,
>>>> pq
>>>
>>>


  reply	other threads:[~2023-11-07 16:52 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-19 21:21 [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 01/17] drm/atomic: Allow get_value for immutable properties on atomic drivers Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 02/17] drm: Don't treat 0 as -1 in drm_fixp2int_ceil Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 03/17] drm/vkms: Create separate Kconfig file for VKMS Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 04/17] drm/vkms: Add kunit tests for VKMS LUT handling Harry Wentland
2023-10-23 22:34   ` Arthur Grillo
2023-10-19 21:21 ` [RFC PATCH v2 05/17] drm/vkms: Avoid reading beyond LUT array Harry Wentland
2023-10-30 13:29   ` Pekka Paalanen
2023-11-06 20:48     ` Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 06/17] drm/doc/rfc: Describe why prescriptive color pipeline is needed Harry Wentland
2023-10-20 14:22   ` Sebastian Wick
2023-10-20 14:57     ` Pekka Paalanen
2023-10-20 15:23       ` Harry Wentland
2023-10-23  8:12         ` Pekka Paalanen
2023-10-25 20:16           ` Alex Goins
2023-10-26  8:57             ` Pekka Paalanen
2023-10-26 17:30               ` Sebastian Wick
2023-10-26 19:25                 ` Alex Goins
2023-10-27  8:59                   ` Michel Dänzer
2023-10-27 10:01                     ` Sebastian Wick
2023-10-27 12:01                       ` Pekka Paalanen
2023-11-04 23:01                   ` Christopher Braga
2023-11-07 16:52                     ` Harry Wentland [this message]
2023-11-07 16:52                   ` Harry Wentland
2023-11-07 16:52                 ` Harry Wentland
2023-11-07 21:17                   ` Sebastian Wick
2023-11-07 16:52               ` Harry Wentland
2023-11-07 16:52             ` Harry Wentland
2023-11-08 12:18   ` Shankar, Uma
2023-11-08 13:43     ` Joshua Ashton
2023-11-09 10:17       ` Shankar, Uma
2023-11-09 11:55         ` Pekka Paalanen
2023-11-10 11:27           ` Shankar, Uma
2023-11-10 13:27             ` Pekka Paalanen
2023-11-08 14:37     ` Harry Wentland
2023-11-09 10:24       ` Shankar, Uma
2023-10-19 21:21 ` [RFC PATCH v2 07/17] drm/colorop: Introduce new drm_colorop mode object Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 08/17] drm/colorop: Add TYPE property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 09/17] drm/color: Add 1D Curve subtype Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 10/17] drm/colorop: Add BYPASS property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 11/17] drm/colorop: Add NEXT property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 12/17] drm/colorop: Add atomic state print for drm_colorop Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 13/17] drm/colorop: Add new IOCTLs to retrieve drm_colorop objects Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 14/17] drm/plane: Add COLOR PIPELINE property Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 15/17] drm/colorop: Add NEXT to colorop state print Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 16/17] drm/vkms: Add enumerated 1D curve colorop Harry Wentland
2023-10-19 21:21 ` [RFC PATCH v2 17/17] drm/vkms: Add kunit tests for linear and sRGB LUTs Harry Wentland
2023-11-08 11:54 ` [RFC PATCH v2 00/17] Color Pipeline API w/ VKMS Shankar, Uma
2023-11-08 14:32   ` Harry Wentland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0dc74587-e2f8-4f6d-95d8-c4a0679e78bf@amd.com \
    --to=harry.wentland@amd.com \
    --cc=Liviu.Dudau@arm.com \
    --cc=agoins@nvidia.com \
    --cc=aleixpol@kde.org \
    --cc=arthurgrillo@riseup.net \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jadahl@redhat.com \
    --cc=joshua@froggi.es \
    --cc=marcan@marcan.st \
    --cc=mdaenzer@redhat.com \
    --cc=mwen@igalia.com \
    --cc=ppaalanen@gmail.com \
    --cc=quic_abhinavk@quicinc.com \
    --cc=quic_cbraga@quicinc.com \
    --cc=quic_naseer@quicinc.com \
    --cc=sashamcintosh@google.com \
    --cc=sebastian.wick@redhat.com \
    --cc=shashank.sharma@amd.com \
    --cc=uma.shankar@intel.com \
    --cc=victoria@system76.com \
    --cc=wayland-devel@lists.freedesktop.org \
    --cc=xaver.hugl@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).