Re: V4L2 Encoders Pre-Processing Support Questions

From: Nicolas Dufresne <nicolas@ndufresne.ca>
To: Paul Kocialkowski <paul.kocialkowski@bootlin.com>,
	Hans Verkuil <hverkuil@xs4all.nl>,
	Benjamin Gaignard <benjamin.gaignard@collabora.com>,
	Jernej Skrabec <jernej.skrabec@gmail.com>,
	Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	Michael Tretter <m.tretter@pengutronix.de>
Cc: linux-media@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: V4L2 Encoders Pre-Processing Support Questions
Date: Fri, 20 Oct 2023 13:56:23 -0400	[thread overview]
Message-ID: <437e4ea0c2c2681c1b333ad109f8f02fc229537f.camel@ndufresne.ca> (raw)
In-Reply-To: <ZTD5TXND4R7JqvCD@aptenodytes>

Hi Paul,

Le jeudi 19 octobre 2023 à 11:39 +0200, Paul Kocialkowski a écrit :
> Hello,
> 
> While working on the Allwinner Video Engine H.264 encoder, I found that it has
> some pre-processing capabilities. This includes things like chroma
> down-sampling, colorspace conversion and scaling.

Similar with Hantro H1.

> 
> For example this means that you can feed the encoder with YUV 4:2:2 data and
> it will downsample it to 4:2:0 since that's the only thing the hardware can do.
> It can also happen when e.g. providing RGB source pictures which will be
> converted to YUV 4:2:0 internally.
> 
> I was wondering how all of this is dealt with currently and whether this should
> be a topic of attention. As far as I can see there is currently no practical way
> for userspace to know that such downsampling will take place, although this is
> useful to know.

Userspace already know that the driver will be downsample through the selected
profile. The only issue would be if a users want to force a profile with 422
support, but have its  422 data downsampled anyway. This is legal in the spec,
but I'd question myself if its worth supporting.

> 
> Would it make sense to have an additional media entity between the source video
> node and the encoder proc and have the actual pixel format configured in that
> link (this would still be a video-centric device so userspace would not be
> expected to configure that link). But then what if the hardware can either
> down-sample or keep the provided sub-sampling? How would userspace indicate
> which behavior to select? It is maybe not great to let userspace configure the
> pads when this is a video-node-centric driver.
> 
> Perhaps this could be a control or the driver could decide to pick the least
> destructive sub-sampling available based on the selected codec profile
> (but this is still a guess that may not match the use case). With a control
> we probably don't need an extra media entity.

Yes, for the cases not covered by the profile, I'd consider a control to force
downsampling. A menu, so we can use the available menu items to get enumerate
what is supported.

> 
> Another topic is scaling. We can generally support scaling by allowing a
> different size for the coded queue after configuring the picture queue.
> However there would be some interaction with the selection rectangle, which is
> used to set the cropping rectangle from the *source*. So the driver will need
> to take this rectangle and scale it to match with the coded size.
> 
> The main inconsistency here is that the rectangle would no longer correspond to
> what will be set in the bitstream, nor would the destination size since it does
> not count the cropping rectangle by definition. It might be more sensible to
> have the selection rectangle operate on the coded/destination queue instead,
> but things are already specified to be the other way round.
> 
> Maybe a selection rectangle could be introduced for the coded queue too, which
> would generally be propagated from the picture-side one, except in the case of
> scaling where it would be used to clarify the actual final size (coded size
> taking the cropping in account). It this case the source selection rectangle
> would be understood as an actual source crop (may not be supported by hardware)
> instead of an indication for the codec metadata crop fields. And the coded
> queue dimensions would need to take in account this source cropping, which is
> kinda contradictory with the current semantics. Perhaps we could define that
> the source crop rectangle should be entirely ignored when scaling is used,
> which would simplify things (although we lose the ability to support source
> cropping if the hardware can do it).

Yes, we should use selection on both queue (fortunately there is a v4l2_buf_type
in that API). Otherwise we cannot model all the scaling and cropping options.
What the spec must do is define the configuration sequence, so that a
negotiation is possible. We need a convention regarding the order, so that there
is a way to converge with the driver, and also to conclude if the driver cannot
handle it.

> 
> If operating on the source selection rectangle only (no second rectangle on the
> coded queue) some cases would be impossible to reach, for instance going from
> some aligned dimensions to unaligned ones (e.g. 1280x720 source scaled to
> 1920x1088 and we want the codec cropping fields to indicate 1920x1080).
> 
> Anyway just wanted to check if people have already thought about these topics,
> but I'm mostly thinking out loud and I'm of course not saying we need to solve
> these problems now.

We might find extra corner case by implementing the spec, but I think the API we
have makes most of this possible already. Remember that we have fwht sw codec in
kernel for the purpose of developing this kind of feature. A simple bob scaler
can be added for testing scaling.

> 
> Sorry again for the long email, I hope the points I'm making are somewhat
> understandable.
> 
> Cheers,
> 
> Paul
> 

regards,
Nicolas