Re: [RFC] Stateful codecs and requirements for compressed formats

From: Nicolas Dufresne <nicolas@ndufresne.ca>
To: Dave Stevenson <dave.stevenson@raspberrypi.org>,
	Hans Verkuil <hverkuil@xs4all.nl>
Cc: Linux Media Mailing List <linux-media@vger.kernel.org>,
	Boris Brezillon <boris.brezillon@collabora.com>,
	Paul Kocialkowski <paul.kocialkowski@bootlin.com>,
	Stanimir Varbanov <stanimir.varbanov@linaro.org>,
	Philipp Zabel <p.zabel@pengutronix.de>,
	Ezequiel Garcia <ezequiel@collabora.com>,
	Michael Tretter <m.tretter@pengutronix.de>,
	Tomasz Figa <tfiga@chromium.org>,
	Sylwester Nawrocki <snawrocki@kernel.org>
Subject: Re: [RFC] Stateful codecs and requirements for compressed formats
Date: Fri, 28 Jun 2019 11:48:49 -0400	[thread overview]
Message-ID: <9c3fe7a71aa4c2f9c3f92fa8d7a8fe0290f51da0.camel@ndufresne.ca> (raw)
In-Reply-To: <CAAoAYcOa7ngH5pPJze+H25rDQgjeNnpKY=HWQqsGFTTrO5iFgg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4192 bytes --]

Le vendredi 28 juin 2019 à 16:21 +0100, Dave Stevenson a écrit :
> Hi Hans
> 
> On Fri, 28 Jun 2019 at 15:34, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > Hi all,
> > 
> > I hope I Cc-ed everyone with a stake in this issue.
> > 
> > One recurring question is how a stateful encoder fills buffers and how a stateful
> > decoder consumes buffers.
> > 
> > The most generic case is that an encoder produces a bitstream and just fills each
> > CAPTURE buffer to the brim before continuing with the next buffer.
> > 
> > I don't think there are drivers that do this, I believe that all drivers just
> > output a single compressed frame. For interlaced formats I understand it is either
> > one compressed field per buffer, or two compressed fields per buffer (this is
> > what I heard, I don't know if this is true).
> 
> From the discussion that started this thread, with H264 and similar,
> does the V4L2 buffer contain just the frame data, or the SPS/PPS
> headers as well.

In existing mainline encoder driver the SPS/PPS is included in the
first frame produced. Decoders expect them to be in the first frame
queued. For decoder, this is being relaxed now that we have a mechanism
to notify the state change after the header has been processed.

> 
> > In any case, I don't think this is specified anywhere. Please correct me if I am
> > wrong.
> > 
> > The latest stateful codec spec is here:
> > 
> > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > 
> > Assuming what I described above is indeed the case, then I think this should
> > be documented. I don't know enough if a flag is needed somewhere to describe
> > the behavior for interlaced formats, or can we leave this open and have userspace
> > detect this?
> > 
> > 
> > For decoders it is more complicated. The stateful decoder spec is written with
> > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > 
> > See section 4.5.1.7 in the spec.
> > 
> > But I understand that various HW decoders *do* have limitations. I would really
> > like to know about those, since that needs to be exposed to userspace somehow.
> > 
> > Specifically, the venus decoder needs to know the resolution of the coded video
> > beforehand and it expects a single frame per buffer (how does that work for
> > interlaced formats?).
> > 
> > Such requirements mean that some userspace parsing is still required, so these
> > decoders are not completely stateful.
> > 
> > Can every codec author give information about their decoder/encoder?
> > 
> > I'll start off with my virtual codec driver:
> > 
> > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > but when that is added it will encode one field per buffer.
> 
> On BCM283x:
> 
> The underlying decoder will accept anything, but giving it a single
> frame per buffer reduces latency as the bitstream parser gets kicked
> earlier. Based on previous discussions I am setting the flag so that
> it expects one compressed frame per buffer, but I don't believe it
> goes wrong should that not be the case (it'll just waste a bit of
> processing effort).
> It'll parse the headers and produce a V4L2_EVENT_SOURCE_CHANGE event
> should the capture queue format not match the stream parameters.
> Interlacing isn't supported yet (it's on the list), but I believe the
> hardware produces the equivalent to V4L2_FIELD_INTERLACED_[TB|BT].
> 
> The encoder currently spits out the H264 SPS/PPS headers as a separate
> V4L2 buffer, and then one compressed frame per V4L2 buffer (provided
> the buffer is big enough). Should
> V4L2_CID_MPEG_VIDEO_REPEAT_SEQ_HEADER be set, then it will repeat the
> headers in an independent V4L2 buffer before each I frame.
> I'm quite happy to amend this should we have a decent spec of what is
> required. As I've never found a spec it's been trial and error until
> now.
> There is no interlaced support available.
> 
>   Dave

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]