[RFC] Stateful codecs and requirements for compressed formats

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC] Stateful codecs and requirements for compressed formats
@ 2019-06-28 14:34 Hans Verkuil
  2019-06-28 15:21 ` Dave Stevenson
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Hans Verkuil @ 2019-06-28 14:34 UTC (permalink / raw)
  To: Linux Media Mailing List, Nicolas Dufresne, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

Hi all,

I hope I Cc-ed everyone with a stake in this issue.

One recurring question is how a stateful encoder fills buffers and how a stateful
decoder consumes buffers.

The most generic case is that an encoder produces a bitstream and just fills each
CAPTURE buffer to the brim before continuing with the next buffer.

I don't think there are drivers that do this, I believe that all drivers just
output a single compressed frame. For interlaced formats I understand it is either
one compressed field per buffer, or two compressed fields per buffer (this is
what I heard, I don't know if this is true).

In any case, I don't think this is specified anywhere. Please correct me if I am
wrong.

The latest stateful codec spec is here:

https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html

Assuming what I described above is indeed the case, then I think this should
be documented. I don't know enough if a flag is needed somewhere to describe
the behavior for interlaced formats, or can we leave this open and have userspace
detect this?

For decoders it is more complicated. The stateful decoder spec is written with
the assumption that userspace can just fill each OUTPUT buffer to the brim with
the compressed bitstream. I.e., no need to split at frame or other boundaries.

See section 4.5.1.7 in the spec.

But I understand that various HW decoders *do* have limitations. I would really
like to know about those, since that needs to be exposed to userspace somehow.

Specifically, the venus decoder needs to know the resolution of the coded video
beforehand and it expects a single frame per buffer (how does that work for
interlaced formats?).

Such requirements mean that some userspace parsing is still required, so these
decoders are not completely stateful.

Can every codec author give information about their decoder/encoder?

I'll start off with my virtual codec driver:

vicodec: the decoder fully parses the bitstream. The encoder produces a single
compressed frame per buffer. This driver doesn't yet support interlaced formats,
but when that is added it will encode one field per buffer.

Let's see what the results are.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 14:34 [RFC] Stateful codecs and requirements for compressed formats Hans Verkuil
@ 2019-06-28 15:21 ` Dave Stevenson
  2019-06-28 15:48   ` Nicolas Dufresne
  2019-06-28 16:18 ` Nicolas Dufresne
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Dave Stevenson @ 2019-06-28 15:21 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Nicolas Dufresne, Boris Brezillon,
	Paul Kocialkowski, Stanimir Varbanov, Philipp Zabel,
	Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

Hi Hans

On Fri, 28 Jun 2019 at 15:34, Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi all,
>
> I hope I Cc-ed everyone with a stake in this issue.
>
> One recurring question is how a stateful encoder fills buffers and how a stateful
> decoder consumes buffers.
>
> The most generic case is that an encoder produces a bitstream and just fills each
> CAPTURE buffer to the brim before continuing with the next buffer.
>
> I don't think there are drivers that do this, I believe that all drivers just
> output a single compressed frame. For interlaced formats I understand it is either
> one compressed field per buffer, or two compressed fields per buffer (this is
> what I heard, I don't know if this is true).

From the discussion that started this thread, with H264 and similar,
does the V4L2 buffer contain just the frame data, or the SPS/PPS
headers as well.

> In any case, I don't think this is specified anywhere. Please correct me if I am
> wrong.
>
> The latest stateful codec spec is here:
>
> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
>
> Assuming what I described above is indeed the case, then I think this should
> be documented. I don't know enough if a flag is needed somewhere to describe
> the behavior for interlaced formats, or can we leave this open and have userspace
> detect this?
>
>
> For decoders it is more complicated. The stateful decoder spec is written with
> the assumption that userspace can just fill each OUTPUT buffer to the brim with
> the compressed bitstream. I.e., no need to split at frame or other boundaries.
>
> See section 4.5.1.7 in the spec.
>
> But I understand that various HW decoders *do* have limitations. I would really
> like to know about those, since that needs to be exposed to userspace somehow.
>
> Specifically, the venus decoder needs to know the resolution of the coded video
> beforehand and it expects a single frame per buffer (how does that work for
> interlaced formats?).
>
> Such requirements mean that some userspace parsing is still required, so these
> decoders are not completely stateful.
>
> Can every codec author give information about their decoder/encoder?
>
> I'll start off with my virtual codec driver:
>
> vicodec: the decoder fully parses the bitstream. The encoder produces a single
> compressed frame per buffer. This driver doesn't yet support interlaced formats,
> but when that is added it will encode one field per buffer.

On BCM283x:

The underlying decoder will accept anything, but giving it a single
frame per buffer reduces latency as the bitstream parser gets kicked
earlier. Based on previous discussions I am setting the flag so that
it expects one compressed frame per buffer, but I don't believe it
goes wrong should that not be the case (it'll just waste a bit of
processing effort).
It'll parse the headers and produce a V4L2_EVENT_SOURCE_CHANGE event
should the capture queue format not match the stream parameters.
Interlacing isn't supported yet (it's on the list), but I believe the
hardware produces the equivalent to V4L2_FIELD_INTERLACED_[TB|BT].

The encoder currently spits out the H264 SPS/PPS headers as a separate
V4L2 buffer, and then one compressed frame per V4L2 buffer (provided
the buffer is big enough). Should
V4L2_CID_MPEG_VIDEO_REPEAT_SEQ_HEADER be set, then it will repeat the
headers in an independent V4L2 buffer before each I frame.
I'm quite happy to amend this should we have a decent spec of what is
required. As I've never found a spec it's been trial and error until
now.
There is no interlaced support available.

  Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 15:21 ` Dave Stevenson
@ 2019-06-28 15:48   ` Nicolas Dufresne
  2019-06-29 10:02     ` Dave Stevenson
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Dufresne @ 2019-06-28 15:48 UTC (permalink / raw)
  To: Dave Stevenson, Hans Verkuil
  Cc: Linux Media Mailing List, Boris Brezillon, Paul Kocialkowski,
	Stanimir Varbanov, Philipp Zabel, Ezequiel Garcia,
	Michael Tretter, Tomasz Figa, Sylwester Nawrocki

[-- Attachment #1: Type: text/plain, Size: 4192 bytes --]

Le vendredi 28 juin 2019 à 16:21 +0100, Dave Stevenson a écrit :
> Hi Hans
> 
> On Fri, 28 Jun 2019 at 15:34, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > Hi all,
> > 
> > I hope I Cc-ed everyone with a stake in this issue.
> > 
> > One recurring question is how a stateful encoder fills buffers and how a stateful
> > decoder consumes buffers.
> > 
> > The most generic case is that an encoder produces a bitstream and just fills each
> > CAPTURE buffer to the brim before continuing with the next buffer.
> > 
> > I don't think there are drivers that do this, I believe that all drivers just
> > output a single compressed frame. For interlaced formats I understand it is either
> > one compressed field per buffer, or two compressed fields per buffer (this is
> > what I heard, I don't know if this is true).
> 
> From the discussion that started this thread, with H264 and similar,
> does the V4L2 buffer contain just the frame data, or the SPS/PPS
> headers as well.

In existing mainline encoder driver the SPS/PPS is included in the
first frame produced. Decoders expect them to be in the first frame
queued. For decoder, this is being relaxed now that we have a mechanism
to notify the state change after the header has been processed.

> 
> > In any case, I don't think this is specified anywhere. Please correct me if I am
> > wrong.
> > 
> > The latest stateful codec spec is here:
> > 
> > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > 
> > Assuming what I described above is indeed the case, then I think this should
> > be documented. I don't know enough if a flag is needed somewhere to describe
> > the behavior for interlaced formats, or can we leave this open and have userspace
> > detect this?
> > 
> > 
> > For decoders it is more complicated. The stateful decoder spec is written with
> > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > 
> > See section 4.5.1.7 in the spec.
> > 
> > But I understand that various HW decoders *do* have limitations. I would really
> > like to know about those, since that needs to be exposed to userspace somehow.
> > 
> > Specifically, the venus decoder needs to know the resolution of the coded video
> > beforehand and it expects a single frame per buffer (how does that work for
> > interlaced formats?).
> > 
> > Such requirements mean that some userspace parsing is still required, so these
> > decoders are not completely stateful.
> > 
> > Can every codec author give information about their decoder/encoder?
> > 
> > I'll start off with my virtual codec driver:
> > 
> > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > but when that is added it will encode one field per buffer.
> 
> On BCM283x:
> 
> The underlying decoder will accept anything, but giving it a single
> frame per buffer reduces latency as the bitstream parser gets kicked
> earlier. Based on previous discussions I am setting the flag so that
> it expects one compressed frame per buffer, but I don't believe it
> goes wrong should that not be the case (it'll just waste a bit of
> processing effort).
> It'll parse the headers and produce a V4L2_EVENT_SOURCE_CHANGE event
> should the capture queue format not match the stream parameters.
> Interlacing isn't supported yet (it's on the list), but I believe the
> hardware produces the equivalent to V4L2_FIELD_INTERLACED_[TB|BT].
> 
> The encoder currently spits out the H264 SPS/PPS headers as a separate
> V4L2 buffer, and then one compressed frame per V4L2 buffer (provided
> the buffer is big enough). Should
> V4L2_CID_MPEG_VIDEO_REPEAT_SEQ_HEADER be set, then it will repeat the
> headers in an independent V4L2 buffer before each I frame.
> I'm quite happy to amend this should we have a decent spec of what is
> required. As I've never found a spec it's been trial and error until
> now.
> There is no interlaced support available.
> 
>   Dave

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 14:34 [RFC] Stateful codecs and requirements for compressed formats Hans Verkuil
  2019-06-28 15:21 ` Dave Stevenson
@ 2019-06-28 16:18 ` Nicolas Dufresne
  2019-06-28 18:09 ` Nicolas Dufresne
  2019-07-03  8:32 ` Tomasz Figa
  3 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2019-06-28 16:18 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

[-- Attachment #1: Type: text/plain, Size: 5263 bytes --]

Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
> Hi all,
> 
> I hope I Cc-ed everyone with a stake in this issue.
> 
> One recurring question is how a stateful encoder fills buffers and how a stateful
> decoder consumes buffers.
> 
> The most generic case is that an encoder produces a bitstream and just fills each
> CAPTURE buffer to the brim before continuing with the next buffer.
> 
> I don't think there are drivers that do this, I believe that all drivers just
> output a single compressed frame. For interlaced formats I understand it is either
> one compressed field per buffer, or two compressed fields per buffer (this is
> what I heard, I don't know if this is true).
> 
> In any case, I don't think this is specified anywhere. Please correct me if I am
> wrong.
> 
> The latest stateful codec spec is here:
> 
> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> 
> Assuming what I described above is indeed the case, then I think this should
> be documented. I don't know enough if a flag is needed somewhere to describe
> the behavior for interlaced formats, or can we leave this open and have userspace
> detect this?
> 
> For decoders it is more complicated. The stateful decoder spec is written with
> the assumption that userspace can just fill each OUTPUT buffer to the brim with
> the compressed bitstream. I.e., no need to split at frame or other boundaries.
> 
> See section 4.5.1.7 in the spec.
> 
> But I understand that various HW decoders *do* have limitations. I would really
> like to know about those, since that needs to be exposed to userspace somehow.

So in "4.5.1.7. Decoding", there is a bit of confusion. The text speaks
about ordered of frames in capture and output, but the bullet points
stays that output buffers aren't frames. The following note about
timestamps creates more confusion, since it says there is potentially,
it's not very affirmative, timestamp matching that let you detect re-
ordering done by the driver, but no clarification on how the timestamp
are to be handle if the packing is random.

What seems entirely missing in what we discussed, is a per format
clarification for the behaviour of codec. I was assuming the NAL
alignment to be documented for H264 and HEVC format. It make sense to
allow some more flexibility since these formats are bytestream with
startcodes, but to be, full-frame behaviour is what existing userspace
expects and we should make this the defacto default. And if the buffer
size ends up too small (badly predicted), I believe we should use the
source change event to allow handling that. That being said, we have
been able to survive this for a long time.

For VP8 and VP9, which don't really have a bytestream format, I do
assume it's logical to enforce full frames always. But if not, special
care is needed to ensure the driver can reconstruct the full frames,
since a firmware won't be able to parse the frame boundaries. Now, when
I saw you taking over, I thought it was clear that this was only the
common bits of the spec and that a per format specification would be
developed later.

> Specifically, the venus decoder needs to know the resolution of the coded video
> beforehand and it expects a single frame per buffer (how does that work for
> interlaced formats?).

If the firmware works in a 1:1 behaviour, with H264 you may have two AU
to compose 1 frame in interlaced stream (and that may change for each
frame). In HEVC you'd always have two AU.

> 
> Such requirements mean that some userspace parsing is still required, so these
> decoders are not completely stateful.

There was a discussion about the meaning of the stateful/stateless.
This is not strictly related to parsing, the amount of parsing being
affected is a side effect. The stateful decoder HW (or firmware) offer
an interface with streams. It hides the state of the decoded stream. As
a side effect, the HW can only be multiplexed if the firmware handles
that. On the other end, stateless decoder offer an API where you
configure the decoding of a frame (and sometimes a slice). Two
consecutive frames do not have to be part of the same stream, which has
the side effect of allowing application to handle their own
multiplexing.

> 
> Can every codec author give information about their decoder/encoder?
> 
> I'll start off with my virtual codec driver:
> 
> vicodec: the decoder fully parses the bitstream. The encoder produces a single
> compressed frame per buffer. This driver doesn't yet support interlaced formats,
> but when that is added it will encode one field per buffer.

I just wanted to highlight that there is lot of behaviour specific to
the formats here. Specially this last one, since it implies that
capture format will be field = ALTERNATE for interlace decoding (this
is a relatively rare format). So the behaviour here can already be
inferred by the capture format (appart that interlace mode cannot be
enumerated, so for encoding, it's a bit of a pain to guess). And there
is already in the spec the information needed to match the pairs (or
detect lost field).

> 
> Let's see what the results are.
> 
> Regards,
> 
> 	Hans

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 14:34 [RFC] Stateful codecs and requirements for compressed formats Hans Verkuil
  2019-06-28 15:21 ` Dave Stevenson
  2019-06-28 16:18 ` Nicolas Dufresne
@ 2019-06-28 18:09 ` Nicolas Dufresne
  2019-07-03  8:46   ` Tomasz Figa
  2019-07-10  8:43   ` Hans Verkuil
  2019-07-03  8:32 ` Tomasz Figa
  3 siblings, 2 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2019-06-28 18:09 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

[-- Attachment #1: Type: text/plain, Size: 3863 bytes --]

Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
> Hi all,
> 
> I hope I Cc-ed everyone with a stake in this issue.
> 
> One recurring question is how a stateful encoder fills buffers and how a stateful
> decoder consumes buffers.
> 
> The most generic case is that an encoder produces a bitstream and just fills each
> CAPTURE buffer to the brim before continuing with the next buffer.
> 
> I don't think there are drivers that do this, I believe that all drivers just
> output a single compressed frame. For interlaced formats I understand it is either
> one compressed field per buffer, or two compressed fields per buffer (this is
> what I heard, I don't know if this is true).
> 
> In any case, I don't think this is specified anywhere. Please correct me if I am
> wrong.
> 
> The latest stateful codec spec is here:
> 
> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> 
> Assuming what I described above is indeed the case, then I think this should
> be documented. I don't know enough if a flag is needed somewhere to describe
> the behavior for interlaced formats, or can we leave this open and have userspace
> detect this?
> 
> 
> For decoders it is more complicated. The stateful decoder spec is written with
> the assumption that userspace can just fill each OUTPUT buffer to the brim with
> the compressed bitstream. I.e., no need to split at frame or other boundaries.
> 
> See section 4.5.1.7 in the spec.
> 
> But I understand that various HW decoders *do* have limitations. I would really
> like to know about those, since that needs to be exposed to userspace somehow.
> 
> Specifically, the venus decoder needs to know the resolution of the coded video
> beforehand and it expects a single frame per buffer (how does that work for
> interlaced formats?).
> 
> Such requirements mean that some userspace parsing is still required, so these
> decoders are not completely stateful.
> 
> Can every codec author give information about their decoder/encoder?
> 
> I'll start off with my virtual codec driver:
> 
> vicodec: the decoder fully parses the bitstream. The encoder produces a single
> compressed frame per buffer. This driver doesn't yet support interlaced formats,
> but when that is added it will encode one field per buffer.
> 
> Let's see what the results are.

Hans though a summary of what existing userspace expects / assumes
would be nice.

GStreamer:
==========
Encodes:
  fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
Decodes:
  fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9

It assumes that each encoded v4l2_buffer contains exactly one frame
(any format, two fields for interlaced content). It may still work
otherwise, but some issues will appear, timestamp shift, lost of
metadata (e.g. timecode, cc, etc.).

FFMpeg:
=======
Encodes:
  h263, h264, hevc, mpeg4, vp8
Decodes:
  h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9

Similarly to GStreamer, it assumes that one AVPacket will fit one
v4l2_buffer. On the encoding side, it seems less of a problem, but they
don't fully implement the FFMPEG CODEC API for frame matching, which I
suspect would create some ambiguity if it was.

Chromium:
=========
Decodes:
  H264, VP8, VP9
Encodes:
  H264

That is the code I know the less, but the encoder does not seem
affected by the nal alignment. The keyframe flag and timestamps seems
to be used and are likely expected to correlate with the input, so I
suspect that there exist some possible ambiguity if the output is not
full frame. For the decoder, I'll have to ask someone else to comment,
the code is hard to follow and I could not get to the place where
output buffers are filled. I thought the GStreamer code was tough, but
this is quite similarly a mess.

Nicolas







[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 15:48   ` Nicolas Dufresne
@ 2019-06-29 10:02     ` Dave Stevenson
  2019-06-29 12:55       ` Nicolas Dufresne
  0 siblings, 1 reply; 18+ messages in thread
From: Dave Stevenson @ 2019-06-29 10:02 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Hans Verkuil, Linux Media Mailing List, Boris Brezillon,
	Paul Kocialkowski, Stanimir Varbanov, Philipp Zabel,
	Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

Hi Nicolas

On Fri, 28 Jun 2019 at 16:48, Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le vendredi 28 juin 2019 à 16:21 +0100, Dave Stevenson a écrit :
> > Hi Hans
> >
> > On Fri, 28 Jun 2019 at 15:34, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > > Hi all,
> > >
> > > I hope I Cc-ed everyone with a stake in this issue.
> > >
> > > One recurring question is how a stateful encoder fills buffers and how a stateful
> > > decoder consumes buffers.
> > >
> > > The most generic case is that an encoder produces a bitstream and just fills each
> > > CAPTURE buffer to the brim before continuing with the next buffer.
> > >
> > > I don't think there are drivers that do this, I believe that all drivers just
> > > output a single compressed frame. For interlaced formats I understand it is either
> > > one compressed field per buffer, or two compressed fields per buffer (this is
> > > what I heard, I don't know if this is true).
> >
> > From the discussion that started this thread, with H264 and similar,
> > does the V4L2 buffer contain just the frame data, or the SPS/PPS
> > headers as well.
>
> In existing mainline encoder driver the SPS/PPS is included in the
> first frame produced. Decoders expect them to be in the first frame
> queued. For decoder, this is being relaxed now that we have a mechanism
> to notify the state change after the header has been processed.

So it sounds like the one bit missing is everyone's "friend" -
documentation. I'm eternally grateful to those who are making the
efforts in updating it. It's a thankless task, but absolutely
necessary.

For those outside the core linux-media circles there is a choice to be
made, and different APIs do adopt different approaches.
OpenMAX IL for one explicitly documents exactly the opposite approach
to V4L2, although admittedly through an optional flag. 1.1.2 spec [1],
section 3.1.2.7.1 (page 70)

The OMX_BUFFERFLAG_CODECCONFIG is an optional flag that is
set by an output port when all bytes in the buffer form part or all of a set of
codec specific configuration data. Examples include SPS/PPS nal units
for OMX_VIDEO_CodingAVC or AudioSpecificConfig data for
OMX_AUDIO_CodingAAC. Any component that for a given stream sets
OMX_BUFFERFLAG_CODECCONFIG shall not mix codec
configuration bytes with frame data in the same buffer, and shall send all
buffers containing codec configuration bytes before any buffers containing
frame data that those configurations bytes describe.

  Dave

[1] https://www.khronos.org/registry/OpenMAX-IL/specs/OpenMAX_IL_1_1_2_Specification.pdf

> > > In any case, I don't think this is specified anywhere. Please correct me if I am
> > > wrong.
> > >
> > > The latest stateful codec spec is here:
> > >
> > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > >
> > > Assuming what I described above is indeed the case, then I think this should
> > > be documented. I don't know enough if a flag is needed somewhere to describe
> > > the behavior for interlaced formats, or can we leave this open and have userspace
> > > detect this?
> > >
> > >
> > > For decoders it is more complicated. The stateful decoder spec is written with
> > > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > >
> > > See section 4.5.1.7 in the spec.
> > >
> > > But I understand that various HW decoders *do* have limitations. I would really
> > > like to know about those, since that needs to be exposed to userspace somehow.
> > >
> > > Specifically, the venus decoder needs to know the resolution of the coded video
> > > beforehand and it expects a single frame per buffer (how does that work for
> > > interlaced formats?).
> > >
> > > Such requirements mean that some userspace parsing is still required, so these
> > > decoders are not completely stateful.
> > >
> > > Can every codec author give information about their decoder/encoder?
> > >
> > > I'll start off with my virtual codec driver:
> > >
> > > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > > but when that is added it will encode one field per buffer.
> >
> > On BCM283x:
> >
> > The underlying decoder will accept anything, but giving it a single
> > frame per buffer reduces latency as the bitstream parser gets kicked
> > earlier. Based on previous discussions I am setting the flag so that
> > it expects one compressed frame per buffer, but I don't believe it
> > goes wrong should that not be the case (it'll just waste a bit of
> > processing effort).
> > It'll parse the headers and produce a V4L2_EVENT_SOURCE_CHANGE event
> > should the capture queue format not match the stream parameters.
> > Interlacing isn't supported yet (it's on the list), but I believe the
> > hardware produces the equivalent to V4L2_FIELD_INTERLACED_[TB|BT].
> >
> > The encoder currently spits out the H264 SPS/PPS headers as a separate
> > V4L2 buffer, and then one compressed frame per V4L2 buffer (provided
> > the buffer is big enough). Should
> > V4L2_CID_MPEG_VIDEO_REPEAT_SEQ_HEADER be set, then it will repeat the
> > headers in an independent V4L2 buffer before each I frame.
> > I'm quite happy to amend this should we have a decent spec of what is
> > required. As I've never found a spec it's been trial and error until
> > now.
> > There is no interlaced support available.
> >
> >   Dave

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-29 10:02     ` Dave Stevenson
@ 2019-06-29 12:55       ` Nicolas Dufresne
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2019-06-29 12:55 UTC (permalink / raw)
  To: Dave Stevenson
  Cc: Hans Verkuil, Linux Media Mailing List, Boris Brezillon,
	Paul Kocialkowski, Stanimir Varbanov, Philipp Zabel,
	Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

Le samedi 29 juin 2019 à 11:02 +0100, Dave Stevenson a écrit :
> Hi Nicolas
> 
> On Fri, 28 Jun 2019 at 16:48, Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
> > Le vendredi 28 juin 2019 à 16:21 +0100, Dave Stevenson a écrit :
> > > Hi Hans
> > > 
> > > On Fri, 28 Jun 2019 at 15:34, Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > > > Hi all,
> > > > 
> > > > I hope I Cc-ed everyone with a stake in this issue.
> > > > 
> > > > One recurring question is how a stateful encoder fills buffers and how a stateful
> > > > decoder consumes buffers.
> > > > 
> > > > The most generic case is that an encoder produces a bitstream and just fills each
> > > > CAPTURE buffer to the brim before continuing with the next buffer.
> > > > 
> > > > I don't think there are drivers that do this, I believe that all drivers just
> > > > output a single compressed frame. For interlaced formats I understand it is either
> > > > one compressed field per buffer, or two compressed fields per buffer (this is
> > > > what I heard, I don't know if this is true).
> > > 
> > > From the discussion that started this thread, with H264 and similar,
> > > does the V4L2 buffer contain just the frame data, or the SPS/PPS
> > > headers as well.
> > 
> > In existing mainline encoder driver the SPS/PPS is included in the
> > first frame produced. Decoders expect them to be in the first frame
> > queued. For decoder, this is being relaxed now that we have a mechanism
> > to notify the state change after the header has been processed.
> 
> So it sounds like the one bit missing is everyone's "friend" -
> documentation. I'm eternally grateful to those who are making the
> efforts in updating it. It's a thankless task, but absolutely
> necessary.
> 
> For those outside the core linux-media circles there is a choice to be
> made, and different APIs do adopt different approaches.
> OpenMAX IL for one explicitly documents exactly the opposite approach
> to V4L2, although admittedly through an optional flag. 1.1.2 spec [1],
> section 3.1.2.7.1 (page 70)
> 
> The OMX_BUFFERFLAG_CODECCONFIG is an optional flag that is
> set by an output port when all bytes in the buffer form part or all of a set of
> codec specific configuration data. Examples include SPS/PPS nal units
> for OMX_VIDEO_CodingAVC or AudioSpecificConfig data for
> OMX_AUDIO_CodingAAC. Any component that for a given stream sets
> OMX_BUFFERFLAG_CODECCONFIG shall not mix codec
> configuration bytes with frame data in the same buffer, and shall send all
> buffers containing codec configuration bytes before any buffers containing
> frame data that those configurations bytes describe.

That only applies to OMX encoders. OMX decoders accepts any alignment
but will have higher latency. On the decoder side they use an
END_OF_FRAME kind of flag for when the data is pre-parsed in order to
reduce that latency (I have only found NVidia respecting this).

A rational for doing it like this is that it will ease use cases where
you want to pass the codec config out-of-band. This is notably the case
for ISOMP4, but then you need to convert start-code into AVC header
(which requires parsing) or for some RTP based protocols where you'd
pass the headers through an SDP or other signalling. I have never met
two OMX stack doing the same thing is this regard, so we ended merging
these together and letting our generic parser handle the conversion.

For the Linux kernel, I don't think we have an equivalent of
OMX_BUFFERFLAG_CODECCONFIG. I believe we also have the problem that if
a firmware decides to merge things, we would have to do some bitstream
parsing in order to separate them (which has been clearly stated as a
no-go so far). Note that some firmware don't even produce the headers,
the drivers need to produce them.

What I'm bringing here is what drivers has been doing since 2011. I
think most userspace will work regardless of the encoder buffer
alignment, but some glitches may also exist. An example, in GStreamer
the code that do input/output timestamp matching was commented out
because the value coming from Samsung MFC driver was completely random.
No one have worked on re-enabling this since, as all drivers started
following this 1 buffer 1 frame rule. The side effect, is that the
matching of metadata (timestamp, timecode, AFB, CC, Tags, etc.) may be
off by a small amount of frames. In general it does not break
completely.

> 
>   Dave
> 
> [1] https://www.khronos.org/registry/OpenMAX-IL/specs/OpenMAX_IL_1_1_2_Specification.pdf

Just a reminder that OMX is a dead specification. All member of the
board have left and no further update will be made available. There was
very good advances in 1.2 draft, but it still got abandoned. In the
end, each implementation represent it's own fork of the specification.
You'll even find some DMABuf support on the ZynqMP OMX stack. But
supporting OMX has became a incredible mess and is not portable at all
(which defeat the purpose of the spec in the first place).

> 
> > > > In any case, I don't think this is specified anywhere. Please correct me if I am
> > > > wrong.
> > > > 
> > > > The latest stateful codec spec is here:
> > > > 
> > > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > > > 
> > > > Assuming what I described above is indeed the case, then I think this should
> > > > be documented. I don't know enough if a flag is needed somewhere to describe
> > > > the behavior for interlaced formats, or can we leave this open and have userspace
> > > > detect this?
> > > > 
> > > > 
> > > > For decoders it is more complicated. The stateful decoder spec is written with
> > > > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > > > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > > > 
> > > > See section 4.5.1.7 in the spec.
> > > > 
> > > > But I understand that various HW decoders *do* have limitations. I would really
> > > > like to know about those, since that needs to be exposed to userspace somehow.
> > > > 
> > > > Specifically, the venus decoder needs to know the resolution of the coded video
> > > > beforehand and it expects a single frame per buffer (how does that work for
> > > > interlaced formats?).
> > > > 
> > > > Such requirements mean that some userspace parsing is still required, so these
> > > > decoders are not completely stateful.
> > > > 
> > > > Can every codec author give information about their decoder/encoder?
> > > > 
> > > > I'll start off with my virtual codec driver:
> > > > 
> > > > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > > > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > > > but when that is added it will encode one field per buffer.
> > > 
> > > On BCM283x:
> > > 
> > > The underlying decoder will accept anything, but giving it a single
> > > frame per buffer reduces latency as the bitstream parser gets kicked
> > > earlier. Based on previous discussions I am setting the flag so that
> > > it expects one compressed frame per buffer, but I don't believe it
> > > goes wrong should that not be the case (it'll just waste a bit of
> > > processing effort).
> > > It'll parse the headers and produce a V4L2_EVENT_SOURCE_CHANGE event
> > > should the capture queue format not match the stream parameters.
> > > Interlacing isn't supported yet (it's on the list), but I believe the
> > > hardware produces the equivalent to V4L2_FIELD_INTERLACED_[TB|BT].
> > > 
> > > The encoder currently spits out the H264 SPS/PPS headers as a separate
> > > V4L2 buffer, and then one compressed frame per V4L2 buffer (provided
> > > the buffer is big enough). Should
> > > V4L2_CID_MPEG_VIDEO_REPEAT_SEQ_HEADER be set, then it will repeat the
> > > headers in an independent V4L2 buffer before each I frame.
> > > I'm quite happy to amend this should we have a decent spec of what is
> > > required. As I've never found a spec it's been trial and error until
> > > now.
> > > There is no interlaced support available.
> > > 
> > >   Dave


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 14:34 [RFC] Stateful codecs and requirements for compressed formats Hans Verkuil
                   ` (2 preceding siblings ...)
  2019-06-28 18:09 ` Nicolas Dufresne
@ 2019-07-03  8:32 ` Tomasz Figa
  2019-07-03 14:46   ` Philipp Zabel
                     ` (3 more replies)
  3 siblings, 4 replies; 18+ messages in thread
From: Tomasz Figa @ 2019-07-03  8:32 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Nicolas Dufresne, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter,
	Sylwester Nawrocki

Hi Hans,

On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi all,
>
> I hope I Cc-ed everyone with a stake in this issue.
>
> One recurring question is how a stateful encoder fills buffers and how a stateful
> decoder consumes buffers.
>
> The most generic case is that an encoder produces a bitstream and just fills each
> CAPTURE buffer to the brim before continuing with the next buffer.
>
> I don't think there are drivers that do this, I believe that all drivers just
> output a single compressed frame. For interlaced formats I understand it is either
> one compressed field per buffer, or two compressed fields per buffer (this is
> what I heard, I don't know if this is true).
>
> In any case, I don't think this is specified anywhere. Please correct me if I am
> wrong.
>
> The latest stateful codec spec is here:
>
> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
>
> Assuming what I described above is indeed the case, then I think this should
> be documented. I don't know enough if a flag is needed somewhere to describe
> the behavior for interlaced formats, or can we leave this open and have userspace
> detect this?
>

From Chromium perspective, we don't have any use case for encoding
interlaced contents, so we'll be okay with whatever the interested
parties decide on. :)

>
> For decoders it is more complicated. The stateful decoder spec is written with
> the assumption that userspace can just fill each OUTPUT buffer to the brim with
> the compressed bitstream. I.e., no need to split at frame or other boundaries.
>
> See section 4.5.1.7 in the spec.
>
> But I understand that various HW decoders *do* have limitations. I would really
> like to know about those, since that needs to be exposed to userspace somehow.

AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
separate buffers. I believe it also needs 1 buffer to contain exactly
1 frame and 1 frame to be fully contained inside 1 buffer.

Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
be fully contained inside 1 buffer. It used to have some specific
requirements regarding SPS and PPS too, but I think that was fixed in
the firmware.

>
> Specifically, the venus decoder needs to know the resolution of the coded video
> beforehand

I don't think that's true for venus. It does parsing and can detect
the resolution.

However that's probably the case for coda...

> and it expects a single frame per buffer (how does that work for
> interlaced formats?).
>
> Such requirements mean that some userspace parsing is still required, so these
> decoders are not completely stateful.
>
> Can every codec author give information about their decoder/encoder?
>
> I'll start off with my virtual codec driver:
>
> vicodec: the decoder fully parses the bitstream. The encoder produces a single
> compressed frame per buffer. This driver doesn't yet support interlaced formats,
> but when that is added it will encode one field per buffer.
>
> Let's see what the results are.

s5p-mfc:
 decoder: fully parses the bitstream,
 encoder: produces a single frame per buffer (haven't tested interlaced stuff)

mtk-vcodec:
 decoder: expects separate buffers for SPS, PPS and full frames
(including some random stuff like SEIMessage),
 encoder: produces a single frame per buffer (haven't tested interlaced stuff)

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 18:09 ` Nicolas Dufresne
@ 2019-07-03  8:46   ` Tomasz Figa
  2019-07-03 17:43     ` Nicolas Dufresne
  2019-07-10  8:43   ` Hans Verkuil
  1 sibling, 1 reply; 18+ messages in thread
From: Tomasz Figa @ 2019-07-03  8:46 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Hans Verkuil, Linux Media Mailing List, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter,
	Sylwester Nawrocki

On Sat, Jun 29, 2019 at 3:09 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
> > Hi all,
> >
> > I hope I Cc-ed everyone with a stake in this issue.
> >
> > One recurring question is how a stateful encoder fills buffers and how a stateful
> > decoder consumes buffers.
> >
> > The most generic case is that an encoder produces a bitstream and just fills each
> > CAPTURE buffer to the brim before continuing with the next buffer.
> >
> > I don't think there are drivers that do this, I believe that all drivers just
> > output a single compressed frame. For interlaced formats I understand it is either
> > one compressed field per buffer, or two compressed fields per buffer (this is
> > what I heard, I don't know if this is true).
> >
> > In any case, I don't think this is specified anywhere. Please correct me if I am
> > wrong.
> >
> > The latest stateful codec spec is here:
> >
> > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> >
> > Assuming what I described above is indeed the case, then I think this should
> > be documented. I don't know enough if a flag is needed somewhere to describe
> > the behavior for interlaced formats, or can we leave this open and have userspace
> > detect this?
> >
> >
> > For decoders it is more complicated. The stateful decoder spec is written with
> > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> >
> > See section 4.5.1.7 in the spec.
> >
> > But I understand that various HW decoders *do* have limitations. I would really
> > like to know about those, since that needs to be exposed to userspace somehow.
> >
> > Specifically, the venus decoder needs to know the resolution of the coded video
> > beforehand and it expects a single frame per buffer (how does that work for
> > interlaced formats?).
> >
> > Such requirements mean that some userspace parsing is still required, so these
> > decoders are not completely stateful.
> >
> > Can every codec author give information about their decoder/encoder?
> >
> > I'll start off with my virtual codec driver:
> >
> > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > but when that is added it will encode one field per buffer.
> >
> > Let's see what the results are.
>
> Hans though a summary of what existing userspace expects / assumes
> would be nice.
>
> GStreamer:
> ==========
> Encodes:
>   fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
> Decodes:
>   fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9
>
> It assumes that each encoded v4l2_buffer contains exactly one frame
> (any format, two fields for interlaced content). It may still work
> otherwise, but some issues will appear, timestamp shift, lost of
> metadata (e.g. timecode, cc, etc.).
>
> FFMpeg:
> =======
> Encodes:
>   h263, h264, hevc, mpeg4, vp8
> Decodes:
>   h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9
>
> Similarly to GStreamer, it assumes that one AVPacket will fit one
> v4l2_buffer. On the encoding side, it seems less of a problem, but they
> don't fully implement the FFMPEG CODEC API for frame matching, which I
> suspect would create some ambiguity if it was.
>
> Chromium:
> =========
> Decodes:
>   H264, VP8, VP9
> Encodes:
>   H264

VP8 too.

It can in theory handle any format V4L2 could expose, but these 2 seem
to be the only commonly used codecs used in practice and supported by
hardware.

>
> That is the code I know the less, but the encoder does not seem
> affected by the nal alignment. The keyframe flag and timestamps seems
> to be used and are likely expected to correlate with the input, so I
> suspect that there exist some possible ambiguity if the output is not
> full frame. For the decoder, I'll have to ask someone else to comment,
> the code is hard to follow and I could not get to the place where
> output buffers are filled. I thought the GStreamer code was tough, but
> this is quite similarly a mess.

Not sure what's so complicated there. There is a clearly isolated
function that does the parsing:
https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.cc?rcl=2880fe4f6b246809f1be72c5a5698dced4cd85d1&l=984

It puts special NALUs like SPS and PPS in separate buffers and for
frames it's 1 frame (all slices of the frame) : 1 buffer.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-03  8:32 ` Tomasz Figa
@ 2019-07-03 14:46   ` Philipp Zabel
  2019-07-03 17:46   ` Nicolas Dufresne
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Philipp Zabel @ 2019-07-03 14:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Linux Media Mailing List, Nicolas Dufresne, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Ezequiel Garcia, Michael Tretter, Sylwester Nawrocki

On Wed, 2019-07-03 at 17:32 +0900, Tomasz Figa wrote:
> Hi Hans,
> 
> On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > 
> > Hi all,
> > 
> > I hope I Cc-ed everyone with a stake in this issue.
> > 
> > One recurring question is how a stateful encoder fills buffers and how a stateful
> > decoder consumes buffers.
> > 
> > The most generic case is that an encoder produces a bitstream and just fills each
> > CAPTURE buffer to the brim before continuing with the next buffer.
> > 
> > I don't think there are drivers that do this, I believe that all drivers just
> > output a single compressed frame. For interlaced formats I understand it is either
> > one compressed field per buffer, or two compressed fields per buffer (this is
> > what I heard, I don't know if this is true).
> > 
> > In any case, I don't think this is specified anywhere. Please correct me if I am
> > wrong.
> > 
> > The latest stateful codec spec is here:
> > 
> > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > 
> > Assuming what I described above is indeed the case, then I think this should
> > be documented. I don't know enough if a flag is needed somewhere to describe
> > the behavior for interlaced formats, or can we leave this open and have userspace
> > detect this?
> > 
> 
> From Chromium perspective, we don't have any use case for encoding
> interlaced contents, so we'll be okay with whatever the interested
> parties decide on. :)
> 
> > 
> > For decoders it is more complicated. The stateful decoder spec is written with
> > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > 
> > See section 4.5.1.7 in the spec.
> > 
> > But I understand that various HW decoders *do* have limitations. I would really
> > like to know about those, since that needs to be exposed to userspace somehow.
> 
> AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
> separate buffers. I believe it also needs 1 buffer to contain exactly
> 1 frame and 1 frame to be fully contained inside 1 buffer.
> 
> Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
> be fully contained inside 1 buffer. It used to have some specific
> requirements regarding SPS and PPS too, but I think that was fixed in
> the firmware.
> 
> > 
> > Specifically, the venus decoder needs to know the resolution of the coded video
> > beforehand
> 
> I don't think that's true for venus. It does parsing and can detect
> the resolution.
> 
> However that's probably the case for coda...

Yes, it is currently true for the coda driver. But I believe it is not
actually necessary for coda hardware / firmware. I have already started
to split sequence initialization (where the firmare parses the bitstream
headers) from internal frame buffer allocation (which have to match
capture buffers in size), and I think it should be possible to
completely decouple the two and postpone buffer allocation far enough to
allow output stream start without prior knowledge of the resolution.

The decoder coda firmware fully parses the bitstream, but the driver has
to copy it from the external output buffers into an internal bitstream
ringbuffer anyway, and a few workarounds are necessary to make it always
succeed regardless of whether the first buffer presented to it only
contains headers, headers and a very small frame, or enough data to
completely fill the bitstream reader's prefetch buffer. For this the
driver has to parse the NAL start headers to a certain degree.

Due to this bitstream copy in the driver, in theory there are no limits
on how the input data is split into v4l2 buffers, but in practice only
single frame per v4l2 output buffer use cases are actually tested
regularly.

The encoder produces a single compressed frame per buffer. There is no
support for B frames in the firmware, as far as I can tell. There is no
driver support for interlaced formats currently, I'm not sure whether
the firmware supports interlacing.

regards
Philipp

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-03  8:46   ` Tomasz Figa
@ 2019-07-03 17:43     ` Nicolas Dufresne
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2019-07-03 17:43 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Hans Verkuil, Linux Media Mailing List, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter,
	Sylwester Nawrocki

[-- Attachment #1: Type: text/plain, Size: 5195 bytes --]

Le mercredi 03 juillet 2019 à 17:46 +0900, Tomasz Figa a écrit :
> On Sat, Jun 29, 2019 at 3:09 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
> > Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
> > > Hi all,
> > > 
> > > I hope I Cc-ed everyone with a stake in this issue.
> > > 
> > > One recurring question is how a stateful encoder fills buffers and how a stateful
> > > decoder consumes buffers.
> > > 
> > > The most generic case is that an encoder produces a bitstream and just fills each
> > > CAPTURE buffer to the brim before continuing with the next buffer.
> > > 
> > > I don't think there are drivers that do this, I believe that all drivers just
> > > output a single compressed frame. For interlaced formats I understand it is either
> > > one compressed field per buffer, or two compressed fields per buffer (this is
> > > what I heard, I don't know if this is true).
> > > 
> > > In any case, I don't think this is specified anywhere. Please correct me if I am
> > > wrong.
> > > 
> > > The latest stateful codec spec is here:
> > > 
> > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > > 
> > > Assuming what I described above is indeed the case, then I think this should
> > > be documented. I don't know enough if a flag is needed somewhere to describe
> > > the behavior for interlaced formats, or can we leave this open and have userspace
> > > detect this?
> > > 
> > > 
> > > For decoders it is more complicated. The stateful decoder spec is written with
> > > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > > 
> > > See section 4.5.1.7 in the spec.
> > > 
> > > But I understand that various HW decoders *do* have limitations. I would really
> > > like to know about those, since that needs to be exposed to userspace somehow.
> > > 
> > > Specifically, the venus decoder needs to know the resolution of the coded video
> > > beforehand and it expects a single frame per buffer (how does that work for
> > > interlaced formats?).
> > > 
> > > Such requirements mean that some userspace parsing is still required, so these
> > > decoders are not completely stateful.
> > > 
> > > Can every codec author give information about their decoder/encoder?
> > > 
> > > I'll start off with my virtual codec driver:
> > > 
> > > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > > but when that is added it will encode one field per buffer.
> > > 
> > > Let's see what the results are.
> > 
> > Hans though a summary of what existing userspace expects / assumes
> > would be nice.
> > 
> > GStreamer:
> > ==========
> > Encodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
> > Decodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > It assumes that each encoded v4l2_buffer contains exactly one frame
> > (any format, two fields for interlaced content). It may still work
> > otherwise, but some issues will appear, timestamp shift, lost of
> > metadata (e.g. timecode, cc, etc.).
> > 
> > FFMpeg:
> > =======
> > Encodes:
> >   h263, h264, hevc, mpeg4, vp8
> > Decodes:
> >   h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > Similarly to GStreamer, it assumes that one AVPacket will fit one
> > v4l2_buffer. On the encoding side, it seems less of a problem, but they
> > don't fully implement the FFMPEG CODEC API for frame matching, which I
> > suspect would create some ambiguity if it was.
> > 
> > Chromium:
> > =========
> > Decodes:
> >   H264, VP8, VP9
> > Encodes:
> >   H264
> 
> VP8 too.
> 
> It can in theory handle any format V4L2 could expose, but these 2 seem
> to be the only commonly used codecs used in practice and supported by
> hardware.
> 
> > That is the code I know the less, but the encoder does not seem
> > affected by the nal alignment. The keyframe flag and timestamps seems
> > to be used and are likely expected to correlate with the input, so I
> > suspect that there exist some possible ambiguity if the output is not
> > full frame. For the decoder, I'll have to ask someone else to comment,
> > the code is hard to follow and I could not get to the place where
> > output buffers are filled. I thought the GStreamer code was tough, but
> > this is quite similarly a mess.
> 
> Not sure what's so complicated there. There is a clearly isolated
> function that does the parsing:
> https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.cc?rcl=2880fe4f6b246809f1be72c5a5698dced4cd85d1&l=984
> 
> It puts special NALUs like SPS and PPS in separate buffers and for
> frames it's 1 frame (all slices of the frame) : 1 buffer.

Consider this a feedback, but the mix of parsing with decoding, along
with the name of the method "::AdvanceFrameFragment".

Thanks for pointing to this code. Was there any HW where this split was
strictly required ?

> 
> Best regards,
> Tomasz

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-03  8:32 ` Tomasz Figa
  2019-07-03 14:46   ` Philipp Zabel
@ 2019-07-03 17:46   ` Nicolas Dufresne
  2019-07-10  9:14   ` Hans Verkuil
  2019-07-11  1:42   ` Nicolas Dufresne
  3 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2019-07-03 17:46 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Linux Media Mailing List, Dave Stevenson, Boris Brezillon,
	Paul Kocialkowski, Stanimir Varbanov, Philipp Zabel,
	Ezequiel Garcia, Michael Tretter, Sylwester Nawrocki

[-- Attachment #1: Type: text/plain, Size: 3975 bytes --]

Le mercredi 03 juillet 2019 à 17:32 +0900, Tomasz Figa a écrit :
> Hi Hans,
> 
> On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > Hi all,
> > 
> > I hope I Cc-ed everyone with a stake in this issue.
> > 
> > One recurring question is how a stateful encoder fills buffers and how a stateful
> > decoder consumes buffers.
> > 
> > The most generic case is that an encoder produces a bitstream and just fills each
> > CAPTURE buffer to the brim before continuing with the next buffer.
> > 
> > I don't think there are drivers that do this, I believe that all drivers just
> > output a single compressed frame. For interlaced formats I understand it is either
> > one compressed field per buffer, or two compressed fields per buffer (this is
> > what I heard, I don't know if this is true).
> > 
> > In any case, I don't think this is specified anywhere. Please correct me if I am
> > wrong.
> > 
> > The latest stateful codec spec is here:
> > 
> > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > 
> > Assuming what I described above is indeed the case, then I think this should
> > be documented. I don't know enough if a flag is needed somewhere to describe
> > the behavior for interlaced formats, or can we leave this open and have userspace
> > detect this?
> > 
> 
> From Chromium perspective, we don't have any use case for encoding
> interlaced contents, so we'll be okay with whatever the interested
> parties decide on. :)
> 
> > For decoders it is more complicated. The stateful decoder spec is written with
> > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > 
> > See section 4.5.1.7 in the spec.
> > 
> > But I understand that various HW decoders *do* have limitations. I would really
> > like to know about those, since that needs to be exposed to userspace somehow.
> 
> AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
> separate buffers. I believe it also needs 1 buffer to contain exactly
> 1 frame and 1 frame to be fully contained inside 1 buffer.
> 
> Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
> be fully contained inside 1 buffer. It used to have some specific
> requirements regarding SPS and PPS too, but I think that was fixed in
> the firmware.
> 
> > Specifically, the venus decoder needs to know the resolution of the coded video
> > beforehand
> 
> I don't think that's true for venus. It does parsing and can detect
> the resolution.
> 
> However that's probably the case for coda...

I'm probably the worst person to have access to the documentation, but
from the documentation I have read, it seems like this is a limitation
of the amount of the driver. Unless it's a limitation of 960,
documentation I have is 970.

> 
> > and it expects a single frame per buffer (how does that work for
> > interlaced formats?).
> > 
> > Such requirements mean that some userspace parsing is still required, so these
> > decoders are not completely stateful.
> > 
> > Can every codec author give information about their decoder/encoder?
> > 
> > I'll start off with my virtual codec driver:
> > 
> > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > but when that is added it will encode one field per buffer.
> > 
> > Let's see what the results are.
> 
> s5p-mfc:
>  decoder: fully parses the bitstream,
>  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> 
> mtk-vcodec:
>  decoder: expects separate buffers for SPS, PPS and full frames
> (including some random stuff like SEIMessage),
>  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> 
> Best regards,
> Tomasz

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-06-28 18:09 ` Nicolas Dufresne
  2019-07-03  8:46   ` Tomasz Figa
@ 2019-07-10  8:43   ` Hans Verkuil
  2019-07-11  1:40     ` Nicolas Dufresne
  1 sibling, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2019-07-10  8:43 UTC (permalink / raw)
  To: Nicolas Dufresne, Linux Media Mailing List, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

On 6/28/19 8:09 PM, Nicolas Dufresne wrote:
> Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
>> Hi all,
>>
>> I hope I Cc-ed everyone with a stake in this issue.
>>
>> One recurring question is how a stateful encoder fills buffers and how a stateful
>> decoder consumes buffers.
>>
>> The most generic case is that an encoder produces a bitstream and just fills each
>> CAPTURE buffer to the brim before continuing with the next buffer.
>>
>> I don't think there are drivers that do this, I believe that all drivers just
>> output a single compressed frame. For interlaced formats I understand it is either
>> one compressed field per buffer, or two compressed fields per buffer (this is
>> what I heard, I don't know if this is true).
>>
>> In any case, I don't think this is specified anywhere. Please correct me if I am
>> wrong.
>>
>> The latest stateful codec spec is here:
>>
>> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
>>
>> Assuming what I described above is indeed the case, then I think this should
>> be documented. I don't know enough if a flag is needed somewhere to describe
>> the behavior for interlaced formats, or can we leave this open and have userspace
>> detect this?
>>
>>
>> For decoders it is more complicated. The stateful decoder spec is written with
>> the assumption that userspace can just fill each OUTPUT buffer to the brim with
>> the compressed bitstream. I.e., no need to split at frame or other boundaries.
>>
>> See section 4.5.1.7 in the spec.
>>
>> But I understand that various HW decoders *do* have limitations. I would really
>> like to know about those, since that needs to be exposed to userspace somehow.
>>
>> Specifically, the venus decoder needs to know the resolution of the coded video
>> beforehand and it expects a single frame per buffer (how does that work for
>> interlaced formats?).
>>
>> Such requirements mean that some userspace parsing is still required, so these
>> decoders are not completely stateful.
>>
>> Can every codec author give information about their decoder/encoder?
>>
>> I'll start off with my virtual codec driver:
>>
>> vicodec: the decoder fully parses the bitstream. The encoder produces a single
>> compressed frame per buffer. This driver doesn't yet support interlaced formats,
>> but when that is added it will encode one field per buffer.
>>
>> Let's see what the results are.
> 
> Hans though a summary of what existing userspace expects / assumes
> would be nice.
> 
> GStreamer:
> ==========
> Encodes:
>   fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
> Decodes:
>   fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9
> 
> It assumes that each encoded v4l2_buffer contains exactly one frame
> (any format, two fields for interlaced content). It may still work
> otherwise, but some issues will appear, timestamp shift, lost of
> metadata (e.g. timecode, cc, etc.).

When you say 'each encoded v4l2_buffer contains exactly on frame',
does that include H.264 SPS/PPS headers? Or are those passed in
a separate v4l2_buffer? Ditto for FFMPEG.

Regards,

	Hans

> 
> FFMpeg:
> =======
> Encodes:
>   h263, h264, hevc, mpeg4, vp8
> Decodes:
>   h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9
> 
> Similarly to GStreamer, it assumes that one AVPacket will fit one
> v4l2_buffer. On the encoding side, it seems less of a problem, but they
> don't fully implement the FFMPEG CODEC API for frame matching, which I
> suspect would create some ambiguity if it was.
> 
> Chromium:
> =========
> Decodes:
>   H264, VP8, VP9
> Encodes:
>   H264
> 
> That is the code I know the less, but the encoder does not seem
> affected by the nal alignment. The keyframe flag and timestamps seems
> to be used and are likely expected to correlate with the input, so I
> suspect that there exist some possible ambiguity if the output is not
> full frame. For the decoder, I'll have to ask someone else to comment,
> the code is hard to follow and I could not get to the place where
> output buffers are filled. I thought the GStreamer code was tough, but
> this is quite similarly a mess.
> 
> Nicolas
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-03  8:32 ` Tomasz Figa
  2019-07-03 14:46   ` Philipp Zabel
  2019-07-03 17:46   ` Nicolas Dufresne
@ 2019-07-10  9:14   ` Hans Verkuil
  2019-07-11 12:49     ` Tomasz Figa
  2019-07-11  1:42   ` Nicolas Dufresne
  3 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2019-07-10  9:14 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Nicolas Dufresne, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter,
	Sylwester Nawrocki

On 7/3/19 10:32 AM, Tomasz Figa wrote:
> Hi Hans,
> 
> On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> Hi all,
>>
>> I hope I Cc-ed everyone with a stake in this issue.
>>
>> One recurring question is how a stateful encoder fills buffers and how a stateful
>> decoder consumes buffers.
>>
>> The most generic case is that an encoder produces a bitstream and just fills each
>> CAPTURE buffer to the brim before continuing with the next buffer.
>>
>> I don't think there are drivers that do this, I believe that all drivers just
>> output a single compressed frame. For interlaced formats I understand it is either
>> one compressed field per buffer, or two compressed fields per buffer (this is
>> what I heard, I don't know if this is true).
>>
>> In any case, I don't think this is specified anywhere. Please correct me if I am
>> wrong.
>>
>> The latest stateful codec spec is here:
>>
>> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
>>
>> Assuming what I described above is indeed the case, then I think this should
>> be documented. I don't know enough if a flag is needed somewhere to describe
>> the behavior for interlaced formats, or can we leave this open and have userspace
>> detect this?
>>
> 
> From Chromium perspective, we don't have any use case for encoding
> interlaced contents, so we'll be okay with whatever the interested
> parties decide on. :)
> 
>>
>> For decoders it is more complicated. The stateful decoder spec is written with
>> the assumption that userspace can just fill each OUTPUT buffer to the brim with
>> the compressed bitstream. I.e., no need to split at frame or other boundaries.
>>
>> See section 4.5.1.7 in the spec.
>>
>> But I understand that various HW decoders *do* have limitations. I would really
>> like to know about those, since that needs to be exposed to userspace somehow.
> 
> AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
> separate buffers. I believe it also needs 1 buffer to contain exactly
> 1 frame and 1 frame to be fully contained inside 1 buffer.
> 
> Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
> be fully contained inside 1 buffer. It used to have some specific
> requirements regarding SPS and PPS too, but I think that was fixed in
> the firmware.
> 
>>
>> Specifically, the venus decoder needs to know the resolution of the coded video
>> beforehand
> 
> I don't think that's true for venus. It does parsing and can detect
> the resolution.
> 
> However that's probably the case for coda...
> 
>> and it expects a single frame per buffer (how does that work for
>> interlaced formats?).
>>
>> Such requirements mean that some userspace parsing is still required, so these
>> decoders are not completely stateful.
>>
>> Can every codec author give information about their decoder/encoder?
>>
>> I'll start off with my virtual codec driver:
>>
>> vicodec: the decoder fully parses the bitstream. The encoder produces a single
>> compressed frame per buffer. This driver doesn't yet support interlaced formats,
>> but when that is added it will encode one field per buffer.
>>
>> Let's see what the results are.
> 
> s5p-mfc:
>  decoder: fully parses the bitstream,
>  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> 
> mtk-vcodec:
>  decoder: expects separate buffers for SPS, PPS and full frames
> (including some random stuff like SEIMessage),

Do you mean that the SPS/PPS etc. should all be in separate buffers? I.e.
you can't combine SPS and PPS in a single buffer?

Regards,

	Hans

>  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> 
> Best regards,
> Tomasz
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-10  8:43   ` Hans Verkuil
@ 2019-07-11  1:40     ` Nicolas Dufresne
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Dufresne @ 2019-07-11  1:40 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter, Tomasz Figa,
	Sylwester Nawrocki

Le mercredi 10 juillet 2019 à 10:43 +0200, Hans Verkuil a écrit :
> On 6/28/19 8:09 PM, Nicolas Dufresne wrote:
> > Le vendredi 28 juin 2019 à 16:34 +0200, Hans Verkuil a écrit :
> > > Hi all,
> > > 
> > > I hope I Cc-ed everyone with a stake in this issue.
> > > 
> > > One recurring question is how a stateful encoder fills buffers and how a stateful
> > > decoder consumes buffers.
> > > 
> > > The most generic case is that an encoder produces a bitstream and just fills each
> > > CAPTURE buffer to the brim before continuing with the next buffer.
> > > 
> > > I don't think there are drivers that do this, I believe that all drivers just
> > > output a single compressed frame. For interlaced formats I understand it is either
> > > one compressed field per buffer, or two compressed fields per buffer (this is
> > > what I heard, I don't know if this is true).
> > > 
> > > In any case, I don't think this is specified anywhere. Please correct me if I am
> > > wrong.
> > > 
> > > The latest stateful codec spec is here:
> > > 
> > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > > 
> > > Assuming what I described above is indeed the case, then I think this should
> > > be documented. I don't know enough if a flag is needed somewhere to describe
> > > the behavior for interlaced formats, or can we leave this open and have userspace
> > > detect this?
> > > 
> > > 
> > > For decoders it is more complicated. The stateful decoder spec is written with
> > > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > > 
> > > See section 4.5.1.7 in the spec.
> > > 
> > > But I understand that various HW decoders *do* have limitations. I would really
> > > like to know about those, since that needs to be exposed to userspace somehow.
> > > 
> > > Specifically, the venus decoder needs to know the resolution of the coded video
> > > beforehand and it expects a single frame per buffer (how does that work for
> > > interlaced formats?).
> > > 
> > > Such requirements mean that some userspace parsing is still required, so these
> > > decoders are not completely stateful.
> > > 
> > > Can every codec author give information about their decoder/encoder?
> > > 
> > > I'll start off with my virtual codec driver:
> > > 
> > > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > > but when that is added it will encode one field per buffer.
> > > 
> > > Let's see what the results are.
> > 
> > Hans though a summary of what existing userspace expects / assumes
> > would be nice.
> > 
> > GStreamer:
> > ==========
> > Encodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg4, vp8, vp9
> > Decodes:
> >   fwht, h263, h264, hevc, jpeg, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > It assumes that each encoded v4l2_buffer contains exactly one frame
> > (any format, two fields for interlaced content). It may still work
> > otherwise, but some issues will appear, timestamp shift, lost of
> > metadata (e.g. timecode, cc, etc.).
> 
> When you say 'each encoded v4l2_buffer contains exactly on frame',
> does that include H.264 SPS/PPS headers? Or are those passed in
> a separate v4l2_buffer? 

Yes, the SPS/PPS is assumed to be in the same buffer. In the case of
the decoder it's guarantied to be, if the decoder does not do that, it
will still work with a timestamp shift.

> Ditto for FFMPEG.

I believe it's the same, but I'd need to re-read that code to confirm.
The thing about FFMPEG is that the internal format is always AVC
instead of bytestream. And the PPS/SPS travels out-of-band, which means
it's not inside an AVPacket internally.

> 
> Regards,
> 
> 	Hans
> 
> > FFMpeg:
> > =======
> > Encodes:
> >   h263, h264, hevc, mpeg4, vp8
> > Decodes:
> >   h263, h264, hevc, mpeg2, mpeg4, vc1, vp8, vp9
> > 
> > Similarly to GStreamer, it assumes that one AVPacket will fit one
> > v4l2_buffer. On the encoding side, it seems less of a problem, but they
> > don't fully implement the FFMPEG CODEC API for frame matching, which I
> > suspect would create some ambiguity if it was.
> > 
> > Chromium:
> > =========
> > Decodes:
> >   H264, VP8, VP9
> > Encodes:
> >   H264
> > 
> > That is the code I know the less, but the encoder does not seem
> > affected by the nal alignment. The keyframe flag and timestamps seems
> > to be used and are likely expected to correlate with the input, so I
> > suspect that there exist some possible ambiguity if the output is not
> > full frame. For the decoder, I'll have to ask someone else to comment,
> > the code is hard to follow and I could not get to the place where
> > output buffers are filled. I thought the GStreamer code was tough, but
> > this is quite similarly a mess.
> > 
> > Nicolas
> > 
> > 
> > 
> > 
> > 
> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-03  8:32 ` Tomasz Figa
                     ` (2 preceding siblings ...)
  2019-07-10  9:14   ` Hans Verkuil
@ 2019-07-11  1:42   ` Nicolas Dufresne
  2019-07-11 12:47     ` Tomasz Figa
  3 siblings, 1 reply; 18+ messages in thread
From: Nicolas Dufresne @ 2019-07-11  1:42 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Linux Media Mailing List, Dave Stevenson, Boris Brezillon,
	Paul Kocialkowski, Stanimir Varbanov, Philipp Zabel,
	Ezequiel Garcia, Michael Tretter, Sylwester Nawrocki

Le mercredi 03 juillet 2019 à 17:32 +0900, Tomasz Figa a écrit :
> Hi Hans,
> 
> On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > Hi all,
> > 
> > I hope I Cc-ed everyone with a stake in this issue.
> > 
> > One recurring question is how a stateful encoder fills buffers and how a stateful
> > decoder consumes buffers.
> > 
> > The most generic case is that an encoder produces a bitstream and just fills each
> > CAPTURE buffer to the brim before continuing with the next buffer.
> > 
> > I don't think there are drivers that do this, I believe that all drivers just
> > output a single compressed frame. For interlaced formats I understand it is either
> > one compressed field per buffer, or two compressed fields per buffer (this is
> > what I heard, I don't know if this is true).
> > 
> > In any case, I don't think this is specified anywhere. Please correct me if I am
> > wrong.
> > 
> > The latest stateful codec spec is here:
> > 
> > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > 
> > Assuming what I described above is indeed the case, then I think this should
> > be documented. I don't know enough if a flag is needed somewhere to describe
> > the behavior for interlaced formats, or can we leave this open and have userspace
> > detect this?
> > 
> 
> From Chromium perspective, we don't have any use case for encoding
> interlaced contents, so we'll be okay with whatever the interested
> parties decide on. :)
> 
> > For decoders it is more complicated. The stateful decoder spec is written with
> > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > 
> > See section 4.5.1.7 in the spec.
> > 
> > But I understand that various HW decoders *do* have limitations. I would really
> > like to know about those, since that needs to be exposed to userspace somehow.
> 
> AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
> separate buffers. I believe it also needs 1 buffer to contain exactly
> 1 frame and 1 frame to be fully contained inside 1 buffer.
> 
> Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
> be fully contained inside 1 buffer. It used to have some specific
> requirements regarding SPS and PPS too, but I think that was fixed in
> the firmware.
> 
> > Specifically, the venus decoder needs to know the resolution of the coded video
> > beforehand
> 
> I don't think that's true for venus. It does parsing and can detect
> the resolution.
> 
> However that's probably the case for coda...
> 
> > and it expects a single frame per buffer (how does that work for
> > interlaced formats?).
> > 
> > Such requirements mean that some userspace parsing is still required, so these
> > decoders are not completely stateful.
> > 
> > Can every codec author give information about their decoder/encoder?
> > 
> > I'll start off with my virtual codec driver:
> > 
> > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > but when that is added it will encode one field per buffer.
> > 
> > Let's see what the results are.
> 
> s5p-mfc:
>  decoder: fully parses the bitstream,
>  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> 
> mtk-vcodec:
>  decoder: expects separate buffers for SPS, PPS and full frames
> (including some random stuff like SEIMessage),
>  encoder: produces a single frame per buffer (haven't tested interlaced stuff)

Interesting, do I read correctly that what the encoder does not produce
what the decoder needs ?

> 
> Best regards,
> Tomasz


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-11  1:42   ` Nicolas Dufresne
@ 2019-07-11 12:47     ` Tomasz Figa
  0 siblings, 0 replies; 18+ messages in thread
From: Tomasz Figa @ 2019-07-11 12:47 UTC (permalink / raw)
  To: Nicolas Dufresne
  Cc: Hans Verkuil, Linux Media Mailing List, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter,
	Sylwester Nawrocki

On Thu, Jul 11, 2019 at 10:42 AM Nicolas Dufresne <nicolas@ndufresne.ca> wrote:
>
> Le mercredi 03 juillet 2019 à 17:32 +0900, Tomasz Figa a écrit :
> > Hi Hans,
> >
> > On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > > Hi all,
> > >
> > > I hope I Cc-ed everyone with a stake in this issue.
> > >
> > > One recurring question is how a stateful encoder fills buffers and how a stateful
> > > decoder consumes buffers.
> > >
> > > The most generic case is that an encoder produces a bitstream and just fills each
> > > CAPTURE buffer to the brim before continuing with the next buffer.
> > >
> > > I don't think there are drivers that do this, I believe that all drivers just
> > > output a single compressed frame. For interlaced formats I understand it is either
> > > one compressed field per buffer, or two compressed fields per buffer (this is
> > > what I heard, I don't know if this is true).
> > >
> > > In any case, I don't think this is specified anywhere. Please correct me if I am
> > > wrong.
> > >
> > > The latest stateful codec spec is here:
> > >
> > > https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> > >
> > > Assuming what I described above is indeed the case, then I think this should
> > > be documented. I don't know enough if a flag is needed somewhere to describe
> > > the behavior for interlaced formats, or can we leave this open and have userspace
> > > detect this?
> > >
> >
> > From Chromium perspective, we don't have any use case for encoding
> > interlaced contents, so we'll be okay with whatever the interested
> > parties decide on. :)
> >
> > > For decoders it is more complicated. The stateful decoder spec is written with
> > > the assumption that userspace can just fill each OUTPUT buffer to the brim with
> > > the compressed bitstream. I.e., no need to split at frame or other boundaries.
> > >
> > > See section 4.5.1.7 in the spec.
> > >
> > > But I understand that various HW decoders *do* have limitations. I would really
> > > like to know about those, since that needs to be exposed to userspace somehow.
> >
> > AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
> > separate buffers. I believe it also needs 1 buffer to contain exactly
> > 1 frame and 1 frame to be fully contained inside 1 buffer.
> >
> > Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
> > be fully contained inside 1 buffer. It used to have some specific
> > requirements regarding SPS and PPS too, but I think that was fixed in
> > the firmware.
> >
> > > Specifically, the venus decoder needs to know the resolution of the coded video
> > > beforehand
> >
> > I don't think that's true for venus. It does parsing and can detect
> > the resolution.
> >
> > However that's probably the case for coda...
> >
> > > and it expects a single frame per buffer (how does that work for
> > > interlaced formats?).
> > >
> > > Such requirements mean that some userspace parsing is still required, so these
> > > decoders are not completely stateful.
> > >
> > > Can every codec author give information about their decoder/encoder?
> > >
> > > I'll start off with my virtual codec driver:
> > >
> > > vicodec: the decoder fully parses the bitstream. The encoder produces a single
> > > compressed frame per buffer. This driver doesn't yet support interlaced formats,
> > > but when that is added it will encode one field per buffer.
> > >
> > > Let's see what the results are.
> >
> > s5p-mfc:
> >  decoder: fully parses the bitstream,
> >  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> >
> > mtk-vcodec:
> >  decoder: expects separate buffers for SPS, PPS and full frames
> > (including some random stuff like SEIMessage),
> >  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
>
> Interesting, do I read correctly that what the encoder does not produce
> what the decoder needs ?

Apparently. :)

But given all the diversity that was mentioned in this thread, one
can't expect to be able to feed a decoder with the exact buffers from
an encoder, although first of all I'm not sure why one would even want
to do that.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC] Stateful codecs and requirements for compressed formats
  2019-07-10  9:14   ` Hans Verkuil
@ 2019-07-11 12:49     ` Tomasz Figa
  0 siblings, 0 replies; 18+ messages in thread
From: Tomasz Figa @ 2019-07-11 12:49 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Nicolas Dufresne, Dave Stevenson,
	Boris Brezillon, Paul Kocialkowski, Stanimir Varbanov,
	Philipp Zabel, Ezequiel Garcia, Michael Tretter,
	Sylwester Nawrocki

On Wed, Jul 10, 2019 at 6:14 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 7/3/19 10:32 AM, Tomasz Figa wrote:
> > Hi Hans,
> >
> > On Fri, Jun 28, 2019 at 11:34 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >>
> >> Hi all,
> >>
> >> I hope I Cc-ed everyone with a stake in this issue.
> >>
> >> One recurring question is how a stateful encoder fills buffers and how a stateful
> >> decoder consumes buffers.
> >>
> >> The most generic case is that an encoder produces a bitstream and just fills each
> >> CAPTURE buffer to the brim before continuing with the next buffer.
> >>
> >> I don't think there are drivers that do this, I believe that all drivers just
> >> output a single compressed frame. For interlaced formats I understand it is either
> >> one compressed field per buffer, or two compressed fields per buffer (this is
> >> what I heard, I don't know if this is true).
> >>
> >> In any case, I don't think this is specified anywhere. Please correct me if I am
> >> wrong.
> >>
> >> The latest stateful codec spec is here:
> >>
> >> https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-mem2mem.html
> >>
> >> Assuming what I described above is indeed the case, then I think this should
> >> be documented. I don't know enough if a flag is needed somewhere to describe
> >> the behavior for interlaced formats, or can we leave this open and have userspace
> >> detect this?
> >>
> >
> > From Chromium perspective, we don't have any use case for encoding
> > interlaced contents, so we'll be okay with whatever the interested
> > parties decide on. :)
> >
> >>
> >> For decoders it is more complicated. The stateful decoder spec is written with
> >> the assumption that userspace can just fill each OUTPUT buffer to the brim with
> >> the compressed bitstream. I.e., no need to split at frame or other boundaries.
> >>
> >> See section 4.5.1.7 in the spec.
> >>
> >> But I understand that various HW decoders *do* have limitations. I would really
> >> like to know about those, since that needs to be exposed to userspace somehow.
> >
> > AFAIK mtk-vcodec needs H.264 SPS and PPS to be split into their own
> > separate buffers. I believe it also needs 1 buffer to contain exactly
> > 1 frame and 1 frame to be fully contained inside 1 buffer.
> >
> > Venus also needed 1 buffer to contain exactly 1 frame and 1 frame to
> > be fully contained inside 1 buffer. It used to have some specific
> > requirements regarding SPS and PPS too, but I think that was fixed in
> > the firmware.
> >
> >>
> >> Specifically, the venus decoder needs to know the resolution of the coded video
> >> beforehand
> >
> > I don't think that's true for venus. It does parsing and can detect
> > the resolution.
> >
> > However that's probably the case for coda...
> >
> >> and it expects a single frame per buffer (how does that work for
> >> interlaced formats?).
> >>
> >> Such requirements mean that some userspace parsing is still required, so these
> >> decoders are not completely stateful.
> >>
> >> Can every codec author give information about their decoder/encoder?
> >>
> >> I'll start off with my virtual codec driver:
> >>
> >> vicodec: the decoder fully parses the bitstream. The encoder produces a single
> >> compressed frame per buffer. This driver doesn't yet support interlaced formats,
> >> but when that is added it will encode one field per buffer.
> >>
> >> Let's see what the results are.
> >
> > s5p-mfc:
> >  decoder: fully parses the bitstream,
> >  encoder: produces a single frame per buffer (haven't tested interlaced stuff)
> >
> > mtk-vcodec:
> >  decoder: expects separate buffers for SPS, PPS and full frames
> > (including some random stuff like SEIMessage),
>
> Do you mean that the SPS/PPS etc. should all be in separate buffers? I.e.
> you can't combine SPS and PPS in a single buffer?

Exactly that. It's obviously a firmware bug, but we haven't been able
to get that fixed.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-07-11 12:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-28 14:34 [RFC] Stateful codecs and requirements for compressed formats Hans Verkuil
2019-06-28 15:21 ` Dave Stevenson
2019-06-28 15:48   ` Nicolas Dufresne
2019-06-29 10:02     ` Dave Stevenson
2019-06-29 12:55       ` Nicolas Dufresne
2019-06-28 16:18 ` Nicolas Dufresne
2019-06-28 18:09 ` Nicolas Dufresne
2019-07-03  8:46   ` Tomasz Figa
2019-07-03 17:43     ` Nicolas Dufresne
2019-07-10  8:43   ` Hans Verkuil
2019-07-11  1:40     ` Nicolas Dufresne
2019-07-03  8:32 ` Tomasz Figa
2019-07-03 14:46   ` Philipp Zabel
2019-07-03 17:46   ` Nicolas Dufresne
2019-07-10  9:14   ` Hans Verkuil
2019-07-11 12:49     ` Tomasz Figa
2019-07-11  1:42   ` Nicolas Dufresne
2019-07-11 12:47     ` Tomasz Figa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.