All of lore.kernel.org
 help / color / mirror / Atom feed
* Stateful Video Decoder interface vs M2M framework
@ 2021-02-01 12:48 Dave Stevenson
  2021-02-01 13:57 ` Alexandre Courbot
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Stevenson @ 2021-02-01 12:48 UTC (permalink / raw)
  To: Linux Media Mailing List

Hi All.

I'm currently doing battle with the stateful video decoder API for
video decode on the Raspberry Pi.

Reading the descriptive docs[1] there is no obligation to
STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
makes my colleague who is working on the userspace side happy as it
saves a config step of allocating buffers that are never needed.

I have been using the v4l2_mem2mem framework, same as some other
decoders. We use v4l2_m2m in the buffered mode as it's actually
remoted over to the VPU via the MMAL API, and so the src and dest are
asynchronous from V4L2's perspective.

Said colleague then complained that he couldn't follow the flow
described in the docs linked above as it never produced the
V4L2_EVENT_SOURCE_CHANGE event.

Digging into it, it's the v4l2_mem2mem framework stopping me.
__v4l2_m2m_try_queue[2] has
    if (!m2m_ctx->out_q_ctx.q.streaming
        || !m2m_ctx->cap_q_ctx.q.streaming) {
        dprintk("Streaming needs to be on for both queues\n");
        return;
    }
So I'm never going to get any of the OUTPUT buffers even queued to my
driver until STREAMON(CAPTURE). That contradicts the documentation :-(

Now I can see that on a non-buffered M2M device you have to have both
OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
tweaking that one clause to
    if (!m2m_ctx->out_q_ctx.buffered &&
        (!m2m_ctx->out_q_ctx.q.streaming ||
         !m2m_ctx->cap_q_ctx.q.streaming)) {
solves the problem, but is that a generic solution? I don't have any
other platforms to test against.

However it poses a larger question for my colleague as to what
behaviour he can rely on in userspace. Is there a way for userspace to
know whether it is permitted on a specific codec implementation to
follow the docs and not STREAMON(CAPTURE) until
V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
many devices.

On a very brief survey of a few existing drivers I see:
- the Coda driver uses v4l2_m2m in the same mode as my driver, so
presumably it's not currently going to be following the documentation
but the above change should make it do so.
- meson/vdec doesn't work in buffered mode, so it's never going to be
able to follow the docs. There is a TODO stating that (although
implying it's mainly on MPEG1 & 2).
- mtk-vcodec is also not in buffered mode, and appears to have
secondary checks that it has both a src and dst buffer before passing
anything to the hardware. Is there any way for that to be able to
follow the docs?

Guidance please.

Thanks
  Dave

[1] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-decoder.html#initialization
[2] https://git.linuxtv.org/media_tree.git/tree/drivers/media/v4l2-core/v4l2-mem2mem.c#n303

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stateful Video Decoder interface vs M2M framework
  2021-02-01 12:48 Stateful Video Decoder interface vs M2M framework Dave Stevenson
@ 2021-02-01 13:57 ` Alexandre Courbot
  2021-02-01 17:09   ` Dave Stevenson
  0 siblings, 1 reply; 6+ messages in thread
From: Alexandre Courbot @ 2021-02-01 13:57 UTC (permalink / raw)
  To: Dave Stevenson; +Cc: Linux Media Mailing List

Hi,

On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson
<dave.stevenson@raspberrypi.com> wrote:
>
> Hi All.
>
> I'm currently doing battle with the stateful video decoder API for
> video decode on the Raspberry Pi.
>
> Reading the descriptive docs[1] there is no obligation to
> STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
> V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
> makes my colleague who is working on the userspace side happy as it
> saves a config step of allocating buffers that are never needed.
>
> I have been using the v4l2_mem2mem framework, same as some other
> decoders. We use v4l2_m2m in the buffered mode as it's actually
> remoted over to the VPU via the MMAL API, and so the src and dest are
> asynchronous from V4L2's perspective.
>
> Said colleague then complained that he couldn't follow the flow
> described in the docs linked above as it never produced the
> V4L2_EVENT_SOURCE_CHANGE event.
>
> Digging into it, it's the v4l2_mem2mem framework stopping me.
> __v4l2_m2m_try_queue[2] has
>     if (!m2m_ctx->out_q_ctx.q.streaming
>         || !m2m_ctx->cap_q_ctx.q.streaming) {
>         dprintk("Streaming needs to be on for both queues\n");
>         return;
>     }
> So I'm never going to get any of the OUTPUT buffers even queued to my
> driver until STREAMON(CAPTURE). That contradicts the documentation :-(
>
> Now I can see that on a non-buffered M2M device you have to have both
> OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
> buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
> tweaking that one clause to
>     if (!m2m_ctx->out_q_ctx.buffered &&
>         (!m2m_ctx->out_q_ctx.q.streaming ||
>          !m2m_ctx->cap_q_ctx.q.streaming)) {
> solves the problem, but is that a generic solution? I don't have any
> other platforms to test against.

As you said you cannot rely on the v4l2_m2m_try_schedule() to run the
jobs until both queues are streaming. This is one point where stateful
decoders do not fit with the expectation from M2M that 1 input buffer
== 1 output buffer. How to work around this limitation depends on the
design of the underlying hardware, but for example the mtk-vcodec
driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue
hook:

https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220

This allows the hardware to start processing the stream until it
reports the expected resolution, after which the CAPTURE buffers can
be allocated and the CAPTURE queue started.

Queueing OUTPUT buffers in the buf_queue hook can be done as a general
rule if the hardware expects input and output buffers to be queued
independently. You can also switch to a more traditional M2M scheme
once both queues are running if this fits the driver better.

>
> However it poses a larger question for my colleague as to what
> behaviour he can rely on in userspace. Is there a way for userspace to
> know whether it is permitted on a specific codec implementation to
> follow the docs and not STREAMON(CAPTURE) until
> V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
> many devices.

The documentation should be correct for most if not all stateful
decoders in the tree. I know firsthand that at least mtk-vcodec and
venus are compliant.

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stateful Video Decoder interface vs M2M framework
  2021-02-01 13:57 ` Alexandre Courbot
@ 2021-02-01 17:09   ` Dave Stevenson
  2021-02-02 13:16     ` Alexandre Courbot
                       ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Dave Stevenson @ 2021-02-01 17:09 UTC (permalink / raw)
  To: Alexandre Courbot; +Cc: Linux Media Mailing List

Hi Alex

Thanks for the response.

On Mon, 1 Feb 2021 at 13:58, Alexandre Courbot <acourbot@chromium.org> wrote:
>
> Hi,
>
> On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson
> <dave.stevenson@raspberrypi.com> wrote:
> >
> > Hi All.
> >
> > I'm currently doing battle with the stateful video decoder API for
> > video decode on the Raspberry Pi.
> >
> > Reading the descriptive docs[1] there is no obligation to
> > STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
> > V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
> > makes my colleague who is working on the userspace side happy as it
> > saves a config step of allocating buffers that are never needed.
> >
> > I have been using the v4l2_mem2mem framework, same as some other
> > decoders. We use v4l2_m2m in the buffered mode as it's actually
> > remoted over to the VPU via the MMAL API, and so the src and dest are
> > asynchronous from V4L2's perspective.
> >
> > Said colleague then complained that he couldn't follow the flow
> > described in the docs linked above as it never produced the
> > V4L2_EVENT_SOURCE_CHANGE event.
> >
> > Digging into it, it's the v4l2_mem2mem framework stopping me.
> > __v4l2_m2m_try_queue[2] has
> >     if (!m2m_ctx->out_q_ctx.q.streaming
> >         || !m2m_ctx->cap_q_ctx.q.streaming) {
> >         dprintk("Streaming needs to be on for both queues\n");
> >         return;
> >     }
> > So I'm never going to get any of the OUTPUT buffers even queued to my
> > driver until STREAMON(CAPTURE). That contradicts the documentation :-(
> >
> > Now I can see that on a non-buffered M2M device you have to have both
> > OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
> > buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
> > tweaking that one clause to
> >     if (!m2m_ctx->out_q_ctx.buffered &&
> >         (!m2m_ctx->out_q_ctx.q.streaming ||
> >          !m2m_ctx->cap_q_ctx.q.streaming)) {
> > solves the problem, but is that a generic solution? I don't have any
> > other platforms to test against.
>
> As you said you cannot rely on the v4l2_m2m_try_schedule() to run the
> jobs until both queues are streaming. This is one point where stateful
> decoders do not fit with the expectation from M2M that 1 input buffer
> == 1 output buffer. How to work around this limitation depends on the
> design of the underlying hardware, but for example the mtk-vcodec
> driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue
> hook:
>
> https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220
>
> This allows the hardware to start processing the stream until it
> reports the expected resolution, after which the CAPTURE buffers can
> be allocated and the CAPTURE queue started.
>
> Queueing OUTPUT buffers in the buf_queue hook can be done as a general
> rule if the hardware expects input and output buffers to be queued
> independently. You can also switch to a more traditional M2M scheme
> once both queues are running if this fits the driver better.

Is it deliberate that v4l2_m2m_try_schedule enforces both queues being
active in buffered mode, or an oversight?

I'm happy to switch to a custom qbuf on the OUTPUT queue if we must,
but at least for drivers using the buffered mode of v4l2_m2m it seems
unnecessary except for this one conditional.

> >
> > However it poses a larger question for my colleague as to what
> > behaviour he can rely on in userspace. Is there a way for userspace to
> > know whether it is permitted on a specific codec implementation to
> > follow the docs and not STREAMON(CAPTURE) until
> > V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
> > many devices.
>
> The documentation should be correct for most if not all stateful
> decoders in the tree. I know firsthand that at least mtk-vcodec and
> venus are compliant.

mtk-vcodec I can see as being compliant now - thanks for your explanation.

venus appears to ignore the v4l2_m2m job scheduling side (device_run
is empty), has a good rummage in the v4l2_m2m buffer queues from
venus_helper_process_initial_out_bufs, and then has a custom vb2_ops
buf_queue to queue the buffer with v4l2_m2m and then immediately kick
the processing thread.

Now knowing where to look, coda appears to have a custom code path in
coda_buf_queue to handle the startup phase, and then drops into a more
generic path once initialised.

v4l2_m2m seems to be so close to doing what is needed for the
stateless decoders that it seems odd that it's requiring what looks
like bodging to all stateless decoders. I guess it's just the way
things have evolved over time.

Thanks again.
  Dave

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stateful Video Decoder interface vs M2M framework
  2021-02-01 17:09   ` Dave Stevenson
@ 2021-02-02 13:16     ` Alexandre Courbot
  2021-02-02 16:48     ` Nicolas Dufresne
  2021-02-04 13:01     ` Ezequiel Garcia
  2 siblings, 0 replies; 6+ messages in thread
From: Alexandre Courbot @ 2021-02-02 13:16 UTC (permalink / raw)
  To: Dave Stevenson; +Cc: Linux Media Mailing List

Hi Dave,

On Tue, Feb 2, 2021 at 2:09 AM Dave Stevenson
<dave.stevenson@raspberrypi.com> wrote:
>
> Hi Alex
>
> Thanks for the response.
>
> On Mon, 1 Feb 2021 at 13:58, Alexandre Courbot <acourbot@chromium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson
> > <dave.stevenson@raspberrypi.com> wrote:
> > >
> > > Hi All.
> > >
> > > I'm currently doing battle with the stateful video decoder API for
> > > video decode on the Raspberry Pi.
> > >
> > > Reading the descriptive docs[1] there is no obligation to
> > > STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
> > > V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
> > > makes my colleague who is working on the userspace side happy as it
> > > saves a config step of allocating buffers that are never needed.
> > >
> > > I have been using the v4l2_mem2mem framework, same as some other
> > > decoders. We use v4l2_m2m in the buffered mode as it's actually
> > > remoted over to the VPU via the MMAL API, and so the src and dest are
> > > asynchronous from V4L2's perspective.
> > >
> > > Said colleague then complained that he couldn't follow the flow
> > > described in the docs linked above as it never produced the
> > > V4L2_EVENT_SOURCE_CHANGE event.
> > >
> > > Digging into it, it's the v4l2_mem2mem framework stopping me.
> > > __v4l2_m2m_try_queue[2] has
> > >     if (!m2m_ctx->out_q_ctx.q.streaming
> > >         || !m2m_ctx->cap_q_ctx.q.streaming) {
> > >         dprintk("Streaming needs to be on for both queues\n");
> > >         return;
> > >     }
> > > So I'm never going to get any of the OUTPUT buffers even queued to my
> > > driver until STREAMON(CAPTURE). That contradicts the documentation :-(
> > >
> > > Now I can see that on a non-buffered M2M device you have to have both
> > > OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
> > > buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
> > > tweaking that one clause to
> > >     if (!m2m_ctx->out_q_ctx.buffered &&
> > >         (!m2m_ctx->out_q_ctx.q.streaming ||
> > >          !m2m_ctx->cap_q_ctx.q.streaming)) {
> > > solves the problem, but is that a generic solution? I don't have any
> > > other platforms to test against.
> >
> > As you said you cannot rely on the v4l2_m2m_try_schedule() to run the
> > jobs until both queues are streaming. This is one point where stateful
> > decoders do not fit with the expectation from M2M that 1 input buffer
> > == 1 output buffer. How to work around this limitation depends on the
> > design of the underlying hardware, but for example the mtk-vcodec
> > driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue
> > hook:
> >
> > https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220
> >
> > This allows the hardware to start processing the stream until it
> > reports the expected resolution, after which the CAPTURE buffers can
> > be allocated and the CAPTURE queue started.
> >
> > Queueing OUTPUT buffers in the buf_queue hook can be done as a general
> > rule if the hardware expects input and output buffers to be queued
> > independently. You can also switch to a more traditional M2M scheme
> > once both queues are running if this fits the driver better.
>
> Is it deliberate that v4l2_m2m_try_schedule enforces both queues being
> active in buffered mode, or an oversight?

IIUC that's what the M2M framework expects, historically. It needs
both ends of "memory to memory" in order to do its work, and this
matched well with the first devices using this framework. Stateful
decoders are also "memory to memory", but in the broader sense, both
queues being asynchronous. So while it saves some effort to reuse some
of the M2M helpers, the whole framework cannot be used as it was
originally intended.

>
> I'm happy to switch to a custom qbuf on the OUTPUT queue if we must,
> but at least for drivers using the buffered mode of v4l2_m2m it seems
> unnecessary except for this one conditional.
>
> > >
> > > However it poses a larger question for my colleague as to what
> > > behaviour he can rely on in userspace. Is there a way for userspace to
> > > know whether it is permitted on a specific codec implementation to
> > > follow the docs and not STREAMON(CAPTURE) until
> > > V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
> > > many devices.
> >
> > The documentation should be correct for most if not all stateful
> > decoders in the tree. I know firsthand that at least mtk-vcodec and
> > venus are compliant.
>
> mtk-vcodec I can see as being compliant now - thanks for your explanation.
>
> venus appears to ignore the v4l2_m2m job scheduling side (device_run
> is empty), has a good rummage in the v4l2_m2m buffer queues from
> venus_helper_process_initial_out_bufs, and then has a custom vb2_ops
> buf_queue to queue the buffer with v4l2_m2m and then immediately kick
> the processing thread.
>
> Now knowing where to look, coda appears to have a custom code path in
> coda_buf_queue to handle the startup phase, and then drops into a more
> generic path once initialised.
>
> v4l2_m2m seems to be so close to doing what is needed for the
> stateless decoders that it seems odd that it's requiring what looks
> like bodging to all stateless decoders. I guess it's just the way
> things have evolved over time.

Yeah, what we really need here are encoder and stateful/stateless
decoder helpers (that may or may not rely on M2M) that manage the
queues as expected, while also enforcing the specification. Coming
with a design that works for all drivers is not easy though.

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stateful Video Decoder interface vs M2M framework
  2021-02-01 17:09   ` Dave Stevenson
  2021-02-02 13:16     ` Alexandre Courbot
@ 2021-02-02 16:48     ` Nicolas Dufresne
  2021-02-04 13:01     ` Ezequiel Garcia
  2 siblings, 0 replies; 6+ messages in thread
From: Nicolas Dufresne @ 2021-02-02 16:48 UTC (permalink / raw)
  To: Dave Stevenson, Alexandre Courbot; +Cc: Linux Media Mailing List

Le lundi 01 février 2021 à 17:09 +0000, Dave Stevenson a écrit :
> Hi Alex
> 
> Thanks for the response.
> 
> On Mon, 1 Feb 2021 at 13:58, Alexandre Courbot <acourbot@chromium.org> wrote:
> > 
> > Hi,
> > 
> > On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson
> > <dave.stevenson@raspberrypi.com> wrote:
> > > 
> > > Hi All.
> > > 
> > > I'm currently doing battle with the stateful video decoder API for
> > > video decode on the Raspberry Pi.
> > > 
> > > Reading the descriptive docs[1] there is no obligation to
> > > STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
> > > V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
> > > makes my colleague who is working on the userspace side happy as it
> > > saves a config step of allocating buffers that are never needed.
> > > 
> > > I have been using the v4l2_mem2mem framework, same as some other
> > > decoders. We use v4l2_m2m in the buffered mode as it's actually
> > > remoted over to the VPU via the MMAL API, and so the src and dest are
> > > asynchronous from V4L2's perspective.
> > > 
> > > Said colleague then complained that he couldn't follow the flow
> > > described in the docs linked above as it never produced the
> > > V4L2_EVENT_SOURCE_CHANGE event.
> > > 
> > > Digging into it, it's the v4l2_mem2mem framework stopping me.
> > > __v4l2_m2m_try_queue[2] has
> > >     if (!m2m_ctx->out_q_ctx.q.streaming
> > >         || !m2m_ctx->cap_q_ctx.q.streaming) {
> > >         dprintk("Streaming needs to be on for both queues\n");
> > >         return;
> > >     }
> > > So I'm never going to get any of the OUTPUT buffers even queued to my
> > > driver until STREAMON(CAPTURE). That contradicts the documentation :-(
> > > 
> > > Now I can see that on a non-buffered M2M device you have to have both
> > > OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
> > > buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
> > > tweaking that one clause to
> > >     if (!m2m_ctx->out_q_ctx.buffered &&
> > >         (!m2m_ctx->out_q_ctx.q.streaming ||
> > >          !m2m_ctx->cap_q_ctx.q.streaming)) {
> > > solves the problem, but is that a generic solution? I don't have any
> > > other platforms to test against.
> > 
> > As you said you cannot rely on the v4l2_m2m_try_schedule() to run the
> > jobs until both queues are streaming. This is one point where stateful
> > decoders do not fit with the expectation from M2M that 1 input buffer
> > == 1 output buffer. How to work around this limitation depends on the
> > design of the underlying hardware, but for example the mtk-vcodec
> > driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue
> > hook:
> > 
> > https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220
> > 
> > This allows the hardware to start processing the stream until it
> > reports the expected resolution, after which the CAPTURE buffers can
> > be allocated and the CAPTURE queue started.
> > 
> > Queueing OUTPUT buffers in the buf_queue hook can be done as a general
> > rule if the hardware expects input and output buffers to be queued
> > independently. You can also switch to a more traditional M2M scheme
> > once both queues are running if this fits the driver better.
> 
> Is it deliberate that v4l2_m2m_try_schedule enforces both queues being
> active in buffered mode, or an oversight?
> 
> I'm happy to switch to a custom qbuf on the OUTPUT queue if we must,
> but at least for drivers using the buffered mode of v4l2_m2m it seems
> unnecessary except for this one conditional.
> 
> > > 
> > > However it poses a larger question for my colleague as to what
> > > behaviour he can rely on in userspace. Is there a way for userspace to
> > > know whether it is permitted on a specific codec implementation to
> > > follow the docs and not STREAMON(CAPTURE) until
> > > V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
> > > many devices.
> > 
> > The documentation should be correct for most if not all stateful
> > decoders in the tree. I know firsthand that at least mtk-vcodec and
> > venus are compliant.
> 
> mtk-vcodec I can see as being compliant now - thanks for your explanation.
> 
> venus appears to ignore the v4l2_m2m job scheduling side (device_run
> is empty), has a good rummage in the v4l2_m2m buffer queues from
> venus_helper_process_initial_out_bufs, and then has a custom vb2_ops
> buf_queue to queue the buffer with v4l2_m2m and then immediately kick
> the processing thread.
> 
> Now knowing where to look, coda appears to have a custom code path in
> coda_buf_queue to handle the startup phase, and then drops into a more
> generic path once initialised.

Just a warning, CODA does not really implement the source-change event the way
it should. The other did not have the specification hence didn't know how to get
the frame resolution from the HW. At the moment, it only work for special
userspace that passed the resolution at setup time (which by spec should be
ignored in favour of HW reported sizes).

You might not be aware that this event driven mechanism was added much after
most of drivers were merged into staging. The legacy method, which I don't
recommand was that userspace would queue on OUTPUT the initial header data, and
then it would call G_FMT which would block till the data was processed. If I
remember right, MFC driver (the first of this type of driver) was starting the
proecessing right inside the G_FMT call.

I'm currently working on general support for source-change on GStreamer side
(not only for codecs), that should allow reviving:

https://gitlab.freedesktop.org/gstreamer/gst-plugins-good/-/merge_requests/240

Which is a left behind attempt to implement compliance to the recent spec in
GStreamer, which I know is available software on RPi.

> 
> v4l2_m2m seems to be so close to doing what is needed for the
> stateless decoders that it seems odd that it's requiring what looks
> like bodging to all stateless decoders. I guess it's just the way
> things have evolved over time.
> 
> Thanks again.
>   Dave



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stateful Video Decoder interface vs M2M framework
  2021-02-01 17:09   ` Dave Stevenson
  2021-02-02 13:16     ` Alexandre Courbot
  2021-02-02 16:48     ` Nicolas Dufresne
@ 2021-02-04 13:01     ` Ezequiel Garcia
  2 siblings, 0 replies; 6+ messages in thread
From: Ezequiel Garcia @ 2021-02-04 13:01 UTC (permalink / raw)
  To: Dave Stevenson; +Cc: Alexandre Courbot, Linux Media Mailing List

Hi Dave,

Thanks for bringing up this topic. I've been wanting
to discuss working on making v4l2-mem2mem less simplistic
for quite some time now :-)

On Mon, 1 Feb 2021 at 14:13, Dave Stevenson
<dave.stevenson@raspberrypi.com> wrote:
>
> Hi Alex
>
> Thanks for the response.
>
> On Mon, 1 Feb 2021 at 13:58, Alexandre Courbot <acourbot@chromium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Feb 1, 2021 at 9:49 PM Dave Stevenson
> > <dave.stevenson@raspberrypi.com> wrote:
> > >
> > > Hi All.
> > >
> > > I'm currently doing battle with the stateful video decoder API for
> > > video decode on the Raspberry Pi.
> > >
> > > Reading the descriptive docs[1] there is no obligation to
> > > STREAMON(CAPTURE) before feeding in OUTPUT buffers and waiting for
> > > V4L2_EVENT_SOURCE_CHANGE to configure the CAPTURE queue. Great! It
> > > makes my colleague who is working on the userspace side happy as it
> > > saves a config step of allocating buffers that are never needed.
> > >
> > > I have been using the v4l2_mem2mem framework, same as some other
> > > decoders. We use v4l2_m2m in the buffered mode as it's actually
> > > remoted over to the VPU via the MMAL API, and so the src and dest are
> > > asynchronous from V4L2's perspective.
> > >
> > > Said colleague then complained that he couldn't follow the flow
> > > described in the docs linked above as it never produced the
> > > V4L2_EVENT_SOURCE_CHANGE event.
> > >
> > > Digging into it, it's the v4l2_mem2mem framework stopping me.
> > > __v4l2_m2m_try_queue[2] has
> > >     if (!m2m_ctx->out_q_ctx.q.streaming
> > >         || !m2m_ctx->cap_q_ctx.q.streaming) {
> > >         dprintk("Streaming needs to be on for both queues\n");
> > >         return;
> > >     }
> > > So I'm never going to get any of the OUTPUT buffers even queued to my
> > > driver until STREAMON(CAPTURE). That contradicts the documentation :-(
> > >

Although this will sound obvious, let's make a distinction
between the API (which applications can rely on, and drivers
must comply with) and an internal component (v4l2-m2m)
which is 100% subject to change, and just there to offer
some boilerplate to drivers.

IOW, nothing forces drivers to use v4l2-m2m,
and nothing prevents us from figuring out a better framework.

> > > Now I can see that on a non-buffered M2M device you have to have both
> > > OUTPUT and CAPTURE enabled because it wants to produce a CAPTURE
> > > buffer for every OUTPUT buffer on a 1:1 basis. On a buffered codec
> > > tweaking that one clause to
> > >     if (!m2m_ctx->out_q_ctx.buffered &&
> > >         (!m2m_ctx->out_q_ctx.q.streaming ||
> > >          !m2m_ctx->cap_q_ctx.q.streaming)) {
> > > solves the problem, but is that a generic solution? I don't have any
> > > other platforms to test against.
> >
> > As you said you cannot rely on the v4l2_m2m_try_schedule() to run the
> > jobs until both queues are streaming. This is one point where stateful
> > decoders do not fit with the expectation from M2M that 1 input buffer
> > == 1 output buffer. How to work around this limitation depends on the
> > design of the underlying hardware, but for example the mtk-vcodec
> > driver passes the OUTPUT buffers to its hardware in its vb2 buf_queue
> > hook:
> >
> > https://elixir.bootlin.com/linux/latest/source/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c#L1220
> >
> > This allows the hardware to start processing the stream until it
> > reports the expected resolution, after which the CAPTURE buffers can
> > be allocated and the CAPTURE queue started.
> >
> > Queueing OUTPUT buffers in the buf_queue hook can be done as a general
> > rule if the hardware expects input and output buffers to be queued
> > independently. You can also switch to a more traditional M2M scheme
> > once both queues are running if this fits the driver better.
>
> Is it deliberate that v4l2_m2m_try_schedule enforces both queues being
> active in buffered mode, or an oversight?
>
> I'm happy to switch to a custom qbuf on the OUTPUT queue if we must,
> but at least for drivers using the buffered mode of v4l2_m2m it seems
> unnecessary except for this one conditional.
>
> > >
> > > However it poses a larger question for my colleague as to what
> > > behaviour he can rely on in userspace. Is there a way for userspace to
> > > know whether it is permitted on a specific codec implementation to
> > > follow the docs and not STREAMON(CAPTURE) until
> > > V4L2_EVENT_SOURCE_CHANGE? If not then the documentation is invalid for
> > > many devices.
> >

Applications should have a way of retrieving the supported events,
unless I'm missing something in the API. But OTOH,
supporting V4L2_EVENT_SOURCE_CHANGE
is mandatory for decoders that comply with the stateful interface.

Yet another thing to keep in mind, is that the stateful interface
was officially merged in 2019. I'm unsure how clear
the "de-facto standard" was to older driver authors.

> > The documentation should be correct for most if not all stateful
> > decoders in the tree. I know firsthand that at least mtk-vcodec and
> > venus are compliant.
>
> mtk-vcodec I can see as being compliant now - thanks for your explanation.
>
> venus appears to ignore the v4l2_m2m job scheduling side (device_run
> is empty), has a good rummage in the v4l2_m2m buffer queues from
> venus_helper_process_initial_out_bufs, and then has a custom vb2_ops
> buf_queue to queue the buffer with v4l2_m2m and then immediately kick
> the processing thread.
>

This is another good indication that v4l2_m2m needs a redesign
to fit more sophisticated use-cases.

> Now knowing where to look, coda appears to have a custom code path in
> coda_buf_queue to handle the startup phase, and then drops into a more
> generic path once initialised.
>
> v4l2_m2m seems to be so close to doing what is needed for the
> stateless decoders that it seems odd that it's requiring what looks
> like bodging to all stateless decoders. I guess it's just the way
> things have evolved over time.
>

I don't think the *stateless interface* involves
V4L2_EVENT_SOURCE_CHANGE. The application should be
able to deal with dynamic resolution changes without it.

In any case, revisiting v4l2-m2m is something we
should tackle sooner or later.

If not to support the API around V4L2 events,
then to support devices that need more than just one source
and one sink buffer for each job (venus and rpivid look like
good examples of this).

Thanks,
Ezequiel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-04 13:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01 12:48 Stateful Video Decoder interface vs M2M framework Dave Stevenson
2021-02-01 13:57 ` Alexandre Courbot
2021-02-01 17:09   ` Dave Stevenson
2021-02-02 13:16     ` Alexandre Courbot
2021-02-02 16:48     ` Nicolas Dufresne
2021-02-04 13:01     ` Ezequiel Garcia

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.