linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
@ 2019-02-13  5:53 Alexandre Courbot
  2019-02-13  5:58 ` Alexandre Courbot
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Courbot @ 2019-02-13  5:53 UTC (permalink / raw)
  To: Tomasz Figa, Paul Kocialkowski, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Pawel Osciak, linux-media
  Cc: linux-kernel, Alexandre Courbot

Documents the protocol that user-space should follow when
communicating with stateless video decoders.

The stateless video decoding API makes use of the new request and tags
APIs. While it has been implemented with the Cedrus driver so far, it
should probably still be considered staging for a short while.

Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
---
Changes since v2:

* Go back to one frame worth of encoded data as the quantity to be passed per
  request, while leaving the door open to other quantities.

* Remove requirement for the client to not queue CAPTURE buffers until all
  the frames referencing them have been decoded, and specify that
  reference frames can be queued back as soon as all the encoded buffers
  of frames depending on them have been queued.

Changes since v1:

* Use timestamps instead of tags to reference frames,
* Applied Paul's suggestions to not require one frame worth of data per OUTPUT
  buffer

 Documentation/media/uapi/v4l/dev-mem2mem.rst  |   5 +
 .../media/uapi/v4l/dev-stateless-decoder.rst  | 377 ++++++++++++++++++
 2 files changed, 382 insertions(+)
 create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst

diff --git a/Documentation/media/uapi/v4l/dev-mem2mem.rst b/Documentation/media/uapi/v4l/dev-mem2mem.rst
index 67a980818dc8..db6f4efc458d 100644
--- a/Documentation/media/uapi/v4l/dev-mem2mem.rst
+++ b/Documentation/media/uapi/v4l/dev-mem2mem.rst
@@ -13,6 +13,11 @@
 Video Memory-To-Memory Interface
 ********************************
 
+.. toctree::
+    :maxdepth: 1
+
+    dev-stateless-decoder
+
 A V4L2 memory-to-memory device can compress, decompress, transform, or
 otherwise convert video data from one format into another format, in memory.
 Such memory-to-memory devices set the ``V4L2_CAP_VIDEO_M2M`` or
diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
new file mode 100644
index 000000000000..403b0004f050
--- /dev/null
+++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
@@ -0,0 +1,377 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _stateless_decoder:
+
+**************************************************
+Memory-to-memory Stateless Video Decoder Interface
+**************************************************
+
+A stateless decoder is a decoder that works without retaining any kind of state
+between processed frames. This means that each frame is decoded independently
+of any previous and future frames, and that the client is responsible for
+maintaining the decoding state and providing it to the decoder with each
+decoding request. This is in contrast to the stateful video decoder interface,
+where the hardware and driver maintain the decoding state and all the client
+has to do is to provide the raw encoded stream and dequeue decoded frames in
+display order.
+
+This section describes how user-space ("the client") is expected to communicate
+with stateless decoders in order to successfully decode an encoded stream.
+Compared to stateful codecs, the decoder/client sequence is simpler, but the
+cost of this simplicity is extra complexity in the client which must maintain a
+consistent decoding state.
+
+Stateless decoders make use of the request API. A stateless decoder must expose
+the ``V4L2_BUF_CAP_SUPPORTS_REQUESTS`` capability on its ``OUTPUT`` queue when
+:c:func:`VIDIOC_REQBUFS` or :c:func:`VIDIOC_CREATE_BUFS` are invoked.
+
+Querying capabilities
+=====================
+
+1. To enumerate the set of coded formats supported by the decoder, the client
+   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
+
+   * The driver must always return the full set of supported ``OUTPUT`` formats,
+     irrespective of the format currently set on the ``CAPTURE`` queue.
+
+   * Simultaneously, the driver must restrain the set of values returned by
+     codec-specific capability controls (such as H.264 profiles) to the set
+     actually supported by the hardware.
+
+2. To enumerate the set of supported raw formats, the client calls
+   :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
+
+   * The driver must return only the formats supported for the format currently
+     active on the ``OUTPUT`` queue.
+
+   * Depending on the currently set ``OUTPUT`` format, the set of supported raw
+     formats may depend on the value of some controls (e.g. parsed format
+     headers) which are codec-dependent. The client is responsible for making
+     sure that these controls are set before querying the ``CAPTURE`` queue.
+     Failure to do so will result in the default values for these controls being
+     used, and a returned set of formats that may not be usable for the media
+     the client is trying to decode.
+
+3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
+   resolutions for a given format, passing desired pixel format in
+   :c:type:`v4l2_frmsizeenum`'s ``pixel_format``.
+
+4. Supported profiles and levels for the current ``OUTPUT`` format, if
+   applicable, may be queried using their respective controls via
+   :c:func:`VIDIOC_QUERYCTRL`.
+
+Initialization
+==============
+
+1. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.
+
+     ``pixelformat``
+         a coded pixel format.
+
+     ``width``, ``height``
+         coded width and height parsed from the stream.
+
+     other fields
+         follow standard semantics.
+
+   .. note::
+
+      Changing the ``OUTPUT`` format may change the currently set ``CAPTURE``
+      format. The driver will derive a new ``CAPTURE`` format from the
+      ``OUTPUT`` format being set, including resolution, colorimetry
+      parameters, etc. If the client needs a specific ``CAPTURE`` format,
+      it must adjust it afterwards.
+
+2. Call :c:func:`VIDIOC_S_EXT_CTRLS` to set all the controls (parsed headers,
+   etc.) required by the ``OUTPUT`` format to enumerate the ``CAPTURE`` formats.
+
+3. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get the format for the
+   destination buffers parsed/decoded from the bitstream.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.
+
+   * **Returned fields:**
+
+     ``width``, ``height``
+         frame buffer resolution for the decoded frames.
+
+     ``pixelformat``
+         pixel format for decoded frames.
+
+     ``num_planes`` (for _MPLANE ``type`` only)
+         number of planes for pixelformat.
+
+     ``sizeimage``, ``bytesperline``
+         as per standard semantics; matching frame buffer format.
+
+   .. note::
+
+      The value of ``pixelformat`` may be any pixel format supported for the
+      ``OUTPUT`` format, based on the hardware capabilities. It is suggested
+      that driver chooses the preferred/optimal format for the current
+      configuration. For example, a YUV format may be preferred over an RGB
+      format, if an additional conversion step would be required for RGB.
+
+4. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
+   the ``CAPTURE`` queue. The client may use this ioctl to discover which
+   alternative raw formats are supported for the current ``OUTPUT`` format and
+   select one of them via :c:func:`VIDIOC_S_FMT`.
+
+   .. note::
+
+      The driver will return only formats supported for the currently selected
+      ``OUTPUT`` format, even if more formats may be supported by the decoder in
+      general.
+
+      For example, a decoder may support YUV and RGB formats for
+      resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
+      to hardware limitations). After setting a resolution of 1920x1088 or lower
+      as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
+      YUV and RGB pixel formats, but after setting a resolution higher than
+      1920x1088, the driver will not return RGB pixel formats, since they are
+      unsupported for this resolution.
+
+5. *[optional]* Choose a different ``CAPTURE`` format than suggested via
+   :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
+   choose a different format than selected/suggested by the driver in
+   :c:func:`VIDIOC_G_FMT`.
+
+    * **Required fields:**
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.
+
+      ``pixelformat``
+          a raw pixel format.
+
+6. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
+   ``OUTPUT`` queue.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero.
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``.
+
+      ``memory``
+          follows standard semantics.
+
+    * **Return fields:**
+
+      ``count``
+          actual number of buffers allocated.
+
+    * If required, the driver will adjust ``count`` to be equal or bigger to the
+      minimum of required number of ``OUTPUT`` buffers for the given format and
+      requested count. The client must check this value after the ioctl returns
+      to get the actual number of buffers allocated.
+
+7. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
+   ``CAPTURE`` queue.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero. The client
+          is responsible for deducing the minimum number of buffers required
+          for the stream to be properly decoded (taking e.g. reference frames
+          into account) and pass an equal or bigger number.
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``.
+
+      ``memory``
+          follows standard semantics. ``V4L2_MEMORY_USERPTR`` is not supported
+          for ``CAPTURE`` buffers.
+
+    * **Return fields:**
+
+      ``count``
+          adjusted to allocated number of buffers, in case the codec requires
+          more buffers than requested.
+
+    * The driver must adjust count to the minimum of required number of
+      ``CAPTURE`` buffers for the current format, stream configuration and
+      requested count. The client must check this value after the ioctl
+      returns to get the number of buffers allocated.
+
+8. Allocate requests (likely one per ``OUTPUT`` buffer) via
+    :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
+
+9. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
+    :c:func:`VIDIOC_STREAMON`.
+
+Decoding
+========
+
+For each frame, the client is responsible for submitting at least one request to
+which the following is attached:
+
+* The amount of encoded data expected by the codec for its current
+  configuration, as a buffer submitted to the ``OUTPUT`` queue. Typically, this
+  corresponds to one frame worth of encoded data, but some formats may allow (or
+  require) different amounts per unit.
+* All the metadata needed to decode the submitted encoded data, in the form of
+  controls relevant to the format being decoded.
+
+The amount and contents of the source ``OUTPUT`` buffer, as well as the controls
+that must be set on the request, depend on the active coded pixel format and
+might be affected by codec-specific extended controls, as stated in
+documentation of each format.
+
+A typical frame would thus be decoded using the following sequence:
+
+1. Queue an ``OUTPUT`` buffer containing one unit of encoded bitstream data for
+   the decoding request, using :c:func:`VIDIOC_QBUF`.
+
+    * **Required fields:**
+
+      ``index``
+          index of the buffer being queued.
+
+      ``type``
+          type of the buffer.
+
+      ``bytesused``
+          number of bytes taken by the encoded data frame in the buffer.
+
+      ``flags``
+          the ``V4L2_BUF_FLAG_REQUEST_FD`` flag must be set.
+
+      ``request_fd``
+          must be set to the file descriptor of the decoding request.
+
+      ``timestamp``
+          must be set to a unique value per frame. This value will be propagated
+          into the decoded frame's buffer and can also be used to use this frame
+          as the reference of another.
+
+2. Set the codec-specific controls for the decoding request, using
+   :c:func:`VIDIOC_S_EXT_CTRLS`.
+
+    * **Required fields:**
+
+      ``which``
+          must be ``V4L2_CTRL_WHICH_REQUEST_VAL``.
+
+      ``request_fd``
+          must be set to the file descriptor of the decoding request.
+
+      other fields
+          other fields are set as usual when setting controls. The ``controls``
+          array must contain all the codec-specific controls required to decode
+          a frame.
+
+   .. note::
+
+      It is possible to specify the controls in different invocations of
+      :c:func:`VIDIOC_S_EXT_CTRLS`, or to overwrite a previously set control, as
+      long as ``request_fd`` and ``which`` are properly set. The controls state
+      at the moment of request submission is the one that will be considered.
+
+   .. note::
+
+      The order in which steps 1 and 2 take place is interchangeable.
+
+3. Submit the request by invoking :c:func:`MEDIA_REQUEST_IOC_QUEUE` on the
+   request FD.
+
+    If the request is submitted without an ``OUTPUT`` buffer, or if some of the
+    required controls are missing from the request, then
+    :c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
+    ``OUTPUT`` buffer is queued, then it will return ``-EINVAL``.
+    :c:func:`MEDIA_REQUEST_IOC_QUEUE` returning non-zero means that no
+    ``CAPTURE`` buffer will be produced for this request.
+
+``CAPTURE`` buffers must not be part of the request, and are queued
+independently. They are returned in decode order (i.e. the same order as coded
+frames were submitted to the ``OUTPUT`` queue).
+
+Runtime decoding errors are signaled by the dequeued ``CAPTURE`` buffers
+carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If a decoded reference frame has an
+error, then all following decoded frames that refer to it also have the
+``V4L2_BUF_FLAG_ERROR`` flag set, although the decoder will still try to
+produce a (likely corrupted) frame.
+
+Buffer management while decoding
+================================
+Contrary to stateful decoders, a stateless decoder does not perform any kind of
+buffer management: it only guarantees that dequeued ``CAPTURE`` buffers can be
+used by the client for as long as they are not queued again. "Used" here
+encompasses using the buffer for compositing or display.
+
+A dequeued capture buffer can also be used as the reference frame of another
+buffer.
+
+A frame is specified as reference by converting its timestamp into nanoseconds,
+and storing it into the relevant member of a codec-dependent control structure.
+The :c:func:`v4l2_timeval_to_ns` function must be used to perform that
+conversion. The timestamp of a frame can be used to reference it as soon as all
+its units of encoded data are successfully submitted to the ``OUTPUT`` queue.
+
+Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
+soon as all the frames they are affecting have been queued to the ``OUTPUT``
+queue. The driver will refrain from using the reference buffer as a decoding
+target until all the frames depending on it are decoded.
+
+Also, when queuing a decoding request, the driver will increase the reference
+count of all the resources associated with reference frames. This means that the
+client can e.g. close the DMABUF file descriptors of reference frame buffers if
+it won't need it afterwards.
+
+Seeking
+=======
+In order to seek, the client just needs to submit requests using input buffers
+corresponding to the new stream position. It must however be aware that
+resolution may have changed and follow the dynamic resolution change sequence in
+that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
+for H.264) may have changed and the client is responsible for making sure that a
+valid state is sent to the decoder.
+
+The client is then free to ignore any returned ``CAPTURE`` buffer that comes
+from the pre-seek position.
+
+Pause
+=====
+
+In order to pause, the client can just cease queuing buffers onto the ``OUTPUT``
+queue. Without source bitstream data, there is no data to process and the codec
+will remain idle.
+
+Dynamic resolution change
+=========================
+
+If the client detects a resolution change in the stream, it will need to perform
+the initialization sequence again with the new resolution:
+
+1. Wait until all submitted requests have completed and dequeue the
+   corresponding output buffers.
+
+2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
+   queues.
+
+3. Free all ``CAPTURE`` buffers by calling :c:func:`VIDIOC_REQBUFS` on the
+   ``CAPTURE`` queue with a buffer count of zero.
+
+4. Perform the initialization sequence again (minus the allocation of
+   ``OUTPUT`` buffers), with the new resolution set on the ``OUTPUT`` queue.
+   Note that due to resolution constraints, a different format may need to be
+   picked on the ``CAPTURE`` queue.
+
+Drain
+=====
+
+In order to drain the stream on a stateless decoder, the client just needs to
+wait until all the submitted requests are completed. There is no need to send a
+``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
+decoder.
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
  2019-02-13  5:53 [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface Alexandre Courbot
@ 2019-02-13  5:58 ` Alexandre Courbot
  2019-02-13  8:59   ` Hans Verkuil
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Courbot @ 2019-02-13  5:58 UTC (permalink / raw)
  To: Tomasz Figa, Paul Kocialkowski, Maxime Ripard, Hans Verkuil,
	Mauro Carvalho Chehab, Pawel Osciak, Linux Media Mailing List
  Cc: LKML

On Wed, Feb 13, 2019 at 2:53 PM Alexandre Courbot <acourbot@chromium.org> wrote:
> [snip]
> +Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
> +soon as all the frames they are affecting have been queued to the ``OUTPUT``
> +queue. The driver will refrain from using the reference buffer as a decoding
> +target until all the frames depending on it are decoded.

Just want to highlight this part in order to make sure that this is
indeed what we agreed on. The recent changes to vb2_find_timestamp()
suggest this, but maybe I misunderstood the intent. It makes the
kernel responsible for tracking referenced buffers and not using them
until all the dependent frames are decoded, something the client could
also do.

Thanks!
Alex.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
  2019-02-13  5:58 ` Alexandre Courbot
@ 2019-02-13  8:59   ` Hans Verkuil
  2019-02-13  9:20     ` Paul Kocialkowski
  0 siblings, 1 reply; 8+ messages in thread
From: Hans Verkuil @ 2019-02-13  8:59 UTC (permalink / raw)
  To: Alexandre Courbot, Tomasz Figa, Paul Kocialkowski, Maxime Ripard,
	Mauro Carvalho Chehab, Pawel Osciak, Linux Media Mailing List
  Cc: LKML

On 2/13/19 6:58 AM, Alexandre Courbot wrote:
> On Wed, Feb 13, 2019 at 2:53 PM Alexandre Courbot <acourbot@chromium.org> wrote:
>> [snip]
>> +Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
>> +soon as all the frames they are affecting have been queued to the ``OUTPUT``
>> +queue. The driver will refrain from using the reference buffer as a decoding
>> +target until all the frames depending on it are decoded.
> 
> Just want to highlight this part in order to make sure that this is
> indeed what we agreed on. The recent changes to vb2_find_timestamp()
> suggest this, but maybe I misunderstood the intent. It makes the
> kernel responsible for tracking referenced buffers and not using them
> until all the dependent frames are decoded, something the client could
> also do.

I don't think this is quite right. Once this patch https://patchwork.linuxtv.org/patch/54275/
is in the vb2 core will track when a buffer can no longer be used as a
reference buffer because the underlying memory might have disappeared.

The core does not check if it makes sense to use a buffer as a reference
frame, just that it is valid memory.

So the driver has to check that the timestamp refers to an existing
buffer, but userspace has to check that it queues everything in the
right order and that the reference buffer won't be overwritten
before the last output buffer using that reference buffer has been
decoded.

So I would say that the second sentence in your paragraph is wrong.

The first sentence isn't quite right either, but I am not really sure how
to phrase it. It is possible to queue a reference buffer even if
not all output buffers referring to it have been decoded, provided
that by the time the driver starts to use this buffer this actually
has happened.

But this is an optimization and theoretically it can depend on the
driver behavior. It is always safe to only queue a reference frame
when all frames depending on it have been decoded. So I am leaning
towards not complicating matters and keeping your first sentence
as-is.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
  2019-02-13  8:59   ` Hans Verkuil
@ 2019-02-13  9:20     ` Paul Kocialkowski
  2019-02-13  9:57       ` Hans Verkuil
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Kocialkowski @ 2019-02-13  9:20 UTC (permalink / raw)
  To: Hans Verkuil, Alexandre Courbot, Tomasz Figa, Maxime Ripard,
	Mauro Carvalho Chehab, Pawel Osciak, Linux Media Mailing List
  Cc: LKML

Hi,

On Wed, 2019-02-13 at 09:59 +0100, Hans Verkuil wrote:
> On 2/13/19 6:58 AM, Alexandre Courbot wrote:
> > On Wed, Feb 13, 2019 at 2:53 PM Alexandre Courbot <acourbot@chromium.org> wrote:
> > > [snip]
> > > +Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
> > > +soon as all the frames they are affecting have been queued to the ``OUTPUT``
> > > +queue. The driver will refrain from using the reference buffer as a decoding
> > > +target until all the frames depending on it are decoded.
> > 
> > Just want to highlight this part in order to make sure that this is
> > indeed what we agreed on. The recent changes to vb2_find_timestamp()
> > suggest this, but maybe I misunderstood the intent. It makes the
> > kernel responsible for tracking referenced buffers and not using them
> > until all the dependent frames are decoded, something the client could
> > also do.
> 
> I don't think this is quite right. Once this patch https://patchwork.linuxtv.org/patch/54275/
> is in the vb2 core will track when a buffer can no longer be used as a
> reference buffer because the underlying memory might have disappeared.
> 
> The core does not check if it makes sense to use a buffer as a reference
> frame, just that it is valid memory.
> 
> So the driver has to check that the timestamp refers to an existing
> buffer, but userspace has to check that it queues everything in the
> right order and that the reference buffer won't be overwritten
> before the last output buffer using that reference buffer has been
> decoded.
> 
> So I would say that the second sentence in your paragraph is wrong.
> 
> The first sentence isn't quite right either, but I am not really sure how
> to phrase it. It is possible to queue a reference buffer even if
> not all output buffers referring to it have been decoded, provided
> that by the time the driver starts to use this buffer this actually
> has happened.

Is there a way we can guarantee this? Looking at the rest of the spec,
it says that capture buffers "are returned in decode order" but that
doesn't imply that they are picked up in the order they are queued.

It seems quite troublesome for the core to check whether each queued
capture buffer is used as a reference for one of the queued requests to
decide whether to pick it up or not.

> But this is an optimization and theoretically it can depend on the
> driver behavior. It is always safe to only queue a reference frame
> when all frames depending on it have been decoded. So I am leaning
> towards not complicating matters and keeping your first sentence
> as-is.

Yes, I believe it would be much simpler to require userspace to only
queue capture buffers once they are no longer needed as references.

This also means that the dmabuf fd can't be changed so we are
guaranteed that memory will still be there.

Cheers,

Paul

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
  2019-02-13  9:20     ` Paul Kocialkowski
@ 2019-02-13  9:57       ` Hans Verkuil
  2019-02-13 11:04         ` Paul Kocialkowski
  0 siblings, 1 reply; 8+ messages in thread
From: Hans Verkuil @ 2019-02-13  9:57 UTC (permalink / raw)
  To: Paul Kocialkowski, Alexandre Courbot, Tomasz Figa, Maxime Ripard,
	Mauro Carvalho Chehab, Pawel Osciak, Linux Media Mailing List
  Cc: LKML

On 2/13/19 10:20 AM, Paul Kocialkowski wrote:
> Hi,
> 
> On Wed, 2019-02-13 at 09:59 +0100, Hans Verkuil wrote:
>> On 2/13/19 6:58 AM, Alexandre Courbot wrote:
>>> On Wed, Feb 13, 2019 at 2:53 PM Alexandre Courbot <acourbot@chromium.org> wrote:
>>>> [snip]
>>>> +Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
>>>> +soon as all the frames they are affecting have been queued to the ``OUTPUT``
>>>> +queue. The driver will refrain from using the reference buffer as a decoding
>>>> +target until all the frames depending on it are decoded.
>>>
>>> Just want to highlight this part in order to make sure that this is
>>> indeed what we agreed on. The recent changes to vb2_find_timestamp()
>>> suggest this, but maybe I misunderstood the intent. It makes the
>>> kernel responsible for tracking referenced buffers and not using them
>>> until all the dependent frames are decoded, something the client could
>>> also do.
>>
>> I don't think this is quite right. Once this patch https://patchwork.linuxtv.org/patch/54275/
>> is in the vb2 core will track when a buffer can no longer be used as a
>> reference buffer because the underlying memory might have disappeared.
>>
>> The core does not check if it makes sense to use a buffer as a reference
>> frame, just that it is valid memory.
>>
>> So the driver has to check that the timestamp refers to an existing
>> buffer, but userspace has to check that it queues everything in the
>> right order and that the reference buffer won't be overwritten
>> before the last output buffer using that reference buffer has been
>> decoded.
>>
>> So I would say that the second sentence in your paragraph is wrong.
>>
>> The first sentence isn't quite right either, but I am not really sure how
>> to phrase it. It is possible to queue a reference buffer even if
>> not all output buffers referring to it have been decoded, provided
>> that by the time the driver starts to use this buffer this actually
>> has happened.
> 
> Is there a way we can guarantee this? Looking at the rest of the spec,
> it says that capture buffers "are returned in decode order" but that
> doesn't imply that they are picked up in the order they are queued.
> 
> It seems quite troublesome for the core to check whether each queued
> capture buffer is used as a reference for one of the queued requests to
> decide whether to pick it up or not.

The core only checks that the timestamp points to a valid buffer.

It is not up to the core or the driver to do anything else. If userspace
gives a reference to a wrong buffer or one that is already overwritten,
then you just get bad decoded video, but nothing crashes.

> 
>> But this is an optimization and theoretically it can depend on the
>> driver behavior. It is always safe to only queue a reference frame
>> when all frames depending on it have been decoded. So I am leaning
>> towards not complicating matters and keeping your first sentence
>> as-is.
> 
> Yes, I believe it would be much simpler to require userspace to only
> queue capture buffers once they are no longer needed as references.

I think that's what we should document, but in cases where you know
the hardware (i.e. an embedded system) it should be allowed to optimize
and have the application queue a capture buffer containing a reference
frame even if it is still in use by already queued output buffers.

That way you can achieve optimal speed and memory usage.

I think this is a desirable feature.

> This also means that the dmabuf fd can't be changed so we are
> guaranteed that memory will still be there.

This is easy enough to check for, so I rather have some checks in
the core for this than prohibiting optimizing the decoder memory
usage.

Regards,

	Hans

> 
> Cheers,
> 
> Paul
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
  2019-02-13  9:57       ` Hans Verkuil
@ 2019-02-13 11:04         ` Paul Kocialkowski
  2019-02-26  3:33           ` Alexandre Courbot
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Kocialkowski @ 2019-02-13 11:04 UTC (permalink / raw)
  To: Hans Verkuil, Alexandre Courbot, Tomasz Figa, Maxime Ripard,
	Mauro Carvalho Chehab, Pawel Osciak, Linux Media Mailing List
  Cc: LKML

Hi,

On Wed, 2019-02-13 at 10:57 +0100, Hans Verkuil wrote:
> On 2/13/19 10:20 AM, Paul Kocialkowski wrote:
> > Hi,
> > 
> > On Wed, 2019-02-13 at 09:59 +0100, Hans Verkuil wrote:
> > > On 2/13/19 6:58 AM, Alexandre Courbot wrote:
> > > > On Wed, Feb 13, 2019 at 2:53 PM Alexandre Courbot <acourbot@chromium.org> wrote:
> > > > > [snip]
> > > > > +Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
> > > > > +soon as all the frames they are affecting have been queued to the ``OUTPUT``
> > > > > +queue. The driver will refrain from using the reference buffer as a decoding
> > > > > +target until all the frames depending on it are decoded.
> > > > 
> > > > Just want to highlight this part in order to make sure that this is
> > > > indeed what we agreed on. The recent changes to vb2_find_timestamp()
> > > > suggest this, but maybe I misunderstood the intent. It makes the
> > > > kernel responsible for tracking referenced buffers and not using them
> > > > until all the dependent frames are decoded, something the client could
> > > > also do.
> > > 
> > > I don't think this is quite right. Once this patch https://patchwork.linuxtv.org/patch/54275/
> > > is in the vb2 core will track when a buffer can no longer be used as a
> > > reference buffer because the underlying memory might have disappeared.
> > > 
> > > The core does not check if it makes sense to use a buffer as a reference
> > > frame, just that it is valid memory.
> > > 
> > > So the driver has to check that the timestamp refers to an existing
> > > buffer, but userspace has to check that it queues everything in the
> > > right order and that the reference buffer won't be overwritten
> > > before the last output buffer using that reference buffer has been
> > > decoded.
> > > 
> > > So I would say that the second sentence in your paragraph is wrong.
> > > 
> > > The first sentence isn't quite right either, but I am not really sure how
> > > to phrase it. It is possible to queue a reference buffer even if
> > > not all output buffers referring to it have been decoded, provided
> > > that by the time the driver starts to use this buffer this actually
> > > has happened.
> > 
> > Is there a way we can guarantee this? Looking at the rest of the spec,
> > it says that capture buffers "are returned in decode order" but that
> > doesn't imply that they are picked up in the order they are queued.
> > 
> > It seems quite troublesome for the core to check whether each queued
> > capture buffer is used as a reference for one of the queued requests to
> > decide whether to pick it up or not.
> 
> The core only checks that the timestamp points to a valid buffer.
> 
> It is not up to the core or the driver to do anything else. If userspace
> gives a reference to a wrong buffer or one that is already overwritten,
> then you just get bad decoded video, but nothing crashes.

Yes, that makes sense. My concern was mainly about cases where the
capture buffers could be consumed by the driver in a different order
than they are queued by userspace (which could lead to the reference
buffer being reused too early). But thinking about it twice, I don't
see a reason why this could happen.

> > > But this is an optimization and theoretically it can depend on the
> > > driver behavior. It is always safe to only queue a reference frame
> > > when all frames depending on it have been decoded. So I am leaning
> > > towards not complicating matters and keeping your first sentence
> > > as-is.
> > 
> > Yes, I believe it would be much simpler to require userspace to only
> > queue capture buffers once they are no longer needed as references.
> 
> I think that's what we should document, but in cases where you know
> the hardware (i.e. an embedded system) it should be allowed to optimize
> and have the application queue a capture buffer containing a reference
> frame even if it is still in use by already queued output buffers.
> 
> That way you can achieve optimal speed and memory usage.
> 
> I think this is a desirable feature.

Yes, definitely.

> > This also means that the dmabuf fd can't be changed so we are
> > guaranteed that memory will still be there.
> 
> This is easy enough to check for, so I rather have some checks in
> the core for this than prohibiting optimizing the decoder memory
> usage.

Cheers,

Paul

-- 
Paul Kocialkowski, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
  2019-02-13 11:04         ` Paul Kocialkowski
@ 2019-02-26  3:33           ` Alexandre Courbot
  2019-02-28 10:14             ` Hans Verkuil
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandre Courbot @ 2019-02-26  3:33 UTC (permalink / raw)
  To: Paul Kocialkowski
  Cc: Hans Verkuil, Tomasz Figa, Maxime Ripard, Mauro Carvalho Chehab,
	Pawel Osciak, Linux Media Mailing List, LKML

Hi, sorry for the delayed reply!

On Wed, Feb 13, 2019 at 8:04 PM Paul Kocialkowski
<paul.kocialkowski@bootlin.com> wrote:
>
> Hi,
>
> On Wed, 2019-02-13 at 10:57 +0100, Hans Verkuil wrote:
> > On 2/13/19 10:20 AM, Paul Kocialkowski wrote:
> > > Hi,
> > >
> > > On Wed, 2019-02-13 at 09:59 +0100, Hans Verkuil wrote:
> > > > On 2/13/19 6:58 AM, Alexandre Courbot wrote:
> > > > > On Wed, Feb 13, 2019 at 2:53 PM Alexandre Courbot <acourbot@chromium.org> wrote:
> > > > > > [snip]
> > > > > > +Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
> > > > > > +soon as all the frames they are affecting have been queued to the ``OUTPUT``
> > > > > > +queue. The driver will refrain from using the reference buffer as a decoding
> > > > > > +target until all the frames depending on it are decoded.
> > > > >
> > > > > Just want to highlight this part in order to make sure that this is
> > > > > indeed what we agreed on. The recent changes to vb2_find_timestamp()
> > > > > suggest this, but maybe I misunderstood the intent. It makes the
> > > > > kernel responsible for tracking referenced buffers and not using them
> > > > > until all the dependent frames are decoded, something the client could
> > > > > also do.
> > > >
> > > > I don't think this is quite right. Once this patch https://patchwork.linuxtv.org/patch/54275/
> > > > is in the vb2 core will track when a buffer can no longer be used as a
> > > > reference buffer because the underlying memory might have disappeared.
> > > >
> > > > The core does not check if it makes sense to use a buffer as a reference
> > > > frame, just that it is valid memory.
> > > >
> > > > So the driver has to check that the timestamp refers to an existing
> > > > buffer, but userspace has to check that it queues everything in the
> > > > right order and that the reference buffer won't be overwritten
> > > > before the last output buffer using that reference buffer has been
> > > > decoded.
> > > >
> > > > So I would say that the second sentence in your paragraph is wrong.
> > > >
> > > > The first sentence isn't quite right either, but I am not really sure how
> > > > to phrase it. It is possible to queue a reference buffer even if
> > > > not all output buffers referring to it have been decoded, provided
> > > > that by the time the driver starts to use this buffer this actually
> > > > has happened.
> > >
> > > Is there a way we can guarantee this? Looking at the rest of the spec,
> > > it says that capture buffers "are returned in decode order" but that
> > > doesn't imply that they are picked up in the order they are queued.
> > >
> > > It seems quite troublesome for the core to check whether each queued
> > > capture buffer is used as a reference for one of the queued requests to
> > > decide whether to pick it up or not.
> >
> > The core only checks that the timestamp points to a valid buffer.
> >
> > It is not up to the core or the driver to do anything else. If userspace
> > gives a reference to a wrong buffer or one that is already overwritten,
> > then you just get bad decoded video, but nothing crashes.
>
> Yes, that makes sense. My concern was mainly about cases where the
> capture buffers could be consumed by the driver in a different order
> than they are queued by userspace (which could lead to the reference
> buffer being reused too early). But thinking about it twice, I don't
> see a reason why this could happen.

Do we have a guarantee that it won't happen though? AFAICT the
behavior that CAPTURE buffers must be processed in queue order is not
documented anywhere, and not guaranteed by VB2 (even though
implementation-wise it may currently be the case). So with the current
state of the specification, the only safe wording I can use is "do not
queue a reference buffer back until all the frames depending on it
have been dequeued".

However, as Hans mentioned it would be nice to be able to assume that
the capture queue is FIFO and let user-space rely in that fact to
queue buffers containing reference frames earlier.

>
> > > > But this is an optimization and theoretically it can depend on the
> > > > driver behavior. It is always safe to only queue a reference frame
> > > > when all frames depending on it have been decoded. So I am leaning
> > > > towards not complicating matters and keeping your first sentence
> > > > as-is.
> > >
> > > Yes, I believe it would be much simpler to require userspace to only
> > > queue capture buffers once they are no longer needed as references.
> >
> > I think that's what we should document, but in cases where you know
> > the hardware (i.e. an embedded system) it should be allowed to optimize
> > and have the application queue a capture buffer containing a reference
> > frame even if it is still in use by already queued output buffers.
> >
> > That way you can achieve optimal speed and memory usage.
> >
> > I think this is a desirable feature.
>
> Yes, definitely.

I guess the question comes down to "how can user-space know that the
hardware supports this"? Do we have a flag that we can return to
signal this behavior? Or can we just define the CAPTURE queue as being
FIFO for stateless codecs? The latter would make sense IMHO.

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface
  2019-02-26  3:33           ` Alexandre Courbot
@ 2019-02-28 10:14             ` Hans Verkuil
  0 siblings, 0 replies; 8+ messages in thread
From: Hans Verkuil @ 2019-02-28 10:14 UTC (permalink / raw)
  To: Alexandre Courbot, Paul Kocialkowski
  Cc: Tomasz Figa, Maxime Ripard, Mauro Carvalho Chehab, Pawel Osciak,
	Linux Media Mailing List, LKML

On 2/26/19 4:33 AM, Alexandre Courbot wrote:
> Hi, sorry for the delayed reply!
> 
> On Wed, Feb 13, 2019 at 8:04 PM Paul Kocialkowski
> <paul.kocialkowski@bootlin.com> wrote:
>>
>> Hi,
>>
>> On Wed, 2019-02-13 at 10:57 +0100, Hans Verkuil wrote:
>>> On 2/13/19 10:20 AM, Paul Kocialkowski wrote:
>>>> Hi,
>>>>
>>>> On Wed, 2019-02-13 at 09:59 +0100, Hans Verkuil wrote:
>>>>> On 2/13/19 6:58 AM, Alexandre Courbot wrote:
>>>>>> On Wed, Feb 13, 2019 at 2:53 PM Alexandre Courbot <acourbot@chromium.org> wrote:
>>>>>>> [snip]
>>>>>>> +Buffers used as reference frames can be queued back to the ``CAPTURE`` queue as
>>>>>>> +soon as all the frames they are affecting have been queued to the ``OUTPUT``
>>>>>>> +queue. The driver will refrain from using the reference buffer as a decoding
>>>>>>> +target until all the frames depending on it are decoded.
>>>>>>
>>>>>> Just want to highlight this part in order to make sure that this is
>>>>>> indeed what we agreed on. The recent changes to vb2_find_timestamp()
>>>>>> suggest this, but maybe I misunderstood the intent. It makes the
>>>>>> kernel responsible for tracking referenced buffers and not using them
>>>>>> until all the dependent frames are decoded, something the client could
>>>>>> also do.
>>>>>
>>>>> I don't think this is quite right. Once this patch https://patchwork.linuxtv.org/patch/54275/
>>>>> is in the vb2 core will track when a buffer can no longer be used as a
>>>>> reference buffer because the underlying memory might have disappeared.
>>>>>
>>>>> The core does not check if it makes sense to use a buffer as a reference
>>>>> frame, just that it is valid memory.
>>>>>
>>>>> So the driver has to check that the timestamp refers to an existing
>>>>> buffer, but userspace has to check that it queues everything in the
>>>>> right order and that the reference buffer won't be overwritten
>>>>> before the last output buffer using that reference buffer has been
>>>>> decoded.
>>>>>
>>>>> So I would say that the second sentence in your paragraph is wrong.
>>>>>
>>>>> The first sentence isn't quite right either, but I am not really sure how
>>>>> to phrase it. It is possible to queue a reference buffer even if
>>>>> not all output buffers referring to it have been decoded, provided
>>>>> that by the time the driver starts to use this buffer this actually
>>>>> has happened.
>>>>
>>>> Is there a way we can guarantee this? Looking at the rest of the spec,
>>>> it says that capture buffers "are returned in decode order" but that
>>>> doesn't imply that they are picked up in the order they are queued.
>>>>
>>>> It seems quite troublesome for the core to check whether each queued
>>>> capture buffer is used as a reference for one of the queued requests to
>>>> decide whether to pick it up or not.
>>>
>>> The core only checks that the timestamp points to a valid buffer.
>>>
>>> It is not up to the core or the driver to do anything else. If userspace
>>> gives a reference to a wrong buffer or one that is already overwritten,
>>> then you just get bad decoded video, but nothing crashes.
>>
>> Yes, that makes sense. My concern was mainly about cases where the
>> capture buffers could be consumed by the driver in a different order
>> than they are queued by userspace (which could lead to the reference
>> buffer being reused too early). But thinking about it twice, I don't
>> see a reason why this could happen.
> 
> Do we have a guarantee that it won't happen though? AFAICT the
> behavior that CAPTURE buffers must be processed in queue order is not
> documented anywhere, and not guaranteed by VB2 (even though
> implementation-wise it may currently be the case). So with the current
> state of the specification, the only safe wording I can use is "do not
> queue a reference buffer back until all the frames depending on it
> have been dequeued".
> 
> However, as Hans mentioned it would be nice to be able to assume that
> the capture queue is FIFO and let user-space rely in that fact to
> queue buffers containing reference frames earlier.

I would not be opposed to adding a capability that explicitly states that
the given vb2 queue is always ordered. It would always be true for drivers
using the v4l2-mem2mem framework (and can be set there).

Unordered queues make no sense for m2m devices, at least I cannot think
of any use-case for it.

> 
>>
>>>>> But this is an optimization and theoretically it can depend on the
>>>>> driver behavior. It is always safe to only queue a reference frame
>>>>> when all frames depending on it have been decoded. So I am leaning
>>>>> towards not complicating matters and keeping your first sentence
>>>>> as-is.
>>>>
>>>> Yes, I believe it would be much simpler to require userspace to only
>>>> queue capture buffers once they are no longer needed as references.
>>>
>>> I think that's what we should document, but in cases where you know
>>> the hardware (i.e. an embedded system) it should be allowed to optimize
>>> and have the application queue a capture buffer containing a reference
>>> frame even if it is still in use by already queued output buffers.
>>>
>>> That way you can achieve optimal speed and memory usage.
>>>
>>> I think this is a desirable feature.
>>
>> Yes, definitely.
> 
> I guess the question comes down to "how can user-space know that the
> hardware supports this"? Do we have a flag that we can return to
> signal this behavior? Or can we just define the CAPTURE queue as being
> FIFO for stateless codecs? The latter would make sense IMHO.

As far as I know all m2m devices are ordered today. As are all video
output devices. For video capture devices I know of one driver that is unordered:
cobalt. As far as I know all other video capture devices that use vb2 or the
old videobuf are all ordered. A few very old drivers that do not use these frameworks
would have to be reviewed to see if they do anything weird.

I think the cobalt driver can be modified so that it is ordered as well.

There are three options: add a V4L2_BUF_CAP_IS_ORDERED flag, add a
V4L2_BUF_CAP_IS_UNORDERED flag, or just document and require that all v4l2 devices
are ordered (nobody cares about the cobalt driver, it's Cisco internal hardware).

I'm honestly not certain which of these options is the best, keeping in mind that
we have no idea if reordering might be needed in the future.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-02-28 10:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13  5:53 [PATCH v3] media: docs-rst: Document m2m stateless video decoder interface Alexandre Courbot
2019-02-13  5:58 ` Alexandre Courbot
2019-02-13  8:59   ` Hans Verkuil
2019-02-13  9:20     ` Paul Kocialkowski
2019-02-13  9:57       ` Hans Verkuil
2019-02-13 11:04         ` Paul Kocialkowski
2019-02-26  3:33           ` Alexandre Courbot
2019-02-28 10:14             ` Hans Verkuil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).