All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
@ 2018-10-04  8:11 Alexandre Courbot
  2018-10-04 12:41 ` Paul Kocialkowski
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Alexandre Courbot @ 2018-10-04  8:11 UTC (permalink / raw)
  To: Tomasz Figa, Paul Kocialkowski, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, linux-media
  Cc: linux-kernel, Alexandre Courbot

This patch documents the protocol that user-space should follow when
communicating with stateless video decoders. It is based on the
following references:

* The current protocol used by Chromium (converted from config store to
  request API)

* The submitted Cedrus VPU driver

As such, some things may not be entirely consistent with the current
state of drivers, so it would be great if all stakeholders could point
out these inconsistencies. :)

This patch is supposed to be applied on top of the Request API V18 as
well as the memory-to-memory video decoder interface series by Tomasz
Figa.

Changes since V1:

* Applied fixes received as feedback,
* Moved controls descriptions to the extended controls file,
* Document reference frame management and referencing (need Hans' feedback on
  that).

Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
---
 .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
 Documentation/media/uapi/v4l/devices.rst      |   1 +
 .../media/uapi/v4l/extended-controls.rst      |  25 ++
 .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
 4 files changed, 424 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst

diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
new file mode 100644
index 000000000000..e54246df18d0
--- /dev/null
+++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
@@ -0,0 +1,348 @@
+.. -*- coding: utf-8; mode: rst -*-
+
+.. _stateless_decoder:
+
+**************************************************
+Memory-to-memory Stateless Video Decoder Interface
+**************************************************
+
+A stateless decoder is a decoder that works without retaining any kind of state
+between processing frames. This means that each frame is decoded independently
+of any previous and future frames, and that the client is responsible for
+maintaining the decoding state and providing it to the driver. This is in
+contrast to the stateful video decoder interface, where the hardware maintains
+the decoding state and all the client has to do is to provide the raw encoded
+stream.
+
+This section describes how user-space ("the client") is expected to communicate
+with such decoders in order to successfully decode an encoded stream. Compared
+to stateful codecs, the driver/client sequence is simpler, but the cost of this
+simplicity is extra complexity in the client which must maintain a consistent
+decoding state.
+
+Querying capabilities
+=====================
+
+1. To enumerate the set of coded formats supported by the driver, the client
+   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
+
+   * The driver must always return the full set of supported ``OUTPUT`` formats,
+     irrespective of the format currently set on the ``CAPTURE`` queue.
+
+2. To enumerate the set of supported raw formats, the client calls
+   :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
+
+   * The driver must return only the formats supported for the format currently
+     active on the ``OUTPUT`` queue.
+
+   * Depending on the currently set ``OUTPUT`` format, the set of supported raw
+     formats may depend on the value of some controls (e.g. H264 or VP9
+     profile). The client is responsible for making sure that these controls
+     are set to the desired value before querying the ``CAPTURE`` queue.
+
+   * In order to enumerate raw formats supported by a given coded format, the
+     client must thus set that coded format on the ``OUTPUT`` queue first, then
+     set any control listed on the format's description, and finally enumerate
+     the ``CAPTURE`` queue.
+
+3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
+   resolutions for a given format, passing desired pixel format in
+   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT`` queue
+     must include all possible coded resolutions supported by the decoder
+     for given coded pixel format.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE`` queue
+     must include all possible frame buffer resolutions supported by the
+     decoder for given raw pixel format and coded format currently set on
+     ``OUTPUT`` queue.
+
+    .. note::
+
+       The client may derive the supported resolution range for a
+       combination of coded and raw format by setting width and height of
+       ``OUTPUT`` format to 0 and calculating the intersection of
+       resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
+       for the given coded and raw formats.
+
+4. Supported profiles and levels for given format, if applicable, may be
+   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
+
+Initialization
+==============
+
+1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
+   capability enumeration.
+
+2. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``pixelformat``
+         a coded pixel format
+
+     ``width``, ``height``
+         coded width and height parsed from the stream
+
+     other fields
+         follow standard semantics
+
+   .. note::
+
+      Changing ``OUTPUT`` format may change currently set ``CAPTURE``
+      format. The driver will derive a new ``CAPTURE`` format from
+      ``OUTPUT`` format being set, including resolution, colorimetry
+      parameters, etc. If the client needs a specific ``CAPTURE`` format,
+      it must adjust it afterwards.
+
+3. Call :c:func:`VIDIOC_S_EXT_CTRLS` to set all the controls (profile, etc)
+   required by the ``OUTPUT`` format to enumerate the ``CAPTURE`` formats.
+
+4. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
+   destination buffers parsed/decoded from the bitstream.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+   * **Return fields:**
+
+     ``width``, ``height``
+         frame buffer resolution for the decoded frames
+
+     ``pixelformat``
+         pixel format for decoded frames
+
+     ``num_planes`` (for _MPLANE ``type`` only)
+         number of planes for pixelformat
+
+     ``sizeimage``, ``bytesperline``
+         as per standard semantics; matching frame buffer format
+
+   .. note::
+
+      The value of ``pixelformat`` may be any pixel format supported for the
+      ``OUTPUT`` format, based on the hardware capabilities. It is suggested
+      that driver chooses the preferred/optimal format for given configuration.
+      For example, a YUV format may be preferred over an RGB format, if
+      additional conversion step would be required.
+
+5. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
+   ``CAPTURE`` queue. The client may use this ioctl to discover which
+   alternative raw formats are supported for the current ``OUTPUT`` format and
+   select one of them via :c:func:`VIDIOC_S_FMT`.
+
+   .. note::
+
+      The driver will return only formats supported for the currently selected
+      ``OUTPUT`` format, even if more formats may be supported by the driver in
+      general.
+
+      For example, a driver/hardware may support YUV and RGB formats for
+      resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
+      to hardware limitations). After setting a resolution of 1920x1088 or lower
+      as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
+      YUV and RGB pixel formats, but after setting a resolution higher than
+      1920x1088, the driver will not return RGB, unsupported for this
+      resolution.
+
+6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
+   :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
+   choose a different format than selected/suggested by the driver in
+   :c:func:`VIDIOC_G_FMT`.
+
+    * **Required fields:**
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+      ``pixelformat``
+          a raw pixel format
+
+7. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
+   ``OUTPUT`` queue.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+      ``memory``
+          follows standard semantics
+
+      ``sizeimage``
+          follows standard semantics; the client is free to choose any
+          suitable size, however, it may be subject to change by the
+          driver
+
+    * **Return fields:**
+
+      ``count``
+          actual number of buffers allocated
+
+    * The driver must adjust count to minimum of required number of ``OUTPUT``
+      buffers for given format and count passed. The client must check this
+      value after the ioctl returns to get the number of buffers allocated.
+
+    .. note::
+
+       To allocate more than minimum number of buffers (for pipeline depth), use
+       G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to get minimum number of
+       buffers required by the driver/format, and pass the obtained value plus
+       the number of additional buffers needed in count to
+       :c:func:`VIDIOC_REQBUFS`.
+
+8. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
+   ``CAPTURE`` queue.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+      ``memory``
+          follows standard semantics
+
+    * **Return fields:**
+
+      ``count``
+          adjusted to allocated number of buffers
+
+    * The driver must adjust count to minimum of required number of
+      destination buffers for given format and stream configuration and the
+      count passed. The client must check this value after the ioctl
+      returns to get the number of buffers allocated.
+
+    .. note::
+
+       To allocate more than minimum number of buffers (for pipeline
+       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
+       get minimum number of buffers required, and pass the obtained value
+       plus the number of additional buffers needed in count to
+       :c:func:`VIDIOC_REQBUFS`.
+
+9. Allocate requests (likely one per ``OUTPUT`` buffer) via
+    :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
+
+10. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
+    :c:func:`VIDIOC_STREAMON`.
+
+Decoding
+========
+
+For each frame, the client is responsible for submitting a request to which the
+following is attached:
+
+* Exactly one frame worth of encoded data in a buffer submitted to the
+  ``OUTPUT`` queue,
+* All the controls relevant to the format being decoded (see below for details).
+
+``CAPTURE`` buffers must not be part of the request, but must be queued
+independently. The driver will pick one of the queued ``CAPTURE`` buffers and
+decode the frame into it. Although the client has no control over which
+``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
+that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
+as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
+``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
+
+If the request is submitted without an ``OUTPUT`` buffer, then
+:c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
+buffer is queued, or if some of the required controls are missing, then it will
+return ``-EINVAL``. Decoding errors are signaled by the ``CAPTURE`` buffers
+being dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If the reference frame
+has an error, then all other frames that refer to it will also set the
+``V4L2_BUF_FLAG_ERROR`` flag.
+
+The contents of source ``OUTPUT`` buffers, as well as the controls that must be
+set on the request, depend on active coded pixel format and might be affected by
+codec-specific extended controls, as stated in documentation of each format.
+Currently supported formats are:
+
+Buffer management during decoding
+=================================
+Contrary to stateful decoder drivers, a stateless decoder driver does not
+perform any kind of buffer management. In particular, it guarantees that
+``CAPTURE`` buffers will be dequeued in the same order as they are queued. This
+allows user-space to know in advance which ``CAPTURE`` buffer will contain a
+given frame, and thus to use that buffer ID as the key to indicate a reference
+frame.
+
+This also means that user-space is fully responsible for not queuing a given
+``CAPTURE`` buffer for as long as it is used as a reference frame. Failure to do
+so will overwrite the reference frame's data while it is still in use, and
+result in visual corruption of future frames.
+
+Note that this applies to all types of buffers, and not only to
+``V4L2_MEMORY_MMAP`` ones, as drivers supporting ``V4L2_MEMORY_DMABUF`` will
+typically maintain a map of buffer IDs to DMABUF handles for reference frame
+management. Queueing a buffer will result in the map entry to be overwritten
+with the new DMABUF handle submitted in the :c:func:`VIDIOC_QBUF` ioctl.
+
+Seek
+====
+In order to seek, the client just needs to submit requests using input buffers
+corresponding to the new stream position. It must however be aware that
+resolution may have changed and follow the dynamic resolution change sequence in
+that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
+for H.264) may have changed and the client is responsible for making sure that
+a valid state is sent to the kernel.
+
+The client is then free to ignore any returned ``CAPTURE`` buffer that comes
+from the pre-seek position.
+
+Pause
+=====
+
+In order to pause, the client should just cease queuing buffers onto the
+``OUTPUT`` queue. This is different from the general V4L2 API definition of
+pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
+Without source bitstream data, there is no data to process and the hardware
+remains idle.
+
+Dynamic resolution change
+=========================
+
+If the client detects a resolution change in the stream, it will need to perform
+the initialization sequence again with the new resolution:
+
+1. Wait until all submitted requests have completed and dequeue the
+   corresponding output buffers.
+
+2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
+   queues.
+
+3. Free all buffers by calling :c:func:`VIDIOC_REQBUFS` on the
+   ``OUTPUT`` and ``CAPTURE`` queues with a buffer count of zero.
+
+Then perform the initialization sequence again, with the new resolution set
+on the ``OUTPUT`` queue. Note that due to resolution constraints, a different
+format may need to be picked on the ``CAPTURE`` queue.
+
+Drain
+=====
+
+In order to drain the stream on a stateless decoder, the client just needs to
+wait until all the submitted requests are completed. There is no need to send a
+``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
+driver.
+
+End of stream
+=============
+
+If the decoder encounters an end of stream marking in the stream, the
+driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
+are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
+:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
+behavior is identical to the drain sequence triggered by the client via
+``V4L2_DEC_CMD_STOP``.
diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
index 1822c66c2154..a8e568eda7d8 100644
--- a/Documentation/media/uapi/v4l/devices.rst
+++ b/Documentation/media/uapi/v4l/devices.rst
@@ -16,6 +16,7 @@ Interfaces
     dev-osd
     dev-codec
     dev-decoder
+    dev-stateless-decoder
     dev-encoder
     dev-effect
     dev-raw-vbi
diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
index a9252225b63e..9d06d853d4ff 100644
--- a/Documentation/media/uapi/v4l/extended-controls.rst
+++ b/Documentation/media/uapi/v4l/extended-controls.rst
@@ -810,6 +810,31 @@ enum v4l2_mpeg_video_bitrate_mode -
     otherwise the decoder expects a single frame in per buffer.
     Applicable to the decoder, all codecs.
 
+.. _v4l2-mpeg-h264:
+
+``V4L2_CID_MPEG_VIDEO_H264_SPS``
+    Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
+    the next queued frame. Applicable to the H.264 stateless decoder.
+
+``V4L2_CID_MPEG_VIDEO_H264_PPS``
+    Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
+    the next queued frame. Applicable to the H.264 stateless decoder.
+
+``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
+    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
+    matrix to use when decoding the next queued frame. Applicable to the H.264
+    stateless decoder.
+
+``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
+    Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
+    entries as there are slices in the corresponding ``OUTPUT`` buffer.
+    Applicable to the H.264 stateless decoder.
+
+``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
+    Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
+    decoding parameters for a H.264 frame. Applicable to the H.264 stateless
+    decoder.
+
 ``V4L2_CID_MPEG_VIDEO_H264_VUI_SAR_ENABLE (boolean)``
     Enable writing sample aspect ratio in the Video Usability
     Information. Applicable to the H264 encoder.
diff --git a/Documentation/media/uapi/v4l/pixfmt-compressed.rst b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
index a86b59f770dd..a03637fda8f9 100644
--- a/Documentation/media/uapi/v4l/pixfmt-compressed.rst
+++ b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
@@ -35,6 +35,42 @@ Compressed Formats
       - ``V4L2_PIX_FMT_H264``
       - 'H264'
       - H264 video elementary stream with start codes.
+    * .. _V4L2-PIX-FMT-H264-SLICE:
+
+      - ``V4L2_PIX_FMT_H264_SLICE``
+      - 'H264'
+      - H264 parsed slice data, as extracted from the H264 bitstream.
+        This format is adapted for stateless video decoders using the M2M and
+        Request APIs.
+
+        ``OUTPUT`` buffers must contain all the macroblock slices of a given
+        frame, i.e. if a frame requires several macroblock slices to be entirely
+        decoded, then all these slices must be provided. In addition, the
+        following metadata controls must be set on the request for each frame:
+
+        V4L2_CID_MPEG_VIDEO_H264_SPS
+           Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use
+           with the frame.
+
+        V4L2_CID_MPEG_VIDEO_H264_PPS
+           Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use
+           with the frame.
+
+        V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX
+           Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the
+           scaling matrix to use when decoding the frame.
+
+        V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
+           Array of struct v4l2_ctrl_h264_slice_param, containing at least as
+           many entries as there are slices in the corresponding ``OUTPUT``
+           buffer.
+
+        V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM
+           Instance of struct v4l2_ctrl_h264_decode_param, containing the
+           high-level decoding parameters for a H.264 frame.
+
+        See the :ref:`associated Codec Control IDs <v4l2-mpeg-h264>` for the
+        format of these controls.
     * .. _V4L2-PIX-FMT-H264-NO-SC:
 
       - ``V4L2_PIX_FMT_H264_NO_SC``
@@ -67,10 +103,20 @@ Compressed Formats
       - MPEG-2 parsed slice data, as extracted from the MPEG-2 bitstream.
 	This format is adapted for stateless video decoders that implement a
 	MPEG-2 pipeline (using the Memory to Memory and Media Request APIs).
-	Metadata associated with the frame to decode is required to be passed
-	through the ``V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS`` control and
-	quantization matrices can optionally be specified through the
-	``V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION`` control.
+
+        ``OUTPUT`` buffers must contain all the macroblock slices of a given
+        frame, i.e. if a frame requires several macroblock slices to be entirely
+        decoded, then all these slices must be provided. In addition, the
+        following metadata controls must be set on the request for each frame:
+
+        V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS
+          Slice parameters (one per slice) for the current frame.
+
+        Optional controls:
+
+        V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION
+          Quantization matrices for the current frame.
+
 	See the :ref:`associated Codec Control IDs <v4l2-mpeg-mpeg2>`.
 	Buffers associated with this pixel format must contain the appropriate
 	number of macroblocks to decode a full corresponding frame.
-- 
2.19.0.605.g01d371f741-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04  8:11 [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface Alexandre Courbot
@ 2018-10-04 12:41 ` Paul Kocialkowski
  2018-10-09  5:58   ` Tomasz Figa
  2018-10-04 12:47 ` Paul Kocialkowski
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Paul Kocialkowski @ 2018-10-04 12:41 UTC (permalink / raw)
  To: Alexandre Courbot, Tomasz Figa, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, linux-media
  Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6369 bytes --]

Hi Alexandre,

Thanks for submitting this second version of the RFC, it is very
appreciated! I will try to provide useful feedback here and hopefully
be more reactive than during v1 review!

Most of it looks good to me, but there is a specific point I'd like to
keep discussing.

Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> This patch documents the protocol that user-space should follow when
> communicating with stateless video decoders. It is based on the
> following references:
> 
> * The current protocol used by Chromium (converted from config store to
>   request API)
> 
> * The submitted Cedrus VPU driver
> 
> As such, some things may not be entirely consistent with the current
> state of drivers, so it would be great if all stakeholders could point
> out these inconsistencies. :)
> 
> This patch is supposed to be applied on top of the Request API V18 as
> well as the memory-to-memory video decoder interface series by Tomasz
> Figa.
> 
> Changes since V1:
> 
> * Applied fixes received as feedback,
> * Moved controls descriptions to the extended controls file,
> * Document reference frame management and referencing (need Hans' feedback on
>   that).
> 
> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> ---
>  .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst      |   1 +
>  .../media/uapi/v4l/extended-controls.rst      |  25 ++
>  .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
>  4 files changed, 424 insertions(+), 4 deletions(-)
>  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> new file mode 100644
> index 000000000000..e54246df18d0
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> @@ -0,0 +1,348 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _stateless_decoder:
> +
> +**************************************************
> +Memory-to-memory Stateless Video Decoder Interface
> +**************************************************
> +
> +A stateless decoder is a decoder that works without retaining any kind of state
> +between processing frames. This means that each frame is decoded independently
> +of any previous and future frames, and that the client is responsible for
> +maintaining the decoding state and providing it to the driver. This is in
> +contrast to the stateful video decoder interface, where the hardware maintains
> +the decoding state and all the client has to do is to provide the raw encoded
> +stream.
> +
> +This section describes how user-space ("the client") is expected to communicate
> +with such decoders in order to successfully decode an encoded stream. Compared
> +to stateful codecs, the driver/client sequence is simpler, but the cost of this
> +simplicity is extra complexity in the client which must maintain a consistent
> +decoding state.
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the client
> +   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> +
> +   * The driver must always return the full set of supported ``OUTPUT`` formats,
> +     irrespective of the format currently set on the ``CAPTURE`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client calls
> +   :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> +
> +   * The driver must return only the formats supported for the format currently
> +     active on the ``OUTPUT`` queue.
> +
> +   * Depending on the currently set ``OUTPUT`` format, the set of supported raw
> +     formats may depend on the value of some controls (e.g. H264 or VP9
> +     profile). The client is responsible for making sure that these controls
> +     are set to the desired value before querying the ``CAPTURE`` queue.

I still think we have a problem when enumerating CAPTURE formats, that
providing the profile/level information does not help solving.

From previous emails on v1 (to which I failed to react to), it seems
that the consensus was to set the profile/level indication beforehand
to reduce the subset of possible formats and return that as enumerated
possible formats.

However, it does not really solve the issue here, given the following
distinct cases:

1. The VPU can only output the format for the decoded frame and that
format is not known until the first buffer metadata is passed.
Everything that is reported as supported at this point should be
understood as supported formats for the decoded bitstreams, but
userspace would have to pick the one matching the decoded format of the
bitstream to decode. I don't really see the point of trying to reduce
that list by providing the profile/level.

2. The VPU has some format conversion block in its pipeline and can
actually provide a range of different formats for CAPTURE buffers,
independently from the format of the decoded bitstream.

Either way, I think (correct me if I'm wrong) that players do know the
format from the decoded bitstream here, so enumeration only makes sense
for case 2.

Something we could do is to not enumerate any format for case 1., which
we would specify as an indication that only the decoded bitstream
format must be set. Then in case 2., we would enumerate the possible
formats.

For case 1., having the driver expose the supported profiles ensures
that any format in a supported profile is valid although not
enumerated.

Alternatively, we could go with a control that indicates whether the
driver supports a format decorrelated from the decoded bitstream format
and still enumerate all formats in case 1., with the implication that
only the right one must be picked by userspace. Here again, I don't see
the point of reducing the list by setting the profile/level.

So my goal here is to clearly enable userspace to distinguish between
the two situations.

What do you think?

Paul

-- 
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04  8:11 [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface Alexandre Courbot
  2018-10-04 12:41 ` Paul Kocialkowski
@ 2018-10-04 12:47 ` Paul Kocialkowski
  2018-10-04 18:10   ` Nicolas Dufresne
  2018-10-09  7:30   ` Tomasz Figa
  2018-10-08 11:01 ` Hans Verkuil
  2018-10-09  5:48 ` Tomasz Figa
  3 siblings, 2 replies; 16+ messages in thread
From: Paul Kocialkowski @ 2018-10-04 12:47 UTC (permalink / raw)
  To: Alexandre Courbot, Tomasz Figa, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, linux-media
  Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6213 bytes --]

Hi,

Here are a few minor suggestion about H.264 controls.

Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> index a9252225b63e..9d06d853d4ff 100644
> --- a/Documentation/media/uapi/v4l/extended-controls.rst
> +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> @@ -810,6 +810,31 @@ enum v4l2_mpeg_video_bitrate_mode -
>      otherwise the decoder expects a single frame in per buffer.
>      Applicable to the decoder, all codecs.
>  
> +.. _v4l2-mpeg-h264:
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> +    Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> +    the next queued frame. Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> +    Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> +    the next queued frame. Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``

For consistency with MPEG-2 and upcoming JPEG, I think we should call
this "H264_QUANTIZATION".

> +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> +    matrix to use when decoding the next queued frame. Applicable to the H.264
> +    stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``

Ditto with "H264_SLICE_PARAMS".

> +    Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> +    entries as there are slices in the corresponding ``OUTPUT`` buffer.
> +    Applicable to the H.264 stateless decoder.
> +
> +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> +    Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> +    decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> +    decoder.

Since we require all the macroblocks to decode one frame to be held in
the same OUTPUT buffer, it probably doesn't make sense to keep
DECODE_PARAM and SLICE_PARAM distinct.

I would suggest merging both in "SLICE_PARAMS", similarly to what I
have proposed for H.265: https://patchwork.kernel.org/patch/10578023/

What do you think?

Cheers,

Paul

>  ``V4L2_CID_MPEG_VIDEO_H264_VUI_SAR_ENABLE (boolean)``
>      Enable writing sample aspect ratio in the Video Usability
>      Information. Applicable to the H264 encoder.
> diff --git a/Documentation/media/uapi/v4l/pixfmt-compressed.rst b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> index a86b59f770dd..a03637fda8f9 100644
> --- a/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> +++ b/Documentation/media/uapi/v4l/pixfmt-compressed.rst
> @@ -35,6 +35,42 @@ Compressed Formats
>        - ``V4L2_PIX_FMT_H264``
>        - 'H264'
>        - H264 video elementary stream with start codes.
> +    * .. _V4L2-PIX-FMT-H264-SLICE:
> +
> +      - ``V4L2_PIX_FMT_H264_SLICE``
> +      - 'H264'
> +      - H264 parsed slice data, as extracted from the H264 bitstream.
> +        This format is adapted for stateless video decoders using the M2M and
> +        Request APIs.
> +
> +        ``OUTPUT`` buffers must contain all the macroblock slices of a given
> +        frame, i.e. if a frame requires several macroblock slices to be entirely
> +        decoded, then all these slices must be provided. In addition, the
> +        following metadata controls must be set on the request for each frame:
> +
> +        V4L2_CID_MPEG_VIDEO_H264_SPS
> +           Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use
> +           with the frame.
> +
> +        V4L2_CID_MPEG_VIDEO_H264_PPS
> +           Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use
> +           with the frame.
> +
> +        V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX
> +           Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the
> +           scaling matrix to use when decoding the frame.
> +
> +        V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> +           Array of struct v4l2_ctrl_h264_slice_param, containing at least as
> +           many entries as there are slices in the corresponding ``OUTPUT``
> +           buffer.
> +
> +        V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM
> +           Instance of struct v4l2_ctrl_h264_decode_param, containing the
> +           high-level decoding parameters for a H.264 frame.
> +
> +        See the :ref:`associated Codec Control IDs <v4l2-mpeg-h264>` for the
> +        format of these controls.
>      * .. _V4L2-PIX-FMT-H264-NO-SC:
>  
>        - ``V4L2_PIX_FMT_H264_NO_SC``
> @@ -67,10 +103,20 @@ Compressed Formats
>        - MPEG-2 parsed slice data, as extracted from the MPEG-2 bitstream.
>  	This format is adapted for stateless video decoders that implement a
>  	MPEG-2 pipeline (using the Memory to Memory and Media Request APIs).
> -	Metadata associated with the frame to decode is required to be passed
> -	through the ``V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS`` control and
> -	quantization matrices can optionally be specified through the
> -	``V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION`` control.
> +
> +        ``OUTPUT`` buffers must contain all the macroblock slices of a given
> +        frame, i.e. if a frame requires several macroblock slices to be entirely
> +        decoded, then all these slices must be provided. In addition, the
> +        following metadata controls must be set on the request for each frame:
> +
> +        V4L2_CID_MPEG_VIDEO_MPEG2_SLICE_PARAMS
> +          Slice parameters (one per slice) for the current frame.
> +
> +        Optional controls:
> +
> +        V4L2_CID_MPEG_VIDEO_MPEG2_QUANTIZATION
> +          Quantization matrices for the current frame.
> +
>  	See the :ref:`associated Codec Control IDs <v4l2-mpeg-mpeg2>`.
>  	Buffers associated with this pixel format must contain the appropriate
>  	number of macroblocks to decode a full corresponding frame.

-- 
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04 12:47 ` Paul Kocialkowski
@ 2018-10-04 18:10   ` Nicolas Dufresne
  2018-10-05 17:10     ` Paul Kocialkowski
  2018-10-09  7:30   ` Tomasz Figa
  1 sibling, 1 reply; 16+ messages in thread
From: Nicolas Dufresne @ 2018-10-04 18:10 UTC (permalink / raw)
  To: Paul Kocialkowski, Alexandre Courbot, Tomasz Figa,
	Mauro Carvalho Chehab, Hans Verkuil, Pawel Osciak, linux-media
  Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1626 bytes --]

Le jeudi 04 octobre 2018 à 14:47 +0200, Paul Kocialkowski a écrit :
> > +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > +    matrix to use when decoding the next queued frame. Applicable to the H.264
> > +    stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> 
> Ditto with "H264_SLICE_PARAMS".
> 
> > +    Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > +    entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > +    Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > +    Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > +    decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > +    decoder.
> 
> Since we require all the macroblocks to decode one frame to be held in
> the same OUTPUT buffer, it probably doesn't make sense to keep
> DECODE_PARAM and SLICE_PARAM distinct.
> 
> I would suggest merging both in "SLICE_PARAMS", similarly to what I
> have proposed for H.265: https://patchwork.kernel.org/patch/10578023/
> 
> What do you think?

I don't understand why we add this arbitrary restriction of "all the
macroblocks to decode one frame". The bitstream may contain multiple
NALs per frame (e.g. slices), and stateless API shall pass each NAL
separately imho. The driver can then decide to combine them if needed,
or to keep them seperate. I would expect most decoder to decode each
slice independently from each other, even though they write into the
same frame.

Nicolas

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04 18:10   ` Nicolas Dufresne
@ 2018-10-05 17:10     ` Paul Kocialkowski
  2018-10-08 10:21       ` Hans Verkuil
  2018-10-09  7:36       ` Tomasz Figa
  0 siblings, 2 replies; 16+ messages in thread
From: Paul Kocialkowski @ 2018-10-05 17:10 UTC (permalink / raw)
  To: Nicolas Dufresne, Alexandre Courbot, Tomasz Figa,
	Mauro Carvalho Chehab, Hans Verkuil, Pawel Osciak, Maxime Ripard,
	linux-media
  Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3507 bytes --]

Hi,

Le jeudi 04 octobre 2018 à 14:10 -0400, Nicolas Dufresne a écrit :
> Le jeudi 04 octobre 2018 à 14:47 +0200, Paul Kocialkowski a écrit :
> > > +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > > +    matrix to use when decoding the next queued frame. Applicable to the H.264
> > > +    stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > 
> > Ditto with "H264_SLICE_PARAMS".
> > 
> > > +    Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > > +    entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > > +    Applicable to the H.264 stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > > +    Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > > +    decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > > +    decoder.
> > 
> > Since we require all the macroblocks to decode one frame to be held in
> > the same OUTPUT buffer, it probably doesn't make sense to keep
> > DECODE_PARAM and SLICE_PARAM distinct.
> > 
> > I would suggest merging both in "SLICE_PARAMS", similarly to what I
> > have proposed for H.265: https://patchwork.kernel.org/patch/10578023/
> > 
> > What do you think?
> 
> I don't understand why we add this arbitrary restriction of "all the
> macroblocks to decode one frame". The bitstream may contain multiple
> NALs per frame (e.g. slices), and stateless API shall pass each NAL
> separately imho. The driver can then decide to combine them if needed,
> or to keep them seperate. I would expect most decoder to decode each
> slice independently from each other, even though they write into the
> same frame.

Well, we sort of always assumed that there is a 1:1 correspondency
between request and output frame when implemeting the software for
cedrus, which simplified both userspace and the driver. The approach we
have taken is to use one of the slice parameters for the whole series
of slices and just append the slice data.

Now that you bring it up, I realize this is an unfortunate decision.
This may have been the cause of bugs and limitations with our driver
because the slice parameters may very well be distinct for each slice.
Moreover, I suppose that just appending the slices data implies that
they are coded in the same order as the picture, which is probably
often the case but certainly not anything guaranteed. 

So I think we should change our software to associate one request per
slice, not per frame and drop this limitation that all the macroblocks
for the frame must be included.

This will require a number of changes to our driver and userspace, but
also to the MPEG-2 controls where I don't think we have the macroblock
position specified.

So it certainly makes sense to keep SLICE_PARAMS separate from
DECODE_PARAMS for H.264. I should probably also rework the H.265
controls to reflect this. Still, all controls must be passed per slice
(and the hardware decoding pipeline is fully reconfigured then), so I
guess it doesn't make such a big difference in practice.

Thanks for pointing this out, it should help bring the API closer to
what is represented in the bitstream.

Cheers,

Paul

-- 
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-05 17:10     ` Paul Kocialkowski
@ 2018-10-08 10:21       ` Hans Verkuil
  2018-10-09  7:36       ` Tomasz Figa
  1 sibling, 0 replies; 16+ messages in thread
From: Hans Verkuil @ 2018-10-08 10:21 UTC (permalink / raw)
  To: Paul Kocialkowski, Nicolas Dufresne, Alexandre Courbot,
	Tomasz Figa, Mauro Carvalho Chehab, Pawel Osciak, Maxime Ripard,
	linux-media
  Cc: linux-kernel

On 10/05/2018 07:10 PM, Paul Kocialkowski wrote:
> Hi,
> 
> Le jeudi 04 octobre 2018 à 14:10 -0400, Nicolas Dufresne a écrit :
>> Le jeudi 04 octobre 2018 à 14:47 +0200, Paul Kocialkowski a écrit :
>>>> +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
>>>> +    matrix to use when decoding the next queued frame. Applicable to the H.264
>>>> +    stateless decoder.
>>>> +
>>>> +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
>>>
>>> Ditto with "H264_SLICE_PARAMS".
>>>
>>>> +    Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
>>>> +    entries as there are slices in the corresponding ``OUTPUT`` buffer.
>>>> +    Applicable to the H.264 stateless decoder.
>>>> +
>>>> +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
>>>> +    Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
>>>> +    decoding parameters for a H.264 frame. Applicable to the H.264 stateless
>>>> +    decoder.
>>>
>>> Since we require all the macroblocks to decode one frame to be held in
>>> the same OUTPUT buffer, it probably doesn't make sense to keep
>>> DECODE_PARAM and SLICE_PARAM distinct.
>>>
>>> I would suggest merging both in "SLICE_PARAMS", similarly to what I
>>> have proposed for H.265: https://patchwork.kernel.org/patch/10578023/
>>>
>>> What do you think?
>>
>> I don't understand why we add this arbitrary restriction of "all the
>> macroblocks to decode one frame". The bitstream may contain multiple
>> NALs per frame (e.g. slices), and stateless API shall pass each NAL
>> separately imho. The driver can then decide to combine them if needed,
>> or to keep them seperate. I would expect most decoder to decode each
>> slice independently from each other, even though they write into the
>> same frame.
> 
> Well, we sort of always assumed that there is a 1:1 correspondency
> between request and output frame when implemeting the software for
> cedrus, which simplified both userspace and the driver. The approach we
> have taken is to use one of the slice parameters for the whole series
> of slices and just append the slice data.
> 
> Now that you bring it up, I realize this is an unfortunate decision.
> This may have been the cause of bugs and limitations with our driver
> because the slice parameters may very well be distinct for each slice.
> Moreover, I suppose that just appending the slices data implies that
> they are coded in the same order as the picture, which is probably
> often the case but certainly not anything guaranteed. 
> 
> So I think we should change our software to associate one request per
> slice, not per frame and drop this limitation that all the macroblocks
> for the frame must be included.
> 
> This will require a number of changes to our driver and userspace, but
> also to the MPEG-2 controls where I don't think we have the macroblock
> position specified.
> 
> So it certainly makes sense to keep SLICE_PARAMS separate from
> DECODE_PARAMS for H.264. I should probably also rework the H.265
> controls to reflect this. Still, all controls must be passed per slice
> (and the hardware decoding pipeline is fully reconfigured then), so I
> guess it doesn't make such a big difference in practice.
> 
> Thanks for pointing this out, it should help bring the API closer to
> what is represented in the bitstream.

One concern I have with this:

If we support slices with one slice per buffer, then I think our
current max of 32 buffers will be insufficient, right? So that will
have to be fixed. That's a fair amount of work since we want to do this
right.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04  8:11 [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface Alexandre Courbot
  2018-10-04 12:41 ` Paul Kocialkowski
  2018-10-04 12:47 ` Paul Kocialkowski
@ 2018-10-08 11:01 ` Hans Verkuil
  2018-10-09  7:41   ` Tomasz Figa
  2018-10-09  5:48 ` Tomasz Figa
  3 siblings, 1 reply; 16+ messages in thread
From: Hans Verkuil @ 2018-10-08 11:01 UTC (permalink / raw)
  To: Alexandre Courbot, Tomasz Figa, Paul Kocialkowski,
	Mauro Carvalho Chehab, Pawel Osciak, linux-media
  Cc: linux-kernel

On 10/04/2018 10:11 AM, Alexandre Courbot wrote:
> This patch documents the protocol that user-space should follow when
> communicating with stateless video decoders. It is based on the
> following references:
> 
> * The current protocol used by Chromium (converted from config store to
>   request API)
> 
> * The submitted Cedrus VPU driver
> 
> As such, some things may not be entirely consistent with the current
> state of drivers, so it would be great if all stakeholders could point
> out these inconsistencies. :)
> 
> This patch is supposed to be applied on top of the Request API V18 as
> well as the memory-to-memory video decoder interface series by Tomasz
> Figa.
> 
> Changes since V1:
> 
> * Applied fixes received as feedback,
> * Moved controls descriptions to the extended controls file,
> * Document reference frame management and referencing (need Hans' feedback on
>   that).
> 
> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> ---
>  .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst      |   1 +
>  .../media/uapi/v4l/extended-controls.rst      |  25 ++
>  .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
>  4 files changed, 424 insertions(+), 4 deletions(-)
>  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst

<snip>

> +Buffer management during decoding
> +=================================
> +Contrary to stateful decoder drivers, a stateless decoder driver does not
> +perform any kind of buffer management. In particular, it guarantees that
> +``CAPTURE`` buffers will be dequeued in the same order as they are queued. This
> +allows user-space to know in advance which ``CAPTURE`` buffer will contain a
> +given frame, and thus to use that buffer ID as the key to indicate a reference
> +frame.
> +
> +This also means that user-space is fully responsible for not queuing a given
> +``CAPTURE`` buffer for as long as it is used as a reference frame. Failure to do
> +so will overwrite the reference frame's data while it is still in use, and
> +result in visual corruption of future frames.
> +
> +Note that this applies to all types of buffers, and not only to
> +``V4L2_MEMORY_MMAP`` ones, as drivers supporting ``V4L2_MEMORY_DMABUF`` will
> +typically maintain a map of buffer IDs to DMABUF handles for reference frame
> +management. Queueing a buffer will result in the map entry to be overwritten
> +with the new DMABUF handle submitted in the :c:func:`VIDIOC_QBUF` ioctl.

The more I think about this, the more I believe that relying on capture buffer
indices is wrong. It's easy enough if there is a straightforward 1-1 relationship,
but what if you have H264 slices as Nicolas mentioned and it becomes a N-1 relationship?

Yes, you can still do this in userspace, but it becomes a lot more complicated.

And what if in the future instead of having one capture buffer per decoded frame
there will be multiple capture buffers per decoded frame, each with a single
slice (for example)?

I would feel much happier if we used a 'cookie' to refer to buffers.

The next problem would be where to put it. I dislike abusing the timestamp field
for this. Part of the reason is that there will be changes there to fix the
year 2038 issue, and I am not entirely sure what that will do to how this field
is handled since there may be conversions from a pre-2038 timeval to a 2038-ready
timeval.

So a union with the timestamp field and a cookie field (+ BUF_FLAG_COOKIE) would
work best IMHO.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04  8:11 [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface Alexandre Courbot
                   ` (2 preceding siblings ...)
  2018-10-08 11:01 ` Hans Verkuil
@ 2018-10-09  5:48 ` Tomasz Figa
  3 siblings, 0 replies; 16+ messages in thread
From: Tomasz Figa @ 2018-10-09  5:48 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Paul Kocialkowski, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Linux Media Mailing List,
	Linux Kernel Mailing List

Hi Alex,

On Thu, Oct 4, 2018 at 5:11 PM Alexandre Courbot <acourbot@chromium.org> wrote:
>
> This patch documents the protocol that user-space should follow when
> communicating with stateless video decoders. It is based on the
> following references:
>
> * The current protocol used by Chromium (converted from config store to
>   request API)
>
> * The submitted Cedrus VPU driver
>
> As such, some things may not be entirely consistent with the current
> state of drivers, so it would be great if all stakeholders could point
> out these inconsistencies. :)
>
> This patch is supposed to be applied on top of the Request API V18 as
> well as the memory-to-memory video decoder interface series by Tomasz
> Figa.
>
> Changes since V1:
>
> * Applied fixes received as feedback,
> * Moved controls descriptions to the extended controls file,
> * Document reference frame management and referencing (need Hans' feedback on
>   that).
>

Please take a look at my comments inline.

> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> ---
>  .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst      |   1 +
>  .../media/uapi/v4l/extended-controls.rst      |  25 ++
>  .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
>  4 files changed, 424 insertions(+), 4 deletions(-)
>  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
>
> diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> new file mode 100644
> index 000000000000..e54246df18d0
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> @@ -0,0 +1,348 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _stateless_decoder:
> +
> +**************************************************
> +Memory-to-memory Stateless Video Decoder Interface
> +**************************************************
> +
> +A stateless decoder is a decoder that works without retaining any kind of state
> +between processing frames. This means that each frame is decoded independently
> +of any previous and future frames,

nit: It sounds weird, because there are still dependencies between
frames, next frame may use previous frame as a reference. How about
dropping the whole

"each frame is decoded independently of any previous and future frames,"

and just making it

"This means that the client is responsible for maintaining all the
decoding state and providing it to the driver"?

> and that the client is responsible for
> +maintaining the decoding state and providing it to the driver. This is in
> +contrast to the stateful video decoder interface, where the hardware maintains
> +the decoding state and all the client has to do is to provide the raw encoded
> +stream.
> +
> +This section describes how user-space ("the client") is expected to communicate
> +with such decoders in order to successfully decode an encoded stream. Compared
> +to stateful codecs, the driver/client sequence is simpler, but the cost of this
> +simplicity is extra complexity in the client which must maintain a consistent
> +decoding state.
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the client
> +   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> +
> +   * The driver must always return the full set of supported ``OUTPUT`` formats,
> +     irrespective of the format currently set on the ``CAPTURE`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client calls
> +   :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> +
> +   * The driver must return only the formats supported for the format currently
> +     active on the ``OUTPUT`` queue.
> +
> +   * Depending on the currently set ``OUTPUT`` format, the set of supported raw
> +     formats may depend on the value of some controls (e.g. H264 or VP9
> +     profile). The client is responsible for making sure that these controls

I wouldn't explicitly mention profile here, since we already agreed
that profile actually is not very helpful. Perhaps more general "e.g.
parsed bitstream headers" would be better?

> +     are set to the desired value before querying the ``CAPTURE`` queue.
> +
> +   * In order to enumerate raw formats supported by a given coded format, the
> +     client must thus set that coded format on the ``OUTPUT`` queue first, then
> +     set any control listed on the format's description, and finally enumerate
> +     the ``CAPTURE`` queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing desired pixel format in
> +   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT`` queue
> +     must include all possible coded resolutions supported by the decoder
> +     for given coded pixel format.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE`` queue
> +     must include all possible frame buffer resolutions supported by the
> +     decoder for given raw pixel format and coded format currently set on
> +     ``OUTPUT`` queue.

As Hans pointed out in his review of my series, VIDIOC_ENUM_FRAMESIZES
doesn't have a queue type argument. It's "key" argument is the
pixelformat.

> +
> +    .. note::
> +
> +       The client may derive the supported resolution range for a
> +       combination of coded and raw format by setting width and height of
> +       ``OUTPUT`` format to 0 and calculating the intersection of
> +       resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> +       for the given coded and raw formats.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set the coded format on the ``OUTPUT`` queue via :c:func:`VIDIOC_S_FMT`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``pixelformat``
> +         a coded pixel format
> +
> +     ``width``, ``height``
> +         coded width and height parsed from the stream
> +
> +     other fields
> +         follow standard semantics
> +
> +   .. note::
> +
> +      Changing ``OUTPUT`` format may change currently set ``CAPTURE``
> +      format. The driver will derive a new ``CAPTURE`` format from
> +      ``OUTPUT`` format being set, including resolution, colorimetry
> +      parameters, etc. If the client needs a specific ``CAPTURE`` format,
> +      it must adjust it afterwards.
> +
> +3. Call :c:func:`VIDIOC_S_EXT_CTRLS` to set all the controls (profile, etc)
> +   required by the ``OUTPUT`` format to enumerate the ``CAPTURE`` formats.
> +
> +4. Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> +   destination buffers parsed/decoded from the bitstream.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         frame buffer resolution for the decoded frames
> +
> +     ``pixelformat``
> +         pixel format for decoded frames
> +
> +     ``num_planes`` (for _MPLANE ``type`` only)
> +         number of planes for pixelformat
> +
> +     ``sizeimage``, ``bytesperline``
> +         as per standard semantics; matching frame buffer format
> +
> +   .. note::
> +
> +      The value of ``pixelformat`` may be any pixel format supported for the
> +      ``OUTPUT`` format, based on the hardware capabilities. It is suggested
> +      that driver chooses the preferred/optimal format for given configuration.
> +      For example, a YUV format may be preferred over an RGB format, if
> +      additional conversion step would be required.

Not related to this patch, but maybe we should consider some format
flags returned by VIDIOC_ENUM_FMT, which would tell the userspace that
a format might be slower due to extra processing? Or maybe the order
of returned formats should be sorted by efficiency descending?

> +
> +5. *[optional]* Enumerate ``CAPTURE`` formats via :c:func:`VIDIOC_ENUM_FMT` on
> +   ``CAPTURE`` queue. The client may use this ioctl to discover which
> +   alternative raw formats are supported for the current ``OUTPUT`` format and
> +   select one of them via :c:func:`VIDIOC_S_FMT`.
> +
> +   .. note::
> +
> +      The driver will return only formats supported for the currently selected
> +      ``OUTPUT`` format, even if more formats may be supported by the driver in
> +      general.
> +
> +      For example, a driver/hardware may support YUV and RGB formats for
> +      resolutions 1920x1088 and lower, but only YUV for higher resolutions (due
> +      to hardware limitations). After setting a resolution of 1920x1088 or lower
> +      as the ``OUTPUT`` format, :c:func:`VIDIOC_ENUM_FMT` may return a set of
> +      YUV and RGB pixel formats, but after setting a resolution higher than
> +      1920x1088, the driver will not return RGB, unsupported for this
> +      resolution.
> +
> +6. *[optional]* Choose a different ``CAPTURE`` format than suggested via
> +   :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the client to
> +   choose a different format than selected/suggested by the driver in
> +   :c:func:`VIDIOC_G_FMT`.
> +
> +    * **Required fields:**
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``pixelformat``
> +          a raw pixel format
> +
> +7. Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> +   ``OUTPUT`` queue.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          requested number of buffers to allocate; greater than zero
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +      ``sizeimage``
> +          follows standard semantics; the client is free to choose any
> +          suitable size, however, it may be subject to change by the
> +          driver
> +
> +    * **Return fields:**
> +
> +      ``count``
> +          actual number of buffers allocated
> +
> +    * The driver must adjust count to minimum of required number of ``OUTPUT``
> +      buffers for given format and count passed. The client must check this
> +      value after the ioctl returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline depth), use
> +       G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to get minimum number of
> +       buffers required by the driver/format, and pass the obtained value plus
> +       the number of additional buffers needed in count to
> +       :c:func:`VIDIOC_REQBUFS`.

This control probably doesn't make sense for stateless decoders,
because the userspace already knows everything about the stream needed
to know how many buffers would be optimal (e.g. DPB count).

> +
> +8. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS` on the
> +   ``CAPTURE`` queue.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          requested number of buffers to allocate; greater than zero
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +    * **Return fields:**
> +
> +      ``count``
> +          adjusted to allocated number of buffers
> +
> +    * The driver must adjust count to minimum of required number of
> +      destination buffers for given format and stream configuration and the
> +      count passed. The client must check this value after the ioctl
> +      returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> +       get minimum number of buffers required, and pass the obtained value
> +       plus the number of additional buffers needed in count to
> +       :c:func:`VIDIOC_REQBUFS`.

Same here. I wouldn't mention this control.

> +
> +9. Allocate requests (likely one per ``OUTPUT`` buffer) via
> +    :c:func:`MEDIA_IOC_REQUEST_ALLOC` on the media device.
> +
> +10. Start streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> +    :c:func:`VIDIOC_STREAMON`.
> +
> +Decoding
> +========
> +
> +For each frame, the client is responsible for submitting a request to which the
> +following is attached:
> +
> +* Exactly one frame worth of encoded data in a buffer submitted to the
> +  ``OUTPUT`` queue,
> +* All the controls relevant to the format being decoded (see below for details).
> +
> +``CAPTURE`` buffers must not be part of the request, but must be queued
> +independently. The driver will pick one of the queued ``CAPTURE`` buffers and
> +decode the frame into it. Although the client has no control over which
> +``CAPTURE`` buffer will be used with a given ``OUTPUT`` buffer, it is guaranteed
> +that ``CAPTURE`` buffers will be returned in decode order (i.e. the same order
> +as ``OUTPUT`` buffers were submitted), so it is trivial to associate a dequeued
> +``CAPTURE`` buffer to its originating request and ``OUTPUT`` buffer.
> +
> +If the request is submitted without an ``OUTPUT`` buffer, then
> +:c:func:`MEDIA_REQUEST_IOC_QUEUE` will return ``-ENOENT``. If more than one
> +buffer is queued, or if some of the required controls are missing, then it will
> +return ``-EINVAL``. Decoding errors are signaled by the ``CAPTURE`` buffers
> +being dequeued carrying the ``V4L2_BUF_FLAG_ERROR`` flag. If the reference frame
> +has an error, then all other frames that refer to it will also set the
> +``V4L2_BUF_FLAG_ERROR`` flag.

Should we also return respective OUTPUT buffer with ERROR flag too?

> +
> +The contents of source ``OUTPUT`` buffers, as well as the controls that must be
> +set on the request, depend on active coded pixel format and might be affected by
> +codec-specific extended controls, as stated in documentation of each format.
> +Currently supported formats are:
> +

Should I take this as: currently whatever format you take, it's unsupported? ;)

> +Buffer management during decoding
> +=================================
> +Contrary to stateful decoder drivers, a stateless decoder driver does not
> +perform any kind of buffer management. In particular, it guarantees that
> +``CAPTURE`` buffers will be dequeued in the same order as they are queued. This
> +allows user-space to know in advance which ``CAPTURE`` buffer will contain a
> +given frame, and thus to use that buffer ID as the key to indicate a reference
> +frame.
> +
> +This also means that user-space is fully responsible for not queuing a given
> +``CAPTURE`` buffer for as long as it is used as a reference frame. Failure to do
> +so will overwrite the reference frame's data while it is still in use, and
> +result in visual corruption of future frames.
> +
> +Note that this applies to all types of buffers, and not only to
> +``V4L2_MEMORY_MMAP`` ones, as drivers supporting ``V4L2_MEMORY_DMABUF`` will
> +typically maintain a map of buffer IDs to DMABUF handles for reference frame
> +management. Queueing a buffer will result in the map entry to be overwritten
> +with the new DMABUF handle submitted in the :c:func:`VIDIOC_QBUF` ioctl.
> +
> +Seek
> +====
> +In order to seek, the client just needs to submit requests using input buffers
> +corresponding to the new stream position. It must however be aware that
> +resolution may have changed and follow the dynamic resolution change sequence in
> +that case. Also depending on the codec used, picture parameters (e.g. SPS/PPS
> +for H.264) may have changed and the client is responsible for making sure that
> +a valid state is sent to the kernel.
> +
> +The client is then free to ignore any returned ``CAPTURE`` buffer that comes
> +from the pre-seek position.

I think it could actually make sense to just stop and reinitialize
everything for a seek, because typically you would want to just skip
any frames pending in vb2 queues and decode the next frame after seek
as soon as possible. As a side effect, there wouldn't be divergence
left between same-resolution and different-resolution seeks.

> +
> +Pause
> +=====
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> +Without source bitstream data, there is no data to process and the hardware
> +remains idle.
> +
> +Dynamic resolution change
> +=========================
> +
> +If the client detects a resolution change in the stream, it will need to perform
> +the initialization sequence again with the new resolution:
> +
> +1. Wait until all submitted requests have completed and dequeue the
> +   corresponding output buffers.
> +
> +2. Call :c:func:`VIDIOC_STREAMOFF` on both the ``OUTPUT`` and ``CAPTURE``
> +   queues.
> +
> +3. Free all buffers by calling :c:func:`VIDIOC_REQBUFS` on the
> +   ``OUTPUT`` and ``CAPTURE`` queues with a buffer count of zero.
> +
> +Then perform the initialization sequence again, with the new resolution set
> +on the ``OUTPUT`` queue. Note that due to resolution constraints, a different
> +format may need to be picked on the ``CAPTURE`` queue.

With my comment to the seek sequence, one could just merge "Seek" and
"Dynamic resolution change" into one section.

> +
> +Drain
> +=====
> +
> +In order to drain the stream on a stateless decoder, the client just needs to
> +wait until all the submitted requests are completed. There is no need to send a
> +``V4L2_DEC_CMD_STOP`` command since requests are processed sequentially by the
> +driver.
> +
> +End of stream
> +=============
> +
> +If the decoder encounters an end of stream marking in the stream, the
> +driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
> +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> +behavior is identical to the drain sequence triggered by the client via
> +``V4L2_DEC_CMD_STOP``.

I think we agreed to remove this section, because a stateless decoder
doesn't detect EOS on its own - the client does.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04 12:41 ` Paul Kocialkowski
@ 2018-10-09  5:58   ` Tomasz Figa
  2018-10-12 12:28     ` Paul Kocialkowski
  0 siblings, 1 reply; 16+ messages in thread
From: Tomasz Figa @ 2018-10-09  5:58 UTC (permalink / raw)
  To: contact
  Cc: Alexandre Courbot, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Linux Media Mailing List,
	Linux Kernel Mailing List

Hi Paul,

On Thu, Oct 4, 2018 at 9:40 PM Paul Kocialkowski <contact@paulk.fr> wrote:
>
> Hi Alexandre,
>
> Thanks for submitting this second version of the RFC, it is very
> appreciated! I will try to provide useful feedback here and hopefully
> be more reactive than during v1 review!
>
> Most of it looks good to me, but there is a specific point I'd like to
> keep discussing.
>
> Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> > This patch documents the protocol that user-space should follow when
> > communicating with stateless video decoders. It is based on the
> > following references:
> >
> > * The current protocol used by Chromium (converted from config store to
> >   request API)
> >
> > * The submitted Cedrus VPU driver
> >
> > As such, some things may not be entirely consistent with the current
> > state of drivers, so it would be great if all stakeholders could point
> > out these inconsistencies. :)
> >
> > This patch is supposed to be applied on top of the Request API V18 as
> > well as the memory-to-memory video decoder interface series by Tomasz
> > Figa.
> >
> > Changes since V1:
> >
> > * Applied fixes received as feedback,
> > * Moved controls descriptions to the extended controls file,
> > * Document reference frame management and referencing (need Hans' feedback on
> >   that).
> >
> > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > ---
> >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst      |   1 +
> >  .../media/uapi/v4l/extended-controls.rst      |  25 ++
> >  .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
> >  4 files changed, 424 insertions(+), 4 deletions(-)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > new file mode 100644
> > index 000000000000..e54246df18d0
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > @@ -0,0 +1,348 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _stateless_decoder:
> > +
> > +**************************************************
> > +Memory-to-memory Stateless Video Decoder Interface
> > +**************************************************
> > +
> > +A stateless decoder is a decoder that works without retaining any kind of state
> > +between processing frames. This means that each frame is decoded independently
> > +of any previous and future frames, and that the client is responsible for
> > +maintaining the decoding state and providing it to the driver. This is in
> > +contrast to the stateful video decoder interface, where the hardware maintains
> > +the decoding state and all the client has to do is to provide the raw encoded
> > +stream.
> > +
> > +This section describes how user-space ("the client") is expected to communicate
> > +with such decoders in order to successfully decode an encoded stream. Compared
> > +to stateful codecs, the driver/client sequence is simpler, but the cost of this
> > +simplicity is extra complexity in the client which must maintain a consistent
> > +decoding state.
> > +
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the client
> > +   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> > +
> > +   * The driver must always return the full set of supported ``OUTPUT`` formats,
> > +     irrespective of the format currently set on the ``CAPTURE`` queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client calls
> > +   :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> > +
> > +   * The driver must return only the formats supported for the format currently
> > +     active on the ``OUTPUT`` queue.
> > +
> > +   * Depending on the currently set ``OUTPUT`` format, the set of supported raw
> > +     formats may depend on the value of some controls (e.g. H264 or VP9
> > +     profile). The client is responsible for making sure that these controls
> > +     are set to the desired value before querying the ``CAPTURE`` queue.
>
> I still think we have a problem when enumerating CAPTURE formats, that
> providing the profile/level information does not help solving.
>
> From previous emails on v1 (to which I failed to react to), it seems
> that the consensus was to set the profile/level indication beforehand
> to reduce the subset of possible formats and return that as enumerated
> possible formats.

I think the consensus was to set all the the parsed header controls
and actually Alex seems to have mentioned it slightly further in his
patch:

+   * In order to enumerate raw formats supported by a given coded format, the
+     client must thus set that coded format on the ``OUTPUT`` queue first, then
+     set any control listed on the format's description, and finally enumerate
+     the ``CAPTURE`` queue.

>
> However, it does not really solve the issue here, given the following
> distinct cases:
>
> 1. The VPU can only output the format for the decoded frame and that
> format is not known until the first buffer metadata is passed.

That's why I later suggested metadata (parsed header controls) and not
just some selective controls, such as profiles.

> Everything that is reported as supported at this point should be
> understood as supported formats for the decoded bitstreams, but
> userspace would have to pick the one matching the decoded format of the
> bitstream to decode. I don't really see the point of trying to reduce
> that list by providing the profile/level.
>
> 2. The VPU has some format conversion block in its pipeline and can
> actually provide a range of different formats for CAPTURE buffers,
> independently from the format of the decoded bitstream.
>
> Either way, I think (correct me if I'm wrong) that players do know the
> format from the decoded bitstream here, so enumeration only makes sense
> for case 2.

Players don't know the format for the decoded bitstream, as I already
explained before. From stream metadata they would only know whether
the stream is YUV 4:2:0 vs 4:2:2, but wouldn't know the exact hardware
constraints, e.g. whether NV12 or YUV420 is supported for given YUV
4:2:0 stream.

>
> Something we could do is to not enumerate any format for case 1., which
> we would specify as an indication that only the decoded bitstream
> format must be set. Then in case 2., we would enumerate the possible
> formats.
>
> For case 1., having the driver expose the supported profiles ensures
> that any format in a supported profile is valid although not
> enumerated.

Profile doesn't fully determine a specific pixel format, only the
abstract format (see above).

>
> Alternatively, we could go with a control that indicates whether the
> driver supports a format decorrelated from the decoded bitstream format
> and still enumerate all formats in case 1., with the implication that
> only the right one must be picked by userspace. Here again, I don't see
> the point of reducing the list by setting the profile/level.
>
> So my goal here is to clearly enable userspace to distinguish between
> the two situations.
>
> What do you think?

Why would we need to create a control, if we already have the ENUM_FMT
API existing exactly to achieve this? We already have this problem
solved for stateful decoders and if we request the userspace to
actually set all the necessary metadata beforehand, the resulting
behavior (initialization sequence) would be much more consistent
between these 2 APIs.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-04 12:47 ` Paul Kocialkowski
  2018-10-04 18:10   ` Nicolas Dufresne
@ 2018-10-09  7:30   ` Tomasz Figa
  2018-10-09 17:48     ` Paul Kocialkowski
  1 sibling, 1 reply; 16+ messages in thread
From: Tomasz Figa @ 2018-10-09  7:30 UTC (permalink / raw)
  To: contact
  Cc: Alexandre Courbot, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Linux Media Mailing List,
	Linux Kernel Mailing List

On Thu, Oct 4, 2018 at 9:46 PM Paul Kocialkowski <contact@paulk.fr> wrote:
>
> Hi,
>
> Here are a few minor suggestion about H.264 controls.
>
> Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> > diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> > index a9252225b63e..9d06d853d4ff 100644
> > --- a/Documentation/media/uapi/v4l/extended-controls.rst
> > +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> > @@ -810,6 +810,31 @@ enum v4l2_mpeg_video_bitrate_mode -
> >      otherwise the decoder expects a single frame in per buffer.
> >      Applicable to the decoder, all codecs.
> >
> > +.. _v4l2-mpeg-h264:
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> > +    Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> > +    the next queued frame. Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> > +    Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> > +    the next queued frame. Applicable to the H.264 stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
>
> For consistency with MPEG-2 and upcoming JPEG, I think we should call
> this "H264_QUANTIZATION".

I'd rather stay consistent with H.264 specification, which uses the
wording as defined in Alex's patch. Otherwise it would be difficult to
correlate between the controls and the specification, which is
something that the userspace developer would definitely need to
understand how to properly parse the stream and obtain the decoding
parameters.

>
> > +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > +    matrix to use when decoding the next queued frame. Applicable to the H.264
> > +    stateless decoder.
> > +
> > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
>
> Ditto with "H264_SLICE_PARAMS".
>

It doesn't seem to be related to the spec in this case and "params"
sounds better indeed.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-05 17:10     ` Paul Kocialkowski
  2018-10-08 10:21       ` Hans Verkuil
@ 2018-10-09  7:36       ` Tomasz Figa
  2018-10-12 11:52         ` Paul Kocialkowski
  1 sibling, 1 reply; 16+ messages in thread
From: Tomasz Figa @ 2018-10-09  7:36 UTC (permalink / raw)
  To: contact
  Cc: nicolas, Alexandre Courbot, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Maxime Ripard, Linux Media Mailing List,
	Linux Kernel Mailing List

On Sat, Oct 6, 2018 at 2:09 AM Paul Kocialkowski <contact@paulk.fr> wrote:
>
> Hi,
>
> Le jeudi 04 octobre 2018 à 14:10 -0400, Nicolas Dufresne a écrit :
> > Le jeudi 04 octobre 2018 à 14:47 +0200, Paul Kocialkowski a écrit :
> > > > +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > > > +    matrix to use when decoding the next queued frame. Applicable to the H.264
> > > > +    stateless decoder.
> > > > +
> > > > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > >
> > > Ditto with "H264_SLICE_PARAMS".
> > >
> > > > +    Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > > > +    entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > > > +    Applicable to the H.264 stateless decoder.
> > > > +
> > > > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > > > +    Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > > > +    decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > > > +    decoder.
> > >
> > > Since we require all the macroblocks to decode one frame to be held in
> > > the same OUTPUT buffer, it probably doesn't make sense to keep
> > > DECODE_PARAM and SLICE_PARAM distinct.
> > >
> > > I would suggest merging both in "SLICE_PARAMS", similarly to what I
> > > have proposed for H.265: https://patchwork.kernel.org/patch/10578023/
> > >
> > > What do you think?
> >
> > I don't understand why we add this arbitrary restriction of "all the
> > macroblocks to decode one frame". The bitstream may contain multiple
> > NALs per frame (e.g. slices), and stateless API shall pass each NAL
> > separately imho. The driver can then decide to combine them if needed,
> > or to keep them seperate. I would expect most decoder to decode each
> > slice independently from each other, even though they write into the
> > same frame.
>
> Well, we sort of always assumed that there is a 1:1 correspondency
> between request and output frame when implemeting the software for
> cedrus, which simplified both userspace and the driver. The approach we
> have taken is to use one of the slice parameters for the whole series
> of slices and just append the slice data.
>
> Now that you bring it up, I realize this is an unfortunate decision.
> This may have been the cause of bugs and limitations with our driver
> because the slice parameters may very well be distinct for each slice.

I might be misunderstanding something, but, at least for the H.264
API, there is no relation between the number of buffers/requests and
number of slice parameters. The V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
is an array, with each element describing each slice in the OUTPUT
buffer. So actually, it could be up to the userspace if it want to
have 1 OUTPUT buffer per slice or all slices in 1 OUTPUT buffer - the
former would have v4l2_ctrl_h264_decode_param::num_slices = 1 and only
one valid element in V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS.

> Moreover, I suppose that just appending the slices data implies that
> they are coded in the same order as the picture, which is probably
> often the case but certainly not anything guaranteed.

Again, at least in the H.264 API being proposed here, the order of
slices is not specified by the order of slice data in the buffer. Each
entry of the V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS array points to the
specific offset within the buffer.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-08 11:01 ` Hans Verkuil
@ 2018-10-09  7:41   ` Tomasz Figa
  2018-10-09  8:07     ` Hans Verkuil
  0 siblings, 1 reply; 16+ messages in thread
From: Tomasz Figa @ 2018-10-09  7:41 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Alexandre Courbot, Paul Kocialkowski, Mauro Carvalho Chehab,
	Pawel Osciak, Linux Media Mailing List,
	Linux Kernel Mailing List

On Mon, Oct 8, 2018 at 8:02 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 10/04/2018 10:11 AM, Alexandre Courbot wrote:
> > This patch documents the protocol that user-space should follow when
> > communicating with stateless video decoders. It is based on the
> > following references:
> >
> > * The current protocol used by Chromium (converted from config store to
> >   request API)
> >
> > * The submitted Cedrus VPU driver
> >
> > As such, some things may not be entirely consistent with the current
> > state of drivers, so it would be great if all stakeholders could point
> > out these inconsistencies. :)
> >
> > This patch is supposed to be applied on top of the Request API V18 as
> > well as the memory-to-memory video decoder interface series by Tomasz
> > Figa.
> >
> > Changes since V1:
> >
> > * Applied fixes received as feedback,
> > * Moved controls descriptions to the extended controls file,
> > * Document reference frame management and referencing (need Hans' feedback on
> >   that).
> >
> > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > ---
> >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst      |   1 +
> >  .../media/uapi/v4l/extended-controls.rst      |  25 ++
> >  .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
> >  4 files changed, 424 insertions(+), 4 deletions(-)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
>
> <snip>
>
> > +Buffer management during decoding
> > +=================================
> > +Contrary to stateful decoder drivers, a stateless decoder driver does not
> > +perform any kind of buffer management. In particular, it guarantees that
> > +``CAPTURE`` buffers will be dequeued in the same order as they are queued. This
> > +allows user-space to know in advance which ``CAPTURE`` buffer will contain a
> > +given frame, and thus to use that buffer ID as the key to indicate a reference
> > +frame.
> > +
> > +This also means that user-space is fully responsible for not queuing a given
> > +``CAPTURE`` buffer for as long as it is used as a reference frame. Failure to do
> > +so will overwrite the reference frame's data while it is still in use, and
> > +result in visual corruption of future frames.
> > +
> > +Note that this applies to all types of buffers, and not only to
> > +``V4L2_MEMORY_MMAP`` ones, as drivers supporting ``V4L2_MEMORY_DMABUF`` will
> > +typically maintain a map of buffer IDs to DMABUF handles for reference frame
> > +management. Queueing a buffer will result in the map entry to be overwritten
> > +with the new DMABUF handle submitted in the :c:func:`VIDIOC_QBUF` ioctl.
>
> The more I think about this, the more I believe that relying on capture buffer
> indices is wrong. It's easy enough if there is a straightforward 1-1 relationship,
> but what if you have H264 slices as Nicolas mentioned and it becomes a N-1 relationship?
>
> Yes, you can still do this in userspace, but it becomes a lot more complicated.
>
> And what if in the future instead of having one capture buffer per decoded frame
> there will be multiple capture buffers per decoded frame, each with a single
> slice (for example)?

Is there any particular scenario you have in mind, where such case would happen?

>
> I would feel much happier if we used a 'cookie' to refer to buffers.

Hmm, how would this cookie work in a case of N OUTPUT -> 1 CAPTURE case?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-09  7:41   ` Tomasz Figa
@ 2018-10-09  8:07     ` Hans Verkuil
  0 siblings, 0 replies; 16+ messages in thread
From: Hans Verkuil @ 2018-10-09  8:07 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Alexandre Courbot, Paul Kocialkowski, Mauro Carvalho Chehab,
	Pawel Osciak, Linux Media Mailing List,
	Linux Kernel Mailing List

On 10/09/18 09:41, Tomasz Figa wrote:
> On Mon, Oct 8, 2018 at 8:02 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> On 10/04/2018 10:11 AM, Alexandre Courbot wrote:
>>> This patch documents the protocol that user-space should follow when
>>> communicating with stateless video decoders. It is based on the
>>> following references:
>>>
>>> * The current protocol used by Chromium (converted from config store to
>>>   request API)
>>>
>>> * The submitted Cedrus VPU driver
>>>
>>> As such, some things may not be entirely consistent with the current
>>> state of drivers, so it would be great if all stakeholders could point
>>> out these inconsistencies. :)
>>>
>>> This patch is supposed to be applied on top of the Request API V18 as
>>> well as the memory-to-memory video decoder interface series by Tomasz
>>> Figa.
>>>
>>> Changes since V1:
>>>
>>> * Applied fixes received as feedback,
>>> * Moved controls descriptions to the extended controls file,
>>> * Document reference frame management and referencing (need Hans' feedback on
>>>   that).
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
>>> ---
>>>  .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
>>>  Documentation/media/uapi/v4l/devices.rst      |   1 +
>>>  .../media/uapi/v4l/extended-controls.rst      |  25 ++
>>>  .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
>>>  4 files changed, 424 insertions(+), 4 deletions(-)
>>>  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
>>>
>>> diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
>>
>> <snip>
>>
>>> +Buffer management during decoding
>>> +=================================
>>> +Contrary to stateful decoder drivers, a stateless decoder driver does not
>>> +perform any kind of buffer management. In particular, it guarantees that
>>> +``CAPTURE`` buffers will be dequeued in the same order as they are queued. This
>>> +allows user-space to know in advance which ``CAPTURE`` buffer will contain a
>>> +given frame, and thus to use that buffer ID as the key to indicate a reference
>>> +frame.
>>> +
>>> +This also means that user-space is fully responsible for not queuing a given
>>> +``CAPTURE`` buffer for as long as it is used as a reference frame. Failure to do
>>> +so will overwrite the reference frame's data while it is still in use, and
>>> +result in visual corruption of future frames.
>>> +
>>> +Note that this applies to all types of buffers, and not only to
>>> +``V4L2_MEMORY_MMAP`` ones, as drivers supporting ``V4L2_MEMORY_DMABUF`` will
>>> +typically maintain a map of buffer IDs to DMABUF handles for reference frame
>>> +management. Queueing a buffer will result in the map entry to be overwritten
>>> +with the new DMABUF handle submitted in the :c:func:`VIDIOC_QBUF` ioctl.
>>
>> The more I think about this, the more I believe that relying on capture buffer
>> indices is wrong. It's easy enough if there is a straightforward 1-1 relationship,
>> but what if you have H264 slices as Nicolas mentioned and it becomes a N-1 relationship?
>>
>> Yes, you can still do this in userspace, but it becomes a lot more complicated.
>>
>> And what if in the future instead of having one capture buffer per decoded frame
>> there will be multiple capture buffers per decoded frame, each with a single
>> slice (for example)?
> 
> Is there any particular scenario you have in mind, where such case would happen?

Video conferencing to reduce the latency, i.e. no need to wait for the full frame
to be available, just start processing as soon as a decoded slice arrives.

> 
>>
>> I would feel much happier if we used a 'cookie' to refer to buffers.
> 
> Hmm, how would this cookie work in a case of N OUTPUT -> 1 CAPTURE case?

The output buffers would use the same cookie.

Regards,

	Hans

> 
> Best regards,
> Tomasz
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-09  7:30   ` Tomasz Figa
@ 2018-10-09 17:48     ` Paul Kocialkowski
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Kocialkowski @ 2018-10-09 17:48 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Alexandre Courbot, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Linux Media Mailing List,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2333 bytes --]

Hi,

Le mardi 09 octobre 2018 à 16:30 +0900, Tomasz Figa a écrit :
> On Thu, Oct 4, 2018 at 9:46 PM Paul Kocialkowski <contact@paulk.fr> wrote:
> > 
> > Hi,
> > 
> > Here are a few minor suggestion about H.264 controls.
> > 
> > Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> > > diff --git a/Documentation/media/uapi/v4l/extended-controls.rst b/Documentation/media/uapi/v4l/extended-controls.rst
> > > index a9252225b63e..9d06d853d4ff 100644
> > > --- a/Documentation/media/uapi/v4l/extended-controls.rst
> > > +++ b/Documentation/media/uapi/v4l/extended-controls.rst
> > > @@ -810,6 +810,31 @@ enum v4l2_mpeg_video_bitrate_mode -
> > >      otherwise the decoder expects a single frame in per buffer.
> > >      Applicable to the decoder, all codecs.
> > > 
> > > +.. _v4l2-mpeg-h264:
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_SPS``
> > > +    Instance of struct v4l2_ctrl_h264_sps, containing the SPS of to use with
> > > +    the next queued frame. Applicable to the H.264 stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_PPS``
> > > +    Instance of struct v4l2_ctrl_h264_pps, containing the PPS of to use with
> > > +    the next queued frame. Applicable to the H.264 stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_SCALING_MATRIX``
> > 
> > For consistency with MPEG-2 and upcoming JPEG, I think we should call
> > this "H264_QUANTIZATION".
> 
> I'd rather stay consistent with H.264 specification, which uses the
> wording as defined in Alex's patch. Otherwise it would be difficult to
> correlate between the controls and the specification, which is
> something that the userspace developer would definitely need to
> understand how to properly parse the stream and obtain the decoding
> parameters.

Okay, I agree this makes more sense than trying to keep the names
consistent across codecs.

> > 
> > > +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > > +    matrix to use when decoding the next queued frame. Applicable to the H.264
> > > +    stateless decoder.
> > > +
> > > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > 
> > Ditto with "H264_SLICE_PARAMS".
> > 
> 
> It doesn't seem to be related to the spec in this case and "params"
> sounds better indeed.

Cheers,

Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-09  7:36       ` Tomasz Figa
@ 2018-10-12 11:52         ` Paul Kocialkowski
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Kocialkowski @ 2018-10-12 11:52 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: nicolas, Alexandre Courbot, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Maxime Ripard, Linux Media Mailing List,
	Linux Kernel Mailing List, Ezequiel Garcia

[-- Attachment #1: Type: text/plain, Size: 4977 bytes --]

Hi,

Le mardi 09 octobre 2018 à 16:36 +0900, Tomasz Figa a écrit :
> On Sat, Oct 6, 2018 at 2:09 AM Paul Kocialkowski <contact@paulk.fr> wrote:
> > Hi,
> > 
> > Le jeudi 04 octobre 2018 à 14:10 -0400, Nicolas Dufresne a écrit :
> > > Le jeudi 04 octobre 2018 à 14:47 +0200, Paul Kocialkowski a écrit :
> > > > > +    Instance of struct v4l2_ctrl_h264_scaling_matrix, containing the scaling
> > > > > +    matrix to use when decoding the next queued frame. Applicable to the H.264
> > > > > +    stateless decoder.
> > > > > +
> > > > > +``V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM``
> > > > 
> > > > Ditto with "H264_SLICE_PARAMS".
> > > > 
> > > > > +    Array of struct v4l2_ctrl_h264_slice_param, containing at least as many
> > > > > +    entries as there are slices in the corresponding ``OUTPUT`` buffer.
> > > > > +    Applicable to the H.264 stateless decoder.
> > > > > +
> > > > > +``V4L2_CID_MPEG_VIDEO_H264_DECODE_PARAM``
> > > > > +    Instance of struct v4l2_ctrl_h264_decode_param, containing the high-level
> > > > > +    decoding parameters for a H.264 frame. Applicable to the H.264 stateless
> > > > > +    decoder.
> > > > 
> > > > Since we require all the macroblocks to decode one frame to be held in
> > > > the same OUTPUT buffer, it probably doesn't make sense to keep
> > > > DECODE_PARAM and SLICE_PARAM distinct.
> > > > 
> > > > I would suggest merging both in "SLICE_PARAMS", similarly to what I
> > > > have proposed for H.265: https://patchwork.kernel.org/patch/10578023/
> > > > 
> > > > What do you think?
> > > 
> > > I don't understand why we add this arbitrary restriction of "all the
> > > macroblocks to decode one frame". The bitstream may contain multiple
> > > NALs per frame (e.g. slices), and stateless API shall pass each NAL
> > > separately imho. The driver can then decide to combine them if needed,
> > > or to keep them seperate. I would expect most decoder to decode each
> > > slice independently from each other, even though they write into the
> > > same frame.
> > 
> > Well, we sort of always assumed that there is a 1:1 correspondency
> > between request and output frame when implemeting the software for
> > cedrus, which simplified both userspace and the driver. The approach we
> > have taken is to use one of the slice parameters for the whole series
> > of slices and just append the slice data.
> > 
> > Now that you bring it up, I realize this is an unfortunate decision.
> > This may have been the cause of bugs and limitations with our driver
> > because the slice parameters may very well be distinct for each slice.
> 
> I might be misunderstanding something, but, at least for the H.264
> API, there is no relation between the number of buffers/requests and
> number of slice parameters. The V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAM
> is an array, with each element describing each slice in the OUTPUT
> buffer. So actually, it could be up to the userspace if it want to
> have 1 OUTPUT buffer per slice or all slices in 1 OUTPUT buffer - the
> former would have v4l2_ctrl_h264_decode_param::num_slices = 1 and only
> one valid element in V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS.

It seems that we have totally missed that when implementing H.264
support for Cedrus and did not do anything similar for other codecs.

I think grouping slices in the same OUTPUT buffer and providing offsets
for each slice from each element of an array of SLICE_PARAMS controls
makes a lot of sense.

This way we don't have to have one OUTPUT buffer per slice, which
simplifies things a lot. Also, not having one request per slice will
probably speed things up.

However, I'm a bit confused when looking at the chromeos-4.4 code. It
seems that RK3288 only takes the parameters from the first slice (it's
not using DECODE_PARAM's num_slices). Also, RK3399 doesn't seem to use
the slice params at all. I'm really curious to understand how it works
for Rockchip. Perhaps someone has some insight about this?

Also, I'm not sure we have converged on a solution for the Rockchip VPU
requiring the start code before the coded data. Given that each slice
should be handled separately, does it mean the start code has to be
repeated for each?

> > Moreover, I suppose that just appending the slices data implies that
> > they are coded in the same order as the picture, which is probably
> > often the case but certainly not anything guaranteed.
> 
> Again, at least in the H.264 API being proposed here, the order of
> slices is not specified by the order of slice data in the buffer. Each
> entry of the V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS array points to the
> specific offset within the buffer.

Sounds good to me.

Cheers,

Paul

-- 
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface
  2018-10-09  5:58   ` Tomasz Figa
@ 2018-10-12 12:28     ` Paul Kocialkowski
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Kocialkowski @ 2018-10-12 12:28 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Alexandre Courbot, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Linux Media Mailing List,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 9623 bytes --]

Hi,

Le mardi 09 octobre 2018 à 14:58 +0900, Tomasz Figa a écrit :
> Hi Paul,
> 
> On Thu, Oct 4, 2018 at 9:40 PM Paul Kocialkowski <contact@paulk.fr> wrote:
> > Hi Alexandre,
> > 
> > Thanks for submitting this second version of the RFC, it is very
> > appreciated! I will try to provide useful feedback here and hopefully
> > be more reactive than during v1 review!
> > 
> > Most of it looks good to me, but there is a specific point I'd like to
> > keep discussing.
> > 
> > Le jeudi 04 octobre 2018 à 17:11 +0900, Alexandre Courbot a écrit :
> > > This patch documents the protocol that user-space should follow when
> > > communicating with stateless video decoders. It is based on the
> > > following references:
> > > 
> > > * The current protocol used by Chromium (converted from config store to
> > >   request API)
> > > 
> > > * The submitted Cedrus VPU driver
> > > 
> > > As such, some things may not be entirely consistent with the current
> > > state of drivers, so it would be great if all stakeholders could point
> > > out these inconsistencies. :)
> > > 
> > > This patch is supposed to be applied on top of the Request API V18 as
> > > well as the memory-to-memory video decoder interface series by Tomasz
> > > Figa.
> > > 
> > > Changes since V1:
> > > 
> > > * Applied fixes received as feedback,
> > > * Moved controls descriptions to the extended controls file,
> > > * Document reference frame management and referencing (need Hans' feedback on
> > >   that).
> > > 
> > > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > > ---
> > >  .../media/uapi/v4l/dev-stateless-decoder.rst  | 348 ++++++++++++++++++
> > >  Documentation/media/uapi/v4l/devices.rst      |   1 +
> > >  .../media/uapi/v4l/extended-controls.rst      |  25 ++
> > >  .../media/uapi/v4l/pixfmt-compressed.rst      |  54 ++-
> > >  4 files changed, 424 insertions(+), 4 deletions(-)
> > >  create mode 100644 Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > 
> > > diff --git a/Documentation/media/uapi/v4l/dev-stateless-decoder.rst b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > new file mode 100644
> > > index 000000000000..e54246df18d0
> > > --- /dev/null
> > > +++ b/Documentation/media/uapi/v4l/dev-stateless-decoder.rst
> > > @@ -0,0 +1,348 @@
> > > +.. -*- coding: utf-8; mode: rst -*-
> > > +
> > > +.. _stateless_decoder:
> > > +
> > > +**************************************************
> > > +Memory-to-memory Stateless Video Decoder Interface
> > > +**************************************************
> > > +
> > > +A stateless decoder is a decoder that works without retaining any kind of state
> > > +between processing frames. This means that each frame is decoded independently
> > > +of any previous and future frames, and that the client is responsible for
> > > +maintaining the decoding state and providing it to the driver. This is in
> > > +contrast to the stateful video decoder interface, where the hardware maintains
> > > +the decoding state and all the client has to do is to provide the raw encoded
> > > +stream.
> > > +
> > > +This section describes how user-space ("the client") is expected to communicate
> > > +with such decoders in order to successfully decode an encoded stream. Compared
> > > +to stateful codecs, the driver/client sequence is simpler, but the cost of this
> > > +simplicity is extra complexity in the client which must maintain a consistent
> > > +decoding state.
> > > +
> > > +Querying capabilities
> > > +=====================
> > > +
> > > +1. To enumerate the set of coded formats supported by the driver, the client
> > > +   calls :c:func:`VIDIOC_ENUM_FMT` on the ``OUTPUT`` queue.
> > > +
> > > +   * The driver must always return the full set of supported ``OUTPUT`` formats,
> > > +     irrespective of the format currently set on the ``CAPTURE`` queue.
> > > +
> > > +2. To enumerate the set of supported raw formats, the client calls
> > > +   :c:func:`VIDIOC_ENUM_FMT` on the ``CAPTURE`` queue.
> > > +
> > > +   * The driver must return only the formats supported for the format currently
> > > +     active on the ``OUTPUT`` queue.
> > > +
> > > +   * Depending on the currently set ``OUTPUT`` format, the set of supported raw
> > > +     formats may depend on the value of some controls (e.g. H264 or VP9
> > > +     profile). The client is responsible for making sure that these controls
> > > +     are set to the desired value before querying the ``CAPTURE`` queue.
> > 
> > I still think we have a problem when enumerating CAPTURE formats, that
> > providing the profile/level information does not help solving.
> > 
> > From previous emails on v1 (to which I failed to react to), it seems
> > that the consensus was to set the profile/level indication beforehand
> > to reduce the subset of possible formats and return that as enumerated
> > possible formats.
> 
> I think the consensus was to set all the the parsed header controls
> and actually Alex seems to have mentioned it slightly further in his
> patch:
> 
> +   * In order to enumerate raw formats supported by a given coded format, the
> +     client must thus set that coded format on the ``OUTPUT`` queue first, then
> +     set any control listed on the format's description, and finally enumerate
> +     the ``CAPTURE`` queue.
> 
> > However, it does not really solve the issue here, given the following
> > distinct cases:
> > 
> > 1. The VPU can only output the format for the decoded frame and that
> > format is not known until the first buffer metadata is passed.
> 
> That's why I later suggested metadata (parsed header controls) and not
> just some selective controls, such as profiles.

Oh sorry, it seems that I misunderstood this part. I totally agree with
the approach of setting the required codec controls then.

I will look into how that can be managed with VAAPI. I recall that the
format negotiation part happens very early (no metadata passed yet) and
I don't think there's a way to change the format afterwards. But if
there's a problem there, it's definitely on VAAPI's side and that
should not contaminate our API.

> > Everything that is reported as supported at this point should be
> > understood as supported formats for the decoded bitstreams, but
> > userspace would have to pick the one matching the decoded format of the
> > bitstream to decode. I don't really see the point of trying to reduce
> > that list by providing the profile/level.
> > 
> > 2. The VPU has some format conversion block in its pipeline and can
> > actually provide a range of different formats for CAPTURE buffers,
> > independently from the format of the decoded bitstream.
> > 
> > Either way, I think (correct me if I'm wrong) that players do know the
> > format from the decoded bitstream here, so enumeration only makes sense
> > for case 2.
> 
> Players don't know the format for the decoded bitstream, as I already
> explained before. From stream metadata they would only know whether
> the stream is YUV 4:2:0 vs 4:2:2, but wouldn't know the exact hardware
> constraints, e.g. whether NV12 or YUV420 is supported for given YUV
> 4:2:0 stream.

That's a good point, thanks!

> > Something we could do is to not enumerate any format for case 1., which
> > we would specify as an indication that only the decoded bitstream
> > format must be set. Then in case 2., we would enumerate the possible
> > formats.
> > 
> > For case 1., having the driver expose the supported profiles ensures
> > that any format in a supported profile is valid although not
> > enumerated.
> 
> Profile doesn't fully determine a specific pixel format, only the
> abstract format (see above).

From what I've understood, profile indicates which YUV sub-sampling
configuration is allowed, but there may be multiple ones. For instance,
the High 4:4:4 profile allows both 4:2:2 and 4:4:4 sub-sampling.

So I maintain that it doesn't make much sense to set the profile while
decoding. Instead, I think it should be a read-only control that
indicates what the hardware can do. Reading this control would allow
userspace to find out whether the current video can be decoded in
hardware or not (so this could be added to the Querying capabilities
part of the doc).

> > Alternatively, we could go with a control that indicates whether the
> > driver supports a format decorrelated from the decoded bitstream format
> > and still enumerate all formats in case 1., with the implication that
> > only the right one must be picked by userspace. Here again, I don't see
> > the point of reducing the list by setting the profile/level.
> > 
> > So my goal here is to clearly enable userspace to distinguish between
> > the two situations.
> > 
> > What do you think?
> 
> Why would we need to create a control, if we already have the ENUM_FMT
> API existing exactly to achieve this? We already have this problem
> solved for stateful decoders and if we request the userspace to
> actually set all the necessary metadata beforehand, the resulting
> behavior (initialization sequence) would be much more consistent
> between these 2 APIs.

Let's drop this idea altogether, setting required controls feels like a
much more generic and cleaner approach.

Cheers,

Paul

-- 
Developer of free digital technology and hardware support.

Website: https://www.paulk.fr/
Coding blog: https://code.paulk.fr/
Git repositories: https://git.paulk.fr/ https://git.code.paulk.fr/

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-10-12 12:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-04  8:11 [RFC PATCH v2] media: docs-rst: Document m2m stateless video decoder interface Alexandre Courbot
2018-10-04 12:41 ` Paul Kocialkowski
2018-10-09  5:58   ` Tomasz Figa
2018-10-12 12:28     ` Paul Kocialkowski
2018-10-04 12:47 ` Paul Kocialkowski
2018-10-04 18:10   ` Nicolas Dufresne
2018-10-05 17:10     ` Paul Kocialkowski
2018-10-08 10:21       ` Hans Verkuil
2018-10-09  7:36       ` Tomasz Figa
2018-10-12 11:52         ` Paul Kocialkowski
2018-10-09  7:30   ` Tomasz Figa
2018-10-09 17:48     ` Paul Kocialkowski
2018-10-08 11:01 ` Hans Verkuil
2018-10-09  7:41   ` Tomasz Figa
2018-10-09  8:07     ` Hans Verkuil
2018-10-09  5:48 ` Tomasz Figa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.