linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Document memory-to-memory video codec interfaces
@ 2018-07-24 14:06 Tomasz Figa
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
                   ` (3 more replies)
  0 siblings, 4 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-07-24 14:06 UTC (permalink / raw)
  To: linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, Alexandre Courbot, kamil, a.hajda,
	Kyungmin Park, jtp.park, Philipp Zabel,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel, Tomasz Figa

This series attempts to add the documentation of what was discussed
during Media Workshops at LinuxCon Europe 2012 in Barcelona and then
later Embedded Linux Conference Europe 2014 in Düsseldorf and then
eventually written down by Pawel Osciak and tweaked a bit by Chrome OS
video team (but mostly in a cosmetic way or making the document more
precise), during the several years of Chrome OS using the APIs in
production.

Note that most, if not all, of the API is already implemented in
existing mainline drivers, such as s5p-mfc or mtk-vcodec. Intention of
this series is just to formalize what we already have.

It is an initial conversion from Google Docs to RST, so formatting is
likely to need some further polishing. It is also the first time for me
to create such long RST documention. I could not find any other instance
of similar userspace sequence specifications among our Media documents,
so I mostly followed what was there in the source. Feel free to suggest
a better format.

Much of credits should go to Pawel Osciak, for writing most of the
original text of the initial RFC.

Changes since RFC:
(https://lore.kernel.org/patchwork/project/lkml/list/?series=348588)
 - The number of changes is too big to list them all here. Thanks to
   a huge number of very useful comments from everyone (Philipp, Hans,
   Nicolas, Dave, Stanimir, Alexandre) we should have the interfaces much
   more specified now. The issues collected since previous revisions and
   answers leading to this revision are listed below.

General issues

  Should TRY_/S_FMT really return an error if invalid format is set,
  rather than falling back to some valid format? That would be
  contradicting to the general spec.

  Answer: Keep non-error behavior for existing spec compatibility, but
  consider returning error for Request API.

  The number of possible opens of M2M video node should not be
  artificially limited. Drivers should defer allocating limited resources
  (e.g. hardware instances) until initialization is attempted to allow
  probing and pre-opening of video nodes. (Hans suggested vb2 queue setup
  or REQBUFS.)

  Answer: Allocate hardware resources in REQBUFS (or later).

  How about colorimetry settings (colorspace, xfer function, etc.)?
  Normally it is not needed for decoding itself, but some codecs can parse
  it from the stream.  If user space can parse by itself, should it set it
  on OUTPUT queue?  What should happen on CAPTURE queue if colorimetry can
  be parsed, colorimetry can’t be parsed.

  Answer: Mention copying colorimetry from OUTPUT to CAPTURE queue only.
  Potentially extend for hardware that can do colorspace conversion later.

Decoder issues

  Is VIDIOC_ENUM_FRAMESIZES mandatory? Coda doesn’t implement it, s5p-mfc
  either.

  Answer: Make it mandatory. Otherwise nobody would implement it.

  Should we support all the three specification modes of
  VIDIOC_ENUM_FRAMESIZES (continuous, discrete and stepwise)? On both
  queues?

  Answer: Support all 3 size specification modes, not to diverge from
  general specification.

  Should ENUM_FRAMESIZES return coded or visible size?

  Answer: That should be the value that characterizes the stream, so
  coded size. Visible size is just a crop.

  How should ENUM_FRAMESIZES be affected by profiles and levels?

  Answer: Not in current specification - the logic is too complicated and
  it might make more sense to actually handle this in user space.  (In
  theory, level implies supported frame sizes + other factors.)

  Is VIDIOC_ENUM_FRAMEINTERVALS mandatory?  Coda doesn’t implement it,
  s5p-mfc either.  What is the meaning of frame interval for m2m in
  general?

  Answer: Do not include in this specification, because there is no way
  to return meaningful values for memory-to-memory devices.

  What to do for coded formats for which coded resolution can’t be parsed
  (due to format or hardware limitation)? Current draft mentions setting
  them on OUTPUT queue. What would be the effect on CAPTURE queue?
  Should OUTPUT queue format include width/height? Would that mean coded
  or visible size? If so, should they always be configured? Gstreamer
  seems to pass visible size from the container.

  Answer: If OUTPUT format has non-zero width and height, the driver must
  behave as it instantly parsed the coded size from the stream, including
  updating CAPTURE format and queuing source change event. If another
  parameters are parsed later by hardware, a dynamic resolution change
  sequence would be triggered. However, for hardware not parsing such
  parameters from the stream, stateless API should be seriously
  considered.

  How about the legacy behavior of G_FMT(CAPTURE) blocking until queued
  OUTPUT buffers are processed?

  Answer: Do not include in the specification, keep in existing drivers for
  compatibility.

  Should we allow preallocating CAPTURE queue before parsing as an
  optimization?  If user space allocated buffers bigger than required, it
  may be desirable to use them if hardware allows.  Similarly, if a
  decreasing resolution change happened, it may be desirable to avoid
  buffer reallocation.  Gstreamer seems to rely on this behavior to be
  allowed and works luckily because it allocates resolutions matching what
  is parsed later.

  Answer: Yes. The client can setup CAPTURE queue beforehand. The driver
  would still issue a source change event, but if existing buffers are
  compatible with driver requirements (size and count), there is no need to
  reallocate. Similarly for dynamic resolution change. 

  What is the meaning of CAPTURE format?  Should it be coded format,
  visible format or something else?

  Answer: It should be a hardware-specific frame buffer size (>= coded
  size), minimum needed for decoding to proceed.

  Which selection target should be used for visible rectangle? Should we
  also report CROP/COMPOSE_DEFAULT and COMPOSE_PADDED (the area that
  hardware actually overwrites)? How about CROP_BOUNDS?

  Answer: COMPOSE. Also require most of the other meaningful targets.
  Make them default to visible rectangle and, on hardware without
  crop/compose/scale ability, read-only.

  What if the hardware only supports handling complete frames?  Current
  draft says that Source OUTPUT buffers must contain: - H.264/AVC: one or
  more complete NALUs of an Annex B elementary stream; one buffer does not
  have to contain enough data to decode a frame;

  Answer: Defer to specification of particular V4L2_PIX_FMT (FourCC), to be
  further specified later. Current drivers seem to implement support for
  various formats in various ways (especially H264). Moreover, various
  userspace applications have their own way of splitting the bitstream. We
  need to keep all existing users working, so sorting this out will require
  quite a bit of effort and should not be blocking the already de facto
  defined part of the specification.

  Does the driver need to flush its CAPTURE queue internally when a seek is
  issued? Or the client needs to explicitly restart streaming on CAPTURE
  queue?

  Answer: No guarantees for CAPTURE queue from codec. User space needs to
  handle.

  Must all drivers support dynamic resolution change?  Gstreamer parses the
  stream itself and it can handle the change itself by resetting the
  decode.

  Answer: Yes, if it's a feature of the coded format. There is already
  userspace relying on this. A hardware that cannot support this, should
  likely use the stateless codec interface.

  What happens with OUTPUT queue format (resolution, colorimetry) after
  resolution change? Currently always 0 on s5p-mfc. mtk-vcodec reports
  coded resolution.

  Answer: Coded size on OUTPUT queue.

  Can we allow G_FMT(CAPTURE) after resolution change before
  REQBUFS(CAPTURE, 0)?  This would allow keeping current buffer set if the
  resolution decreased.

  Answer: Yes, even before STREAMOFF(CAPTURE).

  Should the client also read visible resolution after resolution change?
  Current draft doesn’t mention it.

  Answer: Yes.

  Is there a requirement or expectation for the encoded data to be framed
  as a single encoded frame per buffer, or is feeding in full buffer sized
  chunks from a ES valid?  It's not stated for the description of
  V4L2_PIX_FMT_H264 etc either.  Should we tie such requirements to
  particular format (FourCC)?

  Answer: Defer to specification of particular V4L2_PIX_FMT (FourCC), to be
  further specified later. Similarly to the earlier issue with H264.

  How about first frame in case of VP8, VP9 or H264_NO_SC? Should that
  include only headers?

  Answer: There is no separate header in case of VP8 and VP9. There are
  only full frames. V4L2_PIX_FMT_H264_NO_SC implies user space splitting
  headers (SPS, PPS) and frame data (slice) into separate buffers, due
  to the nature of the format.

  Should we have a separate format for headers and data in separate
  buffers?

  Answer: As with the other format-specific issues - defer to format
  specification.`

  How about timestamp copying between OUTPUT and CAPTURE buffers?
  The draft says - buffers may become available on the CAPTURE queue
  without additional buffers queued to OUTPUT (e.g. during flush or EOS)
  What timestamps would those buffers have?

  Answer: Those CAPTURE buffers would originate from an earelier OUTPUT
  buffer, just being delayed. Timestamp would match those OUTPUT buffers.

  Supposedly there are existing decoders that can’t deal with seek to a
  non-resume point and end up returning corrupt frames.

  Answer: There is userspace relying on this behavior not crashing the
  system or causing a fatal decode error. Corrupt frames are okay. We can
  extend the specification later with a control that gives a hint to the
  client.

  Maybe we should state what happens to reference buffers, things like DPB.
  Can we CMD_STOP, then V4L2_DEC_CMD_START and continue with the ref kept?

  Answer: Refs lost - same as STREAMOFF(CAPTURE), STREAMON(CAPTURE), except
  that buffers are successfully returned to user space.

  After I streamoff, do I need to send PPS/SPS again after STREAMON, or
  will the codec remember, and the following IDR is fine? (ndufresne: For
  sure the DPB will be gone)

  Answer: Decoder needs to keep PPS/SPS across STREAMOFF(OUTPUT),
  STREAMON(OUTPUT). If we seek to another place in the stream that
  references the same PPS/SPS, no need to queue the same PPS/SPS again
  (since decoder needs to hold it). If we seek somewhere far, skipping
  PPS/SPS on the way, we can’t guarantee anything. In practice most client
  implementations already include PPS/SPS at seek before IDR.

Encoder issues

  Is S_FMT() really mandatory during initialization sequence?  In theory,
  the client could just G_FMT() and use what’s already there. (tfiga: In
  practice unlikely.)

  Answer: Not mandatory, but it's the only thing that makes sense.

  When does the actual encoding start? Once both queues are streaming?

  Answer: When both queues start streaming.

  When does the encoding stop/resets? As soon as one queue receives
  STREAMOFF?

  Answer: STREAMOFF on CAPTURE. After restarting streaming on CAPTURE,
  encoder will generate a stream independent of the stream generated
  before. E.g. no references frames from before the restart (no H.264 long
  term reference), any headers that must be included in a standalone stream
  must be produced again. OUTPUT queue might be restarted on demand to
  let the client change the buffer set or extended later to support
  encoding streams with dynamic resolution changes.

  How should we handle hardware that cannot control encoding parameters
  dynamically?  Should the driver internally stop, reconfigure and restart?
  Or should we defer this to user space?

  Answer: Disallow setting respective controls when streaming.

  Which queue should be master, i.e. be capable of overriding settings on
  the other queue?

  Answer: CAPTURE, since coded format is likely to determine the list of
  supported raw formats.

  How should we describe the behavior of two queues?

  Answer: Say that standard M2M principles apply. Also mention no direct
  relation between order of raw frames being queued and encoded frames
  dequeued, other than timestamp.

  How should encoder controls be handled?  
  
  Answer: Keep up to the driver. Use Request API to set controls for exact
  frames.

  What should VIDIOC_STREAMON do on an already streaming queue, but after
  V4L2_ENC_CMD_STOP?
  https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/vidioc-encoder-cmd.html
  says A read() or VIDIOC_STREAMON call sends an implicit START command to
  the encoder if it has not been started yet.
  https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/vidioc-streamon.html
  says If VIDIOC_STREAMON is called when streaming is already in progress,
  or if VIDIOC_STREAMOFF is called when streaming is already stopped, then
  0 is returned. Nothing happens in the case of VIDIOC_STREAMON[...].

  Answer: Nothing, as per the specification. Use V4L2_ENC_CMD_START for
  resuming from pause.

Tomasz Figa (2):
  media: docs-rst: Document memory-to-memory video decoder interface
  media: docs-rst: Document memory-to-memory video encoder interface

 Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
 Documentation/media/uapi/v4l/dev-encoder.rst | 550 ++++++++++++
 Documentation/media/uapi/v4l/devices.rst     |   2 +
 Documentation/media/uapi/v4l/v4l2.rst        |  12 +-
 4 files changed, 1435 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
 create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst

-- 
2.18.0.233.g985f88cf7e-goog


^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-24 14:06 [PATCH 0/2] Document memory-to-memory video codec interfaces Tomasz Figa
@ 2018-07-24 14:06 ` Tomasz Figa
  2018-07-25 11:58   ` Hans Verkuil
                     ` (5 more replies)
  2018-07-24 14:06 ` [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface Tomasz Figa
                   ` (2 subsequent siblings)
  3 siblings, 6 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-07-24 14:06 UTC (permalink / raw)
  To: linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, Alexandre Courbot, kamil, a.hajda,
	Kyungmin Park, jtp.park, Philipp Zabel,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel, Tomasz Figa

Due to complexity of the video decoding process, the V4L2 drivers of
stateful decoder hardware require specific sequences of V4L2 API calls
to be followed. These include capability enumeration, initialization,
decoding, seek, pause, dynamic resolution change, drain and end of
stream.

Specifics of the above have been discussed during Media Workshops at
LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
Conference Europe 2014 in Düsseldorf. The de facto Codec API that
originated at those events was later implemented by the drivers we already
have merged in mainline, such as s5p-mfc or coda.

The only thing missing was the real specification included as a part of
Linux Media documentation. Fix it now and document the decoder part of
the Codec API.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
---
 Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
 Documentation/media/uapi/v4l/devices.rst     |   1 +
 Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
 3 files changed, 882 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst

diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst b/Documentation/media/uapi/v4l/dev-decoder.rst
new file mode 100644
index 000000000000..f55d34d2f860
--- /dev/null
+++ b/Documentation/media/uapi/v4l/dev-decoder.rst
@@ -0,0 +1,872 @@
+.. -*- coding: utf-8; mode: rst -*-
+
+.. _decoder:
+
+****************************************
+Memory-to-memory Video Decoder Interface
+****************************************
+
+Input data to a video decoder are buffers containing unprocessed video
+stream (e.g. Annex-B H.264/HEVC stream, raw VP8/9 stream). The driver is
+expected not to require any additional information from the client to
+process these buffers. Output data are raw video frames returned in display
+order.
+
+Performing software parsing, processing etc. of the stream in the driver
+in order to support this interface is strongly discouraged. In case such
+operations are needed, use of Stateless Video Decoder Interface (in
+development) is strongly advised.
+
+Conventions and notation used in this document
+==============================================
+
+1. The general V4L2 API rules apply if not specified in this document
+   otherwise.
+
+2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
+   2119.
+
+3. All steps not marked “optional” are required.
+
+4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
+   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
+   unless specified otherwise.
+
+5. Single-plane API (see spec) and applicable structures may be used
+   interchangeably with Multi-plane API, unless specified otherwise,
+   depending on driver capabilities and following the general V4L2
+   guidelines.
+
+6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
+   [0..2]: i = 0, 1, 2.
+
+7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
+   containing data (decoded frame/stream) that resulted from processing
+   buffer A.
+
+Glossary
+========
+
+CAPTURE
+   the destination buffer queue; the queue of buffers containing decoded
+   frames; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
+   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
+   hardware into ``CAPTURE`` buffers
+
+client
+   application client communicating with the driver implementing this API
+
+coded format
+   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.); see
+   also: raw format
+
+coded height
+   height for given coded resolution
+
+coded resolution
+   stream resolution in pixels aligned to codec and hardware requirements;
+   typically visible resolution rounded up to full macroblocks;
+   see also: visible resolution
+
+coded width
+   width for given coded resolution
+
+decode order
+   the order in which frames are decoded; may differ from display order if
+   coded format includes a feature of frame reordering; ``OUTPUT`` buffers
+   must be queued by the client in decode order
+
+destination
+   data resulting from the decode process; ``CAPTURE``
+
+display order
+   the order in which frames must be displayed; ``CAPTURE`` buffers must be
+   returned by the driver in display order
+
+DPB
+   Decoded Picture Buffer; a H.264 term for a buffer that stores a picture
+   that is encoded or decoded and available for reference in further
+   decode/encode steps.
+
+EOS
+   end of stream
+
+IDR
+   a type of a keyframe in H.264-encoded stream, which clears the list of
+   earlier reference frames (DPBs)
+
+keyframe
+   an encoded frame that does not reference frames decoded earlier, i.e.
+   can be decoded fully on its own.
+
+OUTPUT
+   the source buffer queue; the queue of buffers containing encoded
+   bitstream; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
+   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
+   from ``OUTPUT`` buffers
+
+PPS
+   Picture Parameter Set; a type of metadata entity in H.264 bitstream
+
+raw format
+   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
+
+resume point
+   a point in the bitstream from which decoding may start/continue, without
+   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
+   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
+   of a new stream, or to resume decoding after a seek
+
+source
+   data fed to the decoder; ``OUTPUT``
+
+SPS
+   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
+
+visible height
+   height for given visible resolution; display height
+
+visible resolution
+   stream resolution of the visible picture, in pixels, to be used for
+   display purposes; must be smaller or equal to coded resolution;
+   display resolution
+
+visible width
+   width for given visible resolution; display width
+
+Querying capabilities
+=====================
+
+1. To enumerate the set of coded formats supported by the driver, the
+   client may call :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
+
+   * The driver must always return the full set of supported formats,
+     irrespective of the format set on the ``CAPTURE``.
+
+2. To enumerate the set of supported raw formats, the client may call
+   :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
+
+   * The driver must return only the formats supported for the format
+     currently active on ``OUTPUT``.
+
+   * In order to enumerate raw formats supported by a given coded format,
+     the client must first set that coded format on ``OUTPUT`` and then
+     enumerate the ``CAPTURE`` queue.
+
+3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
+   resolutions for a given format, passing desired pixel format in
+   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
+     must include all possible coded resolutions supported by the decoder
+     for given coded pixel format.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
+     must include all possible frame buffer resolutions supported by the
+     decoder for given raw pixel format and coded format currently set on
+     ``OUTPUT``.
+
+    .. note::
+
+       The client may derive the supported resolution range for a
+       combination of coded and raw format by setting width and height of
+       ``OUTPUT`` format to 0 and calculating the intersection of
+       resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
+       for the given coded and raw formats.
+
+4. Supported profiles and levels for given format, if applicable, may be
+   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
+
+Initialization
+==============
+
+1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
+   capability enumeration.
+
+2. Set the coded format on ``OUTPUT`` via :c:func:`VIDIOC_S_FMT`
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``pixelformat``
+         a coded pixel format
+
+     ``width``, ``height``
+         required only if cannot be parsed from the stream for the given
+         coded format; optional otherwise - set to zero to ignore
+
+     other fields
+         follow standard semantics
+
+   * For coded formats including stream resolution information, if width
+     and height are set to non-zero values, the driver will propagate the
+     resolution to ``CAPTURE`` and signal a source change event
+     instantly. However, after the decoder is done parsing the
+     information embedded in the stream, it will update ``CAPTURE``
+     format with new values and signal a source change event again, if
+     the values do not match.
+
+   .. note::
+
+      Changing ``OUTPUT`` format may change currently set ``CAPTURE``
+      format. The driver will derive a new ``CAPTURE`` format from
+      ``OUTPUT`` format being set, including resolution, colorimetry
+      parameters, etc. If the client needs a specific ``CAPTURE`` format,
+      it must adjust it afterwards.
+
+3.  *[optional]* Get minimum number of buffers required for ``OUTPUT``
+    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
+    use more buffers than minimum required by hardware/format.
+
+    * **Required fields:**
+
+      ``id``
+          set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
+
+    * **Return fields:**
+
+      ``value``
+          required number of ``OUTPUT`` buffers for the currently set
+          format
+
+4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
+    ``OUTPUT``.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+      ``memory``
+          follows standard semantics
+
+      ``sizeimage``
+          follows standard semantics; the client is free to choose any
+          suitable size, however, it may be subject to change by the
+          driver
+
+    * **Return fields:**
+
+      ``count``
+          actual number of buffers allocated
+
+    * The driver must adjust count to minimum of required number of
+      ``OUTPUT`` buffers for given format and count passed. The client must
+      check this value after the ioctl returns to get the number of
+      buffers allocated.
+
+    .. note::
+
+       To allocate more than minimum number of buffers (for pipeline
+       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
+       get minimum number of buffers required by the driver/format,
+       and pass the obtained value plus the number of additional
+       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
+
+5.  Start streaming on ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`.
+
+6.  This step only applies to coded formats that contain resolution
+    information in the stream. Continue queuing/dequeuing bitstream
+    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
+    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and returning
+    each buffer to the client until required metadata to configure the
+    ``CAPTURE`` queue are found. This is indicated by the driver sending
+    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
+    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
+    requirement to pass enough data for this to occur in the first buffer
+    and the driver must be able to process any number.
+
+    * If data in a buffer that triggers the event is required to decode
+      the first frame, the driver must not return it to the client,
+      but must retain it for further decoding.
+
+    * If the client set width and height of ``OUTPUT`` format to 0, calling
+      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return -EPERM,
+      until the driver configures ``CAPTURE`` format according to stream
+      metadata.
+
+    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
+      the event is signaled, the decoding process will not continue until
+      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
+      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
+      command.
+
+    .. note::
+
+       No decoded frames are produced during this phase.
+
+7.  This step only applies to coded formats that contain resolution
+    information in the stream.
+    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
+    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
+    enough data is obtained from the stream to allocate ``CAPTURE``
+    buffers and to begin producing decoded frames.
+
+    * **Required fields:**
+
+      ``type``
+          set to ``V4L2_EVENT_SOURCE_CHANGE``
+
+    * **Return fields:**
+
+      ``u.src_change.changes``
+          set to ``V4L2_EVENT_SRC_CH_RESOLUTION``
+
+    * Any client query issued after the driver queues the event must return
+      values applying to the just parsed stream, including queue formats,
+      selection rectangles and controls.
+
+8.  Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
+    destination buffers parsed/decoded from the bitstream.
+
+    * **Required fields:**
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+    * **Return fields:**
+
+      ``width``, ``height``
+          frame buffer resolution for the decoded frames
+
+      ``pixelformat``
+          pixel format for decoded frames
+
+      ``num_planes`` (for _MPLANE ``type`` only)
+          number of planes for pixelformat
+
+      ``sizeimage``, ``bytesperline``
+          as per standard semantics; matching frame buffer format
+
+    .. note::
+
+       The value of ``pixelformat`` may be any pixel format supported and
+       must be supported for current stream, based on the information
+       parsed from the stream and hardware capabilities. It is suggested
+       that driver chooses the preferred/optimal format for given
+       configuration. For example, a YUV format may be preferred over an
+       RGB format, if additional conversion step would be required.
+
+9.  *[optional]* Enumerate ``CAPTURE`` formats via
+    :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE`` queue. Once the stream
+    information is parsed and known, the client may use this ioctl to
+    discover which raw formats are supported for given stream and select on
+    of them via :c:func:`VIDIOC_S_FMT`.
+
+    .. note::
+
+       The driver will return only formats supported for the current stream
+       parsed in this initialization sequence, even if more formats may be
+       supported by the driver in general.
+
+       For example, a driver/hardware may support YUV and RGB formats for
+       resolutions 1920x1088 and lower, but only YUV for higher
+       resolutions (due to hardware limitations). After parsing
+       a resolution of 1920x1088 or lower, :c:func:`VIDIOC_ENUM_FMT` may
+       return a set of YUV and RGB pixel formats, but after parsing
+       resolution higher than 1920x1088, the driver will not return RGB,
+       unsupported for this resolution.
+
+       However, subsequent resolution change event triggered after
+       discovering a resolution change within the same stream may switch
+       the stream into a lower resolution and :c:func:`VIDIOC_ENUM_FMT`
+       would return RGB formats again in that case.
+
+10.  *[optional]* Choose a different ``CAPTURE`` format than suggested via
+     :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the
+     client to choose a different format than selected/suggested by the
+     driver in :c:func:`VIDIOC_G_FMT`.
+
+     * **Required fields:**
+
+       ``type``
+           a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+       ``pixelformat``
+           a raw pixel format
+
+     .. note::
+
+        Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
+        formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to
+        find out a set of allowed formats for given configuration, but not
+        required, if the client can accept the defaults.
+
+11. *[optional]* Acquire visible resolution via
+    :c:func:`VIDIOC_G_SELECTION`.
+
+    * **Required fields:**
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+      ``target``
+          set to ``V4L2_SEL_TGT_COMPOSE``
+
+    * **Return fields:**
+
+      ``r.left``, ``r.top``, ``r.width``, ``r.height``
+          visible rectangle; this must fit within frame buffer resolution
+          returned by :c:func:`VIDIOC_G_FMT`.
+
+    * The driver must expose following selection targets on ``CAPTURE``:
+
+      ``V4L2_SEL_TGT_CROP_BOUNDS``
+          corresponds to coded resolution of the stream
+
+      ``V4L2_SEL_TGT_CROP_DEFAULT``
+          a rectangle covering the part of the frame buffer that contains
+          meaningful picture data (visible area); width and height will be
+          equal to visible resolution of the stream
+
+      ``V4L2_SEL_TGT_CROP``
+          rectangle within coded resolution to be output to ``CAPTURE``;
+          defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``; read-only on hardware
+          without additional compose/scaling capabilities
+
+      ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
+          maximum rectangle within ``CAPTURE`` buffer, which the cropped
+          frame can be output into; equal to ``V4L2_SEL_TGT_CROP``, if the
+          hardware does not support compose/scaling
+
+      ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
+          equal to ``V4L2_SEL_TGT_CROP``
+
+      ``V4L2_SEL_TGT_COMPOSE``
+          rectangle inside ``OUTPUT`` buffer into which the cropped frame
+          is output; defaults to ``V4L2_SEL_TGT_COMPOSE_DEFAULT``;
+          read-only on hardware without additional compose/scaling
+          capabilities
+
+      ``V4L2_SEL_TGT_COMPOSE_PADDED``
+          rectangle inside ``OUTPUT`` buffer which is overwritten by the
+          hardware; equal to ``V4L2_SEL_TGT_COMPOSE``, if the hardware
+          does not write padding pixels
+
+12. *[optional]* Get minimum number of buffers required for ``CAPTURE``
+    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
+    use more buffers than minimum required by hardware/format.
+
+    * **Required fields:**
+
+      ``id``
+          set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
+
+    * **Return fields:**
+
+      ``value``
+          minimum number of buffers required to decode the stream parsed in
+          this initialization sequence.
+
+    .. note::
+
+       Note that the minimum number of buffers must be at least the number
+       required to successfully decode the current stream. This may for
+       example be the required DPB size for an H.264 stream given the
+       parsed stream configuration (resolution, level).
+
+13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
+    on the ``CAPTURE`` queue.
+
+    * **Required fields:**
+
+      ``count``
+          requested number of buffers to allocate; greater than zero
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+      ``memory``
+          follows standard semantics
+
+    * **Return fields:**
+
+      ``count``
+          adjusted to allocated number of buffers
+
+    * The driver must adjust count to minimum of required number of
+      destination buffers for given format and stream configuration and the
+      count passed. The client must check this value after the ioctl
+      returns to get the number of buffers allocated.
+
+    .. note::
+
+       To allocate more than minimum number of buffers (for pipeline
+       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
+       get minimum number of buffers required, and pass the obtained value
+       plus the number of additional buffers needed in count to
+       :c:func:`VIDIOC_REQBUFS`.
+
+14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
+
+Decoding
+========
+
+This state is reached after a successful initialization sequence. In this
+state, client queues and dequeues buffers to both queues via
+:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
+semantics.
+
+Both queues operate independently, following standard behavior of V4L2
+buffer queues and memory-to-memory devices. In addition, the order of
+decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
+queuing coded frames to ``OUTPUT`` queue, due to properties of selected
+coded format, e.g. frame reordering. The client must not assume any direct
+relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
+reported by :c:type:`v4l2_buffer` ``timestamp`` field.
+
+The contents of source ``OUTPUT`` buffers depend on active coded pixel
+format and might be affected by codec-specific extended controls, as stated
+in documentation of each format individually.
+
+The client must not assume any direct relationship between ``CAPTURE``
+and ``OUTPUT`` buffers and any specific timing of buffers becoming
+available to dequeue. Specifically:
+
+* a buffer queued to ``OUTPUT`` may result in no buffers being produced
+  on ``CAPTURE`` (e.g. if it does not contain encoded data, or if only
+  metadata syntax structures are present in it),
+
+* a buffer queued to ``OUTPUT`` may result in more than 1 buffer produced
+  on ``CAPTURE`` (if the encoded data contained more than one frame, or if
+  returning a decoded frame allowed the driver to return a frame that
+  preceded it in decode, but succeeded it in display order),
+
+* a buffer queued to ``OUTPUT`` may result in a buffer being produced on
+  ``CAPTURE`` later into decode process, and/or after processing further
+  ``OUTPUT`` buffers, or be returned out of order, e.g. if display
+  reordering is used,
+
+* buffers may become available on the ``CAPTURE`` queue without additional
+  buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of
+  ``OUTPUT`` buffers being queued in the past and decoding result of which
+  being available only at later time, due to specifics of the decoding
+  process.
+
+Seek
+====
+
+Seek is controlled by the ``OUTPUT`` queue, as it is the source of
+bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
+
+1. Stop the ``OUTPUT`` queue to begin the seek sequence via
+   :c:func:`VIDIOC_STREAMOFF`.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+   * The driver must drop all the pending ``OUTPUT`` buffers and they are
+     treated as returned to the client (following standard semantics).
+
+2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+   * The driver must be put in a state after seek and be ready to
+     accept new source bitstream buffers.
+
+3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
+   the seek until a suitable resume point is found.
+
+   .. note::
+
+      There is no requirement to begin queuing stream starting exactly from
+      a resume point (e.g. SPS or a keyframe). The driver must handle any
+      data queued and must keep processing the queued buffers until it
+      finds a suitable resume point. While looking for a resume point, the
+      driver processes ``OUTPUT`` buffers and returns them to the client
+      without producing any decoded frames.
+
+      For hardware known to be mishandling seeks to a non-resume point,
+      e.g. by returning corrupted decoded frames, the driver must be able
+      to handle such seeks without a crash or any fatal decode error.
+
+4. After a resume point is found, the driver will start returning
+   ``CAPTURE`` buffers with decoded frames.
+
+   * There is no precise specification for ``CAPTURE`` queue of when it
+     will start producing buffers containing decoded data from buffers
+     queued after the seek, as it operates independently
+     from ``OUTPUT`` queue.
+
+     * The driver is allowed to and may return a number of remaining
+       ``CAPTURE`` buffers containing decoded frames from before the seek
+       after the seek sequence (STREAMOFF-STREAMON) is performed.
+
+     * The driver is also allowed to and may not return all decoded frames
+       queued but not decode before the seek sequence was initiated. For
+       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
+       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
+       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
+       H’}, {A’, G’, H’}, {G’, H’}.
+
+   .. note::
+
+      To achieve instantaneous seek, the client may restart streaming on
+      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
+
+Pause
+=====
+
+In order to pause, the client should just cease queuing buffers onto the
+``OUTPUT`` queue. This is different from the general V4L2 API definition of
+pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
+Without source bitstream data, there is no data to process and the hardware
+remains idle.
+
+Conversely, using :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue indicates
+a seek, which
+
+1. drops all ``OUTPUT`` buffers in flight and
+2. after a subsequent :c:func:`VIDIOC_STREAMON`, will look for and only
+   continue from a resume point.
+
+This is usually undesirable for pause. The STREAMOFF-STREAMON sequence is
+intended for seeking.
+
+Similarly, ``CAPTURE`` queue should remain streaming as well, as the
+STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
+sets.
+
+Dynamic resolution change
+=========================
+
+A video decoder implementing this interface must support dynamic resolution
+change, for streams, which include resolution metadata in the bitstream.
+When the decoder encounters a resolution change in the stream, the dynamic
+resolution change sequence is started.
+
+1.  After encountering a resolution change in the stream, the driver must
+    first process and decode all remaining buffers from before the
+    resolution change point.
+
+2.  After all buffers containing decoded frames from before the resolution
+    change point are ready to be dequeued on the ``CAPTURE`` queue, the
+    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
+    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
+
+    * The last buffer from before the change must be marked with
+      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the
+      drain sequence. The last buffer might be empty (with
+      :c:type:`v4l2_buffer` ``bytesused`` = 0) and must be ignored by the
+      client, since it does not contain any decoded frame.
+
+    * Any client query issued after the driver queues the event must return
+      values applying to the stream after the resolution change, including
+      queue formats, selection rectangles and controls.
+
+    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
+      the event is signaled, the decoding process will not continue until
+      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
+      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
+      command.
+
+    .. note::
+
+       Any attempts to dequeue more buffers beyond the buffer marked
+       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
+       :c:func:`VIDIOC_DQBUF`.
+
+3.  The client calls :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` to get the new
+    format information. This is identical to calling :c:func:`VIDIOC_G_FMT`
+    after ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization sequence
+    and should be handled similarly.
+
+    .. note::
+
+       It is allowed for the driver not to support the same pixel format as
+       previously used (before the resolution change) for the new
+       resolution. The driver must select a default supported pixel format,
+       return it, if queried using :c:func:`VIDIOC_G_FMT`, and the client
+       must take note of it.
+
+4.  The client acquires visible resolution as in initialization sequence.
+
+5.  *[optional]* The client is allowed to enumerate available formats and
+    select a different one than currently chosen (returned via
+    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
+    the initialization sequence.
+
+6.  *[optional]* The client acquires minimum number of buffers as in
+    initialization sequence.
+
+7.  If all the following conditions are met, the client may resume the
+    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
+    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
+    sequence:
+
+    * ``sizeimage`` of new format is less than or equal to the size of
+      currently allocated buffers,
+
+    * the number of buffers currently allocated is greater than or equal to
+      the minimum number of buffers acquired in step 6.
+
+    In such case, the remaining steps do not apply.
+
+    However, if the client intends to change the buffer set, to lower
+    memory usage or for any other reasons, it may be achieved by following
+    the steps below.
+
+8.  After dequeuing all remaining buffers from the ``CAPTURE`` queue, the
+    client must call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
+    The ``OUTPUT`` queue must remain streaming (calling STREAMOFF on it
+    would trigger a seek).
+
+9.  The client frees the buffers on the ``CAPTURE`` queue using
+    :c:func:`VIDIOC_REQBUFS`.
+
+    * **Required fields:**
+
+      ``count``
+          set to 0
+
+      ``type``
+          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+      ``memory``
+          follows standard semantics
+
+10. The client allocates a new set of buffers for the ``CAPTURE`` queue via
+    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
+    the initialization sequence.
+
+11. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
+    ``CAPTURE`` queue.
+
+During the resolution change sequence, the ``OUTPUT`` queue must remain
+streaming. Calling :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue would
+initiate a seek.
+
+The ``OUTPUT`` queue operates separately from the ``CAPTURE`` queue for the
+duration of the entire resolution change sequence. It is allowed (and
+recommended for best performance and simplicity) for the client to keep
+queuing/dequeuing buffers from/to ``OUTPUT`` queue even while processing
+this sequence.
+
+.. note::
+
+   It is also possible for this sequence to be triggered without a change
+   in coded resolution, if a different number of ``CAPTURE`` buffers is
+   required in order to continue decoding the stream or the visible
+   resolution changes.
+
+Drain
+=====
+
+To ensure that all queued ``OUTPUT`` buffers have been processed and
+related ``CAPTURE`` buffers output to the client, the following drain
+sequence may be followed. After the drain sequence is complete, the client
+has received all decoded frames for all ``OUTPUT`` buffers queued before
+the sequence was started.
+
+1. Begin drain by issuing :c:func:`VIDIOC_DECODER_CMD`.
+
+   * **Required fields:**
+
+     ``cmd``
+         set to ``V4L2_DEC_CMD_STOP``
+
+     ``flags``
+         set to 0
+
+     ``pts``
+         set to 0
+
+2. The driver must process and decode as normal all ``OUTPUT`` buffers
+   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was issued.
+   Any operations triggered as a result of processing these buffers
+   (including the initialization and resolution change sequences) must be
+   processed as normal by both the driver and the client before proceeding
+   with the drain sequence.
+
+3. Once all ``OUTPUT`` buffers queued before ``V4L2_DEC_CMD_STOP`` are
+   processed:
+
+   * If the ``CAPTURE`` queue is streaming, once all decoded frames (if
+     any) are ready to be dequeued on the ``CAPTURE`` queue, the driver
+     must send a ``V4L2_EVENT_EOS``. The driver must also set
+     ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
+     buffer on the ``CAPTURE`` queue containing the last frame (if any)
+     produced as a result of processing the ``OUTPUT`` buffers queued
+     before ``V4L2_DEC_CMD_STOP``. If no more frames are left to be
+     returned at the point of handling ``V4L2_DEC_CMD_STOP``, the driver
+     must return an empty buffer (with :c:type:`v4l2_buffer`
+     ``bytesused`` = 0) as the last buffer with ``V4L2_BUF_FLAG_LAST`` set
+     instead. Any attempts to dequeue more buffers beyond the buffer marked
+     with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
+     :c:func:`VIDIOC_DQBUF`.
+
+   * If the ``CAPTURE`` queue is NOT streaming, no action is necessary for
+     ``CAPTURE`` queue and the driver must send a ``V4L2_EVENT_EOS``
+     immediately after all ``OUTPUT`` buffers in question have been
+     processed.
+
+4. At this point, decoding is paused and the driver will accept, but not
+   process any newly queued ``OUTPUT`` buffers until the client issues
+   ``V4L2_DEC_CMD_START`` or restarts streaming on any queue.
+
+* Once the drain sequence is initiated, the client needs to drive it to
+  completion, as described by the above steps, unless it aborts the process
+  by issuing :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue.  The client
+  is not allowed to issue ``V4L2_DEC_CMD_START`` or ``V4L2_DEC_CMD_STOP``
+  again while the drain sequence is in progress and they will fail with
+  -EBUSY error code if attempted.
+
+* Restarting streaming on ``OUTPUT`` queue will implicitly end the paused
+  state and reinitialize the decoder (similarly to the seek sequence).
+  Restarting ``CAPTURE`` queue will not affect an in-progress drain
+  sequence.
+
+* The drivers must also implement :c:func:`VIDIOC_TRY_DECODER_CMD`, as a
+  way to let the client query the availability of decoder commands.
+
+End of stream
+=============
+
+If the decoder encounters an end of stream marking in the stream, the
+driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
+are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
+:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
+behavior is identical to the drain sequence triggered by the client via
+``V4L2_DEC_CMD_STOP``.
+
+Commit points
+=============
+
+Setting formats and allocating buffers triggers changes in the behavior
+of the driver.
+
+1. Setting format on ``OUTPUT`` queue may change the set of formats
+   supported/advertised on the ``CAPTURE`` queue. In particular, it also
+   means that ``CAPTURE`` format may be reset and the client must not
+   rely on the previously set format being preserved.
+
+2. Enumerating formats on ``CAPTURE`` queue must only return formats
+   supported for the ``OUTPUT`` format currently set.
+
+3. Setting/changing format on ``CAPTURE`` queue does not change formats
+   available on ``OUTPUT`` queue. An attempt to set ``CAPTURE`` format that
+   is not supported for the currently selected ``OUTPUT`` format must
+   result in the driver adjusting the requested format to an acceptable
+   one.
+
+4. Enumerating formats on ``OUTPUT`` queue always returns the full set of
+   supported coded formats, irrespective of the current ``CAPTURE``
+   format.
+
+5. After allocating buffers on the ``OUTPUT`` queue, it is not possible to
+   change format on it.
+
+To summarize, setting formats and allocation must always start with the
+``OUTPUT`` queue and the ``OUTPUT`` queue is the master that governs the
+set of supported formats for the ``CAPTURE`` queue.
diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
index fb7f8c26cf09..12d43fe711cf 100644
--- a/Documentation/media/uapi/v4l/devices.rst
+++ b/Documentation/media/uapi/v4l/devices.rst
@@ -15,6 +15,7 @@ Interfaces
     dev-output
     dev-osd
     dev-codec
+    dev-decoder
     dev-effect
     dev-raw-vbi
     dev-sliced-vbi
diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
index b89e5621ae69..65dc096199ad 100644
--- a/Documentation/media/uapi/v4l/v4l2.rst
+++ b/Documentation/media/uapi/v4l/v4l2.rst
@@ -53,6 +53,10 @@ Authors, in alphabetical order:
 
   - Original author of the V4L2 API and documentation.
 
+- Figa, Tomasz <tfiga@chromium.org>
+
+  - Documented the memory-to-memory decoder interface.
+
 - H Schimek, Michael <mschimek@gmx.at>
 
   - Original author of the V4L2 API and documentation.
@@ -61,6 +65,10 @@ Authors, in alphabetical order:
 
   - Documented the Digital Video timings API.
 
+- Osciak, Pawel <posciak@chromium.org>
+
+  - Documented the memory-to-memory decoder interface.
+
 - Osciak, Pawel <pawel@osciak.com>
 
   - Designed and documented the multi-planar API.
@@ -85,7 +93,7 @@ Authors, in alphabetical order:
 
   - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
 
-**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
+**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa
 
 Except when explicitly stated as GPL, programming examples within this
 part can be used and distributed without restrictions.
-- 
2.18.0.233.g985f88cf7e-goog


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-07-24 14:06 [PATCH 0/2] Document memory-to-memory video codec interfaces Tomasz Figa
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
@ 2018-07-24 14:06 ` Tomasz Figa
  2018-07-25 13:41   ` Philipp Zabel
                     ` (3 more replies)
  2018-07-25 13:28 ` [PATCH 0/2] Document memory-to-memory video codec interfaces Philipp Zabel
  2018-09-10  9:13 ` Hans Verkuil
  3 siblings, 4 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-07-24 14:06 UTC (permalink / raw)
  To: linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, Alexandre Courbot, kamil, a.hajda,
	Kyungmin Park, jtp.park, Philipp Zabel,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel, Tomasz Figa

Due to complexity of the video encoding process, the V4L2 drivers of
stateful encoder hardware require specific sequences of V4L2 API calls
to be followed. These include capability enumeration, initialization,
encoding, encode parameters change, drain and reset.

Specifics of the above have been discussed during Media Workshops at
LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
Conference Europe 2014 in Düsseldorf. The de facto Codec API that
originated at those events was later implemented by the drivers we already
have merged in mainline, such as s5p-mfc or coda.

The only thing missing was the real specification included as a part of
Linux Media documentation. Fix it now and document the encoder part of
the Codec API.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
---
 Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
 Documentation/media/uapi/v4l/devices.rst     |   1 +
 Documentation/media/uapi/v4l/v4l2.rst        |   2 +
 3 files changed, 553 insertions(+)
 create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst

diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
new file mode 100644
index 000000000000..28be1698e99c
--- /dev/null
+++ b/Documentation/media/uapi/v4l/dev-encoder.rst
@@ -0,0 +1,550 @@
+.. -*- coding: utf-8; mode: rst -*-
+
+.. _encoder:
+
+****************************************
+Memory-to-memory Video Encoder Interface
+****************************************
+
+Input data to a video encoder are raw video frames in display order
+to be encoded into the output bitstream. Output data are complete chunks of
+valid bitstream, including all metadata, headers, etc. The resulting stream
+must not need any further post-processing by the client.
+
+Performing software stream processing, header generation etc. in the driver
+in order to support this interface is strongly discouraged. In case such
+operations are needed, use of Stateless Video Encoder Interface (in
+development) is strongly advised.
+
+Conventions and notation used in this document
+==============================================
+
+1. The general V4L2 API rules apply if not specified in this document
+   otherwise.
+
+2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
+   2119.
+
+3. All steps not marked “optional” are required.
+
+4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
+   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
+   unless specified otherwise.
+
+5. Single-plane API (see spec) and applicable structures may be used
+   interchangeably with Multi-plane API, unless specified otherwise,
+   depending on driver capabilities and following the general V4L2
+   guidelines.
+
+6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
+   [0..2]: i = 0, 1, 2.
+
+7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
+   containing data (encoded frame/stream) that resulted from processing
+   buffer A.
+
+Glossary
+========
+
+CAPTURE
+   the destination buffer queue; the queue of buffers containing encoded
+   bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
+   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
+   hardware into ``CAPTURE`` buffers
+
+client
+   application client communicating with the driver implementing this API
+
+coded format
+   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.);
+   see also: raw format
+
+coded height
+   height for given coded resolution
+
+coded resolution
+   stream resolution in pixels aligned to codec and hardware requirements;
+   typically visible resolution rounded up to full macroblocks; see also:
+   visible resolution
+
+coded width
+   width for given coded resolution
+
+decode order
+   the order in which frames are decoded; may differ from display order if
+   coded format includes a feature of frame reordering; ``CAPTURE`` buffers
+   must be returned by the driver in decode order
+
+display order
+   the order in which frames must be displayed; ``OUTPUT`` buffers must be
+   queued by the client in display order
+
+IDR
+   a type of a keyframe in H.264-encoded stream, which clears the list of
+   earlier reference frames (DPBs)
+
+keyframe
+   an encoded frame that does not reference frames decoded earlier, i.e.
+   can be decoded fully on its own.
+
+macroblock
+   a processing unit in image and video compression formats based on linear
+   block transforms (e.g. H264, VP8, VP9); codec-specific, but for most of
+   popular codecs the size is 16x16 samples (pixels)
+
+OUTPUT
+   the source buffer queue; the queue of buffers containing raw frames;
+   ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
+   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
+   from ``OUTPUT`` buffers
+
+PPS
+   Picture Parameter Set; a type of metadata entity in H.264 bitstream
+
+raw format
+   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
+
+resume point
+   a point in the bitstream from which decoding may start/continue, without
+   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
+   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
+   of a new stream, or to resume decoding after a seek
+
+source
+   data fed to the encoder; ``OUTPUT``
+
+source height
+   height in pixels for given source resolution
+
+source resolution
+   resolution in pixels of source frames being source to the encoder and
+   subject to further cropping to the bounds of visible resolution
+
+source width
+   width in pixels for given source resolution
+
+SPS
+   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
+
+stream metadata
+   additional (non-visual) information contained inside encoded bitstream;
+   for example: coded resolution, visible resolution, codec profile
+
+visible height
+   height for given visible resolution; display height
+
+visible resolution
+   stream resolution of the visible picture, in pixels, to be used for
+   display purposes; must be smaller or equal to coded resolution;
+   display resolution
+
+visible width
+   width for given visible resolution; display width
+
+Querying capabilities
+=====================
+
+1. To enumerate the set of coded formats supported by the driver, the
+   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
+
+   * The driver must always return the full set of supported formats,
+     irrespective of the format set on the ``OUTPUT`` queue.
+
+2. To enumerate the set of supported raw formats, the client may call
+   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
+
+   * The driver must return only the formats supported for the format
+     currently active on ``CAPTURE``.
+
+   * In order to enumerate raw formats supported by a given coded format,
+     the client must first set that coded format on ``CAPTURE`` and then
+     enumerate the ``OUTPUT`` queue.
+
+3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
+   resolutions for a given format, passing desired pixel format in
+   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
+     must include all possible coded resolutions supported by the encoder
+     for given coded pixel format.
+
+   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
+     queue must include all possible frame buffer resolutions supported
+     by the encoder for given raw pixel format and coded format currently
+     set on ``CAPTURE``.
+
+4. Supported profiles and levels for given format, if applicable, may be
+   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
+
+5. Any additional encoder capabilities may be discovered by querying
+   their respective controls.
+
+Initialization
+==============
+
+1. *[optional]* Enumerate supported formats and resolutions. See
+   capability enumeration.
+
+2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
+
+     ``pixelformat``
+         set to a coded format to be produced
+
+   * **Return fields:**
+
+     ``width``, ``height``
+         coded resolution (based on currently active ``OUTPUT`` format)
+
+   .. note::
+
+      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
+      format. The driver will derive a new ``OUTPUT`` format from
+      ``CAPTURE`` format being set, including resolution, colorimetry
+      parameters, etc. If the client needs a specific ``OUTPUT`` format,
+      it must adjust it afterwards.
+
+3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
+   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``index``
+         follows standard semantics
+
+   * **Return fields:**
+
+     ``pixelformat``
+         raw format supported for the coded format currently selected on
+         the ``OUTPUT`` queue.
+
+4. The client may set the raw source format on the ``OUTPUT`` queue via
+   :c:func:`VIDIOC_S_FMT`.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``pixelformat``
+         raw format of the source
+
+     ``width``, ``height``
+         source resolution
+
+     ``num_planes`` (for _MPLANE)
+         set to number of planes for pixelformat
+
+     ``sizeimage``, ``bytesperline``
+         follow standard semantics
+
+   * **Return fields:**
+
+     ``width``, ``height``
+         may be adjusted by driver to match alignment requirements, as
+         required by the currently selected formats
+
+     ``sizeimage``, ``bytesperline``
+         follow standard semantics
+
+   * Setting the source resolution will reset visible resolution to the
+     adjusted source resolution rounded up to the closest visible
+     resolution supported by the driver. Similarly, coded resolution will
+     be reset to source resolution rounded up to the closest coded
+     resolution supported by the driver (typically a multiple of
+     macroblock size).
+
+   .. note::
+
+      This step is not strictly required, since ``OUTPUT`` is expected to
+      have a valid default format. However, the client needs to ensure that
+      ``OUTPUT`` format matches its expectations via either
+      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
+      being the typical scenario, since the default format is unlikely to
+      be what the client needs.
+
+5. *[optional]* Set visible resolution for the stream metadata via
+   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
+
+   * **Required fields:**
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
+
+     ``target``
+         set to ``V4L2_SEL_TGT_CROP``
+
+     ``r.left``, ``r.top``, ``r.width``, ``r.height``
+         visible rectangle; this must fit within the framebuffer resolution
+         and might be subject to adjustment to match codec and hardware
+         constraints
+
+   * **Return fields:**
+
+     ``r.left``, ``r.top``, ``r.width``, ``r.height``
+         visible rectangle adjusted by the driver
+
+   * The driver must expose following selection targets on ``OUTPUT``:
+
+     ``V4L2_SEL_TGT_CROP_BOUNDS``
+         maximum crop bounds within the source buffer supported by the
+         encoder
+
+     ``V4L2_SEL_TGT_CROP_DEFAULT``
+         suggested cropping rectangle that covers the whole source picture
+
+     ``V4L2_SEL_TGT_CROP``
+         rectangle within the source buffer to be encoded into the
+         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
+
+     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
+         maximum rectangle within the coded resolution, which the cropped
+         source frame can be output into; always equal to (0, 0)x(width of
+         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
+         hardware does not support compose/scaling
+
+     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
+         equal to ``V4L2_SEL_TGT_CROP``
+
+     ``V4L2_SEL_TGT_COMPOSE``
+         rectangle within the coded frame, which the cropped source frame
+         is to be output into; defaults to
+         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
+         additional compose/scaling capabilities; resulting stream will
+         have this rectangle encoded as the visible rectangle in its
+         metadata
+
+     ``V4L2_SEL_TGT_COMPOSE_PADDED``
+         always equal to coded resolution of the stream, as selected by the
+         encoder based on source resolution and crop/compose rectangles
+
+   .. note::
+
+      The driver may adjust the crop/compose rectangles to the nearest
+      supported ones to meet codec and hardware requirements.
+
+6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
+   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
+
+   * **Required fields:**
+
+     ``count``
+         requested number of buffers to allocate; greater than zero
+
+     ``type``
+         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
+         ``CAPTURE``
+
+     ``memory``
+         follows standard semantics
+
+   * **Return fields:**
+
+     ``count``
+         adjusted to allocated number of buffers
+
+   * The driver must adjust count to minimum of required number of
+     buffers for given format and count passed. The client must
+     check this value after the ioctl returns to get the number of
+     buffers actually allocated.
+
+   .. note::
+
+      To allocate more than minimum number of buffers (for pipeline
+      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
+      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
+      to get the minimum number of buffers required by the
+      driver/format, and pass the obtained value plus the number of
+      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.
+
+7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
+   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual
+   encoding process starts when both queues start streaming.
+
+.. note::
+
+   If the client stops ``CAPTURE`` during the encode process and then
+   restarts it again, the encoder will be expected to generate a stream
+   independent from the stream generated before the stop. Depending on the
+   coded format, that may imply that:
+
+   * encoded frames produced after the restart must not reference any
+     frames produced before the stop, e.g. no long term references for
+     H264,
+
+   * any headers that must be included in a standalone stream must be
+     produced again, e.g. SPS and PPS for H264.
+
+Encoding
+========
+
+This state is reached after a successful initialization sequence. In
+this state, client queues and dequeues buffers to both queues via
+:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
+semantics.
+
+Both queues operate independently, following standard behavior of V4L2
+buffer queues and memory-to-memory devices. In addition, the order of
+encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
+queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
+format, e.g. frame reordering. The client must not assume any direct
+relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
+reported by :c:type:`v4l2_buffer` ``timestamp``.
+
+Encoding parameter changes
+==========================
+
+The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
+parameters at any time. The availability of parameters is driver-specific
+and the client must query the driver to find the set of available controls.
+
+The ability to change each parameter during encoding of is driver-specific,
+as per standard semantics of the V4L2 control interface. The client may
+attempt setting a control of its interest during encoding and if it the
+operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be
+stopped for the configuration change to be allowed (following the drain
+sequence will be  needed to avoid losing already queued/encoded frames).
+
+The timing of parameter update is driver-specific, as per standard
+semantics of the V4L2 control interface. If the client needs to apply the
+parameters exactly at specific frame and the encoder supports it, using
+Request API should be considered.
+
+Drain
+=====
+
+To ensure that all queued ``OUTPUT`` buffers have been processed and
+related ``CAPTURE`` buffers output to the client, the following drain
+sequence may be followed. After the drain sequence is complete, the client
+has received all encoded frames for all ``OUTPUT`` buffers queued before
+the sequence was started.
+
+1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
+
+   * **Required fields:**
+
+     ``cmd``
+         set to ``V4L2_ENC_CMD_STOP``
+
+     ``flags``
+         set to 0
+
+     ``pts``
+         set to 0
+
+2. The driver must process and encode as normal all ``OUTPUT`` buffers
+   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
+
+3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
+   processed:
+
+   * Once all decoded frames (if any) are ready to be dequeued on the
+     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
+     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
+     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
+     last frame (if any) produced as a result of processing the ``OUTPUT``
+     buffers queued before
+     ``V4L2_ENC_CMD_STOP``.
+
+   * If no more frames are left to be returned at the point of handling
+     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
+     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
+     ``V4L2_BUF_FLAG_LAST`` set.
+
+   * Any attempts to dequeue more buffers beyond the buffer marked with
+     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
+     :c:func:`VIDIOC_DQBUF`.
+
+4. At this point, encoding is paused and the driver will accept, but not
+   process any newly queued ``OUTPUT`` buffers until the client issues
+   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
+
+* Once the drain sequence is initiated, the client needs to drive it to
+  completion, as described by the above steps, unless it aborts the process
+  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
+  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
+  again while the drain sequence is in progress and they will fail with
+  -EBUSY error code if attempted.
+
+* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
+  state and make the encoder continue encoding, as long as other encoding
+  conditions are met. Restarting ``OUTPUT`` queue will not affect an
+  in-progress drain sequence.
+
+* The drivers must also implement :c:func:`VIDIOC_TRY_ENCODER_CMD`, as a
+  way to let the client query the availability of encoder commands.
+
+Reset
+=====
+
+The client may want to request the encoder to reinitialize the encoding,
+so that the stream produced becomes independent from the stream generated
+before. Depending on the coded format, that may imply that:
+
+* encoded frames produced after the restart must not reference any frames
+  produced before the stop, e.g. no long term references for H264,
+
+* any headers that must be included in a standalone stream must be produced
+  again, e.g. SPS and PPS for H264.
+
+This can be achieved by performing the reset sequence.
+
+1. *[optional]* If the client is interested in encoded frames resulting
+   from already queued source frames, it needs to perform the Drain
+   sequence. Otherwise, the reset sequence would cause the already
+   encoded and not dequeued encoded frames to be lost.
+
+2. Stop streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMOFF`. This
+   will return all currently queued ``CAPTURE`` buffers to the client,
+   without valid frame data.
+
+3. *[optional]* Restart streaming on ``OUTPUT`` queue via
+   :c:func:`VIDIOC_STREAMOFF` followed by :c:func:`VIDIOC_STREAMON` to
+   drop any source frames enqueued to the encoder before the reset
+   sequence. This is useful if the client requires the new stream to begin
+   at specific source frame. Otherwise, the new stream might include
+   frames encoded from source frames queued before the reset sequence.
+
+4. Restart streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMON` and
+   continue with regular encoding sequence. The encoded frames produced
+   into ``CAPTURE`` buffers from now on will contain a standalone stream
+   that can be decoded without the need for frames encoded before the reset
+   sequence.
+
+Commit points
+=============
+
+Setting formats and allocating buffers triggers changes in the behavior
+of the driver.
+
+1. Setting format on ``CAPTURE`` queue may change the set of formats
+   supported/advertised on the ``OUTPUT`` queue. In particular, it also
+   means that ``OUTPUT`` format may be reset and the client must not
+   rely on the previously set format being preserved.
+
+2. Enumerating formats on ``OUTPUT`` queue must only return formats
+   supported for the ``CAPTURE`` format currently set.
+
+3. Setting/changing format on ``OUTPUT`` queue does not change formats
+   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that
+   is not supported for the currently selected ``CAPTURE`` format must
+   result in the driver adjusting the requested format to an acceptable
+   one.
+
+4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
+   supported coded formats, irrespective of the current ``OUTPUT``
+   format.
+
+5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
+   change format on it.
+
+To summarize, setting formats and allocation must always start with the
+``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
+set of supported formats for the ``OUTPUT`` queue.
diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
index 12d43fe711cf..1822c66c2154 100644
--- a/Documentation/media/uapi/v4l/devices.rst
+++ b/Documentation/media/uapi/v4l/devices.rst
@@ -16,6 +16,7 @@ Interfaces
     dev-osd
     dev-codec
     dev-decoder
+    dev-encoder
     dev-effect
     dev-raw-vbi
     dev-sliced-vbi
diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
index 65dc096199ad..2ef6693b9499 100644
--- a/Documentation/media/uapi/v4l/v4l2.rst
+++ b/Documentation/media/uapi/v4l/v4l2.rst
@@ -56,6 +56,7 @@ Authors, in alphabetical order:
 - Figa, Tomasz <tfiga@chromium.org>
 
   - Documented the memory-to-memory decoder interface.
+  - Documented the memory-to-memory encoder interface.
 
 - H Schimek, Michael <mschimek@gmx.at>
 
@@ -68,6 +69,7 @@ Authors, in alphabetical order:
 - Osciak, Pawel <posciak@chromium.org>
 
   - Documented the memory-to-memory decoder interface.
+  - Documented the memory-to-memory encoder interface.
 
 - Osciak, Pawel <pawel@osciak.com>
 
-- 
2.18.0.233.g985f88cf7e-goog


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
@ 2018-07-25 11:58   ` Hans Verkuil
  2018-07-26 10:20     ` Tomasz Figa
  2018-07-30 12:52   ` Hans Verkuil
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-07-25 11:58 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel

Hi Tomasz,

Many, many thanks for working on this! It's a great document and when done
it will be very useful indeed.

Review comments follow...

On 24/07/18 16:06, Tomasz Figa wrote:
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, drain and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
>  3 files changed, 882 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst b/Documentation/media/uapi/v4l/dev-decoder.rst
> new file mode 100644
> index 000000000000..f55d34d2f860
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> @@ -0,0 +1,872 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _decoder:
> +
> +****************************************
> +Memory-to-memory Video Decoder Interface
> +****************************************
> +
> +Input data to a video decoder are buffers containing unprocessed video
> +stream (e.g. Annex-B H.264/HEVC stream, raw VP8/9 stream). The driver is
> +expected not to require any additional information from the client to
> +process these buffers. Output data are raw video frames returned in display
> +order.
> +
> +Performing software parsing, processing etc. of the stream in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Decoder Interface (in
> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> +   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> +   unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise,
> +   depending on driver capabilities and following the general V4L2
> +   guidelines.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
> +   containing data (decoded frame/stream) that resulted from processing
> +   buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue; the queue of buffers containing decoded
> +   frames; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
> +   hardware into ``CAPTURE`` buffers
> +
> +client
> +   application client communicating with the driver implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.); see
> +   also: raw format
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec and hardware requirements;
> +   typically visible resolution rounded up to full macroblocks;
> +   see also: visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ from display order if
> +   coded format includes a feature of frame reordering; ``OUTPUT`` buffers
> +   must be queued by the client in decode order
> +
> +destination
> +   data resulting from the decode process; ``CAPTURE``
> +
> +display order
> +   the order in which frames must be displayed; ``CAPTURE`` buffers must be
> +   returned by the driver in display order
> +
> +DPB
> +   Decoded Picture Buffer; a H.264 term for a buffer that stores a picture

a H.264 -> an H.264

> +   that is encoded or decoded and available for reference in further
> +   decode/encode steps.
> +
> +EOS
> +   end of stream
> +
> +IDR
> +   a type of a keyframe in H.264-encoded stream, which clears the list of
> +   earlier reference frames (DPBs)

You do not actually say what IDR stands for. Can you add that?

> +
> +keyframe
> +   an encoded frame that does not reference frames decoded earlier, i.e.
> +   can be decoded fully on its own.
> +
> +OUTPUT
> +   the source buffer queue; the queue of buffers containing encoded
> +   bitstream; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
> +   from ``OUTPUT`` buffers
> +
> +PPS
> +   Picture Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may start/continue, without
> +   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
> +   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
> +   of a new stream, or to resume decoding after a seek
> +
> +source
> +   data fed to the decoder; ``OUTPUT``
> +
> +SPS
> +   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +visible height
> +   height for given visible resolution; display height
> +
> +visible resolution
> +   stream resolution of the visible picture, in pixels, to be used for
> +   display purposes; must be smaller or equal to coded resolution;
> +   display resolution
> +
> +visible width
> +   width for given visible resolution; display width
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> +
> +   * The driver must always return the full set of supported formats,
> +     irrespective of the format set on the ``CAPTURE``.
> +
> +2. To enumerate the set of supported raw formats, the client may call
> +   :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> +
> +   * The driver must return only the formats supported for the format
> +     currently active on ``OUTPUT``.
> +
> +   * In order to enumerate raw formats supported by a given coded format,
> +     the client must first set that coded format on ``OUTPUT`` and then
> +     enumerate the ``CAPTURE`` queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing desired pixel format in
> +   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
> +     must include all possible coded resolutions supported by the decoder
> +     for given coded pixel format.

This is confusing. Since VIDIOC_ENUM_FRAMESIZES does not have a buffer type
argument you cannot say 'on OUTPUT'. I would remove 'on OUTPUT' entirely.

> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``

Ditto for 'on CAPTURE'

> +     must include all possible frame buffer resolutions supported by the
> +     decoder for given raw pixel format and coded format currently set on
> +     ``OUTPUT``.
> +
> +    .. note::
> +
> +       The client may derive the supported resolution range for a
> +       combination of coded and raw format by setting width and height of
> +       ``OUTPUT`` format to 0 and calculating the intersection of
> +       resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> +       for the given coded and raw formats.

So if the output format is set to 1280x720, then ENUM_FRAMESIZES would just
return 1280x720 as the resolution. If the output format is set to 0x0, then
it returns the full range it is capable of.

Correct?

If so, then I think this needs to be a bit more explicit. I had to think about
it a bit.

Note that the v4l2_pix_format/S_FMT documentation needs to be updated as well
since we never allowed 0x0 before.

What if you set the format to 0x0 but the stream does not have meta data with
the resolution? How does userspace know if 0x0 is allowed or not? If this is
specific to the chosen coded pixel format, should be add a new flag for those
formats indicating that the coded data contains resolution information?

That way userspace knows if 0x0 can be used, and the driver can reject 0x0
for formats that do not support it.

> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> +   capability enumeration.

capability enumeration. -> 'Querying capabilities' above.

> +
> +2. Set the coded format on ``OUTPUT`` via :c:func:`VIDIOC_S_FMT`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``pixelformat``
> +         a coded pixel format
> +
> +     ``width``, ``height``
> +         required only if cannot be parsed from the stream for the given
> +         coded format; optional otherwise - set to zero to ignore
> +
> +     other fields
> +         follow standard semantics
> +
> +   * For coded formats including stream resolution information, if width
> +     and height are set to non-zero values, the driver will propagate the
> +     resolution to ``CAPTURE`` and signal a source change event
> +     instantly. However, after the decoder is done parsing the
> +     information embedded in the stream, it will update ``CAPTURE``
> +     format with new values and signal a source change event again, if
> +     the values do not match.
> +
> +   .. note::
> +
> +      Changing ``OUTPUT`` format may change currently set ``CAPTURE``

change -> change the

> +      format. The driver will derive a new ``CAPTURE`` format from

from -> from the

> +      ``OUTPUT`` format being set, including resolution, colorimetry
> +      parameters, etc. If the client needs a specific ``CAPTURE`` format,
> +      it must adjust it afterwards.
> +
> +3.  *[optional]* Get minimum number of buffers required for ``OUTPUT``
> +    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to

client -> the client

> +    use more buffers than minimum required by hardware/format.

than -> than the

> +
> +    * **Required fields:**
> +
> +      ``id``
> +          set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    * **Return fields:**
> +
> +      ``value``
> +          required number of ``OUTPUT`` buffers for the currently set
> +          format
> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> +    ``OUTPUT``.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          requested number of buffers to allocate; greater than zero
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +      ``sizeimage``
> +          follows standard semantics; the client is free to choose any
> +          suitable size, however, it may be subject to change by the
> +          driver
> +
> +    * **Return fields:**
> +
> +      ``count``
> +          actual number of buffers allocated
> +
> +    * The driver must adjust count to minimum of required number of
> +      ``OUTPUT`` buffers for given format and count passed. The client must
> +      check this value after the ioctl returns to get the number of
> +      buffers allocated.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Start streaming on ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`.
> +
> +6.  This step only applies to coded formats that contain resolution
> +    information in the stream. Continue queuing/dequeuing bitstream
> +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
> +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and returning
> +    each buffer to the client until required metadata to configure the
> +    ``CAPTURE`` queue are found. This is indicated by the driver sending
> +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> +    requirement to pass enough data for this to occur in the first buffer
> +    and the driver must be able to process any number.
> +
> +    * If data in a buffer that triggers the event is required to decode
> +      the first frame, the driver must not return it to the client,
> +      but must retain it for further decoding.
> +
> +    * If the client set width and height of ``OUTPUT`` format to 0, calling
> +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return -EPERM,
> +      until the driver configures ``CAPTURE`` format according to stream
> +      metadata.
> +
> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
> +      the event is signaled, the decoding process will not continue until
> +      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> +      command.
> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +
> +7.  This step only applies to coded formats that contain resolution
> +    information in the stream.
> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> +    enough data is obtained from the stream to allocate ``CAPTURE``
> +    buffers and to begin producing decoded frames.
> +
> +    * **Required fields:**
> +
> +      ``type``
> +          set to ``V4L2_EVENT_SOURCE_CHANGE``
> +
> +    * **Return fields:**
> +
> +      ``u.src_change.changes``
> +          set to ``V4L2_EVENT_SRC_CH_RESOLUTION``
> +
> +    * Any client query issued after the driver queues the event must return
> +      values applying to the just parsed stream, including queue formats,
> +      selection rectangles and controls.
> +
> +8.  Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> +    destination buffers parsed/decoded from the bitstream.
> +
> +    * **Required fields:**
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +    * **Return fields:**
> +
> +      ``width``, ``height``
> +          frame buffer resolution for the decoded frames
> +
> +      ``pixelformat``
> +          pixel format for decoded frames
> +
> +      ``num_planes`` (for _MPLANE ``type`` only)
> +          number of planes for pixelformat
> +
> +      ``sizeimage``, ``bytesperline``
> +          as per standard semantics; matching frame buffer format
> +
> +    .. note::
> +
> +       The value of ``pixelformat`` may be any pixel format supported and
> +       must be supported for current stream, based on the information
> +       parsed from the stream and hardware capabilities. It is suggested
> +       that driver chooses the preferred/optimal format for given
> +       configuration. For example, a YUV format may be preferred over an
> +       RGB format, if additional conversion step would be required.
> +
> +9.  *[optional]* Enumerate ``CAPTURE`` formats via
> +    :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE`` queue. Once the stream
> +    information is parsed and known, the client may use this ioctl to
> +    discover which raw formats are supported for given stream and select on
> +    of them via :c:func:`VIDIOC_S_FMT`.
> +
> +    .. note::
> +
> +       The driver will return only formats supported for the current stream
> +       parsed in this initialization sequence, even if more formats may be
> +       supported by the driver in general.
> +
> +       For example, a driver/hardware may support YUV and RGB formats for
> +       resolutions 1920x1088 and lower, but only YUV for higher
> +       resolutions (due to hardware limitations). After parsing
> +       a resolution of 1920x1088 or lower, :c:func:`VIDIOC_ENUM_FMT` may
> +       return a set of YUV and RGB pixel formats, but after parsing
> +       resolution higher than 1920x1088, the driver will not return RGB,
> +       unsupported for this resolution.
> +
> +       However, subsequent resolution change event triggered after
> +       discovering a resolution change within the same stream may switch
> +       the stream into a lower resolution and :c:func:`VIDIOC_ENUM_FMT`
> +       would return RGB formats again in that case.
> +
> +10.  *[optional]* Choose a different ``CAPTURE`` format than suggested via
> +     :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the
> +     client to choose a different format than selected/suggested by the
> +     driver in :c:func:`VIDIOC_G_FMT`.
> +
> +     * **Required fields:**
> +
> +       ``type``
> +           a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +       ``pixelformat``
> +           a raw pixel format
> +
> +     .. note::
> +
> +        Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> +        formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to
> +        find out a set of allowed formats for given configuration, but not
> +        required, if the client can accept the defaults.
> +
> +11. *[optional]* Acquire visible resolution via
> +    :c:func:`VIDIOC_G_SELECTION`.
> +
> +    * **Required fields:**
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``target``
> +          set to ``V4L2_SEL_TGT_COMPOSE``
> +
> +    * **Return fields:**
> +
> +      ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +          visible rectangle; this must fit within frame buffer resolution
> +          returned by :c:func:`VIDIOC_G_FMT`.
> +
> +    * The driver must expose following selection targets on ``CAPTURE``:
> +
> +      ``V4L2_SEL_TGT_CROP_BOUNDS``
> +          corresponds to coded resolution of the stream
> +
> +      ``V4L2_SEL_TGT_CROP_DEFAULT``
> +          a rectangle covering the part of the frame buffer that contains
> +          meaningful picture data (visible area); width and height will be
> +          equal to visible resolution of the stream
> +
> +      ``V4L2_SEL_TGT_CROP``
> +          rectangle within coded resolution to be output to ``CAPTURE``;
> +          defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``; read-only on hardware
> +          without additional compose/scaling capabilities
> +
> +      ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> +          maximum rectangle within ``CAPTURE`` buffer, which the cropped
> +          frame can be output into; equal to ``V4L2_SEL_TGT_CROP``, if the
> +          hardware does not support compose/scaling
> +
> +      ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> +          equal to ``V4L2_SEL_TGT_CROP``
> +
> +      ``V4L2_SEL_TGT_COMPOSE``
> +          rectangle inside ``OUTPUT`` buffer into which the cropped frame
> +          is output; defaults to ``V4L2_SEL_TGT_COMPOSE_DEFAULT``;
> +          read-only on hardware without additional compose/scaling
> +          capabilities
> +
> +      ``V4L2_SEL_TGT_COMPOSE_PADDED``
> +          rectangle inside ``OUTPUT`` buffer which is overwritten by the
> +          hardware; equal to ``V4L2_SEL_TGT_COMPOSE``, if the hardware
> +          does not write padding pixels
> +
> +12. *[optional]* Get minimum number of buffers required for ``CAPTURE``
> +    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> +    use more buffers than minimum required by hardware/format.
> +
> +    * **Required fields:**
> +
> +      ``id``
> +          set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> +    * **Return fields:**
> +
> +      ``value``
> +          minimum number of buffers required to decode the stream parsed in
> +          this initialization sequence.
> +
> +    .. note::
> +
> +       Note that the minimum number of buffers must be at least the number
> +       required to successfully decode the current stream. This may for
> +       example be the required DPB size for an H.264 stream given the
> +       parsed stream configuration (resolution, level).
> +
> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
> +    on the ``CAPTURE`` queue.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          requested number of buffers to allocate; greater than zero
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +    * **Return fields:**
> +
> +      ``count``
> +          adjusted to allocated number of buffers
> +
> +    * The driver must adjust count to minimum of required number of
> +      destination buffers for given format and stream configuration and the
> +      count passed. The client must check this value after the ioctl
> +      returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> +       get minimum number of buffers required, and pass the obtained value
> +       plus the number of additional buffers needed in count to
> +       :c:func:`VIDIOC_REQBUFS`.


I think we should mention here the option of using VIDIOC_CREATE_BUFS in order
to allocate buffers larger than the current CAPTURE format in order to accommodate
future resolution changes.

> +
> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> +
> +Decoding
> +========
> +
> +This state is reached after a successful initialization sequence. In this
> +state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> +semantics.
> +
> +Both queues operate independently, following standard behavior of V4L2
> +buffer queues and memory-to-memory devices. In addition, the order of
> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
> +coded format, e.g. frame reordering. The client must not assume any direct
> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.

Is there a relationship between capture and output buffers w.r.t. the timestamp
field? I am not aware that there is one.

> +
> +The contents of source ``OUTPUT`` buffers depend on active coded pixel
> +format and might be affected by codec-specific extended controls, as stated
> +in documentation of each format individually.

in -> in the
each format individually -> each format

> +
> +The client must not assume any direct relationship between ``CAPTURE``
> +and ``OUTPUT`` buffers and any specific timing of buffers becoming
> +available to dequeue. Specifically:
> +
> +* a buffer queued to ``OUTPUT`` may result in no buffers being produced
> +  on ``CAPTURE`` (e.g. if it does not contain encoded data, or if only
> +  metadata syntax structures are present in it),
> +
> +* a buffer queued to ``OUTPUT`` may result in more than 1 buffer produced
> +  on ``CAPTURE`` (if the encoded data contained more than one frame, or if
> +  returning a decoded frame allowed the driver to return a frame that
> +  preceded it in decode, but succeeded it in display order),
> +
> +* a buffer queued to ``OUTPUT`` may result in a buffer being produced on
> +  ``CAPTURE`` later into decode process, and/or after processing further
> +  ``OUTPUT`` buffers, or be returned out of order, e.g. if display
> +  reordering is used,
> +
> +* buffers may become available on the ``CAPTURE`` queue without additional
> +  buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of
> +  ``OUTPUT`` buffers being queued in the past and decoding result of which
> +  being available only at later time, due to specifics of the decoding
> +  process.
> +
> +Seek
> +====
> +
> +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
> +
> +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> +     treated as returned to the client (following standard semantics).
> +
> +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +   * The driver must be put in a state after seek and be ready to

"put in a state"???

> +     accept new source bitstream buffers.
> +
> +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream starting exactly from
> +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> +      data queued and must keep processing the queued buffers until it
> +      finds a suitable resume point. While looking for a resume point, the
> +      driver processes ``OUTPUT`` buffers and returns them to the client
> +      without producing any decoded frames.
> +
> +      For hardware known to be mishandling seeks to a non-resume point,
> +      e.g. by returning corrupted decoded frames, the driver must be able
> +      to handle such seeks without a crash or any fatal decode error.
> +
> +4. After a resume point is found, the driver will start returning
> +   ``CAPTURE`` buffers with decoded frames.
> +
> +   * There is no precise specification for ``CAPTURE`` queue of when it
> +     will start producing buffers containing decoded data from buffers
> +     queued after the seek, as it operates independently
> +     from ``OUTPUT`` queue.
> +
> +     * The driver is allowed to and may return a number of remaining

I'd drop 'is allowed to and'.

> +       ``CAPTURE`` buffers containing decoded frames from before the seek
> +       after the seek sequence (STREAMOFF-STREAMON) is performed.
> +
> +     * The driver is also allowed to and may not return all decoded frames

Ditto.

> +       queued but not decode before the seek sequence was initiated. For

Very confusing sentence. I think you mean this:

	  The driver may not return all decoded frames that where ready for
	  dequeueing from before the seek sequence was initiated.

Is this really true? Once decoded frames are marked as buffer_done by the
driver there is no reason for them to be removed. Or you mean something else
here, e.g. the frames are decoded, but the buffers not yet given back to vb2.

> +       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
> +       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> +       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
> +       H’}, {A’, G’, H’}, {G’, H’}.
> +
> +   .. note::
> +
> +      To achieve instantaneous seek, the client may restart streaming on
> +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
> +
> +Pause
> +=====
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> +Without source bitstream data, there is no data to process and the hardware
> +remains idle.
> +
> +Conversely, using :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue indicates
> +a seek, which
> +
> +1. drops all ``OUTPUT`` buffers in flight and
> +2. after a subsequent :c:func:`VIDIOC_STREAMON`, will look for and only
> +   continue from a resume point.
> +
> +This is usually undesirable for pause. The STREAMOFF-STREAMON sequence is
> +intended for seeking.
> +
> +Similarly, ``CAPTURE`` queue should remain streaming as well, as the

the ``CAPTURE`` queue

(add 'the')

> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> +sets.

'changing buffer sets': not clear what is meant by this. It's certainly not
'solely' since it can also be used to achieve an instantaneous seek.

> +
> +Dynamic resolution change
> +=========================
> +
> +A video decoder implementing this interface must support dynamic resolution
> +change, for streams, which include resolution metadata in the bitstream.

I think the commas can be removed from this sentence. I would also replace
'which' by 'that'.

> +When the decoder encounters a resolution change in the stream, the dynamic
> +resolution change sequence is started.
> +
> +1.  After encountering a resolution change in the stream, the driver must
> +    first process and decode all remaining buffers from before the
> +    resolution change point.
> +
> +2.  After all buffers containing decoded frames from before the resolution
> +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
> +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
> +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +
> +    * The last buffer from before the change must be marked with
> +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the

spurious 'as'?

> +      drain sequence. The last buffer might be empty (with
> +      :c:type:`v4l2_buffer` ``bytesused`` = 0) and must be ignored by the
> +      client, since it does not contain any decoded frame.

any -> a

> +
> +    * Any client query issued after the driver queues the event must return
> +      values applying to the stream after the resolution change, including
> +      queue formats, selection rectangles and controls.
> +
> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
> +      the event is signaled, the decoding process will not continue until
> +      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> +      command.

With (re-)starting streaming you mean a STREAMOFF/ON pair on the CAPTURE queue,
right?

> +
> +    .. note::
> +
> +       Any attempts to dequeue more buffers beyond the buffer marked
> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +       :c:func:`VIDIOC_DQBUF`.
> +
> +3.  The client calls :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` to get the new
> +    format information. This is identical to calling :c:func:`VIDIOC_G_FMT`
> +    after ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization sequence
> +    and should be handled similarly.
> +
> +    .. note::
> +
> +       It is allowed for the driver not to support the same pixel format as
> +       previously used (before the resolution change) for the new
> +       resolution. The driver must select a default supported pixel format,
> +       return it, if queried using :c:func:`VIDIOC_G_FMT`, and the client
> +       must take note of it.
> +
> +4.  The client acquires visible resolution as in initialization sequence.
> +
> +5.  *[optional]* The client is allowed to enumerate available formats and
> +    select a different one than currently chosen (returned via
> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +6.  *[optional]* The client acquires minimum number of buffers as in
> +    initialization sequence.

It's an optional step, but what might happen if you ignore it or if the control
does not exist?

You also should mention that this is the min number of CAPTURE buffers.

I wonder if we should make these min buffer controls required. It might be easier
that way.

> +7.  If all the following conditions are met, the client may resume the
> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
> +    sequence:
> +
> +    * ``sizeimage`` of new format is less than or equal to the size of
> +      currently allocated buffers,
> +
> +    * the number of buffers currently allocated is greater than or equal to
> +      the minimum number of buffers acquired in step 6.

You might want to mention that if there are insufficient buffers, then
VIDIOC_CREATE_BUFS can be used to add more buffers.

> +
> +    In such case, the remaining steps do not apply.
> +
> +    However, if the client intends to change the buffer set, to lower
> +    memory usage or for any other reasons, it may be achieved by following
> +    the steps below.
> +
> +8.  After dequeuing all remaining buffers from the ``CAPTURE`` queue, the
> +    client must call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> +    The ``OUTPUT`` queue must remain streaming (calling STREAMOFF on it
> +    would trigger a seek).
> +
> +9.  The client frees the buffers on the ``CAPTURE`` queue using
> +    :c:func:`VIDIOC_REQBUFS`.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          set to 0
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +10. The client allocates a new set of buffers for the ``CAPTURE`` queue via
> +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +11. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> +    ``CAPTURE`` queue.
> +
> +During the resolution change sequence, the ``OUTPUT`` queue must remain
> +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue would
> +initiate a seek.
> +
> +The ``OUTPUT`` queue operates separately from the ``CAPTURE`` queue for the
> +duration of the entire resolution change sequence. It is allowed (and
> +recommended for best performance and simplicity) for the client to keep
> +queuing/dequeuing buffers from/to ``OUTPUT`` queue even while processing
> +this sequence.
> +
> +.. note::
> +
> +   It is also possible for this sequence to be triggered without a change
> +   in coded resolution, if a different number of ``CAPTURE`` buffers is
> +   required in order to continue decoding the stream or the visible
> +   resolution changes.
> +
> +Drain
> +=====
> +
> +To ensure that all queued ``OUTPUT`` buffers have been processed and
> +related ``CAPTURE`` buffers output to the client, the following drain
> +sequence may be followed. After the drain sequence is complete, the client
> +has received all decoded frames for all ``OUTPUT`` buffers queued before
> +the sequence was started.
> +
> +1. Begin drain by issuing :c:func:`VIDIOC_DECODER_CMD`.
> +
> +   * **Required fields:**
> +
> +     ``cmd``
> +         set to ``V4L2_DEC_CMD_STOP``
> +
> +     ``flags``
> +         set to 0
> +
> +     ``pts``
> +         set to 0
> +
> +2. The driver must process and decode as normal all ``OUTPUT`` buffers
> +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was issued.
> +   Any operations triggered as a result of processing these buffers
> +   (including the initialization and resolution change sequences) must be
> +   processed as normal by both the driver and the client before proceeding
> +   with the drain sequence.
> +
> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_DEC_CMD_STOP`` are
> +   processed:
> +
> +   * If the ``CAPTURE`` queue is streaming, once all decoded frames (if
> +     any) are ready to be dequeued on the ``CAPTURE`` queue, the driver
> +     must send a ``V4L2_EVENT_EOS``. The driver must also set
> +     ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> +     buffer on the ``CAPTURE`` queue containing the last frame (if any)
> +     produced as a result of processing the ``OUTPUT`` buffers queued
> +     before ``V4L2_DEC_CMD_STOP``. If no more frames are left to be
> +     returned at the point of handling ``V4L2_DEC_CMD_STOP``, the driver
> +     must return an empty buffer (with :c:type:`v4l2_buffer`
> +     ``bytesused`` = 0) as the last buffer with ``V4L2_BUF_FLAG_LAST`` set
> +     instead. Any attempts to dequeue more buffers beyond the buffer marked
> +     with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +     :c:func:`VIDIOC_DQBUF`.
> +
> +   * If the ``CAPTURE`` queue is NOT streaming, no action is necessary for
> +     ``CAPTURE`` queue and the driver must send a ``V4L2_EVENT_EOS``
> +     immediately after all ``OUTPUT`` buffers in question have been
> +     processed.
> +
> +4. At this point, decoding is paused and the driver will accept, but not
> +   process any newly queued ``OUTPUT`` buffers until the client issues
> +   ``V4L2_DEC_CMD_START`` or restarts streaming on any queue.
> +
> +* Once the drain sequence is initiated, the client needs to drive it to
> +  completion, as described by the above steps, unless it aborts the process
> +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue.  The client
> +  is not allowed to issue ``V4L2_DEC_CMD_START`` or ``V4L2_DEC_CMD_STOP``
> +  again while the drain sequence is in progress and they will fail with
> +  -EBUSY error code if attempted.
> +
> +* Restarting streaming on ``OUTPUT`` queue will implicitly end the paused
> +  state and reinitialize the decoder (similarly to the seek sequence).
> +  Restarting ``CAPTURE`` queue will not affect an in-progress drain
> +  sequence.
> +
> +* The drivers must also implement :c:func:`VIDIOC_TRY_DECODER_CMD`, as a
> +  way to let the client query the availability of decoder commands.
> +
> +End of stream
> +=============
> +
> +If the decoder encounters an end of stream marking in the stream, the
> +driver must send a ``V4L2_EVENT_EOS`` event to the client after all frames
> +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> +behavior is identical to the drain sequence triggered by the client via
> +``V4L2_DEC_CMD_STOP``.
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on ``OUTPUT`` queue may change the set of formats

Setting -> Setting the

> +   supported/advertised on the ``CAPTURE`` queue. In particular, it also
> +   means that ``CAPTURE`` format may be reset and the client must not

that -> that the

> +   rely on the previously set format being preserved.
> +
> +2. Enumerating formats on ``CAPTURE`` queue must only return formats
> +   supported for the ``OUTPUT`` format currently set.
> +
> +3. Setting/changing format on ``CAPTURE`` queue does not change formats

format -> the format

> +   available on ``OUTPUT`` queue. An attempt to set ``CAPTURE`` format that

set -> set a

> +   is not supported for the currently selected ``OUTPUT`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.
> +
> +4. Enumerating formats on ``OUTPUT`` queue always returns the full set of
> +   supported coded formats, irrespective of the current ``CAPTURE``
> +   format.
> +
> +5. After allocating buffers on the ``OUTPUT`` queue, it is not possible to
> +   change format on it.

format -> the format

> +
> +To summarize, setting formats and allocation must always start with the
> +``OUTPUT`` queue and the ``OUTPUT`` queue is the master that governs the
> +set of supported formats for the ``CAPTURE`` queue.
> diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> index fb7f8c26cf09..12d43fe711cf 100644
> --- a/Documentation/media/uapi/v4l/devices.rst
> +++ b/Documentation/media/uapi/v4l/devices.rst
> @@ -15,6 +15,7 @@ Interfaces
>      dev-output
>      dev-osd
>      dev-codec
> +    dev-decoder
>      dev-effect
>      dev-raw-vbi
>      dev-sliced-vbi
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index b89e5621ae69..65dc096199ad 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -53,6 +53,10 @@ Authors, in alphabetical order:
>  
>    - Original author of the V4L2 API and documentation.
>  
> +- Figa, Tomasz <tfiga@chromium.org>
> +
> +  - Documented the memory-to-memory decoder interface.
> +
>  - H Schimek, Michael <mschimek@gmx.at>
>  
>    - Original author of the V4L2 API and documentation.
> @@ -61,6 +65,10 @@ Authors, in alphabetical order:
>  
>    - Documented the Digital Video timings API.
>  
> +- Osciak, Pawel <posciak@chromium.org>
> +
> +  - Documented the memory-to-memory decoder interface.
> +
>  - Osciak, Pawel <pawel@osciak.com>
>  
>    - Designed and documented the multi-planar API.
> @@ -85,7 +93,7 @@ Authors, in alphabetical order:
>  
>    - Designed and documented the VIDIOC_LOG_STATUS ioctl, the extended control ioctls, major parts of the sliced VBI API, the MPEG encoder and decoder APIs and the DV Timings API.
>  
> -**Copyright** |copy| 1999-2016: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari.
> +**Copyright** |copy| 1999-2018: Bill Dirks, Michael H. Schimek, Hans Verkuil, Martin Rubli, Andy Walls, Muralidharan Karicheri, Mauro Carvalho Chehab, Pawel Osciak, Sakari Ailus & Antti Palosaari, Tomasz Figa
>  
>  Except when explicitly stated as GPL, programming examples within this
>  part can be used and distributed without restrictions.
> 

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/2] Document memory-to-memory video codec interfaces
  2018-07-24 14:06 [PATCH 0/2] Document memory-to-memory video codec interfaces Tomasz Figa
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
  2018-07-24 14:06 ` [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface Tomasz Figa
@ 2018-07-25 13:28 ` Philipp Zabel
  2018-07-25 13:35   ` Tomasz Figa
  2018-09-10  9:13 ` Hans Verkuil
  3 siblings, 1 reply; 62+ messages in thread
From: Philipp Zabel @ 2018-07-25 13:28 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, Alexandre Courbot, kamil, a.hajda,
	Kyungmin Park, jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel

Hi Tomasz,

On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> This series attempts to add the documentation of what was discussed
> during Media Workshops at LinuxCon Europe 2012 in Barcelona and then
> later Embedded Linux Conference Europe 2014 in Düsseldorf and then
> eventually written down by Pawel Osciak and tweaked a bit by Chrome OS
> video team (but mostly in a cosmetic way or making the document more
> precise), during the several years of Chrome OS using the APIs in
> production.
> 
> Note that most, if not all, of the API is already implemented in
> existing mainline drivers, such as s5p-mfc or mtk-vcodec. Intention of
> this series is just to formalize what we already have.
>
> It is an initial conversion from Google Docs to RST, so formatting is
> likely to need some further polishing. It is also the first time for me
> to create such long RST documention. I could not find any other instance
> of similar userspace sequence specifications among our Media documents,
> so I mostly followed what was there in the source. Feel free to suggest
> a better format.
> 
> Much of credits should go to Pawel Osciak, for writing most of the
> original text of the initial RFC.
> 
> Changes since RFC:
> (https://lore.kernel.org/patchwork/project/lkml/list/?series=348588)
>  - The number of changes is too big to list them all here. Thanks to
>    a huge number of very useful comments from everyone (Philipp, Hans,
>    Nicolas, Dave, Stanimir, Alexandre) we should have the interfaces much
>    more specified now. The issues collected since previous revisions and
>    answers leading to this revision are listed below.

Thanks a lot for the update, and especially for the nice Q&A summary of
the discussions so far.

[...]
> Decoder issues
> 
[...]
>   How should ENUM_FRAMESIZES be affected by profiles and levels?
> 
>   Answer: Not in current specification - the logic is too complicated and
>   it might make more sense to actually handle this in user space.  (In
>   theory, level implies supported frame sizes + other factors.)

For decoding I think it makes more sense to let the hardware decode them
from the stream and present them as read-only controls, such as:

42a68012e67c ("media: coda: add read-only h.264 decoder profile/level
controls")

if possible. For encoding, the coda firmware determines level from
bitrate and coded resolution, itself, so I agree not making this part of
the spec is a good idea for now.

regards
Philipp

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/2] Document memory-to-memory video codec interfaces
  2018-07-25 13:28 ` [PATCH 0/2] Document memory-to-memory video codec interfaces Philipp Zabel
@ 2018-07-25 13:35   ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-07-25 13:35 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Philipp,

On Wed, Jul 25, 2018 at 10:28 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> Hi Tomasz,
>
> On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> > This series attempts to add the documentation of what was discussed
> > during Media Workshops at LinuxCon Europe 2012 in Barcelona and then
> > later Embedded Linux Conference Europe 2014 in Düsseldorf and then
> > eventually written down by Pawel Osciak and tweaked a bit by Chrome OS
> > video team (but mostly in a cosmetic way or making the document more
> > precise), during the several years of Chrome OS using the APIs in
> > production.
> >
> > Note that most, if not all, of the API is already implemented in
> > existing mainline drivers, such as s5p-mfc or mtk-vcodec. Intention of
> > this series is just to formalize what we already have.
> >
> > It is an initial conversion from Google Docs to RST, so formatting is
> > likely to need some further polishing. It is also the first time for me
> > to create such long RST documention. I could not find any other instance
> > of similar userspace sequence specifications among our Media documents,
> > so I mostly followed what was there in the source. Feel free to suggest
> > a better format.
> >
> > Much of credits should go to Pawel Osciak, for writing most of the
> > original text of the initial RFC.
> >
> > Changes since RFC:
> > (https://lore.kernel.org/patchwork/project/lkml/list/?series=348588)
> >  - The number of changes is too big to list them all here. Thanks to
> >    a huge number of very useful comments from everyone (Philipp, Hans,
> >    Nicolas, Dave, Stanimir, Alexandre) we should have the interfaces much
> >    more specified now. The issues collected since previous revisions and
> >    answers leading to this revision are listed below.
>
> Thanks a lot for the update, and especially for the nice Q&A summary of
> the discussions so far.
>
> [...]
> > Decoder issues
> >
> [...]
> >   How should ENUM_FRAMESIZES be affected by profiles and levels?
> >
> >   Answer: Not in current specification - the logic is too complicated and
> >   it might make more sense to actually handle this in user space.  (In
> >   theory, level implies supported frame sizes + other factors.)
>
> For decoding I think it makes more sense to let the hardware decode them
> from the stream and present them as read-only controls, such as:
>
> 42a68012e67c ("media: coda: add read-only h.264 decoder profile/level
> controls")

To clarify, this point is only about the effect on ENUM_FRAMESIZES.
Profile and level controls are mentioned in capabilities enumeration,
but it may make sense to add optional steps of querying them in
Initialization sequence.

>
> if possible. For encoding, the coda firmware determines level from
> bitrate and coded resolution, itself, so I agree not making this part of
> the spec is a good idea for now.

Encoder controls are driver-specific in general, since the encoding
capabilities vary a lot, so I decided to just briefly mention the
general idea of encoding parameters in "Encoding parameter changes"
section. It could be a good idea to add a reference to the MPEG
control documentation there, though.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-07-24 14:06 ` [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface Tomasz Figa
@ 2018-07-25 13:41   ` Philipp Zabel
  2018-08-07  6:07     ` Tomasz Figa
  2018-07-25 13:57   ` Hans Verkuil
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: Philipp Zabel @ 2018-07-25 13:41 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, Alexandre Courbot, kamil, a.hajda,
	Kyungmin Park, jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel

On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>

Thanks a lot for the update, 
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting stream
> +must not need any further post-processing by the client.
> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in
> +development) is strongly advised.
> +
[...]
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on ``CAPTURE`` queue may change the set of formats
> +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> +   means that ``OUTPUT`` format may be reset and the client must not
> +   rely on the previously set format being preserved.

Since the only property of the CAPTURE format that can be set directly
via S_FMT is the pixelformat, should this be made explicit?

1. Setting pixelformat on ``CAPTURE`` queue may change the set of
   formats supported/advertised on the ``OUTPUT`` queue. In particular,
   it also means that ``OUTPUT`` format may be reset and the client
   must not rely on the previously set format being preserved.

?

> +2. Enumerating formats on ``OUTPUT`` queue must only return formats
> +   supported for the ``CAPTURE`` format currently set.

Same here, as it usually is the codec selected by CAPTURE pixelformat
that determines the supported OUTPUT pixelformats and resolutions.

2. Enumerating formats on ``OUTPUT`` queue must only return formats
   supported for the ``CAPTURE`` pixelformat currently set.

This could prevent the possible misconception that the CAPTURE
width/height might in any form limit the OUTPUT format, when in fact it
is determined by it.

> +3. Setting/changing format on ``OUTPUT`` queue does not change formats
> +   available on ``CAPTURE`` queue.

3. Setting/changing format on the ``OUTPUT`` queue does not change
   pixelformats available on the ``CAPTURE`` queue.

?

Because setting OUTPUT width/height or CROP SELECTION very much limits
the possible values of CAPTURE width/height.

Maybe 'available' in this context should be specified somewhere to mean
'returned by ENUM_FMT and allowed by S_FMT/TRY_FMT'.

> +   An attempt to set ``OUTPUT`` format that
> +   is not supported for the currently selected ``CAPTURE`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.

   An attempt to set ``OUTPUT`` format that is not supported for the
  
currently selected ``CAPTURE`` pixelformat must result in the driver
  
adjusting the requested format to an acceptable one.

> +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
> +   supported coded formats, irrespective of the current ``OUTPUT``
> +   format.
> +
> +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> +set of supported formats for the ``OUTPUT`` queue.

To summarize, setting formats and allocation must always start with
setting the encoded pixelformat on the ``CAPTURE`` queue. The
``CAPTURE`` queue is the master that governs the set of supported
formats for the ``OUTPUT`` queue.

Or is this too verbose?

regards
Philipp

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-07-24 14:06 ` [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface Tomasz Figa
  2018-07-25 13:41   ` Philipp Zabel
@ 2018-07-25 13:57   ` Hans Verkuil
  2018-08-07  6:54     ` Tomasz Figa
  2018-09-07 20:17   ` Ezequiel Garcia
  2018-10-17 15:19   ` Laurent Pinchart
  3 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-07-25 13:57 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel

On 24/07/18 16:06, Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting stream
> +must not need any further post-processing by the client.

Due to the confusing use capture and output I wonder if it would be better to
rephrase this as follows:

"A video encoder takes raw video frames in display order and encodes them into
a bitstream. It generates complete chunks of the bitstream, including
all metadata, headers, etc. The resulting bitstream does not require any further
post-processing by the client."

Something similar should be done for the decoder documentation.

> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in
> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> +   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> +   unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise,
> +   depending on driver capabilities and following the general V4L2
> +   guidelines.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
> +   containing data (encoded frame/stream) that resulted from processing
> +   buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue; the queue of buffers containing encoded
> +   bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
> +   hardware into ``CAPTURE`` buffers
> +
> +client
> +   application client communicating with the driver implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.);
> +   see also: raw format
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec and hardware requirements;
> +   typically visible resolution rounded up to full macroblocks; see also:
> +   visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ from display order if
> +   coded format includes a feature of frame reordering; ``CAPTURE`` buffers
> +   must be returned by the driver in decode order
> +
> +display order
> +   the order in which frames must be displayed; ``OUTPUT`` buffers must be
> +   queued by the client in display order
> +
> +IDR
> +   a type of a keyframe in H.264-encoded stream, which clears the list of
> +   earlier reference frames (DPBs)

Same problem as with the previous patch: it doesn't say what IDR stands for.
It also refers to DPBs, but DPB is not part of this glossary.

Perhaps the glossary of the encoder/decoder should be combined.

> +
> +keyframe
> +   an encoded frame that does not reference frames decoded earlier, i.e.
> +   can be decoded fully on its own.
> +
> +macroblock
> +   a processing unit in image and video compression formats based on linear
> +   block transforms (e.g. H264, VP8, VP9); codec-specific, but for most of
> +   popular codecs the size is 16x16 samples (pixels)
> +
> +OUTPUT
> +   the source buffer queue; the queue of buffers containing raw frames;
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
> +   from ``OUTPUT`` buffers
> +
> +PPS
> +   Picture Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may start/continue, without
> +   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
> +   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
> +   of a new stream, or to resume decoding after a seek
> +
> +source
> +   data fed to the encoder; ``OUTPUT``
> +
> +source height
> +   height in pixels for given source resolution
> +
> +source resolution
> +   resolution in pixels of source frames being source to the encoder and
> +   subject to further cropping to the bounds of visible resolution
> +
> +source width
> +   width in pixels for given source resolution
> +
> +SPS
> +   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +stream metadata
> +   additional (non-visual) information contained inside encoded bitstream;
> +   for example: coded resolution, visible resolution, codec profile
> +
> +visible height
> +   height for given visible resolution; display height
> +
> +visible resolution
> +   stream resolution of the visible picture, in pixels, to be used for
> +   display purposes; must be smaller or equal to coded resolution;
> +   display resolution
> +
> +visible width
> +   width for given visible resolution; display width
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> +
> +   * The driver must always return the full set of supported formats,
> +     irrespective of the format set on the ``OUTPUT`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client may call
> +   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> +
> +   * The driver must return only the formats supported for the format
> +     currently active on ``CAPTURE``.
> +
> +   * In order to enumerate raw formats supported by a given coded format,
> +     the client must first set that coded format on ``CAPTURE`` and then
> +     enumerate the ``OUTPUT`` queue.
> +
> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing desired pixel format in
> +   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
> +     must include all possible coded resolutions supported by the encoder
> +     for given coded pixel format.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
> +     queue must include all possible frame buffer resolutions supported
> +     by the encoder for given raw pixel format and coded format currently
> +     set on ``CAPTURE``.
> +
> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +5. Any additional encoder capabilities may be discovered by querying
> +   their respective controls.
> +
> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported formats and resolutions. See
> +   capability enumeration.

capability enumeration. -> 'Querying capabilities' above.

> +
> +2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +     ``pixelformat``
> +         set to a coded format to be produced
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         coded resolution (based on currently active ``OUTPUT`` format)
> +
> +   .. note::
> +
> +      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
> +      format. The driver will derive a new ``OUTPUT`` format from
> +      ``CAPTURE`` format being set, including resolution, colorimetry
> +      parameters, etc. If the client needs a specific ``OUTPUT`` format,
> +      it must adjust it afterwards.
> +
> +3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
> +   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``index``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``pixelformat``
> +         raw format supported for the coded format currently selected on
> +         the ``OUTPUT`` queue.
> +
> +4. The client may set the raw source format on the ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_S_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``pixelformat``
> +         raw format of the source
> +
> +     ``width``, ``height``
> +         source resolution
> +
> +     ``num_planes`` (for _MPLANE)
> +         set to number of planes for pixelformat
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         may be adjusted by driver to match alignment requirements, as
> +         required by the currently selected formats
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * Setting the source resolution will reset visible resolution to the
> +     adjusted source resolution rounded up to the closest visible
> +     resolution supported by the driver. Similarly, coded resolution will

coded -> the coded

> +     be reset to source resolution rounded up to the closest coded

reset -> set
source -> the source

> +     resolution supported by the driver (typically a multiple of
> +     macroblock size).

The first sentence of this paragraph is very confusing. It needs a bit more work,
I think.

> +
> +   .. note::
> +
> +      This step is not strictly required, since ``OUTPUT`` is expected to
> +      have a valid default format. However, the client needs to ensure that
> +      ``OUTPUT`` format matches its expectations via either
> +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> +      being the typical scenario, since the default format is unlikely to
> +      be what the client needs.

Hmm. I'm not sure if this note should be included. It's good practice to always
set the output format. I think the note confuses more than that it helps. IMHO.

> +
> +5. *[optional]* Set visible resolution for the stream metadata via

Set -> Set the

> +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``target``
> +         set to ``V4L2_SEL_TGT_CROP``
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle; this must fit within the framebuffer resolution

Should that be "source resolution"? Or the resolution returned by "CROP_BOUNDS"?

> +         and might be subject to adjustment to match codec and hardware
> +         constraints
> +
> +   * **Return fields:**
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle adjusted by the driver
> +
> +   * The driver must expose following selection targets on ``OUTPUT``:
> +
> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> +         maximum crop bounds within the source buffer supported by the
> +         encoder
> +
> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> +         suggested cropping rectangle that covers the whole source picture
> +
> +     ``V4L2_SEL_TGT_CROP``
> +         rectangle within the source buffer to be encoded into the
> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> +
> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> +         maximum rectangle within the coded resolution, which the cropped
> +         source frame can be output into; always equal to (0, 0)x(width of
> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> +         hardware does not support compose/scaling
> +
> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> +         equal to ``V4L2_SEL_TGT_CROP``
> +
> +     ``V4L2_SEL_TGT_COMPOSE``
> +         rectangle within the coded frame, which the cropped source frame
> +         is to be output into; defaults to
> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> +         additional compose/scaling capabilities; resulting stream will
> +         have this rectangle encoded as the visible rectangle in its
> +         metadata
> +
> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> +         always equal to coded resolution of the stream, as selected by the
> +         encoder based on source resolution and crop/compose rectangles

Are there codec drivers that support composition? I can't remember seeing any.

> +
> +   .. note::
> +
> +      The driver may adjust the crop/compose rectangles to the nearest
> +      supported ones to meet codec and hardware requirements.
> +
> +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> +
> +   * **Required fields:**
> +
> +     ``count``
> +         requested number of buffers to allocate; greater than zero
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> +         ``CAPTURE``
> +
> +     ``memory``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``count``
> +         adjusted to allocated number of buffers
> +
> +   * The driver must adjust count to minimum of required number of
> +     buffers for given format and count passed.

I'd rephrase this:

	The driver must adjust ``count`` to the maximum of ``count`` and
	the required number of buffers for the given format.

Note that this is set to the maximum, not minimum.

> The client must
> +     check this value after the ioctl returns to get the number of
> +     buffers actually allocated.
> +
> +   .. note::
> +
> +      To allocate more than minimum number of buffers (for pipeline

than -> than the

> +      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
> +      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
> +      to get the minimum number of buffers required by the
> +      driver/format, and pass the obtained value plus the number of
> +      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.

count -> the ``count``

> +
> +7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> +   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual

Actual -> The actual

> +   encoding process starts when both queues start streaming.
> +
> +.. note::
> +
> +   If the client stops ``CAPTURE`` during the encode process and then
> +   restarts it again, the encoder will be expected to generate a stream
> +   independent from the stream generated before the stop. Depending on the
> +   coded format, that may imply that:
> +
> +   * encoded frames produced after the restart must not reference any
> +     frames produced before the stop, e.g. no long term references for
> +     H264,
> +
> +   * any headers that must be included in a standalone stream must be
> +     produced again, e.g. SPS and PPS for H264.
> +
> +Encoding
> +========
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> +semantics.
> +
> +Both queues operate independently, following standard behavior of V4L2
> +buffer queues and memory-to-memory devices. In addition, the order of
> +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> +queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
> +format, e.g. frame reordering. The client must not assume any direct
> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> +reported by :c:type:`v4l2_buffer` ``timestamp``.

Same question as for the decoder: are you sure about that?

> +
> +Encoding parameter changes
> +==========================
> +
> +The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
> +parameters at any time. The availability of parameters is driver-specific
> +and the client must query the driver to find the set of available controls.
> +
> +The ability to change each parameter during encoding of is driver-specific,

Remove spurious 'of'

> +as per standard semantics of the V4L2 control interface. The client may

per -> per the

> +attempt setting a control of its interest during encoding and if it the

Remove spurious 'it'

> +operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be

``CAPTURE`` -> the ``CAPTURE``

> +stopped for the configuration change to be allowed (following the drain
> +sequence will be  needed to avoid losing already queued/encoded frames).

losing -> losing the

> +
> +The timing of parameter update is driver-specific, as per standard

update -> updates
per -> per the

> +semantics of the V4L2 control interface. If the client needs to apply the
> +parameters exactly at specific frame and the encoder supports it, using

using -> using the

> +Request API should be considered.

This makes the assumption that the Request API will be merged at about the
same time as this document. Which is at the moment a reasonable assumption,
to be fair.

> +
> +Drain
> +=====
> +
> +To ensure that all queued ``OUTPUT`` buffers have been processed and
> +related ``CAPTURE`` buffers output to the client, the following drain

related -> the related

> +sequence may be followed. After the drain sequence is complete, the client
> +has received all encoded frames for all ``OUTPUT`` buffers queued before
> +the sequence was started.
> +
> +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> +
> +   * **Required fields:**
> +
> +     ``cmd``
> +         set to ``V4L2_ENC_CMD_STOP``
> +
> +     ``flags``
> +         set to 0
> +
> +     ``pts``
> +         set to 0
> +
> +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> +
> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> +   processed:
> +
> +   * Once all decoded frames (if any) are ready to be dequeued on the
> +     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
> +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> +     last frame (if any) produced as a result of processing the ``OUTPUT``
> +     buffers queued before
> +     ``V4L2_ENC_CMD_STOP``.

Hmm, this is somewhat awkward phrasing. Can you take another look at this?

> +
> +   * If no more frames are left to be returned at the point of handling
> +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +     ``V4L2_BUF_FLAG_LAST`` set.
> +
> +   * Any attempts to dequeue more buffers beyond the buffer marked with
> +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> +     :c:func:`VIDIOC_DQBUF`.
> +
> +4. At this point, encoding is paused and the driver will accept, but not
> +   process any newly queued ``OUTPUT`` buffers until the client issues

issues -> issues a

> +   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
> +
> +* Once the drain sequence is initiated, the client needs to drive it to
> +  completion, as described by the above steps, unless it aborts the process
> +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
> +  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
> +  again while the drain sequence is in progress and they will fail with
> +  -EBUSY error code if attempted.
> +
> +* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
> +  state and make the encoder continue encoding, as long as other encoding
> +  conditions are met. Restarting ``OUTPUT`` queue will not affect an
> +  in-progress drain sequence.
> +
> +* The drivers must also implement :c:func:`VIDIOC_TRY_ENCODER_CMD`, as a
> +  way to let the client query the availability of encoder commands.
> +
> +Reset
> +=====
> +
> +The client may want to request the encoder to reinitialize the encoding,
> +so that the stream produced becomes independent from the stream generated
> +before. Depending on the coded format, that may imply that:
> +
> +* encoded frames produced after the restart must not reference any frames
> +  produced before the stop, e.g. no long term references for H264,
> +
> +* any headers that must be included in a standalone stream must be produced
> +  again, e.g. SPS and PPS for H264.
> +
> +This can be achieved by performing the reset sequence.
> +
> +1. *[optional]* If the client is interested in encoded frames resulting
> +   from already queued source frames, it needs to perform the Drain
> +   sequence. Otherwise, the reset sequence would cause the already
> +   encoded and not dequeued encoded frames to be lost.
> +
> +2. Stop streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMOFF`. This
> +   will return all currently queued ``CAPTURE`` buffers to the client,
> +   without valid frame data.
> +
> +3. *[optional]* Restart streaming on ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_STREAMOFF` followed by :c:func:`VIDIOC_STREAMON` to
> +   drop any source frames enqueued to the encoder before the reset
> +   sequence. This is useful if the client requires the new stream to begin
> +   at specific source frame. Otherwise, the new stream might include
> +   frames encoded from source frames queued before the reset sequence.
> +
> +4. Restart streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMON` and
> +   continue with regular encoding sequence. The encoded frames produced
> +   into ``CAPTURE`` buffers from now on will contain a standalone stream
> +   that can be decoded without the need for frames encoded before the reset
> +   sequence.
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on ``CAPTURE`` queue may change the set of formats

format -> the format

> +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> +   means that ``OUTPUT`` format may be reset and the client must not

that -> that the

> +   rely on the previously set format being preserved.
> +
> +2. Enumerating formats on ``OUTPUT`` queue must only return formats

on -> on the

> +   supported for the ``CAPTURE`` format currently set.

'for the current ``CAPTURE`` format.'

> +
> +3. Setting/changing format on ``OUTPUT`` queue does not change formats

format -> the format
on -> on the

> +   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that

on -> on the
set -> set the

> +   is not supported for the currently selected ``CAPTURE`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.
> +
> +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of

on -> on the

> +   supported coded formats, irrespective of the current ``OUTPUT``
> +   format.
> +
> +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> +   change format on it.

format -> the format

> +
> +To summarize, setting formats and allocation must always start with the
> +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> +set of supported formats for the ``OUTPUT`` queue.
> diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> index 12d43fe711cf..1822c66c2154 100644
> --- a/Documentation/media/uapi/v4l/devices.rst
> +++ b/Documentation/media/uapi/v4l/devices.rst
> @@ -16,6 +16,7 @@ Interfaces
>      dev-osd
>      dev-codec
>      dev-decoder
> +    dev-encoder
>      dev-effect
>      dev-raw-vbi
>      dev-sliced-vbi
> diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> index 65dc096199ad..2ef6693b9499 100644
> --- a/Documentation/media/uapi/v4l/v4l2.rst
> +++ b/Documentation/media/uapi/v4l/v4l2.rst
> @@ -56,6 +56,7 @@ Authors, in alphabetical order:
>  - Figa, Tomasz <tfiga@chromium.org>
>  
>    - Documented the memory-to-memory decoder interface.
> +  - Documented the memory-to-memory encoder interface.
>  
>  - H Schimek, Michael <mschimek@gmx.at>
>  
> @@ -68,6 +69,7 @@ Authors, in alphabetical order:
>  - Osciak, Pawel <posciak@chromium.org>
>  
>    - Documented the memory-to-memory decoder interface.
> +  - Documented the memory-to-memory encoder interface.
>  
>  - Osciak, Pawel <pawel@osciak.com>
>  
> 

One general comment:

you often talk about 'the driver must', e.g.:

"The driver must process and encode as normal all ``OUTPUT`` buffers
queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."

But this is not a driver specification, it is an API specification.

I think it would be better to phrase it like this:

"All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
was issued will be processed and encoded as normal."

(or perhaps even 'shall' if you want to be really formal)

End-users do not really care what drivers do, they want to know what the API does,
and that implies rules for drivers.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-25 11:58   ` Hans Verkuil
@ 2018-07-26 10:20     ` Tomasz Figa
  2018-07-26 10:36       ` Philipp Zabel
                         ` (3 more replies)
  0 siblings, 4 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-07-26 10:20 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Hans,

On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi Tomasz,
>
> Many, many thanks for working on this! It's a great document and when done
> it will be very useful indeed.
>
> Review comments follow...

Thanks for review!

>
> On 24/07/18 16:06, Tomasz Figa wrote:
[snip]
> > +DPB
> > +   Decoded Picture Buffer; a H.264 term for a buffer that stores a picture
>
> a H.264 -> an H.264
>

Ack.

> > +   that is encoded or decoded and available for reference in further
> > +   decode/encode steps.
> > +
> > +EOS
> > +   end of stream
> > +
> > +IDR
> > +   a type of a keyframe in H.264-encoded stream, which clears the list of
> > +   earlier reference frames (DPBs)
>
> You do not actually say what IDR stands for. Can you add that?
>

Ack.

[snip]
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > +   resolutions for a given format, passing desired pixel format in
> > +   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> > +
> > +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
> > +     must include all possible coded resolutions supported by the decoder
> > +     for given coded pixel format.
>
> This is confusing. Since VIDIOC_ENUM_FRAMESIZES does not have a buffer type
> argument you cannot say 'on OUTPUT'. I would remove 'on OUTPUT' entirely.
>
> > +
> > +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
>
> Ditto for 'on CAPTURE'
>

You're right. I didn't notice that the "type" field in
v4l2_frmsizeenum was not buffer type, but type of the range. Thanks
for spotting this.

> > +     must include all possible frame buffer resolutions supported by the
> > +     decoder for given raw pixel format and coded format currently set on
> > +     ``OUTPUT``.
> > +
> > +    .. note::
> > +
> > +       The client may derive the supported resolution range for a
> > +       combination of coded and raw format by setting width and height of
> > +       ``OUTPUT`` format to 0 and calculating the intersection of
> > +       resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> > +       for the given coded and raw formats.
>
> So if the output format is set to 1280x720, then ENUM_FRAMESIZES would just
> return 1280x720 as the resolution. If the output format is set to 0x0, then
> it returns the full range it is capable of.
>
> Correct?
>
> If so, then I think this needs to be a bit more explicit. I had to think about
> it a bit.
>
> Note that the v4l2_pix_format/S_FMT documentation needs to be updated as well
> since we never allowed 0x0 before.

Is there any text that disallows this? I couldn't spot any. Generally
there are already drivers which return 0x0 for coded formats (s5p-mfc)
and it's not even strange, because in such case, the buffer contains
just a sequence of bytes, not a 2D picture.

> What if you set the format to 0x0 but the stream does not have meta data with
> the resolution? How does userspace know if 0x0 is allowed or not? If this is
> specific to the chosen coded pixel format, should be add a new flag for those
> formats indicating that the coded data contains resolution information?

Yes, this would definitely be on a per-format basis. Not sure what you
mean by a flag, though? E.g. if the format is set to H264, then it's
bound to include resolution information. If the format doesn't include
it, then userspace is already aware of this fact, because it needs to
get this from some other source (e.g. container).

>
> That way userspace knows if 0x0 can be used, and the driver can reject 0x0
> for formats that do not support it.

As above, but I might be misunderstanding your suggestion.

>
> > +
> > +4. Supported profiles and levels for given format, if applicable, may be
> > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> > +   capability enumeration.
>
> capability enumeration. -> 'Querying capabilities' above.
>

Ack.

> > +
> > +2. Set the coded format on ``OUTPUT`` via :c:func:`VIDIOC_S_FMT`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``pixelformat``
> > +         a coded pixel format
> > +
> > +     ``width``, ``height``
> > +         required only if cannot be parsed from the stream for the given
> > +         coded format; optional otherwise - set to zero to ignore
> > +
> > +     other fields
> > +         follow standard semantics
> > +
> > +   * For coded formats including stream resolution information, if width
> > +     and height are set to non-zero values, the driver will propagate the
> > +     resolution to ``CAPTURE`` and signal a source change event
> > +     instantly. However, after the decoder is done parsing the
> > +     information embedded in the stream, it will update ``CAPTURE``
> > +     format with new values and signal a source change event again, if
> > +     the values do not match.
> > +
> > +   .. note::
> > +
> > +      Changing ``OUTPUT`` format may change currently set ``CAPTURE``
>
> change -> change the

Ack.

>
> > +      format. The driver will derive a new ``CAPTURE`` format from
>
> from -> from the

Ack.

>
> > +      ``OUTPUT`` format being set, including resolution, colorimetry
> > +      parameters, etc. If the client needs a specific ``CAPTURE`` format,
> > +      it must adjust it afterwards.
> > +
> > +3.  *[optional]* Get minimum number of buffers required for ``OUTPUT``
> > +    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
>
> client -> the client

Ack.

>
> > +    use more buffers than minimum required by hardware/format.
>
> than -> than the

Ack.

[snip]
> > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
> > +    on the ``CAPTURE`` queue.
> > +
> > +    * **Required fields:**
> > +
> > +      ``count``
> > +          requested number of buffers to allocate; greater than zero
> > +
> > +      ``type``
> > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +      ``memory``
> > +          follows standard semantics
> > +
> > +    * **Return fields:**
> > +
> > +      ``count``
> > +          adjusted to allocated number of buffers
> > +
> > +    * The driver must adjust count to minimum of required number of
> > +      destination buffers for given format and stream configuration and the
> > +      count passed. The client must check this value after the ioctl
> > +      returns to get the number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> > +       get minimum number of buffers required, and pass the obtained value
> > +       plus the number of additional buffers needed in count to
> > +       :c:func:`VIDIOC_REQBUFS`.
>
>
> I think we should mention here the option of using VIDIOC_CREATE_BUFS in order
> to allocate buffers larger than the current CAPTURE format in order to accommodate
> future resolution changes.

Ack.

>
> > +
> > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > +
> > +Decoding
> > +========
> > +
> > +This state is reached after a successful initialization sequence. In this
> > +state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> > +semantics.
> > +
> > +Both queues operate independently, following standard behavior of V4L2
> > +buffer queues and memory-to-memory devices. In addition, the order of
> > +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> > +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
> > +coded format, e.g. frame reordering. The client must not assume any direct
> > +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> > +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
>
> Is there a relationship between capture and output buffers w.r.t. the timestamp
> field? I am not aware that there is one.

I believe the decoder was expected to copy the timestamp of matching
OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
to be implementing it this way. I guess it might be a good idea to
specify this more explicitly.

>
> > +
> > +The contents of source ``OUTPUT`` buffers depend on active coded pixel
> > +format and might be affected by codec-specific extended controls, as stated
> > +in documentation of each format individually.
>
> in -> in the
> each format individually -> each format
>

Ack.

[snip]
> > +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +   * The driver must be put in a state after seek and be ready to
>
> "put in a state"???
>

I'm not sure what this was supposed to be. I guess just "The driver
must start accepting new source bitstream buffers after the call
returns." would be enough.

> > +     accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream starting exactly from
> > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > +      data queued and must keep processing the queued buffers until it
> > +      finds a suitable resume point. While looking for a resume point, the
> > +      driver processes ``OUTPUT`` buffers and returns them to the client
> > +      without producing any decoded frames.
> > +
> > +      For hardware known to be mishandling seeks to a non-resume point,
> > +      e.g. by returning corrupted decoded frames, the driver must be able
> > +      to handle such seeks without a crash or any fatal decode error.
> > +
> > +4. After a resume point is found, the driver will start returning
> > +   ``CAPTURE`` buffers with decoded frames.
> > +
> > +   * There is no precise specification for ``CAPTURE`` queue of when it
> > +     will start producing buffers containing decoded data from buffers
> > +     queued after the seek, as it operates independently
> > +     from ``OUTPUT`` queue.
> > +
> > +     * The driver is allowed to and may return a number of remaining
>
> I'd drop 'is allowed to and'.
>

Ack.

> > +       ``CAPTURE`` buffers containing decoded frames from before the seek
> > +       after the seek sequence (STREAMOFF-STREAMON) is performed.
> > +
> > +     * The driver is also allowed to and may not return all decoded frames
>
> Ditto.

Ack.

>
> > +       queued but not decode before the seek sequence was initiated. For
>
> Very confusing sentence. I think you mean this:
>
>           The driver may not return all decoded frames that where ready for
>           dequeueing from before the seek sequence was initiated.
>
> Is this really true? Once decoded frames are marked as buffer_done by the
> driver there is no reason for them to be removed. Or you mean something else
> here, e.g. the frames are decoded, but the buffers not yet given back to vb2.
>

Exactly "the frames are decoded, but the buffers not yet given back to
vb2", for example, if reordering takes place. However, if one stops
streaming before dequeuing all buffers, they are implicitly returned
(reset to the state after REQBUFS) and can't be dequeued anymore, so
the frames are lost, even if the driver returned them. I guess the
sentence was really unfortunate indeed.

> > +       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
> > +       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > +       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
> > +       H’}, {A’, G’, H’}, {G’, H’}.
> > +
> > +   .. note::
> > +
> > +      To achieve instantaneous seek, the client may restart streaming on
> > +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
> > +
> > +Pause
> > +=====
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> > +Without source bitstream data, there is no data to process and the hardware
> > +remains idle.
> > +
> > +Conversely, using :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue indicates
> > +a seek, which
> > +
> > +1. drops all ``OUTPUT`` buffers in flight and
> > +2. after a subsequent :c:func:`VIDIOC_STREAMON`, will look for and only
> > +   continue from a resume point.
> > +
> > +This is usually undesirable for pause. The STREAMOFF-STREAMON sequence is
> > +intended for seeking.
> > +
> > +Similarly, ``CAPTURE`` queue should remain streaming as well, as the
>
> the ``CAPTURE`` queue
>
> (add 'the')
>

Ack.

> > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > +sets.
>
> 'changing buffer sets': not clear what is meant by this. It's certainly not
> 'solely' since it can also be used to achieve an instantaneous seek.
>

To be honest, I'm not sure whether there is even a need to include
this whole section. It's obvious that if you stop feeding a mem2mem
device, it will pause. Moreover, other sections imply various
behaviors triggered by STREAMOFF/STREAMON/DECODER_CMD/etc., so it
should be quite clear that they are different from a simple pause.
What do you think?

> > +
> > +Dynamic resolution change
> > +=========================
> > +
> > +A video decoder implementing this interface must support dynamic resolution
> > +change, for streams, which include resolution metadata in the bitstream.
>
> I think the commas can be removed from this sentence. I would also replace
> 'which' by 'that'.
>

Ack.

> > +When the decoder encounters a resolution change in the stream, the dynamic
> > +resolution change sequence is started.
> > +
> > +1.  After encountering a resolution change in the stream, the driver must
> > +    first process and decode all remaining buffers from before the
> > +    resolution change point.
> > +
> > +2.  After all buffers containing decoded frames from before the resolution
> > +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
> > +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
> > +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +
> > +    * The last buffer from before the change must be marked with
> > +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the
>
> spurious 'as'?
>

It should be:

    * The last buffer from before the change must be marked with
      the ``V4L2_BUF_FLAG_LAST`` flag in :c:type:`v4l2_buffer` ``flags`` field,
      similarly to the

> > +      drain sequence. The last buffer might be empty (with
> > +      :c:type:`v4l2_buffer` ``bytesused`` = 0) and must be ignored by the
> > +      client, since it does not contain any decoded frame.
>
> any -> a
>

Ack.

> > +
> > +    * Any client query issued after the driver queues the event must return
> > +      values applying to the stream after the resolution change, including
> > +      queue formats, selection rectangles and controls.
> > +
> > +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
> > +      the event is signaled, the decoding process will not continue until
> > +      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
> > +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> > +      command.
>
> With (re-)starting streaming you mean a STREAMOFF/ON pair on the CAPTURE queue,
> right?
>

Right. I guess it might be better to just state that explicitly.

> > +
> > +    .. note::
> > +
> > +       Any attempts to dequeue more buffers beyond the buffer marked
> > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > +       :c:func:`VIDIOC_DQBUF`.
> > +
> > +3.  The client calls :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` to get the new
> > +    format information. This is identical to calling :c:func:`VIDIOC_G_FMT`
> > +    after ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization sequence
> > +    and should be handled similarly.
> > +
> > +    .. note::
> > +
> > +       It is allowed for the driver not to support the same pixel format as
> > +       previously used (before the resolution change) for the new
> > +       resolution. The driver must select a default supported pixel format,
> > +       return it, if queried using :c:func:`VIDIOC_G_FMT`, and the client
> > +       must take note of it.
> > +
> > +4.  The client acquires visible resolution as in initialization sequence.
> > +
> > +5.  *[optional]* The client is allowed to enumerate available formats and
> > +    select a different one than currently chosen (returned via
> > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +6.  *[optional]* The client acquires minimum number of buffers as in
> > +    initialization sequence.
>
> It's an optional step, but what might happen if you ignore it or if the control
> does not exist?

REQBUFS is supposed clamp the requested number of buffers to the [min,
max] range anyway.

>
> You also should mention that this is the min number of CAPTURE buffers.
>
> I wonder if we should make these min buffer controls required. It might be easier
> that way.

Agreed. Although userspace is still free to ignore it, because REQBUFS
would do the right thing anyway.

>
> > +7.  If all the following conditions are met, the client may resume the
> > +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> > +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
> > +    sequence:
> > +
> > +    * ``sizeimage`` of new format is less than or equal to the size of
> > +      currently allocated buffers,
> > +
> > +    * the number of buffers currently allocated is greater than or equal to
> > +      the minimum number of buffers acquired in step 6.
>
> You might want to mention that if there are insufficient buffers, then
> VIDIOC_CREATE_BUFS can be used to add more buffers.
>

This might be a bit tricky, since at least s5p-mfc and coda can only
work on a fixed buffer set and one would need to fully reinitialize
the decoding to add one more buffer, which would effectively be the
full resolution change sequence, as below, just with REQBUFS(0),
REQBUFS(N) replaced with CREATE_BUFS.

We should mention CREATE_BUFS as an alternative to steps 9 and 10, though.

> > +
> > +    In such case, the remaining steps do not apply.
> > +
> > +    However, if the client intends to change the buffer set, to lower
> > +    memory usage or for any other reasons, it may be achieved by following
> > +    the steps below.
> > +
> > +8.  After dequeuing all remaining buffers from the ``CAPTURE`` queue, the
> > +    client must call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> > +    The ``OUTPUT`` queue must remain streaming (calling STREAMOFF on it
> > +    would trigger a seek).
> > +
> > +9.  The client frees the buffers on the ``CAPTURE`` queue using
> > +    :c:func:`VIDIOC_REQBUFS`.
> > +
> > +    * **Required fields:**
> > +
> > +      ``count``
> > +          set to 0
> > +
> > +      ``type``
> > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +      ``memory``
> > +          follows standard semantics
> > +
> > +10. The client allocates a new set of buffers for the ``CAPTURE`` queue via
> > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > +    the initialization sequence.
[snip]
> > +
> > +Commit points
> > +=============
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on ``OUTPUT`` queue may change the set of formats
>
> Setting -> Setting the
>

Ack.

> > +   supported/advertised on the ``CAPTURE`` queue. In particular, it also
> > +   means that ``CAPTURE`` format may be reset and the client must not
>
> that -> that the
>

Ack.

> > +   rely on the previously set format being preserved.
> > +
> > +2. Enumerating formats on ``CAPTURE`` queue must only return formats
> > +   supported for the ``OUTPUT`` format currently set.
> > +
> > +3. Setting/changing format on ``CAPTURE`` queue does not change formats
>
> format -> the format
>

Ack.

> > +   available on ``OUTPUT`` queue. An attempt to set ``CAPTURE`` format that
>
> set -> set a
>

Ack.

> > +   is not supported for the currently selected ``OUTPUT`` format must
> > +   result in the driver adjusting the requested format to an acceptable
> > +   one.
> > +
> > +4. Enumerating formats on ``OUTPUT`` queue always returns the full set of
> > +   supported coded formats, irrespective of the current ``CAPTURE``
> > +   format.
> > +
> > +5. After allocating buffers on the ``OUTPUT`` queue, it is not possible to
> > +   change format on it.
>
> format -> the format
>

Ack.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-26 10:20     ` Tomasz Figa
@ 2018-07-26 10:36       ` Philipp Zabel
  2018-08-07  6:55         ` Tomasz Figa
  2018-07-26 10:57       ` Hans Verkuil
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: Philipp Zabel @ 2018-07-26 10:36 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Thu, 2018-07-26 at 19:20 +0900, Tomasz Figa wrote:
[...]
> > You might want to mention that if there are insufficient buffers, then
> > VIDIOC_CREATE_BUFS can be used to add more buffers.
> > 
> 
> This might be a bit tricky, since at least s5p-mfc and coda can only
> work on a fixed buffer set and one would need to fully reinitialize
> the decoding to add one more buffer, which would effectively be the
> full resolution change sequence, as below, just with REQBUFS(0),
> REQBUFS(N) replaced with CREATE_BUFS.

The coda driver supports CREATE_BUFS on the decoder CAPTURE queue.

The firmware indeed needs a fixed frame buffer set, but these buffers
are internal only and in a coda specific tiling format. The content of
finished internal buffers is copied / detiled into the external CAPTURE
buffers, so those can be added at will.

regards
Philipp

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-26 10:20     ` Tomasz Figa
  2018-07-26 10:36       ` Philipp Zabel
@ 2018-07-26 10:57       ` Hans Verkuil
  2018-08-07  7:05         ` Tomasz Figa
  2018-08-07  7:13       ` Hans Verkuil
  2018-09-19 10:17       ` Tomasz Figa
  3 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-07-26 10:57 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On 26/07/18 12:20, Tomasz Figa wrote:
> Hi Hans,
> 
> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> Hi Tomasz,
>>
>> Many, many thanks for working on this! It's a great document and when done
>> it will be very useful indeed.
>>
>> Review comments follow...
> 
> Thanks for review!
> 
>>
>> On 24/07/18 16:06, Tomasz Figa wrote:

> [snip]

>> Note that the v4l2_pix_format/S_FMT documentation needs to be updated as well
>> since we never allowed 0x0 before.
> 
> Is there any text that disallows this? I couldn't spot any. Generally
> there are already drivers which return 0x0 for coded formats (s5p-mfc)
> and it's not even strange, because in such case, the buffer contains
> just a sequence of bytes, not a 2D picture.

All non-m2m devices will always have non-zero width/height values. Only with
m2m devices do we see this.

This was probably never documented since before m2m appeared it was 'obvious'.

This definitely needs to be documented, though.

> 
>> What if you set the format to 0x0 but the stream does not have meta data with
>> the resolution? How does userspace know if 0x0 is allowed or not? If this is
>> specific to the chosen coded pixel format, should be add a new flag for those
>> formats indicating that the coded data contains resolution information?
> 
> Yes, this would definitely be on a per-format basis. Not sure what you
> mean by a flag, though? E.g. if the format is set to H264, then it's
> bound to include resolution information. If the format doesn't include
> it, then userspace is already aware of this fact, because it needs to
> get this from some other source (e.g. container).
> 
>>
>> That way userspace knows if 0x0 can be used, and the driver can reject 0x0
>> for formats that do not support it.
> 
> As above, but I might be misunderstanding your suggestion.

So my question is: is this tied to the pixel format, or should we make it
explicit with a flag like V4L2_FMT_FLAG_CAN_DECODE_WXH.

The advantage of a flag is that you don't need a switch on the format to
know whether or not 0x0 is allowed. And the flag can just be set in
v4l2-ioctls.c.

>>> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
>>> +sets.
>>
>> 'changing buffer sets': not clear what is meant by this. It's certainly not
>> 'solely' since it can also be used to achieve an instantaneous seek.
>>
> 
> To be honest, I'm not sure whether there is even a need to include
> this whole section. It's obvious that if you stop feeding a mem2mem
> device, it will pause. Moreover, other sections imply various
> behaviors triggered by STREAMOFF/STREAMON/DECODER_CMD/etc., so it
> should be quite clear that they are different from a simple pause.
> What do you think?

Yes, I'd drop this last sentence ('Similarly...sets').

>>> +2.  After all buffers containing decoded frames from before the resolution
>>> +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
>>> +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
>>> +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
>>> +
>>> +    * The last buffer from before the change must be marked with
>>> +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the
>>
>> spurious 'as'?
>>
> 
> It should be:
> 
>     * The last buffer from before the change must be marked with
>       the ``V4L2_BUF_FLAG_LAST`` flag in :c:type:`v4l2_buffer` ``flags`` field,
>       similarly to the

Ah, OK. Now I get it.

>> I wonder if we should make these min buffer controls required. It might be easier
>> that way.
> 
> Agreed. Although userspace is still free to ignore it, because REQBUFS
> would do the right thing anyway.

It's never been entirely clear to me what the purpose of those min buffers controls
is. REQBUFS ensures that the number of buffers is at least the minimum needed to
make the HW work. So why would you need these controls? It only makes sense if they
return something different from REQBUFS.

> 
>>
>>> +7.  If all the following conditions are met, the client may resume the
>>> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
>>> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
>>> +    sequence:
>>> +
>>> +    * ``sizeimage`` of new format is less than or equal to the size of
>>> +      currently allocated buffers,
>>> +
>>> +    * the number of buffers currently allocated is greater than or equal to
>>> +      the minimum number of buffers acquired in step 6.
>>
>> You might want to mention that if there are insufficient buffers, then
>> VIDIOC_CREATE_BUFS can be used to add more buffers.
>>
> 
> This might be a bit tricky, since at least s5p-mfc and coda can only
> work on a fixed buffer set and one would need to fully reinitialize
> the decoding to add one more buffer, which would effectively be the
> full resolution change sequence, as below, just with REQBUFS(0),
> REQBUFS(N) replaced with CREATE_BUFS.

What happens today in those drivers if you try to call CREATE_BUFS?

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
  2018-07-25 11:58   ` Hans Verkuil
@ 2018-07-30 12:52   ` Hans Verkuil
  2018-08-07  7:08     ` Tomasz Figa
  2018-08-08  2:46   ` Tomasz Figa
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-07-30 12:52 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel

On 07/24/2018 04:06 PM, Tomasz Figa wrote:
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, drain and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
>  3 files changed, 882 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst b/Documentation/media/uapi/v4l/dev-decoder.rst
> new file mode 100644
> index 000000000000..f55d34d2f860
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> @@ -0,0 +1,872 @@

<snip>

> +6.  This step only applies to coded formats that contain resolution
> +    information in the stream. Continue queuing/dequeuing bitstream
> +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
> +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and returning
> +    each buffer to the client until required metadata to configure the
> +    ``CAPTURE`` queue are found. This is indicated by the driver sending
> +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> +    requirement to pass enough data for this to occur in the first buffer
> +    and the driver must be able to process any number.
> +
> +    * If data in a buffer that triggers the event is required to decode
> +      the first frame, the driver must not return it to the client,
> +      but must retain it for further decoding.
> +
> +    * If the client set width and height of ``OUTPUT`` format to 0, calling
> +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return -EPERM,
> +      until the driver configures ``CAPTURE`` format according to stream
> +      metadata.

What about calling TRY/S_FMT on the capture queue: will this also return -EPERM?
I assume so.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-07-25 13:41   ` Philipp Zabel
@ 2018-08-07  6:07     ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-07  6:07 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Philipp,

On Wed, Jul 25, 2018 at 10:41 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> > Due to complexity of the video encoding process, the V4L2 drivers of
> > stateful encoder hardware require specific sequences of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > encoding, encode parameters change, drain and reset.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or coda.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the encoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
>
> Thanks a lot for the update,

Thanks for review!

> > ---
> >  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
> >  3 files changed, 553 insertions(+)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> > new file mode 100644
> > index 000000000000..28be1698e99c
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> > @@ -0,0 +1,550 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _encoder:
> > +
> > +****************************************
> > +Memory-to-memory Video Encoder Interface
> > +****************************************
> > +
> > +Input data to a video encoder are raw video frames in display order
> > +to be encoded into the output bitstream. Output data are complete chunks of
> > +valid bitstream, including all metadata, headers, etc. The resulting stream
> > +must not need any further post-processing by the client.
> > +
> > +Performing software stream processing, header generation etc. in the driver
> > +in order to support this interface is strongly discouraged. In case such
> > +operations are needed, use of Stateless Video Encoder Interface (in
> > +development) is strongly advised.
> > +
> [...]
> > +
> > +Commit points
> > +=============
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on ``CAPTURE`` queue may change the set of formats
> > +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> > +   means that ``OUTPUT`` format may be reset and the client must not
> > +   rely on the previously set format being preserved.
>
> Since the only property of the CAPTURE format that can be set directly
> via S_FMT is the pixelformat, should this be made explicit?
>
> 1. Setting pixelformat on ``CAPTURE`` queue may change the set of
>    formats supported/advertised on the ``OUTPUT`` queue. In particular,
>    it also means that ``OUTPUT`` format may be reset and the client
>    must not rely on the previously set format being preserved.
>
> ?
>
> > +2. Enumerating formats on ``OUTPUT`` queue must only return formats
> > +   supported for the ``CAPTURE`` format currently set.
>
> Same here, as it usually is the codec selected by CAPTURE pixelformat
> that determines the supported OUTPUT pixelformats and resolutions.
>
> 2. Enumerating formats on ``OUTPUT`` queue must only return formats
>    supported for the ``CAPTURE`` pixelformat currently set.
>
> This could prevent the possible misconception that the CAPTURE
> width/height might in any form limit the OUTPUT format, when in fact it
> is determined by it.
>
> > +3. Setting/changing format on ``OUTPUT`` queue does not change formats
> > +   available on ``CAPTURE`` queue.
>
> 3. Setting/changing format on the ``OUTPUT`` queue does not change
>    pixelformats available on the ``CAPTURE`` queue.
>
> ?
>
> Because setting OUTPUT width/height or CROP SELECTION very much limits
> the possible values of CAPTURE width/height.
>
> Maybe 'available' in this context should be specified somewhere to mean
> 'returned by ENUM_FMT and allowed by S_FMT/TRY_FMT'.
>
> > +   An attempt to set ``OUTPUT`` format that
> > +   is not supported for the currently selected ``CAPTURE`` format must
> > +   result in the driver adjusting the requested format to an acceptable
> > +   one.
>
>    An attempt to set ``OUTPUT`` format that is not supported for the
>
> currently selected ``CAPTURE`` pixelformat must result in the driver
>
> adjusting the requested format to an acceptable one.
>
> > +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
> > +   supported coded formats, irrespective of the current ``OUTPUT``
> > +   format.
> > +
> > +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> > +   change format on it.
> > +
> > +To summarize, setting formats and allocation must always start with the
> > +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> > +set of supported formats for the ``OUTPUT`` queue.
>
> To summarize, setting formats and allocation must always start with
> setting the encoded pixelformat on the ``CAPTURE`` queue. The
> ``CAPTURE`` queue is the master that governs the set of supported
> formats for the ``OUTPUT`` queue.
>
> Or is this too verbose?

I'm personally okay with making this "pixel format" specifically, but
I thought we wanted to extend this later to other things, such as
colorimetry. Would it introduce any problems if we keep it more
general?

To avoid any ambiguities, we could add a table that lists all the
state accessible by user space, which would clearly mark CAPTURE
width/height as fixed by the driver.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-07-25 13:57   ` Hans Verkuil
@ 2018-08-07  6:54     ` Tomasz Figa
  2018-08-07  7:25       ` Hans Verkuil
  2018-10-16  7:36       ` Tomasz Figa
  0 siblings, 2 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-07  6:54 UTC (permalink / raw)
  To: Hans Verkuil, Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Hans,

On Wed, Jul 25, 2018 at 10:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 24/07/18 16:06, Tomasz Figa wrote:
> > Due to complexity of the video encoding process, the V4L2 drivers of
> > stateful encoder hardware require specific sequences of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > encoding, encode parameters change, drain and reset.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or coda.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the encoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
> >  3 files changed, 553 insertions(+)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> > new file mode 100644
> > index 000000000000..28be1698e99c
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> > @@ -0,0 +1,550 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _encoder:
> > +
> > +****************************************
> > +Memory-to-memory Video Encoder Interface
> > +****************************************
> > +
> > +Input data to a video encoder are raw video frames in display order
> > +to be encoded into the output bitstream. Output data are complete chunks of
> > +valid bitstream, including all metadata, headers, etc. The resulting stream
> > +must not need any further post-processing by the client.
>
> Due to the confusing use capture and output I wonder if it would be better to
> rephrase this as follows:
>
> "A video encoder takes raw video frames in display order and encodes them into
> a bitstream. It generates complete chunks of the bitstream, including
> all metadata, headers, etc. The resulting bitstream does not require any further
> post-processing by the client."
>
> Something similar should be done for the decoder documentation.
>

First, thanks a lot for review!

Sounds good to me, it indeed feels much easier to read, thanks.

[snip]
> > +
> > +IDR
> > +   a type of a keyframe in H.264-encoded stream, which clears the list of
> > +   earlier reference frames (DPBs)
>
> Same problem as with the previous patch: it doesn't say what IDR stands for.
> It also refers to DPBs, but DPB is not part of this glossary.

Ack.

>
> Perhaps the glossary of the encoder/decoder should be combined.
>

There are some terms that have slightly different nuance between
encoder and decoder, so while it would be possible to just include
both meanings (as it was in RFC), I wonder if it wouldn't make it more
difficult to read, also given that it would move it to a separate
page. No strong opinion, though.

[snip]
> > +
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported formats and resolutions. See
> > +   capability enumeration.
>
> capability enumeration. -> 'Querying capabilities' above.
>

Ack.

[snip]
> > +4. The client may set the raw source format on the ``OUTPUT`` queue via
> > +   :c:func:`VIDIOC_S_FMT`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``pixelformat``
> > +         raw format of the source
> > +
> > +     ``width``, ``height``
> > +         source resolution
> > +
> > +     ``num_planes`` (for _MPLANE)
> > +         set to number of planes for pixelformat
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``width``, ``height``
> > +         may be adjusted by driver to match alignment requirements, as
> > +         required by the currently selected formats
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * Setting the source resolution will reset visible resolution to the
> > +     adjusted source resolution rounded up to the closest visible
> > +     resolution supported by the driver. Similarly, coded resolution will
>
> coded -> the coded

Ack.

>
> > +     be reset to source resolution rounded up to the closest coded
>
> reset -> set
> source -> the source

Ack.

>
> > +     resolution supported by the driver (typically a multiple of
> > +     macroblock size).
>
> The first sentence of this paragraph is very confusing. It needs a bit more work,
> I think.

Actually, this applies to all crop rectangles, not just visible
resolution. How about the following?

    Setting the source resolution will reset the crop rectangles to
default values
    corresponding to the new resolution, as described further in this document.
    Similarly, the coded resolution will be reset to match source
resolution rounded up
    to the closest coded resolution supported by the driver (typically
a multiple of
    macroblock size).

>
> > +
> > +   .. note::
> > +
> > +      This step is not strictly required, since ``OUTPUT`` is expected to
> > +      have a valid default format. However, the client needs to ensure that
> > +      ``OUTPUT`` format matches its expectations via either
> > +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> > +      being the typical scenario, since the default format is unlikely to
> > +      be what the client needs.
>
> Hmm. I'm not sure if this note should be included. It's good practice to always
> set the output format. I think the note confuses more than that it helps. IMHO.
>

I agree with you on that. In RFC, Philipp noticed that technically
S_FMT is not mandatory and that there might be some use case where the
already set format matches client's expectation. I've added this note
to cover that. Philipp, do you still think we need it?

> > +
> > +5. *[optional]* Set visible resolution for the stream metadata via
>
> Set -> Set the
>

Ack.

> > +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``target``
> > +         set to ``V4L2_SEL_TGT_CROP``
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle; this must fit within the framebuffer resolution
>
> Should that be "source resolution"? Or the resolution returned by "CROP_BOUNDS"?
>

Referring to V4L2_SEL_TGT_CROP_BOUNDS rather than some arbitrary
resolution is better indeed, will change.

> > +         and might be subject to adjustment to match codec and hardware
> > +         constraints
> > +
> > +   * **Return fields:**
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle adjusted by the driver
> > +
> > +   * The driver must expose following selection targets on ``OUTPUT``:
> > +
> > +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> > +         maximum crop bounds within the source buffer supported by the
> > +         encoder
> > +
> > +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> > +         suggested cropping rectangle that covers the whole source picture
> > +
> > +     ``V4L2_SEL_TGT_CROP``
> > +         rectangle within the source buffer to be encoded into the
> > +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> > +         maximum rectangle within the coded resolution, which the cropped
> > +         source frame can be output into; always equal to (0, 0)x(width of
> > +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> > +         hardware does not support compose/scaling
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> > +         equal to ``V4L2_SEL_TGT_CROP``
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE``
> > +         rectangle within the coded frame, which the cropped source frame
> > +         is to be output into; defaults to
> > +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> > +         additional compose/scaling capabilities; resulting stream will
> > +         have this rectangle encoded as the visible rectangle in its
> > +         metadata
> > +
> > +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> > +         always equal to coded resolution of the stream, as selected by the
> > +         encoder based on source resolution and crop/compose rectangles
>
> Are there codec drivers that support composition? I can't remember seeing any.
>

Hmm, I was convinced that MFC could scale and we just lacked support
in the driver, but I checked the documentation and it doesn't seem to
be able to do so. I guess we could drop the COMPOSE rectangles for
now, until we find any hardware that can do scaling or composing on
the fly.

> > +
> > +   .. note::
> > +
> > +      The driver may adjust the crop/compose rectangles to the nearest
> > +      supported ones to meet codec and hardware requirements.
> > +
> > +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> > +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> > +
> > +   * **Required fields:**
> > +
> > +     ``count``
> > +         requested number of buffers to allocate; greater than zero
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> > +         ``CAPTURE``
> > +
> > +     ``memory``
> > +         follows standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``count``
> > +         adjusted to allocated number of buffers
> > +
> > +   * The driver must adjust count to minimum of required number of
> > +     buffers for given format and count passed.
>
> I'd rephrase this:
>
>         The driver must adjust ``count`` to the maximum of ``count`` and
>         the required number of buffers for the given format.
>
> Note that this is set to the maximum, not minimum.
>

Good catch. Will fix it.

> > The client must
> > +     check this value after the ioctl returns to get the number of
> > +     buffers actually allocated.
> > +
> > +   .. note::
> > +
> > +      To allocate more than minimum number of buffers (for pipeline
>
> than -> than the
>

Ack.

> > +      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
> > +      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
> > +      to get the minimum number of buffers required by the
> > +      driver/format, and pass the obtained value plus the number of
> > +      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.
>
> count -> the ``count``
>

Ack.

> > +
> > +7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> > +   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual
>
> Actual -> The actual
>

Ack.

> > +   encoding process starts when both queues start streaming.
> > +
> > +.. note::
> > +
> > +   If the client stops ``CAPTURE`` during the encode process and then
> > +   restarts it again, the encoder will be expected to generate a stream
> > +   independent from the stream generated before the stop. Depending on the
> > +   coded format, that may imply that:
> > +
> > +   * encoded frames produced after the restart must not reference any
> > +     frames produced before the stop, e.g. no long term references for
> > +     H264,
> > +
> > +   * any headers that must be included in a standalone stream must be
> > +     produced again, e.g. SPS and PPS for H264.
> > +
> > +Encoding
> > +========
> > +
> > +This state is reached after a successful initialization sequence. In
> > +this state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> > +semantics.
> > +
> > +Both queues operate independently, following standard behavior of V4L2
> > +buffer queues and memory-to-memory devices. In addition, the order of
> > +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> > +queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
> > +format, e.g. frame reordering. The client must not assume any direct
> > +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> > +reported by :c:type:`v4l2_buffer` ``timestamp``.
>
> Same question as for the decoder: are you sure about that?
>

I think it's the same answer here. That's why we have the timestamp
copy mechanism, right?

> > +
> > +Encoding parameter changes
> > +==========================
> > +
> > +The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
> > +parameters at any time. The availability of parameters is driver-specific
> > +and the client must query the driver to find the set of available controls.
> > +
> > +The ability to change each parameter during encoding of is driver-specific,
>
> Remove spurious 'of'
>
> > +as per standard semantics of the V4L2 control interface. The client may
>
> per -> per the
>
> > +attempt setting a control of its interest during encoding and if it the
>
> Remove spurious 'it'
>
> > +operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be
>
> ``CAPTURE`` -> the ``CAPTURE``
>
> > +stopped for the configuration change to be allowed (following the drain
> > +sequence will be  needed to avoid losing already queued/encoded frames).
>
> losing -> losing the
>
> > +
> > +The timing of parameter update is driver-specific, as per standard
>
> update -> updates
> per -> per the
>
> > +semantics of the V4L2 control interface. If the client needs to apply the
> > +parameters exactly at specific frame and the encoder supports it, using
>
> using -> using the

Ack +6

>
> > +Request API should be considered.
>
> This makes the assumption that the Request API will be merged at about the
> same time as this document. Which is at the moment a reasonable assumption,
> to be fair.
>

We can easily remove this, if it doesn't happen, but I'd prefer not to
need to. ;)

> > +
> > +Drain
> > +=====
> > +
> > +To ensure that all queued ``OUTPUT`` buffers have been processed and
> > +related ``CAPTURE`` buffers output to the client, the following drain
>
> related -> the related
>

Ack.

> > +sequence may be followed. After the drain sequence is complete, the client
> > +has received all encoded frames for all ``OUTPUT`` buffers queued before
> > +the sequence was started.
> > +
> > +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``cmd``
> > +         set to ``V4L2_ENC_CMD_STOP``
> > +
> > +     ``flags``
> > +         set to 0
> > +
> > +     ``pts``
> > +         set to 0
> > +
> > +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> > +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> > +
> > +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> > +   processed:
> > +
> > +   * Once all decoded frames (if any) are ready to be dequeued on the
> > +     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
> > +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> > +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> > +     last frame (if any) produced as a result of processing the ``OUTPUT``
> > +     buffers queued before
> > +     ``V4L2_ENC_CMD_STOP``.
>
> Hmm, this is somewhat awkward phrasing. Can you take another look at this?
>

How about this?

3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
   processed:

   * The driver returns all ``CAPTURE`` buffers corresponding to processed
     ``OUTPUT`` buffers, if any. The last buffer must have
``V4L2_BUF_FLAG_LAST``
     set in its :c:type:`v4l2_buffer` ``flags`` field.

   * The driver sends a ``V4L2_EVENT_EOS`` event.

> > +
> > +   * If no more frames are left to be returned at the point of handling
> > +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> > +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > +     ``V4L2_BUF_FLAG_LAST`` set.
> > +
> > +   * Any attempts to dequeue more buffers beyond the buffer marked with
> > +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> > +     :c:func:`VIDIOC_DQBUF`.
> > +
> > +4. At this point, encoding is paused and the driver will accept, but not
> > +   process any newly queued ``OUTPUT`` buffers until the client issues
>
> issues -> issues a
>

Ack.

[snip]
> > +Commit points
> > +=============
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
> > +of the driver.
> > +
> > +1. Setting format on ``CAPTURE`` queue may change the set of formats
>
> format -> the format
>
> > +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> > +   means that ``OUTPUT`` format may be reset and the client must not
>
> that -> that the
>
> > +   rely on the previously set format being preserved.
> > +
> > +2. Enumerating formats on ``OUTPUT`` queue must only return formats
>
> on -> on the
>
> > +   supported for the ``CAPTURE`` format currently set.
>
> 'for the current ``CAPTURE`` format.'
>
> > +
> > +3. Setting/changing format on ``OUTPUT`` queue does not change formats
>
> format -> the format
> on -> on the
>
> > +   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that
>
> on -> on the
> set -> set the
>
> > +   is not supported for the currently selected ``CAPTURE`` format must
> > +   result in the driver adjusting the requested format to an acceptable
> > +   one.
> > +
> > +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
>
> on -> on the
>
> > +   supported coded formats, irrespective of the current ``OUTPUT``
> > +   format.
> > +
> > +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> > +   change format on it.
>
> format -> the format
>

Ack +7

> > +
> > +To summarize, setting formats and allocation must always start with the
> > +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> > +set of supported formats for the ``OUTPUT`` queue.
> > diff --git a/Documentation/media/uapi/v4l/devices.rst b/Documentation/media/uapi/v4l/devices.rst
> > index 12d43fe711cf..1822c66c2154 100644
> > --- a/Documentation/media/uapi/v4l/devices.rst
> > +++ b/Documentation/media/uapi/v4l/devices.rst
> > @@ -16,6 +16,7 @@ Interfaces
> >      dev-osd
> >      dev-codec
> >      dev-decoder
> > +    dev-encoder
> >      dev-effect
> >      dev-raw-vbi
> >      dev-sliced-vbi
> > diff --git a/Documentation/media/uapi/v4l/v4l2.rst b/Documentation/media/uapi/v4l/v4l2.rst
> > index 65dc096199ad..2ef6693b9499 100644
> > --- a/Documentation/media/uapi/v4l/v4l2.rst
> > +++ b/Documentation/media/uapi/v4l/v4l2.rst
> > @@ -56,6 +56,7 @@ Authors, in alphabetical order:
> >  - Figa, Tomasz <tfiga@chromium.org>
> >
> >    - Documented the memory-to-memory decoder interface.
> > +  - Documented the memory-to-memory encoder interface.
> >
> >  - H Schimek, Michael <mschimek@gmx.at>
> >
> > @@ -68,6 +69,7 @@ Authors, in alphabetical order:
> >  - Osciak, Pawel <posciak@chromium.org>
> >
> >    - Documented the memory-to-memory decoder interface.
> > +  - Documented the memory-to-memory encoder interface.
> >
> >  - Osciak, Pawel <pawel@osciak.com>
> >
> >
>
> One general comment:
>
> you often talk about 'the driver must', e.g.:
>
> "The driver must process and encode as normal all ``OUTPUT`` buffers
> queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."
>
> But this is not a driver specification, it is an API specification.
>
> I think it would be better to phrase it like this:
>
> "All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
> was issued will be processed and encoded as normal."
>
> (or perhaps even 'shall' if you want to be really formal)
>
> End-users do not really care what drivers do, they want to know what the API does,
> and that implies rules for drivers.

While I see the point, I'm not fully convinced that it makes the
documentation easier to read. We defined "client" for the purpose of
not using the passive form too much, so possibly we could also define
"driver" in the glossary. Maybe it's just me, but I find that
referring directly to both sides of the API and using the active form
is much easier to read.

Possibly just replacing "driver" with "encoder" would ease your concern?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-26 10:36       ` Philipp Zabel
@ 2018-08-07  6:55         ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-07  6:55 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Hans Verkuil, Linux Media Mailing List,
	Linux Kernel Mailing List, Stanimir Varbanov,
	Mauro Carvalho Chehab, Pawel Osciak, Alexandre Courbot, kamil,
	a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Thu, Jul 26, 2018 at 7:36 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> On Thu, 2018-07-26 at 19:20 +0900, Tomasz Figa wrote:
> [...]
> > > You might want to mention that if there are insufficient buffers, then
> > > VIDIOC_CREATE_BUFS can be used to add more buffers.
> > >
> >
> > This might be a bit tricky, since at least s5p-mfc and coda can only
> > work on a fixed buffer set and one would need to fully reinitialize
> > the decoding to add one more buffer, which would effectively be the
> > full resolution change sequence, as below, just with REQBUFS(0),
> > REQBUFS(N) replaced with CREATE_BUFS.
>
> The coda driver supports CREATE_BUFS on the decoder CAPTURE queue.
>
> The firmware indeed needs a fixed frame buffer set, but these buffers
> are internal only and in a coda specific tiling format. The content of
> finished internal buffers is copied / detiled into the external CAPTURE
> buffers, so those can be added at will.

Thanks for clarifying. I forgot about that internal copy indeed.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-26 10:57       ` Hans Verkuil
@ 2018-08-07  7:05         ` Tomasz Figa
  2018-08-07  7:37           ` Hans Verkuil
  0 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-08-07  7:05 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Thu, Jul 26, 2018 at 7:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 26/07/18 12:20, Tomasz Figa wrote:
> > Hi Hans,
> >
> > On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >>
> >> Hi Tomasz,
> >>
> >> Many, many thanks for working on this! It's a great document and when done
> >> it will be very useful indeed.
> >>
> >> Review comments follow...
> >
> > Thanks for review!
> >
> >>
> >> On 24/07/18 16:06, Tomasz Figa wrote:
>
> > [snip]
>
> >> Note that the v4l2_pix_format/S_FMT documentation needs to be updated as well
> >> since we never allowed 0x0 before.
> >
> > Is there any text that disallows this? I couldn't spot any. Generally
> > there are already drivers which return 0x0 for coded formats (s5p-mfc)
> > and it's not even strange, because in such case, the buffer contains
> > just a sequence of bytes, not a 2D picture.
>
> All non-m2m devices will always have non-zero width/height values. Only with
> m2m devices do we see this.
>
> This was probably never documented since before m2m appeared it was 'obvious'.
>
> This definitely needs to be documented, though.
>

Fair enough. Let me try to add a note there.

> >
> >> What if you set the format to 0x0 but the stream does not have meta data with
> >> the resolution? How does userspace know if 0x0 is allowed or not? If this is
> >> specific to the chosen coded pixel format, should be add a new flag for those
> >> formats indicating that the coded data contains resolution information?
> >
> > Yes, this would definitely be on a per-format basis. Not sure what you
> > mean by a flag, though? E.g. if the format is set to H264, then it's
> > bound to include resolution information. If the format doesn't include
> > it, then userspace is already aware of this fact, because it needs to
> > get this from some other source (e.g. container).
> >
> >>
> >> That way userspace knows if 0x0 can be used, and the driver can reject 0x0
> >> for formats that do not support it.
> >
> > As above, but I might be misunderstanding your suggestion.
>
> So my question is: is this tied to the pixel format, or should we make it
> explicit with a flag like V4L2_FMT_FLAG_CAN_DECODE_WXH.
>
> The advantage of a flag is that you don't need a switch on the format to
> know whether or not 0x0 is allowed. And the flag can just be set in
> v4l2-ioctls.c.

As far as my understanding goes, what data is included in the stream
is definitely specified by format. For example, a H264 elementary
stream will always include those data as a part of SPS.

However, having such flag internally, not exposed to userspace, could
indeed be useful to avoid all drivers have such switch. That wouldn't
belong to this documentation, though, since it would be just kernel
API.

>
> >>> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> >>> +sets.
> >>
> >> 'changing buffer sets': not clear what is meant by this. It's certainly not
> >> 'solely' since it can also be used to achieve an instantaneous seek.
> >>
> >
> > To be honest, I'm not sure whether there is even a need to include
> > this whole section. It's obvious that if you stop feeding a mem2mem
> > device, it will pause. Moreover, other sections imply various
> > behaviors triggered by STREAMOFF/STREAMON/DECODER_CMD/etc., so it
> > should be quite clear that they are different from a simple pause.
> > What do you think?
>
> Yes, I'd drop this last sentence ('Similarly...sets').
>

Ack.

> >>> +2.  After all buffers containing decoded frames from before the resolution
> >>> +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
> >>> +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
> >>> +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> >>> +
> >>> +    * The last buffer from before the change must be marked with
> >>> +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the
> >>
> >> spurious 'as'?
> >>
> >
> > It should be:
> >
> >     * The last buffer from before the change must be marked with
> >       the ``V4L2_BUF_FLAG_LAST`` flag in :c:type:`v4l2_buffer` ``flags`` field,
> >       similarly to the
>
> Ah, OK. Now I get it.
>
> >> I wonder if we should make these min buffer controls required. It might be easier
> >> that way.
> >
> > Agreed. Although userspace is still free to ignore it, because REQBUFS
> > would do the right thing anyway.
>
> It's never been entirely clear to me what the purpose of those min buffers controls
> is. REQBUFS ensures that the number of buffers is at least the minimum needed to
> make the HW work. So why would you need these controls? It only makes sense if they
> return something different from REQBUFS.
>

The purpose of those controls is to let the client allocate a number
of buffers bigger than minimum, without the need to allocate the
minimum number of buffers first (to just learn the number), free them
and then allocate a bigger number again.

> >
> >>
> >>> +7.  If all the following conditions are met, the client may resume the
> >>> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> >>> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
> >>> +    sequence:
> >>> +
> >>> +    * ``sizeimage`` of new format is less than or equal to the size of
> >>> +      currently allocated buffers,
> >>> +
> >>> +    * the number of buffers currently allocated is greater than or equal to
> >>> +      the minimum number of buffers acquired in step 6.
> >>
> >> You might want to mention that if there are insufficient buffers, then
> >> VIDIOC_CREATE_BUFS can be used to add more buffers.
> >>
> >
> > This might be a bit tricky, since at least s5p-mfc and coda can only
> > work on a fixed buffer set and one would need to fully reinitialize
> > the decoding to add one more buffer, which would effectively be the
> > full resolution change sequence, as below, just with REQBUFS(0),
> > REQBUFS(N) replaced with CREATE_BUFS.
>
> What happens today in those drivers if you try to call CREATE_BUFS?

s5p-mfc doesn't set the .vidioc_create_bufs pointer in its
v4l2_ioctl_ops, so I suppose that would be -ENOTTY?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-30 12:52   ` Hans Verkuil
@ 2018-08-07  7:08     ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-07  7:08 UTC (permalink / raw)
  To: Hans Verkuil, nicolas
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, Paul Kocialkowski, Laurent Pinchart, dave.stevenson,
	Ezequiel Garcia

On Mon, Jul 30, 2018 at 9:52 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 07/24/2018 04:06 PM, Tomasz Figa wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequences of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, drain and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or coda.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
> >  3 files changed, 882 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst b/Documentation/media/uapi/v4l/dev-decoder.rst
> > new file mode 100644
> > index 000000000000..f55d34d2f860
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> > @@ -0,0 +1,872 @@
>
> <snip>
>
> > +6.  This step only applies to coded formats that contain resolution
> > +    information in the stream. Continue queuing/dequeuing bitstream
> > +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
> > +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and returning
> > +    each buffer to the client until required metadata to configure the
> > +    ``CAPTURE`` queue are found. This is indicated by the driver sending
> > +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> > +    requirement to pass enough data for this to occur in the first buffer
> > +    and the driver must be able to process any number.
> > +
> > +    * If data in a buffer that triggers the event is required to decode
> > +      the first frame, the driver must not return it to the client,
> > +      but must retain it for further decoding.
> > +
> > +    * If the client set width and height of ``OUTPUT`` format to 0, calling
> > +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return -EPERM,
> > +      until the driver configures ``CAPTURE`` format according to stream
> > +      metadata.
>
> What about calling TRY/S_FMT on the capture queue: will this also return -EPERM?
> I assume so.

We should make it so indeed, to make things consistent.

On another note, I don't really like this -EPERM here, as one could
just see that the format is 0x0 and know that it's not valid. This is
only needed for legacy userspace that doesn't handle the source change
event in initial stream parsing and just checks whether G_FMT returns
an error instead.

Nicolas, for more insight here.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-26 10:20     ` Tomasz Figa
  2018-07-26 10:36       ` Philipp Zabel
  2018-07-26 10:57       ` Hans Verkuil
@ 2018-08-07  7:13       ` Hans Verkuil
  2018-08-07 19:11         ` Maxime Jourdan
  2018-08-08  3:11         ` Tomasz Figa
  2018-09-19 10:17       ` Tomasz Figa
  3 siblings, 2 replies; 62+ messages in thread
From: Hans Verkuil @ 2018-08-07  7:13 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On 07/26/2018 12:20 PM, Tomasz Figa wrote:
> Hi Hans,
> 
> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>> +
>>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
>>> +
>>> +Decoding
>>> +========
>>> +
>>> +This state is reached after a successful initialization sequence. In this
>>> +state, client queues and dequeues buffers to both queues via
>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
>>> +semantics.
>>> +
>>> +Both queues operate independently, following standard behavior of V4L2
>>> +buffer queues and memory-to-memory devices. In addition, the order of
>>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
>>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
>>> +coded format, e.g. frame reordering. The client must not assume any direct
>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
>>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
>>
>> Is there a relationship between capture and output buffers w.r.t. the timestamp
>> field? I am not aware that there is one.
> 
> I believe the decoder was expected to copy the timestamp of matching
> OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
> to be implementing it this way. I guess it might be a good idea to
> specify this more explicitly.

What about an output buffer producing multiple capture buffers? Or the case
where the encoded bitstream of a frame starts at one output buffer and ends
at another? What happens if you have B frames and the order of the capture
buffers is different from the output buffers?

In other words, for codecs there is no clear 1-to-1 relationship between an
output buffer and a capture buffer. And we never defined what the 'copy timestamp'
behavior should be in that case or if it even makes sense.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-08-07  6:54     ` Tomasz Figa
@ 2018-08-07  7:25       ` Hans Verkuil
  2018-10-16  7:36       ` Tomasz Figa
  1 sibling, 0 replies; 62+ messages in thread
From: Hans Verkuil @ 2018-08-07  7:25 UTC (permalink / raw)
  To: Tomasz Figa, Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On 08/07/2018 08:54 AM, Tomasz Figa wrote:
> Hi Hans,
> 
> On Wed, Jul 25, 2018 at 10:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> On 24/07/18 16:06, Tomasz Figa wrote:
>>> Due to complexity of the video encoding process, the V4L2 drivers of
>>> stateful encoder hardware require specific sequences of V4L2 API calls
>>> to be followed. These include capability enumeration, initialization,
>>> encoding, encode parameters change, drain and reset.
>>>
>>> Specifics of the above have been discussed during Media Workshops at
>>> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
>>> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
>>> originated at those events was later implemented by the drivers we already
>>> have merged in mainline, such as s5p-mfc or coda.
>>>
>>> The only thing missing was the real specification included as a part of
>>> Linux Media documentation. Fix it now and document the encoder part of
>>> the Codec API.
>>>
>>> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
>>> ---
>>>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>>>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>>>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>>>  3 files changed, 553 insertions(+)
>>>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
>>>
>>> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
>>> new file mode 100644
>>> index 000000000000..28be1698e99c
>>> --- /dev/null
>>> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
>>> @@ -0,0 +1,550 @@
>>> +.. -*- coding: utf-8; mode: rst -*-
>>> +
>>> +.. _encoder:
>>> +
>>> +****************************************
>>> +Memory-to-memory Video Encoder Interface
>>> +****************************************
>>> +
>>> +Input data to a video encoder are raw video frames in display order
>>> +to be encoded into the output bitstream. Output data are complete chunks of
>>> +valid bitstream, including all metadata, headers, etc. The resulting stream
>>> +must not need any further post-processing by the client.
>>
>> Due to the confusing use capture and output I wonder if it would be better to
>> rephrase this as follows:
>>
>> "A video encoder takes raw video frames in display order and encodes them into
>> a bitstream. It generates complete chunks of the bitstream, including
>> all metadata, headers, etc. The resulting bitstream does not require any further
>> post-processing by the client."
>>
>> Something similar should be done for the decoder documentation.
>>
> 
> First, thanks a lot for review!
> 
> Sounds good to me, it indeed feels much easier to read, thanks.
> 
> [snip]
>>> +
>>> +IDR
>>> +   a type of a keyframe in H.264-encoded stream, which clears the list of
>>> +   earlier reference frames (DPBs)
>>
>> Same problem as with the previous patch: it doesn't say what IDR stands for.
>> It also refers to DPBs, but DPB is not part of this glossary.
> 
> Ack.
> 
>>
>> Perhaps the glossary of the encoder/decoder should be combined.
>>
> 
> There are some terms that have slightly different nuance between
> encoder and decoder, so while it would be possible to just include
> both meanings (as it was in RFC), I wonder if it wouldn't make it more
> difficult to read, also given that it would move it to a separate
> page. No strong opinion, though.

I don't have a strong opinion either. Let's keep it as is, we can always
change it later.

>>> +   * Setting the source resolution will reset visible resolution to the
>>> +     adjusted source resolution rounded up to the closest visible
>>> +     resolution supported by the driver. Similarly, coded resolution will
>>
>> coded -> the coded
> 
> Ack.
> 
>>
>>> +     be reset to source resolution rounded up to the closest coded
>>
>> reset -> set
>> source -> the source
> 
> Ack.
> 
>>
>>> +     resolution supported by the driver (typically a multiple of
>>> +     macroblock size).
>>
>> The first sentence of this paragraph is very confusing. It needs a bit more work,
>> I think.
> 
> Actually, this applies to all crop rectangles, not just visible
> resolution. How about the following?
> 
>     Setting the source resolution will reset the crop rectangles to
> default values

default -> their default

>     corresponding to the new resolution, as described further in this document.

Does 'this document' refer to this encoder chapter, or the whole v4l2 spec? It
might be better to provide an explicit link here.

>     Similarly, the coded resolution will be reset to match source

source -> the source

> resolution rounded up
>     to the closest coded resolution supported by the driver (typically
> a multiple of

of -> of the

>     macroblock size).

Anyway, this is much better.

>>> +Both queues operate independently, following standard behavior of V4L2
>>> +buffer queues and memory-to-memory devices. In addition, the order of
>>> +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
>>> +queuing raw frames to ``OUTPUT`` queue, due to properties of selected coded
>>> +format, e.g. frame reordering. The client must not assume any direct
>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
>>> +reported by :c:type:`v4l2_buffer` ``timestamp``.
>>
>> Same question as for the decoder: are you sure about that?
>>
> 
> I think it's the same answer here. That's why we have the timestamp
> copy mechanism, right?

See my reply from a few minutes ago. I'm not convinced copying timestamps
makes sense for codecs.

>>> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
>>> +   processed:
>>> +
>>> +   * Once all decoded frames (if any) are ready to be dequeued on the
>>> +     ``CAPTURE`` queue the driver must send a ``V4L2_EVENT_EOS``. The
>>> +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
>>> +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
>>> +     last frame (if any) produced as a result of processing the ``OUTPUT``
>>> +     buffers queued before
>>> +     ``V4L2_ENC_CMD_STOP``.
>>
>> Hmm, this is somewhat awkward phrasing. Can you take another look at this?
>>
> 
> How about this?
> 
> 3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
>    processed:
> 
>    * The driver returns all ``CAPTURE`` buffers corresponding to processed
>      ``OUTPUT`` buffers, if any. The last buffer must have
> ``V4L2_BUF_FLAG_LAST``
>      set in its :c:type:`v4l2_buffer` ``flags`` field.
> 
>    * The driver sends a ``V4L2_EVENT_EOS`` event.

I'd rephrase that last sentence to:

* Once the last buffer is returned the driver sends a ``V4L2_EVENT_EOS`` event.

>> One general comment:
>>
>> you often talk about 'the driver must', e.g.:
>>
>> "The driver must process and encode as normal all ``OUTPUT`` buffers
>> queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."
>>
>> But this is not a driver specification, it is an API specification.
>>
>> I think it would be better to phrase it like this:
>>
>> "All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
>> was issued will be processed and encoded as normal."
>>
>> (or perhaps even 'shall' if you want to be really formal)
>>
>> End-users do not really care what drivers do, they want to know what the API does,
>> and that implies rules for drivers.
> 
> While I see the point, I'm not fully convinced that it makes the
> documentation easier to read. We defined "client" for the purpose of
> not using the passive form too much, so possibly we could also define
> "driver" in the glossary. Maybe it's just me, but I find that
> referring directly to both sides of the API and using the active form
> is much easier to read.
> 
> Possibly just replacing "driver" with "encoder" would ease your concern?

Actually, yes. I think that would work quite well.

Also, the phrase "the driver must" can be replaced by "the encoder will"
which describes the behavior of the encoder, which in turn defines what
the underlying driver must do.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-07  7:05         ` Tomasz Figa
@ 2018-08-07  7:37           ` Hans Verkuil
  2018-08-08  2:55             ` Tomasz Figa
  0 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-08-07  7:37 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On 08/07/2018 09:05 AM, Tomasz Figa wrote:
> On Thu, Jul 26, 2018 at 7:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>> What if you set the format to 0x0 but the stream does not have meta data with
>>>> the resolution? How does userspace know if 0x0 is allowed or not? If this is
>>>> specific to the chosen coded pixel format, should be add a new flag for those
>>>> formats indicating that the coded data contains resolution information?
>>>
>>> Yes, this would definitely be on a per-format basis. Not sure what you
>>> mean by a flag, though? E.g. if the format is set to H264, then it's
>>> bound to include resolution information. If the format doesn't include
>>> it, then userspace is already aware of this fact, because it needs to
>>> get this from some other source (e.g. container).
>>>
>>>>
>>>> That way userspace knows if 0x0 can be used, and the driver can reject 0x0
>>>> for formats that do not support it.
>>>
>>> As above, but I might be misunderstanding your suggestion.
>>
>> So my question is: is this tied to the pixel format, or should we make it
>> explicit with a flag like V4L2_FMT_FLAG_CAN_DECODE_WXH.
>>
>> The advantage of a flag is that you don't need a switch on the format to
>> know whether or not 0x0 is allowed. And the flag can just be set in
>> v4l2-ioctls.c.
> 
> As far as my understanding goes, what data is included in the stream
> is definitely specified by format. For example, a H264 elementary
> stream will always include those data as a part of SPS.
> 
> However, having such flag internally, not exposed to userspace, could
> indeed be useful to avoid all drivers have such switch. That wouldn't
> belong to this documentation, though, since it would be just kernel
> API.

Why would you keep this internally only?

>>>> I wonder if we should make these min buffer controls required. It might be easier
>>>> that way.
>>>
>>> Agreed. Although userspace is still free to ignore it, because REQBUFS
>>> would do the right thing anyway.
>>
>> It's never been entirely clear to me what the purpose of those min buffers controls
>> is. REQBUFS ensures that the number of buffers is at least the minimum needed to
>> make the HW work. So why would you need these controls? It only makes sense if they
>> return something different from REQBUFS.
>>
> 
> The purpose of those controls is to let the client allocate a number
> of buffers bigger than minimum, without the need to allocate the
> minimum number of buffers first (to just learn the number), free them
> and then allocate a bigger number again.

I don't feel this is particularly useful. One problem with the minimum number
of buffers as used in the kernel is that it is often the minimum number of
buffers required to make the hardware work, but it may not be optimal. E.g.
quite a few capture drivers set the minimum to 2, which is enough for the
hardware, but it will likely lead to dropped frames. You really need 3
(one is being DMAed, one is queued and linked into the DMA engine and one is
being processed by userspace).

I would actually prefer this to be the recommended minimum number of buffers,
which is >= the minimum REQBUFS uses.

I.e., if you use this number and you have no special requirements, then you'll
get good performance.

> 
>>>
>>>>
>>>>> +7.  If all the following conditions are met, the client may resume the
>>>>> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
>>>>> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
>>>>> +    sequence:
>>>>> +
>>>>> +    * ``sizeimage`` of new format is less than or equal to the size of
>>>>> +      currently allocated buffers,
>>>>> +
>>>>> +    * the number of buffers currently allocated is greater than or equal to
>>>>> +      the minimum number of buffers acquired in step 6.
>>>>
>>>> You might want to mention that if there are insufficient buffers, then
>>>> VIDIOC_CREATE_BUFS can be used to add more buffers.
>>>>
>>>
>>> This might be a bit tricky, since at least s5p-mfc and coda can only
>>> work on a fixed buffer set and one would need to fully reinitialize
>>> the decoding to add one more buffer, which would effectively be the
>>> full resolution change sequence, as below, just with REQBUFS(0),
>>> REQBUFS(N) replaced with CREATE_BUFS.
>>
>> What happens today in those drivers if you try to call CREATE_BUFS?
> 
> s5p-mfc doesn't set the .vidioc_create_bufs pointer in its
> v4l2_ioctl_ops, so I suppose that would be -ENOTTY?

Correct for s5p-mfc.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-07  7:13       ` Hans Verkuil
@ 2018-08-07 19:11         ` Maxime Jourdan
  2018-08-08  3:07           ` Tomasz Figa
  2018-08-08  3:11         ` Tomasz Figa
  1 sibling, 1 reply; 62+ messages in thread
From: Maxime Jourdan @ 2018-08-07 19:11 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Tomasz Figa, Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

2018-08-07 9:13 GMT+02:00 Hans Verkuil <hverkuil@xs4all.nl>:
> On 07/26/2018 12:20 PM, Tomasz Figa wrote:
>> Hi Hans,
>>
>> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>> +
>>>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
>>>> +
>>>> +Decoding
>>>> +========
>>>> +
>>>> +This state is reached after a successful initialization sequence. In this
>>>> +state, client queues and dequeues buffers to both queues via
>>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
>>>> +semantics.
>>>> +
>>>> +Both queues operate independently, following standard behavior of V4L2
>>>> +buffer queues and memory-to-memory devices. In addition, the order of
>>>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
>>>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
>>>> +coded format, e.g. frame reordering. The client must not assume any direct
>>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
>>>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
>>>
>>> Is there a relationship between capture and output buffers w.r.t. the timestamp
>>> field? I am not aware that there is one.
>>
>> I believe the decoder was expected to copy the timestamp of matching
>> OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
>> to be implementing it this way. I guess it might be a good idea to
>> specify this more explicitly.
>
> What about an output buffer producing multiple capture buffers? Or the case
> where the encoded bitstream of a frame starts at one output buffer and ends
> at another? What happens if you have B frames and the order of the capture
> buffers is different from the output buffers?
>
> In other words, for codecs there is no clear 1-to-1 relationship between an
> output buffer and a capture buffer. And we never defined what the 'copy timestamp'
> behavior should be in that case or if it even makes sense.
>
> Regards,
>
>         Hans

As it is done right now in userspace (FFmpeg, GStreamer) and most (if
not all?) drivers, it's a 1:1 between OUTPUT and CAPTURE. The only
thing that changes is the ordering since OUTPUT buffers are in
decoding order while CAPTURE buffers are in presentation order.

This almost always implies some timestamping kung-fu to match the
OUTPUT timestamps with the corresponding CAPTURE timestamps. It's
often done indirectly by the firmware on some platforms (rpi comes to
mind iirc).

The current constructions also imply one video packet per OUTPUT
buffer. If a video packet is too big to fit in a buffer, FFmpeg will
crop that packet to the maximum buffer size and will discard the
remaining packet data. GStreamer will abort the decoding. This is
unfortunately one of the shortcomings of having fixed-size buffers.
And if they were to split the packet in multiple buffers, then some
drivers in their current state wouldn't be able to handle the
timestamping issues and/or x:1 OUTPUT:CAPTURE buffer numbers.

Maxime

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
  2018-07-25 11:58   ` Hans Verkuil
  2018-07-30 12:52   ` Hans Verkuil
@ 2018-08-08  2:46   ` Tomasz Figa
  2018-08-20 13:04   ` Philipp Zabel
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-08  2:46 UTC (permalink / raw)
  To: Maxime Jourdan
  Cc: Linux Kernel Mailing List, Linux Media Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Maxime,

On Tue, Aug 7, 2018 at 5:32 AM Maxime Jourdan <maxi.jourdan@wanadoo.fr> wrote:
>
> Hi Tomasz,
>
> Sorry for sending this email only to you, I subscribed to linux-media
> after you posted this and I'm not sure how to respond to everybody.
>

No worries. Let me reply with other recipients added back. Thanks for
your comments.

> I'm currently developing a V4L2 M2M decoder driver for Amlogic SoCs so
> my comments are somewhat biased towards it
> (https://github.com/Elyotna/linux)
>
> > +Seek
> > +====
> > +
> > +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> > +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
> > +
> > +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> > +     treated as returned to the client (following standard semantics).
> > +
> > +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +   * The driver must be put in a state after seek and be ready to
> > +     accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream starting exactly from
> > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > +      data queued and must keep processing the queued buffers until it
> > +      finds a suitable resume point. While looking for a resume point, the
> > +      driver processes ``OUTPUT`` buffers and returns them to the client
> > +      without producing any decoded frames.
> > +
> > +      For hardware known to be mishandling seeks to a non-resume point,
> > +      e.g. by returning corrupted decoded frames, the driver must be able
> > +      to handle such seeks without a crash or any fatal decode error.
>
> This is unfortunately my case, apart from parsing the bitstream
> manually - which is a no-no -, there is no way to know when I'll be
> writing in an IDR frame to the HW bitstream parser. I think it would
> be much preferable that the client starts sending in an IDR frame for
> sure.

Most of the hardware, which have upstream drivers, deal with this
correctly and there is existing user space that relies on this, so we
cannot simply add such requirement. However, when sending your driver
upstream, feel free to include a patch that adds a read-only control
that tells the user space that it needs to do seeks to resume points.
Obviously this will work only with user space aware of this
requirement, but I don't think we can do anything better here.

>
> > +4. After a resume point is found, the driver will start returning
> > +   ``CAPTURE`` buffers with decoded frames.
> > +
> > +   * There is no precise specification for ``CAPTURE`` queue of when it
> > +     will start producing buffers containing decoded data from buffers
> > +     queued after the seek, as it operates independently
> > +     from ``OUTPUT`` queue.
> > +
> > +     * The driver is allowed to and may return a number of remaining
> > +       ``CAPTURE`` buffers containing decoded frames from before the seek
> > +       after the seek sequence (STREAMOFF-STREAMON) is performed.
> > +
> > +     * The driver is also allowed to and may not return all decoded frames
> > +       queued but not decode before the seek sequence was initiated. For
> > +       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
> > +       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > +       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
> > +       H’}, {A’, G’, H’}, {G’, H’}.
> > +
> > +   .. note::
> > +
> > +      To achieve instantaneous seek, the client may restart streaming on
> > +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
>
> Overall, I think Drain followed by V4L2_DEC_CMD_START is a more
> applicable scenario for seeking.
> Heck, simply starting to queue buffers at the seek - starting with an
> IDR - without doing any kind of streamon/off or cmd_start(stop) will
> do the trick.

Why do you think so?

For a seek, as expected by a typical device user, the result should be
discarding anything already queued and just start decoding new frames
as soon as possible.

Actually, this section doesn't describe any specific sequence, just
possible ways to do a seek using existing primitives.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-07  7:37           ` Hans Verkuil
@ 2018-08-08  2:55             ` Tomasz Figa
  2018-08-21 11:29               ` Stanimir Varbanov
  2018-10-15 10:13               ` Tomasz Figa
  0 siblings, 2 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-08  2:55 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Tue, Aug 7, 2018 at 4:37 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 08/07/2018 09:05 AM, Tomasz Figa wrote:
> > On Thu, Jul 26, 2018 at 7:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >>>> What if you set the format to 0x0 but the stream does not have meta data with
> >>>> the resolution? How does userspace know if 0x0 is allowed or not? If this is
> >>>> specific to the chosen coded pixel format, should be add a new flag for those
> >>>> formats indicating that the coded data contains resolution information?
> >>>
> >>> Yes, this would definitely be on a per-format basis. Not sure what you
> >>> mean by a flag, though? E.g. if the format is set to H264, then it's
> >>> bound to include resolution information. If the format doesn't include
> >>> it, then userspace is already aware of this fact, because it needs to
> >>> get this from some other source (e.g. container).
> >>>
> >>>>
> >>>> That way userspace knows if 0x0 can be used, and the driver can reject 0x0
> >>>> for formats that do not support it.
> >>>
> >>> As above, but I might be misunderstanding your suggestion.
> >>
> >> So my question is: is this tied to the pixel format, or should we make it
> >> explicit with a flag like V4L2_FMT_FLAG_CAN_DECODE_WXH.
> >>
> >> The advantage of a flag is that you don't need a switch on the format to
> >> know whether or not 0x0 is allowed. And the flag can just be set in
> >> v4l2-ioctls.c.
> >
> > As far as my understanding goes, what data is included in the stream
> > is definitely specified by format. For example, a H264 elementary
> > stream will always include those data as a part of SPS.
> >
> > However, having such flag internally, not exposed to userspace, could
> > indeed be useful to avoid all drivers have such switch. That wouldn't
> > belong to this documentation, though, since it would be just kernel
> > API.
>
> Why would you keep this internally only?
>

Well, either keep it internal or make it read-only for the user space,
since the behavior is already defined by selected pixel format.

> >>>> I wonder if we should make these min buffer controls required. It might be easier
> >>>> that way.
> >>>
> >>> Agreed. Although userspace is still free to ignore it, because REQBUFS
> >>> would do the right thing anyway.
> >>
> >> It's never been entirely clear to me what the purpose of those min buffers controls
> >> is. REQBUFS ensures that the number of buffers is at least the minimum needed to
> >> make the HW work. So why would you need these controls? It only makes sense if they
> >> return something different from REQBUFS.
> >>
> >
> > The purpose of those controls is to let the client allocate a number
> > of buffers bigger than minimum, without the need to allocate the
> > minimum number of buffers first (to just learn the number), free them
> > and then allocate a bigger number again.
>
> I don't feel this is particularly useful. One problem with the minimum number
> of buffers as used in the kernel is that it is often the minimum number of
> buffers required to make the hardware work, but it may not be optimal. E.g.
> quite a few capture drivers set the minimum to 2, which is enough for the
> hardware, but it will likely lead to dropped frames. You really need 3
> (one is being DMAed, one is queued and linked into the DMA engine and one is
> being processed by userspace).
>
> I would actually prefer this to be the recommended minimum number of buffers,
> which is >= the minimum REQBUFS uses.
>
> I.e., if you use this number and you have no special requirements, then you'll
> get good performance.

I guess we could make it so. It would make existing user space request
more buffers than it used to with the original meaning, but I guess it
shouldn't be a big problem.

>
> >
> >>>
> >>>>
> >>>>> +7.  If all the following conditions are met, the client may resume the
> >>>>> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> >>>>> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
> >>>>> +    sequence:
> >>>>> +
> >>>>> +    * ``sizeimage`` of new format is less than or equal to the size of
> >>>>> +      currently allocated buffers,
> >>>>> +
> >>>>> +    * the number of buffers currently allocated is greater than or equal to
> >>>>> +      the minimum number of buffers acquired in step 6.
> >>>>
> >>>> You might want to mention that if there are insufficient buffers, then
> >>>> VIDIOC_CREATE_BUFS can be used to add more buffers.
> >>>>
> >>>
> >>> This might be a bit tricky, since at least s5p-mfc and coda can only
> >>> work on a fixed buffer set and one would need to fully reinitialize
> >>> the decoding to add one more buffer, which would effectively be the
> >>> full resolution change sequence, as below, just with REQBUFS(0),
> >>> REQBUFS(N) replaced with CREATE_BUFS.
> >>
> >> What happens today in those drivers if you try to call CREATE_BUFS?
> >
> > s5p-mfc doesn't set the .vidioc_create_bufs pointer in its
> > v4l2_ioctl_ops, so I suppose that would be -ENOTTY?
>
> Correct for s5p-mfc.

As Philipp clarified, coda supports adding buffers on the fly. I
briefly looked at venus and mtk-vcodec and they seem to use m2m
implementation of CREATE_BUFS. Not sure if anyone tested that, though.
So the only hardware I know for sure cannot support this is s5p-mfc.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-07 19:11         ` Maxime Jourdan
@ 2018-08-08  3:07           ` Tomasz Figa
  2018-08-08  7:19             ` Maxime Jourdan
  0 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-08-08  3:07 UTC (permalink / raw)
  To: Maxime Jourdan
  Cc: Hans Verkuil, Linux Media Mailing List,
	Linux Kernel Mailing List, Stanimir Varbanov,
	Mauro Carvalho Chehab, Pawel Osciak, Alexandre Courbot, kamil,
	a.hajda, Kyungmin Park, jtp.park, Philipp Zabel,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Wed, Aug 8, 2018 at 4:11 AM Maxime Jourdan <maxi.jourdan@wanadoo.fr> wrote:
>
> 2018-08-07 9:13 GMT+02:00 Hans Verkuil <hverkuil@xs4all.nl>:
> > On 07/26/2018 12:20 PM, Tomasz Figa wrote:
> >> Hi Hans,
> >>
> >> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >>>> +
> >>>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> >>>> +
> >>>> +Decoding
> >>>> +========
> >>>> +
> >>>> +This state is reached after a successful initialization sequence. In this
> >>>> +state, client queues and dequeues buffers to both queues via
> >>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> >>>> +semantics.
> >>>> +
> >>>> +Both queues operate independently, following standard behavior of V4L2
> >>>> +buffer queues and memory-to-memory devices. In addition, the order of
> >>>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> >>>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
> >>>> +coded format, e.g. frame reordering. The client must not assume any direct
> >>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> >>>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
> >>>
> >>> Is there a relationship between capture and output buffers w.r.t. the timestamp
> >>> field? I am not aware that there is one.
> >>
> >> I believe the decoder was expected to copy the timestamp of matching
> >> OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
> >> to be implementing it this way. I guess it might be a good idea to
> >> specify this more explicitly.
> >
> > What about an output buffer producing multiple capture buffers? Or the case
> > where the encoded bitstream of a frame starts at one output buffer and ends
> > at another? What happens if you have B frames and the order of the capture
> > buffers is different from the output buffers?
> >
> > In other words, for codecs there is no clear 1-to-1 relationship between an
> > output buffer and a capture buffer. And we never defined what the 'copy timestamp'
> > behavior should be in that case or if it even makes sense.
> >
> > Regards,
> >
> >         Hans
>
> As it is done right now in userspace (FFmpeg, GStreamer) and most (if
> not all?) drivers, it's a 1:1 between OUTPUT and CAPTURE. The only
> thing that changes is the ordering since OUTPUT buffers are in
> decoding order while CAPTURE buffers are in presentation order.

If I understood it correctly, there is a feature in VP9 that lets one
frame repeat several times, which would make one OUTPUT buffer produce
multiple CAPTURE buffers.

Moreover, V4L2_PIX_FMT_H264 is actually defined to be a byte stream,
without any need for framing, and yes, there are drivers that follow
this definition correctly (s5p-mfc and, AFAIR, coda). In that case,
one OUTPUT buffer can have arbitrary amount of bitstream and lead to
multiple CAPTURE frames being produced.

>
> This almost always implies some timestamping kung-fu to match the
> OUTPUT timestamps with the corresponding CAPTURE timestamps. It's
> often done indirectly by the firmware on some platforms (rpi comes to
> mind iirc).

I don't think there is an upstream driver for it, is there? (If not,
are you aware of any work towards it?)

>
> The current constructions also imply one video packet per OUTPUT
> buffer. If a video packet is too big to fit in a buffer, FFmpeg will
> crop that packet to the maximum buffer size and will discard the
> remaining packet data. GStreamer will abort the decoding. This is
> unfortunately one of the shortcomings of having fixed-size buffers.
> And if they were to split the packet in multiple buffers, then some
> drivers in their current state wouldn't be able to handle the
> timestamping issues and/or x:1 OUTPUT:CAPTURE buffer numbers.

In Chromium, we just allocate OUTPUT buffers big enough to be really
unlikely for a single frame not to fit inside [1]. Obviously it's a
waste of memory, for formats which normally have just single frames
inside buffers, but it seems to work in practice.

[1] https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.h?rcl=3468d5a59e00bcb2c2e946a30694e6057fd9ab21&l=118

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-07  7:13       ` Hans Verkuil
  2018-08-07 19:11         ` Maxime Jourdan
@ 2018-08-08  3:11         ` Tomasz Figa
  2018-08-08  6:43           ` Hans Verkuil
  1 sibling, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-08-08  3:11 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Tue, Aug 7, 2018 at 4:13 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 07/26/2018 12:20 PM, Tomasz Figa wrote:
> > Hi Hans,
> >
> > On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >>> +
> >>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> >>> +
> >>> +Decoding
> >>> +========
> >>> +
> >>> +This state is reached after a successful initialization sequence. In this
> >>> +state, client queues and dequeues buffers to both queues via
> >>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> >>> +semantics.
> >>> +
> >>> +Both queues operate independently, following standard behavior of V4L2
> >>> +buffer queues and memory-to-memory devices. In addition, the order of
> >>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> >>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
> >>> +coded format, e.g. frame reordering. The client must not assume any direct
> >>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> >>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
> >>
> >> Is there a relationship between capture and output buffers w.r.t. the timestamp
> >> field? I am not aware that there is one.
> >
> > I believe the decoder was expected to copy the timestamp of matching
> > OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
> > to be implementing it this way. I guess it might be a good idea to
> > specify this more explicitly.
>
> What about an output buffer producing multiple capture buffers? Or the case
> where the encoded bitstream of a frame starts at one output buffer and ends
> at another? What happens if you have B frames and the order of the capture
> buffers is different from the output buffers?
>
> In other words, for codecs there is no clear 1-to-1 relationship between an
> output buffer and a capture buffer. And we never defined what the 'copy timestamp'
> behavior should be in that case or if it even makes sense.

You're perfectly right. There is no 1:1 relationship, but it doesn't
prevent copying timestamps. It just makes it possible for multiple
CAPTURE buffers to have the same timestamp or some OUTPUT timestamps
not to be found in any CAPTURE buffer.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-08  3:11         ` Tomasz Figa
@ 2018-08-08  6:43           ` Hans Verkuil
  2018-08-08  6:54             ` Ian Arkver
  0 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-08-08  6:43 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On 08/08/2018 05:11 AM, Tomasz Figa wrote:
> On Tue, Aug 7, 2018 at 4:13 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> On 07/26/2018 12:20 PM, Tomasz Figa wrote:
>>> Hi Hans,
>>>
>>> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>>> +
>>>>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
>>>>> +
>>>>> +Decoding
>>>>> +========
>>>>> +
>>>>> +This state is reached after a successful initialization sequence. In this
>>>>> +state, client queues and dequeues buffers to both queues via
>>>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
>>>>> +semantics.
>>>>> +
>>>>> +Both queues operate independently, following standard behavior of V4L2
>>>>> +buffer queues and memory-to-memory devices. In addition, the order of
>>>>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
>>>>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
>>>>> +coded format, e.g. frame reordering. The client must not assume any direct
>>>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
>>>>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
>>>>
>>>> Is there a relationship between capture and output buffers w.r.t. the timestamp
>>>> field? I am not aware that there is one.
>>>
>>> I believe the decoder was expected to copy the timestamp of matching
>>> OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
>>> to be implementing it this way. I guess it might be a good idea to
>>> specify this more explicitly.
>>
>> What about an output buffer producing multiple capture buffers? Or the case
>> where the encoded bitstream of a frame starts at one output buffer and ends
>> at another? What happens if you have B frames and the order of the capture
>> buffers is different from the output buffers?
>>
>> In other words, for codecs there is no clear 1-to-1 relationship between an
>> output buffer and a capture buffer. And we never defined what the 'copy timestamp'
>> behavior should be in that case or if it even makes sense.
> 
> You're perfectly right. There is no 1:1 relationship, but it doesn't
> prevent copying timestamps. It just makes it possible for multiple
> CAPTURE buffers to have the same timestamp or some OUTPUT timestamps
> not to be found in any CAPTURE buffer.

We need to document the behavior. Basically there are three different
corner cases that need documenting:

1) one OUTPUT buffer generates multiple CAPTURE buffers
2) multiple OUTPUT buffers generate one CAPTURE buffer
3) the decoding order differs from the presentation order (i.e. the
   CAPTURE buffers are out-of-order compared to the OUTPUT buffers).

For 1) I assume that we just copy the same OUTPUT timestamp to multiple
CAPTURE buffers.

For 2) we need to specify if the CAPTURE timestamp is copied from the first
or last OUTPUT buffer used in creating the capture buffer. Using the last
OUTPUT buffer makes more sense to me.

And 3) implies that timestamps can be out-of-order. This needs to be
very carefully documented since it is very unexpected.

This should probably be a separate patch, adding text to the v4l2_buffer
documentation (esp. the V4L2_BUF_FLAG_TIMESTAMP_COPY documentation).

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-08  6:43           ` Hans Verkuil
@ 2018-08-08  6:54             ` Ian Arkver
  0 siblings, 0 replies; 62+ messages in thread
From: Ian Arkver @ 2018-08-08  6:54 UTC (permalink / raw)
  To: Hans Verkuil, Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Hans,

On 08/08/18 07:43, Hans Verkuil wrote:
> On 08/08/2018 05:11 AM, Tomasz Figa wrote:
>> On Tue, Aug 7, 2018 at 4:13 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>
>>> On 07/26/2018 12:20 PM, Tomasz Figa wrote:
>>>> Hi Hans,
>>>>
>>>> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>>>> +
>>>>>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
>>>>>> +
>>>>>> +Decoding
>>>>>> +========
>>>>>> +
>>>>>> +This state is reached after a successful initialization sequence. In this
>>>>>> +state, client queues and dequeues buffers to both queues via
>>>>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
>>>>>> +semantics.
>>>>>> +
>>>>>> +Both queues operate independently, following standard behavior of V4L2
>>>>>> +buffer queues and memory-to-memory devices. In addition, the order of
>>>>>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
>>>>>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
>>>>>> +coded format, e.g. frame reordering. The client must not assume any direct
>>>>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
>>>>>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
>>>>>
>>>>> Is there a relationship between capture and output buffers w.r.t. the timestamp
>>>>> field? I am not aware that there is one.
>>>>
>>>> I believe the decoder was expected to copy the timestamp of matching
>>>> OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
>>>> to be implementing it this way. I guess it might be a good idea to
>>>> specify this more explicitly.
>>>
>>> What about an output buffer producing multiple capture buffers? Or the case
>>> where the encoded bitstream of a frame starts at one output buffer and ends
>>> at another? What happens if you have B frames and the order of the capture
>>> buffers is different from the output buffers?
>>>
>>> In other words, for codecs there is no clear 1-to-1 relationship between an
>>> output buffer and a capture buffer. And we never defined what the 'copy timestamp'
>>> behavior should be in that case or if it even makes sense.
>>
>> You're perfectly right. There is no 1:1 relationship, but it doesn't
>> prevent copying timestamps. It just makes it possible for multiple
>> CAPTURE buffers to have the same timestamp or some OUTPUT timestamps
>> not to be found in any CAPTURE buffer.
> 
> We need to document the behavior. Basically there are three different
> corner cases that need documenting:
> 
> 1) one OUTPUT buffer generates multiple CAPTURE buffers
> 2) multiple OUTPUT buffers generate one CAPTURE buffer
> 3) the decoding order differs from the presentation order (i.e. the
>     CAPTURE buffers are out-of-order compared to the OUTPUT buffers).
> 
> For 1) I assume that we just copy the same OUTPUT timestamp to multiple
> CAPTURE buffers.

I'm not sure how this interface would handle something like a temporal
scalability layer, but conceivably this assumption might be invalid in
that case.

Regards,
Ian.

> 
> For 2) we need to specify if the CAPTURE timestamp is copied from the first
> or last OUTPUT buffer used in creating the capture buffer. Using the last
> OUTPUT buffer makes more sense to me.
> 
> And 3) implies that timestamps can be out-of-order. This needs to be
> very carefully documented since it is very unexpected.
> 
> This should probably be a separate patch, adding text to the v4l2_buffer
> documentation (esp. the V4L2_BUF_FLAG_TIMESTAMP_COPY documentation).
> 
> Regards,
> 
> 	Hans
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-08  3:07           ` Tomasz Figa
@ 2018-08-08  7:19             ` Maxime Jourdan
  0 siblings, 0 replies; 62+ messages in thread
From: Maxime Jourdan @ 2018-08-08  7:19 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Maxime Jourdan, Hans Verkuil, Linux Media Mailing List,
	Linux Kernel Mailing List, Stanimir Varbanov,
	Mauro Carvalho Chehab, Pawel Osciak, Alexandre Courbot, kamil,
	a.hajda, Kyungmin Park, jtp.park, Philipp Zabel,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

2018-08-08 5:07 GMT+02:00 Tomasz Figa <tfiga@chromium.org>:
> On Wed, Aug 8, 2018 at 4:11 AM Maxime Jourdan <maxi.jourdan@wanadoo.fr> wrote:
>>
>> 2018-08-07 9:13 GMT+02:00 Hans Verkuil <hverkuil@xs4all.nl>:
>> > On 07/26/2018 12:20 PM, Tomasz Figa wrote:
>> >> Hi Hans,
>> >>
>> >> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>> >>>> +
>> >>>> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
>> >>>> +
>> >>>> +Decoding
>> >>>> +========
>> >>>> +
>> >>>> +This state is reached after a successful initialization sequence. In this
>> >>>> +state, client queues and dequeues buffers to both queues via
>> >>>> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
>> >>>> +semantics.
>> >>>> +
>> >>>> +Both queues operate independently, following standard behavior of V4L2
>> >>>> +buffer queues and memory-to-memory devices. In addition, the order of
>> >>>> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
>> >>>> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
>> >>>> +coded format, e.g. frame reordering. The client must not assume any direct
>> >>>> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
>> >>>> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
>> >>>
>> >>> Is there a relationship between capture and output buffers w.r.t. the timestamp
>> >>> field? I am not aware that there is one.
>> >>
>> >> I believe the decoder was expected to copy the timestamp of matching
>> >> OUTPUT buffer to respective CAPTURE buffer. Both s5p-mfc and coda seem
>> >> to be implementing it this way. I guess it might be a good idea to
>> >> specify this more explicitly.
>> >
>> > What about an output buffer producing multiple capture buffers? Or the case
>> > where the encoded bitstream of a frame starts at one output buffer and ends
>> > at another? What happens if you have B frames and the order of the capture
>> > buffers is different from the output buffers?
>> >
>> > In other words, for codecs there is no clear 1-to-1 relationship between an
>> > output buffer and a capture buffer. And we never defined what the 'copy timestamp'
>> > behavior should be in that case or if it even makes sense.
>> >
>> > Regards,
>> >
>> >         Hans
>>
>> As it is done right now in userspace (FFmpeg, GStreamer) and most (if
>> not all?) drivers, it's a 1:1 between OUTPUT and CAPTURE. The only
>> thing that changes is the ordering since OUTPUT buffers are in
>> decoding order while CAPTURE buffers are in presentation order.
>
> If I understood it correctly, there is a feature in VP9 that lets one
> frame repeat several times, which would make one OUTPUT buffer produce
> multiple CAPTURE buffers.
>
> Moreover, V4L2_PIX_FMT_H264 is actually defined to be a byte stream,
> without any need for framing, and yes, there are drivers that follow
> this definition correctly (s5p-mfc and, AFAIR, coda). In that case,
> one OUTPUT buffer can have arbitrary amount of bitstream and lead to
> multiple CAPTURE frames being produced.

I can see from the code and your answer to Hans that in such case, all
CAPTURE buffers will share the single OUTPUT timestamp.

Does this mean that at the end of the day, userspace disregards the
CAPTURE timestamps since you have the display order guarantee ?
If so, how do you reconstruct the proper PTS on such buffers ? Do you
have them saved from prior demuxing ?

>>
>> This almost always implies some timestamping kung-fu to match the
>> OUTPUT timestamps with the corresponding CAPTURE timestamps. It's
>> often done indirectly by the firmware on some platforms (rpi comes to
>> mind iirc).
>
> I don't think there is an upstream driver for it, is there? (If not,
> are you aware of any work towards it?)

You're right, it's not upstream but it is in a relatively good shape
at https://github.com/6by9/linux/commits/rpi-4.14.y-v4l2-codec

>>
>> The current constructions also imply one video packet per OUTPUT
>> buffer. If a video packet is too big to fit in a buffer, FFmpeg will
>> crop that packet to the maximum buffer size and will discard the
>> remaining packet data. GStreamer will abort the decoding. This is
>> unfortunately one of the shortcomings of having fixed-size buffers.
>> And if they were to split the packet in multiple buffers, then some
>> drivers in their current state wouldn't be able to handle the
>> timestamping issues and/or x:1 OUTPUT:CAPTURE buffer numbers.
>
> In Chromium, we just allocate OUTPUT buffers big enough to be really
> unlikely for a single frame not to fit inside [1]. Obviously it's a
> waste of memory, for formats which normally have just single frames
> inside buffers, but it seems to work in practice.
>
> [1] https://cs.chromium.org/chromium/src/media/gpu/v4l2/v4l2_video_decode_accelerator.h?rcl=3468d5a59e00bcb2c2e946a30694e6057fd9ab21&l=118

Right. As long as you don't need many OUTPUT buffers it's not that big a deal.

[snip]

>> > +      For hardware known to be mishandling seeks to a non-resume point,
>> > +      e.g. by returning corrupted decoded frames, the driver must be able
>> > +      to handle such seeks without a crash or any fatal decode error.
>>
>> This is unfortunately my case, apart from parsing the bitstream
>> manually - which is a no-no -, there is no way to know when I'll be
>> writing in an IDR frame to the HW bitstream parser. I think it would
>> be much preferable that the client starts sending in an IDR frame for
>> sure.
>
> Most of the hardware, which have upstream drivers, deal with this
> correctly and there is existing user space that relies on this, so we
> cannot simply add such requirement. However, when sending your driver
> upstream, feel free to include a patch that adds a read-only control
> that tells the user space that it needs to do seeks to resume points.
> Obviously this will work only with user space aware of this
> requirement, but I don't think we can do anything better here.
>

Makes sense

>> > +      To achieve instantaneous seek, the client may restart streaming on
>> > +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
>>
>> Overall, I think Drain followed by V4L2_DEC_CMD_START is a more
>> applicable scenario for seeking.
>> Heck, simply starting to queue buffers at the seek - starting with an
>> IDR - without doing any kind of streamon/off or cmd_start(stop) will
>> do the trick.
>
> Why do you think so?
>
> For a seek, as expected by a typical device user, the result should be
> discarding anything already queued and just start decoding new frames
> as soon as possible.
>
> Actually, this section doesn't describe any specific sequence, just
> possible ways to do a seek using existing primitives.

Fair enough

Regards,
Maxime

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
                     ` (2 preceding siblings ...)
  2018-08-08  2:46   ` Tomasz Figa
@ 2018-08-20 13:04   ` Philipp Zabel
  2018-08-20 13:12     ` Tomasz Figa
  2018-08-31  8:26   ` Alexandre Courbot
  2018-10-17 13:34   ` Laurent Pinchart
  5 siblings, 1 reply; 62+ messages in thread
From: Philipp Zabel @ 2018-08-20 13:04 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, Alexandre Courbot, kamil, a.hajda,
	Kyungmin Park, jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel

On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
[...]
> +Seek
> +====
> +
> +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
> +
> +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> +     treated as returned to the client (following standard semantics).
> +
> +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +   * The driver must be put in a state after seek and be ready to
> +     accept new source bitstream buffers.
> +
> +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream starting exactly from
> +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> +      data queued and must keep processing the queued buffers until it
> +      finds a suitable resume point. While looking for a resume point, the

I think the definition of a resume point is too vague in this place.
Can the driver decide whether or not a keyframe without SPS is a
suitable resume point? Or do drivers have to parse and store SPS/PPS if
the hardware does not support resuming from a keyframe without sending
SPS/PPS again?

regards
Philipp

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-20 13:04   ` Philipp Zabel
@ 2018-08-20 13:12     ` Tomasz Figa
  2018-08-20 14:13       ` Philipp Zabel
  0 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-08-20 13:12 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Mon, Aug 20, 2018 at 10:04 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> [...]
> > +Seek
> > +====
> > +
> > +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> > +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
> > +
> > +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> > +     treated as returned to the client (following standard semantics).
> > +
> > +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +   * The driver must be put in a state after seek and be ready to
> > +     accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream starting exactly from
> > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > +      data queued and must keep processing the queued buffers until it
> > +      finds a suitable resume point. While looking for a resume point, the
>
> I think the definition of a resume point is too vague in this place.
> Can the driver decide whether or not a keyframe without SPS is a
> suitable resume point? Or do drivers have to parse and store SPS/PPS if
> the hardware does not support resuming from a keyframe without sending
> SPS/PPS again?

The thing is that existing drivers implement and user space clients
rely on the behavior described above, so we cannot really change it
anymore.

Do we have hardware for which this wouldn't work to the point that the
driver couldn't even continue with a bunch of frames corrupted? If
only frame corruption is a problem, we can add a control to tell the
user space to seek to resume points and it can happen in an
incremental patch.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-20 13:12     ` Tomasz Figa
@ 2018-08-20 14:13       ` Philipp Zabel
  2018-08-20 14:27         ` Tomasz Figa
  0 siblings, 1 reply; 62+ messages in thread
From: Philipp Zabel @ 2018-08-20 14:13 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Tomasz,

On Mon, 2018-08-20 at 22:12 +0900, Tomasz Figa wrote:
> On Mon, Aug 20, 2018 at 10:04 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
> > 
> > On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> > [...]
> > > +Seek
> > > +====
> > > +
> > > +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> > > +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
> > > +
> > > +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> > > +   :c:func:`VIDIOC_STREAMOFF`.
> > > +
> > > +   * **Required fields:**
> > > +
> > > +     ``type``
> > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > +
> > > +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> > > +     treated as returned to the client (following standard semantics).
> > > +
> > > +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> > > +
> > > +   * **Required fields:**
> > > +
> > > +     ``type``
> > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > +
> > > +   * The driver must be put in a state after seek and be ready to
> > > +     accept new source bitstream buffers.
> > > +
> > > +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > > +   the seek until a suitable resume point is found.
> > > +
> > > +   .. note::
> > > +
> > > +      There is no requirement to begin queuing stream starting exactly from
> > > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > > +      data queued and must keep processing the queued buffers until it
> > > +      finds a suitable resume point. While looking for a resume point, the
> > 
> > I think the definition of a resume point is too vague in this place.
> > Can the driver decide whether or not a keyframe without SPS is a
> > suitable resume point? Or do drivers have to parse and store SPS/PPS if
> > the hardware does not support resuming from a keyframe without sending
> > SPS/PPS again?
> 
> The thing is that existing drivers implement and user space clients
> rely on the behavior described above, so we cannot really change it
> anymore.

My point is that I'm not exactly sure what that behaviour is, given the
description.

Must a driver be able to resume from a keyframe even if userspace never
pushes SPS/PPS again?
If so, I think it should be mentioned more explicitly than just via an
example in parentheses, to make it clear to all driver developers that
this is a requirement that userspace is going to rely on.

Or, if that is not the case, is a driver free to define "SPS only" as
its "suitable resume point" and to discard all input including keyframes
until the next SPS/PPS is pushed?

It would be better to clearly define what a "suitable resume point" has
to be per codec, and not let the drivers decide for themselves, if at
all possible. Otherwise we'd need a away to inform userspace about the
per-driver definition.

> Do we have hardware for which this wouldn't work to the point that the
> driver couldn't even continue with a bunch of frames corrupted? If
> only frame corruption is a problem, we can add a control to tell the
> user space to seek to resume points and it can happen in an
> incremental patch.

The coda driver currently can't seek at all, it always stops and
restarts the sequence. So depending on the above I might have to either
find and store SPS/PPS in software, or figure out how to make the
firmware flush the bitstream buffer and restart without actually
stopping the sequence.
I'm sure the hardware is capable of this, it's more a question of what
behaviour is actually intended, and whether I have enough information
about the firmware interface to implement it.

regards
Philipp

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-20 14:13       ` Philipp Zabel
@ 2018-08-20 14:27         ` Tomasz Figa
  2018-08-20 15:33           ` Philipp Zabel
  0 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-08-20 14:27 UTC (permalink / raw)
  To: Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Mon, Aug 20, 2018 at 11:13 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> Hi Tomasz,
>
> On Mon, 2018-08-20 at 22:12 +0900, Tomasz Figa wrote:
> > On Mon, Aug 20, 2018 at 10:04 PM Philipp Zabel <p.zabel@pengutronix.de> wrote:
> > >
> > > On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> > > [...]
> > > > +Seek
> > > > +====
> > > > +
> > > > +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> > > > +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
> > > > +
> > > > +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> > > > +   :c:func:`VIDIOC_STREAMOFF`.
> > > > +
> > > > +   * **Required fields:**
> > > > +
> > > > +     ``type``
> > > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > > +
> > > > +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> > > > +     treated as returned to the client (following standard semantics).
> > > > +
> > > > +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> > > > +
> > > > +   * **Required fields:**
> > > > +
> > > > +     ``type``
> > > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > > +
> > > > +   * The driver must be put in a state after seek and be ready to
> > > > +     accept new source bitstream buffers.
> > > > +
> > > > +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > > > +   the seek until a suitable resume point is found.
> > > > +
> > > > +   .. note::
> > > > +
> > > > +      There is no requirement to begin queuing stream starting exactly from
> > > > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > > > +      data queued and must keep processing the queued buffers until it
> > > > +      finds a suitable resume point. While looking for a resume point, the
> > >
> > > I think the definition of a resume point is too vague in this place.
> > > Can the driver decide whether or not a keyframe without SPS is a
> > > suitable resume point? Or do drivers have to parse and store SPS/PPS if
> > > the hardware does not support resuming from a keyframe without sending
> > > SPS/PPS again?
> >
> > The thing is that existing drivers implement and user space clients
> > rely on the behavior described above, so we cannot really change it
> > anymore.
>
> My point is that I'm not exactly sure what that behaviour is, given the
> description.
>
> Must a driver be able to resume from a keyframe even if userspace never
> pushes SPS/PPS again?
> If so, I think it should be mentioned more explicitly than just via an
> example in parentheses, to make it clear to all driver developers that
> this is a requirement that userspace is going to rely on.
>
> Or, if that is not the case, is a driver free to define "SPS only" as
> its "suitable resume point" and to discard all input including keyframes
> until the next SPS/PPS is pushed?
>
> It would be better to clearly define what a "suitable resume point" has
> to be per codec, and not let the drivers decide for themselves, if at
> all possible. Otherwise we'd need a away to inform userspace about the
> per-driver definition.

The intention here is that there is exactly no requirement for the
user space to seek to any kind of resume point and so there is no
point in defining such. The only requirement here is that the
hardware/driver keeps processing the source stream until it finds a
resume point suitable for it - if the hardware keeps SPS/PPS in its
state then just a keyframe; if it doesn't then SPS/PPS. Note that this
is a documentation of the user space API, not a driver implementation
guide. We may want to create the latter separately, though.

H264 is a bit special here, because one may still seek to a key frame,
but past the relevant SPS/PPS headers. In this case, there is no way
for the hardware to know that the SPS/PPS it has in its local state is
not the one that applies to the frame. It may be worth adding that
such case leads to undefined results, but must not cause crash nor a
fatal decode error.

What do you think?

>
> > Do we have hardware for which this wouldn't work to the point that the
> > driver couldn't even continue with a bunch of frames corrupted? If
> > only frame corruption is a problem, we can add a control to tell the
> > user space to seek to resume points and it can happen in an
> > incremental patch.
>
> The coda driver currently can't seek at all, it always stops and
> restarts the sequence. So depending on the above I might have to either
> find and store SPS/PPS in software, or figure out how to make the
> firmware flush the bitstream buffer and restart without actually
> stopping the sequence.
> I'm sure the hardware is capable of this, it's more a question of what
> behaviour is actually intended, and whether I have enough information
> about the firmware interface to implement it.

What happens if you just keep feeding it with next frames? If that
would result only in corrupted frames, I suppose the control (say
V4L2_CID_MPEG_VIDEO_NEEDS_SEEK_TO_RESUME_POINT) would solve the
problem?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-20 14:27         ` Tomasz Figa
@ 2018-08-20 15:33           ` Philipp Zabel
  2018-08-27  4:03             ` Tomasz Figa
  0 siblings, 1 reply; 62+ messages in thread
From: Philipp Zabel @ 2018-08-20 15:33 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Mon, 2018-08-20 at 23:27 +0900, Tomasz Figa wrote:
[...]
> +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > > > > +   the seek until a suitable resume point is found.
> > > > > +
> > > > > +   .. note::
> > > > > +
> > > > > +      There is no requirement to begin queuing stream starting exactly from
> > > > > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > > > > +      data queued and must keep processing the queued buffers until it
> > > > > +      finds a suitable resume point. While looking for a resume point, the
> > > > 
> > > > I think the definition of a resume point is too vague in this place.
> > > > Can the driver decide whether or not a keyframe without SPS is a
> > > > suitable resume point? Or do drivers have to parse and store SPS/PPS if
> > > > the hardware does not support resuming from a keyframe without sending
> > > > SPS/PPS again?
> > > 
> > > The thing is that existing drivers implement and user space clients
> > > rely on the behavior described above, so we cannot really change it
> > > anymore.
> > 
> > My point is that I'm not exactly sure what that behaviour is, given the
> > description.
> > 
> > Must a driver be able to resume from a keyframe even if userspace never
> > pushes SPS/PPS again?
> > If so, I think it should be mentioned more explicitly than just via an
> > example in parentheses, to make it clear to all driver developers that
> > this is a requirement that userspace is going to rely on.
> > 
> > Or, if that is not the case, is a driver free to define "SPS only" as
> > its "suitable resume point" and to discard all input including keyframes
> > until the next SPS/PPS is pushed?
> > 
> > It would be better to clearly define what a "suitable resume point" has
> > to be per codec, and not let the drivers decide for themselves, if at
> > all possible. Otherwise we'd need a away to inform userspace about the
> > per-driver definition.
> 
> The intention here is that there is exactly no requirement for the
> user space to seek to any kind of resume point

No question about this.

> and so there is no point in defining such.

I don't agree. Let me give an example:

Assume userspace wants to play back a simple h.264 stream that has
SPS/PPS exactly once, in the beginning.

If drivers are allowed to resume from SPS/PPS only, and have no way to
communicate this to userspace, userspace always has to assume that
resuming from keyframes alone is not possible. So it has to store
SPS/PPS and resubmit them with every seek, even if a specific driver
wouldn't require it: Otherwise those drivers that don't store SPS/PPS
themselves (or in hardware) would be allowed to just drop everything
after the first seek.
This effectively would make resending SPS/PPS mandatory, which doesn't
fit well with the intention of letting userspace just seek anywhere and
start feeding data (or: NAL units) into the driver blindly.

> The only requirement here is that the
> hardware/driver keeps processing the source stream until it finds a
> resume point suitable for it - if the hardware keeps SPS/PPS in its
> state then just a keyframe; if it doesn't then SPS/PPS.

Yes, but the difference between those two might be very relevant to
userspace behaviour.

> Note that this is a documentation of the user space API, not a driver
> implementation guide. We may want to create the latter separately,
> though.

This is a good point, I keep switching the perspective from which I look
at this document.
Even for userspace it would make sense to be as specific as possible,
though. Otherwise, doesn't userspace always have to assume the worst?

> H264 is a bit special here, because one may still seek to a key frame,
> but past the relevant SPS/PPS headers. In this case, there is no way
> for the hardware to know that the SPS/PPS it has in its local state is
> not the one that applies to the frame. It may be worth adding that
> such case leads to undefined results, but must not cause crash nor a
> fatal decode error.
> 
> What do you think?

That sounds like a good idea. I haven't thought about seeking over a
SPS/PPS change. Of course userspace must not expect correct results in
this case without providing the new SPS/PPS.

> > > Do we have hardware for which this wouldn't work to the point that the
> > > driver couldn't even continue with a bunch of frames corrupted? If
> > > only frame corruption is a problem, we can add a control to tell the
> > > user space to seek to resume points and it can happen in an
> > > incremental patch.
> > 
> > The coda driver currently can't seek at all, it always stops and
> > restarts the sequence. So depending on the above I might have to either
> > find and store SPS/PPS in software, or figure out how to make the
> > firmware flush the bitstream buffer and restart without actually
> > stopping the sequence.
> > I'm sure the hardware is capable of this, it's more a question of what
> > behaviour is actually intended, and whether I have enough information
> > about the firmware interface to implement it.
> 
> What happens if you just keep feeding it with next frames?

As long as they are well formed, it should just decode them, possibly
with artifacts due to mismatched reference buffers. There is an I-Frame
search mode that should be usable to skip to the next resume point, as
well, so I'm sure coda will end up not needing the
NEEDS_SEEK_TO_RESUME_POINT flag below. I'm just not certain at this
point whether I'll be able to (or: whether I'll have to) keep the
SPS/PPS state across seeks. I have seen so many decoder hangs with
malformed input on i.MX53 that I couldn't recover from, that I'm wary
to make any guarantees without flushing the bitstream buffer first.

> If that would result only in corrupted frames, I suppose the control (say
> V4L2_CID_MPEG_VIDEO_NEEDS_SEEK_TO_RESUME_POINT) would solve the
> problem?

For this to be useful, userspace needs to know what a resume point is in
the first place, though.

regards
Philipp

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-08  2:55             ` Tomasz Figa
@ 2018-08-21 11:29               ` Stanimir Varbanov
  2018-08-27  4:09                 ` Tomasz Figa
  2018-10-15 10:13               ` Tomasz Figa
  1 sibling, 1 reply; 62+ messages in thread
From: Stanimir Varbanov @ 2018-08-21 11:29 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Tomasz,

On 08/08/2018 05:55 AM, Tomasz Figa wrote:
> On Tue, Aug 7, 2018 at 4:37 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:

>>>>>>> +7.  If all the following conditions are met, the client may resume the
>>>>>>> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
>>>>>>> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
>>>>>>> +    sequence:
>>>>>>> +
>>>>>>> +    * ``sizeimage`` of new format is less than or equal to the size of
>>>>>>> +      currently allocated buffers,
>>>>>>> +
>>>>>>> +    * the number of buffers currently allocated is greater than or equal to
>>>>>>> +      the minimum number of buffers acquired in step 6.
>>>>>>
>>>>>> You might want to mention that if there are insufficient buffers, then
>>>>>> VIDIOC_CREATE_BUFS can be used to add more buffers.
>>>>>>
>>>>>
>>>>> This might be a bit tricky, since at least s5p-mfc and coda can only
>>>>> work on a fixed buffer set and one would need to fully reinitialize
>>>>> the decoding to add one more buffer, which would effectively be the
>>>>> full resolution change sequence, as below, just with REQBUFS(0),
>>>>> REQBUFS(N) replaced with CREATE_BUFS.
>>>>
>>>> What happens today in those drivers if you try to call CREATE_BUFS?
>>>
>>> s5p-mfc doesn't set the .vidioc_create_bufs pointer in its
>>> v4l2_ioctl_ops, so I suppose that would be -ENOTTY?
>>
>> Correct for s5p-mfc.
> 
> As Philipp clarified, coda supports adding buffers on the fly. I
> briefly looked at venus and mtk-vcodec and they seem to use m2m
> implementation of CREATE_BUFS. Not sure if anyone tested that, though.
> So the only hardware I know for sure cannot support this is s5p-mfc.

In Venus case CREATE_BUFS is tested with Gstreamer.

-- 
regards,
Stan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-20 15:33           ` Philipp Zabel
@ 2018-08-27  4:03             ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-27  4:03 UTC (permalink / raw)
  To: Philipp Zabel, Pawel Osciak
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

Hi Philipp,

On Tue, Aug 21, 2018 at 12:34 AM Philipp Zabel <p.zabel@pengutronix.de> wrote:
>
> On Mon, 2018-08-20 at 23:27 +0900, Tomasz Figa wrote:
> [...]
> > +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > > > > > +   the seek until a suitable resume point is found.
> > > > > > +
> > > > > > +   .. note::
> > > > > > +
> > > > > > +      There is no requirement to begin queuing stream starting exactly from
> > > > > > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > > > > > +      data queued and must keep processing the queued buffers until it
> > > > > > +      finds a suitable resume point. While looking for a resume point, the
> > > > >
> > > > > I think the definition of a resume point is too vague in this place.
> > > > > Can the driver decide whether or not a keyframe without SPS is a
> > > > > suitable resume point? Or do drivers have to parse and store SPS/PPS if
> > > > > the hardware does not support resuming from a keyframe without sending
> > > > > SPS/PPS again?
> > > >
> > > > The thing is that existing drivers implement and user space clients
> > > > rely on the behavior described above, so we cannot really change it
> > > > anymore.
> > >
> > > My point is that I'm not exactly sure what that behaviour is, given the
> > > description.
> > >
> > > Must a driver be able to resume from a keyframe even if userspace never
> > > pushes SPS/PPS again?
> > > If so, I think it should be mentioned more explicitly than just via an
> > > example in parentheses, to make it clear to all driver developers that
> > > this is a requirement that userspace is going to rely on.
> > >
> > > Or, if that is not the case, is a driver free to define "SPS only" as
> > > its "suitable resume point" and to discard all input including keyframes
> > > until the next SPS/PPS is pushed?
> > >
> > > It would be better to clearly define what a "suitable resume point" has
> > > to be per codec, and not let the drivers decide for themselves, if at
> > > all possible. Otherwise we'd need a away to inform userspace about the
> > > per-driver definition.
> >
> > The intention here is that there is exactly no requirement for the
> > user space to seek to any kind of resume point
>
> No question about this.
>
> > and so there is no point in defining such.
>
> I don't agree. Let me give an example:
>
> Assume userspace wants to play back a simple h.264 stream that has
> SPS/PPS exactly once, in the beginning.
>
> If drivers are allowed to resume from SPS/PPS only, and have no way to
> communicate this to userspace, userspace always has to assume that
> resuming from keyframes alone is not possible. So it has to store
> SPS/PPS and resubmit them with every seek, even if a specific driver
> wouldn't require it: Otherwise those drivers that don't store SPS/PPS
> themselves (or in hardware) would be allowed to just drop everything
> after the first seek.
> This effectively would make resending SPS/PPS mandatory, which doesn't
> fit well with the intention of letting userspace just seek anywhere and
> start feeding data (or: NAL units) into the driver blindly.
>

I'd say that such video is broken by design, because you cannot play
back any arbitrary later part of it without decoding it from the
beginning.

However, if the hardware keeps SPS/PPS across seeks (and that should
normally be the case), the case could be handled by the user space
letting the decoder initialize with the first frames and only then
seeking, which would probably be the typical case of a user opening a
video file and then moving the seek bar to desired position (or
clicking a bookmark).

If the hardware doesn't keep SPS/PPS across seeks, stateless API could
arguably be a better candidate for it, since it mandates the user
space to keep SPS/PPS around.

> > The only requirement here is that the
> > hardware/driver keeps processing the source stream until it finds a
> > resume point suitable for it - if the hardware keeps SPS/PPS in its
> > state then just a keyframe; if it doesn't then SPS/PPS.
>
> Yes, but the difference between those two might be very relevant to
> userspace behaviour.
>
> > Note that this is a documentation of the user space API, not a driver
> > implementation guide. We may want to create the latter separately,
> > though.
>
> This is a good point, I keep switching the perspective from which I look
> at this document.
> Even for userspace it would make sense to be as specific as possible,
> though. Otherwise, doesn't userspace always have to assume the worst?
>

That's right, a generic user space is expected to handle all the
possible cases possible with the interface it's using. This is
precisely why I'd like to avoid introducing the case where user space
needs to carry state around. The API is for stateful hardware, which
is expected to carry all the needed state around itself.

> > H264 is a bit special here, because one may still seek to a key frame,
> > but past the relevant SPS/PPS headers. In this case, there is no way
> > for the hardware to know that the SPS/PPS it has in its local state is
> > not the one that applies to the frame. It may be worth adding that
> > such case leads to undefined results, but must not cause crash nor a
> > fatal decode error.
> >
> > What do you think?
>
> That sounds like a good idea. I haven't thought about seeking over a
> SPS/PPS change. Of course userspace must not expect correct results in
> this case without providing the new SPS/PPS.
>

From what I talked with Pawel, our hardware (s5p-mfc, mtk-vcodec) will
just notice that the frames refer to a different SPS/PPS (based on
seq_parameter_set_id, I assume) and keep dropping frames until next
corresponding header is encountered.

> > > > Do we have hardware for which this wouldn't work to the point that the
> > > > driver couldn't even continue with a bunch of frames corrupted? If
> > > > only frame corruption is a problem, we can add a control to tell the
> > > > user space to seek to resume points and it can happen in an
> > > > incremental patch.
> > >
> > > The coda driver currently can't seek at all, it always stops and
> > > restarts the sequence. So depending on the above I might have to either
> > > find and store SPS/PPS in software, or figure out how to make the
> > > firmware flush the bitstream buffer and restart without actually
> > > stopping the sequence.
> > > I'm sure the hardware is capable of this, it's more a question of what
> > > behaviour is actually intended, and whether I have enough information
> > > about the firmware interface to implement it.
> >
> > What happens if you just keep feeding it with next frames?
>
> As long as they are well formed, it should just decode them, possibly
> with artifacts due to mismatched reference buffers. There is an I-Frame
> search mode that should be usable to skip to the next resume point, as
> well, so I'm sure coda will end up not needing the
> NEEDS_SEEK_TO_RESUME_POINT flag below. I'm just not certain at this
> point whether I'll be able to (or: whether I'll have to) keep the
> SPS/PPS state across seeks. I have seen so many decoder hangs with
> malformed input on i.MX53 that I couldn't recover from, that I'm wary
> to make any guarantees without flushing the bitstream buffer first.

Based on the above, I believe the answer is that your hardware/driver
needs to keep SPS/PPS around. Is there a good way to do it with Coda?
We definitely don't want to do any parsing inside the driver.

>
> > If that would result only in corrupted frames, I suppose the control (say
> > V4L2_CID_MPEG_VIDEO_NEEDS_SEEK_TO_RESUME_POINT) would solve the
> > problem?
>
> For this to be useful, userspace needs to know what a resume point is in
> the first place, though.

That would be defined in the context of that control and particular
pixel format, since there is no general, yet precise enough definition
that could apply to all codecs. Right now, I would like to defer
adding such constraints until there is really a hardware which needs
it and it can't be handled using stateless API.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-21 11:29               ` Stanimir Varbanov
@ 2018-08-27  4:09                 ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-08-27  4:09 UTC (permalink / raw)
  To: Stanimir Varbanov, Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Mauro Carvalho Chehab, Pawel Osciak, Alexandre Courbot, kamil,
	a.hajda, Kyungmin Park, jtp.park, Philipp Zabel,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Tue, Aug 21, 2018 at 8:29 PM Stanimir Varbanov
<stanimir.varbanov@linaro.org> wrote:
>
> Hi Tomasz,
>
> On 08/08/2018 05:55 AM, Tomasz Figa wrote:
> > On Tue, Aug 7, 2018 at 4:37 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> >>>>>>> +7.  If all the following conditions are met, the client may resume the
> >>>>>>> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> >>>>>>> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
> >>>>>>> +    sequence:
> >>>>>>> +
> >>>>>>> +    * ``sizeimage`` of new format is less than or equal to the size of
> >>>>>>> +      currently allocated buffers,
> >>>>>>> +
> >>>>>>> +    * the number of buffers currently allocated is greater than or equal to
> >>>>>>> +      the minimum number of buffers acquired in step 6.
> >>>>>>
> >>>>>> You might want to mention that if there are insufficient buffers, then
> >>>>>> VIDIOC_CREATE_BUFS can be used to add more buffers.
> >>>>>>
> >>>>>
> >>>>> This might be a bit tricky, since at least s5p-mfc and coda can only
> >>>>> work on a fixed buffer set and one would need to fully reinitialize
> >>>>> the decoding to add one more buffer, which would effectively be the
> >>>>> full resolution change sequence, as below, just with REQBUFS(0),
> >>>>> REQBUFS(N) replaced with CREATE_BUFS.
> >>>>
> >>>> What happens today in those drivers if you try to call CREATE_BUFS?
> >>>
> >>> s5p-mfc doesn't set the .vidioc_create_bufs pointer in its
> >>> v4l2_ioctl_ops, so I suppose that would be -ENOTTY?
> >>
> >> Correct for s5p-mfc.
> >
> > As Philipp clarified, coda supports adding buffers on the fly. I
> > briefly looked at venus and mtk-vcodec and they seem to use m2m
> > implementation of CREATE_BUFS. Not sure if anyone tested that, though.
> > So the only hardware I know for sure cannot support this is s5p-mfc.
>
> In Venus case CREATE_BUFS is tested with Gstreamer.

Stanimir: Alright. Thanks for confirmation.

Hans: Technically, we could still implement CREATE_BUFS for s5p-mfc,
but it would need to be restricted to situations where it's possible
to reinitialize the whole hardware buffer queue, i.e.
- before initial STREAMON(CAPTURE) after header parsing,
- after a resolution change and before following STREAMON(CAPTURE) or
DECODER_CMD_START (to ack resolution change without buffer
reallocation).

Would that work for your original suggestion?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
                     ` (3 preceding siblings ...)
  2018-08-20 13:04   ` Philipp Zabel
@ 2018-08-31  8:26   ` Alexandre Courbot
  2018-09-05  5:45     ` Tomasz Figa
  2018-10-17 13:34   ` Laurent Pinchart
  5 siblings, 1 reply; 62+ messages in thread
From: Alexandre Courbot @ 2018-08-31  8:26 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, LKML, Stanimir Varbanov,
	Mauro Carvalho Chehab, Hans Verkuil, Pawel Osciak, kamil,
	a.hajda, Kyungmin Park, jtp.park, p.zabel, Tiffany Lin,
	andrew-ct.chen, todor.tomov, Nicolas Dufresne, Paul Kocialkowski,
	Laurent Pinchart, dave.stevenson, ezequiel

Hi Tomasz, just a few thoughts I came across while writing the
stateless codec document:

On Tue, Jul 24, 2018 at 11:06 PM Tomasz Figa <tfiga@chromium.org> wrote:
[snip]
> +****************************************
> +Memory-to-memory Video Decoder Interface
> +****************************************

Since we have a m2m stateless decoder interface, can we call this the
m2m video *stateful* decoder interface? :)

> +Conventions and notation used in this document
> +==============================================
[snip]
> +Glossary
> +========

I think these sections apply to both stateless and stateful. How about
moving then into dev-codec.rst and mentioning that they apply to the
two following sections?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-31  8:26   ` Alexandre Courbot
@ 2018-09-05  5:45     ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-09-05  5:45 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Fri, Aug 31, 2018 at 5:27 PM Alexandre Courbot <acourbot@chromium.org> wrote:
>
> Hi Tomasz, just a few thoughts I came across while writing the
> stateless codec document:
>
> On Tue, Jul 24, 2018 at 11:06 PM Tomasz Figa <tfiga@chromium.org> wrote:
> [snip]
> > +****************************************
> > +Memory-to-memory Video Decoder Interface
> > +****************************************
>
> Since we have a m2m stateless decoder interface, can we call this the
> m2m video *stateful* decoder interface? :)

I guess it could make sense indeed. Let's wait for some other opinions, if any.

>
> > +Conventions and notation used in this document
> > +==============================================
> [snip]
> > +Glossary
> > +========
>
> I think these sections apply to both stateless and stateful. How about
> moving then into dev-codec.rst and mentioning that they apply to the
> two following sections?

Or maybe we could put them into separate rst files and source them at
the top of each interface documentation? Personally, I'm okay with
either. On a related note, I'd love to see some kind of glossary
lookup on mouse hoover, so that I don't have to scroll back and forth.
:)

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-07-24 14:06 ` [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface Tomasz Figa
  2018-07-25 13:41   ` Philipp Zabel
  2018-07-25 13:57   ` Hans Verkuil
@ 2018-09-07 20:17   ` Ezequiel Garcia
  2018-09-10  3:34     ` Tomasz Figa
  2018-10-17 15:19   ` Laurent Pinchart
  3 siblings, 1 reply; 62+ messages in thread
From: Ezequiel Garcia @ 2018-09-07 20:17 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Hans Verkuil, Pawel Osciak, Alexandre Courbot, kamil, a.hajda,
	Kyungmin Park, jtp.park, Philipp Zabel,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson

On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst b/Documentation/media/uapi/v4l/dev-encoder.rst
> new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting stream
> +must not need any further post-processing by the client.
> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in
> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> +   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> +   unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise,
> +   depending on driver capabilities and following the general V4L2
> +   guidelines.
> +
> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
> +   containing data (encoded frame/stream) that resulted from processing
> +   buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue; the queue of buffers containing encoded
> +   bitstream; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
> +   hardware into ``CAPTURE`` buffers
> +
> +client
> +   application client communicating with the driver implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.);
> +   see also: raw format
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec and hardware requirements;
> +   typically visible resolution rounded up to full macroblocks; see also:
> +   visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ from display order if
> +   coded format includes a feature of frame reordering; ``CAPTURE`` buffers
> +   must be returned by the driver in decode order
> +
> +display order
> +   the order in which frames must be displayed; ``OUTPUT`` buffers must be
> +   queued by the client in display order
> +
> +IDR
> +   a type of a keyframe in H.264-encoded stream, which clears the list of
> +   earlier reference frames (DPBs)
> +
> +keyframe
> +   an encoded frame that does not reference frames decoded earlier, i.e.
> +   can be decoded fully on its own.
> +
> +macroblock
> +   a processing unit in image and video compression formats based on linear
> +   block transforms (e.g. H264, VP8, VP9); codec-specific, but for most of
> +   popular codecs the size is 16x16 samples (pixels)
> +
> +OUTPUT
> +   the source buffer queue; the queue of buffers containing raw frames;
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
> +   from ``OUTPUT`` buffers
> +
> +PPS
> +   Picture Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may start/continue, without
> +   any previous state/data present, e.g.: a keyframe (VP8/VP9) or
> +   SPS/PPS/IDR sequence (H.264); a resume point is required to start decode
> +   of a new stream, or to resume decoding after a seek
> +
> +source
> +   data fed to the encoder; ``OUTPUT``
> +
> +source height
> +   height in pixels for given source resolution
> +
> +source resolution
> +   resolution in pixels of source frames being source to the encoder and
> +   subject to further cropping to the bounds of visible resolution
> +
> +source width
> +   width in pixels for given source resolution
> +
> +SPS
> +   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +stream metadata
> +   additional (non-visual) information contained inside encoded bitstream;
> +   for example: coded resolution, visible resolution, codec profile
> +
> +visible height
> +   height for given visible resolution; display height
> +
> +visible resolution
> +   stream resolution of the visible picture, in pixels, to be used for
> +   display purposes; must be smaller or equal to coded resolution;
> +   display resolution
> +
> +visible width
> +   width for given visible resolution; display width
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> +
> +   * The driver must always return the full set of supported formats,
> +     irrespective of the format set on the ``OUTPUT`` queue.
> +
> +2. To enumerate the set of supported raw formats, the client may call
> +   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> +
> +   * The driver must return only the formats supported for the format
> +     currently active on ``CAPTURE``.
> +

Paul and I where discussing about the default active format on CAPTURE
and OUTPUT queues. That is, the format that is active (if any) right
after driver probes.

Currently, the v4l2-compliance tool tests the default active format,
by requiring drivers to support:

    fmt = g_fmt()
    s_fmt(fmt)

Is this actually required? Should we also require this for stateful
and stateless codecs? If yes, should it be documented?

Regards,
Ezequiel

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-09-07 20:17   ` Ezequiel Garcia
@ 2018-09-10  3:34     ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-09-10  3:34 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson

On Sat, Sep 8, 2018 at 5:17 AM Ezequiel Garcia <ezequiel@collabora.com> wrote:
>
> On Tue, 2018-07-24 at 23:06 +0900, Tomasz Figa wrote:
[snip]
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the
> > +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> > +
> > +   * The driver must always return the full set of supported formats,
> > +     irrespective of the format set on the ``OUTPUT`` queue.
> > +
> > +2. To enumerate the set of supported raw formats, the client may call
> > +   :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> > +
> > +   * The driver must return only the formats supported for the format
> > +     currently active on ``CAPTURE``.
> > +
>
> Paul and I where discussing about the default active format on CAPTURE
> and OUTPUT queues. That is, the format that is active (if any) right
> after driver probes.
>
> Currently, the v4l2-compliance tool tests the default active format,
> by requiring drivers to support:
>
>     fmt = g_fmt()
>     s_fmt(fmt)
>
> Is this actually required? Should we also require this for stateful
> and stateless codecs? If yes, should it be documented?

The general V4L2 principle is that drivers must maintain some sane
default state right from when they are exposed to the userspace. I'd
try to stick to the common V4L2 semantics, unless there is a very good
reason not to do so.

Note that we actually diverged from it on CAPTURE state for stateful
decoders, because we return an error, if any format-related ioctl is
called on CAPTURE queue before OUTPUT queue is initialized with a
valid coded format, either explicitly by the client or implicitly via
bitstream parsing. The reason was backwards compatibility with clients
which don't handle source change events. If that wasn't the case, we
could have made the CAPTURE queue completely independent and have the
format there reset with source change event, whenever it becomes
invalid due to things like resolution change or speculative
initialization miss, which would make things much more symmetrical.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/2] Document memory-to-memory video codec interfaces
  2018-07-24 14:06 [PATCH 0/2] Document memory-to-memory video codec interfaces Tomasz Figa
                   ` (2 preceding siblings ...)
  2018-07-25 13:28 ` [PATCH 0/2] Document memory-to-memory video codec interfaces Philipp Zabel
@ 2018-09-10  9:13 ` Hans Verkuil
  2018-09-11  3:52   ` Tomasz Figa
  3 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-09-10  9:13 UTC (permalink / raw)
  To: Tomasz Figa, linux-media
  Cc: linux-kernel, Stanimir Varbanov, Mauro Carvalho Chehab,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	Dave Stevenson, ezequiel

Hi Tomasz,

On 07/24/2018 04:06 PM, Tomasz Figa wrote:
> This series attempts to add the documentation of what was discussed
> during Media Workshops at LinuxCon Europe 2012 in Barcelona and then
> later Embedded Linux Conference Europe 2014 in Düsseldorf and then
> eventually written down by Pawel Osciak and tweaked a bit by Chrome OS
> video team (but mostly in a cosmetic way or making the document more
> precise), during the several years of Chrome OS using the APIs in
> production.
> 
> Note that most, if not all, of the API is already implemented in
> existing mainline drivers, such as s5p-mfc or mtk-vcodec. Intention of
> this series is just to formalize what we already have.
> 
> It is an initial conversion from Google Docs to RST, so formatting is
> likely to need some further polishing. It is also the first time for me
> to create such long RST documention. I could not find any other instance
> of similar userspace sequence specifications among our Media documents,
> so I mostly followed what was there in the source. Feel free to suggest
> a better format.
> 
> Much of credits should go to Pawel Osciak, for writing most of the
> original text of the initial RFC.

I'm adding this here as a result of an irc discussion, since it applies
to both encoders and decoders:

How to handle non-square pixel aspect ratios?

Decoders would have to report it through VIDIOC_CROPCAP, so this needs
to be documented when the application should call this, but I don't
think we can provide this information today for encoders.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 0/2] Document memory-to-memory video codec interfaces
  2018-09-10  9:13 ` Hans Verkuil
@ 2018-09-11  3:52   ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-09-11  3:52 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Mon, Sep 10, 2018 at 6:14 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi Tomasz,
>
> On 07/24/2018 04:06 PM, Tomasz Figa wrote:
> > This series attempts to add the documentation of what was discussed
> > during Media Workshops at LinuxCon Europe 2012 in Barcelona and then
> > later Embedded Linux Conference Europe 2014 in Düsseldorf and then
> > eventually written down by Pawel Osciak and tweaked a bit by Chrome OS
> > video team (but mostly in a cosmetic way or making the document more
> > precise), during the several years of Chrome OS using the APIs in
> > production.
> >
> > Note that most, if not all, of the API is already implemented in
> > existing mainline drivers, such as s5p-mfc or mtk-vcodec. Intention of
> > this series is just to formalize what we already have.
> >
> > It is an initial conversion from Google Docs to RST, so formatting is
> > likely to need some further polishing. It is also the first time for me
> > to create such long RST documention. I could not find any other instance
> > of similar userspace sequence specifications among our Media documents,
> > so I mostly followed what was there in the source. Feel free to suggest
> > a better format.
> >
> > Much of credits should go to Pawel Osciak, for writing most of the
> > original text of the initial RFC.
>
> I'm adding this here as a result of an irc discussion, since it applies
> to both encoders and decoders:
>
> How to handle non-square pixel aspect ratios?
>
> Decoders would have to report it through VIDIOC_CROPCAP, so this needs
> to be documented when the application should call this, but I don't
> think we can provide this information today for encoders.

Thanks for heads up. Will document that VIDIOC_CROPCAP needs to return it.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-26 10:20     ` Tomasz Figa
                         ` (2 preceding siblings ...)
  2018-08-07  7:13       ` Hans Verkuil
@ 2018-09-19 10:17       ` Tomasz Figa
  2018-10-08 12:22         ` Hans Verkuil
  3 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-09-19 10:17 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia, Maxime Jourdan

Hi Hans,

On Thu, Jul 26, 2018 at 7:20 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Hans,
>
> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >
> > Hi Tomasz,
> >
> > Many, many thanks for working on this! It's a great document and when done
> > it will be very useful indeed.
> >
> > Review comments follow...
>
> Thanks for review!
>
> >
> > On 24/07/18 16:06, Tomasz Figa wrote:
[snip]
> > > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
> > > +    on the ``CAPTURE`` queue.
> > > +
> > > +    * **Required fields:**
> > > +
> > > +      ``count``
> > > +          requested number of buffers to allocate; greater than zero
> > > +
> > > +      ``type``
> > > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > > +
> > > +      ``memory``
> > > +          follows standard semantics
> > > +
> > > +    * **Return fields:**
> > > +
> > > +      ``count``
> > > +          adjusted to allocated number of buffers
> > > +
> > > +    * The driver must adjust count to minimum of required number of
> > > +      destination buffers for given format and stream configuration and the
> > > +      count passed. The client must check this value after the ioctl
> > > +      returns to get the number of buffers allocated.
> > > +
> > > +    .. note::
> > > +
> > > +       To allocate more than minimum number of buffers (for pipeline
> > > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> > > +       get minimum number of buffers required, and pass the obtained value
> > > +       plus the number of additional buffers needed in count to
> > > +       :c:func:`VIDIOC_REQBUFS`.
> >
> >
> > I think we should mention here the option of using VIDIOC_CREATE_BUFS in order
> > to allocate buffers larger than the current CAPTURE format in order to accommodate
> > future resolution changes.
>
> Ack.
>

I'm about to add a paragraph to describe this, but there is one detail
to iron out.

The VIDIOC_CREATE_BUFS ioctl accepts a v4l2_format struct. Userspace
needs to fill in this struct and the specs says that

  "Usually this will be done using the VIDIOC_TRY_FMT or VIDIOC_G_FMT
ioctls to ensure that the requested format is supported by the
driver."

However, in case of a decoder, those calls would fixup the format to
match the currently parsed stream, which would likely resolve to the
current coded resolution (~hardware alignment). How do we get a format
for the desired maximum resolution?

[snip].
> > > +
> > > +     * The driver is also allowed to and may not return all decoded frames
[snip]
> > > +       queued but not decode before the seek sequence was initiated. For
> >
> > Very confusing sentence. I think you mean this:
> >
> >           The driver may not return all decoded frames that where ready for
> >           dequeueing from before the seek sequence was initiated.
> >
> > Is this really true? Once decoded frames are marked as buffer_done by the
> > driver there is no reason for them to be removed. Or you mean something else
> > here, e.g. the frames are decoded, but the buffers not yet given back to vb2.
> >
>
> Exactly "the frames are decoded, but the buffers not yet given back to
> vb2", for example, if reordering takes place. However, if one stops
> streaming before dequeuing all buffers, they are implicitly returned
> (reset to the state after REQBUFS) and can't be dequeued anymore, so
> the frames are lost, even if the driver returned them. I guess the
> sentence was really unfortunate indeed.
>

Actually, that's not the only case.

The documentation is written from userspace point of view. Queuing an
OUTPUT buffer is not equivalent to having it decoded (and a CAPTURE
buffer given back to vb2). If the userspace queues a buffer and then
stops streaming, the buffer might have been still waiting in the
queue, for decoding of previous buffers to finish.

So basically by "queued frames" I meant "OUTPUT buffers queued by
userspace and not sent to the hardware yet" and by "decoded frames" I
meant "CAPTURE buffers containing matching frames given back to vb2".

How about rewording like this:

     * The ``VIDIOC_STREAMOFF`` operation discards any remaining queued
       ``OUTPUT`` buffers, which means that not all of the ``OUTPUT`` buffers
       queued before the seek may have matching ``CAPTURE`` buffers produced.
       For example, [...]

> > > +       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
> > > +       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > > +       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
> > > +       H’}, {A’, G’, H’}, {G’, H’}.
> > > +
> > > +   .. note::
> > > +
> > > +      To achieve instantaneous seek, the client may restart streaming on
> > > +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-09-19 10:17       ` Tomasz Figa
@ 2018-10-08 12:22         ` Hans Verkuil
  2018-10-09  4:23           ` Tomasz Figa
  0 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-10-08 12:22 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia, Maxime Jourdan

On 09/19/2018 12:17 PM, Tomasz Figa wrote:
> Hi Hans,
> 
> On Thu, Jul 26, 2018 at 7:20 PM Tomasz Figa <tfiga@chromium.org> wrote:
>>
>> Hi Hans,
>>
>> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>
>>> Hi Tomasz,
>>>
>>> Many, many thanks for working on this! It's a great document and when done
>>> it will be very useful indeed.
>>>
>>> Review comments follow...
>>
>> Thanks for review!
>>
>>>
>>> On 24/07/18 16:06, Tomasz Figa wrote:
> [snip]
>>>> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
>>>> +    on the ``CAPTURE`` queue.
>>>> +
>>>> +    * **Required fields:**
>>>> +
>>>> +      ``count``
>>>> +          requested number of buffers to allocate; greater than zero
>>>> +
>>>> +      ``type``
>>>> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
>>>> +
>>>> +      ``memory``
>>>> +          follows standard semantics
>>>> +
>>>> +    * **Return fields:**
>>>> +
>>>> +      ``count``
>>>> +          adjusted to allocated number of buffers
>>>> +
>>>> +    * The driver must adjust count to minimum of required number of
>>>> +      destination buffers for given format and stream configuration and the
>>>> +      count passed. The client must check this value after the ioctl
>>>> +      returns to get the number of buffers allocated.
>>>> +
>>>> +    .. note::
>>>> +
>>>> +       To allocate more than minimum number of buffers (for pipeline
>>>> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
>>>> +       get minimum number of buffers required, and pass the obtained value
>>>> +       plus the number of additional buffers needed in count to
>>>> +       :c:func:`VIDIOC_REQBUFS`.
>>>
>>>
>>> I think we should mention here the option of using VIDIOC_CREATE_BUFS in order
>>> to allocate buffers larger than the current CAPTURE format in order to accommodate
>>> future resolution changes.
>>
>> Ack.
>>
> 
> I'm about to add a paragraph to describe this, but there is one detail
> to iron out.
> 
> The VIDIOC_CREATE_BUFS ioctl accepts a v4l2_format struct. Userspace
> needs to fill in this struct and the specs says that
> 
>   "Usually this will be done using the VIDIOC_TRY_FMT or VIDIOC_G_FMT
> ioctls to ensure that the requested format is supported by the
> driver."
> 
> However, in case of a decoder, those calls would fixup the format to
> match the currently parsed stream, which would likely resolve to the
> current coded resolution (~hardware alignment). How do we get a format
> for the desired maximum resolution?

You would call G_FMT to get the current format/resolution, then update
width and height and call TRY_FMT.

Although to be honest you can also just set pixelformat and width/height
and zero everything else and call TRY_FMT directly, skipping the G_FMT
ioctl.

> 
> [snip].
>>>> +
>>>> +     * The driver is also allowed to and may not return all decoded frames
> [snip]
>>>> +       queued but not decode before the seek sequence was initiated. For
>>>
>>> Very confusing sentence. I think you mean this:
>>>
>>>           The driver may not return all decoded frames that where ready for
>>>           dequeueing from before the seek sequence was initiated.
>>>
>>> Is this really true? Once decoded frames are marked as buffer_done by the
>>> driver there is no reason for them to be removed. Or you mean something else
>>> here, e.g. the frames are decoded, but the buffers not yet given back to vb2.
>>>
>>
>> Exactly "the frames are decoded, but the buffers not yet given back to
>> vb2", for example, if reordering takes place. However, if one stops
>> streaming before dequeuing all buffers, they are implicitly returned
>> (reset to the state after REQBUFS) and can't be dequeued anymore, so
>> the frames are lost, even if the driver returned them. I guess the
>> sentence was really unfortunate indeed.
>>
> 
> Actually, that's not the only case.
> 
> The documentation is written from userspace point of view. Queuing an
> OUTPUT buffer is not equivalent to having it decoded (and a CAPTURE
> buffer given back to vb2). If the userspace queues a buffer and then
> stops streaming, the buffer might have been still waiting in the
> queue, for decoding of previous buffers to finish.
> 
> So basically by "queued frames" I meant "OUTPUT buffers queued by
> userspace and not sent to the hardware yet" and by "decoded frames" I
> meant "CAPTURE buffers containing matching frames given back to vb2".
> 
> How about rewording like this:
> 
>      * The ``VIDIOC_STREAMOFF`` operation discards any remaining queued
>        ``OUTPUT`` buffers, which means that not all of the ``OUTPUT`` buffers
>        queued before the seek may have matching ``CAPTURE`` buffers produced.
>        For example, [...]

That looks correct.

Regards,

	Hans

> 
>>>> +       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
>>>> +       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
>>>> +       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
>>>> +       H’}, {A’, G’, H’}, {G’, H’}.
>>>> +
>>>> +   .. note::
>>>> +
>>>> +      To achieve instantaneous seek, the client may restart streaming on
>>>> +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
> 
> Best regards,
> Tomasz
> 


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-08 12:22         ` Hans Verkuil
@ 2018-10-09  4:23           ` Tomasz Figa
  2018-10-09  6:39             ` Hans Verkuil
  0 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-10-09  4:23 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia, Maxime Jourdan

On Mon, Oct 8, 2018 at 9:22 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 09/19/2018 12:17 PM, Tomasz Figa wrote:
> > Hi Hans,
> >
> > On Thu, Jul 26, 2018 at 7:20 PM Tomasz Figa <tfiga@chromium.org> wrote:
> >>
> >> Hi Hans,
> >>
> >> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >>>
> >>> Hi Tomasz,
> >>>
> >>> Many, many thanks for working on this! It's a great document and when done
> >>> it will be very useful indeed.
> >>>
> >>> Review comments follow...
> >>
> >> Thanks for review!
> >>
> >>>
> >>> On 24/07/18 16:06, Tomasz Figa wrote:
> > [snip]
> >>>> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
> >>>> +    on the ``CAPTURE`` queue.
> >>>> +
> >>>> +    * **Required fields:**
> >>>> +
> >>>> +      ``count``
> >>>> +          requested number of buffers to allocate; greater than zero
> >>>> +
> >>>> +      ``type``
> >>>> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> >>>> +
> >>>> +      ``memory``
> >>>> +          follows standard semantics
> >>>> +
> >>>> +    * **Return fields:**
> >>>> +
> >>>> +      ``count``
> >>>> +          adjusted to allocated number of buffers
> >>>> +
> >>>> +    * The driver must adjust count to minimum of required number of
> >>>> +      destination buffers for given format and stream configuration and the
> >>>> +      count passed. The client must check this value after the ioctl
> >>>> +      returns to get the number of buffers allocated.
> >>>> +
> >>>> +    .. note::
> >>>> +
> >>>> +       To allocate more than minimum number of buffers (for pipeline
> >>>> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> >>>> +       get minimum number of buffers required, and pass the obtained value
> >>>> +       plus the number of additional buffers needed in count to
> >>>> +       :c:func:`VIDIOC_REQBUFS`.
> >>>
> >>>
> >>> I think we should mention here the option of using VIDIOC_CREATE_BUFS in order
> >>> to allocate buffers larger than the current CAPTURE format in order to accommodate
> >>> future resolution changes.
> >>
> >> Ack.
> >>
> >
> > I'm about to add a paragraph to describe this, but there is one detail
> > to iron out.
> >
> > The VIDIOC_CREATE_BUFS ioctl accepts a v4l2_format struct. Userspace
> > needs to fill in this struct and the specs says that
> >
> >   "Usually this will be done using the VIDIOC_TRY_FMT or VIDIOC_G_FMT
> > ioctls to ensure that the requested format is supported by the
> > driver."
> >
> > However, in case of a decoder, those calls would fixup the format to
> > match the currently parsed stream, which would likely resolve to the
> > current coded resolution (~hardware alignment). How do we get a format
> > for the desired maximum resolution?
>
> You would call G_FMT to get the current format/resolution, then update
> width and height and call TRY_FMT.
>
> Although to be honest you can also just set pixelformat and width/height
> and zero everything else and call TRY_FMT directly, skipping the G_FMT
> ioctl.
>

Wouldn't TRY_FMT adjust the width and height back to match current stream?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-09  4:23           ` Tomasz Figa
@ 2018-10-09  6:39             ` Hans Verkuil
  0 siblings, 0 replies; 62+ messages in thread
From: Hans Verkuil @ 2018-10-09  6:39 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia, Maxime Jourdan

On 10/09/2018 06:23 AM, Tomasz Figa wrote:
> On Mon, Oct 8, 2018 at 9:22 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>
>> On 09/19/2018 12:17 PM, Tomasz Figa wrote:
>>> Hi Hans,
>>>
>>> On Thu, Jul 26, 2018 at 7:20 PM Tomasz Figa <tfiga@chromium.org> wrote:
>>>>
>>>> Hi Hans,
>>>>
>>>> On Wed, Jul 25, 2018 at 8:59 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>>>>>
>>>>> Hi Tomasz,
>>>>>
>>>>> Many, many thanks for working on this! It's a great document and when done
>>>>> it will be very useful indeed.
>>>>>
>>>>> Review comments follow...
>>>>
>>>> Thanks for review!
>>>>
>>>>>
>>>>> On 24/07/18 16:06, Tomasz Figa wrote:
>>> [snip]
>>>>>> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
>>>>>> +    on the ``CAPTURE`` queue.
>>>>>> +
>>>>>> +    * **Required fields:**
>>>>>> +
>>>>>> +      ``count``
>>>>>> +          requested number of buffers to allocate; greater than zero
>>>>>> +
>>>>>> +      ``type``
>>>>>> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
>>>>>> +
>>>>>> +      ``memory``
>>>>>> +          follows standard semantics
>>>>>> +
>>>>>> +    * **Return fields:**
>>>>>> +
>>>>>> +      ``count``
>>>>>> +          adjusted to allocated number of buffers
>>>>>> +
>>>>>> +    * The driver must adjust count to minimum of required number of
>>>>>> +      destination buffers for given format and stream configuration and the
>>>>>> +      count passed. The client must check this value after the ioctl
>>>>>> +      returns to get the number of buffers allocated.
>>>>>> +
>>>>>> +    .. note::
>>>>>> +
>>>>>> +       To allocate more than minimum number of buffers (for pipeline
>>>>>> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
>>>>>> +       get minimum number of buffers required, and pass the obtained value
>>>>>> +       plus the number of additional buffers needed in count to
>>>>>> +       :c:func:`VIDIOC_REQBUFS`.
>>>>>
>>>>>
>>>>> I think we should mention here the option of using VIDIOC_CREATE_BUFS in order
>>>>> to allocate buffers larger than the current CAPTURE format in order to accommodate
>>>>> future resolution changes.
>>>>
>>>> Ack.
>>>>
>>>
>>> I'm about to add a paragraph to describe this, but there is one detail
>>> to iron out.
>>>
>>> The VIDIOC_CREATE_BUFS ioctl accepts a v4l2_format struct. Userspace
>>> needs to fill in this struct and the specs says that
>>>
>>>   "Usually this will be done using the VIDIOC_TRY_FMT or VIDIOC_G_FMT
>>> ioctls to ensure that the requested format is supported by the
>>> driver."
>>>
>>> However, in case of a decoder, those calls would fixup the format to
>>> match the currently parsed stream, which would likely resolve to the
>>> current coded resolution (~hardware alignment). How do we get a format
>>> for the desired maximum resolution?
>>
>> You would call G_FMT to get the current format/resolution, then update
>> width and height and call TRY_FMT.
>>
>> Although to be honest you can also just set pixelformat and width/height
>> and zero everything else and call TRY_FMT directly, skipping the G_FMT
>> ioctl.
>>
> 
> Wouldn't TRY_FMT adjust the width and height back to match current stream?

Huh. Hmm. Grrr.

Good point and I didn't read your original comment carefully enough.

Suggestions on a postcard...

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-08-08  2:55             ` Tomasz Figa
  2018-08-21 11:29               ` Stanimir Varbanov
@ 2018-10-15 10:13               ` Tomasz Figa
  2018-10-16  1:09                 ` Nicolas Dufresne
  1 sibling, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-10-15 10:13 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Wed, Aug 8, 2018 at 11:55 AM Tomasz Figa <tfiga@chromium.org> wrote:
>
> On Tue, Aug 7, 2018 at 4:37 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >
> > On 08/07/2018 09:05 AM, Tomasz Figa wrote:
> > > On Thu, Jul 26, 2018 at 7:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > >>>> I wonder if we should make these min buffer controls required. It might be easier
> > >>>> that way.
> > >>>
> > >>> Agreed. Although userspace is still free to ignore it, because REQBUFS
> > >>> would do the right thing anyway.
> > >>
> > >> It's never been entirely clear to me what the purpose of those min buffers controls
> > >> is. REQBUFS ensures that the number of buffers is at least the minimum needed to
> > >> make the HW work. So why would you need these controls? It only makes sense if they
> > >> return something different from REQBUFS.
> > >>
> > >
> > > The purpose of those controls is to let the client allocate a number
> > > of buffers bigger than minimum, without the need to allocate the
> > > minimum number of buffers first (to just learn the number), free them
> > > and then allocate a bigger number again.
> >
> > I don't feel this is particularly useful. One problem with the minimum number
> > of buffers as used in the kernel is that it is often the minimum number of
> > buffers required to make the hardware work, but it may not be optimal. E.g.
> > quite a few capture drivers set the minimum to 2, which is enough for the
> > hardware, but it will likely lead to dropped frames. You really need 3
> > (one is being DMAed, one is queued and linked into the DMA engine and one is
> > being processed by userspace).
> >
> > I would actually prefer this to be the recommended minimum number of buffers,
> > which is >= the minimum REQBUFS uses.
> >
> > I.e., if you use this number and you have no special requirements, then you'll
> > get good performance.
>
> I guess we could make it so. It would make existing user space request
> more buffers than it used to with the original meaning, but I guess it
> shouldn't be a big problem.

I gave it a bit more thought and I feel like kernel is not the right
place to put any assumptions on what the userspace expects "good
performance" to be. Actually, having these controls return the minimum
number of buffers as REQBUFS would allocate makes it very well
specified - with this number you can only process frame by frame and
the number of buffers added by userspace defines exactly the queue
depth. It leaves no space for driver-specific quirks, because the
driver doesn't decide what's "good performance" anymore.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-15 10:13               ` Tomasz Figa
@ 2018-10-16  1:09                 ` Nicolas Dufresne
  0 siblings, 0 replies; 62+ messages in thread
From: Nicolas Dufresne @ 2018-10-16  1:09 UTC (permalink / raw)
  To: Tomasz Figa, Hans Verkuil
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, Paul Kocialkowski, Laurent Pinchart, dave.stevenson,
	Ezequiel Garcia

Le lundi 15 octobre 2018 à 19:13 +0900, Tomasz Figa a écrit :
> On Wed, Aug 8, 2018 at 11:55 AM Tomasz Figa <tfiga@chromium.org> wrote:
> > 
> > On Tue, Aug 7, 2018 at 4:37 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > > 
> > > On 08/07/2018 09:05 AM, Tomasz Figa wrote:
> > > > On Thu, Jul 26, 2018 at 7:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> > > > > > > I wonder if we should make these min buffer controls required. It might be easier
> > > > > > > that way.
> > > > > > 
> > > > > > Agreed. Although userspace is still free to ignore it, because REQBUFS
> > > > > > would do the right thing anyway.
> > > > > 
> > > > > It's never been entirely clear to me what the purpose of those min buffers controls
> > > > > is. REQBUFS ensures that the number of buffers is at least the minimum needed to
> > > > > make the HW work. So why would you need these controls? It only makes sense if they
> > > > > return something different from REQBUFS.
> > > > > 
> > > > 
> > > > The purpose of those controls is to let the client allocate a number
> > > > of buffers bigger than minimum, without the need to allocate the
> > > > minimum number of buffers first (to just learn the number), free them
> > > > and then allocate a bigger number again.
> > > 
> > > I don't feel this is particularly useful. One problem with the minimum number
> > > of buffers as used in the kernel is that it is often the minimum number of
> > > buffers required to make the hardware work, but it may not be optimal. E.g.
> > > quite a few capture drivers set the minimum to 2, which is enough for the
> > > hardware, but it will likely lead to dropped frames. You really need 3
> > > (one is being DMAed, one is queued and linked into the DMA engine and one is
> > > being processed by userspace).
> > > 
> > > I would actually prefer this to be the recommended minimum number of buffers,
> > > which is >= the minimum REQBUFS uses.
> > > 
> > > I.e., if you use this number and you have no special requirements, then you'll
> > > get good performance.
> > 
> > I guess we could make it so. It would make existing user space request
> > more buffers than it used to with the original meaning, but I guess it
> > shouldn't be a big problem.
> 
> I gave it a bit more thought and I feel like kernel is not the right
> place to put any assumptions on what the userspace expects "good
> performance" to be. Actually, having these controls return the minimum
> number of buffers as REQBUFS would allocate makes it very well
> specified - with this number you can only process frame by frame and
> the number of buffers added by userspace defines exactly the queue
> depth. It leaves no space for driver-specific quirks, because the
> driver doesn't decide what's "good performance" anymore.

I agree on that and I would add that the driver making any assumption
would lead to memory waste in context where less buffer will still work
(think of fence based operation as an example).

> 
> Best regards,
> Tomasz


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-08-07  6:54     ` Tomasz Figa
  2018-08-07  7:25       ` Hans Verkuil
@ 2018-10-16  7:36       ` Tomasz Figa
  2018-10-16 13:49         ` Hans Verkuil
  1 sibling, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-10-16  7:36 UTC (permalink / raw)
  To: Hans Verkuil, Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Tue, Aug 7, 2018 at 3:54 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Hans,
>
> On Wed, Jul 25, 2018 at 10:57 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >
> > On 24/07/18 16:06, Tomasz Figa wrote:
[snip]
> > > +4. The client may set the raw source format on the ``OUTPUT`` queue via
> > > +   :c:func:`VIDIOC_S_FMT`.
> > > +
> > > +   * **Required fields:**
> > > +
> > > +     ``type``
> > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > > +
> > > +     ``pixelformat``
> > > +         raw format of the source
> > > +
> > > +     ``width``, ``height``
> > > +         source resolution
> > > +
> > > +     ``num_planes`` (for _MPLANE)
> > > +         set to number of planes for pixelformat
> > > +
> > > +     ``sizeimage``, ``bytesperline``
> > > +         follow standard semantics
> > > +
> > > +   * **Return fields:**
> > > +
> > > +     ``width``, ``height``
> > > +         may be adjusted by driver to match alignment requirements, as
> > > +         required by the currently selected formats
> > > +
> > > +     ``sizeimage``, ``bytesperline``
> > > +         follow standard semantics
> > > +
> > > +   * Setting the source resolution will reset visible resolution to the
> > > +     adjusted source resolution rounded up to the closest visible
> > > +     resolution supported by the driver. Similarly, coded resolution will
> >
> > coded -> the coded
>
> Ack.
>
> >
> > > +     be reset to source resolution rounded up to the closest coded
> >
> > reset -> set
> > source -> the source
>
> Ack.
>

Actually, I'd prefer to keep it at "reset", so that it signifies the
fact that the driver will actually override whatever was set by the
application before.

[snip]
> > > +   * The driver must expose following selection targets on ``OUTPUT``:
> > > +
> > > +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> > > +         maximum crop bounds within the source buffer supported by the
> > > +         encoder
> > > +
> > > +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> > > +         suggested cropping rectangle that covers the whole source picture
> > > +
> > > +     ``V4L2_SEL_TGT_CROP``
> > > +         rectangle within the source buffer to be encoded into the
> > > +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> > > +         maximum rectangle within the coded resolution, which the cropped
> > > +         source frame can be output into; always equal to (0, 0)x(width of
> > > +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> > > +         hardware does not support compose/scaling
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> > > +         equal to ``V4L2_SEL_TGT_CROP``
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE``
> > > +         rectangle within the coded frame, which the cropped source frame
> > > +         is to be output into; defaults to
> > > +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> > > +         additional compose/scaling capabilities; resulting stream will
> > > +         have this rectangle encoded as the visible rectangle in its
> > > +         metadata
> > > +
> > > +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> > > +         always equal to coded resolution of the stream, as selected by the
> > > +         encoder based on source resolution and crop/compose rectangles
> >
> > Are there codec drivers that support composition? I can't remember seeing any.
> >
>
> Hmm, I was convinced that MFC could scale and we just lacked support
> in the driver, but I checked the documentation and it doesn't seem to
> be able to do so. I guess we could drop the COMPOSE rectangles for
> now, until we find any hardware that can do scaling or composing on
> the fly.
>

On the other hand, having them defined already wouldn't complicate
existing drivers too much either, because they would just handle all
of them in the same switch case, i.e.

case V4L2_SEL_TGT_COMPOSE_BOUNDS:
case V4L2_SEL_TGT_COMPOSE_DEFAULT:
case V4L2_SEL_TGT_COMPOSE:
case V4L2_SEL_TGT_COMPOSE_PADDED:
     return visible_rectangle;

That would need one change, though. We would define
V4L2_SEL_TGT_COMPOSE_DEFAULT to be equal to (0, 0)x(width of
V4L2_SEL_TGT_CROP - 1, height of ``V4L2_SEL_TGT_CROP - 1), which
makes more sense than current definition, since it would bypass any
compose/scaling by default.

> > > +
> > > +   .. note::
> > > +
> > > +      The driver may adjust the crop/compose rectangles to the nearest
> > > +      supported ones to meet codec and hardware requirements.
> > > +
> > > +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> > > +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> > > +
> > > +   * **Required fields:**
> > > +
> > > +     ``count``
> > > +         requested number of buffers to allocate; greater than zero
> > > +
> > > +     ``type``
> > > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> > > +         ``CAPTURE``
> > > +
> > > +     ``memory``
> > > +         follows standard semantics
> > > +
> > > +   * **Return fields:**
> > > +
> > > +     ``count``
> > > +         adjusted to allocated number of buffers
> > > +
> > > +   * The driver must adjust count to minimum of required number of
> > > +     buffers for given format and count passed.
> >
> > I'd rephrase this:
> >
> >         The driver must adjust ``count`` to the maximum of ``count`` and
> >         the required number of buffers for the given format.
> >
> > Note that this is set to the maximum, not minimum.
> >
>
> Good catch. Will fix it.
>

Actually this is not always the maximum. Encoders may also have
constraints on the maximum number of buffers, so how about just making
it a bit less specific:

The count will be adjusted by the driver to match the hardware
requirements. The client must check the final value after the ioctl
returns to get the number of buffers allocated.

[snip]
> > One general comment:
> >
> > you often talk about 'the driver must', e.g.:
> >
> > "The driver must process and encode as normal all ``OUTPUT`` buffers
> > queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued."
> >
> > But this is not a driver specification, it is an API specification.
> >
> > I think it would be better to phrase it like this:
> >
> > "All ``OUTPUT`` buffers queued by the client before the :c:func:`VIDIOC_ENCODER_CMD`
> > was issued will be processed and encoded as normal."
> >
> > (or perhaps even 'shall' if you want to be really formal)
> >
> > End-users do not really care what drivers do, they want to know what the API does,
> > and that implies rules for drivers.
>
> While I see the point, I'm not fully convinced that it makes the
> documentation easier to read. We defined "client" for the purpose of
> not using the passive form too much, so possibly we could also define
> "driver" in the glossary. Maybe it's just me, but I find that
> referring directly to both sides of the API and using the active form
> is much easier to read.
>
> Possibly just replacing "driver" with "encoder" would ease your concern?

I actually went ahead and rephrased the text of both encoder and
decoder to be more userspace-centric. There are still mentions of a
driver, but only limited to the places
where it is necessary to signify the driver-specific bits, such as
alignments, capabilities, etc.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-10-16  7:36       ` Tomasz Figa
@ 2018-10-16 13:49         ` Hans Verkuil
  2018-10-22  4:50           ` Tomasz Figa
  0 siblings, 1 reply; 62+ messages in thread
From: Hans Verkuil @ 2018-10-16 13:49 UTC (permalink / raw)
  To: Tomasz Figa, Philipp Zabel
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On 10/16/18 09:36, Tomasz Figa wrote:
> On Tue, Aug 7, 2018 at 3:54 PM Tomasz Figa <tfiga@chromium.org> wrote:
>>>> +   * The driver must expose following selection targets on ``OUTPUT``:
>>>> +
>>>> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
>>>> +         maximum crop bounds within the source buffer supported by the
>>>> +         encoder
>>>> +
>>>> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
>>>> +         suggested cropping rectangle that covers the whole source picture
>>>> +
>>>> +     ``V4L2_SEL_TGT_CROP``
>>>> +         rectangle within the source buffer to be encoded into the
>>>> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
>>>> +         maximum rectangle within the coded resolution, which the cropped
>>>> +         source frame can be output into; always equal to (0, 0)x(width of
>>>> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
>>>> +         hardware does not support compose/scaling

Re-reading this I would rewrite this a bit:

if the hardware does not support composition or scaling, then this is always
equal to (0, 0)x(width of ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``).

>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
>>>> +         equal to ``V4L2_SEL_TGT_CROP``
>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE``
>>>> +         rectangle within the coded frame, which the cropped source frame
>>>> +         is to be output into; defaults to
>>>> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
>>>> +         additional compose/scaling capabilities; resulting stream will
>>>> +         have this rectangle encoded as the visible rectangle in its
>>>> +         metadata
>>>> +
>>>> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
>>>> +         always equal to coded resolution of the stream, as selected by the
>>>> +         encoder based on source resolution and crop/compose rectangles
>>>
>>> Are there codec drivers that support composition? I can't remember seeing any.
>>>
>>
>> Hmm, I was convinced that MFC could scale and we just lacked support
>> in the driver, but I checked the documentation and it doesn't seem to
>> be able to do so. I guess we could drop the COMPOSE rectangles for
>> now, until we find any hardware that can do scaling or composing on
>> the fly.
>>
> 
> On the other hand, having them defined already wouldn't complicate
> existing drivers too much either, because they would just handle all
> of them in the same switch case, i.e.
> 
> case V4L2_SEL_TGT_COMPOSE_BOUNDS:
> case V4L2_SEL_TGT_COMPOSE_DEFAULT:
> case V4L2_SEL_TGT_COMPOSE:
> case V4L2_SEL_TGT_COMPOSE_PADDED:
>      return visible_rectangle;
> 
> That would need one change, though. We would define
> V4L2_SEL_TGT_COMPOSE_DEFAULT to be equal to (0, 0)x(width of
> V4L2_SEL_TGT_CROP - 1, height of ``V4L2_SEL_TGT_CROP - 1), which

" - 1"? Where does that come from?

Usually rectangles are specified as widthxheight@left,top.

> makes more sense than current definition, since it would bypass any
> compose/scaling by default.

I have no problem with drivers optionally implementing these rectangles,
even if they don't do scaling or composition. The question is, should it
be required for decoders? If there is a good reason, then I'm OK with it.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
                     ` (4 preceding siblings ...)
  2018-08-31  8:26   ` Alexandre Courbot
@ 2018-10-17 13:34   ` Laurent Pinchart
  2018-10-18 10:03     ` Tomasz Figa
  5 siblings, 1 reply; 62+ messages in thread
From: Laurent Pinchart @ 2018-10-17 13:34 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: linux-media, linux-kernel, Stanimir Varbanov,
	Mauro Carvalho Chehab, Hans Verkuil, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Dave Stevenson,
	ezequiel

Hi Tomasz,

Thank you for the patch.

On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
> Due to complexity of the video decoding process, the V4L2 drivers of
> stateful decoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> decoding, seek, pause, dynamic resolution change, drain and end of
> stream.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the decoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
>  3 files changed, 882 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst
> b/Documentation/media/uapi/v4l/dev-decoder.rst new file mode 100644
> index 000000000000..f55d34d2f860
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> @@ -0,0 +1,872 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _decoder:
> +
> +****************************************
> +Memory-to-memory Video Decoder Interface
> +****************************************
> +
> +Input data to a video decoder are buffers containing unprocessed video
> +stream (e.g. Annex-B H.264/HEVC stream, raw VP8/9 stream). The driver is
> +expected not to require any additional information from the client to
> +process these buffers. Output data are raw video frames returned in display
> +order.
> +
> +Performing software parsing, processing etc. of the stream in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Decoder Interface (in
> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================
> +
> +1. The general V4L2 API rules apply if not specified in this document
> +   otherwise.
> +
> +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> +   2119.
> +
> +3. All steps not marked “optional” are required.
> +
> +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> +   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> +   unless specified otherwise.
> +
> +5. Single-plane API (see spec) and applicable structures may be used
> +   interchangeably with Multi-plane API, unless specified otherwise,
> +   depending on driver capabilities and following the general V4L2
> +   guidelines.

How about also allowing VIDIOC_CREATE_BUFS where VIDIOC_REQBUFS is mentioned ?

> +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> +   [0..2]: i = 0, 1, 2.
> +
> +7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
> +   containing data (decoded frame/stream) that resulted from processing + 
>  buffer A.
> +
> +Glossary
> +========
> +
> +CAPTURE
> +   the destination buffer queue; the queue of buffers containing decoded
> +   frames; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
> +   hardware into ``CAPTURE`` buffers
> +
> +client
> +   application client communicating with the driver implementing this API
> +
> +coded format
> +   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.); see
> +   also: raw format
> +
> +coded height
> +   height for given coded resolution
> +
> +coded resolution
> +   stream resolution in pixels aligned to codec and hardware requirements;
> +   typically visible resolution rounded up to full macroblocks;
> +   see also: visible resolution
> +
> +coded width
> +   width for given coded resolution
> +
> +decode order
> +   the order in which frames are decoded; may differ from display order if
> +   coded format includes a feature of frame reordering; ``OUTPUT`` buffers
> +   must be queued by the client in decode order
> +
> +destination
> +   data resulting from the decode process; ``CAPTURE``
> +
> +display order
> +   the order in which frames must be displayed; ``CAPTURE`` buffers must be
> +   returned by the driver in display order
> +
> +DPB
> +   Decoded Picture Buffer; a H.264 term for a buffer that stores a picture
> +   that is encoded or decoded and available for reference in further
> +   decode/encode steps.

By "encoded or decoded", do you mean "raw frames to be encoded (in the encoder 
use case) or decoded raw frames (in the decoder use case)" ? I think this 
should be clarified.

> +EOS
> +   end of stream
> +
> +IDR
> +   a type of a keyframe in H.264-encoded stream, which clears the list of
> +   earlier reference frames (DPBs)
> +
> +keyframe
> +   an encoded frame that does not reference frames decoded earlier, i.e.
> +   can be decoded fully on its own.
> +
> +OUTPUT
> +   the source buffer queue; the queue of buffers containing encoded
> +   bitstream; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
> +   from ``OUTPUT`` buffers
> +
> +PPS
> +   Picture Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +raw format
> +   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> +
> +resume point
> +   a point in the bitstream from which decoding may start/continue, without
> +   any previous state/data present, e.g.: a keyframe (VP8/VP9) or +  
> SPS/PPS/IDR sequence (H.264); a resume point is required to start decode + 
>  of a new stream, or to resume decoding after a seek
> +
> +source
> +   data fed to the decoder; ``OUTPUT``
> +
> +SPS
> +   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> +
> +visible height
> +   height for given visible resolution; display height
> +
> +visible resolution
> +   stream resolution of the visible picture, in pixels, to be used for
> +   display purposes; must be smaller or equal to coded resolution;
> +   display resolution
> +
> +visible width
> +   width for given visible resolution; display width
> +
> +Querying capabilities
> +=====================
> +
> +1. To enumerate the set of coded formats supported by the driver, the
> +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> +
> +   * The driver must always return the full set of supported formats,
> +     irrespective of the format set on the ``CAPTURE``.
> +
> +2. To enumerate the set of supported raw formats, the client may call
> +   :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> +
> +   * The driver must return only the formats supported for the format
> +     currently active on ``OUTPUT``.
> +
> +   * In order to enumerate raw formats supported by a given coded format,
> +     the client must first set that coded format on ``OUTPUT`` and then
> +     enumerate the ``CAPTURE`` queue.

Maybe s/enumerate the/enumerate formats on the/ ?

> +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> +   resolutions for a given format, passing desired pixel format in
> +   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
> +     must include all possible coded resolutions supported by the decoder
> +     for given coded pixel format.
> +
> +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
> +     must include all possible frame buffer resolutions supported by the
> +     decoder for given raw pixel format and coded format currently set on
> +     ``OUTPUT``.
> +
> +    .. note::
> +
> +       The client may derive the supported resolution range for a
> +       combination of coded and raw format by setting width and height of
> +       ``OUTPUT`` format to 0 and calculating the intersection of
> +       resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> +       for the given coded and raw formats.

I'm confused by the note, I'm not sure to understand what you mean.

> +4. Supported profiles and levels for given format, if applicable, may be
> +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> +
> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set the coded format on ``OUTPUT`` via :c:func:`VIDIOC_S_FMT`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``pixelformat``
> +         a coded pixel format
> +
> +     ``width``, ``height``
> +         required only if cannot be parsed from the stream for the given
> +         coded format; optional otherwise - set to zero to ignore
> +
> +     other fields
> +         follow standard semantics
> +
> +   * For coded formats including stream resolution information, if width
> +     and height are set to non-zero values, the driver will propagate the
> +     resolution to ``CAPTURE`` and signal a source change event
> +     instantly.

Maybe s/instantly/immediately before returning from :c:func:`VIDIOC_S_FMT`/ ?

> However, after the decoder is done parsing the
> +     information embedded in the stream, it will update ``CAPTURE``

s/update/update the/

> +     format with new values and signal a source change event again, if

s/, if/ if/

> +     the values do not match.
> +
> +   .. note::
> +
> +      Changing ``OUTPUT`` format may change currently set ``CAPTURE``

Do you have a particular dislike for definite articles ? :-) I would have 
written "Changing the ``OUTPUT`` format may change the currently set 
``CAPTURE`` ...". I won't repeat the comment through the whole review, but 
many places seem to be missing a definite article.

> +      format. The driver will derive a new ``CAPTURE`` format from
> +      ``OUTPUT`` format being set, including resolution, colorimetry
> +      parameters, etc. If the client needs a specific ``CAPTURE`` format,
> +      it must adjust it afterwards.
> +
> +3.  *[optional]* Get minimum number of buffers required for ``OUTPUT``
> +    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> +    use more buffers than minimum required by hardware/format.
> +
> +    * **Required fields:**
> +
> +      ``id``
> +          set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> +
> +    * **Return fields:**
> +
> +      ``value``
> +          required number of ``OUTPUT`` buffers for the currently set
> +          format

s/required/required minimum/

> +
> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> +    ``OUTPUT``.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          requested number of buffers to allocate; greater than zero
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +      ``sizeimage``
> +          follows standard semantics; the client is free to choose any
> +          suitable size, however, it may be subject to change by the
> +          driver
> +
> +    * **Return fields:**
> +
> +      ``count``
> +          actual number of buffers allocated
> +
> +    * The driver must adjust count to minimum of required number of
> +      ``OUTPUT`` buffers for given format and count passed.

Isn't it the maximum, not the minimum ?

> The client must
> +      check this value after the ioctl returns to get the number of
> +      buffers allocated.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
> +       get minimum number of buffers required by the driver/format,
> +       and pass the obtained value plus the number of additional
> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> +
> +5.  Start streaming on ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`.
> +
> +6.  This step only applies to coded formats that contain resolution
> +    information in the stream. Continue queuing/dequeuing bitstream
> +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
> +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and returning
> +    each buffer to the client until required metadata to configure the
> +    ``CAPTURE`` queue are found. This is indicated by the driver sending
> +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> +    requirement to pass enough data for this to occur in the first buffer
> +    and the driver must be able to process any number.
> +
> +    * If data in a buffer that triggers the event is required to decode
> +      the first frame, the driver must not return it to the client,
> +      but must retain it for further decoding.
> +
> +    * If the client set width and height of ``OUTPUT`` format to 0, calling
> +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return -EPERM,
> +      until the driver configures ``CAPTURE`` format according to stream
> +      metadata.

That's a pretty harsh handling for this condition. What's the rationale for 
returning -EPERM instead of for instance succeeding with width and height set 
to 0 ?

> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
> +      the event is signaled, the decoding process will not continue until
> +      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> +      command.
> +
> +    .. note::
> +
> +       No decoded frames are produced during this phase.
> +
> +7.  This step only applies to coded formats that contain resolution
> +    information in the stream.
> +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> +    enough data is obtained from the stream to allocate ``CAPTURE``
> +    buffers and to begin producing decoded frames.

Doesn't the last sentence belong to step 6 (where it's already explained to 
some extent) ?

> +
> +    * **Required fields:**
> +
> +      ``type``
> +          set to ``V4L2_EVENT_SOURCE_CHANGE``

Isn't the type field set by the driver ?

> +    * **Return fields:**
> +
> +      ``u.src_change.changes``
> +          set to ``V4L2_EVENT_SRC_CH_RESOLUTION``
> +
> +    * Any client query issued after the driver queues the event must return
> +      values applying to the just parsed stream, including queue formats,
> +      selection rectangles and controls.

To align with the wording used so far, I would say that "the driver must" 
return values applying to the just parsed stream.

I think I would also move this to step 6, as it's related to queuing the 
event, not dequeuing it.

> +8.  Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> +    destination buffers parsed/decoded from the bitstream.
> +
> +    * **Required fields:**
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +    * **Return fields:**
> +
> +      ``width``, ``height``
> +          frame buffer resolution for the decoded frames
> +
> +      ``pixelformat``
> +          pixel format for decoded frames
> +
> +      ``num_planes`` (for _MPLANE ``type`` only)
> +          number of planes for pixelformat
> +
> +      ``sizeimage``, ``bytesperline``
> +          as per standard semantics; matching frame buffer format
> +
> +    .. note::
> +
> +       The value of ``pixelformat`` may be any pixel format supported and
> +       must be supported for current stream, based on the information
> +       parsed from the stream and hardware capabilities. It is suggested
> +       that driver chooses the preferred/optimal format for given

In compliance with RFC 2119, how about using "Drivers should choose" instead 
of "It is suggested that driver chooses" ?

> +       configuration. For example, a YUV format may be preferred over an
> +       RGB format, if additional conversion step would be required.
> +
> +9.  *[optional]* Enumerate ``CAPTURE`` formats via
> +    :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE`` queue. Once the stream
> +    information is parsed and known, the client may use this ioctl to
> +    discover which raw formats are supported for given stream and select on

s/select on/select one/

> +    of them via :c:func:`VIDIOC_S_FMT`.
> +
> +    .. note::
> +
> +       The driver will return only formats supported for the current stream
> +       parsed in this initialization sequence, even if more formats may be
> +       supported by the driver in general.
> +
> +       For example, a driver/hardware may support YUV and RGB formats for
> +       resolutions 1920x1088 and lower, but only YUV for higher
> +       resolutions (due to hardware limitations). After parsing
> +       a resolution of 1920x1088 or lower, :c:func:`VIDIOC_ENUM_FMT` may
> +       return a set of YUV and RGB pixel formats, but after parsing
> +       resolution higher than 1920x1088, the driver will not return RGB,
> +       unsupported for this resolution.
> +
> +       However, subsequent resolution change event triggered after
> +       discovering a resolution change within the same stream may switch
> +       the stream into a lower resolution and :c:func:`VIDIOC_ENUM_FMT`
> +       would return RGB formats again in that case.
> +
> +10.  *[optional]* Choose a different ``CAPTURE`` format than suggested via
> +     :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the
> +     client to choose a different format than selected/suggested by the

And here, "A client may choose" ?

> +     driver in :c:func:`VIDIOC_G_FMT`.
> +
> +     * **Required fields:**
> +
> +       ``type``
> +           a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +       ``pixelformat``
> +           a raw pixel format
> +
> +     .. note::
> +
> +        Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> +        formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to
> +        find out a set of allowed formats for given configuration, but not
> +        required, if the client can accept the defaults.

s/required/required,/

> +
> +11. *[optional]* Acquire visible resolution via
> +    :c:func:`VIDIOC_G_SELECTION`.
> +
> +    * **Required fields:**
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``target``
> +          set to ``V4L2_SEL_TGT_COMPOSE``
> +
> +    * **Return fields:**
> +
> +      ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +          visible rectangle; this must fit within frame buffer resolution
> +          returned by :c:func:`VIDIOC_G_FMT`.
> +
> +    * The driver must expose following selection targets on ``CAPTURE``:
> +
> +      ``V4L2_SEL_TGT_CROP_BOUNDS``
> +          corresponds to coded resolution of the stream
> +
> +      ``V4L2_SEL_TGT_CROP_DEFAULT``
> +          a rectangle covering the part of the frame buffer that contains
> +          meaningful picture data (visible area); width and height will be
> +          equal to visible resolution of the stream
> +
> +      ``V4L2_SEL_TGT_CROP``
> +          rectangle within coded resolution to be output to ``CAPTURE``;
> +          defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``; read-only on hardware
> +          without additional compose/scaling capabilities
> +
> +      ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> +          maximum rectangle within ``CAPTURE`` buffer, which the cropped
> +          frame can be output into; equal to ``V4L2_SEL_TGT_CROP``, if the
> +          hardware does not support compose/scaling
> +
> +      ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> +          equal to ``V4L2_SEL_TGT_CROP``
> +
> +      ``V4L2_SEL_TGT_COMPOSE``
> +          rectangle inside ``OUTPUT`` buffer into which the cropped frame

s/OUTPUT/CAPTURE/ ?

> +          is output; defaults to ``V4L2_SEL_TGT_COMPOSE_DEFAULT``;

and "is captured" or "is written" ?

> +          read-only on hardware without additional compose/scaling
> +          capabilities
> +
> +      ``V4L2_SEL_TGT_COMPOSE_PADDED``
> +          rectangle inside ``OUTPUT`` buffer which is overwritten by the

Here too ?

> +          hardware; equal to ``V4L2_SEL_TGT_COMPOSE``, if the hardware

s/, if/ if/

> +          does not write padding pixels
> +
> +12. *[optional]* Get minimum number of buffers required for ``CAPTURE``
> +    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> +    use more buffers than minimum required by hardware/format.
> +
> +    * **Required fields:**
> +
> +      ``id``
> +          set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> +
> +    * **Return fields:**
> +
> +      ``value``
> +          minimum number of buffers required to decode the stream parsed in
> +          this initialization sequence.
> +
> +    .. note::
> +
> +       Note that the minimum number of buffers must be at least the number
> +       required to successfully decode the current stream. This may for
> +       example be the required DPB size for an H.264 stream given the
> +       parsed stream configuration (resolution, level).
> +
> +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
> +    on the ``CAPTURE`` queue.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          requested number of buffers to allocate; greater than zero
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +    * **Return fields:**
> +
> +      ``count``
> +          adjusted to allocated number of buffers
> +
> +    * The driver must adjust count to minimum of required number of

s/minimum/maximum/ ?

Should we also mentioned that if count > minimum, the driver may additionally 
limit the number of buffers based on internal limits (such as maximum memory 
consumption) ?

> +      destination buffers for given format and stream configuration and the
> +      count passed. The client must check this value after the ioctl
> +      returns to get the number of buffers allocated.
> +
> +    .. note::
> +
> +       To allocate more than minimum number of buffers (for pipeline
> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> +       get minimum number of buffers required, and pass the obtained value
> +       plus the number of additional buffers needed in count to
> +       :c:func:`VIDIOC_REQBUFS`.
> +
> +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> +
> +Decoding
> +========
> +
> +This state is reached after a successful initialization sequence. In this
> +state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> +semantics.
> +
> +Both queues operate independently, following standard behavior of V4L2
> +buffer queues and memory-to-memory devices. In addition, the order of
> +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
> +coded format, e.g. frame reordering. The client must not assume any direct
> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
> +
> +The contents of source ``OUTPUT`` buffers depend on active coded pixel
> +format and might be affected by codec-specific extended controls, as stated

s/might/may/

> +in documentation of each format individually.
> +
> +The client must not assume any direct relationship between ``CAPTURE``
> +and ``OUTPUT`` buffers and any specific timing of buffers becoming
> +available to dequeue. Specifically:
> +
> +* a buffer queued to ``OUTPUT`` may result in no buffers being produced
> +  on ``CAPTURE`` (e.g. if it does not contain encoded data, or if only
> +  metadata syntax structures are present in it),
> +
> +* a buffer queued to ``OUTPUT`` may result in more than 1 buffer produced
> +  on ``CAPTURE`` (if the encoded data contained more than one frame, or if
> +  returning a decoded frame allowed the driver to return a frame that
> +  preceded it in decode, but succeeded it in display order),
> +
> +* a buffer queued to ``OUTPUT`` may result in a buffer being produced on
> +  ``CAPTURE`` later into decode process, and/or after processing further
> +  ``OUTPUT`` buffers, or be returned out of order, e.g. if display
> +  reordering is used,
> +
> +* buffers may become available on the ``CAPTURE`` queue without additional

s/buffers/Buffers/

> +  buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of
> +  ``OUTPUT`` buffers being queued in the past and decoding result of which
> +  being available only at later time, due to specifics of the decoding
> +  process.

I understand what you mean, but the wording is weird to my eyes. How about

* Buffers may become available on the ``CAPTURE`` queue without additional 
buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of 
``OUTPUT`` buffers queued in the past whose decoding results are only 
available at later time, due to specifics of the decoding process.

> +Seek
> +====
> +
> +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.

I assume that a seek may result in a source resolution change event, in which 
case the capture queue will be affected. How about stating here that 
controlling seek doesn't require any specific operation on the capture queue, 
but that the capture queue may be affected as per normal decoder operation ? 
We may also want to mention the event as an example.

> +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> +   :c:func:`VIDIOC_STREAMOFF`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> +     treated as returned to the client (following standard semantics).
> +
> +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +   * The driver must be put in a state after seek and be ready to

What do you mean by "a state after seek" ?

> +     accept new source bitstream buffers.
> +
> +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> +   the seek until a suitable resume point is found.
> +
> +   .. note::
> +
> +      There is no requirement to begin queuing stream starting exactly from

s/stream/buffers/ ?

> +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> +      data queued and must keep processing the queued buffers until it
> +      finds a suitable resume point. While looking for a resume point, the
> +      driver processes ``OUTPUT`` buffers and returns them to the client
> +      without producing any decoded frames.
> +
> +      For hardware known to be mishandling seeks to a non-resume point,
> +      e.g. by returning corrupted decoded frames, the driver must be able
> +      to handle such seeks without a crash or any fatal decode error.

This should be true for any hardware, there should never be any crash or fatal 
decode error. I'd write it as

Some hardware is known to mishandle seeks to a non-resume point. Such an 
operation may result in an unspecified number of corrupted decoded frames 
being made available on ``CAPTURE``. Drivers must ensure that no fatal 
decoding errors or crashes occur, and implement any necessary handling and 
work-arounds for hardware issues related to seek operations.

> +4. After a resume point is found, the driver will start returning
> +   ``CAPTURE`` buffers with decoded frames.
> +
> +   * There is no precise specification for ``CAPTURE`` queue of when it
> +     will start producing buffers containing decoded data from buffers
> +     queued after the seek, as it operates independently
> +     from ``OUTPUT`` queue.
> +
> +     * The driver is allowed to and may return a number of remaining

s/is allowed to and may/may/

> +       ``CAPTURE`` buffers containing decoded frames from before the seek
> +       after the seek sequence (STREAMOFF-STREAMON) is performed.

Shouldn't all these buffers be returned when STREAMOFF is called on the OUTPUT 
side ?

> +     * The driver is also allowed to and may not return all decoded frames

s/is also allowed to and may not return/may also not return/

> +       queued but not decode before the seek sequence was initiated. For

s/not decode/not decoded/

> +       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
> +       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> +       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
> +       H’}, {A’, G’, H’}, {G’, H’}.

Related to the previous point, shouldn't this be moved to step 1 ?

> +   .. note::
> +
> +      To achieve instantaneous seek, the client may restart streaming on
> +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
> +
> +Pause
> +=====
> +
> +In order to pause, the client should just cease queuing buffers onto the
> +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> +Without source bitstream data, there is no data to process and the
> hardware +remains idle.
> +
> +Conversely, using :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue indicates
> +a seek, which
> +
> +1. drops all ``OUTPUT`` buffers in flight and
> +2. after a subsequent :c:func:`VIDIOC_STREAMON`, will look for and only
> +   continue from a resume point.
> +
> +This is usually undesirable for pause. The STREAMOFF-STREAMON sequence is
> +intended for seeking.
> +
> +Similarly, ``CAPTURE`` queue should remain streaming as well, as the
> +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> +sets.

And also to drop decoded buffers for instant seek ?

> +Dynamic resolution change
> +=========================
> +
> +A video decoder implementing this interface must support dynamic resolution
> +change, for streams, which include resolution metadata in the bitstream.

s/for streams, which/for streams that/

> +When the decoder encounters a resolution change in the stream, the dynamic
> +resolution change sequence is started.
> +
> +1.  After encountering a resolution change in the stream, the driver must
> +    first process and decode all remaining buffers from before the
> +    resolution change point.
> +
> +2.  After all buffers containing decoded frames from before the resolution
> +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
> +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
> +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> +
> +    * The last buffer from before the change must be marked with
> +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the
> +      drain sequence. The last buffer might be empty (with
> +      :c:type:`v4l2_buffer` ``bytesused`` = 0) and must be ignored by the
> +      client, since it does not contain any decoded frame.
> +
> +    * Any client query issued after the driver queues the event must return
> +      values applying to the stream after the resolution change, including
> +      queue formats, selection rectangles and controls.
> +
> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
> +      the event is signaled, the decoding process will not continue until
> +      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> +      command.

This usage of V4L2_DEC_CMD_START isn't aligned with the documentation of the 
command. I'm not opposed to this, but I think the use cases of decoder 
commands for codecs should be explained in the VIDIOC_DECODER_CMD 
documentation. What bothers me in particular is usage of V4L2_DEC_CMD_START to 
restart the decoder, while no V4L2_DEC_CMD_STOP has been issued. Should we add 
a section that details the decoder state machine with the implicit and 
explicit ways in which it is started and stopped ?

I would also reference step 7 here.

> +    .. note::
> +
> +       Any attempts to dequeue more buffers beyond the buffer marked
> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +       :c:func:`VIDIOC_DQBUF`.
> +
> +3.  The client calls :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` to get the new
> +    format information. This is identical to calling :c:func:`VIDIOC_G_FMT`
> +    after ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization sequence
> +    and should be handled similarly.

As the source resolution change event is mentioned in multiple places, how 
about extracting the related ioctls sequence to a specific section, and 
referencing it where needed (at least from the initialization sequence and 
here) ?

> +    .. note::
> +
> +       It is allowed for the driver not to support the same pixel format as

"Drivers may not support ..."

> +       previously used (before the resolution change) for the new
> +       resolution. The driver must select a default supported pixel format,
> +       return it, if queried using :c:func:`VIDIOC_G_FMT`, and the client
> +       must take note of it.
> +
> +4.  The client acquires visible resolution as in initialization sequence.
> +
> +5.  *[optional]* The client is allowed to enumerate available formats and

s/is allowed to/may/

> +    select a different one than currently chosen (returned via
> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +6.  *[optional]* The client acquires minimum number of buffers as in
> +    initialization sequence.
> +
> +7.  If all the following conditions are met, the client may resume the
> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
> +    sequence:
> +
> +    * ``sizeimage`` of new format is less than or equal to the size of
> +      currently allocated buffers,
> +
> +    * the number of buffers currently allocated is greater than or equal to
> +      the minimum number of buffers acquired in step 6.
> +
> +    In such case, the remaining steps do not apply.
> +
> +    However, if the client intends to change the buffer set, to lower
> +    memory usage or for any other reasons, it may be achieved by following
> +    the steps below.
> +
> +8.  After dequeuing all remaining buffers from the ``CAPTURE`` queue,

This is optional, isn't it ?

> the
> +    client must call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> +    The ``OUTPUT`` queue must remain streaming (calling STREAMOFF on it

:c:func:`VIDIOC_STREAMOFF`

> +    would trigger a seek).
> +
> +9.  The client frees the buffers on the ``CAPTURE`` queue using
> +    :c:func:`VIDIOC_REQBUFS`.
> +
> +    * **Required fields:**
> +
> +      ``count``
> +          set to 0
> +
> +      ``type``
> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +      ``memory``
> +          follows standard semantics
> +
> +10. The client allocates a new set of buffers for the ``CAPTURE`` queue via
> +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> +    the initialization sequence.
> +
> +11. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> +    ``CAPTURE`` queue.
> +
> +During the resolution change sequence, the ``OUTPUT`` queue must remain
> +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue would
> +initiate a seek.
> +
> +The ``OUTPUT`` queue operates separately from the ``CAPTURE`` queue for the
> +duration of the entire resolution change sequence. It is allowed (and
> +recommended for best performance and simplicity) for the client to keep

"The client should (for best performance and simplicity) keep ..."

> +queuing/dequeuing buffers from/to ``OUTPUT`` queue even while processing

s/from\to/to\/from/

> +this sequence.
> +
> +.. note::
> +
> +   It is also possible for this sequence to be triggered without a change

"This sequence may be triggered ..."

> +   in coded resolution, if a different number of ``CAPTURE`` buffers is
> +   required in order to continue decoding the stream or the visible
> +   resolution changes.
> +
> +Drain
> +=====
> +
> +To ensure that all queued ``OUTPUT`` buffers have been processed and
> +related ``CAPTURE`` buffers output to the client, the following drain
> +sequence may be followed. After the drain sequence is complete, the client
> +has received all decoded frames for all ``OUTPUT`` buffers queued before
> +the sequence was started.
> +
> +1. Begin drain by issuing :c:func:`VIDIOC_DECODER_CMD`.
> +
> +   * **Required fields:**
> +
> +     ``cmd``
> +         set to ``V4L2_DEC_CMD_STOP``
> +
> +     ``flags``
> +         set to 0
> +
> +     ``pts``
> +         set to 0
> +
> +2. The driver must process and decode as normal all ``OUTPUT`` buffers
> +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was issued.
> +   Any operations triggered as a result of processing these buffers
> +   (including the initialization and resolution change sequences) must be
> +   processed as normal by both the driver and the client before proceeding
> +   with the drain sequence.
> +
> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_DEC_CMD_STOP`` are
> +   processed:
> +
> +   * If the ``CAPTURE`` queue is streaming, once all decoded frames (if
> +     any) are ready to be dequeued on the ``CAPTURE`` queue, the driver
> +     must send a ``V4L2_EVENT_EOS``.

s/\./event./

Is the event sent on the OUTPUT or CAPTURE queue ? I assume the latter, should 
it be explicitly documented ?

> The driver must also set
> +     ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> +     buffer on the ``CAPTURE`` queue containing the last frame (if any)
> +     produced as a result of processing the ``OUTPUT`` buffers queued
> +     before ``V4L2_DEC_CMD_STOP``. If no more frames are left to be
> +     returned at the point of handling ``V4L2_DEC_CMD_STOP``, the driver
> +     must return an empty buffer (with :c:type:`v4l2_buffer`
> +     ``bytesused`` = 0) as the last buffer with ``V4L2_BUF_FLAG_LAST`` set
> +     instead. Any attempts to dequeue more buffers beyond the buffer marked
> +     with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> +     :c:func:`VIDIOC_DQBUF`.
> +
> +   * If the ``CAPTURE`` queue is NOT streaming, no action is necessary for
> +     ``CAPTURE`` queue and the driver must send a ``V4L2_EVENT_EOS``
> +     immediately after all ``OUTPUT`` buffers in question have been
> +     processed.

What is the use case for this ? Can't we just return an error if decoder isn't 
streaming ?

> +4. At this point, decoding is paused and the driver will accept, but not
> +   process any newly queued ``OUTPUT`` buffers until the client issues
> +   ``V4L2_DEC_CMD_START`` or restarts streaming on any queue.
> +
> +* Once the drain sequence is initiated, the client needs to drive it to
> +  completion, as described by the above steps, unless it aborts the process
> +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue.  The client
> +  is not allowed to issue ``V4L2_DEC_CMD_START`` or ``V4L2_DEC_CMD_STOP``
> +  again while the drain sequence is in progress and they will fail with
> +  -EBUSY error code if attempted.

While this seems OK to me, I think drivers will need help to implement all the 
corner cases correctly without race conditions.

> +* Restarting streaming on ``OUTPUT`` queue will implicitly end the paused
> +  state and reinitialize the decoder (similarly to the seek sequence).
> +  Restarting ``CAPTURE`` queue will not affect an in-progress drain
> +  sequence.
> +
> +* The drivers must also implement :c:func:`VIDIOC_TRY_DECODER_CMD`, as a
> +  way to let the client query the availability of decoder commands.
> +
> +End of stream
> +=============
> +
> +If the decoder encounters an end of stream marking in the stream, the
> +driver must send a ``V4L2_EVENT_EOS`` event

On which queue ?

> to the client after all frames
> +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> +behavior is identical to the drain sequence triggered by the client via
> +``V4L2_DEC_CMD_STOP``.
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior

s/triggers/trigger/

> +of the driver.
> +
> +1. Setting format on ``OUTPUT`` queue may change the set of formats
> +   supported/advertised on the ``CAPTURE`` queue. In particular, it also
> +   means that ``CAPTURE`` format may be reset and the client must not
> +   rely on the previously set format being preserved.
> +
> +2. Enumerating formats on ``CAPTURE`` queue must only return formats
> +   supported for the ``OUTPUT`` format currently set.
> +
> +3. Setting/changing format on ``CAPTURE`` queue does not change formats

Why not just "Setting format" ?

> +   available on ``OUTPUT`` queue. An attempt to set ``CAPTURE`` format that
> +   is not supported for the currently selected ``OUTPUT`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.
> +
> +4. Enumerating formats on ``OUTPUT`` queue always returns the full set of
> +   supported coded formats, irrespective of the current ``CAPTURE``
> +   format.
> +
> +5. After allocating buffers on the ``OUTPUT`` queue, it is not possible to
> +   change format on it.

I'd phrase this as

"While buffers are allocated on the ``OUTPUT`` queue, clients must not change 
the format on the queue. Drivers must return <error code> for any such format 
change attempt."

> +
> +To summarize, setting formats and allocation must always start with the
> +``OUTPUT`` queue and the ``OUTPUT`` queue is the master that governs the
> +set of supported formats for the ``CAPTURE`` queue.

[snip]

-- 
Regards,

Laurent Pinchart




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-07-24 14:06 ` [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface Tomasz Figa
                     ` (2 preceding siblings ...)
  2018-09-07 20:17   ` Ezequiel Garcia
@ 2018-10-17 15:19   ` Laurent Pinchart
  2018-10-22  6:12     ` Tomasz Figa
  3 siblings, 1 reply; 62+ messages in thread
From: Laurent Pinchart @ 2018-10-17 15:19 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: linux-media, linux-kernel, Stanimir Varbanov,
	Mauro Carvalho Chehab, Hans Verkuil, Pawel Osciak,
	Alexandre Courbot, kamil, a.hajda, Kyungmin Park, jtp.park,
	Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Dave Stevenson,
	ezequiel

Hi Tomasz,

Thank you for the patch.

On Tuesday, 24 July 2018 17:06:21 EEST Tomasz Figa wrote:
> Due to complexity of the video encoding process, the V4L2 drivers of
> stateful encoder hardware require specific sequences of V4L2 API calls
> to be followed. These include capability enumeration, initialization,
> encoding, encode parameters change, drain and reset.
> 
> Specifics of the above have been discussed during Media Workshops at
> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> originated at those events was later implemented by the drivers we already
> have merged in mainline, such as s5p-mfc or coda.
> 
> The only thing missing was the real specification included as a part of
> Linux Media documentation. Fix it now and document the encoder part of
> the Codec API.
> 
> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> ---
>  Documentation/media/uapi/v4l/dev-encoder.rst | 550 +++++++++++++++++++
>  Documentation/media/uapi/v4l/devices.rst     |   1 +
>  Documentation/media/uapi/v4l/v4l2.rst        |   2 +
>  3 files changed, 553 insertions(+)
>  create mode 100644 Documentation/media/uapi/v4l/dev-encoder.rst
> 
> diff --git a/Documentation/media/uapi/v4l/dev-encoder.rst
> b/Documentation/media/uapi/v4l/dev-encoder.rst new file mode 100644
> index 000000000000..28be1698e99c
> --- /dev/null
> +++ b/Documentation/media/uapi/v4l/dev-encoder.rst
> @@ -0,0 +1,550 @@
> +.. -*- coding: utf-8; mode: rst -*-
> +
> +.. _encoder:
> +
> +****************************************
> +Memory-to-memory Video Encoder Interface
> +****************************************
> +
> +Input data to a video encoder are raw video frames in display order
> +to be encoded into the output bitstream. Output data are complete chunks of
> +valid bitstream, including all metadata, headers, etc. The resulting
> stream
> +must not need any further post-processing by the client.
> +
> +Performing software stream processing, header generation etc. in the driver
> +in order to support this interface is strongly discouraged. In case such
> +operations are needed, use of Stateless Video Encoder Interface (in

s/use of/use of the/

(and in various places below, as pointed out in the review of patch 1/2)

> +development) is strongly advised.
> +
> +Conventions and notation used in this document
> +==============================================

[snip]

> +Glossary
> +========

[snip]

Let's try to share these two sections between the two documents.

[snip]

> +Initialization
> +==============
> +
> +1. *[optional]* Enumerate supported formats and resolutions. See
> +   capability enumeration.
> +
> +2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> +
> +     ``pixelformat``
> +         set to a coded format to be produced
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         coded resolution (based on currently active ``OUTPUT`` format)

Shouldn't userspace then set the resolution on the CAPTURE queue first ?

> +   .. note::
> +
> +      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
> +      format. The driver will derive a new ``OUTPUT`` format from
> +      ``CAPTURE`` format being set, including resolution, colorimetry
> +      parameters, etc. If the client needs a specific ``OUTPUT`` format,
> +      it must adjust it afterwards.

Doesn't this contradict the "based on currently active ``OUTPUT`` format" 
above ?

> +3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
> +   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``index``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``pixelformat``
> +         raw format supported for the coded format currently selected on
> +         the ``OUTPUT`` queue.
> +
> +4. The client may set the raw source format on the ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_S_FMT`.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``pixelformat``
> +         raw format of the source
> +
> +     ``width``, ``height``
> +         source resolution
> +
> +     ``num_planes`` (for _MPLANE)
> +         set to number of planes for pixelformat
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * **Return fields:**
> +
> +     ``width``, ``height``
> +         may be adjusted by driver to match alignment requirements, as
> +         required by the currently selected formats
> +
> +     ``sizeimage``, ``bytesperline``
> +         follow standard semantics
> +
> +   * Setting the source resolution will reset visible resolution to the
> +     adjusted source resolution rounded up to the closest visible
> +     resolution supported by the driver. Similarly, coded resolution will
> +     be reset to source resolution rounded up to the closest coded
> +     resolution supported by the driver (typically a multiple of
> +     macroblock size).
> +
> +   .. note::
> +
> +      This step is not strictly required, since ``OUTPUT`` is expected to
> +      have a valid default format. However, the client needs to ensure that

s/needs to/must/

> +      ``OUTPUT`` format matches its expectations via either
> +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> +      being the typical scenario, since the default format is unlikely to
> +      be what the client needs.
> +
> +5. *[optional]* Set visible resolution for the stream metadata via
> +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> +
> +   * **Required fields:**
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> +
> +     ``target``
> +         set to ``V4L2_SEL_TGT_CROP``
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle; this must fit within the framebuffer resolution
> +         and might be subject to adjustment to match codec and hardware
> +         constraints

Just for my information, are there use cases for r.left != 0 or r.top != 0 ?

> +   * **Return fields:**
> +
> +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> +         visible rectangle adjusted by the driver
> +
> +   * The driver must expose following selection targets on ``OUTPUT``:
> +
> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> +         maximum crop bounds within the source buffer supported by the
> +         encoder

Will this always match the format on the OUTPUT queue, or can it differ ?

> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> +         suggested cropping rectangle that covers the whole source picture

How can the driver know what to report here, apart from the same value as 
V4L2_SET_TGT_CROP_BOUNDS ?

> +     ``V4L2_SEL_TGT_CROP``
> +         rectangle within the source buffer to be encoded into the
> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> +
> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> +         maximum rectangle within the coded resolution, which the cropped
> +         source frame can be output into; always equal to (0, 0)x(width of
> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> +         hardware does not support compose/scaling
> +
> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> +         equal to ``V4L2_SEL_TGT_CROP``
> +
> +     ``V4L2_SEL_TGT_COMPOSE``
> +         rectangle within the coded frame, which the cropped source frame
> +         is to be output into; defaults to
> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> +         additional compose/scaling capabilities; resulting stream will
> +         have this rectangle encoded as the visible rectangle in its
> +         metadata
> +
> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> +         always equal to coded resolution of the stream, as selected by the
> +         encoder based on source resolution and crop/compose rectangles
> +
> +   .. note::
> +
> +      The driver may adjust the crop/compose rectangles to the nearest
> +      supported ones to meet codec and hardware requirements.
> +
> +6. Allocate buffers for both ``OUTPUT`` and ``CAPTURE`` via
> +   :c:func:`VIDIOC_REQBUFS`. This may be performed in any order.
> +
> +   * **Required fields:**
> +
> +     ``count``
> +         requested number of buffers to allocate; greater than zero
> +
> +     ``type``
> +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT`` or
> +         ``CAPTURE``
> +
> +     ``memory``
> +         follows standard semantics
> +
> +   * **Return fields:**
> +
> +     ``count``
> +         adjusted to allocated number of buffers
> +
> +   * The driver must adjust count to minimum of required number of
> +     buffers for given format and count passed.

s/minimum/maximum/ ?

> The client must
> +     check this value after the ioctl returns to get the number of
> +     buffers actually allocated.
> +
> +   .. note::
> +
> +      To allocate more than minimum number of buffers (for pipeline
> +      depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) or
> +      G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``), respectively,
> +      to get the minimum number of buffers required by the
> +      driver/format, and pass the obtained value plus the number of
> +      additional buffers needed in count field to :c:func:`VIDIOC_REQBUFS`.
> +
> +7. Begin streaming on both ``OUTPUT`` and ``CAPTURE`` queues via
> +   :c:func:`VIDIOC_STREAMON`. This may be performed in any order. Actual
> +   encoding process starts when both queues start streaming.
> +
> +.. note::
> +
> +   If the client stops ``CAPTURE`` during the encode process and then
> +   restarts it again, the encoder will be expected to generate a stream
> +   independent from the stream generated before the stop. Depending on the
> +   coded format, that may imply that:
> +
> +   * encoded frames produced after the restart must not reference any
> +     frames produced before the stop, e.g. no long term references for
> +     H264,
> +
> +   * any headers that must be included in a standalone stream must be
> +     produced again, e.g. SPS and PPS for H264.

s/H264/H.264/

(and in other places too)

> +Encoding
> +========
> +
> +This state is reached after a successful initialization sequence. In
> +this state, client queues and dequeues buffers to both queues via
> +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> +semantics.
> +
> +Both queues operate independently, following standard behavior of V4L2
> +buffer queues and memory-to-memory devices. In addition, the order of
> +encoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> +queuing raw frames to ``OUTPUT`` queue, due to properties of selected
> coded
> +format, e.g. frame reordering. The client must not assume any direct
> +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> +reported by :c:type:`v4l2_buffer` ``timestamp``.
> +
> +Encoding parameter changes
> +==========================
> +
> +The client is allowed to use :c:func:`VIDIOC_S_CTRL` to change encoder
> +parameters at any time. The availability of parameters is driver-specific
> +and the client must query the driver to find the set of available controls.
> +
> +The ability to change each parameter during encoding of is driver-specific,
> +as per standard semantics of the V4L2 control interface. The client may
> +attempt setting a control of its interest during encoding and if it the
> +operation fails with the -EBUSY error code, ``CAPTURE`` queue needs to be
> +stopped for the configuration change to be allowed (following the drain
> +sequence will be  needed to avoid losing already queued/encoded frames).
> +
> +The timing of parameter update is driver-specific, as per standard
> +semantics of the V4L2 control interface. If the client needs to apply the
> +parameters exactly at specific frame and the encoder supports it, using
> +Request API should be considered.
> +
> +Drain
> +=====
> +
> +To ensure that all queued ``OUTPUT`` buffers have been processed and
> +related ``CAPTURE`` buffers output to the client, the following drain
> +sequence may be followed. After the drain sequence is complete, the client
> +has received all encoded frames for all ``OUTPUT`` buffers queued before
> +the sequence was started.
> +
> +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> +
> +   * **Required fields:**
> +
> +     ``cmd``
> +         set to ``V4L2_ENC_CMD_STOP``
> +
> +     ``flags``
> +         set to 0
> +
> +     ``pts``
> +         set to 0
> +
> +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> +
> +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> +   processed:
> +
> +   * Once all decoded frames (if any) are ready to be dequeued on the
> +     ``CAPTURE`` queue

I understand this condition to be equivalent to the main step 3 condition. I 
would thus write it as

"At this point all decoded frames (if any) are ready to be dequeued on the 
``CAPTURE`` queue. The driver must send a ``V4L2_EVENT_EOS``."

> the driver must send a ``V4L2_EVENT_EOS``. The
> +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> +     last frame (if any) produced as a result of processing the ``OUTPUT``
> +     buffers queued before

Unneeded line break ?

> +     ``V4L2_ENC_CMD_STOP``.
> +
> +   * If no more frames are left to be returned at the point of handling
> +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> +     ``V4L2_BUF_FLAG_LAST`` set.
> +
> +   * Any attempts to dequeue more buffers beyond the buffer marked with
> +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> +     :c:func:`VIDIOC_DQBUF`.
> +
> +4. At this point, encoding is paused and the driver will accept, but not
> +   process any newly queued ``OUTPUT`` buffers until the client issues
> +   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
> +
> +* Once the drain sequence is initiated, the client needs to drive it to
> +  completion, as described by the above steps, unless it aborts the process
> +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
> +  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
> +  again while the drain sequence is in progress and they will fail with
> +  -EBUSY error code if attempted.
> +
> +* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
> +  state and make the encoder continue encoding, as long as other encoding
> +  conditions are met. Restarting ``OUTPUT`` queue will not affect an
> +  in-progress drain sequence.

The last sentence seems to contradict the "on any queue" part of step 4. What 
happens if the client restarts streaming on the OUTPUT queue while a drain 
sequence is in progress ?

> +* The drivers must also implement :c:func:`VIDIOC_TRY_ENCODER_CMD`, as a
> +  way to let the client query the availability of encoder commands.
> +
> +Reset
> +=====
> +
> +The client may want to request the encoder to reinitialize the encoding,
> +so that the stream produced becomes independent from the stream generated
> +before. Depending on the coded format, that may imply that:
> +
> +* encoded frames produced after the restart must not reference any frames
> +  produced before the stop, e.g. no long term references for H264,
> +
> +* any headers that must be included in a standalone stream must be produced
> +  again, e.g. SPS and PPS for H264.
> +
> +This can be achieved by performing the reset sequence.
> +
> +1. *[optional]* If the client is interested in encoded frames resulting
> +   from already queued source frames, it needs to perform the Drain
> +   sequence. Otherwise, the reset sequence would cause the already
> +   encoded and not dequeued encoded frames to be lost.
> +
> +2. Stop streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMOFF`. This
> +   will return all currently queued ``CAPTURE`` buffers to the client,
> +   without valid frame data.
> +
> +3. *[optional]* Restart streaming on ``OUTPUT`` queue via
> +   :c:func:`VIDIOC_STREAMOFF` followed by :c:func:`VIDIOC_STREAMON` to
> +   drop any source frames enqueued to the encoder before the reset
> +   sequence. This is useful if the client requires the new stream to begin
> +   at specific source frame. Otherwise, the new stream might include
> +   frames encoded from source frames queued before the reset sequence.
> +
> +4. Restart streaming on ``CAPTURE`` queue via :c:func:`VIDIOC_STREAMON` and
> +   continue with regular encoding sequence. The encoded frames produced
> +   into ``CAPTURE`` buffers from now on will contain a standalone stream
> +   that can be decoded without the need for frames encoded before the reset
> +   sequence.
> +
> +Commit points
> +=============
> +
> +Setting formats and allocating buffers triggers changes in the behavior
> +of the driver.
> +
> +1. Setting format on ``CAPTURE`` queue may change the set of formats
> +   supported/advertised on the ``OUTPUT`` queue. In particular, it also
> +   means that ``OUTPUT`` format may be reset and the client must not
> +   rely on the previously set format being preserved.
> +
> +2. Enumerating formats on ``OUTPUT`` queue must only return formats
> +   supported for the ``CAPTURE`` format currently set.
> +
> +3. Setting/changing format on ``OUTPUT`` queue does not change formats

Just "Setting" ?

> +   available on ``CAPTURE`` queue. An attempt to set ``OUTPUT`` format that
> +   is not supported for the currently selected ``CAPTURE`` format must
> +   result in the driver adjusting the requested format to an acceptable
> +   one.
> +
> +4. Enumerating formats on ``CAPTURE`` queue always returns the full set of
> +   supported coded formats, irrespective of the current ``OUTPUT``
> +   format.
> +
> +5. After allocating buffers on the ``CAPTURE`` queue, it is not possible to
> +   change format on it.
> +
> +To summarize, setting formats and allocation must always start with the
> +``CAPTURE`` queue and the ``CAPTURE`` queue is the master that governs the
> +set of supported formats for the ``OUTPUT`` queue.

[snip]

-- 
Regards,

Laurent Pinchart




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-17 13:34   ` Laurent Pinchart
@ 2018-10-18 10:03     ` Tomasz Figa
  2018-10-18 11:22       ` Laurent Pinchart
                         ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-10-18 10:03 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

Hi Laurent,

On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Tomasz,
>
> Thank you for the patch.

Thanks for your comments! Please see my replies inline.

>
> On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
> > Due to complexity of the video decoding process, the V4L2 drivers of
> > stateful decoder hardware require specific sequences of V4L2 API calls
> > to be followed. These include capability enumeration, initialization,
> > decoding, seek, pause, dynamic resolution change, drain and end of
> > stream.
> >
> > Specifics of the above have been discussed during Media Workshops at
> > LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > originated at those events was later implemented by the drivers we already
> > have merged in mainline, such as s5p-mfc or coda.
> >
> > The only thing missing was the real specification included as a part of
> > Linux Media documentation. Fix it now and document the decoder part of
> > the Codec API.
> >
> > Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > ---
> >  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
> >  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
> >  3 files changed, 882 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> >
> > diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst
> > b/Documentation/media/uapi/v4l/dev-decoder.rst new file mode 100644
> > index 000000000000..f55d34d2f860
> > --- /dev/null
> > +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> > @@ -0,0 +1,872 @@
> > +.. -*- coding: utf-8; mode: rst -*-
> > +
> > +.. _decoder:
> > +
> > +****************************************
> > +Memory-to-memory Video Decoder Interface
> > +****************************************
> > +
> > +Input data to a video decoder are buffers containing unprocessed video
> > +stream (e.g. Annex-B H.264/HEVC stream, raw VP8/9 stream). The driver is
> > +expected not to require any additional information from the client to
> > +process these buffers. Output data are raw video frames returned in display
> > +order.
> > +
> > +Performing software parsing, processing etc. of the stream in the driver
> > +in order to support this interface is strongly discouraged. In case such
> > +operations are needed, use of Stateless Video Decoder Interface (in
> > +development) is strongly advised.
> > +
> > +Conventions and notation used in this document
> > +==============================================
> > +
> > +1. The general V4L2 API rules apply if not specified in this document
> > +   otherwise.
> > +
> > +2. The meaning of words “must”, “may”, “should”, etc. is as per RFC
> > +   2119.
> > +
> > +3. All steps not marked “optional” are required.
> > +
> > +4. :c:func:`VIDIOC_G_EXT_CTRLS`, :c:func:`VIDIOC_S_EXT_CTRLS` may be used
> > +   interchangeably with :c:func:`VIDIOC_G_CTRL`, :c:func:`VIDIOC_S_CTRL`,
> > +   unless specified otherwise.
> > +
> > +5. Single-plane API (see spec) and applicable structures may be used
> > +   interchangeably with Multi-plane API, unless specified otherwise,
> > +   depending on driver capabilities and following the general V4L2
> > +   guidelines.
>
> How about also allowing VIDIOC_CREATE_BUFS where VIDIOC_REQBUFS is mentioned ?
>

In my draft of v2, I explicitly described VIDIOC_CREATE_BUFS in any
step mentioning VIDIOC_REQBUFS. Do you think that's fine too?

> > +6. i = [a..b]: sequence of integers from a to b, inclusive, i.e. i =
> > +   [0..2]: i = 0, 1, 2.
> > +
> > +7. For ``OUTPUT`` buffer A, A’ represents a buffer on the ``CAPTURE`` queue
> > +   containing data (decoded frame/stream) that resulted from processing +
> >  buffer A.
> > +
> > +Glossary
> > +========
> > +
> > +CAPTURE
> > +   the destination buffer queue; the queue of buffers containing decoded
> > +   frames; ``V4L2_BUF_TYPE_VIDEO_CAPTURE```` or
> > +   ``V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE``; data are captured from the
> > +   hardware into ``CAPTURE`` buffers
> > +
> > +client
> > +   application client communicating with the driver implementing this API
> > +
> > +coded format
> > +   encoded/compressed video bitstream format (e.g. H.264, VP8, etc.); see
> > +   also: raw format
> > +
> > +coded height
> > +   height for given coded resolution
> > +
> > +coded resolution
> > +   stream resolution in pixels aligned to codec and hardware requirements;
> > +   typically visible resolution rounded up to full macroblocks;
> > +   see also: visible resolution
> > +
> > +coded width
> > +   width for given coded resolution
> > +
> > +decode order
> > +   the order in which frames are decoded; may differ from display order if
> > +   coded format includes a feature of frame reordering; ``OUTPUT`` buffers
> > +   must be queued by the client in decode order
> > +
> > +destination
> > +   data resulting from the decode process; ``CAPTURE``
> > +
> > +display order
> > +   the order in which frames must be displayed; ``CAPTURE`` buffers must be
> > +   returned by the driver in display order
> > +
> > +DPB
> > +   Decoded Picture Buffer; a H.264 term for a buffer that stores a picture
> > +   that is encoded or decoded and available for reference in further
> > +   decode/encode steps.
>
> By "encoded or decoded", do you mean "raw frames to be encoded (in the encoder
> use case) or decoded raw frames (in the decoder use case)" ? I think this
> should be clarified.
>

Actually it's a decoder-specific term, so changed both decoder and
encoder documents to:

DPB
   Decoded Picture Buffer; an H.264 term for a buffer that stores a decoded
   raw frame available for reference in further decoding steps.

Does it sound better now?

> > +EOS
> > +   end of stream
> > +
> > +IDR
> > +   a type of a keyframe in H.264-encoded stream, which clears the list of
> > +   earlier reference frames (DPBs)
> > +
> > +keyframe
> > +   an encoded frame that does not reference frames decoded earlier, i.e.
> > +   can be decoded fully on its own.
> > +
> > +OUTPUT
> > +   the source buffer queue; the queue of buffers containing encoded
> > +   bitstream; ``V4L2_BUF_TYPE_VIDEO_OUTPUT`` or
> > +   ``V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE``; the hardware is fed with data
> > +   from ``OUTPUT`` buffers
> > +
> > +PPS
> > +   Picture Parameter Set; a type of metadata entity in H.264 bitstream
> > +
> > +raw format
> > +   uncompressed format containing raw pixel data (e.g. YUV, RGB formats)
> > +
> > +resume point
> > +   a point in the bitstream from which decoding may start/continue, without
> > +   any previous state/data present, e.g.: a keyframe (VP8/VP9) or +
> > SPS/PPS/IDR sequence (H.264); a resume point is required to start decode +
> >  of a new stream, or to resume decoding after a seek
> > +
> > +source
> > +   data fed to the decoder; ``OUTPUT``
> > +
> > +SPS
> > +   Sequence Parameter Set; a type of metadata entity in H.264 bitstream
> > +
> > +visible height
> > +   height for given visible resolution; display height
> > +
> > +visible resolution
> > +   stream resolution of the visible picture, in pixels, to be used for
> > +   display purposes; must be smaller or equal to coded resolution;
> > +   display resolution
> > +
> > +visible width
> > +   width for given visible resolution; display width
> > +
> > +Querying capabilities
> > +=====================
> > +
> > +1. To enumerate the set of coded formats supported by the driver, the
> > +   client may call :c:func:`VIDIOC_ENUM_FMT` on ``OUTPUT``.
> > +
> > +   * The driver must always return the full set of supported formats,
> > +     irrespective of the format set on the ``CAPTURE``.
> > +
> > +2. To enumerate the set of supported raw formats, the client may call
> > +   :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE``.
> > +
> > +   * The driver must return only the formats supported for the format
> > +     currently active on ``OUTPUT``.
> > +
> > +   * In order to enumerate raw formats supported by a given coded format,
> > +     the client must first set that coded format on ``OUTPUT`` and then
> > +     enumerate the ``CAPTURE`` queue.
>
> Maybe s/enumerate the/enumerate formats on the/ ?
>
> > +3. The client may use :c:func:`VIDIOC_ENUM_FRAMESIZES` to detect supported
> > +   resolutions for a given format, passing desired pixel format in
> > +   :c:type:`v4l2_frmsizeenum` ``pixel_format``.
> > +
> > +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``OUTPUT``
> > +     must include all possible coded resolutions supported by the decoder
> > +     for given coded pixel format.
> > +
> > +   * Values returned by :c:func:`VIDIOC_ENUM_FRAMESIZES` on ``CAPTURE``
> > +     must include all possible frame buffer resolutions supported by the
> > +     decoder for given raw pixel format and coded format currently set on
> > +     ``OUTPUT``.
> > +
> > +    .. note::
> > +
> > +       The client may derive the supported resolution range for a
> > +       combination of coded and raw format by setting width and height of
> > +       ``OUTPUT`` format to 0 and calculating the intersection of
> > +       resolutions returned from calls to :c:func:`VIDIOC_ENUM_FRAMESIZES`
> > +       for the given coded and raw formats.
>
> I'm confused by the note, I'm not sure to understand what you mean.
>

I'm actually going to remove this. This special case of 0 width and
height is not only ugly, but also wouldn't work with decoders that
actually can do scaling, because the scaling ratio range is often
constant, so the supported scaled frame sizes depend on the exact
coded format.

> > +4. Supported profiles and levels for given format, if applicable, may be
> > +   queried using their respective controls via :c:func:`VIDIOC_QUERYCTRL`.
> > +
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported ``OUTPUT`` formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set the coded format on ``OUTPUT`` via :c:func:`VIDIOC_S_FMT`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``pixelformat``
> > +         a coded pixel format
> > +
> > +     ``width``, ``height``
> > +         required only if cannot be parsed from the stream for the given
> > +         coded format; optional otherwise - set to zero to ignore
> > +
> > +     other fields
> > +         follow standard semantics
> > +
> > +   * For coded formats including stream resolution information, if width
> > +     and height are set to non-zero values, the driver will propagate the
> > +     resolution to ``CAPTURE`` and signal a source change event
> > +     instantly.
>
> Maybe s/instantly/immediately before returning from :c:func:`VIDIOC_S_FMT`/ ?
>
> > However, after the decoder is done parsing the
> > +     information embedded in the stream, it will update ``CAPTURE``
>
> s/update/update the/
>
> > +     format with new values and signal a source change event again, if
>
> s/, if/ if/
>
> > +     the values do not match.
> > +
> > +   .. note::
> > +
> > +      Changing ``OUTPUT`` format may change currently set ``CAPTURE``
>
> Do you have a particular dislike for definite articles ? :-) I would have
> written "Changing the ``OUTPUT`` format may change the currently set
> ``CAPTURE`` ...". I won't repeat the comment through the whole review, but
> many places seem to be missing a definite article.

Saving the^Wworld bandwidth one "the " at a time. ;)

Hans also pointed some of those and I should have most of the missing
ones added in my draft of v2. Thanks.

>
> > +      format. The driver will derive a new ``CAPTURE`` format from
> > +      ``OUTPUT`` format being set, including resolution, colorimetry
> > +      parameters, etc. If the client needs a specific ``CAPTURE`` format,
> > +      it must adjust it afterwards.
> > +
> > +3.  *[optional]* Get minimum number of buffers required for ``OUTPUT``
> > +    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> > +    use more buffers than minimum required by hardware/format.
> > +
> > +    * **Required fields:**
> > +
> > +      ``id``
> > +          set to ``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``
> > +
> > +    * **Return fields:**
> > +
> > +      ``value``
> > +          required number of ``OUTPUT`` buffers for the currently set
> > +          format
>
> s/required/required minimum/

I made it "the minimum number of [...] buffers required".

>
> > +
> > +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> > +    ``OUTPUT``.
> > +
> > +    * **Required fields:**
> > +
> > +      ``count``
> > +          requested number of buffers to allocate; greater than zero
> > +
> > +      ``type``
> > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +      ``memory``
> > +          follows standard semantics
> > +
> > +      ``sizeimage``
> > +          follows standard semantics; the client is free to choose any
> > +          suitable size, however, it may be subject to change by the
> > +          driver
> > +
> > +    * **Return fields:**
> > +
> > +      ``count``
> > +          actual number of buffers allocated
> > +
> > +    * The driver must adjust count to minimum of required number of
> > +      ``OUTPUT`` buffers for given format and count passed.
>
> Isn't it the maximum, not the minimum ?
>

It's actually neither. All we can generally say here is that the
number will be adjusted and the client must note it.

> > The client must
> > +      check this value after the ioctl returns to get the number of
> > +      buffers allocated.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
> > +       get minimum number of buffers required by the driver/format,
> > +       and pass the obtained value plus the number of additional
> > +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > +
> > +5.  Start streaming on ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`.
> > +
> > +6.  This step only applies to coded formats that contain resolution
> > +    information in the stream. Continue queuing/dequeuing bitstream
> > +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
> > +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and returning
> > +    each buffer to the client until required metadata to configure the
> > +    ``CAPTURE`` queue are found. This is indicated by the driver sending
> > +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> > +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> > +    requirement to pass enough data for this to occur in the first buffer
> > +    and the driver must be able to process any number.
> > +
> > +    * If data in a buffer that triggers the event is required to decode
> > +      the first frame, the driver must not return it to the client,
> > +      but must retain it for further decoding.
> > +
> > +    * If the client set width and height of ``OUTPUT`` format to 0, calling
> > +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return -EPERM,
> > +      until the driver configures ``CAPTURE`` format according to stream
> > +      metadata.
>
> That's a pretty harsh handling for this condition. What's the rationale for
> returning -EPERM instead of for instance succeeding with width and height set
> to 0 ?

I don't like it, but the error condition must stay for compatibility
reasons as that's what current drivers implement and applications
expect. (Technically current drivers would return -EINVAL, but we
concluded that existing applications don't care about the exact value,
so we can change it to make more sense.)

>
> > +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
> > +      the event is signaled, the decoding process will not continue until
> > +      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
> > +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> > +      command.
> > +
> > +    .. note::
> > +
> > +       No decoded frames are produced during this phase.
> > +
> > +7.  This step only applies to coded formats that contain resolution
> > +    information in the stream.
> > +    Receive and handle ``V4L2_EVENT_SOURCE_CHANGE`` from the driver
> > +    via :c:func:`VIDIOC_DQEVENT`. The driver must send this event once
> > +    enough data is obtained from the stream to allocate ``CAPTURE``
> > +    buffers and to begin producing decoded frames.
>
> Doesn't the last sentence belong to step 6 (where it's already explained to
> some extent) ?
>
> > +
> > +    * **Required fields:**
> > +
> > +      ``type``
> > +          set to ``V4L2_EVENT_SOURCE_CHANGE``
>
> Isn't the type field set by the driver ?
>
> > +    * **Return fields:**
> > +
> > +      ``u.src_change.changes``
> > +          set to ``V4L2_EVENT_SRC_CH_RESOLUTION``
> > +
> > +    * Any client query issued after the driver queues the event must return
> > +      values applying to the just parsed stream, including queue formats,
> > +      selection rectangles and controls.
>
> To align with the wording used so far, I would say that "the driver must"
> return values applying to the just parsed stream.
>
> I think I would also move this to step 6, as it's related to queuing the
> event, not dequeuing it.

As I've rephrased the whole document to be more userspace-oriented,
this step is actually going away. Step 6 will have a note about driver
behavior.

>
> > +8.  Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for the
> > +    destination buffers parsed/decoded from the bitstream.
> > +
> > +    * **Required fields:**
> > +
> > +      ``type``
> > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +    * **Return fields:**
> > +
> > +      ``width``, ``height``
> > +          frame buffer resolution for the decoded frames
> > +
> > +      ``pixelformat``
> > +          pixel format for decoded frames
> > +
> > +      ``num_planes`` (for _MPLANE ``type`` only)
> > +          number of planes for pixelformat
> > +
> > +      ``sizeimage``, ``bytesperline``
> > +          as per standard semantics; matching frame buffer format
> > +
> > +    .. note::
> > +
> > +       The value of ``pixelformat`` may be any pixel format supported and
> > +       must be supported for current stream, based on the information
> > +       parsed from the stream and hardware capabilities. It is suggested
> > +       that driver chooses the preferred/optimal format for given
>
> In compliance with RFC 2119, how about using "Drivers should choose" instead
> of "It is suggested that driver chooses" ?

The whole paragraph became:

       The value of ``pixelformat`` may be any pixel format supported by the
       decoder for the current stream. It is expected that the decoder chooses
       a preferred/optimal format for the default configuration. For example, a
       YUV format may be preferred over an RGB format, if additional conversion
       step would be required.

>
> > +       configuration. For example, a YUV format may be preferred over an
> > +       RGB format, if additional conversion step would be required.
> > +
> > +9.  *[optional]* Enumerate ``CAPTURE`` formats via
> > +    :c:func:`VIDIOC_ENUM_FMT` on ``CAPTURE`` queue. Once the stream
> > +    information is parsed and known, the client may use this ioctl to
> > +    discover which raw formats are supported for given stream and select on
>
> s/select on/select one/

Done.

>
> > +    of them via :c:func:`VIDIOC_S_FMT`.
> > +
> > +    .. note::
> > +
> > +       The driver will return only formats supported for the current stream
> > +       parsed in this initialization sequence, even if more formats may be
> > +       supported by the driver in general.
> > +
> > +       For example, a driver/hardware may support YUV and RGB formats for
> > +       resolutions 1920x1088 and lower, but only YUV for higher
> > +       resolutions (due to hardware limitations). After parsing
> > +       a resolution of 1920x1088 or lower, :c:func:`VIDIOC_ENUM_FMT` may
> > +       return a set of YUV and RGB pixel formats, but after parsing
> > +       resolution higher than 1920x1088, the driver will not return RGB,
> > +       unsupported for this resolution.
> > +
> > +       However, subsequent resolution change event triggered after
> > +       discovering a resolution change within the same stream may switch
> > +       the stream into a lower resolution and :c:func:`VIDIOC_ENUM_FMT`
> > +       would return RGB formats again in that case.
> > +
> > +10.  *[optional]* Choose a different ``CAPTURE`` format than suggested via
> > +     :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for the
> > +     client to choose a different format than selected/suggested by the
>
> And here, "A client may choose" ?
>
> > +     driver in :c:func:`VIDIOC_G_FMT`.
> > +
> > +     * **Required fields:**
> > +
> > +       ``type``
> > +           a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +       ``pixelformat``
> > +           a raw pixel format
> > +
> > +     .. note::
> > +
> > +        Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently available
> > +        formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful to
> > +        find out a set of allowed formats for given configuration, but not
> > +        required, if the client can accept the defaults.
>
> s/required/required,/

That would become "[...]but not required,, if the client[...]". Is
that your suggestion? ;)

>
> > +
> > +11. *[optional]* Acquire visible resolution via
> > +    :c:func:`VIDIOC_G_SELECTION`.
> > +
> > +    * **Required fields:**
> > +
> > +      ``type``
> > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +      ``target``
> > +          set to ``V4L2_SEL_TGT_COMPOSE``
> > +
> > +    * **Return fields:**
> > +
> > +      ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +          visible rectangle; this must fit within frame buffer resolution
> > +          returned by :c:func:`VIDIOC_G_FMT`.
> > +
> > +    * The driver must expose following selection targets on ``CAPTURE``:
> > +
> > +      ``V4L2_SEL_TGT_CROP_BOUNDS``
> > +          corresponds to coded resolution of the stream
> > +
> > +      ``V4L2_SEL_TGT_CROP_DEFAULT``
> > +          a rectangle covering the part of the frame buffer that contains
> > +          meaningful picture data (visible area); width and height will be
> > +          equal to visible resolution of the stream
> > +
> > +      ``V4L2_SEL_TGT_CROP``
> > +          rectangle within coded resolution to be output to ``CAPTURE``;
> > +          defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``; read-only on hardware
> > +          without additional compose/scaling capabilities
> > +
> > +      ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> > +          maximum rectangle within ``CAPTURE`` buffer, which the cropped
> > +          frame can be output into; equal to ``V4L2_SEL_TGT_CROP``, if the
> > +          hardware does not support compose/scaling
> > +
> > +      ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> > +          equal to ``V4L2_SEL_TGT_CROP``
> > +
> > +      ``V4L2_SEL_TGT_COMPOSE``
> > +          rectangle inside ``OUTPUT`` buffer into which the cropped frame
>
> s/OUTPUT/CAPTURE/ ?
>
> > +          is output; defaults to ``V4L2_SEL_TGT_COMPOSE_DEFAULT``;
>
> and "is captured" or "is written" ?
>
> > +          read-only on hardware without additional compose/scaling
> > +          capabilities
> > +
> > +      ``V4L2_SEL_TGT_COMPOSE_PADDED``
> > +          rectangle inside ``OUTPUT`` buffer which is overwritten by the
>
> Here too ?
>
> > +          hardware; equal to ``V4L2_SEL_TGT_COMPOSE``, if the hardware
>
> s/, if/ if/

Ack +3

>
> > +          does not write padding pixels
> > +
> > +12. *[optional]* Get minimum number of buffers required for ``CAPTURE``
> > +    queue via :c:func:`VIDIOC_G_CTRL`. This is useful if client intends to
> > +    use more buffers than minimum required by hardware/format.
> > +
> > +    * **Required fields:**
> > +
> > +      ``id``
> > +          set to ``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``
> > +
> > +    * **Return fields:**
> > +
> > +      ``value``
> > +          minimum number of buffers required to decode the stream parsed in
> > +          this initialization sequence.
> > +
> > +    .. note::
> > +
> > +       Note that the minimum number of buffers must be at least the number
> > +       required to successfully decode the current stream. This may for
> > +       example be the required DPB size for an H.264 stream given the
> > +       parsed stream configuration (resolution, level).
> > +
> > +13. Allocate destination (raw format) buffers via :c:func:`VIDIOC_REQBUFS`
> > +    on the ``CAPTURE`` queue.
> > +
> > +    * **Required fields:**
> > +
> > +      ``count``
> > +          requested number of buffers to allocate; greater than zero
> > +
> > +      ``type``
> > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +      ``memory``
> > +          follows standard semantics
> > +
> > +    * **Return fields:**
> > +
> > +      ``count``
> > +          adjusted to allocated number of buffers
> > +
> > +    * The driver must adjust count to minimum of required number of
>
> s/minimum/maximum/ ?
>
> Should we also mentioned that if count > minimum, the driver may additionally
> limit the number of buffers based on internal limits (such as maximum memory
> consumption) ?

I made it less specific:

    * The count will be adjusted by the decoder to match the stream and hardware
      requirements. The client must check the final value after the ioctl
      returns to get the number of buffers allocated.

>
> > +      destination buffers for given format and stream configuration and the
> > +      count passed. The client must check this value after the ioctl
> > +      returns to get the number of buffers allocated.
> > +
> > +    .. note::
> > +
> > +       To allocate more than minimum number of buffers (for pipeline
> > +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_CAPTURE``) to
> > +       get minimum number of buffers required, and pass the obtained value
> > +       plus the number of additional buffers needed in count to
> > +       :c:func:`VIDIOC_REQBUFS`.
> > +
> > +14. Call :c:func:`VIDIOC_STREAMON` to initiate decoding frames.
> > +
> > +Decoding
> > +========
> > +
> > +This state is reached after a successful initialization sequence. In this
> > +state, client queues and dequeues buffers to both queues via
> > +:c:func:`VIDIOC_QBUF` and :c:func:`VIDIOC_DQBUF`, following standard
> > +semantics.
> > +
> > +Both queues operate independently, following standard behavior of V4L2
> > +buffer queues and memory-to-memory devices. In addition, the order of
> > +decoded frames dequeued from ``CAPTURE`` queue may differ from the order of
> > +queuing coded frames to ``OUTPUT`` queue, due to properties of selected
> > +coded format, e.g. frame reordering. The client must not assume any direct
> > +relationship between ``CAPTURE`` and ``OUTPUT`` buffers, other than
> > +reported by :c:type:`v4l2_buffer` ``timestamp`` field.
> > +
> > +The contents of source ``OUTPUT`` buffers depend on active coded pixel
> > +format and might be affected by codec-specific extended controls, as stated
>
> s/might/may/
>
> > +in documentation of each format individually.
> > +
> > +The client must not assume any direct relationship between ``CAPTURE``
> > +and ``OUTPUT`` buffers and any specific timing of buffers becoming
> > +available to dequeue. Specifically:
> > +
> > +* a buffer queued to ``OUTPUT`` may result in no buffers being produced
> > +  on ``CAPTURE`` (e.g. if it does not contain encoded data, or if only
> > +  metadata syntax structures are present in it),
> > +
> > +* a buffer queued to ``OUTPUT`` may result in more than 1 buffer produced
> > +  on ``CAPTURE`` (if the encoded data contained more than one frame, or if
> > +  returning a decoded frame allowed the driver to return a frame that
> > +  preceded it in decode, but succeeded it in display order),
> > +
> > +* a buffer queued to ``OUTPUT`` may result in a buffer being produced on
> > +  ``CAPTURE`` later into decode process, and/or after processing further
> > +  ``OUTPUT`` buffers, or be returned out of order, e.g. if display
> > +  reordering is used,
> > +
> > +* buffers may become available on the ``CAPTURE`` queue without additional
>
> s/buffers/Buffers/
>

I don't think the items should be capitalized here.

> > +  buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of
> > +  ``OUTPUT`` buffers being queued in the past and decoding result of which
> > +  being available only at later time, due to specifics of the decoding
> > +  process.
>
> I understand what you mean, but the wording is weird to my eyes. How about
>
> * Buffers may become available on the ``CAPTURE`` queue without additional
> buffers queued to ``OUTPUT`` (e.g. during drain or ``EOS``), because of
> ``OUTPUT`` buffers queued in the past whose decoding results are only
> available at later time, due to specifics of the decoding process.

Done, thanks.

>
> > +Seek
> > +====
> > +
> > +Seek is controlled by the ``OUTPUT`` queue, as it is the source of
> > +bitstream data. ``CAPTURE`` queue remains unchanged/unaffected.
>
> I assume that a seek may result in a source resolution change event, in which
> case the capture queue will be affected. How about stating here that
> controlling seek doesn't require any specific operation on the capture queue,
> but that the capture queue may be affected as per normal decoder operation ?
> We may also want to mention the event as an example.

Done. I've also added a general section about decoder-initialized
sequences in the Decoding section.

>
> > +1. Stop the ``OUTPUT`` queue to begin the seek sequence via
> > +   :c:func:`VIDIOC_STREAMOFF`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +   * The driver must drop all the pending ``OUTPUT`` buffers and they are
> > +     treated as returned to the client (following standard semantics).
> > +
> > +2. Restart the ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +   * The driver must be put in a state after seek and be ready to
>
> What do you mean by "a state after seek" ?
>

   * The decoder will start accepting new source bitstream buffers after the
     call returns.

> > +     accept new source bitstream buffers.
> > +
> > +3. Start queuing buffers to ``OUTPUT`` queue containing stream data after
> > +   the seek until a suitable resume point is found.
> > +
> > +   .. note::
> > +
> > +      There is no requirement to begin queuing stream starting exactly from
>
> s/stream/buffers/ ?

Perhaps "stream data"? The buffers don't have a resume point, the stream does.

>
> > +      a resume point (e.g. SPS or a keyframe). The driver must handle any
> > +      data queued and must keep processing the queued buffers until it
> > +      finds a suitable resume point. While looking for a resume point, the
> > +      driver processes ``OUTPUT`` buffers and returns them to the client
> > +      without producing any decoded frames.
> > +
> > +      For hardware known to be mishandling seeks to a non-resume point,
> > +      e.g. by returning corrupted decoded frames, the driver must be able
> > +      to handle such seeks without a crash or any fatal decode error.
>
> This should be true for any hardware, there should never be any crash or fatal
> decode error. I'd write it as
>
> Some hardware is known to mishandle seeks to a non-resume point. Such an
> operation may result in an unspecified number of corrupted decoded frames
> being made available on ``CAPTURE``. Drivers must ensure that no fatal
> decoding errors or crashes occur, and implement any necessary handling and
> work-arounds for hardware issues related to seek operations.
>

Done.

> > +4. After a resume point is found, the driver will start returning
> > +   ``CAPTURE`` buffers with decoded frames.
> > +
> > +   * There is no precise specification for ``CAPTURE`` queue of when it
> > +     will start producing buffers containing decoded data from buffers
> > +     queued after the seek, as it operates independently
> > +     from ``OUTPUT`` queue.
> > +
> > +     * The driver is allowed to and may return a number of remaining
>
> s/is allowed to and may/may/
>
> > +       ``CAPTURE`` buffers containing decoded frames from before the seek
> > +       after the seek sequence (STREAMOFF-STREAMON) is performed.
>
> Shouldn't all these buffers be returned when STREAMOFF is called on the OUTPUT
> side ?

The queues are independent, so STREAMOFF on OUTPUT would only return
the OUTPUT buffers.

That's why there is the note suggesting that the application may also
stop streaming on CAPTURE to avoid stale frames being returned.

>
> > +     * The driver is also allowed to and may not return all decoded frames
>
> s/is also allowed to and may not return/may also not return/
>
> > +       queued but not decode before the seek sequence was initiated. For
>
> s/not decode/not decoded/
>
> > +       example, given an ``OUTPUT`` queue sequence: QBUF(A), QBUF(B),
> > +       STREAMOFF(OUT), STREAMON(OUT), QBUF(G), QBUF(H), any of the
> > +       following results on the ``CAPTURE`` queue is allowed: {A’, B’, G’,
> > +       H’}, {A’, G’, H’}, {G’, H’}.
>
> Related to the previous point, shouldn't this be moved to step 1 ?

I've made it a general warning after the whole sequence.

>
> > +   .. note::
> > +
> > +      To achieve instantaneous seek, the client may restart streaming on
> > +      ``CAPTURE`` queue to discard decoded, but not yet dequeued buffers.
> > +
> > +Pause
> > +=====
> > +
> > +In order to pause, the client should just cease queuing buffers onto the
> > +``OUTPUT`` queue. This is different from the general V4L2 API definition of
> > +pause, which involves calling :c:func:`VIDIOC_STREAMOFF` on the queue.
> > +Without source bitstream data, there is no data to process and the
> > hardware +remains idle.
> > +
> > +Conversely, using :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue indicates
> > +a seek, which
> > +
> > +1. drops all ``OUTPUT`` buffers in flight and
> > +2. after a subsequent :c:func:`VIDIOC_STREAMON`, will look for and only
> > +   continue from a resume point.
> > +
> > +This is usually undesirable for pause. The STREAMOFF-STREAMON sequence is
> > +intended for seeking.
> > +
> > +Similarly, ``CAPTURE`` queue should remain streaming as well, as the
> > +STREAMOFF-STREAMON sequence on it is intended solely for changing buffer
> > +sets.
>
> And also to drop decoded buffers for instant seek ?
>

I've dropped the Pause section completely. It doesn't provide any
useful information IMHO and only doubles with the general semantics of
mem2mem devices.

> > +Dynamic resolution change
> > +=========================
> > +
> > +A video decoder implementing this interface must support dynamic resolution
> > +change, for streams, which include resolution metadata in the bitstream.
>
> s/for streams, which/for streams that/
>
> > +When the decoder encounters a resolution change in the stream, the dynamic
> > +resolution change sequence is started.
> > +
> > +1.  After encountering a resolution change in the stream, the driver must
> > +    first process and decode all remaining buffers from before the
> > +    resolution change point.
> > +
> > +2.  After all buffers containing decoded frames from before the resolution
> > +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
> > +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
> > +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > +
> > +    * The last buffer from before the change must be marked with
> > +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in the
> > +      drain sequence. The last buffer might be empty (with
> > +      :c:type:`v4l2_buffer` ``bytesused`` = 0) and must be ignored by the
> > +      client, since it does not contain any decoded frame.
> > +
> > +    * Any client query issued after the driver queues the event must return
> > +      values applying to the stream after the resolution change, including
> > +      queue formats, selection rectangles and controls.
> > +
> > +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events and
> > +      the event is signaled, the decoding process will not continue until
> > +      it is acknowledged by either (re-)starting streaming on ``CAPTURE``,
> > +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> > +      command.
>
> This usage of V4L2_DEC_CMD_START isn't aligned with the documentation of the
> command. I'm not opposed to this, but I think the use cases of decoder
> commands for codecs should be explained in the VIDIOC_DECODER_CMD
> documentation. What bothers me in particular is usage of V4L2_DEC_CMD_START to
> restart the decoder, while no V4L2_DEC_CMD_STOP has been issued. Should we add
> a section that details the decoder state machine with the implicit and
> explicit ways in which it is started and stopped ?

Yes, we should probably extend the VIDIOC_DECODER_CMD documentation.

As for diagrams, they would indeed be nice to have, but maybe we could
add them in a follow up patch?

>
> I would also reference step 7 here.
>
> > +    .. note::
> > +
> > +       Any attempts to dequeue more buffers beyond the buffer marked
> > +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > +       :c:func:`VIDIOC_DQBUF`.
> > +
> > +3.  The client calls :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` to get the new
> > +    format information. This is identical to calling :c:func:`VIDIOC_G_FMT`
> > +    after ``V4L2_EVENT_SRC_CH_RESOLUTION`` in the initialization sequence
> > +    and should be handled similarly.
>
> As the source resolution change event is mentioned in multiple places, how
> about extracting the related ioctls sequence to a specific section, and
> referencing it where needed (at least from the initialization sequence and
> here) ?

I made the text here refer to the Initialization sequence.

>
> > +    .. note::
> > +
> > +       It is allowed for the driver not to support the same pixel format as
>
> "Drivers may not support ..."
>
> > +       previously used (before the resolution change) for the new
> > +       resolution. The driver must select a default supported pixel format,
> > +       return it, if queried using :c:func:`VIDIOC_G_FMT`, and the client
> > +       must take note of it.
> > +
> > +4.  The client acquires visible resolution as in initialization sequence.
> > +
> > +5.  *[optional]* The client is allowed to enumerate available formats and
>
> s/is allowed to/may/
>
> > +    select a different one than currently chosen (returned via
> > +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +6.  *[optional]* The client acquires minimum number of buffers as in
> > +    initialization sequence.
> > +
> > +7.  If all the following conditions are met, the client may resume the
> > +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> > +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the drain
> > +    sequence:
> > +
> > +    * ``sizeimage`` of new format is less than or equal to the size of
> > +      currently allocated buffers,
> > +
> > +    * the number of buffers currently allocated is greater than or equal to
> > +      the minimum number of buffers acquired in step 6.
> > +
> > +    In such case, the remaining steps do not apply.
> > +
> > +    However, if the client intends to change the buffer set, to lower
> > +    memory usage or for any other reasons, it may be achieved by following
> > +    the steps below.
> > +
> > +8.  After dequeuing all remaining buffers from the ``CAPTURE`` queue,
>
> This is optional, isn't it ?
>

I wouldn't call it optional, since it depends on what the client does
and what the decoder supports. That's why the point above just states
that the remaining steps do not apply.

Also added a note:

       To fulfill those requirements, the client may attempt to use
       :c:func:`VIDIOC_CREATE_BUFS` to add more buffers. However, due to
       hardware limitations, the decoder may not support adding buffers at this
       point and the client must be able to handle a failure using the steps
       below.

> > the
> > +    client must call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE`` queue.
> > +    The ``OUTPUT`` queue must remain streaming (calling STREAMOFF on it
>
> :c:func:`VIDIOC_STREAMOFF`
>
> > +    would trigger a seek).
> > +
> > +9.  The client frees the buffers on the ``CAPTURE`` queue using
> > +    :c:func:`VIDIOC_REQBUFS`.
> > +
> > +    * **Required fields:**
> > +
> > +      ``count``
> > +          set to 0
> > +
> > +      ``type``
> > +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +      ``memory``
> > +          follows standard semantics
> > +
> > +10. The client allocates a new set of buffers for the ``CAPTURE`` queue via
> > +    :c:func:`VIDIOC_REQBUFS`. This is identical to a corresponding step in
> > +    the initialization sequence.
> > +
> > +11. The client resumes decoding by issuing :c:func:`VIDIOC_STREAMON` on the
> > +    ``CAPTURE`` queue.
> > +
> > +During the resolution change sequence, the ``OUTPUT`` queue must remain
> > +streaming. Calling :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue would
> > +initiate a seek.
> > +
> > +The ``OUTPUT`` queue operates separately from the ``CAPTURE`` queue for the
> > +duration of the entire resolution change sequence. It is allowed (and
> > +recommended for best performance and simplicity) for the client to keep
>
> "The client should (for best performance and simplicity) keep ..."
>
> > +queuing/dequeuing buffers from/to ``OUTPUT`` queue even while processing
>
> s/from\to/to\/from/
>
> > +this sequence.
> > +
> > +.. note::
> > +
> > +   It is also possible for this sequence to be triggered without a change
>
> "This sequence may be triggered ..."
>
> > +   in coded resolution, if a different number of ``CAPTURE`` buffers is
> > +   required in order to continue decoding the stream or the visible
> > +   resolution changes.
> > +
> > +Drain
> > +=====
> > +
> > +To ensure that all queued ``OUTPUT`` buffers have been processed and
> > +related ``CAPTURE`` buffers output to the client, the following drain
> > +sequence may be followed. After the drain sequence is complete, the client
> > +has received all decoded frames for all ``OUTPUT`` buffers queued before
> > +the sequence was started.
> > +
> > +1. Begin drain by issuing :c:func:`VIDIOC_DECODER_CMD`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``cmd``
> > +         set to ``V4L2_DEC_CMD_STOP``
> > +
> > +     ``flags``
> > +         set to 0
> > +
> > +     ``pts``
> > +         set to 0
> > +
> > +2. The driver must process and decode as normal all ``OUTPUT`` buffers
> > +   queued by the client before the :c:func:`VIDIOC_DECODER_CMD` was issued.
> > +   Any operations triggered as a result of processing these buffers
> > +   (including the initialization and resolution change sequences) must be
> > +   processed as normal by both the driver and the client before proceeding
> > +   with the drain sequence.
> > +
> > +3. Once all ``OUTPUT`` buffers queued before ``V4L2_DEC_CMD_STOP`` are
> > +   processed:
> > +
> > +   * If the ``CAPTURE`` queue is streaming, once all decoded frames (if
> > +     any) are ready to be dequeued on the ``CAPTURE`` queue, the driver
> > +     must send a ``V4L2_EVENT_EOS``.
>
> s/\./event./
>
> Is the event sent on the OUTPUT or CAPTURE queue ? I assume the latter, should
> it be explicitly documented ?
>

AFAICS, there is no queue type indication in the v4l2_event struct.

In any case, I've removed this event, because existing drivers don't
implement it for the drain sequence and it also makes it more
consistent, since events would be only signaled for decoder-initiated
sequences. It would also allow distinguishing between an EOS mark in
the stream (event signaled) or end of a drain sequence (no event).

> > The driver must also set
> > +     ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> > +     buffer on the ``CAPTURE`` queue containing the last frame (if any)
> > +     produced as a result of processing the ``OUTPUT`` buffers queued
> > +     before ``V4L2_DEC_CMD_STOP``. If no more frames are left to be
> > +     returned at the point of handling ``V4L2_DEC_CMD_STOP``, the driver
> > +     must return an empty buffer (with :c:type:`v4l2_buffer`
> > +     ``bytesused`` = 0) as the last buffer with ``V4L2_BUF_FLAG_LAST`` set
> > +     instead. Any attempts to dequeue more buffers beyond the buffer marked
> > +     with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > +     :c:func:`VIDIOC_DQBUF`.
> > +
> > +   * If the ``CAPTURE`` queue is NOT streaming, no action is necessary for
> > +     ``CAPTURE`` queue and the driver must send a ``V4L2_EVENT_EOS``
> > +     immediately after all ``OUTPUT`` buffers in question have been
> > +     processed.
>
> What is the use case for this ? Can't we just return an error if decoder isn't
> streaming ?
>

Actually this is wrong. We want the queued OUTPUT buffers to be
processed and decoded, so if the CAPTURE queue is not yet set up
(initialization sequence not completed yet), handling the
initialization sequence first will be needed as a part of the drain
sequence. I've updated the document with that.

> > +4. At this point, decoding is paused and the driver will accept, but not
> > +   process any newly queued ``OUTPUT`` buffers until the client issues
> > +   ``V4L2_DEC_CMD_START`` or restarts streaming on any queue.
> > +
> > +* Once the drain sequence is initiated, the client needs to drive it to
> > +  completion, as described by the above steps, unless it aborts the process
> > +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue.  The client
> > +  is not allowed to issue ``V4L2_DEC_CMD_START`` or ``V4L2_DEC_CMD_STOP``
> > +  again while the drain sequence is in progress and they will fail with
> > +  -EBUSY error code if attempted.
>
> While this seems OK to me, I think drivers will need help to implement all the
> corner cases correctly without race conditions.

We went through the possible list of corner cases and concluded that
there is no use in handling them, especially considering how much they
would complicate both the userspace and the drivers. Not even
mentioning some hardware, like s5p-mfc, which actually has a dedicated
flush operation, that needs to complete before the decoder can switch
back to normal mode.

>
> > +* Restarting streaming on ``OUTPUT`` queue will implicitly end the paused
> > +  state and reinitialize the decoder (similarly to the seek sequence).
> > +  Restarting ``CAPTURE`` queue will not affect an in-progress drain
> > +  sequence.
> > +
> > +* The drivers must also implement :c:func:`VIDIOC_TRY_DECODER_CMD`, as a
> > +  way to let the client query the availability of decoder commands.
> > +
> > +End of stream
> > +=============
> > +
> > +If the decoder encounters an end of stream marking in the stream, the
> > +driver must send a ``V4L2_EVENT_EOS`` event
>
> On which queue ?
>

Hmm?

> > to the client after all frames
> > +are decoded and ready to be dequeued on the ``CAPTURE`` queue, with the
> > +:c:type:`v4l2_buffer` ``flags`` set to ``V4L2_BUF_FLAG_LAST``. This
> > +behavior is identical to the drain sequence triggered by the client via
> > +``V4L2_DEC_CMD_STOP``.
> > +
> > +Commit points
> > +=============
> > +
> > +Setting formats and allocating buffers triggers changes in the behavior
>
> s/triggers/trigger/
>
> > +of the driver.
> > +
> > +1. Setting format on ``OUTPUT`` queue may change the set of formats
> > +   supported/advertised on the ``CAPTURE`` queue. In particular, it also
> > +   means that ``CAPTURE`` format may be reset and the client must not
> > +   rely on the previously set format being preserved.
> > +
> > +2. Enumerating formats on ``CAPTURE`` queue must only return formats
> > +   supported for the ``OUTPUT`` format currently set.
> > +
> > +3. Setting/changing format on ``CAPTURE`` queue does not change formats
>
> Why not just "Setting format" ?
>
> > +   available on ``OUTPUT`` queue. An attempt to set ``CAPTURE`` format that
> > +   is not supported for the currently selected ``OUTPUT`` format must
> > +   result in the driver adjusting the requested format to an acceptable
> > +   one.
> > +
> > +4. Enumerating formats on ``OUTPUT`` queue always returns the full set of
> > +   supported coded formats, irrespective of the current ``CAPTURE``
> > +   format.
> > +
> > +5. After allocating buffers on the ``OUTPUT`` queue, it is not possible to
> > +   change format on it.
>
> I'd phrase this as
>
> "While buffers are allocated on the ``OUTPUT`` queue, clients must not change
> the format on the queue. Drivers must return <error code> for any such format
> change attempt."

Done, thanks.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-18 10:03     ` Tomasz Figa
@ 2018-10-18 11:22       ` Laurent Pinchart
  2018-10-20  8:52         ` Tomasz Figa
  2018-10-20 10:24       ` Tomasz Figa
  2018-10-20 15:39       ` Tomasz Figa
  2 siblings, 1 reply; 62+ messages in thread
From: Laurent Pinchart @ 2018-10-18 11:22 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

Hi Tomasz,

I've stripped out all the parts on which I have no specific comment or just 
agree with your proposal. Please see below for a few additional remarks.

On Thursday, 18 October 2018 13:03:33 EEST Tomasz Figa wrote:
> On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart wrote:
> > On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
> >> Due to complexity of the video decoding process, the V4L2 drivers of
> >> stateful decoder hardware require specific sequences of V4L2 API calls
> >> to be followed. These include capability enumeration, initialization,
> >> decoding, seek, pause, dynamic resolution change, drain and end of
> >> stream.
> >> 
> >> Specifics of the above have been discussed during Media Workshops at
> >> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> >> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> >> originated at those events was later implemented by the drivers we
> >> already have merged in mainline, such as s5p-mfc or coda.
> >> 
> >> The only thing missing was the real specification included as a part of
> >> Linux Media documentation. Fix it now and document the decoder part of
> >> the Codec API.
> >> 
> >> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> >> ---
> >> 
> >>  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
> >>  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >>  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
> >>  3 files changed, 882 insertions(+), 1 deletion(-)
> >>  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> >> 
> >> diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst
> >> b/Documentation/media/uapi/v4l/dev-decoder.rst new file mode 100644
> >> index 000000000000..f55d34d2f860
> >> --- /dev/null
> >> +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> >> @@ -0,0 +1,872 @@

[snip]

> >> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> >> +    ``OUTPUT``.
> >> +
> >> +    * **Required fields:**
> >> +
> >> +      ``count``
> >> +          requested number of buffers to allocate; greater than zero
> >> +
> >> +      ``type``
> >> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> >> +
> >> +      ``memory``
> >> +          follows standard semantics
> >> +
> >> +      ``sizeimage``
> >> +          follows standard semantics; the client is free to choose any
> >> +          suitable size, however, it may be subject to change by the
> >> +          driver
> >> +
> >> +    * **Return fields:**
> >> +
> >> +      ``count``
> >> +          actual number of buffers allocated
> >> +
> >> +    * The driver must adjust count to minimum of required number of
> >> +      ``OUTPUT`` buffers for given format and count passed.
> > 
> > Isn't it the maximum, not the minimum ?
> 
> It's actually neither. All we can generally say here is that the
> number will be adjusted and the client must note it.

I expect it to be clamp(requested count, driver minimum, driver maximum). I'm 
not sure it's worth capturing this in the document though, but we could say

"The driver must clam count to the minimum and maximum number of required 
``OUTPUT`` buffers for the given format ."

> >> The client must
> >> +      check this value after the ioctl returns to get the number of
> >> +      buffers allocated.
> >> +
> >> +    .. note::
> >> +
> >> +       To allocate more than minimum number of buffers (for pipeline
> >> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
> >> +       get minimum number of buffers required by the driver/format,
> >> +       and pass the obtained value plus the number of additional
> >> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> >> +
> >> +5.  Start streaming on ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`.
> >> +
> >> +6.  This step only applies to coded formats that contain resolution
> >> +    information in the stream. Continue queuing/dequeuing bitstream
> >> +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
> >> +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and
> >> returning
> >> +    each buffer to the client until required metadata to configure the
> >> +    ``CAPTURE`` queue are found. This is indicated by the driver
> >> sending
> >> +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> >> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> >> +    requirement to pass enough data for this to occur in the first
> >> buffer
> >> +    and the driver must be able to process any number.
> >> +
> >> +    * If data in a buffer that triggers the event is required to decode
> >> +      the first frame, the driver must not return it to the client,
> >> +      but must retain it for further decoding.
> >> +
> >> +    * If the client set width and height of ``OUTPUT`` format to 0,
> >> calling
> >> +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return
> >> -EPERM,
> >> +      until the driver configures ``CAPTURE`` format according to stream
> >> +      metadata.
> > 
> > That's a pretty harsh handling for this condition. What's the rationale
> > for returning -EPERM instead of for instance succeeding with width and
> > height set to 0 ?
> 
> I don't like it, but the error condition must stay for compatibility
> reasons as that's what current drivers implement and applications
> expect. (Technically current drivers would return -EINVAL, but we
> concluded that existing applications don't care about the exact value,
> so we can change it to make more sense.)

Fair enough :-/ A bit of a shame though. Should we try to use an error code 
that would have less chance of being confused with an actual permission 
problem ? -EILSEQ could be an option for "illegal sequence" of operations, but 
better options could exist.

> >> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events
> >> and
> >> +      the event is signaled, the decoding process will not continue
> >> until
> >> +      it is acknowledged by either (re-)starting streaming on
> >> ``CAPTURE``,
> >> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> >> +      command.
> >> +
> >> +    .. note::
> >> +
> >> +       No decoded frames are produced during this phase.
> >> +

[snip]

> >> +8.  Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for
> >> the +    destination buffers parsed/decoded from the bitstream.
> >> +
> >> +    * **Required fields:**
> >> +
> >> +      ``type``
> >> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> >> +
> >> +    * **Return fields:**
> >> +
> >> +      ``width``, ``height``
> >> +          frame buffer resolution for the decoded frames
> >> +
> >> +      ``pixelformat``
> >> +          pixel format for decoded frames
> >> +
> >> +      ``num_planes`` (for _MPLANE ``type`` only)
> >> +          number of planes for pixelformat
> >> +
> >> +      ``sizeimage``, ``bytesperline``
> >> +          as per standard semantics; matching frame buffer format
> >> +
> >> +    .. note::
> >> +
> >> +       The value of ``pixelformat`` may be any pixel format supported
> >> and
> >> +       must be supported for current stream, based on the information
> >> +       parsed from the stream and hardware capabilities. It is
> >> suggested
> >> +       that driver chooses the preferred/optimal format for given
> > 
> > In compliance with RFC 2119, how about using "Drivers should choose"
> > instead of "It is suggested that driver chooses" ?
> 
> The whole paragraph became:
> 
>        The value of ``pixelformat`` may be any pixel format supported by the
> decoder for the current stream. It is expected that the decoder chooses a
> preferred/optimal format for the default configuration. For example, a YUV
> format may be preferred over an RGB format, if additional conversion step
> would be required.

How about using "should" instead of "it is expected that" ?

[snip]

> >> +10.  *[optional]* Choose a different ``CAPTURE`` format than suggested
> >> via
> >> +     :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for
> >> the
> >> +     client to choose a different format than selected/suggested by the
> > 
> > And here, "A client may choose" ?
> > 
> >> +     driver in :c:func:`VIDIOC_G_FMT`.
> >> +
> >> +     * **Required fields:**
> >> +
> >> +       ``type``
> >> +           a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> >> +
> >> +       ``pixelformat``
> >> +           a raw pixel format
> >> +
> >> +     .. note::
> >> +
> >> +        Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently
> >> available
> >> +        formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful
> >> to
> >> +        find out a set of allowed formats for given configuration, but
> >> not
> >> +        required, if the client can accept the defaults.
> > 
> > s/required/required,/
> 
> That would become "[...]but not required,, if the client[...]". Is
> that your suggestion? ;)

Oops, the other way around of course :-)

[snip]

> >> +3. Start queuing buffers to ``OUTPUT`` queue containing stream data
> >> after
> >> +   the seek until a suitable resume point is found.
> >> +
> >> +   .. note::
> >> +
> >> +      There is no requirement to begin queuing stream starting exactly
> >> from
> > 
> > s/stream/buffers/ ?
> 
> Perhaps "stream data"? The buffers don't have a resume point, the stream
> does.

Maybe "coded data" ?

> >> +      a resume point (e.g. SPS or a keyframe). The driver must handle
> >> any
> >> +      data queued and must keep processing the queued buffers until it
> >> +      finds a suitable resume point. While looking for a resume point,
> >> the
> >> +      driver processes ``OUTPUT`` buffers and returns them to the
> >> client
> >> +      without producing any decoded frames.
> >> +
> >> +      For hardware known to be mishandling seeks to a non-resume point,
> >> +      e.g. by returning corrupted decoded frames, the driver must be
> >> able
> >> +      to handle such seeks without a crash or any fatal decode error.
> > 
> > This should be true for any hardware, there should never be any crash or
> > fatal decode error. I'd write it as
> > 
> > Some hardware is known to mishandle seeks to a non-resume point. Such an
> > operation may result in an unspecified number of corrupted decoded frames
> > being made available on ``CAPTURE``. Drivers must ensure that no fatal
> > decoding errors or crashes occur, and implement any necessary handling and
> > work-arounds for hardware issues related to seek operations.
> 
> Done.

[snip]

> >> +2.  After all buffers containing decoded frames from before the
> >> resolution
> >> +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
> >> +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
> >> +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> >> +
> >> +    * The last buffer from before the change must be marked with
> >> +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in
> >> the +      drain sequence. The last buffer might be empty (with
> >> +      :c:type:`v4l2_buffer` ``bytesused`` = 0) and must be ignored by
> >> the
> >> +      client, since it does not contain any decoded frame.
> >> +
> >> +    * Any client query issued after the driver queues the event must
> >> return
> >> +      values applying to the stream after the resolution change,
> >> including
> >> +      queue formats, selection rectangles and controls.
> >> +
> >> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events
> >> and
> >> +      the event is signaled, the decoding process will not continue
> >> until
> >> +      it is acknowledged by either (re-)starting streaming on
> >> ``CAPTURE``,
> >> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> >> +      command.
> > 
> > This usage of V4L2_DEC_CMD_START isn't aligned with the documentation of
> > the command. I'm not opposed to this, but I think the use cases of
> > decoder commands for codecs should be explained in the VIDIOC_DECODER_CMD
> > documentation. What bothers me in particular is usage of
> > V4L2_DEC_CMD_START to restart the decoder, while no V4L2_DEC_CMD_STOP has
> > been issued. Should we add a section that details the decoder state
> > machine with the implicit and explicit ways in which it is started and
> > stopped ?
> 
> Yes, we should probably extend the VIDIOC_DECODER_CMD documentation.
> 
> As for diagrams, they would indeed be nice to have, but maybe we could
> add them in a follow up patch?

That's another way to say it won't happen, right ? ;-) I'm OK with that, but I 
think we should still clarify that the source change generates an implicit 
V4L2_DEC_CMD_STOP.

> > I would also reference step 7 here.
> > 
> >> +    .. note::
> >> +
> >> +       Any attempts to dequeue more buffers beyond the buffer marked
> >> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> >> +       :c:func:`VIDIOC_DQBUF`.
> >> +
> >> +3.  The client calls :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` to get the
> >> new
> >> +    format information. This is identical to calling
> >> :c:func:`VIDIOC_G_FMT` +    after ``V4L2_EVENT_SRC_CH_RESOLUTION`` in
> >> the initialization sequence +    and should be handled similarly.
> > 
> > As the source resolution change event is mentioned in multiple places, how
> > about extracting the related ioctls sequence to a specific section, and
> > referencing it where needed (at least from the initialization sequence and
> > here) ?
> 
> I made the text here refer to the Initialization sequence.

Wouldn't it be clearer if those steps were extracted to a standalone sequence 
referenced from both locations ?

> >> +    .. note::
> >> +
> >> +       It is allowed for the driver not to support the same pixel
> >> format as
> > 
> > "Drivers may not support ..."
> > 
> >> +       previously used (before the resolution change) for the new
> >> +       resolution. The driver must select a default supported pixel
> >> format,
> >> +       return it, if queried using :c:func:`VIDIOC_G_FMT`, and the
> >> client
> >> +       must take note of it.
> >> +
> >> +4.  The client acquires visible resolution as in initialization
> >> sequence.
> >> +
> >> +5.  *[optional]* The client is allowed to enumerate available formats
> >> and
> > 
> > s/is allowed to/may/
> > 
> >> +    select a different one than currently chosen (returned via
> >> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step
> >> in
> >> +    the initialization sequence.
> >> +
> >> +6.  *[optional]* The client acquires minimum number of buffers as in
> >> +    initialization sequence.
> >> +
> >> +7.  If all the following conditions are met, the client may resume the
> >> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> >> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the
> >> drain
> >> +    sequence:
> >> +
> >> +    * ``sizeimage`` of new format is less than or equal to the size of
> >> +      currently allocated buffers,
> >> +
> >> +    * the number of buffers currently allocated is greater than or
> >> equal to
> >> +      the minimum number of buffers acquired in step 6.
> >> +
> >> +    In such case, the remaining steps do not apply.
> >> +
> >> +    However, if the client intends to change the buffer set, to lower
> >> +    memory usage or for any other reasons, it may be achieved by
> >> following
> >> +    the steps below.
> >> +
> >> +8.  After dequeuing all remaining buffers from the ``CAPTURE`` queue,
> > 
> > This is optional, isn't it ?
> 
> I wouldn't call it optional, since it depends on what the client does
> and what the decoder supports. That's why the point above just states
> that the remaining steps do not apply.

I meant isn't the "After dequeuing all remaining buffers from the CAPTURE 
queue" part optional ? As far as I understand, the client may decide not to 
dequeue them.

> Also added a note:
> 
>        To fulfill those requirements, the client may attempt to use
>        :c:func:`VIDIOC_CREATE_BUFS` to add more buffers. However, due to
>        hardware limitations, the decoder may not support adding buffers at
>        this point and the client must be able to handle a failure using the
>        steps below.

I wonder if there could be a way to work around those limitations on the 
driver side. At the beginning of step 7, the decoder is effectively stopped. 
If the hardware doesn't support adding new buffers on the fly, can't the 
driver support the VIDIOC_CREATE_BUFS + V4L2_DEC_CMD_START sequence the same 
way it would support the VIDIOC_STREAMOFF + VIDIOC_REQBUFS(0) + 
VIDIOC_REQBUFS(n) + VIDIOC_STREAMON ?

> >> the
> >> +    client must call :c:func:`VIDIOC_STREAMOFF` on the ``CAPTURE``
> >> queue.
> >> +    The ``OUTPUT`` queue must remain streaming (calling STREAMOFF on it
> >
> > :c:func:`VIDIOC_STREAMOFF`
> >
> >> +    would trigger a seek).

[snip]

-- 
Regards,

Laurent Pinchart




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-18 11:22       ` Laurent Pinchart
@ 2018-10-20  8:52         ` Tomasz Figa
  2018-10-21  9:23           ` Laurent Pinchart
  0 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-10-20  8:52 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

On Thu, Oct 18, 2018 at 8:22 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Tomasz,
>
> I've stripped out all the parts on which I have no specific comment or just
> agree with your proposal. Please see below for a few additional remarks.
>
> On Thursday, 18 October 2018 13:03:33 EEST Tomasz Figa wrote:
> > On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart wrote:
> > > On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
> > >> Due to complexity of the video decoding process, the V4L2 drivers of
> > >> stateful decoder hardware require specific sequences of V4L2 API calls
> > >> to be followed. These include capability enumeration, initialization,
> > >> decoding, seek, pause, dynamic resolution change, drain and end of
> > >> stream.
> > >>
> > >> Specifics of the above have been discussed during Media Workshops at
> > >> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > >> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > >> originated at those events was later implemented by the drivers we
> > >> already have merged in mainline, such as s5p-mfc or coda.
> > >>
> > >> The only thing missing was the real specification included as a part of
> > >> Linux Media documentation. Fix it now and document the decoder part of
> > >> the Codec API.
> > >>
> > >> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > >> ---
> > >>
> > >>  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
> > >>  Documentation/media/uapi/v4l/devices.rst     |   1 +
> > >>  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
> > >>  3 files changed, 882 insertions(+), 1 deletion(-)
> > >>  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> > >>
> > >> diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst
> > >> b/Documentation/media/uapi/v4l/dev-decoder.rst new file mode 100644
> > >> index 000000000000..f55d34d2f860
> > >> --- /dev/null
> > >> +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> > >> @@ -0,0 +1,872 @@
>
> [snip]
>
> > >> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS` on
> > >> +    ``OUTPUT``.
> > >> +
> > >> +    * **Required fields:**
> > >> +
> > >> +      ``count``
> > >> +          requested number of buffers to allocate; greater than zero
> > >> +
> > >> +      ``type``
> > >> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > >> +
> > >> +      ``memory``
> > >> +          follows standard semantics
> > >> +
> > >> +      ``sizeimage``
> > >> +          follows standard semantics; the client is free to choose any
> > >> +          suitable size, however, it may be subject to change by the
> > >> +          driver
> > >> +
> > >> +    * **Return fields:**
> > >> +
> > >> +      ``count``
> > >> +          actual number of buffers allocated
> > >> +
> > >> +    * The driver must adjust count to minimum of required number of
> > >> +      ``OUTPUT`` buffers for given format and count passed.
> > >
> > > Isn't it the maximum, not the minimum ?
> >
> > It's actually neither. All we can generally say here is that the
> > number will be adjusted and the client must note it.
>
> I expect it to be clamp(requested count, driver minimum, driver maximum). I'm
> not sure it's worth capturing this in the document though, but we could say
>
> "The driver must clam count to the minimum and maximum number of required
> ``OUTPUT`` buffers for the given format ."
>

I'd leave the details to the documentation of VIDIOC_REQBUFS, if
needed. This document focuses on the decoder UAPI and with this note I
want to ensure that the applications don't assume that exactly the
requested number of buffers is always allocated.

How about making it even simpler:

The actual number of allocated buffers may differ from the ``count``
given. The client must check the updated value of ``count`` after the
call returns.

> > >> The client must
> > >> +      check this value after the ioctl returns to get the number of
> > >> +      buffers allocated.
> > >> +
> > >> +    .. note::
> > >> +
> > >> +       To allocate more than minimum number of buffers (for pipeline
> > >> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
> > >> +       get minimum number of buffers required by the driver/format,
> > >> +       and pass the obtained value plus the number of additional
> > >> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > >> +
> > >> +5.  Start streaming on ``OUTPUT`` queue via :c:func:`VIDIOC_STREAMON`.
> > >> +
> > >> +6.  This step only applies to coded formats that contain resolution
> > >> +    information in the stream. Continue queuing/dequeuing bitstream
> > >> +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF` and
> > >> +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and
> > >> returning
> > >> +    each buffer to the client until required metadata to configure the
> > >> +    ``CAPTURE`` queue are found. This is indicated by the driver
> > >> sending
> > >> +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> > >> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> > >> +    requirement to pass enough data for this to occur in the first
> > >> buffer
> > >> +    and the driver must be able to process any number.
> > >> +
> > >> +    * If data in a buffer that triggers the event is required to decode
> > >> +      the first frame, the driver must not return it to the client,
> > >> +      but must retain it for further decoding.
> > >> +
> > >> +    * If the client set width and height of ``OUTPUT`` format to 0,
> > >> calling
> > >> +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return
> > >> -EPERM,
> > >> +      until the driver configures ``CAPTURE`` format according to stream
> > >> +      metadata.
> > >
> > > That's a pretty harsh handling for this condition. What's the rationale
> > > for returning -EPERM instead of for instance succeeding with width and
> > > height set to 0 ?
> >
> > I don't like it, but the error condition must stay for compatibility
> > reasons as that's what current drivers implement and applications
> > expect. (Technically current drivers would return -EINVAL, but we
> > concluded that existing applications don't care about the exact value,
> > so we can change it to make more sense.)
>
> Fair enough :-/ A bit of a shame though. Should we try to use an error code
> that would have less chance of being confused with an actual permission
> problem ? -EILSEQ could be an option for "illegal sequence" of operations, but
> better options could exist.
>

In Request API we concluded that -EACCES is the right code to return
for G_EXT_CTRLS on a request that has not finished yet. The case here
is similar - the capture queue is not yet set up. What do you think?

> > >> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events
> > >> and
> > >> +      the event is signaled, the decoding process will not continue
> > >> until
> > >> +      it is acknowledged by either (re-)starting streaming on
> > >> ``CAPTURE``,
> > >> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> > >> +      command.
> > >> +
> > >> +    .. note::
> > >> +
> > >> +       No decoded frames are produced during this phase.
> > >> +
>
> [snip]
>
> > >> +8.  Call :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` queue to get format for
> > >> the +    destination buffers parsed/decoded from the bitstream.
> > >> +
> > >> +    * **Required fields:**
> > >> +
> > >> +      ``type``
> > >> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > >> +
> > >> +    * **Return fields:**
> > >> +
> > >> +      ``width``, ``height``
> > >> +          frame buffer resolution for the decoded frames
> > >> +
> > >> +      ``pixelformat``
> > >> +          pixel format for decoded frames
> > >> +
> > >> +      ``num_planes`` (for _MPLANE ``type`` only)
> > >> +          number of planes for pixelformat
> > >> +
> > >> +      ``sizeimage``, ``bytesperline``
> > >> +          as per standard semantics; matching frame buffer format
> > >> +
> > >> +    .. note::
> > >> +
> > >> +       The value of ``pixelformat`` may be any pixel format supported
> > >> and
> > >> +       must be supported for current stream, based on the information
> > >> +       parsed from the stream and hardware capabilities. It is
> > >> suggested
> > >> +       that driver chooses the preferred/optimal format for given
> > >
> > > In compliance with RFC 2119, how about using "Drivers should choose"
> > > instead of "It is suggested that driver chooses" ?
> >
> > The whole paragraph became:
> >
> >        The value of ``pixelformat`` may be any pixel format supported by the
> > decoder for the current stream. It is expected that the decoder chooses a
> > preferred/optimal format for the default configuration. For example, a YUV
> > format may be preferred over an RGB format, if additional conversion step
> > would be required.
>
> How about using "should" instead of "it is expected that" ?
>

Done.

> [snip]
>
> > >> +10.  *[optional]* Choose a different ``CAPTURE`` format than suggested
> > >> via
> > >> +     :c:func:`VIDIOC_S_FMT` on ``CAPTURE`` queue. It is possible for
> > >> the
> > >> +     client to choose a different format than selected/suggested by the
> > >
> > > And here, "A client may choose" ?
> > >
> > >> +     driver in :c:func:`VIDIOC_G_FMT`.
> > >> +
> > >> +     * **Required fields:**
> > >> +
> > >> +       ``type``
> > >> +           a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > >> +
> > >> +       ``pixelformat``
> > >> +           a raw pixel format
> > >> +
> > >> +     .. note::
> > >> +
> > >> +        Calling :c:func:`VIDIOC_ENUM_FMT` to discover currently
> > >> available
> > >> +        formats after receiving ``V4L2_EVENT_SOURCE_CHANGE`` is useful
> > >> to
> > >> +        find out a set of allowed formats for given configuration, but
> > >> not
> > >> +        required, if the client can accept the defaults.
> > >
> > > s/required/required,/
> >
> > That would become "[...]but not required,, if the client[...]". Is
> > that your suggestion? ;)
>
> Oops, the other way around of course :-)

Done.

>
> [snip]
>
> > >> +3. Start queuing buffers to ``OUTPUT`` queue containing stream data
> > >> after
> > >> +   the seek until a suitable resume point is found.
> > >> +
> > >> +   .. note::
> > >> +
> > >> +      There is no requirement to begin queuing stream starting exactly
> > >> from
> > >
> > > s/stream/buffers/ ?
> >
> > Perhaps "stream data"? The buffers don't have a resume point, the stream
> > does.
>
> Maybe "coded data" ?
>

Done.

> > >> +      a resume point (e.g. SPS or a keyframe). The driver must handle
> > >> any
> > >> +      data queued and must keep processing the queued buffers until it
> > >> +      finds a suitable resume point. While looking for a resume point,
> > >> the
> > >> +      driver processes ``OUTPUT`` buffers and returns them to the
> > >> client
> > >> +      without producing any decoded frames.
> > >> +
> > >> +      For hardware known to be mishandling seeks to a non-resume point,
> > >> +      e.g. by returning corrupted decoded frames, the driver must be
> > >> able
> > >> +      to handle such seeks without a crash or any fatal decode error.
> > >
> > > This should be true for any hardware, there should never be any crash or
> > > fatal decode error. I'd write it as
> > >
> > > Some hardware is known to mishandle seeks to a non-resume point. Such an
> > > operation may result in an unspecified number of corrupted decoded frames
> > > being made available on ``CAPTURE``. Drivers must ensure that no fatal
> > > decoding errors or crashes occur, and implement any necessary handling and
> > > work-arounds for hardware issues related to seek operations.
> >
> > Done.
>
> [snip]
>
> > >> +2.  After all buffers containing decoded frames from before the
> > >> resolution
> > >> +    change point are ready to be dequeued on the ``CAPTURE`` queue, the
> > >> +    driver sends a ``V4L2_EVENT_SOURCE_CHANGE`` event for source change
> > >> +    type ``V4L2_EVENT_SRC_CH_RESOLUTION``.
> > >> +
> > >> +    * The last buffer from before the change must be marked with
> > >> +      :c:type:`v4l2_buffer` ``flags`` flag ``V4L2_BUF_FLAG_LAST`` as in
> > >> the +      drain sequence. The last buffer might be empty (with
> > >> +      :c:type:`v4l2_buffer` ``bytesused`` = 0) and must be ignored by
> > >> the
> > >> +      client, since it does not contain any decoded frame.
> > >> +
> > >> +    * Any client query issued after the driver queues the event must
> > >> return
> > >> +      values applying to the stream after the resolution change,
> > >> including
> > >> +      queue formats, selection rectangles and controls.
> > >> +
> > >> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE`` events
> > >> and
> > >> +      the event is signaled, the decoding process will not continue
> > >> until
> > >> +      it is acknowledged by either (re-)starting streaming on
> > >> ``CAPTURE``,
> > >> +      or via :c:func:`VIDIOC_DECODER_CMD` with ``V4L2_DEC_CMD_START``
> > >> +      command.
> > >
> > > This usage of V4L2_DEC_CMD_START isn't aligned with the documentation of
> > > the command. I'm not opposed to this, but I think the use cases of
> > > decoder commands for codecs should be explained in the VIDIOC_DECODER_CMD
> > > documentation. What bothers me in particular is usage of
> > > V4L2_DEC_CMD_START to restart the decoder, while no V4L2_DEC_CMD_STOP has
> > > been issued. Should we add a section that details the decoder state
> > > machine with the implicit and explicit ways in which it is started and
> > > stopped ?
> >
> > Yes, we should probably extend the VIDIOC_DECODER_CMD documentation.
> >
> > As for diagrams, they would indeed be nice to have, but maybe we could
> > add them in a follow up patch?
>
> That's another way to say it won't happen, right ? ;-)

I'd prefer to focus on the basic description first, since for the last
6 years we haven't had any documentation at all. I hope we can later
have more contributors follow up with patches to make it easier to
read, e.g. add nice diagrams.

Anyway, I'll try to add a simple state machine diagram in dot, but
would appreciate if we could postpone any not critical improvements.

> I'm OK with that, but I
> think we should still clarify that the source change generates an implicit
> V4L2_DEC_CMD_STOP.
>

Good idea, thanks.

> > > I would also reference step 7 here.
> > >
> > >> +    .. note::
> > >> +
> > >> +       Any attempts to dequeue more buffers beyond the buffer marked
> > >> +       with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > >> +       :c:func:`VIDIOC_DQBUF`.
> > >> +
> > >> +3.  The client calls :c:func:`VIDIOC_G_FMT` for ``CAPTURE`` to get the
> > >> new
> > >> +    format information. This is identical to calling
> > >> :c:func:`VIDIOC_G_FMT` +    after ``V4L2_EVENT_SRC_CH_RESOLUTION`` in
> > >> the initialization sequence +    and should be handled similarly.
> > >
> > > As the source resolution change event is mentioned in multiple places, how
> > > about extracting the related ioctls sequence to a specific section, and
> > > referencing it where needed (at least from the initialization sequence and
> > > here) ?
> >
> > I made the text here refer to the Initialization sequence.
>
> Wouldn't it be clearer if those steps were extracted to a standalone sequence
> referenced from both locations ?
>

It might be possible to extract the operations on the CAPTURE queue
into a "Capture setup" sequence. Let me check that.

> > >> +    .. note::
> > >> +
> > >> +       It is allowed for the driver not to support the same pixel
> > >> format as
> > >
> > > "Drivers may not support ..."
> > >
> > >> +       previously used (before the resolution change) for the new
> > >> +       resolution. The driver must select a default supported pixel
> > >> format,
> > >> +       return it, if queried using :c:func:`VIDIOC_G_FMT`, and the
> > >> client
> > >> +       must take note of it.
> > >> +
> > >> +4.  The client acquires visible resolution as in initialization
> > >> sequence.
> > >> +
> > >> +5.  *[optional]* The client is allowed to enumerate available formats
> > >> and
> > >
> > > s/is allowed to/may/
> > >
> > >> +    select a different one than currently chosen (returned via
> > >> +    :c:func:`VIDIOC_G_FMT)`. This is identical to a corresponding step
> > >> in
> > >> +    the initialization sequence.
> > >> +
> > >> +6.  *[optional]* The client acquires minimum number of buffers as in
> > >> +    initialization sequence.
> > >> +
> > >> +7.  If all the following conditions are met, the client may resume the
> > >> +    decoding instantly, by using :c:func:`VIDIOC_DECODER_CMD` with
> > >> +    ``V4L2_DEC_CMD_START`` command, as in case of resuming after the
> > >> drain
> > >> +    sequence:
> > >> +
> > >> +    * ``sizeimage`` of new format is less than or equal to the size of
> > >> +      currently allocated buffers,
> > >> +
> > >> +    * the number of buffers currently allocated is greater than or
> > >> equal to
> > >> +      the minimum number of buffers acquired in step 6.
> > >> +
> > >> +    In such case, the remaining steps do not apply.
> > >> +
> > >> +    However, if the client intends to change the buffer set, to lower
> > >> +    memory usage or for any other reasons, it may be achieved by
> > >> following
> > >> +    the steps below.
> > >> +
> > >> +8.  After dequeuing all remaining buffers from the ``CAPTURE`` queue,
> > >
> > > This is optional, isn't it ?
> >
> > I wouldn't call it optional, since it depends on what the client does
> > and what the decoder supports. That's why the point above just states
> > that the remaining steps do not apply.
>
> I meant isn't the "After dequeuing all remaining buffers from the CAPTURE
> queue" part optional ? As far as I understand, the client may decide not to
> dequeue them.
>

A STREAMOFF would discard the already decoded but not yet dequeued
frames. While it's technically fine, it doesn't make sense, because it
would lead to a frame drop. Therefore, I'd rather keep it required,
for simplicity.

> > Also added a note:
> >
> >        To fulfill those requirements, the client may attempt to use
> >        :c:func:`VIDIOC_CREATE_BUFS` to add more buffers. However, due to
> >        hardware limitations, the decoder may not support adding buffers at
> >        this point and the client must be able to handle a failure using the
> >        steps below.
>
> I wonder if there could be a way to work around those limitations on the
> driver side. At the beginning of step 7, the decoder is effectively stopped.
> If the hardware doesn't support adding new buffers on the fly, can't the
> driver support the VIDIOC_CREATE_BUFS + V4L2_DEC_CMD_START sequence the same
> way it would support the VIDIOC_STREAMOFF + VIDIOC_REQBUFS(0) +
> VIDIOC_REQBUFS(n) + VIDIOC_STREAMON ?
>

I guess that would work. I would only allow it for the case where
existing buffers are already big enough and just more buffers are
needed. Otherwise it would lead to some weird cases, such as some old
buffers already in the CAPTURE queue, blocking the decode of further
frames. (While it could be handled by the driver returning them with
an error state, it would only complicate the interface.)

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-18 10:03     ` Tomasz Figa
  2018-10-18 11:22       ` Laurent Pinchart
@ 2018-10-20 10:24       ` Tomasz Figa
  2018-10-21  9:26         ` Laurent Pinchart
  2018-10-20 15:39       ` Tomasz Figa
  2 siblings, 1 reply; 62+ messages in thread
From: Tomasz Figa @ 2018-10-20 10:24 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

On Thu, Oct 18, 2018 at 7:03 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Laurent,
>
> On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> >
> > Hi Tomasz,
> >
> > Thank you for the patch.
>
> Thanks for your comments! Please see my replies inline.
>
> >
> > On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
[snip]
> > > +4. At this point, decoding is paused and the driver will accept, but not
> > > +   process any newly queued ``OUTPUT`` buffers until the client issues
> > > +   ``V4L2_DEC_CMD_START`` or restarts streaming on any queue.
> > > +
> > > +* Once the drain sequence is initiated, the client needs to drive it to
> > > +  completion, as described by the above steps, unless it aborts the process
> > > +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue.  The client
> > > +  is not allowed to issue ``V4L2_DEC_CMD_START`` or ``V4L2_DEC_CMD_STOP``
> > > +  again while the drain sequence is in progress and they will fail with
> > > +  -EBUSY error code if attempted.
> >
> > While this seems OK to me, I think drivers will need help to implement all the
> > corner cases correctly without race conditions.
>
> We went through the possible list of corner cases and concluded that
> there is no use in handling them, especially considering how much they
> would complicate both the userspace and the drivers. Not even
> mentioning some hardware, like s5p-mfc, which actually has a dedicated
> flush operation, that needs to complete before the decoder can switch
> back to normal mode.

Actually I misread your comment.

Agreed that the decoder commands are a bit tricky to implement
properly. That's one of the reasons I decided to make the return
-EBUSY while an existing drain is in progress.

Do you have any particular simplification in mind that could avoid
some corner cases?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-18 10:03     ` Tomasz Figa
  2018-10-18 11:22       ` Laurent Pinchart
  2018-10-20 10:24       ` Tomasz Figa
@ 2018-10-20 15:39       ` Tomasz Figa
  2 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-10-20 15:39 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

On Thu, Oct 18, 2018 at 7:03 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
> Hi Laurent,
>
> On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> >
> > Hi Tomasz,
> >
> > Thank you for the patch.
>
> Thanks for your comments! Please see my replies inline.
>
> >
> > On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
[snip]
> > > The driver must also set
> > > +     ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer` ``flags`` field on the
> > > +     buffer on the ``CAPTURE`` queue containing the last frame (if any)
> > > +     produced as a result of processing the ``OUTPUT`` buffers queued
> > > +     before ``V4L2_DEC_CMD_STOP``. If no more frames are left to be
> > > +     returned at the point of handling ``V4L2_DEC_CMD_STOP``, the driver
> > > +     must return an empty buffer (with :c:type:`v4l2_buffer`
> > > +     ``bytesused`` = 0) as the last buffer with ``V4L2_BUF_FLAG_LAST`` set
> > > +     instead. Any attempts to dequeue more buffers beyond the buffer marked
> > > +     with ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error from
> > > +     :c:func:`VIDIOC_DQBUF`.
> > > +
> > > +   * If the ``CAPTURE`` queue is NOT streaming, no action is necessary for
> > > +     ``CAPTURE`` queue and the driver must send a ``V4L2_EVENT_EOS``
> > > +     immediately after all ``OUTPUT`` buffers in question have been
> > > +     processed.
> >
> > What is the use case for this ? Can't we just return an error if decoder isn't
> > streaming ?
> >
>
> Actually this is wrong. We want the queued OUTPUT buffers to be
> processed and decoded, so if the CAPTURE queue is not yet set up
> (initialization sequence not completed yet), handling the
> initialization sequence first will be needed as a part of the drain
> sequence. I've updated the document with that.

I might want to take this back. The client could just drive the
initialization to completion on its own and start the drain sequence
after that. Let me think if it makes anything easier. For reference, I
don't see any compatibility constraint here, since the existing user
space already works like that.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-20  8:52         ` Tomasz Figa
@ 2018-10-21  9:23           ` Laurent Pinchart
  2018-10-22  6:19             ` Tomasz Figa
  0 siblings, 1 reply; 62+ messages in thread
From: Laurent Pinchart @ 2018-10-21  9:23 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

Hi Tomasz,

On Saturday, 20 October 2018 11:52:57 EEST Tomasz Figa wrote:
> On Thu, Oct 18, 2018 at 8:22 PM Laurent Pinchart wrote:
> > On Thursday, 18 October 2018 13:03:33 EEST Tomasz Figa wrote:
> >> On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart wrote:
> >>> On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
> >>>> Due to complexity of the video decoding process, the V4L2 drivers of
> >>>> stateful decoder hardware require specific sequences of V4L2 API
> >>>> calls to be followed. These include capability enumeration,
> >>>> initialization, decoding, seek, pause, dynamic resolution change, drain
> >>>> and end of stream.
> >>>> 
> >>>> Specifics of the above have been discussed during Media Workshops at
> >>>> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> >>>> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> >>>> originated at those events was later implemented by the drivers we
> >>>> already have merged in mainline, such as s5p-mfc or coda.
> >>>> 
> >>>> The only thing missing was the real specification included as a part
> >>>> of Linux Media documentation. Fix it now and document the decoder part
> >>>> of the Codec API.
> >>>> 
> >>>> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> >>>> ---
> >>>> 
> >>>>  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
> >>>>  Documentation/media/uapi/v4l/devices.rst     |   1 +
> >>>>  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
> >>>>  3 files changed, 882 insertions(+), 1 deletion(-)
> >>>>  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> >>>> 
> >>>> diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst
> >>>> b/Documentation/media/uapi/v4l/dev-decoder.rst new file mode 100644
> >>>> index 000000000000..f55d34d2f860
> >>>> --- /dev/null
> >>>> +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> >>>> @@ -0,0 +1,872 @@
> > 
> > [snip]
> > 
> >>>> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS`
> >>>> on
> >>>> +    ``OUTPUT``.
> >>>> +
> >>>> +    * **Required fields:**
> >>>> +
> >>>> +      ``count``
> >>>> +          requested number of buffers to allocate; greater than zero
> >>>> +
> >>>> +      ``type``
> >>>> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> >>>> +
> >>>> +      ``memory``
> >>>> +          follows standard semantics
> >>>> +
> >>>> +      ``sizeimage``
> >>>> +          follows standard semantics; the client is free to choose
> >>>> any
> >>>> +          suitable size, however, it may be subject to change by the
> >>>> +          driver
> >>>> +
> >>>> +    * **Return fields:**
> >>>> +
> >>>> +      ``count``
> >>>> +          actual number of buffers allocated
> >>>> +
> >>>> +    * The driver must adjust count to minimum of required number of
> >>>> +      ``OUTPUT`` buffers for given format and count passed.
> >>> 
> >>> Isn't it the maximum, not the minimum ?
> >> 
> >> It's actually neither. All we can generally say here is that the
> >> number will be adjusted and the client must note it.
> > 
> > I expect it to be clamp(requested count, driver minimum, driver maximum).
> > I'm not sure it's worth capturing this in the document though, but we
> > could say
> > 
> > "The driver must clam count to the minimum and maximum number of required
> > ``OUTPUT`` buffers for the given format ."
> 
> I'd leave the details to the documentation of VIDIOC_REQBUFS, if
> needed. This document focuses on the decoder UAPI and with this note I
> want to ensure that the applications don't assume that exactly the
> requested number of buffers is always allocated.
> 
> How about making it even simpler:
> 
> The actual number of allocated buffers may differ from the ``count``
> given. The client must check the updated value of ``count`` after the
> call returns.

That works for me. You may want to see "... given, as specified in the 
VIDIOC_REQBUFS documentation.".

> >>>> The client must
> >>>> +      check this value after the ioctl returns to get the number of
> >>>> +      buffers allocated.
> >>>> +
> >>>> +    .. note::
> >>>> +
> >>>> +       To allocate more than minimum number of buffers (for pipeline
> >>>> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
> >>>> +       get minimum number of buffers required by the driver/format,
> >>>> +       and pass the obtained value plus the number of additional
> >>>> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> >>>> +
> >>>> +5.  Start streaming on ``OUTPUT`` queue via
> >>>> :c:func:`VIDIOC_STREAMON`.
> >>>> +
> >>>> +6.  This step only applies to coded formats that contain resolution
> >>>> +    information in the stream. Continue queuing/dequeuing bitstream
> >>>> +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF`
> >>>> and
> >>>> +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and
> >>>> returning
> >>>> +    each buffer to the client until required metadata to configure
> >>>> the
> >>>> +    ``CAPTURE`` queue are found. This is indicated by the driver
> >>>> sending
> >>>> +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> >>>> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> >>>> +    requirement to pass enough data for this to occur in the first
> >>>> buffer
> >>>> +    and the driver must be able to process any number.
> >>>> +
> >>>> +    * If data in a buffer that triggers the event is required to
> >>>> decode
> >>>> +      the first frame, the driver must not return it to the client,
> >>>> +      but must retain it for further decoding.
> >>>> +
> >>>> +    * If the client set width and height of ``OUTPUT`` format to 0,
> >>>> calling
> >>>> +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return
> >>>> -EPERM,
> >>>> +      until the driver configures ``CAPTURE`` format according to
> >>>> stream
> >>>> +      metadata.
> >>> 
> >>> That's a pretty harsh handling for this condition. What's the
> >>> rationale for returning -EPERM instead of for instance succeeding with
> >>> width and height set to 0 ?
> >> 
> >> I don't like it, but the error condition must stay for compatibility
> >> reasons as that's what current drivers implement and applications
> >> expect. (Technically current drivers would return -EINVAL, but we
> >> concluded that existing applications don't care about the exact value,
> >> so we can change it to make more sense.)
> > 
> > Fair enough :-/ A bit of a shame though. Should we try to use an error
> > code that would have less chance of being confused with an actual
> > permission problem ? -EILSEQ could be an option for "illegal sequence" of
> > operations, but better options could exist.
> 
> In Request API we concluded that -EACCES is the right code to return
> for G_EXT_CTRLS on a request that has not finished yet. The case here
> is similar - the capture queue is not yet set up. What do you think?

Good question. -EPERM is documented as "Operation not permitted", while -
EACCES is documented as "Permission denied". The former appears to be 
understood as "This isn't a good idea, I can't let you do that", and the 
latter as "You don't have sufficient privileges, if you retry with the correct 
privileges this will succeed". Neither are a perfect match, but -EACCES might 
be better if you replace getting privileges by performing the required setup.

> >>>> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE``
> >>>> events and
> >>>> +      the event is signaled, the decoding process will not continue
> >>>> until
> >>>> +      it is acknowledged by either (re-)starting streaming on
> >>>> ``CAPTURE``,
> >>>> +      or via :c:func:`VIDIOC_DECODER_CMD` with
> >>>> ``V4L2_DEC_CMD_START``
> >>>> +      command.
> >>>> +
> >>>> +    .. note::
> >>>> +
> >>>> +       No decoded frames are produced during this phase.
> >>>> +

[snip]

> >> Also added a note:
> >>        To fulfill those requirements, the client may attempt to use
> >>        :c:func:`VIDIOC_CREATE_BUFS` to add more buffers. However, due to
> >>        hardware limitations, the decoder may not support adding buffers
> >>        at this point and the client must be able to handle a failure
> >>        using the steps below.
> > 
> > I wonder if there could be a way to work around those limitations on the
> > driver side. At the beginning of step 7, the decoder is effectively
> > stopped. If the hardware doesn't support adding new buffers on the fly,
> > can't the driver support the VIDIOC_CREATE_BUFS + V4L2_DEC_CMD_START
> > sequence the same way it would support the VIDIOC_STREAMOFF +
> > VIDIOC_REQBUFS(0) +
> > VIDIOC_REQBUFS(n) + VIDIOC_STREAMON ?
> 
> I guess that would work. I would only allow it for the case where
> existing buffers are already big enough and just more buffers are
> needed. Otherwise it would lead to some weird cases, such as some old
> buffers already in the CAPTURE queue, blocking the decode of further
> frames. (While it could be handled by the driver returning them with
> an error state, it would only complicate the interface.)

Good point. I wonder if this could be handled in the framework. If it can't, 
or with non trivial support code on the driver side, then I would agree with 
you. Otherwise, handling the workaround in the framework would ensure 
consistent behaviour across drivers with minimal cost, and simplify the 
userspace API, so I think it would be a good thing.

-- 
Regards,

Laurent Pinchart




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-20 10:24       ` Tomasz Figa
@ 2018-10-21  9:26         ` Laurent Pinchart
  0 siblings, 0 replies; 62+ messages in thread
From: Laurent Pinchart @ 2018-10-21  9:26 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

Hi Tomasz,

On Saturday, 20 October 2018 13:24:20 EEST Tomasz Figa wrote:
> On Thu, Oct 18, 2018 at 7:03 PM Tomasz Figa wrote:
> > On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart wrote:
> >> Hi Tomasz,
> >> 
> >> Thank you for the patch.
> > 
> > Thanks for your comments! Please see my replies inline.
> > 
> >> On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
> 
> [snip]
> 
> >>> +4. At this point, decoding is paused and the driver will accept, but
> >>> not
> >>> +   process any newly queued ``OUTPUT`` buffers until the client
> >>> issues
> >>> +   ``V4L2_DEC_CMD_START`` or restarts streaming on any queue.
> >>> +
> >>> +* Once the drain sequence is initiated, the client needs to drive it
> >>> to
> >>> +  completion, as described by the above steps, unless it aborts the
> >>> process
> >>> +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``OUTPUT`` queue. The client
> >>> +  is not allowed to issue ``V4L2_DEC_CMD_START`` or
> >>> ``V4L2_DEC_CMD_STOP``
> >>> +  again while the drain sequence is in progress and they will fail with
> >>> +  -EBUSY error code if attempted.
> >> 
> >> While this seems OK to me, I think drivers will need help to implement
> >> all the corner cases correctly without race conditions.
> > 
> > We went through the possible list of corner cases and concluded that
> > there is no use in handling them, especially considering how much they
> > would complicate both the userspace and the drivers. Not even
> > mentioning some hardware, like s5p-mfc, which actually has a dedicated
> > flush operation, that needs to complete before the decoder can switch
> > back to normal mode.
> 
> Actually I misread your comment.
> 
> Agreed that the decoder commands are a bit tricky to implement
> properly. That's one of the reasons I decided to make the return
> -EBUSY while an existing drain is in progress.
> 
> Do you have any particular simplification in mind that could avoid
> some corner cases?

Not really on the spec side. I think we'll have to implement helper functions 
for drivers to use if we want to ensure a consistent and bug-free behaviour.

-- 
Regards,

Laurent Pinchart




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-10-16 13:49         ` Hans Verkuil
@ 2018-10-22  4:50           ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-10-22  4:50 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Philipp Zabel, Linux Media Mailing List,
	Linux Kernel Mailing List, Stanimir Varbanov,
	Mauro Carvalho Chehab, Pawel Osciak, Alexandre Courbot, kamil,
	a.hajda, Kyungmin Park, jtp.park,
	Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, Laurent Pinchart,
	dave.stevenson, Ezequiel Garcia

On Tue, Oct 16, 2018 at 10:50 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> On 10/16/18 09:36, Tomasz Figa wrote:
> > On Tue, Aug 7, 2018 at 3:54 PM Tomasz Figa <tfiga@chromium.org> wrote:
> >>>> +   * The driver must expose following selection targets on ``OUTPUT``:
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> >>>> +         maximum crop bounds within the source buffer supported by the
> >>>> +         encoder
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> >>>> +         suggested cropping rectangle that covers the whole source picture
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_CROP``
> >>>> +         rectangle within the source buffer to be encoded into the
> >>>> +         ``CAPTURE`` stream; defaults to ``V4L2_SEL_TGT_CROP_DEFAULT``
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE_BOUNDS``
> >>>> +         maximum rectangle within the coded resolution, which the cropped
> >>>> +         source frame can be output into; always equal to (0, 0)x(width of
> >>>> +         ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``), if the
> >>>> +         hardware does not support compose/scaling
>
> Re-reading this I would rewrite this a bit:
>
> if the hardware does not support composition or scaling, then this is always
> equal to (0, 0)x(width of ``V4L2_SEL_TGT_CROP``, height of ``V4L2_SEL_TGT_CROP``).
>

Ack.

> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE_DEFAULT``
> >>>> +         equal to ``V4L2_SEL_TGT_CROP``
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE``
> >>>> +         rectangle within the coded frame, which the cropped source frame
> >>>> +         is to be output into; defaults to
> >>>> +         ``V4L2_SEL_TGT_COMPOSE_DEFAULT``; read-only on hardware without
> >>>> +         additional compose/scaling capabilities; resulting stream will
> >>>> +         have this rectangle encoded as the visible rectangle in its
> >>>> +         metadata
> >>>> +
> >>>> +     ``V4L2_SEL_TGT_COMPOSE_PADDED``
> >>>> +         always equal to coded resolution of the stream, as selected by the
> >>>> +         encoder based on source resolution and crop/compose rectangles
> >>>
> >>> Are there codec drivers that support composition? I can't remember seeing any.
> >>>
> >>
> >> Hmm, I was convinced that MFC could scale and we just lacked support
> >> in the driver, but I checked the documentation and it doesn't seem to
> >> be able to do so. I guess we could drop the COMPOSE rectangles for
> >> now, until we find any hardware that can do scaling or composing on
> >> the fly.
> >>
> >
> > On the other hand, having them defined already wouldn't complicate
> > existing drivers too much either, because they would just handle all
> > of them in the same switch case, i.e.
> >
> > case V4L2_SEL_TGT_COMPOSE_BOUNDS:
> > case V4L2_SEL_TGT_COMPOSE_DEFAULT:
> > case V4L2_SEL_TGT_COMPOSE:
> > case V4L2_SEL_TGT_COMPOSE_PADDED:
> >      return visible_rectangle;
> >
> > That would need one change, though. We would define
> > V4L2_SEL_TGT_COMPOSE_DEFAULT to be equal to (0, 0)x(width of
> > V4L2_SEL_TGT_CROP - 1, height of ``V4L2_SEL_TGT_CROP - 1), which
>
> " - 1"? Where does that come from?
>
> Usually rectangles are specified as widthxheight@left,top.
>

Yeah, the notation I used was quite unfortunate. How about just making
it fully verbose?

         if the hardware does not support
         composition or scaling, then this is always equal to the rectangle of
         width and height matching ``V4L2_SEL_TGT_CROP`` and located at (0, 0)

> > makes more sense than current definition, since it would bypass any
> > compose/scaling by default.
>
> I have no problem with drivers optionally implementing these rectangles,
> even if they don't do scaling or composition. The question is, should it
> be required for decoders? If there is a good reason, then I'm OK with it.

There is no particular reason to do it for existing drivers. I'm
personally not a big fan of making things optional, since you never
know when something becomes required and then you run into problems
with user space compatibility. In this case the cost of having those
rectangles defined is really low and they will be useful to handle
encoders and decoders with scaling/compose ability in the future.

I have no strong opinion though.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface
  2018-10-17 15:19   ` Laurent Pinchart
@ 2018-10-22  6:12     ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-10-22  6:12 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

On Thu, Oct 18, 2018 at 12:19 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Tomasz,
>
> Thank you for the patch.
>

Thanks for the review. I'll snip out the comments that I've already addressed.

> On Tuesday, 24 July 2018 17:06:21 EEST Tomasz Figa wrote:
[snip]
> > +Glossary
> > +========
>
> [snip]
>
> Let's try to share these two sections between the two documents.
>

Do you have any idea on how to include common contents in multiple
documentation pages (here encoder and decoder)?

> [snip]
>
> > +Initialization
> > +==============
> > +
> > +1. *[optional]* Enumerate supported formats and resolutions. See
> > +   capability enumeration.
> > +
> > +2. Set a coded format on the ``CAPTURE`` queue via :c:func:`VIDIOC_S_FMT`
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``CAPTURE``
> > +
> > +     ``pixelformat``
> > +         set to a coded format to be produced
> > +
> > +   * **Return fields:**
> > +
> > +     ``width``, ``height``
> > +         coded resolution (based on currently active ``OUTPUT`` format)
>
> Shouldn't userspace then set the resolution on the CAPTURE queue first ?
>

I don't think so. The resolution on the CAPTURE queue is the internal
coded size of the stream. It depends on the resolution of the OUTPUT
queue, selection rectangles and codec and hardware details.

Actually, I'm thinking whether we actually need to report it to the
userspace. I kept it this way to be consistent with decoders, but I
can't find any use case for it and the CAPTURE format could just
always have width and height set to 0, since it's just a compressed
bitstream.

> > +   .. note::
> > +
> > +      Changing ``CAPTURE`` format may change currently set ``OUTPUT``
> > +      format. The driver will derive a new ``OUTPUT`` format from
> > +      ``CAPTURE`` format being set, including resolution, colorimetry
> > +      parameters, etc. If the client needs a specific ``OUTPUT`` format,
> > +      it must adjust it afterwards.
>
> Doesn't this contradict the "based on currently active ``OUTPUT`` format"
> above ?
>

It might be worded a bit unfortunately indeed, but generally the
userspace doesn't set the width and height, so these values don't
affect the OUTPUT format.

> > +3. *[optional]* Enumerate supported ``OUTPUT`` formats (raw formats for
> > +   source) for the selected coded format via :c:func:`VIDIOC_ENUM_FMT`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``index``
> > +         follows standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``pixelformat``
> > +         raw format supported for the coded format currently selected on
> > +         the ``OUTPUT`` queue.
> > +
> > +4. The client may set the raw source format on the ``OUTPUT`` queue via
> > +   :c:func:`VIDIOC_S_FMT`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``pixelformat``
> > +         raw format of the source
> > +
> > +     ``width``, ``height``
> > +         source resolution
> > +
> > +     ``num_planes`` (for _MPLANE)
> > +         set to number of planes for pixelformat
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * **Return fields:**
> > +
> > +     ``width``, ``height``
> > +         may be adjusted by driver to match alignment requirements, as
> > +         required by the currently selected formats
> > +
> > +     ``sizeimage``, ``bytesperline``
> > +         follow standard semantics
> > +
> > +   * Setting the source resolution will reset visible resolution to the
> > +     adjusted source resolution rounded up to the closest visible
> > +     resolution supported by the driver. Similarly, coded resolution will
> > +     be reset to source resolution rounded up to the closest coded
> > +     resolution supported by the driver (typically a multiple of
> > +     macroblock size).
> > +
> > +   .. note::
> > +
> > +      This step is not strictly required, since ``OUTPUT`` is expected to
> > +      have a valid default format. However, the client needs to ensure that
>
> s/needs to/must/

I've removed this note.

>
> > +      ``OUTPUT`` format matches its expectations via either
> > +      :c:func:`VIDIOC_S_FMT` or :c:func:`VIDIOC_G_FMT`, with the former
> > +      being the typical scenario, since the default format is unlikely to
> > +      be what the client needs.
> > +
> > +5. *[optional]* Set visible resolution for the stream metadata via
> > +   :c:func:`VIDIOC_S_SELECTION` on the ``OUTPUT`` queue.
> > +
> > +   * **Required fields:**
> > +
> > +     ``type``
> > +         a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > +
> > +     ``target``
> > +         set to ``V4L2_SEL_TGT_CROP``
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle; this must fit within the framebuffer resolution
> > +         and might be subject to adjustment to match codec and hardware
> > +         constraints
>
> Just for my information, are there use cases for r.left != 0 or r.top != 0 ?
>

How about screen capture, where you select the part of the screen to
encode, or arbitrary cropping of camera frames?

> > +   * **Return fields:**
> > +
> > +     ``r.left``, ``r.top``, ``r.width``, ``r.height``
> > +         visible rectangle adjusted by the driver
> > +
> > +   * The driver must expose following selection targets on ``OUTPUT``:
> > +
> > +     ``V4L2_SEL_TGT_CROP_BOUNDS``
> > +         maximum crop bounds within the source buffer supported by the
> > +         encoder
>
> Will this always match the format on the OUTPUT queue, or can it differ ?

Yes. I've reworded it as follows:

     ``V4L2_SEL_TGT_CROP_BOUNDS``
         equal to the full source frame, matching the active ``OUTPUT``
         format

>
> > +     ``V4L2_SEL_TGT_CROP_DEFAULT``
> > +         suggested cropping rectangle that covers the whole source picture
>
> How can the driver know what to report here, apart from the same value as
> V4L2_SET_TGT_CROP_BOUNDS ?
>

I've made them equal in v2 indeed.

[snip]
> > +Drain
> > +=====
> > +
> > +To ensure that all queued ``OUTPUT`` buffers have been processed and
> > +related ``CAPTURE`` buffers output to the client, the following drain
> > +sequence may be followed. After the drain sequence is complete, the client
> > +has received all encoded frames for all ``OUTPUT`` buffers queued before
> > +the sequence was started.
> > +
> > +1. Begin drain by issuing :c:func:`VIDIOC_ENCODER_CMD`.
> > +
> > +   * **Required fields:**
> > +
> > +     ``cmd``
> > +         set to ``V4L2_ENC_CMD_STOP``
> > +
> > +     ``flags``
> > +         set to 0
> > +
> > +     ``pts``
> > +         set to 0
> > +
> > +2. The driver must process and encode as normal all ``OUTPUT`` buffers
> > +   queued by the client before the :c:func:`VIDIOC_ENCODER_CMD` was issued.
> > +
> > +3. Once all ``OUTPUT`` buffers queued before ``V4L2_ENC_CMD_STOP`` are
> > +   processed:
> > +
> > +   * Once all decoded frames (if any) are ready to be dequeued on the
> > +     ``CAPTURE`` queue
>
> I understand this condition to be equivalent to the main step 3 condition. I
> would thus write it as
>
> "At this point all decoded frames (if any) are ready to be dequeued on the
> ``CAPTURE`` queue. The driver must send a ``V4L2_EVENT_EOS``."
>
> > the driver must send a ``V4L2_EVENT_EOS``. The
> > +     driver must also set ``V4L2_BUF_FLAG_LAST`` in :c:type:`v4l2_buffer`
> > +     ``flags`` field on the buffer on the ``CAPTURE`` queue containing the
> > +     last frame (if any) produced as a result of processing the ``OUTPUT``
> > +     buffers queued before
>
> Unneeded line break ?
>

The whole sequence has been completely rewritten in v2, hopefully
addressing your comments. Please take a look when I post the new
revision.

> > +     ``V4L2_ENC_CMD_STOP``.
> > +
> > +   * If no more frames are left to be returned at the point of handling
> > +     ``V4L2_ENC_CMD_STOP``, the driver must return an empty buffer (with
> > +     :c:type:`v4l2_buffer` ``bytesused`` = 0) as the last buffer with
> > +     ``V4L2_BUF_FLAG_LAST`` set.
> > +
> > +   * Any attempts to dequeue more buffers beyond the buffer marked with
> > +     ``V4L2_BUF_FLAG_LAST`` will result in a -EPIPE error code returned by
> > +     :c:func:`VIDIOC_DQBUF`.
> > +
> > +4. At this point, encoding is paused and the driver will accept, but not
> > +   process any newly queued ``OUTPUT`` buffers until the client issues
> > +   ``V4L2_ENC_CMD_START`` or restarts streaming on any queue.
> > +
> > +* Once the drain sequence is initiated, the client needs to drive it to
> > +  completion, as described by the above steps, unless it aborts the process
> > +  by issuing :c:func:`VIDIOC_STREAMOFF` on ``CAPTURE`` queue.  The client
> > +  is not allowed to issue ``V4L2_ENC_CMD_START`` or ``V4L2_ENC_CMD_STOP``
> > +  again while the drain sequence is in progress and they will fail with
> > +  -EBUSY error code if attempted.
> > +
> > +* Restarting streaming on ``CAPTURE`` queue will implicitly end the paused
> > +  state and make the encoder continue encoding, as long as other encoding
> > +  conditions are met. Restarting ``OUTPUT`` queue will not affect an
> > +  in-progress drain sequence.
>
> The last sentence seems to contradict the "on any queue" part of step 4. What
> happens if the client restarts streaming on the OUTPUT queue while a drain
> sequence is in progress ?
>

v2 states that the sequence is aborted in case of any of the queues is stopped.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface
  2018-10-21  9:23           ` Laurent Pinchart
@ 2018-10-22  6:19             ` Tomasz Figa
  0 siblings, 0 replies; 62+ messages in thread
From: Tomasz Figa @ 2018-10-22  6:19 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Linux Media Mailing List, Linux Kernel Mailing List,
	Stanimir Varbanov, Mauro Carvalho Chehab, Hans Verkuil,
	Pawel Osciak, Alexandre Courbot, kamil, a.hajda, Kyungmin Park,
	jtp.park, Philipp Zabel, Tiffany Lin (林慧珊),
	Andrew-CT Chen (陳智迪),
	todor.tomov, nicolas, Paul Kocialkowski, dave.stevenson,
	Ezequiel Garcia

On Sun, Oct 21, 2018 at 6:23 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Tomasz,
>
> On Saturday, 20 October 2018 11:52:57 EEST Tomasz Figa wrote:
> > On Thu, Oct 18, 2018 at 8:22 PM Laurent Pinchart wrote:
> > > On Thursday, 18 October 2018 13:03:33 EEST Tomasz Figa wrote:
> > >> On Wed, Oct 17, 2018 at 10:34 PM Laurent Pinchart wrote:
> > >>> On Tuesday, 24 July 2018 17:06:20 EEST Tomasz Figa wrote:
> > >>>> Due to complexity of the video decoding process, the V4L2 drivers of
> > >>>> stateful decoder hardware require specific sequences of V4L2 API
> > >>>> calls to be followed. These include capability enumeration,
> > >>>> initialization, decoding, seek, pause, dynamic resolution change, drain
> > >>>> and end of stream.
> > >>>>
> > >>>> Specifics of the above have been discussed during Media Workshops at
> > >>>> LinuxCon Europe 2012 in Barcelona and then later Embedded Linux
> > >>>> Conference Europe 2014 in Düsseldorf. The de facto Codec API that
> > >>>> originated at those events was later implemented by the drivers we
> > >>>> already have merged in mainline, such as s5p-mfc or coda.
> > >>>>
> > >>>> The only thing missing was the real specification included as a part
> > >>>> of Linux Media documentation. Fix it now and document the decoder part
> > >>>> of the Codec API.
> > >>>>
> > >>>> Signed-off-by: Tomasz Figa <tfiga@chromium.org>
> > >>>> ---
> > >>>>
> > >>>>  Documentation/media/uapi/v4l/dev-decoder.rst | 872 +++++++++++++++++++
> > >>>>  Documentation/media/uapi/v4l/devices.rst     |   1 +
> > >>>>  Documentation/media/uapi/v4l/v4l2.rst        |  10 +-
> > >>>>  3 files changed, 882 insertions(+), 1 deletion(-)
> > >>>>  create mode 100644 Documentation/media/uapi/v4l/dev-decoder.rst
> > >>>>
> > >>>> diff --git a/Documentation/media/uapi/v4l/dev-decoder.rst
> > >>>> b/Documentation/media/uapi/v4l/dev-decoder.rst new file mode 100644
> > >>>> index 000000000000..f55d34d2f860
> > >>>> --- /dev/null
> > >>>> +++ b/Documentation/media/uapi/v4l/dev-decoder.rst
> > >>>> @@ -0,0 +1,872 @@
> > >
> > > [snip]
> > >
> > >>>> +4.  Allocate source (bitstream) buffers via :c:func:`VIDIOC_REQBUFS`
> > >>>> on
> > >>>> +    ``OUTPUT``.
> > >>>> +
> > >>>> +    * **Required fields:**
> > >>>> +
> > >>>> +      ``count``
> > >>>> +          requested number of buffers to allocate; greater than zero
> > >>>> +
> > >>>> +      ``type``
> > >>>> +          a ``V4L2_BUF_TYPE_*`` enum appropriate for ``OUTPUT``
> > >>>> +
> > >>>> +      ``memory``
> > >>>> +          follows standard semantics
> > >>>> +
> > >>>> +      ``sizeimage``
> > >>>> +          follows standard semantics; the client is free to choose
> > >>>> any
> > >>>> +          suitable size, however, it may be subject to change by the
> > >>>> +          driver
> > >>>> +
> > >>>> +    * **Return fields:**
> > >>>> +
> > >>>> +      ``count``
> > >>>> +          actual number of buffers allocated
> > >>>> +
> > >>>> +    * The driver must adjust count to minimum of required number of
> > >>>> +      ``OUTPUT`` buffers for given format and count passed.
> > >>>
> > >>> Isn't it the maximum, not the minimum ?
> > >>
> > >> It's actually neither. All we can generally say here is that the
> > >> number will be adjusted and the client must note it.
> > >
> > > I expect it to be clamp(requested count, driver minimum, driver maximum).
> > > I'm not sure it's worth capturing this in the document though, but we
> > > could say
> > >
> > > "The driver must clam count to the minimum and maximum number of required
> > > ``OUTPUT`` buffers for the given format ."
> >
> > I'd leave the details to the documentation of VIDIOC_REQBUFS, if
> > needed. This document focuses on the decoder UAPI and with this note I
> > want to ensure that the applications don't assume that exactly the
> > requested number of buffers is always allocated.
> >
> > How about making it even simpler:
> >
> > The actual number of allocated buffers may differ from the ``count``
> > given. The client must check the updated value of ``count`` after the
> > call returns.
>
> That works for me. You may want to see "... given, as specified in the
> VIDIOC_REQBUFS documentation.".
>

The "Conventions[...]" section mentions that

1. The general V4L2 API rules apply if not specified in this document
   otherwise.

so I think I'll skip this additional explanation.

> > >>>> The client must
> > >>>> +      check this value after the ioctl returns to get the number of
> > >>>> +      buffers allocated.
> > >>>> +
> > >>>> +    .. note::
> > >>>> +
> > >>>> +       To allocate more than minimum number of buffers (for pipeline
> > >>>> +       depth), use G_CTRL(``V4L2_CID_MIN_BUFFERS_FOR_OUTPUT``) to
> > >>>> +       get minimum number of buffers required by the driver/format,
> > >>>> +       and pass the obtained value plus the number of additional
> > >>>> +       buffers needed in count to :c:func:`VIDIOC_REQBUFS`.
> > >>>> +
> > >>>> +5.  Start streaming on ``OUTPUT`` queue via
> > >>>> :c:func:`VIDIOC_STREAMON`.
> > >>>> +
> > >>>> +6.  This step only applies to coded formats that contain resolution
> > >>>> +    information in the stream. Continue queuing/dequeuing bitstream
> > >>>> +    buffers to/from the ``OUTPUT`` queue via :c:func:`VIDIOC_QBUF`
> > >>>> and
> > >>>> +    :c:func:`VIDIOC_DQBUF`. The driver must keep processing and
> > >>>> returning
> > >>>> +    each buffer to the client until required metadata to configure
> > >>>> the
> > >>>> +    ``CAPTURE`` queue are found. This is indicated by the driver
> > >>>> sending
> > >>>> +    a ``V4L2_EVENT_SOURCE_CHANGE`` event with
> > >>>> +    ``V4L2_EVENT_SRC_CH_RESOLUTION`` source change type. There is no
> > >>>> +    requirement to pass enough data for this to occur in the first
> > >>>> buffer
> > >>>> +    and the driver must be able to process any number.
> > >>>> +
> > >>>> +    * If data in a buffer that triggers the event is required to
> > >>>> decode
> > >>>> +      the first frame, the driver must not return it to the client,
> > >>>> +      but must retain it for further decoding.
> > >>>> +
> > >>>> +    * If the client set width and height of ``OUTPUT`` format to 0,
> > >>>> calling
> > >>>> +      :c:func:`VIDIOC_G_FMT` on the ``CAPTURE`` queue will return
> > >>>> -EPERM,
> > >>>> +      until the driver configures ``CAPTURE`` format according to
> > >>>> stream
> > >>>> +      metadata.
> > >>>
> > >>> That's a pretty harsh handling for this condition. What's the
> > >>> rationale for returning -EPERM instead of for instance succeeding with
> > >>> width and height set to 0 ?
> > >>
> > >> I don't like it, but the error condition must stay for compatibility
> > >> reasons as that's what current drivers implement and applications
> > >> expect. (Technically current drivers would return -EINVAL, but we
> > >> concluded that existing applications don't care about the exact value,
> > >> so we can change it to make more sense.)
> > >
> > > Fair enough :-/ A bit of a shame though. Should we try to use an error
> > > code that would have less chance of being confused with an actual
> > > permission problem ? -EILSEQ could be an option for "illegal sequence" of
> > > operations, but better options could exist.
> >
> > In Request API we concluded that -EACCES is the right code to return
> > for G_EXT_CTRLS on a request that has not finished yet. The case here
> > is similar - the capture queue is not yet set up. What do you think?
>
> Good question. -EPERM is documented as "Operation not permitted", while -
> EACCES is documented as "Permission denied". The former appears to be
> understood as "This isn't a good idea, I can't let you do that", and the
> latter as "You don't have sufficient privileges, if you retry with the correct
> privileges this will succeed". Neither are a perfect match, but -EACCES might
> be better if you replace getting privileges by performing the required setup.
>

AFAIR that was also the rationale behind it for the Request API.

> > >>>> +    * If the client subscribes to ``V4L2_EVENT_SOURCE_CHANGE``
> > >>>> events and
> > >>>> +      the event is signaled, the decoding process will not continue
> > >>>> until
> > >>>> +      it is acknowledged by either (re-)starting streaming on
> > >>>> ``CAPTURE``,
> > >>>> +      or via :c:func:`VIDIOC_DECODER_CMD` with
> > >>>> ``V4L2_DEC_CMD_START``
> > >>>> +      command.
> > >>>> +
> > >>>> +    .. note::
> > >>>> +
> > >>>> +       No decoded frames are produced during this phase.
> > >>>> +
>
> [snip]
>
> > >> Also added a note:
> > >>        To fulfill those requirements, the client may attempt to use
> > >>        :c:func:`VIDIOC_CREATE_BUFS` to add more buffers. However, due to
> > >>        hardware limitations, the decoder may not support adding buffers
> > >>        at this point and the client must be able to handle a failure
> > >>        using the steps below.
> > >
> > > I wonder if there could be a way to work around those limitations on the
> > > driver side. At the beginning of step 7, the decoder is effectively
> > > stopped. If the hardware doesn't support adding new buffers on the fly,
> > > can't the driver support the VIDIOC_CREATE_BUFS + V4L2_DEC_CMD_START
> > > sequence the same way it would support the VIDIOC_STREAMOFF +
> > > VIDIOC_REQBUFS(0) +
> > > VIDIOC_REQBUFS(n) + VIDIOC_STREAMON ?
> >
> > I guess that would work. I would only allow it for the case where
> > existing buffers are already big enough and just more buffers are
> > needed. Otherwise it would lead to some weird cases, such as some old
> > buffers already in the CAPTURE queue, blocking the decode of further
> > frames. (While it could be handled by the driver returning them with
> > an error state, it would only complicate the interface.)
>
> Good point. I wonder if this could be handled in the framework. If it can't,
> or with non trivial support code on the driver side, then I would agree with
> you. Otherwise, handling the workaround in the framework would ensure
> consistent behaviour across drivers with minimal cost, and simplify the
> userspace API, so I think it would be a good thing.

I think it should be possible to handle in the framework, but right
now we don't have a framework for codecs and it would definitely be a
non-trivial piece of code.

I'd stick to the restricted behavior for now, since it's easy to lift
the restrictions in the future, but if we make it mandatory, the
userspace could start relying on it.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2018-10-22  6:26 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-24 14:06 [PATCH 0/2] Document memory-to-memory video codec interfaces Tomasz Figa
2018-07-24 14:06 ` [PATCH 1/2] media: docs-rst: Document memory-to-memory video decoder interface Tomasz Figa
2018-07-25 11:58   ` Hans Verkuil
2018-07-26 10:20     ` Tomasz Figa
2018-07-26 10:36       ` Philipp Zabel
2018-08-07  6:55         ` Tomasz Figa
2018-07-26 10:57       ` Hans Verkuil
2018-08-07  7:05         ` Tomasz Figa
2018-08-07  7:37           ` Hans Verkuil
2018-08-08  2:55             ` Tomasz Figa
2018-08-21 11:29               ` Stanimir Varbanov
2018-08-27  4:09                 ` Tomasz Figa
2018-10-15 10:13               ` Tomasz Figa
2018-10-16  1:09                 ` Nicolas Dufresne
2018-08-07  7:13       ` Hans Verkuil
2018-08-07 19:11         ` Maxime Jourdan
2018-08-08  3:07           ` Tomasz Figa
2018-08-08  7:19             ` Maxime Jourdan
2018-08-08  3:11         ` Tomasz Figa
2018-08-08  6:43           ` Hans Verkuil
2018-08-08  6:54             ` Ian Arkver
2018-09-19 10:17       ` Tomasz Figa
2018-10-08 12:22         ` Hans Verkuil
2018-10-09  4:23           ` Tomasz Figa
2018-10-09  6:39             ` Hans Verkuil
2018-07-30 12:52   ` Hans Verkuil
2018-08-07  7:08     ` Tomasz Figa
2018-08-08  2:46   ` Tomasz Figa
2018-08-20 13:04   ` Philipp Zabel
2018-08-20 13:12     ` Tomasz Figa
2018-08-20 14:13       ` Philipp Zabel
2018-08-20 14:27         ` Tomasz Figa
2018-08-20 15:33           ` Philipp Zabel
2018-08-27  4:03             ` Tomasz Figa
2018-08-31  8:26   ` Alexandre Courbot
2018-09-05  5:45     ` Tomasz Figa
2018-10-17 13:34   ` Laurent Pinchart
2018-10-18 10:03     ` Tomasz Figa
2018-10-18 11:22       ` Laurent Pinchart
2018-10-20  8:52         ` Tomasz Figa
2018-10-21  9:23           ` Laurent Pinchart
2018-10-22  6:19             ` Tomasz Figa
2018-10-20 10:24       ` Tomasz Figa
2018-10-21  9:26         ` Laurent Pinchart
2018-10-20 15:39       ` Tomasz Figa
2018-07-24 14:06 ` [PATCH 2/2] media: docs-rst: Document memory-to-memory video encoder interface Tomasz Figa
2018-07-25 13:41   ` Philipp Zabel
2018-08-07  6:07     ` Tomasz Figa
2018-07-25 13:57   ` Hans Verkuil
2018-08-07  6:54     ` Tomasz Figa
2018-08-07  7:25       ` Hans Verkuil
2018-10-16  7:36       ` Tomasz Figa
2018-10-16 13:49         ` Hans Verkuil
2018-10-22  4:50           ` Tomasz Figa
2018-09-07 20:17   ` Ezequiel Garcia
2018-09-10  3:34     ` Tomasz Figa
2018-10-17 15:19   ` Laurent Pinchart
2018-10-22  6:12     ` Tomasz Figa
2018-07-25 13:28 ` [PATCH 0/2] Document memory-to-memory video codec interfaces Philipp Zabel
2018-07-25 13:35   ` Tomasz Figa
2018-09-10  9:13 ` Hans Verkuil
2018-09-11  3:52   ` Tomasz Figa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).