All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification
@ 2022-12-08  7:23 Alexandre Courbot
  2022-12-08 15:00 ` Cornelia Huck
                   ` (3 more replies)
  0 siblings, 4 replies; 97+ messages in thread
From: Alexandre Courbot @ 2022-12-08  7:23 UTC (permalink / raw)
  To: virtio-dev, Keiichi Watanabe, Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Alexandre Courbot

Add the specification of the video decoder and encoder devices, which
can be used to provide host-accelerated video operations to the guest.

Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
--
Here is the long-overdue new revision of the virtio-video RFC. This
version reorganizes the specification quite a bit and tries to simplify
the protocol further. Nonetheless, it still results in a rather long (17
pages) specification for just these devices, even though the spec is not
fully complete (I want to rethink the formats descriptions, and some
parameters need to be added for the encoder device).

I would like to get some high-level feedback on this version and maybe
propose to do things a bit differently before people invest too much
time reviewing this in depth. While rewriting this document, it became
more and more obvious that this is just a different, and maybe a bit
simpler, reimplementation of the V4L2 stateless decoder protocol [1]. I
am now wondering whether it would not make more sense to rewrite this
specification as just a way to transport V4L2 requests over virtio,
similarly to how virtio-fs does with the FUSE protocol [2].

At the time we started writing this implementation, the V4L2 protocols
for decoders and encoders were not set in stone yet, but now that they
are it might make sense to reconsider. Switching to this solution would
greatly shorten the virtio-video device spec, and also provide a way to
support other kind of V4L2 devices like cameras or image processors at
no extra cost.

Note that doing so would not require that either the host or guest uses
V4L2 - the virtio video device would just emulate a V4L2 device over
virtio. A few adaptations would need to be done regarding how memory
types work, but otherwise I believe most of V4L2 could be used as-is.

Please share your thoughts about this, and I will either explore this
idea further with a prototype, or keep moving the present spec forward,
hopefully at a faster pace.

Due to the RFC state of this patch I have refrained from referencing the
normative statements in conformance.tex - I will do that as a final step
once the spec is mostly agreed on.

[1] https://docs.kernel.org/userspace-api/media/v4l/dev-stateless-decoder.html
[2] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-fs.tex

Full PDF:
https://drive.google.com/file/d/1HRVDiDdo50-S9X5tWgzmT90FJRHoB1dN/view?usp=sharing

PDF of video section only:
https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing
---
 content.tex      |    1 +
 virtio-video.tex | 1420 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1421 insertions(+)
 create mode 100644 virtio-video.tex

diff --git a/content.tex b/content.tex
index 9d1de53..8a1557a 100644
--- a/content.tex
+++ b/content.tex
@@ -7543,6 +7543,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
 \input{virtio-scmi.tex}
 \input{virtio-gpio.tex}
 \input{virtio-pmem.tex}
+\input{virtio-video.tex}
 
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
diff --git a/virtio-video.tex b/virtio-video.tex
new file mode 100644
index 0000000..5d44eb1
--- /dev/null
+++ b/virtio-video.tex
@@ -0,0 +1,1420 @@
+\section{Video Device}\label{sec:Device Types / Video Device}
+
+The virtio video encoder and decoder devices provide support for
+host-accelerated video encoding and decoding. Despite being different
+devices types, they use the same protocol and general flow.
+
+\subsection{Device ID}\label{sec:Device Types / Video Device / Device ID}
+
+\begin{description}
+\item[30]
+encoder device
+\item[31]
+decoder device
+\end{description}
+
+\subsection{Virtqueues}\label{sec:Device Types / Video Device / Virtqueues}
+
+\begin{description}
+\item[0]
+commandq - queue for driver commands and device responses to these
+commands.
+\item[1]
+eventq - queue for events sent by the device to the driver.
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / Video Device / Feature bits}
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES (0)]
+Guest pages can be used as the backing memory of resources.
+\item[VIRTIO\_VIDEO\_F\_RESOURCE\_NON\_CONTIG (1)]
+The device can use non-contiguous guest memory as the backing memory of
+resources. Only meaningful if VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES
+is also set.
+\item[VIRTIO\_VIDEO\_F\_RESOURCE\_DYNAMIC (2)]
+The device supports re-attaching memory to resources while streaming.
+\item[VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT (3)]
+Objects exported by another virtio device can be used as the backing
+memory of resources.
+\end{description}
+
+\devicenormative{\subsubsection}{Feature bits}{Device Types / Video Device / Feature bits}
+
+The device MUST present at least one of
+VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES or
+VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT, since the absence of both
+bits would mean that no memory can be used at all for resources.
+
+\drivernormative{\subsubsection}{Feature bits}{Device Types / Video Device / Feature bits}
+
+The driver MUST negotiate at least one of the
+VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES and
+VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT features.
+
+If the VIRTIO\_VIDEO\_F\_RESOURCE\_NON\_CONTIG is not present, the
+driver MUST use physically contiguous memory for all the buffers it
+allocates.
+
+\subsection{Device configuration layout}\label{sec:Device Types / Video Device / Device configuration layout}
+
+Video device configuration uses the following layout:
+
+\begin{lstlisting}
+struct virtio_video_config {
+        le32 version;
+        le32 caps_length;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{version}]
+is the protocol version that the device understands.
+\item[\field{caps_length}]
+is the minumum length in bytes that a device-writable buffer must have
+in order to receive the response to
+VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
+\end{description}
+
+\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Video Device / Device configuration layout}
+
+As there is currently only one version of the protocol, the device MUST
+set the \field{version} field to 0.
+
+The device MUST set the \field{caps_length} field to a value equal to
+the response size of VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
+
+\subsection{Device Initialization}\label{sec:Device Types / Video Device / Device Initialization}
+
+\begin{enumerate}
+\def\labelenumi{\arabic{enumi}.}
+\item
+  The driver reads the feature bits and negotiates the features it
+  needs.
+\item
+  The driver sets up the commandq.
+\item
+  The driver confirms that it supports the version of the protocol
+  advertized in the \field{version} field of the configuration space.
+\item
+  The driver reads the \field{caps_length} field of the configuration
+  space and prepares a buffer of at least that size.
+\item
+  The driver sends that buffer on the commandq with the
+  VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS command.
+\item
+  The driver receives the reponse from the device, and parses its
+  capabilities.
+\end{enumerate}
+
+\subsection{Device Operation}\label{sec:Device Types / Video Device / Device Operation}
+
+The commandq is used by the driver to send commands to the device and to
+receive the device's response via used buffers.
+
+The driver can create new streams using the
+VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE command. Each stream has two resource
+queues (not to be confused with the virtio queues) called INPUT and
+OUTPUT. The INPUT queue accepts driver-filled input data for the device
+(bitstream data for a decoder ; input frames for an encoder), while the
+OUTPUT queue receives resources to be filled by the device as a result
+of processing the INPUT queue (decoded frames for a decoder ; encoded
+bitstream data for an encoder).
+
+A resource is a set of memory buffers that contain a unit of data that
+the device can process or produce. Most resources will only have one
+buffer (like bitstreams and single-planar images), but frames using a
+multi-planar format will have several.
+
+Before resources can be submitted to a queue, backing memory must be
+attached to them using VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING.
+The exact form of that memory is negotiated using the feature flags.
+
+The INPUT and OUTPUT queues are effectively independent, and the driver
+can fill them without caring about the other queue. In particular there
+is no need to queue input and output resources in pairs: one input
+resource can result in zero to many output resources being produced.
+
+Resources are queued to the INPUT or OUTPUT queue using the
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command. The device replies to this
+command when an input resource has been fully processed and can be
+reused by the driver, or when an output resource has been filled by the
+device as a result of processing.
+
+Parameters of the stream can be obtained and configured using
+VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM and
+VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM. Available parameters depend on
+on the device type and are detailed in section
+\ref{sec:Device Types / Video Device / Parameters}.
+
+The device may detect stream-related events that require intervention
+from the driver and signals them on the eventq. One example is a dynamic
+resolution change while decoding a stream, which may require the driver
+to reallocate the backing memory of its output resources to fit the new
+resolution.
+
+\drivernormative{\subsubsection}{Device Operation}{Device Types / Video Device / Device Operation}
+
+Descriptor chains sent to the commandq by the driver MUST include at
+least one device-writable descriptor of a size sufficient to receive the
+response to the queued command.
+
+\devicenormative{\subsubsection}{Device Operation}{Device Types / Video Device / Device Operation}
+
+Responses to a command MUST be written by the device in the first
+device-writable descriptor of the descriptor chain from which the
+command came.
+
+\subsubsection{Device Operation: Command Virtqueue}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
+
+This section details the commands that can be sent on the commandq by
+the driver, as well as the responses that the device will write.
+
+Different structures are used for each command and response. A command
+structure starts with the requested command code, defined as follows:
+
+\begin{lstlisting}
+/* Device */
+#define VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS       0x100
+
+/* Stream */
+#define VIRTIO_VIDEO_CMD_STREAM_CREATE           0x200
+#define VIRTIO_VIDEO_CMD_STREAM_DESTROY          0x201
+#define VIRTIO_VIDEO_CMD_STREAM_DRAIN            0x203
+#define VIRTIO_VIDEO_CMD_STREAM_STOP             0x204
+#define VIRTIO_VIDEO_CMD_STREAM_GET_PARAM        0x205
+#define VIRTIO_VIDEO_CMD_STREAM_SET_PARAM        0x206
+
+/* Resource*/
+#define VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING 0x400
+#define VIRTIO_VIDEO_CMD_RESOURCE_QUEUE          0x401
+};
+\end{lstlisting}
+
+A response structure starts with the result of the requested command,
+defined as follows:
+
+\begin{lstlisting}
+/* Success */
+#define VIRTIO_VIDEO_RESULT_OK                          0x000
+
+/* Error */
+#define VIRTIO_VIDEO_RESULT_ERR_INVALID_COMMAND         0x100
+#define VIRTIO_VIDEO_RESULT_ERR_INVALID_OPERATION       0x101
+#define VIRTIO_VIDEO_RESULT_ERR_INVALID_STREAM_ID       0x102
+#define VIRTIO_VIDEO_RESULT_ERR_INVALID_RESOURCE_ID     0x103
+#define VIRTIO_VIDEO_RESULT_ERR_INVALID_ARGUMENT        0x104
+#define VIRTIO_VIDEO_RESULT_ERR_CANCELED                0x105
+#define VIRTIO_VIDEO_RESULT_ERR_OUT_OF_MEMORY           0x106
+\end{lstlisting}
+
+For response structures carrying an error code, the rest of the
+structure is considered invalid. Only response structures carrying
+VIRTIO\_VIDEO\_RESULT\_OK shall be examined further by the driver.
+
+\devicenormative{\paragraph}{Device Operation: Command Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
+
+The device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND to
+any non-existing command code.
+
+\drivernormative{\paragraph}{Device Operation: Command Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
+
+The driver MUST NOT interpret the rest of a response which result is not
+VIRTIO\_VIDEO\_RESULT\_OK.
+
+\subsubsection{Device Operation: Device Commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands}
+
+Device capabilities are retrieved using the
+VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS command, which returns arrays of
+formats supported by the input and output queues.
+
+\paragraph{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
+
+Retrieve device capabilities.
+
+The driver sends this command with
+\field{struct virtio_video_device_query_caps}:
+
+\begin{lstlisting}
+struct virtio_video_device_query_caps {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS */
+};
+\end{lstlisting}
+
+The device responds with
+\field{struct virtio_video_device_query_caps_resp}:
+
+\begin{lstlisting}
+struct virtio_video_device_query_caps_resp {
+        le32 result; /* VIRTIO_VIDEO_RESULT_* */
+        le32 num_bitstream_formats;
+        le32 num_image_formats;
+        /**
+         * Followed by
+         * struct virtio_video_bitstream_format_desc bitstream_formats[num_bitstream_formats];
+         */
+        /**
+         * Followed by
+         * struct virtio_video_image_format_desc image_formats[num_image_formats]
+         */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
+if the descriptor was smaller than the defined \field{caps_length} in
+the video device configuration.
+\end{description}
+\item[\field{num_bitstream_formats}]
+is the number of supported bitstream formats.
+\item[\field{num_image_formats}]
+is the number of supported image formats.
+\item[\field{bitstream_formats}]
+is an array of size \field{num_bitstream_formats} containing the
+supported encoded formats. These correspond to the formats that can be
+set on the INPUT queue for a decoder, and on the OUTPUT queue for an
+encoder. For a description of bitstream formats, see
+\ref{sec:Device Types / Video Device / Supported formats / Bitstream formats}.
+\item[\field{image_formats}]
+is an array of size \field{num_image_formats} containing the supported
+image formats. These correspond to the formats that can be set on the
+OUTPUT queue for a decoder, and on the INPUT queue for an encoder. For a
+description of image formats, see
+\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
+\end{description}
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}{Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS
+by the driver.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}{Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
+
+The device MUST write the two \field{bitstream_formats} and
+\field{image_formats} arrays, of length \field{num_bitstream_formats}
+and \field{num_image_formats}, respectively.
+
+The total size of the response MUST be equal to \field{caps_length}
+bytes, as reported by the device configuration.
+
+\subsubsection{Device Operation: Stream commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands}
+
+Stream commands allow the creation, destruction, and flow control of a
+stream.
+
+\paragraph{VIRTIO_VIDEO_CMD_STREAM_CREATE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
+
+Create a new stream using the device.
+
+The driver sends this command with
+\field{struct virtio_video_stream_create}:
+
+\begin{lstlisting}
+struct virtio_video_stream_create {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_CREATE */
+};
+\end{lstlisting}
+
+The device responds with \field{struct virtio_video_stream_create_resp}:
+
+\begin{lstlisting}
+struct virtio_video_stream_create_resp {
+        le32 result; /* VIRTIO_VIDEO_RESULT_* */
+        le32 stream_id;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
+if the limit of simultaneous streams has been reached by the device and
+no more can be created.
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND]
+if the stream cannot be created due to an unexpected device issue.
+\end{description}
+\item[\field{stream_id}]
+is the ID of the created stream allocated by the device.
+\end{description}
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_CREATE}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE by
+the driver.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_CREATE}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
+
+\field{stream_id} MUST be set to an identifier that is unique to that
+stream for as long as it lives.
+
+\paragraph{VIRTIO_VIDEO_CMD_STREAM_DESTROY}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
+
+Destroy a video stream and all its resources. Any activity on the stream
+is halted and all resources released by the time the response is
+received by the driver.
+
+The driver sends this command with
+\field{struct virtio_video_stream_destroy}:
+
+\begin{lstlisting}
+struct virtio_video_stream_destroy {
+         le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_DESTROY */
+         le32 stream_id;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{stream_id}]
+is the ID of the stream to be destroyed, as previously returned by
+VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+\end{description}
+
+The device responds with
+\field{struct virtio_video_stream_destroy_resp}:
+
+\begin{lstlisting}
+struct virtio_video_stream_destroy_resp {
+         le32 result; /* VIRTIO_VIDEO_RESULT_* */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
+if the requested stream ID does not exist.
+\end{description}
+\end{description}
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DESTROY}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY by
+the driver.
+
+\field{stream_id} MUST be set to a valid stream ID previously returned
+by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+
+The driver MUST stop using \field{stream_id} as a valid stream after it
+received the response to this command.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DESTROY}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
+
+Before the device sends a response, it MUST respond with
+VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED to all pending commands.
+
+After responding to this command, the device MUST reply with
+VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID to any command related
+to this stream.
+
+\paragraph{VIRTIO_VIDEO_CMD_STREAM_DRAIN}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
+
+Complete processing of all currently queued input resources.
+
+VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN ensures that all sent
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue are
+processed by the device and that the resulting output resources are
+available to the driver.
+
+The driver sends this command with
+\field{struct virtio_video_stream_drain}:
+
+\begin{lstlisting}
+struct virtio_video_stream_drain {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_DRAIN */
+        le32 stream_id;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{stream_id}]
+is the ID of the stream to drain, as previously returned by
+VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+\end{description}
+
+The device responds with \field{struct virtio_video_stream_drain_resp}
+once the drain operation is completed:
+
+\begin{lstlisting}
+struct virtio_video_stream_drain_resp {
+        le32 result; /* VIRTIO_VIDEO_RESULT_* */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
+if the requested stream does not exist,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
+if a drain operation is already in progress for this stream,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED]
+if the operation has been canceled by a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
+or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation.
+\end{description}
+\end{description}
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN by the
+driver.
+
+\field{stream_id} MUST be set to a valid stream ID previously returned
+by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+
+The driver MUST keep queueing output resources until it gets the
+response to this command. Failure to do so may result in the device
+stalling as it waits for output resources to write into.
+
+The driver MUST account for the fact that the response to this command
+might come out-of-order (i.e.~after other commands sent to the device),
+and that it can be interrupted.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
+
+Before the device sends the response, it MUST process and respond to all
+the VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue that
+were sent before the drain command, and make all the corresponding
+output resources available to the driver by responding to their
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command.
+
+While the device is processing the command, it MUST return
+VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to the
+VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN command.
+
+If the command is interrupted due to a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
+or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation, the device MUST
+respond with VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
+
+\paragraph{VIRTIO_VIDEO_CMD_STREAM_STOP}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
+
+Immediately return all queued input resources without processing them
+and stop operation until new input resources are queued.
+
+This command is mostly useful for decoders that need to quickly jump
+from one point of the stream to another (i.e.~seeking), or in order to
+stop processing as quickly as possible.
+
+The driver sends this command with
+\field{struct virtio_video_stream_stop}:
+
+\begin{lstlisting}
+struct virtio_video_stream_stop {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_STOP */
+        le32 stream_id;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{stream_id}]
+is the ID of the stream to stop, as previously returned by
+VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+\end{description}
+
+The device responds with \field{struct virtio_video_stream_stop_resp}
+after the response for all previously queued input resources has been
+sent:
+
+\begin{lstlisting}
+struct virtio_video_stream_stop_resp {
+        le32 result; /* VIRTIO_VIDEO_RESULT_* */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
+if the requested stream does not exist,
+\end{description}
+\end{description}
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_STOP}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_STOP by the
+driver.
+
+\field{stream_id} MUST be set to a valid stream ID previously returned
+by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+
+Upon receiving the response to this command, the driver SHOULD process
+(or drop) any output resource before resuming operation by queueing new
+input resources.
+
+Upon receiving the response to this command, the driver CAN modify the
+\field{struct virtio_video_params_resources} parameter corresponding to
+the INPUT queue, and subsequently attach new backing memory to the input
+resources using the VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING
+command.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_STOP}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
+
+The device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED to any
+pending VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN and
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command on the INPUT queue before
+responding to this command. Pending commands on the output queue are not
+affected.
+
+The device MUST interrupt operation as quickly as possible, and not be
+dependent on output resources being queued by the driver.
+
+Upon resuming processing, the device CAN skip input data until it finds
+a point that allows it to resume operation properly (e.g.~until a
+keyframe it found in the input stream of a decoder).
+
+\paragraph{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
+
+Read the value of a parameter of the given stream. Available parameters
+depend on the device type and are listed in
+\ref{sec:Device Types / Video Device / Parameters}.
+
+\begin{lstlisting}
+struct virtio_video_stream_get_param {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_GET_PARAM */
+        le32 stream_id;
+        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
+        u8 padding[4];
+}
+\end{lstlisting}
+
+\begin{description}
+\item[\field{stream_id}]
+is the ID of the stream we want to get a parameter from.
+\item[\field{param_type}]
+is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
+we want to get.
+\end{description}
+
+The device responds with \field{struct virtio_video_stream_param_resp}:
+
+\begin{lstlisting}
+struct virtio_video_stream_param_resp {
+        le32 result; /* VIRTIO_VIDEO_RESULT_* */
+        union virtio_video_stream_params param;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
+if the requested stream does not exist,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
+if the \field{param_type} argument is invalid for the device,
+\end{description}
+\item[\field{param}]
+is the value of the requested parameter, if \field{result} is
+VIRTIO\_VIDEO\_RESULT\_OK.
+\end{description}
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM
+by the driver.
+
+\field{stream_id} MUST be set to a valid stream ID previously returned
+by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+
+\field{param_type} MUST be set to a parameter type that is valid for the
+device.
+
+\paragraph{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
+
+Write the value of a parameter of the given stream, and return the value
+actually set by the device. Available parameters depend on the device
+type and are listed in
+\ref{sec:Device Types / Video Device / Parameters}.
+
+\begin{lstlisting}
+struct virtio_video_stream_set_param {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_SET_PARAM */
+        le32 stream_id;
+        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
+        u8 padding[4];
+        union virtio_video_stream_params param;
+}
+\end{lstlisting}
+
+\begin{description}
+\item[\field{stream_id}]
+is the ID of the stream we want to set a parameter for.
+\item[\field{param_type}]
+is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
+we want to set.
+\end{description}
+
+The device responds with \field{struct virtio_video_stream_param_resp}:
+
+\begin{lstlisting}
+struct virtio_video_stream_param_resp {
+        le32 result; /* VIRTIO_VIDEO_RESULT_* */
+        union virtio_video_stream_params param;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
+if the requested stream does not exist,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
+if the \field{param_type} argument is invalid for the device,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
+if the requested parameter cannot be modified at this moment.
+\end{description}
+\item[\field{param}]
+is the actual value of the parameter set by the device, if
+\field{result} is VIRTIO\_VIDEO\_RESULT\_OK. The value set by the device
+may differ from the requested value depending on the device's
+capabilities.
+\end{description}
+
+Outside of the error cases described above, setting a parameter does not
+fail. If the device cannot apply the parameter as requested, it will
+adjust it to the closest setting it supports, and return that value to
+the driver. It is then up to the driver to decide whether it can work
+within the range of parameters supported by the device.
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM
+by the driver.
+
+\field{stream_id} MUST be set to a valid stream ID previously returned
+by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+
+\field{param_type} MUST be set to a parameter type that is valid for the
+device, and \field{param} MUST be filled as the union member
+corresponding to \field{param_type}.
+
+The driver MUST check the actual value of the parameter as set by the
+device and work with this value, or fail properly if it cannot.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
+
+The device MUST NOT return an error if the value requested by the driver
+cannot be applied as-is. Instead, the device MUST set the parameter to
+the closest supported value to the one requested by the driver.
+
+\subsubsection{Device Operation: Resource Commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands}
+
+Resource commands manage the memory backing of individual resources and
+allow to queue them so the device can process them.
+
+\paragraph{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
+
+Assign backing memory to a resource.
+
+The driver sends this command with
+\field{struct virtio_video_resource_attach_backing}:
+
+\begin{lstlisting}
+#define VIRTIO_VIDEO_QUEUE_TYPE_INPUT       0
+#define VIRTIO_VIDEO_QUEUE_TYPE_OUTPUT      1
+
+struct virtio_video_resource_attach_backing {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING */
+        le32 stream_id;
+        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
+        le32 resource_id;
+        union virtio_video_resource resources[];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{stream_id}]
+is the ID of a valid stream.
+\item[\field{queue_type}]
+is the direction of the queue.
+\item[\field{resource_id}]
+is the ID of the resource to be attached to.
+\item[\field{resources}]
+specifies memory regions to attach.
+\end{description}
+
+The union \field{virtio_video_resource} is defined as follows:
+
+\begin{lstlisting}
+union virtio_video_resource {
+        struct virtio_video_resource_sg_list sg_list;
+        struct virtio_video_resource_object object;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{sg_list}]
+represents a scatter-gather list. This variant can be used when the
+\field{mem_type} member of the \field{virtio_video_params_resources}
+corresponding to the queue is set to
+VIRTIO\_VIDEO\_MEM\_TYPE\_GUEST\_PAGES (see
+\ref{sec:Device Types / Video Device / Parameters / Common parameters}).
+\item[\field{object}]
+represents an object exported from another virtio device. This variant
+can be used when the \field{mem_type} member of the
+\field{virtio_video_params_resources} corresponding to the queue is set
+to VIRTIO\_VIDEO\_MEM\_TYPE\_VIRTIO\_OBJECT (see
+\ref{sec:Device Types / Video Device / Parameters / Common parameters}).
+\end{description}
+
+The struct \field{virtio_video_resource_sg_list} is defined as follows:
+
+\begin{lstlisting}
+struct virtio_video_resource_sg_entry {
+        le64 addr;
+        le32 length;
+        u8 padding[4];
+};
+
+struct virtio_video_resource_sg_list {
+        le32 num_entries;
+        u8 padding[4];
+        /* Followed by num_entries instances of
+           video_video_resource_sg_entry */
+};
+\end{lstlisting}
+
+Within \field{struct virtio_video_resource_sg_entry}:
+
+\begin{description}
+\item[\field{addr}]
+is a guest physical address to the start of the SG entry.
+\item[\field{length}]
+is the length of the SG entry.
+\end{description}
+
+Finally, for \field{struct virtio_video_resource_sg_list}:
+
+\begin{description}
+\item[\field{num_entries}]
+is the number of \field{struct virtio_video_resource_sg_entry} instances
+that follow.
+\end{description}
+
+\field{struct virtio_video_resource_object} is defined as follows:
+
+\begin{lstlisting}
+struct virtio_video_resource_object {
+        u8 uuid[16];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[uuid]
+is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
+\end{description}
+
+The device responds with
+\field{struct virtio_video_resource_attach_backing_resp}:
+
+\begin{lstlisting}
+struct virtio_video_resource_attach_backing_resp {
+        le32 result; /* VIRTIO_VIDEO_RESULT_* */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
+if the mentioned stream does not exist,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
+if \field{queue_type}, \field{resource_id}, or \field{resources} have an
+invalid value,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
+if the operation is performed at a time when it is non-valid.
+\end{description}
+\end{description}
+
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
+the following times:
+
+\begin{itemize}
+\item
+  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
+  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
+  resource,
+\item
+  AFTER successfully changing the \field{virtio_video_params_resources}
+  parameter corresponding to the queue and BEFORE
+  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
+\end{itemize}
+
+This is to ensure that the device can rely on the fact that a given
+resource will always point to the same memory for as long as it may be
+used by the video device. For instance, a decoder may use returned
+decoded frames as reference for future frames and won't overwrite the
+backing resource of a frame that is being referenced. It is only before
+a stream is started and after a Dynamic Resolution Change event has
+occurred that we can be sure that all resources won't be used in that
+way.
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
+
+\field{cmd_type} MUST be set to
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING by the driver.
+
+\field{stream_id} MUST be set to a valid stream ID previously returned
+by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+
+\field{queue_type} MUST be set to a valid queue type.
+
+\field{resource_id} MUST be an integer inferior to the number of
+resources currently allocated for the queue.
+
+The length of the \field{resources} array of
+\field{struct virtio_video_resource_attach_backing} MUST be equal to the
+number of resources required by the format currently set on the queue,
+as described in
+\ref{sec:Device Types / Video Device / Supported formats}.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
+
+At any time other than the times valid for calling this command, the
+device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION.
+
+\paragraph{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
+
+Add a resource to the device's queue.
+
+\begin{lstlisting}
+#define VIRTIO_VIDEO_MAX_PLANES                    8
+
+#define VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME  (1 << 0)
+
+struct virtio_video_resource_queue {
+        le32 cmd_type; /* VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING */
+        le32 stream_id;
+        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
+        le32 resource_id;
+        le32 flags; /* Bitmask of VIRTIO_VIDEO_ENQUEUE_FLAG_* */
+        u8 padding[4];
+        le64 timestamp;
+        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{stream_id}]
+is the ID of a valid stream.
+\item[\field{queue_type}]
+is the direction of the queue.
+\item[\field{resource_id}]
+is the ID of the resource to be queued.
+\item[\field{flags}]
+is a bitmask of VIRTIO\_VIDEO\_ENQUEUE\_FLAG\_* values.
+
+\begin{description}
+\item[\field{VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME}]
+The submitted frame is to be encoded as a key frame. Only valid for the
+encoder's INPUT queue.
+\end{description}
+\item[\field{timestamp}]
+is an abstract sequence counter that can be used on the INPUT queue for
+synchronization. Resources produced on the output queue will carry the
+\field{timestamp} of the input resource they have been produced from.
+\item[\field{data_sizes}]
+number of data bytes used for each plane. Set by the driver for input
+resources.
+\end{description}
+
+The device responds with
+\field{struct virtio_video_resource_queue_resp}:
+
+\begin{lstlisting}
+#define VIRTIO_VIDEO_DEQUEUE_FLAG_ERR           (1 << 0)
+/* Encoder only */
+#define VIRTIO_VIDEO_DEQUEUE_FLAG_KEY_FRAME     (1 << 1)
+#define VIRTIO_VIDEO_DEQUEUE_FLAG_P_FRAME       (1 << 2)
+#define VIRTIO_VIDEO_DEQUEUE_FLAG_B_FRAME       (1 << 3)
+
+struct virtio_video_resource_queue_resp {
+        le32 result;
+        le32 flags;
+        le64 timestamp;
+        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{result}]
+is
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_RESULT\_OK]
+if the operation succeeded,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
+if the requested stream does not exist,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
+if the \field{queue_type}, \field{resource_id} or \field{flags}
+parameters have an invalid value,
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
+if VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING has not been
+successfully called on the resource prior to queueing it.
+\item[VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED]
+if the resource has not been processed, not because of an error but
+because of a change in the state of the codec. The driver is expected to
+take action and address the condition before submitting the resource
+again.
+\end{description}
+\item[\field{flags}]
+is a bitmask of VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_* flags.
+
+\begin{description}
+\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_ERR]
+is set on output resources when a non-fatal processing error has
+happened and the data contained by the resource is likely to be
+corrupted,
+\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_KEY\_FRAME]
+is set on output resources when the resource contains an encoded key
+frame (only for encoders).
+\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_P\_FRAME]
+is set on output resources when the resource contains only differences
+to a previous key frame (only for encoders).
+\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_B\_FRAME]
+is set on output resources when the resource contains only the
+differences between the current frame and both the preceding and
+following key frames (only for encoders).
+\end{description}
+\item[\field{timestamp}]
+is set on output resources to the \field{timestamp} value of the input
+resource that produced the resource.
+\item[\field{data_sizes}]
+is set on output resources to the amount of data written by the device,
+for each plane.
+\end{description}
+
+\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
+
+\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE by
+the driver.
+
+\field{stream_id} MUST be set to a valid stream ID previously returned
+by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
+
+\field{queue_type} MUST be set to a valid queue type.
+
+\field{resource_id} MUST be an integer inferior to the number of
+resources currently allocated for the queue.
+
+The driver MUST account for the fact that the response to this command
+might come out-of-order (i.e.~after other commands sent to the device),
+and that it can be interrupted.
+
+\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
+
+The device MUST mark output resources that might contain corrupted
+content due to and error with the VIRTIO\_VIDEO\_BUFFER\_FLAG\_ERR flag.
+
+For output resources, the device MUST copy the \field{timestamp}
+parameter of the input resource that produced it into its response.
+
+In case of encoder, the device MUST mark each output resource with one
+of VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_KEY\_FRAME,
+VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_P\_FRAME, or
+VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_B\_FRAME.
+
+If the processing of a resource was stopped due to a stream event, a
+VIRTIO\_VIDEO\_CMD\_STREAM\_STOP, or a
+VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY, the device MUST set \field{result}
+to VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
+
+\subsubsection{Device Operation: Event Virtqueue}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue}
+
+The eventq is used by the device to signal stream events that are not a
+direct result of a command queued by the driver on the commandq. Since
+these events affect the device operation, the driver is expected to
+react to them and resume streaming afterwards.
+
+There are currently two supported events: device error, and Dynamic
+Resolution Change.
+
+\begin{lstlisting}
+#define VIRTIO_VIDEO_EVENT_ERROR    1
+#define VIRTIO_VIDEO_EVENT_DRC      2
+
+union virtio_video_event {
+        le32 event_type /* One of VIRTIO_VIDEO_EVENT_* */
+        struct virtio_video_event_err err;
+        struct virtio_video_event_drc drc;
+}
+\end{lstlisting}
+
+\drivernormative{\paragraph}{Device Operation: Event Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue}
+
+The driver MUST at any time have at least one descriptor with a used
+buffer large enough to contain a \field{struct virtio_video_event}
+queued on the eventq.
+
+\paragraph{Error Event}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Error Event}
+
+The error event is queued by the device when an unrecoverable error
+occurred during processing. The stream is considered invalid from that
+point and is automatically closed. Pending
+VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN and
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands are canceled, and further
+commands will fail with VIRTIO\_VIDEO\_RESULT\_INVALID\_STREAM\_ID.
+
+Note that this is different from dequeued resources carrying the
+VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_ERR flag. This flag indicates that the
+output might be corrupted, but the stream in itself can continue and
+might recover.
+
+This event should only be used for catastrophic errors, e.g.~a host
+driver failure that cannot be recovered.
+
+\paragraph{Dynamic Resolution Change Event}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
+
+A Dynamic Resolution Change (or DRC) event happens when a decoder device
+detects that the resolution of the stream being decoded has changed.
+This event is emitted after processing all the input resources preceding
+the resolution change, and as a result all the output resources
+corresponding to these pre-DRC input resources are available to the
+driver by the time it receives the DRC event.
+
+A DRC event automatically detaches the backing memory of all output
+resources. Upon receiving the DRC event and processing all pending
+output resources, the driver is responsible for querying the new output
+resolution and re-attaching suitable backing memory to the output
+resources before queueing them again. Streaming resumes when the first
+output resource is queued with memory properly attached.
+
+\devicenormative{\subparagraph}{Dynamic Resolution Change Event}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
+
+The device MUST make all the output resources that correspond to frames
+before the resolution change point available to the driver BEFORE it
+sends the resolution change event to the driver.
+
+After the event is emitted, the device MUST reject all output resources
+for which VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING has not been
+successfully called again with
+VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION.
+
+\drivernormative{\subparagraph}{Dynamic Resolution Change Event}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
+
+The driver MUST query the new output resolution parameter and call
+VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING with suitable memory for
+each output resource before queueing them again.
+
+\subsection{Parameters}\label{sec:Device Types / Video Device / Parameters}
+
+Parameters allow the driver to configure the device for the decoding or
+encoding operation.
+
+The \field{union virtio_video_stream_params} is defined as follows:
+
+\begin{lstlisting}
+/* Common parameters */
+#define VIRTIO_VIDEO_PARAMS_INPUT_RESOURCES             0x001
+#define VIRTIO_VIDEO_PARAMS_OUTPUT_RESOURCES            0x002
+
+/* Decoder-only parameters */
+#define VIRTIO_VIDEO_PARAMS_DECODER_INPUT_FORMAT        0x101
+#define VIRTIO_VIDEO_PARAMS_DECODER_OUTPUT_FORMAT       0x102
+
+/* Encoder-only parameters */
+#define VIRTIO_VIDEO_PARAMS_ENCODER_INPUT_FORMAT        0x201
+#define VIRTIO_VIDEO_PARAMS_ENCODER_OUTPUT_FORMAT       0x202
+#define VIRTIO_VIDEO_PARAMS_ENCODER_BITRATE             0x203
+
+union virtio_video_stream_params {
+        /* Common parameters */
+        struct virtio_video_params_resources input_resources;
+        struct virtio_video_params_resources output_resources;
+
+        /* Decoder-only parameters */
+        struct virtio_video_params_bitstream_format decoder_input_format;
+        struct virtio_video_params_image_format decoder_output_format;
+
+        /* Encoder-only parameters */
+        struct virtio_video_params_image_format encoder_input_format;
+        struct virtio_video_params_bitstream_format encoder_output_format;
+        struct virtio_video_params_bitrate encoder_bitrate;
+};
+\end{lstlisting}
+
+Not all parameters are valid for all devices. For instance, a decoder
+does not support any of the encoder-only parameters and will return
+VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION if an unsupported
+parameter is queried or set.
+
+Each parameter is described in the remainder of this section.
+
+\drivernormative{\subsubsection}{Parameters}{Device Types / Video Device / Parameters}
+
+After creating a new stream, the initial value of all parameters is
+undefined to the driver. Thus, the driver MUST NOT assume the default
+value of any parameter and MUST use
+VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM in order to get the values of the
+parameters is needs.
+
+The driver SHOULD modify parameters by first calling
+VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM to get the current value of the
+parameter it wants to modify, alter it and submit the desired value
+using VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM, then checking the actual
+value set to the parameter in the response of
+VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM.
+
+\devicenormative{\subsubsection}{Parameters}{Device Types / Video Device / Parameters}
+
+The device MUST initialize each parameter to a valid default value and
+allow each parameter to be read even without the driver explicitly
+setting a value for it.
+
+\subsubsection{Common parameters}\label{sec:Device Types / Video Device / Parameters / Common parameters}
+
+\field{struct virtio_video_params_resources} is used to control the
+number of resources and their backing memory type for the INPUT and
+OUTPUT queues:
+
+\begin{lstlisting}
+#define VIRTIO_VIDEO_MEM_TYPE_GUEST_PAGES       0x1
+#define VIRTIO_VIDEO_MEM_TYPE_VIRTIO_OBJECT     0x2
+
+struct virtio_video_params_resources {
+        le32 min_resources;
+        le32 max_resources;
+        le32 num_resources;
+        u8 mem_type; /* VIRTIO_VIDEO_MEM_TYPE_* */
+        u8 padding[3];
+}
+\end{lstlisting}
+
+\begin{description}
+\item[\field{min_resources}]
+is the minimum number of resources that the queue supports for the
+current settings. Cannot be modified by the driver.
+\item[\field{max_resources}]
+is the maximum number of resources that the queue supports for the
+current settings. Cannot be modified by the driver.
+\item[\field{num_resources}]
+is the number of resources that can be addressed for the queue, numbered
+from \(0\) to \(num\_queue - 1\). Can be equal to zero if no resources
+are allocated, otherwise will be comprised between \field{min_resources}
+and \field{max_resources}.
+\item[\field{mem_type}]
+is the memory type that will be used to back these resources.
+\end{description}
+
+Successfully setting this parameter results in all currently attached
+resources of the corresponding queue to become detached, i.e.~the driver
+cannot queue a resource to the queue without attaching some backing
+memory first. All currently queued resources for the queue are returned
+with the VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED error before the response
+to the VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM is returned.
+
+This parameter can only be changed during the following times:
+
+\begin{itemize}
+\item
+  After creating a stream and before queuing any resource on a given
+  queue,
+\item
+  For the INPUT queue, after receiving the reponse to a
+  VIRTIO\_VIDEO\_CMD\_STREAM\_STOP and before queueing any input
+  resource,
+\item
+  For the OUTPUT queue, after receiving a DRC event and before queueing
+  any output resource.
+\end{itemize}
+
+Attempts to change this parameter outside of these times will result in
+VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to be returned.
+
+\subsubsection{Format parameters}\label{sec:Device Types / Video Device / Parameters / Format parameters}
+
+The format of the input and output queues are defined using the
+\field{virtio_video_params_bitstream_format} and
+\field{virtio_video_params_image_format}. Which one applies to the input
+or output queue depends on whether the device is a decoder or an
+encoder.
+
+Bitstream formats are set using the
+\field{virtio_video_params_bitstream_format} struct:
+
+\begin{lstlisting}
+struct virtio_video_params_bitstream_format {
+        u8 fourcc[4];
+        le32 buffer_size;
+}
+\end{lstlisting}
+
+\begin{description}
+\item[\field{fourcc}]
+is the fourcc of the bitstream format. For a list of supported formats,
+see
+\ref{sec:Device Types / Video Device / Supported formats / Bitstream formats}.
+\item[\field{buffer_size}]
+is the minimum size of the buffers that will back resources to be
+queued.
+\end{description}
+
+Image formats are set using the \field{virtio_video_params_image_format}
+struct:
+
+\begin{lstlisting}
+struct virtio_video_rect {
+        le32 left;
+        le32 top;
+        le32 width;
+        le32 height;
+}
+
+struct virtio_video_plane_format {
+        le32 buffer_size;
+        le32 stride;
+        le32 offset;
+        u8 padding[4];
+}
+
+struct virtio_video_params_image_format {
+        u8 fourcc[4];
+        le32 width;
+        le32 height;
+        u8 padding[4];
+        struct virtio_video_rect crop;
+        struct virtio_video_plane_format planes[VIRTIO_VIDEO_MAX_PLANES];
+}
+\end{lstlisting}
+
+\begin{description}
+\item[\field{fourcc}]
+is the fourcc of the image format. For a list of supported formats, see
+\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
+\item[\field{width}]
+is the width in pixels of the coded image.
+\item[\field{height}]
+is the height in pixels of the coded image.
+\item[\field{crop}]
+is the rectangle covering the visible size of the frame, i.e the part of
+the frame that should be displayed.
+\item[\field{planes}]
+is the format description of each individual plane making this format.
+The number of planes is dependent on the \field{fourcc} and detailed in
+\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
+
+\begin{description}
+\item[\field{buffer_size}]
+is the minimum size of the buffers that will back resources to be
+queued.
+\item[\field{stride}]
+is the distance in bytes between two lines of data.
+\item[\field{offset}]
+is the starting offset for the data in the buffer.
+\end{description}
+\end{description}
+
+\devicenormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
+
+The device MAY adjust any requested parameter to the nearest-supported
+value if the requested one is not suitable. For instance, an encoder
+device may decide that it needs more larger output buffers in order to
+encode at the requested format and resolution.
+
+\drivernormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
+
+When setting a format parameter, the driver MUST check the adjusted
+returned value and comply with it, or try to set a different one if it
+cannot.
+
+\subsubsection{Encoder parameters}\label{sec:Device Types / Video Device / Parameters / Encoder parameters}
+
+\begin{lstlisting}
+struct virtio_video_params_bitrate {
+    le32 min_bitrate;
+    le32 max_bitrate;
+    le32 bitrate;
+    u8 padding[4];
+}
+\end{lstlisting}
+
+\begin{description}
+\item[\field{min_bitrate}]
+is the minimum bitrate supported by the encoder for the current
+settings. Ignored when setting the parameter.
+\item[\field{max_bitrate}]
+is the maximum bitrate supported by the encoder for the current
+settings. Ignored when setting the parameter.
+\item[\field{bitrate_}]
+is the current desired bitrate for the encoder.
+\end{description}
+
+\subsection{Supported formats}\label{sec:Device Types / Video Device / Supported formats}
+
+Bitstream and image formats are identified by their fourcc code, which
+is a four-bytes ASCII sequence uniquely identifying the format and its
+properties.
+
+\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
+
+The fourcc code of each supported bitstream format is given, as well as
+the unit of data requested in each input resource for the decoder, or
+produced in each output resource for the encoder.
+
+\begin{description}
+\item[\field{MPG2}]
+MPEG2 encoded stream. One Access Unit per resource.
+\item[\field{H264}]
+H.264 encoded stream. One NAL unit per resource.
+\item[\field{HEVC}]
+HEVC encoded stream. One NAL unit per resource.
+\item[\field{VP80}]
+VP8 encoded stream. One frame per resource.
+\item[\field{VP90}]
+VP9 encoded stream. One frame per resource.
+\end{description}
+
+\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
+
+The fourcc code of each supported image format is given, as well as its
+number of planes, physical buffers, and eventual subsampling.
+
+\begin{description}
+\item[\field{RGB3}]
+one RGB plane where each component takes one byte, i.e.~3 bytes per
+pixel.
+\item[\field{NV12}]
+one Y plane followed by interleaved U and V data, in a single buffer.
+4:2:0 subsampling.
+\item[\field{NV12}]
+same as \field{NV12} but using two separate buffers for the Y and UV
+planes.
+\item[\field{YU12}]
+one Y plane followed by one Cb plane, followed by one Cr plane, in a
+single buffer. 4:2:0 subsampling.
+\item[\field{YM12}]
+same as \field{YU12} but using three separate buffers for the Y, U and V
+planes.
+\end{description}
-- 
2.39.0.rc0.267.gcb52ba06e7-goog


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-08  7:23 [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification Alexandre Courbot
@ 2022-12-08 15:00 ` Cornelia Huck
  2022-12-27  5:38   ` Alexandre Courbot
  2022-12-19 16:59 ` [virtio-dev] " Alexander Gordeev
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2022-12-08 15:00 UTC (permalink / raw)
  To: Alexandre Courbot, virtio-dev, Keiichi Watanabe, Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Alexandre Courbot

On Thu, Dec 08 2022, Alexandre Courbot <acourbot@chromium.org> wrote:

> Add the specification of the video decoder and encoder devices, which
> can be used to provide host-accelerated video operations to the guest.
>
> Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> --
> Here is the long-overdue new revision of the virtio-video RFC. This
> version reorganizes the specification quite a bit and tries to simplify
> the protocol further. Nonetheless, it still results in a rather long (17
> pages) specification for just these devices, even though the spec is not
> fully complete (I want to rethink the formats descriptions, and some
> parameters need to be added for the encoder device).
>
> I would like to get some high-level feedback on this version and maybe
> propose to do things a bit differently before people invest too much
> time reviewing this in depth. While rewriting this document, it became
> more and more obvious that this is just a different, and maybe a bit
> simpler, reimplementation of the V4L2 stateless decoder protocol [1]. I
> am now wondering whether it would not make more sense to rewrite this
> specification as just a way to transport V4L2 requests over virtio,
> similarly to how virtio-fs does with the FUSE protocol [2].
>
> At the time we started writing this implementation, the V4L2 protocols
> for decoders and encoders were not set in stone yet, but now that they
> are it might make sense to reconsider. Switching to this solution would
> greatly shorten the virtio-video device spec, and also provide a way to
> support other kind of V4L2 devices like cameras or image processors at
> no extra cost.
>
> Note that doing so would not require that either the host or guest uses
> V4L2 - the virtio video device would just emulate a V4L2 device over
> virtio. A few adaptations would need to be done regarding how memory
> types work, but otherwise I believe most of V4L2 could be used as-is.
>
> Please share your thoughts about this, and I will either explore this
> idea further with a prototype, or keep moving the present spec forward,
> hopefully at a faster pace.

In principle, reusing an existing interface that does the job might be a
good idea. I see that the Linux headers are dual-licenced as 3-clause
BSD, and if the interface has indeed stabilized, it might be a good idea
to rely on it. The main question is: Is the interface sufficiently
independent from Linux specialities (i.e. can others implement it
without issue?)

>
> Due to the RFC state of this patch I have refrained from referencing the
> normative statements in conformance.tex - I will do that as a final step
> once the spec is mostly agreed on.
>
> [1] https://docs.kernel.org/userspace-api/media/v4l/dev-stateless-decoder.html
> [2] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-fs.tex
>
> Full PDF:
> https://drive.google.com/file/d/1HRVDiDdo50-S9X5tWgzmT90FJRHoB1dN/view?usp=sharing
>
> PDF of video section only:
> https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing
> ---
>  content.tex      |    1 +
>  virtio-video.tex | 1420 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 1421 insertions(+)
>  create mode 100644 virtio-video.tex
>
> diff --git a/content.tex b/content.tex
> index 9d1de53..8a1557a 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -7543,6 +7543,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
>  \input{virtio-scmi.tex}
>  \input{virtio-gpio.tex}
>  \input{virtio-pmem.tex}
> +\input{virtio-video.tex}
>  
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
> diff --git a/virtio-video.tex b/virtio-video.tex
> new file mode 100644
> index 0000000..5d44eb1
> --- /dev/null
> +++ b/virtio-video.tex
> @@ -0,0 +1,1420 @@
> +\section{Video Device}\label{sec:Device Types / Video Device}
> +
> +The virtio video encoder and decoder devices provide support for
> +host-accelerated video encoding and decoding. Despite being different
> +devices types, they use the same protocol and general flow.
> +
> +\subsection{Device ID}\label{sec:Device Types / Video Device / Device ID}
> +
> +\begin{description}
> +\item[30]
> +encoder device
> +\item[31]
> +decoder device
> +\end{description}
> +
> +\subsection{Virtqueues}\label{sec:Device Types / Video Device / Virtqueues}
> +
> +\begin{description}
> +\item[0]
> +commandq - queue for driver commands and device responses to these
> +commands.
> +\item[1]
> +eventq - queue for events sent by the device to the driver.

nit: drop trailing '.'

> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / Video Device / Feature bits}
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES (0)]

Side note: you should get the correct output even without escaping the
underscores (although your editor might still be confused...)

> +Guest pages can be used as the backing memory of resources.
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_NON\_CONTIG (1)]
> +The device can use non-contiguous guest memory as the backing memory of
> +resources. Only meaningful if VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES
> +is also set.
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_DYNAMIC (2)]
> +The device supports re-attaching memory to resources while streaming.
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT (3)]
> +Objects exported by another virtio device can be used as the backing
> +memory of resources.
> +\end{description}
> +
> +\devicenormative{\subsubsection}{Feature bits}{Device Types / Video Device / Feature bits}
> +
> +The device MUST present at least one of

s/present/offer/ ?

> +VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES or
> +VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT, since the absence of both
> +bits would mean that no memory can be used at all for resources.

Maybe also

"The device MUST NOT present VIRTIO_VIDEO_F_RESOURCE_NON_CONTIG unless
it also offers VIRTIO_VIDEO_F_RESOURCE_GUEST_PAGES." ?

> +
> +\drivernormative{\subsubsection}{Feature bits}{Device Types / Video Device / Feature bits}
> +
> +The driver MUST negotiate at least one of the
> +VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES and
> +VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT features.
> +
> +If the VIRTIO\_VIDEO\_F\_RESOURCE\_NON\_CONTIG is not present, the

"If VIRTIO_VIDEO_F_RESOURCE_GUEST_PAGES has been negotiated, but not
VIRTIO_VIDEO_F_RESOURCE_NON_CONTIG, the..." ?

> +driver MUST use physically contiguous memory for all the buffers it
> +allocates.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / Video Device / Device configuration layout}
> +
> +Video device configuration uses the following layout:

"The video device configuration space uses the following layout:"

> +
> +\begin{lstlisting}
> +struct virtio_video_config {
> +        le32 version;
> +        le32 caps_length;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{version}]
> +is the protocol version that the device understands.
> +\item[\field{caps_length}]
> +is the minumum length in bytes that a device-writable buffer must have
> +in order to receive the response to
> +VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
> +\end{description}
> +
> +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Video Device / Device configuration layout}
> +
> +As there is currently only one version of the protocol, the device MUST
> +set the \field{version} field to 0.

In what way would you want to change the protocol so that it becomes
incompatible? Extensions should be easy to handle via extra
capabilities, and if we don't expect the protocol to change often, a
feature bit for a new format might be sufficient.

If we stick with the version field, maybe start at 1 and make 0 invalid?
Probably easier to spot errors that way.

> +
> +The device MUST set the \field{caps_length} field to a value equal to
> +the response size of VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.

Could the device also support a minimum response size that only supports
a subset of the caps to be returned? Otherwise, I think caps_length is
the maximum (or fixed?) length of the query caps response?

> +
> +\subsection{Device Initialization}\label{sec:Device Types / Video Device / Device Initialization}
> +
> +\begin{enumerate}
> +\def\labelenumi{\arabic{enumi}.}
> +\item
> +  The driver reads the feature bits and negotiates the features it
> +  needs.
> +\item
> +  The driver sets up the commandq.
> +\item
> +  The driver confirms that it supports the version of the protocol
> +  advertized in the \field{version} field of the configuration space.

Do we expect the version field to change depending on the negotiated
features? If not, the driver could already fail during the feature
negotiation step?

> +\item
> +  The driver reads the \field{caps_length} field of the configuration
> +  space and prepares a buffer of at least that size.
> +\item
> +  The driver sends that buffer on the commandq with the
> +  VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS command.
> +\item
> +  The driver receives the reponse from the device, and parses its
> +  capabilities.
> +\end{enumerate}
> +
> +\subsection{Device Operation}\label{sec:Device Types / Video Device / Device Operation}
> +
> +The commandq is used by the driver to send commands to the device and to
> +receive the device's response via used buffers.
> +
> +The driver can create new streams using the
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE command. Each stream has two resource
> +queues (not to be confused with the virtio queues) called INPUT and
> +OUTPUT. The INPUT queue accepts driver-filled input data for the device
> +(bitstream data for a decoder ; input frames for an encoder), while the
> +OUTPUT queue receives resources to be filled by the device as a result
> +of processing the INPUT queue (decoded frames for a decoder ; encoded
> +bitstream data for an encoder).
> +
> +A resource is a set of memory buffers that contain a unit of data that
> +the device can process or produce. Most resources will only have one
> +buffer (like bitstreams and single-planar images), but frames using a
> +multi-planar format will have several.
> +
> +Before resources can be submitted to a queue, backing memory must be
> +attached to them using VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING.
> +The exact form of that memory is negotiated using the feature flags.
> +
> +The INPUT and OUTPUT queues are effectively independent, and the driver
> +can fill them without caring about the other queue. In particular there
> +is no need to queue input and output resources in pairs: one input
> +resource can result in zero to many output resources being produced.
> +
> +Resources are queued to the INPUT or OUTPUT queue using the
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command. The device replies to this
> +command when an input resource has been fully processed and can be
> +reused by the driver, or when an output resource has been filled by the
> +device as a result of processing.
> +
> +Parameters of the stream can be obtained and configured using
> +VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM and
> +VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM. Available parameters depend on
> +on the device type and are detailed in section
> +\ref{sec:Device Types / Video Device / Parameters}.
> +
> +The device may detect stream-related events that require intervention

s/may/can/ ? Just to avoid confusion with "MAY".

> +from the driver and signals them on the eventq. One example is a dynamic
> +resolution change while decoding a stream, which may require the driver
> +to reallocate the backing memory of its output resources to fit the new
> +resolution.
> +
> +\drivernormative{\subsubsection}{Device Operation}{Device Types / Video Device / Device Operation}
> +
> +Descriptor chains sent to the commandq by the driver MUST include at
> +least one device-writable descriptor of a size sufficient to receive the
> +response to the queued command.
> +
> +\devicenormative{\subsubsection}{Device Operation}{Device Types / Video Device / Device Operation}
> +
> +Responses to a command MUST be written by the device in the first
> +device-writable descriptor of the descriptor chain from which the
> +command came.
> +
> +\subsubsection{Device Operation: Command Virtqueue}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
> +
> +This section details the commands that can be sent on the commandq by
> +the driver, as well as the responses that the device will write.
> +
> +Different structures are used for each command and response. A command
> +structure starts with the requested command code, defined as follows:
> +
> +\begin{lstlisting}
> +/* Device */
> +#define VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS       0x100
> +
> +/* Stream */
> +#define VIRTIO_VIDEO_CMD_STREAM_CREATE           0x200
> +#define VIRTIO_VIDEO_CMD_STREAM_DESTROY          0x201
> +#define VIRTIO_VIDEO_CMD_STREAM_DRAIN            0x203
> +#define VIRTIO_VIDEO_CMD_STREAM_STOP             0x204
> +#define VIRTIO_VIDEO_CMD_STREAM_GET_PARAM        0x205
> +#define VIRTIO_VIDEO_CMD_STREAM_SET_PARAM        0x206
> +
> +/* Resource*/
> +#define VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING 0x400
> +#define VIRTIO_VIDEO_CMD_RESOURCE_QUEUE          0x401
> +};
> +\end{lstlisting}
> +
> +A response structure starts with the result of the requested command,
> +defined as follows:
> +
> +\begin{lstlisting}
> +/* Success */
> +#define VIRTIO_VIDEO_RESULT_OK                          0x000
> +
> +/* Error */
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_COMMAND         0x100
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_OPERATION       0x101
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_STREAM_ID       0x102
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_RESOURCE_ID     0x103
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_ARGUMENT        0x104
> +#define VIRTIO_VIDEO_RESULT_ERR_CANCELED                0x105
> +#define VIRTIO_VIDEO_RESULT_ERR_OUT_OF_MEMORY           0x106
> +\end{lstlisting}
> +
> +For response structures carrying an error code, the rest of the
> +structure is considered invalid. Only response structures carrying
> +VIRTIO\_VIDEO\_RESULT\_OK shall be examined further by the driver.
> +
> +\devicenormative{\paragraph}{Device Operation: Command Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
> +
> +The device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND to
> +any non-existing command code.

Maybe "any command code it does not recognize"?

> +
> +\drivernormative{\paragraph}{Device Operation: Command Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
> +
> +The driver MUST NOT interpret the rest of a response which result is not

s/which/whose/ ?

> +VIRTIO\_VIDEO\_RESULT\_OK.
> +
> +\subsubsection{Device Operation: Device Commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands}
> +
> +Device capabilities are retrieved using the
> +VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS command, which returns arrays of
> +formats supported by the input and output queues.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
> +
> +Retrieve device capabilities.
> +
> +The driver sends this command with
> +\field{struct virtio_video_device_query_caps}:
> +
> +\begin{lstlisting}
> +struct virtio_video_device_query_caps {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS */
> +};
> +\end{lstlisting}
> +
> +The device responds with
> +\field{struct virtio_video_device_query_caps_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_device_query_caps_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        le32 num_bitstream_formats;
> +        le32 num_image_formats;
> +        /**
> +         * Followed by
> +         * struct virtio_video_bitstream_format_desc bitstream_formats[num_bitstream_formats];
> +         */
> +        /**
> +         * Followed by
> +         * struct virtio_video_image_format_desc image_formats[num_image_formats]
> +         */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
> +if the descriptor was smaller than the defined \field{caps_length} in
> +the video device configuration.
> +\end{description}
> +\item[\field{num_bitstream_formats}]
> +is the number of supported bitstream formats.
> +\item[\field{num_image_formats}]
> +is the number of supported image formats.
> +\item[\field{bitstream_formats}]
> +is an array of size \field{num_bitstream_formats} containing the
> +supported encoded formats. These correspond to the formats that can be
> +set on the INPUT queue for a decoder, and on the OUTPUT queue for an
> +encoder. For a description of bitstream formats, see
> +\ref{sec:Device Types / Video Device / Supported formats / Bitstream formats}.
> +\item[\field{image_formats}]
> +is an array of size \field{num_image_formats} containing the supported
> +image formats. These correspond to the formats that can be set on the
> +OUTPUT queue for a decoder, and on the INPUT queue for an encoder. For a
> +description of image formats, see
> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}{Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS
> +by the driver.

I don't think that is needed; we can reasonably assume that the driver
actually sends the command it wants to execute.

> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}{Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
> +
> +The device MUST write the two \field{bitstream_formats} and
> +\field{image_formats} arrays, of length \field{num_bitstream_formats}
> +and \field{num_image_formats}, respectively.
> +
> +The total size of the response MUST be equal to \field{caps_length}
> +bytes, as reported by the device configuration.

Ah, that answers my question from above... so caps_length is the actual
length of the response, not a minimum or maximum value in any direction.

> +
> +\subsubsection{Device Operation: Stream commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands}
> +
> +Stream commands allow the creation, destruction, and flow control of a
> +stream.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_CREATE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
> +
> +Create a new stream using the device.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_create}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_create {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_CREATE */
> +};
> +\end{lstlisting}
> +
> +The device responds with \field{struct virtio_video_stream_create_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_create_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
> +if the limit of simultaneous streams has been reached by the device and
> +no more can be created.
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND]
> +if the stream cannot be created due to an unexpected device issue.

Is it an "unexpected device issue" or "the driver send something it
should not have"? It might be a good idea to distinguish the two?

> +\end{description}
> +\item[\field{stream_id}]
> +is the ID of the created stream allocated by the device.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_CREATE}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE by
> +the driver.

Same as above: we don't need to specify that the driver needs to use the
correct command.

> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_CREATE}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
> +
> +\field{stream_id} MUST be set to an identifier that is unique to that
> +stream for as long as it lives.

Unique on a per-device basis?

> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_DESTROY}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
> +
> +Destroy a video stream and all its resources. Any activity on the stream
> +is halted and all resources released by the time the response is
> +received by the driver.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_destroy}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_destroy {
> +         le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_DESTROY */
> +         le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream to be destroyed, as previously returned by
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +\end{description}
> +
> +The device responds with
> +\field{struct virtio_video_stream_destroy_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_destroy_resp {
> +         le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream ID does not exist.
> +\end{description}
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DESTROY}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY by
> +the driver.

Not needed, as above.

> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +The driver MUST stop using \field{stream_id} as a valid stream after it
> +received the response to this command.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DESTROY}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
> +
> +Before the device sends a response, it MUST respond with
> +VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED to all pending commands.
> +
> +After responding to this command, the device MUST reply with
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID to any command related
> +to this stream.

"to this stream id"?

> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_DRAIN}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> +
> +Complete processing of all currently queued input resources.
> +
> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN ensures that all sent
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue are
> +processed by the device and that the resulting output resources are
> +available to the driver.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_drain}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_drain {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_DRAIN */
> +        le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream to drain, as previously returned by
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_drain_resp}
> +once the drain operation is completed:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_drain_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if a drain operation is already in progress for this stream,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED]
> +if the operation has been canceled by a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
> +or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation.
> +\end{description}
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN by the
> +driver.

Not needed, as above.

> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +The driver MUST keep queueing output resources until it gets the
> +response to this command. Failure to do so may result in the device
> +stalling as it waits for output resources to write into.

The exception is if the driver decides to stop/destroy the stream,
right?

> +
> +The driver MUST account for the fact that the response to this command
> +might come out-of-order (i.e.~after other commands sent to the device),
> +and that it can be interrupted.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> +
> +Before the device sends the response, it MUST process and respond to all
> +the VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue that
> +were sent before the drain command, and make all the corresponding
> +output resources available to the driver by responding to their
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command.
> +
> +While the device is processing the command, it MUST return
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to the
> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN command.
> +
> +If the command is interrupted due to a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
> +or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation, the device MUST
> +respond with VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_STOP}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
> +
> +Immediately return all queued input resources without processing them
> +and stop operation until new input resources are queued.
> +
> +This command is mostly useful for decoders that need to quickly jump
> +from one point of the stream to another (i.e.~seeking), or in order to
> +stop processing as quickly as possible.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_stop}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_stop {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_STOP */
> +        le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream to stop, as previously returned by
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_stop_resp}
> +after the response for all previously queued input resources has been
> +sent:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_stop_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\end{description}
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_STOP}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_STOP by the
> +driver.

Not needed, as above.

> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +Upon receiving the response to this command, the driver SHOULD process
> +(or drop) any output resource before resuming operation by queueing new
> +input resources.
> +
> +Upon receiving the response to this command, the driver CAN modify the

s/CAN/MAY/

> +\field{struct virtio_video_params_resources} parameter corresponding to
> +the INPUT queue, and subsequently attach new backing memory to the input
> +resources using the VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING
> +command.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_STOP}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
> +
> +The device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED to any
> +pending VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN and
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command on the INPUT queue before
> +responding to this command. Pending commands on the output queue are not
> +affected.
> +
> +The device MUST interrupt operation as quickly as possible, and not be
> +dependent on output resources being queued by the driver.
> +
> +Upon resuming processing, the device CAN skip input data until it finds

s/CAN/MAY/

> +a point that allows it to resume operation properly (e.g.~until a
> +keyframe it found in the input stream of a decoder).
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
> +
> +Read the value of a parameter of the given stream. Available parameters
> +depend on the device type and are listed in
> +\ref{sec:Device Types / Video Device / Parameters}.
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_get_param {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_GET_PARAM */
> +        le32 stream_id;
> +        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
> +        u8 padding[4];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream we want to get a parameter from.
> +\item[\field{param_type}]
> +is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
> +we want to get.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_param_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_param_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        union virtio_video_stream_params param;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if the \field{param_type} argument is invalid for the device,
> +\end{description}
> +\item[\field{param}]
> +is the value of the requested parameter, if \field{result} is
> +VIRTIO\_VIDEO\_RESULT\_OK.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM
> +by the driver.

Not needed, as above.

> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{param_type} MUST be set to a parameter type that is valid for the
> +device.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> +
> +Write the value of a parameter of the given stream, and return the value
> +actually set by the device. Available parameters depend on the device
> +type and are listed in
> +\ref{sec:Device Types / Video Device / Parameters}.
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_set_param {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_SET_PARAM */
> +        le32 stream_id;
> +        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
> +        u8 padding[4];
> +        union virtio_video_stream_params param;
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream we want to set a parameter for.
> +\item[\field{param_type}]
> +is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
> +we want to set.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_param_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_param_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        union virtio_video_stream_params param;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if the \field{param_type} argument is invalid for the device,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if the requested parameter cannot be modified at this moment.
> +\end{description}
> +\item[\field{param}]
> +is the actual value of the parameter set by the device, if
> +\field{result} is VIRTIO\_VIDEO\_RESULT\_OK. The value set by the device
> +may differ from the requested value depending on the device's
> +capabilities.
> +\end{description}
> +
> +Outside of the error cases described above, setting a parameter does not
> +fail. If the device cannot apply the parameter as requested, it will
> +adjust it to the closest setting it supports, and return that value to
> +the driver. It is then up to the driver to decide whether it can work
> +within the range of parameters supported by the device.

Does the driver need a way to discover which parameters are supported?
Or is that depending on the context?

> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM
> +by the driver.

Not needed, as above.

> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{param_type} MUST be set to a parameter type that is valid for the
> +device, and \field{param} MUST be filled as the union member
> +corresponding to \field{param_type}.
> +
> +The driver MUST check the actual value of the parameter as set by the
> +device and work with this value, or fail properly if it cannot.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> +
> +The device MUST NOT return an error if the value requested by the driver
> +cannot be applied as-is. Instead, the device MUST set the parameter to
> +the closest supported value to the one requested by the driver.
> +
> +\subsubsection{Device Operation: Resource Commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands}
> +
> +Resource commands manage the memory backing of individual resources and
> +allow to queue them so the device can process them.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> +
> +Assign backing memory to a resource.
> +
> +The driver sends this command with
> +\field{struct virtio_video_resource_attach_backing}:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_QUEUE_TYPE_INPUT       0
> +#define VIRTIO_VIDEO_QUEUE_TYPE_OUTPUT      1
> +
> +struct virtio_video_resource_attach_backing {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING */
> +        le32 stream_id;
> +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
> +        le32 resource_id;
> +        union virtio_video_resource resources[];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of a valid stream.
> +\item[\field{queue_type}]
> +is the direction of the queue.
> +\item[\field{resource_id}]
> +is the ID of the resource to be attached to.
> +\item[\field{resources}]
> +specifies memory regions to attach.
> +\end{description}
> +
> +The union \field{virtio_video_resource} is defined as follows:
> +
> +\begin{lstlisting}
> +union virtio_video_resource {
> +        struct virtio_video_resource_sg_list sg_list;
> +        struct virtio_video_resource_object object;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{sg_list}]
> +represents a scatter-gather list. This variant can be used when the
> +\field{mem_type} member of the \field{virtio_video_params_resources}
> +corresponding to the queue is set to
> +VIRTIO\_VIDEO\_MEM\_TYPE\_GUEST\_PAGES (see
> +\ref{sec:Device Types / Video Device / Parameters / Common parameters}).
> +\item[\field{object}]
> +represents an object exported from another virtio device. This variant
> +can be used when the \field{mem_type} member of the
> +\field{virtio_video_params_resources} corresponding to the queue is set
> +to VIRTIO\_VIDEO\_MEM\_TYPE\_VIRTIO\_OBJECT (see
> +\ref{sec:Device Types / Video Device / Parameters / Common parameters}).
> +\end{description}
> +
> +The struct \field{virtio_video_resource_sg_list} is defined as follows:
> +
> +\begin{lstlisting}
> +struct virtio_video_resource_sg_entry {
> +        le64 addr;
> +        le32 length;
> +        u8 padding[4];
> +};
> +
> +struct virtio_video_resource_sg_list {
> +        le32 num_entries;
> +        u8 padding[4];
> +        /* Followed by num_entries instances of
> +           video_video_resource_sg_entry */
> +};
> +\end{lstlisting}
> +
> +Within \field{struct virtio_video_resource_sg_entry}:
> +
> +\begin{description}
> +\item[\field{addr}]
> +is a guest physical address to the start of the SG entry.
> +\item[\field{length}]
> +is the length of the SG entry.
> +\end{description}
> +
> +Finally, for \field{struct virtio_video_resource_sg_list}:
> +
> +\begin{description}
> +\item[\field{num_entries}]
> +is the number of \field{struct virtio_video_resource_sg_entry} instances
> +that follow.
> +\end{description}
> +
> +\field{struct virtio_video_resource_object} is defined as follows:
> +
> +\begin{lstlisting}
> +struct virtio_video_resource_object {
> +        u8 uuid[16];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[uuid]
> +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
> +\end{description}
> +
> +The device responds with
> +\field{struct virtio_video_resource_attach_backing_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_resource_attach_backing_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the mentioned stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
> +invalid value,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if the operation is performed at a time when it is non-valid.
> +\end{description}
> +\end{description}
> +
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
> +the following times:
> +
> +\begin{itemize}
> +\item
> +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
> +  resource,
> +\item
> +  AFTER successfully changing the \field{virtio_video_params_resources}
> +  parameter corresponding to the queue and BEFORE
> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
> +\end{itemize}
> +
> +This is to ensure that the device can rely on the fact that a given
> +resource will always point to the same memory for as long as it may be
> +used by the video device. For instance, a decoder may use returned
> +decoded frames as reference for future frames and won't overwrite the
> +backing resource of a frame that is being referenced. It is only before
> +a stream is started and after a Dynamic Resolution Change event has
> +occurred that we can be sure that all resources won't be used in that
> +way.
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> +
> +\field{cmd_type} MUST be set to
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING by the driver.

Not needed, as above.

> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{queue_type} MUST be set to a valid queue type.
> +
> +\field{resource_id} MUST be an integer inferior to the number of
> +resources currently allocated for the queue.
> +
> +The length of the \field{resources} array of
> +\field{struct virtio_video_resource_attach_backing} MUST be equal to the
> +number of resources required by the format currently set on the queue,
> +as described in
> +\ref{sec:Device Types / Video Device / Supported formats}.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> +
> +At any time other than the times valid for calling this command, the
> +device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
> +
> +Add a resource to the device's queue.
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_MAX_PLANES                    8
> +
> +#define VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME  (1 << 0)
> +
> +struct virtio_video_resource_queue {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING */
> +        le32 stream_id;
> +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
> +        le32 resource_id;
> +        le32 flags; /* Bitmask of VIRTIO_VIDEO_ENQUEUE_FLAG_* */
> +        u8 padding[4];
> +        le64 timestamp;
> +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of a valid stream.
> +\item[\field{queue_type}]
> +is the direction of the queue.
> +\item[\field{resource_id}]
> +is the ID of the resource to be queued.
> +\item[\field{flags}]
> +is a bitmask of VIRTIO\_VIDEO\_ENQUEUE\_FLAG\_* values.
> +
> +\begin{description}
> +\item[\field{VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME}]
> +The submitted frame is to be encoded as a key frame. Only valid for the
> +encoder's INPUT queue.
> +\end{description}
> +\item[\field{timestamp}]
> +is an abstract sequence counter that can be used on the INPUT queue for
> +synchronization. Resources produced on the output queue will carry the
> +\field{timestamp} of the input resource they have been produced from.
> +\item[\field{data_sizes}]
> +number of data bytes used for each plane. Set by the driver for input
> +resources.
> +\end{description}
> +
> +The device responds with
> +\field{struct virtio_video_resource_queue_resp}:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_ERR           (1 << 0)
> +/* Encoder only */
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_KEY_FRAME     (1 << 1)
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_P_FRAME       (1 << 2)
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_B_FRAME       (1 << 3)
> +
> +struct virtio_video_resource_queue_resp {
> +        le32 result;
> +        le32 flags;
> +        le64 timestamp;
> +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if the \field{queue_type}, \field{resource_id} or \field{flags}
> +parameters have an invalid value,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING has not been
> +successfully called on the resource prior to queueing it.
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED]
> +if the resource has not been processed, not because of an error but
> +because of a change in the state of the codec. The driver is expected to
> +take action and address the condition before submitting the resource
> +again.
> +\end{description}
> +\item[\field{flags}]
> +is a bitmask of VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_* flags.
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_ERR]
> +is set on output resources when a non-fatal processing error has
> +happened and the data contained by the resource is likely to be
> +corrupted,
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_KEY\_FRAME]
> +is set on output resources when the resource contains an encoded key
> +frame (only for encoders).
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_P\_FRAME]
> +is set on output resources when the resource contains only differences
> +to a previous key frame (only for encoders).
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_B\_FRAME]
> +is set on output resources when the resource contains only the
> +differences between the current frame and both the preceding and
> +following key frames (only for encoders).
> +\end{description}
> +\item[\field{timestamp}]
> +is set on output resources to the \field{timestamp} value of the input
> +resource that produced the resource.
> +\item[\field{data_sizes}]
> +is set on output resources to the amount of data written by the device,
> +for each plane.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE by
> +the driver.

Not needed, as above.

> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{queue_type} MUST be set to a valid queue type.
> +
> +\field{resource_id} MUST be an integer inferior to the number of
> +resources currently allocated for the queue.
> +
> +The driver MUST account for the fact that the response to this command
> +might come out-of-order (i.e.~after other commands sent to the device),
> +and that it can be interrupted.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
> +
> +The device MUST mark output resources that might contain corrupted
> +content due to and error with the VIRTIO\_VIDEO\_BUFFER\_FLAG\_ERR flag.

"due to" what?

> +
> +For output resources, the device MUST copy the \field{timestamp}
> +parameter of the input resource that produced it into its response.
> +
> +In case of encoder, the device MUST mark each output resource with one
> +of VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_KEY\_FRAME,
> +VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_P\_FRAME, or
> +VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_B\_FRAME.
> +
> +If the processing of a resource was stopped due to a stream event, a
> +VIRTIO\_VIDEO\_CMD\_STREAM\_STOP, or a
> +VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY, the device MUST set \field{result}
> +to VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
> +
> +\subsubsection{Device Operation: Event Virtqueue}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue}
> +
> +The eventq is used by the device to signal stream events that are not a
> +direct result of a command queued by the driver on the commandq. Since
> +these events affect the device operation, the driver is expected to
> +react to them and resume streaming afterwards.
> +
> +There are currently two supported events: device error, and Dynamic
> +Resolution Change.
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_EVENT_ERROR    1
> +#define VIRTIO_VIDEO_EVENT_DRC      2
> +
> +union virtio_video_event {
> +        le32 event_type /* One of VIRTIO_VIDEO_EVENT_* */
> +        struct virtio_video_event_err err;
> +        struct virtio_video_event_drc drc;
> +}
> +\end{lstlisting}
> +
> +\drivernormative{\paragraph}{Device Operation: Event Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue}
> +
> +The driver MUST at any time have at least one descriptor with a used
> +buffer large enough to contain a \field{struct virtio_video_event}
> +queued on the eventq.
> +
> +\paragraph{Error Event}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Error Event}
> +
> +The error event is queued by the device when an unrecoverable error
> +occurred during processing. The stream is considered invalid from that
> +point and is automatically closed. Pending
> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN and
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands are canceled, and further
> +commands will fail with VIRTIO\_VIDEO\_RESULT\_INVALID\_STREAM\_ID.
> +
> +Note that this is different from dequeued resources carrying the
> +VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_ERR flag. This flag indicates that the
> +output might be corrupted, but the stream in itself can continue and
> +might recover.
> +
> +This event should only be used for catastrophic errors, e.g.~a host
> +driver failure that cannot be recovered.
> +
> +\paragraph{Dynamic Resolution Change Event}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
> +
> +A Dynamic Resolution Change (or DRC) event happens when a decoder device
> +detects that the resolution of the stream being decoded has changed.
> +This event is emitted after processing all the input resources preceding
> +the resolution change, and as a result all the output resources
> +corresponding to these pre-DRC input resources are available to the
> +driver by the time it receives the DRC event.
> +
> +A DRC event automatically detaches the backing memory of all output
> +resources. Upon receiving the DRC event and processing all pending
> +output resources, the driver is responsible for querying the new output
> +resolution and re-attaching suitable backing memory to the output
> +resources before queueing them again. Streaming resumes when the first
> +output resource is queued with memory properly attached.
> +
> +\devicenormative{\subparagraph}{Dynamic Resolution Change Event}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
> +
> +The device MUST make all the output resources that correspond to frames
> +before the resolution change point available to the driver BEFORE it
> +sends the resolution change event to the driver.
> +
> +After the event is emitted, the device MUST reject all output resources
> +for which VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING has not been
> +successfully called again with
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION.
> +
> +\drivernormative{\subparagraph}{Dynamic Resolution Change Event}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
> +
> +The driver MUST query the new output resolution parameter and call
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING with suitable memory for
> +each output resource before queueing them again.
> +
> +\subsection{Parameters}\label{sec:Device Types / Video Device / Parameters}
> +
> +Parameters allow the driver to configure the device for the decoding or
> +encoding operation.
> +
> +The \field{union virtio_video_stream_params} is defined as follows:
> +
> +\begin{lstlisting}
> +/* Common parameters */
> +#define VIRTIO_VIDEO_PARAMS_INPUT_RESOURCES             0x001
> +#define VIRTIO_VIDEO_PARAMS_OUTPUT_RESOURCES            0x002
> +
> +/* Decoder-only parameters */
> +#define VIRTIO_VIDEO_PARAMS_DECODER_INPUT_FORMAT        0x101
> +#define VIRTIO_VIDEO_PARAMS_DECODER_OUTPUT_FORMAT       0x102
> +
> +/* Encoder-only parameters */
> +#define VIRTIO_VIDEO_PARAMS_ENCODER_INPUT_FORMAT        0x201
> +#define VIRTIO_VIDEO_PARAMS_ENCODER_OUTPUT_FORMAT       0x202
> +#define VIRTIO_VIDEO_PARAMS_ENCODER_BITRATE             0x203
> +
> +union virtio_video_stream_params {
> +        /* Common parameters */
> +        struct virtio_video_params_resources input_resources;
> +        struct virtio_video_params_resources output_resources;
> +
> +        /* Decoder-only parameters */
> +        struct virtio_video_params_bitstream_format decoder_input_format;
> +        struct virtio_video_params_image_format decoder_output_format;
> +
> +        /* Encoder-only parameters */
> +        struct virtio_video_params_image_format encoder_input_format;
> +        struct virtio_video_params_bitstream_format encoder_output_format;
> +        struct virtio_video_params_bitrate encoder_bitrate;
> +};
> +\end{lstlisting}
> +
> +Not all parameters are valid for all devices. For instance, a decoder
> +does not support any of the encoder-only parameters and will return
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION if an unsupported
> +parameter is queried or set.
> +
> +Each parameter is described in the remainder of this section.
> +
> +\drivernormative{\subsubsection}{Parameters}{Device Types / Video Device / Parameters}
> +
> +After creating a new stream, the initial value of all parameters is
> +undefined to the driver. Thus, the driver MUST NOT assume the default
> +value of any parameter and MUST use
> +VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM in order to get the values of the
> +parameters is needs.
> +
> +The driver SHOULD modify parameters by first calling
> +VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM to get the current value of the
> +parameter it wants to modify, alter it and submit the desired value
> +using VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM, then checking the actual
> +value set to the parameter in the response of
> +VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM.
> +
> +\devicenormative{\subsubsection}{Parameters}{Device Types / Video Device / Parameters}
> +
> +The device MUST initialize each parameter to a valid default value and

Maybe add one more "MUST" for the second part of the sentence?

> +allow each parameter to be read even without the driver explicitly
> +setting a value for it.
> +
> +\subsubsection{Common parameters}\label{sec:Device Types / Video Device / Parameters / Common parameters}
> +
> +\field{struct virtio_video_params_resources} is used to control the
> +number of resources and their backing memory type for the INPUT and
> +OUTPUT queues:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_MEM_TYPE_GUEST_PAGES       0x1
> +#define VIRTIO_VIDEO_MEM_TYPE_VIRTIO_OBJECT     0x2
> +
> +struct virtio_video_params_resources {
> +        le32 min_resources;
> +        le32 max_resources;
> +        le32 num_resources;
> +        u8 mem_type; /* VIRTIO_VIDEO_MEM_TYPE_* */
> +        u8 padding[3];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{min_resources}]
> +is the minimum number of resources that the queue supports for the
> +current settings. Cannot be modified by the driver.
> +\item[\field{max_resources}]
> +is the maximum number of resources that the queue supports for the
> +current settings. Cannot be modified by the driver.
> +\item[\field{num_resources}]
> +is the number of resources that can be addressed for the queue, numbered
> +from \(0\) to \(num\_queue - 1\). Can be equal to zero if no resources
> +are allocated, otherwise will be comprised between \field{min_resources}
> +and \field{max_resources}.
> +\item[\field{mem_type}]
> +is the memory type that will be used to back these resources.
> +\end{description}
> +
> +Successfully setting this parameter results in all currently attached
> +resources of the corresponding queue to become detached, i.e.~the driver
> +cannot queue a resource to the queue without attaching some backing
> +memory first. All currently queued resources for the queue are returned
> +with the VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED error before the response
> +to the VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM is returned.
> +
> +This parameter can only be changed during the following times:
> +
> +\begin{itemize}
> +\item
> +  After creating a stream and before queuing any resource on a given
> +  queue,
> +\item
> +  For the INPUT queue, after receiving the reponse to a
> +  VIRTIO\_VIDEO\_CMD\_STREAM\_STOP and before queueing any input
> +  resource,
> +\item
> +  For the OUTPUT queue, after receiving a DRC event and before queueing
> +  any output resource.
> +\end{itemize}
> +
> +Attempts to change this parameter outside of these times will result in
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to be returned.
> +
> +\subsubsection{Format parameters}\label{sec:Device Types / Video Device / Parameters / Format parameters}
> +
> +The format of the input and output queues are defined using the
> +\field{virtio_video_params_bitstream_format} and
> +\field{virtio_video_params_image_format}. Which one applies to the input
> +or output queue depends on whether the device is a decoder or an
> +encoder.
> +
> +Bitstream formats are set using the
> +\field{virtio_video_params_bitstream_format} struct:
> +
> +\begin{lstlisting}
> +struct virtio_video_params_bitstream_format {
> +        u8 fourcc[4];
> +        le32 buffer_size;
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{fourcc}]
> +is the fourcc of the bitstream format. For a list of supported formats,
> +see
> +\ref{sec:Device Types / Video Device / Supported formats / Bitstream formats}.
> +\item[\field{buffer_size}]
> +is the minimum size of the buffers that will back resources to be
> +queued.
> +\end{description}
> +
> +Image formats are set using the \field{virtio_video_params_image_format}
> +struct:
> +
> +\begin{lstlisting}
> +struct virtio_video_rect {
> +        le32 left;
> +        le32 top;
> +        le32 width;
> +        le32 height;
> +}
> +
> +struct virtio_video_plane_format {
> +        le32 buffer_size;
> +        le32 stride;
> +        le32 offset;
> +        u8 padding[4];
> +}
> +
> +struct virtio_video_params_image_format {
> +        u8 fourcc[4];
> +        le32 width;
> +        le32 height;
> +        u8 padding[4];
> +        struct virtio_video_rect crop;
> +        struct virtio_video_plane_format planes[VIRTIO_VIDEO_MAX_PLANES];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{fourcc}]
> +is the fourcc of the image format. For a list of supported formats, see
> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> +\item[\field{width}]
> +is the width in pixels of the coded image.
> +\item[\field{height}]
> +is the height in pixels of the coded image.
> +\item[\field{crop}]
> +is the rectangle covering the visible size of the frame, i.e the part of
> +the frame that should be displayed.
> +\item[\field{planes}]
> +is the format description of each individual plane making this format.
> +The number of planes is dependent on the \field{fourcc} and detailed in
> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> +
> +\begin{description}
> +\item[\field{buffer_size}]
> +is the minimum size of the buffers that will back resources to be
> +queued.
> +\item[\field{stride}]
> +is the distance in bytes between two lines of data.
> +\item[\field{offset}]
> +is the starting offset for the data in the buffer.
> +\end{description}
> +\end{description}
> +
> +\devicenormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
> +
> +The device MAY adjust any requested parameter to the nearest-supported
> +value if the requested one is not suitable. For instance, an encoder
> +device may decide that it needs more larger output buffers in order to
> +encode at the requested format and resolution.

What does "not suitable" mean here? Is it "not supported in the current
case"?

> +
> +\drivernormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
> +
> +When setting a format parameter, the driver MUST check the adjusted
> +returned value and comply with it, or try to set a different one if it
> +cannot.
> +
> +\subsubsection{Encoder parameters}\label{sec:Device Types / Video Device / Parameters / Encoder parameters}
> +
> +\begin{lstlisting}
> +struct virtio_video_params_bitrate {
> +    le32 min_bitrate;
> +    le32 max_bitrate;
> +    le32 bitrate;
> +    u8 padding[4];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{min_bitrate}]
> +is the minimum bitrate supported by the encoder for the current
> +settings. Ignored when setting the parameter.
> +\item[\field{max_bitrate}]
> +is the maximum bitrate supported by the encoder for the current
> +settings. Ignored when setting the parameter.
> +\item[\field{bitrate_}]
> +is the current desired bitrate for the encoder.
> +\end{description}
> +
> +\subsection{Supported formats}\label{sec:Device Types / Video Device / Supported formats}
> +
> +Bitstream and image formats are identified by their fourcc code, which
> +is a four-bytes ASCII sequence uniquely identifying the format and its
> +properties.
> +
> +\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
> +
> +The fourcc code of each supported bitstream format is given, as well as
> +the unit of data requested in each input resource for the decoder, or
> +produced in each output resource for the encoder.
> +
> +\begin{description}
> +\item[\field{MPG2}]
> +MPEG2 encoded stream. One Access Unit per resource.
> +\item[\field{H264}]
> +H.264 encoded stream. One NAL unit per resource.
> +\item[\field{HEVC}]
> +HEVC encoded stream. One NAL unit per resource.
> +\item[\field{VP80}]
> +VP8 encoded stream. One frame per resource.
> +\item[\field{VP90}]
> +VP9 encoded stream. One frame per resource.
> +\end{description}
> +
> +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
> +
> +The fourcc code of each supported image format is given, as well as its
> +number of planes, physical buffers, and eventual subsampling.
> +
> +\begin{description}
> +\item[\field{RGB3}]
> +one RGB plane where each component takes one byte, i.e.~3 bytes per
> +pixel.
> +\item[\field{NV12}]
> +one Y plane followed by interleaved U and V data, in a single buffer.
> +4:2:0 subsampling.
> +\item[\field{NV12}]
> +same as \field{NV12} but using two separate buffers for the Y and UV
> +planes.
> +\item[\field{YU12}]
> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
> +single buffer. 4:2:0 subsampling.
> +\item[\field{YM12}]
> +same as \field{YU12} but using three separate buffers for the Y, U and V
> +planes.
> +\end{description}

Can we assume that implementers know what all of those fourcc codes
mean? (I don't really know anything about this.) Is there some kind of
normative reference we should add?

Generally, I don't see anything fundamentally wrong with this approach
(mostly some smaller nits.) Feedback from someone familiar with this
subject would be great, though.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-08  7:23 [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification Alexandre Courbot
  2022-12-08 15:00 ` Cornelia Huck
@ 2022-12-19 16:59 ` Alexander Gordeev
  2022-12-20  9:51   ` Cornelia Huck
  2022-12-27  7:31   ` Alexandre Courbot
  2023-01-11 17:04 ` Alexander Gordeev
  2023-01-11 18:45 ` Alexander Gordeev
  3 siblings, 2 replies; 97+ messages in thread
From: Alexander Gordeev @ 2022-12-19 16:59 UTC (permalink / raw)
  To: Alexandre Courbot, virtio-dev, Keiichi Watanabe, Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa

Hello Alexandre,

Thanks for the update. Please check my comments below.
I'm new to the virtio video spec development, so I may lack some
historic perspective. I would gladly appreciate pointing me to some
older emails explaining decisions, that I might not understand. I hope
to read through all of them later. Overall I have a lot of experience in
the video domain and in virtio video device development in Opsy, so I
hope, that my comments are relevant and useful.

On 08.12.22 08:23, Alexandre Courbot wrote:
> Add the specification of the video decoder and encoder devices, which
> can be used to provide host-accelerated video operations to the guest.
>
> Signed-off-by: Keiichi Watanabe<keiichiw@chromium.org>
> Signed-off-by: Alexandre Courbot<acourbot@chromium.org>
> --
> Here is the long-overdue new revision of the virtio-video RFC. This
> version reorganizes the specification quite a bit and tries to simplify
> the protocol further. Nonetheless, it still results in a rather long (17
> pages) specification for just these devices, even though the spec is not
> fully complete (I want to rethink the formats descriptions, and some
> parameters need to be added for the encoder device).
>
> I would like to get some high-level feedback on this version and maybe
> propose to do things a bit differently before people invest too much
> time reviewing this in depth. While rewriting this document, it became
> more and more obvious that this is just a different, and maybe a bit
> simpler, reimplementation of the V4L2 stateless decoder protocol [1]. I
> am now wondering whether it would not make more sense to rewrite this
> specification as just a way to transport V4L2 requests over virtio,
> similarly to how virtio-fs does with the FUSE protocol [2].
>
> At the time we started writing this implementation, the V4L2 protocols
> for decoders and encoders were not set in stone yet, but now that they
> are it might make sense to reconsider. Switching to this solution would
> greatly shorten the virtio-video device spec, and also provide a way to
> support other kind of V4L2 devices like cameras or image processors at
> no extra cost.
>
> Note that doing so would not require that either the host or guest uses
> V4L2 - the virtio video device would just emulate a V4L2 device over
> virtio. A few adaptations would need to be done regarding how memory
> types work, but otherwise I believe most of V4L2 could be used as-is.
>
> Please share your thoughts about this, and I will either explore this
> idea further with a prototype, or keep moving the present spec forward,
> hopefully at a faster pace.
>
> Due to the RFC state of this patch I have refrained from referencing the
> normative statements in conformance.tex - I will do that as a final step
> once the spec is mostly agreed on.
>
> [1]https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdocs.kernel.org%2fuserspace%2dapi%2fmedia%2fv4l%2fdev%2dstateless%2ddecoder.html&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-e98508782bc1c9aa6b2e4a9df9d4dd170f9a5ffa
> [2]https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-fs.tex
>
> Full PDF:
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f1HRVDiDdo50%2dS9X5tWgzmT90FJRHoB1dN%2fview%3fusp%3dsharing&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-e315af79a067165e908bf1d803441eb181e2f375
>
> PDF of video section only:
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD%2fview%3fusp%3dsharing&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-b5c45bb4b6ccc5b73ea2a54e26f151d61722d0df
> ---
>   content.tex      |    1 +
>   virtio-video.tex | 1420 ++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 1421 insertions(+)
>   create mode 100644 virtio-video.tex
>
> diff --git a/content.tex b/content.tex
> index 9d1de53..8a1557a 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -7543,6 +7543,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
>   \input{virtio-scmi.tex}
>   \input{virtio-gpio.tex}
>   \input{virtio-pmem.tex}
> +\input{virtio-video.tex}
>
>   \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>
> diff --git a/virtio-video.tex b/virtio-video.tex
> new file mode 100644
> index 0000000..5d44eb1
> --- /dev/null
> +++ b/virtio-video.tex
> @@ -0,0 +1,1420 @@
> +\section{Video Device}\label{sec:Device Types / Video Device}
> +
> +The virtio video encoder and decoder devices provide support for
> +host-accelerated video encoding and decoding. Despite being different
> +devices types, they use the same protocol and general flow.
> +
> +\subsection{Device ID}\label{sec:Device Types / Video Device / Device ID}
> +
> +\begin{description}
> +\item[30]
> +encoder device
> +\item[31]
> +decoder device
> +\end{description}
> +
> +\subsection{Virtqueues}\label{sec:Device Types / Video Device / Virtqueues}
> +
> +\begin{description}
> +\item[0]
> +commandq - queue for driver commands and device responses to these
> +commands.
> +\item[1]
> +eventq - queue for events sent by the device to the driver.
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / Video Device / Feature bits}
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES (0)]
> +Guest pages can be used as the backing memory of resources.
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_NON\_CONTIG (1)]
> +The device can use non-contiguous guest memory as the backing memory of
> +resources. Only meaningful if VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES
> +is also set.
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_DYNAMIC (2)]
> +The device supports re-attaching memory to resources while streaming.
> +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT (3)]
> +Objects exported by another virtio device can be used as the backing
> +memory of resources.
> +\end{description}
> +
> +\devicenormative{\subsubsection}{Feature bits}{Device Types / Video Device / Feature bits}
> +
> +The device MUST present at least one of
> +VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES or
> +VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT, since the absence of both
> +bits would mean that no memory can be used at all for resources.
> +
> +\drivernormative{\subsubsection}{Feature bits}{Device Types / Video Device / Feature bits}
> +
> +The driver MUST negotiate at least one of the
> +VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES and
> +VIRTIO\_VIDEO\_F\_RESOURCE\_VIRTIO\_OBJECT features.
> +
> +If the VIRTIO\_VIDEO\_F\_RESOURCE\_NON\_CONTIG is not present, the
> +driver MUST use physically contiguous memory for all the buffers it
> +allocates.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / Video Device / Device configuration layout}
> +
> +Video device configuration uses the following layout:
> +
> +\begin{lstlisting}
> +struct virtio_video_config {
> +        le32 version;
> +        le32 caps_length;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{version}]
> +is the protocol version that the device understands.
> +\item[\field{caps_length}]
> +is the minumum length in bytes that a device-writable buffer must have
> +in order to receive the response to
> +VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
> +\end{description}
> +
> +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Video Device / Device configuration layout}
> +
> +As there is currently only one version of the protocol, the device MUST
> +set the \field{version} field to 0.
> +
> +The device MUST set the \field{caps_length} field to a value equal to
> +the response size of VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
> +
> +\subsection{Device Initialization}\label{sec:Device Types / Video Device / Device Initialization}
> +
> +\begin{enumerate}
> +\def\labelenumi{\arabic{enumi}.}
> +\item
> +  The driver reads the feature bits and negotiates the features it
> +  needs.
> +\item
> +  The driver sets up the commandq.
> +\item
> +  The driver confirms that it supports the version of the protocol
> +  advertized in the \field{version} field of the configuration space.
> +\item
> +  The driver reads the \field{caps_length} field of the configuration
> +  space and prepares a buffer of at least that size.
> +\item
> +  The driver sends that buffer on the commandq with the
> +  VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS command.
> +\item
> +  The driver receives the reponse from the device, and parses its
> +  capabilities.
> +\end{enumerate}
> +
> +\subsection{Device Operation}\label{sec:Device Types / Video Device / Device Operation}
> +
> +The commandq is used by the driver to send commands to the device and to
> +receive the device's response via used buffers.
> +
> +The driver can create new streams using the
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE command. Each stream has two resource
> +queues (not to be confused with the virtio queues) called INPUT and
> +OUTPUT. The INPUT queue accepts driver-filled input data for the device
> +(bitstream data for a decoder ; input frames for an encoder), while the
> +OUTPUT queue receives resources to be filled by the device as a result
> +of processing the INPUT queue (decoded frames for a decoder ; encoded
> +bitstream data for an encoder).
> +
> +A resource is a set of memory buffers that contain a unit of data that
> +the device can process or produce. Most resources will only have one
> +buffer (like bitstreams and single-planar images), but frames using a
> +multi-planar format will have several.
> +
> +Before resources can be submitted to a queue, backing memory must be
> +attached to them using VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING.
> +The exact form of that memory is negotiated using the feature flags.
> +
> +The INPUT and OUTPUT queues are effectively independent, and the driver
> +can fill them without caring about the other queue. In particular there
> +is no need to queue input and output resources in pairs: one input
> +resource can result in zero to many output resources being produced.
> +
> +Resources are queued to the INPUT or OUTPUT queue using the
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command. The device replies to this
> +command when an input resource has been fully processed and can be
> +reused by the driver, or when an output resource has been filled by the
> +device as a result of processing.
> +
> +Parameters of the stream can be obtained and configured using
> +VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM and
> +VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM. Available parameters depend on
> +on the device type and are detailed in section
  s/on on/on/


> +\ref{sec:Device Types / Video Device / Parameters}.
> +
> +The device may detect stream-related events that require intervention
> +from the driver and signals them on the eventq. One example is a dynamic
> +resolution change while decoding a stream, which may require the driver
> +to reallocate the backing memory of its output resources to fit the new
> +resolution.
> +
> +\drivernormative{\subsubsection}{Device Operation}{Device Types / Video Device / Device Operation}
> +
> +Descriptor chains sent to the commandq by the driver MUST include at
> +least one device-writable descriptor of a size sufficient to receive the
> +response to the queued command.
> +
> +\devicenormative{\subsubsection}{Device Operation}{Device Types / Video Device / Device Operation}
> +
> +Responses to a command MUST be written by the device in the first
> +device-writable descriptor of the descriptor chain from which the
> +command came.
> +
> +\subsubsection{Device Operation: Command Virtqueue}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
> +
> +This section details the commands that can be sent on the commandq by
> +the driver, as well as the responses that the device will write.
> +
> +Different structures are used for each command and response. A command
> +structure starts with the requested command code, defined as follows:
> +
> +\begin{lstlisting}
> +/* Device */
> +#define VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS       0x100
> +
> +/* Stream */
> +#define VIRTIO_VIDEO_CMD_STREAM_CREATE           0x200
> +#define VIRTIO_VIDEO_CMD_STREAM_DESTROY          0x201

Is this gap in numbers intentional? It would be great to remove it to
simplify boundary checks.


> +#define VIRTIO_VIDEO_CMD_STREAM_DRAIN            0x203
> +#define VIRTIO_VIDEO_CMD_STREAM_STOP             0x204
> +#define VIRTIO_VIDEO_CMD_STREAM_GET_PARAM        0x205
> +#define VIRTIO_VIDEO_CMD_STREAM_SET_PARAM        0x206
> +
> +/* Resource*/
> +#define VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING 0x400
> +#define VIRTIO_VIDEO_CMD_RESOURCE_QUEUE          0x401
> +};
> +\end{lstlisting}
> +
> +A response structure starts with the result of the requested command,
> +defined as follows:
> +
> +\begin{lstlisting}
> +/* Success */
> +#define VIRTIO_VIDEO_RESULT_OK                          0x000
> +
> +/* Error */
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_COMMAND         0x100
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_OPERATION       0x101
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_STREAM_ID       0x102
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_RESOURCE_ID     0x103
> +#define VIRTIO_VIDEO_RESULT_ERR_INVALID_ARGUMENT        0x104
> +#define VIRTIO_VIDEO_RESULT_ERR_CANCELED                0x105
> +#define VIRTIO_VIDEO_RESULT_ERR_OUT_OF_MEMORY           0x106
> +\end{lstlisting}
> +
> +For response structures carrying an error code, the rest of the
> +structure is considered invalid. Only response structures carrying
> +VIRTIO\_VIDEO\_RESULT\_OK shall be examined further by the driver.
> +
> +\devicenormative{\paragraph}{Device Operation: Command Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
> +
> +The device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND to
> +any non-existing command code.
> +
> +\drivernormative{\paragraph}{Device Operation: Command Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Command Virtqueue}
> +
> +The driver MUST NOT interpret the rest of a response which result is not

s/which/whose/ ?


> +VIRTIO\_VIDEO\_RESULT\_OK.
> +
> +\subsubsection{Device Operation: Device Commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands}
> +
> +Device capabilities are retrieved using the
> +VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS command, which returns arrays of
> +formats supported by the input and output queues.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
> +
> +Retrieve device capabilities.
> +
> +The driver sends this command with
> +\field{struct virtio_video_device_query_caps}:
> +
> +\begin{lstlisting}
> +struct virtio_video_device_query_caps {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS */
> +};
> +\end{lstlisting}
> +
> +The device responds with
> +\field{struct virtio_video_device_query_caps_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_device_query_caps_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        le32 num_bitstream_formats;
> +        le32 num_image_formats;
> +        /**
> +         * Followed by
> +         * struct virtio_video_bitstream_format_desc bitstream_formats[num_bitstream_formats];
> +         */
> +        /**
> +         * Followed by
> +         * struct virtio_video_image_format_desc image_formats[num_image_formats]
> +         */
> +};

struct virtio_video_bitstream_format_desc and struct
virtio_video_image_format_desc are not declared anywhere.


> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
> +if the descriptor was smaller than the defined \field{caps_length} in
> +the video device configuration.
> +\end{description}
> +\item[\field{num_bitstream_formats}]
> +is the number of supported bitstream formats.
> +\item[\field{num_image_formats}]
> +is the number of supported image formats.
> +\item[\field{bitstream_formats}]
> +is an array of size \field{num_bitstream_formats} containing the
> +supported encoded formats. These correspond to the formats that can be
> +set on the INPUT queue for a decoder, and on the OUTPUT queue for an
> +encoder. For a description of bitstream formats, see
> +\ref{sec:Device Types / Video Device / Supported formats / Bitstream formats}.
> +\item[\field{image_formats}]
> +is an array of size \field{num_image_formats} containing the supported
> +image formats. These correspond to the formats that can be set on the
> +OUTPUT queue for a decoder, and on the INPUT queue for an encoder. For a
> +description of image formats, see
> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}{Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS
> +by the driver.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}{Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
> +
> +The device MUST write the two \field{bitstream_formats} and
> +\field{image_formats} arrays, of length \field{num_bitstream_formats}
> +and \field{num_image_formats}, respectively.
> +
> +The total size of the response MUST be equal to \field{caps_length}
> +bytes, as reported by the device configuration.
> +
> +\subsubsection{Device Operation: Stream commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands}
> +
> +Stream commands allow the creation, destruction, and flow control of a
> +stream.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_CREATE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
> +
> +Create a new stream using the device.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_create}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_create {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_CREATE */
> +};
> +\end{lstlisting}
> +
> +The device responds with \field{struct virtio_video_stream_create_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_create_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
> +if the limit of simultaneous streams has been reached by the device and
> +no more can be created.
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND]
> +if the stream cannot be created due to an unexpected device issue.
> +\end{description}
> +\item[\field{stream_id}]
> +is the ID of the created stream allocated by the device.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_CREATE}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE by
> +the driver.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_CREATE}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
> +
> +\field{stream_id} MUST be set to an identifier that is unique to that
> +stream for as long as it lives.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_DESTROY}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
> +
> +Destroy a video stream and all its resources. Any activity on the stream
> +is halted and all resources released by the time the response is
> +received by the driver.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_destroy}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_destroy {
> +         le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_DESTROY */
> +         le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream to be destroyed, as previously returned by
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +\end{description}
> +
> +The device responds with
> +\field{struct virtio_video_stream_destroy_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_destroy_resp {
> +         le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream ID does not exist.
> +\end{description}
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DESTROY}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY by
> +the driver.
> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +The driver MUST stop using \field{stream_id} as a valid stream after it
> +received the response to this command.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DESTROY}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DESTROY}
> +
> +Before the device sends a response, it MUST respond with
> +VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED to all pending commands.
> +
> +After responding to this command, the device MUST reply with
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID to any command related
> +to this stream.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_DRAIN}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> +
> +Complete processing of all currently queued input resources.
> +
> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN ensures that all sent
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue are
> +processed by the device and that the resulting output resources are
> +available to the driver.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_drain}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_drain {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_DRAIN */
> +        le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream to drain, as previously returned by
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_drain_resp}
> +once the drain operation is completed:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_drain_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if a drain operation is already in progress for this stream,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED]
> +if the operation has been canceled by a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
> +or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation.
> +\end{description}
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN by the
> +driver.
> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +The driver MUST keep queueing output resources until it gets the
> +response to this command. Failure to do so may result in the device
> +stalling as it waits for output resources to write into.
> +
> +The driver MUST account for the fact that the response to this command
> +might come out-of-order (i.e.~after other commands sent to the device),
> +and that it can be interrupted.

The driver MUST send a DRAIN command when it doesn't have any input,
right? Otherwise, there is no way to receive all the decoded buffers
back, if the codec just keeps some of them waiting for more input.


> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> +
> +Before the device sends the response, it MUST process and respond to all
> +the VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue that
> +were sent before the drain command, and make all the corresponding
> +output resources available to the driver by responding to their
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command.

Unfortunately I don't see much details about the OUTPUT queue. What if
the driver queues new output buffers, as it must do, fast enough? Looks
like a valid implementation of the DRAIN command might never send a
response in this case, because the only thing it does is replying to
VIRTIO_VIDEO_CMD_RESOURCE_QUEUE commands on the OUTPUT queue. I guess,
it is better to specify what happens. I think the device should respond
to a certain amount of OUTPUT QUEUE commands until there is an end of
stream condition. Then it should respond to DRAIN command. What happens
with the remaining queued output buffers is a question to me: should
they be cancelled or not?


> +
> +While the device is processing the command, it MUST return
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to the
> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN command.

Should the device stop accepting input too?


> +
> +If the command is interrupted due to a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
> +or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation, the device MUST
> +respond with VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_STOP}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
> +

I don't like this command to be called "stop". When I see a "stop"
command, I expect to see a "start" command as well. My personal
preference would be "flush" or "reset".


> +Immediately return all queued input resources without processing them
> +and stop operation until new input resources are queued.
> +
> +This command is mostly useful for decoders that need to quickly jump
> +from one point of the stream to another (i.e.~seeking), or in order to
> +stop processing as quickly as possible.
> +
> +The driver sends this command with
> +\field{struct virtio_video_stream_stop}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_stop {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_STOP */
> +        le32 stream_id;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream to stop, as previously returned by
> +VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_stop_resp}
> +after the response for all previously queued input resources has been
> +sent:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_stop_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\end{description}
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_STOP}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_STOP by the
> +driver.
> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +Upon receiving the response to this command, the driver SHOULD process
> +(or drop) any output resource before resuming operation by queueing new
> +input resources.
> +
> +Upon receiving the response to this command, the driver CAN modify the
> +\field{struct virtio_video_params_resources} parameter corresponding to
> +the INPUT queue, and subsequently attach new backing memory to the input
> +resources using the VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING
> +command.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_STOP}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
> +
> +The device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED to any
> +pending VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN and
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command on the INPUT queue before
> +responding to this command. Pending commands on the output queue are not
> +affected.
> +
> +The device MUST interrupt operation as quickly as possible, and not be
> +dependent on output resources being queued by the driver.
> +
> +Upon resuming processing, the device CAN skip input data until it finds
> +a point that allows it to resume operation properly (e.g.~until a
> +keyframe it found in the input stream of a decoder).

s/it found/is found/ ?


> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
> +
> +Read the value of a parameter of the given stream. Available parameters
> +depend on the device type and are listed in
> +\ref{sec:Device Types / Video Device / Parameters}.
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_get_param {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_GET_PARAM */
> +        le32 stream_id;
> +        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
> +        u8 padding[4];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream we want to get a parameter from.
> +\item[\field{param_type}]
> +is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
> +we want to get.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_param_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_param_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        union virtio_video_stream_params param;

I'd prefer to have param_type in the response too for safety.


> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if the \field{param_type} argument is invalid for the device,
> +\end{description}
> +\item[\field{param}]
> +is the value of the requested parameter, if \field{result} is
> +VIRTIO\_VIDEO\_RESULT\_OK.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM
> +by the driver.
> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{param_type} MUST be set to a parameter type that is valid for the
> +device.

The device requirements are missing for GET_PARAMS.


> +
> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> +
> +Write the value of a parameter of the given stream, and return the value
> +actually set by the device. Available parameters depend on the device
> +type and are listed in
> +\ref{sec:Device Types / Video Device / Parameters}.
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_set_param {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_SET_PARAM */
> +        le32 stream_id;
> +        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
> +        u8 padding[4];
> +        union virtio_video_stream_params param;
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of the stream we want to set a parameter for.
> +\item[\field{param_type}]
> +is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
> +we want to set.
> +\end{description}
> +
> +The device responds with \field{struct virtio_video_stream_param_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_stream_param_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +        union virtio_video_stream_params param;

I'd prefer to have param_type in the response too for safety.


> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if the \field{param_type} argument is invalid for the device,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if the requested parameter cannot be modified at this moment.
> +\end{description}
> +\item[\field{param}]
> +is the actual value of the parameter set by the device, if
> +\field{result} is VIRTIO\_VIDEO\_RESULT\_OK. The value set by the device
> +may differ from the requested value depending on the device's
> +capabilities.
> +\end{description}
> +
> +Outside of the error cases described above, setting a parameter does not
> +fail. If the device cannot apply the parameter as requested, it will
> +adjust it to the closest setting it supports, and return that value to
> +the driver. It is then up to the driver to decide whether it can work
> +within the range of parameters supported by the device.
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM
> +by the driver.
> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{param_type} MUST be set to a parameter type that is valid for the
> +device, and \field{param} MUST be filled as the union member
> +corresponding to \field{param_type}.
> +
> +The driver MUST check the actual value of the parameter as set by the
> +device and work with this value, or fail properly if it cannot.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> +
> +The device MUST NOT return an error if the value requested by the driver
> +cannot be applied as-is. Instead, the device MUST set the parameter to
> +the closest supported value to the one requested by the driver.
> +
> +\subsubsection{Device Operation: Resource Commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands}
> +
> +Resource commands manage the memory backing of individual resources and
> +allow to queue them so the device can process them.

s/allow to queue them so the device can process them/allow them to be
queued for the device to process/

or maybe

s/allow to queue them so the device can process them/queue them for the
device to process/


> +
> +\paragraph{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> +
> +Assign backing memory to a resource.

IMO it is better to stick to "attach" everywhere.


> +
> +The driver sends this command with
> +\field{struct virtio_video_resource_attach_backing}:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_QUEUE_TYPE_INPUT       0
> +#define VIRTIO_VIDEO_QUEUE_TYPE_OUTPUT      1
> +
> +struct virtio_video_resource_attach_backing {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING */
> +        le32 stream_id;
> +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
> +        le32 resource_id;
> +        union virtio_video_resource resources[];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of a valid stream.
> +\item[\field{queue_type}]
> +is the direction of the queue.
> +\item[\field{resource_id}]
> +is the ID of the resource to be attached to.
> +\item[\field{resources}]
> +specifies memory regions to attach.
> +\end{description}
> +
> +The union \field{virtio_video_resource} is defined as follows:
> +
> +\begin{lstlisting}
> +union virtio_video_resource {
> +        struct virtio_video_resource_sg_list sg_list;
> +        struct virtio_video_resource_object object;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{sg_list}]
> +represents a scatter-gather list. This variant can be used when the
> +\field{mem_type} member of the \field{virtio_video_params_resources}
> +corresponding to the queue is set to
> +VIRTIO\_VIDEO\_MEM\_TYPE\_GUEST\_PAGES (see
> +\ref{sec:Device Types / Video Device / Parameters / Common parameters}).
> +\item[\field{object}]
> +represents an object exported from another virtio device. This variant
> +can be used when the \field{mem_type} member of the
> +\field{virtio_video_params_resources} corresponding to the queue is set
> +to VIRTIO\_VIDEO\_MEM\_TYPE\_VIRTIO\_OBJECT (see
> +\ref{sec:Device Types / Video Device / Parameters / Common parameters}).
> +\end{description}
> +
> +The struct \field{virtio_video_resource_sg_list} is defined as follows:
> +
> +\begin{lstlisting}
> +struct virtio_video_resource_sg_entry {
> +        le64 addr;
> +        le32 length;
> +        u8 padding[4];
> +};
> +
> +struct virtio_video_resource_sg_list {
> +        le32 num_entries;
> +        u8 padding[4];
> +        /* Followed by num_entries instances of
> +           video_video_resource_sg_entry */

s/video_video/virtio_video/


> +};
> +\end{lstlisting}
> +
> +Within \field{struct virtio_video_resource_sg_entry}:
> +
> +\begin{description}
> +\item[\field{addr}]
> +is a guest physical address to the start of the SG entry.
> +\item[\field{length}]
> +is the length of the SG entry.
> +\end{description}

I think having explicit page alignment requirements here would be great.


> +
> +Finally, for \field{struct virtio_video_resource_sg_list}:
> +
> +\begin{description}
> +\item[\field{num_entries}]
> +is the number of \field{struct virtio_video_resource_sg_entry} instances
> +that follow.
> +\end{description}
> +
> +\field{struct virtio_video_resource_object} is defined as follows:
> +
> +\begin{lstlisting}
> +struct virtio_video_resource_object {
> +        u8 uuid[16];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[uuid]
> +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
> +\end{description}
> +
> +The device responds with
> +\field{struct virtio_video_resource_attach_backing_resp}:
> +
> +\begin{lstlisting}
> +struct virtio_video_resource_attach_backing_resp {
> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the mentioned stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
> +invalid value,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if the operation is performed at a time when it is non-valid.
> +\end{description}
> +\end{description}
> +
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
> +the following times:
> +
> +\begin{itemize}
> +\item
> +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
> +  resource,
> +\item
> +  AFTER successfully changing the \field{virtio_video_params_resources}
> +  parameter corresponding to the queue and BEFORE
> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
> +\end{itemize}
> +
> +This is to ensure that the device can rely on the fact that a given
> +resource will always point to the same memory for as long as it may be
> +used by the video device. For instance, a decoder may use returned
> +decoded frames as reference for future frames and won't overwrite the
> +backing resource of a frame that is being referenced. It is only before
> +a stream is started and after a Dynamic Resolution Change event has
> +occurred that we can be sure that all resources won't be used in that
> +way.

The mentioned scenario about the referenced frames looks
somewhatreasonable, but I wonder how exactly would that work in practice.


> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> +
> +\field{cmd_type} MUST be set to
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING by the driver.
> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{queue_type} MUST be set to a valid queue type.
> +
> +\field{resource_id} MUST be an integer inferior to the number of
> +resources currently allocated for the queue.
> +
> +The length of the \field{resources} array of
> +\field{struct virtio_video_resource_attach_backing} MUST be equal to the
> +number of resources required by the format currently set on the queue,
> +as described in
> +\ref{sec:Device Types / Video Device / Supported formats}.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> +
> +At any time other than the times valid for calling this command, the
> +device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION.
> +
> +\paragraph{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
> +
> +Add a resource to the device's queue.
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_MAX_PLANES                    8
> +
> +#define VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME  (1 << 0)
> +
> +struct virtio_video_resource_queue {
> +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING */

s/VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING/VIRTIO_VIDEO_CMD_RESOURCE_QUEUE/


> +        le32 stream_id;
> +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
> +        le32 resource_id;
> +        le32 flags; /* Bitmask of VIRTIO_VIDEO_ENQUEUE_FLAG_* */
> +        u8 padding[4];
> +        le64 timestamp;
> +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{stream_id}]
> +is the ID of a valid stream.
> +\item[\field{queue_type}]
> +is the direction of the queue.
> +\item[\field{resource_id}]
> +is the ID of the resource to be queued.
> +\item[\field{flags}]
> +is a bitmask of VIRTIO\_VIDEO\_ENQUEUE\_FLAG\_* values.
> +
> +\begin{description}
> +\item[\field{VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME}]
> +The submitted frame is to be encoded as a key frame. Only valid for the
> +encoder's INPUT queue.
> +\end{description}
> +\item[\field{timestamp}]
> +is an abstract sequence counter that can be used on the INPUT queue for
> +synchronization. Resources produced on the output queue will carry the
> +\field{timestamp} of the input resource they have been produced from.

I think this is quite misleading. Implementers may assume, that it is ok
to assume a 1-to-1 mapping between input and output buffers and no
reordering, right? But this is not the case usually:

1. In the end of the spec H.264 and HEVC are defined to always have a
single NAL unit per resource. Well, there are many types of NAL units,
that do not represent any video data. Like SEI NAL units or delimiters.

2. We may assume that the SEI and delimiter units are filtered before
queuing, but there still is also other codec-specific data that can't be
filtered, like SPS and PPS NAL units. There has to be some special handling.

3. All of this means more codec-specific code in the driver or client
applications.

4. This spec says that the device may skip to a next key frame after a
seek. So the driver has to account for this too.

5. For example, in H.264 a single key frame may by coded by several NAL
units. In fact all VCL NAL units are called slices because of this. What
happens when the decoder sees several NAL units with different
timestamps coding the same output frame? Which timestamp will it choose?
I'm not sure it is defined anywhere. Probably it will just take the
first timestamp. The driver/client applications have to be ready for
this too.

6. I saw almost the same scenario with CSD units too. Imagine SPS with
timestamp 0, then PPS with 1, and then an IDR with 2. These three might
be combined in a single input buffer together by the vendor-provided
decoding software. Then the timestamp of the resulting frame is
naturally 0. But the driver/client application already doesn't expect to
get any response with timestamps 0 and 1, because they are known to be
belonging to CSD. And it expects an output buffer with ts 2. So there
will be a problem. (This is a real world example actually.)

7. Then there is H.264 High profile, for example. It has different
decoding and presentation order because frames may depend on future
frames. I think all the modern codecs have a mode like this. The input
frames are usually provided in the decoding order. Should the output
frames timestamps just be copied from input frames, they have been
produced from as this paragraph above says? This resembles decoder order
then. Well, this can work, if the container has correct DTS and PTS, and
the client software creates a mapping between these timestamps and the
virtio video timestamp. But this is not always the case. For example,
simple H.264 bitstream doesn't have any timestamps. And still it can be
easily played by ffmpeg/gstreamer/VLC/etc. There is no way to make this
work with a decoder following this spec, I think.

My suggestion is to not think about the timestamp as an abstract
counter, but give some freedom to the device by providing the available
information from the container, be it DTS, PTR or only FPS (through
PARAMS). Also the input and output queues should indeed be completely
separated. There should be no assumption of a 1-to-1 mapping of buffers.


> +\item[\field{data_sizes}]
> +number of data bytes used for each plane. Set by the driver for input
> +resources.
> +\end{description}
> +
> +The device responds with
> +\field{struct virtio_video_resource_queue_resp}:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_ERR           (1 << 0)
> +/* Encoder only */
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_KEY_FRAME     (1 << 1)
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_P_FRAME       (1 << 2)
> +#define VIRTIO_VIDEO_DEQUEUE_FLAG_B_FRAME       (1 << 3)
> +
> +struct virtio_video_resource_queue_resp {
> +        le32 result;
> +        le32 flags;
> +        le64 timestamp;
> +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{result}]
> +is
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> +if the operation succeeded,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> +if the requested stream does not exist,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> +if the \field{queue_type}, \field{resource_id} or \field{flags}
> +parameters have an invalid value,
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> +if VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING has not been
> +successfully called on the resource prior to queueing it.
> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED]
> +if the resource has not been processed, not because of an error but
> +because of a change in the state of the codec. The driver is expected to
> +take action and address the condition before submitting the resource
> +again.
> +\end{description}
> +\item[\field{flags}]
> +is a bitmask of VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_* flags.
> +
> +\begin{description}
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_ERR]
> +is set on output resources when a non-fatal processing error has
> +happened and the data contained by the resource is likely to be
> +corrupted,
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_KEY\_FRAME]
> +is set on output resources when the resource contains an encoded key
> +frame (only for encoders).
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_P\_FRAME]
> +is set on output resources when the resource contains only differences
> +to a previous key frame (only for encoders).
> +\item[VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_B\_FRAME]
> +is set on output resources when the resource contains only the
> +differences between the current frame and both the preceding and
> +following key frames (only for encoders).
> +\end{description}
> +\item[\field{timestamp}]
> +is set on output resources to the \field{timestamp} value of the input
> +resource that produced the resource.
> +\item[\field{data_sizes}]
> +is set on output resources to the amount of data written by the device,
> +for each plane.
> +\end{description}
> +
> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
> +
> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE by
> +the driver.
> +
> +\field{stream_id} MUST be set to a valid stream ID previously returned
> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> +
> +\field{queue_type} MUST be set to a valid queue type.
> +
> +\field{resource_id} MUST be an integer inferior to the number of
> +resources currently allocated for the queue.
> +
> +The driver MUST account for the fact that the response to this command
> +might come out-of-order (i.e.~after other commands sent to the device),
> +and that it can be interrupted.
> +
> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
> +
> +The device MUST mark output resources that might contain corrupted
> +content due to and error with the VIRTIO\_VIDEO\_BUFFER\_FLAG\_ERR flag.

s/and error/an error/


> +
> +For output resources, the device MUST copy the \field{timestamp}
> +parameter of the input resource that produced it into its response.
> +

Please see the comment on timestamps above.


> +In case of encoder, the device MUST mark each output resource with one
> +of VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_KEY\_FRAME,
> +VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_P\_FRAME, or
> +VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_B\_FRAME.
> +
> +If the processing of a resource was stopped due to a stream event, a
> +VIRTIO\_VIDEO\_CMD\_STREAM\_STOP, or a
> +VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY, the device MUST set \field{result}
> +to VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
> +
> +\subsubsection{Device Operation: Event Virtqueue}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue}
> +
> +The eventq is used by the device to signal stream events that are not a
> +direct result of a command queued by the driver on the commandq. Since
> +these events affect the device operation, the driver is expected to
> +react to them and resume streaming afterwards.
> +
> +There are currently two supported events: device error, and Dynamic
> +Resolution Change.
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_EVENT_ERROR    1
> +#define VIRTIO_VIDEO_EVENT_DRC      2
> +
> +union virtio_video_event {
> +        le32 event_type /* One of VIRTIO_VIDEO_EVENT_* */
> +        struct virtio_video_event_err err;
> +        struct virtio_video_event_drc drc;
> +}

This doesn't look like a correct declaration. It should be a union of
structs within a struct.

Also I couldn't find declarations for struct virtio_video_event_err and
struct virtio_video_event_drc.


> +\end{lstlisting}
> +
> +\drivernormative{\paragraph}{Device Operation: Event Virtqueue}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue}
> +
> +The driver MUST at any time have at least one descriptor with a used
> +buffer large enough to contain a \field{struct virtio_video_event}
> +queued on the eventq.
> +
> +\paragraph{Error Event}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Error Event}
> +
> +The error event is queued by the device when an unrecoverable error
> +occurred during processing. The stream is considered invalid from that
> +point and is automatically closed. Pending

Does this mean an implicit STREAM_DESTROY command?


> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN and
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands are canceled, and further
> +commands will fail with VIRTIO\_VIDEO\_RESULT\_INVALID\_STREAM\_ID.
> +
> +Note that this is different from dequeued resources carrying the
> +VIRTIO\_VIDEO\_DEQUEUE\_FLAG\_ERR flag. This flag indicates that the
> +output might be corrupted, but the stream in itself can continue and
> +might recover.
> +
> +This event should only be used for catastrophic errors, e.g.~a host
> +driver failure that cannot be recovered.
> +
> +\paragraph{Dynamic Resolution Change Event}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
> +
> +A Dynamic Resolution Change (or DRC) event happens when a decoder device
> +detects that the resolution of the stream being decoded has changed.
> +This event is emitted after processing all the input resources preceding
> +the resolution change, and as a result all the output resources
> +corresponding to these pre-DRC input resources are available to the
> +driver by the time it receives the DRC event.
> +
> +A DRC event automatically detaches the backing memory of all output
> +resources. Upon receiving the DRC event and processing all pending
> +output resources, the driver is responsible for querying the new output
> +resolution and re-attaching suitable backing memory to the output
> +resources before queueing them again. Streaming resumes when the first
> +output resource is queued with memory properly attached.
> +
> +\devicenormative{\subparagraph}{Dynamic Resolution Change Event}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
> +
> +The device MUST make all the output resources that correspond to frames
> +before the resolution change point available to the driver BEFORE it
> +sends the resolution change event to the driver.
> +
> +After the event is emitted, the device MUST reject all output resources
> +for which VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING has not been
> +successfully called again with
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION.
> +
> +\drivernormative{\subparagraph}{Dynamic Resolution Change Event}{Device Types / Video Device / Device Operation / Device Operation: Event Virtqueue / Dynamic Resolution Change Event}
> +
> +The driver MUST query the new output resolution parameter and call
> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING with suitable memory for
> +each output resource before queueing them again.
> +
> +\subsection{Parameters}\label{sec:Device Types / Video Device / Parameters}
> +
> +Parameters allow the driver to configure the device for the decoding or
> +encoding operation.
> +
> +The \field{union virtio_video_stream_params} is defined as follows:
> +
> +\begin{lstlisting}
> +/* Common parameters */
> +#define VIRTIO_VIDEO_PARAMS_INPUT_RESOURCES             0x001
> +#define VIRTIO_VIDEO_PARAMS_OUTPUT_RESOURCES            0x002
> +
> +/* Decoder-only parameters */
> +#define VIRTIO_VIDEO_PARAMS_DECODER_INPUT_FORMAT        0x101
> +#define VIRTIO_VIDEO_PARAMS_DECODER_OUTPUT_FORMAT       0x102
> +
> +/* Encoder-only parameters */
> +#define VIRTIO_VIDEO_PARAMS_ENCODER_INPUT_FORMAT        0x201
> +#define VIRTIO_VIDEO_PARAMS_ENCODER_OUTPUT_FORMAT       0x202
> +#define VIRTIO_VIDEO_PARAMS_ENCODER_BITRATE             0x203
> +
> +union virtio_video_stream_params {
> +        /* Common parameters */
> +        struct virtio_video_params_resources input_resources;
> +        struct virtio_video_params_resources output_resources;
> +
> +        /* Decoder-only parameters */
> +        struct virtio_video_params_bitstream_format decoder_input_format;
> +        struct virtio_video_params_image_format decoder_output_format;
> +
> +        /* Encoder-only parameters */
> +        struct virtio_video_params_image_format encoder_input_format;
> +        struct virtio_video_params_bitstream_format encoder_output_format;
> +        struct virtio_video_params_bitrate encoder_bitrate;
> +};
> +\end{lstlisting}
> +
> +Not all parameters are valid for all devices. For instance, a decoder
> +does not support any of the encoder-only parameters and will return
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION if an unsupported
> +parameter is queried or set.
> +
> +Each parameter is described in the remainder of this section.
> +
> +\drivernormative{\subsubsection}{Parameters}{Device Types / Video Device / Parameters}
> +
> +After creating a new stream, the initial value of all parameters is
> +undefined to the driver. Thus, the driver MUST NOT assume the default
> +value of any parameter and MUST use
> +VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM in order to get the values of the
> +parameters is needs.
> +
> +The driver SHOULD modify parameters by first calling
> +VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM to get the current value of the
> +parameter it wants to modify, alter it and submit the desired value
> +using VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM, then checking the actual
> +value set to the parameter in the response of
> +VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM.
> +
> +\devicenormative{\subsubsection}{Parameters}{Device Types / Video Device / Parameters}
> +
> +The device MUST initialize each parameter to a valid default value and
> +allow each parameter to be read even without the driver explicitly
> +setting a value for it.
> +
> +\subsubsection{Common parameters}\label{sec:Device Types / Video Device / Parameters / Common parameters}
> +
> +\field{struct virtio_video_params_resources} is used to control the
> +number of resources and their backing memory type for the INPUT and
> +OUTPUT queues:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VIDEO_MEM_TYPE_GUEST_PAGES       0x1
> +#define VIRTIO_VIDEO_MEM_TYPE_VIRTIO_OBJECT     0x2
> +
> +struct virtio_video_params_resources {
> +        le32 min_resources;
> +        le32 max_resources;
> +        le32 num_resources;
> +        u8 mem_type; /* VIRTIO_VIDEO_MEM_TYPE_* */
> +        u8 padding[3];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{min_resources}]
> +is the minimum number of resources that the queue supports for the
> +current settings. Cannot be modified by the driver.
> +\item[\field{max_resources}]
> +is the maximum number of resources that the queue supports for the
> +current settings. Cannot be modified by the driver.
> +\item[\field{num_resources}]
> +is the number of resources that can be addressed for the queue, numbered
> +from \(0\) to \(num\_queue - 1\). Can be equal to zero if no resources

s/num_queue/num_resources/ ?


> +are allocated, otherwise will be comprised between \field{min_resources}
> +and \field{max_resources}.
> +\item[\field{mem_type}]
> +is the memory type that will be used to back these resources.
> +\end{description}
> +
> +Successfully setting this parameter results in all currently attached
> +resources of the corresponding queue to become detached, i.e.~the driver
> +cannot queue a resource to the queue without attaching some backing
> +memory first. All currently queued resources for the queue are returned
> +with the VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED error before the response
> +to the VIRTIO\_VIDEO\_CMD\_STREAM\_SET\_PARAM is returned.
> +
> +This parameter can only be changed during the following times:
> +
> +\begin{itemize}
> +\item
> +  After creating a stream and before queuing any resource on a given
> +  queue,
> +\item
> +  For the INPUT queue, after receiving the reponse to a
> +  VIRTIO\_VIDEO\_CMD\_STREAM\_STOP and before queueing any input
> +  resource,
> +\item
> +  For the OUTPUT queue, after receiving a DRC event and before queueing
> +  any output resource.
> +\end{itemize}
> +
> +Attempts to change this parameter outside of these times will result in
> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to be returned.
> +
> +\subsubsection{Format parameters}\label{sec:Device Types / Video Device / Parameters / Format parameters}
> +
> +The format of the input and output queues are defined using the
> +\field{virtio_video_params_bitstream_format} and
> +\field{virtio_video_params_image_format}. Which one applies to the input
> +or output queue depends on whether the device is a decoder or an
> +encoder.
> +
> +Bitstream formats are set using the
> +\field{virtio_video_params_bitstream_format} struct:
> +
> +\begin{lstlisting}
> +struct virtio_video_params_bitstream_format {
> +        u8 fourcc[4];
> +        le32 buffer_size;
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{fourcc}]
> +is the fourcc of the bitstream format. For a list of supported formats,
> +see
> +\ref{sec:Device Types / Video Device / Supported formats / Bitstream formats}.
> +\item[\field{buffer_size}]
> +is the minimum size of the buffers that will back resources to be
> +queued.
> +\end{description}
> +
> +Image formats are set using the \field{virtio_video_params_image_format}
> +struct:
> +
> +\begin{lstlisting}
> +struct virtio_video_rect {
> +        le32 left;
> +        le32 top;
> +        le32 width;
> +        le32 height;
> +}
> +
> +struct virtio_video_plane_format {
> +        le32 buffer_size;
> +        le32 stride;
> +        le32 offset;
> +        u8 padding[4];
> +}
> +
> +struct virtio_video_params_image_format {
> +        u8 fourcc[4];
> +        le32 width;
> +        le32 height;
> +        u8 padding[4];
> +        struct virtio_video_rect crop;
> +        struct virtio_video_plane_format planes[VIRTIO_VIDEO_MAX_PLANES];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{fourcc}]
> +is the fourcc of the image format. For a list of supported formats, see
> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> +\item[\field{width}]
> +is the width in pixels of the coded image.
> +\item[\field{height}]
> +is the height in pixels of the coded image.
> +\item[\field{crop}]
> +is the rectangle covering the visible size of the frame, i.e the part of
> +the frame that should be displayed.

I think it is better to elaborate here whether width and height inside
crop are relative to (0, 0) or (left, top).


> +\item[\field{planes}]
> +is the format description of each individual plane making this format.
> +The number of planes is dependent on the \field{fourcc} and detailed in
> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> +
> +\begin{description}
> +\item[\field{buffer_size}]
> +is the minimum size of the buffers that will back resources to be
> +queued.
> +\item[\field{stride}]
> +is the distance in bytes between two lines of data.
> +\item[\field{offset}]
> +is the starting offset for the data in the buffer.

It is not quite clear to me how to use the offset during SET_PARAMS. I
think it is much more reasonable to have per plane offsets in struct
virtio_video_resource_queue and struct virtio_video_resource_queue_resp.


> +\end{description}
> +\end{description}
> +
> +\devicenormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
> +
> +The device MAY adjust any requested parameter to the nearest-supported
> +value if the requested one is not suitable. For instance, an encoder
> +device may decide that it needs more larger output buffers in order to

s/more larger/more or larger/ ?


> +encode at the requested format and resolution.

It is not defined when changing these parameters is allowed. Also there
is an issue: changing width, height, format, buffer_size should probably
detach all the currently attached buffers. But changing crop shouldn't
affect the output buffers in any way, right? So maybe it is better to
split them?


> +
> +\drivernormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
> +
> +When setting a format parameter, the driver MUST check the adjusted
> +returned value and comply with it, or try to set a different one if it
> +cannot.
> +
> +\subsubsection{Encoder parameters}\label{sec:Device Types / Video Device / Parameters / Encoder parameters}
> +
> +\begin{lstlisting}
> +struct virtio_video_params_bitrate {
> +    le32 min_bitrate;
> +    le32 max_bitrate;
> +    le32 bitrate;
> +    u8 padding[4];
> +}
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{min_bitrate}]
> +is the minimum bitrate supported by the encoder for the current
> +settings. Ignored when setting the parameter.
> +\item[\field{max_bitrate}]
> +is the maximum bitrate supported by the encoder for the current
> +settings. Ignored when setting the parameter.
> +\item[\field{bitrate_}]

s/bitrate_/bitrate/


> +is the current desired bitrate for the encoder.
> +\end{description}

The bitrate should be changeable on the fly IMO. It would be nice to
have this mentioned explicitly.


> +
> +\subsection{Supported formats}\label{sec:Device Types / Video Device / Supported formats}
> +
> +Bitstream and image formats are identified by their fourcc code, which
> +is a four-bytes ASCII sequence uniquely identifying the format and its
> +properties.
> +
> +\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
> +
> +The fourcc code of each supported bitstream format is given, as well as
> +the unit of data requested in each input resource for the decoder, or
> +produced in each output resource for the encoder.
> +
> +\begin{description}
> +\item[\field{MPG2}]
> +MPEG2 encoded stream. One Access Unit per resource.
> +\item[\field{H264}]
> +H.264 encoded stream. One NAL unit per resource.
> +\item[\field{HEVC}]
> +HEVC encoded stream. One NAL unit per resource.
> +\item[\field{VP80}]
> +VP8 encoded stream. One frame per resource.
> +\item[\field{VP90}]
> +VP9 encoded stream. One frame per resource.
> +\end{description}
> +
> +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
> +
> +The fourcc code of each supported image format is given, as well as its
> +number of planes, physical buffers, and eventual subsampling.
> +
> +\begin{description}
> +\item[\field{RGB3}]
> +one RGB plane where each component takes one byte, i.e.~3 bytes per
> +pixel.
> +\item[\field{NV12}]
> +one Y plane followed by interleaved U and V data, in a single buffer.
> +4:2:0 subsampling.
> +\item[\field{NV12}]
> +same as \field{NV12} but using two separate buffers for the Y and UV
> +planes.

s/NV12/NM12/ ?


> +\item[\field{YU12}]
> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
> +single buffer. 4:2:0 subsampling.
> +\item[\field{YM12}]
> +same as \field{YU12} but using three separate buffers for the Y, U and V
> +planes.
> +\end{description}

This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
V4L2 documentation has a nice description of exact plane layouts.
Otherwise it would be nice to have these layouts in the spec IMO.

--
Alexander Gordeev
Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail:alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-19 16:59 ` [virtio-dev] " Alexander Gordeev
@ 2022-12-20  9:51   ` Cornelia Huck
  2022-12-20 10:35     ` Alexander Gordeev
  2022-12-27  7:31   ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2022-12-20  9:51 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot, virtio-dev,
	Keiichi Watanabe, Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa

On Mon, Dec 19 2022, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:

> Hello Alexandre,
>
> Thanks for the update. Please check my comments below.
> I'm new to the virtio video spec development, so I may lack some
> historic perspective. I would gladly appreciate pointing me to some
> older emails explaining decisions, that I might not understand. I hope
> to read through all of them later. Overall I have a lot of experience in
> the video domain and in virtio video device development in Opsy, so I
> hope, that my comments are relevant and useful.

Thank you for commenting! I know the virtio spec, but I'm not familiar
with the video part, so your feedback is highly appreciated :)

Prior discussions should be in the virtio-dev archives; at least v3
(https://lists.oasis-open.org/archives/virtio-dev/202002/msg00002.html)
and v4
(https://lists.oasis-open.org/archives/virtio-dev/202006/msg00072.html)
seem to have gotten some feedback.

[A note to Alexandre: You'll want to send your patches to virtio-comment
as well as virtio-dev; virtio-comment is for commenting on the spec
including proposing updates, while virtio-dev is for discussing virtio
development. Yes, I know that we haven't always been strict about that
in the past.]

>
> On 08.12.22 08:23, Alexandre Courbot wrote:

(...)

>> +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
>> +
>> +The fourcc code of each supported image format is given, as well as its
>> +number of planes, physical buffers, and eventual subsampling.
>> +
>> +\begin{description}
>> +\item[\field{RGB3}]
>> +one RGB plane where each component takes one byte, i.e.~3 bytes per
>> +pixel.
>> +\item[\field{NV12}]
>> +one Y plane followed by interleaved U and V data, in a single buffer.
>> +4:2:0 subsampling.
>> +\item[\field{NV12}]
>> +same as \field{NV12} but using two separate buffers for the Y and UV
>> +planes.
>
> s/NV12/NM12/ ?
>
>
>> +\item[\field{YU12}]
>> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
>> +single buffer. 4:2:0 subsampling.
>> +\item[\field{YM12}]
>> +same as \field{YU12} but using three separate buffers for the Y, U and V
>> +planes.
>> +\end{description}
>
> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
> V4L2 documentation has a nice description of exact plane layouts.
> Otherwise it would be nice to have these layouts in the spec IMO.

Ah, so this is all V4L2? It looks like we really want to refer to the
existing V4L2 headers (like we already do for FUSE in the virtiofs
case). Are those headers sufficient to specify the formats, or do we
need anything else?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-20  9:51   ` Cornelia Huck
@ 2022-12-20 10:35     ` Alexander Gordeev
  2022-12-20 17:39       ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2022-12-20 10:35 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot, virtio-dev, Keiichi Watanabe,
	Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa

Hello Cornelia,

On 20.12.22 10:51, Cornelia Huck wrote:
> On Mon, Dec 19 2022, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>
>> Hello Alexandre,
>>
>> Thanks for the update. Please check my comments below.
>> I'm new to the virtio video spec development, so I may lack some
>> historic perspective. I would gladly appreciate pointing me to some
>> older emails explaining decisions, that I might not understand. I hope
>> to read through all of them later. Overall I have a lot of experience in
>> the video domain and in virtio video device development in Opsy, so I
>> hope, that my comments are relevant and useful.
> Thank you for commenting! I know the virtio spec, but I'm not familiar
> with the video part, so your feedback is highly appreciated :)
>
> Prior discussions should be in the virtio-dev archives; at least v3
> (https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flists.oasis%2dopen.org%2farchives%2fvirtio%2ddev%2f202002%2fmsg00002.html&umid=ce4587fa-5f66-4a08-be71-c7fad0355b4c&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-8ac7749933f48b779f5d37c4630ba2d37c15b0eb)
> and v4
> (https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flists.oasis%2dopen.org%2farchives%2fvirtio%2ddev%2f202006%2fmsg00072.html&umid=ce4587fa-5f66-4a08-be71-c7fad0355b4c&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-f5d8196d57b85625d3a8262d32ad87a88d5d6bee)
> seem to have gotten some feedback.
>
> [A note to Alexandre: You'll want to send your patches to virtio-comment
> as well as virtio-dev; virtio-comment is for commenting on the spec
> including proposing updates, while virtio-dev is for discussing virtio
> development. Yes, I know that we haven't always been strict about that
> in the past.]

Thank you for the references! I'll take some time to read these
discussions through.


>> On 08.12.22 08:23, Alexandre Courbot wrote:
> (...)
>
>>> +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
>>> +
>>> +The fourcc code of each supported image format is given, as well as its
>>> +number of planes, physical buffers, and eventual subsampling.
>>> +
>>> +\begin{description}
>>> +\item[\field{RGB3}]
>>> +one RGB plane where each component takes one byte, i.e.~3 bytes per
>>> +pixel.
>>> +\item[\field{NV12}]
>>> +one Y plane followed by interleaved U and V data, in a single buffer.
>>> +4:2:0 subsampling.
>>> +\item[\field{NV12}]
>>> +same as \field{NV12} but using two separate buffers for the Y and UV
>>> +planes.
>> s/NV12/NM12/ ?
>>
>>
>>> +\item[\field{YU12}]
>>> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
>>> +single buffer. 4:2:0 subsampling.
>>> +\item[\field{YM12}]
>>> +same as \field{YU12} but using three separate buffers for the Y, U and V
>>> +planes.
>>> +\end{description}
>> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
>> V4L2 documentation has a nice description of exact plane layouts.
>> Otherwise it would be nice to have these layouts in the spec IMO.
> Ah, so this is all V4L2? It looks like we really want to refer to the
> existing V4L2 headers (like we already do for FUSE in the virtiofs
> case). Are those headers sufficient to specify the formats, or do we
> need anything else?
Yes, these fourcc code seem to come from V4L2:
https://docs.kernel.org/userspace-api/media/v4l/pixfmt-yuv-planar.html
I also think we want to have a reference to V4L2 in the spec here, because:
1. V4L2 docs have the nice component/plane layouts that help convert or
match these format definitions with other APIs.
2. These fourcc codes are not used anywhere else except in V4L2. I think
this is the only place, that distinguishes coontiguous and non
contiguous buffers with a fourcc code. So without a reference this may
look quite weird. With a reference this looks logical, because I guess
any virtio video driver for a Linux guest would naturally implement V4L2.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-20 10:35     ` Alexander Gordeev
@ 2022-12-20 17:39       ` Cornelia Huck
  2022-12-21 14:56         ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2022-12-20 17:39 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot, virtio-dev,
	Keiichi Watanabe, Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa

On Tue, Dec 20 2022, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:

> Hello Cornelia,
>
> On 20.12.22 10:51, Cornelia Huck wrote:
>> On Mon, Dec 19 2022, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>
>>> Hello Alexandre,
>>>
>>> Thanks for the update. Please check my comments below.
>>> I'm new to the virtio video spec development, so I may lack some
>>> historic perspective. I would gladly appreciate pointing me to some
>>> older emails explaining decisions, that I might not understand. I hope
>>> to read through all of them later. Overall I have a lot of experience in
>>> the video domain and in virtio video device development in Opsy, so I
>>> hope, that my comments are relevant and useful.
>> Thank you for commenting! I know the virtio spec, but I'm not familiar
>> with the video part, so your feedback is highly appreciated :)
>>
>> Prior discussions should be in the virtio-dev archives; at least v3
>> (https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flists.oasis%2dopen.org%2farchives%2fvirtio%2ddev%2f202002%2fmsg00002.html&umid=ce4587fa-5f66-4a08-be71-c7fad0355b4c&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-8ac7749933f48b779f5d37c4630ba2d37c15b0eb)
>> and v4
>> (https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flists.oasis%2dopen.org%2farchives%2fvirtio%2ddev%2f202006%2fmsg00072.html&umid=ce4587fa-5f66-4a08-be71-c7fad0355b4c&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-f5d8196d57b85625d3a8262d32ad87a88d5d6bee)
>> seem to have gotten some feedback.
>>
>> [A note to Alexandre: You'll want to send your patches to virtio-comment
>> as well as virtio-dev; virtio-comment is for commenting on the spec
>> including proposing updates, while virtio-dev is for discussing virtio
>> development. Yes, I know that we haven't always been strict about that
>> in the past.]
>
> Thank you for the references! I'll take some time to read these
> discussions through.

You're welcome; I'm not sure how much of this is still relevant.

>
>
>>> On 08.12.22 08:23, Alexandre Courbot wrote:
>> (...)
>>
>>>> +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
>>>> +
>>>> +The fourcc code of each supported image format is given, as well as its
>>>> +number of planes, physical buffers, and eventual subsampling.
>>>> +
>>>> +\begin{description}
>>>> +\item[\field{RGB3}]
>>>> +one RGB plane where each component takes one byte, i.e.~3 bytes per
>>>> +pixel.
>>>> +\item[\field{NV12}]
>>>> +one Y plane followed by interleaved U and V data, in a single buffer.
>>>> +4:2:0 subsampling.
>>>> +\item[\field{NV12}]
>>>> +same as \field{NV12} but using two separate buffers for the Y and UV
>>>> +planes.
>>> s/NV12/NM12/ ?
>>>
>>>
>>>> +\item[\field{YU12}]
>>>> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
>>>> +single buffer. 4:2:0 subsampling.
>>>> +\item[\field{YM12}]
>>>> +same as \field{YU12} but using three separate buffers for the Y, U and V
>>>> +planes.
>>>> +\end{description}
>>> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
>>> V4L2 documentation has a nice description of exact plane layouts.
>>> Otherwise it would be nice to have these layouts in the spec IMO.
>> Ah, so this is all V4L2? It looks like we really want to refer to the
>> existing V4L2 headers (like we already do for FUSE in the virtiofs
>> case). Are those headers sufficient to specify the formats, or do we
>> need anything else?
> Yes, these fourcc code seem to come from V4L2:
> https://docs.kernel.org/userspace-api/media/v4l/pixfmt-yuv-planar.html
> I also think we want to have a reference to V4L2 in the spec here, because:
> 1. V4L2 docs have the nice component/plane layouts that help convert or
> match these format definitions with other APIs.
> 2. These fourcc codes are not used anywhere else except in V4L2. I think
> this is the only place, that distinguishes coontiguous and non
> contiguous buffers with a fourcc code. So without a reference this may
> look quite weird. With a reference this looks logical, because I guess
> any virtio video driver for a Linux guest would naturally implement V4L2.

Maybe the best way to handle this would be to link to both the UAPI
header file (via Linus' tree on kernel.org, as we do for FUSE), and to
the .rst file(s) that describe what this all means (probably also via
kernel.org). The header file would definitely be a normative reference;
not sure whether the .rst files would be normative or non-normative, but
we can figure that out later. That's certainly better than trying to
replicate existing definitions and explanations.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-20 17:39       ` Cornelia Huck
@ 2022-12-21 14:56         ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2022-12-21 14:56 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot, virtio-dev, Keiichi Watanabe,
	Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa

On 20.12.22 18:39, Cornelia Huck wrote:
>>>>> +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
>>>>> +
>>>>> +The fourcc code of each supported image format is given, as well as its
>>>>> +number of planes, physical buffers, and eventual subsampling.
>>>>> +
>>>>> +\begin{description}
>>>>> +\item[\field{RGB3}]
>>>>> +one RGB plane where each component takes one byte, i.e.~3 bytes per
>>>>> +pixel.
>>>>> +\item[\field{NV12}]
>>>>> +one Y plane followed by interleaved U and V data, in a single buffer.
>>>>> +4:2:0 subsampling.
>>>>> +\item[\field{NV12}]
>>>>> +same as \field{NV12} but using two separate buffers for the Y and UV
>>>>> +planes.
>>>> s/NV12/NM12/ ?
>>>>
>>>>
>>>>> +\item[\field{YU12}]
>>>>> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
>>>>> +single buffer. 4:2:0 subsampling.
>>>>> +\item[\field{YM12}]
>>>>> +same as \field{YU12} but using three separate buffers for the Y, U and V
>>>>> +planes.
>>>>> +\end{description}
>>>> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
>>>> V4L2 documentation has a nice description of exact plane layouts.
>>>> Otherwise it would be nice to have these layouts in the spec IMO.
>>> Ah, so this is all V4L2? It looks like we really want to refer to the
>>> existing V4L2 headers (like we already do for FUSE in the virtiofs
>>> case). Are those headers sufficient to specify the formats, or do we
>>> need anything else?
>> Yes, these fourcc code seem to come from V4L2:
>> https://docs.kernel.org/userspace-api/media/v4l/pixfmt-yuv-planar.html
>> I also think we want to have a reference to V4L2 in the spec here, because:
>> 1. V4L2 docs have the nice component/plane layouts that help convert or
>> match these format definitions with other APIs.
>> 2. These fourcc codes are not used anywhere else except in V4L2. I think
>> this is the only place, that distinguishes coontiguous and non
>> contiguous buffers with a fourcc code. So without a reference this may
>> look quite weird. With a reference this looks logical, because I guess
>> any virtio video driver for a Linux guest would naturally implement V4L2.
> Maybe the best way to handle this would be to link to both the UAPI
> header file (via Linus' tree on kernel.org, as we do for FUSE), and to
> the .rst file(s) that describe what this all means (probably also via
> kernel.org). The header file would definitely be a normative reference;
> not sure whether the .rst files would be normative or non-normative, but
> we can figure that out later. That's certainly better than trying to
> replicate existing definitions and explanations.

I like the plan. I also think it is better to reference the definitions
and explanations, not replicate them. Thanks!

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-08 15:00 ` Cornelia Huck
@ 2022-12-27  5:38   ` Alexandre Courbot
  2023-01-11  8:45     ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2022-12-27  5:38 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa

Hi Cornelia, thanks for the feedback! I have directly reported the
comments snipped from this answer to the source document.

On Fri, Dec 9, 2022 at 12:01 AM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Thu, Dec 08 2022, Alexandre Courbot <acourbot@chromium.org> wrote:
>
> > Add the specification of the video decoder and encoder devices, which
> > can be used to provide host-accelerated video operations to the guest.
> >
> > Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> > --
> > Here is the long-overdue new revision of the virtio-video RFC. This
> > version reorganizes the specification quite a bit and tries to simplify
> > the protocol further. Nonetheless, it still results in a rather long (17
> > pages) specification for just these devices, even though the spec is not
> > fully complete (I want to rethink the formats descriptions, and some
> > parameters need to be added for the encoder device).
> >
> > I would like to get some high-level feedback on this version and maybe
> > propose to do things a bit differently before people invest too much
> > time reviewing this in depth. While rewriting this document, it became
> > more and more obvious that this is just a different, and maybe a bit
> > simpler, reimplementation of the V4L2 stateless decoder protocol [1]. I
> > am now wondering whether it would not make more sense to rewrite this
> > specification as just a way to transport V4L2 requests over virtio,
> > similarly to how virtio-fs does with the FUSE protocol [2].
> >
> > At the time we started writing this implementation, the V4L2 protocols
> > for decoders and encoders were not set in stone yet, but now that they
> > are it might make sense to reconsider. Switching to this solution would
> > greatly shorten the virtio-video device spec, and also provide a way to
> > support other kind of V4L2 devices like cameras or image processors at
> > no extra cost.
> >
> > Note that doing so would not require that either the host or guest uses
> > V4L2 - the virtio video device would just emulate a V4L2 device over
> > virtio. A few adaptations would need to be done regarding how memory
> > types work, but otherwise I believe most of V4L2 could be used as-is.
> >
> > Please share your thoughts about this, and I will either explore this
> > idea further with a prototype, or keep moving the present spec forward,
> > hopefully at a faster pace.
>
> In principle, reusing an existing interface that does the job might be a
> good idea. I see that the Linux headers are dual-licenced as 3-clause
> BSD, and if the interface has indeed stabilized, it might be a good idea
> to rely on it. The main question is: Is the interface sufficiently
> independent from Linux specialities (i.e. can others implement it
> without issue?)

From what I can infer after looking at the sequence of ioctls that
would be necessary to decode a stream, I believe this would work with
only minor arrangements.

> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / Video Device / Feature bits}
> > +
> > +\begin{description}
> > +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES (0)]
>
> Side note: you should get the correct output even without escaping the
> underscores (although your editor might still be confused...)

Actually this LaTeX document has been generated from a Markdown file
passed through a Pandoc filter (this makes it simpler to write for me
vs. writing the LaTeX directly). I'll see if I can remove these escape
sequences using the filter or sed.

> > +
> > +\begin{lstlisting}
> > +struct virtio_video_config {
> > +        le32 version;
> > +        le32 caps_length;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{version}]
> > +is the protocol version that the device understands.
> > +\item[\field{caps_length}]
> > +is the minumum length in bytes that a device-writable buffer must have
> > +in order to receive the response to
> > +VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
> > +\end{description}
> > +
> > +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Video Device / Device configuration layout}
> > +
> > +As there is currently only one version of the protocol, the device MUST
> > +set the \field{version} field to 0.
>
> In what way would you want to change the protocol so that it becomes
> incompatible? Extensions should be easy to handle via extra
> capabilities, and if we don't expect the protocol to change often, a
> feature bit for a new format might be sufficient.
>
> If we stick with the version field, maybe start at 1 and make 0 invalid?
> Probably easier to spot errors that way.

You are right, this is probably not needed. I guess in the early days
we wanted to handle the case where the protocol would evolve in
incompatible ways, but we'd better not consider that route at all if
only for the complexity that would be added to the spec. I'll remove
the version field.

>
> > +
> > +The device MUST set the \field{caps_length} field to a value equal to
> > +the response size of VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
>
> Could the device also support a minimum response size that only supports
> a subset of the caps to be returned? Otherwise, I think caps_length is
> the maximum (or fixed?) length of the query caps response?

I think this can be replaced by a fixed-size call for getting only one
format at a time. The guest would have to make several of these in
order to obtain the whole set of supported formats, but it would be
easier to parse compared to the large result returned by QUERY_CAP and
simpler overall.

> > +
> > +\subsubsection{Device Operation: Stream commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands}
> > +
> > +Stream commands allow the creation, destruction, and flow control of a
> > +stream.
> > +
> > +\paragraph{VIRTIO_VIDEO_CMD_STREAM_CREATE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
> > +
> > +Create a new stream using the device.
> > +
> > +The driver sends this command with
> > +\field{struct virtio_video_stream_create}:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_stream_create {
> > +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_CREATE */
> > +};
> > +\end{lstlisting}
> > +
> > +The device responds with \field{struct virtio_video_stream_create_resp}:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_stream_create_resp {
> > +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> > +        le32 stream_id;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{result}]
> > +is
> > +
> > +\begin{description}
> > +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> > +if the operation succeeded,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
> > +if the limit of simultaneous streams has been reached by the device and
> > +no more can be created.
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND]
> > +if the stream cannot be created due to an unexpected device issue.
>
> Is it an "unexpected device issue" or "the driver send something it
> should not have"? It might be a good idea to distinguish the two?

This error code should not be ambiguous as the input of the command is
its type. Therefore this error code can only be returned in case of a
library or hardware error on the host side.

> > +
> > +\field{stream_id} MUST be set to a valid stream ID previously returned
> > +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> > +
> > +\field{param_type} MUST be set to a parameter type that is valid for the
> > +device.
> > +
> > +\paragraph{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> > +
> > +Write the value of a parameter of the given stream, and return the value
> > +actually set by the device. Available parameters depend on the device
> > +type and are listed in
> > +\ref{sec:Device Types / Video Device / Parameters}.
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_stream_set_param {
> > +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_SET_PARAM */
> > +        le32 stream_id;
> > +        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
> > +        u8 padding[4];
> > +        union virtio_video_stream_params param;
> > +}
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{stream_id}]
> > +is the ID of the stream we want to set a parameter for.
> > +\item[\field{param_type}]
> > +is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
> > +we want to set.
> > +\end{description}
> > +
> > +The device responds with \field{struct virtio_video_stream_param_resp}:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_stream_param_resp {
> > +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> > +        union virtio_video_stream_params param;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{result}]
> > +is
> > +
> > +\begin{description}
> > +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> > +if the operation succeeded,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> > +if the requested stream does not exist,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> > +if the \field{param_type} argument is invalid for the device,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> > +if the requested parameter cannot be modified at this moment.
> > +\end{description}
> > +\item[\field{param}]
> > +is the actual value of the parameter set by the device, if
> > +\field{result} is VIRTIO\_VIDEO\_RESULT\_OK. The value set by the device
> > +may differ from the requested value depending on the device's
> > +capabilities.
> > +\end{description}
> > +
> > +Outside of the error cases described above, setting a parameter does not
> > +fail. If the device cannot apply the parameter as requested, it will
> > +adjust it to the closest setting it supports, and return that value to
> > +the driver. It is then up to the driver to decide whether it can work
> > +within the range of parameters supported by the device.
>
> Does the driver need a way to discover which parameters are supported?
> Or is that depending on the context?

The set of valid parameters should be evident from the current codec,
but there may be cases (notably with the encoder) where some
parameters are optional. I guess that's another case where leveraging
V4L2 would help as it features ways to list valid parameters (or
"controls" in V4L2-speak).

> > +
> > +\drivernormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
> > +
> > +When setting a format parameter, the driver MUST check the adjusted
> > +returned value and comply with it, or try to set a different one if it
> > +cannot.
> > +
> > +\subsubsection{Encoder parameters}\label{sec:Device Types / Video Device / Parameters / Encoder parameters}
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_params_bitrate {
> > +    le32 min_bitrate;
> > +    le32 max_bitrate;
> > +    le32 bitrate;
> > +    u8 padding[4];
> > +}
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{min_bitrate}]
> > +is the minimum bitrate supported by the encoder for the current
> > +settings. Ignored when setting the parameter.
> > +\item[\field{max_bitrate}]
> > +is the maximum bitrate supported by the encoder for the current
> > +settings. Ignored when setting the parameter.
> > +\item[\field{bitrate_}]
> > +is the current desired bitrate for the encoder.
> > +\end{description}
> > +
> > +\subsection{Supported formats}\label{sec:Device Types / Video Device / Supported formats}
> > +
> > +Bitstream and image formats are identified by their fourcc code, which
> > +is a four-bytes ASCII sequence uniquely identifying the format and its
> > +properties.
> > +
> > +\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
> > +
> > +The fourcc code of each supported bitstream format is given, as well as
> > +the unit of data requested in each input resource for the decoder, or
> > +produced in each output resource for the encoder.
> > +
> > +\begin{description}
> > +\item[\field{MPG2}]
> > +MPEG2 encoded stream. One Access Unit per resource.
> > +\item[\field{H264}]
> > +H.264 encoded stream. One NAL unit per resource.
> > +\item[\field{HEVC}]
> > +HEVC encoded stream. One NAL unit per resource.
> > +\item[\field{VP80}]
> > +VP8 encoded stream. One frame per resource.
> > +\item[\field{VP90}]
> > +VP9 encoded stream. One frame per resource.
> > +\end{description}
> > +
> > +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
> > +
> > +The fourcc code of each supported image format is given, as well as its
> > +number of planes, physical buffers, and eventual subsampling.
> > +
> > +\begin{description}
> > +\item[\field{RGB3}]
> > +one RGB plane where each component takes one byte, i.e.~3 bytes per
> > +pixel.
> > +\item[\field{NV12}]
> > +one Y plane followed by interleaved U and V data, in a single buffer.
> > +4:2:0 subsampling.
> > +\item[\field{NV12}]
> > +same as \field{NV12} but using two separate buffers for the Y and UV
> > +planes.
> > +\item[\field{YU12}]
> > +one Y plane followed by one Cb plane, followed by one Cr plane, in a
> > +single buffer. 4:2:0 subsampling.
> > +\item[\field{YM12}]
> > +same as \field{YU12} but using three separate buffers for the Y, U and V
> > +planes.
> > +\end{description}
>
> Can we assume that implementers know what all of those fourcc codes
> mean? (I don't really know anything about this.) Is there some kind of
> normative reference we should add?

As Alexander pointed out, these are taken directly from V4L2, so I
will add a reference to the source.

> Generally, I don't see anything fundamentally wrong with this approach
> (mostly some smaller nits.) Feedback from someone familiar with this
> subject would be great, though.

Thanks, that's encouraging! There are still a few bits missing, and we
may switch to something different if we decide to piggyback V4L2, but
the core mechanisms will remain similar so it is great to see that
there isn't any hard blocker.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-19 16:59 ` [virtio-dev] " Alexander Gordeev
  2022-12-20  9:51   ` Cornelia Huck
@ 2022-12-27  7:31   ` Alexandre Courbot
  2023-01-11 18:42     ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2022-12-27  7:31 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa

Hi Alexander,


On Tue, Dec 20, 2022 at 1:59 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> Hello Alexandre,
>
> Thanks for the update. Please check my comments below.
> I'm new to the virtio video spec development, so I may lack some
> historic perspective. I would gladly appreciate pointing me to some
> older emails explaining decisions, that I might not understand. I hope
> to read through all of them later. Overall I have a lot of experience in
> the video domain and in virtio video device development in Opsy, so I
> hope, that my comments are relevant and useful.

Cornelia provided links to the previous versions (thanks!). Through
these revisions we tried different approaches, and the more we
progress the closer we are getting to the V4L2 stateful
decoder/encoder interface.

This is actually the point where I would particularly be interested in
having your feedback, since you probably have noticed the similarity.
What would you think about just using virtio as a transport for V4L2
ioctls (virtio-fs does something similar with FUSE), and having the
host emulate a V4L2 decoder or encoder device in place of this (long)
specification? I am personally starting to think this is could be a
better and faster way to get us to a point where both spec and guest
drivers are merged. Moreover this would also open the way to support
other kinds of V4L2 devices like simple cameras - we would just need
to allocate new device IDs for these and would be good to go.

This probably means a bit more work on the device side, since this
spec is tailored for the specific video codec use-case and V4L2 is
more generic, but also less spec to maintain and more confidence that
things will work as we want in the real world. On the other hand, the
device would also become simpler by the fact that responses to
commands could not come out-of-order as they currently do. So at the
end of the day I'm not even sure this would result in a more complex
device.

> > +\begin{lstlisting}
> > +/* Device */
> > +#define VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS       0x100
> > +
> > +/* Stream */
> > +#define VIRTIO_VIDEO_CMD_STREAM_CREATE           0x200
> > +#define VIRTIO_VIDEO_CMD_STREAM_DESTROY          0x201
>
> Is this gap in numbers intentional? It would be great to remove it to
> simplify boundary checks.

This is to allow commands of the same family to stay close to one
another. I'm not opposed to removing the gap, it just means that
commands may end up being a bit all over the place if we extend the
protocol.

> > +VIRTIO\_VIDEO\_RESULT\_OK.
> > +
> > +\subsubsection{Device Operation: Device Commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands}
> > +
> > +Device capabilities are retrieved using the
> > +VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS command, which returns arrays of
> > +formats supported by the input and output queues.
> > +
> > +\paragraph{VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Device Commands / VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS}
> > +
> > +Retrieve device capabilities.
> > +
> > +The driver sends this command with
> > +\field{struct virtio_video_device_query_caps}:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_device_query_caps {
> > +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS */
> > +};
> > +\end{lstlisting}
> > +
> > +The device responds with
> > +\field{struct virtio_video_device_query_caps_resp}:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_device_query_caps_resp {
> > +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> > +        le32 num_bitstream_formats;
> > +        le32 num_image_formats;
> > +        /**
> > +         * Followed by
> > +         * struct virtio_video_bitstream_format_desc bitstream_formats[num_bitstream_formats];
> > +         */
> > +        /**
> > +         * Followed by
> > +         * struct virtio_video_image_format_desc image_formats[num_image_formats]
> > +         */
> > +};
>
> struct virtio_video_bitstream_format_desc and struct
> virtio_video_image_format_desc are not declared anywhere.

Oops, nice catch.

> > +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> > +
> > +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN by the
> > +driver.
> > +
> > +\field{stream_id} MUST be set to a valid stream ID previously returned
> > +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> > +
> > +The driver MUST keep queueing output resources until it gets the
> > +response to this command. Failure to do so may result in the device
> > +stalling as it waits for output resources to write into.
> > +
> > +The driver MUST account for the fact that the response to this command
> > +might come out-of-order (i.e.~after other commands sent to the device),
> > +and that it can be interrupted.
>
> The driver MUST send a DRAIN command when it doesn't have any input,
> right? Otherwise, there is no way to receive all the decoded buffers
> back, if the codec just keeps some of them waiting for more input.

Good point, a DRAIN should be mandatory to ensure all the submitted
work has been processed and the output made available by the device.

> > +
> > +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
> > +
> > +Before the device sends the response, it MUST process and respond to all
> > +the VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue that
> > +were sent before the drain command, and make all the corresponding
> > +output resources available to the driver by responding to their
> > +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command.
>
> Unfortunately I don't see much details about the OUTPUT queue. What if
> the driver queues new output buffers, as it must do, fast enough? Looks
> like a valid implementation of the DRAIN command might never send a
> response in this case, because the only thing it does is replying to
> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE commands on the OUTPUT queue. I guess,
> it is better to specify what happens. I think the device should respond
> to a certain amount of OUTPUT QUEUE commands until there is an end of
> stream condition. Then it should respond to DRAIN command. What happens
> with the remaining queued output buffers is a question to me: should
> they be cancelled or not?

If I understand correctly this should not be a problem. Replies to
commands can come out-of-order, so the reply to DRAIN can come as soon
as the command is completed, regardless of how many output buffers we
have queued at that moment. The queued output buffers can also remain
queued in prediction for the next sequence, if any - if it has the
same resolution as the previous one, then the queued output buffers
can be used. If it doesn't then a resolution change event will be
produced and the driver will process it.

> > +
> > +While the device is processing the command, it MUST return
> > +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to the
> > +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN command.
>
> Should the device stop accepting input too?

There should be no problem with the device accepting (and even
processing) input for the next sequence, as long as it doesn't make
its result available before the response to the DRAIN command.

> > +
> > +If the command is interrupted due to a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
> > +or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation, the device MUST
> > +respond with VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
> > +
> > +\paragraph{VIRTIO_VIDEO_CMD_STREAM_STOP}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
> > +
>
> I don't like this command to be called "stop". When I see a "stop"
> command, I expect to see a "start" command as well. My personal
> preference would be "flush" or "reset".

Fair enough, let me rename this to RESET (which was the name used in a
previous revision for a somehow-similar command).

> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{result}]
> > +is
> > +
> > +\begin{description}
> > +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> > +if the operation succeeded,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> > +if the requested stream does not exist,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> > +if the \field{param_type} argument is invalid for the device,
> > +\end{description}
> > +\item[\field{param}]
> > +is the value of the requested parameter, if \field{result} is
> > +VIRTIO\_VIDEO\_RESULT\_OK.
> > +\end{description}
> > +
> > +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
> > +
> > +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM
> > +by the driver.
> > +
> > +\field{stream_id} MUST be set to a valid stream ID previously returned
> > +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> > +
> > +\field{param_type} MUST be set to a parameter type that is valid for the
> > +device.
>
> The device requirements are missing for GET_PARAMS.

There aren't any beyond returning the requested parameter or an error code.

> > +
> > +\paragraph{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
> > +
> > +Write the value of a parameter of the given stream, and return the value
> > +actually set by the device. Available parameters depend on the device
> > +type and are listed in
> > +\ref{sec:Device Types / Video Device / Parameters}.
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_stream_set_param {
> > +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_SET_PARAM */
> > +        le32 stream_id;
> > +        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
> > +        u8 padding[4];
> > +        union virtio_video_stream_params param;
> > +}
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{stream_id}]
> > +is the ID of the stream we want to set a parameter for.
> > +\item[\field{param_type}]
> > +is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
> > +we want to set.
> > +\end{description}
> > +
> > +The device responds with \field{struct virtio_video_stream_param_resp}:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_stream_param_resp {
> > +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> > +        union virtio_video_stream_params param;
>
> I'd prefer to have param_type in the response too for safety.

Done.

> > +};
> > +\end{lstlisting}
> > +
> > +Within \field{struct virtio_video_resource_sg_entry}:
> > +
> > +\begin{description}
> > +\item[\field{addr}]
> > +is a guest physical address to the start of the SG entry.
> > +\item[\field{length}]
> > +is the length of the SG entry.
> > +\end{description}
>
> I think having explicit page alignment requirements here would be great.

This may be host-dependent, maybe we should have a capability field so
it can provide this information?

> > +
> > +Finally, for \field{struct virtio_video_resource_sg_list}:
> > +
> > +\begin{description}
> > +\item[\field{num_entries}]
> > +is the number of \field{struct virtio_video_resource_sg_entry} instances
> > +that follow.
> > +\end{description}
> > +
> > +\field{struct virtio_video_resource_object} is defined as follows:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_resource_object {
> > +        u8 uuid[16];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[uuid]
> > +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
> > +\end{description}
> > +
> > +The device responds with
> > +\field{struct virtio_video_resource_attach_backing_resp}:
> > +
> > +\begin{lstlisting}
> > +struct virtio_video_resource_attach_backing_resp {
> > +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{result}]
> > +is
> > +
> > +\begin{description}
> > +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> > +if the operation succeeded,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> > +if the mentioned stream does not exist,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> > +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
> > +invalid value,
> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> > +if the operation is performed at a time when it is non-valid.
> > +\end{description}
> > +\end{description}
> > +
> > +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
> > +the following times:
> > +
> > +\begin{itemize}
> > +\item
> > +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
> > +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
> > +  resource,
> > +\item
> > +  AFTER successfully changing the \field{virtio_video_params_resources}
> > +  parameter corresponding to the queue and BEFORE
> > +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
> > +\end{itemize}
> > +
> > +This is to ensure that the device can rely on the fact that a given
> > +resource will always point to the same memory for as long as it may be
> > +used by the video device. For instance, a decoder may use returned
> > +decoded frames as reference for future frames and won't overwrite the
> > +backing resource of a frame that is being referenced. It is only before
> > +a stream is started and after a Dynamic Resolution Change event has
> > +occurred that we can be sure that all resources won't be used in that
> > +way.
>
> The mentioned scenario about the referenced frames looks
> somewhatreasonable, but I wonder how exactly would that work in practice.

Basically the guest need to make sure the backing memory remains
available and unwritten until the conditions mentioned above are met.
Or is there anything unclear in this description?

> > +
> > +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> > +
> > +\field{cmd_type} MUST be set to
> > +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING by the driver.
> > +
> > +\field{stream_id} MUST be set to a valid stream ID previously returned
> > +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
> > +
> > +\field{queue_type} MUST be set to a valid queue type.
> > +
> > +\field{resource_id} MUST be an integer inferior to the number of
> > +resources currently allocated for the queue.
> > +
> > +The length of the \field{resources} array of
> > +\field{struct virtio_video_resource_attach_backing} MUST be equal to the
> > +number of resources required by the format currently set on the queue,
> > +as described in
> > +\ref{sec:Device Types / Video Device / Supported formats}.
> > +
> > +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}{Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING}
> > +
> > +At any time other than the times valid for calling this command, the
> > +device MUST return VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION.
> > +
> > +\paragraph{VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Resource Commands / VIRTIO_VIDEO_CMD_RESOURCE_QUEUE}
> > +
> > +Add a resource to the device's queue.
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_VIDEO_MAX_PLANES                    8
> > +
> > +#define VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME  (1 << 0)
> > +
> > +struct virtio_video_resource_queue {
> > +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING */
>
> s/VIRTIO_VIDEO_CMD_RESOURCE_ATTACH_BACKING/VIRTIO_VIDEO_CMD_RESOURCE_QUEUE/
>
>
> > +        le32 stream_id;
> > +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
> > +        le32 resource_id;
> > +        le32 flags; /* Bitmask of VIRTIO_VIDEO_ENQUEUE_FLAG_* */
> > +        u8 padding[4];
> > +        le64 timestamp;
> > +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{stream_id}]
> > +is the ID of a valid stream.
> > +\item[\field{queue_type}]
> > +is the direction of the queue.
> > +\item[\field{resource_id}]
> > +is the ID of the resource to be queued.
> > +\item[\field{flags}]
> > +is a bitmask of VIRTIO\_VIDEO\_ENQUEUE\_FLAG\_* values.
> > +
> > +\begin{description}
> > +\item[\field{VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME}]
> > +The submitted frame is to be encoded as a key frame. Only valid for the
> > +encoder's INPUT queue.
> > +\end{description}
> > +\item[\field{timestamp}]
> > +is an abstract sequence counter that can be used on the INPUT queue for
> > +synchronization. Resources produced on the output queue will carry the
> > +\field{timestamp} of the input resource they have been produced from.
>
> I think this is quite misleading. Implementers may assume, that it is ok
> to assume a 1-to-1 mapping between input and output buffers and no
> reordering, right? But this is not the case usually:
>
> 1. In the end of the spec H.264 and HEVC are defined to always have a
> single NAL unit per resource. Well, there are many types of NAL units,
> that do not represent any video data. Like SEI NAL units or delimiters.
>
> 2. We may assume that the SEI and delimiter units are filtered before
> queuing, but there still is also other codec-specific data that can't be
> filtered, like SPS and PPS NAL units. There has to be some special handling.
>
> 3. All of this means more codec-specific code in the driver or client
> applications.
>
> 4. This spec says that the device may skip to a next key frame after a
> seek. So the driver has to account for this too.
>
> 5. For example, in H.264 a single key frame may by coded by several NAL
> units. In fact all VCL NAL units are called slices because of this. What
> happens when the decoder sees several NAL units with different
> timestamps coding the same output frame? Which timestamp will it choose?
> I'm not sure it is defined anywhere. Probably it will just take the
> first timestamp. The driver/client applications have to be ready for
> this too.
>
> 6. I saw almost the same scenario with CSD units too. Imagine SPS with
> timestamp 0, then PPS with 1, and then an IDR with 2. These three might
> be combined in a single input buffer together by the vendor-provided
> decoding software. Then the timestamp of the resulting frame is
> naturally 0. But the driver/client application already doesn't expect to
> get any response with timestamps 0 and 1, because they are known to be
> belonging to CSD. And it expects an output buffer with ts 2. So there
> will be a problem. (This is a real world example actually.)
>
> 7. Then there is H.264 High profile, for example. It has different
> decoding and presentation order because frames may depend on future
> frames. I think all the modern codecs have a mode like this. The input
> frames are usually provided in the decoding order. Should the output
> frames timestamps just be copied from input frames, they have been
> produced from as this paragraph above says? This resembles decoder order
> then. Well, this can work, if the container has correct DTS and PTS, and
> the client software creates a mapping between these timestamps and the
> virtio video timestamp. But this is not always the case. For example,
> simple H.264 bitstream doesn't have any timestamps. And still it can be
> easily played by ffmpeg/gstreamer/VLC/etc. There is no way to make this
> work with a decoder following this spec, I think.
>
> My suggestion is to not think about the timestamp as an abstract
> counter, but give some freedom to the device by providing the available
> information from the container, be it DTS, PTR or only FPS (through
> PARAMS). Also the input and output queues should indeed be completely
> separated. There should be no assumption of a 1-to-1 mapping of buffers.

The beginning of the "Device Operation" section tries to make it clear
that the input and output queues are operating independently and that
no mapping or ordering should be expected by the driver, but maybe
this is worth repeating here.

Regarding the use of timestamp, a sensible use would indeed be for the
driver to set it to some meaningful information retrieved from the
container (which the driver would itself obtain from user-space),
probably the PTS if that is available. In the case of H.264 non-VCL
NAL units would not produce any output, so their timestamp would
effectively be ignored. For frames that are made of several slices,
the first timestamp should be the one propagated to the output frame.
(and this here is why I prefer VP8/VP9 ^_^;)

In fact most users probably won't care about this field. In the worst
case, even if no timestamp is available, operation can still be done
reliably since decoded frames are made available in presentation
order. This fact was not obvious in the spec, so I have added a
sentence in the "Device Operation" section to clarify.

I hope this answers your concerns, but please let me know if I didn't
address something in particular.

> > +\item[\field{planes}]
> > +is the format description of each individual plane making this format.
> > +The number of planes is dependent on the \field{fourcc} and detailed in
> > +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> > +
> > +\begin{description}
> > +\item[\field{buffer_size}]
> > +is the minimum size of the buffers that will back resources to be
> > +queued.
> > +\item[\field{stride}]
> > +is the distance in bytes between two lines of data.
> > +\item[\field{offset}]
> > +is the starting offset for the data in the buffer.
>
> It is not quite clear to me how to use the offset during SET_PARAMS. I
> think it is much more reasonable to have per plane offsets in struct
> virtio_video_resource_queue and struct virtio_video_resource_queue_resp.

This is supposed to describe where in a given buffer the host can find
the beginning of a given plane (mostly useful for multi-planar/single
buffer formats). This typically does not change between frames, so
having it as a parameter seems appropriate to me?

> > +encode at the requested format and resolution.
>
> It is not defined when changing these parameters is allowed. Also there
> is an issue: changing width, height, format, buffer_size should probably
> detach all the currently attached buffers. But changing crop shouldn't
> affect the output buffers in any way, right? So maybe it is better to
> split them?

If the currently attached buffers are large enough to support the new
format, there should not be any need to detach them (if they are not,
the SET_PARAM command should fail). So even if we only change the
crop, the device can perform the full validation on the format and
keep going with the current buffers if possible.

Indeed the timing for setting this parameter should be better defined.
In particular the input format for a decoder (or output format for an
encoder) will probably remain static through the session.

> > +\item[\field{YU12}]
> > +one Y plane followed by one Cb plane, followed by one Cr plane, in a
> > +single buffer. 4:2:0 subsampling.
> > +\item[\field{YM12}]
> > +same as \field{YU12} but using three separate buffers for the Y, U and V
> > +planes.
> > +\end{description}
>
> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
> V4L2 documentation has a nice description of exact plane layouts.
> Otherwise it would be nice to have these layouts in the spec IMO.

I've linked to the relevant V4L2 pages, indeed they describe the
formats and layouts much better.

Thanks for all the feedback. We can continue on this basis, or I can
try to build a small prototype of that V4L2-over-virtio idea if you
agree this looks like a good idea. The guest driver would mostly be
forwarding the V4L2 ioctls as-is to the host, it would be interesting
to see how small we can make it with this design.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-27  5:38   ` Alexandre Courbot
@ 2023-01-11  8:45     ` Cornelia Huck
  2023-01-12  6:32       ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2023-01-11  8:45 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa

On Tue, Dec 27 2022, Alexandre Courbot <acourbot@chromium.org> wrote:

> Hi Cornelia, thanks for the feedback! I have directly reported the
> comments snipped from this answer to the source document.
>
> On Fri, Dec 9, 2022 at 12:01 AM Cornelia Huck <cohuck@redhat.com> wrote:
>>
>> On Thu, Dec 08 2022, Alexandre Courbot <acourbot@chromium.org> wrote:
>>
>> > Add the specification of the video decoder and encoder devices, which
>> > can be used to provide host-accelerated video operations to the guest.
>> >
>> > Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
>> > Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
>> > --
>> > Here is the long-overdue new revision of the virtio-video RFC. This
>> > version reorganizes the specification quite a bit and tries to simplify
>> > the protocol further. Nonetheless, it still results in a rather long (17
>> > pages) specification for just these devices, even though the spec is not
>> > fully complete (I want to rethink the formats descriptions, and some
>> > parameters need to be added for the encoder device).
>> >
>> > I would like to get some high-level feedback on this version and maybe
>> > propose to do things a bit differently before people invest too much
>> > time reviewing this in depth. While rewriting this document, it became
>> > more and more obvious that this is just a different, and maybe a bit
>> > simpler, reimplementation of the V4L2 stateless decoder protocol [1]. I
>> > am now wondering whether it would not make more sense to rewrite this
>> > specification as just a way to transport V4L2 requests over virtio,
>> > similarly to how virtio-fs does with the FUSE protocol [2].
>> >
>> > At the time we started writing this implementation, the V4L2 protocols
>> > for decoders and encoders were not set in stone yet, but now that they
>> > are it might make sense to reconsider. Switching to this solution would
>> > greatly shorten the virtio-video device spec, and also provide a way to
>> > support other kind of V4L2 devices like cameras or image processors at
>> > no extra cost.
>> >
>> > Note that doing so would not require that either the host or guest uses
>> > V4L2 - the virtio video device would just emulate a V4L2 device over
>> > virtio. A few adaptations would need to be done regarding how memory
>> > types work, but otherwise I believe most of V4L2 could be used as-is.
>> >
>> > Please share your thoughts about this, and I will either explore this
>> > idea further with a prototype, or keep moving the present spec forward,
>> > hopefully at a faster pace.
>>
>> In principle, reusing an existing interface that does the job might be a
>> good idea. I see that the Linux headers are dual-licenced as 3-clause
>> BSD, and if the interface has indeed stabilized, it might be a good idea
>> to rely on it. The main question is: Is the interface sufficiently
>> independent from Linux specialities (i.e. can others implement it
>> without issue?)
>
> From what I can infer after looking at the sequence of ioctls that
> would be necessary to decode a stream, I believe this would work with
> only minor arrangements.

Great.

>
>> > +\end{description}
>> > +
>> > +\subsection{Feature bits}\label{sec:Device Types / Video Device / Feature bits}
>> > +
>> > +\begin{description}
>> > +\item[VIRTIO\_VIDEO\_F\_RESOURCE\_GUEST\_PAGES (0)]
>>
>> Side note: you should get the correct output even without escaping the
>> underscores (although your editor might still be confused...)
>
> Actually this LaTeX document has been generated from a Markdown file
> passed through a Pandoc filter (this makes it simpler to write for me
> vs. writing the LaTeX directly). I'll see if I can remove these escape
> sequences using the filter or sed.

If you could make it work, that would be good for consistency reasons.

>
>> > +
>> > +\begin{lstlisting}
>> > +struct virtio_video_config {
>> > +        le32 version;
>> > +        le32 caps_length;
>> > +};
>> > +\end{lstlisting}
>> > +
>> > +\begin{description}
>> > +\item[\field{version}]
>> > +is the protocol version that the device understands.
>> > +\item[\field{caps_length}]
>> > +is the minumum length in bytes that a device-writable buffer must have
>> > +in order to receive the response to
>> > +VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
>> > +\end{description}
>> > +
>> > +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / Video Device / Device configuration layout}
>> > +
>> > +As there is currently only one version of the protocol, the device MUST
>> > +set the \field{version} field to 0.
>>
>> In what way would you want to change the protocol so that it becomes
>> incompatible? Extensions should be easy to handle via extra
>> capabilities, and if we don't expect the protocol to change often, a
>> feature bit for a new format might be sufficient.
>>
>> If we stick with the version field, maybe start at 1 and make 0 invalid?
>> Probably easier to spot errors that way.
>
> You are right, this is probably not needed. I guess in the early days
> we wanted to handle the case where the protocol would evolve in
> incompatible ways, but we'd better not consider that route at all if
> only for the complexity that would be added to the spec. I'll remove
> the version field.

Sounds good.

>
>>
>> > +
>> > +The device MUST set the \field{caps_length} field to a value equal to
>> > +the response size of VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
>>
>> Could the device also support a minimum response size that only supports
>> a subset of the caps to be returned? Otherwise, I think caps_length is
>> the maximum (or fixed?) length of the query caps response?
>
> I think this can be replaced by a fixed-size call for getting only one
> format at a time. The guest would have to make several of these in
> order to obtain the whole set of supported formats, but it would be
> easier to parse compared to the large result returned by QUERY_CAP and
> simpler overall.

How would you implement this? Would the driver do the call repeatedly
until no more formats remain (requires the device to track state, and
needs a specification on what happens if the driver continues doing the
call?) Or would the driver pass in an index, and the device only needs
to check for out-of-range?

>
>> > +
>> > +\subsubsection{Device Operation: Stream commands}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands}
>> > +
>> > +Stream commands allow the creation, destruction, and flow control of a
>> > +stream.
>> > +
>> > +\paragraph{VIRTIO_VIDEO_CMD_STREAM_CREATE}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_CREATE}
>> > +
>> > +Create a new stream using the device.
>> > +
>> > +The driver sends this command with
>> > +\field{struct virtio_video_stream_create}:
>> > +
>> > +\begin{lstlisting}
>> > +struct virtio_video_stream_create {
>> > +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_CREATE */
>> > +};
>> > +\end{lstlisting}
>> > +
>> > +The device responds with \field{struct virtio_video_stream_create_resp}:
>> > +
>> > +\begin{lstlisting}
>> > +struct virtio_video_stream_create_resp {
>> > +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
>> > +        le32 stream_id;
>> > +};
>> > +\end{lstlisting}
>> > +
>> > +\begin{description}
>> > +\item[\field{result}]
>> > +is
>> > +
>> > +\begin{description}
>> > +\item[VIRTIO\_VIDEO\_RESULT\_OK]
>> > +if the operation succeeded,
>> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_OUT\_OF\_MEMORY]
>> > +if the limit of simultaneous streams has been reached by the device and
>> > +no more can be created.
>> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_COMMAND]
>> > +if the stream cannot be created due to an unexpected device issue.
>>
>> Is it an "unexpected device issue" or "the driver send something it
>> should not have"? It might be a good idea to distinguish the two?
>
> This error code should not be ambiguous as the input of the command is
> its type. Therefore this error code can only be returned in case of a
> library or hardware error on the host side.

Ok.

>
>> > +
>> > +\field{stream_id} MUST be set to a valid stream ID previously returned
>> > +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
>> > +
>> > +\field{param_type} MUST be set to a parameter type that is valid for the
>> > +device.
>> > +
>> > +\paragraph{VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_SET_PARAM}
>> > +
>> > +Write the value of a parameter of the given stream, and return the value
>> > +actually set by the device. Available parameters depend on the device
>> > +type and are listed in
>> > +\ref{sec:Device Types / Video Device / Parameters}.
>> > +
>> > +\begin{lstlisting}
>> > +struct virtio_video_stream_set_param {
>> > +        le32 cmd_type; /* VIRTIO_VIDEO_CMD_STREAM_SET_PARAM */
>> > +        le32 stream_id;
>> > +        le32 param_type; /* VIRTIO_VIDEO_PARAMS_* */
>> > +        u8 padding[4];
>> > +        union virtio_video_stream_params param;
>> > +}
>> > +\end{lstlisting}
>> > +
>> > +\begin{description}
>> > +\item[\field{stream_id}]
>> > +is the ID of the stream we want to set a parameter for.
>> > +\item[\field{param_type}]
>> > +is one of the VIRTIO\_VIDEO\_PARAMS\_* values indicating the parameter
>> > +we want to set.
>> > +\end{description}
>> > +
>> > +The device responds with \field{struct virtio_video_stream_param_resp}:
>> > +
>> > +\begin{lstlisting}
>> > +struct virtio_video_stream_param_resp {
>> > +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
>> > +        union virtio_video_stream_params param;
>> > +};
>> > +\end{lstlisting}
>> > +
>> > +\begin{description}
>> > +\item[\field{result}]
>> > +is
>> > +
>> > +\begin{description}
>> > +\item[VIRTIO\_VIDEO\_RESULT\_OK]
>> > +if the operation succeeded,
>> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
>> > +if the requested stream does not exist,
>> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
>> > +if the \field{param_type} argument is invalid for the device,
>> > +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
>> > +if the requested parameter cannot be modified at this moment.
>> > +\end{description}
>> > +\item[\field{param}]
>> > +is the actual value of the parameter set by the device, if
>> > +\field{result} is VIRTIO\_VIDEO\_RESULT\_OK. The value set by the device
>> > +may differ from the requested value depending on the device's
>> > +capabilities.
>> > +\end{description}
>> > +
>> > +Outside of the error cases described above, setting a parameter does not
>> > +fail. If the device cannot apply the parameter as requested, it will
>> > +adjust it to the closest setting it supports, and return that value to
>> > +the driver. It is then up to the driver to decide whether it can work
>> > +within the range of parameters supported by the device.
>>
>> Does the driver need a way to discover which parameters are supported?
>> Or is that depending on the context?
>
> The set of valid parameters should be evident from the current codec,
> but there may be cases (notably with the encoder) where some
> parameters are optional. I guess that's another case where leveraging
> V4L2 would help as it features ways to list valid parameters (or
> "controls" in V4L2-speak).

Yes, that would be a good way to handle that.

>
>> > +
>> > +\drivernormative{\paragraph}{Format parameters}{Device Types / Video Device / Parameters / Format parameters}
>> > +
>> > +When setting a format parameter, the driver MUST check the adjusted
>> > +returned value and comply with it, or try to set a different one if it
>> > +cannot.
>> > +
>> > +\subsubsection{Encoder parameters}\label{sec:Device Types / Video Device / Parameters / Encoder parameters}
>> > +
>> > +\begin{lstlisting}
>> > +struct virtio_video_params_bitrate {
>> > +    le32 min_bitrate;
>> > +    le32 max_bitrate;
>> > +    le32 bitrate;
>> > +    u8 padding[4];
>> > +}
>> > +\end{lstlisting}
>> > +
>> > +\begin{description}
>> > +\item[\field{min_bitrate}]
>> > +is the minimum bitrate supported by the encoder for the current
>> > +settings. Ignored when setting the parameter.
>> > +\item[\field{max_bitrate}]
>> > +is the maximum bitrate supported by the encoder for the current
>> > +settings. Ignored when setting the parameter.
>> > +\item[\field{bitrate_}]
>> > +is the current desired bitrate for the encoder.
>> > +\end{description}
>> > +
>> > +\subsection{Supported formats}\label{sec:Device Types / Video Device / Supported formats}
>> > +
>> > +Bitstream and image formats are identified by their fourcc code, which
>> > +is a four-bytes ASCII sequence uniquely identifying the format and its
>> > +properties.
>> > +
>> > +\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
>> > +
>> > +The fourcc code of each supported bitstream format is given, as well as
>> > +the unit of data requested in each input resource for the decoder, or
>> > +produced in each output resource for the encoder.
>> > +
>> > +\begin{description}
>> > +\item[\field{MPG2}]
>> > +MPEG2 encoded stream. One Access Unit per resource.
>> > +\item[\field{H264}]
>> > +H.264 encoded stream. One NAL unit per resource.
>> > +\item[\field{HEVC}]
>> > +HEVC encoded stream. One NAL unit per resource.
>> > +\item[\field{VP80}]
>> > +VP8 encoded stream. One frame per resource.
>> > +\item[\field{VP90}]
>> > +VP9 encoded stream. One frame per resource.
>> > +\end{description}
>> > +
>> > +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
>> > +
>> > +The fourcc code of each supported image format is given, as well as its
>> > +number of planes, physical buffers, and eventual subsampling.
>> > +
>> > +\begin{description}
>> > +\item[\field{RGB3}]
>> > +one RGB plane where each component takes one byte, i.e.~3 bytes per
>> > +pixel.
>> > +\item[\field{NV12}]
>> > +one Y plane followed by interleaved U and V data, in a single buffer.
>> > +4:2:0 subsampling.
>> > +\item[\field{NV12}]
>> > +same as \field{NV12} but using two separate buffers for the Y and UV
>> > +planes.
>> > +\item[\field{YU12}]
>> > +one Y plane followed by one Cb plane, followed by one Cr plane, in a
>> > +single buffer. 4:2:0 subsampling.
>> > +\item[\field{YM12}]
>> > +same as \field{YU12} but using three separate buffers for the Y, U and V
>> > +planes.
>> > +\end{description}
>>
>> Can we assume that implementers know what all of those fourcc codes
>> mean? (I don't really know anything about this.) Is there some kind of
>> normative reference we should add?
>
> As Alexander pointed out, these are taken directly from V4L2, so I
> will add a reference to the source.

Sounds good.

>
>> Generally, I don't see anything fundamentally wrong with this approach
>> (mostly some smaller nits.) Feedback from someone familiar with this
>> subject would be great, though.
>
> Thanks, that's encouraging! There are still a few bits missing, and we
> may switch to something different if we decide to piggyback V4L2, but
> the core mechanisms will remain similar so it is great to see that
> there isn't any hard blocker.

Let's see how we can move this forward to something that can be
included in the spec, always good to see a new, useful device!


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-08  7:23 [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification Alexandre Courbot
  2022-12-08 15:00 ` Cornelia Huck
  2022-12-19 16:59 ` [virtio-dev] " Alexander Gordeev
@ 2023-01-11 17:04 ` Alexander Gordeev
  2023-01-12  6:32   ` Alexandre Courbot
  2023-01-11 18:45 ` Alexander Gordeev
  3 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-01-11 17:04 UTC (permalink / raw)
  To: Alexandre Courbot, virtio-dev, Keiichi Watanabe, Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa

Hello,

Sorry for the delay. I'm still gathering data in the old emails...

On 08.12.22 08:23, Alexandre Courbot wrote:
> Add the specification of the video decoder and encoder devices, which
> can be used to provide host-accelerated video operations to the guest.
>
> Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> --
> Here is the long-overdue new revision of the virtio-video RFC. This
> version reorganizes the specification quite a bit and tries to simplify
> the protocol further. Nonetheless, it still results in a rather long (17
> pages) specification for just these devices, even though the spec is not
> fully complete (I want to rethink the formats descriptions, and some
> parameters need to be added for the encoder device).
>
> I would like to get some high-level feedback on this version and maybe
> propose to do things a bit differently before people invest too much
> time reviewing this in depth. While rewriting this document, it became
> more and more obvious that this is just a different, and maybe a bit
> simpler, reimplementation of the V4L2 stateless decoder protocol [1]. I
> am now wondering whether it would not make more sense to rewrite this
> specification as just a way to transport V4L2 requests over virtio,
> similarly to how virtio-fs does with the FUSE protocol [2].
>
> At the time we started writing this implementation, the V4L2 protocols
> for decoders and encoders were not set in stone yet, but now that they
> are it might make sense to reconsider. Switching to this solution would
> greatly shorten the virtio-video device spec, and also provide a way to
> support other kind of V4L2 devices like cameras or image processors at
> no extra cost.
>
> Note that doing so would not require that either the host or guest uses
> V4L2 - the virtio video device would just emulate a V4L2 device over
> virtio. A few adaptations would need to be done regarding how memory
> types work, but otherwise I believe most of V4L2 could be used as-is.
>
> Please share your thoughts about this, and I will either explore this
> idea further with a prototype, or keep moving the present spec forward,
> hopefully at a faster pace.
>
> Due to the RFC state of this patch I have refrained from referencing the
> normative statements in conformance.tex - I will do that as a final step
> once the spec is mostly agreed on.
>
> [1] https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdocs.kernel.org%2fuserspace%2dapi%2fmedia%2fv4l%2fdev%2dstateless%2ddecoder.html&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-e98508782bc1c9aa6b2e4a9df9d4dd170f9a5ffa
> [2] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-fs.tex
>
> Full PDF:
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f1HRVDiDdo50%2dS9X5tWgzmT90FJRHoB1dN%2fview%3fusp%3dsharing&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-e315af79a067165e908bf1d803441eb181e2f375
>
> PDF of video section only:
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD%2fview%3fusp%3dsharing&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-b5c45bb4b6ccc5b73ea2a54e26f151d61722d0df
> ---
>   content.tex      |    1 +
>   virtio-video.tex | 1420 ++++++++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 1421 insertions(+)
>   create mode 100644 virtio-video.tex
>
> ...snip...
>
> +\subsection{Supported formats}\label{sec:Device Types / Video Device / Supported formats}
> +
> +Bitstream and image formats are identified by their fourcc code, which
> +is a four-bytes ASCII sequence uniquely identifying the format and its
> +properties.
> +
> +\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
> +
> +The fourcc code of each supported bitstream format is given, as well as
> +the unit of data requested in each input resource for the decoder, or
> +produced in each output resource for the encoder.
> +
> +\begin{description}
> +\item[\field{MPG2}]
> +MPEG2 encoded stream. One Access Unit per resource.
> +\item[\field{H264}]
> +H.264 encoded stream. One NAL unit per resource.
> +\item[\field{HEVC}]
> +HEVC encoded stream. One NAL unit per resource.
> +\item[\field{VP80}]
> +VP8 encoded stream. One frame per resource.
> +\item[\field{VP90}]
> +VP9 encoded stream. One frame per resource.
> +\end{description}
> +

Actually I'm not sure where do these fourcc codes for bitstream formats
come from.

In one of the old comments I found a reference to
https://www.rfc-editor.org/rfc/rfc2361

But it doesn't define MPG2, HEVC, VP80, VP90.

Also there is this comment to virtio-video v1 from Tomasz Figa and the
related discussion:
https://markmail.org/message/gc6h25acct22niut#query:+page:1+mid:et4l3ni7qjqhiygo+state:results

He wrote that it is not worth it because there are so many conflicting
sets of fourcc codes.

I can see that fourcc codes were not used in virtio-video spec draft
versions 1 to 5. So now it looks quite weird to see them here. Probably
this is because you'd like to mimic v4l2 interface more closely?

> +\subsubsection{Image formats}\label{sec:Device Types / Video Device / Supported formats / Image formats}
> +
> +The fourcc code of each supported image format is given, as well as its
> +number of planes, physical buffers, and eventual subsampling.
> +
> +\begin{description}
> +\item[\field{RGB3}]
> +one RGB plane where each component takes one byte, i.e.~3 bytes per
> +pixel.
> +\item[\field{NV12}]
> +one Y plane followed by interleaved U and V data, in a single buffer.
> +4:2:0 subsampling.
> +\item[\field{NV12}]
> +same as \field{NV12} but using two separate buffers for the Y and UV
> +planes.
> +\item[\field{YU12}]
> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
> +single buffer. 4:2:0 subsampling.
> +\item[\field{YM12}]
> +same as \field{YU12} but using three separate buffers for the Y, U and V
> +planes.
> +\end{description}

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-27  7:31   ` Alexandre Courbot
@ 2023-01-11 18:42     ` Alexander Gordeev
  2023-01-11 20:13       ` Alex Bennée
  2023-01-12  6:39       ` Alexandre Courbot
  0 siblings, 2 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-01-11 18:42 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa

Hi Alexandre,

On 27.12.22 08:31, Alexandre Courbot wrote:
> Hi Alexander,
>
>
> On Tue, Dec 20, 2022 at 1:59 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>> Hello Alexandre,
>>
>> Thanks for the update. Please check my comments below.
>> I'm new to the virtio video spec development, so I may lack some
>> historic perspective. I would gladly appreciate pointing me to some
>> older emails explaining decisions, that I might not understand. I hope
>> to read through all of them later. Overall I have a lot of experience in
>> the video domain and in virtio video device development in Opsy, so I
>> hope, that my comments are relevant and useful.
> Cornelia provided links to the previous versions (thanks!). Through
> these revisions we tried different approaches, and the more we
> progress the closer we are getting to the V4L2 stateful
> decoder/encoder interface.
>
> This is actually the point where I would particularly be interested in
> having your feedback, since you probably have noticed the similarity.
> What would you think about just using virtio as a transport for V4L2
> ioctls (virtio-fs does something similar with FUSE), and having the
> host emulate a V4L2 decoder or encoder device in place of this (long)
> specification? I am personally starting to think this is could be a
> better and faster way to get us to a point where both spec and guest
> drivers are merged. Moreover this would also open the way to support
> other kinds of V4L2 devices like simple cameras - we would just need
> to allocate new device IDs for these and would be good to go.
>
> This probably means a bit more work on the device side, since this
> spec is tailored for the specific video codec use-case and V4L2 is
> more generic, but also less spec to maintain and more confidence that
> things will work as we want in the real world. On the other hand, the
> device would also become simpler by the fact that responses to
> commands could not come out-of-order as they currently do. So at the
> end of the day I'm not even sure this would result in a more complex
> device.

Sorry for the delay. I tried to gather data about how the spec has
evolved in the old emails.

Well, on the one hand mimicking v4l2 looks like an easy solution from
virtio-video spec writing perspective. (But the implementers will have
to read the V4L2 API instead AFAIU, which is probably longer...)

On the other hand v4l2 has a lot of history. It started as a camera API
and gained the codec support later, right? So it definitely has just to
much stuff irrelevant for codecs. Here we have an option to design from
scratch taking the best ideas from v4l2.

Also I have concerns about the virtio-video spec development. This seems
like a big change. It seems to me that after so many discussions and
versions of the spec, the process should be coming to something by now.
But this is still a moving target...

There were arguments against adding camera support for security and
complexity reasons during discussions about virtio-video spec v1. Were
these concerns addressed somehow? Maybe I missed a followup discussion?


>>> +\begin{lstlisting}
>>> +/* Device */
>>> +#define VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS       0x100
>>> +
>>> +/* Stream */
>>> +#define VIRTIO_VIDEO_CMD_STREAM_CREATE           0x200
>>> +#define VIRTIO_VIDEO_CMD_STREAM_DESTROY          0x201
>> Is this gap in numbers intentional? It would be great to remove it to
>> simplify boundary checks.
> This is to allow commands of the same family to stay close to one
> another. I'm not opposed to removing the gap, it just means that
> commands may end up being a bit all over the place if we extend the
> protocol.

Actually there is a gap between 0x201 and 0x203. Sorry for not being
clear here.


>>> +
>>> +\devicenormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_DRAIN}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_DRAIN}
>>> +
>>> +Before the device sends the response, it MUST process and respond to all
>>> +the VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE commands on the INPUT queue that
>>> +were sent before the drain command, and make all the corresponding
>>> +output resources available to the driver by responding to their
>>> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE command.
>> Unfortunately I don't see much details about the OUTPUT queue. What if
>> the driver queues new output buffers, as it must do, fast enough? Looks
>> like a valid implementation of the DRAIN command might never send a
>> response in this case, because the only thing it does is replying to
>> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE commands on the OUTPUT queue. I guess,
>> it is better to specify what happens. I think the device should respond
>> to a certain amount of OUTPUT QUEUE commands until there is an end of
>> stream condition. Then it should respond to DRAIN command. What happens
>> with the remaining queued output buffers is a question to me: should
>> they be cancelled or not?
> If I understand correctly this should not be a problem. Replies to
> commands can come out-of-order, so the reply to DRAIN can come as soon
> as the command is completed, regardless of how many output buffers we
> have queued at that moment. The queued output buffers can also remain
> queued in prediction for the next sequence, if any - if it has the
> same resolution as the previous one, then the queued output buffers
> can be used. If it doesn't then a resolution change event will be
> produced and the driver will process it.

Ok, thanks, this makes sense to me.


>>> +
>>> +While the device is processing the command, it MUST return
>>> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to the
>>> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN command.
>> Should the device stop accepting input too?
> There should be no problem with the device accepting (and even
> processing) input for the next sequence, as long as it doesn't make
> its result available before the response to the DRAIN command.

Hmm, maybe it is worth to add this requirement in the spec. WDYT?


>>> +
>>> +If the command is interrupted due to a VIRTIO\_VIDEO\_CMD\_STREAM\_STOP
>>> +or VIRTIO\_VIDEO\_CMD\_STREAM\_DESTROY operation, the device MUST
>>> +respond with VIRTIO\_VIDEO\_RESULT\_ERR\_CANCELED.
>>> +
>>> +\paragraph{VIRTIO_VIDEO_CMD_STREAM_STOP}\label{sec:Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_STOP}
>>> +
>> I don't like this command to be called "stop". When I see a "stop"
>> command, I expect to see a "start" command as well. My personal
>> preference would be "flush" or "reset".
> Fair enough, let me rename this to RESET (which was the name used in a
> previous revision for a somehow-similar command).

Great.


>>> +};
>>> +\end{lstlisting}
>>> +
>>> +\begin{description}
>>> +\item[\field{result}]
>>> +is
>>> +
>>> +\begin{description}
>>> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
>>> +if the operation succeeded,
>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
>>> +if the requested stream does not exist,
>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
>>> +if the \field{param_type} argument is invalid for the device,
>>> +\end{description}
>>> +\item[\field{param}]
>>> +is the value of the requested parameter, if \field{result} is
>>> +VIRTIO\_VIDEO\_RESULT\_OK.
>>> +\end{description}
>>> +
>>> +\drivernormative{\subparagraph}{VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}{Device Types / Video Device / Device Operation / Device Operation: Stream commands / VIRTIO_VIDEO_CMD_STREAM_GET_PARAM}
>>> +
>>> +\field{cmd_type} MUST be set to VIRTIO\_VIDEO\_CMD\_STREAM\_GET\_PARAM
>>> +by the driver.
>>> +
>>> +\field{stream_id} MUST be set to a valid stream ID previously returned
>>> +by VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE.
>>> +
>>> +\field{param_type} MUST be set to a parameter type that is valid for the
>>> +device.
>> The device requirements are missing for GET_PARAMS.
> There aren't any beyond returning the requested parameter or an error code.

Ok.


>>> +};
>>> +\end{lstlisting}
>>> +
>>> +Within \field{struct virtio_video_resource_sg_entry}:
>>> +
>>> +\begin{description}
>>> +\item[\field{addr}]
>>> +is a guest physical address to the start of the SG entry.
>>> +\item[\field{length}]
>>> +is the length of the SG entry.
>>> +\end{description}
>> I think having explicit page alignment requirements here would be great.
> This may be host-dependent, maybe we should have a capability field so
> it can provide this information?

I mean there is already a VIRTIO_VIDEO_F_RESOURCE_GUEST_PAGES feature
bit. This suggests, that these addresses always point to pages, right?
If not, there is some inconsistency here IMO.

In our setup I think it is just always the case, that they are page
aligned. Probably non page aligned addresses would require copying on
CPU on all our platforms. So I think, yes, there should be a way to
indicate (if not require) this.


>>> +
>>> +Finally, for \field{struct virtio_video_resource_sg_list}:
>>> +
>>> +\begin{description}
>>> +\item[\field{num_entries}]
>>> +is the number of \field{struct virtio_video_resource_sg_entry} instances
>>> +that follow.
>>> +\end{description}
>>> +
>>> +\field{struct virtio_video_resource_object} is defined as follows:
>>> +
>>> +\begin{lstlisting}
>>> +struct virtio_video_resource_object {
>>> +        u8 uuid[16];
>>> +};
>>> +\end{lstlisting}
>>> +
>>> +\begin{description}
>>> +\item[uuid]
>>> +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>> +\end{description}
>>> +
>>> +The device responds with
>>> +\field{struct virtio_video_resource_attach_backing_resp}:
>>> +
>>> +\begin{lstlisting}
>>> +struct virtio_video_resource_attach_backing_resp {
>>> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
>>> +};
>>> +\end{lstlisting}
>>> +
>>> +\begin{description}
>>> +\item[\field{result}]
>>> +is
>>> +
>>> +\begin{description}
>>> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
>>> +if the operation succeeded,
>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
>>> +if the mentioned stream does not exist,
>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
>>> +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
>>> +invalid value,
>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
>>> +if the operation is performed at a time when it is non-valid.
>>> +\end{description}
>>> +\end{description}
>>> +
>>> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
>>> +the following times:
>>> +
>>> +\begin{itemize}
>>> +\item
>>> +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
>>> +  resource,
>>> +\item
>>> +  AFTER successfully changing the \field{virtio_video_params_resources}
>>> +  parameter corresponding to the queue and BEFORE
>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
>>> +\end{itemize}
>>> +
>>> +This is to ensure that the device can rely on the fact that a given
>>> +resource will always point to the same memory for as long as it may be
>>> +used by the video device. For instance, a decoder may use returned
>>> +decoded frames as reference for future frames and won't overwrite the
>>> +backing resource of a frame that is being referenced. It is only before
>>> +a stream is started and after a Dynamic Resolution Change event has
>>> +occurred that we can be sure that all resources won't be used in that
>>> +way.
>> The mentioned scenario about the referenced frames looks
>> somewhatreasonable, but I wonder how exactly would that work in practice.
> Basically the guest need to make sure the backing memory remains
> available and unwritten until the conditions mentioned above are met.
> Or is there anything unclear in this description?

Ok, I read the discussions about whether to allow the device to have
read access after response to QUEUE or not. Since this comes from v4l2,
then this should not be a problem, I think. I didn't know that v4l2
expects the user-space to never write to CAPTURE buffers after they
dequeued. I wonder if it is enforced in drivers.


>>> +        le32 stream_id;
>>> +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
>>> +        le32 resource_id;
>>> +        le32 flags; /* Bitmask of VIRTIO_VIDEO_ENQUEUE_FLAG_* */
>>> +        u8 padding[4];
>>> +        le64 timestamp;
>>> +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
>>> +};
>>> +\end{lstlisting}
>>> +
>>> +\begin{description}
>>> +\item[\field{stream_id}]
>>> +is the ID of a valid stream.
>>> +\item[\field{queue_type}]
>>> +is the direction of the queue.
>>> +\item[\field{resource_id}]
>>> +is the ID of the resource to be queued.
>>> +\item[\field{flags}]
>>> +is a bitmask of VIRTIO\_VIDEO\_ENQUEUE\_FLAG\_* values.
>>> +
>>> +\begin{description}
>>> +\item[\field{VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME}]
>>> +The submitted frame is to be encoded as a key frame. Only valid for the
>>> +encoder's INPUT queue.
>>> +\end{description}
>>> +\item[\field{timestamp}]
>>> +is an abstract sequence counter that can be used on the INPUT queue for
>>> +synchronization. Resources produced on the output queue will carry the
>>> +\field{timestamp} of the input resource they have been produced from.
>> I think this is quite misleading. Implementers may assume, that it is ok
>> to assume a 1-to-1 mapping between input and output buffers and no
>> reordering, right? But this is not the case usually:
>>
>> 1. In the end of the spec H.264 and HEVC are defined to always have a
>> single NAL unit per resource. Well, there are many types of NAL units,
>> that do not represent any video data. Like SEI NAL units or delimiters.
>>
>> 2. We may assume that the SEI and delimiter units are filtered before
>> queuing, but there still is also other codec-specific data that can't be
>> filtered, like SPS and PPS NAL units. There has to be some special handling.
>>
>> 3. All of this means more codec-specific code in the driver or client
>> applications.
>>
>> 4. This spec says that the device may skip to a next key frame after a
>> seek. So the driver has to account for this too.
>>
>> 5. For example, in H.264 a single key frame may by coded by several NAL
>> units. In fact all VCL NAL units are called slices because of this. What
>> happens when the decoder sees several NAL units with different
>> timestamps coding the same output frame? Which timestamp will it choose?
>> I'm not sure it is defined anywhere. Probably it will just take the
>> first timestamp. The driver/client applications have to be ready for
>> this too.
>>
>> 6. I saw almost the same scenario with CSD units too. Imagine SPS with
>> timestamp 0, then PPS with 1, and then an IDR with 2. These three might
>> be combined in a single input buffer together by the vendor-provided
>> decoding software. Then the timestamp of the resulting frame is
>> naturally 0. But the driver/client application already doesn't expect to
>> get any response with timestamps 0 and 1, because they are known to be
>> belonging to CSD. And it expects an output buffer with ts 2. So there
>> will be a problem. (This is a real world example actually.)
>>
>> 7. Then there is H.264 High profile, for example. It has different
>> decoding and presentation order because frames may depend on future
>> frames. I think all the modern codecs have a mode like this. The input
>> frames are usually provided in the decoding order. Should the output
>> frames timestamps just be copied from input frames, they have been
>> produced from as this paragraph above says? This resembles decoder order
>> then. Well, this can work, if the container has correct DTS and PTS, and
>> the client software creates a mapping between these timestamps and the
>> virtio video timestamp. But this is not always the case. For example,
>> simple H.264 bitstream doesn't have any timestamps. And still it can be
>> easily played by ffmpeg/gstreamer/VLC/etc. There is no way to make this
>> work with a decoder following this spec, I think.
>>
>> My suggestion is to not think about the timestamp as an abstract
>> counter, but give some freedom to the device by providing the available
>> information from the container, be it DTS, PTR or only FPS (through
>> PARAMS). Also the input and output queues should indeed be completely
>> separated. There should be no assumption of a 1-to-1 mapping of buffers.
> The beginning of the "Device Operation" section tries to make it clear
> that the input and output queues are operating independently and that
> no mapping or ordering should be expected by the driver, but maybe
> this is worth repeating here.
>
> Regarding the use of timestamp, a sensible use would indeed be for the
> driver to set it to some meaningful information retrieved from the
> container (which the driver would itself obtain from user-space),
> probably the PTS if that is available. In the case of H.264 non-VCL
> NAL units would not produce any output, so their timestamp would
> effectively be ignored. For frames that are made of several slices,
> the first timestamp should be the one propagated to the output frame.
> (and this here is why I prefer VP8/VP9 ^_^;)

Did they manage to avoid the same thing with VP9 SVC? :)

The phrase "Resources produced on the output queue will carry the
\field{timestamp} of the input resource they have been produced from."
still sounds misleading to me. It doesn't cover for all these cases of
no 1 to 1 mapping. Also what if there are timestamps for some of the
frames, but not for all?


> In fact most users probably won't care about this field. In the worst
> case, even if no timestamp is available, operation can still be done
> reliably since decoded frames are made available in presentation
> order. This fact was not obvious in the spec, so I have added a
> sentence in the "Device Operation" section to clarify.
>
> I hope this answers your concerns, but please let me know if I didn't
> address something in particular.

Indeed the order of output frames was not obvious from the spec. I think
there might be use-cases, when you want the decoded frames as early as
possible. Like when you have to transmit the frames over some (slow)
medium. If the decoder outputs in presentation order, the frames might
come out in batches. This is not good for latency then. WDYT?


>>> +\item[\field{planes}]
>>> +is the format description of each individual plane making this format.
>>> +The number of planes is dependent on the \field{fourcc} and detailed in
>>> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
>>> +
>>> +\begin{description}
>>> +\item[\field{buffer_size}]
>>> +is the minimum size of the buffers that will back resources to be
>>> +queued.
>>> +\item[\field{stride}]
>>> +is the distance in bytes between two lines of data.
>>> +\item[\field{offset}]
>>> +is the starting offset for the data in the buffer.
>> It is not quite clear to me how to use the offset during SET_PARAMS. I
>> think it is much more reasonable to have per plane offsets in struct
>> virtio_video_resource_queue and struct virtio_video_resource_queue_resp.
> This is supposed to describe where in a given buffer the host can find
> the beginning of a given plane (mostly useful for multi-planar/single
> buffer formats). This typically does not change between frames, so
> having it as a parameter seems appropriate to me?

The plane sizes don't change either, right? I think it is just usual way
to put the plane offsets and sizes together. I saw this pattern in
gstreamer. I think, in DRM and V4L2 as well. For me it is quite reasonable.


>>> +encode at the requested format and resolution.
>> It is not defined when changing these parameters is allowed. Also there
>> is an issue: changing width, height, format, buffer_size should probably
>> detach all the currently attached buffers. But changing crop shouldn't
>> affect the output buffers in any way, right? So maybe it is better to
>> split them?
> If the currently attached buffers are large enough to support the new
> format, there should not be any need to detach them (if they are not,
> the SET_PARAM command should fail). So even if we only change the
> crop, the device can perform the full validation on the format and
> keep going with the current buffers if possible.
>
> Indeed the timing for setting this parameter should be better defined.
> In particular the input format for a decoder (or output format for an
> encoder) will probably remain static through the session.

Ok.


>>> +\item[\field{YU12}]
>>> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
>>> +single buffer. 4:2:0 subsampling.
>>> +\item[\field{YM12}]
>>> +same as \field{YU12} but using three separate buffers for the Y, U and V
>>> +planes.
>>> +\end{description}
>> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
>> V4L2 documentation has a nice description of exact plane layouts.
>> Otherwise it would be nice to have these layouts in the spec IMO.
> I've linked to the relevant V4L2 pages, indeed they describe the
> formats and layouts much better.
>
> Thanks for all the feedback. We can continue on this basis, or I can
> try to build a small prototype of that V4L2-over-virtio idea if you
> agree this looks like a good idea. The guest driver would mostly be
> forwarding the V4L2 ioctls as-is to the host, it would be interesting
> to see how small we can make it with this design.

Let's discuss the idea.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2022-12-08  7:23 [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification Alexandre Courbot
                   ` (2 preceding siblings ...)
  2023-01-11 17:04 ` Alexander Gordeev
@ 2023-01-11 18:45 ` Alexander Gordeev
  3 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-01-11 18:45 UTC (permalink / raw)
  To: Alexandre Courbot, virtio-dev, Keiichi Watanabe, Alex Bennée
  Cc: Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa

Hi Alexandre,

On 08.12.22 08:23, Alexandre Courbot wrote:
> Add the specification of the video decoder and encoder devices, which
> can be used to provide host-accelerated video operations to the guest.
>
> Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> Signed-off-by: Alexandre Courbot <acourbot@chromium.org>
> --
> Here is the long-overdue new revision of the virtio-video RFC. This
> version reorganizes the specification quite a bit and tries to simplify
> the protocol further. Nonetheless, it still results in a rather long (17
> pages) specification for just these devices, even though the spec is not
> fully complete (I want to rethink the formats descriptions, and some
> parameters need to be added for the encoder device).
>
> I would like to get some high-level feedback on this version and maybe
> propose to do things a bit differently before people invest too much
> time reviewing this in depth. While rewriting this document, it became
> more and more obvious that this is just a different, and maybe a bit
> simpler, reimplementation of the V4L2 stateless decoder protocol [1]. I
> am now wondering whether it would not make more sense to rewrite this
> specification as just a way to transport V4L2 requests over virtio,
> similarly to how virtio-fs does with the FUSE protocol [2].
>
> At the time we started writing this implementation, the V4L2 protocols
> for decoders and encoders were not set in stone yet, but now that they
> are it might make sense to reconsider. Switching to this solution would
> greatly shorten the virtio-video device spec, and also provide a way to
> support other kind of V4L2 devices like cameras or image processors at
> no extra cost.
>
> Note that doing so would not require that either the host or guest uses
> V4L2 - the virtio video device would just emulate a V4L2 device over
> virtio. A few adaptations would need to be done regarding how memory
> types work, but otherwise I believe most of V4L2 could be used as-is.
>
> Please share your thoughts about this, and I will either explore this
> idea further with a prototype, or keep moving the present spec forward,
> hopefully at a faster pace.
>
> Due to the RFC state of this patch I have refrained from referencing the
> normative statements in conformance.tex - I will do that as a final step
> once the spec is mostly agreed on.
>
> [1] https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdocs.kernel.org%2fuserspace%2dapi%2fmedia%2fv4l%2fdev%2dstateless%2ddecoder.html&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-e98508782bc1c9aa6b2e4a9df9d4dd170f9a5ffa
> [2] https://github.com/oasis-tcs/virtio-spec/blob/master/virtio-fs.tex
>
> Full PDF:
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f1HRVDiDdo50%2dS9X5tWgzmT90FJRHoB1dN%2fview%3fusp%3dsharing&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-e315af79a067165e908bf1d803441eb181e2f375
>
> PDF of video section only:
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdrive.google.com%2ffile%2fd%2f1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD%2fview%3fusp%3dsharing&umid=8ec8d8c9-b83c-40de-9337-a377056fe2af&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-b5c45bb4b6ccc5b73ea2a54e26f151d61722d0df

One more thing. I haven't found profiles and levels for the encoder
anywhere in the spec. They were there in v5.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-11 18:42     ` Alexander Gordeev
@ 2023-01-11 20:13       ` Alex Bennée
  2023-01-12  6:40         ` Alexandre Courbot
  2023-01-12  6:39       ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Alex Bennée @ 2023-01-11 20:13 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Alexandre Courbot, virtio-dev, Keiichi Watanabe, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa


Alexander Gordeev <alexander.gordeev@opensynergy.com> writes:

> Hi Alexandre,
>
> On 27.12.22 08:31, Alexandre Courbot wrote:
>> Hi Alexander,
>>
>>
>> On Tue, Dec 20, 2022 at 1:59 AM Alexander Gordeev
>> <alexander.gordeev@opensynergy.com> wrote:
>>> Hello Alexandre,
>>>
>>> Thanks for the update. Please check my comments below.
>>> I'm new to the virtio video spec development, so I may lack some
>>> historic perspective. I would gladly appreciate pointing me to some
>>> older emails explaining decisions, that I might not understand. I hope
>>> to read through all of them later. Overall I have a lot of experience in
>>> the video domain and in virtio video device development in Opsy, so I
>>> hope, that my comments are relevant and useful.
>> Cornelia provided links to the previous versions (thanks!). Through
>> these revisions we tried different approaches, and the more we
>> progress the closer we are getting to the V4L2 stateful
>> decoder/encoder interface.
<snip>
>> This probably means a bit more work on the device side, since this
>> spec is tailored for the specific video codec use-case and V4L2 is
>> more generic, but also less spec to maintain and more confidence that
>> things will work as we want in the real world. On the other hand, the
>> device would also become simpler by the fact that responses to
>> commands could not come out-of-order as they currently do. So at the
>> end of the day I'm not even sure this would result in a more complex
>> device.
>
> Sorry for the delay. I tried to gather data about how the spec has
> evolved in the old emails.
>
> Well, on the one hand mimicking v4l2 looks like an easy solution from
> virtio-video spec writing perspective. (But the implementers will have
> to read the V4L2 API instead AFAIU, which is probably longer...)
>
> On the other hand v4l2 has a lot of history. It started as a camera API
> and gained the codec support later, right? So it definitely has just to
> much stuff irrelevant for codecs. Here we have an option to design from
> scratch taking the best ideas from v4l2.

We definitely don't want to bake in Linux APIs into the VirtIO spec
which is meant to be host/guest agnostic if we can help it.

>
> Also I have concerns about the virtio-video spec development. This seems
> like a big change. It seems to me that after so many discussions and
> versions of the spec, the process should be coming to something by now.
> But this is still a moving target...
>
> There were arguments against adding camera support for security and
> complexity reasons during discussions about virtio-video spec v1. Were
> these concerns addressed somehow? Maybe I missed a followup
> discussion?

I think adding camera support would introduce a lot of potential
complexity to the spec - c.f. projects like libcamera and the fact most
modern cameras are more than providing a stream of images. It probably
warrants having its own device specification that is designed from the
outset with cameras in mind.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-11  8:45     ` Cornelia Huck
@ 2023-01-12  6:32       ` Alexandre Courbot
  2023-01-12 15:23         ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-01-12  6:32 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

Hi Cornelia,

On Wed, Jan 11, 2023 at 5:45 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >> > +
> >> > +The device MUST set the \field{caps_length} field to a value equal to
> >> > +the response size of VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
> >>
> >> Could the device also support a minimum response size that only supports
> >> a subset of the caps to be returned? Otherwise, I think caps_length is
> >> the maximum (or fixed?) length of the query caps response?
> >
> > I think this can be replaced by a fixed-size call for getting only one
> > format at a time. The guest would have to make several of these in
> > order to obtain the whole set of supported formats, but it would be
> > easier to parse compared to the large result returned by QUERY_CAP and
> > simpler overall.
>
> How would you implement this? Would the driver do the call repeatedly
> until no more formats remain (requires the device to track state, and
> needs a specification on what happens if the driver continues doing the
> call?) Or would the driver pass in an index, and the device only needs
> to check for out-of-range?

One peculiarity of codecs is that the pixel formats available may
depend on the coded format (and sometimes even its resolution). For
instance, NV12 may be available if you do VP8 at 1080p but not 4K
H.264.

This means that a single call returning all the coded and pixel
formats won't be able to convey all the possible subtleties. In
previous versions of this spec we used bitmasks to associate pixel
formats to their supported coded formats, but it's a bit of a pain to
manage while still not bringing the necessary precision.

So here again the safe way is to follow the V4L2 lead: get a list of
supported bitstream formats (one at a time, using an index with a
range defined by the configuration space), apply the one we are
interested in, and query the supported pixel formats for that specific
bitstream format using the same mechanism. That's more back-and-forth
between the guest and the host, but it happens before streaming starts
and better reflects the actual decoding workflow where the client is
typically only interested in the supported formats for the codec they
want to decode.

> >> Generally, I don't see anything fundamentally wrong with this approach
> >> (mostly some smaller nits.) Feedback from someone familiar with this
> >> subject would be great, though.
> >
> > Thanks, that's encouraging! There are still a few bits missing, and we
> > may switch to something different if we decide to piggyback V4L2, but
> > the core mechanisms will remain similar so it is great to see that
> > there isn't any hard blocker.
>
> Let's see how we can move this forward to something that can be
> included in the spec, always good to see a new, useful device!

Thanks! I guess the main decision will be whether to follow this spec
or switch to encapsulating V4L2, but we will be able to progress
either way. I'll try to present the benefits of the V4L2-over-virtio
solution later in this thread.

Cheers,
Alex.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-11 17:04 ` Alexander Gordeev
@ 2023-01-12  6:32   ` Alexandre Courbot
  2023-01-12 22:24     ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-01-12  6:32 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

Hi Alexander,

On Thu, Jan 12, 2023 at 2:04 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
> > +\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
> > +
> > +The fourcc code of each supported bitstream format is given, as well as
> > +the unit of data requested in each input resource for the decoder, or
> > +produced in each output resource for the encoder.
> > +
> > +\begin{description}
> > +\item[\field{MPG2}]
> > +MPEG2 encoded stream. One Access Unit per resource.
> > +\item[\field{H264}]
> > +H.264 encoded stream. One NAL unit per resource.
> > +\item[\field{HEVC}]
> > +HEVC encoded stream. One NAL unit per resource.
> > +\item[\field{VP80}]
> > +VP8 encoded stream. One frame per resource.
> > +\item[\field{VP90}]
> > +VP9 encoded stream. One frame per resource.
> > +\end{description}
> > +
>
> Actually I'm not sure where do these fourcc codes for bitstream formats
> come from.
>
> In one of the old comments I found a reference to
> https://www.rfc-editor.org/rfc/rfc2361
>
> But it doesn't define MPG2, HEVC, VP80, VP90.
>
> Also there is this comment to virtio-video v1 from Tomasz Figa and the
> related discussion:
> https://markmail.org/message/gc6h25acct22niut#query:+page:1+mid:et4l3ni7qjqhiygo+state:results
>
> He wrote that it is not worth it because there are so many conflicting
> sets of fourcc codes.
>
> I can see that fourcc codes were not used in virtio-video spec draft
> versions 1 to 5. So now it looks quite weird to see them here. Probably
> this is because you'd like to mimic v4l2 interface more closely?

These come from here:
https://docs.kernel.org/userspace-api/media/v4l/pixfmt-compressed.html

V4L2 has a nice list of supported fourccs for coded formats, so I'd
just suggest we refer to them and use them instead of defining our
own.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-11 18:42     ` Alexander Gordeev
  2023-01-11 20:13       ` Alex Bennée
@ 2023-01-12  6:39       ` Alexandre Courbot
  2023-01-18 23:06         ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-01-12  6:39 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

On Thu, Jan 12, 2023 at 3:42 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> Hi Alexandre,
>
> On 27.12.22 08:31, Alexandre Courbot wrote:
> > Hi Alexander,
> >
> >
> > On Tue, Dec 20, 2022 at 1:59 AM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >> Hello Alexandre,
> >>
> >> Thanks for the update. Please check my comments below.
> >> I'm new to the virtio video spec development, so I may lack some
> >> historic perspective. I would gladly appreciate pointing me to some
> >> older emails explaining decisions, that I might not understand. I hope
> >> to read through all of them later. Overall I have a lot of experience in
> >> the video domain and in virtio video device development in Opsy, so I
> >> hope, that my comments are relevant and useful.
> > Cornelia provided links to the previous versions (thanks!). Through
> > these revisions we tried different approaches, and the more we
> > progress the closer we are getting to the V4L2 stateful
> > decoder/encoder interface.
> >
> > This is actually the point where I would particularly be interested in
> > having your feedback, since you probably have noticed the similarity.
> > What would you think about just using virtio as a transport for V4L2
> > ioctls (virtio-fs does something similar with FUSE), and having the
> > host emulate a V4L2 decoder or encoder device in place of this (long)
> > specification? I am personally starting to think this is could be a
> > better and faster way to get us to a point where both spec and guest
> > drivers are merged. Moreover this would also open the way to support
> > other kinds of V4L2 devices like simple cameras - we would just need
> > to allocate new device IDs for these and would be good to go.
> >
> > This probably means a bit more work on the device side, since this
> > spec is tailored for the specific video codec use-case and V4L2 is
> > more generic, but also less spec to maintain and more confidence that
> > things will work as we want in the real world. On the other hand, the
> > device would also become simpler by the fact that responses to
> > commands could not come out-of-order as they currently do. So at the
> > end of the day I'm not even sure this would result in a more complex
> > device.
>
> Sorry for the delay. I tried to gather data about how the spec has
> evolved in the old emails.

If has been a bit all over the place as we tried different approaches,
sorry about that. >_<

>
> Well, on the one hand mimicking v4l2 looks like an easy solution from
> virtio-video spec writing perspective. (But the implementers will have
> to read the V4L2 API instead AFAIU, which is probably longer...)

It should not necessarily be much longer as the parts we are
interested in have their own dedicated pages:

https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html
https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html

Besides, the decoding and encoding processes are described with more
precision, not that we couldn't do that here but it would make the
spec grow longer than I am comfortable with...

>
> On the other hand v4l2 has a lot of history. It started as a camera API
> and gained the codec support later, right? So it definitely has just to
> much stuff irrelevant for codecs. Here we have an option to design from
> scratch taking the best ideas from v4l2.

That's also what we were thinking initially, but as we try to
implement our new and optimized designs, we end up hitting a wall and
redoing things like V4L2 did. There are small exceptions, like how
STOP/RESET is implemented here which is slightly simpler than the V4L2
equivalent, but I don't think these justify reinventing the remaining
95% quasi-identically.

V4L2 supports much more than video codecs, but if you want to
implement a decoder you don't need to support anything more than what
the decoder spec says you should. And that subset happens to map very
well to the decoder use-case - it's not like the V4L2 folks tried to
shoehorn codecs into something that is inadequate for them.

>
> Also I have concerns about the virtio-video spec development. This seems
> like a big change. It seems to me that after so many discussions and
> versions of the spec, the process should be coming to something by now.
> But this is still a moving target...

I agree and apologize for the slow progress of this project, but let's
not fall for the sunk cost fallacy if it turns out the
V4L2-over-virtio solution fits the bill better and for less effort.

>
> There were arguments against adding camera support for security and
> complexity reasons during discussions about virtio-video spec v1. Were
> these concerns addressed somehow? Maybe I missed a followup discussion?

The conclusion was that cameras should be their own specification as
the virtio-video spec is too specialized for the codec use-case. There
is actually an ongoing project for this:

https://gitlab.collabora.com/collabora/virtio-camera

... which states in its README: "For now it is almost directly based
on V4L2 Linux driver UAPI."

That makes me think, if virtio-video is going to ressemble V4L2
closely, and virtio-camera ends up heading in the same direction, why
don't we just embrace the underlying reality that we are reinventing
V4L2?

>
>
> >>> +\begin{lstlisting}
> >>> +/* Device */
> >>> +#define VIRTIO_VIDEO_CMD_DEVICE_QUERY_CAPS       0x100
> >>> +
> >>> +/* Stream */
> >>> +#define VIRTIO_VIDEO_CMD_STREAM_CREATE           0x200
> >>> +#define VIRTIO_VIDEO_CMD_STREAM_DESTROY          0x201
> >> Is this gap in numbers intentional? It would be great to remove it to
> >> simplify boundary checks.
> > This is to allow commands of the same family to stay close to one
> > another. I'm not opposed to removing the gap, it just means that
> > commands may end up being a bit all over the place if we extend the
> > protocol.
>
> Actually there is a gap between 0x201 and 0x203. Sorry for not being
> clear here.

Ah, right. Fixed that, thanks.

> >>> +
> >>> +While the device is processing the command, it MUST return
> >>> +VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION to the
> >>> +VIRTIO\_VIDEO\_CMD\_STREAM\_DRAIN command.
> >> Should the device stop accepting input too?
> > There should be no problem with the device accepting (and even
> > processing) input for the next sequence, as long as it doesn't make
> > its result available before the response to the DRAIN command.
>
> Hmm, maybe it is worth to add this requirement in the spec. WDYT?

Agreed and added a sentence to clarify this.

> >>> +};
> >>> +\end{lstlisting}
> >>> +
> >>> +Within \field{struct virtio_video_resource_sg_entry}:
> >>> +
> >>> +\begin{description}
> >>> +\item[\field{addr}]
> >>> +is a guest physical address to the start of the SG entry.
> >>> +\item[\field{length}]
> >>> +is the length of the SG entry.
> >>> +\end{description}
> >> I think having explicit page alignment requirements here would be great.
> > This may be host-dependent, maybe we should have a capability field so
> > it can provide this information?
>
> I mean there is already a VIRTIO_VIDEO_F_RESOURCE_GUEST_PAGES feature
> bit. This suggests, that these addresses always point to pages, right?
> If not, there is some inconsistency here IMO.
>
> In our setup I think it is just always the case, that they are page
> aligned. Probably non page aligned addresses would require copying on
> CPU on all our platforms. So I think, yes, there should be a way to
> indicate (if not require) this.

Ah, I see what you mean now. I agree it makes sense to be page-aligned
for this, added that in the description of the `addr` field.

> >>> +
> >>> +Finally, for \field{struct virtio_video_resource_sg_list}:
> >>> +
> >>> +\begin{description}
> >>> +\item[\field{num_entries}]
> >>> +is the number of \field{struct virtio_video_resource_sg_entry} instances
> >>> +that follow.
> >>> +\end{description}
> >>> +
> >>> +\field{struct virtio_video_resource_object} is defined as follows:
> >>> +
> >>> +\begin{lstlisting}
> >>> +struct virtio_video_resource_object {
> >>> +        u8 uuid[16];
> >>> +};
> >>> +\end{lstlisting}
> >>> +
> >>> +\begin{description}
> >>> +\item[uuid]
> >>> +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
> >>> +\end{description}
> >>> +
> >>> +The device responds with
> >>> +\field{struct virtio_video_resource_attach_backing_resp}:
> >>> +
> >>> +\begin{lstlisting}
> >>> +struct virtio_video_resource_attach_backing_resp {
> >>> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> >>> +};
> >>> +\end{lstlisting}
> >>> +
> >>> +\begin{description}
> >>> +\item[\field{result}]
> >>> +is
> >>> +
> >>> +\begin{description}
> >>> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> >>> +if the operation succeeded,
> >>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> >>> +if the mentioned stream does not exist,
> >>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> >>> +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
> >>> +invalid value,
> >>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> >>> +if the operation is performed at a time when it is non-valid.
> >>> +\end{description}
> >>> +\end{description}
> >>> +
> >>> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
> >>> +the following times:
> >>> +
> >>> +\begin{itemize}
> >>> +\item
> >>> +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
> >>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
> >>> +  resource,
> >>> +\item
> >>> +  AFTER successfully changing the \field{virtio_video_params_resources}
> >>> +  parameter corresponding to the queue and BEFORE
> >>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
> >>> +\end{itemize}
> >>> +
> >>> +This is to ensure that the device can rely on the fact that a given
> >>> +resource will always point to the same memory for as long as it may be
> >>> +used by the video device. For instance, a decoder may use returned
> >>> +decoded frames as reference for future frames and won't overwrite the
> >>> +backing resource of a frame that is being referenced. It is only before
> >>> +a stream is started and after a Dynamic Resolution Change event has
> >>> +occurred that we can be sure that all resources won't be used in that
> >>> +way.
> >> The mentioned scenario about the referenced frames looks
> >> somewhatreasonable, but I wonder how exactly would that work in practice.
> > Basically the guest need to make sure the backing memory remains
> > available and unwritten until the conditions mentioned above are met.
> > Or is there anything unclear in this description?
>
> Ok, I read the discussions about whether to allow the device to have
> read access after response to QUEUE or not. Since this comes from v4l2,
> then this should not be a problem, I think. I didn't know that v4l2
> expects the user-space to never write to CAPTURE buffers after they
> dequeued. I wonder if it is enforced in drivers.
>
>
> >>> +        le32 stream_id;
> >>> +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
> >>> +        le32 resource_id;
> >>> +        le32 flags; /* Bitmask of VIRTIO_VIDEO_ENQUEUE_FLAG_* */
> >>> +        u8 padding[4];
> >>> +        le64 timestamp;
> >>> +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
> >>> +};
> >>> +\end{lstlisting}
> >>> +
> >>> +\begin{description}
> >>> +\item[\field{stream_id}]
> >>> +is the ID of a valid stream.
> >>> +\item[\field{queue_type}]
> >>> +is the direction of the queue.
> >>> +\item[\field{resource_id}]
> >>> +is the ID of the resource to be queued.
> >>> +\item[\field{flags}]
> >>> +is a bitmask of VIRTIO\_VIDEO\_ENQUEUE\_FLAG\_* values.
> >>> +
> >>> +\begin{description}
> >>> +\item[\field{VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME}]
> >>> +The submitted frame is to be encoded as a key frame. Only valid for the
> >>> +encoder's INPUT queue.
> >>> +\end{description}
> >>> +\item[\field{timestamp}]
> >>> +is an abstract sequence counter that can be used on the INPUT queue for
> >>> +synchronization. Resources produced on the output queue will carry the
> >>> +\field{timestamp} of the input resource they have been produced from.
> >> I think this is quite misleading. Implementers may assume, that it is ok
> >> to assume a 1-to-1 mapping between input and output buffers and no
> >> reordering, right? But this is not the case usually:
> >>
> >> 1. In the end of the spec H.264 and HEVC are defined to always have a
> >> single NAL unit per resource. Well, there are many types of NAL units,
> >> that do not represent any video data. Like SEI NAL units or delimiters.
> >>
> >> 2. We may assume that the SEI and delimiter units are filtered before
> >> queuing, but there still is also other codec-specific data that can't be
> >> filtered, like SPS and PPS NAL units. There has to be some special handling.
> >>
> >> 3. All of this means more codec-specific code in the driver or client
> >> applications.
> >>
> >> 4. This spec says that the device may skip to a next key frame after a
> >> seek. So the driver has to account for this too.
> >>
> >> 5. For example, in H.264 a single key frame may by coded by several NAL
> >> units. In fact all VCL NAL units are called slices because of this. What
> >> happens when the decoder sees several NAL units with different
> >> timestamps coding the same output frame? Which timestamp will it choose?
> >> I'm not sure it is defined anywhere. Probably it will just take the
> >> first timestamp. The driver/client applications have to be ready for
> >> this too.
> >>
> >> 6. I saw almost the same scenario with CSD units too. Imagine SPS with
> >> timestamp 0, then PPS with 1, and then an IDR with 2. These three might
> >> be combined in a single input buffer together by the vendor-provided
> >> decoding software. Then the timestamp of the resulting frame is
> >> naturally 0. But the driver/client application already doesn't expect to
> >> get any response with timestamps 0 and 1, because they are known to be
> >> belonging to CSD. And it expects an output buffer with ts 2. So there
> >> will be a problem. (This is a real world example actually.)
> >>
> >> 7. Then there is H.264 High profile, for example. It has different
> >> decoding and presentation order because frames may depend on future
> >> frames. I think all the modern codecs have a mode like this. The input
> >> frames are usually provided in the decoding order. Should the output
> >> frames timestamps just be copied from input frames, they have been
> >> produced from as this paragraph above says? This resembles decoder order
> >> then. Well, this can work, if the container has correct DTS and PTS, and
> >> the client software creates a mapping between these timestamps and the
> >> virtio video timestamp. But this is not always the case. For example,
> >> simple H.264 bitstream doesn't have any timestamps. And still it can be
> >> easily played by ffmpeg/gstreamer/VLC/etc. There is no way to make this
> >> work with a decoder following this spec, I think.
> >>
> >> My suggestion is to not think about the timestamp as an abstract
> >> counter, but give some freedom to the device by providing the available
> >> information from the container, be it DTS, PTR or only FPS (through
> >> PARAMS). Also the input and output queues should indeed be completely
> >> separated. There should be no assumption of a 1-to-1 mapping of buffers.
> > The beginning of the "Device Operation" section tries to make it clear
> > that the input and output queues are operating independently and that
> > no mapping or ordering should be expected by the driver, but maybe
> > this is worth repeating here.
> >
> > Regarding the use of timestamp, a sensible use would indeed be for the
> > driver to set it to some meaningful information retrieved from the
> > container (which the driver would itself obtain from user-space),
> > probably the PTS if that is available. In the case of H.264 non-VCL
> > NAL units would not produce any output, so their timestamp would
> > effectively be ignored. For frames that are made of several slices,
> > the first timestamp should be the one propagated to the output frame.
> > (and this here is why I prefer VP8/VP9 ^_^;)
>
> Did they manage to avoid the same thing with VP9 SVC? :)
>
> The phrase "Resources produced on the output queue will carry the
> \field{timestamp} of the input resource they have been produced from."
> still sounds misleading to me. It doesn't cover for all these cases of
> no 1 to 1 mapping. Also what if there are timestamps for some of the
> frames, but not for all?

This shouldn't matter - a timestamp of 0 is still a timestamp and will
be carried over to the corresponding frames.

> > In fact most users probably won't care about this field. In the worst
> > case, even if no timestamp is available, operation can still be done
> > reliably since decoded frames are made available in presentation
> > order. This fact was not obvious in the spec, so I have added a
> > sentence in the "Device Operation" section to clarify.
> >
> > I hope this answers your concerns, but please let me know if I didn't
> > address something in particular.
>
> Indeed the order of output frames was not obvious from the spec. I think
> there might be use-cases, when you want the decoded frames as early as
> possible. Like when you have to transmit the frames over some (slow)
> medium. If the decoder outputs in presentation order, the frames might
> come out in batches. This is not good for latency then. WDYT?

Who would be in charge of reordering then? If that burden falls to the
guest user-space, then it probably wants to use a stateless API.
That's not something covered by this spec (and covering it would
require adding many more per-codec structures for SPS/PPS, VP8
headers, etc.), but can be supported with V4L2 FWIW. Supporting this
API however would add dozens more pages just to document the
codec-specific structures necessary to decode a frame. See for
instance what would be needed for H.264:
https://www.kernel.org/doc/html/v5.5/media/uapi/v4l/ext-ctrls-codec.html#c.v4l2_ctrl_h264_sps.

Or a client that *really* wants decoding order for latency reasons
could hack the stream a bit to change the presentation order and
perform QUEUE/DRAIN sequences for each frame. That would not be more
complex than supporting a stateless API anyway.

> >>> +\item[\field{planes}]
> >>> +is the format description of each individual plane making this format.
> >>> +The number of planes is dependent on the \field{fourcc} and detailed in
> >>> +\ref{sec:Device Types / Video Device / Supported formats / Image formats}.
> >>> +
> >>> +\begin{description}
> >>> +\item[\field{buffer_size}]
> >>> +is the minimum size of the buffers that will back resources to be
> >>> +queued.
> >>> +\item[\field{stride}]
> >>> +is the distance in bytes between two lines of data.
> >>> +\item[\field{offset}]
> >>> +is the starting offset for the data in the buffer.
> >> It is not quite clear to me how to use the offset during SET_PARAMS. I
> >> think it is much more reasonable to have per plane offsets in struct
> >> virtio_video_resource_queue and struct virtio_video_resource_queue_resp.
> > This is supposed to describe where in a given buffer the host can find
> > the beginning of a given plane (mostly useful for multi-planar/single
> > buffer formats). This typically does not change between frames, so
> > having it as a parameter seems appropriate to me?
>
> The plane sizes don't change either, right? I think it is just usual way
> to put the plane offsets and sizes together. I saw this pattern in
> gstreamer. I think, in DRM and V4L2 as well. For me it is quite reasonable.

Ack!

> >>> +\item[\field{YU12}]
> >>> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
> >>> +single buffer. 4:2:0 subsampling.
> >>> +\item[\field{YM12}]
> >>> +same as \field{YU12} but using three separate buffers for the Y, U and V
> >>> +planes.
> >>> +\end{description}
> >> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
> >> V4L2 documentation has a nice description of exact plane layouts.
> >> Otherwise it would be nice to have these layouts in the spec IMO.
> > I've linked to the relevant V4L2 pages, indeed they describe the
> > formats and layouts much better.
> >
> > Thanks for all the feedback. We can continue on this basis, or I can
> > try to build a small prototype of that V4L2-over-virtio idea if you
> > agree this looks like a good idea. The guest driver would mostly be
> > forwarding the V4L2 ioctls as-is to the host, it would be interesting
> > to see how small we can make it with this design.
>
> Let's discuss the idea.

Let me try to summarize the case for using V4L2 over Virtio (I'll call
it virtio-v4l2 to differentiate it from the current spec).

There is the argument that virtio-video turns out to be a recreation
of the stateful V4L2 decoder API, which itself works similarly to
other high-level decoder APIs. So it's not like we could or should
come with something very different. In parallel, virtio-camera is also
currently using V4L2 as its model. While this is subject to change, I
am starting to see a pattern here. :)

Transporting V4L2 over virtio would considerably shorten the length of
this spec, as we would just need to care about the transport aspect
and minor amendments to the meaning of some V4L2 structure members,
and leave the rest to V4L2 which is properly documented and for which
there is a large collection of working examples.

This would work very well for codec devices, but as a side-effect
would also enable other kinds of devices that may be useful to
virtualize, like image processors, DVB cards, and cameras. This
doesn't mean virtio-v4l2 should be the *only* way to support cameras
over virtio. It is a nice bonus of encapsulating V4L2, it may be
sufficient for simple (most?) use-cases, but also doesn't forbid more
specialized virtual devices for complex camera pipelines to be added
later. virtio-v4l2 would just be the generic virtual video device that
happens to be sufficient for our accelerated video needs - and if your
host camera is a USB UVC one, well feel free to use that too.

In other words, I see an opportunity to enable a whole class of
devices instead of a single type for the same effort and think we
should seriously consider this.

I have started to put down what a virtio-v4l2 transport might look
like, and am also planning on putting together a small
proof-of-concept. If I can get folks here to warm up to the idea, I
believe we should be able to share a spec and prototype in a month or
so.

Cheers,
Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-11 20:13       ` Alex Bennée
@ 2023-01-12  6:40         ` Alexandre Courbot
  0 siblings, 0 replies; 97+ messages in thread
From: Alexandre Courbot @ 2023-01-12  6:40 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa

Hi Alex (are we having too many Alexes in this discussion? ^_^;)

On Thu, Jan 12, 2023 at 5:20 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Alexander Gordeev <alexander.gordeev@opensynergy.com> writes:
>
> > Hi Alexandre,
> >
> > On 27.12.22 08:31, Alexandre Courbot wrote:
> >> Hi Alexander,
> >>
> >>
> >> On Tue, Dec 20, 2022 at 1:59 AM Alexander Gordeev
> >> <alexander.gordeev@opensynergy.com> wrote:
> >>> Hello Alexandre,
> >>>
> >>> Thanks for the update. Please check my comments below.
> >>> I'm new to the virtio video spec development, so I may lack some
> >>> historic perspective. I would gladly appreciate pointing me to some
> >>> older emails explaining decisions, that I might not understand. I hope
> >>> to read through all of them later. Overall I have a lot of experience in
> >>> the video domain and in virtio video device development in Opsy, so I
> >>> hope, that my comments are relevant and useful.
> >> Cornelia provided links to the previous versions (thanks!). Through
> >> these revisions we tried different approaches, and the more we
> >> progress the closer we are getting to the V4L2 stateful
> >> decoder/encoder interface.
> <snip>
> >> This probably means a bit more work on the device side, since this
> >> spec is tailored for the specific video codec use-case and V4L2 is
> >> more generic, but also less spec to maintain and more confidence that
> >> things will work as we want in the real world. On the other hand, the
> >> device would also become simpler by the fact that responses to
> >> commands could not come out-of-order as they currently do. So at the
> >> end of the day I'm not even sure this would result in a more complex
> >> device.
> >
> > Sorry for the delay. I tried to gather data about how the spec has
> > evolved in the old emails.
> >
> > Well, on the one hand mimicking v4l2 looks like an easy solution from
> > virtio-video spec writing perspective. (But the implementers will have
> > to read the V4L2 API instead AFAIU, which is probably longer...)
> >
> > On the other hand v4l2 has a lot of history. It started as a camera API
> > and gained the codec support later, right? So it definitely has just to
> > much stuff irrelevant for codecs. Here we have an option to design from
> > scratch taking the best ideas from v4l2.
>
> We definitely don't want to bake in Linux APIs into the VirtIO spec
> which is meant to be host/guest agnostic if we can help it.

Note that using V4L2 here wouldn't bind us to Linux on either the host
or guest, it would just provide definitions of structures and
processes to perform video tasks. You can think of it as the way FUSE
is used in virtio-fs.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-12  6:32       ` Alexandre Courbot
@ 2023-01-12 15:23         ` Cornelia Huck
  0 siblings, 0 replies; 97+ messages in thread
From: Cornelia Huck @ 2023-01-12 15:23 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

On Thu, Jan 12 2023, Alexandre Courbot <acourbot@chromium.org> wrote:

> Hi Cornelia,
>
> On Wed, Jan 11, 2023 at 5:45 PM Cornelia Huck <cohuck@redhat.com> wrote:
>> >> > +
>> >> > +The device MUST set the \field{caps_length} field to a value equal to
>> >> > +the response size of VIRTIO\_VIDEO\_CMD\_DEVICE\_QUERY\_CAPS.
>> >>
>> >> Could the device also support a minimum response size that only supports
>> >> a subset of the caps to be returned? Otherwise, I think caps_length is
>> >> the maximum (or fixed?) length of the query caps response?
>> >
>> > I think this can be replaced by a fixed-size call for getting only one
>> > format at a time. The guest would have to make several of these in
>> > order to obtain the whole set of supported formats, but it would be
>> > easier to parse compared to the large result returned by QUERY_CAP and
>> > simpler overall.
>>
>> How would you implement this? Would the driver do the call repeatedly
>> until no more formats remain (requires the device to track state, and
>> needs a specification on what happens if the driver continues doing the
>> call?) Or would the driver pass in an index, and the device only needs
>> to check for out-of-range?
>
> One peculiarity of codecs is that the pixel formats available may
> depend on the coded format (and sometimes even its resolution). For
> instance, NV12 may be available if you do VP8 at 1080p but not 4K
> H.264.
>
> This means that a single call returning all the coded and pixel
> formats won't be able to convey all the possible subtleties. In
> previous versions of this spec we used bitmasks to associate pixel
> formats to their supported coded formats, but it's a bit of a pain to
> manage while still not bringing the necessary precision.
>
> So here again the safe way is to follow the V4L2 lead: get a list of
> supported bitstream formats (one at a time, using an index with a
> range defined by the configuration space), apply the one we are
> interested in, and query the supported pixel formats for that specific
> bitstream format using the same mechanism. That's more back-and-forth
> between the guest and the host, but it happens before streaming starts
> and better reflects the actual decoding workflow where the client is
> typically only interested in the supported formats for the codec they
> want to decode.

Thanks for the explanation; I agree that this is probably a good way to
handle a complex situation.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-12  6:32   ` Alexandre Courbot
@ 2023-01-12 22:24     ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-01-12 22:24 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

Hi Alexandre,

On 12.01.23 07:32, Alexandre Courbot wrote:
> Hi Alexander,
>
> On Thu, Jan 12, 2023 at 2:04 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>> +\subsubsection{Bitstream formats}\label{sec:Device Types / Video Device / Supported formats / Bitstream formats}
>>> +
>>> +The fourcc code of each supported bitstream format is given, as well as
>>> +the unit of data requested in each input resource for the decoder, or
>>> +produced in each output resource for the encoder.
>>> +
>>> +\begin{description}
>>> +\item[\field{MPG2}]
>>> +MPEG2 encoded stream. One Access Unit per resource.
>>> +\item[\field{H264}]
>>> +H.264 encoded stream. One NAL unit per resource.
>>> +\item[\field{HEVC}]
>>> +HEVC encoded stream. One NAL unit per resource.
>>> +\item[\field{VP80}]
>>> +VP8 encoded stream. One frame per resource.
>>> +\item[\field{VP90}]
>>> +VP9 encoded stream. One frame per resource.
>>> +\end{description}
>>> +
>> Actually I'm not sure where do these fourcc codes for bitstream formats
>> come from.
>>
>> In one of the old comments I found a reference to
>> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fwww.rfc%2deditor.org%2frfc%2frfc2361&umid=1c7e5b96-6438-489f-ae82-dfea03537169&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-9abf34a612eb36d608f592911fde3480af19c68e
>>
>> But it doesn't define MPG2, HEVC, VP80, VP90.
>>
>> Also there is this comment to virtio-video v1 from Tomasz Figa and the
>> related discussion:
>> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fmarkmail.org%2fmessage%2fgc6h25acct22niut%23query%3a%2bpage%3a1%2bmid%3aet4l3ni7qjqhiygo%2bstate%3aresults&umid=1c7e5b96-6438-489f-ae82-dfea03537169&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-d0ecfac6371fb65d9d93c694ba692ef42fe41d63
>>
>> He wrote that it is not worth it because there are so many conflicting
>> sets of fourcc codes.
>>
>> I can see that fourcc codes were not used in virtio-video spec draft
>> versions 1 to 5. So now it looks quite weird to see them here. Probably
>> this is because you'd like to mimic v4l2 interface more closely?
> These come from here:
> https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fdocs.kernel.org%2fuserspace%2dapi%2fmedia%2fv4l%2fpixfmt%2dcompressed.html&umid=1c7e5b96-6438-489f-ae82-dfea03537169&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-4fbebb6ce3215a0e62782b61d287090a62d088dc
>
> V4L2 has a nice list of supported fourccs for coded formats, so I'd
> just suggest we refer to them and use them instead of defining our
> own.

Oops, somehow I overlooked that. Thanks!


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-12  6:39       ` Alexandre Courbot
@ 2023-01-18 23:06         ` Alexander Gordeev
  2023-02-06 14:12           ` Cornelia Huck
  2023-02-07  6:51           ` Alexandre Courbot
  0 siblings, 2 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-01-18 23:06 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

Hi Alexandre,

On 12.01.23 07:39, Alexandre Courbot wrote:
> On Thu, Jan 12, 2023 at 3:42 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>> Hi Alexandre,
>>
>> On 27.12.22 08:31, Alexandre Courbot wrote:
>>> Hi Alexander,
>>>
>>>
>>> Cornelia provided links to the previous versions (thanks!). Through
>>> these revisions we tried different approaches, and the more we
>>> progress the closer we are getting to the V4L2 stateful
>>> decoder/encoder interface.
>>>
>>> This is actually the point where I would particularly be interested in
>>> having your feedback, since you probably have noticed the similarity.
>>> What would you think about just using virtio as a transport for V4L2
>>> ioctls (virtio-fs does something similar with FUSE), and having the
>>> host emulate a V4L2 decoder or encoder device in place of this (long)
>>> specification? I am personally starting to think this is could be a
>>> better and faster way to get us to a point where both spec and guest
>>> drivers are merged. Moreover this would also open the way to support
>>> other kinds of V4L2 devices like simple cameras - we would just need
>>> to allocate new device IDs for these and would be good to go.
>>>
>>> This probably means a bit more work on the device side, since this
>>> spec is tailored for the specific video codec use-case and V4L2 is
>>> more generic, but also less spec to maintain and more confidence that
>>> things will work as we want in the real world. On the other hand, the
>>> device would also become simpler by the fact that responses to
>>> commands could not come out-of-order as they currently do. So at the
>>> end of the day I'm not even sure this would result in a more complex
>>> device.
>> Sorry for the delay. I tried to gather data about how the spec has
>> evolved in the old emails.
> If has been a bit all over the place as we tried different approaches,
> sorry about that. >_<

No worries, this is totally understandable.


>> Well, on the one hand mimicking v4l2 looks like an easy solution from
>> virtio-video spec writing perspective. (But the implementers will have
>> to read the V4L2 API instead AFAIU, which is probably longer...)
> It should not necessarily be much longer as the parts we are
> interested in have their own dedicated pages:
>
> https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html>https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html>
>
> Besides, the decoding and encoding processes are described with more
> precision, not that we couldn't do that here but it would make the
> spec grow longer than I am comfortable with...

I read the references carefully, thanks. I am somewhat familiar with the
stateful decoder API, but the stateless one still needs exploring.

There is one serious issue with these references IMO: they represent
guest user-space <-> v4l2 subsystem API, not v4l2 subsystem <->
virtio-video driver API. Just to make sure we're on the same page:

guest user-space <-> v4l2 kernel subsystem <-> virtio-video driver <-
virtio-video protocol -> virtio-video device.

I believe this is how it is supposed to work, right? So I thought, that
your intention is to simplify virtio-video driver and virtio-video
protocol by reusing the v4l2 subsystem <-> v4l2 driver API. But having
these references I can assume, that you want to use user-space <-> v4l2
subsystem API, right? Well, I think this cannot happen and therefore
these references cannot be used directly unless:

1. You suggest that virtio-video driver should not use v4l2 subsystem,
but should mimic its user-space API in every detail. Probably not a good
idea.

2. There is already a way to bypass the subsystem completely. I'm not
aware of that.

3. user-space <-> v4l2 subsystem API is already the same or very close
to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
even with stateful decoder/encoder. Even more with stateless decoders
because I can see, that v4l2 subsystem actually stores some state in
this case as well. Which is quite reasonable I think.

So I think what we need to reference here is v4l2 subsystem <-> v4l2
driver API. Do you have this reference? Well, I know there is some
documentation, but still I doubt that. AFAIR kernel internal APIs are
never fixed. Right?

Besides that I can see, that this version of the spec indeed comes
closer to v4l2 user-space API, but it doesn't borrow all the legacy. I
think, this is fine. It is definitely more lightweight. I like that.

>> On the other hand v4l2 has a lot of history. It started as a camera API
>> and gained the codec support later, right? So it definitely has just to
>> much stuff irrelevant for codecs. Here we have an option to design from
>> scratch taking the best ideas from v4l2.
> That's also what we were thinking initially, but as we try to
> implement our new and optimized designs, we end up hitting a wall and
> redoing things like V4L2 did. There are small exceptions, like how
> STOP/RESET is implemented here which is slightly simpler than the V4L2
> equivalent, but I don't think these justify reinventing the remaining
> 95% quasi-identically.

I like this simplicity. If the user-space API can be reused, and this
makes things easier somehow, then maybe you're right. So looking forward
for you response to my comment above.


> V4L2 supports much more than video codecs, but if you want to
> implement a decoder you don't need to support anything more than what
> the decoder spec says you should. And that subset happens to map very
> well to the decoder use-case - it's not like the V4L2 folks tried to
> shoehorn codecs into something that is inadequate for them.

Actually I think that having only a single timestamp might be the result
of the evolution from an API for cameras, where you don't have this
tricky ordering stuff.


>> Also I have concerns about the virtio-video spec development. This seems
>> like a big change. It seems to me that after so many discussions and
>> versions of the spec, the process should be coming to something by now.
>> But this is still a moving target...
> I agree and apologize for the slow progress of this project, but let's
> not fall for the sunk cost fallacy if it turns out the
> V4L2-over-virtio solution fits the bill better and for less effort.

Ok. I just think that at this point this has to be very well justified.


>> There were arguments against adding camera support for security and
>> complexity reasons during discussions about virtio-video spec v1. Were
>> these concerns addressed somehow? Maybe I missed a followup discussion?
> The conclusion was that cameras should be their own specification as
> the virtio-video spec is too specialized for the codec use-case. There
> is actually an ongoing project for this:
>
> https://gitlab.collabora.com/collabora/virtio-camera
>
> ... which states in its README: "For now it is almost directly based
> on V4L2 Linux driver UAPI."
>
> That makes me think, if virtio-video is going to ressemble V4L2
> closely, and virtio-camera ends up heading in the same direction, why
> don't we just embrace the underlying reality that we are reinventing
> V4L2?

As I wrote earlier I only welcome taking the best ideas from V4L2. And
as I wrote above AFAIU we can't reinvent and we're not reinventing V4L2
UAPI.


>>>>> +
>>>>> +Finally, for \field{struct virtio_video_resource_sg_list}:
>>>>> +
>>>>> +\begin{description}
>>>>> +\item[\field{num_entries}]
>>>>> +is the number of \field{struct virtio_video_resource_sg_entry} instances
>>>>> +that follow.
>>>>> +\end{description}
>>>>> +
>>>>> +\field{struct virtio_video_resource_object} is defined as follows:
>>>>> +
>>>>> +\begin{lstlisting}
>>>>> +struct virtio_video_resource_object {
>>>>> +        u8 uuid[16];
>>>>> +};
>>>>> +\end{lstlisting}
>>>>> +
>>>>> +\begin{description}
>>>>> +\item[uuid]
>>>>> +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>>>> +\end{description}
>>>>> +
>>>>> +The device responds with
>>>>> +\field{struct virtio_video_resource_attach_backing_resp}:
>>>>> +
>>>>> +\begin{lstlisting}
>>>>> +struct virtio_video_resource_attach_backing_resp {
>>>>> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
>>>>> +};
>>>>> +\end{lstlisting}
>>>>> +
>>>>> +\begin{description}
>>>>> +\item[\field{result}]
>>>>> +is
>>>>> +
>>>>> +\begin{description}
>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
>>>>> +if the operation succeeded,
>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
>>>>> +if the mentioned stream does not exist,
>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
>>>>> +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
>>>>> +invalid value,
>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
>>>>> +if the operation is performed at a time when it is non-valid.
>>>>> +\end{description}
>>>>> +\end{description}
>>>>> +
>>>>> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
>>>>> +the following times:
>>>>> +
>>>>> +\begin{itemize}
>>>>> +\item
>>>>> +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
>>>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
>>>>> +  resource,
>>>>> +\item
>>>>> +  AFTER successfully changing the \field{virtio_video_params_resources}
>>>>> +  parameter corresponding to the queue and BEFORE
>>>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
>>>>> +\end{itemize}
>>>>> +
>>>>> +This is to ensure that the device can rely on the fact that a given
>>>>> +resource will always point to the same memory for as long as it may be
>>>>> +used by the video device. For instance, a decoder may use returned
>>>>> +decoded frames as reference for future frames and won't overwrite the
>>>>> +backing resource of a frame that is being referenced. It is only before
>>>>> +a stream is started and after a Dynamic Resolution Change event has
>>>>> +occurred that we can be sure that all resources won't be used in that
>>>>> +way.
>>>> The mentioned scenario about the referenced frames looks
>>>> somewhatreasonable, but I wonder how exactly would that work in practice.
>>> Basically the guest need to make sure the backing memory remains
>>> available and unwritten until the conditions mentioned above are met.
>>> Or is there anything unclear in this description?
>> Ok, I read the discussions about whether to allow the device to have
>> read access after response to QUEUE or not. Since this comes from v4l2,
>> then this should not be a problem, I think. I didn't know that v4l2
>> expects the user-space to never write to CAPTURE buffers after they
>> dequeued. I wonder if it is enforced in drivers.

Actually I have more on this. I'd prefer if there is a requirement for
the driver or the device to handle a case when a still referenced frame
is queued again. So that it is less likely to be forgotten.


>>>>> +        le32 stream_id;
>>>>> +        le32 queue_type; /* VIRTIO_VIDEO_QUEUE_TYPE_* */
>>>>> +        le32 resource_id;
>>>>> +        le32 flags; /* Bitmask of VIRTIO_VIDEO_ENQUEUE_FLAG_* */
>>>>> +        u8 padding[4];
>>>>> +        le64 timestamp;
>>>>> +        le32 data_sizes[VIRTIO_VIDEO_MAX_PLANES];
>>>>> +};
>>>>> +\end{lstlisting}
>>>>> +
>>>>> +\begin{description}
>>>>> +\item[\field{stream_id}]
>>>>> +is the ID of a valid stream.
>>>>> +\item[\field{queue_type}]
>>>>> +is the direction of the queue.
>>>>> +\item[\field{resource_id}]
>>>>> +is the ID of the resource to be queued.
>>>>> +\item[\field{flags}]
>>>>> +is a bitmask of VIRTIO\_VIDEO\_ENQUEUE\_FLAG\_* values.
>>>>> +
>>>>> +\begin{description}
>>>>> +\item[\field{VIRTIO_VIDEO_ENQUEUE_FLAG_FORCE_KEY_FRAME}]
>>>>> +The submitted frame is to be encoded as a key frame. Only valid for the
>>>>> +encoder's INPUT queue.
>>>>> +\end{description}
>>>>> +\item[\field{timestamp}]
>>>>> +is an abstract sequence counter that can be used on the INPUT queue for
>>>>> +synchronization. Resources produced on the output queue will carry the
>>>>> +\field{timestamp} of the input resource they have been produced from.
>>>> I think this is quite misleading. Implementers may assume, that it is ok
>>>> to assume a 1-to-1 mapping between input and output buffers and no
>>>> reordering, right? But this is not the case usually:
>>>>
>>>> 1. In the end of the spec H.264 and HEVC are defined to always have a
>>>> single NAL unit per resource. Well, there are many types of NAL units,
>>>> that do not represent any video data. Like SEI NAL units or delimiters.
>>>>
>>>> 2. We may assume that the SEI and delimiter units are filtered before
>>>> queuing, but there still is also other codec-specific data that can't be
>>>> filtered, like SPS and PPS NAL units. There has to be some special handling.
>>>>
>>>> 3. All of this means more codec-specific code in the driver or client
>>>> applications.
>>>>
>>>> 4. This spec says that the device may skip to a next key frame after a
>>>> seek. So the driver has to account for this too.
>>>>
>>>> 5. For example, in H.264 a single key frame may by coded by several NAL
>>>> units. In fact all VCL NAL units are called slices because of this. What
>>>> happens when the decoder sees several NAL units with different
>>>> timestamps coding the same output frame? Which timestamp will it choose?
>>>> I'm not sure it is defined anywhere. Probably it will just take the
>>>> first timestamp. The driver/client applications have to be ready for
>>>> this too.
>>>>
>>>> 6. I saw almost the same scenario with CSD units too. Imagine SPS with
>>>> timestamp 0, then PPS with 1, and then an IDR with 2. These three might
>>>> be combined in a single input buffer together by the vendor-provided
>>>> decoding software. Then the timestamp of the resulting frame is
>>>> naturally 0. But the driver/client application already doesn't expect to
>>>> get any response with timestamps 0 and 1, because they are known to be
>>>> belonging to CSD. And it expects an output buffer with ts 2. So there
>>>> will be a problem. (This is a real world example actually.)
>>>>
>>>> 7. Then there is H.264 High profile, for example. It has different
>>>> decoding and presentation order because frames may depend on future
>>>> frames. I think all the modern codecs have a mode like this. The input
>>>> frames are usually provided in the decoding order. Should the output
>>>> frames timestamps just be copied from input frames, they have been
>>>> produced from as this paragraph above says? This resembles decoder order
>>>> then. Well, this can work, if the container has correct DTS and PTS, and
>>>> the client software creates a mapping between these timestamps and the
>>>> virtio video timestamp. But this is not always the case. For example,
>>>> simple H.264 bitstream doesn't have any timestamps. And still it can be
>>>> easily played by ffmpeg/gstreamer/VLC/etc. There is no way to make this
>>>> work with a decoder following this spec, I think.
>>>>
>>>> My suggestion is to not think about the timestamp as an abstract
>>>> counter, but give some freedom to the device by providing the available
>>>> information from the container, be it DTS, PTR or only FPS (through
>>>> PARAMS). Also the input and output queues should indeed be completely
>>>> separated. There should be no assumption of a 1-to-1 mapping of buffers.
>>> The beginning of the "Device Operation" section tries to make it clear
>>> that the input and output queues are operating independently and that
>>> no mapping or ordering should be expected by the driver, but maybe
>>> this is worth repeating here.
>>>
>>> Regarding the use of timestamp, a sensible use would indeed be for the
>>> driver to set it to some meaningful information retrieved from the
>>> container (which the driver would itself obtain from user-space),
>>> probably the PTS if that is available. In the case of H.264 non-VCL
>>> NAL units would not produce any output, so their timestamp would
>>> effectively be ignored. For frames that are made of several slices,
>>> the first timestamp should be the one propagated to the output frame.
>>> (and this here is why I prefer VP8/VP9 ^_^;)
>> Did they manage to avoid the same thing with VP9 SVC? :)
>>
>> The phrase "Resources produced on the output queue will carry the
>> \field{timestamp} of the input resource they have been produced from."
>> still sounds misleading to me. It doesn't cover for all these cases of
>> no 1 to 1 mapping. Also what if there are timestamps for some of the
>> frames, but not for all?
> This shouldn't matter - a timestamp of 0 is still a timestamp and will
> be carried over to the corresponding frames.

I really like the table of all possible cases from the reference, that
you provided above:
https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html#decoding

I'd prefer to have it here as well. Maybe it could be shorter.


>>> In fact most users probably won't care about this field. In the worst
>>> case, even if no timestamp is available, operation can still be done
>>> reliably since decoded frames are made available in presentation
>>> order. This fact was not obvious in the spec, so I have added a
>>> sentence in the "Device Operation" section to clarify.
>>>
>>> I hope this answers your concerns, but please let me know if I didn't
>>> address something in particular.
>> Indeed the order of output frames was not obvious from the spec. I think
>> there might be use-cases, when you want the decoded frames as early as
>> possible. Like when you have to transmit the frames over some (slow)
>> medium. If the decoder outputs in presentation order, the frames might
>> come out in batches. This is not good for latency then. WDYT?
> Who would be in charge of reordering then? If that burden falls to the
> guest user-space, then it probably wants to use a stateless API.
> That's not something covered by this spec (and covering it would
> require adding many more per-codec structures for SPS/PPS, VP8
> headers, etc.), but can be supported with V4L2 FWIW. Supporting this
> API however would add dozens more pages just to document the
> codec-specific structures necessary to decode a frame. See for
> instance what would be needed for H.264:
> https://www.kernel.org/doc/html/v5.5/media/uapi/v4l/ext-ctrls-codec.html#c.v4l2_ctrl_h264_sps.
>
> Or a client that *really* wants decoding order for latency reasons
> could hack the stream a bit to change the presentation order and
> perform QUEUE/DRAIN sequences for each frame. That would not be more
> complex than supporting a stateless API anyway.

I agree to some extent. It is just that I see this as a limitation of
stateful V4L2 UAPI. If we consider virtio-video V4L2 driver as the only
virtio-video driver implementation out there, then yes, we can safely
bake this limitation in. If V4L2 APIs change, the spec can be updated
too. If other implementations are expected and they need the decoding
order, then this might be a problem. Reordering could be done in the
virtio video V4L2 driver. Anyway for me this is a low priority issue for
now.


>>>>> +\item[\field{YU12}]
>>>>> +one Y plane followed by one Cb plane, followed by one Cr plane, in a
>>>>> +single buffer. 4:2:0 subsampling.
>>>>> +\item[\field{YM12}]
>>>>> +same as \field{YU12} but using three separate buffers for the Y, U and V
>>>>> +planes.
>>>>> +\end{description}
>>>> This looks like V4L2 formats. Maybe add a V4L2 reference? At least the
>>>> V4L2 documentation has a nice description of exact plane layouts.
>>>> Otherwise it would be nice to have these layouts in the spec IMO.
>>> I've linked to the relevant V4L2 pages, indeed they describe the
>>> formats and layouts much better.
>>>
>>> Thanks for all the feedback. We can continue on this basis, or I can
>>> try to build a small prototype of that V4L2-over-virtio idea if you
>>> agree this looks like a good idea. The guest driver would mostly be
>>> forwarding the V4L2 ioctls as-is to the host, it would be interesting
>>> to see how small we can make it with this design.
>> Let's discuss the idea.
> Let me try to summarize the case for using V4L2 over Virtio (I'll call
> it virtio-v4l2 to differentiate it from the current spec).
>
> There is the argument that virtio-video turns out to be a recreation
> of the stateful V4L2 decoder API, which itself works similarly to
> other high-level decoder APIs. So it's not like we could or should
> come with something very different. In parallel, virtio-camera is also
> currently using V4L2 as its model. While this is subject to change, I
> am starting to see a pattern here. :)
>
> Transporting V4L2 over virtio would considerably shorten the length of
> this spec, as we would just need to care about the transport aspect
> and minor amendments to the meaning of some V4L2 structure members,
> and leave the rest to V4L2 which is properly documented and for which
> there is a large collection of working examples.
>
> This would work very well for codec devices, but as a side-effect
> would also enable other kinds of devices that may be useful to
> virtualize, like image processors, DVB cards, and cameras. This
> doesn't mean virtio-v4l2 should be the *only* way to support cameras
> over virtio. It is a nice bonus of encapsulating V4L2, it may be
> sufficient for simple (most?) use-cases, but also doesn't forbid more
> specialized virtual devices for complex camera pipelines to be added
> later. virtio-v4l2 would just be the generic virtual video device that
> happens to be sufficient for our accelerated video needs - and if your
> host camera is a USB UVC one, well feel free to use that too.
>
> In other words, I see an opportunity to enable a whole class of
> devices instead of a single type for the same effort and think we
> should seriously consider this.
>
> I have started to put down what a virtio-v4l2 transport might look
> like, and am also planning on putting together a small
> proof-of-concept. If I can get folks here to warm up to the idea, I
> believe we should be able to share a spec and prototype in a month or
> so.

Thanks for the detailed explanation. Please check my comments above. I'd
like to resolve the mentioned issue first.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-18 23:06         ` Alexander Gordeev
@ 2023-02-06 14:12           ` Cornelia Huck
  2023-02-07  6:16             ` Alexandre Courbot
  2023-02-07 11:11             ` Alexander Gordeev
  2023-02-07  6:51           ` Alexandre Courbot
  1 sibling, 2 replies; 97+ messages in thread
From: Cornelia Huck @ 2023-02-06 14:12 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

On Thu, Jan 19 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:

> Hi Alexandre,
>
> On 12.01.23 07:39, Alexandre Courbot wrote:
>> On Thu, Jan 12, 2023 at 3:42 AM Alexander Gordeev
>> <alexander.gordeev@opensynergy.com> wrote:

>>> Well, on the one hand mimicking v4l2 looks like an easy solution from
>>> virtio-video spec writing perspective. (But the implementers will have
>>> to read the V4L2 API instead AFAIU, which is probably longer...)
>> It should not necessarily be much longer as the parts we are
>> interested in have their own dedicated pages:
>>
>> https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html>https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html>
>>
>> Besides, the decoding and encoding processes are described with more
>> precision, not that we couldn't do that here but it would make the
>> spec grow longer than I am comfortable with...
>
> I read the references carefully, thanks. I am somewhat familiar with the
> stateful decoder API, but the stateless one still needs exploring.
>
> There is one serious issue with these references IMO: they represent
> guest user-space <-> v4l2 subsystem API, not v4l2 subsystem <->
> virtio-video driver API. Just to make sure we're on the same page:
>
> guest user-space <-> v4l2 kernel subsystem <-> virtio-video driver <-
> virtio-video protocol -> virtio-video device.
>
> I believe this is how it is supposed to work, right? So I thought, that
> your intention is to simplify virtio-video driver and virtio-video
> protocol by reusing the v4l2 subsystem <-> v4l2 driver API. But having
> these references I can assume, that you want to use user-space <-> v4l2
> subsystem API, right? Well, I think this cannot happen and therefore
> these references cannot be used directly unless:
>
> 1. You suggest that virtio-video driver should not use v4l2 subsystem,
> but should mimic its user-space API in every detail. Probably not a good
> idea.
>
> 2. There is already a way to bypass the subsystem completely. I'm not
> aware of that.
>
> 3. user-space <-> v4l2 subsystem API is already the same or very close
> to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
> even with stateful decoder/encoder. Even more with stateless decoders
> because I can see, that v4l2 subsystem actually stores some state in
> this case as well. Which is quite reasonable I think.
>
> So I think what we need to reference here is v4l2 subsystem <-> v4l2
> driver API. Do you have this reference? Well, I know there is some
> documentation, but still I doubt that. AFAIR kernel internal APIs are
> never fixed. Right?

So, I'm not that familiar with v4l2, but if that's indeed the case,
depending on some kernel internal APIs is a no-go. First, because
in-kernel APIs are not stable, and second, because we want something
that's BSD-licenced (as opposed to GPLv2-licenced) to point to. The
kernel<->userspace API would work (BSD-licenced header and stable); I
had the impression that we wanted to reuse the various #defines in
there -- did I misunderstand?

(...)

>> Let me try to summarize the case for using V4L2 over Virtio (I'll call
>> it virtio-v4l2 to differentiate it from the current spec).
>>
>> There is the argument that virtio-video turns out to be a recreation
>> of the stateful V4L2 decoder API, which itself works similarly to
>> other high-level decoder APIs. So it's not like we could or should
>> come with something very different. In parallel, virtio-camera is also
>> currently using V4L2 as its model. While this is subject to change, I
>> am starting to see a pattern here. :)
>>
>> Transporting V4L2 over virtio would considerably shorten the length of
>> this spec, as we would just need to care about the transport aspect
>> and minor amendments to the meaning of some V4L2 structure members,
>> and leave the rest to V4L2 which is properly documented and for which
>> there is a large collection of working examples.
>>
>> This would work very well for codec devices, but as a side-effect
>> would also enable other kinds of devices that may be useful to
>> virtualize, like image processors, DVB cards, and cameras. This
>> doesn't mean virtio-v4l2 should be the *only* way to support cameras
>> over virtio. It is a nice bonus of encapsulating V4L2, it may be
>> sufficient for simple (most?) use-cases, but also doesn't forbid more
>> specialized virtual devices for complex camera pipelines to be added
>> later. virtio-v4l2 would just be the generic virtual video device that
>> happens to be sufficient for our accelerated video needs - and if your
>> host camera is a USB UVC one, well feel free to use that too.
>>
>> In other words, I see an opportunity to enable a whole class of
>> devices instead of a single type for the same effort and think we
>> should seriously consider this.
>>
>> I have started to put down what a virtio-v4l2 transport might look
>> like, and am also planning on putting together a small
>> proof-of-concept. If I can get folks here to warm up to the idea, I
>> believe we should be able to share a spec and prototype in a month or
>> so.
>
> Thanks for the detailed explanation. Please check my comments above. I'd
> like to resolve the mentioned issue first.

I hope we can sort this out soon -- I guess I'm not the only one who is
anxious about this spec moving forward :) Please let me know if I can
help in any way.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-02-06 14:12           ` Cornelia Huck
@ 2023-02-07  6:16             ` Alexandre Courbot
  2023-02-07 13:59               ` Cornelia Huck
  2023-02-07 11:11             ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-02-07  6:16 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida

On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Thu, Jan 19 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>
> > Hi Alexandre,
> >
> > On 12.01.23 07:39, Alexandre Courbot wrote:
> >> On Thu, Jan 12, 2023 at 3:42 AM Alexander Gordeev
> >> <alexander.gordeev@opensynergy.com> wrote:
>
> >>> Well, on the one hand mimicking v4l2 looks like an easy solution from
> >>> virtio-video spec writing perspective. (But the implementers will have
> >>> to read the V4L2 API instead AFAIU, which is probably longer...)
> >> It should not necessarily be much longer as the parts we are
> >> interested in have their own dedicated pages:
> >>
> >> https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html>https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html>
> >>
> >> Besides, the decoding and encoding processes are described with more
> >> precision, not that we couldn't do that here but it would make the
> >> spec grow longer than I am comfortable with...
> >
> > I read the references carefully, thanks. I am somewhat familiar with the
> > stateful decoder API, but the stateless one still needs exploring.
> >
> > There is one serious issue with these references IMO: they represent
> > guest user-space <-> v4l2 subsystem API, not v4l2 subsystem <->
> > virtio-video driver API. Just to make sure we're on the same page:
> >
> > guest user-space <-> v4l2 kernel subsystem <-> virtio-video driver <-
> > virtio-video protocol -> virtio-video device.
> >
> > I believe this is how it is supposed to work, right? So I thought, that
> > your intention is to simplify virtio-video driver and virtio-video
> > protocol by reusing the v4l2 subsystem <-> v4l2 driver API. But having
> > these references I can assume, that you want to use user-space <-> v4l2
> > subsystem API, right? Well, I think this cannot happen and therefore
> > these references cannot be used directly unless:
> >
> > 1. You suggest that virtio-video driver should not use v4l2 subsystem,
> > but should mimic its user-space API in every detail. Probably not a good
> > idea.
> >
> > 2. There is already a way to bypass the subsystem completely. I'm not
> > aware of that.
> >
> > 3. user-space <-> v4l2 subsystem API is already the same or very close
> > to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
> > even with stateful decoder/encoder. Even more with stateless decoders
> > because I can see, that v4l2 subsystem actually stores some state in
> > this case as well. Which is quite reasonable I think.
> >
> > So I think what we need to reference here is v4l2 subsystem <-> v4l2
> > driver API. Do you have this reference? Well, I know there is some
> > documentation, but still I doubt that. AFAIR kernel internal APIs are
> > never fixed. Right?
>
> So, I'm not that familiar with v4l2, but if that's indeed the case,
> depending on some kernel internal APIs is a no-go. First, because
> in-kernel APIs are not stable, and second, because we want something
> that's BSD-licenced (as opposed to GPLv2-licenced) to point to. The
> kernel<->userspace API would work (BSD-licenced header and stable); I
> had the impression that we wanted to reuse the various #defines in
> there -- did I misunderstand?

Sorry, I should have replied earlier to lift any misunderstanding. I
am not suggesting to use any kernel internal API as reference. My
suggestion is to stick strictly to the UAPI which is stable (as in,
guaranteed to be backward-compatible) and well documented. Here is for
instance the part documenting buffer queuing/dequeuing:
https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/vidioc-qbuf.html

Changing anything in this documentation in a backward-incompatible way
would break user-space, and everyone familiar with the kernel
community knows what happens when someone breaks user-space. ;) So
this can be a reliable source for virtio-video (and if you look
closely, you will also notice many similarities between the two).

>
> (...)
>
> >> Let me try to summarize the case for using V4L2 over Virtio (I'll call
> >> it virtio-v4l2 to differentiate it from the current spec).
> >>
> >> There is the argument that virtio-video turns out to be a recreation
> >> of the stateful V4L2 decoder API, which itself works similarly to
> >> other high-level decoder APIs. So it's not like we could or should
> >> come with something very different. In parallel, virtio-camera is also
> >> currently using V4L2 as its model. While this is subject to change, I
> >> am starting to see a pattern here. :)
> >>
> >> Transporting V4L2 over virtio would considerably shorten the length of
> >> this spec, as we would just need to care about the transport aspect
> >> and minor amendments to the meaning of some V4L2 structure members,
> >> and leave the rest to V4L2 which is properly documented and for which
> >> there is a large collection of working examples.
> >>
> >> This would work very well for codec devices, but as a side-effect
> >> would also enable other kinds of devices that may be useful to
> >> virtualize, like image processors, DVB cards, and cameras. This
> >> doesn't mean virtio-v4l2 should be the *only* way to support cameras
> >> over virtio. It is a nice bonus of encapsulating V4L2, it may be
> >> sufficient for simple (most?) use-cases, but also doesn't forbid more
> >> specialized virtual devices for complex camera pipelines to be added
> >> later. virtio-v4l2 would just be the generic virtual video device that
> >> happens to be sufficient for our accelerated video needs - and if your
> >> host camera is a USB UVC one, well feel free to use that too.
> >>
> >> In other words, I see an opportunity to enable a whole class of
> >> devices instead of a single type for the same effort and think we
> >> should seriously consider this.
> >>
> >> I have started to put down what a virtio-v4l2 transport might look
> >> like, and am also planning on putting together a small
> >> proof-of-concept. If I can get folks here to warm up to the idea, I
> >> believe we should be able to share a spec and prototype in a month or
> >> so.
> >
> > Thanks for the detailed explanation. Please check my comments above. I'd
> > like to resolve the mentioned issue first.
>
> I hope we can sort this out soon -- I guess I'm not the only one who is
> anxious about this spec moving forward :) Please let me know if I can
> help in any way.

I'll try to address Alexander's points in more detail, but I am not
seeing any blocking issue with using the V4L2 UAPI as the basis for
virtio-video (we are working on a small proof-of-concept and things
are going smoothly so far).

Cheers,
Alex.


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-01-18 23:06         ` Alexander Gordeev
  2023-02-06 14:12           ` Cornelia Huck
@ 2023-02-07  6:51           ` Alexandre Courbot
  2023-02-07 10:57             ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-02-07  6:51 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

On Thu, Jan 19, 2023 at 8:06 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> Hi Alexandre,
>
> On 12.01.23 07:39, Alexandre Courbot wrote:
> > On Thu, Jan 12, 2023 at 3:42 AM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >> Hi Alexandre,
> >>
> >> On 27.12.22 08:31, Alexandre Courbot wrote:
> >>> Hi Alexander,
> >>>
> >>>
> >>> Cornelia provided links to the previous versions (thanks!). Through
> >>> these revisions we tried different approaches, and the more we
> >>> progress the closer we are getting to the V4L2 stateful
> >>> decoder/encoder interface.
> >>>
> >>> This is actually the point where I would particularly be interested in
> >>> having your feedback, since you probably have noticed the similarity.
> >>> What would you think about just using virtio as a transport for V4L2
> >>> ioctls (virtio-fs does something similar with FUSE), and having the
> >>> host emulate a V4L2 decoder or encoder device in place of this (long)
> >>> specification? I am personally starting to think this is could be a
> >>> better and faster way to get us to a point where both spec and guest
> >>> drivers are merged. Moreover this would also open the way to support
> >>> other kinds of V4L2 devices like simple cameras - we would just need
> >>> to allocate new device IDs for these and would be good to go.
> >>>
> >>> This probably means a bit more work on the device side, since this
> >>> spec is tailored for the specific video codec use-case and V4L2 is
> >>> more generic, but also less spec to maintain and more confidence that
> >>> things will work as we want in the real world. On the other hand, the
> >>> device would also become simpler by the fact that responses to
> >>> commands could not come out-of-order as they currently do. So at the
> >>> end of the day I'm not even sure this would result in a more complex
> >>> device.
> >> Sorry for the delay. I tried to gather data about how the spec has
> >> evolved in the old emails.
> > If has been a bit all over the place as we tried different approaches,
> > sorry about that. >_<
>
> No worries, this is totally understandable.
>
>
> >> Well, on the one hand mimicking v4l2 looks like an easy solution from
> >> virtio-video spec writing perspective. (But the implementers will have
> >> to read the V4L2 API instead AFAIU, which is probably longer...)
> > It should not necessarily be much longer as the parts we are
> > interested in have their own dedicated pages:
> >
> > https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html>https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html>
> >
> > Besides, the decoding and encoding processes are described with more
> > precision, not that we couldn't do that here but it would make the
> > spec grow longer than I am comfortable with...
>
> I read the references carefully, thanks. I am somewhat familiar with the
> stateful decoder API, but the stateless one still needs exploring.

Note that the stateful API is the one covering the current
virtio-video spec, and the one I would recommend to implement a
virtual video device. A stateless device could be implemented by
people who really need it, but I personally don't see a huge benefit
in doing so except maybe in some corner cases.

>
> There is one serious issue with these references IMO: they represent
> guest user-space <-> v4l2 subsystem API, not v4l2 subsystem <->
> virtio-video driver API. Just to make sure we're on the same page:
>
> guest user-space <-> v4l2 kernel subsystem <-> virtio-video driver <-
> virtio-video protocol -> virtio-video device.
>
> I believe this is how it is supposed to work, right? So I thought, that
> your intention is to simplify virtio-video driver and virtio-video
> protocol by reusing the v4l2 subsystem <-> v4l2 driver API. But having
> these references I can assume, that you want to use user-space <-> v4l2
> subsystem API, right? Well, I think this cannot happen and therefore
> these references cannot be used directly unless:
>
> 1. You suggest that virtio-video driver should not use v4l2 subsystem,
> but should mimic its user-space API in every detail. Probably not a good
> idea.

Can you elaborate on why you think this would be a bad idea? V4L2 has
some legacy stuff that we definitely will not want to add to the spec
(hence I don't think it should mimic it "in every detail"), but for
the most part the UAPI works very similarly to the current
virtio-video specification.

> 2. There is already a way to bypass the subsystem completely. I'm not
> aware of that.

A V4L2 driver needs to implement v4l2_ioctl_ops, which provides it
with the UAPI structures as submitted by user-space. Most drivers then
call V4L2 subsystem functions from these hooks, but it is perfectly ok
to not do so and thus bypass the subsystem completely.

If we find out that there is a benefit in going through the V4L2
subsystem (which I cannot see for now), rebuilding the UAPI structures
to communicate with the device is not different from building
virtio-video specific structures like what we are currently doing.

> 3. user-space <-> v4l2 subsystem API is already the same or very close
> to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
> even with stateful decoder/encoder. Even more with stateless decoders
> because I can see, that v4l2 subsystem actually stores some state in
> this case as well. Which is quite reasonable I think.

Actually I don't think this is even something we need to think about -
in its simplest form the V4L2 guest driver just needs to act as a
proxy for the device. So which decoder API is used by the host is
completely irrelevant to the guest driver - it can support a decoder,
an encoder, or a camera - it doesn't even need to be aware of what
kind of device it is exposing and that simplicity is another thing
that I like with this design.

This simplicity goes away if the guest device does not use V4L2 as its
user-space interface (e.g. Windows?). In this case we would be in the
exact same scenario as the current virtio-video spec, where we need to
build device-specific structures from the guest driver's internal
state.

> So I think what we need to reference here is v4l2 subsystem <-> v4l2
> driver API. Do you have this reference? Well, I know there is some
> documentation, but still I doubt that. AFAIR kernel internal APIs are
> never fixed. Right?

Correct, but since they are not being considered for the spec this
should not matter, thankfully.

> Besides that I can see, that this version of the spec indeed comes
> closer to v4l2 user-space API, but it doesn't borrow all the legacy. I
> think, this is fine. It is definitely more lightweight. I like that.

We definitely don't want to borrow all the legacy. Actually most
modern V4L2 drivers also do not support the oldest V4L2 features, so
it is perfectly fine if the virtio-video guest driver does the same.
We will just need to mention explicitly in the spec which parts of
V4L2 are out-of-scope.

> >> On the other hand v4l2 has a lot of history. It started as a camera API
> >> and gained the codec support later, right? So it definitely has just to
> >> much stuff irrelevant for codecs. Here we have an option to design from
> >> scratch taking the best ideas from v4l2.
> > That's also what we were thinking initially, but as we try to
> > implement our new and optimized designs, we end up hitting a wall and
> > redoing things like V4L2 did. There are small exceptions, like how
> > STOP/RESET is implemented here which is slightly simpler than the V4L2
> > equivalent, but I don't think these justify reinventing the remaining
> > 95% quasi-identically.
>
> I like this simplicity. If the user-space API can be reused, and this
> makes things easier somehow, then maybe you're right. So looking forward
> for you response to my comment above.

I believe the user-space API can be reused and hope to demonstrate it
with a small PoC soon. But if you see something that would make this
unworkable, now is a good time to elaborate. :)

> > V4L2 supports much more than video codecs, but if you want to
> > implement a decoder you don't need to support anything more than what
> > the decoder spec says you should. And that subset happens to map very
> > well to the decoder use-case - it's not like the V4L2 folks tried to
> > shoehorn codecs into something that is inadequate for them.
>
> Actually I think that having only a single timestamp might be the result
> of the evolution from an API for cameras, where you don't have this
> tricky ordering stuff.

IIUC frames from camera devices do get a real-time timestamp affixed
to them though. What is new with codec devices is that the timestamp
is copied from an input buffer to the corresponding output buffers.

> >> Also I have concerns about the virtio-video spec development. This seems
> >> like a big change. It seems to me that after so many discussions and
> >> versions of the spec, the process should be coming to something by now.
> >> But this is still a moving target...
> > I agree and apologize for the slow progress of this project, but let's
> > not fall for the sunk cost fallacy if it turns out the
> > V4L2-over-virtio solution fits the bill better and for less effort.
>
> Ok. I just think that at this point this has to be very well justified.
>
>
> >> There were arguments against adding camera support for security and
> >> complexity reasons during discussions about virtio-video spec v1. Were
> >> these concerns addressed somehow? Maybe I missed a followup discussion?
> > The conclusion was that cameras should be their own specification as
> > the virtio-video spec is too specialized for the codec use-case. There
> > is actually an ongoing project for this:
> >
> > https://gitlab.collabora.com/collabora/virtio-camera
> >
> > ... which states in its README: "For now it is almost directly based
> > on V4L2 Linux driver UAPI."
> >
> > That makes me think, if virtio-video is going to ressemble V4L2
> > closely, and virtio-camera ends up heading in the same direction, why
> > don't we just embrace the underlying reality that we are reinventing
> > V4L2?
>
> As I wrote earlier I only welcome taking the best ideas from V4L2. And
> as I wrote above AFAIU we can't reinvent and we're not reinventing V4L2
> UAPI.
>
>
> >>>>> +
> >>>>> +Finally, for \field{struct virtio_video_resource_sg_list}:
> >>>>> +
> >>>>> +\begin{description}
> >>>>> +\item[\field{num_entries}]
> >>>>> +is the number of \field{struct virtio_video_resource_sg_entry} instances
> >>>>> +that follow.
> >>>>> +\end{description}
> >>>>> +
> >>>>> +\field{struct virtio_video_resource_object} is defined as follows:
> >>>>> +
> >>>>> +\begin{lstlisting}
> >>>>> +struct virtio_video_resource_object {
> >>>>> +        u8 uuid[16];
> >>>>> +};
> >>>>> +\end{lstlisting}
> >>>>> +
> >>>>> +\begin{description}
> >>>>> +\item[uuid]
> >>>>> +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
> >>>>> +\end{description}
> >>>>> +
> >>>>> +The device responds with
> >>>>> +\field{struct virtio_video_resource_attach_backing_resp}:
> >>>>> +
> >>>>> +\begin{lstlisting}
> >>>>> +struct virtio_video_resource_attach_backing_resp {
> >>>>> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
> >>>>> +};
> >>>>> +\end{lstlisting}
> >>>>> +
> >>>>> +\begin{description}
> >>>>> +\item[\field{result}]
> >>>>> +is
> >>>>> +
> >>>>> +\begin{description}
> >>>>> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
> >>>>> +if the operation succeeded,
> >>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
> >>>>> +if the mentioned stream does not exist,
> >>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
> >>>>> +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
> >>>>> +invalid value,
> >>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
> >>>>> +if the operation is performed at a time when it is non-valid.
> >>>>> +\end{description}
> >>>>> +\end{description}
> >>>>> +
> >>>>> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
> >>>>> +the following times:
> >>>>> +
> >>>>> +\begin{itemize}
> >>>>> +\item
> >>>>> +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
> >>>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
> >>>>> +  resource,
> >>>>> +\item
> >>>>> +  AFTER successfully changing the \field{virtio_video_params_resources}
> >>>>> +  parameter corresponding to the queue and BEFORE
> >>>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
> >>>>> +\end{itemize}
> >>>>> +
> >>>>> +This is to ensure that the device can rely on the fact that a given
> >>>>> +resource will always point to the same memory for as long as it may be
> >>>>> +used by the video device. For instance, a decoder may use returned
> >>>>> +decoded frames as reference for future frames and won't overwrite the
> >>>>> +backing resource of a frame that is being referenced. It is only before
> >>>>> +a stream is started and after a Dynamic Resolution Change event has
> >>>>> +occurred that we can be sure that all resources won't be used in that
> >>>>> +way.
> >>>> The mentioned scenario about the referenced frames looks
> >>>> somewhatreasonable, but I wonder how exactly would that work in practice.
> >>> Basically the guest need to make sure the backing memory remains
> >>> available and unwritten until the conditions mentioned above are met.
> >>> Or is there anything unclear in this description?
> >> Ok, I read the discussions about whether to allow the device to have
> >> read access after response to QUEUE or not. Since this comes from v4l2,
> >> then this should not be a problem, I think. I didn't know that v4l2
> >> expects the user-space to never write to CAPTURE buffers after they
> >> dequeued. I wonder if it is enforced in drivers.
>
> Actually I have more on this. I'd prefer if there is a requirement for
> the driver or the device to handle a case when a still referenced frame
> is queued again. So that it is less likely to be forgotten.

With the stateful API, the driver/device is in charge of making sure
submitted reference frames are not overwritten as long as there is a
reference to them. I.e. the driver or device firmware will reorder to
frames to avoid that, so there is indeed the requirement you describe.

With the stateless API, user-space is in charge of reference frames
reordering and the driver is only an executor - if user-space does
something damaging to itself, it will get corrupted frames.

I hope this clarifies things a bit - let's go to the bottom of any
remaining concern to make sure we make the right call here.

Cheers,
Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-02-07  6:51           ` Alexandre Courbot
@ 2023-02-07 10:57             ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-02-07 10:57 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

On 07.02.23 07:51, Alexandre Courbot wrote:
> On Thu, Jan 19, 2023 at 8:06 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>>> Well, on the one hand mimicking v4l2 looks like an easy solution from
>>>> virtio-video spec writing perspective. (But the implementers will have
>>>> to read the V4L2 API instead AFAIU, which is probably longer...)
>>> It should not necessarily be much longer as the parts we are
>>> interested in have their own dedicated pages:
>>>
>>> https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html
>>> https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html
>>>
>>> Besides, the decoding and encoding processes are described with more
>>> precision, not that we couldn't do that here but it would make the
>>> spec grow longer than I am comfortable with...
>> I read the references carefully, thanks. I am somewhat familiar with the
>> stateful decoder API, but the stateless one still needs exploring.
> Note that the stateful API is the one covering the current
> virtio-video spec, and the one I would recommend to implement a
> virtual video device. A stateless device could be implemented by
> people who really need it, but I personally don't see a huge benefit
> in doing so except maybe in some corner cases.
>
>> There is one serious issue with these references IMO: they represent
>> guest user-space <-> v4l2 subsystem API, not v4l2 subsystem <->
>> virtio-video driver API. Just to make sure we're on the same page:
>>
>> guest user-space <-> v4l2 kernel subsystem <-> virtio-video driver <-
>> virtio-video protocol -> virtio-video device.
>>
>> I believe this is how it is supposed to work, right? So I thought, that
>> your intention is to simplify virtio-video driver and virtio-video
>> protocol by reusing the v4l2 subsystem <-> v4l2 driver API. But having
>> these references I can assume, that you want to use user-space <-> v4l2
>> subsystem API, right? Well, I think this cannot happen and therefore
>> these references cannot be used directly unless:
>>
>> 1. You suggest that virtio-video driver should not use v4l2 subsystem,
>> but should mimic its user-space API in every detail. Probably not a good
>> idea.
> Can you elaborate on why you think this would be a bad idea? V4L2 has
> some legacy stuff that we definitely will not want to add to the spec
> (hence I don't think it should mimic it "in every detail"), but for
> the most part the UAPI works very similarly to the current
> virtio-video specification.

This was an attempt to imagine a situation, where v4l2 subsystem is not
used at all, completely. Like not even added into the kernel build. Just
for the sake of listing all the available options. Sorry for the confusion.


>> 2. There is already a way to bypass the subsystem completely. I'm not
>> aware of that.
> A V4L2 driver needs to implement v4l2_ioctl_ops, which provides it
> with the UAPI structures as submitted by user-space. Most drivers then
> call V4L2 subsystem functions from these hooks, but it is perfectly ok
> to not do so and thus bypass the subsystem completely.

True. Thanks for pointing this out. Indeed, this is technically possible
and this approach seems well upstreamable.


> If we find out that there is a benefit in going through the V4L2
> subsystem (which I cannot see for now), rebuilding the UAPI structures
> to communicate with the device is not different from building
> virtio-video specific structures like what we are currently doing.

Well, the V4L2 subsystem is there for a reason, right? It does some
important things too. I'm going to check all the v4l2_ioctl_ops
callbacks in the current virtio-video driver to make the list. Also if
you have some PoC spec/implementations, that would be nice to review. It
is always better to see the actual implementation, of course.

I have these points so far:

1. Overall the V4L2 stateful decoder API looks significantly more
complex to me. Looks like you're a V4L2 expert, so this might not be
visible to you that much.

   a. So V4L2 subsystem and the current virtio-video driver are already
reducing the complexity. And this seems as the right place to do this,
because the complexity is caused by the amount of V4L2 use cases and its
legacy. If somebody wants to use virtio-video in a Windows guest, they
would prefer a simpler API, right? I think this use-case is not purely
abstract at all.

   b. Less complex API is better from a security point of view too. When
V4L2 was developed, not many people were concerned with malicious USB
devices probably. At least exploiting a malicious USB device usually
requires physical access. With virtual devices and multiple VMs the
stakes are higher, I believe.

2. We have a working virtio-video driver. So we need very good reasons
to start from scratch. You name two reasons AFAIR: simplicity and
possible use of cameras. Did I miss something else?

   a. The simplicity is there only in case all the interfaces are V4L2,
both in the backend and in the guest. Otherwise the complexity is just
moved to backends. I haven't seen V4L2 in our setups so far, only some
proprietary OMX libraries. So from my point of view, this is not
simplicity in general, but an optimization for a specific narrow use case.

   b. For modern cameras the V4L2 interface is not enough anyway. This
was already discussed AFAIR. There is a separate virtio-camera
specification, that indeed is based on V4L2 UAPI as you said. But
combining these two specs is certainly not future proof, right? So I
think it is best to let the virtio-camera spec to be developed
independently.

3. More specifically I can see, that around 95% V4L2 drivers use
videobuf2. This includes the current virtio-video driver. Bypassing the
V4L2 subsystem means that vb2 can't be used, right? In various
discussions vb2 popped up as a thing, that would be hard to avoid. What
do you think about this? How are you going to deal with various V4L2
memory types (V4L2_MEMORY_MMAP, V4L2_MEMORY_DMABUF, etc), for example?
I'll try to dive deeper myself too...


>> 3. user-space <-> v4l2 subsystem API is already the same or very close
>> to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
>> even with stateful decoder/encoder. Even more with stateless decoders
>> because I can see, that v4l2 subsystem actually stores some state in
>> this case as well. Which is quite reasonable I think.
> Actually I don't think this is even something we need to think about -
> in its simplest form the V4L2 guest driver just needs to act as a
> proxy for the device. So which decoder API is used by the host is
> completely irrelevant to the guest driver - it can support a decoder,
> an encoder, or a camera - it doesn't even need to be aware of what
> kind of device it is exposing and that simplicity is another thing
> that I like with this design.

As I wrote above the design would be indeed simple only in case the
actual hardware is exposed to a backend through V4L2 too. Otherwise the
complexity is just moved to backends.


> This simplicity goes away if the guest device does not use V4L2 as its
> user-space interface (e.g. Windows?). In this case we would be in the
> exact same scenario as the current virtio-video spec, where we need to
> build device-specific structures from the guest driver's internal
> state.

IMO this is not quite correct. The scenario would not be not the same,
because the V4L2 stateful decoder API is more complex in comparison to
any virtio-video spec draft version. Probably it would be great to have
a list of differences. I hope to find some time for this later...


>> So I think what we need to reference here is v4l2 subsystem <-> v4l2
>> driver API. Do you have this reference? Well, I know there is some
>> documentation, but still I doubt that. AFAIR kernel internal APIs are
>> never fixed. Right?
> Correct, but since they are not being considered for the spec this
> should not matter, thankfully.

Ack.


>> Besides that I can see, that this version of the spec indeed comes
>> closer to v4l2 user-space API, but it doesn't borrow all the legacy. I
>> think, this is fine. It is definitely more lightweight. I like that.
> We definitely don't want to borrow all the legacy. Actually most
> modern V4L2 drivers also do not support the oldest V4L2 features, so
> it is perfectly fine if the virtio-video guest driver does the same.
> We will just need to mention explicitly in the spec which parts of
> V4L2 are out-of-scope.

Hmm, good. It would be nice to see this in practice. Looks like it is
best to wait for the PoC then.


>>>> On the other hand v4l2 has a lot of history. It started as a camera API
>>>> and gained the codec support later, right? So it definitely has just to
>>>> much stuff irrelevant for codecs. Here we have an option to design from
>>>> scratch taking the best ideas from v4l2.
>>> That's also what we were thinking initially, but as we try to
>>> implement our new and optimized designs, we end up hitting a wall and
>>> redoing things like V4L2 did. There are small exceptions, like how
>>> STOP/RESET is implemented here which is slightly simpler than the V4L2
>>> equivalent, but I don't think these justify reinventing the remaining
>>> 95% quasi-identically.
>> I like this simplicity. If the user-space API can be reused, and this
>> makes things easier somehow, then maybe you're right. So looking forward
>> for you response to my comment above.
> I believe the user-space API can be reused and hope to demonstrate it
> with a small PoC soon. But if you see something that would make this
> unworkable, now is a good time to elaborate. :)

Unfortunately I don't have enough knowledge of V4L2 internals at the
moment to be certain about that. But I'll definitely spend some time on
this as the approach now looks technically possible. So I have only the
mentioned above higher level concerns at the moment.


>>> V4L2 supports much more than video codecs, but if you want to
>>> implement a decoder you don't need to support anything more than what
>>> the decoder spec says you should. And that subset happens to map very
>>> well to the decoder use-case - it's not like the V4L2 folks tried to
>>> shoehorn codecs into something that is inadequate for them.
>> Actually I think that having only a single timestamp might be the result
>> of the evolution from an API for cameras, where you don't have this
>> tricky ordering stuff.
> IIUC frames from camera devices do get a real-time timestamp affixed
> to them though. What is new with codec devices is that the timestamp
> is copied from an input buffer to the corresponding output buffers.

Yes, that's what I mean. The frames from cameras have to be timestamped,
but nobody really expects reordering, right? Then codecs are added and
now reordering of output frames is a normal use case. So I imagine the
V4L2 folks had a choice:

1. replace a single timestamp with PTS and DTS in all the APIs and
potentially break some user-space apps or

2. simply require presentation ordering for the new class of devices.


>>>>>>> +
>>>>>>> +Finally, for \field{struct virtio_video_resource_sg_list}:
>>>>>>> +
>>>>>>> +\begin{description}
>>>>>>> +\item[\field{num_entries}]
>>>>>>> +is the number of \field{struct virtio_video_resource_sg_entry} instances
>>>>>>> +that follow.
>>>>>>> +\end{description}
>>>>>>> +
>>>>>>> +\field{struct virtio_video_resource_object} is defined as follows:
>>>>>>> +
>>>>>>> +\begin{lstlisting}
>>>>>>> +struct virtio_video_resource_object {
>>>>>>> +        u8 uuid[16];
>>>>>>> +};
>>>>>>> +\end{lstlisting}
>>>>>>> +
>>>>>>> +\begin{description}
>>>>>>> +\item[uuid]
>>>>>>> +is a version 4 UUID specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>>>>>> +\end{description}
>>>>>>> +
>>>>>>> +The device responds with
>>>>>>> +\field{struct virtio_video_resource_attach_backing_resp}:
>>>>>>> +
>>>>>>> +\begin{lstlisting}
>>>>>>> +struct virtio_video_resource_attach_backing_resp {
>>>>>>> +        le32 result; /* VIRTIO_VIDEO_RESULT_* */
>>>>>>> +};
>>>>>>> +\end{lstlisting}
>>>>>>> +
>>>>>>> +\begin{description}
>>>>>>> +\item[\field{result}]
>>>>>>> +is
>>>>>>> +
>>>>>>> +\begin{description}
>>>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_OK]
>>>>>>> +if the operation succeeded,
>>>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_STREAM\_ID]
>>>>>>> +if the mentioned stream does not exist,
>>>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_ARGUMENT]
>>>>>>> +if \field{queue_type}, \field{resource_id}, or \field{resources} have an
>>>>>>> +invalid value,
>>>>>>> +\item[VIRTIO\_VIDEO\_RESULT\_ERR\_INVALID\_OPERATION]
>>>>>>> +if the operation is performed at a time when it is non-valid.
>>>>>>> +\end{description}
>>>>>>> +\end{description}
>>>>>>> +
>>>>>>> +VIRTIO\_VIDEO\_CMD\_RESOURCE\_ATTACH\_BACKING can only be called during
>>>>>>> +the following times:
>>>>>>> +
>>>>>>> +\begin{itemize}
>>>>>>> +\item
>>>>>>> +  AFTER a VIRTIO\_VIDEO\_CMD\_STREAM\_CREATE and BEFORE invoking
>>>>>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE for the first time on the
>>>>>>> +  resource,
>>>>>>> +\item
>>>>>>> +  AFTER successfully changing the \field{virtio_video_params_resources}
>>>>>>> +  parameter corresponding to the queue and BEFORE
>>>>>>> +  VIRTIO\_VIDEO\_CMD\_RESOURCE\_QUEUE is called again on the resource.
>>>>>>> +\end{itemize}
>>>>>>> +
>>>>>>> +This is to ensure that the device can rely on the fact that a given
>>>>>>> +resource will always point to the same memory for as long as it may be
>>>>>>> +used by the video device. For instance, a decoder may use returned
>>>>>>> +decoded frames as reference for future frames and won't overwrite the
>>>>>>> +backing resource of a frame that is being referenced. It is only before
>>>>>>> +a stream is started and after a Dynamic Resolution Change event has
>>>>>>> +occurred that we can be sure that all resources won't be used in that
>>>>>>> +way.
>>>>>> The mentioned scenario about the referenced frames looks
>>>>>> somewhatreasonable, but I wonder how exactly would that work in practice.
>>>>> Basically the guest need to make sure the backing memory remains
>>>>> available and unwritten until the conditions mentioned above are met.
>>>>> Or is there anything unclear in this description?
>>>> Ok, I read the discussions about whether to allow the device to have
>>>> read access after response to QUEUE or not. Since this comes from v4l2,
>>>> then this should not be a problem, I think. I didn't know that v4l2
>>>> expects the user-space to never write to CAPTURE buffers after they
>>>> dequeued. I wonder if it is enforced in drivers.
>> Actually I have more on this. I'd prefer if there is a requirement for
>> the driver or the device to handle a case when a still referenced frame
>> is queued again. So that it is less likely to be forgotten.
> With the stateful API, the driver/device is in charge of making sure
> submitted reference frames are not overwritten as long as there is a
> reference to them. I.e. the driver or device firmware will reorder to
> frames to avoid that, so there is indeed the requirement you describe.
>
> With the stateless API, user-space is in charge of reference frames
> reordering and the driver is only an executor - if user-space does
> something damaging to itself, it will get corrupted frames.
>
> I hope this clarifies things a bit - let's go to the bottom of any
> remaining concern to make sure we make the right call here.

This was a note about the virtio-video draft v6. I hope it is still on
the table to discuss. :) Of course, discussions about moving to V4L2
UAPI is of higher priority.


Regards,
Alexander

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-02-06 14:12           ` Cornelia Huck
  2023-02-07  6:16             ` Alexandre Courbot
@ 2023-02-07 11:11             ` Alexander Gordeev
  1 sibling, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-02-07 11:11 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida

On 06.02.23 15:12, Cornelia Huck wrote:
> On Thu, Jan 19 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>
>> On 12.01.23 07:39, Alexandre Courbot wrote:
>>> On Thu, Jan 12, 2023 at 3:42 AM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>> Well, on the one hand mimicking v4l2 looks like an easy solution from
>>>> virtio-video spec writing perspective. (But the implementers will have
>>>> to read the V4L2 API instead AFAIU, which is probably longer...)
>>> It should not necessarily be much longer as the parts we are
>>> interested in have their own dedicated pages:
>>>
>>> https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html
>>> https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html
>>>
>>> Besides, the decoding and encoding processes are described with more
>>> precision, not that we couldn't do that here but it would make the
>>> spec grow longer than I am comfortable with...
>> I read the references carefully, thanks. I am somewhat familiar with the
>> stateful decoder API, but the stateless one still needs exploring.
>>
>> There is one serious issue with these references IMO: they represent
>> guest user-space <-> v4l2 subsystem API, not v4l2 subsystem <->
>> virtio-video driver API. Just to make sure we're on the same page:
>>
>> guest user-space <-> v4l2 kernel subsystem <-> virtio-video driver <-
>> virtio-video protocol -> virtio-video device.
>>
>> I believe this is how it is supposed to work, right? So I thought, that
>> your intention is to simplify virtio-video driver and virtio-video
>> protocol by reusing the v4l2 subsystem <-> v4l2 driver API. But having
>> these references I can assume, that you want to use user-space <-> v4l2
>> subsystem API, right? Well, I think this cannot happen and therefore
>> these references cannot be used directly unless:
>>
>> 1. You suggest that virtio-video driver should not use v4l2 subsystem,
>> but should mimic its user-space API in every detail. Probably not a good
>> idea.
>>
>> 2. There is already a way to bypass the subsystem completely. I'm not
>> aware of that.
>>
>> 3. user-space <-> v4l2 subsystem API is already the same or very close
>> to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
>> even with stateful decoder/encoder. Even more with stateless decoders
>> because I can see, that v4l2 subsystem actually stores some state in
>> this case as well. Which is quite reasonable I think.
>>
>> So I think what we need to reference here is v4l2 subsystem <-> v4l2
>> driver API. Do you have this reference? Well, I know there is some
>> documentation, but still I doubt that. AFAIR kernel internal APIs are
>> never fixed. Right?
> So, I'm not that familiar with v4l2, but if that's indeed the case,
> depending on some kernel internal APIs is a no-go. First, because
> in-kernel APIs are not stable, and second, because we want something
> that's BSD-licenced (as opposed to GPLv2-licenced) to point to. The
> kernel<->userspace API would work (BSD-licenced header and stable); I
> had the impression that we wanted to reuse the various #defines in
> there -- did I misunderstand?

Sorry, this was a misunderstanding. Now I see, that indeed it is
possible to implement V4L2 UAPI in a V4L2 driver using v4l2_ioctl_ops as
pointed out by Alexandre.


>>> Let me try to summarize the case for using V4L2 over Virtio (I'll call
>>> it virtio-v4l2 to differentiate it from the current spec).
>>>
>>> There is the argument that virtio-video turns out to be a recreation
>>> of the stateful V4L2 decoder API, which itself works similarly to
>>> other high-level decoder APIs. So it's not like we could or should
>>> come with something very different. In parallel, virtio-camera is also
>>> currently using V4L2 as its model. While this is subject to change, I
>>> am starting to see a pattern here. :)
>>>
>>> Transporting V4L2 over virtio would considerably shorten the length of
>>> this spec, as we would just need to care about the transport aspect
>>> and minor amendments to the meaning of some V4L2 structure members,
>>> and leave the rest to V4L2 which is properly documented and for which
>>> there is a large collection of working examples.
>>>
>>> This would work very well for codec devices, but as a side-effect
>>> would also enable other kinds of devices that may be useful to
>>> virtualize, like image processors, DVB cards, and cameras. This
>>> doesn't mean virtio-v4l2 should be the *only* way to support cameras
>>> over virtio. It is a nice bonus of encapsulating V4L2, it may be
>>> sufficient for simple (most?) use-cases, but also doesn't forbid more
>>> specialized virtual devices for complex camera pipelines to be added
>>> later. virtio-v4l2 would just be the generic virtual video device that
>>> happens to be sufficient for our accelerated video needs - and if your
>>> host camera is a USB UVC one, well feel free to use that too.
>>>
>>> In other words, I see an opportunity to enable a whole class of
>>> devices instead of a single type for the same effort and think we
>>> should seriously consider this.
>>>
>>> I have started to put down what a virtio-v4l2 transport might look
>>> like, and am also planning on putting together a small
>>> proof-of-concept. If I can get folks here to warm up to the idea, I
>>> believe we should be able to share a spec and prototype in a month or
>>> so.
>> Thanks for the detailed explanation. Please check my comments above. I'd
>> like to resolve the mentioned issue first.
> I hope we can sort this out soon -- I guess I'm not the only one who is
> anxious about this spec moving forward :) Please let me know if I can
> help in any way.

Thank you very much for pushing this discussion forward!


Kind regards,
Alexander Gordeev

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-02-07  6:16             ` Alexandre Courbot
@ 2023-02-07 13:59               ` Cornelia Huck
  2023-03-10 10:50                 ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2023-02-07 13:59 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida

On Tue, Feb 07 2023, Alexandre Courbot <acourbot@chromium.org> wrote:

> On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com> wrote:
>>
>> On Thu, Jan 19 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>
>> > Hi Alexandre,
>> >
>> > On 12.01.23 07:39, Alexandre Courbot wrote:
>> >> On Thu, Jan 12, 2023 at 3:42 AM Alexander Gordeev
>> >> <alexander.gordeev@opensynergy.com> wrote:
>>
>> >>> Well, on the one hand mimicking v4l2 looks like an easy solution from
>> >>> virtio-video spec writing perspective. (But the implementers will have
>> >>> to read the V4L2 API instead AFAIU, which is probably longer...)
>> >> It should not necessarily be much longer as the parts we are
>> >> interested in have their own dedicated pages:
>> >>
>> >> https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-decoder.html>https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html  <https://docs.kernel.org/userspace-api/media/v4l/dev-encoder.html>
>> >>
>> >> Besides, the decoding and encoding processes are described with more
>> >> precision, not that we couldn't do that here but it would make the
>> >> spec grow longer than I am comfortable with...
>> >
>> > I read the references carefully, thanks. I am somewhat familiar with the
>> > stateful decoder API, but the stateless one still needs exploring.
>> >
>> > There is one serious issue with these references IMO: they represent
>> > guest user-space <-> v4l2 subsystem API, not v4l2 subsystem <->
>> > virtio-video driver API. Just to make sure we're on the same page:
>> >
>> > guest user-space <-> v4l2 kernel subsystem <-> virtio-video driver <-
>> > virtio-video protocol -> virtio-video device.
>> >
>> > I believe this is how it is supposed to work, right? So I thought, that
>> > your intention is to simplify virtio-video driver and virtio-video
>> > protocol by reusing the v4l2 subsystem <-> v4l2 driver API. But having
>> > these references I can assume, that you want to use user-space <-> v4l2
>> > subsystem API, right? Well, I think this cannot happen and therefore
>> > these references cannot be used directly unless:
>> >
>> > 1. You suggest that virtio-video driver should not use v4l2 subsystem,
>> > but should mimic its user-space API in every detail. Probably not a good
>> > idea.
>> >
>> > 2. There is already a way to bypass the subsystem completely. I'm not
>> > aware of that.
>> >
>> > 3. user-space <-> v4l2 subsystem API is already the same or very close
>> > to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
>> > even with stateful decoder/encoder. Even more with stateless decoders
>> > because I can see, that v4l2 subsystem actually stores some state in
>> > this case as well. Which is quite reasonable I think.
>> >
>> > So I think what we need to reference here is v4l2 subsystem <-> v4l2
>> > driver API. Do you have this reference? Well, I know there is some
>> > documentation, but still I doubt that. AFAIR kernel internal APIs are
>> > never fixed. Right?
>>
>> So, I'm not that familiar with v4l2, but if that's indeed the case,
>> depending on some kernel internal APIs is a no-go. First, because
>> in-kernel APIs are not stable, and second, because we want something
>> that's BSD-licenced (as opposed to GPLv2-licenced) to point to. The
>> kernel<->userspace API would work (BSD-licenced header and stable); I
>> had the impression that we wanted to reuse the various #defines in
>> there -- did I misunderstand?
>
> Sorry, I should have replied earlier to lift any misunderstanding. I
> am not suggesting to use any kernel internal API as reference. My
> suggestion is to stick strictly to the UAPI which is stable (as in,
> guaranteed to be backward-compatible) and well documented. Here is for
> instance the part documenting buffer queuing/dequeuing:
> https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/vidioc-qbuf.html
>
> Changing anything in this documentation in a backward-incompatible way
> would break user-space, and everyone familiar with the kernel
> community knows what happens when someone breaks user-space. ;) So
> this can be a reliable source for virtio-video (and if you look
> closely, you will also notice many similarities between the two).

Yes, that sounds good to me :)

>
>>
>> (...)
>>
>> >> Let me try to summarize the case for using V4L2 over Virtio (I'll call
>> >> it virtio-v4l2 to differentiate it from the current spec).
>> >>
>> >> There is the argument that virtio-video turns out to be a recreation
>> >> of the stateful V4L2 decoder API, which itself works similarly to
>> >> other high-level decoder APIs. So it's not like we could or should
>> >> come with something very different. In parallel, virtio-camera is also
>> >> currently using V4L2 as its model. While this is subject to change, I
>> >> am starting to see a pattern here. :)
>> >>
>> >> Transporting V4L2 over virtio would considerably shorten the length of
>> >> this spec, as we would just need to care about the transport aspect
>> >> and minor amendments to the meaning of some V4L2 structure members,
>> >> and leave the rest to V4L2 which is properly documented and for which
>> >> there is a large collection of working examples.
>> >>
>> >> This would work very well for codec devices, but as a side-effect
>> >> would also enable other kinds of devices that may be useful to
>> >> virtualize, like image processors, DVB cards, and cameras. This
>> >> doesn't mean virtio-v4l2 should be the *only* way to support cameras
>> >> over virtio. It is a nice bonus of encapsulating V4L2, it may be
>> >> sufficient for simple (most?) use-cases, but also doesn't forbid more
>> >> specialized virtual devices for complex camera pipelines to be added
>> >> later. virtio-v4l2 would just be the generic virtual video device that
>> >> happens to be sufficient for our accelerated video needs - and if your
>> >> host camera is a USB UVC one, well feel free to use that too.
>> >>
>> >> In other words, I see an opportunity to enable a whole class of
>> >> devices instead of a single type for the same effort and think we
>> >> should seriously consider this.
>> >>
>> >> I have started to put down what a virtio-v4l2 transport might look
>> >> like, and am also planning on putting together a small
>> >> proof-of-concept. If I can get folks here to warm up to the idea, I
>> >> believe we should be able to share a spec and prototype in a month or
>> >> so.
>> >
>> > Thanks for the detailed explanation. Please check my comments above. I'd
>> > like to resolve the mentioned issue first.
>>
>> I hope we can sort this out soon -- I guess I'm not the only one who is
>> anxious about this spec moving forward :) Please let me know if I can
>> help in any way.
>
> I'll try to address Alexander's points in more detail, but I am not
> seeing any blocking issue with using the V4L2 UAPI as the basis for
> virtio-video (we are working on a small proof-of-concept and things
> are going smoothly so far).

Great to hear, looking forward to it!


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-02-07 13:59               ` Cornelia Huck
@ 2023-03-10 10:50                 ` Cornelia Huck
  2023-03-10 13:19                   ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2023-03-10 10:50 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida

On Tue, Feb 07 2023, Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, Feb 07 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
>
>> On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com> wrote:
>>> I hope we can sort this out soon -- I guess I'm not the only one who is
>>> anxious about this spec moving forward :) Please let me know if I can
>>> help in any way.
>>
>> I'll try to address Alexander's points in more detail, but I am not
>> seeing any blocking issue with using the V4L2 UAPI as the basis for
>> virtio-video (we are working on a small proof-of-concept and things
>> are going smoothly so far).
>
> Great to hear, looking forward to it!

Quick question: Is there any git repo or similar where interested
parties can follow along? It would be great to have virtio-video in 1.3;
if you have some idea on when it might be ready, we could come up with a
schedule to accommodate that.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-10 10:50                 ` Cornelia Huck
@ 2023-03-10 13:19                   ` Alexandre Courbot
  2023-03-10 14:20                     ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-03-10 13:19 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve

Hi Cornelia,

On Fri, Mar 10, 2023 at 7:51 PM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Tue, Feb 07 2023, Cornelia Huck <cohuck@redhat.com> wrote:
>
> > On Tue, Feb 07 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
> >
> >> On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >>> I hope we can sort this out soon -- I guess I'm not the only one who is
> >>> anxious about this spec moving forward :) Please let me know if I can
> >>> help in any way.
> >>
> >> I'll try to address Alexander's points in more detail, but I am not
> >> seeing any blocking issue with using the V4L2 UAPI as the basis for
> >> virtio-video (we are working on a small proof-of-concept and things
> >> are going smoothly so far).
> >
> > Great to hear, looking forward to it!
>
> Quick question: Is there any git repo or similar where interested
> parties can follow along? It would be great to have virtio-video in 1.3;
> if you have some idea on when it might be ready, we could come up with a
> schedule to accommodate that.

I'm glad you asked, as a matter of fact I have just finished the
virtio-v4l2 proof of concept today! It is capable of exposing a camera
or encoder V4L2 device from the host to the guest, by encapsulating
V4L2 commands into virtio.

The guest driver code (single file for simplicity):
https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c

Bulk of the host-side crosvm device code:
https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/protocol.rs
https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/worker.rs

Neither are works of art, so please forgive the few inefficiencies
here and there - the goal was to make them easy to understand. Still,
the guest driver is probably closer to what a final driver would look
like. It fits in around 1,000 LoCs (comments excluded), which is
enough to support stateful video encoders as well as USB camera
devices. Decoders cannot be run yet because they require support for
V4L2 events and polling - I will try to enable these features next.
But even in its current state this driver shows one interesting aspect
of virtio-v4l2, at least for Linux guests: a single and relatively
simple driver is able to drive a wide range of devices.

The crosvm device code proxies a V4L2 device on the host, again using
roughly 1,200 lines of non-comment code. This design does not intend
to reflect what an actual host device will look like - in effect they
should be much more specialized since they are unlikely to also call
into V4L2 on the host side. However, if the host is Linux and we just
want to expose a USB camera or other V4L2 device almost as-is, then
this could actually be a good fit.

The protocol should be easy to understand by looking at the code - we
only have 5 virtio commands to open/close a session, map/unmap a
host-allocated buffer into the guest PAS, and the IOCTL command which
sends V4L2 ioctl structures to the host and waits for its reply. All
ioctls are synchronous per-session, meaning that a session only sends
one ioctl at a time and waits for its response before it can send the
next (as this is what user-space does too). Ioctls, however, never
block on the host side and the ones that would do (DQBUF and DQEVENT)
are replaced by host-initiated events. On top of being familiar to
people who have worked with V4L2 (i.e. a large portion of the media
folks), this simple design seems to be efficient as I have observed
identical performance on both host and guest with the vicodec virtual
encoder. Since this device generates frames using the CPU and keeps
one core 100% busy, any overhead introduced by virtualization should
be noticeable - yet I got nearly identical framerates on both host and
guest.

Things that still need to be implemented before this can be considered
more complete:

* Controls. This should not be particularly difficult but I left it
for now as they are not necessary to demonstrate the viability of this
project.
* Guest-allocated memory buffers and virtio objects. Based on our
previous experience with virtio-video these should not be difficult to
implement. Currently all video buffers are allocated by the host, and
mapped into the guest if needed.
* Events and polling, required to use a decoder. Again these were not
strictly necessary for the proof of concept, but since we've gone this
far I will try to get them to work as the next step.
* Requests and multi-part media devices. This will be necessary in
order to support more modern camera pipelines. I haven't made up my
mind yet about whether we should support this, but if we want to it
should not be too hard (describe several devices in the configuration
space and enable the request-related commands). I need to talk to
camera folks to know whether there is an actual interest in this.
* Support for more ioctls, in case we want to support tuners and radios.

If you want to try this code, you need to build the guest kernel with
CONFIG_VIRTIO_V4L2 and enable the `v4l2` feature when building crosvm
(check out the Book of Crosvm if you need instructions on how to build
and use it). Then pass --virtio-v4l2=/dev/videoX to crosvm in order to
expose the /dev/videoX host V4L2 device to the guest.

I have successfully captured frames (and verified their validity)
using the following devices:

* A simple USB camera using the `uvcvideo` driver. Both host and guest
could capture a MJPEG stream with the following command:
v4l2-ctl -d0 -v pixelformat=MJPG --stream-mmap --stream-to=test.mjpg

* The vivid virtual camera driver. I could capture a valid YUV stream
using the following command:
v4l2-ctl -d0 -v pixelformat=NV12 --stream-mmap --stream-to test.yuv

* The encoder device of the vicodec virtual codec driver. On both host
and guest, the following command produces a valid FWHT stream in
`test.fwht`:
v4l2-ctl -x pixelformat=NV12 --stream-mmap --stream-out-mmap
--stream-to-hdr test.fwht

By this work I hope to demonstrate to people interested in video
virtualization that encapsulating V4L2 in virtio is not only a viable
solution, it is a huge shortcut in terms of specification crafting,
driver writing, and overall headaches involved in specifying something
as complex as a video device. Not only could we support video decoders
and encoders, which was the goal of virtio-video, we would also get
image processors, video overlays and simple cameras for free, and
potentially more complex cameras if we decide to.

After writing this prototype (and a couple attempts at the
virtio-video specification) I don't see any reason not to rely on a
battle-tested protocol instead of designing our own that does
basically the same thing. The genericity of V4L2 may mean that
sometimes we will need 2 commands where virtio-video would require
only one, but we are talking about a low frequency of virtio commands
(60 fps for video playback typically) and that genericity comes with
the benefit of a single Linux guest driver.

If there is an agreement to move forward with this, I guess the next
step for me will be to write a proper spec so the protocol can be
understood and discussed in detail. Then why not try and upstream the
kernel driver and make ChromeOS use this too in place of our
heavily-patched virtio-video. :) We might even make it for virtio 1.3.

Looking forward to your feedback. Please don't hesitate to ask
questions, especially if you are not familiar with V4L2. I can also
help folks interested in running this with the setup if needed.

Cheers,
Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-10 13:19                   ` Alexandre Courbot
@ 2023-03-10 14:20                     ` Cornelia Huck
  2023-03-14  5:06                       ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2023-03-10 14:20 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve

On Fri, Mar 10 2023, Alexandre Courbot <acourbot@chromium.org> wrote:

> Hi Cornelia,
>
> On Fri, Mar 10, 2023 at 7:51 PM Cornelia Huck <cohuck@redhat.com> wrote:
>>
>> On Tue, Feb 07 2023, Cornelia Huck <cohuck@redhat.com> wrote:
>>
>> > On Tue, Feb 07 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
>> >
>> >> On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com> wrote:
>> >>> I hope we can sort this out soon -- I guess I'm not the only one who is
>> >>> anxious about this spec moving forward :) Please let me know if I can
>> >>> help in any way.
>> >>
>> >> I'll try to address Alexander's points in more detail, but I am not
>> >> seeing any blocking issue with using the V4L2 UAPI as the basis for
>> >> virtio-video (we are working on a small proof-of-concept and things
>> >> are going smoothly so far).
>> >
>> > Great to hear, looking forward to it!
>>
>> Quick question: Is there any git repo or similar where interested
>> parties can follow along? It would be great to have virtio-video in 1.3;
>> if you have some idea on when it might be ready, we could come up with a
>> schedule to accommodate that.
>
> I'm glad you asked, as a matter of fact I have just finished the
> virtio-v4l2 proof of concept today! It is capable of exposing a camera
> or encoder V4L2 device from the host to the guest, by encapsulating
> V4L2 commands into virtio.

\o/ Excellent news!

>
> The guest driver code (single file for simplicity):
> https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c
>
> Bulk of the host-side crosvm device code:
> https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/protocol.rs
> https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/worker.rs
>
> Neither are works of art, so please forgive the few inefficiencies
> here and there - the goal was to make them easy to understand. Still,
> the guest driver is probably closer to what a final driver would look
> like. It fits in around 1,000 LoCs (comments excluded), which is
> enough to support stateful video encoders as well as USB camera
> devices. Decoders cannot be run yet because they require support for
> V4L2 events and polling - I will try to enable these features next.
> But even in its current state this driver shows one interesting aspect
> of virtio-v4l2, at least for Linux guests: a single and relatively
> simple driver is able to drive a wide range of devices.

I had a quick look at the driver; it indeed looks like a big win on
Linux systems. (The one thing I'm missing is how easy it would be to
replicate the used v4l2 parts on non-Linux systems.)

>
> The crosvm device code proxies a V4L2 device on the host, again using
> roughly 1,200 lines of non-comment code. This design does not intend
> to reflect what an actual host device will look like - in effect they
> should be much more specialized since they are unlikely to also call
> into V4L2 on the host side. However, if the host is Linux and we just
> want to expose a USB camera or other V4L2 device almost as-is, then
> this could actually be a good fit.
>
> The protocol should be easy to understand by looking at the code - we
> only have 5 virtio commands to open/close a session, map/unmap a
> host-allocated buffer into the guest PAS, and the IOCTL command which
> sends V4L2 ioctl structures to the host and waits for its reply. All
> ioctls are synchronous per-session, meaning that a session only sends
> one ioctl at a time and waits for its response before it can send the
> next (as this is what user-space does too). Ioctls, however, never
> block on the host side and the ones that would do (DQBUF and DQEVENT)
> are replaced by host-initiated events. On top of being familiar to
> people who have worked with V4L2 (i.e. a large portion of the media
> folks), this simple design seems to be efficient as I have observed
> identical performance on both host and guest with the vicodec virtual
> encoder. Since this device generates frames using the CPU and keeps
> one core 100% busy, any overhead introduced by virtualization should
> be noticeable - yet I got nearly identical framerates on both host and
> guest.

I haven't worked with v4l2, but this approach sounds reasonable to me.

>
> Things that still need to be implemented before this can be considered
> more complete:
>
> * Controls. This should not be particularly difficult but I left it
> for now as they are not necessary to demonstrate the viability of this
> project.
> * Guest-allocated memory buffers and virtio objects. Based on our
> previous experience with virtio-video these should not be difficult to
> implement. Currently all video buffers are allocated by the host, and
> mapped into the guest if needed.
> * Events and polling, required to use a decoder. Again these were not
> strictly necessary for the proof of concept, but since we've gone this
> far I will try to get them to work as the next step.
> * Requests and multi-part media devices. This will be necessary in
> order to support more modern camera pipelines. I haven't made up my
> mind yet about whether we should support this, but if we want to it
> should not be too hard (describe several devices in the configuration
> space and enable the request-related commands). I need to talk to
> camera folks to know whether there is an actual interest in this.
> * Support for more ioctls, in case we want to support tuners and radios.
>
> If you want to try this code, you need to build the guest kernel with
> CONFIG_VIRTIO_V4L2 and enable the `v4l2` feature when building crosvm
> (check out the Book of Crosvm if you need instructions on how to build
> and use it). Then pass --virtio-v4l2=/dev/videoX to crosvm in order to
> expose the /dev/videoX host V4L2 device to the guest.
>
> I have successfully captured frames (and verified their validity)
> using the following devices:
>
> * A simple USB camera using the `uvcvideo` driver. Both host and guest
> could capture a MJPEG stream with the following command:
> v4l2-ctl -d0 -v pixelformat=MJPG --stream-mmap --stream-to=test.mjpg
>
> * The vivid virtual camera driver. I could capture a valid YUV stream
> using the following command:
> v4l2-ctl -d0 -v pixelformat=NV12 --stream-mmap --stream-to test.yuv
>
> * The encoder device of the vicodec virtual codec driver. On both host
> and guest, the following command produces a valid FWHT stream in
> `test.fwht`:
> v4l2-ctl -x pixelformat=NV12 --stream-mmap --stream-out-mmap
> --stream-to-hdr test.fwht

This looks very good already.

>
> By this work I hope to demonstrate to people interested in video
> virtualization that encapsulating V4L2 in virtio is not only a viable
> solution, it is a huge shortcut in terms of specification crafting,
> driver writing, and overall headaches involved in specifying something
> as complex as a video device. Not only could we support video decoders
> and encoders, which was the goal of virtio-video, we would also get
> image processors, video overlays and simple cameras for free, and
> potentially more complex cameras if we decide to.
>
> After writing this prototype (and a couple attempts at the
> virtio-video specification) I don't see any reason not to rely on a
> battle-tested protocol instead of designing our own that does
> basically the same thing. The genericity of V4L2 may mean that
> sometimes we will need 2 commands where virtio-video would require
> only one, but we are talking about a low frequency of virtio commands
> (60 fps for video playback typically) and that genericity comes with
> the benefit of a single Linux guest driver.
>
> If there is an agreement to move forward with this, I guess the next
> step for me will be to write a proper spec so the protocol can be
> understood and discussed in detail. Then why not try and upstream the
> kernel driver and make ChromeOS use this too in place of our
> heavily-patched virtio-video. :) We might even make it for virtio 1.3.
>
> Looking forward to your feedback. Please don't hesitate to ask
> questions, especially if you are not familiar with V4L2. I can also
> help folks interested in running this with the setup if needed.

Thank you for sharing your work! I think this looks very promising, and
I'd like to hear feedback from others as well. I assume that would make
the spec change more digestible than earlier versions.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-10 14:20                     ` Cornelia Huck
@ 2023-03-14  5:06                       ` Alexandre Courbot
  2023-03-16 10:12                         ` Alexander Gordeev
  2023-03-27 13:00                         ` Albert Esteve
  0 siblings, 2 replies; 97+ messages in thread
From: Alexandre Courbot @ 2023-03-14  5:06 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve

Hi Cornelia,

On Fri, Mar 10, 2023 at 11:20 PM Cornelia Huck <cohuck@redhat.com> wrote:
>
> On Fri, Mar 10 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
>
> > Hi Cornelia,
> >
> > On Fri, Mar 10, 2023 at 7:51 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >>
> >> On Tue, Feb 07 2023, Cornelia Huck <cohuck@redhat.com> wrote:
> >>
> >> > On Tue, Feb 07 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
> >> >
> >> >> On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >> >>> I hope we can sort this out soon -- I guess I'm not the only one who is
> >> >>> anxious about this spec moving forward :) Please let me know if I can
> >> >>> help in any way.
> >> >>
> >> >> I'll try to address Alexander's points in more detail, but I am not
> >> >> seeing any blocking issue with using the V4L2 UAPI as the basis for
> >> >> virtio-video (we are working on a small proof-of-concept and things
> >> >> are going smoothly so far).
> >> >
> >> > Great to hear, looking forward to it!
> >>
> >> Quick question: Is there any git repo or similar where interested
> >> parties can follow along? It would be great to have virtio-video in 1.3;
> >> if you have some idea on when it might be ready, we could come up with a
> >> schedule to accommodate that.
> >
> > I'm glad you asked, as a matter of fact I have just finished the
> > virtio-v4l2 proof of concept today! It is capable of exposing a camera
> > or encoder V4L2 device from the host to the guest, by encapsulating
> > V4L2 commands into virtio.
>
> \o/ Excellent news!

I am delighted that you seem to like it!

>
> >
> > The guest driver code (single file for simplicity):
> > https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c
> >
> > Bulk of the host-side crosvm device code:
> > https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/protocol.rs
> > https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/worker.rs
> >
> > Neither are works of art, so please forgive the few inefficiencies
> > here and there - the goal was to make them easy to understand. Still,
> > the guest driver is probably closer to what a final driver would look
> > like. It fits in around 1,000 LoCs (comments excluded), which is
> > enough to support stateful video encoders as well as USB camera
> > devices. Decoders cannot be run yet because they require support for
> > V4L2 events and polling - I will try to enable these features next.
> > But even in its current state this driver shows one interesting aspect
> > of virtio-v4l2, at least for Linux guests: a single and relatively
> > simple driver is able to drive a wide range of devices.
>
> I had a quick look at the driver; it indeed looks like a big win on
> Linux systems. (The one thing I'm missing is how easy it would be to
> replicate the used v4l2 parts on non-Linux systems.)

For non-Linux systems (host or guest, we may need to copy/paste or
reproduce the UAPI structures. Thankfully they are unambiguously
described in the UAPI documentation, see for example
https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/vidioc-enum-fmt.html
for the formats structure.

>
> >
> > The crosvm device code proxies a V4L2 device on the host, again using
> > roughly 1,200 lines of non-comment code. This design does not intend
> > to reflect what an actual host device will look like - in effect they
> > should be much more specialized since they are unlikely to also call
> > into V4L2 on the host side. However, if the host is Linux and we just
> > want to expose a USB camera or other V4L2 device almost as-is, then
> > this could actually be a good fit.
> >
> > The protocol should be easy to understand by looking at the code - we
> > only have 5 virtio commands to open/close a session, map/unmap a
> > host-allocated buffer into the guest PAS, and the IOCTL command which
> > sends V4L2 ioctl structures to the host and waits for its reply. All
> > ioctls are synchronous per-session, meaning that a session only sends
> > one ioctl at a time and waits for its response before it can send the
> > next (as this is what user-space does too). Ioctls, however, never
> > block on the host side and the ones that would do (DQBUF and DQEVENT)
> > are replaced by host-initiated events. On top of being familiar to
> > people who have worked with V4L2 (i.e. a large portion of the media
> > folks), this simple design seems to be efficient as I have observed
> > identical performance on both host and guest with the vicodec virtual
> > encoder. Since this device generates frames using the CPU and keeps
> > one core 100% busy, any overhead introduced by virtualization should
> > be noticeable - yet I got nearly identical framerates on both host and
> > guest.
>
> I haven't worked with v4l2, but this approach sounds reasonable to me.
>
> >
> > Things that still need to be implemented before this can be considered
> > more complete:
> >
> > * Controls. This should not be particularly difficult but I left it
> > for now as they are not necessary to demonstrate the viability of this
> > project.
> > * Guest-allocated memory buffers and virtio objects. Based on our
> > previous experience with virtio-video these should not be difficult to
> > implement. Currently all video buffers are allocated by the host, and
> > mapped into the guest if needed.
> > * Events and polling, required to use a decoder. Again these were not
> > strictly necessary for the proof of concept, but since we've gone this
> > far I will try to get them to work as the next step.
> > * Requests and multi-part media devices. This will be necessary in
> > order to support more modern camera pipelines. I haven't made up my
> > mind yet about whether we should support this, but if we want to it
> > should not be too hard (describe several devices in the configuration
> > space and enable the request-related commands). I need to talk to
> > camera folks to know whether there is an actual interest in this.
> > * Support for more ioctls, in case we want to support tuners and radios.
> >
> > If you want to try this code, you need to build the guest kernel with
> > CONFIG_VIRTIO_V4L2 and enable the `v4l2` feature when building crosvm
> > (check out the Book of Crosvm if you need instructions on how to build
> > and use it). Then pass --virtio-v4l2=/dev/videoX to crosvm in order to
> > expose the /dev/videoX host V4L2 device to the guest.
> >
> > I have successfully captured frames (and verified their validity)
> > using the following devices:
> >
> > * A simple USB camera using the `uvcvideo` driver. Both host and guest
> > could capture a MJPEG stream with the following command:
> > v4l2-ctl -d0 -v pixelformat=MJPG --stream-mmap --stream-to=test.mjpg
> >
> > * The vivid virtual camera driver. I could capture a valid YUV stream
> > using the following command:
> > v4l2-ctl -d0 -v pixelformat=NV12 --stream-mmap --stream-to test.yuv
> >
> > * The encoder device of the vicodec virtual codec driver. On both host
> > and guest, the following command produces a valid FWHT stream in
> > `test.fwht`:
> > v4l2-ctl -x pixelformat=NV12 --stream-mmap --stream-out-mmap
> > --stream-to-hdr test.fwht
>
> This looks very good already.

Turns out that ffmpeg can also be used to capture video and encode it
in a more convenient format, e.g:

ffmpeg -f v4l2 -i /dev/video0 output.mkv

This is a very good sign with respect to compliance with V4L2
protocols. We had a very hard time getting the original virtio-video
to be used with common tools and passing the V4L2 compliance test in
general, but with this approach it seems like it will be much simpler
to achieve.

>
> >
> > By this work I hope to demonstrate to people interested in video
> > virtualization that encapsulating V4L2 in virtio is not only a viable
> > solution, it is a huge shortcut in terms of specification crafting,
> > driver writing, and overall headaches involved in specifying something
> > as complex as a video device. Not only could we support video decoders
> > and encoders, which was the goal of virtio-video, we would also get
> > image processors, video overlays and simple cameras for free, and
> > potentially more complex cameras if we decide to.
> >
> > After writing this prototype (and a couple attempts at the
> > virtio-video specification) I don't see any reason not to rely on a
> > battle-tested protocol instead of designing our own that does
> > basically the same thing. The genericity of V4L2 may mean that
> > sometimes we will need 2 commands where virtio-video would require
> > only one, but we are talking about a low frequency of virtio commands
> > (60 fps for video playback typically) and that genericity comes with
> > the benefit of a single Linux guest driver.
> >
> > If there is an agreement to move forward with this, I guess the next
> > step for me will be to write a proper spec so the protocol can be
> > understood and discussed in detail. Then why not try and upstream the
> > kernel driver and make ChromeOS use this too in place of our
> > heavily-patched virtio-video. :) We might even make it for virtio 1.3.
> >
> > Looking forward to your feedback. Please don't hesitate to ask
> > questions, especially if you are not familiar with V4L2. I can also
> > help folks interested in running this with the setup if needed.
>
> Thank you for sharing your work! I think this looks very promising, and
> I'd like to hear feedback from others as well. I assume that would make
> the spec change more digestible than earlier versions.

The spec should indeed be considerably lighter. I'll wait for more
feedback, but if the concept appeals to other people as well, I may
give the spec a try soon.

Meanwhile I'll also try to add support for stateful decoders and
guest-allocated buffers to the prototype so it can be considered more
complete.

Cheers,
Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-14  5:06                       ` Alexandre Courbot
@ 2023-03-16 10:12                         ` Alexander Gordeev
  2023-03-17  7:24                           ` Alexandre Courbot
  2023-03-27 13:00                         ` Albert Esteve
  1 sibling, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-03-16 10:12 UTC (permalink / raw)
  To: Alexandre Courbot, Cornelia Huck
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

Hi Alexandre,

On 14.03.23 06:06, Alexandre Courbot wrote:
> The spec should indeed be considerably lighter. I'll wait for more
> feedback, but if the concept appeals to other people as well, I may
> give the spec a try soon.

Did you receive an email I sent on February 7? There was some feedback
there. It has been already established, that V4L2 UAPI pass-through is
technically possible. But I had a couple of points why it is not
desirable. Unfortunately I haven't received a reply. I also don't see
most of these points addressed in any subsequent emails from you.

I have more to say now, but I'd like to make sure that you're interested
in the discussion first.


> Meanwhile I'll also try to add support for stateful decoders and
> guest-allocated buffers to the prototype so it can be considered more
> complete.
>
> Cheers,
> Alex.
>
--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-16 10:12                         ` Alexander Gordeev
@ 2023-03-17  7:24                           ` Alexandre Courbot
  2023-04-17 12:51                             ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-03-17  7:24 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

Hi Alexander,

On Thu, Mar 16, 2023 at 7:13 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> Hi Alexandre,
>
> On 14.03.23 06:06, Alexandre Courbot wrote:
> > The spec should indeed be considerably lighter. I'll wait for more
> > feedback, but if the concept appeals to other people as well, I may
> > give the spec a try soon.
>
> Did you receive an email I sent on February 7? There was some feedback
> there. It has been already established, that V4L2 UAPI pass-through is
> technically possible. But I had a couple of points why it is not
> desirable. Unfortunately I haven't received a reply. I also don't see
> most of these points addressed in any subsequent emails from you.
>
> I have more to say now, but I'd like to make sure that you're interested
> in the discussion first.

Sorry about that, I dived head first into the code to see how viable
the idea would be and forgot to come back to you. Let me try to answer
your points now that I have a better idea of how this would work.

> > If we find out that there is a benefit in going through the V4L2
> > subsystem (which I cannot see for now), rebuilding the UAPI structures
> > to communicate with the device is not different from building
> > virtio-video specific structures like what we are currently doing.
>
> Well, the V4L2 subsystem is there for a reason, right? It does some
> important things too. I'm going to check all the v4l2_ioctl_ops
> callbacks in the current virtio-video driver to make the list. Also if
> you have some PoC spec/implementations, that would be nice to review. It
> is always better to see the actual implementation, of course.
>
> I have these points so far:
>
> 1. Overall the V4L2 stateful decoder API looks significantly more
> complex to me. Looks like you're a V4L2 expert, so this might not be
> visible to you that much.

V4L2 is more generic than virtio-video, so as a result specific uses
tend to require a bit more operations. I would argue the mental
overhead of working with it is less than significant, and most of it
consists in not forgetting to call STREAMON on a queue after some
operations. Things like format, resolution and buffer management do
not get more complex (and V4L2 is actually more complete than our
previous proposal on these).

The counterpart of this marginal extra complexity is that you can
virtualize more kinds of devices, and even within virtio-video support
more formats than what has been specified so far. If your guest is
Linux, the same kernel driver can be used to expose any kind of device
supported by V4L2, and the driver is also much simpler than
virtio-video, so you are actually reducing complexity significantly
here. Even if you are not Linux, you can share the V4L2 structures
definitions and low-layer code that sends V4L2 commands to the host
between drivers. So while it is true that some specifics become
slightly more complex, there is a lot of potential simplification when
you look at the whole picture.

It's an opinionated proposal, and it comes with a few compromises if
you are mostly interested in codecs alone. But looking at the guest
driver convinces me that this is the better approach when you look at
the whole picture.

>    a. So V4L2 subsystem and the current virtio-video driver are already
> reducing the complexity. And this seems as the right place to do this,
> because the complexity is caused by the amount of V4L2 use cases and its
> legacy. If somebody wants to use virtio-video in a Windows guest, they
> would prefer a simpler API, right? I think this use-case is not purely
> abstract at all.

The V4L2 subsystem is there to factorize code that can be shared
between drivers and manage their internal state. Our target is the
V4L2 UAPI, so a Windows driver needs not be concerned about these
details - it does what it would have done with virtio-video, and just
uses the V4L2 structures to communicate with the host instead of the
virtio-video ones.

>    b. Less complex API is better from a security point of view too. When
> V4L2 was developed, not many people were concerned with malicious USB
> devices probably. At least exploiting a malicious USB device usually
> requires physical access. With virtual devices and multiple VMs the
> stakes are higher, I believe.

That's probably true, but I fail to see how the fact we are using
struct v4l2_buffer instead of struct virtio_video_buffer can have an
impact on that?

V4L2 has a larger UAPI surface because it manages more kinds of
devices, but drivers only need to implement the ioctls they need. For
the rest, they just return -ENOTTY, and evil actors are hopefully kept
at bay.

> 2. We have a working virtio-video driver. So we need very good reasons
> to start from scratch. You name two reasons AFAIR: simplicity and
> possible use of cameras. Did I miss something else?
>
>    a. The simplicity is there only in case all the interfaces are V4L2,
> both in the backend and in the guest. Otherwise the complexity is just
> moved to backends. I haven't seen V4L2 in our setups so far, only some
> proprietary OMX libraries. So from my point of view, this is not
> simplicity in general, but an optimization for a specific narrow use case.

V4L2 is not a narrow use-case when it comes to video devices on Linux
- basically every user space application involving cameras or codecs
can use it. Even the virtio-video driver exposes a V4L2 device, so
unless you are using a different driver and proprietary userspace apps
specifically written to interact with that driver, V4L2 is involved in
your setup at some point.

The guest driver that I wrote is, I think, a good example of the
complexity you can expect in terms of guest driver size (as it is
pretty functional already with its 1000 and some LoCs). For the UAPI
complexity, the host device basically unpacks the information it needs
and rebuilds the V4L2 structures before calling into the host device,
and I don't see this process as more complex that the unpacking of
virtio-video structs which we also did in crosvm.

>    b. For modern cameras the V4L2 interface is not enough anyway. This
> was already discussed AFAIR. There is a separate virtio-camera
> specification, that indeed is based on V4L2 UAPI as you said. But
> combining these two specs is certainly not future proof, right? So I
> think it is best to let the virtio-camera spec to be developed
> independently.

I don't know if virtio-camera has made progress that they have not
published yet, but from what I have seen virtio-v4l2 can cover
everything that the currently published driver does (I could not find
a specification, but please point me to it if it exists), so there
would be no conflict to resolve.

V4L2 with requests support should be capable of handling complex
camera configurations, but the effort indeed seems to have switched to
KCAM when it comes to supporting complex native cameras natively. That
being said:

* KCAM is not merged yet, is probably not going to be for some time
(https://lwn.net/Articles/904776/), and we don't know how we can
handle virtualization with it,
* The fact that the camera is complex on the host does not mean that
all that complexity needs to be exposed to the guest. I don't know how
the camera folks want to manage this, but one can imagine that the
host could expose a simpler model for the virtual camera, with only
the required knobs, while the host takes care of doing all the complex
configuration.
* The counter argument can be made that simple camera devices do not
need a complex virtualization solution, so one can also invoke
simplicity here to advocate for virtio-v4l2.

My point is not to say that all other camera virtualization efforts
should be abandoned - if indeed there is a need for something more
specific, then nothing prevents us from having a virtio-camera
specification added. However, we are nowhere close to this at the
moment, and right now there is no official solution for camera
virtualization, so I see no reason to deny the opportunity to support
simple camera devices since its cost would just be to add "and cameras
device" in the paragraph of the spec that explains what devices are
supported.

> 3. More specifically I can see, that around 95% V4L2 drivers use
> videobuf2. This includes the current virtio-video driver. Bypassing the
> V4L2 subsystem means that vb2 can't be used, right? In various
> discussions vb2 popped up as a thing, that would be hard to avoid. What
> do you think about this? How are you going to deal with various V4L2
> memory types (V4L2_MEMORY_MMAP, V4L2_MEMORY_DMABUF, etc), for example?
> I'll try to dive deeper myself too...

VB2 is entirely avoided in the current driver, but my understanding is
that its helpers could be used if needed.

In virtio-v4l2, MMAP means that the host is responsible for managing
the buffers, so vb2 is entirely avoided. USERPTR means the guest
passes a SG list of guest physical addresses as mapping memory. VB2
may or may not be involved in managing this memory, but most likely
not if that memory comes from the guest userspace. DMABUF means the
guest passes a virtio object as the backing memory of the buffer.
There again there is no particular management to be done on the guest
side.

I bypassed VB2 for the current driver, and the cost of doing this is
that I had to write my own mmap() function.

> >> 3. user-space <-> v4l2 subsystem API is already the same or very close
> >> to v4l2 subsystem <-> v4l2 driver API. I believe this is not the case
> >> even with stateful decoder/encoder. Even more with stateless decoders
> >> because I can see, that v4l2 subsystem actually stores some state in
> >> this case as well. Which is quite reasonable I think.
> > Actually I don't think this is even something we need to think about -
> > in its simplest form the V4L2 guest driver just needs to act as a
> > proxy for the device. So which decoder API is used by the host is
> > completely irrelevant to the guest driver - it can support a decoder,
> > an encoder, or a camera - it doesn't even need to be aware of what
> > kind of device it is exposing and that simplicity is another thing
> > that I like with this design.
>
> As I wrote above the design would be indeed simple only in case the
> actual hardware is exposed to a backend through V4L2 too. Otherwise the
> complexity is just moved to backends.

Yes, and while I acknowledge that, this is not really more complex
that what you would have to do with a virtio-video device which also
needs to manage its own state and drive the hardware through backends.
I say that based on the experience working on the virtio-video device
in crosvm which follows that design too.

> > This simplicity goes away if the guest device does not use V4L2 as its
> > user-space interface (e.g. Windows?). In this case we would be in the
> > exact same scenario as the current virtio-video spec, where we need to
> > build device-specific structures from the guest driver's internal
> > state.
>
> IMO this is not quite correct. The scenario would not be not the same,
> because the V4L2 stateful decoder API is more complex in comparison to
> any virtio-video spec draft version. Probably it would be great to have
> a list of differences. I hope to find some time for this later...

There is not much difference between the V4L2 stateful decoder spec
and the virtio-video spec. Actually that's the very reason why I am
proposing to just virtualize V4L2, we were redoing the same thing.

I have quickly parsed the V4L2 decoder spec and here are the
differences I have found:

* VIDIOC_STREAMON needs to be called on both queues to start decoding.
* Frame crop is obtained using VIDIOC_G_SELECTION instead of being
available alongside the format parameter.
* End of drain requires to send the V4L2_DEC_CMD_START and call
VIDIOC_STREAMON again.
* Seeking is done by calling VIDIOC_STREAMOFF followed by
VIDIOC_STREAMON on the OUTPUT queue instead of having a dedicated
command.

... and that's basically it! Do we really need a new spec just to
smoothen these differences?

I hope I have somehow addressed your points. The main point here is to
discuss whether the V4L2 UAPI is a suitable transport for guest/host
accelerated codec work, regardless of what the guest or host
ultimately uses as UAPI. The goal of the PoC is to demonstrate that
this is a viable solution. This PoC is largely simplified by the fact
that V4L2 is used all along the way, but this is irrelevant - yes,
actual devices will likely talk to other APIs and maintain more state,
like a virtio-video device would do. What I want to demonstrate is
that we can send encoding work and receive a valid stream, and that it
is not costly, and only marginally more complex than our virtio-video
spec attempts.

... and we can support cameras too, but that's just a convenient
side-effect, not the ultimate solution to the camera virtualization
problem (that's for the camera folks to decide).

 Cheers,
Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-14  5:06                       ` Alexandre Courbot
  2023-03-16 10:12                         ` Alexander Gordeev
@ 2023-03-27 13:00                         ` Albert Esteve
  2023-04-15  5:58                           ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Albert Esteve @ 2023-03-27 13:00 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra

[-- Attachment #1: Type: text/plain, Size: 12214 bytes --]

On Tue, Mar 14, 2023 at 6:06 AM Alexandre Courbot <acourbot@chromium.org>
wrote:

> Hi Cornelia,
>
> On Fri, Mar 10, 2023 at 11:20 PM Cornelia Huck <cohuck@redhat.com> wrote:
> >
> > On Fri, Mar 10 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
> >
> > > Hi Cornelia,
> > >
> > > On Fri, Mar 10, 2023 at 7:51 PM Cornelia Huck <cohuck@redhat.com>
> wrote:
> > >>
> > >> On Tue, Feb 07 2023, Cornelia Huck <cohuck@redhat.com> wrote:
> > >>
> > >> > On Tue, Feb 07 2023, Alexandre Courbot <acourbot@chromium.org>
> wrote:
> > >> >
> > >> >> On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com>
> wrote:
> > >> >>> I hope we can sort this out soon -- I guess I'm not the only one
> who is
> > >> >>> anxious about this spec moving forward :) Please let me know if I
> can
> > >> >>> help in any way.
> > >> >>
> > >> >> I'll try to address Alexander's points in more detail, but I am not
> > >> >> seeing any blocking issue with using the V4L2 UAPI as the basis for
> > >> >> virtio-video (we are working on a small proof-of-concept and things
> > >> >> are going smoothly so far).
> > >> >
> > >> > Great to hear, looking forward to it!
> > >>
> > >> Quick question: Is there any git repo or similar where interested
> > >> parties can follow along? It would be great to have virtio-video in
> 1.3;
> > >> if you have some idea on when it might be ready, we could come up
> with a
> > >> schedule to accommodate that.
> > >
> > > I'm glad you asked, as a matter of fact I have just finished the
> > > virtio-v4l2 proof of concept today! It is capable of exposing a camera
> > > or encoder V4L2 device from the host to the guest, by encapsulating
> > > V4L2 commands into virtio.
> >
> > \o/ Excellent news!
>
> I am delighted that you seem to like it!
>
> >
> > >
> > > The guest driver code (single file for simplicity):
> > >
> https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c
> > >
> > > Bulk of the host-side crosvm device code:
> > >
> https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/protocol.rs
> > >
> https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/worker.rs
> > >
> > > Neither are works of art, so please forgive the few inefficiencies
> > > here and there - the goal was to make them easy to understand. Still,
> > > the guest driver is probably closer to what a final driver would look
> > > like. It fits in around 1,000 LoCs (comments excluded), which is
> > > enough to support stateful video encoders as well as USB camera
> > > devices. Decoders cannot be run yet because they require support for
> > > V4L2 events and polling - I will try to enable these features next.
> > > But even in its current state this driver shows one interesting aspect
> > > of virtio-v4l2, at least for Linux guests: a single and relatively
> > > simple driver is able to drive a wide range of devices.
> >
> > I had a quick look at the driver; it indeed looks like a big win on
> > Linux systems. (The one thing I'm missing is how easy it would be to
> > replicate the used v4l2 parts on non-Linux systems.)
>
> For non-Linux systems (host or guest, we may need to copy/paste or
> reproduce the UAPI structures. Thankfully they are unambiguously
> described in the UAPI documentation, see for example
> https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/vidioc-enum-fmt.html
> for the formats structure.
>
> >
> > >
> > > The crosvm device code proxies a V4L2 device on the host, again using
> > > roughly 1,200 lines of non-comment code. This design does not intend
> > > to reflect what an actual host device will look like - in effect they
> > > should be much more specialized since they are unlikely to also call
> > > into V4L2 on the host side. However, if the host is Linux and we just
> > > want to expose a USB camera or other V4L2 device almost as-is, then
> > > this could actually be a good fit.
> > >
> > > The protocol should be easy to understand by looking at the code - we
> > > only have 5 virtio commands to open/close a session, map/unmap a
> > > host-allocated buffer into the guest PAS, and the IOCTL command which
> > > sends V4L2 ioctl structures to the host and waits for its reply. All
> > > ioctls are synchronous per-session, meaning that a session only sends
> > > one ioctl at a time and waits for its response before it can send the
> > > next (as this is what user-space does too). Ioctls, however, never
> > > block on the host side and the ones that would do (DQBUF and DQEVENT)
> > > are replaced by host-initiated events. On top of being familiar to
> > > people who have worked with V4L2 (i.e. a large portion of the media
> > > folks), this simple design seems to be efficient as I have observed
> > > identical performance on both host and guest with the vicodec virtual
> > > encoder. Since this device generates frames using the CPU and keeps
> > > one core 100% busy, any overhead introduced by virtualization should
> > > be noticeable - yet I got nearly identical framerates on both host and
> > > guest.
> >
> > I haven't worked with v4l2, but this approach sounds reasonable to me.
> >
> > >
> > > Things that still need to be implemented before this can be considered
> > > more complete:
> > >
> > > * Controls. This should not be particularly difficult but I left it
> > > for now as they are not necessary to demonstrate the viability of this
> > > project.
> > > * Guest-allocated memory buffers and virtio objects. Based on our
> > > previous experience with virtio-video these should not be difficult to
> > > implement. Currently all video buffers are allocated by the host, and
> > > mapped into the guest if needed.
> > > * Events and polling, required to use a decoder. Again these were not
> > > strictly necessary for the proof of concept, but since we've gone this
> > > far I will try to get them to work as the next step.
> > > * Requests and multi-part media devices. This will be necessary in
> > > order to support more modern camera pipelines. I haven't made up my
> > > mind yet about whether we should support this, but if we want to it
> > > should not be too hard (describe several devices in the configuration
> > > space and enable the request-related commands). I need to talk to
> > > camera folks to know whether there is an actual interest in this.
> > > * Support for more ioctls, in case we want to support tuners and
> radios.
> > >
> > > If you want to try this code, you need to build the guest kernel with
> > > CONFIG_VIRTIO_V4L2 and enable the `v4l2` feature when building crosvm
> > > (check out the Book of Crosvm if you need instructions on how to build
> > > and use it). Then pass --virtio-v4l2=/dev/videoX to crosvm in order to
> > > expose the /dev/videoX host V4L2 device to the guest.
> > >
> > > I have successfully captured frames (and verified their validity)
> > > using the following devices:
> > >
> > > * A simple USB camera using the `uvcvideo` driver. Both host and guest
> > > could capture a MJPEG stream with the following command:
> > > v4l2-ctl -d0 -v pixelformat=MJPG --stream-mmap --stream-to=test.mjpg
> > >
> > > * The vivid virtual camera driver. I could capture a valid YUV stream
> > > using the following command:
> > > v4l2-ctl -d0 -v pixelformat=NV12 --stream-mmap --stream-to test.yuv
> > >
> > > * The encoder device of the vicodec virtual codec driver. On both host
> > > and guest, the following command produces a valid FWHT stream in
> > > `test.fwht`:
> > > v4l2-ctl -x pixelformat=NV12 --stream-mmap --stream-out-mmap
> > > --stream-to-hdr test.fwht
> >
> > This looks very good already.
>
> Turns out that ffmpeg can also be used to capture video and encode it
> in a more convenient format, e.g:
>
> ffmpeg -f v4l2 -i /dev/video0 output.mkv
>
> This is a very good sign with respect to compliance with V4L2
> protocols. We had a very hard time getting the original virtio-video
> to be used with common tools and passing the V4L2 compliance test in
> general, but with this approach it seems like it will be much simpler
> to achieve.
>

I was curious, and I have tried the prototype. I was gladly surprised at
how nice
it behaves in this state. Really nice work.

I have managed to expose different devices to the host and run compliance
tests,
capture frames from a camera, and encode by doing:

# v4l2-ctl -d0 -x width=640,height=480 -v
width=640,height=480,pixelformat=FWHT \
                --stream-mmap --stream-out-mmap \
                --stream-from in-640-480.YU12 --stream-to output.fwht

Some results for the compliance tests using vicodec:
*  Enconder device
    * Host
Total for vicodec device /dev/video4: 46, Succeeded: 45, Failed: 1,
Warnings: 0
    * Guest
Total for virtio_v4l2 device /dev/video0: 45, Succeeded: 34, Failed: 11,
Warnings: 0

*  Decoder device
    * Host
Total for vicodec device /dev/video5: 46, Succeeded: 42, Failed: 4,
Warnings: 0
    * Guest
Total for virtio_v4l2 device /dev/video0: 45, Succeeded: 36, Failed: 9,
Warnings: 0


>
> >
> > >
> > > By this work I hope to demonstrate to people interested in video
> > > virtualization that encapsulating V4L2 in virtio is not only a viable
> > > solution, it is a huge shortcut in terms of specification crafting,
> > > driver writing, and overall headaches involved in specifying something
> > > as complex as a video device. Not only could we support video decoders
> > > and encoders, which was the goal of virtio-video, we would also get
> > > image processors, video overlays and simple cameras for free, and
> > > potentially more complex cameras if we decide to.
> > >
> > > After writing this prototype (and a couple attempts at the
> > > virtio-video specification) I don't see any reason not to rely on a
> > > battle-tested protocol instead of designing our own that does
> > > basically the same thing. The genericity of V4L2 may mean that
> > > sometimes we will need 2 commands where virtio-video would require
> > > only one, but we are talking about a low frequency of virtio commands
> > > (60 fps for video playback typically) and that genericity comes with
> > > the benefit of a single Linux guest driver.
> > >
> > > If there is an agreement to move forward with this, I guess the next
> > > step for me will be to write a proper spec so the protocol can be
> > > understood and discussed in detail. Then why not try and upstream the
> > > kernel driver and make ChromeOS use this too in place of our
> > > heavily-patched virtio-video. :) We might even make it for virtio 1.3.
> > >
> > > Looking forward to your feedback. Please don't hesitate to ask
> > > questions, especially if you are not familiar with V4L2. I can also
> > > help folks interested in running this with the setup if needed.
> >
> > Thank you for sharing your work! I think this looks very promising, and
> > I'd like to hear feedback from others as well. I assume that would make
> > the spec change more digestible than earlier versions.
>
> The spec should indeed be considerably lighter. I'll wait for more
> feedback, but if the concept appeals to other people as well, I may
> give the spec a try soon.
>
>
From my perspective, I think we gain a simpler backend, simpler
driver, and potential support for more devices for free. It may
not cover all usecases, but that has already been discussed in this
thread. For that purpose, virtio-camera will still be a possibility.

FWIW, I think this is a great step in the right direction, as it will make
both
drivers and devices easier to implement and debug.

BR,
Albert


> Meanwhile I'll also try to add support for stateful decoders and
> guest-allocated buffers to the prototype so it can be considered more
> complete.
>
> Cheers,
> Alex.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
>

[-- Attachment #2: Type: text/html, Size: 15966 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-27 13:00                         ` Albert Esteve
@ 2023-04-15  5:58                           ` Alexandre Courbot
  2023-04-17 12:56                             ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-15  5:58 UTC (permalink / raw)
  To: Albert Esteve
  Cc: Cornelia Huck, Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra

Thanks for the feedback Albert!

On Mon, Mar 27, 2023 at 10:00 PM Albert Esteve <aesteve@redhat.com> wrote:
>
>
>
>
> On Tue, Mar 14, 2023 at 6:06 AM Alexandre Courbot <acourbot@chromium.org> wrote:
>>
>> Hi Cornelia,
>>
>> On Fri, Mar 10, 2023 at 11:20 PM Cornelia Huck <cohuck@redhat.com> wrote:
>> >
>> > On Fri, Mar 10 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
>> >
>> > > Hi Cornelia,
>> > >
>> > > On Fri, Mar 10, 2023 at 7:51 PM Cornelia Huck <cohuck@redhat.com> wrote:
>> > >>
>> > >> On Tue, Feb 07 2023, Cornelia Huck <cohuck@redhat.com> wrote:
>> > >>
>> > >> > On Tue, Feb 07 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
>> > >> >
>> > >> >> On Mon, Feb 6, 2023 at 11:13 PM Cornelia Huck <cohuck@redhat.com> wrote:
>> > >> >>> I hope we can sort this out soon -- I guess I'm not the only one who is
>> > >> >>> anxious about this spec moving forward :) Please let me know if I can
>> > >> >>> help in any way.
>> > >> >>
>> > >> >> I'll try to address Alexander's points in more detail, but I am not
>> > >> >> seeing any blocking issue with using the V4L2 UAPI as the basis for
>> > >> >> virtio-video (we are working on a small proof-of-concept and things
>> > >> >> are going smoothly so far).
>> > >> >
>> > >> > Great to hear, looking forward to it!
>> > >>
>> > >> Quick question: Is there any git repo or similar where interested
>> > >> parties can follow along? It would be great to have virtio-video in 1.3;
>> > >> if you have some idea on when it might be ready, we could come up with a
>> > >> schedule to accommodate that.
>> > >
>> > > I'm glad you asked, as a matter of fact I have just finished the
>> > > virtio-v4l2 proof of concept today! It is capable of exposing a camera
>> > > or encoder V4L2 device from the host to the guest, by encapsulating
>> > > V4L2 commands into virtio.
>> >
>> > \o/ Excellent news!
>>
>> I am delighted that you seem to like it!
>>
>> >
>> > >
>> > > The guest driver code (single file for simplicity):
>> > > https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c
>> > >
>> > > Bulk of the host-side crosvm device code:
>> > > https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/protocol.rs
>> > > https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/worker.rs
>> > >
>> > > Neither are works of art, so please forgive the few inefficiencies
>> > > here and there - the goal was to make them easy to understand. Still,
>> > > the guest driver is probably closer to what a final driver would look
>> > > like. It fits in around 1,000 LoCs (comments excluded), which is
>> > > enough to support stateful video encoders as well as USB camera
>> > > devices. Decoders cannot be run yet because they require support for
>> > > V4L2 events and polling - I will try to enable these features next.
>> > > But even in its current state this driver shows one interesting aspect
>> > > of virtio-v4l2, at least for Linux guests: a single and relatively
>> > > simple driver is able to drive a wide range of devices.
>> >
>> > I had a quick look at the driver; it indeed looks like a big win on
>> > Linux systems. (The one thing I'm missing is how easy it would be to
>> > replicate the used v4l2 parts on non-Linux systems.)
>>
>> For non-Linux systems (host or guest, we may need to copy/paste or
>> reproduce the UAPI structures. Thankfully they are unambiguously
>> described in the UAPI documentation, see for example
>> https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/vidioc-enum-fmt.html
>> for the formats structure.
>>
>> >
>> > >
>> > > The crosvm device code proxies a V4L2 device on the host, again using
>> > > roughly 1,200 lines of non-comment code. This design does not intend
>> > > to reflect what an actual host device will look like - in effect they
>> > > should be much more specialized since they are unlikely to also call
>> > > into V4L2 on the host side. However, if the host is Linux and we just
>> > > want to expose a USB camera or other V4L2 device almost as-is, then
>> > > this could actually be a good fit.
>> > >
>> > > The protocol should be easy to understand by looking at the code - we
>> > > only have 5 virtio commands to open/close a session, map/unmap a
>> > > host-allocated buffer into the guest PAS, and the IOCTL command which
>> > > sends V4L2 ioctl structures to the host and waits for its reply. All
>> > > ioctls are synchronous per-session, meaning that a session only sends
>> > > one ioctl at a time and waits for its response before it can send the
>> > > next (as this is what user-space does too). Ioctls, however, never
>> > > block on the host side and the ones that would do (DQBUF and DQEVENT)
>> > > are replaced by host-initiated events. On top of being familiar to
>> > > people who have worked with V4L2 (i.e. a large portion of the media
>> > > folks), this simple design seems to be efficient as I have observed
>> > > identical performance on both host and guest with the vicodec virtual
>> > > encoder. Since this device generates frames using the CPU and keeps
>> > > one core 100% busy, any overhead introduced by virtualization should
>> > > be noticeable - yet I got nearly identical framerates on both host and
>> > > guest.
>> >
>> > I haven't worked with v4l2, but this approach sounds reasonable to me.
>> >
>> > >
>> > > Things that still need to be implemented before this can be considered
>> > > more complete:
>> > >
>> > > * Controls. This should not be particularly difficult but I left it
>> > > for now as they are not necessary to demonstrate the viability of this
>> > > project.
>> > > * Guest-allocated memory buffers and virtio objects. Based on our
>> > > previous experience with virtio-video these should not be difficult to
>> > > implement. Currently all video buffers are allocated by the host, and
>> > > mapped into the guest if needed.
>> > > * Events and polling, required to use a decoder. Again these were not
>> > > strictly necessary for the proof of concept, but since we've gone this
>> > > far I will try to get them to work as the next step.
>> > > * Requests and multi-part media devices. This will be necessary in
>> > > order to support more modern camera pipelines. I haven't made up my
>> > > mind yet about whether we should support this, but if we want to it
>> > > should not be too hard (describe several devices in the configuration
>> > > space and enable the request-related commands). I need to talk to
>> > > camera folks to know whether there is an actual interest in this.
>> > > * Support for more ioctls, in case we want to support tuners and radios.
>> > >
>> > > If you want to try this code, you need to build the guest kernel with
>> > > CONFIG_VIRTIO_V4L2 and enable the `v4l2` feature when building crosvm
>> > > (check out the Book of Crosvm if you need instructions on how to build
>> > > and use it). Then pass --virtio-v4l2=/dev/videoX to crosvm in order to
>> > > expose the /dev/videoX host V4L2 device to the guest.
>> > >
>> > > I have successfully captured frames (and verified their validity)
>> > > using the following devices:
>> > >
>> > > * A simple USB camera using the `uvcvideo` driver. Both host and guest
>> > > could capture a MJPEG stream with the following command:
>> > > v4l2-ctl -d0 -v pixelformat=MJPG --stream-mmap --stream-to=test.mjpg
>> > >
>> > > * The vivid virtual camera driver. I could capture a valid YUV stream
>> > > using the following command:
>> > > v4l2-ctl -d0 -v pixelformat=NV12 --stream-mmap --stream-to test.yuv
>> > >
>> > > * The encoder device of the vicodec virtual codec driver. On both host
>> > > and guest, the following command produces a valid FWHT stream in
>> > > `test.fwht`:
>> > > v4l2-ctl -x pixelformat=NV12 --stream-mmap --stream-out-mmap
>> > > --stream-to-hdr test.fwht
>> >
>> > This looks very good already.
>>
>> Turns out that ffmpeg can also be used to capture video and encode it
>> in a more convenient format, e.g:
>>
>> ffmpeg -f v4l2 -i /dev/video0 output.mkv
>>
>> This is a very good sign with respect to compliance with V4L2
>> protocols. We had a very hard time getting the original virtio-video
>> to be used with common tools and passing the V4L2 compliance test in
>> general, but with this approach it seems like it will be much simpler
>> to achieve.
>
>
> I was curious, and I have tried the prototype. I was gladly surprised at how nice
> it behaves in this state. Really nice work.
>
> I have managed to expose different devices to the host and run compliance tests,
> capture frames from a camera, and encode by doing:
>
> # v4l2-ctl -d0 -x width=640,height=480 -v width=640,height=480,pixelformat=FWHT \
>                 --stream-mmap --stream-out-mmap \
>                 --stream-from in-640-480.YU12 --stream-to output.fwht

Just a small update: I have updated the device and driver to handle
V4L2 events and polling and the vicodec decoder is now also working.

Instructions are mostly unchanged:

* Insert the vicodec module on your host (sudo modprobe vicodec multiplanar=1)
* Check that it is working properly:
** Encode 30 frames in FWHT format:
v4l2-ctl -d/dev/video0 -x pixelformat=NV12 --stream-mmap
--stream-out-mmap --stream-to-hdr out.fwht --stream-count 30
** Decode them back to NV12:
v4l2-ctl -d/dev/video1 -v pixelformat=NV12 --stream-mmap
--stream-out-mmap --stream-from-hdr out.fwht --stream-to out.nv12
(You can check the result with YUView, frame size is 1280x720).
* Get the kernel at https://github.com/Gnurou/linux/tree/virtio-v4l2
and build with CONFIG_VIRTIO_V4L2,
* Get crosvm at https://github.com/Gnurou/crosvm/tree/virtio-v4l2 and
run it with the "--virtio-v4l2=/dev/video0 --virtio-v4l2=/dev/video1"
options to virtualize the encoder and decoder devices,
* Run the same commands as the second step in the guest, you should
get the exact same result as when they were run on the host.

This demonstrates that a decoder device can be virtualized quite
simply by this solution. Next let me enable controls (so
v4l2-compliance passes in the guest) and the remaining types of buffer
memories and we should have something rather complete, at least on the
guest side.

I have also chatted a bit more with a few camera folks at Google, and
they agreed that V4L2 with requests should be enough to support even
modern camera pipelines. That is, the whole complexity of the host
pipeline would not and should not be entirely exposed to the guest,
only a simpler model that makes sense for what the camera is used for,
and for that V4L2 is more than enough. So a stronger confidence that
this will be suitable for the camera use-case as well.

If nobody strongly objects, I think this can be pushed a bit more
officially. Cornelia, would you consider it for inclusion if I
switched the next version of the specification to use V4L2 as the
host/guest protocol? This may take some more time as I want to confirm
the last details with code, but it should definitely be faster to
merge and to test with a real implementation than our previous
virtio-video attempts.

Cheers,
Alex.




>
> Some results for the compliance tests using vicodec:
> *  Enconder device
>     * Host
> Total for vicodec device /dev/video4: 46, Succeeded: 45, Failed: 1, Warnings: 0
>     * Guest
> Total for virtio_v4l2 device /dev/video0: 45, Succeeded: 34, Failed: 11, Warnings: 0
>
> *  Decoder device
>     * Host
> Total for vicodec device /dev/video5: 46, Succeeded: 42, Failed: 4, Warnings: 0
>     * Guest
> Total for virtio_v4l2 device /dev/video0: 45, Succeeded: 36, Failed: 9, Warnings: 0
>
>>
>>
>> >
>> > >
>> > > By this work I hope to demonstrate to people interested in video
>> > > virtualization that encapsulating V4L2 in virtio is not only a viable
>> > > solution, it is a huge shortcut in terms of specification crafting,
>> > > driver writing, and overall headaches involved in specifying something
>> > > as complex as a video device. Not only could we support video decoders
>> > > and encoders, which was the goal of virtio-video, we would also get
>> > > image processors, video overlays and simple cameras for free, and
>> > > potentially more complex cameras if we decide to.
>> > >
>> > > After writing this prototype (and a couple attempts at the
>> > > virtio-video specification) I don't see any reason not to rely on a
>> > > battle-tested protocol instead of designing our own that does
>> > > basically the same thing. The genericity of V4L2 may mean that
>> > > sometimes we will need 2 commands where virtio-video would require
>> > > only one, but we are talking about a low frequency of virtio commands
>> > > (60 fps for video playback typically) and that genericity comes with
>> > > the benefit of a single Linux guest driver.
>> > >
>> > > If there is an agreement to move forward with this, I guess the next
>> > > step for me will be to write a proper spec so the protocol can be
>> > > understood and discussed in detail. Then why not try and upstream the
>> > > kernel driver and make ChromeOS use this too in place of our
>> > > heavily-patched virtio-video. :) We might even make it for virtio 1.3.
>> > >
>> > > Looking forward to your feedback. Please don't hesitate to ask
>> > > questions, especially if you are not familiar with V4L2. I can also
>> > > help folks interested in running this with the setup if needed.
>> >
>> > Thank you for sharing your work! I think this looks very promising, and
>> > I'd like to hear feedback from others as well. I assume that would make
>> > the spec change more digestible than earlier versions.
>>
>> The spec should indeed be considerably lighter. I'll wait for more
>> feedback, but if the concept appeals to other people as well, I may
>> give the spec a try soon.
>>
>
> From my perspective, I think we gain a simpler backend, simpler
> driver, and potential support for more devices for free. It may
> not cover all usecases, but that has already been discussed in this
> thread. For that purpose, virtio-camera will still be a possibility.
>
> FWIW, I think this is a great step in the right direction, as it will make both
> drivers and devices easier to implement and debug.
>
> BR,
> Albert
>
>>
>> Meanwhile I'll also try to add support for stateful decoders and
>> guest-allocated buffers to the prototype so it can be considered more
>> complete.
>>
>> Cheers,
>> Alex.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-03-17  7:24                           ` Alexandre Courbot
@ 2023-04-17 12:51                             ` Alexander Gordeev
  2023-04-17 14:43                               ` Cornelia Huck
  2023-04-21  4:02                               ` Alexandre Courbot
  0 siblings, 2 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-17 12:51 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

Hi Alexandre,

Thanks for you letter! Sorry, it took me some time to write an answer.

First of all I'd like to describe my perspective a little bit because it
seems, that in many cases we (and other people writing their feedbacks)
simply have very different priorities and background.

OpenSynergy, the company that I work for, develops a proprietary
hypervisor called COQOS mainly for automotive and aerospace domains. We
have our proprietary device implementations, but overall our goal is to
bring open standards into these quite closed domains and we're betting
big on virtio. The idea is to run safety-critical functions like cockpit
controller alongside with multimedia stuff in different VMs on the same
physical board. Right now they have it on separate physical devices. So
they already have maximum isolation. And we're trying to make this
equally safe on a single board. The benefit is the reduced costs and
some additional features. Of course, we also need features here, but at
the same time security and ease of certification are among the top of
our priorities. Nobody wants cars or planes to have security problems,
right? Also nobody really needs DVB and even more exotic devices in cars
and planes AFAIK.

For the above mentioned reasons our COQOS hypervisor is running on bare
metal. Also memory management for the guests is mostly static. It is
possible to make a shared memory region between a device and a driver
managed by device in advance. But definitely no mapping of random host
pages on the fly is supported.

AFAIU crosvm is about making Chrome OS more secure by putting every app
in its own virtualized environment, right? Both the host and guest are
linux. In this case I totally understand why V4L2 UAPI pass-through
feels like a right move. I guess, you'd like to make the switch to
virtualized apps as seemless as possible for your users. If they can't
use their DVBs anymore, they complain. And adding the virtualization
makes the whole thing more secure anyway. So I understand the desire to
have the range of supported devices as broad as possible. It is also
understandable that priorities are different with desktop
virtualization. Also I'm not trying to diminish the great work, that you
have done. It is just that from my perspective this looks like a step in
the wrong direction because of the mentioned concerns. So I'm going to
continue being a skeptic here, sorry.

Of course, I don't expect that you continue working on the old approach
now as you have put that many efforts into the V4L2 UAPI pass-through.
So I think it is best to do the evolutionary changes in scope of virtio
video device specification, and create a new device specification
(virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
continue the virtio-video development. In fact I already started making
draft v7 of the spec according to the comments. I hope it will be ready
for review soon.

I hope this approach will also help fix issues with virtio-video spec
and driver development misalignment as well as V4L2 compliance issues
with the driver. I believe the problems were caused partly by poor
communication between us and by misalignment of our development cycles,
not by the driver complexity.

So in my opinion it is OK to have different specs with overlapping
functionality for some time. My only concern is if this would be
accepted by the community and the committee. How the things usually go
here: preferring features and tolerating possible security issues or the
other way around? Also how acceptable is having linux-specific protocols
at all?

Also I still have concerns about memory management with V4L2 UAPI
pass-through. Please see below.

On 17.03.23 08:24, Alexandre Courbot wrote:
> Hi Alexander,
>
> On Thu, Mar 16, 2023 at 7:13 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>> Hi Alexandre,
>>
>> On 14.03.23 06:06, Alexandre Courbot wrote:
>>> The spec should indeed be considerably lighter. I'll wait for more
>>> feedback, but if the concept appeals to other people as well, I may
>>> give the spec a try soon.
>> Did you receive an email I sent on February 7? There was some feedback
>> there. It has been already established, that V4L2 UAPI pass-through is
>> technically possible. But I had a couple of points why it is not
>> desirable. Unfortunately I haven't received a reply. I also don't see
>> most of these points addressed in any subsequent emails from you.
>>
>> I have more to say now, but I'd like to make sure that you're interested
>> in the discussion first.
> Sorry about that, I dived head first into the code to see how viable
> the idea would be and forgot to come back to you. Let me try to answer
> your points now that I have a better idea of how this would work.
>
>>> If we find out that there is a benefit in going through the V4L2
>>> subsystem (which I cannot see for now), rebuilding the UAPI structures
>>> to communicate with the device is not different from building
>>> virtio-video specific structures like what we are currently doing.
>> Well, the V4L2 subsystem is there for a reason, right? It does some
>> important things too. I'm going to check all the v4l2_ioctl_ops
>> callbacks in the current virtio-video driver to make the list. Also if
>> you have some PoC spec/implementations, that would be nice to review. It
>> is always better to see the actual implementation, of course.
>>
>> I have these points so far:
>>
>> 1. Overall the V4L2 stateful decoder API looks significantly more
>> complex to me. Looks like you're a V4L2 expert, so this might not be
>> visible to you that much.
> V4L2 is more generic than virtio-video, so as a result specific uses
> tend to require a bit more operations. I would argue the mental
> overhead of working with it is less than significant, and most of it
> consists in not forgetting to call STREAMON on a queue after some
> operations. Things like format, resolution and buffer management do
> not get more complex (and V4L2 is actually more complete than our
> previous proposal on these).
>
> The counterpart of this marginal extra complexity is that you can
> virtualize more kinds of devices, and even within virtio-video support
> more formats than what has been specified so far. If your guest is
> Linux, the same kernel driver can be used to expose any kind of device
> supported by V4L2, and the driver is also much simpler than
> virtio-video, so you are actually reducing complexity significantly
> here. Even if you are not Linux, you can share the V4L2 structures
> definitions and low-layer code that sends V4L2 commands to the host
> between drivers. So while it is true that some specifics become
> slightly more complex, there is a lot of potential simplification when
> you look at the whole picture.
>
> It's an opinionated proposal, and it comes with a few compromises if
> you are mostly interested in codecs alone. But looking at the guest
> driver convinces me that this is the better approach when you look at
> the whole picture.

Sorry, I just see it differently as I tried to describe above. The
problem is that we don't yet see the whole picture with the V4L2 UAPI
pass-through. I reviewed the code briefly. It is great, that you already
implemented the MMAP mode and host allocations already. But I would
argue, that this is the simplest case. Do you agree? Also this mode of
operation is not supported in our hypervisor for reasons mentioned
above. So in our case this PoC doesn't yet prove anything unfortunately.
I think the real complexity is yet to come.


>>     a. So V4L2 subsystem and the current virtio-video driver are already
>> reducing the complexity. And this seems as the right place to do this,
>> because the complexity is caused by the amount of V4L2 use cases and its
>> legacy. If somebody wants to use virtio-video in a Windows guest, they
>> would prefer a simpler API, right? I think this use-case is not purely
>> abstract at all.
> The V4L2 subsystem is there to factorize code that can be shared
> between drivers and manage their internal state. Our target is the
> V4L2 UAPI, so a Windows driver needs not be concerned about these
> details - it does what it would have done with virtio-video, and just
> uses the V4L2 structures to communicate with the host instead of the
> virtio-video ones.

It can also reuse the virtio-video structures. So I think despite the
ability to reuse V4L2 structures, having to implement a linux-specific
interface would still be a bigger pain.


>>     b. Less complex API is better from a security point of view too. When
>> V4L2 was developed, not many people were concerned with malicious USB
>> devices probably. At least exploiting a malicious USB device usually
>> requires physical access. With virtual devices and multiple VMs the
>> stakes are higher, I believe.
> That's probably true, but I fail to see how the fact we are using
> struct v4l2_buffer instead of struct virtio_video_buffer can have an
> impact on that?
>
> V4L2 has a larger UAPI surface because it manages more kinds of
> devices, but drivers only need to implement the ioctls they need. For
> the rest, they just return -ENOTTY, and evil actors are hopefully kept
> at bay.

Still there are definitely more ways to do things wrong. It would be
harder to audit a larger API surface.


>> 2. We have a working virtio-video driver. So we need very good reasons
>> to start from scratch. You name two reasons AFAIR: simplicity and
>> possible use of cameras. Did I miss something else?
>>
>>     a. The simplicity is there only in case all the interfaces are V4L2,
>> both in the backend and in the guest. Otherwise the complexity is just
>> moved to backends. I haven't seen V4L2 in our setups so far, only some
>> proprietary OMX libraries. So from my point of view, this is not
>> simplicity in general, but an optimization for a specific narrow use case.
> V4L2 is not a narrow use-case when it comes to video devices on Linux
> - basically every user space application involving cameras or codecs
> can use it. Even the virtio-video driver exposes a V4L2 device, so
> unless you are using a different driver and proprietary userspace apps
> specifically written to interact with that driver, V4L2 is involved in
> your setup at some point.

Sorry, I mean narrow use-case if we look into other possibilities:

1. Stateless V4L2 on the host.
2. Any other interface on the host.
3. Any other guest except Linux.

Our targets are several popular embedded SoCs. Unfortunately we don't
have the luxury of simply having normal V4L2 devices there. And it
doesn't look like this is going to change.


> The guest driver that I wrote is, I think, a good example of the
> complexity you can expect in terms of guest driver size (as it is
> pretty functional already with its 1000 and some LoCs). For the UAPI
> complexity, the host device basically unpacks the information it needs
> and rebuilds the V4L2 structures before calling into the host device,
> and I don't see this process as more complex that the unpacking of
> virtio-video structs which we also did in crosvm.

Unfortunately our hypervisor doesn't support mapping random host pages
in the guest. Static allocations of shared memory regions are possible.
But then we have to tell V4L2 to allocate buffers there. Then we'll need
a region per virtual device. This is just very tedious and inflexible.
That's why we're mainly interested in having the guest pages sharing in
the virtio video spec.


>>     b. For modern cameras the V4L2 interface is not enough anyway. This
>> was already discussed AFAIR. There is a separate virtio-camera
>> specification, that indeed is based on V4L2 UAPI as you said. But
>> combining these two specs is certainly not future proof, right? So I
>> think it is best to let the virtio-camera spec to be developed
>> independently.
> I don't know if virtio-camera has made progress that they have not
> published yet, but from what I have seen virtio-v4l2 can cover
> everything that the currently published driver does (I could not find
> a specification, but please point me to it if it exists), so there
> would be no conflict to resolve.
>
> V4L2 with requests support should be capable of handling complex
> camera configurations, but the effort indeed seems to have switched to
> KCAM when it comes to supporting complex native cameras natively. That
> being said:
>
> * KCAM is not merged yet, is probably not going to be for some time
> (https://lwn.net/Articles/904776/), and we don't know how we can
> handle virtualization with it,
> * The fact that the camera is complex on the host does not mean that
> all that complexity needs to be exposed to the guest. I don't know how
> the camera folks want to manage this, but one can imagine that the
> host could expose a simpler model for the virtual camera, with only
> the required knobs, while the host takes care of doing all the complex
> configuration.
> * The counter argument can be made that simple camera devices do not
> need a complex virtualization solution, so one can also invoke
> simplicity here to advocate for virtio-v4l2.
>
> My point is not to say that all other camera virtualization efforts
> should be abandoned - if indeed there is a need for something more
> specific, then nothing prevents us from having a virtio-camera
> specification added. However, we are nowhere close to this at the
> moment, and right now there is no official solution for camera
> virtualization, so I see no reason to deny the opportunity to support
> simple camera devices since its cost would just be to add "and cameras
> device" in the paragraph of the spec that explains what devices are
> supported.

Well, for reasons described above it still seems perfectly fine to me to
have separate devices. Ok, the argument, that this approach also seems
more future-proof, is not a strong one.


>> 3. More specifically I can see, that around 95% V4L2 drivers use
>> videobuf2. This includes the current virtio-video driver. Bypassing the
>> V4L2 subsystem means that vb2 can't be used, right? In various
>> discussions vb2 popped up as a thing, that would be hard to avoid. What
>> do you think about this? How are you going to deal with various V4L2
>> memory types (V4L2_MEMORY_MMAP, V4L2_MEMORY_DMABUF, etc), for example?
>> I'll try to dive deeper myself too...
> VB2 is entirely avoided in the current driver, but my understanding is
> that its helpers could be used if needed.
>
> In virtio-v4l2, MMAP means that the host is responsible for managing
> the buffers, so vb2 is entirely avoided. USERPTR means the guest
> passes a SG list of guest physical addresses as mapping memory. VB2
> may or may not be involved in managing this memory, but most likely
> not if that memory comes from the guest userspace. DMABUF means the
> guest passes a virtio object as the backing memory of the buffer.
> There again there is no particular management to be done on the guest
> side.
>
> I bypassed VB2 for the current driver, and the cost of doing this is
> that I had to write my own mmap() function.

The cost of it as of now is also that:

1. Only guest user-space applications, that use V4L2_MEMORY_MMAP, are
supported AFAIU.
2. There is no flexibility to choose whatever way of memory management
host and guest would like to use. Now the guest user-space application
selects this.

The latter makes the solution much less flexible IMO. For example, this
won't work well with our hypervisor. There might other special needs in
other use-cases. Like sharing these object UUIDs. Probably this can
handled by mapping, for example, V4L2_MEMORY_USERPTR to guest-pages
sharing, V4L2_MEMORY_DMABUF to the UUIDs (which is not quite correct
IMHO). So this already means querying the device for supported sharing
methods, rewriting the flow of V4L2 UAPI calls on the fly, ensuring
consistency, etc. This already looks hackish to me. Do you have a better
plan? Also this limits us to only 3 methods, right? And what if there
are more than 3 methods in the future?

I think this inflexibility is a major problem with this approach.


>>> Actually I don't think this is even something we need to think about -
>>> in its simplest form the V4L2 guest driver just needs to act as a
>>> proxy for the device. So which decoder API is used by the host is
>>> completely irrelevant to the guest driver - it can support a decoder,
>>> an encoder, or a camera - it doesn't even need to be aware of what
>>> kind of device it is exposing and that simplicity is another thing
>>> that I like with this design.
>> As I wrote above the design would be indeed simple only in case the
>> actual hardware is exposed to a backend through V4L2 too. Otherwise the
>> complexity is just moved to backends.
> Yes, and while I acknowledge that, this is not really more complex
> that what you would have to do with a virtio-video device which also
> needs to manage its own state and drive the hardware through backends.
> I say that based on the experience working on the virtio-video device
> in crosvm which follows that design too.

As I wrote above we have a different use-case. And I see the current
state of virtio video as a good common ground for different parties and
use-cases. Unfortunately I don't see any upsides for our use-cases from
the V4L2 UAPI proposal, only downsides.


>>> This simplicity goes away if the guest device does not use V4L2 as its
>>> user-space interface (e.g. Windows?). In this case we would be in the
>>> exact same scenario as the current virtio-video spec, where we need to
>>> build device-specific structures from the guest driver's internal
>>> state.
>> IMO this is not quite correct. The scenario would not be not the same,
>> because the V4L2 stateful decoder API is more complex in comparison to
>> any virtio-video spec draft version. Probably it would be great to have
>> a list of differences. I hope to find some time for this later...
> There is not much difference between the V4L2 stateful decoder spec
> and the virtio-video spec. Actually that's the very reason why I am
> proposing to just virtualize V4L2, we were redoing the same thing.
>
> I have quickly parsed the V4L2 decoder spec and here are the
> differences I have found:
>
> * VIDIOC_STREAMON needs to be called on both queues to start decoding.
> * Frame crop is obtained using VIDIOC_G_SELECTION instead of being
> available alongside the format parameter.
> * End of drain requires to send the V4L2_DEC_CMD_START and call
> VIDIOC_STREAMON again.
> * Seeking is done by calling VIDIOC_STREAMOFF followed by
> VIDIOC_STREAMON on the OUTPUT queue instead of having a dedicated
> command.
>
> ... and that's basically it! Do we really need a new spec just to
> smoothen these differences?

If we look deeper there are more differences. I'm still preparing the
list. Sorry, it takes time.


> I hope I have somehow addressed your points. The main point here is to
> discuss whether the V4L2 UAPI is a suitable transport for guest/host
> accelerated codec work, regardless of what the guest or host
> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
> this is a viable solution. This PoC is largely simplified by the fact
> that V4L2 is used all along the way, but this is irrelevant - yes,
> actual devices will likely talk to other APIs and maintain more state,
> like a virtio-video device would do. What I want to demonstrate is
> that we can send encoding work and receive a valid stream, and that it
> is not costly, and only marginally more complex than our virtio-video
> spec attempts.
>
> ... and we can support cameras too, but that's just a convenient
> side-effect, not the ultimate solution to the camera virtualization
> problem (that's for the camera folks to decide).

Thanks for your answer!


>   Cheers,
> Alex.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-15  5:58                           ` Alexandre Courbot
@ 2023-04-17 12:56                             ` Cornelia Huck
  2023-04-17 13:13                               ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2023-04-17 12:56 UTC (permalink / raw)
  To: Alexandre Courbot, Albert Esteve
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra

On Sat, Apr 15 2023, Alexandre Courbot <acourbot@chromium.org> wrote:

> If nobody strongly objects, I think this can be pushed a bit more
> officially. Cornelia, would you consider it for inclusion if I
> switched the next version of the specification to use V4L2 as the
> host/guest protocol? This may take some more time as I want to confirm
> the last details with code, but it should definitely be faster to
> merge and to test with a real implementation than our previous
> virtio-video attempts.

Yes, please do post a new version of this spec; I agree that an existing
implementation is really helpful here.

[I have proposed July 1st as a "freeze" date for new features for 1.3,
with August 1st as an "everything must be in" date; I'd really like
virtio-video to be a part of 1.3, if possible :)]


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-17 12:56                             ` Cornelia Huck
@ 2023-04-17 13:13                               ` Alexander Gordeev
  2023-04-17 13:22                                 ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-17 13:13 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot, Albert Esteve
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra

Hello Cornelia,

On 17.04.23 14:56, Cornelia Huck wrote:
> On Sat, Apr 15 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
>
>> If nobody strongly objects, I think this can be pushed a bit more
>> officially. Cornelia, would you consider it for inclusion if I
>> switched the next version of the specification to use V4L2 as the
>> host/guest protocol? This may take some more time as I want to confirm
>> the last details with code, but it should definitely be faster to
>> merge and to test with a real implementation than our previous
>> virtio-video attempts.
> Yes, please do post a new version of this spec; I agree that an existing
> implementation is really helpful here.
>
> [I have proposed July 1st as a "freeze" date for new features for 1.3,
> with August 1st as an "everything must be in" date; I'd really like
> virtio-video to be a part of 1.3, if possible :)]

I sent an email minutes ago with an alternative plan. I'd like to
volunteer to continue the evolutionary changes to virtio-video. I'm
already working on the v7 draft. I think it will be available next week.
Then I'll focus on making the driver V4L2 compliant. I think all of this
is achievable by July 1st. At least the spec part. I think the
revolutionary changes should be in a separate namespace. The rationale
is in the longer email. WDYT?

Kind regards,
Alexander Gordeev

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-17 13:13                               ` Alexander Gordeev
@ 2023-04-17 13:22                                 ` Cornelia Huck
  0 siblings, 0 replies; 97+ messages in thread
From: Cornelia Huck @ 2023-04-17 13:22 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot, Albert Esteve
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra

On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:

> Hello Cornelia,
>
> On 17.04.23 14:56, Cornelia Huck wrote:
>> On Sat, Apr 15 2023, Alexandre Courbot <acourbot@chromium.org> wrote:
>>
>>> If nobody strongly objects, I think this can be pushed a bit more
>>> officially. Cornelia, would you consider it for inclusion if I
>>> switched the next version of the specification to use V4L2 as the
>>> host/guest protocol? This may take some more time as I want to confirm
>>> the last details with code, but it should definitely be faster to
>>> merge and to test with a real implementation than our previous
>>> virtio-video attempts.
>> Yes, please do post a new version of this spec; I agree that an existing
>> implementation is really helpful here.
>>
>> [I have proposed July 1st as a "freeze" date for new features for 1.3,
>> with August 1st as an "everything must be in" date; I'd really like
>> virtio-video to be a part of 1.3, if possible :)]
>
> I sent an email minutes ago with an alternative plan. I'd like to
> volunteer to continue the evolutionary changes to virtio-video. I'm
> already working on the v7 draft. I think it will be available next week.
> Then I'll focus on making the driver V4L2 compliant. I think all of this
> is achievable by July 1st. At least the spec part. I think the
> revolutionary changes should be in a separate namespace. The rationale
> is in the longer email. WDYT?

Seems our mails crossed mid-air... I'll go ahead and read your mail (and
probably answer there.) My ultimate goal is to have a spec everyone is
happy with, regardless on how we arrive there.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-17 12:51                             ` Alexander Gordeev
@ 2023-04-17 14:43                               ` Cornelia Huck
  2023-04-19  7:39                                 ` Alexander Gordeev
  2023-04-21  4:02                               ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2023-04-17 14:43 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:

> Hi Alexandre,
>
> Thanks for you letter! Sorry, it took me some time to write an answer.
>
> First of all I'd like to describe my perspective a little bit because it
> seems, that in many cases we (and other people writing their feedbacks)
> simply have very different priorities and background.

Thank you for describing the environment you want to use this in, this
helps to understand the different use cases.

>
> OpenSynergy, the company that I work for, develops a proprietary
> hypervisor called COQOS mainly for automotive and aerospace domains. We
> have our proprietary device implementations, but overall our goal is to
> bring open standards into these quite closed domains and we're betting
> big on virtio. The idea is to run safety-critical functions like cockpit
> controller alongside with multimedia stuff in different VMs on the same
> physical board. Right now they have it on separate physical devices. So
> they already have maximum isolation. And we're trying to make this
> equally safe on a single board. The benefit is the reduced costs and
> some additional features. Of course, we also need features here, but at
> the same time security and ease of certification are among the top of
> our priorities. Nobody wants cars or planes to have security problems,
> right? Also nobody really needs DVB and even more exotic devices in cars
> and planes AFAIK.
>
> For the above mentioned reasons our COQOS hypervisor is running on bare
> metal. Also memory management for the guests is mostly static. It is
> possible to make a shared memory region between a device and a driver
> managed by device in advance. But definitely no mapping of random host
> pages on the fly is supported.
>
> AFAIU crosvm is about making Chrome OS more secure by putting every app
> in its own virtualized environment, right? Both the host and guest are
> linux. In this case I totally understand why V4L2 UAPI pass-through
> feels like a right move. I guess, you'd like to make the switch to
> virtualized apps as seemless as possible for your users. If they can't
> use their DVBs anymore, they complain. And adding the virtualization
> makes the whole thing more secure anyway. So I understand the desire to
> have the range of supported devices as broad as possible. It is also
> understandable that priorities are different with desktop
> virtualization. Also I'm not trying to diminish the great work, that you
> have done. It is just that from my perspective this looks like a step in
> the wrong direction because of the mentioned concerns. So I'm going to
> continue being a skeptic here, sorry.
>
> Of course, I don't expect that you continue working on the old approach
> now as you have put that many efforts into the V4L2 UAPI pass-through.
> So I think it is best to do the evolutionary changes in scope of virtio
> video device specification, and create a new device specification
> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> continue the virtio-video development. In fact I already started making
> draft v7 of the spec according to the comments. I hope it will be ready
> for review soon.
>
> I hope this approach will also help fix issues with virtio-video spec
> and driver development misalignment as well as V4L2 compliance issues
> with the driver. I believe the problems were caused partly by poor
> communication between us and by misalignment of our development cycles,
> not by the driver complexity.
>
> So in my opinion it is OK to have different specs with overlapping
> functionality for some time. My only concern is if this would be
> accepted by the community and the committee. How the things usually go
> here: preferring features and tolerating possible security issues or the
> other way around? Also how acceptable is having linux-specific protocols
> at all?

My main question is: What would be something that we can merge as a
spec, that would either cover the different use cases already, or that
could be easily extended to cover the use cases it does not handle
initially?

For example, can some of the features that would be useful in crosvm be
tucked behind some feature bit(s), so that the more restricted COQOS
hypervisor would simply not offer them? (Two feature bits covering two
different mechanisms, like the current approach and the v4l2 approach,
would also be good, as long as there's enough common ground between the
two.)

If a staged approach (adding features controled by feature bits) would
be possible, that would be my preferred way to do it.

Regarding the protocol: I think Linux-originating protocols (that can be
implemented on non-Linux setups) are fine, Linux-only protocols probably
not so much.

>
> Also I still have concerns about memory management with V4L2 UAPI
> pass-through. Please see below.
>
> On 17.03.23 08:24, Alexandre Courbot wrote:
>> Hi Alexander,
>>
>> On Thu, Mar 16, 2023 at 7:13 PM Alexander Gordeev
>> <alexander.gordeev@opensynergy.com> wrote:
>>> Hi Alexandre,
>>>
>>> On 14.03.23 06:06, Alexandre Courbot wrote:
>>>> The spec should indeed be considerably lighter. I'll wait for more
>>>> feedback, but if the concept appeals to other people as well, I may
>>>> give the spec a try soon.
>>> Did you receive an email I sent on February 7? There was some feedback
>>> there. It has been already established, that V4L2 UAPI pass-through is
>>> technically possible. But I had a couple of points why it is not
>>> desirable. Unfortunately I haven't received a reply. I also don't see
>>> most of these points addressed in any subsequent emails from you.
>>>
>>> I have more to say now, but I'd like to make sure that you're interested
>>> in the discussion first.
>> Sorry about that, I dived head first into the code to see how viable
>> the idea would be and forgot to come back to you. Let me try to answer
>> your points now that I have a better idea of how this would work.
>>
>>>> If we find out that there is a benefit in going through the V4L2
>>>> subsystem (which I cannot see for now), rebuilding the UAPI structures
>>>> to communicate with the device is not different from building
>>>> virtio-video specific structures like what we are currently doing.
>>> Well, the V4L2 subsystem is there for a reason, right? It does some
>>> important things too. I'm going to check all the v4l2_ioctl_ops
>>> callbacks in the current virtio-video driver to make the list. Also if
>>> you have some PoC spec/implementations, that would be nice to review. It
>>> is always better to see the actual implementation, of course.
>>>
>>> I have these points so far:
>>>
>>> 1. Overall the V4L2 stateful decoder API looks significantly more
>>> complex to me. Looks like you're a V4L2 expert, so this might not be
>>> visible to you that much.
>> V4L2 is more generic than virtio-video, so as a result specific uses
>> tend to require a bit more operations. I would argue the mental
>> overhead of working with it is less than significant, and most of it
>> consists in not forgetting to call STREAMON on a queue after some
>> operations. Things like format, resolution and buffer management do
>> not get more complex (and V4L2 is actually more complete than our
>> previous proposal on these).
>>
>> The counterpart of this marginal extra complexity is that you can
>> virtualize more kinds of devices, and even within virtio-video support
>> more formats than what has been specified so far. If your guest is
>> Linux, the same kernel driver can be used to expose any kind of device
>> supported by V4L2, and the driver is also much simpler than
>> virtio-video, so you are actually reducing complexity significantly
>> here. Even if you are not Linux, you can share the V4L2 structures
>> definitions and low-layer code that sends V4L2 commands to the host
>> between drivers. So while it is true that some specifics become
>> slightly more complex, there is a lot of potential simplification when
>> you look at the whole picture.
>>
>> It's an opinionated proposal, and it comes with a few compromises if
>> you are mostly interested in codecs alone. But looking at the guest
>> driver convinces me that this is the better approach when you look at
>> the whole picture.
>
> Sorry, I just see it differently as I tried to describe above. The
> problem is that we don't yet see the whole picture with the V4L2 UAPI
> pass-through. I reviewed the code briefly. It is great, that you already
> implemented the MMAP mode and host allocations already. But I would
> argue, that this is the simplest case. Do you agree? Also this mode of
> operation is not supported in our hypervisor for reasons mentioned
> above. So in our case this PoC doesn't yet prove anything unfortunately.
> I think the real complexity is yet to come.
>
>
>>>     a. So V4L2 subsystem and the current virtio-video driver are already
>>> reducing the complexity. And this seems as the right place to do this,
>>> because the complexity is caused by the amount of V4L2 use cases and its
>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
>>> would prefer a simpler API, right? I think this use-case is not purely
>>> abstract at all.
>> The V4L2 subsystem is there to factorize code that can be shared
>> between drivers and manage their internal state. Our target is the
>> V4L2 UAPI, so a Windows driver needs not be concerned about these
>> details - it does what it would have done with virtio-video, and just
>> uses the V4L2 structures to communicate with the host instead of the
>> virtio-video ones.
>
> It can also reuse the virtio-video structures. So I think despite the
> ability to reuse V4L2 structures, having to implement a linux-specific
> interface would still be a bigger pain.

Hm. Do the v4l2 structures drag in too many adjacent things that need to
be implemented? Can we match the video-video structures from the current
proposal with some v4l2 structures and extract a common wrapper for
those that match, with a feature-bit controlled backend? It would be
fine if any of those backends supported a slightly different subset of
the common parts, as long as the parts implemented by both would be
enough to implement a working device. (Mostly thinking out loud here.)

>
>
>>>     b. Less complex API is better from a security point of view too. When
>>> V4L2 was developed, not many people were concerned with malicious USB
>>> devices probably. At least exploiting a malicious USB device usually
>>> requires physical access. With virtual devices and multiple VMs the
>>> stakes are higher, I believe.
>> That's probably true, but I fail to see how the fact we are using
>> struct v4l2_buffer instead of struct virtio_video_buffer can have an
>> impact on that?
>>
>> V4L2 has a larger UAPI surface because it manages more kinds of
>> devices, but drivers only need to implement the ioctls they need. For
>> the rest, they just return -ENOTTY, and evil actors are hopefully kept
>> at bay.
>
> Still there are definitely more ways to do things wrong. It would be
> harder to audit a larger API surface.
>
>
>>> 2. We have a working virtio-video driver. So we need very good reasons
>>> to start from scratch. You name two reasons AFAIR: simplicity and
>>> possible use of cameras. Did I miss something else?
>>>
>>>     a. The simplicity is there only in case all the interfaces are V4L2,
>>> both in the backend and in the guest. Otherwise the complexity is just
>>> moved to backends. I haven't seen V4L2 in our setups so far, only some
>>> proprietary OMX libraries. So from my point of view, this is not
>>> simplicity in general, but an optimization for a specific narrow use case.
>> V4L2 is not a narrow use-case when it comes to video devices on Linux
>> - basically every user space application involving cameras or codecs
>> can use it. Even the virtio-video driver exposes a V4L2 device, so
>> unless you are using a different driver and proprietary userspace apps
>> specifically written to interact with that driver, V4L2 is involved in
>> your setup at some point.
>
> Sorry, I mean narrow use-case if we look into other possibilities:
>
> 1. Stateless V4L2 on the host.
> 2. Any other interface on the host.
> 3. Any other guest except Linux.
>
> Our targets are several popular embedded SoCs. Unfortunately we don't
> have the luxury of simply having normal V4L2 devices there. And it
> doesn't look like this is going to change.
>
>
>> The guest driver that I wrote is, I think, a good example of the
>> complexity you can expect in terms of guest driver size (as it is
>> pretty functional already with its 1000 and some LoCs). For the UAPI
>> complexity, the host device basically unpacks the information it needs
>> and rebuilds the V4L2 structures before calling into the host device,
>> and I don't see this process as more complex that the unpacking of
>> virtio-video structs which we also did in crosvm.
>
> Unfortunately our hypervisor doesn't support mapping random host pages
> in the guest. Static allocations of shared memory regions are possible.
> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> a region per virtual device. This is just very tedious and inflexible.
> That's why we're mainly interested in having the guest pages sharing in
> the virtio video spec.

This really sounds like you'll want a different approach -- two
mechanisms covered by two feature bits might indeed be the way to go.

>
>
>>>     b. For modern cameras the V4L2 interface is not enough anyway. This
>>> was already discussed AFAIR. There is a separate virtio-camera
>>> specification, that indeed is based on V4L2 UAPI as you said. But
>>> combining these two specs is certainly not future proof, right? So I
>>> think it is best to let the virtio-camera spec to be developed
>>> independently.
>> I don't know if virtio-camera has made progress that they have not
>> published yet, but from what I have seen virtio-v4l2 can cover
>> everything that the currently published driver does (I could not find
>> a specification, but please point me to it if it exists), so there
>> would be no conflict to resolve.
>>
>> V4L2 with requests support should be capable of handling complex
>> camera configurations, but the effort indeed seems to have switched to
>> KCAM when it comes to supporting complex native cameras natively. That
>> being said:
>>
>> * KCAM is not merged yet, is probably not going to be for some time
>> (https://lwn.net/Articles/904776/), and we don't know how we can
>> handle virtualization with it,
>> * The fact that the camera is complex on the host does not mean that
>> all that complexity needs to be exposed to the guest. I don't know how
>> the camera folks want to manage this, but one can imagine that the
>> host could expose a simpler model for the virtual camera, with only
>> the required knobs, while the host takes care of doing all the complex
>> configuration.
>> * The counter argument can be made that simple camera devices do not
>> need a complex virtualization solution, so one can also invoke
>> simplicity here to advocate for virtio-v4l2.
>>
>> My point is not to say that all other camera virtualization efforts
>> should be abandoned - if indeed there is a need for something more
>> specific, then nothing prevents us from having a virtio-camera
>> specification added. However, we are nowhere close to this at the
>> moment, and right now there is no official solution for camera
>> virtualization, so I see no reason to deny the opportunity to support
>> simple camera devices since its cost would just be to add "and cameras
>> device" in the paragraph of the spec that explains what devices are
>> supported.
>
> Well, for reasons described above it still seems perfectly fine to me to
> have separate devices. Ok, the argument, that this approach also seems
> more future-proof, is not a strong one.
>
>
>>> 3. More specifically I can see, that around 95% V4L2 drivers use
>>> videobuf2. This includes the current virtio-video driver. Bypassing the
>>> V4L2 subsystem means that vb2 can't be used, right? In various
>>> discussions vb2 popped up as a thing, that would be hard to avoid. What
>>> do you think about this? How are you going to deal with various V4L2
>>> memory types (V4L2_MEMORY_MMAP, V4L2_MEMORY_DMABUF, etc), for example?
>>> I'll try to dive deeper myself too...
>> VB2 is entirely avoided in the current driver, but my understanding is
>> that its helpers could be used if needed.
>>
>> In virtio-v4l2, MMAP means that the host is responsible for managing
>> the buffers, so vb2 is entirely avoided. USERPTR means the guest
>> passes a SG list of guest physical addresses as mapping memory. VB2
>> may or may not be involved in managing this memory, but most likely
>> not if that memory comes from the guest userspace. DMABUF means the
>> guest passes a virtio object as the backing memory of the buffer.
>> There again there is no particular management to be done on the guest
>> side.
>>
>> I bypassed VB2 for the current driver, and the cost of doing this is
>> that I had to write my own mmap() function.
>
> The cost of it as of now is also that:
>
> 1. Only guest user-space applications, that use V4L2_MEMORY_MMAP, are
> supported AFAIU.
> 2. There is no flexibility to choose whatever way of memory management
> host and guest would like to use. Now the guest user-space application
> selects this.
>
> The latter makes the solution much less flexible IMO. For example, this
> won't work well with our hypervisor. There might other special needs in
> other use-cases. Like sharing these object UUIDs. Probably this can
> handled by mapping, for example, V4L2_MEMORY_USERPTR to guest-pages
> sharing, V4L2_MEMORY_DMABUF to the UUIDs (which is not quite correct
> IMHO). So this already means querying the device for supported sharing
> methods, rewriting the flow of V4L2 UAPI calls on the fly, ensuring
> consistency, etc. This already looks hackish to me. Do you have a better
> plan? Also this limits us to only 3 methods, right? And what if there
> are more than 3 methods in the future?
>
> I think this inflexibility is a major problem with this approach.
>
>
>>>> Actually I don't think this is even something we need to think about -
>>>> in its simplest form the V4L2 guest driver just needs to act as a
>>>> proxy for the device. So which decoder API is used by the host is
>>>> completely irrelevant to the guest driver - it can support a decoder,
>>>> an encoder, or a camera - it doesn't even need to be aware of what
>>>> kind of device it is exposing and that simplicity is another thing
>>>> that I like with this design.
>>> As I wrote above the design would be indeed simple only in case the
>>> actual hardware is exposed to a backend through V4L2 too. Otherwise the
>>> complexity is just moved to backends.
>> Yes, and while I acknowledge that, this is not really more complex
>> that what you would have to do with a virtio-video device which also
>> needs to manage its own state and drive the hardware through backends.
>> I say that based on the experience working on the virtio-video device
>> in crosvm which follows that design too.
>
> As I wrote above we have a different use-case. And I see the current
> state of virtio video as a good common ground for different parties and
> use-cases. Unfortunately I don't see any upsides for our use-cases from
> the V4L2 UAPI proposal, only downsides.
>
>
>>>> This simplicity goes away if the guest device does not use V4L2 as its
>>>> user-space interface (e.g. Windows?). In this case we would be in the
>>>> exact same scenario as the current virtio-video spec, where we need to
>>>> build device-specific structures from the guest driver's internal
>>>> state.
>>> IMO this is not quite correct. The scenario would not be not the same,
>>> because the V4L2 stateful decoder API is more complex in comparison to
>>> any virtio-video spec draft version. Probably it would be great to have
>>> a list of differences. I hope to find some time for this later...
>> There is not much difference between the V4L2 stateful decoder spec
>> and the virtio-video spec. Actually that's the very reason why I am
>> proposing to just virtualize V4L2, we were redoing the same thing.
>>
>> I have quickly parsed the V4L2 decoder spec and here are the
>> differences I have found:
>>
>> * VIDIOC_STREAMON needs to be called on both queues to start decoding.
>> * Frame crop is obtained using VIDIOC_G_SELECTION instead of being
>> available alongside the format parameter.
>> * End of drain requires to send the V4L2_DEC_CMD_START and call
>> VIDIOC_STREAMON again.
>> * Seeking is done by calling VIDIOC_STREAMOFF followed by
>> VIDIOC_STREAMON on the OUTPUT queue instead of having a dedicated
>> command.
>>
>> ... and that's basically it! Do we really need a new spec just to
>> smoothen these differences?
>
> If we look deeper there are more differences. I'm still preparing the
> list. Sorry, it takes time.
>
>
>> I hope I have somehow addressed your points. The main point here is to
>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
>> accelerated codec work, regardless of what the guest or host
>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
>> this is a viable solution. This PoC is largely simplified by the fact
>> that V4L2 is used all along the way, but this is irrelevant - yes,
>> actual devices will likely talk to other APIs and maintain more state,
>> like a virtio-video device would do. What I want to demonstrate is
>> that we can send encoding work and receive a valid stream, and that it
>> is not costly, and only marginally more complex than our virtio-video
>> spec attempts.
>>
>> ... and we can support cameras too, but that's just a convenient
>> side-effect, not the ultimate solution to the camera virtualization
>> problem (that's for the camera folks to decide).
>
> Thanks for your answer!

Thanks everyone -- do you think the "two feature bits to cover different
approaches, but using a common infrastructure" idea could work? If yes,
I think that's the direction we should take. If we can implement this
with just one feature bit, that might also be a good route to extend it
later, but I'm not familiar enough with the whole infrastructure to make
any judgement here.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-17 14:43                               ` Cornelia Huck
@ 2023-04-19  7:39                                 ` Alexander Gordeev
  2023-04-19 21:34                                   ` Enrico Granata
  2023-04-21  4:02                                   ` Alexandre Courbot
  0 siblings, 2 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-19  7:39 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On 17.04.23 16:43, Cornelia Huck wrote:
> On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>
>> Hi Alexandre,
>>
>> Thanks for you letter! Sorry, it took me some time to write an answer.
>>
>> First of all I'd like to describe my perspective a little bit because it
>> seems, that in many cases we (and other people writing their feedbacks)
>> simply have very different priorities and background.
> Thank you for describing the environment you want to use this in, this
> helps to understand the different use cases.

Yeah, I hope too. I should have done that earlier. I think Dmitry
described our use-case, but it was several years ago, so nobody
remembers of course. We should reiterate all the use-cases once in a while.


>> OpenSynergy, the company that I work for, develops a proprietary
>> hypervisor called COQOS mainly for automotive and aerospace domains. We
>> have our proprietary device implementations, but overall our goal is to
>> bring open standards into these quite closed domains and we're betting
>> big on virtio. The idea is to run safety-critical functions like cockpit
>> controller alongside with multimedia stuff in different VMs on the same
>> physical board. Right now they have it on separate physical devices. So
>> they already have maximum isolation. And we're trying to make this
>> equally safe on a single board. The benefit is the reduced costs and
>> some additional features. Of course, we also need features here, but at
>> the same time security and ease of certification are among the top of
>> our priorities. Nobody wants cars or planes to have security problems,
>> right? Also nobody really needs DVB and even more exotic devices in cars
>> and planes AFAIK.
>>
>> For the above mentioned reasons our COQOS hypervisor is running on bare
>> metal. Also memory management for the guests is mostly static. It is
>> possible to make a shared memory region between a device and a driver
>> managed by device in advance. But definitely no mapping of random host
>> pages on the fly is supported.
>>
>> AFAIU crosvm is about making Chrome OS more secure by putting every app
>> in its own virtualized environment, right? Both the host and guest are
>> linux. In this case I totally understand why V4L2 UAPI pass-through
>> feels like a right move. I guess, you'd like to make the switch to
>> virtualized apps as seemless as possible for your users. If they can't
>> use their DVBs anymore, they complain. And adding the virtualization
>> makes the whole thing more secure anyway. So I understand the desire to
>> have the range of supported devices as broad as possible. It is also
>> understandable that priorities are different with desktop
>> virtualization. Also I'm not trying to diminish the great work, that you
>> have done. It is just that from my perspective this looks like a step in
>> the wrong direction because of the mentioned concerns. So I'm going to
>> continue being a skeptic here, sorry.
>>
>> Of course, I don't expect that you continue working on the old approach
>> now as you have put that many efforts into the V4L2 UAPI pass-through.
>> So I think it is best to do the evolutionary changes in scope of virtio
>> video device specification, and create a new device specification
>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
>> continue the virtio-video development. In fact I already started making
>> draft v7 of the spec according to the comments. I hope it will be ready
>> for review soon.
>>
>> I hope this approach will also help fix issues with virtio-video spec
>> and driver development misalignment as well as V4L2 compliance issues
>> with the driver. I believe the problems were caused partly by poor
>> communication between us and by misalignment of our development cycles,
>> not by the driver complexity.
>>
>> So in my opinion it is OK to have different specs with overlapping
>> functionality for some time. My only concern is if this would be
>> accepted by the community and the committee. How the things usually go
>> here: preferring features and tolerating possible security issues or the
>> other way around? Also how acceptable is having linux-specific protocols
>> at all?
> My main question is: What would be something that we can merge as a
> spec, that would either cover the different use cases already, or that
> could be easily extended to cover the use cases it does not handle
> initially?
>
> For example, can some of the features that would be useful in crosvm be
> tucked behind some feature bit(s), so that the more restricted COQOS
> hypervisor would simply not offer them? (Two feature bits covering two
> different mechanisms, like the current approach and the v4l2 approach,
> would also be good, as long as there's enough common ground between the
> two.)
>
> If a staged approach (adding features controled by feature bits) would
> be possible, that would be my preferred way to do it.

Hmm, I see several ways how we can use the feature flags:
1. Basically making two feature flags: one for the current video spec
and one for the V4L2 UAPI pass through. Kind of the same as having two
different specs, but within one device. Not sure which way is better.
Probably having two separate devices would be easier to review and merge.
2. Finding a subset of V4L2, that closely matches the current draft, and
restrict everything else. A perfectly reasoned answer for this case will
require a lot of work going through all the V4L2 structures I think. And
even if we have a concrete plan, it has to be implemented first. I doubt
it is possible. Based on the things, that I already know, this is going
to be a compromise on security anyway, so we're not happy about that.
More on that below.
3. Stop trying to simply pass the V4L2 UAPI through and focus on making
the virtio video spec as close to the V4L2 UAPI as possible, but with
the appropriate security model. So that the video device can be extended
with a feature flag to something very close to full V4L2 UAPI. A lot of
work as well, I think. And this won't allow us to simply link the V4L2
UAPI in the spec and therefore reduce its size, which is Alexandre's
current goal. So Alexandre and his team are not happy this way probably.

 From the security point of view these are our goals from most to less
important AFAIU:
1. Make the device secure. If a device is compromised, the whole
physical machine is at risk. Complexity is the enemy here. It helps a
lot to make the device as straightforward and easy to implement as
possible. Therefore it is good to make the spec device/function-centric
from this PoV.
2. Ensure, that drivers are also secure at least from user-space side.
Maybe from device side too.
3. Implementing secure playback and making sure media doesn't leak. For
this case it is nice to have these object UUIDs as buffers.

Please correct me if there's something wrong here.

When we start looking from this perspective even things like naming
video buffer queues "output" for device input and "capture" for device
output are problematic. In my experience this naming scheme takes some
time and for sure several coding mistakes to get used to. Unfortunately
this can't be turned off with some feature flags. In contrast virtio
video v6 names these queues "input" and "output". This is perfectly fine
if we look from the device side. It is understandable, that Alexandre's
list of differences between V4L2 UAPI and the current state of virtio
video doesn't include these things. But we have to count them, I think.
That's why it takes me so long to make the list. :) So throwing away
this simplicity is still going to be a compromise from the security
perspective, that we're not happy about.

This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
security model, legacy, use-cases, developers, etc. It can be changed
over time, but this is a long process because this means changing the
Linux UAPI. Also this means, that nothing can be removed from it, only
added (if the V4L2 community agrees). For example, we can't simply add a
new way of sharing buffers for potential new use-cases once we switch to
V4L2 UAPI AFAIU. The V4L2 community can simply reject changes because
this is a UAPI after all. We kind of have a weak dependency already,
because the only driver implementation is based on V4L2 and we'd like to
keep the spec as close to V4L2 as possible, but it is not the same
thing. So at the moment it looks like the V4L2 UAPI proposal is not
super flexible. Alexandre said, that we can simply not implement some of
the ioctls. Well, this definitely doesn't cover all the complexity like
the structures and other subtle details.

Also adding the feature flags would probably defeat the stated purpose
of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
V4L2 driver.

So I have a lot of doubts about the feasibility of adding feature flags.
If Alexandre and his team want the V4L2 UAPI as is, then looks like it
is best to simply have two specs:
1. virtio-video for those interested in building from ground up with the
security model appropriate for virtualization in mind. This is going to
take time and not going to reach feature parity with V4L2 ever I think.
I mean some old devices might never get support this way.
2. virtio-v4l2 as a shortcut for those interested in having feature
parity with V4L2 fast. Like a compatibility layer. Probably this is
going to be used in linux host + linux guest use-cases only. Maybe it
gets obsoleted by the first spec in several years for most modern use-cases.

Or maybe have these two cases within a single device spec as I wrote above.

This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
only needed for compatibility, then this way we can avoid a lot of work
going through all of the V4L2 and trying to find different subsets or
trying to construct something, that is close to V4L2 UAPI, but doesn't
compromise on the security. I'm not really interested in doing all this
work because we're already more or less satisfied with the current
state. We don't need feature parity with V4L2. On the other hand for
Alexandre the feature-parity with V4L2 is clearly of higher priority,
than all these subtle security model differences. In my opinion it also
doesn't make sense to invest that much time in something, that looks
like a compatibility layer. So it seems both of us are interested in
avoiding all this extra work. Then I'd just prefer to have two different
specs so that everyone can work according to their priorities.


> Regarding the protocol: I think Linux-originating protocols (that can be
> implemented on non-Linux setups) are fine, Linux-only protocols probably
> not so much.

Thanks for the information. Well, it looks like the V4L2 UAPI could be
implemented on any platform unless it needs a completely new way of
memory management since all the V4L2_MEMORY_* constants are going to be
used already AFAIU.


>>>>      a. So V4L2 subsystem and the current virtio-video driver are already
>>>> reducing the complexity. And this seems as the right place to do this,
>>>> because the complexity is caused by the amount of V4L2 use cases and its
>>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
>>>> would prefer a simpler API, right? I think this use-case is not purely
>>>> abstract at all.
>>> The V4L2 subsystem is there to factorize code that can be shared
>>> between drivers and manage their internal state. Our target is the
>>> V4L2 UAPI, so a Windows driver needs not be concerned about these
>>> details - it does what it would have done with virtio-video, and just
>>> uses the V4L2 structures to communicate with the host instead of the
>>> virtio-video ones.
>> It can also reuse the virtio-video structures. So I think despite the
>> ability to reuse V4L2 structures, having to implement a linux-specific
>> interface would still be a bigger pain.
> Hm. Do the v4l2 structures drag in too many adjacent things that need to
> be implemented? Can we match the video-video structures from the current
> proposal with some v4l2 structures and extract a common wrapper for
> those that match, with a feature-bit controlled backend? It would be
> fine if any of those backends supported a slightly different subset of
> the common parts, as long as the parts implemented by both would be
> enough to implement a working device. (Mostly thinking out loud here.)

I don't think this is realistic unfortunately. On per ioctl level it is
possible to disable some functionality probably, but the V4L2 structures
are set in stone. We can only extend them.


>>> The guest driver that I wrote is, I think, a good example of the
>>> complexity you can expect in terms of guest driver size (as it is
>>> pretty functional already with its 1000 and some LoCs). For the UAPI
>>> complexity, the host device basically unpacks the information it needs
>>> and rebuilds the V4L2 structures before calling into the host device,
>>> and I don't see this process as more complex that the unpacking of
>>> virtio-video structs which we also did in crosvm.
>> Unfortunately our hypervisor doesn't support mapping random host pages
>> in the guest. Static allocations of shared memory regions are possible.
>> But then we have to tell V4L2 to allocate buffers there. Then we'll need
>> a region per virtual device. This is just very tedious and inflexible.
>> That's why we're mainly interested in having the guest pages sharing in
>> the virtio video spec.
> This really sounds like you'll want a different approach -- two
> mechanisms covered by two feature bits might indeed be the way to go.

Well, basically this is the way we have it now. I'm not sure what is
Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
solved, the solution already doesn't look future-proof anyway unfortunately.


>>> I hope I have somehow addressed your points. The main point here is to
>>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
>>> accelerated codec work, regardless of what the guest or host
>>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
>>> this is a viable solution. This PoC is largely simplified by the fact
>>> that V4L2 is used all along the way, but this is irrelevant - yes,
>>> actual devices will likely talk to other APIs and maintain more state,
>>> like a virtio-video device would do. What I want to demonstrate is
>>> that we can send encoding work and receive a valid stream, and that it
>>> is not costly, and only marginally more complex than our virtio-video
>>> spec attempts.
>>>
>>> ... and we can support cameras too, but that's just a convenient
>>> side-effect, not the ultimate solution to the camera virtualization
>>> problem (that's for the camera folks to decide).
>> Thanks for your answer!
> Thanks everyone -- do you think the "two feature bits to cover different
> approaches, but using a common infrastructure" idea could work? If yes,
> I think that's the direction we should take. If we can implement this
> with just one feature bit, that might also be a good route to extend it
> later, but I'm not familiar enough with the whole infrastructure to make
> any judgement here.

Thanks for your suggestions. Hopefully we end up with a good solution.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-19  7:39                                 ` Alexander Gordeev
@ 2023-04-19 21:34                                   ` Enrico Granata
  2023-04-21 14:48                                     ` Alexander Gordeev
  2023-04-21  4:02                                   ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Enrico Granata @ 2023-04-19 21:34 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, Alexandre Courbot, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

Inlined

Thanks,
- Enrico

Thanks,
- Enrico


On Wed, Apr 19, 2023 at 12:39 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 17.04.23 16:43, Cornelia Huck wrote:
> > On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >
> >> Hi Alexandre,
> >>
> >> Thanks for you letter! Sorry, it took me some time to write an answer.
> >>
> >> First of all I'd like to describe my perspective a little bit because it
> >> seems, that in many cases we (and other people writing their feedbacks)
> >> simply have very different priorities and background.
> > Thank you for describing the environment you want to use this in, this
> > helps to understand the different use cases.
>
> Yeah, I hope too. I should have done that earlier. I think Dmitry
> described our use-case, but it was several years ago, so nobody
> remembers of course. We should reiterate all the use-cases once in a while.
>
>
> >> OpenSynergy, the company that I work for, develops a proprietary
> >> hypervisor called COQOS mainly for automotive and aerospace domains. We
> >> have our proprietary device implementations, but overall our goal is to
> >> bring open standards into these quite closed domains and we're betting
> >> big on virtio. The idea is to run safety-critical functions like cockpit
> >> controller alongside with multimedia stuff in different VMs on the same
> >> physical board. Right now they have it on separate physical devices. So
> >> they already have maximum isolation. And we're trying to make this
> >> equally safe on a single board. The benefit is the reduced costs and
> >> some additional features. Of course, we also need features here, but at
> >> the same time security and ease of certification are among the top of
> >> our priorities. Nobody wants cars or planes to have security problems,
> >> right? Also nobody really needs DVB and even more exotic devices in cars
> >> and planes AFAIK.
> >>
> >> For the above mentioned reasons our COQOS hypervisor is running on bare
> >> metal. Also memory management for the guests is mostly static. It is
> >> possible to make a shared memory region between a device and a driver
> >> managed by device in advance. But definitely no mapping of random host
> >> pages on the fly is supported.
> >>
> >> AFAIU crosvm is about making Chrome OS more secure by putting every app
> >> in its own virtualized environment, right? Both the host and guest are
> >> linux. In this case I totally understand why V4L2 UAPI pass-through
> >> feels like a right move. I guess, you'd like to make the switch to
> >> virtualized apps as seemless as possible for your users. If they can't
> >> use their DVBs anymore, they complain. And adding the virtualization
> >> makes the whole thing more secure anyway. So I understand the desire to
> >> have the range of supported devices as broad as possible. It is also
> >> understandable that priorities are different with desktop
> >> virtualization. Also I'm not trying to diminish the great work, that you
> >> have done. It is just that from my perspective this looks like a step in
> >> the wrong direction because of the mentioned concerns. So I'm going to
> >> continue being a skeptic here, sorry.
> >>
> >> Of course, I don't expect that you continue working on the old approach
> >> now as you have put that many efforts into the V4L2 UAPI pass-through.
> >> So I think it is best to do the evolutionary changes in scope of virtio
> >> video device specification, and create a new device specification
> >> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> >> continue the virtio-video development. In fact I already started making
> >> draft v7 of the spec according to the comments. I hope it will be ready
> >> for review soon.
> >>
> >> I hope this approach will also help fix issues with virtio-video spec
> >> and driver development misalignment as well as V4L2 compliance issues
> >> with the driver. I believe the problems were caused partly by poor
> >> communication between us and by misalignment of our development cycles,
> >> not by the driver complexity.
> >>
> >> So in my opinion it is OK to have different specs with overlapping
> >> functionality for some time. My only concern is if this would be
> >> accepted by the community and the committee. How the things usually go
> >> here: preferring features and tolerating possible security issues or the
> >> other way around? Also how acceptable is having linux-specific protocols
> >> at all?
> > My main question is: What would be something that we can merge as a
> > spec, that would either cover the different use cases already, or that
> > could be easily extended to cover the use cases it does not handle
> > initially?
> >
> > For example, can some of the features that would be useful in crosvm be
> > tucked behind some feature bit(s), so that the more restricted COQOS
> > hypervisor would simply not offer them? (Two feature bits covering two
> > different mechanisms, like the current approach and the v4l2 approach,
> > would also be good, as long as there's enough common ground between the
> > two.)
> >
> > If a staged approach (adding features controled by feature bits) would
> > be possible, that would be my preferred way to do it.
>
> Hmm, I see several ways how we can use the feature flags:
> 1. Basically making two feature flags: one for the current video spec
> and one for the V4L2 UAPI pass through. Kind of the same as having two
> different specs, but within one device. Not sure which way is better.
> Probably having two separate devices would be easier to review and merge.

I agree with this. It may be worth renaming the specs to something,
say virtio-vencdec and virtio-v4l2 </bikeshedding>
Having one spec for what is really two devices forked by flags may
superficially seem simpler and less controversial (everyone gets what
they want, right?) but it feels bolted on.

> 2. Finding a subset of V4L2, that closely matches the current draft, and
> restrict everything else. A perfectly reasoned answer for this case will
> require a lot of work going through all the V4L2 structures I think. And
> even if we have a concrete plan, it has to be implemented first. I doubt
> it is possible. Based on the things, that I already know, this is going
> to be a compromise on security anyway, so we're not happy about that.
> More on that below.

To be fair, this could be done on the device side, right? I am not
saying it would be trivial, but the device implementation could
restrict the subset of V4L2 it accepts to whatever it feels is "safe"
and limit the rest.

> 3. Stop trying to simply pass the V4L2 UAPI through and focus on making
> the virtio video spec as close to the V4L2 UAPI as possible, but with
> the appropriate security model. So that the video device can be extended
> with a feature flag to something very close to full V4L2 UAPI. A lot of
> work as well, I think. And this won't allow us to simply link the V4L2
> UAPI in the spec and therefore reduce its size, which is Alexandre's
> current goal. So Alexandre and his team are not happy this way probably.
>
>  From the security point of view these are our goals from most to less
> important AFAIU:
> 1. Make the device secure. If a device is compromised, the whole
> physical machine is at risk. Complexity is the enemy here. It helps a
> lot to make the device as straightforward and easy to implement as
> possible. Therefore it is good to make the spec device/function-centric
> from this PoV.
> 2. Ensure, that drivers are also secure at least from user-space side.
> Maybe from device side too.
> 3. Implementing secure playback and making sure media doesn't leak. For
> this case it is nice to have these object UUIDs as buffers.
>
> Please correct me if there's something wrong here.
>
> When we start looking from this perspective even things like naming
> video buffer queues "output" for device input and "capture" for device
> output are problematic. In my experience this naming scheme takes some
> time and for sure several coding mistakes to get used to. Unfortunately
> this can't be turned off with some feature flags. In contrast virtio
> video v6 names these queues "input" and "output". This is perfectly fine
> if we look from the device side. It is understandable, that Alexandre's
> list of differences between V4L2 UAPI and the current state of virtio
> video doesn't include these things. But we have to count them, I think.
> That's why it takes me so long to make the list. :) So throwing away
> this simplicity is still going to be a compromise from the security
> perspective, that we're not happy about.
>
> This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
> security model, legacy, use-cases, developers, etc. It can be changed
> over time, but this is a long process because this means changing the
> Linux UAPI. Also this means, that nothing can be removed from it, only
> added (if the V4L2 community agrees). For example, we can't simply add a
> new way of sharing buffers for potential new use-cases once we switch to
> V4L2 UAPI AFAIU. The V4L2 community can simply reject changes because
> this is a UAPI after all. We kind of have a weak dependency already,
> because the only driver implementation is based on V4L2 and we'd like to
> keep the spec as close to V4L2 as possible, but it is not the same
> thing. So at the moment it looks like the V4L2 UAPI proposal is not
> super flexible. Alexandre said, that we can simply not implement some of
> the ioctls. Well, this definitely doesn't cover all the complexity like
> the structures and other subtle details.
>
> Also adding the feature flags would probably defeat the stated purpose
> of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
> V4L2 driver.
>
> So I have a lot of doubts about the feasibility of adding feature flags.
> If Alexandre and his team want the V4L2 UAPI as is, then looks like it
> is best to simply have two specs:
> 1. virtio-video for those interested in building from ground up with the
> security model appropriate for virtualization in mind. This is going to
> take time and not going to reach feature parity with V4L2 ever I think.
> I mean some old devices might never get support this way.
> 2. virtio-v4l2 as a shortcut for those interested in having feature
> parity with V4L2 fast. Like a compatibility layer. Probably this is
> going to be used in linux host + linux guest use-cases only. Maybe it
> gets obsoleted by the first spec in several years for most modern use-cases.

This latter device also makes a lot of sense for Android (I know,
Linux based) where there are nifty implementations of a number of HALs
that will just work if you assume V4L2 and will not (or will not
without rework) if you assume anything else.

>
> Or maybe have these two cases within a single device spec as I wrote above.
>
> This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
> only needed for compatibility, then this way we can avoid a lot of work
> going through all of the V4L2 and trying to find different subsets or
> trying to construct something, that is close to V4L2 UAPI, but doesn't
> compromise on the security. I'm not really interested in doing all this
> work because we're already more or less satisfied with the current
> state. We don't need feature parity with V4L2. On the other hand for
> Alexandre the feature-parity with V4L2 is clearly of higher priority,
> than all these subtle security model differences. In my opinion it also
> doesn't make sense to invest that much time in something, that looks
> like a compatibility layer. So it seems both of us are interested in
> avoiding all this extra work. Then I'd just prefer to have two different
> specs so that everyone can work according to their priorities.
>
>
> > Regarding the protocol: I think Linux-originating protocols (that can be
> > implemented on non-Linux setups) are fine, Linux-only protocols probably
> > not so much.
>
> Thanks for the information. Well, it looks like the V4L2 UAPI could be
> implemented on any platform unless it needs a completely new way of
> memory management since all the V4L2_MEMORY_* constants are going to be
> used already AFAIU.
>
>
> >>>>      a. So V4L2 subsystem and the current virtio-video driver are already
> >>>> reducing the complexity. And this seems as the right place to do this,
> >>>> because the complexity is caused by the amount of V4L2 use cases and its
> >>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
> >>>> would prefer a simpler API, right? I think this use-case is not purely
> >>>> abstract at all.
> >>> The V4L2 subsystem is there to factorize code that can be shared
> >>> between drivers and manage their internal state. Our target is the
> >>> V4L2 UAPI, so a Windows driver needs not be concerned about these
> >>> details - it does what it would have done with virtio-video, and just
> >>> uses the V4L2 structures to communicate with the host instead of the
> >>> virtio-video ones.
> >> It can also reuse the virtio-video structures. So I think despite the
> >> ability to reuse V4L2 structures, having to implement a linux-specific
> >> interface would still be a bigger pain.
> > Hm. Do the v4l2 structures drag in too many adjacent things that need to
> > be implemented? Can we match the video-video structures from the current
> > proposal with some v4l2 structures and extract a common wrapper for
> > those that match, with a feature-bit controlled backend? It would be
> > fine if any of those backends supported a slightly different subset of
> > the common parts, as long as the parts implemented by both would be
> > enough to implement a working device. (Mostly thinking out loud here.)
>
> I don't think this is realistic unfortunately. On per ioctl level it is
> possible to disable some functionality probably, but the V4L2 structures
> are set in stone. We can only extend them.
>
>
> >>> The guest driver that I wrote is, I think, a good example of the
> >>> complexity you can expect in terms of guest driver size (as it is
> >>> pretty functional already with its 1000 and some LoCs). For the UAPI
> >>> complexity, the host device basically unpacks the information it needs
> >>> and rebuilds the V4L2 structures before calling into the host device,
> >>> and I don't see this process as more complex that the unpacking of
> >>> virtio-video structs which we also did in crosvm.
> >> Unfortunately our hypervisor doesn't support mapping random host pages
> >> in the guest. Static allocations of shared memory regions are possible.
> >> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> >> a region per virtual device. This is just very tedious and inflexible.
> >> That's why we're mainly interested in having the guest pages sharing in
> >> the virtio video spec.
> > This really sounds like you'll want a different approach -- two
> > mechanisms covered by two feature bits might indeed be the way to go.
>
> Well, basically this is the way we have it now. I'm not sure what is
> Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
> solved, the solution already doesn't look future-proof anyway unfortunately.
>
>
> >>> I hope I have somehow addressed your points. The main point here is to
> >>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
> >>> accelerated codec work, regardless of what the guest or host
> >>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
> >>> this is a viable solution. This PoC is largely simplified by the fact
> >>> that V4L2 is used all along the way, but this is irrelevant - yes,
> >>> actual devices will likely talk to other APIs and maintain more state,
> >>> like a virtio-video device would do. What I want to demonstrate is
> >>> that we can send encoding work and receive a valid stream, and that it
> >>> is not costly, and only marginally more complex than our virtio-video
> >>> spec attempts.
> >>>
> >>> ... and we can support cameras too, but that's just a convenient
> >>> side-effect, not the ultimate solution to the camera virtualization
> >>> problem (that's for the camera folks to decide).
> >> Thanks for your answer!
> > Thanks everyone -- do you think the "two feature bits to cover different
> > approaches, but using a common infrastructure" idea could work? If yes,
> > I think that's the direction we should take. If we can implement this
> > with just one feature bit, that might also be a good route to extend it
> > later, but I'm not familiar enough with the whole infrastructure to make
> > any judgement here.
>
> Thanks for your suggestions. Hopefully we end up with a good solution.
>
>
> --
> Alexander Gordeev
> Senior Software Engineer
>
> OpenSynergy GmbH
> Rotherstr. 20, 10245 Berlin
>
> Phone: +49 30 60 98 54 0 - 88
> Fax: +49 (30) 60 98 54 0 - 99
> EMail: alexander.gordeev@opensynergy.com
>
> www.opensynergy.com
>
> Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
> Geschäftsführer/Managing Director: Régis Adjamah
>
>
> Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-17 12:51                             ` Alexander Gordeev
  2023-04-17 14:43                               ` Cornelia Huck
@ 2023-04-21  4:02                               ` Alexandre Courbot
  2023-04-26 15:11                                 ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-21  4:02 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

Hi Alexander,

On Mon, Apr 17, 2023 at 9:52 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> Hi Alexandre,
>
> Thanks for you letter! Sorry, it took me some time to write an answer.
>
> First of all I'd like to describe my perspective a little bit because it
> seems, that in many cases we (and other people writing their feedbacks)
> simply have very different priorities and background.
>
> OpenSynergy, the company that I work for, develops a proprietary
> hypervisor called COQOS mainly for automotive and aerospace domains. We
> have our proprietary device implementations, but overall our goal is to
> bring open standards into these quite closed domains and we're betting
> big on virtio. The idea is to run safety-critical functions like cockpit
> controller alongside with multimedia stuff in different VMs on the same
> physical board. Right now they have it on separate physical devices. So
> they already have maximum isolation. And we're trying to make this
> equally safe on a single board. The benefit is the reduced costs and
> some additional features. Of course, we also need features here, but at
> the same time security and ease of certification are among the top of
> our priorities. Nobody wants cars or planes to have security problems,
> right? Also nobody really needs DVB and even more exotic devices in cars
> and planes AFAIK.
>
> For the above mentioned reasons our COQOS hypervisor is running on bare
> metal. Also memory management for the guests is mostly static. It is
> possible to make a shared memory region between a device and a driver
> managed by device in advance. But definitely no mapping of random host
> pages on the fly is supported.
>
> AFAIU crosvm is about making Chrome OS more secure by putting every app
> in its own virtualized environment, right?

Not really, but for the discussion here you can assume that it is a
VMM similar to QEmu with KVM enabled.

> Both the host and guest are
> linux. In this case I totally understand why V4L2 UAPI pass-through
> feels like a right move. I guess, you'd like to make the switch to
> virtualized apps as seemless as possible for your users. If they can't
> use their DVBs anymore, they complain. And adding the virtualization
> makes the whole thing more secure anyway. So I understand the desire to
> have the range of supported devices as broad as possible. It is also
> understandable that priorities are different with desktop
> virtualization. Also I'm not trying to diminish the great work, that you
> have done. It is just that from my perspective this looks like a step in
> the wrong direction because of the mentioned concerns. So I'm going to
> continue being a skeptic here, sorry.
>
> Of course, I don't expect that you continue working on the old approach
> now as you have put that many efforts into the V4L2 UAPI pass-through.
> So I think it is best to do the evolutionary changes in scope of virtio
> video device specification, and create a new device specification
> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> continue the virtio-video development. In fact I already started making
> draft v7 of the spec according to the comments. I hope it will be ready
> for review soon.
>
> I hope this approach will also help fix issues with virtio-video spec
> and driver development misalignment as well as V4L2 compliance issues
> with the driver. I believe the problems were caused partly by poor
> communication between us and by misalignment of our development cycles,
> not by the driver complexity.
>
> So in my opinion it is OK to have different specs with overlapping
> functionality for some time. My only concern is if this would be
> accepted by the community and the committee. How the things usually go
> here: preferring features and tolerating possible security issues or the
> other way around? Also how acceptable is having linux-specific protocols
> at all?
>
> Also I still have concerns about memory management with V4L2 UAPI
> pass-through. Please see below.
>
> On 17.03.23 08:24, Alexandre Courbot wrote:
> > Hi Alexander,
> >
> > On Thu, Mar 16, 2023 at 7:13 PM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >> Hi Alexandre,
> >>
> >> On 14.03.23 06:06, Alexandre Courbot wrote:
> >>> The spec should indeed be considerably lighter. I'll wait for more
> >>> feedback, but if the concept appeals to other people as well, I may
> >>> give the spec a try soon.
> >> Did you receive an email I sent on February 7? There was some feedback
> >> there. It has been already established, that V4L2 UAPI pass-through is
> >> technically possible. But I had a couple of points why it is not
> >> desirable. Unfortunately I haven't received a reply. I also don't see
> >> most of these points addressed in any subsequent emails from you.
> >>
> >> I have more to say now, but I'd like to make sure that you're interested
> >> in the discussion first.
> > Sorry about that, I dived head first into the code to see how viable
> > the idea would be and forgot to come back to you. Let me try to answer
> > your points now that I have a better idea of how this would work.
> >
> >>> If we find out that there is a benefit in going through the V4L2
> >>> subsystem (which I cannot see for now), rebuilding the UAPI structures
> >>> to communicate with the device is not different from building
> >>> virtio-video specific structures like what we are currently doing.
> >> Well, the V4L2 subsystem is there for a reason, right? It does some
> >> important things too. I'm going to check all the v4l2_ioctl_ops
> >> callbacks in the current virtio-video driver to make the list. Also if
> >> you have some PoC spec/implementations, that would be nice to review. It
> >> is always better to see the actual implementation, of course.
> >>
> >> I have these points so far:
> >>
> >> 1. Overall the V4L2 stateful decoder API looks significantly more
> >> complex to me. Looks like you're a V4L2 expert, so this might not be
> >> visible to you that much.
> > V4L2 is more generic than virtio-video, so as a result specific uses
> > tend to require a bit more operations. I would argue the mental
> > overhead of working with it is less than significant, and most of it
> > consists in not forgetting to call STREAMON on a queue after some
> > operations. Things like format, resolution and buffer management do
> > not get more complex (and V4L2 is actually more complete than our
> > previous proposal on these).
> >
> > The counterpart of this marginal extra complexity is that you can
> > virtualize more kinds of devices, and even within virtio-video support
> > more formats than what has been specified so far. If your guest is
> > Linux, the same kernel driver can be used to expose any kind of device
> > supported by V4L2, and the driver is also much simpler than
> > virtio-video, so you are actually reducing complexity significantly
> > here. Even if you are not Linux, you can share the V4L2 structures
> > definitions and low-layer code that sends V4L2 commands to the host
> > between drivers. So while it is true that some specifics become
> > slightly more complex, there is a lot of potential simplification when
> > you look at the whole picture.
> >
> > It's an opinionated proposal, and it comes with a few compromises if
> > you are mostly interested in codecs alone. But looking at the guest
> > driver convinces me that this is the better approach when you look at
> > the whole picture.
>
> Sorry, I just see it differently as I tried to describe above. The
> problem is that we don't yet see the whole picture with the V4L2 UAPI
> pass-through. I reviewed the code briefly. It is great, that you already
> implemented the MMAP mode and host allocations already. But I would
> argue, that this is the simplest case. Do you agree?

I was trying to do a proof-of-concept here, of course it is not
feature-complete and of course I started with the simplest case. I
don't see your point here.

> Also this mode of
> operation is not supported in our hypervisor for reasons mentioned
> above. So in our case this PoC doesn't yet prove anything unfortunately.

I did not have your use-case in mind while writing the PoC, its
purpose was to demonstrate the suitability of V4L2 as a protocol for
virtualizing video.

Now if your hypervisor does static memory management and pre-allocates
memory for guest buffers, then the V4L2 MMAP memory type actually
looks like the best fit for the job. There are no tokens like virtio
objects UUID to manage, and the MMAP request can be as simple as
returning the pre-mapped address of the buffer in the guest PAS.

If instead it carves some predefined amount of memory out for the
whole guest and expects it to allocate buffer memory from there, then
the USERPTR memory type (which works like the guest pages of
virtio-video) is what you want to use.

> I think the real complexity is yet to come.

Evidence would be appreciated.

>
>
> >>     a. So V4L2 subsystem and the current virtio-video driver are already
> >> reducing the complexity. And this seems as the right place to do this,
> >> because the complexity is caused by the amount of V4L2 use cases and its
> >> legacy. If somebody wants to use virtio-video in a Windows guest, they
> >> would prefer a simpler API, right? I think this use-case is not purely
> >> abstract at all.
> > The V4L2 subsystem is there to factorize code that can be shared
> > between drivers and manage their internal state. Our target is the
> > V4L2 UAPI, so a Windows driver needs not be concerned about these
> > details - it does what it would have done with virtio-video, and just
> > uses the V4L2 structures to communicate with the host instead of the
> > virtio-video ones.
>
> It can also reuse the virtio-video structures. So I think despite the
> ability to reuse V4L2 structures, having to implement a linux-specific
> interface would still be a bigger pain.

The only Linux-specific thing in this interface is that it
misleadingly has "Linux" in its name. Otherwise it's really similar to
what we previously had.

>
>
> >>     b. Less complex API is better from a security point of view too. When
> >> V4L2 was developed, not many people were concerned with malicious USB
> >> devices probably. At least exploiting a malicious USB device usually
> >> requires physical access. With virtual devices and multiple VMs the
> >> stakes are higher, I believe.
> > That's probably true, but I fail to see how the fact we are using
> > struct v4l2_buffer instead of struct virtio_video_buffer can have an
> > impact on that?
> >
> > V4L2 has a larger UAPI surface because it manages more kinds of
> > devices, but drivers only need to implement the ioctls they need. For
> > the rest, they just return -ENOTTY, and evil actors are hopefully kept
> > at bay.
>
> Still there are definitely more ways to do things wrong. It would be
> harder to audit a larger API surface.

If you write a video device you don't need to support more API than
that requested for your device. All unsupported interfaces can simply
return ENOTTY.

>
>
> >> 2. We have a working virtio-video driver. So we need very good reasons
> >> to start from scratch. You name two reasons AFAIR: simplicity and
> >> possible use of cameras. Did I miss something else?
> >>
> >>     a. The simplicity is there only in case all the interfaces are V4L2,
> >> both in the backend and in the guest. Otherwise the complexity is just
> >> moved to backends. I haven't seen V4L2 in our setups so far, only some
> >> proprietary OMX libraries. So from my point of view, this is not
> >> simplicity in general, but an optimization for a specific narrow use case.
> > V4L2 is not a narrow use-case when it comes to video devices on Linux
> > - basically every user space application involving cameras or codecs
> > can use it. Even the virtio-video driver exposes a V4L2 device, so
> > unless you are using a different driver and proprietary userspace apps
> > specifically written to interact with that driver, V4L2 is involved in
> > your setup at some point.
>
> Sorry, I mean narrow use-case if we look into other possibilities:
>
> 1. Stateless V4L2 on the host.
> 2. Any other interface on the host.
> 3. Any other guest except Linux.
>
> Our targets are several popular embedded SoCs. Unfortunately we don't
> have the luxury of simply having normal V4L2 devices there. And it
> doesn't look like this is going to change.
>
>
> > The guest driver that I wrote is, I think, a good example of the
> > complexity you can expect in terms of guest driver size (as it is
> > pretty functional already with its 1000 and some LoCs). For the UAPI
> > complexity, the host device basically unpacks the information it needs
> > and rebuilds the V4L2 structures before calling into the host device,
> > and I don't see this process as more complex that the unpacking of
> > virtio-video structs which we also did in crosvm.
>
> Unfortunately our hypervisor doesn't support mapping random host pages
> in the guest.

The ability to map random host pages to the guest is *not* a
requirement of virtio-v4l2.

> Static allocations of shared memory regions are possible.
> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> a region per virtual device. This is just very tedious and inflexible.
> That's why we're mainly interested in having the guest pages sharing in
> the virtio video spec.

I'll be happy to update the PoC and make it able to use guest pages as
buffer backing memory. It just wasn't the priority to demonstrate the
global approach.

>
>
> >>     b. For modern cameras the V4L2 interface is not enough anyway. This
> >> was already discussed AFAIR. There is a separate virtio-camera
> >> specification, that indeed is based on V4L2 UAPI as you said. But
> >> combining these two specs is certainly not future proof, right? So I
> >> think it is best to let the virtio-camera spec to be developed
> >> independently.
> > I don't know if virtio-camera has made progress that they have not
> > published yet, but from what I have seen virtio-v4l2 can cover
> > everything that the currently published driver does (I could not find
> > a specification, but please point me to it if it exists), so there
> > would be no conflict to resolve.
> >
> > V4L2 with requests support should be capable of handling complex
> > camera configurations, but the effort indeed seems to have switched to
> > KCAM when it comes to supporting complex native cameras natively. That
> > being said:
> >
> > * KCAM is not merged yet, is probably not going to be for some time
> > (https://lwn.net/Articles/904776/), and we don't know how we can
> > handle virtualization with it,
> > * The fact that the camera is complex on the host does not mean that
> > all that complexity needs to be exposed to the guest. I don't know how
> > the camera folks want to manage this, but one can imagine that the
> > host could expose a simpler model for the virtual camera, with only
> > the required knobs, while the host takes care of doing all the complex
> > configuration.
> > * The counter argument can be made that simple camera devices do not
> > need a complex virtualization solution, so one can also invoke
> > simplicity here to advocate for virtio-v4l2.
> >
> > My point is not to say that all other camera virtualization efforts
> > should be abandoned - if indeed there is a need for something more
> > specific, then nothing prevents us from having a virtio-camera
> > specification added. However, we are nowhere close to this at the
> > moment, and right now there is no official solution for camera
> > virtualization, so I see no reason to deny the opportunity to support
> > simple camera devices since its cost would just be to add "and cameras
> > device" in the paragraph of the spec that explains what devices are
> > supported.
>
> Well, for reasons described above it still seems perfectly fine to me to
> have separate devices. Ok, the argument, that this approach also seems
> more future-proof, is not a strong one.

Please elaborate on its weaknesses then.

>
>
> >> 3. More specifically I can see, that around 95% V4L2 drivers use
> >> videobuf2. This includes the current virtio-video driver. Bypassing the
> >> V4L2 subsystem means that vb2 can't be used, right? In various
> >> discussions vb2 popped up as a thing, that would be hard to avoid. What
> >> do you think about this? How are you going to deal with various V4L2
> >> memory types (V4L2_MEMORY_MMAP, V4L2_MEMORY_DMABUF, etc), for example?
> >> I'll try to dive deeper myself too...
> > VB2 is entirely avoided in the current driver, but my understanding is
> > that its helpers could be used if needed.
> >
> > In virtio-v4l2, MMAP means that the host is responsible for managing
> > the buffers, so vb2 is entirely avoided. USERPTR means the guest
> > passes a SG list of guest physical addresses as mapping memory. VB2
> > may or may not be involved in managing this memory, but most likely
> > not if that memory comes from the guest userspace. DMABUF means the
> > guest passes a virtio object as the backing memory of the buffer.
> > There again there is no particular management to be done on the guest
> > side.
> >
> > I bypassed VB2 for the current driver, and the cost of doing this is
> > that I had to write my own mmap() function.
>
> The cost of it as of now is also that:
>
> 1. Only guest user-space applications, that use V4L2_MEMORY_MMAP, are
> supported AFAIU.

This has nothing to do with VB2. I wanted to demonstrate that V4L2
could be used as a host-guest protocol and did it on a single memory
type to release something quickly. Please stop strawmanning the design
because the PoC is still incomplete.

> 2. There is no flexibility to choose whatever way of memory management
> host and guest would like to use. Now the guest user-space application
> selects this.

Errr no. The guest user-space chooses a type of memory from what the
guest kernel exposes, which depends on what the host itself decides to
expose.

>
> The latter makes the solution much less flexible IMO. For example, this
> won't work well with our hypervisor. There might other special needs in
> other use-cases. Like sharing these object UUIDs. Probably this can
> handled by mapping, for example, V4L2_MEMORY_USERPTR to guest-pages
> sharing, V4L2_MEMORY_DMABUF to the UUIDs (which is not quite correct
> IMHO).

Please elaborate on why this is not correct.

> So this already means querying the device for supported sharing
> methods, rewriting the flow of V4L2 UAPI calls on the fly, ensuring
> consistency, etc. This already looks hackish to me. Do you have a better
> plan?

How do you support different kinds of memory without querying? Or do
you suggest we stick to a single one?

I am also not quite sure what you mean by "rewriting the flow of V4L2
UAPI calls on the fly". There is no "rewriting" - V4L2 structures are
just used to communicate with the host instead of virtio-video
structures.

> Also this limits us to only 3 methods, right? And what if there
> are more than 3 methods in the future?

Nothing prevents us from adding new virtio-specific memory types if
needed. But what other methods did you have in mind?

>
> I think this inflexibility is a major problem with this approach.
>
>
> >>> Actually I don't think this is even something we need to think about -
> >>> in its simplest form the V4L2 guest driver just needs to act as a
> >>> proxy for the device. So which decoder API is used by the host is
> >>> completely irrelevant to the guest driver - it can support a decoder,
> >>> an encoder, or a camera - it doesn't even need to be aware of what
> >>> kind of device it is exposing and that simplicity is another thing
> >>> that I like with this design.
> >> As I wrote above the design would be indeed simple only in case the
> >> actual hardware is exposed to a backend through V4L2 too. Otherwise the
> >> complexity is just moved to backends.
> > Yes, and while I acknowledge that, this is not really more complex
> > that what you would have to do with a virtio-video device which also
> > needs to manage its own state and drive the hardware through backends.
> > I say that based on the experience working on the virtio-video device
> > in crosvm which follows that design too.
>
> As I wrote above we have a different use-case. And I see the current
> state of virtio video as a good common ground for different parties and
> use-cases. Unfortunately I don't see any upsides for our use-cases from
> the V4L2 UAPI proposal, only downsides.

Well AFAICT V4L2 provides the exact same set of capabilities as
virtio-video, with only minor differences. If virtio-video was
suitable for your use-case, V4L2 should be as well.

Maybe it makes things marginally more complex for your particular
proprietary bare-metal hypervisor. But it also makes things
dramatically easier and provides much more features for the vast
majority of the virtio audience who run Linux guests and can now use a
much simpler driver. Which one do we want to prioritize?

I'm sorry but your answer is full of vague assertions about supposed
shortcomings of the approach without any concrete evidence of its
unsuitability. Please show us why this wouldn't work for you.

Thanks,
Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-19  7:39                                 ` Alexander Gordeev
  2023-04-19 21:34                                   ` Enrico Granata
@ 2023-04-21  4:02                                   ` Alexandre Courbot
  2023-04-21 16:01                                     ` Alexander Gordeev
                                                       ` (2 more replies)
  1 sibling, 3 replies; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-21  4:02 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Wed, Apr 19, 2023 at 4:39 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 17.04.23 16:43, Cornelia Huck wrote:
> > On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >
> >> Hi Alexandre,
> >>
> >> Thanks for you letter! Sorry, it took me some time to write an answer.
> >>
> >> First of all I'd like to describe my perspective a little bit because it
> >> seems, that in many cases we (and other people writing their feedbacks)
> >> simply have very different priorities and background.
> > Thank you for describing the environment you want to use this in, this
> > helps to understand the different use cases.
>
> Yeah, I hope too. I should have done that earlier. I think Dmitry
> described our use-case, but it was several years ago, so nobody
> remembers of course. We should reiterate all the use-cases once in a while.
>
>
> >> OpenSynergy, the company that I work for, develops a proprietary
> >> hypervisor called COQOS mainly for automotive and aerospace domains. We
> >> have our proprietary device implementations, but overall our goal is to
> >> bring open standards into these quite closed domains and we're betting
> >> big on virtio. The idea is to run safety-critical functions like cockpit
> >> controller alongside with multimedia stuff in different VMs on the same
> >> physical board. Right now they have it on separate physical devices. So
> >> they already have maximum isolation. And we're trying to make this
> >> equally safe on a single board. The benefit is the reduced costs and
> >> some additional features. Of course, we also need features here, but at
> >> the same time security and ease of certification are among the top of
> >> our priorities. Nobody wants cars or planes to have security problems,
> >> right? Also nobody really needs DVB and even more exotic devices in cars
> >> and planes AFAIK.
> >>
> >> For the above mentioned reasons our COQOS hypervisor is running on bare
> >> metal. Also memory management for the guests is mostly static. It is
> >> possible to make a shared memory region between a device and a driver
> >> managed by device in advance. But definitely no mapping of random host
> >> pages on the fly is supported.
> >>
> >> AFAIU crosvm is about making Chrome OS more secure by putting every app
> >> in its own virtualized environment, right? Both the host and guest are
> >> linux. In this case I totally understand why V4L2 UAPI pass-through
> >> feels like a right move. I guess, you'd like to make the switch to
> >> virtualized apps as seemless as possible for your users. If they can't
> >> use their DVBs anymore, they complain. And adding the virtualization
> >> makes the whole thing more secure anyway. So I understand the desire to
> >> have the range of supported devices as broad as possible. It is also
> >> understandable that priorities are different with desktop
> >> virtualization. Also I'm not trying to diminish the great work, that you
> >> have done. It is just that from my perspective this looks like a step in
> >> the wrong direction because of the mentioned concerns. So I'm going to
> >> continue being a skeptic here, sorry.
> >>
> >> Of course, I don't expect that you continue working on the old approach
> >> now as you have put that many efforts into the V4L2 UAPI pass-through.
> >> So I think it is best to do the evolutionary changes in scope of virtio
> >> video device specification, and create a new device specification
> >> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> >> continue the virtio-video development. In fact I already started making
> >> draft v7 of the spec according to the comments. I hope it will be ready
> >> for review soon.
> >>
> >> I hope this approach will also help fix issues with virtio-video spec
> >> and driver development misalignment as well as V4L2 compliance issues
> >> with the driver. I believe the problems were caused partly by poor
> >> communication between us and by misalignment of our development cycles,
> >> not by the driver complexity.
> >>
> >> So in my opinion it is OK to have different specs with overlapping
> >> functionality for some time. My only concern is if this would be
> >> accepted by the community and the committee. How the things usually go
> >> here: preferring features and tolerating possible security issues or the
> >> other way around? Also how acceptable is having linux-specific protocols
> >> at all?
> > My main question is: What would be something that we can merge as a
> > spec, that would either cover the different use cases already, or that
> > could be easily extended to cover the use cases it does not handle
> > initially?
> >
> > For example, can some of the features that would be useful in crosvm be
> > tucked behind some feature bit(s), so that the more restricted COQOS
> > hypervisor would simply not offer them? (Two feature bits covering two
> > different mechanisms, like the current approach and the v4l2 approach,
> > would also be good, as long as there's enough common ground between the
> > two.)
> >
> > If a staged approach (adding features controled by feature bits) would
> > be possible, that would be my preferred way to do it.
>
> Hmm, I see several ways how we can use the feature flags:
> 1. Basically making two feature flags: one for the current video spec
> and one for the V4L2 UAPI pass through. Kind of the same as having two
> different specs, but within one device. Not sure which way is better.
> Probably having two separate devices would be easier to review and merge.

Having two different devices with their own IDs would indeed be less
confusing than using feature bits.

That being said, the whole point of proposing virtio-v4l2 is to end up
with *less* specification, not more. Having two concurrent and largely
overlapping approaches will result in fragmentation and duplicated
work, so my suggestion would be to decide on one or the other and
stick to it.

> 2. Finding a subset of V4L2, that closely matches the current draft, and
> restrict everything else. A perfectly reasoned answer for this case will
> require a lot of work going through all the V4L2 structures I think. And

It's pretty trivial, on the contrary. For a decoder device one would
just need to restrict to the structs and operations mentioned here:
https://www.kernel.org/doc/html/v5.18/userspace-api/media/v4l/dev-decoder.html

> even if we have a concrete plan, it has to be implemented first. I doubt
> it is possible. Based on the things, that I already know, this is going
> to be a compromise on security anyway, so we're not happy about that.
> More on that below.
> 3. Stop trying to simply pass the V4L2 UAPI through and focus on making
> the virtio video spec as close to the V4L2 UAPI as possible, but with
> the appropriate security model. So that the video device can be extended
> with a feature flag to something very close to full V4L2 UAPI. A lot of
> work as well, I think. And this won't allow us to simply link the V4L2
> UAPI in the spec and therefore reduce its size, which is Alexandre's
> current goal. So Alexandre and his team are not happy this way probably.

That would indeed be reinventing the wheel and a completely pointless
exercise IMHO.

>
>  From the security point of view these are our goals from most to less
> important AFAIU:
> 1. Make the device secure. If a device is compromised, the whole
> physical machine is at risk. Complexity is the enemy here. It helps a
> lot to make the device as straightforward and easy to implement as
> possible. Therefore it is good to make the spec device/function-centric
> from this PoV.

This seems to be the main point of contention we have right here. I do
not believe that V4L2 introduces significant complexity to the video
use-case, and certainly not to a degree that would justify writing a
new spec just for that.

> 2. Ensure, that drivers are also secure at least from user-space side.
> Maybe from device side too.

FWIW V4L2 has been in use for a looong time (including in Chromebooks
that do secure playback) and I am not aware of fundamental security
issues with it.

> 3. Implementing secure playback and making sure media doesn't leak. For
> this case it is nice to have these object UUIDs as buffers.
>
> Please correct me if there's something wrong here.
>
> When we start looking from this perspective even things like naming
> video buffer queues "output" for device input and "capture" for device
> output are problematic. In my experience this naming scheme takes some
> time and for sure several coding mistakes to get used to. Unfortunately
> this can't be turned off with some feature flags. In contrast virtio
> video v6 names these queues "input" and "output". This is perfectly fine
> if we look from the device side. It is understandable, that Alexandre's
> list of differences between V4L2 UAPI and the current state of virtio
> video doesn't include these things. But we have to count them, I think.
> That's why it takes me so long to make the list. :) So throwing away
> this simplicity is still going to be a compromise from the security
> perspective, that we're not happy about.
>
> This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
> security model, legacy, use-cases, developers, etc. It can be changed
> over time, but this is a long process because this means changing the
> Linux UAPI. Also this means, that nothing can be removed from it, only
> added (if the V4L2 community agrees). For example, we can't simply add a
> new way of sharing buffers for potential new use-cases once we switch to
> V4L2 UAPI AFAIU.

You certainly can, it would just need to be in the virtio
specification. And again that's a very hypothetical case.

> The V4L2 community can simply reject changes because
> this is a UAPI after all. We kind of have a weak dependency already,
> because the only driver implementation is based on V4L2 and we'd like to
> keep the spec as close to V4L2 as possible, but it is not the same
> thing. So at the moment it looks like the V4L2 UAPI proposal is not
> super flexible. Alexandre said, that we can simply not implement some of
> the ioctls. Well, this definitely doesn't cover all the complexity like
> the structures and other subtle details.

Mmm? Where did I say that? That sounds like a misunderstanding.

>
> Also adding the feature flags would probably defeat the stated purpose
> of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
> V4L2 driver.
>
> So I have a lot of doubts about the feasibility of adding feature flags.
> If Alexandre and his team want the V4L2 UAPI as is, then looks like it
> is best to simply have two specs:
> 1. virtio-video for those interested in building from ground up with the
> security model appropriate for virtualization in mind. This is going to
> take time and not going to reach feature parity with V4L2 ever I think.
> I mean some old devices might never get support this way.
> 2. virtio-v4l2 as a shortcut for those interested in having feature
> parity with V4L2 fast. Like a compatibility layer. Probably this is
> going to be used in linux host + linux guest use-cases only. Maybe it
> gets obsoleted by the first spec in several years for most modern use-cases.
>
> Or maybe have these two cases within a single device spec as I wrote above.
>
> This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
> only needed for compatibility, then this way we can avoid a lot of work
> going through all of the V4L2 and trying to find different subsets or
> trying to construct something, that is close to V4L2 UAPI, but doesn't
> compromise on the security. I'm not really interested in doing all this
> work because we're already more or less satisfied with the current
> state. We don't need feature parity with V4L2. On the other hand for
> Alexandre the feature-parity with V4L2 is clearly of higher priority,
> than all these subtle security model differences. In my opinion it also
> doesn't make sense to invest that much time in something, that looks
> like a compatibility layer. So it seems both of us are interested in
> avoiding all this extra work. Then I'd just prefer to have two different
> specs so that everyone can work according to their priorities.
>
>
> > Regarding the protocol: I think Linux-originating protocols (that can be
> > implemented on non-Linux setups) are fine, Linux-only protocols probably
> > not so much.
>
> Thanks for the information. Well, it looks like the V4L2 UAPI could be
> implemented on any platform unless it needs a completely new way of
> memory management since all the V4L2_MEMORY_* constants are going to be
> used already AFAIU.
>
>
> >>>>      a. So V4L2 subsystem and the current virtio-video driver are already
> >>>> reducing the complexity. And this seems as the right place to do this,
> >>>> because the complexity is caused by the amount of V4L2 use cases and its
> >>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
> >>>> would prefer a simpler API, right? I think this use-case is not purely
> >>>> abstract at all.
> >>> The V4L2 subsystem is there to factorize code that can be shared
> >>> between drivers and manage their internal state. Our target is the
> >>> V4L2 UAPI, so a Windows driver needs not be concerned about these
> >>> details - it does what it would have done with virtio-video, and just
> >>> uses the V4L2 structures to communicate with the host instead of the
> >>> virtio-video ones.
> >> It can also reuse the virtio-video structures. So I think despite the
> >> ability to reuse V4L2 structures, having to implement a linux-specific
> >> interface would still be a bigger pain.
> > Hm. Do the v4l2 structures drag in too many adjacent things that need to
> > be implemented? Can we match the video-video structures from the current
> > proposal with some v4l2 structures and extract a common wrapper for
> > those that match, with a feature-bit controlled backend? It would be
> > fine if any of those backends supported a slightly different subset of
> > the common parts, as long as the parts implemented by both would be
> > enough to implement a working device. (Mostly thinking out loud here.)
>
> I don't think this is realistic unfortunately. On per ioctl level it is
> possible to disable some functionality probably, but the V4L2 structures
> are set in stone. We can only extend them.
>
>
> >>> The guest driver that I wrote is, I think, a good example of the
> >>> complexity you can expect in terms of guest driver size (as it is
> >>> pretty functional already with its 1000 and some LoCs). For the UAPI
> >>> complexity, the host device basically unpacks the information it needs
> >>> and rebuilds the V4L2 structures before calling into the host device,
> >>> and I don't see this process as more complex that the unpacking of
> >>> virtio-video structs which we also did in crosvm.
> >> Unfortunately our hypervisor doesn't support mapping random host pages
> >> in the guest. Static allocations of shared memory regions are possible.
> >> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> >> a region per virtual device. This is just very tedious and inflexible.
> >> That's why we're mainly interested in having the guest pages sharing in
> >> the virtio video spec.
> > This really sounds like you'll want a different approach -- two
> > mechanisms covered by two feature bits might indeed be the way to go.
>
> Well, basically this is the way we have it now. I'm not sure what is
> Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
> solved, the solution already doesn't look future-proof anyway unfortunately.
>
>
> >>> I hope I have somehow addressed your points. The main point here is to
> >>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
> >>> accelerated codec work, regardless of what the guest or host
> >>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
> >>> this is a viable solution. This PoC is largely simplified by the fact
> >>> that V4L2 is used all along the way, but this is irrelevant - yes,
> >>> actual devices will likely talk to other APIs and maintain more state,
> >>> like a virtio-video device would do. What I want to demonstrate is
> >>> that we can send encoding work and receive a valid stream, and that it
> >>> is not costly, and only marginally more complex than our virtio-video
> >>> spec attempts.
> >>>
> >>> ... and we can support cameras too, but that's just a convenient
> >>> side-effect, not the ultimate solution to the camera virtualization
> >>> problem (that's for the camera folks to decide).
> >> Thanks for your answer!
> > Thanks everyone -- do you think the "two feature bits to cover different
> > approaches, but using a common infrastructure" idea could work? If yes,
> > I think that's the direction we should take. If we can implement this
> > with just one feature bit, that might also be a good route to extend it
> > later, but I'm not familiar enough with the whole infrastructure to make
> > any judgement here.
>
> Thanks for your suggestions. Hopefully we end up with a good solution.

To summarize my position:

* I am still not convinced that V4L2 is lacking from a security
perspective. It would take just one valid example to change my mind
(and no, the way the queues are named is not valid). And btw, if it
really introduces security issues, then this makes it invalid for
inclusion in virtio entirely, just not OpSy's hypervisor.

* Having two overlapping specifications for video is overkill and will
just fragment virtio (as tempting as it is, I won't link to XKCD). I
strongly advise against that.

* If the goal is to provide a standard that is suitable and useful to
the greater number, then we shouldn't downsize the benefit that
virtio-v4l2 brings to Linux guests.

Cheers,
Alex.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-19 21:34                                   ` Enrico Granata
@ 2023-04-21 14:48                                     ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-21 14:48 UTC (permalink / raw)
  To: Enrico Granata
  Cc: Cornelia Huck, Alexandre Courbot, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

Thanks for your feedback, Enrico!

On 19.04.23 23:34, Enrico Granata wrote:

> Inlined
>
> Thanks,
> - Enrico
>
> Thanks,
> - Enrico
>
>
> On Wed, Apr 19, 2023 at 12:39 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com>  wrote:
>> On 17.04.23 16:43, Cornelia Huck wrote:
>>> On Mon, Apr 17 2023, Alexander Gordeev<alexander.gordeev@opensynergy.com>  wrote:
>>>
>>>> OpenSynergy, the company that I work for, develops a proprietary
>>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
>>>> have our proprietary device implementations, but overall our goal is to
>>>> bring open standards into these quite closed domains and we're betting
>>>> big on virtio. The idea is to run safety-critical functions like cockpit
>>>> controller alongside with multimedia stuff in different VMs on the same
>>>> physical board. Right now they have it on separate physical devices. So
>>>> they already have maximum isolation. And we're trying to make this
>>>> equally safe on a single board. The benefit is the reduced costs and
>>>> some additional features. Of course, we also need features here, but at
>>>> the same time security and ease of certification are among the top of
>>>> our priorities. Nobody wants cars or planes to have security problems,
>>>> right? Also nobody really needs DVB and even more exotic devices in cars
>>>> and planes AFAIK.
>>>>
>>>> For the above mentioned reasons our COQOS hypervisor is running on bare
>>>> metal. Also memory management for the guests is mostly static. It is
>>>> possible to make a shared memory region between a device and a driver
>>>> managed by device in advance. But definitely no mapping of random host
>>>> pages on the fly is supported.
>>>>
>>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
>>>> in its own virtualized environment, right? Both the host and guest are
>>>> linux. In this case I totally understand why V4L2 UAPI pass-through
>>>> feels like a right move. I guess, you'd like to make the switch to
>>>> virtualized apps as seemless as possible for your users. If they can't
>>>> use their DVBs anymore, they complain. And adding the virtualization
>>>> makes the whole thing more secure anyway. So I understand the desire to
>>>> have the range of supported devices as broad as possible. It is also
>>>> understandable that priorities are different with desktop
>>>> virtualization. Also I'm not trying to diminish the great work, that you
>>>> have done. It is just that from my perspective this looks like a step in
>>>> the wrong direction because of the mentioned concerns. So I'm going to
>>>> continue being a skeptic here, sorry.
>>>>
>>>> Of course, I don't expect that you continue working on the old approach
>>>> now as you have put that many efforts into the V4L2 UAPI pass-through.
>>>> So I think it is best to do the evolutionary changes in scope of virtio
>>>> video device specification, and create a new device specification
>>>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
>>>> continue the virtio-video development. In fact I already started making
>>>> draft v7 of the spec according to the comments. I hope it will be ready
>>>> for review soon.
>>>>
>>>> I hope this approach will also help fix issues with virtio-video spec
>>>> and driver development misalignment as well as V4L2 compliance issues
>>>> with the driver. I believe the problems were caused partly by poor
>>>> communication between us and by misalignment of our development cycles,
>>>> not by the driver complexity.
>>>>
>>>> So in my opinion it is OK to have different specs with overlapping
>>>> functionality for some time. My only concern is if this would be
>>>> accepted by the community and the committee. How the things usually go
>>>> here: preferring features and tolerating possible security issues or the
>>>> other way around? Also how acceptable is having linux-specific protocols
>>>> at all?
>>> My main question is: What would be something that we can merge as a
>>> spec, that would either cover the different use cases already, or that
>>> could be easily extended to cover the use cases it does not handle
>>> initially?
>>>
>>> For example, can some of the features that would be useful in crosvm be
>>> tucked behind some feature bit(s), so that the more restricted COQOS
>>> hypervisor would simply not offer them? (Two feature bits covering two
>>> different mechanisms, like the current approach and the v4l2 approach,
>>> would also be good, as long as there's enough common ground between the
>>> two.)
>>>
>>> If a staged approach (adding features controled by feature bits) would
>>> be possible, that would be my preferred way to do it.
>> Hmm, I see several ways how we can use the feature flags:
>> 1. Basically making two feature flags: one for the current video spec
>> and one for the V4L2 UAPI pass through. Kind of the same as having two
>> different specs, but within one device. Not sure which way is better.
>> Probably having two separate devices would be easier to review and merge.
> I agree with this. It may be worth renaming the specs to something,
> say virtio-vencdec and virtio-v4l2 </bikeshedding>

Well, personally I'd prefer to keep the virtio-video because vencdec is
a little harder to pronounce IMO. Also after all I'd like to simply
continue the current development, so this may be easier to track. But I
don't really mind renaming, of course.


> Having one spec for what is really two devices forked by flags may
> superficially seem simpler and less controversial (everyone gets what
> they want, right?) but it feels bolted on.

Yes, I have the same thoughts.


>> 2. Finding a subset of V4L2, that closely matches the current draft, and
>> restrict everything else. A perfectly reasoned answer for this case will
>> require a lot of work going through all the V4L2 structures I think. And
>> even if we have a concrete plan, it has to be implemented first. I doubt
>> it is possible. Based on the things, that I already know, this is going
>> to be a compromise on security anyway, so we're not happy about that.
>> More on that below.
> To be fair, this could be done on the device side, right? I am not
> saying it would be trivial, but the device implementation could
> restrict the subset of V4L2 it accepts to whatever it feels is "safe"
> and limit the rest.

Hmm, if I understood correctly this means the same thing, as I
suggested, just not including the subset (if it exists) in the spec.
Right? Well, this subset has to be defined somewhere, I think. Otherwise
it would be hard to avoid compatibility problems with the drivers.
Anyway I have a lot of doubts and concerns about this possibility. So
we're not happy with this solution.


>> 3. Stop trying to simply pass the V4L2 UAPI through and focus on making
>> the virtio video spec as close to the V4L2 UAPI as possible, but with
>> the appropriate security model. So that the video device can be extended
>> with a feature flag to something very close to full V4L2 UAPI. A lot of
>> work as well, I think. And this won't allow us to simply link the V4L2
>> UAPI in the spec and therefore reduce its size, which is Alexandre's
>> current goal. So Alexandre and his team are not happy this way probably.
>>
>>   From the security point of view these are our goals from most to less
>> important AFAIU:
>> 1. Make the device secure. If a device is compromised, the whole
>> physical machine is at risk. Complexity is the enemy here. It helps a
>> lot to make the device as straightforward and easy to implement as
>> possible. Therefore it is good to make the spec device/function-centric
>> from this PoV.
>> 2. Ensure, that drivers are also secure at least from user-space side.
>> Maybe from device side too.
>> 3. Implementing secure playback and making sure media doesn't leak. For
>> this case it is nice to have these object UUIDs as buffers.
>>
>> Please correct me if there's something wrong here.
>>
>> When we start looking from this perspective even things like naming
>> video buffer queues "output" for device input and "capture" for device
>> output are problematic. In my experience this naming scheme takes some
>> time and for sure several coding mistakes to get used to. Unfortunately
>> this can't be turned off with some feature flags. In contrast virtio
>> video v6 names these queues "input" and "output". This is perfectly fine
>> if we look from the device side. It is understandable, that Alexandre's
>> list of differences between V4L2 UAPI and the current state of virtio
>> video doesn't include these things. But we have to count them, I think.
>> That's why it takes me so long to make the list. :) So throwing away
>> this simplicity is still going to be a compromise from the security
>> perspective, that we're not happy about.
>>
>> This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
>> security model, legacy, use-cases, developers, etc. It can be changed
>> over time, but this is a long process because this means changing the
>> Linux UAPI. Also this means, that nothing can be removed from it, only
>> added (if the V4L2 community agrees). For example, we can't simply add a
>> new way of sharing buffers for potential new use-cases once we switch to
>> V4L2 UAPI AFAIU. The V4L2 community can simply reject changes because
>> this is a UAPI after all. We kind of have a weak dependency already,
>> because the only driver implementation is based on V4L2 and we'd like to
>> keep the spec as close to V4L2 as possible, but it is not the same
>> thing. So at the moment it looks like the V4L2 UAPI proposal is not
>> super flexible. Alexandre said, that we can simply not implement some of
>> the ioctls. Well, this definitely doesn't cover all the complexity like
>> the structures and other subtle details.
>>
>> Also adding the feature flags would probably defeat the stated purpose
>> of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
>> V4L2 driver.
>>
>> So I have a lot of doubts about the feasibility of adding feature flags.
>> If Alexandre and his team want the V4L2 UAPI as is, then looks like it
>> is best to simply have two specs:
>> 1. virtio-video for those interested in building from ground up with the
>> security model appropriate for virtualization in mind. This is going to
>> take time and not going to reach feature parity with V4L2 ever I think.
>> I mean some old devices might never get support this way.
>> 2. virtio-v4l2 as a shortcut for those interested in having feature
>> parity with V4L2 fast. Like a compatibility layer. Probably this is
>> going to be used in linux host + linux guest use-cases only. Maybe it
>> gets obsoleted by the first spec in several years for most modern use-cases.
> This latter device also makes a lot of sense for Android (I know,
> Linux based) where there are nifty implementations of a number of HALs
> that will just work if you assume V4L2 and will not (or will not
> without rework) if you assume anything else.

Yeah, I know. We also have issues with Android HALs. Still we have more
restricted hypervisor and we also believe, that avoiding sharing host
pages is the right thing to do. Because of this moving to V4L2 UAPI
seems to bring us no benefits. So we'd like to continue digging here.
Hopefully with two devices we can be both unblocked. And hopefully at
some point our developments might become useful for you.


Kind regards,
Alexander Gordeev

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail:alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-21  4:02                                   ` Alexandre Courbot
@ 2023-04-21 16:01                                     ` Alexander Gordeev
  2023-04-24  7:52                                       ` Alexander Gordeev
  2023-04-26 15:52                                     ` Alexander Gordeev
       [not found]                                     ` <CALgKJBqKWng508cB_F_uD2fy9EAvQ36rYR3fRb57sFd3ihpUFw@mail.gmail.com>
  2 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-21 16:01 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

Hi Alexandre,

On 21.04.23 06:02, Alexandre Courbot wrote:
> * I am still not convinced that V4L2 is lacking from a security
> perspective. It would take just one valid example to change my mind
> (and no, the way the queues are named is not valid). And btw, if it
> really introduces security issues, then this makes it invalid for
> inclusion in virtio entirely, just not OpSy's hypervisor.

I'd like to start with this and then answer everything else later.

Let's compare VIRTIO_VIDEO_CMD_RESOURCE_QUEUE with
VIDIOC_QBUF+VIDIOC_DQBUF. Including the parameters, of course. First,
let's compare the word count to get a very rough estimate of complexity.
I counted 585 words for VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, including the
parameters. VIDIOC_QBUF+VIDIOC_DQBUF are defined together and take 1206
words, they both use struct v4l2_buffer as a parameter. The struct takes
2716 words to be described. So the whole thing takes 3922 words. This is
6.7 times more, than VIRTIO_VIDEO_CMD_RESOURCE_QUEUE. If we check the
definitions of the structs, it is also very obvious, that V4L2 UAPI is
almost like an order of magnitude more complex.

Also please read:

https://medium.com/starting-up-security/evidence-of-absence-8148958da092

https://www.schneier.com/essays/archives/1999/11/a_plea_for_simplicit.html


Kind regards,
Alexander Gordeev

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah


Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-21 16:01                                     ` Alexander Gordeev
@ 2023-04-24  7:52                                       ` Alexander Gordeev
  2023-04-25 16:04                                         ` Cornelia Huck
  2023-04-26  5:52                                         ` Alexandre Courbot
  0 siblings, 2 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-24  7:52 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

[-- Attachment #1: Type: text/plain, Size: 6313 bytes --]

On 21.04.23 18:01, Alexander Gordeev wrote:
Hi Alexandre,

On 21.04.23 06:02, Alexandre Courbot wrote:
* I am still not convinced that V4L2 is lacking from a security
perspective. It would take just one valid example to change my mind
(and no, the way the queues are named is not valid). And btw, if it
really introduces security issues, then this makes it invalid for
inclusion in virtio entirely, just not OpSy's hypervisor.

I'd like to start with this and then answer everything else later.

Let's compare VIRTIO_VIDEO_CMD_RESOURCE_QUEUE with
VIDIOC_QBUF+VIDIOC_DQBUF. Including the parameters, of course. First,
let's compare the word count to get a very rough estimate of complexity.
I counted 585 words for VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, including the
parameters. VIDIOC_QBUF+VIDIOC_DQBUF are defined together and take 1206
words, they both use struct v4l2_buffer as a parameter. The struct takes
2716 words to be described. So the whole thing takes 3922 words. This is
6.7 times more, than VIRTIO_VIDEO_CMD_RESOURCE_QUEUE. If we check the
definitions of the structs, it is also very obvious, that V4L2 UAPI is
almost like an order of magnitude more complex.


I think, it is best to add all the steps necessary to reproduce my calculations just in case.

VIRTIO_VIDEO_CMD_RESOURCE_QUEUE is doing essentially the same thing as VIDIOC_QBUF+VIDIOC_DQBUF, so we're comparing apples to apples (if we don't forget to compare their parameters too).

To get the word count for the VIRTIO_VIDEO_CMD_RESOURCE_QUEUE I opened the rendered PDF of video section only from the first email in this thread. Here is the link: https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing . Then I scrolled to page 11 and copied everything related a text file. This is around two pages in the PDF. Then I removed page numbers from the copied text and used 'wc -w' to count words.

To get the word count for VIDIOC_QBUF+VIDIOC_DQBUF I opened this link: https://docs.kernel.org/userspace-api/media/v4l/vidioc-qbuf.html . Then I selected all the text except table of contents and did followed the same procedure.

To get the word count for struct v4l2_buffer and other types, that are referenced from it, I opened this link: https://docs.kernel.org/userspace-api/media/v4l/buffer.html#struct-v4l2-buffer . Then I selected all the text except the table of contents and the text above struct v4l2_buffer definition. The rest is the same.

Also it's quite obvious if you look at them how much bigger struct v4l2_buffer (including the referenced types) is compared to struct virtio_video_resource_queue.

Do we agree now, that V4L2 UAPI is not only marginally more complex?


Also please read:

https://medium.com/starting-up-security/evidence-of-absence-8148958da092


This reference probably needs a clarification. You argued, that V4L2 has a good track record so far. Here is the quote:

FWIW V4L2 has been in use for a looong time (including in Chromebooks
that do secure playback) and I am not aware of fundamental security
issues with it.

But absence of found major security issues doesn't tell us much about the number of not found ones. Absence of evidence is not an evidence of absence. At the link above a former Director of Security at Facebook shares his thoughts about what could be a good evidence of absence of major security problems.

So a bug bounty program with high premiums that covers V4L2 would be a better argument in favor of *already written code* in my opinion. Not for new code. Also probably it is also an argument in favor of the spec, that is the V4L2 UAPI. Like that it is polished enough. Not so sure about that though.

There actually are several bug bounty programs, that cover the kernel. These are Google's kctf, ZDI's pwn2own, and zerodium AFAIK. However the premiums are not even close to the ones mentioned in my reference. Anyway this means, that using *the existing V4L2 code in the kernel* is probably OK. But this creates some limitations if we want the actual code to still be covered with these bug bounties, right? This means, that the host OS has to be Linux and the actual hardware has to be exposed through a stable V4L2 driver, that is in mainline for some time, and there has to be no or little processing on top. For us this is not possible unfortunately. In the end both things could be secure:

1. V4L2 pass through can be secure because of the bug bounty programs and a lot of attention to the kernel in general.
2. For the new code this doesn't work, so the spec should be as simple and device-centric as possible. Because, all other things being equal, there are fewer errors in simpler programs. So defining a subset of V4L2 UAPI including the data types looks like a good idea to me. The stateful decoder interface, that you point to, does not define a subset in the data types.

This is basically my reasoning.

Also these two specs don't need to compete with each other. They have different limitations and they are for different audiences. If you check the XKCD's comic, it is about competing standards.


https://www.schneier.com/essays/archives/1999/11/a_plea_for_simplicit.html


A quote from this article:

The worst enemy of security is complexity.

I hope I've provided above some evidence, that V4L2 UAPI is significantly more complex. You asked for one example, I provided it. For us this is already something to care about.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com<mailto:alexander.gordeev@opensynergy.com>

www.opensynergy.com<http://www.opensynergy.com>

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

[-- Attachment #2: Type: text/html, Size: 8813 bytes --]

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-24  7:52                                       ` Alexander Gordeev
@ 2023-04-25 16:04                                         ` Cornelia Huck
  2023-04-26  6:29                                           ` Alexandre Courbot
  2023-04-27 14:10                                           ` Alexander Gordeev
  2023-04-26  5:52                                         ` Alexandre Courbot
  1 sibling, 2 replies; 97+ messages in thread
From: Cornelia Huck @ 2023-04-25 16:04 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve


[I'm replying here, as that seems to be the last message in the thread,
and my reply hopefully catches everyone interested here.]

To do a very high level summary, we have (at least) two use cases for
virtio-video, that unfortunately have quite different requirements. Both
want to encode/decode video, but in different environments.

- The "restricted" case: Priority is on security, and the attack surface
  should be kept as small as possible, for example, by avoiding unneded
  complexity in the interface. Fancy allocations and management should
  be avoided. The required functionality is also quite clearly defined.
- The "feature-rich" case: Priority is on enabling features, and being
  able to re-use existing V4L2 support is considered a big plus. Both
  device and driver implementations will be implemented in a full OS
  environment, so all kind of helpers are already available.

(This is not to say that one case does not care about functionality or
security; it's mostly a case of different priorities and environments.)

I had been hoping that it would be possible to find kind of a common
ground between the two cases, but reading the thread, I'm not quite as
hopeful anymore... if we really don't manage to find an approach to make
the different requirements co-exist, a separate virtio-v4l2 device might
be the way to go -- but I've not totally given up hope yet.

Some remarks from my side:

- I'm not totally convinced that counting words is always a good proxy
  for complexity -- an interface might be simple on paper, but if the
  actual implementation would need to be quite involved to get it right,
  we'd again have a lot of opportunity for mistakes.
- How much of v4l2 does actually need to be in the device specification
  for a driver to make potentially good use of it? Sure, being able to
  directly map to v4l2 really gives a huge benefit, but is there a way
  to extract a subset that's not too complex, but can be easily wrapped
  for interfacing with v4l2? (Both interface and functionality wise.)
  Even if that means that a driver would need to implement some kind of
  shim, a layer that easily maps to v4l2 concepts would still be much
  easier to implement than one that needs to map two quite different
  interfaces. [I'm really relying on the good judgement of people
  familiar with the interfaces here :)]
- To which extent does security need to be baked into the device
  specification? We should avoid footguns, and avoiding needless
  complication is also a good idea, but while every new functionality
  means more attack surface, it also enables more use cases. That
  tension is hard to resolve; how much of it can we alleviate by making
  things optional?

I hope I have not muddied the waters here, but I'd really like to see an
agreement on how to continue (with two different devices, if there is
really no other way.)


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-24  7:52                                       ` Alexander Gordeev
  2023-04-25 16:04                                         ` Cornelia Huck
@ 2023-04-26  5:52                                         ` Alexandre Courbot
  2023-04-27 14:20                                           ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-26  5:52 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Mon, Apr 24, 2023 at 4:52 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 21.04.23 18:01, Alexander Gordeev wrote:
>
> Hi Alexandre,
>
> On 21.04.23 06:02, Alexandre Courbot wrote:
>
> * I am still not convinced that V4L2 is lacking from a security
> perspective. It would take just one valid example to change my mind
> (and no, the way the queues are named is not valid). And btw, if it
> really introduces security issues, then this makes it invalid for
> inclusion in virtio entirely, just not OpSy's hypervisor.
>
>
> I'd like to start with this and then answer everything else later.
>
> Let's compare VIRTIO_VIDEO_CMD_RESOURCE_QUEUE with
> VIDIOC_QBUF+VIDIOC_DQBUF. Including the parameters, of course. First,
> let's compare the word count to get a very rough estimate of complexity.
> I counted 585 words for VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, including the
> parameters. VIDIOC_QBUF+VIDIOC_DQBUF are defined together and take 1206
> words, they both use struct v4l2_buffer as a parameter. The struct takes
> 2716 words to be described. So the whole thing takes 3922 words. This is
> 6.7 times more, than VIRTIO_VIDEO_CMD_RESOURCE_QUEUE. If we check the
> definitions of the structs, it is also very obvious, that V4L2 UAPI is
> almost like an order of magnitude more complex.
>
>
> I think, it is best to add all the steps necessary to reproduce my calculations just in case.
>
> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE is doing essentially the same thing as VIDIOC_QBUF+VIDIOC_DQBUF, so we're comparing apples to apples (if we don't forget to compare their parameters too).
>
> To get the word count for the VIRTIO_VIDEO_CMD_RESOURCE_QUEUE I opened the rendered PDF of video section only from the first email in this thread. Here is the link: https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing . Then I scrolled to page 11 and copied everything related a text file. This is around two pages in the PDF. Then I removed page numbers from the copied text and used 'wc -w' to count words.
>
> To get the word count for VIDIOC_QBUF+VIDIOC_DQBUF I opened this link: https://docs.kernel.org/userspace-api/media/v4l/vidioc-qbuf.html . Then I selected all the text except table of contents and did followed the same procedure.
>
> To get the word count for struct v4l2_buffer and other types, that are referenced from it, I opened this link: https://docs.kernel.org/userspace-api/media/v4l/buffer.html#struct-v4l2-buffer . Then I selected all the text except the table of contents and the text above struct v4l2_buffer definition. The rest is the same.
>
> Also it's quite obvious if you look at them how much bigger struct v4l2_buffer (including the referenced types) is compared to struct virtio_video_resource_queue.

You are comparing not the complexity of the structures but the
verbosity of their documentation, which are written in a different
style, format, and by different people. And the V4L2 page also
contains the description of memory types, which is part of another
section in the virtio-video spec. There is no way to draw a meaningful
conclusion from this.

If you want to compare, do it with how the structures are actually
used. Here is how you would queue an input buffer with virtio-video:

  struct virtio_video_resource_queue queue_buf = {
      .cmd_type = VIRTIO_VIDEO_CMD_RESOURCE_QUEUE,
      .stream_id = 42,
      .queue_type = VIRTIO_VIDEO_QUEUE_TYPE_INPUT,
      .resource_id = 1,
      .timestamp = 0x10,
      .data_sizes = {
        [0] = 0x1000,
      },
  };

Now the same with virtio-v4l2:

  struct virtio_v4l2_queue_buf queue_buf = {
      .cmd = VIRTIO_V4L2_CMD_IOCTL,
      .code = VIDIOC_QBUF,
      .session_id = 42,
      .buffer.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
      .buffer.index = 1,
      .buffer.timestamp.tv_usec = 0x10,
      .buffer.memory = V4L2_MEMORY_MMAP,
      .planes = {
        [0] = { .bytesused = 0x1000 },
      }
  };

In both cases, you pass a structure with some members set, and the
rest to 0. The host receives basically the same thing - it's the same
data! The only difference is how it is laid out.

Also as mentioned by Bart, the apparent simplicity of
VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, which does not require a dequeue
operation as the dequeue is sent with its response, is actually a
fallacy: that design choice makes the specification simpler, but at
the cost of more complexity on the device side and the potential to
starve on command descriptors. By contrast, adopting the V4L2 model
resulted in simpler code on both sides and no possibility to deadlock.
That point could be addressed by revising the virtio-video spec, but
then you get even closer to V4L2.

>
> Do we agree now, that V4L2 UAPI is not only marginally more complex?

No, and I rest my case on the code samples above.

>
>
> Also please read:
>
> https://medium.com/starting-up-security/evidence-of-absence-8148958da092
>
>
> This reference probably needs a clarification. You argued, that V4L2 has a good track record so far. Here is the quote:
>
> FWIW V4L2 has been in use for a looong time (including in Chromebooks
> that do secure playback) and I am not aware of fundamental security
> issues with it.
>
> But absence of found major security issues doesn't tell us much about the number of not found ones. Absence of evidence is not an evidence of absence. At the link above a former Director of Security at Facebook shares his thoughts about what could be a good evidence of absence of major security problems.

You are just FUDing now. The exact same argument could be made about
virtio-video. If you want to borrow from the security world, I can ask
you how going with virtio-video when V4L2 exists is different from
rolling your own crypto.

>
> So a bug bounty program with high premiums that covers V4L2 would be a better argument in favor of *already written code* in my opinion. Not for new code. Also probably it is also an argument in favor of the spec, that is the V4L2 UAPI. Like that it is polished enough. Not so sure about that though.
>
> There actually are several bug bounty programs, that cover the kernel. These are Google's kctf, ZDI's pwn2own, and zerodium AFAIK. However the premiums are not even close to the ones mentioned in my reference. Anyway this means, that using *the existing V4L2 code in the kernel* is probably OK. But this creates some limitations if we want the actual code to still be covered with these bug bounties, right? This means, that the host OS has to be Linux and the actual hardware has to be exposed through a stable V4L2 driver, that is in mainline for some time, and there has to be no or little processing on top. For us this is not possible unfortunately. In the end both things could be secure:
>
> 1. V4L2 pass through can be secure because of the bug bounty programs and a lot of attention to the kernel in general.
> 2. For the new code this doesn't work, so the spec should be as simple and device-centric as possible. Because, all other things being equal, there are fewer errors in simpler programs. So defining a subset of V4L2 UAPI including the data types looks like a good idea to me. The stateful decoder interface, that you point to, does not define a subset in the data types.

Wouldn't you have exactly the same problem by using a new guest-host
protocol? Are you going to start a bounty program for virtio-video to
get an assurance that it is secure?

>
> This is basically my reasoning.
>
> Also these two specs don't need to compete with each other. They have different limitations and they are for different audiences. If you check the XKCD's comic, it is about competing standards.

They allow exactly the same thing (virtualization of video
decoding/encoding) and there isn't any use-case of virtio-video that
could not be covered equally well by V4L2. Reinventing a new video
specification is pointless and will lead to unneeded fragmentation.

We should also expect reticence from the Linux community to upstream
two virtual video drivers that basically do the same thing.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-25 16:04                                         ` Cornelia Huck
@ 2023-04-26  6:29                                           ` Alexandre Courbot
  2023-04-27 14:10                                           ` Alexander Gordeev
  1 sibling, 0 replies; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-26  6:29 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alexander Gordeev, virtio-dev, Keiichi Watanabe,
	Alex Bennée, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve

On Wed, Apr 26, 2023 at 1:04 AM Cornelia Huck <cohuck@redhat.com> wrote:
>
>
> [I'm replying here, as that seems to be the last message in the thread,
> and my reply hopefully catches everyone interested here.]
>
> To do a very high level summary, we have (at least) two use cases for
> virtio-video, that unfortunately have quite different requirements. Both
> want to encode/decode video, but in different environments.
>
> - The "restricted" case: Priority is on security, and the attack surface
>   should be kept as small as possible, for example, by avoiding unneded
>   complexity in the interface. Fancy allocations and management should
>   be avoided. The required functionality is also quite clearly defined.
> - The "feature-rich" case: Priority is on enabling features, and being
>   able to re-use existing V4L2 support is considered a big plus. Both
>   device and driver implementations will be implemented in a full OS
>   environment, so all kind of helpers are already available.

I should highlight that virtio-v4l2 does not require any more allocations or
management that virtio-video does. It's just a different way to pass the same
messages between guest and host.

>
> (This is not to say that one case does not care about functionality or
> security; it's mostly a case of different priorities and environments.)
>
> I had been hoping that it would be possible to find kind of a common
> ground between the two cases, but reading the thread, I'm not quite as
> hopeful anymore... if we really don't manage to find an approach to make
> the different requirements co-exist, a separate virtio-v4l2 device might
> be the way to go -- but I've not totally given up hope yet.
>
> Some remarks from my side:
>
> - I'm not totally convinced that counting words is always a good proxy
>   for complexity -- an interface might be simple on paper, but if the
>   actual implementation would need to be quite involved to get it right,
>   we'd again have a lot of opportunity for mistakes.
> - How much of v4l2 does actually need to be in the device specification
>   for a driver to make potentially good use of it? Sure, being able to
>   directly map to v4l2 really gives a huge benefit, but is there a way
>   to extract a subset that's not too complex, but can be easily wrapped
>   for interfacing with v4l2? (Both interface and functionality wise.)
>   Even if that means that a driver would need to implement some kind of
>   shim, a layer that easily maps to v4l2 concepts would still be much
>   easier to implement than one that needs to map two quite different
>   interfaces. [I'm really relying on the good judgement of people
>   familiar with the interfaces here :)]

The current prototype is eschewing some legacy V4L2 features, so we are already
in that spirit. Even with a V4L2 guest there will be some light shimming.

One extra step we could maybe take, if we are worried about having the whole
V4L2 API, is assign a virtio device ID to each kind of device (decoder,
encoder, camera, ...) and explicitly list which calls and features from V4L2
are allowed for this kind of device - anything else is invalid and must be
rejected by the host.

This would result in more spec and less flexibility as nobody could e.g. write
a camera device before we specify what a camera device can do.

But it would also unambiguously define the allowed functionality and set hard
boundaries as to what can be done, which may be one of Alexander's concerns. As
a bonus it will also spare the guest from having to detect what kind of video
device it is dealing with as the device ID will carry that information.

I don't think it is absolutely necessary, as this is already defined in the
V4L2 docs, and the host is under no obligation to support every V4L2 syscall -
it can just reject those it doesn't like, as many drivers do. But it could be a
way to mitigate concerns about the attack surface.

> - To which extent does security need to be baked into the device
>   specification? We should avoid footguns, and avoiding needless
>   complication is also a good idea, but while every new functionality
>   means more attack surface, it also enables more use cases. That
>   tension is hard to resolve; how much of it can we alleviate by making
>   things optional?
>
> I hope I have not muddied the waters here, but I'd really like to see an
> agreement on how to continue (with two different devices, if there is
> really no other way.)

That would be an ironic way to end up with more work when we expected to have
less, but ultimately it's your call. :)

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-21  4:02                               ` Alexandre Courbot
@ 2023-04-26 15:11                                 ` Alexander Gordeev
  2023-04-27 13:16                                   ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-26 15:11 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 21.04.23 06:02, Alexandre Courbot wrote:
> Hi Alexander,
>
> On Mon, Apr 17, 2023 at 9:52 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> Hi Alexandre,
>>
>> Thanks for you letter! Sorry, it took me some time to write an answer.
>>
>> First of all I'd like to describe my perspective a little bit because it
>> seems, that in many cases we (and other people writing their feedbacks)
>> simply have very different priorities and background.
>>
>> OpenSynergy, the company that I work for, develops a proprietary
>> hypervisor called COQOS mainly for automotive and aerospace domains. We
>> have our proprietary device implementations, but overall our goal is to
>> bring open standards into these quite closed domains and we're betting
>> big on virtio. The idea is to run safety-critical functions like cockpit
>> controller alongside with multimedia stuff in different VMs on the same
>> physical board. Right now they have it on separate physical devices. So
>> they already have maximum isolation. And we're trying to make this
>> equally safe on a single board. The benefit is the reduced costs and
>> some additional features. Of course, we also need features here, but at
>> the same time security and ease of certification are among the top of
>> our priorities. Nobody wants cars or planes to have security problems,
>> right? Also nobody really needs DVB and even more exotic devices in cars
>> and planes AFAIK.
>>
>> For the above mentioned reasons our COQOS hypervisor is running on bare
>> metal. Also memory management for the guests is mostly static. It is
>> possible to make a shared memory region between a device and a driver
>> managed by device in advance. But definitely no mapping of random host
>> pages on the fly is supported.
>>
>> AFAIU crosvm is about making Chrome OS more secure by putting every app
>> in its own virtualized environment, right?
>
> Not really, but for the discussion here you can assume that it is a
> VMM similar to QEmu with KVM enabled.

Thanks for the clarification. If my idea about your use-case is not
totally correct, then it would be very helpful if you can provide more
details about it.

>> Both the host and guest are
>> linux. In this case I totally understand why V4L2 UAPI pass-through
>> feels like a right move. I guess, you'd like to make the switch to
>> virtualized apps as seemless as possible for your users. If they can't
>> use their DVBs anymore, they complain. And adding the virtualization
>> makes the whole thing more secure anyway. So I understand the desire to
>> have the range of supported devices as broad as possible. It is also
>> understandable that priorities are different with desktop
>> virtualization. Also I'm not trying to diminish the great work, that you
>> have done. It is just that from my perspective this looks like a step in
>> the wrong direction because of the mentioned concerns. So I'm going to
>> continue being a skeptic here, sorry.
>>
>> Of course, I don't expect that you continue working on the old approach
>> now as you have put that many efforts into the V4L2 UAPI pass-through.
>> So I think it is best to do the evolutionary changes in scope of virtio
>> video device specification, and create a new device specification
>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
>> continue the virtio-video development. In fact I already started making
>> draft v7 of the spec according to the comments. I hope it will be ready
>> for review soon.
>>
>> I hope this approach will also help fix issues with virtio-video spec
>> and driver development misalignment as well as V4L2 compliance issues
>> with the driver. I believe the problems were caused partly by poor
>> communication between us and by misalignment of our development cycles,
>> not by the driver complexity.
>>
>> So in my opinion it is OK to have different specs with overlapping
>> functionality for some time. My only concern is if this would be
>> accepted by the community and the committee. How the things usually go
>> here: preferring features and tolerating possible security issues or the
>> other way around? Also how acceptable is having linux-specific protocols
>> at all?
>>
>> Also I still have concerns about memory management with V4L2 UAPI
>> pass-through. Please see below.
>>
>> On 17.03.23 08:24, Alexandre Courbot wrote:
>>> Hi Alexander,
>>>
>>> On Thu, Mar 16, 2023 at 7:13 PM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>> Hi Alexandre,
>>>>
>>>> On 14.03.23 06:06, Alexandre Courbot wrote:
>>>>> If we find out that there is a benefit in going through the V4L2
>>>>> subsystem (which I cannot see for now), rebuilding the UAPI structures
>>>>> to communicate with the device is not different from building
>>>>> virtio-video specific structures like what we are currently doing.
>>>> Well, the V4L2 subsystem is there for a reason, right? It does some
>>>> important things too. I'm going to check all the v4l2_ioctl_ops
>>>> callbacks in the current virtio-video driver to make the list. Also if
>>>> you have some PoC spec/implementations, that would be nice to review. It
>>>> is always better to see the actual implementation, of course.
>>>>
>>>> I have these points so far:
>>>>
>>>> 1. Overall the V4L2 stateful decoder API looks significantly more
>>>> complex to me. Looks like you're a V4L2 expert, so this might not be
>>>> visible to you that much.
>>> V4L2 is more generic than virtio-video, so as a result specific uses
>>> tend to require a bit more operations. I would argue the mental
>>> overhead of working with it is less than significant, and most of it
>>> consists in not forgetting to call STREAMON on a queue after some
>>> operations. Things like format, resolution and buffer management do
>>> not get more complex (and V4L2 is actually more complete than our
>>> previous proposal on these).
>>>
>>> The counterpart of this marginal extra complexity is that you can
>>> virtualize more kinds of devices, and even within virtio-video support
>>> more formats than what has been specified so far. If your guest is
>>> Linux, the same kernel driver can be used to expose any kind of device
>>> supported by V4L2, and the driver is also much simpler than
>>> virtio-video, so you are actually reducing complexity significantly
>>> here. Even if you are not Linux, you can share the V4L2 structures
>>> definitions and low-layer code that sends V4L2 commands to the host
>>> between drivers. So while it is true that some specifics become
>>> slightly more complex, there is a lot of potential simplification when
>>> you look at the whole picture.
>>>
>>> It's an opinionated proposal, and it comes with a few compromises if
>>> you are mostly interested in codecs alone. But looking at the guest
>>> driver convinces me that this is the better approach when you look at
>>> the whole picture.
>>
>> Sorry, I just see it differently as I tried to describe above. The
>> problem is that we don't yet see the whole picture with the V4L2 UAPI
>> pass-through. I reviewed the code briefly. It is great, that you already
>> implemented the MMAP mode and host allocations already. But I would
>> argue, that this is the simplest case. Do you agree?
>
> I was trying to do a proof-of-concept here, of course it is not
> feature-complete and of course I started with the simplest case. I
> don't see your point here.

I understand that. The point is that the real complexity is yet to come.
Please see below. I think this is logical: if you only have implemented
the simplest case, then implementing more complex cases requires making
the implementation more complex.

>> Also this mode of
>> operation is not supported in our hypervisor for reasons mentioned
>> above. So in our case this PoC doesn't yet prove anything unfortunately.
>
> I did not have your use-case in mind while writing the PoC, its
> purpose was to demonstrate the suitability of V4L2 as a protocol for
> virtualizing video.
>
> Now if your hypervisor does static memory management and pre-allocates
> memory for guest buffers, then the V4L2 MMAP memory type actually
> looks like the best fit for the job. There are no tokens like virtio
> objects UUID to manage, and the MMAP request can be as simple as
> returning the pre-mapped address of the buffer in the guest PAS.
>
> If instead it carves some predefined amount of memory out for the
> whole guest and expects it to allocate buffer memory from there, then
> the USERPTR memory type (which works like the guest pages of
> virtio-video) is what you want to use.

It doesn't look like a good idea to us. This means preconfiguring memory
regions in the hypervisor config. It is hard to predict the amount of
memory, that is necessary. If we allocate too much, this is a waste of
memory. If we allocate too little, it won't be enough. Then we don't
know yet how to make V4L2 allocate from that memory. Then this memory
has to be managed on the host side. And memory management is exactly the
thing, that causes most security issues, right? So overall this is very
tedious, potentially wasteful and not flexible.


>> I think the real complexity is yet to come.
>
> Evidence would be appreciated.

Please check my comment above.

>>>>      a. So V4L2 subsystem and the current virtio-video driver are already
>>>> reducing the complexity. And this seems as the right place to do this,
>>>> because the complexity is caused by the amount of V4L2 use cases and its
>>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
>>>> would prefer a simpler API, right? I think this use-case is not purely
>>>> abstract at all.
>>> The V4L2 subsystem is there to factorize code that can be shared
>>> between drivers and manage their internal state. Our target is the
>>> V4L2 UAPI, so a Windows driver needs not be concerned about these
>>> details - it does what it would have done with virtio-video, and just
>>> uses the V4L2 structures to communicate with the host instead of the
>>> virtio-video ones.
>>
>> It can also reuse the virtio-video structures. So I think despite the
>> ability to reuse V4L2 structures, having to implement a linux-specific
>> interface would still be a bigger pain.
>
> The only Linux-specific thing in this interface is that it
> misleadingly has "Linux" in its name. Otherwise it's really similar to
> what we previously had.
>
>>
>>
>>>>      b. Less complex API is better from a security point of view too. When
>>>> V4L2 was developed, not many people were concerned with malicious USB
>>>> devices probably. At least exploiting a malicious USB device usually
>>>> requires physical access. With virtual devices and multiple VMs the
>>>> stakes are higher, I believe.
>>> That's probably true, but I fail to see how the fact we are using
>>> struct v4l2_buffer instead of struct virtio_video_buffer can have an
>>> impact on that?
>>>
>>> V4L2 has a larger UAPI surface because it manages more kinds of
>>> devices, but drivers only need to implement the ioctls they need. For
>>> the rest, they just return -ENOTTY, and evil actors are hopefully kept
>>> at bay.
>>
>> Still there are definitely more ways to do things wrong. It would be
>> harder to audit a larger API surface.
>
> If you write a video device you don't need to support more API than
> that requested for your device. All unsupported interfaces can simply
> return ENOTTY.



>>>> 2. We have a working virtio-video driver. So we need very good reasons
>>>> to start from scratch. You name two reasons AFAIR: simplicity and
>>>> possible use of cameras. Did I miss something else?
>>>>
>>>>      a. The simplicity is there only in case all the interfaces are V4L2,
>>>> both in the backend and in the guest. Otherwise the complexity is just
>>>> moved to backends. I haven't seen V4L2 in our setups so far, only some
>>>> proprietary OMX libraries. So from my point of view, this is not
>>>> simplicity in general, but an optimization for a specific narrow use case.
>>> V4L2 is not a narrow use-case when it comes to video devices on Linux
>>> - basically every user space application involving cameras or codecs
>>> can use it. Even the virtio-video driver exposes a V4L2 device, so
>>> unless you are using a different driver and proprietary userspace apps
>>> specifically written to interact with that driver, V4L2 is involved in
>>> your setup at some point.
>>
>> Sorry, I mean narrow use-case if we look into other possibilities:
>>
>> 1. Stateless V4L2 on the host.
>> 2. Any other interface on the host.
>> 3. Any other guest except Linux.
>>
>> Our targets are several popular embedded SoCs. Unfortunately we don't
>> have the luxury of simply having normal V4L2 devices there. And it
>> doesn't look like this is going to change.
>>
>>
>>> The guest driver that I wrote is, I think, a good example of the
>>> complexity you can expect in terms of guest driver size (as it is
>>> pretty functional already with its 1000 and some LoCs). For the UAPI
>>> complexity, the host device basically unpacks the information it needs
>>> and rebuilds the V4L2 structures before calling into the host device,
>>> and I don't see this process as more complex that the unpacking of
>>> virtio-video structs which we also did in crosvm.
>>
>> Unfortunately our hypervisor doesn't support mapping random host pages
>> in the guest.
>
> The ability to map random host pages to the guest is *not* a
> requirement of virtio-v4l2.
>
>> Static allocations of shared memory regions are possible.
>> But then we have to tell V4L2 to allocate buffers there. Then we'll need
>> a region per virtual device. This is just very tedious and inflexible.
>> That's why we're mainly interested in having the guest pages sharing in
>> the virtio video spec.
>
> I'll be happy to update the PoC and make it able to use guest pages as
> buffer backing memory. It just wasn't the priority to demonstrate the
> global approach.

Great, thank you. If you have a concrete plan already, I think it could
be beneficial to discuss it now. Otherwise I'd prefer to keep working on
the current approach until I see something concrete.

>>>>      b. For modern cameras the V4L2 interface is not enough anyway. This
>>>> was already discussed AFAIR. There is a separate virtio-camera
>>>> specification, that indeed is based on V4L2 UAPI as you said. But
>>>> combining these two specs is certainly not future proof, right? So I
>>>> think it is best to let the virtio-camera spec to be developed
>>>> independently.
>>> I don't know if virtio-camera has made progress that they have not
>>> published yet, but from what I have seen virtio-v4l2 can cover
>>> everything that the currently published driver does (I could not find
>>> a specification, but please point me to it if it exists), so there
>>> would be no conflict to resolve.
>>>
>>> V4L2 with requests support should be capable of handling complex
>>> camera configurations, but the effort indeed seems to have switched to
>>> KCAM when it comes to supporting complex native cameras natively. That
>>> being said:
>>>
>>> * KCAM is not merged yet, is probably not going to be for some time
>>> (https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flwn.net%2fArticles%2f904776%2f&umid=7506d37b-2a1b-4aff-ac10-25fcc75ef955&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-7499883461cdeab6be6e5789888e8475e39171da), and we don't know how we can
>>> handle virtualization with it,
>>> * The fact that the camera is complex on the host does not mean that
>>> all that complexity needs to be exposed to the guest. I don't know how
>>> the camera folks want to manage this, but one can imagine that the
>>> host could expose a simpler model for the virtual camera, with only
>>> the required knobs, while the host takes care of doing all the complex
>>> configuration.
>>> * The counter argument can be made that simple camera devices do not
>>> need a complex virtualization solution, so one can also invoke
>>> simplicity here to advocate for virtio-v4l2.
>>>
>>> My point is not to say that all other camera virtualization efforts
>>> should be abandoned - if indeed there is a need for something more
>>> specific, then nothing prevents us from having a virtio-camera
>>> specification added. However, we are nowhere close to this at the
>>> moment, and right now there is no official solution for camera
>>> virtualization, so I see no reason to deny the opportunity to support
>>> simple camera devices since its cost would just be to add "and cameras
>>> device" in the paragraph of the spec that explains what devices are
>>> supported.
>>
>> Well, for reasons described above it still seems perfectly fine to me to
>> have separate devices. Ok, the argument, that this approach also seems
>> more future-proof, is not a strong one.
>
> Please elaborate on its weaknesses then.

Well, as you said basically. The weakness of the argument is that the
virtio-camera is not yet published, the KCAM is not merged yet, so yeah,
the future is not clear actually.

BTW I just thought about one more case, that is already real: sharing
camera streams with pipewire. I think pipewire doesn't provide a V4L2
UAPI interface, right?

>>>> 3. More specifically I can see, that around 95% V4L2 drivers use
>>>> videobuf2. This includes the current virtio-video driver. Bypassing the
>>>> V4L2 subsystem means that vb2 can't be used, right? In various
>>>> discussions vb2 popped up as a thing, that would be hard to avoid. What
>>>> do you think about this? How are you going to deal with various V4L2
>>>> memory types (V4L2_MEMORY_MMAP, V4L2_MEMORY_DMABUF, etc), for example?
>>>> I'll try to dive deeper myself too...
>>> VB2 is entirely avoided in the current driver, but my understanding is
>>> that its helpers could be used if needed.
>>>
>>> In virtio-v4l2, MMAP means that the host is responsible for managing
>>> the buffers, so vb2 is entirely avoided. USERPTR means the guest
>>> passes a SG list of guest physical addresses as mapping memory. VB2
>>> may or may not be involved in managing this memory, but most likely
>>> not if that memory comes from the guest userspace. DMABUF means the
>>> guest passes a virtio object as the backing memory of the buffer.
>>> There again there is no particular management to be done on the guest
>>> side.
>>>
>>> I bypassed VB2 for the current driver, and the cost of doing this is
>>> that I had to write my own mmap() function.
>>
>> The cost of it as of now is also that:
>>
>> 1. Only guest user-space applications, that use V4L2_MEMORY_MMAP, are
>> supported AFAIU.
>
> This has nothing to do with VB2. I wanted to demonstrate that V4L2
> could be used as a host-guest protocol and did it on a single memory
> type to release something quickly. Please stop strawmanning the design
> because the PoC is still incomplete.

Please stop putting labels like this on my arguments. This is not
helpful at all.

>> 2. There is no flexibility to choose whatever way of memory management
>> host and guest would like to use. Now the guest user-space application
>> selects this.
>
> Errr no. The guest user-space chooses a type of memory from what the
> guest kernel exposes, which depends on what the host itself decides to
> expose.

I don't agree. If an already written user-space app supports only MMAP,
then there is no way to force it use USERPTR, right? Please correct me
if I'm wrong.

>> The latter makes the solution much less flexible IMO. For example, this
>> won't work well with our hypervisor. There might other special needs in
>> other use-cases. Like sharing these object UUIDs. Probably this can
>> handled by mapping, for example, V4L2_MEMORY_USERPTR to guest-pages
>> sharing, V4L2_MEMORY_DMABUF to the UUIDs (which is not quite correct
>> IMHO).
>
> Please elaborate on why this is not correct.

Because IMHO UUIDs pointing to memory allocated by virtio-gpu are quite
different dmabufs created in the guest with udmabuf, for example. This
can be confusing.

>> So this already means querying the device for supported sharing
>> methods, rewriting the flow of V4L2 UAPI calls on the fly, ensuring
>> consistency, etc. This already looks hackish to me. Do you have a better
>> plan?
>
> How do you support different kinds of memory without querying? Or do
> you suggest we stick to a single one?
>
> I am also not quite sure what you mean by "rewriting the flow of V4L2
> UAPI calls on the fly". There is no "rewriting" - V4L2 structures are
> just used to communicate with the host instead of virtio-video
> structures.

I'd like to know your ideas or better a concrete plan for enabling
user-space apps, that only support MMAP, to work on top of a device,
that supports only guest pages sharing.

>> Also this limits us to only 3 methods, right? And what if there
>> are more than 3 methods in the future?
>
> Nothing prevents us from adding new virtio-specific memory types if
> needed. But what other methods did you have in mind?

You mean we can easily extend V4L2 UAPI with our own memory types, that
are not used in usual V4L2 drivers? Please provide some evidence.

>> I think this inflexibility is a major problem with this approach.
>>
>>
>>>>> Actually I don't think this is even something we need to think about -
>>>>> in its simplest form the V4L2 guest driver just needs to act as a
>>>>> proxy for the device. So which decoder API is used by the host is
>>>>> completely irrelevant to the guest driver - it can support a decoder,
>>>>> an encoder, or a camera - it doesn't even need to be aware of what
>>>>> kind of device it is exposing and that simplicity is another thing
>>>>> that I like with this design.
>>>> As I wrote above the design would be indeed simple only in case the
>>>> actual hardware is exposed to a backend through V4L2 too. Otherwise the
>>>> complexity is just moved to backends.
>>> Yes, and while I acknowledge that, this is not really more complex
>>> that what you would have to do with a virtio-video device which also
>>> needs to manage its own state and drive the hardware through backends.
>>> I say that based on the experience working on the virtio-video device
>>> in crosvm which follows that design too.
>>
>> As I wrote above we have a different use-case. And I see the current
>> state of virtio video as a good common ground for different parties and
>> use-cases. Unfortunately I don't see any upsides for our use-cases from
>> the V4L2 UAPI proposal, only downsides.
>
> Well AFAICT V4L2 provides the exact same set of capabilities as
> virtio-video, with only minor differences. If virtio-video was
> suitable for your use-case, V4L2 should be as well.
>
> Maybe it makes things marginally more complex for your particular
> proprietary bare-metal hypervisor. But it also makes things
> dramatically easier and provides much more features for the vast
> majority of the virtio audience who run Linux guests and can now use a
> much simpler driver. Which one do we want to prioritize?

This sounds like a neglect for our use-case. This is not helpful, if
we're going to continue working with the same device, because this
questions our ability to cooperate. That's fine as long as we can
continue developing the current version separately.

> I'm sorry but your answer is full of vague assertions about supposed
> shortcomings of the approach without any concrete evidence of its
> unsuitability. Please show us why this wouldn't work for you.

I asked you what is your plan about the guest pages sharing. Probably
you didn't see this question because I don't see the answer in your
email. So I'm reiterating it here. What is your plan? Without that I can
only share my own ideas, and indeed the whole conversation can seem
vague and hypothetical.

I also provided a benchmark in the other email. We haven't agreed on
that yet, but I hope this can help making things clear.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-21  4:02                                   ` Alexandre Courbot
  2023-04-21 16:01                                     ` Alexander Gordeev
@ 2023-04-26 15:52                                     ` Alexander Gordeev
  2023-04-27 13:23                                       ` Alexandre Courbot
       [not found]                                     ` <CALgKJBqKWng508cB_F_uD2fy9EAvQ36rYR3fRb57sFd3ihpUFw@mail.gmail.com>
  2 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-26 15:52 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 21.04.23 06:02, Alexandre Courbot wrote:
> On Wed, Apr 19, 2023 at 4:39 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 17.04.23 16:43, Cornelia Huck wrote:
>>> On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>
>>>> OpenSynergy, the company that I work for, develops a proprietary
>>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
>>>> have our proprietary device implementations, but overall our goal is to
>>>> bring open standards into these quite closed domains and we're betting
>>>> big on virtio. The idea is to run safety-critical functions like cockpit
>>>> controller alongside with multimedia stuff in different VMs on the same
>>>> physical board. Right now they have it on separate physical devices. So
>>>> they already have maximum isolation. And we're trying to make this
>>>> equally safe on a single board. The benefit is the reduced costs and
>>>> some additional features. Of course, we also need features here, but at
>>>> the same time security and ease of certification are among the top of
>>>> our priorities. Nobody wants cars or planes to have security problems,
>>>> right? Also nobody really needs DVB and even more exotic devices in cars
>>>> and planes AFAIK.
>>>>
>>>> For the above mentioned reasons our COQOS hypervisor is running on bare
>>>> metal. Also memory management for the guests is mostly static. It is
>>>> possible to make a shared memory region between a device and a driver
>>>> managed by device in advance. But definitely no mapping of random host
>>>> pages on the fly is supported.
>>>>
>>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
>>>> in its own virtualized environment, right? Both the host and guest are
>>>> linux. In this case I totally understand why V4L2 UAPI pass-through
>>>> feels like a right move. I guess, you'd like to make the switch to
>>>> virtualized apps as seemless as possible for your users. If they can't
>>>> use their DVBs anymore, they complain. And adding the virtualization
>>>> makes the whole thing more secure anyway. So I understand the desire to
>>>> have the range of supported devices as broad as possible. It is also
>>>> understandable that priorities are different with desktop
>>>> virtualization. Also I'm not trying to diminish the great work, that you
>>>> have done. It is just that from my perspective this looks like a step in
>>>> the wrong direction because of the mentioned concerns. So I'm going to
>>>> continue being a skeptic here, sorry.
>>>>
>>>> Of course, I don't expect that you continue working on the old approach
>>>> now as you have put that many efforts into the V4L2 UAPI pass-through.
>>>> So I think it is best to do the evolutionary changes in scope of virtio
>>>> video device specification, and create a new device specification
>>>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
>>>> continue the virtio-video development. In fact I already started making
>>>> draft v7 of the spec according to the comments. I hope it will be ready
>>>> for review soon.
>>>>
>>>> I hope this approach will also help fix issues with virtio-video spec
>>>> and driver development misalignment as well as V4L2 compliance issues
>>>> with the driver. I believe the problems were caused partly by poor
>>>> communication between us and by misalignment of our development cycles,
>>>> not by the driver complexity.
>>>>
>>>> So in my opinion it is OK to have different specs with overlapping
>>>> functionality for some time. My only concern is if this would be
>>>> accepted by the community and the committee. How the things usually go
>>>> here: preferring features and tolerating possible security issues or the
>>>> other way around? Also how acceptable is having linux-specific protocols
>>>> at all?
>>> My main question is: What would be something that we can merge as a
>>> spec, that would either cover the different use cases already, or that
>>> could be easily extended to cover the use cases it does not handle
>>> initially?
>>>
>>> For example, can some of the features that would be useful in crosvm be
>>> tucked behind some feature bit(s), so that the more restricted COQOS
>>> hypervisor would simply not offer them? (Two feature bits covering two
>>> different mechanisms, like the current approach and the v4l2 approach,
>>> would also be good, as long as there's enough common ground between the
>>> two.)
>>>
>>> If a staged approach (adding features controled by feature bits) would
>>> be possible, that would be my preferred way to do it.
>>
>> Hmm, I see several ways how we can use the feature flags:
>> 1. Basically making two feature flags: one for the current video spec
>> and one for the V4L2 UAPI pass through. Kind of the same as having two
>> different specs, but within one device. Not sure which way is better.
>> Probably having two separate devices would be easier to review and merge.
>
> Having two different devices with their own IDs would indeed be less
> confusing than using feature bits.
>
> That being said, the whole point of proposing virtio-v4l2 is to end up
> with *less* specification, not more. Having two concurrent and largely
> overlapping approaches will result in fragmentation and duplicated
> work, so my suggestion would be to decide on one or the other and
> stick to it.

Hmm, Enrico pointed out, that having virtio-v4l2 would also be good
because of much better compatibility with Android right now. I don't
think the specification length should be our ultimate goal. Cornelia
said, that her ultimate goal is to have a spec everyone is happy with,
regardless on how we arrive there. Well, I can only say, that I also
think this should be our goal.

>> 2. Finding a subset of V4L2, that closely matches the current draft, and
>> restrict everything else. A perfectly reasoned answer for this case will
>> require a lot of work going through all the V4L2 structures I think. And
>
> It's pretty trivial, on the contrary. For a decoder device one would
> just need to restrict to the structs and operations mentioned here:
> https://www.kernel.org/doc/html/v5.18/userspace-api/media/v4l/dev-decoder.html

I don't agree. But we already talk about this in other threads.

>> even if we have a concrete plan, it has to be implemented first. I doubt
>> it is possible. Based on the things, that I already know, this is going
>> to be a compromise on security anyway, so we're not happy about that.
>> More on that below.
>> 3. Stop trying to simply pass the V4L2 UAPI through and focus on making
>> the virtio video spec as close to the V4L2 UAPI as possible, but with
>> the appropriate security model. So that the video device can be extended
>> with a feature flag to something very close to full V4L2 UAPI. A lot of
>> work as well, I think. And this won't allow us to simply link the V4L2
>> UAPI in the spec and therefore reduce its size, which is Alexandre's
>> current goal. So Alexandre and his team are not happy this way probably.
>
> That would indeed be reinventing the wheel and a completely pointless
> exercise IMHO.

I agree.

>>   From the security point of view these are our goals from most to less
>> important AFAIU:
>> 1. Make the device secure. If a device is compromised, the whole
>> physical machine is at risk. Complexity is the enemy here. It helps a
>> lot to make the device as straightforward and easy to implement as
>> possible. Therefore it is good to make the spec device/function-centric
>> from this PoV.
>
> This seems to be the main point of contention we have right here. I do
> not believe that V4L2 introduces significant complexity to the video
> use-case, and certainly not to a degree that would justify writing a
> new spec just for that.

Ok, then we should try to agree on a benchmark, I think.

>> 2. Ensure, that drivers are also secure at least from user-space side.
>> Maybe from device side too.
>
> FWIW V4L2 has been in use for a looong time (including in Chromebooks
> that do secure playback) and I am not aware of fundamental security
> issues with it.

Already being discussed separately...

>> 3. Implementing secure playback and making sure media doesn't leak. For
>> this case it is nice to have these object UUIDs as buffers.
>>
>> Please correct me if there's something wrong here.
>>
>> When we start looking from this perspective even things like naming
>> video buffer queues "output" for device input and "capture" for device
>> output are problematic. In my experience this naming scheme takes some
>> time and for sure several coding mistakes to get used to. Unfortunately
>> this can't be turned off with some feature flags. In contrast virtio
>> video v6 names these queues "input" and "output". This is perfectly fine
>> if we look from the device side. It is understandable, that Alexandre's
>> list of differences between V4L2 UAPI and the current state of virtio
>> video doesn't include these things. But we have to count them, I think.
>> That's why it takes me so long to make the list. :) So throwing away
>> this simplicity is still going to be a compromise from the security
>> perspective, that we're not happy about.
>>
>> This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
>> security model, legacy, use-cases, developers, etc. It can be changed
>> over time, but this is a long process because this means changing the
>> Linux UAPI. Also this means, that nothing can be removed from it, only
>> added (if the V4L2 community agrees). For example, we can't simply add a
>> new way of sharing buffers for potential new use-cases once we switch to
>> V4L2 UAPI AFAIU.
>
> You certainly can, it would just need to be in the virtio
> specification. And again that's a very hypothetical case.

Ok, maybe it is hypothetical.  We won't know until it happens. I still
don't like it. Overriding things in the virtio specification is not
particularly nice. It is much easier to understand when things are kept
together, not as patches.

>> The V4L2 community can simply reject changes because
>> this is a UAPI after all. We kind of have a weak dependency already,
>> because the only driver implementation is based on V4L2 and we'd like to
>> keep the spec as close to V4L2 as possible, but it is not the same
>> thing. So at the moment it looks like the V4L2 UAPI proposal is not
>> super flexible. Alexandre said, that we can simply not implement some of
>> the ioctls. Well, this definitely doesn't cover all the complexity like
>> the structures and other subtle details.
>
> Mmm? Where did I say that? That sounds like a misunderstanding.

Here is the quote from your email on March 17:

> V4L2 has a larger UAPI surface because it manages more kinds of
> devices, but drivers only need to implement the ioctls they need. For
> the rest, they just return -ENOTTY, and evil actors are hopefully kept
> at bay.


>> Also adding the feature flags would probably defeat the stated purpose
>> of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
>> V4L2 driver.
>>
>> So I have a lot of doubts about the feasibility of adding feature flags.
>> If Alexandre and his team want the V4L2 UAPI as is, then looks like it
>> is best to simply have two specs:
>> 1. virtio-video for those interested in building from ground up with the
>> security model appropriate for virtualization in mind. This is going to
>> take time and not going to reach feature parity with V4L2 ever I think.
>> I mean some old devices might never get support this way.
>> 2. virtio-v4l2 as a shortcut for those interested in having feature
>> parity with V4L2 fast. Like a compatibility layer. Probably this is
>> going to be used in linux host + linux guest use-cases only. Maybe it
>> gets obsoleted by the first spec in several years for most modern use-cases.
>>
>> Or maybe have these two cases within a single device spec as I wrote above.
>>
>> This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
>> only needed for compatibility, then this way we can avoid a lot of work
>> going through all of the V4L2 and trying to find different subsets or
>> trying to construct something, that is close to V4L2 UAPI, but doesn't
>> compromise on the security. I'm not really interested in doing all this
>> work because we're already more or less satisfied with the current
>> state. We don't need feature parity with V4L2. On the other hand for
>> Alexandre the feature-parity with V4L2 is clearly of higher priority,
>> than all these subtle security model differences. In my opinion it also
>> doesn't make sense to invest that much time in something, that looks
>> like a compatibility layer. So it seems both of us are interested in
>> avoiding all this extra work. Then I'd just prefer to have two different
>> specs so that everyone can work according to their priorities.
>>
>>
>>> Regarding the protocol: I think Linux-originating protocols (that can be
>>> implemented on non-Linux setups) are fine, Linux-only protocols probably
>>> not so much.
>>
>> Thanks for the information. Well, it looks like the V4L2 UAPI could be
>> implemented on any platform unless it needs a completely new way of
>> memory management since all the V4L2_MEMORY_* constants are going to be
>> used already AFAIU.
>>
>>
>>>>>>       a. So V4L2 subsystem and the current virtio-video driver are already
>>>>>> reducing the complexity. And this seems as the right place to do this,
>>>>>> because the complexity is caused by the amount of V4L2 use cases and its
>>>>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
>>>>>> would prefer a simpler API, right? I think this use-case is not purely
>>>>>> abstract at all.
>>>>> The V4L2 subsystem is there to factorize code that can be shared
>>>>> between drivers and manage their internal state. Our target is the
>>>>> V4L2 UAPI, so a Windows driver needs not be concerned about these
>>>>> details - it does what it would have done with virtio-video, and just
>>>>> uses the V4L2 structures to communicate with the host instead of the
>>>>> virtio-video ones.
>>>> It can also reuse the virtio-video structures. So I think despite the
>>>> ability to reuse V4L2 structures, having to implement a linux-specific
>>>> interface would still be a bigger pain.
>>> Hm. Do the v4l2 structures drag in too many adjacent things that need to
>>> be implemented? Can we match the video-video structures from the current
>>> proposal with some v4l2 structures and extract a common wrapper for
>>> those that match, with a feature-bit controlled backend? It would be
>>> fine if any of those backends supported a slightly different subset of
>>> the common parts, as long as the parts implemented by both would be
>>> enough to implement a working device. (Mostly thinking out loud here.)
>>
>> I don't think this is realistic unfortunately. On per ioctl level it is
>> possible to disable some functionality probably, but the V4L2 structures
>> are set in stone. We can only extend them.
>>
>>
>>>>> The guest driver that I wrote is, I think, a good example of the
>>>>> complexity you can expect in terms of guest driver size (as it is
>>>>> pretty functional already with its 1000 and some LoCs). For the UAPI
>>>>> complexity, the host device basically unpacks the information it needs
>>>>> and rebuilds the V4L2 structures before calling into the host device,
>>>>> and I don't see this process as more complex that the unpacking of
>>>>> virtio-video structs which we also did in crosvm.
>>>> Unfortunately our hypervisor doesn't support mapping random host pages
>>>> in the guest. Static allocations of shared memory regions are possible.
>>>> But then we have to tell V4L2 to allocate buffers there. Then we'll need
>>>> a region per virtual device. This is just very tedious and inflexible.
>>>> That's why we're mainly interested in having the guest pages sharing in
>>>> the virtio video spec.
>>> This really sounds like you'll want a different approach -- two
>>> mechanisms covered by two feature bits might indeed be the way to go.
>>
>> Well, basically this is the way we have it now. I'm not sure what is
>> Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
>> solved, the solution already doesn't look future-proof anyway unfortunately.
>>
>>
>>>>> I hope I have somehow addressed your points. The main point here is to
>>>>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
>>>>> accelerated codec work, regardless of what the guest or host
>>>>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
>>>>> this is a viable solution. This PoC is largely simplified by the fact
>>>>> that V4L2 is used all along the way, but this is irrelevant - yes,
>>>>> actual devices will likely talk to other APIs and maintain more state,
>>>>> like a virtio-video device would do. What I want to demonstrate is
>>>>> that we can send encoding work and receive a valid stream, and that it
>>>>> is not costly, and only marginally more complex than our virtio-video
>>>>> spec attempts.
>>>>>
>>>>> ... and we can support cameras too, but that's just a convenient
>>>>> side-effect, not the ultimate solution to the camera virtualization
>>>>> problem (that's for the camera folks to decide).
>>>> Thanks for your answer!
>>> Thanks everyone -- do you think the "two feature bits to cover different
>>> approaches, but using a common infrastructure" idea could work? If yes,
>>> I think that's the direction we should take. If we can implement this
>>> with just one feature bit, that might also be a good route to extend it
>>> later, but I'm not familiar enough with the whole infrastructure to make
>>> any judgement here.
>>
>> Thanks for your suggestions. Hopefully we end up with a good solution.
>
> To summarize my position:
>
> * I am still not convinced that V4L2 is lacking from a security
> perspective. It would take just one valid example to change my mind
> (and no, the way the queues are named is not valid). And btw, if it
> really introduces security issues, then this makes it invalid for
> inclusion in virtio entirely, just not OpSy's hypervisor.

Already being discussed separately...

> * Having two overlapping specifications for video is overkill and will
> just fragment virtio (as tempting as it is, I won't link to XKCD). I
> strongly advise against that.

I think they're not going to create more problems, than virtio-blk,
virtio-scsi and virtio-fs, for example.
The decision can be made like this:
1. You have a V4L2 device, you don't need any more processing, just want
it inside a Linux/Android VM => use virtio-v4l2.
2. You don't have a V4L2 device, or your host is not Linux, or your
maybe your guest is not Linux/Android, or you want some extra processing
on the host (say you have a third-party proprietary library or whatever)
=> use virtio-video.

The more I think about that, the more sense it makes to me...

> * If the goal is to provide a standard that is suitable and useful to
> the greater number, then we shouldn't downsize the benefit that
> virtio-v4l2 brings to Linux guests.

Please see my comments above about our goals. I think virtio-v4l2 +
virtio-video would be suitable and useful to even greater number of
users, than virtio-v4l2 only.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
       [not found]                                     ` <CALgKJBqKWng508cB_F_uD2fy9EAvQ36rYR3fRb57sFd3ihpUFw@mail.gmail.com>
@ 2023-04-26 16:00                                       ` Alexander Gordeev
  2023-04-27 10:13                                         ` Bartłomiej Grzesik
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-26 16:00 UTC (permalink / raw)
  To: Bartłomiej Grzesik, Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve, bgrzesik, srosek, zyta,
	hmazur, mikrawczyk

Hi Bartłomiej,

On 21.04.23 13:49, Bartłomiej Grzesik wrote:
> +CC chromeos-arc-video-eng team that also works on virtio-video
>
> Hi everyone!
>
>  From the experience of working on virtio-video I can definitely agree
> with Alex Courbot, that moving to virtio-v4l2 will be a great move.
> This move will not only simply things a lot but also allow us things like
> vhost-net like implementation for some devices (however this is
> thinking way ahead and linux only).
>
> One added benefit of this move that I'd like to point out now, that
> probably
> haven't been mentioned before, is moving to asynchronous resource queue
> call. Previously `VIRTIO_VIDEO_CMD_RESOURCE_QUEUE` have been
> synchronous and caused one hard to debug bug caused by this flawed
> design. During the command execution the virtio queue descriptors are
> blocked, potentially leading to dead locking the device. The implementation
> of virtio-v4l2 (even as is - btw nice work Alex!) eliminates this issue
> by moving to asynchronous response of the resource queue (VIDIOC_QBUF).

Thanks for your valuable feedback! Could you please share some details
about the bug? That would be very helpful. I'm working on the next
version of the virtio-video draft, so I can change it there. I like the
idea to use V4L2 as a reference, so we should probably do it like it is
done there, only simpler. Still it would be interesting to know the
details, because we didn't have issues with the current design.

> I also agree that in this case v4l2 would only be a protocol similar to how
> virtio-video was, that has the benefit of allowing very simple guest
> drivers.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-26 16:00                                       ` Alexander Gordeev
@ 2023-04-27 10:13                                         ` Bartłomiej Grzesik
  2023-04-27 14:34                                           ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Bartłomiej Grzesik @ 2023-04-27 10:13 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Bartłomiej Grzesik, Alexandre Courbot, Cornelia Huck,
	virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve, srosek, zyta, hmazur,
	mikrawczyk

Hi Alexander

On Wed, Apr 26, 2023 at 6:00 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> Hi Bartłomiej,
>
> On 21.04.23 13:49, Bartłomiej Grzesik wrote:
> > +CC chromeos-arc-video-eng team that also works on virtio-video
> >
> > Hi everyone!
> >
> >  From the experience of working on virtio-video I can definitely agree
> > with Alex Courbot, that moving to virtio-v4l2 will be a great move.
> > This move will not only simply things a lot but also allow us things like
> > vhost-net like implementation for some devices (however this is
> > thinking way ahead and linux only).
> >
> > One added benefit of this move that I'd like to point out now, that
> > probably
> > haven't been mentioned before, is moving to asynchronous resource queue
> > call. Previously `VIRTIO_VIDEO_CMD_RESOURCE_QUEUE` have been
> > synchronous and caused one hard to debug bug caused by this flawed
> > design. During the command execution the virtio queue descriptors are
> > blocked, potentially leading to dead locking the device. The implementation
> > of virtio-v4l2 (even as is - btw nice work Alex!) eliminates this issue
> > by moving to asynchronous response of the resource queue (VIDIOC_QBUF).
>
> Thanks for your valuable feedback! Could you please share some details
> about the bug? That would be very helpful. I'm working on the next
> version of the virtio-video draft, so I can change it there. I like the
> idea to use V4L2 as a reference, so we should probably do it like it is
> done there, only simpler. Still it would be interesting to know the
> details, because we didn't have issues with the current design.

In this bug, an app preallocated and enqueued all output buffers (to
CAPTURE queue). This is inline with V4L2 and in case of virtualized video
stack helps with latency. Those enqueued buffers are holding virtqueue
descriptors until filled with a decoded frame. While for one stream it is not an
issue, but for more simultaneous streams, it quickly becomes a serious
problem. In our case all descriptors from the virtqueue were consumed
by enqueued output buffers and no other command could be issued
to hypervisor. This dead locked the entire driver, by starving the driver with
descriptors - even STREAM_DESTROY command could not be issued and
only solution was to reboot the guest.

While it is easily solvable by adjusting the size of the virtqueue, it
is a flaw to
this design. The number would always have to a function of maximum number
of supported streams - raising rather quickly.

I remember having a few thoughts on how it could be solved and I think that
removing the need to block those descriptors is the best approach in my opinion.
One would argue that preallocating descriptors for this purpose or splitting
the command queue to input, output and control might be a viable solution
or per stream. However it would only delay the issue in time or could
cause other streams to "starve".

Porting asynchronous dequeueing of resources would bring the virtio-video
extremely close to virtio-v4l2 and therefore I support Alexandre idea to use
v4l2 as a protocol.


> > I also agree that in this case v4l2 would only be a protocol similar to how
> > virtio-video was, that has the benefit of allowing very simple guest
> > drivers.
>
> --
> Alexander Gordeev
> Senior Software Engineer
>
> OpenSynergy GmbH
> Rotherstr. 20, 10245 Berlin
>
> Phone: +49 30 60 98 54 0 - 88
> Fax: +49 (30) 60 98 54 0 - 99
> EMail: alexander.gordeev@opensynergy.com
>
> www.opensynergy.com
>
> Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
> Geschäftsführer/Managing Director: Régis Adjamah
>
> Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>



--
Best regards,
Bartek

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-26 15:11                                 ` Alexander Gordeev
@ 2023-04-27 13:16                                   ` Alexandre Courbot
  2023-04-28  7:47                                     ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-27 13:16 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Thu, Apr 27, 2023 at 12:11 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 21.04.23 06:02, Alexandre Courbot wrote:
> > Hi Alexander,
> >
> > On Mon, Apr 17, 2023 at 9:52 PM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> Hi Alexandre,
> >>
> >> Thanks for you letter! Sorry, it took me some time to write an answer.
> >>
> >> First of all I'd like to describe my perspective a little bit because it
> >> seems, that in many cases we (and other people writing their feedbacks)
> >> simply have very different priorities and background.
> >>
> >> OpenSynergy, the company that I work for, develops a proprietary
> >> hypervisor called COQOS mainly for automotive and aerospace domains. We
> >> have our proprietary device implementations, but overall our goal is to
> >> bring open standards into these quite closed domains and we're betting
> >> big on virtio. The idea is to run safety-critical functions like cockpit
> >> controller alongside with multimedia stuff in different VMs on the same
> >> physical board. Right now they have it on separate physical devices. So
> >> they already have maximum isolation. And we're trying to make this
> >> equally safe on a single board. The benefit is the reduced costs and
> >> some additional features. Of course, we also need features here, but at
> >> the same time security and ease of certification are among the top of
> >> our priorities. Nobody wants cars or planes to have security problems,
> >> right? Also nobody really needs DVB and even more exotic devices in cars
> >> and planes AFAIK.
> >>
> >> For the above mentioned reasons our COQOS hypervisor is running on bare
> >> metal. Also memory management for the guests is mostly static. It is
> >> possible to make a shared memory region between a device and a driver
> >> managed by device in advance. But definitely no mapping of random host
> >> pages on the fly is supported.
> >>
> >> AFAIU crosvm is about making Chrome OS more secure by putting every app
> >> in its own virtualized environment, right?
> >
> > Not really, but for the discussion here you can assume that it is a
> > VMM similar to QEmu with KVM enabled.
>
> Thanks for the clarification. If my idea about your use-case is not
> totally correct, then it would be very helpful if you can provide more
> details about it.

It's nothing fancy ; Linux host, Linux (e.g. Android) guests. But
virtio being a standard, we should focus on making something that is
usable by everyone instead of individual use-cases.

>
> >> Both the host and guest are
> >> linux. In this case I totally understand why V4L2 UAPI pass-through
> >> feels like a right move. I guess, you'd like to make the switch to
> >> virtualized apps as seemless as possible for your users. If they can't
> >> use their DVBs anymore, they complain. And adding the virtualization
> >> makes the whole thing more secure anyway. So I understand the desire to
> >> have the range of supported devices as broad as possible. It is also
> >> understandable that priorities are different with desktop
> >> virtualization. Also I'm not trying to diminish the great work, that you
> >> have done. It is just that from my perspective this looks like a step in
> >> the wrong direction because of the mentioned concerns. So I'm going to
> >> continue being a skeptic here, sorry.
> >>
> >> Of course, I don't expect that you continue working on the old approach
> >> now as you have put that many efforts into the V4L2 UAPI pass-through.
> >> So I think it is best to do the evolutionary changes in scope of virtio
> >> video device specification, and create a new device specification
> >> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> >> continue the virtio-video development. In fact I already started making
> >> draft v7 of the spec according to the comments. I hope it will be ready
> >> for review soon.
> >>
> >> I hope this approach will also help fix issues with virtio-video spec
> >> and driver development misalignment as well as V4L2 compliance issues
> >> with the driver. I believe the problems were caused partly by poor
> >> communication between us and by misalignment of our development cycles,
> >> not by the driver complexity.
> >>
> >> So in my opinion it is OK to have different specs with overlapping
> >> functionality for some time. My only concern is if this would be
> >> accepted by the community and the committee. How the things usually go
> >> here: preferring features and tolerating possible security issues or the
> >> other way around? Also how acceptable is having linux-specific protocols
> >> at all?
> >>
> >> Also I still have concerns about memory management with V4L2 UAPI
> >> pass-through. Please see below.
> >>
> >> On 17.03.23 08:24, Alexandre Courbot wrote:
> >>> Hi Alexander,
> >>>
> >>> On Thu, Mar 16, 2023 at 7:13 PM Alexander Gordeev
> >>> <alexander.gordeev@opensynergy.com> wrote:
> >>>> Hi Alexandre,
> >>>>
> >>>> On 14.03.23 06:06, Alexandre Courbot wrote:
> >>>>> If we find out that there is a benefit in going through the V4L2
> >>>>> subsystem (which I cannot see for now), rebuilding the UAPI structures
> >>>>> to communicate with the device is not different from building
> >>>>> virtio-video specific structures like what we are currently doing.
> >>>> Well, the V4L2 subsystem is there for a reason, right? It does some
> >>>> important things too. I'm going to check all the v4l2_ioctl_ops
> >>>> callbacks in the current virtio-video driver to make the list. Also if
> >>>> you have some PoC spec/implementations, that would be nice to review. It
> >>>> is always better to see the actual implementation, of course.
> >>>>
> >>>> I have these points so far:
> >>>>
> >>>> 1. Overall the V4L2 stateful decoder API looks significantly more
> >>>> complex to me. Looks like you're a V4L2 expert, so this might not be
> >>>> visible to you that much.
> >>> V4L2 is more generic than virtio-video, so as a result specific uses
> >>> tend to require a bit more operations. I would argue the mental
> >>> overhead of working with it is less than significant, and most of it
> >>> consists in not forgetting to call STREAMON on a queue after some
> >>> operations. Things like format, resolution and buffer management do
> >>> not get more complex (and V4L2 is actually more complete than our
> >>> previous proposal on these).
> >>>
> >>> The counterpart of this marginal extra complexity is that you can
> >>> virtualize more kinds of devices, and even within virtio-video support
> >>> more formats than what has been specified so far. If your guest is
> >>> Linux, the same kernel driver can be used to expose any kind of device
> >>> supported by V4L2, and the driver is also much simpler than
> >>> virtio-video, so you are actually reducing complexity significantly
> >>> here. Even if you are not Linux, you can share the V4L2 structures
> >>> definitions and low-layer code that sends V4L2 commands to the host
> >>> between drivers. So while it is true that some specifics become
> >>> slightly more complex, there is a lot of potential simplification when
> >>> you look at the whole picture.
> >>>
> >>> It's an opinionated proposal, and it comes with a few compromises if
> >>> you are mostly interested in codecs alone. But looking at the guest
> >>> driver convinces me that this is the better approach when you look at
> >>> the whole picture.
> >>
> >> Sorry, I just see it differently as I tried to describe above. The
> >> problem is that we don't yet see the whole picture with the V4L2 UAPI
> >> pass-through. I reviewed the code briefly. It is great, that you already
> >> implemented the MMAP mode and host allocations already. But I would
> >> argue, that this is the simplest case. Do you agree?
> >
> > I was trying to do a proof-of-concept here, of course it is not
> > feature-complete and of course I started with the simplest case. I
> > don't see your point here.
>
> I understand that. The point is that the real complexity is yet to come.
> Please see below. I think this is logical: if you only have implemented
> the simplest case, then implementing more complex cases requires making
> the implementation more complex.
>
> >> Also this mode of
> >> operation is not supported in our hypervisor for reasons mentioned
> >> above. So in our case this PoC doesn't yet prove anything unfortunately.
> >
> > I did not have your use-case in mind while writing the PoC, its
> > purpose was to demonstrate the suitability of V4L2 as a protocol for
> > virtualizing video.
> >
> > Now if your hypervisor does static memory management and pre-allocates
> > memory for guest buffers, then the V4L2 MMAP memory type actually
> > looks like the best fit for the job. There are no tokens like virtio
> > objects UUID to manage, and the MMAP request can be as simple as
> > returning the pre-mapped address of the buffer in the guest PAS.
> >
> > If instead it carves some predefined amount of memory out for the
> > whole guest and expects it to allocate buffer memory from there, then
> > the USERPTR memory type (which works like the guest pages of
> > virtio-video) is what you want to use.
>
> It doesn't look like a good idea to us. This means preconfiguring memory
> regions in the hypervisor config. It is hard to predict the amount of
> memory, that is necessary. If we allocate too much, this is a waste of
> memory. If we allocate too little, it won't be enough. Then we don't
> know yet how to make V4L2 allocate from that memory. Then this memory
> has to be managed on the host side. And memory management is exactly the
> thing, that causes most security issues, right? So overall this is very
> tedious, potentially wasteful and not flexible.

My last paragraph mentions that you can also let the guest manage the
buffer memory from its own RAM. Or maybe I am missing how memory is
managed on your hypervisor, but if that's the case elaborate on where
you want the buffer memory to come from.

>
>
> >> I think the real complexity is yet to come.
> >
> > Evidence would be appreciated.
>
> Please check my comment above.
>
> >>>>      a. So V4L2 subsystem and the current virtio-video driver are already
> >>>> reducing the complexity. And this seems as the right place to do this,
> >>>> because the complexity is caused by the amount of V4L2 use cases and its
> >>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
> >>>> would prefer a simpler API, right? I think this use-case is not purely
> >>>> abstract at all.
> >>> The V4L2 subsystem is there to factorize code that can be shared
> >>> between drivers and manage their internal state. Our target is the
> >>> V4L2 UAPI, so a Windows driver needs not be concerned about these
> >>> details - it does what it would have done with virtio-video, and just
> >>> uses the V4L2 structures to communicate with the host instead of the
> >>> virtio-video ones.
> >>
> >> It can also reuse the virtio-video structures. So I think despite the
> >> ability to reuse V4L2 structures, having to implement a linux-specific
> >> interface would still be a bigger pain.
> >
> > The only Linux-specific thing in this interface is that it
> > misleadingly has "Linux" in its name. Otherwise it's really similar to
> > what we previously had.
> >
> >>
> >>
> >>>>      b. Less complex API is better from a security point of view too. When
> >>>> V4L2 was developed, not many people were concerned with malicious USB
> >>>> devices probably. At least exploiting a malicious USB device usually
> >>>> requires physical access. With virtual devices and multiple VMs the
> >>>> stakes are higher, I believe.
> >>> That's probably true, but I fail to see how the fact we are using
> >>> struct v4l2_buffer instead of struct virtio_video_buffer can have an
> >>> impact on that?
> >>>
> >>> V4L2 has a larger UAPI surface because it manages more kinds of
> >>> devices, but drivers only need to implement the ioctls they need. For
> >>> the rest, they just return -ENOTTY, and evil actors are hopefully kept
> >>> at bay.
> >>
> >> Still there are definitely more ways to do things wrong. It would be
> >> harder to audit a larger API surface.
> >
> > If you write a video device you don't need to support more API than
> > that requested for your device. All unsupported interfaces can simply
> > return ENOTTY.
>
>
>
> >>>> 2. We have a working virtio-video driver. So we need very good reasons
> >>>> to start from scratch. You name two reasons AFAIR: simplicity and
> >>>> possible use of cameras. Did I miss something else?
> >>>>
> >>>>      a. The simplicity is there only in case all the interfaces are V4L2,
> >>>> both in the backend and in the guest. Otherwise the complexity is just
> >>>> moved to backends. I haven't seen V4L2 in our setups so far, only some
> >>>> proprietary OMX libraries. So from my point of view, this is not
> >>>> simplicity in general, but an optimization for a specific narrow use case.
> >>> V4L2 is not a narrow use-case when it comes to video devices on Linux
> >>> - basically every user space application involving cameras or codecs
> >>> can use it. Even the virtio-video driver exposes a V4L2 device, so
> >>> unless you are using a different driver and proprietary userspace apps
> >>> specifically written to interact with that driver, V4L2 is involved in
> >>> your setup at some point.
> >>
> >> Sorry, I mean narrow use-case if we look into other possibilities:
> >>
> >> 1. Stateless V4L2 on the host.
> >> 2. Any other interface on the host.
> >> 3. Any other guest except Linux.
> >>
> >> Our targets are several popular embedded SoCs. Unfortunately we don't
> >> have the luxury of simply having normal V4L2 devices there. And it
> >> doesn't look like this is going to change.
> >>
> >>
> >>> The guest driver that I wrote is, I think, a good example of the
> >>> complexity you can expect in terms of guest driver size (as it is
> >>> pretty functional already with its 1000 and some LoCs). For the UAPI
> >>> complexity, the host device basically unpacks the information it needs
> >>> and rebuilds the V4L2 structures before calling into the host device,
> >>> and I don't see this process as more complex that the unpacking of
> >>> virtio-video structs which we also did in crosvm.
> >>
> >> Unfortunately our hypervisor doesn't support mapping random host pages
> >> in the guest.
> >
> > The ability to map random host pages to the guest is *not* a
> > requirement of virtio-v4l2.
> >
> >> Static allocations of shared memory regions are possible.
> >> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> >> a region per virtual device. This is just very tedious and inflexible.
> >> That's why we're mainly interested in having the guest pages sharing in
> >> the virtio video spec.
> >
> > I'll be happy to update the PoC and make it able to use guest pages as
> > buffer backing memory. It just wasn't the priority to demonstrate the
> > global approach.
>
> Great, thank you. If you have a concrete plan already, I think it could
> be beneficial to discuss it now. Otherwise I'd prefer to keep working on
> the current approach until I see something concrete.

Just give me a couple more weeks and I think I can produce the code.
But I'm afraid you have already made up your mind anyway.

>
> >>>>      b. For modern cameras the V4L2 interface is not enough anyway. This
> >>>> was already discussed AFAIR. There is a separate virtio-camera
> >>>> specification, that indeed is based on V4L2 UAPI as you said. But
> >>>> combining these two specs is certainly not future proof, right? So I
> >>>> think it is best to let the virtio-camera spec to be developed
> >>>> independently.
> >>> I don't know if virtio-camera has made progress that they have not
> >>> published yet, but from what I have seen virtio-v4l2 can cover
> >>> everything that the currently published driver does (I could not find
> >>> a specification, but please point me to it if it exists), so there
> >>> would be no conflict to resolve.
> >>>
> >>> V4L2 with requests support should be capable of handling complex
> >>> camera configurations, but the effort indeed seems to have switched to
> >>> KCAM when it comes to supporting complex native cameras natively. That
> >>> being said:
> >>>
> >>> * KCAM is not merged yet, is probably not going to be for some time
> >>> (https://ddec1-0-en-ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2flwn.net%2fArticles%2f904776%2f&umid=7506d37b-2a1b-4aff-ac10-25fcc75ef955&auth=53c7c7de28b92dfd96e93d9dd61a23e634d2fbec-7499883461cdeab6be6e5789888e8475e39171da), and we don't know how we can
> >>> handle virtualization with it,
> >>> * The fact that the camera is complex on the host does not mean that
> >>> all that complexity needs to be exposed to the guest. I don't know how
> >>> the camera folks want to manage this, but one can imagine that the
> >>> host could expose a simpler model for the virtual camera, with only
> >>> the required knobs, while the host takes care of doing all the complex
> >>> configuration.
> >>> * The counter argument can be made that simple camera devices do not
> >>> need a complex virtualization solution, so one can also invoke
> >>> simplicity here to advocate for virtio-v4l2.
> >>>
> >>> My point is not to say that all other camera virtualization efforts
> >>> should be abandoned - if indeed there is a need for something more
> >>> specific, then nothing prevents us from having a virtio-camera
> >>> specification added. However, we are nowhere close to this at the
> >>> moment, and right now there is no official solution for camera
> >>> virtualization, so I see no reason to deny the opportunity to support
> >>> simple camera devices since its cost would just be to add "and cameras
> >>> device" in the paragraph of the spec that explains what devices are
> >>> supported.
> >>
> >> Well, for reasons described above it still seems perfectly fine to me to
> >> have separate devices. Ok, the argument, that this approach also seems
> >> more future-proof, is not a strong one.
> >
> > Please elaborate on its weaknesses then.
>
> Well, as you said basically. The weakness of the argument is that the
> virtio-camera is not yet published, the KCAM is not merged yet, so yeah,
> the future is not clear actually.
>
> BTW I just thought about one more case, that is already real: sharing
> camera streams with pipewire. I think pipewire doesn't provide a V4L2
> UAPI interface, right?

I believe it does: https://archlinux.org/packages/extra/x86_64/pipewire-v4l2/

But in any case, that's irrelevant to the guest-host interface, and I
think a big part of the disagreement stems from the misconception that
V4L2 absolutely needs to be used on either the guest or the host,
which is absolutely not the case.

>
> >>>> 3. More specifically I can see, that around 95% V4L2 drivers use
> >>>> videobuf2. This includes the current virtio-video driver. Bypassing the
> >>>> V4L2 subsystem means that vb2 can't be used, right? In various
> >>>> discussions vb2 popped up as a thing, that would be hard to avoid. What
> >>>> do you think about this? How are you going to deal with various V4L2
> >>>> memory types (V4L2_MEMORY_MMAP, V4L2_MEMORY_DMABUF, etc), for example?
> >>>> I'll try to dive deeper myself too...
> >>> VB2 is entirely avoided in the current driver, but my understanding is
> >>> that its helpers could be used if needed.
> >>>
> >>> In virtio-v4l2, MMAP means that the host is responsible for managing
> >>> the buffers, so vb2 is entirely avoided. USERPTR means the guest
> >>> passes a SG list of guest physical addresses as mapping memory. VB2
> >>> may or may not be involved in managing this memory, but most likely
> >>> not if that memory comes from the guest userspace. DMABUF means the
> >>> guest passes a virtio object as the backing memory of the buffer.
> >>> There again there is no particular management to be done on the guest
> >>> side.
> >>>
> >>> I bypassed VB2 for the current driver, and the cost of doing this is
> >>> that I had to write my own mmap() function.
> >>
> >> The cost of it as of now is also that:
> >>
> >> 1. Only guest user-space applications, that use V4L2_MEMORY_MMAP, are
> >> supported AFAIU.
> >
> > This has nothing to do with VB2. I wanted to demonstrate that V4L2
> > could be used as a host-guest protocol and did it on a single memory
> > type to release something quickly. Please stop strawmanning the design
> > because the PoC is still incomplete.
>
> Please stop putting labels like this on my arguments. This is not
> helpful at all.
>
> >> 2. There is no flexibility to choose whatever way of memory management
> >> host and guest would like to use. Now the guest user-space application
> >> selects this.
> >
> > Errr no. The guest user-space chooses a type of memory from what the
> > guest kernel exposes, which depends on what the host itself decides to
> > expose.
>
> I don't agree. If an already written user-space app supports only MMAP,
> then there is no way to force it use USERPTR, right? Please correct me
> if I'm wrong.

The memory types exposed by the guest kernel do not need to match
those exposed by the hypervisor or that the guest kernel chooses to
use.

For instance, imagine that the hypervisor does not support allocating
buffer memory - i.e. it does not support the MMAP memory type. The
guest will then have to use its own memory for buffer allocation, and
send them to the host with the USERPTR memory type.

Now if a guest user-space application only supports MMAP, that's not a
problem at all. Most V4L2 drivers allocate MMAP buffers from regular
memory. So when the application requests MMAP buffers, the guest
kernel can honor this request by allocating some memory itself, and
changes it to USERPTR when passing the request to the hypervisor so it
knows that guest memory is in use.

I am responsible for this misconception since I insisted on using the
same memory types (MMAP, USERPTR, DMABUF) as V4L2 for guest/host
communication, which is misleading. It would probably be less
confusing to define new types (HOST, GUEST and VIRTIO_OBJ) just for
virtio-v4l2 and forbid the use of the kernel/user space memory types.

With these new names, I think it is clear that we have the exact
feature set of virtio-video (guest memory and virtio objects) covered,
plus another one where we allow the host to perform the allocation
itself (which may be useful if the video device has its own dedicated
memory). Again, this is only for host/guest communication. Guest
kernel/userspace is a different thing and can be implemented in
different ways depending on what the host supports.

>
> >> The latter makes the solution much less flexible IMO. For example, this
> >> won't work well with our hypervisor. There might other special needs in
> >> other use-cases. Like sharing these object UUIDs. Probably this can
> >> handled by mapping, for example, V4L2_MEMORY_USERPTR to guest-pages
> >> sharing, V4L2_MEMORY_DMABUF to the UUIDs (which is not quite correct
> >> IMHO).
> >
> > Please elaborate on why this is not correct.
>
> Because IMHO UUIDs pointing to memory allocated by virtio-gpu are quite
> different dmabufs created in the guest with udmabuf, for example. This
> can be confusing.

True, and that's another reason to define our own memory types to
remove that confusion.

>
> >> So this already means querying the device for supported sharing
> >> methods, rewriting the flow of V4L2 UAPI calls on the fly, ensuring
> >> consistency, etc. This already looks hackish to me. Do you have a better
> >> plan?
> >
> > How do you support different kinds of memory without querying? Or do
> > you suggest we stick to a single one?
> >
> > I am also not quite sure what you mean by "rewriting the flow of V4L2
> > UAPI calls on the fly". There is no "rewriting" - V4L2 structures are
> > just used to communicate with the host instead of virtio-video
> > structures.
>
> I'd like to know your ideas or better a concrete plan for enabling
> user-space apps, that only support MMAP, to work on top of a device,
> that supports only guest pages sharing.

Hopefully my explanation above clears that.

>
> >> Also this limits us to only 3 methods, right? And what if there
> >> are more than 3 methods in the future?
> >
> > Nothing prevents us from adding new virtio-specific memory types if
> > needed. But what other methods did you have in mind?
>
> You mean we can easily extend V4L2 UAPI with our own memory types, that
> are not used in usual V4L2 drivers? Please provide some evidence.

Err you were the one asking about adding more methods. I don't see the
need for it myself.

>
> >> I think this inflexibility is a major problem with this approach.
> >>
> >>
> >>>>> Actually I don't think this is even something we need to think about -
> >>>>> in its simplest form the V4L2 guest driver just needs to act as a
> >>>>> proxy for the device. So which decoder API is used by the host is
> >>>>> completely irrelevant to the guest driver - it can support a decoder,
> >>>>> an encoder, or a camera - it doesn't even need to be aware of what
> >>>>> kind of device it is exposing and that simplicity is another thing
> >>>>> that I like with this design.
> >>>> As I wrote above the design would be indeed simple only in case the
> >>>> actual hardware is exposed to a backend through V4L2 too. Otherwise the
> >>>> complexity is just moved to backends.
> >>> Yes, and while I acknowledge that, this is not really more complex
> >>> that what you would have to do with a virtio-video device which also
> >>> needs to manage its own state and drive the hardware through backends.
> >>> I say that based on the experience working on the virtio-video device
> >>> in crosvm which follows that design too.
> >>
> >> As I wrote above we have a different use-case. And I see the current
> >> state of virtio video as a good common ground for different parties and
> >> use-cases. Unfortunately I don't see any upsides for our use-cases from
> >> the V4L2 UAPI proposal, only downsides.
> >
> > Well AFAICT V4L2 provides the exact same set of capabilities as
> > virtio-video, with only minor differences. If virtio-video was
> > suitable for your use-case, V4L2 should be as well.
> >
> > Maybe it makes things marginally more complex for your particular
> > proprietary bare-metal hypervisor. But it also makes things
> > dramatically easier and provides much more features for the vast
> > majority of the virtio audience who run Linux guests and can now use a
> > much simpler driver. Which one do we want to prioritize?
>
> This sounds like a neglect for our use-case. This is not helpful, if
> we're going to continue working with the same device, because this
> questions our ability to cooperate. That's fine as long as we can
> continue developing the current version separately.

I said "marginally". AFAICT your use-case would be just fine with
virtio-v4l2. On my end, I'd like to make sure the vast majority of
device implementers don't have to make a choice between two equivalent
but incompatible standards.

>
> > I'm sorry but your answer is full of vague assertions about supposed
> > shortcomings of the approach without any concrete evidence of its
> > unsuitability. Please show us why this wouldn't work for you.
>
> I asked you what is your plan about the guest pages sharing. Probably
> you didn't see this question because I don't see the answer in your
> email. So I'm reiterating it here. What is your plan? Without that I can
> only share my own ideas, and indeed the whole conversation can seem
> vague and hypothetical.

The plan is that it's going to be basically the same as virtio-video
guest page sharing. I don't need to detail much more as we both know
how that works.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-26 15:52                                     ` Alexander Gordeev
@ 2023-04-27 13:23                                       ` Alexandre Courbot
  2023-04-27 15:12                                         ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-27 13:23 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Thu, Apr 27, 2023 at 12:52 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 21.04.23 06:02, Alexandre Courbot wrote:
> > On Wed, Apr 19, 2023 at 4:39 PM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> On 17.04.23 16:43, Cornelia Huck wrote:
> >>> On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >>>
> >>>> OpenSynergy, the company that I work for, develops a proprietary
> >>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
> >>>> have our proprietary device implementations, but overall our goal is to
> >>>> bring open standards into these quite closed domains and we're betting
> >>>> big on virtio. The idea is to run safety-critical functions like cockpit
> >>>> controller alongside with multimedia stuff in different VMs on the same
> >>>> physical board. Right now they have it on separate physical devices. So
> >>>> they already have maximum isolation. And we're trying to make this
> >>>> equally safe on a single board. The benefit is the reduced costs and
> >>>> some additional features. Of course, we also need features here, but at
> >>>> the same time security and ease of certification are among the top of
> >>>> our priorities. Nobody wants cars or planes to have security problems,
> >>>> right? Also nobody really needs DVB and even more exotic devices in cars
> >>>> and planes AFAIK.
> >>>>
> >>>> For the above mentioned reasons our COQOS hypervisor is running on bare
> >>>> metal. Also memory management for the guests is mostly static. It is
> >>>> possible to make a shared memory region between a device and a driver
> >>>> managed by device in advance. But definitely no mapping of random host
> >>>> pages on the fly is supported.
> >>>>
> >>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
> >>>> in its own virtualized environment, right? Both the host and guest are
> >>>> linux. In this case I totally understand why V4L2 UAPI pass-through
> >>>> feels like a right move. I guess, you'd like to make the switch to
> >>>> virtualized apps as seemless as possible for your users. If they can't
> >>>> use their DVBs anymore, they complain. And adding the virtualization
> >>>> makes the whole thing more secure anyway. So I understand the desire to
> >>>> have the range of supported devices as broad as possible. It is also
> >>>> understandable that priorities are different with desktop
> >>>> virtualization. Also I'm not trying to diminish the great work, that you
> >>>> have done. It is just that from my perspective this looks like a step in
> >>>> the wrong direction because of the mentioned concerns. So I'm going to
> >>>> continue being a skeptic here, sorry.
> >>>>
> >>>> Of course, I don't expect that you continue working on the old approach
> >>>> now as you have put that many efforts into the V4L2 UAPI pass-through.
> >>>> So I think it is best to do the evolutionary changes in scope of virtio
> >>>> video device specification, and create a new device specification
> >>>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> >>>> continue the virtio-video development. In fact I already started making
> >>>> draft v7 of the spec according to the comments. I hope it will be ready
> >>>> for review soon.
> >>>>
> >>>> I hope this approach will also help fix issues with virtio-video spec
> >>>> and driver development misalignment as well as V4L2 compliance issues
> >>>> with the driver. I believe the problems were caused partly by poor
> >>>> communication between us and by misalignment of our development cycles,
> >>>> not by the driver complexity.
> >>>>
> >>>> So in my opinion it is OK to have different specs with overlapping
> >>>> functionality for some time. My only concern is if this would be
> >>>> accepted by the community and the committee. How the things usually go
> >>>> here: preferring features and tolerating possible security issues or the
> >>>> other way around? Also how acceptable is having linux-specific protocols
> >>>> at all?
> >>> My main question is: What would be something that we can merge as a
> >>> spec, that would either cover the different use cases already, or that
> >>> could be easily extended to cover the use cases it does not handle
> >>> initially?
> >>>
> >>> For example, can some of the features that would be useful in crosvm be
> >>> tucked behind some feature bit(s), so that the more restricted COQOS
> >>> hypervisor would simply not offer them? (Two feature bits covering two
> >>> different mechanisms, like the current approach and the v4l2 approach,
> >>> would also be good, as long as there's enough common ground between the
> >>> two.)
> >>>
> >>> If a staged approach (adding features controled by feature bits) would
> >>> be possible, that would be my preferred way to do it.
> >>
> >> Hmm, I see several ways how we can use the feature flags:
> >> 1. Basically making two feature flags: one for the current video spec
> >> and one for the V4L2 UAPI pass through. Kind of the same as having two
> >> different specs, but within one device. Not sure which way is better.
> >> Probably having two separate devices would be easier to review and merge.
> >
> > Having two different devices with their own IDs would indeed be less
> > confusing than using feature bits.
> >
> > That being said, the whole point of proposing virtio-v4l2 is to end up
> > with *less* specification, not more. Having two concurrent and largely
> > overlapping approaches will result in fragmentation and duplicated
> > work, so my suggestion would be to decide on one or the other and
> > stick to it.
>
> Hmm, Enrico pointed out, that having virtio-v4l2 would also be good
> because of much better compatibility with Android right now. I don't
> think the specification length should be our ultimate goal. Cornelia
> said, that her ultimate goal is to have a spec everyone is happy with,
> regardless on how we arrive there. Well, I can only say, that I also
> think this should be our goal.

Try to put yourself into the shoes of someone who needs to write a new
video device using virtio. Oh, there are two ways to do it. This guest
OS only supports virtio-video. But this guest OS only supports
virtio-v4l2. Great, now you need to support two interfaces with your
device, or write two different devices.

>
> >> 2. Finding a subset of V4L2, that closely matches the current draft, and
> >> restrict everything else. A perfectly reasoned answer for this case will
> >> require a lot of work going through all the V4L2 structures I think. And
> >
> > It's pretty trivial, on the contrary. For a decoder device one would
> > just need to restrict to the structs and operations mentioned here:
> > https://www.kernel.org/doc/html/v5.18/userspace-api/media/v4l/dev-decoder.html
>
> I don't agree. But we already talk about this in other threads.
>
> >> even if we have a concrete plan, it has to be implemented first. I doubt
> >> it is possible. Based on the things, that I already know, this is going
> >> to be a compromise on security anyway, so we're not happy about that.
> >> More on that below.
> >> 3. Stop trying to simply pass the V4L2 UAPI through and focus on making
> >> the virtio video spec as close to the V4L2 UAPI as possible, but with
> >> the appropriate security model. So that the video device can be extended
> >> with a feature flag to something very close to full V4L2 UAPI. A lot of
> >> work as well, I think. And this won't allow us to simply link the V4L2
> >> UAPI in the spec and therefore reduce its size, which is Alexandre's
> >> current goal. So Alexandre and his team are not happy this way probably.
> >
> > That would indeed be reinventing the wheel and a completely pointless
> > exercise IMHO.
>
> I agree.
>
> >>   From the security point of view these are our goals from most to less
> >> important AFAIU:
> >> 1. Make the device secure. If a device is compromised, the whole
> >> physical machine is at risk. Complexity is the enemy here. It helps a
> >> lot to make the device as straightforward and easy to implement as
> >> possible. Therefore it is good to make the spec device/function-centric
> >> from this PoV.
> >
> > This seems to be the main point of contention we have right here. I do
> > not believe that V4L2 introduces significant complexity to the video
> > use-case, and certainly not to a degree that would justify writing a
> > new spec just for that.
>
> Ok, then we should try to agree on a benchmark, I think.
>
> >> 2. Ensure, that drivers are also secure at least from user-space side.
> >> Maybe from device side too.
> >
> > FWIW V4L2 has been in use for a looong time (including in Chromebooks
> > that do secure playback) and I am not aware of fundamental security
> > issues with it.
>
> Already being discussed separately...
>
> >> 3. Implementing secure playback and making sure media doesn't leak. For
> >> this case it is nice to have these object UUIDs as buffers.
> >>
> >> Please correct me if there's something wrong here.
> >>
> >> When we start looking from this perspective even things like naming
> >> video buffer queues "output" for device input and "capture" for device
> >> output are problematic. In my experience this naming scheme takes some
> >> time and for sure several coding mistakes to get used to. Unfortunately
> >> this can't be turned off with some feature flags. In contrast virtio
> >> video v6 names these queues "input" and "output". This is perfectly fine
> >> if we look from the device side. It is understandable, that Alexandre's
> >> list of differences between V4L2 UAPI and the current state of virtio
> >> video doesn't include these things. But we have to count them, I think.
> >> That's why it takes me so long to make the list. :) So throwing away
> >> this simplicity is still going to be a compromise from the security
> >> perspective, that we're not happy about.
> >>
> >> This is mostly because V4L2 UAPI brings a hard dependency on V4L2, its
> >> security model, legacy, use-cases, developers, etc. It can be changed
> >> over time, but this is a long process because this means changing the
> >> Linux UAPI. Also this means, that nothing can be removed from it, only
> >> added (if the V4L2 community agrees). For example, we can't simply add a
> >> new way of sharing buffers for potential new use-cases once we switch to
> >> V4L2 UAPI AFAIU.
> >
> > You certainly can, it would just need to be in the virtio
> > specification. And again that's a very hypothetical case.
>
> Ok, maybe it is hypothetical.  We won't know until it happens. I still
> don't like it. Overriding things in the virtio specification is not
> particularly nice. It is much easier to understand when things are kept
> together, not as patches.
>
> >> The V4L2 community can simply reject changes because
> >> this is a UAPI after all. We kind of have a weak dependency already,
> >> because the only driver implementation is based on V4L2 and we'd like to
> >> keep the spec as close to V4L2 as possible, but it is not the same
> >> thing. So at the moment it looks like the V4L2 UAPI proposal is not
> >> super flexible. Alexandre said, that we can simply not implement some of
> >> the ioctls. Well, this definitely doesn't cover all the complexity like
> >> the structures and other subtle details.
> >
> > Mmm? Where did I say that? That sounds like a misunderstanding.
>
> Here is the quote from your email on March 17:
>
> > V4L2 has a larger UAPI surface because it manages more kinds of
> > devices, but drivers only need to implement the ioctls they need. For
> > the rest, they just return -ENOTTY, and evil actors are hopefully kept
> > at bay.
>
>
> >> Also adding the feature flags would probably defeat the stated purpose
> >> of switching to V4L2 UAPI anyway: the simplicity of the spec and of the
> >> V4L2 driver.
> >>
> >> So I have a lot of doubts about the feasibility of adding feature flags.
> >> If Alexandre and his team want the V4L2 UAPI as is, then looks like it
> >> is best to simply have two specs:
> >> 1. virtio-video for those interested in building from ground up with the
> >> security model appropriate for virtualization in mind. This is going to
> >> take time and not going to reach feature parity with V4L2 ever I think.
> >> I mean some old devices might never get support this way.
> >> 2. virtio-v4l2 as a shortcut for those interested in having feature
> >> parity with V4L2 fast. Like a compatibility layer. Probably this is
> >> going to be used in linux host + linux guest use-cases only. Maybe it
> >> gets obsoleted by the first spec in several years for most modern use-cases.
> >>
> >> Or maybe have these two cases within a single device spec as I wrote above.
> >>
> >> This makes a lot of sense to me. If V4L2 UAPI pass through is in fact
> >> only needed for compatibility, then this way we can avoid a lot of work
> >> going through all of the V4L2 and trying to find different subsets or
> >> trying to construct something, that is close to V4L2 UAPI, but doesn't
> >> compromise on the security. I'm not really interested in doing all this
> >> work because we're already more or less satisfied with the current
> >> state. We don't need feature parity with V4L2. On the other hand for
> >> Alexandre the feature-parity with V4L2 is clearly of higher priority,
> >> than all these subtle security model differences. In my opinion it also
> >> doesn't make sense to invest that much time in something, that looks
> >> like a compatibility layer. So it seems both of us are interested in
> >> avoiding all this extra work. Then I'd just prefer to have two different
> >> specs so that everyone can work according to their priorities.
> >>
> >>
> >>> Regarding the protocol: I think Linux-originating protocols (that can be
> >>> implemented on non-Linux setups) are fine, Linux-only protocols probably
> >>> not so much.
> >>
> >> Thanks for the information. Well, it looks like the V4L2 UAPI could be
> >> implemented on any platform unless it needs a completely new way of
> >> memory management since all the V4L2_MEMORY_* constants are going to be
> >> used already AFAIU.
> >>
> >>
> >>>>>>       a. So V4L2 subsystem and the current virtio-video driver are already
> >>>>>> reducing the complexity. And this seems as the right place to do this,
> >>>>>> because the complexity is caused by the amount of V4L2 use cases and its
> >>>>>> legacy. If somebody wants to use virtio-video in a Windows guest, they
> >>>>>> would prefer a simpler API, right? I think this use-case is not purely
> >>>>>> abstract at all.
> >>>>> The V4L2 subsystem is there to factorize code that can be shared
> >>>>> between drivers and manage their internal state. Our target is the
> >>>>> V4L2 UAPI, so a Windows driver needs not be concerned about these
> >>>>> details - it does what it would have done with virtio-video, and just
> >>>>> uses the V4L2 structures to communicate with the host instead of the
> >>>>> virtio-video ones.
> >>>> It can also reuse the virtio-video structures. So I think despite the
> >>>> ability to reuse V4L2 structures, having to implement a linux-specific
> >>>> interface would still be a bigger pain.
> >>> Hm. Do the v4l2 structures drag in too many adjacent things that need to
> >>> be implemented? Can we match the video-video structures from the current
> >>> proposal with some v4l2 structures and extract a common wrapper for
> >>> those that match, with a feature-bit controlled backend? It would be
> >>> fine if any of those backends supported a slightly different subset of
> >>> the common parts, as long as the parts implemented by both would be
> >>> enough to implement a working device. (Mostly thinking out loud here.)
> >>
> >> I don't think this is realistic unfortunately. On per ioctl level it is
> >> possible to disable some functionality probably, but the V4L2 structures
> >> are set in stone. We can only extend them.
> >>
> >>
> >>>>> The guest driver that I wrote is, I think, a good example of the
> >>>>> complexity you can expect in terms of guest driver size (as it is
> >>>>> pretty functional already with its 1000 and some LoCs). For the UAPI
> >>>>> complexity, the host device basically unpacks the information it needs
> >>>>> and rebuilds the V4L2 structures before calling into the host device,
> >>>>> and I don't see this process as more complex that the unpacking of
> >>>>> virtio-video structs which we also did in crosvm.
> >>>> Unfortunately our hypervisor doesn't support mapping random host pages
> >>>> in the guest. Static allocations of shared memory regions are possible.
> >>>> But then we have to tell V4L2 to allocate buffers there. Then we'll need
> >>>> a region per virtual device. This is just very tedious and inflexible.
> >>>> That's why we're mainly interested in having the guest pages sharing in
> >>>> the virtio video spec.
> >>> This really sounds like you'll want a different approach -- two
> >>> mechanisms covered by two feature bits might indeed be the way to go.
> >>
> >> Well, basically this is the way we have it now. I'm not sure what is
> >> Alexandre's plan with the V4L2 UAPI approach. And if this is going to be
> >> solved, the solution already doesn't look future-proof anyway unfortunately.
> >>
> >>
> >>>>> I hope I have somehow addressed your points. The main point here is to
> >>>>> discuss whether the V4L2 UAPI is a suitable transport for guest/host
> >>>>> accelerated codec work, regardless of what the guest or host
> >>>>> ultimately uses as UAPI. The goal of the PoC is to demonstrate that
> >>>>> this is a viable solution. This PoC is largely simplified by the fact
> >>>>> that V4L2 is used all along the way, but this is irrelevant - yes,
> >>>>> actual devices will likely talk to other APIs and maintain more state,
> >>>>> like a virtio-video device would do. What I want to demonstrate is
> >>>>> that we can send encoding work and receive a valid stream, and that it
> >>>>> is not costly, and only marginally more complex than our virtio-video
> >>>>> spec attempts.
> >>>>>
> >>>>> ... and we can support cameras too, but that's just a convenient
> >>>>> side-effect, not the ultimate solution to the camera virtualization
> >>>>> problem (that's for the camera folks to decide).
> >>>> Thanks for your answer!
> >>> Thanks everyone -- do you think the "two feature bits to cover different
> >>> approaches, but using a common infrastructure" idea could work? If yes,
> >>> I think that's the direction we should take. If we can implement this
> >>> with just one feature bit, that might also be a good route to extend it
> >>> later, but I'm not familiar enough with the whole infrastructure to make
> >>> any judgement here.
> >>
> >> Thanks for your suggestions. Hopefully we end up with a good solution.
> >
> > To summarize my position:
> >
> > * I am still not convinced that V4L2 is lacking from a security
> > perspective. It would take just one valid example to change my mind
> > (and no, the way the queues are named is not valid). And btw, if it
> > really introduces security issues, then this makes it invalid for
> > inclusion in virtio entirely, just not OpSy's hypervisor.
>
> Already being discussed separately...
>
> > * Having two overlapping specifications for video is overkill and will
> > just fragment virtio (as tempting as it is, I won't link to XKCD). I
> > strongly advise against that.
>
> I think they're not going to create more problems, than virtio-blk,
> virtio-scsi and virtio-fs, for example.

At least these devices work at different layers, that makes them more
justifiable. virtio-video and virtio-v4l2 are just going to provide
the same API for video devices, only with different structures and
commands.

> The decision can be made like this:
> 1. You have a V4L2 device, you don't need any more processing, just want
> it inside a Linux/Android VM => use virtio-v4l2.
> 2. You don't have a V4L2 device, or your host is not Linux, or your
> maybe your guest is not Linux/Android, or you want some extra processing
> on the host (say you have a third-party proprietary library or whatever)
> => use virtio-video.

That would make sense if 2. could not be done just as easily by also
using virtio-v4l2, which I believe it can be.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-25 16:04                                         ` Cornelia Huck
  2023-04-26  6:29                                           ` Alexandre Courbot
@ 2023-04-27 14:10                                           ` Alexander Gordeev
  2023-04-28  4:02                                             ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-27 14:10 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On 25.04.23 18:04, Cornelia Huck wrote:
>
> [I'm replying here, as that seems to be the last message in the thread,
> and my reply hopefully catches everyone interested here.]
>
> To do a very high level summary, we have (at least) two use cases for
> virtio-video, that unfortunately have quite different requirements. Both
> want to encode/decode video, but in different environments.
>
> - The "restricted" case: Priority is on security, and the attack surface
>    should be kept as small as possible, for example, by avoiding unneded
>    complexity in the interface. Fancy allocations and management should
>    be avoided. The required functionality is also quite clearly defined.
> - The "feature-rich" case: Priority is on enabling features, and being
>    able to re-use existing V4L2 support is considered a big plus. Both
>    device and driver implementations will be implemented in a full OS
>    environment, so all kind of helpers are already available.
>
> (This is not to say that one case does not care about functionality or
> security; it's mostly a case of different priorities and environments.)

I'm thinking about the latter as more like a "compatibility" case, but
the "feature-rich" is also a good name.

> I had been hoping that it would be possible to find kind of a common
> ground between the two cases, but reading the thread, I'm not quite as
> hopeful anymore... if we really don't manage to find an approach to make
> the different requirements co-exist, a separate virtio-v4l2 device might
> be the way to go -- but I've not totally given up hope yet.

 From our side I can say, that moving from the current state even to a
well-defined subset of V4L2 would require a lot of work, bring literally
zero advantages for our use-case, while bringing some disadvantages. I
think we had a good progress so far, we don't want to give up the
achievements and now this is a great opportunity to do even better
because our priorities will not collide anymore.

On the other side I don't think Alexandre and his team are really
interested in doing the extra work of clearly defining the subset of
V4L2, writing larger specifications, going through all the hassle with
making the guest pages sharing work (again) and supporting this case in
their driver for us for something that is planned to be a very simple
device and driver. Please correct me if I'm wrong here.

And even if they say they agree to do the work with us... I'm sorry, but
you can probably see, that our communication doesn't go smooth. My
emails are forgotten, our use-case is clearly not a priority for
Alexandre, my arguments seem to be considered as obstacles. If we have a
single device, we have to cooperate actively. I have a lot of doubts
that it is possible at the moment. In this case I'd prefer to just make
room for everybody. Maybe we'll cooperate in a fruitful way again later.
The priorities or use-cases might change, for example. For us this is an
opportunity to finally update the virtio-video driver against the latest
state and hopefully make it V4L2 compliant.

> Some remarks from my side:
>
> - I'm not totally convinced that counting words is always a good proxy
>    for complexity -- an interface might be simple on paper, but if the
>    actual implementation would need to be quite involved to get it right,
>    we'd again have a lot of opportunity for mistakes.

Well, I agree, that this is not the best possible benchmark. At least it
is simple. My idea was that our output as spec writers is words. I hope
at some point we'll have a better benchmark or just agree, that these
structures have a lot of completely irrelevant stuff and it is not
immediately clear, what is necessary and what is not. Maybe the number
of different definitions (structs, struct fields, enum members, defines,
etc) would be a better benchmark?

> - How much of v4l2 does actually need to be in the device specification
>    for a driver to make potentially good use of it? Sure, being able to
>    directly map to v4l2 really gives a huge benefit, but is there a way
>    to extract a subset that's not too complex, but can be easily wrapped
>    for interfacing with v4l2? (Both interface and functionality wise.)
>    Even if that means that a driver would need to implement some kind of
>    shim, a layer that easily maps to v4l2 concepts would still be much
>    easier to implement than one that needs to map two quite different
>    interfaces. [I'm really relying on the good judgement of people
>    familiar with the interfaces here :)]

I don't have an answer at the moment unfortunately.

> - To which extent does security need to be baked into the device
>    specification? We should avoid footguns, and avoiding needless
>    complication is also a good idea, but while every new functionality
>    means more attack surface, it also enables more use cases. That
>    tension is hard to resolve; how much of it can we alleviate by making
>    things optional?

My opinion is that it is best to not compromise on security right from
the start and not give up what we already have. Because it is hard to
add it back later. It is best to define clear granular interfaces from
the beginning, use feature flags where possible and also have many
different devices. Then it is easy both to add features and to reduce
the attack surface by disabling them. No device, no problem. I think
V4L2 does too many things with no clear way of disabling various
features. Defining the stateful decoder and encoder interfaces is a
great thing. We can already benefit from this. But I can see, that this
work is not finished yet.

> I hope I have not muddied the waters here, but I'd really like to see an
> agreement on how to continue (with two different devices, if there is
> really no other way.)

Thanks for your suggestions!

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-26  5:52                                         ` Alexandre Courbot
@ 2023-04-27 14:20                                           ` Alexander Gordeev
  2023-04-28  3:22                                             ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-27 14:20 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 26.04.23 07:52, Alexandre Courbot wrote:
> On Mon, Apr 24, 2023 at 4:52 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 21.04.23 18:01, Alexander Gordeev wrote:
>>
>>> On 21.04.23 06:02, Alexandre Courbot wrote:
>>>
>>>> * I am still not convinced that V4L2 is lacking from a security
>>>> perspective. It would take just one valid example to change my mind
>>>> (and no, the way the queues are named is not valid). And btw, if it
>>>> really introduces security issues, then this makes it invalid for
>>>> inclusion in virtio entirely, just not OpSy's hypervisor.
>>>
>>>
>>> I'd like to start with this and then answer everything else later.
>>>
>>> Let's compare VIRTIO_VIDEO_CMD_RESOURCE_QUEUE with
>>> VIDIOC_QBUF+VIDIOC_DQBUF. Including the parameters, of course. First,
>>> let's compare the word count to get a very rough estimate of complexity.
>>> I counted 585 words for VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, including the
>>> parameters. VIDIOC_QBUF+VIDIOC_DQBUF are defined together and take 1206
>>> words, they both use struct v4l2_buffer as a parameter. The struct takes
>>> 2716 words to be described. So the whole thing takes 3922 words. This is
>>> 6.7 times more, than VIRTIO_VIDEO_CMD_RESOURCE_QUEUE. If we check the
>>> definitions of the structs, it is also very obvious, that V4L2 UAPI is
>>> almost like an order of magnitude more complex.
>>
>>
>> I think, it is best to add all the steps necessary to reproduce my calculations just in case.
>>
>> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE is doing essentially the same thing as VIDIOC_QBUF+VIDIOC_DQBUF, so we're comparing apples to apples (if we don't forget to compare their parameters too).
>>
>> To get the word count for the VIRTIO_VIDEO_CMD_RESOURCE_QUEUE I opened the rendered PDF of video section only from the first email in this thread. Here is the link: https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing . Then I scrolled to page 11 and copied everything related a text file. This is around two pages in the PDF. Then I removed page numbers from the copied text and used 'wc -w' to count words.
>>
>> To get the word count for VIDIOC_QBUF+VIDIOC_DQBUF I opened this link: https://docs.kernel.org/userspace-api/media/v4l/vidioc-qbuf.html . Then I selected all the text except table of contents and did followed the same procedure.
>>
>> To get the word count for struct v4l2_buffer and other types, that are referenced from it, I opened this link: https://docs.kernel.org/userspace-api/media/v4l/buffer.html#struct-v4l2-buffer . Then I selected all the text except the table of contents and the text above struct v4l2_buffer definition. The rest is the same.
>>
>> Also it's quite obvious if you look at them how much bigger struct v4l2_buffer (including the referenced types) is compared to struct virtio_video_resource_queue.
>
> You are comparing not the complexity of the structures but the
> verbosity of their documentation, which are written in a different
> style, format, and by different people.

I agree to some extent. At least this benchmark is simple and it
provokes to actually go and look at the definitions, which IMO should be
enough to already see the difference. What could be a better benchmark?
Maybe counting the number of various fields and flags and enum cases,
that one has to read through?

> And the V4L2 page also
> contains the description of memory types, which is part of another
> section in the virtio-video spec.

You mean only enum v4l2_memory? Or anything else too?

> There is no way to draw a meaningful
> conclusion from this.
>
> If you want to compare, do it with how the structures are actually
> used. Here is how you would queue an input buffer with virtio-video:
>
>    struct virtio_video_resource_queue queue_buf = {
>        .cmd_type = VIRTIO_VIDEO_CMD_RESOURCE_QUEUE,
>        .stream_id = 42,
>        .queue_type = VIRTIO_VIDEO_QUEUE_TYPE_INPUT,
>        .resource_id = 1,
>        .timestamp = 0x10,
>        .data_sizes = {
>          [0] = 0x1000,
>        },
>    };
>
> Now the same with virtio-v4l2:
>
>    struct virtio_v4l2_queue_buf queue_buf = {
>        .cmd = VIRTIO_V4L2_CMD_IOCTL,
>        .code = VIDIOC_QBUF,
>        .session_id = 42,
>        .buffer.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
>        .buffer.index = 1,
>        .buffer.timestamp.tv_usec = 0x10,
>        .buffer.memory = V4L2_MEMORY_MMAP,
>        .planes = {
>          [0] = { .bytesused = 0x1000 },
>        }
>    };
>
> In both cases, you pass a structure with some members set, and the
> rest to 0. The host receives basically the same thing - it's the same
> data! The only difference is how it is laid out.

How do I know this from the text? How do I verify, that this is correct?
Can I be sure this code is going to work or not between any device and
any driver?

With virtio-video you can just go to the spec/header file, take the
struct definition and simply set all the fields.

With V4L2 UAPI I don't see any other way except going through the whole
buffer.html file, filtering potentially irrelevant stuff, then doing
some trials, and maybe looking into the device or driver code. Maybe
also asking you for an advice?
I think it is worth trying to imagine you're a newcomer in the project.
Or a security auditor.

So I think we can't draw conclusions about the spec quality from these
code samples.

> Also as mentioned by Bart, the apparent simplicity of
> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, which does not require a dequeue
> operation as the dequeue is sent with its response, is actually a
> fallacy: that design choice makes the specification simpler, but at
> the cost of more complexity on the device side and the potential to
> starve on command descriptors. By contrast, adopting the V4L2 model
> resulted in simpler code on both sides and no possibility to deadlock.
> That point could be addressed by revising the virtio-video spec, but
> then you get even closer to V4L2.

Hmm, I understand, thanks. Well, I can fix this in the spec. I have some
ideas. I think I'd better describe them in an answer to Bart's email.

>> Do we agree now, that V4L2 UAPI is not only marginally more complex?
>
> No, and I rest my case on the code samples above.
>
>>>
>>>
>>> Also please read:
>>>
>>> https://medium.com/starting-up-security/evidence-of-absence-8148958da092
>>>
>>
>> This reference probably needs a clarification. You argued, that V4L2 has a good track record so far. Here is the quote:
>>
>>> FWIW V4L2 has been in use for a looong time (including in Chromebooks
>>> that do secure playback) and I am not aware of fundamental security
>>> issues with it.
>>
>> But absence of found major security issues doesn't tell us much about the number of not found ones. Absence of evidence is not an evidence of absence. At the link above a former Director of Security at Facebook shares his thoughts about what could be a good evidence of absence of major security problems.
>
> You are just FUDing now.

These are well established security concepts AFAIK. Are you gaslighting?

> The exact same argument could be made about
> virtio-video. If you want to borrow from the security world, I can ask
> you how going with virtio-video when V4L2 exists is different from
> rolling your own crypto.

It is very different because obviously neither of them is a crypto.

>> So a bug bounty program with high premiums that covers V4L2 would be a better argument in favor of *already written code* in my opinion. Not for new code. Also probably it is also an argument in favor of the spec, that is the V4L2 UAPI. Like that it is polished enough. Not so sure about that though.
>>
>> There actually are several bug bounty programs, that cover the kernel. These are Google's kctf, ZDI's pwn2own, and zerodium AFAIK. However the premiums are not even close to the ones mentioned in my reference. Anyway this means, that using *the existing V4L2 code in the kernel* is probably OK. But this creates some limitations if we want the actual code to still be covered with these bug bounties, right? This means, that the host OS has to be Linux and the actual hardware has to be exposed through a stable V4L2 driver, that is in mainline for some time, and there has to be no or little processing on top. For us this is not possible unfortunately. In the end both things could be secure:
>>
>> 1. V4L2 pass through can be secure because of the bug bounty programs and a lot of attention to the kernel in general.
>> 2. For the new code this doesn't work, so the spec should be as simple and device-centric as possible. Because, all other things being equal, there are fewer errors in simpler programs. So defining a subset of V4L2 UAPI including the data types looks like a good idea to me. The stateful decoder interface, that you point to, does not define a subset in the data types.
>
> Wouldn't you have exactly the same problem by using a new guest-host
> protocol?

I think this is far-fetched. Especially if our new protocol is
intentionally kept close to the reference protocol with a few things
made better and simpler. That would be my preferred route.

> Are you going to start a bounty program for virtio-video to
> get an assurance that it is secure?

I'm not sure it is possible because the device is proprietary. At least
the driver is going to be covered hopefully. So for the device we're
going to rely on simplicity, tests and audits. But who knows. I'm not
the one to make this decision.

>> This is basically my reasoning.
>>
>> Also these two specs don't need to compete with each other. They have different limitations and they are for different audiences. If you check the XKCD's comic, it is about competing standards.
>
> They allow exactly the same thing (virtualization of video
> decoding/encoding) and there isn't any use-case of virtio-video that
> could not be covered equally well by V4L2. Reinventing a new video
> specification is pointless and will lead to unneeded fragmentation.

I don't agree with these statements.

> We should also expect reticence from the Linux community to upstream
> two virtual video drivers that basically do the same thing.

This is hypothetical.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-27 10:13                                         ` Bartłomiej Grzesik
@ 2023-04-27 14:34                                           ` Alexander Gordeev
  2023-04-28  3:22                                             ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-27 14:34 UTC (permalink / raw)
  To: Bartłomiej Grzesik
  Cc: Bartłomiej Grzesik, Alexandre Courbot, Cornelia Huck,
	virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve, srosek, zyta, hmazur,
	mikrawczyk

On 27.04.23 12:13, Bartłomiej Grzesik wrote:
> Hi Alexander
>
> On Wed, Apr 26, 2023 at 6:00 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> Hi Bartłomiej,
>>
>> On 21.04.23 13:49, Bartłomiej Grzesik wrote:
>>> +CC chromeos-arc-video-eng team that also works on virtio-video
>>>
>>> Hi everyone!
>>>
>>>   From the experience of working on virtio-video I can definitely agree
>>> with Alex Courbot, that moving to virtio-v4l2 will be a great move.
>>> This move will not only simply things a lot but also allow us things like
>>> vhost-net like implementation for some devices (however this is
>>> thinking way ahead and linux only).
>>>
>>> One added benefit of this move that I'd like to point out now, that
>>> probably
>>> haven't been mentioned before, is moving to asynchronous resource queue
>>> call. Previously `VIRTIO_VIDEO_CMD_RESOURCE_QUEUE` have been
>>> synchronous and caused one hard to debug bug caused by this flawed
>>> design. During the command execution the virtio queue descriptors are
>>> blocked, potentially leading to dead locking the device. The implementation
>>> of virtio-v4l2 (even as is - btw nice work Alex!) eliminates this issue
>>> by moving to asynchronous response of the resource queue (VIDIOC_QBUF).
>>
>> Thanks for your valuable feedback! Could you please share some details
>> about the bug? That would be very helpful. I'm working on the next
>> version of the virtio-video draft, so I can change it there. I like the
>> idea to use V4L2 as a reference, so we should probably do it like it is
>> done there, only simpler. Still it would be interesting to know the
>> details, because we didn't have issues with the current design.
>
> In this bug, an app preallocated and enqueued all output buffers (to
> CAPTURE queue). This is inline with V4L2 and in case of virtualized video
> stack helps with latency. Those enqueued buffers are holding virtqueue
> descriptors until filled with a decoded frame. While for one stream it is not an
> issue, but for more simultaneous streams, it quickly becomes a serious
> problem. In our case all descriptors from the virtqueue were consumed
> by enqueued output buffers and no other command could be issued
> to hypervisor. This dead locked the entire driver, by starving the driver with
> descriptors - even STREAM_DESTROY command could not be issued and
> only solution was to reboot the guest.
>
> While it is easily solvable by adjusting the size of the virtqueue, it
> is a flaw to
> this design. The number would always have to a function of maximum number
> of supported streams - raising rather quickly.
>
> I remember having a few thoughts on how it could be solved and I think that
> removing the need to block those descriptors is the best approach in my opinion.
> One would argue that preallocating descriptors for this purpose or splitting
> the command queue to input, output and control might be a viable solution
> or per stream. However it would only delay the issue in time or could
> cause other streams to "starve".

Thank you for the detailed description. This makes total sense to me
indeed. I thought about this problem some time and discussed it with my
colleagues. Indeed looks like it would be best to stop blocking these
descriptors. We can add more queues, but this doesn't look scalable
indeed. There are several ways to unblock the descriptors, I think.
First, we can do the same thing as in V4L2: add a separate (blocking?)
DEQUEUE command. But then we theoretically can have the same problem
with DRAIN, because it also blocks. So why not just use the event queue
to receive the completion events for both QUEUE and DRAIN commands
asynchronously? One could argue, that the errors should maybe come out
of band. But at the same time we already use event queue to deliver
dynamic resolution change events, that clearly should be delivered with
the flow of buffers.

> Porting asynchronous dequeueing of resources would bring the virtio-video
> extremely close to virtio-v4l2 and therefore I support Alexandre idea to use
> v4l2 as a protocol.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-27 13:23                                       ` Alexandre Courbot
@ 2023-04-27 15:12                                         ` Alexander Gordeev
  2023-04-28  3:24                                           ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-27 15:12 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 27.04.23 15:23, Alexandre Courbot wrote:
> On Thu, Apr 27, 2023 at 12:52 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 21.04.23 06:02, Alexandre Courbot wrote:
>>> On Wed, Apr 19, 2023 at 4:39 PM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>
>>>> On 17.04.23 16:43, Cornelia Huck wrote:
>>>>> On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>>>
>>>>>> OpenSynergy, the company that I work for, develops a proprietary
>>>>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
>>>>>> have our proprietary device implementations, but overall our goal is to
>>>>>> bring open standards into these quite closed domains and we're betting
>>>>>> big on virtio. The idea is to run safety-critical functions like cockpit
>>>>>> controller alongside with multimedia stuff in different VMs on the same
>>>>>> physical board. Right now they have it on separate physical devices. So
>>>>>> they already have maximum isolation. And we're trying to make this
>>>>>> equally safe on a single board. The benefit is the reduced costs and
>>>>>> some additional features. Of course, we also need features here, but at
>>>>>> the same time security and ease of certification are among the top of
>>>>>> our priorities. Nobody wants cars or planes to have security problems,
>>>>>> right? Also nobody really needs DVB and even more exotic devices in cars
>>>>>> and planes AFAIK.
>>>>>>
>>>>>> For the above mentioned reasons our COQOS hypervisor is running on bare
>>>>>> metal. Also memory management for the guests is mostly static. It is
>>>>>> possible to make a shared memory region between a device and a driver
>>>>>> managed by device in advance. But definitely no mapping of random host
>>>>>> pages on the fly is supported.
>>>>>>
>>>>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
>>>>>> in its own virtualized environment, right? Both the host and guest are
>>>>>> linux. In this case I totally understand why V4L2 UAPI pass-through
>>>>>> feels like a right move. I guess, you'd like to make the switch to
>>>>>> virtualized apps as seemless as possible for your users. If they can't
>>>>>> use their DVBs anymore, they complain. And adding the virtualization
>>>>>> makes the whole thing more secure anyway. So I understand the desire to
>>>>>> have the range of supported devices as broad as possible. It is also
>>>>>> understandable that priorities are different with desktop
>>>>>> virtualization. Also I'm not trying to diminish the great work, that you
>>>>>> have done. It is just that from my perspective this looks like a step in
>>>>>> the wrong direction because of the mentioned concerns. So I'm going to
>>>>>> continue being a skeptic here, sorry.
>>>>>>
>>>>>> Of course, I don't expect that you continue working on the old approach
>>>>>> now as you have put that many efforts into the V4L2 UAPI pass-through.
>>>>>> So I think it is best to do the evolutionary changes in scope of virtio
>>>>>> video device specification, and create a new device specification
>>>>>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
>>>>>> continue the virtio-video development. In fact I already started making
>>>>>> draft v7 of the spec according to the comments. I hope it will be ready
>>>>>> for review soon.
>>>>>>
>>>>>> I hope this approach will also help fix issues with virtio-video spec
>>>>>> and driver development misalignment as well as V4L2 compliance issues
>>>>>> with the driver. I believe the problems were caused partly by poor
>>>>>> communication between us and by misalignment of our development cycles,
>>>>>> not by the driver complexity.
>>>>>>
>>>>>> So in my opinion it is OK to have different specs with overlapping
>>>>>> functionality for some time. My only concern is if this would be
>>>>>> accepted by the community and the committee. How the things usually go
>>>>>> here: preferring features and tolerating possible security issues or the
>>>>>> other way around? Also how acceptable is having linux-specific protocols
>>>>>> at all?
>>>>> My main question is: What would be something that we can merge as a
>>>>> spec, that would either cover the different use cases already, or that
>>>>> could be easily extended to cover the use cases it does not handle
>>>>> initially?
>>>>>
>>>>> For example, can some of the features that would be useful in crosvm be
>>>>> tucked behind some feature bit(s), so that the more restricted COQOS
>>>>> hypervisor would simply not offer them? (Two feature bits covering two
>>>>> different mechanisms, like the current approach and the v4l2 approach,
>>>>> would also be good, as long as there's enough common ground between the
>>>>> two.)
>>>>>
>>>>> If a staged approach (adding features controled by feature bits) would
>>>>> be possible, that would be my preferred way to do it.
>>>>
>>>> Hmm, I see several ways how we can use the feature flags:
>>>> 1. Basically making two feature flags: one for the current video spec
>>>> and one for the V4L2 UAPI pass through. Kind of the same as having two
>>>> different specs, but within one device. Not sure which way is better.
>>>> Probably having two separate devices would be easier to review and merge.
>>>
>>> Having two different devices with their own IDs would indeed be less
>>> confusing than using feature bits.
>>>
>>> That being said, the whole point of proposing virtio-v4l2 is to end up
>>> with *less* specification, not more. Having two concurrent and largely
>>> overlapping approaches will result in fragmentation and duplicated
>>> work, so my suggestion would be to decide on one or the other and
>>> stick to it.
>>
>> Hmm, Enrico pointed out, that having virtio-v4l2 would also be good
>> because of much better compatibility with Android right now. I don't
>> think the specification length should be our ultimate goal. Cornelia
>> said, that her ultimate goal is to have a spec everyone is happy with,
>> regardless on how we arrive there. Well, I can only say, that I also
>> think this should be our goal.
>
> Try to put yourself into the shoes of someone who needs to write a new
> video device using virtio. Oh, there are two ways to do it. This guest
> OS only supports virtio-video. But this guest OS only supports
> virtio-v4l2. Great, now you need to support two interfaces with your
> device, or write two different devices.

I think this is a hypothetical issue. IMO it makes sense to use V4L2
UAPI only on Linux host with V4L2 devices + Linux/Android guest.
virtio-video driver is already available for Linux. We already know our
priorities and limitations, so building a decision tree for a potential
device developer would be very easy.

>>> * Having two overlapping specifications for video is overkill and will
>>> just fragment virtio (as tempting as it is, I won't link to XKCD). I
>>> strongly advise against that.
>>
>> I think they're not going to create more problems, than virtio-blk,
>> virtio-scsi and virtio-fs, for example.
>
> At least these devices work at different layers, that makes them more
> justifiable. virtio-video and virtio-v4l2 are just going to provide
> the same API for video devices, only with different structures and
> commands.

Hmm, I think virtio-blk and virtio-scsi work on the same level, aren't
they? They could also say this is basically the same thing.

>> The decision can be made like this:
>> 1. You have a V4L2 device, you don't need any more processing, just want
>> it inside a Linux/Android VM => use virtio-v4l2.
>> 2. You don't have a V4L2 device, or your host is not Linux, or your
>> maybe your guest is not Linux/Android, or you want some extra processing
>> on the host (say you have a third-party proprietary library or whatever)
>> => use virtio-video.
>
> That would make sense if 2. could not be done just as easily by also
> using virtio-v4l2, which I believe it can be.

I'm sorry, I think a potential developer would just look into V4L2 docs,
see struct v4l2_buffer (AFAIU with patches in the spec for the
host/guest/object memory types on top) and run back to the cleanliness
and simplicity of virtio-video. That's basically my story. :)
Iterating on formats in VIDIOC_ENUM_FMT looks quite weird to me too.
Thankfully this is not a big deal usually.
Also it depends on the use-case of a potential developer. If the
priority is on security, if they only need decoding/encoding video or not.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-27 14:20                                           ` Alexander Gordeev
@ 2023-04-28  3:22                                             ` Alexandre Courbot
  2023-04-28  8:22                                               ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-28  3:22 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Thu, Apr 27, 2023 at 11:20 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 26.04.23 07:52, Alexandre Courbot wrote:
> > On Mon, Apr 24, 2023 at 4:52 PM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> On 21.04.23 18:01, Alexander Gordeev wrote:
> >>
> >>> On 21.04.23 06:02, Alexandre Courbot wrote:
> >>>
> >>>> * I am still not convinced that V4L2 is lacking from a security
> >>>> perspective. It would take just one valid example to change my mind
> >>>> (and no, the way the queues are named is not valid). And btw, if it
> >>>> really introduces security issues, then this makes it invalid for
> >>>> inclusion in virtio entirely, just not OpSy's hypervisor.
> >>>
> >>>
> >>> I'd like to start with this and then answer everything else later.
> >>>
> >>> Let's compare VIRTIO_VIDEO_CMD_RESOURCE_QUEUE with
> >>> VIDIOC_QBUF+VIDIOC_DQBUF. Including the parameters, of course. First,
> >>> let's compare the word count to get a very rough estimate of complexity.
> >>> I counted 585 words for VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, including the
> >>> parameters. VIDIOC_QBUF+VIDIOC_DQBUF are defined together and take 1206
> >>> words, they both use struct v4l2_buffer as a parameter. The struct takes
> >>> 2716 words to be described. So the whole thing takes 3922 words. This is
> >>> 6.7 times more, than VIRTIO_VIDEO_CMD_RESOURCE_QUEUE. If we check the
> >>> definitions of the structs, it is also very obvious, that V4L2 UAPI is
> >>> almost like an order of magnitude more complex.
> >>
> >>
> >> I think, it is best to add all the steps necessary to reproduce my calculations just in case.
> >>
> >> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE is doing essentially the same thing as VIDIOC_QBUF+VIDIOC_DQBUF, so we're comparing apples to apples (if we don't forget to compare their parameters too).
> >>
> >> To get the word count for the VIRTIO_VIDEO_CMD_RESOURCE_QUEUE I opened the rendered PDF of video section only from the first email in this thread. Here is the link: https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing . Then I scrolled to page 11 and copied everything related a text file. This is around two pages in the PDF. Then I removed page numbers from the copied text and used 'wc -w' to count words.
> >>
> >> To get the word count for VIDIOC_QBUF+VIDIOC_DQBUF I opened this link: https://docs.kernel.org/userspace-api/media/v4l/vidioc-qbuf.html . Then I selected all the text except table of contents and did followed the same procedure.
> >>
> >> To get the word count for struct v4l2_buffer and other types, that are referenced from it, I opened this link: https://docs.kernel.org/userspace-api/media/v4l/buffer.html#struct-v4l2-buffer . Then I selected all the text except the table of contents and the text above struct v4l2_buffer definition. The rest is the same.
> >>
> >> Also it's quite obvious if you look at them how much bigger struct v4l2_buffer (including the referenced types) is compared to struct virtio_video_resource_queue.
> >
> > You are comparing not the complexity of the structures but the
> > verbosity of their documentation, which are written in a different
> > style, format, and by different people.
>
> I agree to some extent. At least this benchmark is simple and it
> provokes to actually go and look at the definitions, which IMO should be
> enough to already see the difference. What could be a better benchmark?
> Maybe counting the number of various fields and flags and enum cases,
> that one has to read through?

I give you another point of comparison literally 5 lines down my email.

>
> > And the V4L2 page also
> > contains the description of memory types, which is part of another
> > section in the virtio-video spec.
>
> You mean only enum v4l2_memory? Or anything else too?
>
> > There is no way to draw a meaningful
> > conclusion from this.
> >
> > If you want to compare, do it with how the structures are actually
> > used. Here is how you would queue an input buffer with virtio-video:
> >
> >    struct virtio_video_resource_queue queue_buf = {
> >        .cmd_type = VIRTIO_VIDEO_CMD_RESOURCE_QUEUE,
> >        .stream_id = 42,
> >        .queue_type = VIRTIO_VIDEO_QUEUE_TYPE_INPUT,
> >        .resource_id = 1,
> >        .timestamp = 0x10,
> >        .data_sizes = {
> >          [0] = 0x1000,
> >        },
> >    };
> >
> > Now the same with virtio-v4l2:
> >
> >    struct virtio_v4l2_queue_buf queue_buf = {
> >        .cmd = VIRTIO_V4L2_CMD_IOCTL,
> >        .code = VIDIOC_QBUF,
> >        .session_id = 42,
> >        .buffer.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
> >        .buffer.index = 1,
> >        .buffer.timestamp.tv_usec = 0x10,
> >        .buffer.memory = V4L2_MEMORY_MMAP,
> >        .planes = {
> >          [0] = { .bytesused = 0x1000 },
> >        }
> >    };
> >
> > In both cases, you pass a structure with some members set, and the
> > rest to 0. The host receives basically the same thing - it's the same
> > data! The only difference is how it is laid out.
>
> How do I know this from the text? How do I verify, that this is correct?
> Can I be sure this code is going to work or not between any device and
> any driver?
>
> With virtio-video you can just go to the spec/header file, take the
> struct definition and simply set all the fields.
>
> With V4L2 UAPI I don't see any other way except going through the whole
> buffer.html file, filtering potentially irrelevant stuff, then doing
> some trials, and maybe looking into the device or driver code. Maybe
> also asking you for an advice?

The relevant fields for each decoding operation are clearly detailed
in https://www.kernel.org/doc/html/v5.18/userspace-api/media/v4l/dev-decoder.html,
which I already linked to multiple times.

> I think it is worth trying to imagine you're a newcomer in the project.
> Or a security auditor.
>
> So I think we can't draw conclusions about the spec quality from these
> code samples.
>
> > Also as mentioned by Bart, the apparent simplicity of
> > VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, which does not require a dequeue
> > operation as the dequeue is sent with its response, is actually a
> > fallacy: that design choice makes the specification simpler, but at
> > the cost of more complexity on the device side and the potential to
> > starve on command descriptors. By contrast, adopting the V4L2 model
> > resulted in simpler code on both sides and no possibility to deadlock.
> > That point could be addressed by revising the virtio-video spec, but
> > then you get even closer to V4L2.
>
> Hmm, I understand, thanks. Well, I can fix this in the spec. I have some
> ideas. I think I'd better describe them in an answer to Bart's email.
>
> >> Do we agree now, that V4L2 UAPI is not only marginally more complex?
> >
> > No, and I rest my case on the code samples above.
> >
> >>>
> >>>
> >>> Also please read:
> >>>
> >>> https://medium.com/starting-up-security/evidence-of-absence-8148958da092
> >>>
> >>
> >> This reference probably needs a clarification. You argued, that V4L2 has a good track record so far. Here is the quote:
> >>
> >>> FWIW V4L2 has been in use for a looong time (including in Chromebooks
> >>> that do secure playback) and I am not aware of fundamental security
> >>> issues with it.
> >>
> >> But absence of found major security issues doesn't tell us much about the number of not found ones. Absence of evidence is not an evidence of absence. At the link above a former Director of Security at Facebook shares his thoughts about what could be a good evidence of absence of major security problems.
> >
> > You are just FUDing now.
>
> These are well established security concepts AFAIK. Are you gaslighting?
>
> > The exact same argument could be made about
> > virtio-video. If you want to borrow from the security world, I can ask
> > you how going with virtio-video when V4L2 exists is different from
> > rolling your own crypto.
>
> It is very different because obviously neither of them is a crypto.
>
> >> So a bug bounty program with high premiums that covers V4L2 would be a better argument in favor of *already written code* in my opinion. Not for new code. Also probably it is also an argument in favor of the spec, that is the V4L2 UAPI. Like that it is polished enough. Not so sure about that though.
> >>
> >> There actually are several bug bounty programs, that cover the kernel. These are Google's kctf, ZDI's pwn2own, and zerodium AFAIK. However the premiums are not even close to the ones mentioned in my reference. Anyway this means, that using *the existing V4L2 code in the kernel* is probably OK. But this creates some limitations if we want the actual code to still be covered with these bug bounties, right? This means, that the host OS has to be Linux and the actual hardware has to be exposed through a stable V4L2 driver, that is in mainline for some time, and there has to be no or little processing on top. For us this is not possible unfortunately. In the end both things could be secure:
> >>
> >> 1. V4L2 pass through can be secure because of the bug bounty programs and a lot of attention to the kernel in general.
> >> 2. For the new code this doesn't work, so the spec should be as simple and device-centric as possible. Because, all other things being equal, there are fewer errors in simpler programs. So defining a subset of V4L2 UAPI including the data types looks like a good idea to me. The stateful decoder interface, that you point to, does not define a subset in the data types.
> >
> > Wouldn't you have exactly the same problem by using a new guest-host
> > protocol?
>
> I think this is far-fetched. Especially if our new protocol is
> intentionally kept close to the reference protocol with a few things
> made better and simpler. That would be my preferred route.
>
> > Are you going to start a bounty program for virtio-video to
> > get an assurance that it is secure?
>
> I'm not sure it is possible because the device is proprietary. At least
> the driver is going to be covered hopefully. So for the device we're
> going to rely on simplicity, tests and audits. But who knows. I'm not
> the one to make this decision.
>
> >> This is basically my reasoning.
> >>
> >> Also these two specs don't need to compete with each other. They have different limitations and they are for different audiences. If you check the XKCD's comic, it is about competing standards.
> >
> > They allow exactly the same thing (virtualization of video
> > decoding/encoding) and there isn't any use-case of virtio-video that
> > could not be covered equally well by V4L2. Reinventing a new video
> > specification is pointless and will lead to unneeded fragmentation.
>
> I don't agree with these statements.
>
> > We should also expect reticence from the Linux community to upstream
> > two virtual video drivers that basically do the same thing.
>
> This is hypothetical.
>
> --
> Alexander Gordeev
> Senior Software Engineer
>
> OpenSynergy GmbH
> Rotherstr. 20, 10245 Berlin
>
> Phone: +49 30 60 98 54 0 - 88
> Fax: +49 (30) 60 98 54 0 - 99
> EMail: alexander.gordeev@opensynergy.com
>
> www.opensynergy.com
>
> Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
> Geschäftsführer/Managing Director: Régis Adjamah
>
> Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-27 14:34                                           ` Alexander Gordeev
@ 2023-04-28  3:22                                             ` Alexandre Courbot
  2023-04-28  7:57                                               ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-28  3:22 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Bartłomiej Grzesik, Bartłomiej Grzesik, Cornelia Huck,
	virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve, srosek, zyta, hmazur,
	mikrawczyk

On Thu, Apr 27, 2023 at 11:35 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 27.04.23 12:13, Bartłomiej Grzesik wrote:
> > Hi Alexander
> >
> > On Wed, Apr 26, 2023 at 6:00 PM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> Hi Bartłomiej,
> >>
> >> On 21.04.23 13:49, Bartłomiej Grzesik wrote:
> >>> +CC chromeos-arc-video-eng team that also works on virtio-video
> >>>
> >>> Hi everyone!
> >>>
> >>>   From the experience of working on virtio-video I can definitely agree
> >>> with Alex Courbot, that moving to virtio-v4l2 will be a great move.
> >>> This move will not only simply things a lot but also allow us things like
> >>> vhost-net like implementation for some devices (however this is
> >>> thinking way ahead and linux only).
> >>>
> >>> One added benefit of this move that I'd like to point out now, that
> >>> probably
> >>> haven't been mentioned before, is moving to asynchronous resource queue
> >>> call. Previously `VIRTIO_VIDEO_CMD_RESOURCE_QUEUE` have been
> >>> synchronous and caused one hard to debug bug caused by this flawed
> >>> design. During the command execution the virtio queue descriptors are
> >>> blocked, potentially leading to dead locking the device. The implementation
> >>> of virtio-v4l2 (even as is - btw nice work Alex!) eliminates this issue
> >>> by moving to asynchronous response of the resource queue (VIDIOC_QBUF).
> >>
> >> Thanks for your valuable feedback! Could you please share some details
> >> about the bug? That would be very helpful. I'm working on the next
> >> version of the virtio-video draft, so I can change it there. I like the
> >> idea to use V4L2 as a reference, so we should probably do it like it is
> >> done there, only simpler. Still it would be interesting to know the
> >> details, because we didn't have issues with the current design.
> >
> > In this bug, an app preallocated and enqueued all output buffers (to
> > CAPTURE queue). This is inline with V4L2 and in case of virtualized video
> > stack helps with latency. Those enqueued buffers are holding virtqueue
> > descriptors until filled with a decoded frame. While for one stream it is not an
> > issue, but for more simultaneous streams, it quickly becomes a serious
> > problem. In our case all descriptors from the virtqueue were consumed
> > by enqueued output buffers and no other command could be issued
> > to hypervisor. This dead locked the entire driver, by starving the driver with
> > descriptors - even STREAM_DESTROY command could not be issued and
> > only solution was to reboot the guest.
> >
> > While it is easily solvable by adjusting the size of the virtqueue, it
> > is a flaw to
> > this design. The number would always have to a function of maximum number
> > of supported streams - raising rather quickly.
> >
> > I remember having a few thoughts on how it could be solved and I think that
> > removing the need to block those descriptors is the best approach in my opinion.
> > One would argue that preallocating descriptors for this purpose or splitting
> > the command queue to input, output and control might be a viable solution
> > or per stream. However it would only delay the issue in time or could
> > cause other streams to "starve".
>
> Thank you for the detailed description. This makes total sense to me
> indeed. I thought about this problem some time and discussed it with my
> colleagues. Indeed looks like it would be best to stop blocking these
> descriptors. We can add more queues, but this doesn't look scalable
> indeed. There are several ways to unblock the descriptors, I think.
> First, we can do the same thing as in V4L2: add a separate (blocking?)
> DEQUEUE command. But then we theoretically can have the same problem
> with DRAIN, because it also blocks. So why not just use the event queue
> to receive the completion events for both QUEUE and DRAIN commands
> asynchronously? One could argue, that the errors should maybe come out
> of band. But at the same time we already use event queue to deliver
> dynamic resolution change events, that clearly should be delivered with
> the flow of buffers.

FWIW that's exactly how virtio-v4l2 handles things.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-27 15:12                                         ` Alexander Gordeev
@ 2023-04-28  3:24                                           ` Alexandre Courbot
  2023-04-28  8:31                                             ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-28  3:24 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Fri, Apr 28, 2023 at 12:12 AM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 27.04.23 15:23, Alexandre Courbot wrote:
> > On Thu, Apr 27, 2023 at 12:52 AM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> On 21.04.23 06:02, Alexandre Courbot wrote:
> >>> On Wed, Apr 19, 2023 at 4:39 PM Alexander Gordeev
> >>> <alexander.gordeev@opensynergy.com> wrote:
> >>>>
> >>>> On 17.04.23 16:43, Cornelia Huck wrote:
> >>>>> On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >>>>>
> >>>>>> OpenSynergy, the company that I work for, develops a proprietary
> >>>>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
> >>>>>> have our proprietary device implementations, but overall our goal is to
> >>>>>> bring open standards into these quite closed domains and we're betting
> >>>>>> big on virtio. The idea is to run safety-critical functions like cockpit
> >>>>>> controller alongside with multimedia stuff in different VMs on the same
> >>>>>> physical board. Right now they have it on separate physical devices. So
> >>>>>> they already have maximum isolation. And we're trying to make this
> >>>>>> equally safe on a single board. The benefit is the reduced costs and
> >>>>>> some additional features. Of course, we also need features here, but at
> >>>>>> the same time security and ease of certification are among the top of
> >>>>>> our priorities. Nobody wants cars or planes to have security problems,
> >>>>>> right? Also nobody really needs DVB and even more exotic devices in cars
> >>>>>> and planes AFAIK.
> >>>>>>
> >>>>>> For the above mentioned reasons our COQOS hypervisor is running on bare
> >>>>>> metal. Also memory management for the guests is mostly static. It is
> >>>>>> possible to make a shared memory region between a device and a driver
> >>>>>> managed by device in advance. But definitely no mapping of random host
> >>>>>> pages on the fly is supported.
> >>>>>>
> >>>>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
> >>>>>> in its own virtualized environment, right? Both the host and guest are
> >>>>>> linux. In this case I totally understand why V4L2 UAPI pass-through
> >>>>>> feels like a right move. I guess, you'd like to make the switch to
> >>>>>> virtualized apps as seemless as possible for your users. If they can't
> >>>>>> use their DVBs anymore, they complain. And adding the virtualization
> >>>>>> makes the whole thing more secure anyway. So I understand the desire to
> >>>>>> have the range of supported devices as broad as possible. It is also
> >>>>>> understandable that priorities are different with desktop
> >>>>>> virtualization. Also I'm not trying to diminish the great work, that you
> >>>>>> have done. It is just that from my perspective this looks like a step in
> >>>>>> the wrong direction because of the mentioned concerns. So I'm going to
> >>>>>> continue being a skeptic here, sorry.
> >>>>>>
> >>>>>> Of course, I don't expect that you continue working on the old approach
> >>>>>> now as you have put that many efforts into the V4L2 UAPI pass-through.
> >>>>>> So I think it is best to do the evolutionary changes in scope of virtio
> >>>>>> video device specification, and create a new device specification
> >>>>>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
> >>>>>> continue the virtio-video development. In fact I already started making
> >>>>>> draft v7 of the spec according to the comments. I hope it will be ready
> >>>>>> for review soon.
> >>>>>>
> >>>>>> I hope this approach will also help fix issues with virtio-video spec
> >>>>>> and driver development misalignment as well as V4L2 compliance issues
> >>>>>> with the driver. I believe the problems were caused partly by poor
> >>>>>> communication between us and by misalignment of our development cycles,
> >>>>>> not by the driver complexity.
> >>>>>>
> >>>>>> So in my opinion it is OK to have different specs with overlapping
> >>>>>> functionality for some time. My only concern is if this would be
> >>>>>> accepted by the community and the committee. How the things usually go
> >>>>>> here: preferring features and tolerating possible security issues or the
> >>>>>> other way around? Also how acceptable is having linux-specific protocols
> >>>>>> at all?
> >>>>> My main question is: What would be something that we can merge as a
> >>>>> spec, that would either cover the different use cases already, or that
> >>>>> could be easily extended to cover the use cases it does not handle
> >>>>> initially?
> >>>>>
> >>>>> For example, can some of the features that would be useful in crosvm be
> >>>>> tucked behind some feature bit(s), so that the more restricted COQOS
> >>>>> hypervisor would simply not offer them? (Two feature bits covering two
> >>>>> different mechanisms, like the current approach and the v4l2 approach,
> >>>>> would also be good, as long as there's enough common ground between the
> >>>>> two.)
> >>>>>
> >>>>> If a staged approach (adding features controled by feature bits) would
> >>>>> be possible, that would be my preferred way to do it.
> >>>>
> >>>> Hmm, I see several ways how we can use the feature flags:
> >>>> 1. Basically making two feature flags: one for the current video spec
> >>>> and one for the V4L2 UAPI pass through. Kind of the same as having two
> >>>> different specs, but within one device. Not sure which way is better.
> >>>> Probably having two separate devices would be easier to review and merge.
> >>>
> >>> Having two different devices with their own IDs would indeed be less
> >>> confusing than using feature bits.
> >>>
> >>> That being said, the whole point of proposing virtio-v4l2 is to end up
> >>> with *less* specification, not more. Having two concurrent and largely
> >>> overlapping approaches will result in fragmentation and duplicated
> >>> work, so my suggestion would be to decide on one or the other and
> >>> stick to it.
> >>
> >> Hmm, Enrico pointed out, that having virtio-v4l2 would also be good
> >> because of much better compatibility with Android right now. I don't
> >> think the specification length should be our ultimate goal. Cornelia
> >> said, that her ultimate goal is to have a spec everyone is happy with,
> >> regardless on how we arrive there. Well, I can only say, that I also
> >> think this should be our goal.
> >
> > Try to put yourself into the shoes of someone who needs to write a new
> > video device using virtio. Oh, there are two ways to do it. This guest
> > OS only supports virtio-video. But this guest OS only supports
> > virtio-v4l2. Great, now you need to support two interfaces with your
> > device, or write two different devices.
>
> I think this is a hypothetical issue.

Issues have a strange tendency to become hypothetical when it's
convenient for you.

> IMO it makes sense to use V4L2
> UAPI only on Linux host with V4L2 devices + Linux/Android guest.
> virtio-video driver is already available for Linux. We already know our
> priorities and limitations, so building a decision tree for a potential
> device developer would be very easy.
>
> >>> * Having two overlapping specifications for video is overkill and will
> >>> just fragment virtio (as tempting as it is, I won't link to XKCD). I
> >>> strongly advise against that.
> >>
> >> I think they're not going to create more problems, than virtio-blk,
> >> virtio-scsi and virtio-fs, for example.
> >
> > At least these devices work at different layers, that makes them more
> > justifiable. virtio-video and virtio-v4l2 are just going to provide
> > the same API for video devices, only with different structures and
> > commands.
>
> Hmm, I think virtio-blk and virtio-scsi work on the same level, aren't
> they? They could also say this is basically the same thing.
>
> >> The decision can be made like this:
> >> 1. You have a V4L2 device, you don't need any more processing, just want
> >> it inside a Linux/Android VM => use virtio-v4l2.
> >> 2. You don't have a V4L2 device, or your host is not Linux, or your
> >> maybe your guest is not Linux/Android, or you want some extra processing
> >> on the host (say you have a third-party proprietary library or whatever)
> >> => use virtio-video.
> >
> > That would make sense if 2. could not be done just as easily by also
> > using virtio-v4l2, which I believe it can be.
>
> I'm sorry, I think a potential developer would just look into V4L2 docs,
> see struct v4l2_buffer (AFAIU with patches in the spec for the
> host/guest/object memory types on top) and run back to the cleanliness
> and simplicity of virtio-video. That's basically my story. :)
> Iterating on formats in VIDIOC_ENUM_FMT looks quite weird to me too.

I think you will probably end up doing similarly in virtio-video due
to the dependency between input and output formats that is not easy to
represent in a single structure.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-27 14:10                                           ` Alexander Gordeev
@ 2023-04-28  4:02                                             ` Alexandre Courbot
  2023-04-28  8:54                                               ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-04-28  4:02 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

This is going to be my last answer to this thread ; I don't think I
have more technical arguments to give than I already have and the
discussion is drifting into territory I am not interested in engaging.
At the end of the day it's up to the virtio folks to make a decision
about what the best course of action is. If we end up with
fragmentation, so be it, it will still be better than the current
situation anyway.

On Thu, Apr 27, 2023 at 11:11 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 25.04.23 18:04, Cornelia Huck wrote:
> >
> > [I'm replying here, as that seems to be the last message in the thread,
> > and my reply hopefully catches everyone interested here.]
> >
> > To do a very high level summary, we have (at least) two use cases for
> > virtio-video, that unfortunately have quite different requirements. Both
> > want to encode/decode video, but in different environments.
> >
> > - The "restricted" case: Priority is on security, and the attack surface
> >    should be kept as small as possible, for example, by avoiding unneded
> >    complexity in the interface. Fancy allocations and management should
> >    be avoided. The required functionality is also quite clearly defined.
> > - The "feature-rich" case: Priority is on enabling features, and being
> >    able to re-use existing V4L2 support is considered a big plus. Both
> >    device and driver implementations will be implemented in a full OS
> >    environment, so all kind of helpers are already available.
> >
> > (This is not to say that one case does not care about functionality or
> > security; it's mostly a case of different priorities and environments.)
>
> I'm thinking about the latter as more like a "compatibility" case, but
> the "feature-rich" is also a good name.
>
> > I had been hoping that it would be possible to find kind of a common
> > ground between the two cases, but reading the thread, I'm not quite as
> > hopeful anymore... if we really don't manage to find an approach to make
> > the different requirements co-exist, a separate virtio-v4l2 device might
> > be the way to go -- but I've not totally given up hope yet.
>
>  From our side I can say, that moving from the current state even to a
> well-defined subset of V4L2 would require a lot of work, bring literally
> zero advantages for our use-case, while bringing some disadvantages. I
> think we had a good progress so far, we don't want to give up the
> achievements and now this is a great opportunity to do even better
> because our priorities will not collide anymore.

In other words, virtio-v4l2 does not bring you any direct benefit and
switching would have a cost, acknowledged. But I don't think it's
reasonable to split the standard just for one project.

>
> On the other side I don't think Alexandre and his team are really
> interested in doing the extra work of clearly defining the subset of
> V4L2, writing larger specifications, going through all the hassle with
> making the guest pages sharing work (again) and supporting this case in
> their driver for us for something that is planned to be a very simple
> device and driver. Please correct me if I'm wrong here.

You are wrong and stop making things up about what our intent is.
Guest pages are a must-have for us, as I've already said. I also
proposed in a former email to do just what you said I wouldn't
(defining a valid subset of V4L2 for each device), and you ignored it.

>
> And even if they say they agree to do the work with us... I'm sorry, but
> you can probably see, that our communication doesn't go smooth. My
> emails are forgotten, our use-case is clearly not a priority for
> Alexandre, my arguments seem to be considered as obstacles. If we have a
> single device, we have to cooperate actively. I have a lot of doubts
> that it is possible at the moment. In this case I'd prefer to just make
> room for everybody. Maybe we'll cooperate in a fruitful way again later.
> The priorities or use-cases might change, for example. For us this is an
> opportunity to finally update the virtio-video driver against the latest
> state and hopefully make it V4L2 compliant.

It's pretty rich to constantly misrepresent my proposals and then
pretend communication does not go smoothly (implied, by my fault).

I won't engage any more in this discussion as I don't think your
position can be moved, and I have better things to do with my time
than constantly repeat what I said earlier. Do your thing, I'll do
mine, and at the end the virtio folks will decide what they want.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-27 13:16                                   ` Alexandre Courbot
@ 2023-04-28  7:47                                     ` Alexander Gordeev
  2023-05-03 14:04                                       ` Cornelia Huck
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-28  7:47 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 27.04.23 15:16, Alexandre Courbot wrote:
> On Thu, Apr 27, 2023 at 12:11 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 21.04.23 06:02, Alexandre Courbot wrote:
>>> Hi Alexander,
>>>
>>> On Mon, Apr 17, 2023 at 9:52 PM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>
>>>> Hi Alexandre,
>>>>
>>>> Thanks for you letter! Sorry, it took me some time to write an answer.
>>>>
>>>> First of all I'd like to describe my perspective a little bit because it
>>>> seems, that in many cases we (and other people writing their feedbacks)
>>>> simply have very different priorities and background.
>>>>
>>>> OpenSynergy, the company that I work for, develops a proprietary
>>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
>>>> have our proprietary device implementations, but overall our goal is to
>>>> bring open standards into these quite closed domains and we're betting
>>>> big on virtio. The idea is to run safety-critical functions like cockpit
>>>> controller alongside with multimedia stuff in different VMs on the same
>>>> physical board. Right now they have it on separate physical devices. So
>>>> they already have maximum isolation. And we're trying to make this
>>>> equally safe on a single board. The benefit is the reduced costs and
>>>> some additional features. Of course, we also need features here, but at
>>>> the same time security and ease of certification are among the top of
>>>> our priorities. Nobody wants cars or planes to have security problems,
>>>> right? Also nobody really needs DVB and even more exotic devices in cars
>>>> and planes AFAIK.
>>>>
>>>> For the above mentioned reasons our COQOS hypervisor is running on bare
>>>> metal. Also memory management for the guests is mostly static. It is
>>>> possible to make a shared memory region between a device and a driver
>>>> managed by device in advance. But definitely no mapping of random host
>>>> pages on the fly is supported.
>>>>
>>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
>>>> in its own virtualized environment, right?
>>>
>>> Not really, but for the discussion here you can assume that it is a
>>> VMM similar to QEmu with KVM enabled.
>>
>> Thanks for the clarification. If my idea about your use-case is not
>> totally correct, then it would be very helpful if you can provide more
>> details about it.
>
> It's nothing fancy ; Linux host, Linux (e.g. Android) guests.

Thanks for the explanation.

> But
> virtio being a standard, we should focus on making something that is
> usable by everyone instead of individual use-cases.

In an ideal world this would be possible. But in practice we have
various constraints and priorities. So sometimes it makes sense to
achieve more by digging in different directions instead of arguing
endlessly.

>>>> Also this mode of
>>>> operation is not supported in our hypervisor for reasons mentioned
>>>> above. So in our case this PoC doesn't yet prove anything unfortunately.
>>>
>>> I did not have your use-case in mind while writing the PoC, its
>>> purpose was to demonstrate the suitability of V4L2 as a protocol for
>>> virtualizing video.
>>>
>>> Now if your hypervisor does static memory management and pre-allocates
>>> memory for guest buffers, then the V4L2 MMAP memory type actually
>>> looks like the best fit for the job. There are no tokens like virtio
>>> objects UUID to manage, and the MMAP request can be as simple as
>>> returning the pre-mapped address of the buffer in the guest PAS.
>>>
>>> If instead it carves some predefined amount of memory out for the
>>> whole guest and expects it to allocate buffer memory from there, then
>>> the USERPTR memory type (which works like the guest pages of
>>> virtio-video) is what you want to use.
>>
>> It doesn't look like a good idea to us. This means preconfiguring memory
>> regions in the hypervisor config. It is hard to predict the amount of
>> memory, that is necessary. If we allocate too much, this is a waste of
>> memory. If we allocate too little, it won't be enough. Then we don't
>> know yet how to make V4L2 allocate from that memory. Then this memory
>> has to be managed on the host side. And memory management is exactly the
>> thing, that causes most security issues, right? So overall this is very
>> tedious, potentially wasteful and not flexible.
>
> My last paragraph mentions that you can also let the guest manage the
> buffer memory from its own RAM. Or maybe I am missing how memory is
> managed on your hypervisor, but if that's the case elaborate on where
> you want the buffer memory to come from.

Hmm, I think what you write is indeed not different from using guest
memory as it is done in virtio-video. This maps to USERPTR indeed.
Well, it is nothing special. The hypervisor allocates memory for the
guest, then guest is free to manage it. It can also give this memory to
the host.
It is possible to statically create another memory region, that is
managed by the device side and is accessible by the guest. But as I
wrote this is not what we want for various reasons.

>>>> Static allocations of shared memory regions are possible.
>>>> But then we have to tell V4L2 to allocate buffers there. Then we'll need
>>>> a region per virtual device. This is just very tedious and inflexible.
>>>> That's why we're mainly interested in having the guest pages sharing in
>>>> the virtio video spec.
>>>
>>> I'll be happy to update the PoC and make it able to use guest pages as
>>> buffer backing memory. It just wasn't the priority to demonstrate the
>>> global approach.
>>
>> Great, thank you. If you have a concrete plan already, I think it could
>> be beneficial to discuss it now. Otherwise I'd prefer to keep working on
>> the current approach until I see something concrete.
>
> Just give me a couple more weeks and I think I can produce the code.
> But I'm afraid you have already made up your mind anyway.

I had indeed. So IMO it is pointless right now. I simply wanted to have
a detailed plan just in case and make sure, that we both understand it
in the same way. If in the end we're forced to compromise, than this has
to be implemented (or reimplemented?).

>>>>>>       b. For modern cameras the V4L2 interface is not enough anyway. This
>>>>>> was already discussed AFAIR. There is a separate virtio-camera
>>>>>> specification, that indeed is based on V4L2 UAPI as you said. But
>>>>>> combining these two specs is certainly not future proof, right? So I
>>>>>> think it is best to let the virtio-camera spec to be developed
>>>>>> independently.
>>>>> I don't know if virtio-camera has made progress that they have not
>>>>> published yet, but from what I have seen virtio-v4l2 can cover
>>>>> everything that the currently published driver does (I could not find
>>>>> a specification, but please point me to it if it exists), so there
>>>>> would be no conflict to resolve.
>>>>>
>>>>> V4L2 with requests support should be capable of handling complex
>>>>> camera configurations, but the effort indeed seems to have switched to
>>>>> KCAM when it comes to supporting complex native cameras natively. That
>>>>> being said:
>>>>>
>>>>> * KCAM is not merged yet, is probably not going to be for some time
>>>>> (https://lwn.net/Articles/904776/), and we don't know how we can
>>>>> handle virtualization with it,
>>>>> * The fact that the camera is complex on the host does not mean that
>>>>> all that complexity needs to be exposed to the guest. I don't know how
>>>>> the camera folks want to manage this, but one can imagine that the
>>>>> host could expose a simpler model for the virtual camera, with only
>>>>> the required knobs, while the host takes care of doing all the complex
>>>>> configuration.
>>>>> * The counter argument can be made that simple camera devices do not
>>>>> need a complex virtualization solution, so one can also invoke
>>>>> simplicity here to advocate for virtio-v4l2.
>>>>>
>>>>> My point is not to say that all other camera virtualization efforts
>>>>> should be abandoned - if indeed there is a need for something more
>>>>> specific, then nothing prevents us from having a virtio-camera
>>>>> specification added. However, we are nowhere close to this at the
>>>>> moment, and right now there is no official solution for camera
>>>>> virtualization, so I see no reason to deny the opportunity to support
>>>>> simple camera devices since its cost would just be to add "and cameras
>>>>> device" in the paragraph of the spec that explains what devices are
>>>>> supported.
>>>>
>>>> Well, for reasons described above it still seems perfectly fine to me to
>>>> have separate devices. Ok, the argument, that this approach also seems
>>>> more future-proof, is not a strong one.
>>>
>>> Please elaborate on its weaknesses then.
>>
>> Well, as you said basically. The weakness of the argument is that the
>> virtio-camera is not yet published, the KCAM is not merged yet, so yeah,
>> the future is not clear actually.
>>
>> BTW I just thought about one more case, that is already real: sharing
>> camera streams with pipewire. I think pipewire doesn't provide a V4L2
>> UAPI interface, right?
>
> I believe it does: https://archlinux.org/packages/extra/x86_64/pipewire-v4l2/

Oh, indeed. Thanks for the link!

 > This package contains an LD_PRELOAD library that redirects v4l2
applications to PipeWire.

Well, it is cool, that they did this for the compatibility. But this
doesn't look clean and straightforward enough to use this for a new
development.

> But in any case, that's irrelevant to the guest-host interface, and I
> think a big part of the disagreement stems from the misconception that
> V4L2 absolutely needs to be used on either the guest or the host,
> which is absolutely not the case.

I understand this, of course. I'm arguing, that it is harder to
implement it, get it straight and then maintain it over years. Also it
brings limitations, that sometimes can be workarounded in the virtio
spec, but this always comes at a cost of decreased readability and
increased complexity. Overall it looks clearly as a downgrade compared
to virtio-video for our use-case. And I believe it would be the same for
every developer, that has to actually implement the spec, not just do
the pass through. So if we think of V4L2 UAPI pass through as a
compatibility device (which I believe it is), then it is fine to have
both and keep improving the virtio-video, including taking the best
ideas from the V4L2 and overall using it as a reference to make writing
the driver simpler.

>>>> 2. There is no flexibility to choose whatever way of memory management
>>>> host and guest would like to use. Now the guest user-space application
>>>> selects this.
>>>
>>> Errr no. The guest user-space chooses a type of memory from what the
>>> guest kernel exposes, which depends on what the host itself decides to
>>> expose.
>>
>> I don't agree. If an already written user-space app supports only MMAP,
>> then there is no way to force it use USERPTR, right? Please correct me
>> if I'm wrong.
>
> The memory types exposed by the guest kernel do not need to match
> those exposed by the hypervisor or that the guest kernel chooses to
> use.
>
> For instance, imagine that the hypervisor does not support allocating
> buffer memory - i.e. it does not support the MMAP memory type. The
> guest will then have to use its own memory for buffer allocation, and
> send them to the host with the USERPTR memory type.
>
> Now if a guest user-space application only supports MMAP, that's not a
> problem at all. Most V4L2 drivers allocate MMAP buffers from regular
> memory. So when the application requests MMAP buffers, the guest
> kernel can honor this request by allocating some memory itself, and
> changes it to USERPTR when passing the request to the hypervisor so it
> knows that guest memory is in use.
>
> I am responsible for this misconception since I insisted on using the
> same memory types (MMAP, USERPTR, DMABUF) as V4L2 for guest/host
> communication, which is misleading. It would probably be less
> confusing to define new types (HOST, GUEST and VIRTIO_OBJ) just for
> virtio-v4l2 and forbid the use of the kernel/user space memory types.
>
> With these new names, I think it is clear that we have the exact
> feature set of virtio-video (guest memory and virtio objects) covered,
> plus another one where we allow the host to perform the allocation
> itself (which may be useful if the video device has its own dedicated
> memory). Again, this is only for host/guest communication. Guest
> kernel/userspace is a different thing and can be implemented in
> different ways depending on what the host supports.

Thank you very much! This is basically the same thing, that I called
"rewriting the flow of V4L2 UAPI calls on the fly" earlier. You can find
the quote in this email below. As I wrote in the quote, this already
seems hackish to me.

Ensuring consistency everywhere, always remembering in which context a
particular VIDIOC_SOMETHING command is used. Is this command used in
host to guest interaction or in kernel to user-space interaction? You're
going to have problems grepping the code. Well, maybe it can be clearly
separated from each other and from the pass through case. I'm sure it
will still create a lot of confusion.

You went further and suggested to redefine the memory types for
host-guest interactions. I agree completely! This is absolutely the
right thing to do. Mapping original V4L2 memory types would be awkward.
At the same time you now have to remember, that there are two
definitions of enum v4l2_memory, one in the spec, and one in
buffer.html. And you have to remember, that the one from the spec has
priority for host-guest interactions, and the one from buffer.html has
priority for kernel to user-space interactions. That's why I call it a
workaround.

This reminds me another benchmark for code quality: the amount of WTFs
per minute. :)

>>>> The latter makes the solution much less flexible IMO. For example, this
>>>> won't work well with our hypervisor. There might other special needs in
>>>> other use-cases. Like sharing these object UUIDs. Probably this can
>>>> handled by mapping, for example, V4L2_MEMORY_USERPTR to guest-pages
>>>> sharing, V4L2_MEMORY_DMABUF to the UUIDs (which is not quite correct
>>>> IMHO).
>>>
>>> Please elaborate on why this is not correct.
>>
>> Because IMHO UUIDs pointing to memory allocated by virtio-gpu are quite
>> different dmabufs created in the guest with udmabuf, for example. This
>> can be confusing.
>
> True, and that's another reason to define our own memory types to
> remove that confusion.

Yes, and I'm very happy we agree on this.

>>>> So this already means querying the device for supported sharing
>>>> methods, rewriting the flow of V4L2 UAPI calls on the fly, ensuring
>>>> consistency, etc. This already looks hackish to me. Do you have a better
>>>> plan?
>>>
>>> How do you support different kinds of memory without querying? Or do
>>> you suggest we stick to a single one?
>>>
>>> I am also not quite sure what you mean by "rewriting the flow of V4L2
>>> UAPI calls on the fly". There is no "rewriting" - V4L2 structures are
>>> just used to communicate with the host instead of virtio-video
>>> structures.
>>
>> I'd like to know your ideas or better a concrete plan for enabling
>> user-space apps, that only support MMAP, to work on top of a device,
>> that supports only guest pages sharing.
>
> Hopefully my explanation above clears that.

Yes, thank you.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-28  3:22                                             ` Alexandre Courbot
@ 2023-04-28  7:57                                               ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-28  7:57 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Bartłomiej Grzesik, Bartłomiej Grzesik, Cornelia Huck,
	virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve, srosek, zyta, hmazur,
	mikrawczyk

On 28.04.23 05:22, Alexandre Courbot wrote:
> On Thu, Apr 27, 2023 at 11:35 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 27.04.23 12:13, Bartłomiej Grzesik wrote:
>>> Hi Alexander
>>>
>>> On Wed, Apr 26, 2023 at 6:00 PM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>
>>>> Hi Bartłomiej,
>>>>
>>>> On 21.04.23 13:49, Bartłomiej Grzesik wrote:
>>>>> +CC chromeos-arc-video-eng team that also works on virtio-video
>>>>>
>>>>> Hi everyone!
>>>>>
>>>>>    From the experience of working on virtio-video I can definitely agree
>>>>> with Alex Courbot, that moving to virtio-v4l2 will be a great move.
>>>>> This move will not only simply things a lot but also allow us things like
>>>>> vhost-net like implementation for some devices (however this is
>>>>> thinking way ahead and linux only).
>>>>>
>>>>> One added benefit of this move that I'd like to point out now, that
>>>>> probably
>>>>> haven't been mentioned before, is moving to asynchronous resource queue
>>>>> call. Previously `VIRTIO_VIDEO_CMD_RESOURCE_QUEUE` have been
>>>>> synchronous and caused one hard to debug bug caused by this flawed
>>>>> design. During the command execution the virtio queue descriptors are
>>>>> blocked, potentially leading to dead locking the device. The implementation
>>>>> of virtio-v4l2 (even as is - btw nice work Alex!) eliminates this issue
>>>>> by moving to asynchronous response of the resource queue (VIDIOC_QBUF).
>>>>
>>>> Thanks for your valuable feedback! Could you please share some details
>>>> about the bug? That would be very helpful. I'm working on the next
>>>> version of the virtio-video draft, so I can change it there. I like the
>>>> idea to use V4L2 as a reference, so we should probably do it like it is
>>>> done there, only simpler. Still it would be interesting to know the
>>>> details, because we didn't have issues with the current design.
>>>
>>> In this bug, an app preallocated and enqueued all output buffers (to
>>> CAPTURE queue). This is inline with V4L2 and in case of virtualized video
>>> stack helps with latency. Those enqueued buffers are holding virtqueue
>>> descriptors until filled with a decoded frame. While for one stream it is not an
>>> issue, but for more simultaneous streams, it quickly becomes a serious
>>> problem. In our case all descriptors from the virtqueue were consumed
>>> by enqueued output buffers and no other command could be issued
>>> to hypervisor. This dead locked the entire driver, by starving the driver with
>>> descriptors - even STREAM_DESTROY command could not be issued and
>>> only solution was to reboot the guest.
>>>
>>> While it is easily solvable by adjusting the size of the virtqueue, it
>>> is a flaw to
>>> this design. The number would always have to a function of maximum number
>>> of supported streams - raising rather quickly.
>>>
>>> I remember having a few thoughts on how it could be solved and I think that
>>> removing the need to block those descriptors is the best approach in my opinion.
>>> One would argue that preallocating descriptors for this purpose or splitting
>>> the command queue to input, output and control might be a viable solution
>>> or per stream. However it would only delay the issue in time or could
>>> cause other streams to "starve".
>>
>> Thank you for the detailed description. This makes total sense to me
>> indeed. I thought about this problem some time and discussed it with my
>> colleagues. Indeed looks like it would be best to stop blocking these
>> descriptors. We can add more queues, but this doesn't look scalable
>> indeed. There are several ways to unblock the descriptors, I think.
>> First, we can do the same thing as in V4L2: add a separate (blocking?)
>> DEQUEUE command. But then we theoretically can have the same problem
>> with DRAIN, because it also blocks. So why not just use the event queue
>> to receive the completion events for both QUEUE and DRAIN commands
>> asynchronously? One could argue, that the errors should maybe come out
>> of band. But at the same time we already use event queue to deliver
>> dynamic resolution change events, that clearly should be delivered with
>> the flow of buffers.
>
> FWIW that's exactly how virtio-v4l2 handles things.

I'm not sure you read my email to the end.
I'm proposing a different thing. We already have an event queue. It is
used even for something, that should actually go with the flow of output
buffers: dynamic resolution change events. So we can simply add a
BUFFER_DEQUEUED events instead of doing the same thing in the command
queue by implementing a blocking DEQUEUE command as it is done in V4L2.
I think this is even better. This way we have blocked descriptors only
on the event queue, and this should never be a problem. Also this way we
automatically have correct ordering of DEQUEUE and DRC. Add a DRAIN
completion event here, and the whole thing becomes even better. It also
has to be synchronized to the flow of buffers. This way we don't have to
block on DRAIN or have to dequeue empty buffers with an EOS flag. The
whole thing becomes simpler and more consistent. Right?

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-28  3:22                                             ` Alexandre Courbot
@ 2023-04-28  8:22                                               ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-28  8:22 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 28.04.23 05:22, Alexandre Courbot wrote:
> On Thu, Apr 27, 2023 at 11:20 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 26.04.23 07:52, Alexandre Courbot wrote:
>>> On Mon, Apr 24, 2023 at 4:52 PM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>
>>>> On 21.04.23 18:01, Alexander Gordeev wrote:
>>>>
>>>>> Let's compare VIRTIO_VIDEO_CMD_RESOURCE_QUEUE with
>>>>> VIDIOC_QBUF+VIDIOC_DQBUF. Including the parameters, of course. First,
>>>>> let's compare the word count to get a very rough estimate of complexity.
>>>>> I counted 585 words for VIRTIO_VIDEO_CMD_RESOURCE_QUEUE, including the
>>>>> parameters. VIDIOC_QBUF+VIDIOC_DQBUF are defined together and take 1206
>>>>> words, they both use struct v4l2_buffer as a parameter. The struct takes
>>>>> 2716 words to be described. So the whole thing takes 3922 words. This is
>>>>> 6.7 times more, than VIRTIO_VIDEO_CMD_RESOURCE_QUEUE. If we check the
>>>>> definitions of the structs, it is also very obvious, that V4L2 UAPI is
>>>>> almost like an order of magnitude more complex.
>>>>
>>>>
>>>> I think, it is best to add all the steps necessary to reproduce my calculations just in case.
>>>>
>>>> VIRTIO_VIDEO_CMD_RESOURCE_QUEUE is doing essentially the same thing as VIDIOC_QBUF+VIDIOC_DQBUF, so we're comparing apples to apples (if we don't forget to compare their parameters too).
>>>>
>>>> To get the word count for the VIRTIO_VIDEO_CMD_RESOURCE_QUEUE I opened the rendered PDF of video section only from the first email in this thread. Here is the link: https://drive.google.com/file/d/1Sm6LSqvKqQiwYmDE9BXZ0po3XTKnKYlD/view?usp=sharing . Then I scrolled to page 11 and copied everything related a text file. This is around two pages in the PDF. Then I removed page numbers from the copied text and used 'wc -w' to count words.
>>>>
>>>> To get the word count for VIDIOC_QBUF+VIDIOC_DQBUF I opened this link: https://docs.kernel.org/userspace-api/media/v4l/vidioc-qbuf.html . Then I selected all the text except table of contents and did followed the same procedure.
>>>>
>>>> To get the word count for struct v4l2_buffer and other types, that are referenced from it, I opened this link: https://docs.kernel.org/userspace-api/media/v4l/buffer.html#struct-v4l2-buffer . Then I selected all the text except the table of contents and the text above struct v4l2_buffer definition. The rest is the same.
>>>>
>>>> Also it's quite obvious if you look at them how much bigger struct v4l2_buffer (including the referenced types) is compared to struct virtio_video_resource_queue.
>>>
>>> You are comparing not the complexity of the structures but the
>>> verbosity of their documentation, which are written in a different
>>> style, format, and by different people.
>>
>> I agree to some extent. At least this benchmark is simple and it
>> provokes to actually go and look at the definitions, which IMO should be
>> enough to already see the difference. What could be a better benchmark?
>> Maybe counting the number of various fields and flags and enum cases,
>> that one has to read through?
>
> I give you another point of comparison literally 5 lines down my email.
>
>>
>>> And the V4L2 page also
>>> contains the description of memory types, which is part of another
>>> section in the virtio-video spec.
>>
>> You mean only enum v4l2_memory? Or anything else too?
>>
>>> There is no way to draw a meaningful
>>> conclusion from this.
>>>
>>> If you want to compare, do it with how the structures are actually
>>> used. Here is how you would queue an input buffer with virtio-video:
>>>
>>>     struct virtio_video_resource_queue queue_buf = {
>>>         .cmd_type = VIRTIO_VIDEO_CMD_RESOURCE_QUEUE,
>>>         .stream_id = 42,
>>>         .queue_type = VIRTIO_VIDEO_QUEUE_TYPE_INPUT,
>>>         .resource_id = 1,
>>>         .timestamp = 0x10,
>>>         .data_sizes = {
>>>           [0] = 0x1000,
>>>         },
>>>     };
>>>
>>> Now the same with virtio-v4l2:
>>>
>>>     struct virtio_v4l2_queue_buf queue_buf = {
>>>         .cmd = VIRTIO_V4L2_CMD_IOCTL,
>>>         .code = VIDIOC_QBUF,
>>>         .session_id = 42,
>>>         .buffer.type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE,
>>>         .buffer.index = 1,
>>>         .buffer.timestamp.tv_usec = 0x10,
>>>         .buffer.memory = V4L2_MEMORY_MMAP,
>>>         .planes = {
>>>           [0] = { .bytesused = 0x1000 },
>>>         }
>>>     };
>>>
>>> In both cases, you pass a structure with some members set, and the
>>> rest to 0. The host receives basically the same thing - it's the same
>>> data! The only difference is how it is laid out.
>>
>> How do I know this from the text? How do I verify, that this is correct?
>> Can I be sure this code is going to work or not between any device and
>> any driver?
>>
>> With virtio-video you can just go to the spec/header file, take the
>> struct definition and simply set all the fields.
>>
>> With V4L2 UAPI I don't see any other way except going through the whole
>> buffer.html file, filtering potentially irrelevant stuff, then doing
>> some trials, and maybe looking into the device or driver code. Maybe
>> also asking you for an advice?
>
> The relevant fields for each decoding operation are clearly detailed
> in https://www.kernel.org/doc/html/v5.18/userspace-api/media/v4l/dev-decoder.html,
> which I already linked to multiple times.

That's true for most of them. I'd prefer a single document without all
the irrelevant stuff at all. So this one is not quite readable and it
never asks to zero the not required fields. Still this is bearable.

But this is not true for struct v4l2_buffer. I couldn't find anything in
the document, that you're linking. I went there and rechecked it again,
still no. Where is it exactly?

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-28  3:24                                           ` Alexandre Courbot
@ 2023-04-28  8:31                                             ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-28  8:31 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 28.04.23 05:24, Alexandre Courbot wrote:
> On Fri, Apr 28, 2023 at 12:12 AM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 27.04.23 15:23, Alexandre Courbot wrote:
>>> On Thu, Apr 27, 2023 at 12:52 AM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>
>>>> On 21.04.23 06:02, Alexandre Courbot wrote:
>>>>> On Wed, Apr 19, 2023 at 4:39 PM Alexander Gordeev
>>>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>>>
>>>>>> On 17.04.23 16:43, Cornelia Huck wrote:
>>>>>>> On Mon, Apr 17 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>>>>>
>>>>>>>> OpenSynergy, the company that I work for, develops a proprietary
>>>>>>>> hypervisor called COQOS mainly for automotive and aerospace domains. We
>>>>>>>> have our proprietary device implementations, but overall our goal is to
>>>>>>>> bring open standards into these quite closed domains and we're betting
>>>>>>>> big on virtio. The idea is to run safety-critical functions like cockpit
>>>>>>>> controller alongside with multimedia stuff in different VMs on the same
>>>>>>>> physical board. Right now they have it on separate physical devices. So
>>>>>>>> they already have maximum isolation. And we're trying to make this
>>>>>>>> equally safe on a single board. The benefit is the reduced costs and
>>>>>>>> some additional features. Of course, we also need features here, but at
>>>>>>>> the same time security and ease of certification are among the top of
>>>>>>>> our priorities. Nobody wants cars or planes to have security problems,
>>>>>>>> right? Also nobody really needs DVB and even more exotic devices in cars
>>>>>>>> and planes AFAIK.
>>>>>>>>
>>>>>>>> For the above mentioned reasons our COQOS hypervisor is running on bare
>>>>>>>> metal. Also memory management for the guests is mostly static. It is
>>>>>>>> possible to make a shared memory region between a device and a driver
>>>>>>>> managed by device in advance. But definitely no mapping of random host
>>>>>>>> pages on the fly is supported.
>>>>>>>>
>>>>>>>> AFAIU crosvm is about making Chrome OS more secure by putting every app
>>>>>>>> in its own virtualized environment, right? Both the host and guest are
>>>>>>>> linux. In this case I totally understand why V4L2 UAPI pass-through
>>>>>>>> feels like a right move. I guess, you'd like to make the switch to
>>>>>>>> virtualized apps as seemless as possible for your users. If they can't
>>>>>>>> use their DVBs anymore, they complain. And adding the virtualization
>>>>>>>> makes the whole thing more secure anyway. So I understand the desire to
>>>>>>>> have the range of supported devices as broad as possible. It is also
>>>>>>>> understandable that priorities are different with desktop
>>>>>>>> virtualization. Also I'm not trying to diminish the great work, that you
>>>>>>>> have done. It is just that from my perspective this looks like a step in
>>>>>>>> the wrong direction because of the mentioned concerns. So I'm going to
>>>>>>>> continue being a skeptic here, sorry.
>>>>>>>>
>>>>>>>> Of course, I don't expect that you continue working on the old approach
>>>>>>>> now as you have put that many efforts into the V4L2 UAPI pass-through.
>>>>>>>> So I think it is best to do the evolutionary changes in scope of virtio
>>>>>>>> video device specification, and create a new device specification
>>>>>>>> (virtio-v4l2 ?) for the revolutionary changes. Then I'd be glad to
>>>>>>>> continue the virtio-video development. In fact I already started making
>>>>>>>> draft v7 of the spec according to the comments. I hope it will be ready
>>>>>>>> for review soon.
>>>>>>>>
>>>>>>>> I hope this approach will also help fix issues with virtio-video spec
>>>>>>>> and driver development misalignment as well as V4L2 compliance issues
>>>>>>>> with the driver. I believe the problems were caused partly by poor
>>>>>>>> communication between us and by misalignment of our development cycles,
>>>>>>>> not by the driver complexity.
>>>>>>>>
>>>>>>>> So in my opinion it is OK to have different specs with overlapping
>>>>>>>> functionality for some time. My only concern is if this would be
>>>>>>>> accepted by the community and the committee. How the things usually go
>>>>>>>> here: preferring features and tolerating possible security issues or the
>>>>>>>> other way around? Also how acceptable is having linux-specific protocols
>>>>>>>> at all?
>>>>>>> My main question is: What would be something that we can merge as a
>>>>>>> spec, that would either cover the different use cases already, or that
>>>>>>> could be easily extended to cover the use cases it does not handle
>>>>>>> initially?
>>>>>>>
>>>>>>> For example, can some of the features that would be useful in crosvm be
>>>>>>> tucked behind some feature bit(s), so that the more restricted COQOS
>>>>>>> hypervisor would simply not offer them? (Two feature bits covering two
>>>>>>> different mechanisms, like the current approach and the v4l2 approach,
>>>>>>> would also be good, as long as there's enough common ground between the
>>>>>>> two.)
>>>>>>>
>>>>>>> If a staged approach (adding features controled by feature bits) would
>>>>>>> be possible, that would be my preferred way to do it.
>>>>>>
>>>>>> Hmm, I see several ways how we can use the feature flags:
>>>>>> 1. Basically making two feature flags: one for the current video spec
>>>>>> and one for the V4L2 UAPI pass through. Kind of the same as having two
>>>>>> different specs, but within one device. Not sure which way is better.
>>>>>> Probably having two separate devices would be easier to review and merge.
>>>>>
>>>>> Having two different devices with their own IDs would indeed be less
>>>>> confusing than using feature bits.
>>>>>
>>>>> That being said, the whole point of proposing virtio-v4l2 is to end up
>>>>> with *less* specification, not more. Having two concurrent and largely
>>>>> overlapping approaches will result in fragmentation and duplicated
>>>>> work, so my suggestion would be to decide on one or the other and
>>>>> stick to it.
>>>>
>>>> Hmm, Enrico pointed out, that having virtio-v4l2 would also be good
>>>> because of much better compatibility with Android right now. I don't
>>>> think the specification length should be our ultimate goal. Cornelia
>>>> said, that her ultimate goal is to have a spec everyone is happy with,
>>>> regardless on how we arrive there. Well, I can only say, that I also
>>>> think this should be our goal.
>>>
>>> Try to put yourself into the shoes of someone who needs to write a new
>>> video device using virtio. Oh, there are two ways to do it. This guest
>>> OS only supports virtio-video. But this guest OS only supports
>>> virtio-v4l2. Great, now you need to support two interfaces with your
>>> device, or write two different devices.
>>
>> I think this is a hypothetical issue.
>
> Issues have a strange tendency to become hypothetical when it's
> convenient for you.

I would be happy to discuss some of the hypothetical issues coming from
you, if you agree to discuss hypothetical issue coming from me.
Otherwise let's just focus on less hypothetical ones please.

>> IMO it makes sense to use V4L2
>> UAPI only on Linux host with V4L2 devices + Linux/Android guest.
>> virtio-video driver is already available for Linux. We already know our
>> priorities and limitations, so building a decision tree for a potential
>> device developer would be very easy.
>>
>>>>> * Having two overlapping specifications for video is overkill and will
>>>>> just fragment virtio (as tempting as it is, I won't link to XKCD). I
>>>>> strongly advise against that.
>>>>
>>>> I think they're not going to create more problems, than virtio-blk,
>>>> virtio-scsi and virtio-fs, for example.
>>>
>>> At least these devices work at different layers, that makes them more
>>> justifiable. virtio-video and virtio-v4l2 are just going to provide
>>> the same API for video devices, only with different structures and
>>> commands.
>>
>> Hmm, I think virtio-blk and virtio-scsi work on the same level, aren't
>> they? They could also say this is basically the same thing.
>>
>>>> The decision can be made like this:
>>>> 1. You have a V4L2 device, you don't need any more processing, just want
>>>> it inside a Linux/Android VM => use virtio-v4l2.
>>>> 2. You don't have a V4L2 device, or your host is not Linux, or your
>>>> maybe your guest is not Linux/Android, or you want some extra processing
>>>> on the host (say you have a third-party proprietary library or whatever)
>>>> => use virtio-video.
>>>
>>> That would make sense if 2. could not be done just as easily by also
>>> using virtio-v4l2, which I believe it can be.
>>
>> I'm sorry, I think a potential developer would just look into V4L2 docs,
>> see struct v4l2_buffer (AFAIU with patches in the spec for the
>> host/guest/object memory types on top) and run back to the cleanliness
>> and simplicity of virtio-video. That's basically my story. :)
>> Iterating on formats in VIDIOC_ENUM_FMT looks quite weird to me too.
>
> I think you will probably end up doing similarly in virtio-video due
> to the dependency between input and output formats that is not easy to
> represent in a single structure.

Yes! I'm already working on this actually. But I'd like to have only two
round-trips over virtio. Not 10+ like with VIDIOC_ENUM_FMT. Do you
agree, that less interactions would be better?

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-28  4:02                                             ` Alexandre Courbot
@ 2023-04-28  8:54                                               ` Alexander Gordeev
  2023-05-02  1:07                                                 ` Alexandre Courbot
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-04-28  8:54 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 28.04.23 06:02, Alexandre Courbot wrote:
> This is going to be my last answer to this thread ; I don't think I
> have more technical arguments to give than I already have and the
> discussion is drifting into territory I am not interested in engaging.
> At the end of the day it's up to the virtio folks to make a decision
> about what the best course of action is. If we end up with
> fragmentation, so be it, it will still be better than the current
> situation anyway.

I agree with the summary. I think we have discussed all the things in
depth and even reached some understanding.
Could you please at least share your feedback on my proposal for the
QUEUE/DRAIN completion handling?

> On Thu, Apr 27, 2023 at 11:11 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 25.04.23 18:04, Cornelia Huck wrote:
>>>
>>> [I'm replying here, as that seems to be the last message in the thread,
>>> and my reply hopefully catches everyone interested here.]
>>>
>>> To do a very high level summary, we have (at least) two use cases for
>>> virtio-video, that unfortunately have quite different requirements. Both
>>> want to encode/decode video, but in different environments.
>>>
>>> - The "restricted" case: Priority is on security, and the attack surface
>>>     should be kept as small as possible, for example, by avoiding unneded
>>>     complexity in the interface. Fancy allocations and management should
>>>     be avoided. The required functionality is also quite clearly defined.
>>> - The "feature-rich" case: Priority is on enabling features, and being
>>>     able to re-use existing V4L2 support is considered a big plus. Both
>>>     device and driver implementations will be implemented in a full OS
>>>     environment, so all kind of helpers are already available.
>>>
>>> (This is not to say that one case does not care about functionality or
>>> security; it's mostly a case of different priorities and environments.)
>>
>> I'm thinking about the latter as more like a "compatibility" case, but
>> the "feature-rich" is also a good name.
>>
>>> I had been hoping that it would be possible to find kind of a common
>>> ground between the two cases, but reading the thread, I'm not quite as
>>> hopeful anymore... if we really don't manage to find an approach to make
>>> the different requirements co-exist, a separate virtio-v4l2 device might
>>> be the way to go -- but I've not totally given up hope yet.
>>
>>   From our side I can say, that moving from the current state even to a
>> well-defined subset of V4L2 would require a lot of work, bring literally
>> zero advantages for our use-case, while bringing some disadvantages. I
>> think we had a good progress so far, we don't want to give up the
>> achievements and now this is a great opportunity to do even better
>> because our priorities will not collide anymore.
>
> In other words, virtio-v4l2 does not bring you any direct benefit and
> switching would have a cost, acknowledged. But I don't think it's
> reasonable to split the standard just for one project.

I've acknowledged this in many emails already. But I wouldn't argue that
long if this is only about costs. I'm a developer after all. (I'm not
paying my salary myself.) I believe, that virtio-video is simply better
for a whole class of use-cases, including ours.

>> On the other side I don't think Alexandre and his team are really
>> interested in doing the extra work of clearly defining the subset of
>> V4L2, writing larger specifications, going through all the hassle with
>> making the guest pages sharing work (again) and supporting this case in
>> their driver for us for something that is planned to be a very simple
>> device and driver. Please correct me if I'm wrong here.
>
> You are wrong and stop making things up about what our intent is.
> Guest pages are a must-have for us, as I've already said. I also
> proposed in a former email to do just what you said I wouldn't
> (defining a valid subset of V4L2 for each device), and you ignored it.

Hmm, I haven't ignored it, I just didn't have time to answer, sorry. I
answered it a few minutes ago.
Are guest pages a must have because some of the user-space apps want to
use them? Or is the conversion of memory types in the driver also a must
have to you? As far as I can understand your use-case, the former is a
must have, but the latter is not. At the same time the latter is a must
have for us. We already have this implemented this in the virtio-video
spec and virtio-video driver. Also I believe the virtio-video spec is
much clearer and better defined, compared to V4L2 UAPI, so we'd like to
focus on improving the current state, not downgrading to V4L2 UAPI.

> I won't engage any more in this discussion as I don't think your
> position can be moved, and I have better things to do with my time
> than constantly repeat what I said earlier. Do your thing, I'll do
> mine, and at the end the virtio folks will decide what they want.

So be it.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-28  8:54                                               ` Alexander Gordeev
@ 2023-05-02  1:07                                                 ` Alexandre Courbot
  2023-05-02 11:12                                                   ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-05-02  1:07 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Fri, Apr 28, 2023 at 5:55 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 28.04.23 06:02, Alexandre Courbot wrote:
> > This is going to be my last answer to this thread ; I don't think I
> > have more technical arguments to give than I already have and the
> > discussion is drifting into territory I am not interested in engaging.
> > At the end of the day it's up to the virtio folks to make a decision
> > about what the best course of action is. If we end up with
> > fragmentation, so be it, it will still be better than the current
> > situation anyway.
>
> I agree with the summary. I think we have discussed all the things in
> depth and even reached some understanding.
> Could you please at least share your feedback on my proposal for the
> QUEUE/DRAIN completion handling?

Well I've re-read your email once more and again I confirm, this is
exactly what virtio-v4l2 does. Here is the code that sends the dequeue
buffer event on the event queue after a buffer is dequeued on the
host: https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/worker.rs#L1114.

IOW, there are no VIDIOC_DQBUF or VIDIOC_DQEVENT (or any command that
can block) taking place between the guest and host, these are sent to
the guest as events.

And yes, this design simplifies things quite a bit vs. retaining
command descriptors until some operation completes. Beyond the
increased potential for deadlock, there is also added complexity as
the descriptors must eventually be returned to the guest, even if the
requested operation is interrupted. This increases the amount of state
on the host which must keep track of pending operations and cancel
some on occasion (which is pretty error-prone), whereas having all
commands complete without delay could theoretically let us function
even with a single command descriptor. Retaining the descriptors was
supposed to reduce host/guest traffic a bit, but at that level of
interaction (60 ops/second for a high framerate video) I don't think
it's worth the cost.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-02  1:07                                                 ` Alexandre Courbot
@ 2023-05-02 11:12                                                   ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-02 11:12 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 02.05.23 03:07, Alexandre Courbot wrote:
> On Fri, Apr 28, 2023 at 5:55 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 28.04.23 06:02, Alexandre Courbot wrote:
>>> This is going to be my last answer to this thread ; I don't think I
>>> have more technical arguments to give than I already have and the
>>> discussion is drifting into territory I am not interested in engaging.
>>> At the end of the day it's up to the virtio folks to make a decision
>>> about what the best course of action is. If we end up with
>>> fragmentation, so be it, it will still be better than the current
>>> situation anyway.
>>
>> I agree with the summary. I think we have discussed all the things in
>> depth and even reached some understanding.
>> Could you please at least share your feedback on my proposal for the
>> QUEUE/DRAIN completion handling?
>
> Well I've re-read your email once more and again I confirm, this is
> exactly what virtio-v4l2 does. Here is the code that sends the dequeue
> buffer event on the event queue after a buffer is dequeued on the
> host: https://github.com/Gnurou/crosvm/blob/virtio-v4l2/devices/src/virtio/v4l2/worker.rs#L1114.
>
> IOW, there are no VIDIOC_DQBUF or VIDIOC_DQEVENT (or any command that
> can block) taking place between the guest and host, these are sent to
> the guest as events.
>
> And yes, this design simplifies things quite a bit vs. retaining
> command descriptors until some operation completes. Beyond the
> increased potential for deadlock, there is also added complexity as
> the descriptors must eventually be returned to the guest, even if the
> requested operation is interrupted. This increases the amount of state
> on the host which must keep track of pending operations and cancel
> some on occasion (which is pretty error-prone), whereas having all
> commands complete without delay could theoretically let us function
> even with a single command descriptor. Retaining the descriptors was
> supposed to reduce host/guest traffic a bit, but at that level of
> interaction (60 ops/second for a high framerate video) I don't think
> it's worth the cost.

Thanks for the clarification! So this was a misunderstanding on my side.
Indeed I'm still thinking of virtio-v4l2 as a V4L2 UAPI pass through. So
of course I thought you have VIDIOC_DQBUF and VIDIOC_DQEVENT. The
design, that you have in fact, indeed looks better to me compared to
having these ioctls passed through on the commandq. So doing the same
thing in virtio-video shouldn't break anything. I'm happy we agree on
that. I totally agree with you arguments in favor of the events.
So virtio-v4l2 is going to have quite a few patches on top of the V4L2
UAPI: the memory types, the way events are passed, maybe also more
restrictions on the data types (this can be upstreamed hopefully). Did I
miss something else? So probably it is better to present and think of
virtio-v4l2 as something like "patched V4L2 UAPI" to avoid confusion.
Now I'm really curious what the virtio-v4l2 spec would look like. I
don't say, that I'd like to give up on virtio-video. Please don't get me
wrong. I still think it is much better to have a readable clean spec and
to avoid all the patches, because they will certainly cause a lot of
confusion (like what we can see right now). I'm just glad, that we're
going in the same direction from different starting points. This
probably means, that we both do the right thing. Good!

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-04-28  7:47                                     ` Alexander Gordeev
@ 2023-05-03 14:04                                       ` Cornelia Huck
  2023-05-03 15:11                                         ` Alex Bennée
  2023-05-05 11:54                                         ` Alexander Gordeev
  0 siblings, 2 replies; 97+ messages in thread
From: Cornelia Huck @ 2023-05-03 14:04 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:

> On 27.04.23 15:16, Alexandre Courbot wrote:
>> But in any case, that's irrelevant to the guest-host interface, and I
>> think a big part of the disagreement stems from the misconception that
>> V4L2 absolutely needs to be used on either the guest or the host,
>> which is absolutely not the case.
>
> I understand this, of course. I'm arguing, that it is harder to
> implement it, get it straight and then maintain it over years. Also it
> brings limitations, that sometimes can be workarounded in the virtio
> spec, but this always comes at a cost of decreased readability and
> increased complexity. Overall it looks clearly as a downgrade compared
> to virtio-video for our use-case. And I believe it would be the same for
> every developer, that has to actually implement the spec, not just do
> the pass through. So if we think of V4L2 UAPI pass through as a
> compatibility device (which I believe it is), then it is fine to have
> both and keep improving the virtio-video, including taking the best
> ideas from the V4L2 and overall using it as a reference to make writing
> the driver simpler.

Let me jump in here and ask another question:

Imagine that, some years in the future, somebody wants to add a virtio
device for handling video encoding/decoding to their hypervisor.

Option 1: There are different devices to chose from. How is the person
implementing this supposed to pick a device? They might have a narrow
use case, where it is clear which of the devices is the one that needs to
be supported; but they also might have multiple, diverse use cases, and
end up needing to implement all of the devices.

Option 2: There is one device with various optional features. The person
implementing this can start off with a certain subset of features
depending on their expected use cases, and add to it later, if needed;
but the upfront complexity might be too high for specialized use cases.

Leaving concrete references to V4L2 out of the picture, we're currently
trying to decide whether our future will be more like Option 1 or Option
2, with their respective trade-offs.

I'm slightly biased towards Option 2; does it look feasible at all, or
am I missing something essential here? (I had the impression that some
previous confusion had been cleared up; apologies in advance if I'm
misrepresenting things.)

I'd really love to see some kind of consensus for 1.3, if at all
possible :)


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-03 14:04                                       ` Cornelia Huck
@ 2023-05-03 15:11                                         ` Alex Bennée
  2023-05-03 15:53                                           ` Cornelia Huck
  2023-05-05 12:28                                           ` Alexander Gordeev
  2023-05-05 11:54                                         ` Alexander Gordeev
  1 sibling, 2 replies; 97+ messages in thread
From: Alex Bennée @ 2023-05-03 15:11 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alexander Gordeev, Alexandre Courbot, virtio-dev,
	Keiichi Watanabe, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve


Cornelia Huck <cohuck@redhat.com> writes:

> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>
>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>> But in any case, that's irrelevant to the guest-host interface, and I
>>> think a big part of the disagreement stems from the misconception that
>>> V4L2 absolutely needs to be used on either the guest or the host,
>>> which is absolutely not the case.
>>
>> I understand this, of course. I'm arguing, that it is harder to
>> implement it, get it straight and then maintain it over years. Also it
>> brings limitations, that sometimes can be workarounded in the virtio
>> spec, but this always comes at a cost of decreased readability and
>> increased complexity. Overall it looks clearly as a downgrade compared
>> to virtio-video for our use-case. And I believe it would be the same for
>> every developer, that has to actually implement the spec, not just do
>> the pass through. So if we think of V4L2 UAPI pass through as a
>> compatibility device (which I believe it is), then it is fine to have
>> both and keep improving the virtio-video, including taking the best
>> ideas from the V4L2 and overall using it as a reference to make writing
>> the driver simpler.
>
> Let me jump in here and ask another question:
>
> Imagine that, some years in the future, somebody wants to add a virtio
> device for handling video encoding/decoding to their hypervisor.
>
> Option 1: There are different devices to chose from. How is the person
> implementing this supposed to pick a device? They might have a narrow
> use case, where it is clear which of the devices is the one that needs to
> be supported; but they also might have multiple, diverse use cases, and
> end up needing to implement all of the devices.
>
> Option 2: There is one device with various optional features. The person
> implementing this can start off with a certain subset of features
> depending on their expected use cases, and add to it later, if needed;
> but the upfront complexity might be too high for specialized use cases.
>
> Leaving concrete references to V4L2 out of the picture, we're currently
> trying to decide whether our future will be more like Option 1 or Option
> 2, with their respective trade-offs.
>
> I'm slightly biased towards Option 2; does it look feasible at all, or
> am I missing something essential here? (I had the impression that some
> previous confusion had been cleared up; apologies in advance if I'm
> misrepresenting things.)
>
> I'd really love to see some kind of consensus for 1.3, if at all
> possible :)

I think feature discovery and extensibility is a key part of the VirtIO
paradigm which is why I find the virtio-v4l approach limiting. By
pegging the device to a Linux API we effectively limit the growth of the
device specification to as fast as the Linux API changes. I'm not fully
immersed in v4l but I don't think it is seeing any additional features
developed for it and its limitations for camera are one of the reasons
stuff is being pushed to userspace in solutions like libcamera:

  How is libcamera different from V4L2?

  We see libcamera as a continuation of V4L2. One that can more easily
  handle the recent advances in hardware design. As embedded cameras have
  developed, all of the complexity has been pushed on to the developers.
  With libcamera, all of that complexity is simplified and a single model
  is presented to application developers.

That said its not totally our experience to have virtio devices act as
simple pipes for some higher level protocol. The virtio-gpu spec says
very little about the details of how 3D devices work and simply offers
an opaque pipe to push a (potentially propriety) command stream to the
back end. As far as I'm aware the proposals for Vulkan and Wayland
device support doesn't even offer a feature bit but simply changes the
graphics stream type in the command packets.

We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
incompatible with other feature bits and make that the baseline
implementation but it's not really in the spirit of what VirtIO is
trying to achieve.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-03 15:11                                         ` Alex Bennée
@ 2023-05-03 15:53                                           ` Cornelia Huck
  2023-05-05  9:57                                             ` Alexander Gordeev
  2023-05-05 12:28                                           ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Cornelia Huck @ 2023-05-03 15:53 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Alexander Gordeev, Alexandre Courbot, virtio-dev,
	Keiichi Watanabe, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve

On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:

> Cornelia Huck <cohuck@redhat.com> writes:
>
>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>
>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>> think a big part of the disagreement stems from the misconception that
>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>> which is absolutely not the case.
>>>
>>> I understand this, of course. I'm arguing, that it is harder to
>>> implement it, get it straight and then maintain it over years. Also it
>>> brings limitations, that sometimes can be workarounded in the virtio
>>> spec, but this always comes at a cost of decreased readability and
>>> increased complexity. Overall it looks clearly as a downgrade compared
>>> to virtio-video for our use-case. And I believe it would be the same for
>>> every developer, that has to actually implement the spec, not just do
>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>> compatibility device (which I believe it is), then it is fine to have
>>> both and keep improving the virtio-video, including taking the best
>>> ideas from the V4L2 and overall using it as a reference to make writing
>>> the driver simpler.
>>
>> Let me jump in here and ask another question:
>>
>> Imagine that, some years in the future, somebody wants to add a virtio
>> device for handling video encoding/decoding to their hypervisor.
>>
>> Option 1: There are different devices to chose from. How is the person
>> implementing this supposed to pick a device? They might have a narrow
>> use case, where it is clear which of the devices is the one that needs to
>> be supported; but they also might have multiple, diverse use cases, and
>> end up needing to implement all of the devices.
>>
>> Option 2: There is one device with various optional features. The person
>> implementing this can start off with a certain subset of features
>> depending on their expected use cases, and add to it later, if needed;
>> but the upfront complexity might be too high for specialized use cases.
>>
>> Leaving concrete references to V4L2 out of the picture, we're currently
>> trying to decide whether our future will be more like Option 1 or Option
>> 2, with their respective trade-offs.
>>
>> I'm slightly biased towards Option 2; does it look feasible at all, or
>> am I missing something essential here? (I had the impression that some
>> previous confusion had been cleared up; apologies in advance if I'm
>> misrepresenting things.)
>>
>> I'd really love to see some kind of consensus for 1.3, if at all
>> possible :)
>
> I think feature discovery and extensibility is a key part of the VirtIO
> paradigm which is why I find the virtio-v4l approach limiting. By
> pegging the device to a Linux API we effectively limit the growth of the
> device specification to as fast as the Linux API changes. I'm not fully
> immersed in v4l but I don't think it is seeing any additional features
> developed for it and its limitations for camera are one of the reasons
> stuff is being pushed to userspace in solutions like libcamera:
>
>   How is libcamera different from V4L2?
>
>   We see libcamera as a continuation of V4L2. One that can more easily
>   handle the recent advances in hardware design. As embedded cameras have
>   developed, all of the complexity has been pushed on to the developers.
>   With libcamera, all of that complexity is simplified and a single model
>   is presented to application developers.

Ok, that is interesting; thanks for the information.

>
> That said its not totally our experience to have virtio devices act as
> simple pipes for some higher level protocol. The virtio-gpu spec says
> very little about the details of how 3D devices work and simply offers
> an opaque pipe to push a (potentially propriety) command stream to the
> back end. As far as I'm aware the proposals for Vulkan and Wayland
> device support doesn't even offer a feature bit but simply changes the
> graphics stream type in the command packets.
>
> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
> incompatible with other feature bits and make that the baseline
> implementation but it's not really in the spirit of what VirtIO is
> trying to achieve.

I'd not be in favour of an incompatible feature flag,
either... extensions are good, but conflicting features is something
that I'd like to avoid.

So, given that I'd still prefer to have a single device: How well does
the proposed virtio-video device map to a Linux driver implementation
that hooks into V4L2? If the general process flow is compatible and it
is mostly a question of wiring the parts together, I think pushing that
part of the complexity into the Linux driver is a reasonable
trade-off. Being able to use an existing protocol is nice, but if that
protocol is not perceived as flexible enough, it is probably not worth
encoding it into a spec. (Similar considerations apply to hooking up the
device in the hypervisor.)

Sorry about asking all those basic questions, but I really rely on the
judgment of people familiar with the infrastructure and use cases so
that we end up with a specification that is actually usable in the long
term. Too many details are likely to confuse me :)


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-03 15:53                                           ` Cornelia Huck
@ 2023-05-05  9:57                                             ` Alexander Gordeev
       [not found]                                               ` <168329085253.1880445.14002473591422425775@Monstersaurus>
  0 siblings, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-05  9:57 UTC (permalink / raw)
  To: Cornelia Huck, Alex Bennée
  Cc: Alexandre Courbot, virtio-dev, Keiichi Watanabe, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On 03.05.23 17:53, Cornelia Huck wrote:
> On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>> Cornelia Huck <cohuck@redhat.com> writes:
>>
>>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>
>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>> think a big part of the disagreement stems from the misconception that
>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>> which is absolutely not the case.
>>>>
>>>> I understand this, of course. I'm arguing, that it is harder to
>>>> implement it, get it straight and then maintain it over years. Also it
>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>> spec, but this always comes at a cost of decreased readability and
>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>> to virtio-video for our use-case. And I believe it would be the same for
>>>> every developer, that has to actually implement the spec, not just do
>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>> compatibility device (which I believe it is), then it is fine to have
>>>> both and keep improving the virtio-video, including taking the best
>>>> ideas from the V4L2 and overall using it as a reference to make writing
>>>> the driver simpler.
>>>
>>> Let me jump in here and ask another question:
>>>
>>> Imagine that, some years in the future, somebody wants to add a virtio
>>> device for handling video encoding/decoding to their hypervisor.
>>>
>>> Option 1: There are different devices to chose from. How is the person
>>> implementing this supposed to pick a device? They might have a narrow
>>> use case, where it is clear which of the devices is the one that needs to
>>> be supported; but they also might have multiple, diverse use cases, and
>>> end up needing to implement all of the devices.
>>>
>>> Option 2: There is one device with various optional features. The person
>>> implementing this can start off with a certain subset of features
>>> depending on their expected use cases, and add to it later, if needed;
>>> but the upfront complexity might be too high for specialized use cases.
>>>
>>> Leaving concrete references to V4L2 out of the picture, we're currently
>>> trying to decide whether our future will be more like Option 1 or Option
>>> 2, with their respective trade-offs.
>>>
>>> I'm slightly biased towards Option 2; does it look feasible at all, or
>>> am I missing something essential here? (I had the impression that some
>>> previous confusion had been cleared up; apologies in advance if I'm
>>> misrepresenting things.)
>>>
>>> I'd really love to see some kind of consensus for 1.3, if at all
>>> possible :)
>>
>> I think feature discovery and extensibility is a key part of the VirtIO
>> paradigm which is why I find the virtio-v4l approach limiting. By
>> pegging the device to a Linux API we effectively limit the growth of the
>> device specification to as fast as the Linux API changes. I'm not fully
>> immersed in v4l but I don't think it is seeing any additional features
>> developed for it and its limitations for camera are one of the reasons
>> stuff is being pushed to userspace in solutions like libcamera:
>>
>>    How is libcamera different from V4L2?
>>
>>    We see libcamera as a continuation of V4L2. One that can more easily
>>    handle the recent advances in hardware design. As embedded cameras have
>>    developed, all of the complexity has been pushed on to the developers.
>>    With libcamera, all of that complexity is simplified and a single model
>>    is presented to application developers.
>
> Ok, that is interesting; thanks for the information.
>
>>
>> That said its not totally our experience to have virtio devices act as
>> simple pipes for some higher level protocol. The virtio-gpu spec says
>> very little about the details of how 3D devices work and simply offers
>> an opaque pipe to push a (potentially propriety) command stream to the
>> back end. As far as I'm aware the proposals for Vulkan and Wayland
>> device support doesn't even offer a feature bit but simply changes the
>> graphics stream type in the command packets.
>>
>> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
>> incompatible with other feature bits and make that the baseline
>> implementation but it's not really in the spirit of what VirtIO is
>> trying to achieve.
>
> I'd not be in favour of an incompatible feature flag,
> either... extensions are good, but conflicting features is something
> that I'd like to avoid.
>
> So, given that I'd still prefer to have a single device: How well does
> the proposed virtio-video device map to a Linux driver implementation
> that hooks into V4L2?

IMO it hooks into V4L2 pretty well. And I'm going to spend next few
months making the existing driver fully V4L2 compliant. If this goal
requires changing the spec, than we still have time to do that. I don't
expect a lot of problems on this side. There might be problems with
Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
of this can be accomplished over time.

> If the general process flow is compatible and it
> is mostly a question of wiring the parts together, I think pushing that
> part of the complexity into the Linux driver is a reasonable
> trade-off. Being able to use an existing protocol is nice, but if that
> protocol is not perceived as flexible enough, it is probably not worth
> encoding it into a spec. (Similar considerations apply to hooking up the
> device in the hypervisor.)

I very much agree with these statements. I think this is how it should
be: we start with a compact but usable device, then add features and
enable them using feature flags. Eventually we can cover all the
use-cases of V4L2 unless we decide to have separate devices for them
(virtio-camera, etc). This would be better in the long term I think.
If everybody wants a single device that much, I think it should be based
on the current virtio-video. virtio-v4l2 spec can be developed out of
tree for those, who need 100% compatibility right now. Maybe we can link
it in the virtio-video, when it is ready?

> Sorry about asking all those basic questions, but I really rely on the
> judgment of people familiar with the infrastructure and use cases so
> that we end up with a specification that is actually usable in the long
> term. Too many details are likely to confuse me :)

No worries. :) Thank you!

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-03 14:04                                       ` Cornelia Huck
  2023-05-03 15:11                                         ` Alex Bennée
@ 2023-05-05 11:54                                         ` Alexander Gordeev
  2023-05-08  4:55                                           ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-05 11:54 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On 03.05.23 16:04, Cornelia Huck wrote:
> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>
>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>> But in any case, that's irrelevant to the guest-host interface, and I
>>> think a big part of the disagreement stems from the misconception that
>>> V4L2 absolutely needs to be used on either the guest or the host,
>>> which is absolutely not the case.
>>
>> I understand this, of course. I'm arguing, that it is harder to
>> implement it, get it straight and then maintain it over years. Also it
>> brings limitations, that sometimes can be workarounded in the virtio
>> spec, but this always comes at a cost of decreased readability and
>> increased complexity. Overall it looks clearly as a downgrade compared
>> to virtio-video for our use-case. And I believe it would be the same for
>> every developer, that has to actually implement the spec, not just do
>> the pass through. So if we think of V4L2 UAPI pass through as a
>> compatibility device (which I believe it is), then it is fine to have
>> both and keep improving the virtio-video, including taking the best
>> ideas from the V4L2 and overall using it as a reference to make writing
>> the driver simpler.
>
> Let me jump in here and ask another question:
>
> Imagine that, some years in the future, somebody wants to add a virtio
> device for handling video encoding/decoding to their hypervisor.
>
> Option 1: There are different devices to chose from. How is the person
> implementing this supposed to pick a device? They might have a narrow
> use case, where it is clear which of the devices is the one that needs to
> be supported; but they also might have multiple, diverse use cases, and
> end up needing to implement all of the devices.

I think in this case virtio-v4l2 should be used as a compatibility
device exclusively. This means discouraging increasing its complexity
even more with more patches in the spec. virtio-video should eventually
cover all the use-cases of V4L2, so I think it is reasonable to use it
in both complex use-cases and in simple use-cases, where there is no
decoder/encoder V4L2 device on the host.

> Option 2: There is one device with various optional features. The person
> implementing this can start off with a certain subset of features
> depending on their expected use cases, and add to it later, if needed;
> but the upfront complexity might be too high for specialized use cases.

In this case I'd prefer to have the simpler device first, that is the
current virtio-video, then to add features incrementally using feature
flags and taking into account the virtualization context. V4L2 is a
complex thing from a different context. They already tried to carve out
some of the use-cases like stateful decoder/encoder API, but this work
is not finished (struct v4l2_buffer can serve as an evidence). This is
like dissecting a monolith. Also it has to be patched to make it more
appropriate for virtualization (we can see this in Alexandre's PoC already).

> Leaving concrete references to V4L2 out of the picture, we're currently
> trying to decide whether our future will be more like Option 1 or Option
> 2, with their respective trade-offs.

I'd like to rely on opinions of people, who know more about virtio
development and goals. I would be happy to present or reiterate my
arguments to anyone interested if necessary.

> I'm slightly biased towards Option 2; does it look feasible at all, or
> am I missing something essential here? (I had the impression that some
> previous confusion had been cleared up; apologies in advance if I'm
> misrepresenting things.)

Indeed some of the previous confusion has been cleared up. But not the
key thing. Alexandre still claims, that this patched V4L2 UAPI pass
through is only marginally more complex, for example. I don't agree with
this and I have evidence. We haven't finished discussing this evidence.

> I'd really love to see some kind of consensus for 1.3, if at all
> possible :)

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-03 15:11                                         ` Alex Bennée
  2023-05-03 15:53                                           ` Cornelia Huck
@ 2023-05-05 12:28                                           ` Alexander Gordeev
  1 sibling, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-05 12:28 UTC (permalink / raw)
  To: Alex Bennée, Cornelia Huck
  Cc: Alexandre Courbot, virtio-dev, Keiichi Watanabe, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On 03.05.23 17:11, Alex Bennée wrote:
>
> Cornelia Huck <cohuck@redhat.com> writes:
>
>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>
>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>> think a big part of the disagreement stems from the misconception that
>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>> which is absolutely not the case.
>>>
>>> I understand this, of course. I'm arguing, that it is harder to
>>> implement it, get it straight and then maintain it over years. Also it
>>> brings limitations, that sometimes can be workarounded in the virtio
>>> spec, but this always comes at a cost of decreased readability and
>>> increased complexity. Overall it looks clearly as a downgrade compared
>>> to virtio-video for our use-case. And I believe it would be the same for
>>> every developer, that has to actually implement the spec, not just do
>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>> compatibility device (which I believe it is), then it is fine to have
>>> both and keep improving the virtio-video, including taking the best
>>> ideas from the V4L2 and overall using it as a reference to make writing
>>> the driver simpler.
>>
>> Let me jump in here and ask another question:
>>
>> Imagine that, some years in the future, somebody wants to add a virtio
>> device for handling video encoding/decoding to their hypervisor.
>>
>> Option 1: There are different devices to chose from. How is the person
>> implementing this supposed to pick a device? They might have a narrow
>> use case, where it is clear which of the devices is the one that needs to
>> be supported; but they also might have multiple, diverse use cases, and
>> end up needing to implement all of the devices.
>>
>> Option 2: There is one device with various optional features. The person
>> implementing this can start off with a certain subset of features
>> depending on their expected use cases, and add to it later, if needed;
>> but the upfront complexity might be too high for specialized use cases.
>>
>> Leaving concrete references to V4L2 out of the picture, we're currently
>> trying to decide whether our future will be more like Option 1 or Option
>> 2, with their respective trade-offs.
>>
>> I'm slightly biased towards Option 2; does it look feasible at all, or
>> am I missing something essential here? (I had the impression that some
>> previous confusion had been cleared up; apologies in advance if I'm
>> misrepresenting things.)
>>
>> I'd really love to see some kind of consensus for 1.3, if at all
>> possible :)
>
> I think feature discovery and extensibility is a key part of the VirtIO
> paradigm which is why I find the virtio-v4l approach limiting. By
> pegging the device to a Linux API we effectively limit the growth of the
> device specification to as fast as the Linux API changes. I'm not fully
> immersed in v4l but I don't think it is seeing any additional features
> developed for it and its limitations for camera are one of the reasons
> stuff is being pushed to userspace in solutions like libcamera:
>
>    How is libcamera different from V4L2?
>
>    We see libcamera as a continuation of V4L2. One that can more easily
>    handle the recent advances in hardware design. As embedded cameras have
>    developed, all of the complexity has been pushed on to the developers.
>    With libcamera, all of that complexity is simplified and a single model
>    is presented to application developers.
>
> That said its not totally our experience to have virtio devices act as
> simple pipes for some higher level protocol. The virtio-gpu spec says
> very little about the details of how 3D devices work and simply offers
> an opaque pipe to push a (potentially propriety) command stream to the
> back end. As far as I'm aware the proposals for Vulkan and Wayland
> device support doesn't even offer a feature bit but simply changes the
> graphics stream type in the command packets.

I'd like to note, that virtio-v4l2 is not going to be a simple pipe for
V4L2 UAPI already AFAIU, because it is going to have some patches for
the forwarded protocol in the virtio spec. AFAIK virtio-gpu doesn't do this.

> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
> incompatible with other feature bits and make that the baseline
> implementation but it's not really in the spirit of what VirtIO is
> trying to achieve.

Thank you for your input!

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
       [not found]                                               ` <168329085253.1880445.14002473591422425775@Monstersaurus>
@ 2023-05-05 15:55                                                 ` Alex Bennée
       [not found]                                                   ` <20230506081229.GA8114@pendragon.ideasonboard.com>
  2023-05-16 12:57                                                   ` Alexander Gordeev
  0 siblings, 2 replies; 97+ messages in thread
From: Alex Bennée @ 2023-05-05 15:55 UTC (permalink / raw)
  To: Kieran Bingham
  Cc: Alexander Gordeev, Cornelia Huck, Alexandre Courbot, virtio-dev,
	Keiichi Watanabe, Marcin Wojtas, Matti Möll, Andrew Gazizov,
	Enrico Granata, Gustavo Padovan, Peter Griffin,
	Bartłomiej Grzesik, Tomasz Figa, Daniel Almeida,
	Enric Balletbo i Serra, Albert Esteve, libcamera-devel


Kieran Bingham <kieran.bingham@ideasonboard.com> writes:

> Hi All,
>
> Coming in late, thanks to lei/lore spotting the libcamera keyword.
>
> + Cc: libcamera-devel to raise awareness of the discussion there.
>
> Quoting Alexander Gordeev (2023-05-05 10:57:29)
>> On 03.05.23 17:53, Cornelia Huck wrote:
>> > On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
>> >
>> >> Cornelia Huck <cohuck@redhat.com> writes:
>> >>
>> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>> >>>
>> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
>> >>>>> think a big part of the disagreement stems from the misconception that
>> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
>> >>>>> which is absolutely not the case.
>> >>>>
>> >>>> I understand this, of course. I'm arguing, that it is harder to
>> >>>> implement it, get it straight and then maintain it over years. Also it
>> >>>> brings limitations, that sometimes can be workarounded in the virtio
>> >>>> spec, but this always comes at a cost of decreased readability and
>> >>>> increased complexity. Overall it looks clearly as a downgrade compared
>> >>>> to virtio-video for our use-case. And I believe it would be the same for
>> >>>> every developer, that has to actually implement the spec, not just do
>> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
>> >>>> compatibility device (which I believe it is), then it is fine to have
>> >>>> both and keep improving the virtio-video, including taking the best
>> >>>> ideas from the V4L2 and overall using it as a reference to make writing
>> >>>> the driver simpler.
>> >>>
>> >>> Let me jump in here and ask another question:
>> >>>
>> >>> Imagine that, some years in the future, somebody wants to add a virtio
>> >>> device for handling video encoding/decoding to their hypervisor.
>> >>>
>> >>> Option 1: There are different devices to chose from. How is the person
>> >>> implementing this supposed to pick a device? They might have a narrow
>> >>> use case, where it is clear which of the devices is the one that needs to
>> >>> be supported; but they also might have multiple, diverse use cases, and
>> >>> end up needing to implement all of the devices.
>> >>>
>> >>> Option 2: There is one device with various optional features. The person
>> >>> implementing this can start off with a certain subset of features
>> >>> depending on their expected use cases, and add to it later, if needed;
>> >>> but the upfront complexity might be too high for specialized use cases.
>> >>>
>> >>> Leaving concrete references to V4L2 out of the picture, we're currently
>> >>> trying to decide whether our future will be more like Option 1 or Option
>> >>> 2, with their respective trade-offs.
>> >>>
>> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
>> >>> am I missing something essential here? (I had the impression that some
>> >>> previous confusion had been cleared up; apologies in advance if I'm
>> >>> misrepresenting things.)
>> >>>
>> >>> I'd really love to see some kind of consensus for 1.3, if at all
>> >>> possible :)
>> >>
>> >> I think feature discovery and extensibility is a key part of the VirtIO
>> >> paradigm which is why I find the virtio-v4l approach limiting. By
>> >> pegging the device to a Linux API we effectively limit the growth of the
>> >> device specification to as fast as the Linux API changes. I'm not fully
>> >> immersed in v4l but I don't think it is seeing any additional features
>> >> developed for it and its limitations for camera are one of the reasons
>> >> stuff is being pushed to userspace in solutions like libcamera:
>> >>
>> >>    How is libcamera different from V4L2?
>> >>
>> >>    We see libcamera as a continuation of V4L2. One that can more easily
>> >>    handle the recent advances in hardware design. As embedded cameras have
>> >>    developed, all of the complexity has been pushed on to the developers.
>> >>    With libcamera, all of that complexity is simplified and a single model
>> >>    is presented to application developers.
>> >
>> > Ok, that is interesting; thanks for the information.
>> >
>> >>
>> >> That said its not totally our experience to have virtio devices act as
>> >> simple pipes for some higher level protocol. The virtio-gpu spec says
>> >> very little about the details of how 3D devices work and simply offers
>> >> an opaque pipe to push a (potentially propriety) command stream to the
>> >> back end. As far as I'm aware the proposals for Vulkan and Wayland
>> >> device support doesn't even offer a feature bit but simply changes the
>> >> graphics stream type in the command packets.
>> >>
>> >> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
>> >> incompatible with other feature bits and make that the baseline
>> >> implementation but it's not really in the spirit of what VirtIO is
>> >> trying to achieve.
>> >
>> > I'd not be in favour of an incompatible feature flag,
>> > either... extensions are good, but conflicting features is something
>> > that I'd like to avoid.
>> >
>> > So, given that I'd still prefer to have a single device: How well does
>> > the proposed virtio-video device map to a Linux driver implementation
>> > that hooks into V4L2?
>> 
>> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
>> months making the existing driver fully V4L2 compliant. If this goal
>> requires changing the spec, than we still have time to do that. I don't
>> expect a lot of problems on this side. There might be problems with
>> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
>> of this can be accomplished over time.
>> 
>> > If the general process flow is compatible and it
>> > is mostly a question of wiring the parts together, I think pushing that
>> > part of the complexity into the Linux driver is a reasonable
>> > trade-off. Being able to use an existing protocol is nice, but if that
>> > protocol is not perceived as flexible enough, it is probably not worth
>> > encoding it into a spec. (Similar considerations apply to hooking up the
>> > device in the hypervisor.)
>> 
>> I very much agree with these statements. I think this is how it should
>> be: we start with a compact but usable device, then add features and
>> enable them using feature flags. Eventually we can cover all the
>> use-cases of V4L2 unless we decide to have separate devices for them
>> (virtio-camera, etc). This would be better in the long term I think.
>
> Camera's definitely have their quirks - mostly because many usecases are
> hard to convey over a single Video device node (with the hardware) but I
> think we might expect that complexity to be managed by the host, and
> probably offer a ready made stream to the guest. Of course how to handle
> multiple streams and configuration of the whole pipeline may get more
> difficult and warrant a specific 'virtio-camera' ... but I would think
> the basics could be covered generically to start with.
>
> It's not clear who's driving this implementation and spec, so I guess
> there's more reading to do.
>
> Anyway, I've added Cc libcamera-devel to raise awareness of this topic
> to camera list.
>
> I bet Laurent has some stronger opinions on how he'd see camera's exist
> in a virtio space.

Personally I would rather see a separate virtio-camera specification
that properly encapsulates all the various use cases we have for
cameras. In many ways just processing a stream of video is a much
simpler use case.

During Linaro's Project Stratos we got a lot of feedback from members
who professed interest in a virtio-camera initiative. However we were
unable to get enough engineering resources from the various companies to
collaborate in developing a specification that would meet everyone's
needs. The problem space is wide from having numerous black and white
sensor cameras on cars to the full on computational photography as
exposed by modern camera systems on phones. If you want to read more
words on the topic I wrote a blog post at the time:

  https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/

Back to the topic of virtio-video as I understand it the principle
features/configurations are:

  - All the various CODECs, resolutions and pixel formats
  - Stateful vs Stateless streams
  - If we want support grabbing single frames from a source

My main concern about the V4L approach is that it pegs updates to the
interface to the continuing evolution of the V4L interface in Linux. Now
maybe video is a solved problem and there won't be (m)any new features
we need to add after the initial revision. However I'm not a domain
expert here so I just don't know.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
       [not found]                                                   ` <20230506081229.GA8114@pendragon.ideasonboard.com>
@ 2023-05-06  8:16                                                     ` Laurent Pinchart
  2023-05-08  8:00                                                         ` Alexandre Courbot
  2023-05-16 13:50                                                         ` Alexander Gordeev
  2023-05-17  3:58                                                     ` [virtio-dev] " Tomasz Figa
  1 sibling, 2 replies; 97+ messages in thread
From: Laurent Pinchart @ 2023-05-06  8:16 UTC (permalink / raw)
  To: Alex Bennée, virtio-dev, Albert Esteve, Matti Möll,
	Alexandre Courbot, Andrew Gazizov, Daniel Almeida, Cornelia Huck,
	Marcin Wojtas, Keiichi Watanabe, Gustavo Padovan,
	Alexander Gordeev, libcamera-devel, Bartłomiej Grzesik,
	Enrico Granata, Enric Balletbo i Serra, linux-media

I'm also CC'ing the linux-media@vger.kernel.org mailing list for these
discussions, I'm sure there are folks there who are interested in codec
and camera virtualization.

On Sat, May 06, 2023 at 11:12:29AM +0300, Laurent Pinchart via libcamera-devel wrote:
> On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
> > Kieran Bingham writes:
> > > Quoting Alexander Gordeev (2023-05-05 10:57:29)
> > >> On 03.05.23 17:53, Cornelia Huck wrote:
> > >> > On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
> > >> >> Cornelia Huck <cohuck@redhat.com> writes:
> > >> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> > >> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> > >> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
> > >> >>>>> think a big part of the disagreement stems from the misconception that
> > >> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
> > >> >>>>> which is absolutely not the case.
> > >> >>>>
> > >> >>>> I understand this, of course. I'm arguing, that it is harder to
> > >> >>>> implement it, get it straight and then maintain it over years. Also it
> > >> >>>> brings limitations, that sometimes can be workarounded in the virtio
> > >> >>>> spec, but this always comes at a cost of decreased readability and
> > >> >>>> increased complexity. Overall it looks clearly as a downgrade compared
> > >> >>>> to virtio-video for our use-case. And I believe it would be the same for
> > >> >>>> every developer, that has to actually implement the spec, not just do
> > >> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
> > >> >>>> compatibility device (which I believe it is), then it is fine to have
> > >> >>>> both and keep improving the virtio-video, including taking the best
> > >> >>>> ideas from the V4L2 and overall using it as a reference to make writing
> > >> >>>> the driver simpler.
> > >> >>>
> > >> >>> Let me jump in here and ask another question:
> > >> >>>
> > >> >>> Imagine that, some years in the future, somebody wants to add a virtio
> > >> >>> device for handling video encoding/decoding to their hypervisor.
> > >> >>>
> > >> >>> Option 1: There are different devices to chose from. How is the person
> > >> >>> implementing this supposed to pick a device? They might have a narrow
> > >> >>> use case, where it is clear which of the devices is the one that needs to
> > >> >>> be supported; but they also might have multiple, diverse use cases, and
> > >> >>> end up needing to implement all of the devices.
> > >> >>>
> > >> >>> Option 2: There is one device with various optional features. The person
> > >> >>> implementing this can start off with a certain subset of features
> > >> >>> depending on their expected use cases, and add to it later, if needed;
> > >> >>> but the upfront complexity might be too high for specialized use cases.
> > >> >>>
> > >> >>> Leaving concrete references to V4L2 out of the picture, we're currently
> > >> >>> trying to decide whether our future will be more like Option 1 or Option
> > >> >>> 2, with their respective trade-offs.
> > >> >>>
> > >> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
> > >> >>> am I missing something essential here? (I had the impression that some
> > >> >>> previous confusion had been cleared up; apologies in advance if I'm
> > >> >>> misrepresenting things.)
> > >> >>>
> > >> >>> I'd really love to see some kind of consensus for 1.3, if at all
> > >> >>> possible :)
> > >> >>
> > >> >> I think feature discovery and extensibility is a key part of the VirtIO
> > >> >> paradigm which is why I find the virtio-v4l approach limiting. By
> > >> >> pegging the device to a Linux API we effectively limit the growth of the
> > >> >> device specification to as fast as the Linux API changes. I'm not fully
> > >> >> immersed in v4l but I don't think it is seeing any additional features
> > >> >> developed for it and its limitations for camera are one of the reasons
> > >> >> stuff is being pushed to userspace in solutions like libcamera:
> > >> >>
> > >> >>    How is libcamera different from V4L2?
> > >> >>
> > >> >>    We see libcamera as a continuation of V4L2. One that can more easily
> > >> >>    handle the recent advances in hardware design. As embedded cameras have
> > >> >>    developed, all of the complexity has been pushed on to the developers.
> > >> >>    With libcamera, all of that complexity is simplified and a single model
> > >> >>    is presented to application developers.
> > >> >
> > >> > Ok, that is interesting; thanks for the information.
> > >> >
> > >> >>
> > >> >> That said its not totally our experience to have virtio devices act as
> > >> >> simple pipes for some higher level protocol. The virtio-gpu spec says
> > >> >> very little about the details of how 3D devices work and simply offers
> > >> >> an opaque pipe to push a (potentially propriety) command stream to the
> > >> >> back end. As far as I'm aware the proposals for Vulkan and Wayland
> > >> >> device support doesn't even offer a feature bit but simply changes the
> > >> >> graphics stream type in the command packets.
> > >> >>
> > >> >> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
> > >> >> incompatible with other feature bits and make that the baseline
> > >> >> implementation but it's not really in the spirit of what VirtIO is
> > >> >> trying to achieve.
> > >> >
> > >> > I'd not be in favour of an incompatible feature flag,
> > >> > either... extensions are good, but conflicting features is something
> > >> > that I'd like to avoid.
> > >> >
> > >> > So, given that I'd still prefer to have a single device: How well does
> > >> > the proposed virtio-video device map to a Linux driver implementation
> > >> > that hooks into V4L2?
> > >> 
> > >> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
> > >> months making the existing driver fully V4L2 compliant. If this goal
> > >> requires changing the spec, than we still have time to do that. I don't
> > >> expect a lot of problems on this side. There might be problems with
> > >> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
> > >> of this can be accomplished over time.
> > >> 
> > >> > If the general process flow is compatible and it
> > >> > is mostly a question of wiring the parts together, I think pushing that
> > >> > part of the complexity into the Linux driver is a reasonable
> > >> > trade-off. Being able to use an existing protocol is nice, but if that
> > >> > protocol is not perceived as flexible enough, it is probably not worth
> > >> > encoding it into a spec. (Similar considerations apply to hooking up the
> > >> > device in the hypervisor.)
> > >> 
> > >> I very much agree with these statements. I think this is how it should
> > >> be: we start with a compact but usable device, then add features and
> > >> enable them using feature flags. Eventually we can cover all the
> > >> use-cases of V4L2 unless we decide to have separate devices for them
> > >> (virtio-camera, etc). This would be better in the long term I think.
> > >
> > > Camera's definitely have their quirks - mostly because many usecases are
> > > hard to convey over a single Video device node (with the hardware) but I
> > > think we might expect that complexity to be managed by the host, and
> > > probably offer a ready made stream to the guest. Of course how to handle
> > > multiple streams and configuration of the whole pipeline may get more
> > > difficult and warrant a specific 'virtio-camera' ... but I would think
> > > the basics could be covered generically to start with.
> > >
> > > It's not clear who's driving this implementation and spec, so I guess
> > > there's more reading to do.
> > >
> > > Anyway, I've added Cc libcamera-devel to raise awareness of this topic
> > > to camera list.
> > >
> > > I bet Laurent has some stronger opinions on how he'd see camera's exist
> > > in a virtio space.
> 
> You seem to think I have strong opinions about everything. This may not
> be a complitely unfounded assumption ;-)
> 
> Overall I agree with you, I think cameras are too complex for a
> low-level virtualization protocol. I'd rather see a high-level protocol
> that exposes webcam-like devices, with the low-level complexity handled
> on the host side (using libcamera of course ;-)). This would support use
> cases that require sharing hardware blocks between multiple logical
> cameras, including sharing the same camera streams between multiple
> guests.
> 
> If a guest needs low-level access to the camera, including the ability
> to control the raw camera sensor or ISP, then I'd recommend passing the
> corresponding hardware blocks to the guest for exclusive access.
> 
> > Personally I would rather see a separate virtio-camera specification
> > that properly encapsulates all the various use cases we have for
> > cameras. In many ways just processing a stream of video is a much
> > simpler use case.
> > 
> > During Linaro's Project Stratos we got a lot of feedback from members
> > who professed interest in a virtio-camera initiative. However we were
> > unable to get enough engineering resources from the various companies to
> > collaborate in developing a specification that would meet everyone's
> > needs. The problem space is wide from having numerous black and white
> > sensor cameras on cars to the full on computational photography as
> > exposed by modern camera systems on phones. If you want to read more
> > words on the topic I wrote a blog post at the time:
> > 
> >   https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
> > 
> > Back to the topic of virtio-video as I understand it the principle
> > features/configurations are:
> > 
> >   - All the various CODECs, resolutions and pixel formats
> >   - Stateful vs Stateless streams
> >   - If we want support grabbing single frames from a source
> > 
> > My main concern about the V4L approach is that it pegs updates to the
> > interface to the continuing evolution of the V4L interface in Linux. Now
> > maybe video is a solved problem and there won't be (m)any new features
> > we need to add after the initial revision. However I'm not a domain
> > expert here so I just don't know.
> 
> I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
> when we got a chance to meet face to face. I think the V4L2 kernel API
> is a quite good fit in the sense that its level of abstraction, when
> applied to video codecs and "simple" cameras (defined, more or less, as
> something ressembling a USB webcam feature-wise). It doesn't mean that
> the virtio-video or virtio-camera specifications should necessarily
> reference V4L2 or use the exact same vocabulary, they could simply copy
> the concepts, and stay loosely-coupled with V4L2 in the sense that both
> specification should try to evolve in compatible directions.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-05 11:54                                         ` Alexander Gordeev
@ 2023-05-08  4:55                                           ` Alexandre Courbot
  2023-05-11  8:50                                             ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-05-08  4:55 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Fri, May 5, 2023 at 8:55 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 03.05.23 16:04, Cornelia Huck wrote:
> > On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >
> >> On 27.04.23 15:16, Alexandre Courbot wrote:
> >>> But in any case, that's irrelevant to the guest-host interface, and I
> >>> think a big part of the disagreement stems from the misconception that
> >>> V4L2 absolutely needs to be used on either the guest or the host,
> >>> which is absolutely not the case.
> >>
> >> I understand this, of course. I'm arguing, that it is harder to
> >> implement it, get it straight and then maintain it over years. Also it
> >> brings limitations, that sometimes can be workarounded in the virtio
> >> spec, but this always comes at a cost of decreased readability and
> >> increased complexity. Overall it looks clearly as a downgrade compared
> >> to virtio-video for our use-case. And I believe it would be the same for
> >> every developer, that has to actually implement the spec, not just do
> >> the pass through. So if we think of V4L2 UAPI pass through as a
> >> compatibility device (which I believe it is), then it is fine to have
> >> both and keep improving the virtio-video, including taking the best
> >> ideas from the V4L2 and overall using it as a reference to make writing
> >> the driver simpler.
> >
> > Let me jump in here and ask another question:
> >
> > Imagine that, some years in the future, somebody wants to add a virtio
> > device for handling video encoding/decoding to their hypervisor.
> >
> > Option 1: There are different devices to chose from. How is the person
> > implementing this supposed to pick a device? They might have a narrow
> > use case, where it is clear which of the devices is the one that needs to
> > be supported; but they also might have multiple, diverse use cases, and
> > end up needing to implement all of the devices.
>
> I think in this case virtio-v4l2 should be used as a compatibility
> device exclusively. This means discouraging increasing its complexity
> even more with more patches in the spec. virtio-video should eventually
> cover all the use-cases of V4L2, so I think it is reasonable to use it
> in both complex use-cases and in simple use-cases, where there is no
> decoder/encoder V4L2 device on the host.
>
> > Option 2: There is one device with various optional features. The person
> > implementing this can start off with a certain subset of features
> > depending on their expected use cases, and add to it later, if needed;
> > but the upfront complexity might be too high for specialized use cases.

I don't see that many negociable features we can provide for a
decoder/encoder device - at least not many that are not considered
basic (like guest buffers). In terms of provided features for codecs
virtio-video and virtio-v4l2 are essentially equivalent.

The question is more: do we want a decoder/encoder specification and
another one for cameras or do we want something that covers video
devices in general. And is it desirable to reuse an already existing
protocol for communication between a non-privileged entity and a
privileged one (V4L2) or define our own?

About the latter point, Alex Bennée mentioned that it was difficult to
find the engineering time to define virtio-camera. And virtio-video
has also not been particularly fast to come together. virtio-v4l2
basically serves us both on a plate for a lower effort.

>
> In this case I'd prefer to have the simpler device first, that is the
> current virtio-video, then to add features incrementally using feature
> flags and taking into account the virtualization context. V4L2 is a
> complex thing from a different context. They already tried to carve out
> some of the use-cases like stateful decoder/encoder API, but this work
> is not finished (struct v4l2_buffer can serve as an evidence). This is
> like dissecting a monolith. Also it has to be patched to make it more
> appropriate for virtualization (we can see this in Alexandre's PoC already).
>
> > Leaving concrete references to V4L2 out of the picture, we're currently
> > trying to decide whether our future will be more like Option 1 or Option
> > 2, with their respective trade-offs.
>
> I'd like to rely on opinions of people, who know more about virtio
> development and goals. I would be happy to present or reiterate my
> arguments to anyone interested if necessary.
>
> > I'm slightly biased towards Option 2; does it look feasible at all, or
> > am I missing something essential here? (I had the impression that some
> > previous confusion had been cleared up; apologies in advance if I'm
> > misrepresenting things.)
>
> Indeed some of the previous confusion has been cleared up. But not the
> key thing. Alexandre still claims, that this patched V4L2 UAPI pass
> through is only marginally more complex, for example. I don't agree with
> this and I have evidence. We haven't finished discussing this evidence.

Are you talking about v4l2_buffer?

https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/buffer.html#struct-v4l2-buffer

I think you implied that some of its fields were not relevant for
video decoding or encoding, which if you examine them is again
incorrect. That also answers your question of why the stateful decoder
spec did not mention the valid fields - because it is documented on
this page, which tells exactly which fields the driver/device are
expected to set for each queue.

Also I think I mentioned it before, but it would be really sweet if
you could stop mischaracterizing virtio-v4l2 as a "passthrough" since
it is very much not that, much like virgl and venus aren't either.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-06  8:16                                                     ` [libcamera-devel] " Laurent Pinchart
@ 2023-05-08  8:00                                                         ` Alexandre Courbot
  2023-05-16 13:50                                                         ` Alexander Gordeev
  1 sibling, 0 replies; 97+ messages in thread
From: Alexandre Courbot @ 2023-05-08  8:00 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Alex Bennée, virtio-dev, Albert Esteve, Matti Möll,
	Andrew Gazizov, Daniel Almeida, Cornelia Huck, Marcin Wojtas,
	Keiichi Watanabe, Gustavo Padovan, Alexander Gordeev,
	libcamera-devel, Bartłomiej Grzesik, Enrico Granata,
	Enric Balletbo i Serra, linux-media

Just to add some context for linux-media@, as I think it may be
missing from the quoted thread:

The virtio-video specification has been dragging for quite some time,
and the more it progresses the more it starts looking like V4L2 with
different names. So I suggested that we just encapsulate V4L2 syscalls
into virtio descriptors and basically use the V4L2 model for video
device virtualization. The benefits would be a much shorter
virtio-video specification, and support for other kinds of V4L2
devices like cameras.

I tried to write a quick prototype to test the idea and it works well
enough to expose a USB webcam or the vicodec decoder/encoder using the
same guest driver:

https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c

(driver not supposed to be upstreamed as-is ; it was quickly put
together to check whether the idea could fly).

It would be interesting to hear what the V4L2 maintainers and active
contributors think of this idea. IMHO it provides much more bang for
the buck than having to write new specifications for codec and camera
virtualization, but there are arguments that V4L2 would be too complex
for virtualizing video codecs, and is overall not specified as
precisely as virtio-video would be.

On Sat, May 6, 2023 at 5:16 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> I'm also CC'ing the linux-media@vger.kernel.org mailing list for these
> discussions, I'm sure there are folks there who are interested in codec
> and camera virtualization.
>
> On Sat, May 06, 2023 at 11:12:29AM +0300, Laurent Pinchart via libcamera-devel wrote:
> > On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
> > > Kieran Bingham writes:
> > > > Quoting Alexander Gordeev (2023-05-05 10:57:29)
> > > >> On 03.05.23 17:53, Cornelia Huck wrote:
> > > >> > On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
> > > >> >> Cornelia Huck <cohuck@redhat.com> writes:
> > > >> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> > > >> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> > > >> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
> > > >> >>>>> think a big part of the disagreement stems from the misconception that
> > > >> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
> > > >> >>>>> which is absolutely not the case.
> > > >> >>>>
> > > >> >>>> I understand this, of course. I'm arguing, that it is harder to
> > > >> >>>> implement it, get it straight and then maintain it over years. Also it
> > > >> >>>> brings limitations, that sometimes can be workarounded in the virtio
> > > >> >>>> spec, but this always comes at a cost of decreased readability and
> > > >> >>>> increased complexity. Overall it looks clearly as a downgrade compared
> > > >> >>>> to virtio-video for our use-case. And I believe it would be the same for
> > > >> >>>> every developer, that has to actually implement the spec, not just do
> > > >> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
> > > >> >>>> compatibility device (which I believe it is), then it is fine to have
> > > >> >>>> both and keep improving the virtio-video, including taking the best
> > > >> >>>> ideas from the V4L2 and overall using it as a reference to make writing
> > > >> >>>> the driver simpler.
> > > >> >>>
> > > >> >>> Let me jump in here and ask another question:
> > > >> >>>
> > > >> >>> Imagine that, some years in the future, somebody wants to add a virtio
> > > >> >>> device for handling video encoding/decoding to their hypervisor.
> > > >> >>>
> > > >> >>> Option 1: There are different devices to chose from. How is the person
> > > >> >>> implementing this supposed to pick a device? They might have a narrow
> > > >> >>> use case, where it is clear which of the devices is the one that needs to
> > > >> >>> be supported; but they also might have multiple, diverse use cases, and
> > > >> >>> end up needing to implement all of the devices.
> > > >> >>>
> > > >> >>> Option 2: There is one device with various optional features. The person
> > > >> >>> implementing this can start off with a certain subset of features
> > > >> >>> depending on their expected use cases, and add to it later, if needed;
> > > >> >>> but the upfront complexity might be too high for specialized use cases.
> > > >> >>>
> > > >> >>> Leaving concrete references to V4L2 out of the picture, we're currently
> > > >> >>> trying to decide whether our future will be more like Option 1 or Option
> > > >> >>> 2, with their respective trade-offs.
> > > >> >>>
> > > >> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
> > > >> >>> am I missing something essential here? (I had the impression that some
> > > >> >>> previous confusion had been cleared up; apologies in advance if I'm
> > > >> >>> misrepresenting things.)
> > > >> >>>
> > > >> >>> I'd really love to see some kind of consensus for 1.3, if at all
> > > >> >>> possible :)
> > > >> >>
> > > >> >> I think feature discovery and extensibility is a key part of the VirtIO
> > > >> >> paradigm which is why I find the virtio-v4l approach limiting. By
> > > >> >> pegging the device to a Linux API we effectively limit the growth of the
> > > >> >> device specification to as fast as the Linux API changes. I'm not fully
> > > >> >> immersed in v4l but I don't think it is seeing any additional features
> > > >> >> developed for it and its limitations for camera are one of the reasons
> > > >> >> stuff is being pushed to userspace in solutions like libcamera:
> > > >> >>
> > > >> >>    How is libcamera different from V4L2?
> > > >> >>
> > > >> >>    We see libcamera as a continuation of V4L2. One that can more easily
> > > >> >>    handle the recent advances in hardware design. As embedded cameras have
> > > >> >>    developed, all of the complexity has been pushed on to the developers.
> > > >> >>    With libcamera, all of that complexity is simplified and a single model
> > > >> >>    is presented to application developers.
> > > >> >
> > > >> > Ok, that is interesting; thanks for the information.
> > > >> >
> > > >> >>
> > > >> >> That said its not totally our experience to have virtio devices act as
> > > >> >> simple pipes for some higher level protocol. The virtio-gpu spec says
> > > >> >> very little about the details of how 3D devices work and simply offers
> > > >> >> an opaque pipe to push a (potentially propriety) command stream to the
> > > >> >> back end. As far as I'm aware the proposals for Vulkan and Wayland
> > > >> >> device support doesn't even offer a feature bit but simply changes the
> > > >> >> graphics stream type in the command packets.
> > > >> >>
> > > >> >> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
> > > >> >> incompatible with other feature bits and make that the baseline
> > > >> >> implementation but it's not really in the spirit of what VirtIO is
> > > >> >> trying to achieve.
> > > >> >
> > > >> > I'd not be in favour of an incompatible feature flag,
> > > >> > either... extensions are good, but conflicting features is something
> > > >> > that I'd like to avoid.
> > > >> >
> > > >> > So, given that I'd still prefer to have a single device: How well does
> > > >> > the proposed virtio-video device map to a Linux driver implementation
> > > >> > that hooks into V4L2?
> > > >>
> > > >> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
> > > >> months making the existing driver fully V4L2 compliant. If this goal
> > > >> requires changing the spec, than we still have time to do that. I don't
> > > >> expect a lot of problems on this side. There might be problems with
> > > >> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
> > > >> of this can be accomplished over time.
> > > >>
> > > >> > If the general process flow is compatible and it
> > > >> > is mostly a question of wiring the parts together, I think pushing that
> > > >> > part of the complexity into the Linux driver is a reasonable
> > > >> > trade-off. Being able to use an existing protocol is nice, but if that
> > > >> > protocol is not perceived as flexible enough, it is probably not worth
> > > >> > encoding it into a spec. (Similar considerations apply to hooking up the
> > > >> > device in the hypervisor.)
> > > >>
> > > >> I very much agree with these statements. I think this is how it should
> > > >> be: we start with a compact but usable device, then add features and
> > > >> enable them using feature flags. Eventually we can cover all the
> > > >> use-cases of V4L2 unless we decide to have separate devices for them
> > > >> (virtio-camera, etc). This would be better in the long term I think.
> > > >
> > > > Camera's definitely have their quirks - mostly because many usecases are
> > > > hard to convey over a single Video device node (with the hardware) but I
> > > > think we might expect that complexity to be managed by the host, and
> > > > probably offer a ready made stream to the guest. Of course how to handle
> > > > multiple streams and configuration of the whole pipeline may get more
> > > > difficult and warrant a specific 'virtio-camera' ... but I would think
> > > > the basics could be covered generically to start with.
> > > >
> > > > It's not clear who's driving this implementation and spec, so I guess
> > > > there's more reading to do.
> > > >
> > > > Anyway, I've added Cc libcamera-devel to raise awareness of this topic
> > > > to camera list.
> > > >
> > > > I bet Laurent has some stronger opinions on how he'd see camera's exist
> > > > in a virtio space.
> >
> > You seem to think I have strong opinions about everything. This may not
> > be a complitely unfounded assumption ;-)
> >
> > Overall I agree with you, I think cameras are too complex for a
> > low-level virtualization protocol. I'd rather see a high-level protocol
> > that exposes webcam-like devices, with the low-level complexity handled
> > on the host side (using libcamera of course ;-)). This would support use
> > cases that require sharing hardware blocks between multiple logical
> > cameras, including sharing the same camera streams between multiple
> > guests.
> >
> > If a guest needs low-level access to the camera, including the ability
> > to control the raw camera sensor or ISP, then I'd recommend passing the
> > corresponding hardware blocks to the guest for exclusive access.
> >
> > > Personally I would rather see a separate virtio-camera specification
> > > that properly encapsulates all the various use cases we have for
> > > cameras. In many ways just processing a stream of video is a much
> > > simpler use case.
> > >
> > > During Linaro's Project Stratos we got a lot of feedback from members
> > > who professed interest in a virtio-camera initiative. However we were
> > > unable to get enough engineering resources from the various companies to
> > > collaborate in developing a specification that would meet everyone's
> > > needs. The problem space is wide from having numerous black and white
> > > sensor cameras on cars to the full on computational photography as
> > > exposed by modern camera systems on phones. If you want to read more
> > > words on the topic I wrote a blog post at the time:
> > >
> > >   https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
> > >
> > > Back to the topic of virtio-video as I understand it the principle
> > > features/configurations are:
> > >
> > >   - All the various CODECs, resolutions and pixel formats
> > >   - Stateful vs Stateless streams
> > >   - If we want support grabbing single frames from a source
> > >
> > > My main concern about the V4L approach is that it pegs updates to the
> > > interface to the continuing evolution of the V4L interface in Linux. Now
> > > maybe video is a solved problem and there won't be (m)any new features
> > > we need to add after the initial revision. However I'm not a domain
> > > expert here so I just don't know.
> >
> > I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
> > when we got a chance to meet face to face. I think the V4L2 kernel API
> > is a quite good fit in the sense that its level of abstraction, when
> > applied to video codecs and "simple" cameras (defined, more or less, as
> > something ressembling a USB webcam feature-wise). It doesn't mean that
> > the virtio-video or virtio-camera specifications should necessarily
> > reference V4L2 or use the exact same vocabulary, they could simply copy
> > the concepts, and stay loosely-coupled with V4L2 in the sense that both
> > specification should try to evolve in compatible directions.
>
> --
> Regards,
>
> Laurent Pinchart

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
@ 2023-05-08  8:00                                                         ` Alexandre Courbot
  0 siblings, 0 replies; 97+ messages in thread
From: Alexandre Courbot @ 2023-05-08  8:00 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Alex Bennée, virtio-dev, Albert Esteve, Matti Möll,
	Andrew Gazizov, Daniel Almeida, Cornelia Huck, Marcin Wojtas,
	Keiichi Watanabe, Gustavo Padovan, Alexander Gordeev,
	libcamera-devel, Bartłomiej Grzesik, Enrico Granata,
	Enric Balletbo i Serra, linux-media

Just to add some context for linux-media@, as I think it may be
missing from the quoted thread:

The virtio-video specification has been dragging for quite some time,
and the more it progresses the more it starts looking like V4L2 with
different names. So I suggested that we just encapsulate V4L2 syscalls
into virtio descriptors and basically use the V4L2 model for video
device virtualization. The benefits would be a much shorter
virtio-video specification, and support for other kinds of V4L2
devices like cameras.

I tried to write a quick prototype to test the idea and it works well
enough to expose a USB webcam or the vicodec decoder/encoder using the
same guest driver:

https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c

(driver not supposed to be upstreamed as-is ; it was quickly put
together to check whether the idea could fly).

It would be interesting to hear what the V4L2 maintainers and active
contributors think of this idea. IMHO it provides much more bang for
the buck than having to write new specifications for codec and camera
virtualization, but there are arguments that V4L2 would be too complex
for virtualizing video codecs, and is overall not specified as
precisely as virtio-video would be.

On Sat, May 6, 2023 at 5:16 PM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> I'm also CC'ing the linux-media@vger.kernel.org mailing list for these
> discussions, I'm sure there are folks there who are interested in codec
> and camera virtualization.
>
> On Sat, May 06, 2023 at 11:12:29AM +0300, Laurent Pinchart via libcamera-devel wrote:
> > On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
> > > Kieran Bingham writes:
> > > > Quoting Alexander Gordeev (2023-05-05 10:57:29)
> > > >> On 03.05.23 17:53, Cornelia Huck wrote:
> > > >> > On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
> > > >> >> Cornelia Huck <cohuck@redhat.com> writes:
> > > >> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> > > >> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> > > >> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
> > > >> >>>>> think a big part of the disagreement stems from the misconception that
> > > >> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
> > > >> >>>>> which is absolutely not the case.
> > > >> >>>>
> > > >> >>>> I understand this, of course. I'm arguing, that it is harder to
> > > >> >>>> implement it, get it straight and then maintain it over years. Also it
> > > >> >>>> brings limitations, that sometimes can be workarounded in the virtio
> > > >> >>>> spec, but this always comes at a cost of decreased readability and
> > > >> >>>> increased complexity. Overall it looks clearly as a downgrade compared
> > > >> >>>> to virtio-video for our use-case. And I believe it would be the same for
> > > >> >>>> every developer, that has to actually implement the spec, not just do
> > > >> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
> > > >> >>>> compatibility device (which I believe it is), then it is fine to have
> > > >> >>>> both and keep improving the virtio-video, including taking the best
> > > >> >>>> ideas from the V4L2 and overall using it as a reference to make writing
> > > >> >>>> the driver simpler.
> > > >> >>>
> > > >> >>> Let me jump in here and ask another question:
> > > >> >>>
> > > >> >>> Imagine that, some years in the future, somebody wants to add a virtio
> > > >> >>> device for handling video encoding/decoding to their hypervisor.
> > > >> >>>
> > > >> >>> Option 1: There are different devices to chose from. How is the person
> > > >> >>> implementing this supposed to pick a device? They might have a narrow
> > > >> >>> use case, where it is clear which of the devices is the one that needs to
> > > >> >>> be supported; but they also might have multiple, diverse use cases, and
> > > >> >>> end up needing to implement all of the devices.
> > > >> >>>
> > > >> >>> Option 2: There is one device with various optional features. The person
> > > >> >>> implementing this can start off with a certain subset of features
> > > >> >>> depending on their expected use cases, and add to it later, if needed;
> > > >> >>> but the upfront complexity might be too high for specialized use cases.
> > > >> >>>
> > > >> >>> Leaving concrete references to V4L2 out of the picture, we're currently
> > > >> >>> trying to decide whether our future will be more like Option 1 or Option
> > > >> >>> 2, with their respective trade-offs.
> > > >> >>>
> > > >> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
> > > >> >>> am I missing something essential here? (I had the impression that some
> > > >> >>> previous confusion had been cleared up; apologies in advance if I'm
> > > >> >>> misrepresenting things.)
> > > >> >>>
> > > >> >>> I'd really love to see some kind of consensus for 1.3, if at all
> > > >> >>> possible :)
> > > >> >>
> > > >> >> I think feature discovery and extensibility is a key part of the VirtIO
> > > >> >> paradigm which is why I find the virtio-v4l approach limiting. By
> > > >> >> pegging the device to a Linux API we effectively limit the growth of the
> > > >> >> device specification to as fast as the Linux API changes. I'm not fully
> > > >> >> immersed in v4l but I don't think it is seeing any additional features
> > > >> >> developed for it and its limitations for camera are one of the reasons
> > > >> >> stuff is being pushed to userspace in solutions like libcamera:
> > > >> >>
> > > >> >>    How is libcamera different from V4L2?
> > > >> >>
> > > >> >>    We see libcamera as a continuation of V4L2. One that can more easily
> > > >> >>    handle the recent advances in hardware design. As embedded cameras have
> > > >> >>    developed, all of the complexity has been pushed on to the developers.
> > > >> >>    With libcamera, all of that complexity is simplified and a single model
> > > >> >>    is presented to application developers.
> > > >> >
> > > >> > Ok, that is interesting; thanks for the information.
> > > >> >
> > > >> >>
> > > >> >> That said its not totally our experience to have virtio devices act as
> > > >> >> simple pipes for some higher level protocol. The virtio-gpu spec says
> > > >> >> very little about the details of how 3D devices work and simply offers
> > > >> >> an opaque pipe to push a (potentially propriety) command stream to the
> > > >> >> back end. As far as I'm aware the proposals for Vulkan and Wayland
> > > >> >> device support doesn't even offer a feature bit but simply changes the
> > > >> >> graphics stream type in the command packets.
> > > >> >>
> > > >> >> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
> > > >> >> incompatible with other feature bits and make that the baseline
> > > >> >> implementation but it's not really in the spirit of what VirtIO is
> > > >> >> trying to achieve.
> > > >> >
> > > >> > I'd not be in favour of an incompatible feature flag,
> > > >> > either... extensions are good, but conflicting features is something
> > > >> > that I'd like to avoid.
> > > >> >
> > > >> > So, given that I'd still prefer to have a single device: How well does
> > > >> > the proposed virtio-video device map to a Linux driver implementation
> > > >> > that hooks into V4L2?
> > > >>
> > > >> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
> > > >> months making the existing driver fully V4L2 compliant. If this goal
> > > >> requires changing the spec, than we still have time to do that. I don't
> > > >> expect a lot of problems on this side. There might be problems with
> > > >> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
> > > >> of this can be accomplished over time.
> > > >>
> > > >> > If the general process flow is compatible and it
> > > >> > is mostly a question of wiring the parts together, I think pushing that
> > > >> > part of the complexity into the Linux driver is a reasonable
> > > >> > trade-off. Being able to use an existing protocol is nice, but if that
> > > >> > protocol is not perceived as flexible enough, it is probably not worth
> > > >> > encoding it into a spec. (Similar considerations apply to hooking up the
> > > >> > device in the hypervisor.)
> > > >>
> > > >> I very much agree with these statements. I think this is how it should
> > > >> be: we start with a compact but usable device, then add features and
> > > >> enable them using feature flags. Eventually we can cover all the
> > > >> use-cases of V4L2 unless we decide to have separate devices for them
> > > >> (virtio-camera, etc). This would be better in the long term I think.
> > > >
> > > > Camera's definitely have their quirks - mostly because many usecases are
> > > > hard to convey over a single Video device node (with the hardware) but I
> > > > think we might expect that complexity to be managed by the host, and
> > > > probably offer a ready made stream to the guest. Of course how to handle
> > > > multiple streams and configuration of the whole pipeline may get more
> > > > difficult and warrant a specific 'virtio-camera' ... but I would think
> > > > the basics could be covered generically to start with.
> > > >
> > > > It's not clear who's driving this implementation and spec, so I guess
> > > > there's more reading to do.
> > > >
> > > > Anyway, I've added Cc libcamera-devel to raise awareness of this topic
> > > > to camera list.
> > > >
> > > > I bet Laurent has some stronger opinions on how he'd see camera's exist
> > > > in a virtio space.
> >
> > You seem to think I have strong opinions about everything. This may not
> > be a complitely unfounded assumption ;-)
> >
> > Overall I agree with you, I think cameras are too complex for a
> > low-level virtualization protocol. I'd rather see a high-level protocol
> > that exposes webcam-like devices, with the low-level complexity handled
> > on the host side (using libcamera of course ;-)). This would support use
> > cases that require sharing hardware blocks between multiple logical
> > cameras, including sharing the same camera streams between multiple
> > guests.
> >
> > If a guest needs low-level access to the camera, including the ability
> > to control the raw camera sensor or ISP, then I'd recommend passing the
> > corresponding hardware blocks to the guest for exclusive access.
> >
> > > Personally I would rather see a separate virtio-camera specification
> > > that properly encapsulates all the various use cases we have for
> > > cameras. In many ways just processing a stream of video is a much
> > > simpler use case.
> > >
> > > During Linaro's Project Stratos we got a lot of feedback from members
> > > who professed interest in a virtio-camera initiative. However we were
> > > unable to get enough engineering resources from the various companies to
> > > collaborate in developing a specification that would meet everyone's
> > > needs. The problem space is wide from having numerous black and white
> > > sensor cameras on cars to the full on computational photography as
> > > exposed by modern camera systems on phones. If you want to read more
> > > words on the topic I wrote a blog post at the time:
> > >
> > >   https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
> > >
> > > Back to the topic of virtio-video as I understand it the principle
> > > features/configurations are:
> > >
> > >   - All the various CODECs, resolutions and pixel formats
> > >   - Stateful vs Stateless streams
> > >   - If we want support grabbing single frames from a source
> > >
> > > My main concern about the V4L approach is that it pegs updates to the
> > > interface to the continuing evolution of the V4L interface in Linux. Now
> > > maybe video is a solved problem and there won't be (m)any new features
> > > we need to add after the initial revision. However I'm not a domain
> > > expert here so I just don't know.
> >
> > I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
> > when we got a chance to meet face to face. I think the V4L2 kernel API
> > is a quite good fit in the sense that its level of abstraction, when
> > applied to video codecs and "simple" cameras (defined, more or less, as
> > something ressembling a USB webcam feature-wise). It doesn't mean that
> > the virtio-video or virtio-camera specifications should necessarily
> > reference V4L2 or use the exact same vocabulary, they could simply copy
> > the concepts, and stay loosely-coupled with V4L2 in the sense that both
> > specification should try to evolve in compatible directions.
>
> --
> Regards,
>
> Laurent Pinchart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-08  4:55                                           ` Alexandre Courbot
@ 2023-05-11  8:50                                             ` Alexander Gordeev
  2023-05-11  9:00                                               ` Alexander Gordeev
  2023-05-12  4:09                                               ` Alexandre Courbot
  0 siblings, 2 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-11  8:50 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 08.05.23 06:55, Alexandre Courbot wrote:
> On Fri, May 5, 2023 at 8:55 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 03.05.23 16:04, Cornelia Huck wrote:
>>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>
>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>> think a big part of the disagreement stems from the misconception that
>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>> which is absolutely not the case.
>>>>
>>>> I understand this, of course. I'm arguing, that it is harder to
>>>> implement it, get it straight and then maintain it over years. Also it
>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>> spec, but this always comes at a cost of decreased readability and
>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>> to virtio-video for our use-case. And I believe it would be the same for
>>>> every developer, that has to actually implement the spec, not just do
>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>> compatibility device (which I believe it is), then it is fine to have
>>>> both and keep improving the virtio-video, including taking the best
>>>> ideas from the V4L2 and overall using it as a reference to make writing
>>>> the driver simpler.
>>>
>>> Let me jump in here and ask another question:
>>>
>>> Imagine that, some years in the future, somebody wants to add a virtio
>>> device for handling video encoding/decoding to their hypervisor.
>>>
>>> Option 1: There are different devices to chose from. How is the person
>>> implementing this supposed to pick a device? They might have a narrow
>>> use case, where it is clear which of the devices is the one that needs to
>>> be supported; but they also might have multiple, diverse use cases, and
>>> end up needing to implement all of the devices.
>>
>> I think in this case virtio-v4l2 should be used as a compatibility
>> device exclusively. This means discouraging increasing its complexity
>> even more with more patches in the spec. virtio-video should eventually
>> cover all the use-cases of V4L2, so I think it is reasonable to use it
>> in both complex use-cases and in simple use-cases, where there is no
>> decoder/encoder V4L2 device on the host.
>>
>>> Option 2: There is one device with various optional features. The person
>>> implementing this can start off with a certain subset of features
>>> depending on their expected use cases, and add to it later, if needed;
>>> but the upfront complexity might be too high for specialized use cases.
>
> I don't see that many negociable features we can provide for a
> decoder/encoder device - at least not many that are not considered
> basic (like guest buffers). In terms of provided features for codecs
> virtio-video and virtio-v4l2 are essentially equivalent.

Actually I see a lot of potential in using the virtio feature flag
negotiation for virtio-video:

1. We already have some feature flags related to memory management.

2. I think it would be great to take V4L2 controls negotiation and turn
it into the feature flags negotiation. I really like this idea and I'd
like to implement it. Not all the controls at once, of course. Still it
would be very easy to port more given the existing process. They
correspond well enough to each other, I think. This way we don't need to
introduce something like the VIDIOC_QUERYCTRL/VIDIOC_QUERY_EXT_CTRL, we
don't need two mechanisms for feature negotiations (like it would be
with virtio-v4l2, right?), also all the features would be in one place.
Then we can directly reference some enums from V4L2, like
v4l2_mpeg_video_h264_profile or v4l2_mpeg_video_h264_level. That's what
I call taking the best from V4L2.

3. We can have things, that V4L2 doesn't support in their stateful UAPI.
For example, dequeuing output buffers in decoder order.

> The question is more: do we want a decoder/encoder specification and
> another one for cameras or do we want something that covers video
> devices in general. And is it desirable to reuse an already existing
> protocol for communication between a non-privileged entity and a
> privileged one (V4L2) or define our own?

Please note, that virtio-video could be extended to support everything
V4L2 does in the future. Including cameras if necessary. So it can cover
video devices in general too.

> About the latter point, Alex Bennée mentioned that it was difficult to
> find the engineering time to define virtio-camera. And virtio-video
> has also not been particularly fast to come together. virtio-v4l2
> basically serves us both on a plate for a lower effort.

If we talk only about codecs, the effort is lower only in case you have
V4L2 codecs on the host. Otherwise the effort seems higher.
I also hope to be able to update virtio-video at a faster pace. Please
let me try.
This is understandable, that working on the specs takes time. That's why
I'm all for making room for everybody to work. I think eventually
virtio-video or virtio-video + virtio-camera can replace virtio-v4l2. I
strongly believe this is a better solution for the long term.

>> In this case I'd prefer to have the simpler device first, that is the
>> current virtio-video, then to add features incrementally using feature
>> flags and taking into account the virtualization context. V4L2 is a
>> complex thing from a different context. They already tried to carve out
>> some of the use-cases like stateful decoder/encoder API, but this work
>> is not finished (struct v4l2_buffer can serve as an evidence). This is
>> like dissecting a monolith. Also it has to be patched to make it more
>> appropriate for virtualization (we can see this in Alexandre's PoC already).
>>
>>> Leaving concrete references to V4L2 out of the picture, we're currently
>>> trying to decide whether our future will be more like Option 1 or Option
>>> 2, with their respective trade-offs.
>>
>> I'd like to rely on opinions of people, who know more about virtio
>> development and goals. I would be happy to present or reiterate my
>> arguments to anyone interested if necessary.
>>
>>> I'm slightly biased towards Option 2; does it look feasible at all, or
>>> am I missing something essential here? (I had the impression that some
>>> previous confusion had been cleared up; apologies in advance if I'm
>>> misrepresenting things.)
>>
>> Indeed some of the previous confusion has been cleared up. But not the
>> key thing. Alexandre still claims, that this patched V4L2 UAPI pass
>> through is only marginally more complex, for example. I don't agree with
>> this and I have evidence. We haven't finished discussing this evidence.
>
> Are you talking about v4l2_buffer?
>
> https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/buffer.html#struct-v4l2-buffer
>
> I think you implied that some of its fields were not relevant for
> video decoding or encoding, which if you examine them is again
> incorrect. That also answers your question of why the stateful decoder
> spec did not mention the valid fields - because it is documented on
> this page, which tells exactly which fields the driver/device are
> expected to set for each queue.

I'm talking about struct v4l2_buffer, yes, but not only. Also about
struct v4l2_plane, enum v4l2_buf_type, the buffer flags, enum
v4l2_memory (but this one is comparable to virtio-video), timecodes.
For example, the way the fields in the struct v4l2_buffer and struct
v4l2_plane are filled and interpreted depends a lot on the type. Here is
the enum v4l2_buf_type:

enum v4l2_buf_type {
         V4L2_BUF_TYPE_VIDEO_CAPTURE        = 1,
         V4L2_BUF_TYPE_VIDEO_OUTPUT         = 2,
         V4L2_BUF_TYPE_VIDEO_OVERLAY        = 3,
         V4L2_BUF_TYPE_VBI_CAPTURE          = 4,
         V4L2_BUF_TYPE_VBI_OUTPUT           = 5,
         V4L2_BUF_TYPE_SLICED_VBI_CAPTURE   = 6,
         V4L2_BUF_TYPE_SLICED_VBI_OUTPUT    = 7,
         V4L2_BUF_TYPE_VIDEO_OUTPUT_OVERLAY = 8,
         V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE = 9,
         V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE  = 10,
         V4L2_BUF_TYPE_SDR_CAPTURE          = 11,
         V4L2_BUF_TYPE_SDR_OUTPUT           = 12,
         V4L2_BUF_TYPE_META_CAPTURE         = 13,
         V4L2_BUF_TYPE_META_OUTPUT          = 14,
         /* Deprecated, do not use */
         V4L2_BUF_TYPE_PRIVATE              = 0x80,
};

Of these 14 cases we only need 2 for the codecs. Right?
Also the flags. There are 22 of them. Are they all needed too? I don't
think so. We only have like 5 in virtio-video at the moment.

You posted code for filling v4l2_buffer in one of your previous emails.
What I'm trying to say is that a person, who doesn't know this in
advance, will have a hard time writing this same code if they only have
the virtio-v4l2 spec.

> Also I think I mentioned it before, but it would be really sweet if
> you could stop mischaracterizing virtio-v4l2 as a "passthrough" since
> it is very much not that, much like virgl and venus aren't either.

I try to use the "V4L2 passthrough" term only when I'm talking about the
actual pass through use-case like yours. Sorry if I misused it. I'll try
to be more careful.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-11  8:50                                             ` Alexander Gordeev
@ 2023-05-11  9:00                                               ` Alexander Gordeev
  2023-05-12  4:15                                                 ` Alexandre Courbot
  2023-05-12  4:09                                               ` Alexandre Courbot
  1 sibling, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-11  9:00 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 11.05.23 10:50, Alexander Gordeev wrote:
> On 08.05.23 06:55, Alexandre Courbot wrote:
>> On Fri, May 5, 2023 at 8:55 PM Alexander Gordeev
>> <alexander.gordeev@opensynergy.com> wrote:
>>>
>>> On 03.05.23 16:04, Cornelia Huck wrote:
>>>> On Fri, Apr 28 2023, Alexander Gordeev
>>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>
>>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>>> think a big part of the disagreement stems from the misconception
>>>>>> that
>>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>>> which is absolutely not the case.
>>>>>
>>>>> I understand this, of course. I'm arguing, that it is harder to
>>>>> implement it, get it straight and then maintain it over years. Also it
>>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>>> spec, but this always comes at a cost of decreased readability and
>>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>>> to virtio-video for our use-case. And I believe it would be the
>>>>> same for
>>>>> every developer, that has to actually implement the spec, not just do
>>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>>> compatibility device (which I believe it is), then it is fine to have
>>>>> both and keep improving the virtio-video, including taking the best
>>>>> ideas from the V4L2 and overall using it as a reference to make
>>>>> writing
>>>>> the driver simpler.
>>>>
>>>> Let me jump in here and ask another question:
>>>>
>>>> Imagine that, some years in the future, somebody wants to add a virtio
>>>> device for handling video encoding/decoding to their hypervisor.
>>>>
>>>> Option 1: There are different devices to chose from. How is the person
>>>> implementing this supposed to pick a device? They might have a narrow
>>>> use case, where it is clear which of the devices is the one that
>>>> needs to
>>>> be supported; but they also might have multiple, diverse use cases, and
>>>> end up needing to implement all of the devices.
>>>
>>> I think in this case virtio-v4l2 should be used as a compatibility
>>> device exclusively. This means discouraging increasing its complexity
>>> even more with more patches in the spec. virtio-video should eventually
>>> cover all the use-cases of V4L2, so I think it is reasonable to use it
>>> in both complex use-cases and in simple use-cases, where there is no
>>> decoder/encoder V4L2 device on the host.
>>>
>>>> Option 2: There is one device with various optional features. The
>>>> person
>>>> implementing this can start off with a certain subset of features
>>>> depending on their expected use cases, and add to it later, if needed;
>>>> but the upfront complexity might be too high for specialized use cases.
>>
>> I don't see that many negociable features we can provide for a
>> decoder/encoder device - at least not many that are not considered
>> basic (like guest buffers). In terms of provided features for codecs
>> virtio-video and virtio-v4l2 are essentially equivalent.
>
> Actually I see a lot of potential in using the virtio feature flag
> negotiation for virtio-video:
>
> 1. We already have some feature flags related to memory management.
>
> 2. I think it would be great to take V4L2 controls negotiation and turn
> it into the feature flags negotiation. I really like this idea and I'd
> like to implement it. Not all the controls at once, of course. Still it
> would be very easy to port more given the existing process. They
> correspond well enough to each other, I think. This way we don't need to
> introduce something like the VIDIOC_QUERYCTRL/VIDIOC_QUERY_EXT_CTRL, we
> don't need two mechanisms for feature negotiations (like it would be
> with virtio-v4l2, right?), also all the features would be in one place.
> Then we can directly reference some enums from V4L2, like
> v4l2_mpeg_video_h264_profile or v4l2_mpeg_video_h264_level. That's what
> I call taking the best from V4L2.
>
> 3. We can have things, that V4L2 doesn't support in their stateful UAPI.
> For example, dequeuing output buffers in decoder order.

I'd like to also share my current roadmap for the draft v7 of
virtio-video (or maybe the draft v8 in some cases). The significant
changes would be:

1. Making querying the capabilities fully compatible with V4L2 but in 1
round-trip over virtio, not 10+. This is what I'm actively working on
right now.
2. Making all the commands non-blocking by providing completion events
over the event queue.
3. Adding back the controls from v5 and adding the corresponding feature
flags as I wrote in the quote above.

Also I'll address all the other comments, of course.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-11  8:50                                             ` Alexander Gordeev
  2023-05-11  9:00                                               ` Alexander Gordeev
@ 2023-05-12  4:09                                               ` Alexandre Courbot
  2023-05-16 14:53                                                 ` Alexander Gordeev
  2023-05-17 11:04                                                 ` Alexander Gordeev
  1 sibling, 2 replies; 97+ messages in thread
From: Alexandre Courbot @ 2023-05-12  4:09 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Thu, May 11, 2023 at 5:50 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 08.05.23 06:55, Alexandre Courbot wrote:
> > On Fri, May 5, 2023 at 8:55 PM Alexander Gordeev
> > <alexander.gordeev@opensynergy.com> wrote:
> >>
> >> On 03.05.23 16:04, Cornelia Huck wrote:
> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> >>>
> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
> >>>>> think a big part of the disagreement stems from the misconception that
> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
> >>>>> which is absolutely not the case.
> >>>>
> >>>> I understand this, of course. I'm arguing, that it is harder to
> >>>> implement it, get it straight and then maintain it over years. Also it
> >>>> brings limitations, that sometimes can be workarounded in the virtio
> >>>> spec, but this always comes at a cost of decreased readability and
> >>>> increased complexity. Overall it looks clearly as a downgrade compared
> >>>> to virtio-video for our use-case. And I believe it would be the same for
> >>>> every developer, that has to actually implement the spec, not just do
> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
> >>>> compatibility device (which I believe it is), then it is fine to have
> >>>> both and keep improving the virtio-video, including taking the best
> >>>> ideas from the V4L2 and overall using it as a reference to make writing
> >>>> the driver simpler.
> >>>
> >>> Let me jump in here and ask another question:
> >>>
> >>> Imagine that, some years in the future, somebody wants to add a virtio
> >>> device for handling video encoding/decoding to their hypervisor.
> >>>
> >>> Option 1: There are different devices to chose from. How is the person
> >>> implementing this supposed to pick a device? They might have a narrow
> >>> use case, where it is clear which of the devices is the one that needs to
> >>> be supported; but they also might have multiple, diverse use cases, and
> >>> end up needing to implement all of the devices.
> >>
> >> I think in this case virtio-v4l2 should be used as a compatibility
> >> device exclusively. This means discouraging increasing its complexity
> >> even more with more patches in the spec. virtio-video should eventually
> >> cover all the use-cases of V4L2, so I think it is reasonable to use it
> >> in both complex use-cases and in simple use-cases, where there is no
> >> decoder/encoder V4L2 device on the host.
> >>
> >>> Option 2: There is one device with various optional features. The person
> >>> implementing this can start off with a certain subset of features
> >>> depending on their expected use cases, and add to it later, if needed;
> >>> but the upfront complexity might be too high for specialized use cases.
> >
> > I don't see that many negociable features we can provide for a
> > decoder/encoder device - at least not many that are not considered
> > basic (like guest buffers). In terms of provided features for codecs
> > virtio-video and virtio-v4l2 are essentially equivalent.
>
> Actually I see a lot of potential in using the virtio feature flag
> negotiation for virtio-video:
>
> 1. We already have some feature flags related to memory management.
>
> 2. I think it would be great to take V4L2 controls negotiation and turn
> it into the feature flags negotiation. I really like this idea and I'd
> like to implement it. Not all the controls at once, of course. Still it
> would be very easy to port more given the existing process. They
> correspond well enough to each other, I think. This way we don't need to
> introduce something like the VIDIOC_QUERYCTRL/VIDIOC_QUERY_EXT_CTRL, we
> don't need two mechanisms for feature negotiations (like it would be
> with virtio-v4l2, right?), also all the features would be in one place.
> Then we can directly reference some enums from V4L2, like
> v4l2_mpeg_video_h264_profile or v4l2_mpeg_video_h264_level. That's what
> I call taking the best from V4L2.
>
> 3. We can have things, that V4L2 doesn't support in their stateful UAPI.
> For example, dequeuing output buffers in decoder order.

... provided the host can do that (i.e. has a stateless decoder
interface). What is the use-case for this btw?

>
> > The question is more: do we want a decoder/encoder specification and
> > another one for cameras or do we want something that covers video
> > devices in general. And is it desirable to reuse an already existing
> > protocol for communication between a non-privileged entity and a
> > privileged one (V4L2) or define our own?
>
> Please note, that virtio-video could be extended to support everything
> V4L2 does in the future. Including cameras if necessary. So it can cover
> video devices in general too.

That's a very, very bold claim, and if you do you will inevitably need
to add things that are not relevant to the codec case, which is your
complaint about V4L2. Not to mention the many new pages of spec that
this will require.

>
> > About the latter point, Alex Bennée mentioned that it was difficult to
> > find the engineering time to define virtio-camera. And virtio-video
> > has also not been particularly fast to come together. virtio-v4l2
> > basically serves us both on a plate for a lower effort.
>
> If we talk only about codecs, the effort is lower only in case you have
> V4L2 codecs on the host. Otherwise the effort seems higher.

The effort is lower in terms of spec writing, and also lower if your
guest is Linux as you can support all video devices with a single
driver. That's a very large portion of the virtio users here.

> I also hope to be able to update virtio-video at a faster pace. Please
> let me try.

Please hold on a bit - there are two things here.

1) I'd like to settle the virtio-v4l2/virtio-video argument first to
make sure we don't get two things clashing head-on. As far as codecs
are concerned we certainly don't need both. Cornelia, I think we'll
need you to make a call on this, or at least tell us what you need to
make the call. If it helps I can send a draft of what the virtio-v4l2
spec would look like, it should be relatively short.

2) I (and other ChromeOS contributors) have been driving this spec so
far and while I think virtio-v4l2 is a better solution, I have not
said I would give up on virtio-video if virtio-v4l2 was not adopted
and will keep iterating on it in that case.

That said, your contributions are of course welcome if you have stuff
written down that you want to include. The current virtio-video spec
is available here:

https://github.com/Gnurou/virtio-video-spec

I'm writing it in Markdown (virtio-video.md) to avoid dealing with
LaTeX directly and use a pandoc filter to convert it before submission
- there is a Makefile that takes care of that. Feel free to send a
pull request with changes you have worked on, including your
Signed-off-by in the patches so I can carry it on into the v7 patch if
we go that route.

> This is understandable, that working on the specs takes time. That's why
> I'm all for making room for everybody to work. I think eventually
> virtio-video or virtio-video + virtio-camera can replace virtio-v4l2. I
> strongly believe this is a better solution for the long term.
>
> >> In this case I'd prefer to have the simpler device first, that is the
> >> current virtio-video, then to add features incrementally using feature
> >> flags and taking into account the virtualization context. V4L2 is a
> >> complex thing from a different context. They already tried to carve out
> >> some of the use-cases like stateful decoder/encoder API, but this work
> >> is not finished (struct v4l2_buffer can serve as an evidence). This is
> >> like dissecting a monolith. Also it has to be patched to make it more
> >> appropriate for virtualization (we can see this in Alexandre's PoC already).
> >>
> >>> Leaving concrete references to V4L2 out of the picture, we're currently
> >>> trying to decide whether our future will be more like Option 1 or Option
> >>> 2, with their respective trade-offs.
> >>
> >> I'd like to rely on opinions of people, who know more about virtio
> >> development and goals. I would be happy to present or reiterate my
> >> arguments to anyone interested if necessary.
> >>
> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
> >>> am I missing something essential here? (I had the impression that some
> >>> previous confusion had been cleared up; apologies in advance if I'm
> >>> misrepresenting things.)
> >>
> >> Indeed some of the previous confusion has been cleared up. But not the
> >> key thing. Alexandre still claims, that this patched V4L2 UAPI pass
> >> through is only marginally more complex, for example. I don't agree with
> >> this and I have evidence. We haven't finished discussing this evidence.
> >
> > Are you talking about v4l2_buffer?
> >
> > https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/buffer.html#struct-v4l2-buffer
> >
> > I think you implied that some of its fields were not relevant for
> > video decoding or encoding, which if you examine them is again
> > incorrect. That also answers your question of why the stateful decoder
> > spec did not mention the valid fields - because it is documented on
> > this page, which tells exactly which fields the driver/device are
> > expected to set for each queue.
>
> I'm talking about struct v4l2_buffer, yes, but not only. Also about
> struct v4l2_plane, enum v4l2_buf_type, the buffer flags, enum
> v4l2_memory (but this one is comparable to virtio-video), timecodes.
> For example, the way the fields in the struct v4l2_buffer and struct
> v4l2_plane are filled and interpreted depends a lot on the type. Here is
> the enum v4l2_buf_type:
>
> enum v4l2_buf_type {
>          V4L2_BUF_TYPE_VIDEO_CAPTURE        = 1,
>          V4L2_BUF_TYPE_VIDEO_OUTPUT         = 2,
>          V4L2_BUF_TYPE_VIDEO_OVERLAY        = 3,
>          V4L2_BUF_TYPE_VBI_CAPTURE          = 4,
>          V4L2_BUF_TYPE_VBI_OUTPUT           = 5,
>          V4L2_BUF_TYPE_SLICED_VBI_CAPTURE   = 6,
>          V4L2_BUF_TYPE_SLICED_VBI_OUTPUT    = 7,
>          V4L2_BUF_TYPE_VIDEO_OUTPUT_OVERLAY = 8,
>          V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE = 9,
>          V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE  = 10,
>          V4L2_BUF_TYPE_SDR_CAPTURE          = 11,
>          V4L2_BUF_TYPE_SDR_OUTPUT           = 12,
>          V4L2_BUF_TYPE_META_CAPTURE         = 13,
>          V4L2_BUF_TYPE_META_OUTPUT          = 14,
>          /* Deprecated, do not use */
>          V4L2_BUF_TYPE_PRIVATE              = 0x80,
> };
>
> Of these 14 cases we only need 2 for the codecs. Right?
> Also the flags. There are 22 of them. Are they all needed too? I don't
> think so. We only have like 5 in virtio-video at the moment.

Upon reading the V4L2 spec it is pretty clear which queue types are
valid for each device, and anything other than CAPTURE_MPLANE and
OUTPUT_MPLANE is unlikely to be used anyway.

We are really splitting hairs here and this looks like a wild goose
chase for software purity - even within virtio-video there are already
flags that only make sense for an encoder, and you cannot remove them
without defining new more specific structures and complicating things
overall. If you extend virtio-video to support more use-cases, you
will end up with more of these as well. So seriously, why is this such
a big deal when the instructions on how to use these structures for
each use case are at the other end of a link click?

>
> You posted code for filling v4l2_buffer in one of your previous emails.
> What I'm trying to say is that a person, who doesn't know this in
> advance, will have a hard time writing this same code if they only have
> the virtio-v4l2 spec.

Well yes, the counterpart of virtio-v4l2 being shorter is that you
need to refer to V4L2 as well - that's actually the point, to reduce
the burden on virtio by reusing a spec that already exists and
referring to it.

Earlier in this email you mentioned reusing structs like
v4l2_mpeg_video_h264_profile in virtio-video, that creates the same
dependency to the V4L2 spec. The difference is how much we are taking
from it.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-11  9:00                                               ` Alexander Gordeev
@ 2023-05-12  4:15                                                 ` Alexandre Courbot
  2023-05-17  7:35                                                   ` Alexander Gordeev
  0 siblings, 1 reply; 97+ messages in thread
From: Alexandre Courbot @ 2023-05-12  4:15 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On Thu, May 11, 2023 at 6:00 PM Alexander Gordeev
<alexander.gordeev@opensynergy.com> wrote:
>
> On 11.05.23 10:50, Alexander Gordeev wrote:
> > On 08.05.23 06:55, Alexandre Courbot wrote:
> >> On Fri, May 5, 2023 at 8:55 PM Alexander Gordeev
> >> <alexander.gordeev@opensynergy.com> wrote:
> >>>
> >>> On 03.05.23 16:04, Cornelia Huck wrote:
> >>>> On Fri, Apr 28 2023, Alexander Gordeev
> >>>> <alexander.gordeev@opensynergy.com> wrote:
> >>>>
> >>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> >>>>>> But in any case, that's irrelevant to the guest-host interface, and I
> >>>>>> think a big part of the disagreement stems from the misconception
> >>>>>> that
> >>>>>> V4L2 absolutely needs to be used on either the guest or the host,
> >>>>>> which is absolutely not the case.
> >>>>>
> >>>>> I understand this, of course. I'm arguing, that it is harder to
> >>>>> implement it, get it straight and then maintain it over years. Also it
> >>>>> brings limitations, that sometimes can be workarounded in the virtio
> >>>>> spec, but this always comes at a cost of decreased readability and
> >>>>> increased complexity. Overall it looks clearly as a downgrade compared
> >>>>> to virtio-video for our use-case. And I believe it would be the
> >>>>> same for
> >>>>> every developer, that has to actually implement the spec, not just do
> >>>>> the pass through. So if we think of V4L2 UAPI pass through as a
> >>>>> compatibility device (which I believe it is), then it is fine to have
> >>>>> both and keep improving the virtio-video, including taking the best
> >>>>> ideas from the V4L2 and overall using it as a reference to make
> >>>>> writing
> >>>>> the driver simpler.
> >>>>
> >>>> Let me jump in here and ask another question:
> >>>>
> >>>> Imagine that, some years in the future, somebody wants to add a virtio
> >>>> device for handling video encoding/decoding to their hypervisor.
> >>>>
> >>>> Option 1: There are different devices to chose from. How is the person
> >>>> implementing this supposed to pick a device? They might have a narrow
> >>>> use case, where it is clear which of the devices is the one that
> >>>> needs to
> >>>> be supported; but they also might have multiple, diverse use cases, and
> >>>> end up needing to implement all of the devices.
> >>>
> >>> I think in this case virtio-v4l2 should be used as a compatibility
> >>> device exclusively. This means discouraging increasing its complexity
> >>> even more with more patches in the spec. virtio-video should eventually
> >>> cover all the use-cases of V4L2, so I think it is reasonable to use it
> >>> in both complex use-cases and in simple use-cases, where there is no
> >>> decoder/encoder V4L2 device on the host.
> >>>
> >>>> Option 2: There is one device with various optional features. The
> >>>> person
> >>>> implementing this can start off with a certain subset of features
> >>>> depending on their expected use cases, and add to it later, if needed;
> >>>> but the upfront complexity might be too high for specialized use cases.
> >>
> >> I don't see that many negociable features we can provide for a
> >> decoder/encoder device - at least not many that are not considered
> >> basic (like guest buffers). In terms of provided features for codecs
> >> virtio-video and virtio-v4l2 are essentially equivalent.
> >
> > Actually I see a lot of potential in using the virtio feature flag
> > negotiation for virtio-video:
> >
> > 1. We already have some feature flags related to memory management.
> >
> > 2. I think it would be great to take V4L2 controls negotiation and turn
> > it into the feature flags negotiation. I really like this idea and I'd
> > like to implement it. Not all the controls at once, of course. Still it
> > would be very easy to port more given the existing process. They
> > correspond well enough to each other, I think. This way we don't need to
> > introduce something like the VIDIOC_QUERYCTRL/VIDIOC_QUERY_EXT_CTRL, we
> > don't need two mechanisms for feature negotiations (like it would be
> > with virtio-v4l2, right?), also all the features would be in one place.
> > Then we can directly reference some enums from V4L2, like
> > v4l2_mpeg_video_h264_profile or v4l2_mpeg_video_h264_level. That's what
> > I call taking the best from V4L2.
> >
> > 3. We can have things, that V4L2 doesn't support in their stateful UAPI.
> > For example, dequeuing output buffers in decoder order.
>
> I'd like to also share my current roadmap for the draft v7 of
> virtio-video (or maybe the draft v8 in some cases). The significant
> changes would be:
>
> 1. Making querying the capabilities fully compatible with V4L2 but in 1
> round-trip over virtio, not 10+. This is what I'm actively working on
> right now.

That sounds good, while not essential since the capability negotiation
occurs before we start streaming so these extra trips should not be
perceptible by the user. But the current capability command was not
suitable and needs to be improved anyway.

> 2. Making all the commands non-blocking by providing completion events
> over the event queue.

+1 on that too, as the experience with virtio-v4l2 suggests this
eliminates some headaches down the line.

> 3. Adding back the controls from v5 and adding the corresponding feature
> flags as I wrote in the quote above.

Beware of not repeating the same mistake as v5 (all controls under one
big structure). If you mean extending the parameters mechanisms with
the things that are missing from v5, then yes that should be done
anyway.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-05 15:55                                                 ` Alex Bennée
       [not found]                                                   ` <20230506081229.GA8114@pendragon.ideasonboard.com>
@ 2023-05-16 12:57                                                   ` Alexander Gordeev
  1 sibling, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-16 12:57 UTC (permalink / raw)
  To: Alex Bennée, Kieran Bingham
  Cc: Cornelia Huck, Alexandre Courbot, virtio-dev, Keiichi Watanabe,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve, libcamera-devel

On 05.05.23 17:55, Alex Bennée wrote:
>
> Kieran Bingham <kieran.bingham@ideasonboard.com> writes:
>
>> Hi All,
>>
>> Coming in late, thanks to lei/lore spotting the libcamera keyword.
>>
>> + Cc: libcamera-devel to raise awareness of the discussion there.
>>
>> Quoting Alexander Gordeev (2023-05-05 10:57:29)
>>> On 03.05.23 17:53, Cornelia Huck wrote:
>>>> On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>
>>>>> Cornelia Huck <cohuck@redhat.com> writes:
>>>>>
>>>>>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>>>>
>>>>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>>>>> think a big part of the disagreement stems from the misconception that
>>>>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>>>>> which is absolutely not the case.
>>>>>>>
>>>>>>> I understand this, of course. I'm arguing, that it is harder to
>>>>>>> implement it, get it straight and then maintain it over years. Also it
>>>>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>>>>> spec, but this always comes at a cost of decreased readability and
>>>>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>>>>> to virtio-video for our use-case. And I believe it would be the same for
>>>>>>> every developer, that has to actually implement the spec, not just do
>>>>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>>>>> compatibility device (which I believe it is), then it is fine to have
>>>>>>> both and keep improving the virtio-video, including taking the best
>>>>>>> ideas from the V4L2 and overall using it as a reference to make writing
>>>>>>> the driver simpler.
>>>>>>
>>>>>> Let me jump in here and ask another question:
>>>>>>
>>>>>> Imagine that, some years in the future, somebody wants to add a virtio
>>>>>> device for handling video encoding/decoding to their hypervisor.
>>>>>>
>>>>>> Option 1: There are different devices to chose from. How is the person
>>>>>> implementing this supposed to pick a device? They might have a narrow
>>>>>> use case, where it is clear which of the devices is the one that needs to
>>>>>> be supported; but they also might have multiple, diverse use cases, and
>>>>>> end up needing to implement all of the devices.
>>>>>>
>>>>>> Option 2: There is one device with various optional features. The person
>>>>>> implementing this can start off with a certain subset of features
>>>>>> depending on their expected use cases, and add to it later, if needed;
>>>>>> but the upfront complexity might be too high for specialized use cases.
>>>>>>
>>>>>> Leaving concrete references to V4L2 out of the picture, we're currently
>>>>>> trying to decide whether our future will be more like Option 1 or Option
>>>>>> 2, with their respective trade-offs.
>>>>>>
>>>>>> I'm slightly biased towards Option 2; does it look feasible at all, or
>>>>>> am I missing something essential here? (I had the impression that some
>>>>>> previous confusion had been cleared up; apologies in advance if I'm
>>>>>> misrepresenting things.)
>>>>>>
>>>>>> I'd really love to see some kind of consensus for 1.3, if at all
>>>>>> possible :)
>>>>>
>>>>> I think feature discovery and extensibility is a key part of the VirtIO
>>>>> paradigm which is why I find the virtio-v4l approach limiting. By
>>>>> pegging the device to a Linux API we effectively limit the growth of the
>>>>> device specification to as fast as the Linux API changes. I'm not fully
>>>>> immersed in v4l but I don't think it is seeing any additional features
>>>>> developed for it and its limitations for camera are one of the reasons
>>>>> stuff is being pushed to userspace in solutions like libcamera:
>>>>>
>>>>>     How is libcamera different from V4L2?
>>>>>
>>>>>     We see libcamera as a continuation of V4L2. One that can more easily
>>>>>     handle the recent advances in hardware design. As embedded cameras have
>>>>>     developed, all of the complexity has been pushed on to the developers.
>>>>>     With libcamera, all of that complexity is simplified and a single model
>>>>>     is presented to application developers.
>>>>
>>>> Ok, that is interesting; thanks for the information.
>>>>
>>>>>
>>>>> That said its not totally our experience to have virtio devices act as
>>>>> simple pipes for some higher level protocol. The virtio-gpu spec says
>>>>> very little about the details of how 3D devices work and simply offers
>>>>> an opaque pipe to push a (potentially propriety) command stream to the
>>>>> back end. As far as I'm aware the proposals for Vulkan and Wayland
>>>>> device support doesn't even offer a feature bit but simply changes the
>>>>> graphics stream type in the command packets.
>>>>>
>>>>> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
>>>>> incompatible with other feature bits and make that the baseline
>>>>> implementation but it's not really in the spirit of what VirtIO is
>>>>> trying to achieve.
>>>>
>>>> I'd not be in favour of an incompatible feature flag,
>>>> either... extensions are good, but conflicting features is something
>>>> that I'd like to avoid.
>>>>
>>>> So, given that I'd still prefer to have a single device: How well does
>>>> the proposed virtio-video device map to a Linux driver implementation
>>>> that hooks into V4L2?
>>>
>>> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
>>> months making the existing driver fully V4L2 compliant. If this goal
>>> requires changing the spec, than we still have time to do that. I don't
>>> expect a lot of problems on this side. There might be problems with
>>> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
>>> of this can be accomplished over time.
>>>
>>>> If the general process flow is compatible and it
>>>> is mostly a question of wiring the parts together, I think pushing that
>>>> part of the complexity into the Linux driver is a reasonable
>>>> trade-off. Being able to use an existing protocol is nice, but if that
>>>> protocol is not perceived as flexible enough, it is probably not worth
>>>> encoding it into a spec. (Similar considerations apply to hooking up the
>>>> device in the hypervisor.)
>>>
>>> I very much agree with these statements. I think this is how it should
>>> be: we start with a compact but usable device, then add features and
>>> enable them using feature flags. Eventually we can cover all the
>>> use-cases of V4L2 unless we decide to have separate devices for them
>>> (virtio-camera, etc). This would be better in the long term I think.
>>
>> Camera's definitely have their quirks - mostly because many usecases are
>> hard to convey over a single Video device node (with the hardware) but I
>> think we might expect that complexity to be managed by the host, and
>> probably offer a ready made stream to the guest. Of course how to handle
>> multiple streams and configuration of the whole pipeline may get more
>> difficult and warrant a specific 'virtio-camera' ... but I would think
>> the basics could be covered generically to start with.
>>
>> It's not clear who's driving this implementation and spec, so I guess
>> there's more reading to do.
>>
>> Anyway, I've added Cc libcamera-devel to raise awareness of this topic
>> to camera list.
>>
>> I bet Laurent has some stronger opinions on how he'd see camera's exist
>> in a virtio space.
>
> Personally I would rather see a separate virtio-camera specification
> that properly encapsulates all the various use cases we have for
> cameras. In many ways just processing a stream of video is a much
> simpler use case.
>
> During Linaro's Project Stratos we got a lot of feedback from members
> who professed interest in a virtio-camera initiative. However we were
> unable to get enough engineering resources from the various companies to
> collaborate in developing a specification that would meet everyone's
> needs. The problem space is wide from having numerous black and white
> sensor cameras on cars to the full on computational photography as
> exposed by modern camera systems on phones. If you want to read more
> words on the topic I wrote a blog post at the time:
>
>    https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/

Thanks for the link, it is a very insightful read. Actually there are
folks attempting to virtualize cameras in my company too. Probably they
should reach out to you.

> Back to the topic of virtio-video as I understand it the principle
> features/configurations are:
>
>    - All the various CODECs, resolutions and pixel formats

These would be the basics. Encoders usually have even more parameters
with sensible defaults. So the mentioned ones should be enough in the
simple case.

>    - Stateful vs Stateless streams

Yes. Still I think everybody agrees at the moment, that we should
consider the stateful interface first. For me the valid reason is that
there is hardware, that only exposes stateful interface. And stateless
interface can be always converted to stateful by keeping the state on
the host side.

>    - If we want support grabbing single frames from a source
>
> My main concern about the V4L approach is that it pegs updates to the
> interface to the continuing evolution of the V4L interface in Linux. Now
> maybe video is a solved problem and there won't be (m)any new features
> we need to add after the initial revision. However I'm not a domain
> expert here so I just don't know.

This is also my concern. AFAIU Alexandre proposed to "patch" the V4L2
stateful interfaces were we see fit. But this decreases readability
significantly IMO. So I'd prefer to avoid this.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-06  8:16                                                     ` [libcamera-devel] " Laurent Pinchart
@ 2023-05-16 13:50                                                         ` Alexander Gordeev
  2023-05-16 13:50                                                         ` Alexander Gordeev
  1 sibling, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-16 13:50 UTC (permalink / raw)
  To: Laurent Pinchart, Alex Bennée, virtio-dev, Albert Esteve,
	Matti Möll, Alexandre Courbot, Andrew Gazizov,
	Daniel Almeida, Cornelia Huck, Marcin Wojtas, Keiichi Watanabe,
	Gustavo Padovan, libcamera-devel, Bartłomiej Grzesik,
	Enrico Granata, Enric Balletbo i Serra, linux-media,
	Andrii Cherniavskyi

Hello Laurent,

+CC: Andrii

On 06.05.23 10:16, Laurent Pinchart wrote:
> I'm also CC'ing the linux-media@vger.kernel.org mailing list for these
> discussions, I'm sure there are folks there who are interested in codec
> and camera virtualization.
>
> On Sat, May 06, 2023 at 11:12:29AM +0300, Laurent Pinchart via libcamera-devel wrote:
>> On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
>>> Kieran Bingham writes:
>>>> Quoting Alexander Gordeev (2023-05-05 10:57:29)
>>>>> On 03.05.23 17:53, Cornelia Huck wrote:
>>>>>> On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>> Cornelia Huck <cohuck@redhat.com> writes:
>>>>>>>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>>>>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>>>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>>>>>>> think a big part of the disagreement stems from the misconception that
>>>>>>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>>>>>>> which is absolutely not the case.
>>>>>>>>>
>>>>>>>>> I understand this, of course. I'm arguing, that it is harder to
>>>>>>>>> implement it, get it straight and then maintain it over years. Also it
>>>>>>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>>>>>>> spec, but this always comes at a cost of decreased readability and
>>>>>>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>>>>>>> to virtio-video for our use-case. And I believe it would be the same for
>>>>>>>>> every developer, that has to actually implement the spec, not just do
>>>>>>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>>>>>>> compatibility device (which I believe it is), then it is fine to have
>>>>>>>>> both and keep improving the virtio-video, including taking the best
>>>>>>>>> ideas from the V4L2 and overall using it as a reference to make writing
>>>>>>>>> the driver simpler.
>>>>>>>>
>>>>>>>> Let me jump in here and ask another question:
>>>>>>>>
>>>>>>>> Imagine that, some years in the future, somebody wants to add a virtio
>>>>>>>> device for handling video encoding/decoding to their hypervisor.
>>>>>>>>
>>>>>>>> Option 1: There are different devices to chose from. How is the person
>>>>>>>> implementing this supposed to pick a device? They might have a narrow
>>>>>>>> use case, where it is clear which of the devices is the one that needs to
>>>>>>>> be supported; but they also might have multiple, diverse use cases, and
>>>>>>>> end up needing to implement all of the devices.
>>>>>>>>
>>>>>>>> Option 2: There is one device with various optional features. The person
>>>>>>>> implementing this can start off with a certain subset of features
>>>>>>>> depending on their expected use cases, and add to it later, if needed;
>>>>>>>> but the upfront complexity might be too high for specialized use cases.
>>>>>>>>
>>>>>>>> Leaving concrete references to V4L2 out of the picture, we're currently
>>>>>>>> trying to decide whether our future will be more like Option 1 or Option
>>>>>>>> 2, with their respective trade-offs.
>>>>>>>>
>>>>>>>> I'm slightly biased towards Option 2; does it look feasible at all, or
>>>>>>>> am I missing something essential here? (I had the impression that some
>>>>>>>> previous confusion had been cleared up; apologies in advance if I'm
>>>>>>>> misrepresenting things.)
>>>>>>>>
>>>>>>>> I'd really love to see some kind of consensus for 1.3, if at all
>>>>>>>> possible :)
>>>>>>>
>>>>>>> I think feature discovery and extensibility is a key part of the VirtIO
>>>>>>> paradigm which is why I find the virtio-v4l approach limiting. By
>>>>>>> pegging the device to a Linux API we effectively limit the growth of the
>>>>>>> device specification to as fast as the Linux API changes. I'm not fully
>>>>>>> immersed in v4l but I don't think it is seeing any additional features
>>>>>>> developed for it and its limitations for camera are one of the reasons
>>>>>>> stuff is being pushed to userspace in solutions like libcamera:
>>>>>>>
>>>>>>>     How is libcamera different from V4L2?
>>>>>>>
>>>>>>>     We see libcamera as a continuation of V4L2. One that can more easily
>>>>>>>     handle the recent advances in hardware design. As embedded cameras have
>>>>>>>     developed, all of the complexity has been pushed on to the developers.
>>>>>>>     With libcamera, all of that complexity is simplified and a single model
>>>>>>>     is presented to application developers.
>>>>>>
>>>>>> Ok, that is interesting; thanks for the information.
>>>>>>
>>>>>>>
>>>>>>> That said its not totally our experience to have virtio devices act as
>>>>>>> simple pipes for some higher level protocol. The virtio-gpu spec says
>>>>>>> very little about the details of how 3D devices work and simply offers
>>>>>>> an opaque pipe to push a (potentially propriety) command stream to the
>>>>>>> back end. As far as I'm aware the proposals for Vulkan and Wayland
>>>>>>> device support doesn't even offer a feature bit but simply changes the
>>>>>>> graphics stream type in the command packets.
>>>>>>>
>>>>>>> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
>>>>>>> incompatible with other feature bits and make that the baseline
>>>>>>> implementation but it's not really in the spirit of what VirtIO is
>>>>>>> trying to achieve.
>>>>>>
>>>>>> I'd not be in favour of an incompatible feature flag,
>>>>>> either... extensions are good, but conflicting features is something
>>>>>> that I'd like to avoid.
>>>>>>
>>>>>> So, given that I'd still prefer to have a single device: How well does
>>>>>> the proposed virtio-video device map to a Linux driver implementation
>>>>>> that hooks into V4L2?
>>>>>
>>>>> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
>>>>> months making the existing driver fully V4L2 compliant. If this goal
>>>>> requires changing the spec, than we still have time to do that. I don't
>>>>> expect a lot of problems on this side. There might be problems with
>>>>> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
>>>>> of this can be accomplished over time.
>>>>>
>>>>>> If the general process flow is compatible and it
>>>>>> is mostly a question of wiring the parts together, I think pushing that
>>>>>> part of the complexity into the Linux driver is a reasonable
>>>>>> trade-off. Being able to use an existing protocol is nice, but if that
>>>>>> protocol is not perceived as flexible enough, it is probably not worth
>>>>>> encoding it into a spec. (Similar considerations apply to hooking up the
>>>>>> device in the hypervisor.)
>>>>>
>>>>> I very much agree with these statements. I think this is how it should
>>>>> be: we start with a compact but usable device, then add features and
>>>>> enable them using feature flags. Eventually we can cover all the
>>>>> use-cases of V4L2 unless we decide to have separate devices for them
>>>>> (virtio-camera, etc). This would be better in the long term I think.
>>>>
>>>> Camera's definitely have their quirks - mostly because many usecases are
>>>> hard to convey over a single Video device node (with the hardware) but I
>>>> think we might expect that complexity to be managed by the host, and
>>>> probably offer a ready made stream to the guest. Of course how to handle
>>>> multiple streams and configuration of the whole pipeline may get more
>>>> difficult and warrant a specific 'virtio-camera' ... but I would think
>>>> the basics could be covered generically to start with.
>>>>
>>>> It's not clear who's driving this implementation and spec, so I guess
>>>> there's more reading to do.
>>>>
>>>> Anyway, I've added Cc libcamera-devel to raise awareness of this topic
>>>> to camera list.
>>>>
>>>> I bet Laurent has some stronger opinions on how he'd see camera's exist
>>>> in a virtio space.
>>
>> You seem to think I have strong opinions about everything. This may not
>> be a complitely unfounded assumption ;-)
>>
>> Overall I agree with you, I think cameras are too complex for a
>> low-level virtualization protocol. I'd rather see a high-level protocol
>> that exposes webcam-like devices, with the low-level complexity handled
>> on the host side (using libcamera of course ;-)). This would support use
>> cases that require sharing hardware blocks between multiple logical
>> cameras, including sharing the same camera streams between multiple
>> guests.
>>
>> If a guest needs low-level access to the camera, including the ability
>> to control the raw camera sensor or ISP, then I'd recommend passing the
>> corresponding hardware blocks to the guest for exclusive access.
>>
>>> Personally I would rather see a separate virtio-camera specification
>>> that properly encapsulates all the various use cases we have for
>>> cameras. In many ways just processing a stream of video is a much
>>> simpler use case.
>>>
>>> During Linaro's Project Stratos we got a lot of feedback from members
>>> who professed interest in a virtio-camera initiative. However we were
>>> unable to get enough engineering resources from the various companies to
>>> collaborate in developing a specification that would meet everyone's
>>> needs. The problem space is wide from having numerous black and white
>>> sensor cameras on cars to the full on computational photography as
>>> exposed by modern camera systems on phones. If you want to read more
>>> words on the topic I wrote a blog post at the time:
>>>
>>>    https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
>>>
>>> Back to the topic of virtio-video as I understand it the principle
>>> features/configurations are:
>>>
>>>    - All the various CODECs, resolutions and pixel formats
>>>    - Stateful vs Stateless streams
>>>    - If we want support grabbing single frames from a source
>>>
>>> My main concern about the V4L approach is that it pegs updates to the
>>> interface to the continuing evolution of the V4L interface in Linux. Now
>>> maybe video is a solved problem and there won't be (m)any new features
>>> we need to add after the initial revision. However I'm not a domain
>>> expert here so I just don't know.
>>
>> I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
>> when we got a chance to meet face to face. I think the V4L2 kernel API
>> is a quite good fit in the sense that its level of abstraction, when
>> applied to video codecs and "simple" cameras (defined, more or less, as
>> something ressembling a USB webcam feature-wise). It doesn't mean that
>> the virtio-video or virtio-camera specifications should necessarily
>> reference V4L2 or use the exact same vocabulary, they could simply copy
>> the concepts, and stay loosely-coupled with V4L2 in the sense that both
>> specification should try to evolve in compatible directions.

Thanks for the info.

Would everybody agree to have only a simple USB webcam-like virtual
camera and expect more complex devices to be passed through for
exclusive access to a guest? I don't have my own opinion at the moment.
If we have an agreement here, then it would definitely help us move
forward with the virtio-video/virtio-v4l2 discussion. AFAIU this is what
Alex Bennée called "catering to the lowest common denominator" in his
article. Right? So he prefers to avoid this using feature negotiations
built in virtio. Well, I also like to have flexibility. Andrii, what do
you think?

Would V4L2 be enough for the virtual cameras in the future? For me the
existence of libcamera is already a sign, that V4L2 (or the way it is
developed) might not be flexible enough for everybody. If somebody has
an issue in the future, they might want to create a new device with an
overlapping scope. Then the same questions would be discussed again.

If we don't know yet answers to these questions, and we decide to
postpone the decision, then this means no devices could be merged,
right? For me this would be another argument to keep things separate.
Because we already know how to do the codecs. I think there is no
disagreement on this.

Well, basically virtio-video has taken a lot of ideas from V4L2. So in a
sense it is or it tries to be a subset of V4L2 for the codecs only, but
adapted for the virtualization case. I think it is much better defined
compared to V4L2 for this scope. I believe it can be extended to support
the simple cameras if necessary. However at the moment I'd prefer to see
a dedicated virtio-camera device as I said. So I agree with Alex Bennée.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
@ 2023-05-16 13:50                                                         ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-16 13:50 UTC (permalink / raw)
  To: Laurent Pinchart, Alex Bennée, virtio-dev, Albert Esteve,
	Matti Möll, Alexandre Courbot, Andrew Gazizov,
	Daniel Almeida, Cornelia Huck, Marcin Wojtas, Keiichi Watanabe,
	Gustavo Padovan, libcamera-devel, Bartłomiej Grzesik,
	Enrico Granata, Enric Balletbo i Serra, linux-media,
	Andrii Cherniavskyi

Hello Laurent,

+CC: Andrii

On 06.05.23 10:16, Laurent Pinchart wrote:
> I'm also CC'ing the linux-media@vger.kernel.org mailing list for these
> discussions, I'm sure there are folks there who are interested in codec
> and camera virtualization.
>
> On Sat, May 06, 2023 at 11:12:29AM +0300, Laurent Pinchart via libcamera-devel wrote:
>> On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
>>> Kieran Bingham writes:
>>>> Quoting Alexander Gordeev (2023-05-05 10:57:29)
>>>>> On 03.05.23 17:53, Cornelia Huck wrote:
>>>>>> On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>> Cornelia Huck <cohuck@redhat.com> writes:
>>>>>>>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>>>>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>>>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>>>>>>> think a big part of the disagreement stems from the misconception that
>>>>>>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>>>>>>> which is absolutely not the case.
>>>>>>>>>
>>>>>>>>> I understand this, of course. I'm arguing, that it is harder to
>>>>>>>>> implement it, get it straight and then maintain it over years. Also it
>>>>>>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>>>>>>> spec, but this always comes at a cost of decreased readability and
>>>>>>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>>>>>>> to virtio-video for our use-case. And I believe it would be the same for
>>>>>>>>> every developer, that has to actually implement the spec, not just do
>>>>>>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>>>>>>> compatibility device (which I believe it is), then it is fine to have
>>>>>>>>> both and keep improving the virtio-video, including taking the best
>>>>>>>>> ideas from the V4L2 and overall using it as a reference to make writing
>>>>>>>>> the driver simpler.
>>>>>>>>
>>>>>>>> Let me jump in here and ask another question:
>>>>>>>>
>>>>>>>> Imagine that, some years in the future, somebody wants to add a virtio
>>>>>>>> device for handling video encoding/decoding to their hypervisor.
>>>>>>>>
>>>>>>>> Option 1: There are different devices to chose from. How is the person
>>>>>>>> implementing this supposed to pick a device? They might have a narrow
>>>>>>>> use case, where it is clear which of the devices is the one that needs to
>>>>>>>> be supported; but they also might have multiple, diverse use cases, and
>>>>>>>> end up needing to implement all of the devices.
>>>>>>>>
>>>>>>>> Option 2: There is one device with various optional features. The person
>>>>>>>> implementing this can start off with a certain subset of features
>>>>>>>> depending on their expected use cases, and add to it later, if needed;
>>>>>>>> but the upfront complexity might be too high for specialized use cases.
>>>>>>>>
>>>>>>>> Leaving concrete references to V4L2 out of the picture, we're currently
>>>>>>>> trying to decide whether our future will be more like Option 1 or Option
>>>>>>>> 2, with their respective trade-offs.
>>>>>>>>
>>>>>>>> I'm slightly biased towards Option 2; does it look feasible at all, or
>>>>>>>> am I missing something essential here? (I had the impression that some
>>>>>>>> previous confusion had been cleared up; apologies in advance if I'm
>>>>>>>> misrepresenting things.)
>>>>>>>>
>>>>>>>> I'd really love to see some kind of consensus for 1.3, if at all
>>>>>>>> possible :)
>>>>>>>
>>>>>>> I think feature discovery and extensibility is a key part of the VirtIO
>>>>>>> paradigm which is why I find the virtio-v4l approach limiting. By
>>>>>>> pegging the device to a Linux API we effectively limit the growth of the
>>>>>>> device specification to as fast as the Linux API changes. I'm not fully
>>>>>>> immersed in v4l but I don't think it is seeing any additional features
>>>>>>> developed for it and its limitations for camera are one of the reasons
>>>>>>> stuff is being pushed to userspace in solutions like libcamera:
>>>>>>>
>>>>>>>     How is libcamera different from V4L2?
>>>>>>>
>>>>>>>     We see libcamera as a continuation of V4L2. One that can more easily
>>>>>>>     handle the recent advances in hardware design. As embedded cameras have
>>>>>>>     developed, all of the complexity has been pushed on to the developers.
>>>>>>>     With libcamera, all of that complexity is simplified and a single model
>>>>>>>     is presented to application developers.
>>>>>>
>>>>>> Ok, that is interesting; thanks for the information.
>>>>>>
>>>>>>>
>>>>>>> That said its not totally our experience to have virtio devices act as
>>>>>>> simple pipes for some higher level protocol. The virtio-gpu spec says
>>>>>>> very little about the details of how 3D devices work and simply offers
>>>>>>> an opaque pipe to push a (potentially propriety) command stream to the
>>>>>>> back end. As far as I'm aware the proposals for Vulkan and Wayland
>>>>>>> device support doesn't even offer a feature bit but simply changes the
>>>>>>> graphics stream type in the command packets.
>>>>>>>
>>>>>>> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
>>>>>>> incompatible with other feature bits and make that the baseline
>>>>>>> implementation but it's not really in the spirit of what VirtIO is
>>>>>>> trying to achieve.
>>>>>>
>>>>>> I'd not be in favour of an incompatible feature flag,
>>>>>> either... extensions are good, but conflicting features is something
>>>>>> that I'd like to avoid.
>>>>>>
>>>>>> So, given that I'd still prefer to have a single device: How well does
>>>>>> the proposed virtio-video device map to a Linux driver implementation
>>>>>> that hooks into V4L2?
>>>>>
>>>>> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
>>>>> months making the existing driver fully V4L2 compliant. If this goal
>>>>> requires changing the spec, than we still have time to do that. I don't
>>>>> expect a lot of problems on this side. There might be problems with
>>>>> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
>>>>> of this can be accomplished over time.
>>>>>
>>>>>> If the general process flow is compatible and it
>>>>>> is mostly a question of wiring the parts together, I think pushing that
>>>>>> part of the complexity into the Linux driver is a reasonable
>>>>>> trade-off. Being able to use an existing protocol is nice, but if that
>>>>>> protocol is not perceived as flexible enough, it is probably not worth
>>>>>> encoding it into a spec. (Similar considerations apply to hooking up the
>>>>>> device in the hypervisor.)
>>>>>
>>>>> I very much agree with these statements. I think this is how it should
>>>>> be: we start with a compact but usable device, then add features and
>>>>> enable them using feature flags. Eventually we can cover all the
>>>>> use-cases of V4L2 unless we decide to have separate devices for them
>>>>> (virtio-camera, etc). This would be better in the long term I think.
>>>>
>>>> Camera's definitely have their quirks - mostly because many usecases are
>>>> hard to convey over a single Video device node (with the hardware) but I
>>>> think we might expect that complexity to be managed by the host, and
>>>> probably offer a ready made stream to the guest. Of course how to handle
>>>> multiple streams and configuration of the whole pipeline may get more
>>>> difficult and warrant a specific 'virtio-camera' ... but I would think
>>>> the basics could be covered generically to start with.
>>>>
>>>> It's not clear who's driving this implementation and spec, so I guess
>>>> there's more reading to do.
>>>>
>>>> Anyway, I've added Cc libcamera-devel to raise awareness of this topic
>>>> to camera list.
>>>>
>>>> I bet Laurent has some stronger opinions on how he'd see camera's exist
>>>> in a virtio space.
>>
>> You seem to think I have strong opinions about everything. This may not
>> be a complitely unfounded assumption ;-)
>>
>> Overall I agree with you, I think cameras are too complex for a
>> low-level virtualization protocol. I'd rather see a high-level protocol
>> that exposes webcam-like devices, with the low-level complexity handled
>> on the host side (using libcamera of course ;-)). This would support use
>> cases that require sharing hardware blocks between multiple logical
>> cameras, including sharing the same camera streams between multiple
>> guests.
>>
>> If a guest needs low-level access to the camera, including the ability
>> to control the raw camera sensor or ISP, then I'd recommend passing the
>> corresponding hardware blocks to the guest for exclusive access.
>>
>>> Personally I would rather see a separate virtio-camera specification
>>> that properly encapsulates all the various use cases we have for
>>> cameras. In many ways just processing a stream of video is a much
>>> simpler use case.
>>>
>>> During Linaro's Project Stratos we got a lot of feedback from members
>>> who professed interest in a virtio-camera initiative. However we were
>>> unable to get enough engineering resources from the various companies to
>>> collaborate in developing a specification that would meet everyone's
>>> needs. The problem space is wide from having numerous black and white
>>> sensor cameras on cars to the full on computational photography as
>>> exposed by modern camera systems on phones. If you want to read more
>>> words on the topic I wrote a blog post at the time:
>>>
>>>    https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
>>>
>>> Back to the topic of virtio-video as I understand it the principle
>>> features/configurations are:
>>>
>>>    - All the various CODECs, resolutions and pixel formats
>>>    - Stateful vs Stateless streams
>>>    - If we want support grabbing single frames from a source
>>>
>>> My main concern about the V4L approach is that it pegs updates to the
>>> interface to the continuing evolution of the V4L interface in Linux. Now
>>> maybe video is a solved problem and there won't be (m)any new features
>>> we need to add after the initial revision. However I'm not a domain
>>> expert here so I just don't know.
>>
>> I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
>> when we got a chance to meet face to face. I think the V4L2 kernel API
>> is a quite good fit in the sense that its level of abstraction, when
>> applied to video codecs and "simple" cameras (defined, more or less, as
>> something ressembling a USB webcam feature-wise). It doesn't mean that
>> the virtio-video or virtio-camera specifications should necessarily
>> reference V4L2 or use the exact same vocabulary, they could simply copy
>> the concepts, and stay loosely-coupled with V4L2 in the sense that both
>> specification should try to evolve in compatible directions.

Thanks for the info.

Would everybody agree to have only a simple USB webcam-like virtual
camera and expect more complex devices to be passed through for
exclusive access to a guest? I don't have my own opinion at the moment.
If we have an agreement here, then it would definitely help us move
forward with the virtio-video/virtio-v4l2 discussion. AFAIU this is what
Alex Bennée called "catering to the lowest common denominator" in his
article. Right? So he prefers to avoid this using feature negotiations
built in virtio. Well, I also like to have flexibility. Andrii, what do
you think?

Would V4L2 be enough for the virtual cameras in the future? For me the
existence of libcamera is already a sign, that V4L2 (or the way it is
developed) might not be flexible enough for everybody. If somebody has
an issue in the future, they might want to create a new device with an
overlapping scope. Then the same questions would be discussed again.

If we don't know yet answers to these questions, and we decide to
postpone the decision, then this means no devices could be merged,
right? For me this would be another argument to keep things separate.
Because we already know how to do the codecs. I think there is no
disagreement on this.

Well, basically virtio-video has taken a lot of ideas from V4L2. So in a
sense it is or it tries to be a subset of V4L2 for the codecs only, but
adapted for the virtualization case. I think it is much better defined
compared to V4L2 for this scope. I believe it can be extended to support
the simple cameras if necessary. However at the moment I'd prefer to see
a dedicated virtio-camera device as I said. So I agree with Alex Bennée.


--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-12  4:09                                               ` Alexandre Courbot
@ 2023-05-16 14:53                                                 ` Alexander Gordeev
  2023-05-17 16:28                                                   ` Cornelia Huck
  2023-05-17 11:04                                                 ` Alexander Gordeev
  1 sibling, 1 reply; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-16 14:53 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 12.05.23 06:09, Alexandre Courbot wrote:
> On Thu, May 11, 2023 at 5:50 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 08.05.23 06:55, Alexandre Courbot wrote:
>>> On Fri, May 5, 2023 at 8:55 PM Alexander Gordeev
>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>
>>>> On 03.05.23 16:04, Cornelia Huck wrote:
>>>>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>>>
>>>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>>>> think a big part of the disagreement stems from the misconception that
>>>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>>>> which is absolutely not the case.
>>>>>>
>>>>>> I understand this, of course. I'm arguing, that it is harder to
>>>>>> implement it, get it straight and then maintain it over years. Also it
>>>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>>>> spec, but this always comes at a cost of decreased readability and
>>>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>>>> to virtio-video for our use-case. And I believe it would be the same for
>>>>>> every developer, that has to actually implement the spec, not just do
>>>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>>>> compatibility device (which I believe it is), then it is fine to have
>>>>>> both and keep improving the virtio-video, including taking the best
>>>>>> ideas from the V4L2 and overall using it as a reference to make writing
>>>>>> the driver simpler.
>>>>>
>>>>> Let me jump in here and ask another question:
>>>>>
>>>>> Imagine that, some years in the future, somebody wants to add a virtio
>>>>> device for handling video encoding/decoding to their hypervisor.
>>>>>
>>>>> Option 1: There are different devices to chose from. How is the person
>>>>> implementing this supposed to pick a device? They might have a narrow
>>>>> use case, where it is clear which of the devices is the one that needs to
>>>>> be supported; but they also might have multiple, diverse use cases, and
>>>>> end up needing to implement all of the devices.
>>>>
>>>> I think in this case virtio-v4l2 should be used as a compatibility
>>>> device exclusively. This means discouraging increasing its complexity
>>>> even more with more patches in the spec. virtio-video should eventually
>>>> cover all the use-cases of V4L2, so I think it is reasonable to use it
>>>> in both complex use-cases and in simple use-cases, where there is no
>>>> decoder/encoder V4L2 device on the host.
>>>>
>>>>> Option 2: There is one device with various optional features. The person
>>>>> implementing this can start off with a certain subset of features
>>>>> depending on their expected use cases, and add to it later, if needed;
>>>>> but the upfront complexity might be too high for specialized use cases.
>>>
>>> I don't see that many negociable features we can provide for a
>>> decoder/encoder device - at least not many that are not considered
>>> basic (like guest buffers). In terms of provided features for codecs
>>> virtio-video and virtio-v4l2 are essentially equivalent.
>>
>> Actually I see a lot of potential in using the virtio feature flag
>> negotiation for virtio-video:
>>
>> 1. We already have some feature flags related to memory management.
>>
>> 2. I think it would be great to take V4L2 controls negotiation and turn
>> it into the feature flags negotiation. I really like this idea and I'd
>> like to implement it. Not all the controls at once, of course. Still it
>> would be very easy to port more given the existing process. They
>> correspond well enough to each other, I think. This way we don't need to
>> introduce something like the VIDIOC_QUERYCTRL/VIDIOC_QUERY_EXT_CTRL, we
>> don't need two mechanisms for feature negotiations (like it would be
>> with virtio-v4l2, right?), also all the features would be in one place.
>> Then we can directly reference some enums from V4L2, like
>> v4l2_mpeg_video_h264_profile or v4l2_mpeg_video_h264_level. That's what
>> I call taking the best from V4L2.
>>
>> 3. We can have things, that V4L2 doesn't support in their stateful UAPI.
>> For example, dequeuing output buffers in decoder order.
>
> ... provided the host can do that (i.e. has a stateless decoder
> interface). What is the use-case for this btw?

We have discussed this already. All the relevant quotes can be seen
closer to the end of the email here:
https://markmail.org/message/skotipxlqfiijj7c#query:+page:1+mid:hvkyxsj4tjyq56hj+state:results

>>> The question is more: do we want a decoder/encoder specification and
>>> another one for cameras or do we want something that covers video
>>> devices in general. And is it desirable to reuse an already existing
>>> protocol for communication between a non-privileged entity and a
>>> privileged one (V4L2) or define our own?
>>
>> Please note, that virtio-video could be extended to support everything
>> V4L2 does in the future. Including cameras if necessary. So it can cover
>> video devices in general too.
>
> That's a very, very bold claim, and if you do you will inevitably need
> to add things that are not relevant to the codec case, which is your
> complaint about V4L2. Not to mention the many new pages of spec that
> this will require.

Hmm, you're right. My statement was not specific enough. Indeed V4L2
does a lot of stuff. I'll rephrase.
I think, virtio-video could be extended to support at least the more or
less basic devices like USB cameras, maybe a little more. Everything
beyond this is possible, but should be avoided.

>>> About the latter point, Alex Bennée mentioned that it was difficult to
>>> find the engineering time to define virtio-camera. And virtio-video
>>> has also not been particularly fast to come together. virtio-v4l2
>>> basically serves us both on a plate for a lower effort.
>>
>> If we talk only about codecs, the effort is lower only in case you have
>> V4L2 codecs on the host. Otherwise the effort seems higher.
>
> The effort is lower in terms of spec writing, and also lower if your
> guest is Linux as you can support all video devices with a single
> driver. That's a very large portion of the virtio users here.

As we found out already, the V4L2 probably has to be patched to be
actually useful in virtio. So I'm not convinced that more complex
devices wouldn't require more patches. So in the end it is hard to
compare the effort.
For the spec I'm also not so sure about that. I think we'll have to see
it first and let it pass a couple of review rounds.
Anyway the effort writing the devices would be higher in cases, when
pass through is not possible. That's our use-case. I think the goal of
making it easy to write devices should have higher priority compared to
the length of the device part in the virtio spec.
But I think we already wrote all these points, so we're making circles.

>> I also hope to be able to update virtio-video at a faster pace. Please
>> let me try.
>
> Please hold on a bit - there are two things here.
>
> 1) I'd like to settle the virtio-v4l2/virtio-video argument first to
> make sure we don't get two things clashing head-on. As far as codecs
> are concerned we certainly don't need both. Cornelia, I think we'll
> need you to make a call on this, or at least tell us what you need to
> make the call. If it helps I can send a draft of what the virtio-v4l2
> spec would look like, it should be relatively short.
>
> 2) I (and other ChromeOS contributors) have been driving this spec so
> far and while I think virtio-v4l2 is a better solution, I have not
> said I would give up on virtio-video if virtio-v4l2 was not adopted
> and will keep iterating on it in that case.

It actually very much looks like you gave up. I mean, you developed it
for years, and now you'd like to throw it away. Well, I meant something
like "please have some faith in my ability to update virtio-video at a
faster pace". I think I already have the permission to continue the
development. You know, OpenSynergy also spent some time on the
virtio-video. This includes the draft v1 version and the V4L2 driver. I
think this was kind of an informal agreement between us, that you do the
spec and we do the driver. It didn't work well enough since draft v4, I
think. Now as our interests are not aligned anymore, it is fine to
continue separately. I'm not going to wait until you change your mind on
virtio-v4l2. Obviously you're very busy with it right now. If I stop and
wait now, we won't have the video device in the 1.3 release for sure. I
think when we are ready to develop virtio-video together again, we'll
need a better agreement.
That said, I very much appreciate your efforts with the spec. It
definitely received a lot of improvements.

> That said, your contributions are of course welcome if you have stuff
> written down that you want to include. The current virtio-video spec
> is available here:
>
> https://github.com/Gnurou/virtio-video-spec
>
> I'm writing it in Markdown (virtio-video.md) to avoid dealing with
> LaTeX directly and use a pandoc filter to convert it before submission
> - there is a Makefile that takes care of that. Feel free to send a
> pull request with changes you have worked on, including your
> Signed-off-by in the patches so I can carry it on into the v7 patch if
> we go that route.

Sorry, I tried this, but I had a lot of troubles with any maths in
Markdown. So after some time I just moved to TeX. Now I have a great
development environment for TeX. I have the final pdf file on the right
side, and it gets updated every time I save a file. I can immediately
see the formatted result as it would be in the spec. So I'd prefer to
stay with TeX.

Also I'm not completely sure if the OASIS IPR license allows this kind
of workflow. Simply sending patches to the mailing list seems like a
safer choice. Could anyone please help understand if it is OK when
several developers work in a separate repository and then one of them
publishes the combined changes to the mailing list?

>> This is understandable, that working on the specs takes time. That's why
>> I'm all for making room for everybody to work. I think eventually
>> virtio-video or virtio-video + virtio-camera can replace virtio-v4l2. I
>> strongly believe this is a better solution for the long term.
>>
>>>> In this case I'd prefer to have the simpler device first, that is the
>>>> current virtio-video, then to add features incrementally using feature
>>>> flags and taking into account the virtualization context. V4L2 is a
>>>> complex thing from a different context. They already tried to carve out
>>>> some of the use-cases like stateful decoder/encoder API, but this work
>>>> is not finished (struct v4l2_buffer can serve as an evidence). This is
>>>> like dissecting a monolith. Also it has to be patched to make it more
>>>> appropriate for virtualization (we can see this in Alexandre's PoC already).
>>>>
>>>>> Leaving concrete references to V4L2 out of the picture, we're currently
>>>>> trying to decide whether our future will be more like Option 1 or Option
>>>>> 2, with their respective trade-offs.
>>>>
>>>> I'd like to rely on opinions of people, who know more about virtio
>>>> development and goals. I would be happy to present or reiterate my
>>>> arguments to anyone interested if necessary.
>>>>
>>>>> I'm slightly biased towards Option 2; does it look feasible at all, or
>>>>> am I missing something essential here? (I had the impression that some
>>>>> previous confusion had been cleared up; apologies in advance if I'm
>>>>> misrepresenting things.)
>>>>
>>>> Indeed some of the previous confusion has been cleared up. But not the
>>>> key thing. Alexandre still claims, that this patched V4L2 UAPI pass
>>>> through is only marginally more complex, for example. I don't agree with
>>>> this and I have evidence. We haven't finished discussing this evidence.
>>>
>>> Are you talking about v4l2_buffer?
>>>
>>> https://www.kernel.org/doc/html/v4.9/media/uapi/v4l/buffer.html#struct-v4l2-buffer
>>>
>>> I think you implied that some of its fields were not relevant for
>>> video decoding or encoding, which if you examine them is again
>>> incorrect. That also answers your question of why the stateful decoder
>>> spec did not mention the valid fields - because it is documented on
>>> this page, which tells exactly which fields the driver/device are
>>> expected to set for each queue.
>>
>> I'm talking about struct v4l2_buffer, yes, but not only. Also about
>> struct v4l2_plane, enum v4l2_buf_type, the buffer flags, enum
>> v4l2_memory (but this one is comparable to virtio-video), timecodes.
>> For example, the way the fields in the struct v4l2_buffer and struct
>> v4l2_plane are filled and interpreted depends a lot on the type. Here is
>> the enum v4l2_buf_type:
>>
>> enum v4l2_buf_type {
>>           V4L2_BUF_TYPE_VIDEO_CAPTURE        = 1,
>>           V4L2_BUF_TYPE_VIDEO_OUTPUT         = 2,
>>           V4L2_BUF_TYPE_VIDEO_OVERLAY        = 3,
>>           V4L2_BUF_TYPE_VBI_CAPTURE          = 4,
>>           V4L2_BUF_TYPE_VBI_OUTPUT           = 5,
>>           V4L2_BUF_TYPE_SLICED_VBI_CAPTURE   = 6,
>>           V4L2_BUF_TYPE_SLICED_VBI_OUTPUT    = 7,
>>           V4L2_BUF_TYPE_VIDEO_OUTPUT_OVERLAY = 8,
>>           V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE = 9,
>>           V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE  = 10,
>>           V4L2_BUF_TYPE_SDR_CAPTURE          = 11,
>>           V4L2_BUF_TYPE_SDR_OUTPUT           = 12,
>>           V4L2_BUF_TYPE_META_CAPTURE         = 13,
>>           V4L2_BUF_TYPE_META_OUTPUT          = 14,
>>           /* Deprecated, do not use */
>>           V4L2_BUF_TYPE_PRIVATE              = 0x80,
>> };
>>
>> Of these 14 cases we only need 2 for the codecs. Right?
>> Also the flags. There are 22 of them. Are they all needed too? I don't
>> think so. We only have like 5 in virtio-video at the moment.
>
> Upon reading the V4L2 spec it is pretty clear which queue types are
> valid for each device, and anything other than CAPTURE_MPLANE and
> OUTPUT_MPLANE is unlikely to be used anyway.
>
> We are really splitting hairs here and this looks like a wild goose
> chase for software purity - even within virtio-video there are already
> flags that only make sense for an encoder, and you cannot remove them
> without defining new more specific structures and complicating things
> overall. If you extend virtio-video to support more use-cases, you
> will end up with more of these as well. So seriously, why is this such
> a big deal when the instructions on how to use these structures for
> each use case are at the other end of a link click?

I don't agree with this description. I have written my arguments several
times. It took me a lot of time to write all these emails. It is very
disappointing, that you still don't seem to take them seriously.

>> You posted code for filling v4l2_buffer in one of your previous emails.
>> What I'm trying to say is that a person, who doesn't know this in
>> advance, will have a hard time writing this same code if they only have
>> the virtio-v4l2 spec.
>
> Well yes, the counterpart of virtio-v4l2 being shorter is that you
> need to refer to V4L2 as well - that's actually the point, to reduce
> the burden on virtio by reusing a spec that already exists and
> referring to it.

As I wrote, I don't agree, that reducing the burden on virtio is our
primary goal.

> Earlier in this email you mentioned reusing structs like
> v4l2_mpeg_video_h264_profile in virtio-video, that creates the same
> dependency to the V4L2 spec. The difference is how much we are taking
> from it.

Yes, exactly. I think taking all of the V4L2 spec is too much. I'd
prefer to only take the parts, that are well-defined, well aligned with
virtio and don't need patches on top.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* [virtio-dev] Re: [libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
       [not found]                                                   ` <20230506081229.GA8114@pendragon.ideasonboard.com>
  2023-05-06  8:16                                                     ` [libcamera-devel] " Laurent Pinchart
@ 2023-05-17  3:58                                                     ` Tomasz Figa
  1 sibling, 0 replies; 97+ messages in thread
From: Tomasz Figa @ 2023-05-17  3:58 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Alex Bennée, virtio-dev, Albert Esteve, Matti Möll,
	Alexandre Courbot, Andrew Gazizov, Daniel Almeida, Cornelia Huck,
	Marcin Wojtas, Keiichi Watanabe, Gustavo Padovan,
	Alexander Gordeev, libcamera-devel, Bartłomiej Grzesik,
	Enrico Granata, Enric Balletbo i Serra

On Sat, May 6, 2023 at 5:12 PM Laurent Pinchart via libcamera-devel
<libcamera-devel@lists.libcamera.org> wrote:
>
> Hello,
>
> On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
> > Kieran Bingham writes:
> >
> > > Hi All,
> > >
> > > Coming in late, thanks to lei/lore spotting the libcamera keyword.
> > >
> > > + Cc: libcamera-devel to raise awareness of the discussion there.
> > >
> > > Quoting Alexander Gordeev (2023-05-05 10:57:29)
> > >> On 03.05.23 17:53, Cornelia Huck wrote:
> > >> > On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
> > >> >> Cornelia Huck <cohuck@redhat.com> writes:
> > >> >>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
> > >> >>>> On 27.04.23 15:16, Alexandre Courbot wrote:
> > >> >>>>> But in any case, that's irrelevant to the guest-host interface, and I
> > >> >>>>> think a big part of the disagreement stems from the misconception that
> > >> >>>>> V4L2 absolutely needs to be used on either the guest or the host,
> > >> >>>>> which is absolutely not the case.
> > >> >>>>
> > >> >>>> I understand this, of course. I'm arguing, that it is harder to
> > >> >>>> implement it, get it straight and then maintain it over years. Also it
> > >> >>>> brings limitations, that sometimes can be workarounded in the virtio
> > >> >>>> spec, but this always comes at a cost of decreased readability and
> > >> >>>> increased complexity. Overall it looks clearly as a downgrade compared
> > >> >>>> to virtio-video for our use-case. And I believe it would be the same for
> > >> >>>> every developer, that has to actually implement the spec, not just do
> > >> >>>> the pass through. So if we think of V4L2 UAPI pass through as a
> > >> >>>> compatibility device (which I believe it is), then it is fine to have
> > >> >>>> both and keep improving the virtio-video, including taking the best
> > >> >>>> ideas from the V4L2 and overall using it as a reference to make writing
> > >> >>>> the driver simpler.
> > >> >>>
> > >> >>> Let me jump in here and ask another question:
> > >> >>>
> > >> >>> Imagine that, some years in the future, somebody wants to add a virtio
> > >> >>> device for handling video encoding/decoding to their hypervisor.
> > >> >>>
> > >> >>> Option 1: There are different devices to chose from. How is the person
> > >> >>> implementing this supposed to pick a device? They might have a narrow
> > >> >>> use case, where it is clear which of the devices is the one that needs to
> > >> >>> be supported; but they also might have multiple, diverse use cases, and
> > >> >>> end up needing to implement all of the devices.
> > >> >>>
> > >> >>> Option 2: There is one device with various optional features. The person
> > >> >>> implementing this can start off with a certain subset of features
> > >> >>> depending on their expected use cases, and add to it later, if needed;
> > >> >>> but the upfront complexity might be too high for specialized use cases.
> > >> >>>
> > >> >>> Leaving concrete references to V4L2 out of the picture, we're currently
> > >> >>> trying to decide whether our future will be more like Option 1 or Option
> > >> >>> 2, with their respective trade-offs.
> > >> >>>
> > >> >>> I'm slightly biased towards Option 2; does it look feasible at all, or
> > >> >>> am I missing something essential here? (I had the impression that some
> > >> >>> previous confusion had been cleared up; apologies in advance if I'm
> > >> >>> misrepresenting things.)
> > >> >>>
> > >> >>> I'd really love to see some kind of consensus for 1.3, if at all
> > >> >>> possible :)
> > >> >>
> > >> >> I think feature discovery and extensibility is a key part of the VirtIO
> > >> >> paradigm which is why I find the virtio-v4l approach limiting. By
> > >> >> pegging the device to a Linux API we effectively limit the growth of the
> > >> >> device specification to as fast as the Linux API changes. I'm not fully
> > >> >> immersed in v4l but I don't think it is seeing any additional features
> > >> >> developed for it and its limitations for camera are one of the reasons
> > >> >> stuff is being pushed to userspace in solutions like libcamera:
> > >> >>
> > >> >>    How is libcamera different from V4L2?
> > >> >>
> > >> >>    We see libcamera as a continuation of V4L2. One that can more easily
> > >> >>    handle the recent advances in hardware design. As embedded cameras have
> > >> >>    developed, all of the complexity has been pushed on to the developers.
> > >> >>    With libcamera, all of that complexity is simplified and a single model
> > >> >>    is presented to application developers.
> > >> >
> > >> > Ok, that is interesting; thanks for the information.
> > >> >
> > >> >>
> > >> >> That said its not totally our experience to have virtio devices act as
> > >> >> simple pipes for some higher level protocol. The virtio-gpu spec says
> > >> >> very little about the details of how 3D devices work and simply offers
> > >> >> an opaque pipe to push a (potentially propriety) command stream to the
> > >> >> back end. As far as I'm aware the proposals for Vulkan and Wayland
> > >> >> device support doesn't even offer a feature bit but simply changes the
> > >> >> graphics stream type in the command packets.
> > >> >>
> > >> >> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
> > >> >> incompatible with other feature bits and make that the baseline
> > >> >> implementation but it's not really in the spirit of what VirtIO is
> > >> >> trying to achieve.
> > >> >
> > >> > I'd not be in favour of an incompatible feature flag,
> > >> > either... extensions are good, but conflicting features is something
> > >> > that I'd like to avoid.
> > >> >
> > >> > So, given that I'd still prefer to have a single device: How well does
> > >> > the proposed virtio-video device map to a Linux driver implementation
> > >> > that hooks into V4L2?
> > >>
> > >> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
> > >> months making the existing driver fully V4L2 compliant. If this goal
> > >> requires changing the spec, than we still have time to do that. I don't
> > >> expect a lot of problems on this side. There might be problems with
> > >> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
> > >> of this can be accomplished over time.
> > >>
> > >> > If the general process flow is compatible and it
> > >> > is mostly a question of wiring the parts together, I think pushing that
> > >> > part of the complexity into the Linux driver is a reasonable
> > >> > trade-off. Being able to use an existing protocol is nice, but if that
> > >> > protocol is not perceived as flexible enough, it is probably not worth
> > >> > encoding it into a spec. (Similar considerations apply to hooking up the
> > >> > device in the hypervisor.)
> > >>
> > >> I very much agree with these statements. I think this is how it should
> > >> be: we start with a compact but usable device, then add features and
> > >> enable them using feature flags. Eventually we can cover all the
> > >> use-cases of V4L2 unless we decide to have separate devices for them
> > >> (virtio-camera, etc). This would be better in the long term I think.
> > >
> > > Camera's definitely have their quirks - mostly because many usecases are
> > > hard to convey over a single Video device node (with the hardware) but I
> > > think we might expect that complexity to be managed by the host, and
> > > probably offer a ready made stream to the guest. Of course how to handle
> > > multiple streams and configuration of the whole pipeline may get more
> > > difficult and warrant a specific 'virtio-camera' ... but I would think
> > > the basics could be covered generically to start with.
> > >
> > > It's not clear who's driving this implementation and spec, so I guess
> > > there's more reading to do.
> > >
> > > Anyway, I've added Cc libcamera-devel to raise awareness of this topic
> > > to camera list.
> > >
> > > I bet Laurent has some stronger opinions on how he'd see camera's exist
> > > in a virtio space.
>
> You seem to think I have strong opinions about everything. This may not
> be a complitely unfounded assumption ;-)
>
> Overall I agree with you, I think cameras are too complex for a
> low-level virtualization protocol. I'd rather see a high-level protocol
> that exposes webcam-like devices, with the low-level complexity handled
> on the host side (using libcamera of course ;-)). This would support use
> cases that require sharing hardware blocks between multiple logical
> cameras, including sharing the same camera streams between multiple
> guests.
>
> If a guest needs low-level access to the camera, including the ability
> to control the raw camera sensor or ISP, then I'd recommend passing the
> corresponding hardware blocks to the guest for exclusive access.

I'd argue that to be rather a serious limitation for systems like
ChromeOS, which run multiple rich OS environments in virtual machines,
Android being one. Having to pass through the entire hardware
subsystem to get slightly more control over the camera (RAW capture
and reprocessing afterwards is by no means an advanced capability
these days) would impose a number serious implications:

1) can't share access to the camera subsystem across different
applications in different VMs (given that it's common SoC design to
have a single set of ISP devices, which can handle more than 1 stream
simultaneously),

2) need to have all the low level code to deal with the hardware
duplicated (or often reimplemented) for all the different VM guest
OSes,

3) lack of a way to introduce a security layer between the camera hw
subsystem and guest OS.

I'd argue that for proper camera virtualization for consumer
applications, we would need something on an abstraction level closer
to the Android Camera HAL3 API, or actually, the libcamera native API.

Best regards,
Tomasz

>
> > Personally I would rather see a separate virtio-camera specification
> > that properly encapsulates all the various use cases we have for
> > cameras. In many ways just processing a stream of video is a much
> > simpler use case.
> >
> > During Linaro's Project Stratos we got a lot of feedback from members
> > who professed interest in a virtio-camera initiative. However we were
> > unable to get enough engineering resources from the various companies to
> > collaborate in developing a specification that would meet everyone's
> > needs. The problem space is wide from having numerous black and white
> > sensor cameras on cars to the full on computational photography as
> > exposed by modern camera systems on phones. If you want to read more
> > words on the topic I wrote a blog post at the time:
> >
> >   https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
> >
> > Back to the topic of virtio-video as I understand it the principle
> > features/configurations are:
> >
> >   - All the various CODECs, resolutions and pixel formats
> >   - Stateful vs Stateless streams
> >   - If we want support grabbing single frames from a source
> >
> > My main concern about the V4L approach is that it pegs updates to the
> > interface to the continuing evolution of the V4L interface in Linux. Now
> > maybe video is a solved problem and there won't be (m)any new features
> > we need to add after the initial revision. However I'm not a domain
> > expert here so I just don't know.
>
> I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
> when we got a chance to meet face to face. I think the V4L2 kernel API
> is a quite good fit in the sense that its level of abstraction, when
> applied to video codecs and "simple" cameras (defined, more or less, as
> something ressembling a USB webcam feature-wise). It doesn't mean that
> the virtio-video or virtio-camera specifications should necessarily
> reference V4L2 or use the exact same vocabulary, they could simply copy
> the concepts, and stay loosely-coupled with V4L2 in the sense that both
> specification should try to evolve in compatible directions.
>
> --
> Regards,
>
> Laurent Pinchart

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-12  4:15                                                 ` Alexandre Courbot
@ 2023-05-17  7:35                                                   ` Alexander Gordeev
  0 siblings, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-17  7:35 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 12.05.23 06:15, Alexandre Courbot wrote:
> On Thu, May 11, 2023 at 6:00 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> On 11.05.23 10:50, Alexander Gordeev wrote:
>>> On 08.05.23 06:55, Alexandre Courbot wrote:
>>>> On Fri, May 5, 2023 at 8:55 PM Alexander Gordeev
>>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>>
>>>>> On 03.05.23 16:04, Cornelia Huck wrote:
>>>>>> On Fri, Apr 28 2023, Alexander Gordeev
>>>>>> <alexander.gordeev@opensynergy.com> wrote:
>>>>>>
>>>>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>>>>> think a big part of the disagreement stems from the misconception
>>>>>>>> that
>>>>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>>>>> which is absolutely not the case.
>>>>>>>
>>>>>>> I understand this, of course. I'm arguing, that it is harder to
>>>>>>> implement it, get it straight and then maintain it over years. Also it
>>>>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>>>>> spec, but this always comes at a cost of decreased readability and
>>>>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>>>>> to virtio-video for our use-case. And I believe it would be the
>>>>>>> same for
>>>>>>> every developer, that has to actually implement the spec, not just do
>>>>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>>>>> compatibility device (which I believe it is), then it is fine to have
>>>>>>> both and keep improving the virtio-video, including taking the best
>>>>>>> ideas from the V4L2 and overall using it as a reference to make
>>>>>>> writing
>>>>>>> the driver simpler.
>>>>>>
>>>>>> Let me jump in here and ask another question:
>>>>>>
>>>>>> Imagine that, some years in the future, somebody wants to add a virtio
>>>>>> device for handling video encoding/decoding to their hypervisor.
>>>>>>
>>>>>> Option 1: There are different devices to chose from. How is the person
>>>>>> implementing this supposed to pick a device? They might have a narrow
>>>>>> use case, where it is clear which of the devices is the one that
>>>>>> needs to
>>>>>> be supported; but they also might have multiple, diverse use cases, and
>>>>>> end up needing to implement all of the devices.
>>>>>
>>>>> I think in this case virtio-v4l2 should be used as a compatibility
>>>>> device exclusively. This means discouraging increasing its complexity
>>>>> even more with more patches in the spec. virtio-video should eventually
>>>>> cover all the use-cases of V4L2, so I think it is reasonable to use it
>>>>> in both complex use-cases and in simple use-cases, where there is no
>>>>> decoder/encoder V4L2 device on the host.
>>>>>
>>>>>> Option 2: There is one device with various optional features. The
>>>>>> person
>>>>>> implementing this can start off with a certain subset of features
>>>>>> depending on their expected use cases, and add to it later, if needed;
>>>>>> but the upfront complexity might be too high for specialized use cases.
>>>>
>>>> I don't see that many negociable features we can provide for a
>>>> decoder/encoder device - at least not many that are not considered
>>>> basic (like guest buffers). In terms of provided features for codecs
>>>> virtio-video and virtio-v4l2 are essentially equivalent.
>>>
>>> Actually I see a lot of potential in using the virtio feature flag
>>> negotiation for virtio-video:
>>>
>>> 1. We already have some feature flags related to memory management.
>>>
>>> 2. I think it would be great to take V4L2 controls negotiation and turn
>>> it into the feature flags negotiation. I really like this idea and I'd
>>> like to implement it. Not all the controls at once, of course. Still it
>>> would be very easy to port more given the existing process. They
>>> correspond well enough to each other, I think. This way we don't need to
>>> introduce something like the VIDIOC_QUERYCTRL/VIDIOC_QUERY_EXT_CTRL, we
>>> don't need two mechanisms for feature negotiations (like it would be
>>> with virtio-v4l2, right?), also all the features would be in one place.
>>> Then we can directly reference some enums from V4L2, like
>>> v4l2_mpeg_video_h264_profile or v4l2_mpeg_video_h264_level. That's what
>>> I call taking the best from V4L2.
>>>
>>> 3. We can have things, that V4L2 doesn't support in their stateful UAPI.
>>> For example, dequeuing output buffers in decoder order.
>>
>> I'd like to also share my current roadmap for the draft v7 of
>> virtio-video (or maybe the draft v8 in some cases). The significant
>> changes would be:
>>
>> 1. Making querying the capabilities fully compatible with V4L2 but in 1
>> round-trip over virtio, not 10+. This is what I'm actively working on
>> right now.
>
> That sounds good, while not essential since the capability negotiation
> occurs before we start streaming so these extra trips should not be
> perceptible by the user. But the current capability command was not
> suitable and needs to be improved anyway.

Great!
Well, there are a lot of steps in capability negotiations in V4L2:
1. enumeration of coded formats,
2. setting the coded format on OUTPUT/CAPTURE,
3. enumeration of raw formats,
4. enumeration of resolutions.
5. enumeration of intervals (optional),
6. enumeration of controls and their ranges (if the range is a menu, it
needs its own enumeration).

Each enumeration is 1 API call per element + 1 more in the end. So it
may require up to like 200 round trips overall if there are many
controls I guess. I believe only the first step can be done before any
streams are created. Well, maybe one can create a fake stream and do all
the enumerations during the startup.
Right now it doesn't look like a big problem. If somebody tries to run
the video processing over a local network, they may notice the startup
delay already. Please correct me if I'm wrong.

>> 2. Making all the commands non-blocking by providing completion events
>> over the event queue.
>
> +1 on that too, as the experience with virtio-v4l2 suggests this
> eliminates some headaches down the line.

Super!

>> 3. Adding back the controls from v5 and adding the corresponding feature
>> flags as I wrote in the quote above.
>
> Beware of not repeating the same mistake as v5 (all controls under one
> big structure). If you mean extending the parameters mechanisms with
> the things that are missing from v5, then yes that should be done
> anyway.

Thanks, noted.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-12  4:09                                               ` Alexandre Courbot
  2023-05-16 14:53                                                 ` Alexander Gordeev
@ 2023-05-17 11:04                                                 ` Alexander Gordeev
  1 sibling, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-17 11:04 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Cornelia Huck, virtio-dev, Keiichi Watanabe, Alex Bennée,
	Marcin Wojtas, Matti Möll, Andrew Gazizov, Enrico Granata,
	Gustavo Padovan, Peter Griffin, Bartłomiej Grzesik,
	Tomasz Figa, Daniel Almeida, Enric Balletbo i Serra,
	Albert Esteve

On 12.05.23 06:09, Alexandre Courbot wrote:
> On Thu, May 11, 2023 at 5:50 PM Alexander Gordeev
> <alexander.gordeev@opensynergy.com> wrote:
>>
>> 3. We can have things, that V4L2 doesn't support in their stateful UAPI.
>> For example, dequeuing output buffers in decoder order.
>
> ... provided the host can do that (i.e. has a stateless decoder
> interface).

Indeed dequeuing in decoding order is obviously possible with stateless
interfaces. But it is often possible with stateful interfaces too AFAIK,
just not with V4L2. For example, we have this option on some of our
targets because the codecs are exposed using OMX with some vendor
extensions. This is a stateful interface.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-16 14:53                                                 ` Alexander Gordeev
@ 2023-05-17 16:28                                                   ` Cornelia Huck
  2023-05-18  6:29                                                     ` Alexander Gordeev
  2023-05-18 19:35                                                     ` Michael S. Tsirkin
  0 siblings, 2 replies; 97+ messages in thread
From: Cornelia Huck @ 2023-05-17 16:28 UTC (permalink / raw)
  To: Alexander Gordeev, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

>>> I also hope to be able to update virtio-video at a faster pace. Please
>>> let me try.
>>
>> Please hold on a bit - there are two things here.
>>
>> 1) I'd like to settle the virtio-v4l2/virtio-video argument first to
>> make sure we don't get two things clashing head-on. As far as codecs
>> are concerned we certainly don't need both. Cornelia, I think we'll
>> need you to make a call on this, or at least tell us what you need to
>> make the call. If it helps I can send a draft of what the virtio-v4l2
>> spec would look like, it should be relatively short.
>>
>> 2) I (and other ChromeOS contributors) have been driving this spec so
>> far and while I think virtio-v4l2 is a better solution, I have not
>> said I would give up on virtio-video if virtio-v4l2 was not adopted
>> and will keep iterating on it in that case.
>
> It actually very much looks like you gave up. I mean, you developed it
> for years, and now you'd like to throw it away. Well, I meant something
> like "please have some faith in my ability to update virtio-video at a
> faster pace". I think I already have the permission to continue the
> development. You know, OpenSynergy also spent some time on the
> virtio-video. This includes the draft v1 version and the V4L2 driver. I
> think this was kind of an informal agreement between us, that you do the
> spec and we do the driver. It didn't work well enough since draft v4, I
> think. Now as our interests are not aligned anymore, it is fine to
> continue separately. I'm not going to wait until you change your mind on
> virtio-v4l2. Obviously you're very busy with it right now. If I stop and
> wait now, we won't have the video device in the 1.3 release for sure. I
> think when we are ready to develop virtio-video together again, we'll
> need a better agreement.

Please, can we just calm down here? This thread is painful to read
already, and pointing at each other is not going to help us to come up
with something that is generally agreeable. In the end, I want _a_ spec,
and I don't really have a horse in V4L2 vs. video race, but having to
dig through all of this in the hope of moderating things is just
impossible if you're arguing in circles, and much of it seemingly being
a rehashing about who said/did something or not.

> That said, I very much appreciate your efforts with the spec. It
> definitely received a lot of improvements.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-17 16:28                                                   ` Cornelia Huck
@ 2023-05-18  6:29                                                     ` Alexander Gordeev
  2023-05-18 19:35                                                     ` Michael S. Tsirkin
  1 sibling, 0 replies; 97+ messages in thread
From: Alexander Gordeev @ 2023-05-18  6:29 UTC (permalink / raw)
  To: Cornelia Huck, Alexandre Courbot
  Cc: virtio-dev, Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On 17.05.23 18:28, Cornelia Huck wrote:
>>>> I also hope to be able to update virtio-video at a faster pace. Please
>>>> let me try.
>>>
>>> Please hold on a bit - there are two things here.
>>>
>>> 1) I'd like to settle the virtio-v4l2/virtio-video argument first to
>>> make sure we don't get two things clashing head-on. As far as codecs
>>> are concerned we certainly don't need both. Cornelia, I think we'll
>>> need you to make a call on this, or at least tell us what you need to
>>> make the call. If it helps I can send a draft of what the virtio-v4l2
>>> spec would look like, it should be relatively short.
>>>
>>> 2) I (and other ChromeOS contributors) have been driving this spec so
>>> far and while I think virtio-v4l2 is a better solution, I have not
>>> said I would give up on virtio-video if virtio-v4l2 was not adopted
>>> and will keep iterating on it in that case.
>>
>> It actually very much looks like you gave up. I mean, you developed it
>> for years, and now you'd like to throw it away. Well, I meant something
>> like "please have some faith in my ability to update virtio-video at a
>> faster pace". I think I already have the permission to continue the
>> development. You know, OpenSynergy also spent some time on the
>> virtio-video. This includes the draft v1 version and the V4L2 driver. I
>> think this was kind of an informal agreement between us, that you do the
>> spec and we do the driver. It didn't work well enough since draft v4, I
>> think. Now as our interests are not aligned anymore, it is fine to
>> continue separately. I'm not going to wait until you change your mind on
>> virtio-v4l2. Obviously you're very busy with it right now. If I stop and
>> wait now, we won't have the video device in the 1.3 release for sure. I
>> think when we are ready to develop virtio-video together again, we'll
>> need a better agreement.
>
> Please, can we just calm down here? This thread is painful to read
> already, and pointing at each other is not going to help us to come up
> with something that is generally agreeable. In the end, I want _a_ spec,
> and I don't really have a horse in V4L2 vs. video race, but having to
> dig through all of this in the hope of moderating things is just
> impossible if you're arguing in circles, and much of it seemingly being
> a rehashing about who said/did something or not.

I'm very sorry for all the noise. I think it is fine to reiterate the
arguments a couple times to make them more digestible. But then it is
sometimes hard to stop. Sorry for that.

Also I didn't want to show any disrespect to Alexandre and Keiichi. In
fact I have deep respect for what they have achieved: going from v1 with
lots of TBD placeholders to fairly complete v3 and then trying to
simplify things further. It just seems to me that few final steps are
missing. Also I very much like the idea of having a repository with all
the intermediate changes. This already helps increase transparency of
the spec development. I'd like to sort out licensing questions first and
then continue with the same thing. In the end it might be not relevant
anymore who exactly sends the patch to the mailing list.

I'd like to focus on the draft v7 and I'm also going to send it to this
mailing list (or to virtio-comments first) even though I totally
understand, that the decision is not made yet and that we may end up
throwing it all away. If the decision is made on June 30, it would be
better for everyone to have a spec, that is ready to be merged.

--
Alexander Gordeev
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone: +49 30 60 98 54 0 - 88
Fax: +49 (30) 60 98 54 0 - 99
EMail: alexander.gordeev@opensynergy.com

www.opensynergy.com

Handelsregister/Commercial Registry: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Régis Adjamah

Please mind our privacy notice<https://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/> pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.<https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-17 16:28                                                   ` Cornelia Huck
  2023-05-18  6:29                                                     ` Alexander Gordeev
@ 2023-05-18 19:35                                                     ` Michael S. Tsirkin
  1 sibling, 0 replies; 97+ messages in thread
From: Michael S. Tsirkin @ 2023-05-18 19:35 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alexander Gordeev, Alexandre Courbot, virtio-dev,
	Keiichi Watanabe, Alex Bennée, Marcin Wojtas,
	Matti Möll, Andrew Gazizov, Enrico Granata, Gustavo Padovan,
	Peter Griffin, Bartłomiej Grzesik, Tomasz Figa,
	Daniel Almeida, Enric Balletbo i Serra, Albert Esteve

On Wed, May 17, 2023 at 06:28:29PM +0200, Cornelia Huck wrote:
> >>> I also hope to be able to update virtio-video at a faster pace. Please
> >>> let me try.
> >>
> >> Please hold on a bit - there are two things here.
> >>
> >> 1) I'd like to settle the virtio-v4l2/virtio-video argument first to
> >> make sure we don't get two things clashing head-on. As far as codecs
> >> are concerned we certainly don't need both. Cornelia, I think we'll
> >> need you to make a call on this, or at least tell us what you need to
> >> make the call. If it helps I can send a draft of what the virtio-v4l2
> >> spec would look like, it should be relatively short.
> >>
> >> 2) I (and other ChromeOS contributors) have been driving this spec so
> >> far and while I think virtio-v4l2 is a better solution, I have not
> >> said I would give up on virtio-video if virtio-v4l2 was not adopted
> >> and will keep iterating on it in that case.
> >
> > It actually very much looks like you gave up. I mean, you developed it
> > for years, and now you'd like to throw it away. Well, I meant something
> > like "please have some faith in my ability to update virtio-video at a
> > faster pace". I think I already have the permission to continue the
> > development. You know, OpenSynergy also spent some time on the
> > virtio-video. This includes the draft v1 version and the V4L2 driver. I
> > think this was kind of an informal agreement between us, that you do the
> > spec and we do the driver. It didn't work well enough since draft v4, I
> > think. Now as our interests are not aligned anymore, it is fine to
> > continue separately. I'm not going to wait until you change your mind on
> > virtio-v4l2. Obviously you're very busy with it right now. If I stop and
> > wait now, we won't have the video device in the 1.3 release for sure. I
> > think when we are ready to develop virtio-video together again, we'll
> > need a better agreement.
> 
> Please, can we just calm down here? This thread is painful to read
> already, and pointing at each other is not going to help us to come up
> with something that is generally agreeable. In the end, I want _a_ spec,
> and I don't really have a horse in V4L2 vs. video race, but having to
> dig through all of this in the hope of moderating things is just
> impossible if you're arguing in circles, and much of it seemingly being
> a rehashing about who said/did something or not.

Well said Cornelia thanks for writing this.

I really stopped reading this thread, hopefully things will be better
from now on. One thing that will help is if future submissions
include a summary of alternatives and past discussion in
the cover letter.

> > That said, I very much appreciate your efforts with the spec. It
> > definitely received a lot of improvements.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [libcamera-devel] [virtio-dev] Re: [RFC PATCH v6] virtio-video: Add virtio video device specification
  2023-05-08  8:00                                                         ` Alexandre Courbot
  (?)
@ 2023-08-09  7:34                                                         ` Hans Verkuil
  -1 siblings, 0 replies; 97+ messages in thread
From: Hans Verkuil @ 2023-08-09  7:34 UTC (permalink / raw)
  To: Alexandre Courbot, Laurent Pinchart
  Cc: Alex Bennée, virtio-dev, Albert Esteve, Matti Möll,
	Andrew Gazizov, Daniel Almeida, Cornelia Huck, Marcin Wojtas,
	Keiichi Watanabe, Gustavo Padovan, Alexander Gordeev,
	libcamera-devel, Bartłomiej Grzesik, Enrico Granata,
	Enric Balletbo i Serra, linux-media

Hi all,

On 08/05/2023 10:00, Alexandre Courbot wrote:
> Just to add some context for linux-media@, as I think it may be
> missing from the quoted thread:
> 
> The virtio-video specification has been dragging for quite some time,
> and the more it progresses the more it starts looking like V4L2 with
> different names. So I suggested that we just encapsulate V4L2 syscalls
> into virtio descriptors and basically use the V4L2 model for video
> device virtualization. The benefits would be a much shorter
> virtio-video specification, and support for other kinds of V4L2
> devices like cameras.
> 
> I tried to write a quick prototype to test the idea and it works well
> enough to expose a USB webcam or the vicodec decoder/encoder using the
> same guest driver:
> 
> https://github.com/Gnurou/linux/blob/virtio-v4l2/drivers/media/virtio-v4l2/virtio_v4l2_driver.c
> 
> (driver not supposed to be upstreamed as-is ; it was quickly put
> together to check whether the idea could fly).
> 
> It would be interesting to hear what the V4L2 maintainers and active
> contributors think of this idea. IMHO it provides much more bang for
> the buck than having to write new specifications for codec and camera
> virtualization, but there are arguments that V4L2 would be too complex
> for virtualizing video codecs, and is overall not specified as
> precisely as virtio-video would be.

One of the main problems with cameras and codecs is that technology keeps
evolving and APIs need to adapt continuously. It's a painful process and
replicating that process in another API seems to me something you would
want to avoid if at all possible.

So Alexander's proposal is IMHO something you should seriously consider.

And w.r.t. codecs whatever API you come up with will be very similar to the
V4L2 API: there aren't that many ways you can do decoding, for example, and
a lot of that is defined by the underlying standards.

Regards,

	Hans

> 
> On Sat, May 6, 2023 at 5:16 PM Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
>>
>> I'm also CC'ing the linux-media@vger.kernel.org mailing list for these
>> discussions, I'm sure there are folks there who are interested in codec
>> and camera virtualization.
>>
>> On Sat, May 06, 2023 at 11:12:29AM +0300, Laurent Pinchart via libcamera-devel wrote:
>>> On Fri, May 05, 2023 at 04:55:33PM +0100, Alex Bennée via libcamera-devel wrote:
>>>> Kieran Bingham writes:
>>>>> Quoting Alexander Gordeev (2023-05-05 10:57:29)
>>>>>> On 03.05.23 17:53, Cornelia Huck wrote:
>>>>>>> On Wed, May 03 2023, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>>>>>> Cornelia Huck <cohuck@redhat.com> writes:
>>>>>>>>> On Fri, Apr 28 2023, Alexander Gordeev <alexander.gordeev@opensynergy.com> wrote:
>>>>>>>>>> On 27.04.23 15:16, Alexandre Courbot wrote:
>>>>>>>>>>> But in any case, that's irrelevant to the guest-host interface, and I
>>>>>>>>>>> think a big part of the disagreement stems from the misconception that
>>>>>>>>>>> V4L2 absolutely needs to be used on either the guest or the host,
>>>>>>>>>>> which is absolutely not the case.
>>>>>>>>>>
>>>>>>>>>> I understand this, of course. I'm arguing, that it is harder to
>>>>>>>>>> implement it, get it straight and then maintain it over years. Also it
>>>>>>>>>> brings limitations, that sometimes can be workarounded in the virtio
>>>>>>>>>> spec, but this always comes at a cost of decreased readability and
>>>>>>>>>> increased complexity. Overall it looks clearly as a downgrade compared
>>>>>>>>>> to virtio-video for our use-case. And I believe it would be the same for
>>>>>>>>>> every developer, that has to actually implement the spec, not just do
>>>>>>>>>> the pass through. So if we think of V4L2 UAPI pass through as a
>>>>>>>>>> compatibility device (which I believe it is), then it is fine to have
>>>>>>>>>> both and keep improving the virtio-video, including taking the best
>>>>>>>>>> ideas from the V4L2 and overall using it as a reference to make writing
>>>>>>>>>> the driver simpler.
>>>>>>>>>
>>>>>>>>> Let me jump in here and ask another question:
>>>>>>>>>
>>>>>>>>> Imagine that, some years in the future, somebody wants to add a virtio
>>>>>>>>> device for handling video encoding/decoding to their hypervisor.
>>>>>>>>>
>>>>>>>>> Option 1: There are different devices to chose from. How is the person
>>>>>>>>> implementing this supposed to pick a device? They might have a narrow
>>>>>>>>> use case, where it is clear which of the devices is the one that needs to
>>>>>>>>> be supported; but they also might have multiple, diverse use cases, and
>>>>>>>>> end up needing to implement all of the devices.
>>>>>>>>>
>>>>>>>>> Option 2: There is one device with various optional features. The person
>>>>>>>>> implementing this can start off with a certain subset of features
>>>>>>>>> depending on their expected use cases, and add to it later, if needed;
>>>>>>>>> but the upfront complexity might be too high for specialized use cases.
>>>>>>>>>
>>>>>>>>> Leaving concrete references to V4L2 out of the picture, we're currently
>>>>>>>>> trying to decide whether our future will be more like Option 1 or Option
>>>>>>>>> 2, with their respective trade-offs.
>>>>>>>>>
>>>>>>>>> I'm slightly biased towards Option 2; does it look feasible at all, or
>>>>>>>>> am I missing something essential here? (I had the impression that some
>>>>>>>>> previous confusion had been cleared up; apologies in advance if I'm
>>>>>>>>> misrepresenting things.)
>>>>>>>>>
>>>>>>>>> I'd really love to see some kind of consensus for 1.3, if at all
>>>>>>>>> possible :)
>>>>>>>>
>>>>>>>> I think feature discovery and extensibility is a key part of the VirtIO
>>>>>>>> paradigm which is why I find the virtio-v4l approach limiting. By
>>>>>>>> pegging the device to a Linux API we effectively limit the growth of the
>>>>>>>> device specification to as fast as the Linux API changes. I'm not fully
>>>>>>>> immersed in v4l but I don't think it is seeing any additional features
>>>>>>>> developed for it and its limitations for camera are one of the reasons
>>>>>>>> stuff is being pushed to userspace in solutions like libcamera:
>>>>>>>>
>>>>>>>>    How is libcamera different from V4L2?
>>>>>>>>
>>>>>>>>    We see libcamera as a continuation of V4L2. One that can more easily
>>>>>>>>    handle the recent advances in hardware design. As embedded cameras have
>>>>>>>>    developed, all of the complexity has been pushed on to the developers.
>>>>>>>>    With libcamera, all of that complexity is simplified and a single model
>>>>>>>>    is presented to application developers.
>>>>>>>
>>>>>>> Ok, that is interesting; thanks for the information.
>>>>>>>
>>>>>>>>
>>>>>>>> That said its not totally our experience to have virtio devices act as
>>>>>>>> simple pipes for some higher level protocol. The virtio-gpu spec says
>>>>>>>> very little about the details of how 3D devices work and simply offers
>>>>>>>> an opaque pipe to push a (potentially propriety) command stream to the
>>>>>>>> back end. As far as I'm aware the proposals for Vulkan and Wayland
>>>>>>>> device support doesn't even offer a feature bit but simply changes the
>>>>>>>> graphics stream type in the command packets.
>>>>>>>>
>>>>>>>> We could just offer a VIRTIO_VIDEO_F_V4L feature bit, document it as
>>>>>>>> incompatible with other feature bits and make that the baseline
>>>>>>>> implementation but it's not really in the spirit of what VirtIO is
>>>>>>>> trying to achieve.
>>>>>>>
>>>>>>> I'd not be in favour of an incompatible feature flag,
>>>>>>> either... extensions are good, but conflicting features is something
>>>>>>> that I'd like to avoid.
>>>>>>>
>>>>>>> So, given that I'd still prefer to have a single device: How well does
>>>>>>> the proposed virtio-video device map to a Linux driver implementation
>>>>>>> that hooks into V4L2?
>>>>>>
>>>>>> IMO it hooks into V4L2 pretty well. And I'm going to spend next few
>>>>>> months making the existing driver fully V4L2 compliant. If this goal
>>>>>> requires changing the spec, than we still have time to do that. I don't
>>>>>> expect a lot of problems on this side. There might be problems with
>>>>>> Android using V4L2 in weird ways. Well, let's see. Anyway, I think all
>>>>>> of this can be accomplished over time.
>>>>>>
>>>>>>> If the general process flow is compatible and it
>>>>>>> is mostly a question of wiring the parts together, I think pushing that
>>>>>>> part of the complexity into the Linux driver is a reasonable
>>>>>>> trade-off. Being able to use an existing protocol is nice, but if that
>>>>>>> protocol is not perceived as flexible enough, it is probably not worth
>>>>>>> encoding it into a spec. (Similar considerations apply to hooking up the
>>>>>>> device in the hypervisor.)
>>>>>>
>>>>>> I very much agree with these statements. I think this is how it should
>>>>>> be: we start with a compact but usable device, then add features and
>>>>>> enable them using feature flags. Eventually we can cover all the
>>>>>> use-cases of V4L2 unless we decide to have separate devices for them
>>>>>> (virtio-camera, etc). This would be better in the long term I think.
>>>>>
>>>>> Camera's definitely have their quirks - mostly because many usecases are
>>>>> hard to convey over a single Video device node (with the hardware) but I
>>>>> think we might expect that complexity to be managed by the host, and
>>>>> probably offer a ready made stream to the guest. Of course how to handle
>>>>> multiple streams and configuration of the whole pipeline may get more
>>>>> difficult and warrant a specific 'virtio-camera' ... but I would think
>>>>> the basics could be covered generically to start with.
>>>>>
>>>>> It's not clear who's driving this implementation and spec, so I guess
>>>>> there's more reading to do.
>>>>>
>>>>> Anyway, I've added Cc libcamera-devel to raise awareness of this topic
>>>>> to camera list.
>>>>>
>>>>> I bet Laurent has some stronger opinions on how he'd see camera's exist
>>>>> in a virtio space.
>>>
>>> You seem to think I have strong opinions about everything. This may not
>>> be a complitely unfounded assumption ;-)
>>>
>>> Overall I agree with you, I think cameras are too complex for a
>>> low-level virtualization protocol. I'd rather see a high-level protocol
>>> that exposes webcam-like devices, with the low-level complexity handled
>>> on the host side (using libcamera of course ;-)). This would support use
>>> cases that require sharing hardware blocks between multiple logical
>>> cameras, including sharing the same camera streams between multiple
>>> guests.
>>>
>>> If a guest needs low-level access to the camera, including the ability
>>> to control the raw camera sensor or ISP, then I'd recommend passing the
>>> corresponding hardware blocks to the guest for exclusive access.
>>>
>>>> Personally I would rather see a separate virtio-camera specification
>>>> that properly encapsulates all the various use cases we have for
>>>> cameras. In many ways just processing a stream of video is a much
>>>> simpler use case.
>>>>
>>>> During Linaro's Project Stratos we got a lot of feedback from members
>>>> who professed interest in a virtio-camera initiative. However we were
>>>> unable to get enough engineering resources from the various companies to
>>>> collaborate in developing a specification that would meet everyone's
>>>> needs. The problem space is wide from having numerous black and white
>>>> sensor cameras on cars to the full on computational photography as
>>>> exposed by modern camera systems on phones. If you want to read more
>>>> words on the topic I wrote a blog post at the time:
>>>>
>>>>   https://www.linaro.org/blog/the-challenges-of-abstracting-virtio/
>>>>
>>>> Back to the topic of virtio-video as I understand it the principle
>>>> features/configurations are:
>>>>
>>>>   - All the various CODECs, resolutions and pixel formats
>>>>   - Stateful vs Stateless streams
>>>>   - If we want support grabbing single frames from a source
>>>>
>>>> My main concern about the V4L approach is that it pegs updates to the
>>>> interface to the continuing evolution of the V4L interface in Linux. Now
>>>> maybe video is a solved problem and there won't be (m)any new features
>>>> we need to add after the initial revision. However I'm not a domain
>>>> expert here so I just don't know.
>>>
>>> I've briefly discussed "virtio-v4l2" with Alex Courbot a few weeks ago
>>> when we got a chance to meet face to face. I think the V4L2 kernel API
>>> is a quite good fit in the sense that its level of abstraction, when
>>> applied to video codecs and "simple" cameras (defined, more or less, as
>>> something ressembling a USB webcam feature-wise). It doesn't mean that
>>> the virtio-video or virtio-camera specifications should necessarily
>>> reference V4L2 or use the exact same vocabulary, they could simply copy
>>> the concepts, and stay loosely-coupled with V4L2 in the sense that both
>>> specification should try to evolve in compatible directions.
>>
>> --
>> Regards,
>>
>> Laurent Pinchart


^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2023-08-09  7:34 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-08  7:23 [virtio-dev] [RFC PATCH v6] virtio-video: Add virtio video device specification Alexandre Courbot
2022-12-08 15:00 ` Cornelia Huck
2022-12-27  5:38   ` Alexandre Courbot
2023-01-11  8:45     ` Cornelia Huck
2023-01-12  6:32       ` Alexandre Courbot
2023-01-12 15:23         ` Cornelia Huck
2022-12-19 16:59 ` [virtio-dev] " Alexander Gordeev
2022-12-20  9:51   ` Cornelia Huck
2022-12-20 10:35     ` Alexander Gordeev
2022-12-20 17:39       ` Cornelia Huck
2022-12-21 14:56         ` Alexander Gordeev
2022-12-27  7:31   ` Alexandre Courbot
2023-01-11 18:42     ` Alexander Gordeev
2023-01-11 20:13       ` Alex Bennée
2023-01-12  6:40         ` Alexandre Courbot
2023-01-12  6:39       ` Alexandre Courbot
2023-01-18 23:06         ` Alexander Gordeev
2023-02-06 14:12           ` Cornelia Huck
2023-02-07  6:16             ` Alexandre Courbot
2023-02-07 13:59               ` Cornelia Huck
2023-03-10 10:50                 ` Cornelia Huck
2023-03-10 13:19                   ` Alexandre Courbot
2023-03-10 14:20                     ` Cornelia Huck
2023-03-14  5:06                       ` Alexandre Courbot
2023-03-16 10:12                         ` Alexander Gordeev
2023-03-17  7:24                           ` Alexandre Courbot
2023-04-17 12:51                             ` Alexander Gordeev
2023-04-17 14:43                               ` Cornelia Huck
2023-04-19  7:39                                 ` Alexander Gordeev
2023-04-19 21:34                                   ` Enrico Granata
2023-04-21 14:48                                     ` Alexander Gordeev
2023-04-21  4:02                                   ` Alexandre Courbot
2023-04-21 16:01                                     ` Alexander Gordeev
2023-04-24  7:52                                       ` Alexander Gordeev
2023-04-25 16:04                                         ` Cornelia Huck
2023-04-26  6:29                                           ` Alexandre Courbot
2023-04-27 14:10                                           ` Alexander Gordeev
2023-04-28  4:02                                             ` Alexandre Courbot
2023-04-28  8:54                                               ` Alexander Gordeev
2023-05-02  1:07                                                 ` Alexandre Courbot
2023-05-02 11:12                                                   ` Alexander Gordeev
2023-04-26  5:52                                         ` Alexandre Courbot
2023-04-27 14:20                                           ` Alexander Gordeev
2023-04-28  3:22                                             ` Alexandre Courbot
2023-04-28  8:22                                               ` Alexander Gordeev
2023-04-26 15:52                                     ` Alexander Gordeev
2023-04-27 13:23                                       ` Alexandre Courbot
2023-04-27 15:12                                         ` Alexander Gordeev
2023-04-28  3:24                                           ` Alexandre Courbot
2023-04-28  8:31                                             ` Alexander Gordeev
     [not found]                                     ` <CALgKJBqKWng508cB_F_uD2fy9EAvQ36rYR3fRb57sFd3ihpUFw@mail.gmail.com>
2023-04-26 16:00                                       ` Alexander Gordeev
2023-04-27 10:13                                         ` Bartłomiej Grzesik
2023-04-27 14:34                                           ` Alexander Gordeev
2023-04-28  3:22                                             ` Alexandre Courbot
2023-04-28  7:57                                               ` Alexander Gordeev
2023-04-21  4:02                               ` Alexandre Courbot
2023-04-26 15:11                                 ` Alexander Gordeev
2023-04-27 13:16                                   ` Alexandre Courbot
2023-04-28  7:47                                     ` Alexander Gordeev
2023-05-03 14:04                                       ` Cornelia Huck
2023-05-03 15:11                                         ` Alex Bennée
2023-05-03 15:53                                           ` Cornelia Huck
2023-05-05  9:57                                             ` Alexander Gordeev
     [not found]                                               ` <168329085253.1880445.14002473591422425775@Monstersaurus>
2023-05-05 15:55                                                 ` Alex Bennée
     [not found]                                                   ` <20230506081229.GA8114@pendragon.ideasonboard.com>
2023-05-06  8:16                                                     ` [libcamera-devel] " Laurent Pinchart
2023-05-08  8:00                                                       ` [virtio-dev] " Alexandre Courbot
2023-05-08  8:00                                                         ` Alexandre Courbot
2023-08-09  7:34                                                         ` Hans Verkuil
2023-05-16 13:50                                                       ` [virtio-dev] " Alexander Gordeev
2023-05-16 13:50                                                         ` Alexander Gordeev
2023-05-17  3:58                                                     ` [virtio-dev] " Tomasz Figa
2023-05-16 12:57                                                   ` Alexander Gordeev
2023-05-05 12:28                                           ` Alexander Gordeev
2023-05-05 11:54                                         ` Alexander Gordeev
2023-05-08  4:55                                           ` Alexandre Courbot
2023-05-11  8:50                                             ` Alexander Gordeev
2023-05-11  9:00                                               ` Alexander Gordeev
2023-05-12  4:15                                                 ` Alexandre Courbot
2023-05-17  7:35                                                   ` Alexander Gordeev
2023-05-12  4:09                                               ` Alexandre Courbot
2023-05-16 14:53                                                 ` Alexander Gordeev
2023-05-17 16:28                                                   ` Cornelia Huck
2023-05-18  6:29                                                     ` Alexander Gordeev
2023-05-18 19:35                                                     ` Michael S. Tsirkin
2023-05-17 11:04                                                 ` Alexander Gordeev
2023-03-27 13:00                         ` Albert Esteve
2023-04-15  5:58                           ` Alexandre Courbot
2023-04-17 12:56                             ` Cornelia Huck
2023-04-17 13:13                               ` Alexander Gordeev
2023-04-17 13:22                                 ` Cornelia Huck
2023-02-07 11:11             ` Alexander Gordeev
2023-02-07  6:51           ` Alexandre Courbot
2023-02-07 10:57             ` Alexander Gordeev
2023-01-11 17:04 ` Alexander Gordeev
2023-01-12  6:32   ` Alexandre Courbot
2023-01-12 22:24     ` Alexander Gordeev
2023-01-11 18:45 ` Alexander Gordeev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.