All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-09-19  9:34 ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-09-19  9:34 UTC (permalink / raw)
  To: virtio-dev
  Cc: keiichiw, acourbot, alexlau, dgreid, marcheu, posciak, stevensd,
	tfiga, hverkuil, linux-media

[Resending because of some issues with sending to virtio-dev. Sorry for the noise.]

This patch proposes virtio specification for new virtio video decode
device.
This device provides the functionality of hardware accelerated video
decoding from encoded video contents provided by the guest into frame
buffers accessible by the guest.

We have prototype implementation for VMs on Chrome OS:
* virtio-vdec device in crosvm [1]
* virtio-vdec driver in Linux kernel v4.19 [2]
  - This driver follows V4L2 stateful video decoder API [3].

Our prototype implementation uses [4], which allows the virtio-vdec
device to use buffers allocated by virtio-gpu device.

Any feedback would be greatly appreciated. Thank you.

[1] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
[2] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
[3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be merged to Linux 5.4)
[4] https://lkml.org/lkml/2019/9/12/157

Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
---
 content.tex     |   1 +
 virtio-vdec.tex | 750 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 751 insertions(+)
 create mode 100644 virtio-vdec.tex

diff --git a/content.tex b/content.tex
index 37a2190..b57d4a9 100644
--- a/content.tex
+++ b/content.tex
@@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
 \input{virtio-input.tex}
 \input{virtio-crypto.tex}
 \input{virtio-vsock.tex}
+\input{virtio-vdec.tex}

 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}

diff --git a/virtio-vdec.tex b/virtio-vdec.tex
new file mode 100644
index 0000000..d117129
--- /dev/null
+++ b/virtio-vdec.tex
@@ -0,0 +1,750 @@
+\section{Video Decode Device}
+\label{sec:Device Types / Video Decode Device}
+
+virtio-vdec is a virtio based video decoder. This device provides the
+functionality of hardware accelerated video decoding from encoded
+video contents provided by the guest into frame buffers accessible by
+the guest.
+
+\subsection{Device ID}
+\label{sec:Device Types / Video Decode Device / Device ID}
+
+28
+
+\subsection{Virtqueues}
+\label{sec:Device Types / Video Decode Device / Virtqueues}
+
+\begin{description}
+\item[0] outq - queue for sending requests from the driver to the
+  device
+\item[1] inq - queue for sending requests from the device to the
+  driver
+\end{description}
+
+Each queue is used uni-directionally. outq is used to send requests
+from the driver to the device (i.e., guest requests) and inq is used
+to send requests in the other direction (i.e., host requests).
+
+\subsection{Feature bits}
+\label{sec:Device Types / Video Decode Device / Feature bits}
+
+There are currently no feature bits defined for this device.
+
+\subsection{Device configuration layout}
+\label{sec:Device Types / Video Decode Device / Device configuration layout}
+
+None.
+
+\subsection{Device Requirements: Device Initialization}
+\label{sec:Device Types / Video Decode Device / Device Requirements: Device Initialization}
+
+The virtqueues are initialized.
+
+\subsection{Device Operation}
+\label{sec:Device Types / Video Decode Device / Device Operation}
+
+\subsubsection{Video Buffers}
+\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
+
+A virtio-vdec driver and a device use two types of video buffers:
+\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
+contains encoded video stream data. This buffer is similar to an
+OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
+contains decoded video frame data like a CAPTURE buffers for V4L2 API.
+The driver and the device share these buffers, and each buffer is
+identified by a unique integer called a \emph{resource handle}.
+
+\subsubsection{Guest Request}
+
+The driver queues requests to the outq virtqueue. The device MAY
+process requests out-of-order. All requests on outq use the following
+structure:
+
+\begin{lstlisting}
+enum virtio_vdec_guest_req_type {
+        VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
+
+        /* Global */
+        VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
+
+        /* Per instance */
+        VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
+        VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
+        VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
+        VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
+        VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
+        VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
+        VIRTIO_VDEC_GUEST_REQ_DRAIN,
+        VIRTIO_VDEC_GUEST_REQ_FLUSH,
+        VIRTIO_VDEC_GUEST_REQ_CLOSE,
+};
+
+struct virtio_vdec_guest_req {
+        le32 type;
+        le32 instance_id;
+        union {
+                struct virtio_vdec_guest_req_open open;
+                struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
+                struct virtio_vdec_guest_req_register_buffer register_buffer;
+                struct virtio_vdec_guest_req_bitstream_buffer bitstream_buffer;
+                struct virtio_vdec_guest_req_frame_buffer frame_buffer;
+        };
+};
+\end{lstlisting}
+
+This structure includes the following generic fields to identify the
+request:
+\begin{description}
+\item[\field{type}] specifies the type of the guest request using
+  values from \field{enum virtio_vdec_guest_req_type}.
+\item[\field{instance_id}] specifies an instance ID. This field is
+  ignored unless \field{type} represents a per-instance request.
+\end{description}
+
+The union fields are additional per-request data used only when
+\field{type} has a specific value:
+
+\begin{description}
+\item[\field{open}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_OPEN. See \ref{sec:Device Types / Video Decode
+    Device / Device Operation / Instance Open}.
+\item[\field{set_buffer_count}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Set Buffer Count}.
+\item[\field{register_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Buffer Registration}.
+\item[\field{bitstream_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Bitstream Buffer
+    Requests}.
+\item[\field{frame_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Frame Buffer Requests}.
+\end{description}
+
+\subsubsection{Host Request}
+
+The device queues requests to the inq virtqueue. All requests on inq
+use the following structure:
+
+\begin{lstlisting}
+enum virtio_vdec_host_req_type {
+        VIRTIO_VDEC_HOST_REQ_UNDEFINED = 0,
+
+        /* Global */
+        VIRTIO_VDEC_HOST_REQ_QUERY = 0x0100,
+
+        /* Per instance */
+        VIRTIO_VDEC_HOST_REQ_OPEN = 0x0200,
+        VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT,
+        VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER,
+        VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER,
+        VIRTIO_VDEC_HOST_REQ_STREAM_INFO,
+        VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER,
+        VIRTIO_VDEC_HOST_REQ_DRAINED,
+        VIRTIO_VDEC_HOST_REQ_FLUSHED,
+        VIRTIO_VDEC_HOST_REQ_CLOSE,
+        VIRTIO_VDEC_HOST_REQ_EOS,
+
+        /* Error response */
+        VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR = 0x1100,
+};
+
+enum virtio_vdec_host_req_result {
+        /* Success */
+        VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS = 0,
+
+        /* Error */
+        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_UNSPEC = 0x1000,
+        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_INVALID_REQUEST,
+};
+
+struct virtio_vdec_host_req {
+        le32 type;     /* VIRTIO_VDEC_HOST_REQ_* */
+        le32 result;   /* VIRTIO_VDEC_HOST_REQ_RESULT_ */
+        le32 instance_id;
+        le32 reserved; /* for 64-bit alignment */
+        union {
+                struct virtio_vdec_host_req_query query;
+                struct virtio_vdec_host_req_set_buffer_count set_buffer_count;
+                struct virtio_vdec_host_req_register_buffer register_buffer;
+                struct virtio_vdec_host_req_bitstream_buffer bitstream_buffer;
+                struct virtio_vdec_host_req_stream_info stream_info;
+                struct virtio_vdec_host_req_frame_buffer frame_buffer;
+        };
+};
+\end{lstlisting}
+
+This structure includes the following generic fields:
+\begin{description}
+\item[\field{type}] specifies the type of the host request using
+  values from \field{enum virtio_vdec_host_req_type}.
+\item[\field{result}] specifies the result of the operation using the
+  \field{enum virtio_vdec_host_req_result}.
+\item[\field{instance_id}] specifies an instance ID. This field is
+  ignored unless \field{type} represents a per-instance request.
+\end{description}
+
+The union fields are additional per-request data used only when
+\field{type} has a specific value:
+
+\begin{description}
+\item[\field{query}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_QUERY. See \ref{sec:Device Types / Video Decode
+    Device / Device Operation / Query Capabilities}.
+\item[\field{set_buffer_count}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Set Buffer Count}.
+\item[\field{register_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Buffer Registration}.
+\item[\field{bitstream_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Bitstream Buffer
+    Requests}.
+\item[\field{stream_info}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types / Video
+    Decode Device / Device Operation / Stream Information}.
+\item[\field{frame_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER. See \ref{sec:Device Types / Video
+    Decode Device / Device Operation / Frame Buffer Requests}.
+\end{description}
+
+\subsubsection{Query Capabilities}
+\label{sec:Device Types / Video Decode Device / Device Operation / Query Capabilities}
+
+The driver can send VIRTIO_VDEC_GUEST_REQ_QUERY to query supported
+bitstream formats and frame formats. This is similar to
+VIDIOC_ENUM_FMT ioctl and VIDIOC_ENUM_FRAMESIZES ioctl for V4L2
+devices.
+
+The device sends supported formats as VIRTIO_VDEC_HOST_REQ_QUERY with
+the following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_range {
+        le32 min;
+        le32 max;
+        le32 step;
+};
+
+struct virtio_vdec_format_desc {
+        le32 fourcc;
+        le32 mask;
+        struct virtio_vdec_range width;
+        struct virtio_vdec_range height;
+};
+
+#define VIRTIO_VDEC_NUM_FORMATS 32
+
+struct virtio_vdec_host_req_query {
+        struct virtio_vdec_format_desc bitstream_formats[VIRTIO_VDEC_NUM_FORMATS];
+        struct virtio_vdec_format_desc frame_formats[VIRTIO_VDEC_NUM_FORMATS];
+};
+\end{lstlisting}
+
+The \field{virtio_vdec_host_req_query} describes the set of supported
+bitstream and frame formats, with corresponding ranges of image
+resolution and compatibility mask to determine the compatibility
+between coded and raw formats.
+
+\begin{description}
+\item[fourcc] specifies a FOURCC of a supported video codec or a pixel
+  format.
+\item[mask] is used to represent supported combination of video codec
+  and pixel format. If i-th bit is set in \field{mask} of the
+  \field{bitstream_formats[j]}, the device MUST support decoding from
+  the video codec of \field{bitstream_formats[j]} to the pixel format
+  of \field{frame_formats[i]}.
+\item[width/height] represents either a supported stream resolution or
+  a supported frame buffer resolution.
+\end{description}
+
+\devicenormative{\paragraph}{Query Capabilities}{Device Types / Video
+  Decode Device / Device Operation / Query Capabilities}
+
+The device MUST set \field{mask} to 0 if the \field{struct
+  virtio_vdec_format_desc} is invalid in VIRTIO_VDEC_HOST_REQ_QUERY.
+
+\subsubsection{Instance Open}
+\label{sec:Device Types / Video Decode Device / Device Operation / Instance Open}
+
+To acquire a decoder instance for a given coded format, the driver
+sends a VIRTIO_VDEC_GUEST_REQ_OPEN request. The driver specifies a
+video codec with the following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_guest_req_open {
+        le32 bitstream_fourcc;
+};
+\end{lstlisting}
+
+When the guest request comes, the device allocates the instance and
+sends a VIRTIO_VDEC_HOST_REQ_OPEN request.
+
+Once the driver recieves VIRTIO_VDEC_HOST_REQ_OPEN, the driver can
+send per-instance guest requests to the opened instance by specifying
+\field{instance_id}. The device MUST process per-instance requests
+with a same value of \field{instance_id} in order.
+
+\drivernormative{\paragraph}{Instance Open}{Device Types / Video
+  Decode Device / Device Operation / Instance Open}
+
+In VIRTIO_VDEC_GUEST_REQ_OPEN,
+\begin{itemize*}
+\item The driver MUST set \field{bitstream_fourcc} to a FOURCC of a
+  video codec that is supported by the device.
+\item The driver MUST set \field{instance_id} to an integer that is
+  not used as an \field{instance_id} before.
+\end{itemize*}
+
+\devicenormative{\paragraph}{Instance Open}{Device Types / Video
+  Decode Device / Device Operation / Instance Open}
+
+If the device fails to allocate an instance, the device MUST set
+\field{result} to a value other than
+VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS.
+
+\subsubsection{Set Buffer Count}
+\label{sec:Device Types / Video Decode Device / Device Operation / Set Buffer Count}
+
+Once an instance is acquired, the driver and the device negotiate the
+number of buffers to use during the decoding process by
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT and
+VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. These requests are similar to
+VIDIOC_REQBUFS controls for V4L2 devices. First, the driver sends
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to tell how many buffers the
+driver needs by using the following structure:
+
+\begin{lstlisting}
+enum virtio_vdec_buffer_type {
+        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_BITSTREAM = 0,
+        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_FRAME = 1,
+};
+
+struct virtio_vdec_guest_req_set_buffer_count {
+        le32 type;
+        le32 buffer_count;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[buffer_count] is the minimum number of buffers that the driver
+  needs to use.
+\end{description}
+
+The device responds the VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to
+notify the number of buffers that it can accept with the following
+structure.
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_set_buffer_count {
+        le32 type;
+        le32 buffer_count;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[buffer_count] is the maximum number of buffers that the device
+  can accept.
+\end{description}
+
+\drivernormative{\paragraph}{Set Buffer Count}{Device Types / Video
+  Decode Device / Device Operation / Set Buffer Count}
+
+The driver MUST set \field{buffer_count} to a non-zero value in
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. The value of
+\field{buffer_count} for frame buffers MUST match the requirement
+given via VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types
+  / Video Decode Device / Device Operation / Stream Information}.
+
+\devicenormative{\paragraph}{Set Buffer Count}{Device Types / Video
+  Decode Device / Device Operation / Set Buffer Count}
+
+The device MUST set \field{buffer_count} to a number of buffers that
+the device can accept. The value MAY be different from the value the
+driver set \field{buffer_count} to in the corresponding
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT request.
+
+\subsubsection{Buffer Registration}
+\label{sec:Device Types / Video Decode Device / Device Operation / Buffer Registration}
+
+Once the number of buffers is set, the driver registers metadata of
+buffers via the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER with the
+following structure:
+
+\begin{lstlisting}
+#define VIRTIO_VDEC_MAX_PLANES 8
+
+struct virtio_vdec_guest_req_register_buffer {
+        le32 type;
+        le32 num_planes;
+        struct virtio_vdec_plane {
+                le32 handle;
+                le32 offset;
+                le32 length;
+        } planes[VIRTIO_VDEC_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[VIRTIO_VDEC_MAX_PLANES] is the maximum number of planes that a
+  video format needs. It is similar to VIDEO_MAX_PLANES in
+  \href{https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/videodev2.h}{include/uapi/linux/videodev2.h}
+  in the Linux source tree.
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[num_planes] is the number of planes (i.e. separate memory
+  buffers) for this format. This indicates the number of valid entries
+  in \field{planes}.
+\item[planes] is an array of structures describing information of each
+  plane.
+  \begin{description}
+  \item[handle] is a resource handle for the buffer.
+  \item[offset] equals to the offset in the plane to the start of
+    data.
+  \item[length] is the size of this plane in bytes.
+  \end{description}
+\end{description}
+
+Once the device accepts the buffer registration, the device responds
+the VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER with the following structure.
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_register_buffer {
+        le32 type;
+        le32 handles[VIRTIO_VDEC_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[handles] is an array of resource handles from the corresponding
+  guest request.
+\end{description}
+
+\drivernormative{\paragraph}{Buffer Registration}{Device Types / Video
+  Decode Device / Device Operation / Buffer Registration}
+
+The VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER requires followings:
+
+\begin{itemize*}
+\item The driver MUST set \field{type}.
+\item The driver MUST set \field{num_planes} to a non-zero value that
+  is less than of equal to VIRTIO_VDEC_MAX_PLANES.
+\item The driver MUST fill out the fields of \field{planes[i]} for
+  each \(0 \le i < \field{num_planes} \).
+\item The driver MUST NOT send VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER
+  more than the number of times that the device requested via
+  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT.
+\end{itemize*}
+
+\devicenormative{\paragraph}{Buffer Registration}{Device Types / Video
+  Decode Device / Device Operation / Buffer Registration}
+
+The VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER requires followings:
+
+\begin{itemize*}
+\item The device MUST set \field{type} to the value that the
+  corresponding VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER used.
+\item The device MUST set \field{handles[i]} to the value that equals
+  to \field{register_buffer.planes[i].handle} in the corresponding
+  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
+\end{itemize*}
+
+\subsubsection{Bitstream Buffer Requests}
+\label{sec:Device Types / Video Decode Device / Device Operation / Bitstream Buffer Requests}
+
+The driver sends VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER to ask the
+device to decode contents in a registered bitstream buffer. This
+request is similar to the V4L2's VIDIOC_QBUF ioctl for the OUTPUT
+buffer. The following structure is used to specify the buffer:
+
+\begin{lstlisting}
+struct virtio_vdec_guest_req_bitstream_buffer {
+        le32 handle;
+        le32 offset;
+        le32 length;
+        le64 cookie;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[handle] is a resource handle for the bitstream buffer to be
+  decoded.
+\item[offset] is the offset in bytes to video data in the buffer.
+\item[length] is the size of valid data in the buffer.
+\item[cookie] is an identifier for the frame decoded from this
+  bitstream data. This cookie will be copied to the matching frame
+  buffer. See also \field{virtio_vdec_host_req_frame_buffer} in
+  \ref{sec:Device Types / Video Decode Device / Device Operation /
+    Frame Buffer Requests}.
+\end{description}
+
+Once the device consumes the bitstream passed by the
+VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request, it sends
+VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER with the following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_bitstream_buffer {
+        le32 handle;
+};
+\end{lstlisting}
+\begin{description}
+\item[handle] is a resource handle of the bitstream buffer that the
+  device completed processing.
+\end{description}
+
+The driver can reuse bitstream buffers that
+VIRTIO_VDEC_HOST_REQ_BITSTREAM is sent for.
+
+\drivernormative{\paragraph}{Bitstream Buffer Requests}{Device Types /
+  Video Decode Device / Device Operation / Bitstream Buffer Requests}
+
+\begin{itemize*}
+\item The driver MUST use \field{handle} which is already registered
+  by the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
+\end{itemize*}
+
+\subsubsection{Stream Information}
+\label{sec:Device Types / Video Decode Device / Device Operation / Stream Information}
+
+To obtain the information required to allocate frame buffers
+(resolution, etc.), the driver registers bitstream buffers first and
+proceeds with queuing VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER requests
+until stream information is received via a
+VIRTIO_VDEC_HOST_REQ_STREAM_INFO host request. The
+VIRTIO_VDEC_HOST_REQ_STREAM_INFO is also sent when a run-time
+resolution change is detected. This request is similar to
+V4L2_EVENT_SOURCE_CHANGE events for V4L2 devices.
+
+The following structure is used for the
+VIRTIO_VDEC_HOST_REQ_STREAM_INFO:
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_stream_info {
+        le32 frame_fourcc;
+        le32 fb_width;
+        le32 fb_height;
+        le32 min_frame_buffer_count;
+        le32 max_frame_buffer_count;
+        struct virtio_vdec_host_req_stream_crop {
+                le32 left;
+                le32 top;
+                le32 width;
+                le32 height;
+        } crop;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[frame_fourcc] indicates the pixel format that the video is
+  decoded to.
+\item[fb_width] is the width of a required frame buffer.
+\item[fb_height] is the height of a required frame buffer.
+\item[min_frame_buffer_count] is the minimum number of frame buffers
+  that the device requires.
+\item[max_frame_buffer_count] is the maximum number of frame buffers
+  that the device can accept.
+\item[crop] indicates video cropping information.
+    \begin{description}
+    \item[left] is the horizontal offset of the left-most valid pixel
+      of the video in pixels.
+    \item[top] is the vertical offset of the top-most valid line of
+      the video in lines.
+    \item[width] is the width of the rectangle in pixels.
+    \item[height] is the height of the rectangle in pixels.
+    \end{description}
+\end{description}
+
+The driver responds the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO once it
+processes the incoming stream information.
+
+\devicenormative{\paragraph}{Stream Information}{Device Types / Video
+  Decode Device / Device Operation / Stream Information}
+
+\begin{itemize*}
+\item After the device send the VIRTIO_VDEC_HOST_REQ_STREAM_INFO, the
+  device MUST empty its list of registered frame buffers and skip any
+  pending VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER requests until a
+  VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO request is received.
+\end{itemize*}
+
+\drivernormative{\paragraph}{Stream Information}{Device Types / Video
+  Decode Device / Device Operation / Stream Information}
+
+\begin{itemize*}
+\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO
+  once it processes a VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
+\item If VIRTIO_VDEC_HOST_REQ_STREAM_INFO came when the driver is
+  waiting for a VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER request, the driver
+  MUST enqueue the corresponding VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
+  again after sending the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for a
+  frame buffer with new stream information.
+\end{itemize*}
+
+\subsubsection{Frame Buffer Requests}
+\label{sec:Device Types / Video Decode Device / Device Operation / Frame Buffer Requests}
+
+The driver provides frame buffers via
+VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER, in which the device will fill
+decoded frames. This request is similar to V4L2's VIDIOC_QBUF ioctl
+for CAPTURE buffers. The following structure is used for the request:
+
+\begin{lstlisting}
+struct virtio_vdec_guest_req_frame_buffer {
+        struct virtio_vdec_frame_buffer_plane {
+                le32 handle;
+                le32 offset;
+                le32 stride;
+                le32 length;
+        } planes[VIRTIO_VDEC_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[planes] is an array of planes that a frame buffer consists of.
+    \begin{description}
+    \item[handle] is a resource handle to indicate the buffer for a
+      plane.
+    \item[offset] is the offset in bytes to the place where decoded
+      frame is filled in.
+    \item[stride] is the line stride.
+    \item[length] is the length of plane where the device can fill in
+      decoded data.
+    \end{description}
+\end{description}
+
+When the device fills the buffer with decoded frame data, the device
+notifies to the driver by VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER with the
+following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_frame_buffer {
+        le32 handle[VIRTIO_VDEC_MAX_PLANES];
+        le64 cookie;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[handles] is an array of handles passed via the
+  \field{virtio_vdec_guest_req_frame_buffer} in the corresponding
+  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER.
+\item[cookie] is a \field{cookie} passed via a
+  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. This indicates a bitstream
+  buffer which a frame was decoded from.
+\end{description}
+
+\drivernormative{\paragraph}{Frame Buffer Requests}{Device Types /
+  Video Decode Device / Device Operation / Frame Buffer Requests}
+
+\begin{itemize*}
+\item Before the driver sends the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
+  for a frame buffer, the driver MUST have sent the
+  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for the buffer.
+\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER for
+  a frame buffer that satisfies the restriction passed by
+  VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
+\end{itemize*}
+
+\devicenormative{\paragraph}{Frame Buffer Requests}{Device Types /
+  Video Decode Device / Device Operation / Frame Buffer Requests}
+
+When the device successfully processed the frame buffer, it MUST set
+\field{cookie} to a value that is passed via a
+VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request that provided the
+corresponding bitstream.
+
+\subsubsection{Drain}
+\label{sec:Device Types / Video Decode Device / Device Operation / Drain}
+
+To ensure that any pending requests are processed and decoded frames
+returned in appropriate replies, the driver MAY issue a
+VIRTIO_VDEC_GUEST_REQ_DRAIN request. The device will continue
+processing the requests as usual, but once all the requests of given
+context preceding the DRAIN request are processed, the device will
+issue a VIRTIO_VDEC_HOST_REQ_DRAINED host request to notify the driver
+of this fact.
+
+\devicenormative{\paragraph}{Drain}{Device Types / Video Decode Device
+  / Device Operation / Drain}
+
+\begin{itemize*}
+\item When VIRTIO_VDEC_GUEST_REQ_DRAIN comes, the device MUST process
+  all pending guest requests. After processing them, the device MUST
+  send VIRTIO_VDEC_HOST_REQ_DRAINED to the driver.
+\item While processing a drain request, the device MUST ignore any
+  guest requests other than Flush and Close requests. See
+  \ref{sec:Device Types / Video Decode Device / Device Operation / Flush}
+  and
+  \ref{sec:Device Types / Video Decode Device / Device Operation / Instance
+    Close}.
+\end{itemize*}
+
+\subsubsection{Flush}
+\label{sec:Device Types / Video Decode Device / Device Operation / Flush}
+
+To drop any pending decode requests and decoded frames pending on
+output buffers, the driver MAY issue a VIRTIO_VDEC_GUEST_REQ_FLUSH
+request. Any frames that would result from the previously enqueued
+bitstream data will be dropped and the device will not write to any of
+the frame buffers provided earlier.
+
+\devicenormative{\paragraph}{Flush}{Device Types / Video Decode Device
+  / Device Operation / Flush}
+
+\begin{itemize*}
+\item The device MUST send VIRTIO_VDEC_HOST_REQ_FLUSHED when finishing
+  the flush process.
+\item If VIRTIO_VDEC_GUEST_REQ_FLUSH comes while processing guest
+  requests, the device MUST stop them.
+\item The device MUST NOT process any guest request that comes after
+  VIRTIO_VDEC_GUEST_REQ_FLUSH until it sends
+  VIRTIO_VDEC_HOST_REQ_FLUSHED.
+\item The device MUST dequeue all unread requests in outq when
+  VIRTIO_VDEC_GUEST_REQ_FLUSH comes.
+\end{itemize*}
+
+\subsubsection{EOS Notification}
+\label{sec:Device Types / Video Decode Device / Device Operation / EOS Notification}
+
+The device sends VIRTIO_VDEC_HOST_REQ_EOS when the end of a stream
+(EOS) is reached while decoding.
+
+This request is similar to the V4L2_EVENT_EOS for V4L2 devices.
+
+\subsubsection{Instance Close}
+\label{sec:Device Types / Video Decode Device / Device Operation / Instance Close}
+
+If a decoding instance is no longer needed, the driver SHOULD issue a
+VIRTIO_VDEC_GUEST_REQ_CLOSE request. This will trigger an implicit
+flush operation (as in a VIRTIO_VDEC_GUEST_REQ_FLUSH request; without
+a following host request) and free the instance ID and any associated
+resources on the device side.
+
+The instance is successfully terminated once the device responds to
+the request. After that, it is guaranteed that any buffers registered
+by the driver to given instance will no longer be accessed by the
+device (unless also provided to another instance).
+
+\subsubsection{Error Reporting}
+\label{sec:Device Types / Video Decode Device / Device Operation / Error Reporting}
+
+The device has two ways of reporting errors to the driver: one is for
+errors associated with a host request, the other is for global errors.
+
+If an error is associated with a host request, the device sets
+\field{type} to the host request and \field{result} to a value of
+\field{virtio_vdec_host_req_result} describing the error in
+\field{virtio_vdec_host_req}.
+
+If the device cannot proceed decoding due to an error which is not
+associated with any requests, the device MUST send the
+VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR. In such case, the driver
+MUST stop sending decoding requests.
--
2.23.0.237.gc6a4ce50a0-goog

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-09-19  9:34 ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-09-19  9:34 UTC (permalink / raw)
  To: virtio-dev
  Cc: keiichiw, acourbot, alexlau, dgreid, marcheu, posciak, stevensd,
	tfiga, hverkuil, linux-media

[Resending because of some issues with sending to virtio-dev. Sorry for the noise.]

This patch proposes virtio specification for new virtio video decode
device.
This device provides the functionality of hardware accelerated video
decoding from encoded video contents provided by the guest into frame
buffers accessible by the guest.

We have prototype implementation for VMs on Chrome OS:
* virtio-vdec device in crosvm [1]
* virtio-vdec driver in Linux kernel v4.19 [2]
  - This driver follows V4L2 stateful video decoder API [3].

Our prototype implementation uses [4], which allows the virtio-vdec
device to use buffers allocated by virtio-gpu device.

Any feedback would be greatly appreciated. Thank you.

[1] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
[2] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
[3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be merged to Linux 5.4)
[4] https://lkml.org/lkml/2019/9/12/157

Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
---
 content.tex     |   1 +
 virtio-vdec.tex | 750 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 751 insertions(+)
 create mode 100644 virtio-vdec.tex

diff --git a/content.tex b/content.tex
index 37a2190..b57d4a9 100644
--- a/content.tex
+++ b/content.tex
@@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
 \input{virtio-input.tex}
 \input{virtio-crypto.tex}
 \input{virtio-vsock.tex}
+\input{virtio-vdec.tex}

 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}

diff --git a/virtio-vdec.tex b/virtio-vdec.tex
new file mode 100644
index 0000000..d117129
--- /dev/null
+++ b/virtio-vdec.tex
@@ -0,0 +1,750 @@
+\section{Video Decode Device}
+\label{sec:Device Types / Video Decode Device}
+
+virtio-vdec is a virtio based video decoder. This device provides the
+functionality of hardware accelerated video decoding from encoded
+video contents provided by the guest into frame buffers accessible by
+the guest.
+
+\subsection{Device ID}
+\label{sec:Device Types / Video Decode Device / Device ID}
+
+28
+
+\subsection{Virtqueues}
+\label{sec:Device Types / Video Decode Device / Virtqueues}
+
+\begin{description}
+\item[0] outq - queue for sending requests from the driver to the
+  device
+\item[1] inq - queue for sending requests from the device to the
+  driver
+\end{description}
+
+Each queue is used uni-directionally. outq is used to send requests
+from the driver to the device (i.e., guest requests) and inq is used
+to send requests in the other direction (i.e., host requests).
+
+\subsection{Feature bits}
+\label{sec:Device Types / Video Decode Device / Feature bits}
+
+There are currently no feature bits defined for this device.
+
+\subsection{Device configuration layout}
+\label{sec:Device Types / Video Decode Device / Device configuration layout}
+
+None.
+
+\subsection{Device Requirements: Device Initialization}
+\label{sec:Device Types / Video Decode Device / Device Requirements: Device Initialization}
+
+The virtqueues are initialized.
+
+\subsection{Device Operation}
+\label{sec:Device Types / Video Decode Device / Device Operation}
+
+\subsubsection{Video Buffers}
+\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
+
+A virtio-vdec driver and a device use two types of video buffers:
+\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
+contains encoded video stream data. This buffer is similar to an
+OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
+contains decoded video frame data like a CAPTURE buffers for V4L2 API.
+The driver and the device share these buffers, and each buffer is
+identified by a unique integer called a \emph{resource handle}.
+
+\subsubsection{Guest Request}
+
+The driver queues requests to the outq virtqueue. The device MAY
+process requests out-of-order. All requests on outq use the following
+structure:
+
+\begin{lstlisting}
+enum virtio_vdec_guest_req_type {
+        VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
+
+        /* Global */
+        VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
+
+        /* Per instance */
+        VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
+        VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
+        VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
+        VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
+        VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
+        VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
+        VIRTIO_VDEC_GUEST_REQ_DRAIN,
+        VIRTIO_VDEC_GUEST_REQ_FLUSH,
+        VIRTIO_VDEC_GUEST_REQ_CLOSE,
+};
+
+struct virtio_vdec_guest_req {
+        le32 type;
+        le32 instance_id;
+        union {
+                struct virtio_vdec_guest_req_open open;
+                struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
+                struct virtio_vdec_guest_req_register_buffer register_buffer;
+                struct virtio_vdec_guest_req_bitstream_buffer bitstream_buffer;
+                struct virtio_vdec_guest_req_frame_buffer frame_buffer;
+        };
+};
+\end{lstlisting}
+
+This structure includes the following generic fields to identify the
+request:
+\begin{description}
+\item[\field{type}] specifies the type of the guest request using
+  values from \field{enum virtio_vdec_guest_req_type}.
+\item[\field{instance_id}] specifies an instance ID. This field is
+  ignored unless \field{type} represents a per-instance request.
+\end{description}
+
+The union fields are additional per-request data used only when
+\field{type} has a specific value:
+
+\begin{description}
+\item[\field{open}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_OPEN. See \ref{sec:Device Types / Video Decode
+    Device / Device Operation / Instance Open}.
+\item[\field{set_buffer_count}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Set Buffer Count}.
+\item[\field{register_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Buffer Registration}.
+\item[\field{bitstream_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Bitstream Buffer
+    Requests}.
+\item[\field{frame_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Frame Buffer Requests}.
+\end{description}
+
+\subsubsection{Host Request}
+
+The device queues requests to the inq virtqueue. All requests on inq
+use the following structure:
+
+\begin{lstlisting}
+enum virtio_vdec_host_req_type {
+        VIRTIO_VDEC_HOST_REQ_UNDEFINED = 0,
+
+        /* Global */
+        VIRTIO_VDEC_HOST_REQ_QUERY = 0x0100,
+
+        /* Per instance */
+        VIRTIO_VDEC_HOST_REQ_OPEN = 0x0200,
+        VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT,
+        VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER,
+        VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER,
+        VIRTIO_VDEC_HOST_REQ_STREAM_INFO,
+        VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER,
+        VIRTIO_VDEC_HOST_REQ_DRAINED,
+        VIRTIO_VDEC_HOST_REQ_FLUSHED,
+        VIRTIO_VDEC_HOST_REQ_CLOSE,
+        VIRTIO_VDEC_HOST_REQ_EOS,
+
+        /* Error response */
+        VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR = 0x1100,
+};
+
+enum virtio_vdec_host_req_result {
+        /* Success */
+        VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS = 0,
+
+        /* Error */
+        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_UNSPEC = 0x1000,
+        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_INVALID_REQUEST,
+};
+
+struct virtio_vdec_host_req {
+        le32 type;     /* VIRTIO_VDEC_HOST_REQ_* */
+        le32 result;   /* VIRTIO_VDEC_HOST_REQ_RESULT_ */
+        le32 instance_id;
+        le32 reserved; /* for 64-bit alignment */
+        union {
+                struct virtio_vdec_host_req_query query;
+                struct virtio_vdec_host_req_set_buffer_count set_buffer_count;
+                struct virtio_vdec_host_req_register_buffer register_buffer;
+                struct virtio_vdec_host_req_bitstream_buffer bitstream_buffer;
+                struct virtio_vdec_host_req_stream_info stream_info;
+                struct virtio_vdec_host_req_frame_buffer frame_buffer;
+        };
+};
+\end{lstlisting}
+
+This structure includes the following generic fields:
+\begin{description}
+\item[\field{type}] specifies the type of the host request using
+  values from \field{enum virtio_vdec_host_req_type}.
+\item[\field{result}] specifies the result of the operation using the
+  \field{enum virtio_vdec_host_req_result}.
+\item[\field{instance_id}] specifies an instance ID. This field is
+  ignored unless \field{type} represents a per-instance request.
+\end{description}
+
+The union fields are additional per-request data used only when
+\field{type} has a specific value:
+
+\begin{description}
+\item[\field{query}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_QUERY. See \ref{sec:Device Types / Video Decode
+    Device / Device Operation / Query Capabilities}.
+\item[\field{set_buffer_count}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Set Buffer Count}.
+\item[\field{register_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Buffer Registration}.
+\item[\field{bitstream_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
+    Video Decode Device / Device Operation / Bitstream Buffer
+    Requests}.
+\item[\field{stream_info}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types / Video
+    Decode Device / Device Operation / Stream Information}.
+\item[\field{frame_buffer}] is used when \field{type} is set to
+  VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER. See \ref{sec:Device Types / Video
+    Decode Device / Device Operation / Frame Buffer Requests}.
+\end{description}
+
+\subsubsection{Query Capabilities}
+\label{sec:Device Types / Video Decode Device / Device Operation / Query Capabilities}
+
+The driver can send VIRTIO_VDEC_GUEST_REQ_QUERY to query supported
+bitstream formats and frame formats. This is similar to
+VIDIOC_ENUM_FMT ioctl and VIDIOC_ENUM_FRAMESIZES ioctl for V4L2
+devices.
+
+The device sends supported formats as VIRTIO_VDEC_HOST_REQ_QUERY with
+the following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_range {
+        le32 min;
+        le32 max;
+        le32 step;
+};
+
+struct virtio_vdec_format_desc {
+        le32 fourcc;
+        le32 mask;
+        struct virtio_vdec_range width;
+        struct virtio_vdec_range height;
+};
+
+#define VIRTIO_VDEC_NUM_FORMATS 32
+
+struct virtio_vdec_host_req_query {
+        struct virtio_vdec_format_desc bitstream_formats[VIRTIO_VDEC_NUM_FORMATS];
+        struct virtio_vdec_format_desc frame_formats[VIRTIO_VDEC_NUM_FORMATS];
+};
+\end{lstlisting}
+
+The \field{virtio_vdec_host_req_query} describes the set of supported
+bitstream and frame formats, with corresponding ranges of image
+resolution and compatibility mask to determine the compatibility
+between coded and raw formats.
+
+\begin{description}
+\item[fourcc] specifies a FOURCC of a supported video codec or a pixel
+  format.
+\item[mask] is used to represent supported combination of video codec
+  and pixel format. If i-th bit is set in \field{mask} of the
+  \field{bitstream_formats[j]}, the device MUST support decoding from
+  the video codec of \field{bitstream_formats[j]} to the pixel format
+  of \field{frame_formats[i]}.
+\item[width/height] represents either a supported stream resolution or
+  a supported frame buffer resolution.
+\end{description}
+
+\devicenormative{\paragraph}{Query Capabilities}{Device Types / Video
+  Decode Device / Device Operation / Query Capabilities}
+
+The device MUST set \field{mask} to 0 if the \field{struct
+  virtio_vdec_format_desc} is invalid in VIRTIO_VDEC_HOST_REQ_QUERY.
+
+\subsubsection{Instance Open}
+\label{sec:Device Types / Video Decode Device / Device Operation / Instance Open}
+
+To acquire a decoder instance for a given coded format, the driver
+sends a VIRTIO_VDEC_GUEST_REQ_OPEN request. The driver specifies a
+video codec with the following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_guest_req_open {
+        le32 bitstream_fourcc;
+};
+\end{lstlisting}
+
+When the guest request comes, the device allocates the instance and
+sends a VIRTIO_VDEC_HOST_REQ_OPEN request.
+
+Once the driver recieves VIRTIO_VDEC_HOST_REQ_OPEN, the driver can
+send per-instance guest requests to the opened instance by specifying
+\field{instance_id}. The device MUST process per-instance requests
+with a same value of \field{instance_id} in order.
+
+\drivernormative{\paragraph}{Instance Open}{Device Types / Video
+  Decode Device / Device Operation / Instance Open}
+
+In VIRTIO_VDEC_GUEST_REQ_OPEN,
+\begin{itemize*}
+\item The driver MUST set \field{bitstream_fourcc} to a FOURCC of a
+  video codec that is supported by the device.
+\item The driver MUST set \field{instance_id} to an integer that is
+  not used as an \field{instance_id} before.
+\end{itemize*}
+
+\devicenormative{\paragraph}{Instance Open}{Device Types / Video
+  Decode Device / Device Operation / Instance Open}
+
+If the device fails to allocate an instance, the device MUST set
+\field{result} to a value other than
+VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS.
+
+\subsubsection{Set Buffer Count}
+\label{sec:Device Types / Video Decode Device / Device Operation / Set Buffer Count}
+
+Once an instance is acquired, the driver and the device negotiate the
+number of buffers to use during the decoding process by
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT and
+VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. These requests are similar to
+VIDIOC_REQBUFS controls for V4L2 devices. First, the driver sends
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to tell how many buffers the
+driver needs by using the following structure:
+
+\begin{lstlisting}
+enum virtio_vdec_buffer_type {
+        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_BITSTREAM = 0,
+        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_FRAME = 1,
+};
+
+struct virtio_vdec_guest_req_set_buffer_count {
+        le32 type;
+        le32 buffer_count;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[buffer_count] is the minimum number of buffers that the driver
+  needs to use.
+\end{description}
+
+The device responds the VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to
+notify the number of buffers that it can accept with the following
+structure.
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_set_buffer_count {
+        le32 type;
+        le32 buffer_count;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[buffer_count] is the maximum number of buffers that the device
+  can accept.
+\end{description}
+
+\drivernormative{\paragraph}{Set Buffer Count}{Device Types / Video
+  Decode Device / Device Operation / Set Buffer Count}
+
+The driver MUST set \field{buffer_count} to a non-zero value in
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. The value of
+\field{buffer_count} for frame buffers MUST match the requirement
+given via VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types
+  / Video Decode Device / Device Operation / Stream Information}.
+
+\devicenormative{\paragraph}{Set Buffer Count}{Device Types / Video
+  Decode Device / Device Operation / Set Buffer Count}
+
+The device MUST set \field{buffer_count} to a number of buffers that
+the device can accept. The value MAY be different from the value the
+driver set \field{buffer_count} to in the corresponding
+VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT request.
+
+\subsubsection{Buffer Registration}
+\label{sec:Device Types / Video Decode Device / Device Operation / Buffer Registration}
+
+Once the number of buffers is set, the driver registers metadata of
+buffers via the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER with the
+following structure:
+
+\begin{lstlisting}
+#define VIRTIO_VDEC_MAX_PLANES 8
+
+struct virtio_vdec_guest_req_register_buffer {
+        le32 type;
+        le32 num_planes;
+        struct virtio_vdec_plane {
+                le32 handle;
+                le32 offset;
+                le32 length;
+        } planes[VIRTIO_VDEC_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[VIRTIO_VDEC_MAX_PLANES] is the maximum number of planes that a
+  video format needs. It is similar to VIDEO_MAX_PLANES in
+  \href{https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/videodev2.h}{include/uapi/linux/videodev2.h}
+  in the Linux source tree.
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[num_planes] is the number of planes (i.e. separate memory
+  buffers) for this format. This indicates the number of valid entries
+  in \field{planes}.
+\item[planes] is an array of structures describing information of each
+  plane.
+  \begin{description}
+  \item[handle] is a resource handle for the buffer.
+  \item[offset] equals to the offset in the plane to the start of
+    data.
+  \item[length] is the size of this plane in bytes.
+  \end{description}
+\end{description}
+
+Once the device accepts the buffer registration, the device responds
+the VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER with the following structure.
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_register_buffer {
+        le32 type;
+        le32 handles[VIRTIO_VDEC_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[type] specifies a buffer type with a value of the
+  \field{virtio_vdec_buffer_type}.
+\item[handles] is an array of resource handles from the corresponding
+  guest request.
+\end{description}
+
+\drivernormative{\paragraph}{Buffer Registration}{Device Types / Video
+  Decode Device / Device Operation / Buffer Registration}
+
+The VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER requires followings:
+
+\begin{itemize*}
+\item The driver MUST set \field{type}.
+\item The driver MUST set \field{num_planes} to a non-zero value that
+  is less than of equal to VIRTIO_VDEC_MAX_PLANES.
+\item The driver MUST fill out the fields of \field{planes[i]} for
+  each \(0 \le i < \field{num_planes} \).
+\item The driver MUST NOT send VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER
+  more than the number of times that the device requested via
+  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT.
+\end{itemize*}
+
+\devicenormative{\paragraph}{Buffer Registration}{Device Types / Video
+  Decode Device / Device Operation / Buffer Registration}
+
+The VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER requires followings:
+
+\begin{itemize*}
+\item The device MUST set \field{type} to the value that the
+  corresponding VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER used.
+\item The device MUST set \field{handles[i]} to the value that equals
+  to \field{register_buffer.planes[i].handle} in the corresponding
+  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
+\end{itemize*}
+
+\subsubsection{Bitstream Buffer Requests}
+\label{sec:Device Types / Video Decode Device / Device Operation / Bitstream Buffer Requests}
+
+The driver sends VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER to ask the
+device to decode contents in a registered bitstream buffer. This
+request is similar to the V4L2's VIDIOC_QBUF ioctl for the OUTPUT
+buffer. The following structure is used to specify the buffer:
+
+\begin{lstlisting}
+struct virtio_vdec_guest_req_bitstream_buffer {
+        le32 handle;
+        le32 offset;
+        le32 length;
+        le64 cookie;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[handle] is a resource handle for the bitstream buffer to be
+  decoded.
+\item[offset] is the offset in bytes to video data in the buffer.
+\item[length] is the size of valid data in the buffer.
+\item[cookie] is an identifier for the frame decoded from this
+  bitstream data. This cookie will be copied to the matching frame
+  buffer. See also \field{virtio_vdec_host_req_frame_buffer} in
+  \ref{sec:Device Types / Video Decode Device / Device Operation /
+    Frame Buffer Requests}.
+\end{description}
+
+Once the device consumes the bitstream passed by the
+VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request, it sends
+VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER with the following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_bitstream_buffer {
+        le32 handle;
+};
+\end{lstlisting}
+\begin{description}
+\item[handle] is a resource handle of the bitstream buffer that the
+  device completed processing.
+\end{description}
+
+The driver can reuse bitstream buffers that
+VIRTIO_VDEC_HOST_REQ_BITSTREAM is sent for.
+
+\drivernormative{\paragraph}{Bitstream Buffer Requests}{Device Types /
+  Video Decode Device / Device Operation / Bitstream Buffer Requests}
+
+\begin{itemize*}
+\item The driver MUST use \field{handle} which is already registered
+  by the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
+\end{itemize*}
+
+\subsubsection{Stream Information}
+\label{sec:Device Types / Video Decode Device / Device Operation / Stream Information}
+
+To obtain the information required to allocate frame buffers
+(resolution, etc.), the driver registers bitstream buffers first and
+proceeds with queuing VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER requests
+until stream information is received via a
+VIRTIO_VDEC_HOST_REQ_STREAM_INFO host request. The
+VIRTIO_VDEC_HOST_REQ_STREAM_INFO is also sent when a run-time
+resolution change is detected. This request is similar to
+V4L2_EVENT_SOURCE_CHANGE events for V4L2 devices.
+
+The following structure is used for the
+VIRTIO_VDEC_HOST_REQ_STREAM_INFO:
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_stream_info {
+        le32 frame_fourcc;
+        le32 fb_width;
+        le32 fb_height;
+        le32 min_frame_buffer_count;
+        le32 max_frame_buffer_count;
+        struct virtio_vdec_host_req_stream_crop {
+                le32 left;
+                le32 top;
+                le32 width;
+                le32 height;
+        } crop;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[frame_fourcc] indicates the pixel format that the video is
+  decoded to.
+\item[fb_width] is the width of a required frame buffer.
+\item[fb_height] is the height of a required frame buffer.
+\item[min_frame_buffer_count] is the minimum number of frame buffers
+  that the device requires.
+\item[max_frame_buffer_count] is the maximum number of frame buffers
+  that the device can accept.
+\item[crop] indicates video cropping information.
+    \begin{description}
+    \item[left] is the horizontal offset of the left-most valid pixel
+      of the video in pixels.
+    \item[top] is the vertical offset of the top-most valid line of
+      the video in lines.
+    \item[width] is the width of the rectangle in pixels.
+    \item[height] is the height of the rectangle in pixels.
+    \end{description}
+\end{description}
+
+The driver responds the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO once it
+processes the incoming stream information.
+
+\devicenormative{\paragraph}{Stream Information}{Device Types / Video
+  Decode Device / Device Operation / Stream Information}
+
+\begin{itemize*}
+\item After the device send the VIRTIO_VDEC_HOST_REQ_STREAM_INFO, the
+  device MUST empty its list of registered frame buffers and skip any
+  pending VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER requests until a
+  VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO request is received.
+\end{itemize*}
+
+\drivernormative{\paragraph}{Stream Information}{Device Types / Video
+  Decode Device / Device Operation / Stream Information}
+
+\begin{itemize*}
+\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO
+  once it processes a VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
+\item If VIRTIO_VDEC_HOST_REQ_STREAM_INFO came when the driver is
+  waiting for a VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER request, the driver
+  MUST enqueue the corresponding VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
+  again after sending the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for a
+  frame buffer with new stream information.
+\end{itemize*}
+
+\subsubsection{Frame Buffer Requests}
+\label{sec:Device Types / Video Decode Device / Device Operation / Frame Buffer Requests}
+
+The driver provides frame buffers via
+VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER, in which the device will fill
+decoded frames. This request is similar to V4L2's VIDIOC_QBUF ioctl
+for CAPTURE buffers. The following structure is used for the request:
+
+\begin{lstlisting}
+struct virtio_vdec_guest_req_frame_buffer {
+        struct virtio_vdec_frame_buffer_plane {
+                le32 handle;
+                le32 offset;
+                le32 stride;
+                le32 length;
+        } planes[VIRTIO_VDEC_MAX_PLANES];
+};
+\end{lstlisting}
+
+\begin{description}
+\item[planes] is an array of planes that a frame buffer consists of.
+    \begin{description}
+    \item[handle] is a resource handle to indicate the buffer for a
+      plane.
+    \item[offset] is the offset in bytes to the place where decoded
+      frame is filled in.
+    \item[stride] is the line stride.
+    \item[length] is the length of plane where the device can fill in
+      decoded data.
+    \end{description}
+\end{description}
+
+When the device fills the buffer with decoded frame data, the device
+notifies to the driver by VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER with the
+following structure:
+
+\begin{lstlisting}
+struct virtio_vdec_host_req_frame_buffer {
+        le32 handle[VIRTIO_VDEC_MAX_PLANES];
+        le64 cookie;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[handles] is an array of handles passed via the
+  \field{virtio_vdec_guest_req_frame_buffer} in the corresponding
+  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER.
+\item[cookie] is a \field{cookie} passed via a
+  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. This indicates a bitstream
+  buffer which a frame was decoded from.
+\end{description}
+
+\drivernormative{\paragraph}{Frame Buffer Requests}{Device Types /
+  Video Decode Device / Device Operation / Frame Buffer Requests}
+
+\begin{itemize*}
+\item Before the driver sends the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
+  for a frame buffer, the driver MUST have sent the
+  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for the buffer.
+\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER for
+  a frame buffer that satisfies the restriction passed by
+  VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
+\end{itemize*}
+
+\devicenormative{\paragraph}{Frame Buffer Requests}{Device Types /
+  Video Decode Device / Device Operation / Frame Buffer Requests}
+
+When the device successfully processed the frame buffer, it MUST set
+\field{cookie} to a value that is passed via a
+VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request that provided the
+corresponding bitstream.
+
+\subsubsection{Drain}
+\label{sec:Device Types / Video Decode Device / Device Operation / Drain}
+
+To ensure that any pending requests are processed and decoded frames
+returned in appropriate replies, the driver MAY issue a
+VIRTIO_VDEC_GUEST_REQ_DRAIN request. The device will continue
+processing the requests as usual, but once all the requests of given
+context preceding the DRAIN request are processed, the device will
+issue a VIRTIO_VDEC_HOST_REQ_DRAINED host request to notify the driver
+of this fact.
+
+\devicenormative{\paragraph}{Drain}{Device Types / Video Decode Device
+  / Device Operation / Drain}
+
+\begin{itemize*}
+\item When VIRTIO_VDEC_GUEST_REQ_DRAIN comes, the device MUST process
+  all pending guest requests. After processing them, the device MUST
+  send VIRTIO_VDEC_HOST_REQ_DRAINED to the driver.
+\item While processing a drain request, the device MUST ignore any
+  guest requests other than Flush and Close requests. See
+  \ref{sec:Device Types / Video Decode Device / Device Operation / Flush}
+  and
+  \ref{sec:Device Types / Video Decode Device / Device Operation / Instance
+    Close}.
+\end{itemize*}
+
+\subsubsection{Flush}
+\label{sec:Device Types / Video Decode Device / Device Operation / Flush}
+
+To drop any pending decode requests and decoded frames pending on
+output buffers, the driver MAY issue a VIRTIO_VDEC_GUEST_REQ_FLUSH
+request. Any frames that would result from the previously enqueued
+bitstream data will be dropped and the device will not write to any of
+the frame buffers provided earlier.
+
+\devicenormative{\paragraph}{Flush}{Device Types / Video Decode Device
+  / Device Operation / Flush}
+
+\begin{itemize*}
+\item The device MUST send VIRTIO_VDEC_HOST_REQ_FLUSHED when finishing
+  the flush process.
+\item If VIRTIO_VDEC_GUEST_REQ_FLUSH comes while processing guest
+  requests, the device MUST stop them.
+\item The device MUST NOT process any guest request that comes after
+  VIRTIO_VDEC_GUEST_REQ_FLUSH until it sends
+  VIRTIO_VDEC_HOST_REQ_FLUSHED.
+\item The device MUST dequeue all unread requests in outq when
+  VIRTIO_VDEC_GUEST_REQ_FLUSH comes.
+\end{itemize*}
+
+\subsubsection{EOS Notification}
+\label{sec:Device Types / Video Decode Device / Device Operation / EOS Notification}
+
+The device sends VIRTIO_VDEC_HOST_REQ_EOS when the end of a stream
+(EOS) is reached while decoding.
+
+This request is similar to the V4L2_EVENT_EOS for V4L2 devices.
+
+\subsubsection{Instance Close}
+\label{sec:Device Types / Video Decode Device / Device Operation / Instance Close}
+
+If a decoding instance is no longer needed, the driver SHOULD issue a
+VIRTIO_VDEC_GUEST_REQ_CLOSE request. This will trigger an implicit
+flush operation (as in a VIRTIO_VDEC_GUEST_REQ_FLUSH request; without
+a following host request) and free the instance ID and any associated
+resources on the device side.
+
+The instance is successfully terminated once the device responds to
+the request. After that, it is guaranteed that any buffers registered
+by the driver to given instance will no longer be accessed by the
+device (unless also provided to another instance).
+
+\subsubsection{Error Reporting}
+\label{sec:Device Types / Video Decode Device / Device Operation / Error Reporting}
+
+The device has two ways of reporting errors to the driver: one is for
+errors associated with a host request, the other is for global errors.
+
+If an error is associated with a host request, the device sets
+\field{type} to the host request and \field{result} to a value of
+\field{virtio_vdec_host_req_result} describing the error in
+\field{virtio_vdec_host_req}.
+
+If the device cannot proceed decoding due to an error which is not
+associated with any requests, the device MUST send the
+VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR. In such case, the driver
+MUST stop sending decoding requests.
--
2.23.0.237.gc6a4ce50a0-goog

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-09-19  9:34 ` [virtio-dev] " Keiichi Watanabe
  (?)
@ 2019-09-19  9:52 ` Hans Verkuil
  2019-09-19 11:15     ` [virtio-dev] " Keiichi Watanabe
  -1 siblings, 1 reply; 59+ messages in thread
From: Hans Verkuil @ 2019-09-19  9:52 UTC (permalink / raw)
  To: Keiichi Watanabe, virtio-dev
  Cc: acourbot, alexlau, dgreid, marcheu, posciak, stevensd, tfiga,
	linux-media

Hi Keiichi,

On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> [Resending because of some issues with sending to virtio-dev. Sorry for the noise.]
> 
> This patch proposes virtio specification for new virtio video decode
> device.
> This device provides the functionality of hardware accelerated video
> decoding from encoded video contents provided by the guest into frame
> buffers accessible by the guest.
> 
> We have prototype implementation for VMs on Chrome OS:
> * virtio-vdec device in crosvm [1]
> * virtio-vdec driver in Linux kernel v4.19 [2]
>   - This driver follows V4L2 stateful video decoder API [3].
> 
> Our prototype implementation uses [4], which allows the virtio-vdec
> device to use buffers allocated by virtio-gpu device.
> 
> Any feedback would be greatly appreciated. Thank you.

I'm not a virtio expert, but as I understand it the virtio-vdec driver
looks like a regular v4l2 stateful decoder devices to the guest, while
on the host there is a driver (or something like that) that maps the
virtio-vdec requests to the actual decoder hardware, right?

What concerns me a bit (but there may be good reasons for this) is that
this virtio driver is so specific for stateful decoders.

How does this scale to stateful encoders? Stateless codecs? Other M2M
devices like deinterlacers or colorspace converters? What about webcams?

In other words, I would like to see a bigger picture here.

Note that there is also an effort for Xen to expose webcams to a guest:

https://www.spinics.net/lists/linux-media/msg148629.html

This may or may not be of interest. This was an RFC only, and I haven't
seen any follow-up patches with actual code.

There will be a half-day meeting of media developers during the ELCE
in October about codecs. I know Alexandre and Tomasz will be there.
It might be a good idea to discuss this in more detail if needed.

Regards,

	Hans

> 
> [1] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> [2] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be merged to Linux 5.4)
> [4] https://lkml.org/lkml/2019/9/12/157
> 
> Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> ---
>  content.tex     |   1 +
>  virtio-vdec.tex | 750 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 751 insertions(+)
>  create mode 100644 virtio-vdec.tex
> 
> diff --git a/content.tex b/content.tex
> index 37a2190..b57d4a9 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
>  \input{virtio-input.tex}
>  \input{virtio-crypto.tex}
>  \input{virtio-vsock.tex}
> +\input{virtio-vdec.tex}
> 
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> 
> diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> new file mode 100644
> index 0000000..d117129
> --- /dev/null
> +++ b/virtio-vdec.tex
> @@ -0,0 +1,750 @@
> +\section{Video Decode Device}
> +\label{sec:Device Types / Video Decode Device}
> +
> +virtio-vdec is a virtio based video decoder. This device provides the
> +functionality of hardware accelerated video decoding from encoded
> +video contents provided by the guest into frame buffers accessible by
> +the guest.
> +
> +\subsection{Device ID}
> +\label{sec:Device Types / Video Decode Device / Device ID}
> +
> +28
> +
> +\subsection{Virtqueues}
> +\label{sec:Device Types / Video Decode Device / Virtqueues}
> +
> +\begin{description}
> +\item[0] outq - queue for sending requests from the driver to the
> +  device
> +\item[1] inq - queue for sending requests from the device to the
> +  driver
> +\end{description}
> +
> +Each queue is used uni-directionally. outq is used to send requests
> +from the driver to the device (i.e., guest requests) and inq is used
> +to send requests in the other direction (i.e., host requests).
> +
> +\subsection{Feature bits}
> +\label{sec:Device Types / Video Decode Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}
> +\label{sec:Device Types / Video Decode Device / Device configuration layout}
> +
> +None.
> +
> +\subsection{Device Requirements: Device Initialization}
> +\label{sec:Device Types / Video Decode Device / Device Requirements: Device Initialization}
> +
> +The virtqueues are initialized.
> +
> +\subsection{Device Operation}
> +\label{sec:Device Types / Video Decode Device / Device Operation}
> +
> +\subsubsection{Video Buffers}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
> +
> +A virtio-vdec driver and a device use two types of video buffers:
> +\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
> +contains encoded video stream data. This buffer is similar to an
> +OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
> +contains decoded video frame data like a CAPTURE buffers for V4L2 API.
> +The driver and the device share these buffers, and each buffer is
> +identified by a unique integer called a \emph{resource handle}.
> +
> +\subsubsection{Guest Request}
> +
> +The driver queues requests to the outq virtqueue. The device MAY
> +process requests out-of-order. All requests on outq use the following
> +structure:
> +
> +\begin{lstlisting}
> +enum virtio_vdec_guest_req_type {
> +        VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
> +
> +        /* Global */
> +        VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
> +
> +        /* Per instance */
> +        VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
> +        VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
> +        VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
> +        VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
> +        VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
> +        VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
> +        VIRTIO_VDEC_GUEST_REQ_DRAIN,
> +        VIRTIO_VDEC_GUEST_REQ_FLUSH,
> +        VIRTIO_VDEC_GUEST_REQ_CLOSE,
> +};
> +
> +struct virtio_vdec_guest_req {
> +        le32 type;
> +        le32 instance_id;
> +        union {
> +                struct virtio_vdec_guest_req_open open;
> +                struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
> +                struct virtio_vdec_guest_req_register_buffer register_buffer;
> +                struct virtio_vdec_guest_req_bitstream_buffer bitstream_buffer;
> +                struct virtio_vdec_guest_req_frame_buffer frame_buffer;
> +        };
> +};
> +\end{lstlisting}
> +
> +This structure includes the following generic fields to identify the
> +request:
> +\begin{description}
> +\item[\field{type}] specifies the type of the guest request using
> +  values from \field{enum virtio_vdec_guest_req_type}.
> +\item[\field{instance_id}] specifies an instance ID. This field is
> +  ignored unless \field{type} represents a per-instance request.
> +\end{description}
> +
> +The union fields are additional per-request data used only when
> +\field{type} has a specific value:
> +
> +\begin{description}
> +\item[\field{open}] is used when \field{type} is set to
> +  VIRTIO_VDEC_GUEST_REQ_OPEN. See \ref{sec:Device Types / Video Decode
> +    Device / Device Operation / Instance Open}.
> +\item[\field{set_buffer_count}] is used when \field{type} is set to
> +  VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> +    Video Decode Device / Device Operation / Set Buffer Count}.
> +\item[\field{register_buffer}] is used when \field{type} is set to
> +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> +    Video Decode Device / Device Operation / Buffer Registration}.
> +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> +    Video Decode Device / Device Operation / Bitstream Buffer
> +    Requests}.
> +\item[\field{frame_buffer}] is used when \field{type} is set to
> +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER. See \ref{sec:Device Types /
> +    Video Decode Device / Device Operation / Frame Buffer Requests}.
> +\end{description}
> +
> +\subsubsection{Host Request}
> +
> +The device queues requests to the inq virtqueue. All requests on inq
> +use the following structure:
> +
> +\begin{lstlisting}
> +enum virtio_vdec_host_req_type {
> +        VIRTIO_VDEC_HOST_REQ_UNDEFINED = 0,
> +
> +        /* Global */
> +        VIRTIO_VDEC_HOST_REQ_QUERY = 0x0100,
> +
> +        /* Per instance */
> +        VIRTIO_VDEC_HOST_REQ_OPEN = 0x0200,
> +        VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT,
> +        VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER,
> +        VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER,
> +        VIRTIO_VDEC_HOST_REQ_STREAM_INFO,
> +        VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER,
> +        VIRTIO_VDEC_HOST_REQ_DRAINED,
> +        VIRTIO_VDEC_HOST_REQ_FLUSHED,
> +        VIRTIO_VDEC_HOST_REQ_CLOSE,
> +        VIRTIO_VDEC_HOST_REQ_EOS,
> +
> +        /* Error response */
> +        VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR = 0x1100,
> +};
> +
> +enum virtio_vdec_host_req_result {
> +        /* Success */
> +        VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS = 0,
> +
> +        /* Error */
> +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_UNSPEC = 0x1000,
> +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_INVALID_REQUEST,
> +};
> +
> +struct virtio_vdec_host_req {
> +        le32 type;     /* VIRTIO_VDEC_HOST_REQ_* */
> +        le32 result;   /* VIRTIO_VDEC_HOST_REQ_RESULT_ */
> +        le32 instance_id;
> +        le32 reserved; /* for 64-bit alignment */
> +        union {
> +                struct virtio_vdec_host_req_query query;
> +                struct virtio_vdec_host_req_set_buffer_count set_buffer_count;
> +                struct virtio_vdec_host_req_register_buffer register_buffer;
> +                struct virtio_vdec_host_req_bitstream_buffer bitstream_buffer;
> +                struct virtio_vdec_host_req_stream_info stream_info;
> +                struct virtio_vdec_host_req_frame_buffer frame_buffer;
> +        };
> +};
> +\end{lstlisting}
> +
> +This structure includes the following generic fields:
> +\begin{description}
> +\item[\field{type}] specifies the type of the host request using
> +  values from \field{enum virtio_vdec_host_req_type}.
> +\item[\field{result}] specifies the result of the operation using the
> +  \field{enum virtio_vdec_host_req_result}.
> +\item[\field{instance_id}] specifies an instance ID. This field is
> +  ignored unless \field{type} represents a per-instance request.
> +\end{description}
> +
> +The union fields are additional per-request data used only when
> +\field{type} has a specific value:
> +
> +\begin{description}
> +\item[\field{query}] is used when \field{type} is set to
> +  VIRTIO_VDEC_HOST_REQ_QUERY. See \ref{sec:Device Types / Video Decode
> +    Device / Device Operation / Query Capabilities}.
> +\item[\field{set_buffer_count}] is used when \field{type} is set to
> +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> +    Video Decode Device / Device Operation / Set Buffer Count}.
> +\item[\field{register_buffer}] is used when \field{type} is set to
> +  VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> +    Video Decode Device / Device Operation / Buffer Registration}.
> +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> +  VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> +    Video Decode Device / Device Operation / Bitstream Buffer
> +    Requests}.
> +\item[\field{stream_info}] is used when \field{type} is set to
> +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types / Video
> +    Decode Device / Device Operation / Stream Information}.
> +\item[\field{frame_buffer}] is used when \field{type} is set to
> +  VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER. See \ref{sec:Device Types / Video
> +    Decode Device / Device Operation / Frame Buffer Requests}.
> +\end{description}
> +
> +\subsubsection{Query Capabilities}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Query Capabilities}
> +
> +The driver can send VIRTIO_VDEC_GUEST_REQ_QUERY to query supported
> +bitstream formats and frame formats. This is similar to
> +VIDIOC_ENUM_FMT ioctl and VIDIOC_ENUM_FRAMESIZES ioctl for V4L2
> +devices.
> +
> +The device sends supported formats as VIRTIO_VDEC_HOST_REQ_QUERY with
> +the following structure:
> +
> +\begin{lstlisting}
> +struct virtio_vdec_range {
> +        le32 min;
> +        le32 max;
> +        le32 step;
> +};
> +
> +struct virtio_vdec_format_desc {
> +        le32 fourcc;
> +        le32 mask;
> +        struct virtio_vdec_range width;
> +        struct virtio_vdec_range height;
> +};
> +
> +#define VIRTIO_VDEC_NUM_FORMATS 32
> +
> +struct virtio_vdec_host_req_query {
> +        struct virtio_vdec_format_desc bitstream_formats[VIRTIO_VDEC_NUM_FORMATS];
> +        struct virtio_vdec_format_desc frame_formats[VIRTIO_VDEC_NUM_FORMATS];
> +};
> +\end{lstlisting}
> +
> +The \field{virtio_vdec_host_req_query} describes the set of supported
> +bitstream and frame formats, with corresponding ranges of image
> +resolution and compatibility mask to determine the compatibility
> +between coded and raw formats.
> +
> +\begin{description}
> +\item[fourcc] specifies a FOURCC of a supported video codec or a pixel
> +  format.
> +\item[mask] is used to represent supported combination of video codec
> +  and pixel format. If i-th bit is set in \field{mask} of the
> +  \field{bitstream_formats[j]}, the device MUST support decoding from
> +  the video codec of \field{bitstream_formats[j]} to the pixel format
> +  of \field{frame_formats[i]}.
> +\item[width/height] represents either a supported stream resolution or
> +  a supported frame buffer resolution.
> +\end{description}
> +
> +\devicenormative{\paragraph}{Query Capabilities}{Device Types / Video
> +  Decode Device / Device Operation / Query Capabilities}
> +
> +The device MUST set \field{mask} to 0 if the \field{struct
> +  virtio_vdec_format_desc} is invalid in VIRTIO_VDEC_HOST_REQ_QUERY.
> +
> +\subsubsection{Instance Open}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Open}
> +
> +To acquire a decoder instance for a given coded format, the driver
> +sends a VIRTIO_VDEC_GUEST_REQ_OPEN request. The driver specifies a
> +video codec with the following structure:
> +
> +\begin{lstlisting}
> +struct virtio_vdec_guest_req_open {
> +        le32 bitstream_fourcc;
> +};
> +\end{lstlisting}
> +
> +When the guest request comes, the device allocates the instance and
> +sends a VIRTIO_VDEC_HOST_REQ_OPEN request.
> +
> +Once the driver recieves VIRTIO_VDEC_HOST_REQ_OPEN, the driver can
> +send per-instance guest requests to the opened instance by specifying
> +\field{instance_id}. The device MUST process per-instance requests
> +with a same value of \field{instance_id} in order.
> +
> +\drivernormative{\paragraph}{Instance Open}{Device Types / Video
> +  Decode Device / Device Operation / Instance Open}
> +
> +In VIRTIO_VDEC_GUEST_REQ_OPEN,
> +\begin{itemize*}
> +\item The driver MUST set \field{bitstream_fourcc} to a FOURCC of a
> +  video codec that is supported by the device.
> +\item The driver MUST set \field{instance_id} to an integer that is
> +  not used as an \field{instance_id} before.
> +\end{itemize*}
> +
> +\devicenormative{\paragraph}{Instance Open}{Device Types / Video
> +  Decode Device / Device Operation / Instance Open}
> +
> +If the device fails to allocate an instance, the device MUST set
> +\field{result} to a value other than
> +VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS.
> +
> +\subsubsection{Set Buffer Count}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Set Buffer Count}
> +
> +Once an instance is acquired, the driver and the device negotiate the
> +number of buffers to use during the decoding process by
> +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT and
> +VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. These requests are similar to
> +VIDIOC_REQBUFS controls for V4L2 devices. First, the driver sends
> +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to tell how many buffers the
> +driver needs by using the following structure:
> +
> +\begin{lstlisting}
> +enum virtio_vdec_buffer_type {
> +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_BITSTREAM = 0,
> +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_FRAME = 1,
> +};
> +
> +struct virtio_vdec_guest_req_set_buffer_count {
> +        le32 type;
> +        le32 buffer_count;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[type] specifies a buffer type with a value of the
> +  \field{virtio_vdec_buffer_type}.
> +\item[buffer_count] is the minimum number of buffers that the driver
> +  needs to use.
> +\end{description}
> +
> +The device responds the VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to
> +notify the number of buffers that it can accept with the following
> +structure.
> +
> +\begin{lstlisting}
> +struct virtio_vdec_host_req_set_buffer_count {
> +        le32 type;
> +        le32 buffer_count;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[type] specifies a buffer type with a value of the
> +  \field{virtio_vdec_buffer_type}.
> +\item[buffer_count] is the maximum number of buffers that the device
> +  can accept.
> +\end{description}
> +
> +\drivernormative{\paragraph}{Set Buffer Count}{Device Types / Video
> +  Decode Device / Device Operation / Set Buffer Count}
> +
> +The driver MUST set \field{buffer_count} to a non-zero value in
> +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. The value of
> +\field{buffer_count} for frame buffers MUST match the requirement
> +given via VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types
> +  / Video Decode Device / Device Operation / Stream Information}.
> +
> +\devicenormative{\paragraph}{Set Buffer Count}{Device Types / Video
> +  Decode Device / Device Operation / Set Buffer Count}
> +
> +The device MUST set \field{buffer_count} to a number of buffers that
> +the device can accept. The value MAY be different from the value the
> +driver set \field{buffer_count} to in the corresponding
> +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT request.
> +
> +\subsubsection{Buffer Registration}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Buffer Registration}
> +
> +Once the number of buffers is set, the driver registers metadata of
> +buffers via the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER with the
> +following structure:
> +
> +\begin{lstlisting}
> +#define VIRTIO_VDEC_MAX_PLANES 8
> +
> +struct virtio_vdec_guest_req_register_buffer {
> +        le32 type;
> +        le32 num_planes;
> +        struct virtio_vdec_plane {
> +                le32 handle;
> +                le32 offset;
> +                le32 length;
> +        } planes[VIRTIO_VDEC_MAX_PLANES];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[VIRTIO_VDEC_MAX_PLANES] is the maximum number of planes that a
> +  video format needs. It is similar to VIDEO_MAX_PLANES in
> +  \href{https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/videodev2.h}{include/uapi/linux/videodev2.h}
> +  in the Linux source tree.
> +\item[type] specifies a buffer type with a value of the
> +  \field{virtio_vdec_buffer_type}.
> +\item[num_planes] is the number of planes (i.e. separate memory
> +  buffers) for this format. This indicates the number of valid entries
> +  in \field{planes}.
> +\item[planes] is an array of structures describing information of each
> +  plane.
> +  \begin{description}
> +  \item[handle] is a resource handle for the buffer.
> +  \item[offset] equals to the offset in the plane to the start of
> +    data.
> +  \item[length] is the size of this plane in bytes.
> +  \end{description}
> +\end{description}
> +
> +Once the device accepts the buffer registration, the device responds
> +the VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER with the following structure.
> +
> +\begin{lstlisting}
> +struct virtio_vdec_host_req_register_buffer {
> +        le32 type;
> +        le32 handles[VIRTIO_VDEC_MAX_PLANES];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[type] specifies a buffer type with a value of the
> +  \field{virtio_vdec_buffer_type}.
> +\item[handles] is an array of resource handles from the corresponding
> +  guest request.
> +\end{description}
> +
> +\drivernormative{\paragraph}{Buffer Registration}{Device Types / Video
> +  Decode Device / Device Operation / Buffer Registration}
> +
> +The VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER requires followings:
> +
> +\begin{itemize*}
> +\item The driver MUST set \field{type}.
> +\item The driver MUST set \field{num_planes} to a non-zero value that
> +  is less than of equal to VIRTIO_VDEC_MAX_PLANES.
> +\item The driver MUST fill out the fields of \field{planes[i]} for
> +  each \(0 \le i < \field{num_planes} \).
> +\item The driver MUST NOT send VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER
> +  more than the number of times that the device requested via
> +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT.
> +\end{itemize*}
> +
> +\devicenormative{\paragraph}{Buffer Registration}{Device Types / Video
> +  Decode Device / Device Operation / Buffer Registration}
> +
> +The VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER requires followings:
> +
> +\begin{itemize*}
> +\item The device MUST set \field{type} to the value that the
> +  corresponding VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER used.
> +\item The device MUST set \field{handles[i]} to the value that equals
> +  to \field{register_buffer.planes[i].handle} in the corresponding
> +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> +\end{itemize*}
> +
> +\subsubsection{Bitstream Buffer Requests}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Bitstream Buffer Requests}
> +
> +The driver sends VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER to ask the
> +device to decode contents in a registered bitstream buffer. This
> +request is similar to the V4L2's VIDIOC_QBUF ioctl for the OUTPUT
> +buffer. The following structure is used to specify the buffer:
> +
> +\begin{lstlisting}
> +struct virtio_vdec_guest_req_bitstream_buffer {
> +        le32 handle;
> +        le32 offset;
> +        le32 length;
> +        le64 cookie;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[handle] is a resource handle for the bitstream buffer to be
> +  decoded.
> +\item[offset] is the offset in bytes to video data in the buffer.
> +\item[length] is the size of valid data in the buffer.
> +\item[cookie] is an identifier for the frame decoded from this
> +  bitstream data. This cookie will be copied to the matching frame
> +  buffer. See also \field{virtio_vdec_host_req_frame_buffer} in
> +  \ref{sec:Device Types / Video Decode Device / Device Operation /
> +    Frame Buffer Requests}.
> +\end{description}
> +
> +Once the device consumes the bitstream passed by the
> +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request, it sends
> +VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER with the following structure:
> +
> +\begin{lstlisting}
> +struct virtio_vdec_host_req_bitstream_buffer {
> +        le32 handle;
> +};
> +\end{lstlisting}
> +\begin{description}
> +\item[handle] is a resource handle of the bitstream buffer that the
> +  device completed processing.
> +\end{description}
> +
> +The driver can reuse bitstream buffers that
> +VIRTIO_VDEC_HOST_REQ_BITSTREAM is sent for.
> +
> +\drivernormative{\paragraph}{Bitstream Buffer Requests}{Device Types /
> +  Video Decode Device / Device Operation / Bitstream Buffer Requests}
> +
> +\begin{itemize*}
> +\item The driver MUST use \field{handle} which is already registered
> +  by the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> +\end{itemize*}
> +
> +\subsubsection{Stream Information}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Stream Information}
> +
> +To obtain the information required to allocate frame buffers
> +(resolution, etc.), the driver registers bitstream buffers first and
> +proceeds with queuing VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER requests
> +until stream information is received via a
> +VIRTIO_VDEC_HOST_REQ_STREAM_INFO host request. The
> +VIRTIO_VDEC_HOST_REQ_STREAM_INFO is also sent when a run-time
> +resolution change is detected. This request is similar to
> +V4L2_EVENT_SOURCE_CHANGE events for V4L2 devices.
> +
> +The following structure is used for the
> +VIRTIO_VDEC_HOST_REQ_STREAM_INFO:
> +
> +\begin{lstlisting}
> +struct virtio_vdec_host_req_stream_info {
> +        le32 frame_fourcc;
> +        le32 fb_width;
> +        le32 fb_height;
> +        le32 min_frame_buffer_count;
> +        le32 max_frame_buffer_count;
> +        struct virtio_vdec_host_req_stream_crop {
> +                le32 left;
> +                le32 top;
> +                le32 width;
> +                le32 height;
> +        } crop;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[frame_fourcc] indicates the pixel format that the video is
> +  decoded to.
> +\item[fb_width] is the width of a required frame buffer.
> +\item[fb_height] is the height of a required frame buffer.
> +\item[min_frame_buffer_count] is the minimum number of frame buffers
> +  that the device requires.
> +\item[max_frame_buffer_count] is the maximum number of frame buffers
> +  that the device can accept.
> +\item[crop] indicates video cropping information.
> +    \begin{description}
> +    \item[left] is the horizontal offset of the left-most valid pixel
> +      of the video in pixels.
> +    \item[top] is the vertical offset of the top-most valid line of
> +      the video in lines.
> +    \item[width] is the width of the rectangle in pixels.
> +    \item[height] is the height of the rectangle in pixels.
> +    \end{description}
> +\end{description}
> +
> +The driver responds the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO once it
> +processes the incoming stream information.
> +
> +\devicenormative{\paragraph}{Stream Information}{Device Types / Video
> +  Decode Device / Device Operation / Stream Information}
> +
> +\begin{itemize*}
> +\item After the device send the VIRTIO_VDEC_HOST_REQ_STREAM_INFO, the
> +  device MUST empty its list of registered frame buffers and skip any
> +  pending VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER requests until a
> +  VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO request is received.
> +\end{itemize*}
> +
> +\drivernormative{\paragraph}{Stream Information}{Device Types / Video
> +  Decode Device / Device Operation / Stream Information}
> +
> +\begin{itemize*}
> +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO
> +  once it processes a VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> +\item If VIRTIO_VDEC_HOST_REQ_STREAM_INFO came when the driver is
> +  waiting for a VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER request, the driver
> +  MUST enqueue the corresponding VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> +  again after sending the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for a
> +  frame buffer with new stream information.
> +\end{itemize*}
> +
> +\subsubsection{Frame Buffer Requests}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Frame Buffer Requests}
> +
> +The driver provides frame buffers via
> +VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER, in which the device will fill
> +decoded frames. This request is similar to V4L2's VIDIOC_QBUF ioctl
> +for CAPTURE buffers. The following structure is used for the request:
> +
> +\begin{lstlisting}
> +struct virtio_vdec_guest_req_frame_buffer {
> +        struct virtio_vdec_frame_buffer_plane {
> +                le32 handle;
> +                le32 offset;
> +                le32 stride;
> +                le32 length;
> +        } planes[VIRTIO_VDEC_MAX_PLANES];
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[planes] is an array of planes that a frame buffer consists of.
> +    \begin{description}
> +    \item[handle] is a resource handle to indicate the buffer for a
> +      plane.
> +    \item[offset] is the offset in bytes to the place where decoded
> +      frame is filled in.
> +    \item[stride] is the line stride.
> +    \item[length] is the length of plane where the device can fill in
> +      decoded data.
> +    \end{description}
> +\end{description}
> +
> +When the device fills the buffer with decoded frame data, the device
> +notifies to the driver by VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER with the
> +following structure:
> +
> +\begin{lstlisting}
> +struct virtio_vdec_host_req_frame_buffer {
> +        le32 handle[VIRTIO_VDEC_MAX_PLANES];
> +        le64 cookie;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[handles] is an array of handles passed via the
> +  \field{virtio_vdec_guest_req_frame_buffer} in the corresponding
> +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER.
> +\item[cookie] is a \field{cookie} passed via a
> +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. This indicates a bitstream
> +  buffer which a frame was decoded from.
> +\end{description}
> +
> +\drivernormative{\paragraph}{Frame Buffer Requests}{Device Types /
> +  Video Decode Device / Device Operation / Frame Buffer Requests}
> +
> +\begin{itemize*}
> +\item Before the driver sends the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> +  for a frame buffer, the driver MUST have sent the
> +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for the buffer.
> +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER for
> +  a frame buffer that satisfies the restriction passed by
> +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> +\end{itemize*}
> +
> +\devicenormative{\paragraph}{Frame Buffer Requests}{Device Types /
> +  Video Decode Device / Device Operation / Frame Buffer Requests}
> +
> +When the device successfully processed the frame buffer, it MUST set
> +\field{cookie} to a value that is passed via a
> +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request that provided the
> +corresponding bitstream.
> +
> +\subsubsection{Drain}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Drain}
> +
> +To ensure that any pending requests are processed and decoded frames
> +returned in appropriate replies, the driver MAY issue a
> +VIRTIO_VDEC_GUEST_REQ_DRAIN request. The device will continue
> +processing the requests as usual, but once all the requests of given
> +context preceding the DRAIN request are processed, the device will
> +issue a VIRTIO_VDEC_HOST_REQ_DRAINED host request to notify the driver
> +of this fact.
> +
> +\devicenormative{\paragraph}{Drain}{Device Types / Video Decode Device
> +  / Device Operation / Drain}
> +
> +\begin{itemize*}
> +\item When VIRTIO_VDEC_GUEST_REQ_DRAIN comes, the device MUST process
> +  all pending guest requests. After processing them, the device MUST
> +  send VIRTIO_VDEC_HOST_REQ_DRAINED to the driver.
> +\item While processing a drain request, the device MUST ignore any
> +  guest requests other than Flush and Close requests. See
> +  \ref{sec:Device Types / Video Decode Device / Device Operation / Flush}
> +  and
> +  \ref{sec:Device Types / Video Decode Device / Device Operation / Instance
> +    Close}.
> +\end{itemize*}
> +
> +\subsubsection{Flush}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Flush}
> +
> +To drop any pending decode requests and decoded frames pending on
> +output buffers, the driver MAY issue a VIRTIO_VDEC_GUEST_REQ_FLUSH
> +request. Any frames that would result from the previously enqueued
> +bitstream data will be dropped and the device will not write to any of
> +the frame buffers provided earlier.
> +
> +\devicenormative{\paragraph}{Flush}{Device Types / Video Decode Device
> +  / Device Operation / Flush}
> +
> +\begin{itemize*}
> +\item The device MUST send VIRTIO_VDEC_HOST_REQ_FLUSHED when finishing
> +  the flush process.
> +\item If VIRTIO_VDEC_GUEST_REQ_FLUSH comes while processing guest
> +  requests, the device MUST stop them.
> +\item The device MUST NOT process any guest request that comes after
> +  VIRTIO_VDEC_GUEST_REQ_FLUSH until it sends
> +  VIRTIO_VDEC_HOST_REQ_FLUSHED.
> +\item The device MUST dequeue all unread requests in outq when
> +  VIRTIO_VDEC_GUEST_REQ_FLUSH comes.
> +\end{itemize*}
> +
> +\subsubsection{EOS Notification}
> +\label{sec:Device Types / Video Decode Device / Device Operation / EOS Notification}
> +
> +The device sends VIRTIO_VDEC_HOST_REQ_EOS when the end of a stream
> +(EOS) is reached while decoding.
> +
> +This request is similar to the V4L2_EVENT_EOS for V4L2 devices.
> +
> +\subsubsection{Instance Close}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Close}
> +
> +If a decoding instance is no longer needed, the driver SHOULD issue a
> +VIRTIO_VDEC_GUEST_REQ_CLOSE request. This will trigger an implicit
> +flush operation (as in a VIRTIO_VDEC_GUEST_REQ_FLUSH request; without
> +a following host request) and free the instance ID and any associated
> +resources on the device side.
> +
> +The instance is successfully terminated once the device responds to
> +the request. After that, it is guaranteed that any buffers registered
> +by the driver to given instance will no longer be accessed by the
> +device (unless also provided to another instance).
> +
> +\subsubsection{Error Reporting}
> +\label{sec:Device Types / Video Decode Device / Device Operation / Error Reporting}
> +
> +The device has two ways of reporting errors to the driver: one is for
> +errors associated with a host request, the other is for global errors.
> +
> +If an error is associated with a host request, the device sets
> +\field{type} to the host request and \field{result} to a value of
> +\field{virtio_vdec_host_req_result} describing the error in
> +\field{virtio_vdec_host_req}.
> +
> +If the device cannot proceed decoding due to an error which is not
> +associated with any requests, the device MUST send the
> +VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR. In such case, the driver
> +MUST stop sending decoding requests.
> --
> 2.23.0.237.gc6a4ce50a0-goog
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-09-19  9:52 ` Hans Verkuil
@ 2019-09-19 11:15     ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-09-19 11:15 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: virtio-dev, Alexandre Courbot, Alex Lau, Dylan Reid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Tomasz Figa, Linux Media Mailing List

Hi Hans,
Thank you for your feedback.

On Thu, Sep 19, 2019 at 6:53 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi Keiichi,
>
> On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> > [Resending because of some issues with sending to virtio-dev. Sorry for the noise.]
> >
> > This patch proposes virtio specification for new virtio video decode
> > device.
> > This device provides the functionality of hardware accelerated video
> > decoding from encoded video contents provided by the guest into frame
> > buffers accessible by the guest.
> >
> > We have prototype implementation for VMs on Chrome OS:
> > * virtio-vdec device in crosvm [1]
> > * virtio-vdec driver in Linux kernel v4.19 [2]
> >   - This driver follows V4L2 stateful video decoder API [3].
> >
> > Our prototype implementation uses [4], which allows the virtio-vdec
> > device to use buffers allocated by virtio-gpu device.
> >
> > Any feedback would be greatly appreciated. Thank you.
>
> I'm not a virtio expert, but as I understand it the virtio-vdec driver
> looks like a regular v4l2 stateful decoder devices to the guest, while
> on the host there is a driver (or something like that) that maps the
> virtio-vdec requests to the actual decoder hardware, right?
>
> What concerns me a bit (but there may be good reasons for this) is that
> this virtio driver is so specific for stateful decoders.
>

We aim to design a platform-independent interface. The virtio-vdec protocol
should be designed to be workable regardless of APIs, OS, and platforms
eventually.

Our prototype virtio-vdec device translates the virtio-vdec protocol to a
Chrome's video decode acceleration API instead of talking to hardware decoders
directly. This Chrome's API is an abstract layer for multiple decoder APIs on
Linux such as V4L2 stateful, V4L2 slice, Intel's VAAPI.

That is to say the guest driver translates V4L2 stateful API to virtio-vdec and
the host device translates virtio-vdec to Chrome's API. So, I could say that
this is already more general than a mere V4L2 stateful API wrapper, at least.

I'd appreciate if you could let me know some parts are still specific to V4L2.


> How does this scale to stateful encoders? Stateless codecs? Other M2M
> devices like deinterlacers or colorspace converters? What about webcams?
>

We're designing virtio protocol for encoder as well, but we are at an early
stage. So, I'm not sure if we should/can handle decoder and encoder in one
protocol. I don't have any plans for other media devices.


> In other words, I would like to see a bigger picture here.
>
> Note that there is also an effort for Xen to expose webcams to a guest:
>
> https://www.spinics.net/lists/linux-media/msg148629.html
>

Good to know. Thanks.

> This may or may not be of interest. This was an RFC only, and I haven't
> seen any follow-up patches with actual code.
>
> There will be a half-day meeting of media developers during the ELCE
> in October about codecs. I know Alexandre and Tomasz will be there.
> It might be a good idea to discuss this in more detail if needed.
>

Sounds good. They are closely working with me.

Regards,
Keiichi

> Regards,
>
>         Hans
>
> >
> > [1] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> > [2] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> > [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be merged to Linux 5.4)
> > [4] https://lkml.org/lkml/2019/9/12/157
> >
> > Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> > ---
> >  content.tex     |   1 +
> >  virtio-vdec.tex | 750 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 751 insertions(+)
> >  create mode 100644 virtio-vdec.tex
> >
> > diff --git a/content.tex b/content.tex
> > index 37a2190..b57d4a9 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> >  \input{virtio-input.tex}
> >  \input{virtio-crypto.tex}
> >  \input{virtio-vsock.tex}
> > +\input{virtio-vdec.tex}
> >
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >
> > diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> > new file mode 100644
> > index 0000000..d117129
> > --- /dev/null
> > +++ b/virtio-vdec.tex
> > @@ -0,0 +1,750 @@
> > +\section{Video Decode Device}
> > +\label{sec:Device Types / Video Decode Device}
> > +
> > +virtio-vdec is a virtio based video decoder. This device provides the
> > +functionality of hardware accelerated video decoding from encoded
> > +video contents provided by the guest into frame buffers accessible by
> > +the guest.
> > +
> > +\subsection{Device ID}
> > +\label{sec:Device Types / Video Decode Device / Device ID}
> > +
> > +28
> > +
> > +\subsection{Virtqueues}
> > +\label{sec:Device Types / Video Decode Device / Virtqueues}
> > +
> > +\begin{description}
> > +\item[0] outq - queue for sending requests from the driver to the
> > +  device
> > +\item[1] inq - queue for sending requests from the device to the
> > +  driver
> > +\end{description}
> > +
> > +Each queue is used uni-directionally. outq is used to send requests
> > +from the driver to the device (i.e., guest requests) and inq is used
> > +to send requests in the other direction (i.e., host requests).
> > +
> > +\subsection{Feature bits}
> > +\label{sec:Device Types / Video Decode Device / Feature bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}
> > +\label{sec:Device Types / Video Decode Device / Device configuration layout}
> > +
> > +None.
> > +
> > +\subsection{Device Requirements: Device Initialization}
> > +\label{sec:Device Types / Video Decode Device / Device Requirements: Device Initialization}
> > +
> > +The virtqueues are initialized.
> > +
> > +\subsection{Device Operation}
> > +\label{sec:Device Types / Video Decode Device / Device Operation}
> > +
> > +\subsubsection{Video Buffers}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
> > +
> > +A virtio-vdec driver and a device use two types of video buffers:
> > +\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
> > +contains encoded video stream data. This buffer is similar to an
> > +OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
> > +contains decoded video frame data like a CAPTURE buffers for V4L2 API.
> > +The driver and the device share these buffers, and each buffer is
> > +identified by a unique integer called a \emph{resource handle}.
> > +
> > +\subsubsection{Guest Request}
> > +
> > +The driver queues requests to the outq virtqueue. The device MAY
> > +process requests out-of-order. All requests on outq use the following
> > +structure:
> > +
> > +\begin{lstlisting}
> > +enum virtio_vdec_guest_req_type {
> > +        VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
> > +
> > +        /* Global */
> > +        VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
> > +
> > +        /* Per instance */
> > +        VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
> > +        VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
> > +        VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
> > +        VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
> > +        VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
> > +        VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
> > +        VIRTIO_VDEC_GUEST_REQ_DRAIN,
> > +        VIRTIO_VDEC_GUEST_REQ_FLUSH,
> > +        VIRTIO_VDEC_GUEST_REQ_CLOSE,
> > +};
> > +
> > +struct virtio_vdec_guest_req {
> > +        le32 type;
> > +        le32 instance_id;
> > +        union {
> > +                struct virtio_vdec_guest_req_open open;
> > +                struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
> > +                struct virtio_vdec_guest_req_register_buffer register_buffer;
> > +                struct virtio_vdec_guest_req_bitstream_buffer bitstream_buffer;
> > +                struct virtio_vdec_guest_req_frame_buffer frame_buffer;
> > +        };
> > +};
> > +\end{lstlisting}
> > +
> > +This structure includes the following generic fields to identify the
> > +request:
> > +\begin{description}
> > +\item[\field{type}] specifies the type of the guest request using
> > +  values from \field{enum virtio_vdec_guest_req_type}.
> > +\item[\field{instance_id}] specifies an instance ID. This field is
> > +  ignored unless \field{type} represents a per-instance request.
> > +\end{description}
> > +
> > +The union fields are additional per-request data used only when
> > +\field{type} has a specific value:
> > +
> > +\begin{description}
> > +\item[\field{open}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_OPEN. See \ref{sec:Device Types / Video Decode
> > +    Device / Device Operation / Instance Open}.
> > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > +\item[\field{register_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Buffer Registration}.
> > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Bitstream Buffer
> > +    Requests}.
> > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Frame Buffer Requests}.
> > +\end{description}
> > +
> > +\subsubsection{Host Request}
> > +
> > +The device queues requests to the inq virtqueue. All requests on inq
> > +use the following structure:
> > +
> > +\begin{lstlisting}
> > +enum virtio_vdec_host_req_type {
> > +        VIRTIO_VDEC_HOST_REQ_UNDEFINED = 0,
> > +
> > +        /* Global */
> > +        VIRTIO_VDEC_HOST_REQ_QUERY = 0x0100,
> > +
> > +        /* Per instance */
> > +        VIRTIO_VDEC_HOST_REQ_OPEN = 0x0200,
> > +        VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT,
> > +        VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER,
> > +        VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER,
> > +        VIRTIO_VDEC_HOST_REQ_STREAM_INFO,
> > +        VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER,
> > +        VIRTIO_VDEC_HOST_REQ_DRAINED,
> > +        VIRTIO_VDEC_HOST_REQ_FLUSHED,
> > +        VIRTIO_VDEC_HOST_REQ_CLOSE,
> > +        VIRTIO_VDEC_HOST_REQ_EOS,
> > +
> > +        /* Error response */
> > +        VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR = 0x1100,
> > +};
> > +
> > +enum virtio_vdec_host_req_result {
> > +        /* Success */
> > +        VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS = 0,
> > +
> > +        /* Error */
> > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_UNSPEC = 0x1000,
> > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_INVALID_REQUEST,
> > +};
> > +
> > +struct virtio_vdec_host_req {
> > +        le32 type;     /* VIRTIO_VDEC_HOST_REQ_* */
> > +        le32 result;   /* VIRTIO_VDEC_HOST_REQ_RESULT_ */
> > +        le32 instance_id;
> > +        le32 reserved; /* for 64-bit alignment */
> > +        union {
> > +                struct virtio_vdec_host_req_query query;
> > +                struct virtio_vdec_host_req_set_buffer_count set_buffer_count;
> > +                struct virtio_vdec_host_req_register_buffer register_buffer;
> > +                struct virtio_vdec_host_req_bitstream_buffer bitstream_buffer;
> > +                struct virtio_vdec_host_req_stream_info stream_info;
> > +                struct virtio_vdec_host_req_frame_buffer frame_buffer;
> > +        };
> > +};
> > +\end{lstlisting}
> > +
> > +This structure includes the following generic fields:
> > +\begin{description}
> > +\item[\field{type}] specifies the type of the host request using
> > +  values from \field{enum virtio_vdec_host_req_type}.
> > +\item[\field{result}] specifies the result of the operation using the
> > +  \field{enum virtio_vdec_host_req_result}.
> > +\item[\field{instance_id}] specifies an instance ID. This field is
> > +  ignored unless \field{type} represents a per-instance request.
> > +\end{description}
> > +
> > +The union fields are additional per-request data used only when
> > +\field{type} has a specific value:
> > +
> > +\begin{description}
> > +\item[\field{query}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_QUERY. See \ref{sec:Device Types / Video Decode
> > +    Device / Device Operation / Query Capabilities}.
> > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > +\item[\field{register_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Buffer Registration}.
> > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Bitstream Buffer
> > +    Requests}.
> > +\item[\field{stream_info}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types / Video
> > +    Decode Device / Device Operation / Stream Information}.
> > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER. See \ref{sec:Device Types / Video
> > +    Decode Device / Device Operation / Frame Buffer Requests}.
> > +\end{description}
> > +
> > +\subsubsection{Query Capabilities}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Query Capabilities}
> > +
> > +The driver can send VIRTIO_VDEC_GUEST_REQ_QUERY to query supported
> > +bitstream formats and frame formats. This is similar to
> > +VIDIOC_ENUM_FMT ioctl and VIDIOC_ENUM_FRAMESIZES ioctl for V4L2
> > +devices.
> > +
> > +The device sends supported formats as VIRTIO_VDEC_HOST_REQ_QUERY with
> > +the following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_range {
> > +        le32 min;
> > +        le32 max;
> > +        le32 step;
> > +};
> > +
> > +struct virtio_vdec_format_desc {
> > +        le32 fourcc;
> > +        le32 mask;
> > +        struct virtio_vdec_range width;
> > +        struct virtio_vdec_range height;
> > +};
> > +
> > +#define VIRTIO_VDEC_NUM_FORMATS 32
> > +
> > +struct virtio_vdec_host_req_query {
> > +        struct virtio_vdec_format_desc bitstream_formats[VIRTIO_VDEC_NUM_FORMATS];
> > +        struct virtio_vdec_format_desc frame_formats[VIRTIO_VDEC_NUM_FORMATS];
> > +};
> > +\end{lstlisting}
> > +
> > +The \field{virtio_vdec_host_req_query} describes the set of supported
> > +bitstream and frame formats, with corresponding ranges of image
> > +resolution and compatibility mask to determine the compatibility
> > +between coded and raw formats.
> > +
> > +\begin{description}
> > +\item[fourcc] specifies a FOURCC of a supported video codec or a pixel
> > +  format.
> > +\item[mask] is used to represent supported combination of video codec
> > +  and pixel format. If i-th bit is set in \field{mask} of the
> > +  \field{bitstream_formats[j]}, the device MUST support decoding from
> > +  the video codec of \field{bitstream_formats[j]} to the pixel format
> > +  of \field{frame_formats[i]}.
> > +\item[width/height] represents either a supported stream resolution or
> > +  a supported frame buffer resolution.
> > +\end{description}
> > +
> > +\devicenormative{\paragraph}{Query Capabilities}{Device Types / Video
> > +  Decode Device / Device Operation / Query Capabilities}
> > +
> > +The device MUST set \field{mask} to 0 if the \field{struct
> > +  virtio_vdec_format_desc} is invalid in VIRTIO_VDEC_HOST_REQ_QUERY.
> > +
> > +\subsubsection{Instance Open}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Open}
> > +
> > +To acquire a decoder instance for a given coded format, the driver
> > +sends a VIRTIO_VDEC_GUEST_REQ_OPEN request. The driver specifies a
> > +video codec with the following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_guest_req_open {
> > +        le32 bitstream_fourcc;
> > +};
> > +\end{lstlisting}
> > +
> > +When the guest request comes, the device allocates the instance and
> > +sends a VIRTIO_VDEC_HOST_REQ_OPEN request.
> > +
> > +Once the driver recieves VIRTIO_VDEC_HOST_REQ_OPEN, the driver can
> > +send per-instance guest requests to the opened instance by specifying
> > +\field{instance_id}. The device MUST process per-instance requests
> > +with a same value of \field{instance_id} in order.
> > +
> > +\drivernormative{\paragraph}{Instance Open}{Device Types / Video
> > +  Decode Device / Device Operation / Instance Open}
> > +
> > +In VIRTIO_VDEC_GUEST_REQ_OPEN,
> > +\begin{itemize*}
> > +\item The driver MUST set \field{bitstream_fourcc} to a FOURCC of a
> > +  video codec that is supported by the device.
> > +\item The driver MUST set \field{instance_id} to an integer that is
> > +  not used as an \field{instance_id} before.
> > +\end{itemize*}
> > +
> > +\devicenormative{\paragraph}{Instance Open}{Device Types / Video
> > +  Decode Device / Device Operation / Instance Open}
> > +
> > +If the device fails to allocate an instance, the device MUST set
> > +\field{result} to a value other than
> > +VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS.
> > +
> > +\subsubsection{Set Buffer Count}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Set Buffer Count}
> > +
> > +Once an instance is acquired, the driver and the device negotiate the
> > +number of buffers to use during the decoding process by
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT and
> > +VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. These requests are similar to
> > +VIDIOC_REQBUFS controls for V4L2 devices. First, the driver sends
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to tell how many buffers the
> > +driver needs by using the following structure:
> > +
> > +\begin{lstlisting}
> > +enum virtio_vdec_buffer_type {
> > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_BITSTREAM = 0,
> > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_FRAME = 1,
> > +};
> > +
> > +struct virtio_vdec_guest_req_set_buffer_count {
> > +        le32 type;
> > +        le32 buffer_count;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[buffer_count] is the minimum number of buffers that the driver
> > +  needs to use.
> > +\end{description}
> > +
> > +The device responds the VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to
> > +notify the number of buffers that it can accept with the following
> > +structure.
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_set_buffer_count {
> > +        le32 type;
> > +        le32 buffer_count;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[buffer_count] is the maximum number of buffers that the device
> > +  can accept.
> > +\end{description}
> > +
> > +\drivernormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > +  Decode Device / Device Operation / Set Buffer Count}
> > +
> > +The driver MUST set \field{buffer_count} to a non-zero value in
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. The value of
> > +\field{buffer_count} for frame buffers MUST match the requirement
> > +given via VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types
> > +  / Video Decode Device / Device Operation / Stream Information}.
> > +
> > +\devicenormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > +  Decode Device / Device Operation / Set Buffer Count}
> > +
> > +The device MUST set \field{buffer_count} to a number of buffers that
> > +the device can accept. The value MAY be different from the value the
> > +driver set \field{buffer_count} to in the corresponding
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT request.
> > +
> > +\subsubsection{Buffer Registration}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffer Registration}
> > +
> > +Once the number of buffers is set, the driver registers metadata of
> > +buffers via the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER with the
> > +following structure:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_VDEC_MAX_PLANES 8
> > +
> > +struct virtio_vdec_guest_req_register_buffer {
> > +        le32 type;
> > +        le32 num_planes;
> > +        struct virtio_vdec_plane {
> > +                le32 handle;
> > +                le32 offset;
> > +                le32 length;
> > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[VIRTIO_VDEC_MAX_PLANES] is the maximum number of planes that a
> > +  video format needs. It is similar to VIDEO_MAX_PLANES in
> > +  \href{https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/videodev2.h}{include/uapi/linux/videodev2.h}
> > +  in the Linux source tree.
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[num_planes] is the number of planes (i.e. separate memory
> > +  buffers) for this format. This indicates the number of valid entries
> > +  in \field{planes}.
> > +\item[planes] is an array of structures describing information of each
> > +  plane.
> > +  \begin{description}
> > +  \item[handle] is a resource handle for the buffer.
> > +  \item[offset] equals to the offset in the plane to the start of
> > +    data.
> > +  \item[length] is the size of this plane in bytes.
> > +  \end{description}
> > +\end{description}
> > +
> > +Once the device accepts the buffer registration, the device responds
> > +the VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER with the following structure.
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_register_buffer {
> > +        le32 type;
> > +        le32 handles[VIRTIO_VDEC_MAX_PLANES];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[handles] is an array of resource handles from the corresponding
> > +  guest request.
> > +\end{description}
> > +
> > +\drivernormative{\paragraph}{Buffer Registration}{Device Types / Video
> > +  Decode Device / Device Operation / Buffer Registration}
> > +
> > +The VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER requires followings:
> > +
> > +\begin{itemize*}
> > +\item The driver MUST set \field{type}.
> > +\item The driver MUST set \field{num_planes} to a non-zero value that
> > +  is less than of equal to VIRTIO_VDEC_MAX_PLANES.
> > +\item The driver MUST fill out the fields of \field{planes[i]} for
> > +  each \(0 \le i < \field{num_planes} \).
> > +\item The driver MUST NOT send VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER
> > +  more than the number of times that the device requested via
> > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT.
> > +\end{itemize*}
> > +
> > +\devicenormative{\paragraph}{Buffer Registration}{Device Types / Video
> > +  Decode Device / Device Operation / Buffer Registration}
> > +
> > +The VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER requires followings:
> > +
> > +\begin{itemize*}
> > +\item The device MUST set \field{type} to the value that the
> > +  corresponding VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER used.
> > +\item The device MUST set \field{handles[i]} to the value that equals
> > +  to \field{register_buffer.planes[i].handle} in the corresponding
> > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > +\end{itemize*}
> > +
> > +\subsubsection{Bitstream Buffer Requests}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > +
> > +The driver sends VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER to ask the
> > +device to decode contents in a registered bitstream buffer. This
> > +request is similar to the V4L2's VIDIOC_QBUF ioctl for the OUTPUT
> > +buffer. The following structure is used to specify the buffer:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_guest_req_bitstream_buffer {
> > +        le32 handle;
> > +        le32 offset;
> > +        le32 length;
> > +        le64 cookie;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[handle] is a resource handle for the bitstream buffer to be
> > +  decoded.
> > +\item[offset] is the offset in bytes to video data in the buffer.
> > +\item[length] is the size of valid data in the buffer.
> > +\item[cookie] is an identifier for the frame decoded from this
> > +  bitstream data. This cookie will be copied to the matching frame
> > +  buffer. See also \field{virtio_vdec_host_req_frame_buffer} in
> > +  \ref{sec:Device Types / Video Decode Device / Device Operation /
> > +    Frame Buffer Requests}.
> > +\end{description}
> > +
> > +Once the device consumes the bitstream passed by the
> > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request, it sends
> > +VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER with the following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_bitstream_buffer {
> > +        le32 handle;
> > +};
> > +\end{lstlisting}
> > +\begin{description}
> > +\item[handle] is a resource handle of the bitstream buffer that the
> > +  device completed processing.
> > +\end{description}
> > +
> > +The driver can reuse bitstream buffers that
> > +VIRTIO_VDEC_HOST_REQ_BITSTREAM is sent for.
> > +
> > +\drivernormative{\paragraph}{Bitstream Buffer Requests}{Device Types /
> > +  Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > +
> > +\begin{itemize*}
> > +\item The driver MUST use \field{handle} which is already registered
> > +  by the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > +\end{itemize*}
> > +
> > +\subsubsection{Stream Information}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Stream Information}
> > +
> > +To obtain the information required to allocate frame buffers
> > +(resolution, etc.), the driver registers bitstream buffers first and
> > +proceeds with queuing VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER requests
> > +until stream information is received via a
> > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO host request. The
> > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO is also sent when a run-time
> > +resolution change is detected. This request is similar to
> > +V4L2_EVENT_SOURCE_CHANGE events for V4L2 devices.
> > +
> > +The following structure is used for the
> > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_stream_info {
> > +        le32 frame_fourcc;
> > +        le32 fb_width;
> > +        le32 fb_height;
> > +        le32 min_frame_buffer_count;
> > +        le32 max_frame_buffer_count;
> > +        struct virtio_vdec_host_req_stream_crop {
> > +                le32 left;
> > +                le32 top;
> > +                le32 width;
> > +                le32 height;
> > +        } crop;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[frame_fourcc] indicates the pixel format that the video is
> > +  decoded to.
> > +\item[fb_width] is the width of a required frame buffer.
> > +\item[fb_height] is the height of a required frame buffer.
> > +\item[min_frame_buffer_count] is the minimum number of frame buffers
> > +  that the device requires.
> > +\item[max_frame_buffer_count] is the maximum number of frame buffers
> > +  that the device can accept.
> > +\item[crop] indicates video cropping information.
> > +    \begin{description}
> > +    \item[left] is the horizontal offset of the left-most valid pixel
> > +      of the video in pixels.
> > +    \item[top] is the vertical offset of the top-most valid line of
> > +      the video in lines.
> > +    \item[width] is the width of the rectangle in pixels.
> > +    \item[height] is the height of the rectangle in pixels.
> > +    \end{description}
> > +\end{description}
> > +
> > +The driver responds the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO once it
> > +processes the incoming stream information.
> > +
> > +\devicenormative{\paragraph}{Stream Information}{Device Types / Video
> > +  Decode Device / Device Operation / Stream Information}
> > +
> > +\begin{itemize*}
> > +\item After the device send the VIRTIO_VDEC_HOST_REQ_STREAM_INFO, the
> > +  device MUST empty its list of registered frame buffers and skip any
> > +  pending VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER requests until a
> > +  VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO request is received.
> > +\end{itemize*}
> > +
> > +\drivernormative{\paragraph}{Stream Information}{Device Types / Video
> > +  Decode Device / Device Operation / Stream Information}
> > +
> > +\begin{itemize*}
> > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO
> > +  once it processes a VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > +\item If VIRTIO_VDEC_HOST_REQ_STREAM_INFO came when the driver is
> > +  waiting for a VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER request, the driver
> > +  MUST enqueue the corresponding VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > +  again after sending the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for a
> > +  frame buffer with new stream information.
> > +\end{itemize*}
> > +
> > +\subsubsection{Frame Buffer Requests}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Frame Buffer Requests}
> > +
> > +The driver provides frame buffers via
> > +VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER, in which the device will fill
> > +decoded frames. This request is similar to V4L2's VIDIOC_QBUF ioctl
> > +for CAPTURE buffers. The following structure is used for the request:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_guest_req_frame_buffer {
> > +        struct virtio_vdec_frame_buffer_plane {
> > +                le32 handle;
> > +                le32 offset;
> > +                le32 stride;
> > +                le32 length;
> > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[planes] is an array of planes that a frame buffer consists of.
> > +    \begin{description}
> > +    \item[handle] is a resource handle to indicate the buffer for a
> > +      plane.
> > +    \item[offset] is the offset in bytes to the place where decoded
> > +      frame is filled in.
> > +    \item[stride] is the line stride.
> > +    \item[length] is the length of plane where the device can fill in
> > +      decoded data.
> > +    \end{description}
> > +\end{description}
> > +
> > +When the device fills the buffer with decoded frame data, the device
> > +notifies to the driver by VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER with the
> > +following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_frame_buffer {
> > +        le32 handle[VIRTIO_VDEC_MAX_PLANES];
> > +        le64 cookie;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[handles] is an array of handles passed via the
> > +  \field{virtio_vdec_guest_req_frame_buffer} in the corresponding
> > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER.
> > +\item[cookie] is a \field{cookie} passed via a
> > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. This indicates a bitstream
> > +  buffer which a frame was decoded from.
> > +\end{description}
> > +
> > +\drivernormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > +
> > +\begin{itemize*}
> > +\item Before the driver sends the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > +  for a frame buffer, the driver MUST have sent the
> > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for the buffer.
> > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER for
> > +  a frame buffer that satisfies the restriction passed by
> > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > +\end{itemize*}
> > +
> > +\devicenormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > +
> > +When the device successfully processed the frame buffer, it MUST set
> > +\field{cookie} to a value that is passed via a
> > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request that provided the
> > +corresponding bitstream.
> > +
> > +\subsubsection{Drain}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Drain}
> > +
> > +To ensure that any pending requests are processed and decoded frames
> > +returned in appropriate replies, the driver MAY issue a
> > +VIRTIO_VDEC_GUEST_REQ_DRAIN request. The device will continue
> > +processing the requests as usual, but once all the requests of given
> > +context preceding the DRAIN request are processed, the device will
> > +issue a VIRTIO_VDEC_HOST_REQ_DRAINED host request to notify the driver
> > +of this fact.
> > +
> > +\devicenormative{\paragraph}{Drain}{Device Types / Video Decode Device
> > +  / Device Operation / Drain}
> > +
> > +\begin{itemize*}
> > +\item When VIRTIO_VDEC_GUEST_REQ_DRAIN comes, the device MUST process
> > +  all pending guest requests. After processing them, the device MUST
> > +  send VIRTIO_VDEC_HOST_REQ_DRAINED to the driver.
> > +\item While processing a drain request, the device MUST ignore any
> > +  guest requests other than Flush and Close requests. See
> > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > +  and
> > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Instance
> > +    Close}.
> > +\end{itemize*}
> > +
> > +\subsubsection{Flush}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > +
> > +To drop any pending decode requests and decoded frames pending on
> > +output buffers, the driver MAY issue a VIRTIO_VDEC_GUEST_REQ_FLUSH
> > +request. Any frames that would result from the previously enqueued
> > +bitstream data will be dropped and the device will not write to any of
> > +the frame buffers provided earlier.
> > +
> > +\devicenormative{\paragraph}{Flush}{Device Types / Video Decode Device
> > +  / Device Operation / Flush}
> > +
> > +\begin{itemize*}
> > +\item The device MUST send VIRTIO_VDEC_HOST_REQ_FLUSHED when finishing
> > +  the flush process.
> > +\item If VIRTIO_VDEC_GUEST_REQ_FLUSH comes while processing guest
> > +  requests, the device MUST stop them.
> > +\item The device MUST NOT process any guest request that comes after
> > +  VIRTIO_VDEC_GUEST_REQ_FLUSH until it sends
> > +  VIRTIO_VDEC_HOST_REQ_FLUSHED.
> > +\item The device MUST dequeue all unread requests in outq when
> > +  VIRTIO_VDEC_GUEST_REQ_FLUSH comes.
> > +\end{itemize*}
> > +
> > +\subsubsection{EOS Notification}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / EOS Notification}
> > +
> > +The device sends VIRTIO_VDEC_HOST_REQ_EOS when the end of a stream
> > +(EOS) is reached while decoding.
> > +
> > +This request is similar to the V4L2_EVENT_EOS for V4L2 devices.
> > +
> > +\subsubsection{Instance Close}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Close}
> > +
> > +If a decoding instance is no longer needed, the driver SHOULD issue a
> > +VIRTIO_VDEC_GUEST_REQ_CLOSE request. This will trigger an implicit
> > +flush operation (as in a VIRTIO_VDEC_GUEST_REQ_FLUSH request; without
> > +a following host request) and free the instance ID and any associated
> > +resources on the device side.
> > +
> > +The instance is successfully terminated once the device responds to
> > +the request. After that, it is guaranteed that any buffers registered
> > +by the driver to given instance will no longer be accessed by the
> > +device (unless also provided to another instance).
> > +
> > +\subsubsection{Error Reporting}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Error Reporting}
> > +
> > +The device has two ways of reporting errors to the driver: one is for
> > +errors associated with a host request, the other is for global errors.
> > +
> > +If an error is associated with a host request, the device sets
> > +\field{type} to the host request and \field{result} to a value of
> > +\field{virtio_vdec_host_req_result} describing the error in
> > +\field{virtio_vdec_host_req}.
> > +
> > +If the device cannot proceed decoding due to an error which is not
> > +associated with any requests, the device MUST send the
> > +VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR. In such case, the driver
> > +MUST stop sending decoding requests.
> > --
> > 2.23.0.237.gc6a4ce50a0-goog
> >
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [virtio-dev] Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-09-19 11:15     ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-09-19 11:15 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: virtio-dev, Alexandre Courbot, Alex Lau, Dylan Reid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Tomasz Figa, Linux Media Mailing List

Hi Hans,
Thank you for your feedback.

On Thu, Sep 19, 2019 at 6:53 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
>
> Hi Keiichi,
>
> On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> > [Resending because of some issues with sending to virtio-dev. Sorry for the noise.]
> >
> > This patch proposes virtio specification for new virtio video decode
> > device.
> > This device provides the functionality of hardware accelerated video
> > decoding from encoded video contents provided by the guest into frame
> > buffers accessible by the guest.
> >
> > We have prototype implementation for VMs on Chrome OS:
> > * virtio-vdec device in crosvm [1]
> > * virtio-vdec driver in Linux kernel v4.19 [2]
> >   - This driver follows V4L2 stateful video decoder API [3].
> >
> > Our prototype implementation uses [4], which allows the virtio-vdec
> > device to use buffers allocated by virtio-gpu device.
> >
> > Any feedback would be greatly appreciated. Thank you.
>
> I'm not a virtio expert, but as I understand it the virtio-vdec driver
> looks like a regular v4l2 stateful decoder devices to the guest, while
> on the host there is a driver (or something like that) that maps the
> virtio-vdec requests to the actual decoder hardware, right?
>
> What concerns me a bit (but there may be good reasons for this) is that
> this virtio driver is so specific for stateful decoders.
>

We aim to design a platform-independent interface. The virtio-vdec protocol
should be designed to be workable regardless of APIs, OS, and platforms
eventually.

Our prototype virtio-vdec device translates the virtio-vdec protocol to a
Chrome's video decode acceleration API instead of talking to hardware decoders
directly. This Chrome's API is an abstract layer for multiple decoder APIs on
Linux such as V4L2 stateful, V4L2 slice, Intel's VAAPI.

That is to say the guest driver translates V4L2 stateful API to virtio-vdec and
the host device translates virtio-vdec to Chrome's API. So, I could say that
this is already more general than a mere V4L2 stateful API wrapper, at least.

I'd appreciate if you could let me know some parts are still specific to V4L2.


> How does this scale to stateful encoders? Stateless codecs? Other M2M
> devices like deinterlacers or colorspace converters? What about webcams?
>

We're designing virtio protocol for encoder as well, but we are at an early
stage. So, I'm not sure if we should/can handle decoder and encoder in one
protocol. I don't have any plans for other media devices.


> In other words, I would like to see a bigger picture here.
>
> Note that there is also an effort for Xen to expose webcams to a guest:
>
> https://www.spinics.net/lists/linux-media/msg148629.html
>

Good to know. Thanks.

> This may or may not be of interest. This was an RFC only, and I haven't
> seen any follow-up patches with actual code.
>
> There will be a half-day meeting of media developers during the ELCE
> in October about codecs. I know Alexandre and Tomasz will be there.
> It might be a good idea to discuss this in more detail if needed.
>

Sounds good. They are closely working with me.

Regards,
Keiichi

> Regards,
>
>         Hans
>
> >
> > [1] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> > [2] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> > [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be merged to Linux 5.4)
> > [4] https://lkml.org/lkml/2019/9/12/157
> >
> > Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> > ---
> >  content.tex     |   1 +
> >  virtio-vdec.tex | 750 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 751 insertions(+)
> >  create mode 100644 virtio-vdec.tex
> >
> > diff --git a/content.tex b/content.tex
> > index 37a2190..b57d4a9 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> >  \input{virtio-input.tex}
> >  \input{virtio-crypto.tex}
> >  \input{virtio-vsock.tex}
> > +\input{virtio-vdec.tex}
> >
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >
> > diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> > new file mode 100644
> > index 0000000..d117129
> > --- /dev/null
> > +++ b/virtio-vdec.tex
> > @@ -0,0 +1,750 @@
> > +\section{Video Decode Device}
> > +\label{sec:Device Types / Video Decode Device}
> > +
> > +virtio-vdec is a virtio based video decoder. This device provides the
> > +functionality of hardware accelerated video decoding from encoded
> > +video contents provided by the guest into frame buffers accessible by
> > +the guest.
> > +
> > +\subsection{Device ID}
> > +\label{sec:Device Types / Video Decode Device / Device ID}
> > +
> > +28
> > +
> > +\subsection{Virtqueues}
> > +\label{sec:Device Types / Video Decode Device / Virtqueues}
> > +
> > +\begin{description}
> > +\item[0] outq - queue for sending requests from the driver to the
> > +  device
> > +\item[1] inq - queue for sending requests from the device to the
> > +  driver
> > +\end{description}
> > +
> > +Each queue is used uni-directionally. outq is used to send requests
> > +from the driver to the device (i.e., guest requests) and inq is used
> > +to send requests in the other direction (i.e., host requests).
> > +
> > +\subsection{Feature bits}
> > +\label{sec:Device Types / Video Decode Device / Feature bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}
> > +\label{sec:Device Types / Video Decode Device / Device configuration layout}
> > +
> > +None.
> > +
> > +\subsection{Device Requirements: Device Initialization}
> > +\label{sec:Device Types / Video Decode Device / Device Requirements: Device Initialization}
> > +
> > +The virtqueues are initialized.
> > +
> > +\subsection{Device Operation}
> > +\label{sec:Device Types / Video Decode Device / Device Operation}
> > +
> > +\subsubsection{Video Buffers}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
> > +
> > +A virtio-vdec driver and a device use two types of video buffers:
> > +\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
> > +contains encoded video stream data. This buffer is similar to an
> > +OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
> > +contains decoded video frame data like a CAPTURE buffers for V4L2 API.
> > +The driver and the device share these buffers, and each buffer is
> > +identified by a unique integer called a \emph{resource handle}.
> > +
> > +\subsubsection{Guest Request}
> > +
> > +The driver queues requests to the outq virtqueue. The device MAY
> > +process requests out-of-order. All requests on outq use the following
> > +structure:
> > +
> > +\begin{lstlisting}
> > +enum virtio_vdec_guest_req_type {
> > +        VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
> > +
> > +        /* Global */
> > +        VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
> > +
> > +        /* Per instance */
> > +        VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
> > +        VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
> > +        VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
> > +        VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
> > +        VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
> > +        VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
> > +        VIRTIO_VDEC_GUEST_REQ_DRAIN,
> > +        VIRTIO_VDEC_GUEST_REQ_FLUSH,
> > +        VIRTIO_VDEC_GUEST_REQ_CLOSE,
> > +};
> > +
> > +struct virtio_vdec_guest_req {
> > +        le32 type;
> > +        le32 instance_id;
> > +        union {
> > +                struct virtio_vdec_guest_req_open open;
> > +                struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
> > +                struct virtio_vdec_guest_req_register_buffer register_buffer;
> > +                struct virtio_vdec_guest_req_bitstream_buffer bitstream_buffer;
> > +                struct virtio_vdec_guest_req_frame_buffer frame_buffer;
> > +        };
> > +};
> > +\end{lstlisting}
> > +
> > +This structure includes the following generic fields to identify the
> > +request:
> > +\begin{description}
> > +\item[\field{type}] specifies the type of the guest request using
> > +  values from \field{enum virtio_vdec_guest_req_type}.
> > +\item[\field{instance_id}] specifies an instance ID. This field is
> > +  ignored unless \field{type} represents a per-instance request.
> > +\end{description}
> > +
> > +The union fields are additional per-request data used only when
> > +\field{type} has a specific value:
> > +
> > +\begin{description}
> > +\item[\field{open}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_OPEN. See \ref{sec:Device Types / Video Decode
> > +    Device / Device Operation / Instance Open}.
> > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > +\item[\field{register_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Buffer Registration}.
> > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Bitstream Buffer
> > +    Requests}.
> > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Frame Buffer Requests}.
> > +\end{description}
> > +
> > +\subsubsection{Host Request}
> > +
> > +The device queues requests to the inq virtqueue. All requests on inq
> > +use the following structure:
> > +
> > +\begin{lstlisting}
> > +enum virtio_vdec_host_req_type {
> > +        VIRTIO_VDEC_HOST_REQ_UNDEFINED = 0,
> > +
> > +        /* Global */
> > +        VIRTIO_VDEC_HOST_REQ_QUERY = 0x0100,
> > +
> > +        /* Per instance */
> > +        VIRTIO_VDEC_HOST_REQ_OPEN = 0x0200,
> > +        VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT,
> > +        VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER,
> > +        VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER,
> > +        VIRTIO_VDEC_HOST_REQ_STREAM_INFO,
> > +        VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER,
> > +        VIRTIO_VDEC_HOST_REQ_DRAINED,
> > +        VIRTIO_VDEC_HOST_REQ_FLUSHED,
> > +        VIRTIO_VDEC_HOST_REQ_CLOSE,
> > +        VIRTIO_VDEC_HOST_REQ_EOS,
> > +
> > +        /* Error response */
> > +        VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR = 0x1100,
> > +};
> > +
> > +enum virtio_vdec_host_req_result {
> > +        /* Success */
> > +        VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS = 0,
> > +
> > +        /* Error */
> > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_UNSPEC = 0x1000,
> > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_INVALID_REQUEST,
> > +};
> > +
> > +struct virtio_vdec_host_req {
> > +        le32 type;     /* VIRTIO_VDEC_HOST_REQ_* */
> > +        le32 result;   /* VIRTIO_VDEC_HOST_REQ_RESULT_ */
> > +        le32 instance_id;
> > +        le32 reserved; /* for 64-bit alignment */
> > +        union {
> > +                struct virtio_vdec_host_req_query query;
> > +                struct virtio_vdec_host_req_set_buffer_count set_buffer_count;
> > +                struct virtio_vdec_host_req_register_buffer register_buffer;
> > +                struct virtio_vdec_host_req_bitstream_buffer bitstream_buffer;
> > +                struct virtio_vdec_host_req_stream_info stream_info;
> > +                struct virtio_vdec_host_req_frame_buffer frame_buffer;
> > +        };
> > +};
> > +\end{lstlisting}
> > +
> > +This structure includes the following generic fields:
> > +\begin{description}
> > +\item[\field{type}] specifies the type of the host request using
> > +  values from \field{enum virtio_vdec_host_req_type}.
> > +\item[\field{result}] specifies the result of the operation using the
> > +  \field{enum virtio_vdec_host_req_result}.
> > +\item[\field{instance_id}] specifies an instance ID. This field is
> > +  ignored unless \field{type} represents a per-instance request.
> > +\end{description}
> > +
> > +The union fields are additional per-request data used only when
> > +\field{type} has a specific value:
> > +
> > +\begin{description}
> > +\item[\field{query}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_QUERY. See \ref{sec:Device Types / Video Decode
> > +    Device / Device Operation / Query Capabilities}.
> > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > +\item[\field{register_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Buffer Registration}.
> > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > +    Video Decode Device / Device Operation / Bitstream Buffer
> > +    Requests}.
> > +\item[\field{stream_info}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types / Video
> > +    Decode Device / Device Operation / Stream Information}.
> > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > +  VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER. See \ref{sec:Device Types / Video
> > +    Decode Device / Device Operation / Frame Buffer Requests}.
> > +\end{description}
> > +
> > +\subsubsection{Query Capabilities}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Query Capabilities}
> > +
> > +The driver can send VIRTIO_VDEC_GUEST_REQ_QUERY to query supported
> > +bitstream formats and frame formats. This is similar to
> > +VIDIOC_ENUM_FMT ioctl and VIDIOC_ENUM_FRAMESIZES ioctl for V4L2
> > +devices.
> > +
> > +The device sends supported formats as VIRTIO_VDEC_HOST_REQ_QUERY with
> > +the following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_range {
> > +        le32 min;
> > +        le32 max;
> > +        le32 step;
> > +};
> > +
> > +struct virtio_vdec_format_desc {
> > +        le32 fourcc;
> > +        le32 mask;
> > +        struct virtio_vdec_range width;
> > +        struct virtio_vdec_range height;
> > +};
> > +
> > +#define VIRTIO_VDEC_NUM_FORMATS 32
> > +
> > +struct virtio_vdec_host_req_query {
> > +        struct virtio_vdec_format_desc bitstream_formats[VIRTIO_VDEC_NUM_FORMATS];
> > +        struct virtio_vdec_format_desc frame_formats[VIRTIO_VDEC_NUM_FORMATS];
> > +};
> > +\end{lstlisting}
> > +
> > +The \field{virtio_vdec_host_req_query} describes the set of supported
> > +bitstream and frame formats, with corresponding ranges of image
> > +resolution and compatibility mask to determine the compatibility
> > +between coded and raw formats.
> > +
> > +\begin{description}
> > +\item[fourcc] specifies a FOURCC of a supported video codec or a pixel
> > +  format.
> > +\item[mask] is used to represent supported combination of video codec
> > +  and pixel format. If i-th bit is set in \field{mask} of the
> > +  \field{bitstream_formats[j]}, the device MUST support decoding from
> > +  the video codec of \field{bitstream_formats[j]} to the pixel format
> > +  of \field{frame_formats[i]}.
> > +\item[width/height] represents either a supported stream resolution or
> > +  a supported frame buffer resolution.
> > +\end{description}
> > +
> > +\devicenormative{\paragraph}{Query Capabilities}{Device Types / Video
> > +  Decode Device / Device Operation / Query Capabilities}
> > +
> > +The device MUST set \field{mask} to 0 if the \field{struct
> > +  virtio_vdec_format_desc} is invalid in VIRTIO_VDEC_HOST_REQ_QUERY.
> > +
> > +\subsubsection{Instance Open}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Open}
> > +
> > +To acquire a decoder instance for a given coded format, the driver
> > +sends a VIRTIO_VDEC_GUEST_REQ_OPEN request. The driver specifies a
> > +video codec with the following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_guest_req_open {
> > +        le32 bitstream_fourcc;
> > +};
> > +\end{lstlisting}
> > +
> > +When the guest request comes, the device allocates the instance and
> > +sends a VIRTIO_VDEC_HOST_REQ_OPEN request.
> > +
> > +Once the driver recieves VIRTIO_VDEC_HOST_REQ_OPEN, the driver can
> > +send per-instance guest requests to the opened instance by specifying
> > +\field{instance_id}. The device MUST process per-instance requests
> > +with a same value of \field{instance_id} in order.
> > +
> > +\drivernormative{\paragraph}{Instance Open}{Device Types / Video
> > +  Decode Device / Device Operation / Instance Open}
> > +
> > +In VIRTIO_VDEC_GUEST_REQ_OPEN,
> > +\begin{itemize*}
> > +\item The driver MUST set \field{bitstream_fourcc} to a FOURCC of a
> > +  video codec that is supported by the device.
> > +\item The driver MUST set \field{instance_id} to an integer that is
> > +  not used as an \field{instance_id} before.
> > +\end{itemize*}
> > +
> > +\devicenormative{\paragraph}{Instance Open}{Device Types / Video
> > +  Decode Device / Device Operation / Instance Open}
> > +
> > +If the device fails to allocate an instance, the device MUST set
> > +\field{result} to a value other than
> > +VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS.
> > +
> > +\subsubsection{Set Buffer Count}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Set Buffer Count}
> > +
> > +Once an instance is acquired, the driver and the device negotiate the
> > +number of buffers to use during the decoding process by
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT and
> > +VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. These requests are similar to
> > +VIDIOC_REQBUFS controls for V4L2 devices. First, the driver sends
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to tell how many buffers the
> > +driver needs by using the following structure:
> > +
> > +\begin{lstlisting}
> > +enum virtio_vdec_buffer_type {
> > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_BITSTREAM = 0,
> > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_FRAME = 1,
> > +};
> > +
> > +struct virtio_vdec_guest_req_set_buffer_count {
> > +        le32 type;
> > +        le32 buffer_count;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[buffer_count] is the minimum number of buffers that the driver
> > +  needs to use.
> > +\end{description}
> > +
> > +The device responds the VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to
> > +notify the number of buffers that it can accept with the following
> > +structure.
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_set_buffer_count {
> > +        le32 type;
> > +        le32 buffer_count;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[buffer_count] is the maximum number of buffers that the device
> > +  can accept.
> > +\end{description}
> > +
> > +\drivernormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > +  Decode Device / Device Operation / Set Buffer Count}
> > +
> > +The driver MUST set \field{buffer_count} to a non-zero value in
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. The value of
> > +\field{buffer_count} for frame buffers MUST match the requirement
> > +given via VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types
> > +  / Video Decode Device / Device Operation / Stream Information}.
> > +
> > +\devicenormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > +  Decode Device / Device Operation / Set Buffer Count}
> > +
> > +The device MUST set \field{buffer_count} to a number of buffers that
> > +the device can accept. The value MAY be different from the value the
> > +driver set \field{buffer_count} to in the corresponding
> > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT request.
> > +
> > +\subsubsection{Buffer Registration}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffer Registration}
> > +
> > +Once the number of buffers is set, the driver registers metadata of
> > +buffers via the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER with the
> > +following structure:
> > +
> > +\begin{lstlisting}
> > +#define VIRTIO_VDEC_MAX_PLANES 8
> > +
> > +struct virtio_vdec_guest_req_register_buffer {
> > +        le32 type;
> > +        le32 num_planes;
> > +        struct virtio_vdec_plane {
> > +                le32 handle;
> > +                le32 offset;
> > +                le32 length;
> > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[VIRTIO_VDEC_MAX_PLANES] is the maximum number of planes that a
> > +  video format needs. It is similar to VIDEO_MAX_PLANES in
> > +  \href{https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/videodev2.h}{include/uapi/linux/videodev2.h}
> > +  in the Linux source tree.
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[num_planes] is the number of planes (i.e. separate memory
> > +  buffers) for this format. This indicates the number of valid entries
> > +  in \field{planes}.
> > +\item[planes] is an array of structures describing information of each
> > +  plane.
> > +  \begin{description}
> > +  \item[handle] is a resource handle for the buffer.
> > +  \item[offset] equals to the offset in the plane to the start of
> > +    data.
> > +  \item[length] is the size of this plane in bytes.
> > +  \end{description}
> > +\end{description}
> > +
> > +Once the device accepts the buffer registration, the device responds
> > +the VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER with the following structure.
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_register_buffer {
> > +        le32 type;
> > +        le32 handles[VIRTIO_VDEC_MAX_PLANES];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[type] specifies a buffer type with a value of the
> > +  \field{virtio_vdec_buffer_type}.
> > +\item[handles] is an array of resource handles from the corresponding
> > +  guest request.
> > +\end{description}
> > +
> > +\drivernormative{\paragraph}{Buffer Registration}{Device Types / Video
> > +  Decode Device / Device Operation / Buffer Registration}
> > +
> > +The VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER requires followings:
> > +
> > +\begin{itemize*}
> > +\item The driver MUST set \field{type}.
> > +\item The driver MUST set \field{num_planes} to a non-zero value that
> > +  is less than of equal to VIRTIO_VDEC_MAX_PLANES.
> > +\item The driver MUST fill out the fields of \field{planes[i]} for
> > +  each \(0 \le i < \field{num_planes} \).
> > +\item The driver MUST NOT send VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER
> > +  more than the number of times that the device requested via
> > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT.
> > +\end{itemize*}
> > +
> > +\devicenormative{\paragraph}{Buffer Registration}{Device Types / Video
> > +  Decode Device / Device Operation / Buffer Registration}
> > +
> > +The VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER requires followings:
> > +
> > +\begin{itemize*}
> > +\item The device MUST set \field{type} to the value that the
> > +  corresponding VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER used.
> > +\item The device MUST set \field{handles[i]} to the value that equals
> > +  to \field{register_buffer.planes[i].handle} in the corresponding
> > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > +\end{itemize*}
> > +
> > +\subsubsection{Bitstream Buffer Requests}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > +
> > +The driver sends VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER to ask the
> > +device to decode contents in a registered bitstream buffer. This
> > +request is similar to the V4L2's VIDIOC_QBUF ioctl for the OUTPUT
> > +buffer. The following structure is used to specify the buffer:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_guest_req_bitstream_buffer {
> > +        le32 handle;
> > +        le32 offset;
> > +        le32 length;
> > +        le64 cookie;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[handle] is a resource handle for the bitstream buffer to be
> > +  decoded.
> > +\item[offset] is the offset in bytes to video data in the buffer.
> > +\item[length] is the size of valid data in the buffer.
> > +\item[cookie] is an identifier for the frame decoded from this
> > +  bitstream data. This cookie will be copied to the matching frame
> > +  buffer. See also \field{virtio_vdec_host_req_frame_buffer} in
> > +  \ref{sec:Device Types / Video Decode Device / Device Operation /
> > +    Frame Buffer Requests}.
> > +\end{description}
> > +
> > +Once the device consumes the bitstream passed by the
> > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request, it sends
> > +VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER with the following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_bitstream_buffer {
> > +        le32 handle;
> > +};
> > +\end{lstlisting}
> > +\begin{description}
> > +\item[handle] is a resource handle of the bitstream buffer that the
> > +  device completed processing.
> > +\end{description}
> > +
> > +The driver can reuse bitstream buffers that
> > +VIRTIO_VDEC_HOST_REQ_BITSTREAM is sent for.
> > +
> > +\drivernormative{\paragraph}{Bitstream Buffer Requests}{Device Types /
> > +  Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > +
> > +\begin{itemize*}
> > +\item The driver MUST use \field{handle} which is already registered
> > +  by the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > +\end{itemize*}
> > +
> > +\subsubsection{Stream Information}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Stream Information}
> > +
> > +To obtain the information required to allocate frame buffers
> > +(resolution, etc.), the driver registers bitstream buffers first and
> > +proceeds with queuing VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER requests
> > +until stream information is received via a
> > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO host request. The
> > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO is also sent when a run-time
> > +resolution change is detected. This request is similar to
> > +V4L2_EVENT_SOURCE_CHANGE events for V4L2 devices.
> > +
> > +The following structure is used for the
> > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_stream_info {
> > +        le32 frame_fourcc;
> > +        le32 fb_width;
> > +        le32 fb_height;
> > +        le32 min_frame_buffer_count;
> > +        le32 max_frame_buffer_count;
> > +        struct virtio_vdec_host_req_stream_crop {
> > +                le32 left;
> > +                le32 top;
> > +                le32 width;
> > +                le32 height;
> > +        } crop;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[frame_fourcc] indicates the pixel format that the video is
> > +  decoded to.
> > +\item[fb_width] is the width of a required frame buffer.
> > +\item[fb_height] is the height of a required frame buffer.
> > +\item[min_frame_buffer_count] is the minimum number of frame buffers
> > +  that the device requires.
> > +\item[max_frame_buffer_count] is the maximum number of frame buffers
> > +  that the device can accept.
> > +\item[crop] indicates video cropping information.
> > +    \begin{description}
> > +    \item[left] is the horizontal offset of the left-most valid pixel
> > +      of the video in pixels.
> > +    \item[top] is the vertical offset of the top-most valid line of
> > +      the video in lines.
> > +    \item[width] is the width of the rectangle in pixels.
> > +    \item[height] is the height of the rectangle in pixels.
> > +    \end{description}
> > +\end{description}
> > +
> > +The driver responds the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO once it
> > +processes the incoming stream information.
> > +
> > +\devicenormative{\paragraph}{Stream Information}{Device Types / Video
> > +  Decode Device / Device Operation / Stream Information}
> > +
> > +\begin{itemize*}
> > +\item After the device send the VIRTIO_VDEC_HOST_REQ_STREAM_INFO, the
> > +  device MUST empty its list of registered frame buffers and skip any
> > +  pending VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER requests until a
> > +  VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO request is received.
> > +\end{itemize*}
> > +
> > +\drivernormative{\paragraph}{Stream Information}{Device Types / Video
> > +  Decode Device / Device Operation / Stream Information}
> > +
> > +\begin{itemize*}
> > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO
> > +  once it processes a VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > +\item If VIRTIO_VDEC_HOST_REQ_STREAM_INFO came when the driver is
> > +  waiting for a VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER request, the driver
> > +  MUST enqueue the corresponding VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > +  again after sending the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for a
> > +  frame buffer with new stream information.
> > +\end{itemize*}
> > +
> > +\subsubsection{Frame Buffer Requests}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Frame Buffer Requests}
> > +
> > +The driver provides frame buffers via
> > +VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER, in which the device will fill
> > +decoded frames. This request is similar to V4L2's VIDIOC_QBUF ioctl
> > +for CAPTURE buffers. The following structure is used for the request:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_guest_req_frame_buffer {
> > +        struct virtio_vdec_frame_buffer_plane {
> > +                le32 handle;
> > +                le32 offset;
> > +                le32 stride;
> > +                le32 length;
> > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[planes] is an array of planes that a frame buffer consists of.
> > +    \begin{description}
> > +    \item[handle] is a resource handle to indicate the buffer for a
> > +      plane.
> > +    \item[offset] is the offset in bytes to the place where decoded
> > +      frame is filled in.
> > +    \item[stride] is the line stride.
> > +    \item[length] is the length of plane where the device can fill in
> > +      decoded data.
> > +    \end{description}
> > +\end{description}
> > +
> > +When the device fills the buffer with decoded frame data, the device
> > +notifies to the driver by VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER with the
> > +following structure:
> > +
> > +\begin{lstlisting}
> > +struct virtio_vdec_host_req_frame_buffer {
> > +        le32 handle[VIRTIO_VDEC_MAX_PLANES];
> > +        le64 cookie;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[handles] is an array of handles passed via the
> > +  \field{virtio_vdec_guest_req_frame_buffer} in the corresponding
> > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER.
> > +\item[cookie] is a \field{cookie} passed via a
> > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. This indicates a bitstream
> > +  buffer which a frame was decoded from.
> > +\end{description}
> > +
> > +\drivernormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > +
> > +\begin{itemize*}
> > +\item Before the driver sends the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > +  for a frame buffer, the driver MUST have sent the
> > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for the buffer.
> > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER for
> > +  a frame buffer that satisfies the restriction passed by
> > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > +\end{itemize*}
> > +
> > +\devicenormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > +
> > +When the device successfully processed the frame buffer, it MUST set
> > +\field{cookie} to a value that is passed via a
> > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request that provided the
> > +corresponding bitstream.
> > +
> > +\subsubsection{Drain}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Drain}
> > +
> > +To ensure that any pending requests are processed and decoded frames
> > +returned in appropriate replies, the driver MAY issue a
> > +VIRTIO_VDEC_GUEST_REQ_DRAIN request. The device will continue
> > +processing the requests as usual, but once all the requests of given
> > +context preceding the DRAIN request are processed, the device will
> > +issue a VIRTIO_VDEC_HOST_REQ_DRAINED host request to notify the driver
> > +of this fact.
> > +
> > +\devicenormative{\paragraph}{Drain}{Device Types / Video Decode Device
> > +  / Device Operation / Drain}
> > +
> > +\begin{itemize*}
> > +\item When VIRTIO_VDEC_GUEST_REQ_DRAIN comes, the device MUST process
> > +  all pending guest requests. After processing them, the device MUST
> > +  send VIRTIO_VDEC_HOST_REQ_DRAINED to the driver.
> > +\item While processing a drain request, the device MUST ignore any
> > +  guest requests other than Flush and Close requests. See
> > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > +  and
> > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Instance
> > +    Close}.
> > +\end{itemize*}
> > +
> > +\subsubsection{Flush}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > +
> > +To drop any pending decode requests and decoded frames pending on
> > +output buffers, the driver MAY issue a VIRTIO_VDEC_GUEST_REQ_FLUSH
> > +request. Any frames that would result from the previously enqueued
> > +bitstream data will be dropped and the device will not write to any of
> > +the frame buffers provided earlier.
> > +
> > +\devicenormative{\paragraph}{Flush}{Device Types / Video Decode Device
> > +  / Device Operation / Flush}
> > +
> > +\begin{itemize*}
> > +\item The device MUST send VIRTIO_VDEC_HOST_REQ_FLUSHED when finishing
> > +  the flush process.
> > +\item If VIRTIO_VDEC_GUEST_REQ_FLUSH comes while processing guest
> > +  requests, the device MUST stop them.
> > +\item The device MUST NOT process any guest request that comes after
> > +  VIRTIO_VDEC_GUEST_REQ_FLUSH until it sends
> > +  VIRTIO_VDEC_HOST_REQ_FLUSHED.
> > +\item The device MUST dequeue all unread requests in outq when
> > +  VIRTIO_VDEC_GUEST_REQ_FLUSH comes.
> > +\end{itemize*}
> > +
> > +\subsubsection{EOS Notification}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / EOS Notification}
> > +
> > +The device sends VIRTIO_VDEC_HOST_REQ_EOS when the end of a stream
> > +(EOS) is reached while decoding.
> > +
> > +This request is similar to the V4L2_EVENT_EOS for V4L2 devices.
> > +
> > +\subsubsection{Instance Close}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Close}
> > +
> > +If a decoding instance is no longer needed, the driver SHOULD issue a
> > +VIRTIO_VDEC_GUEST_REQ_CLOSE request. This will trigger an implicit
> > +flush operation (as in a VIRTIO_VDEC_GUEST_REQ_FLUSH request; without
> > +a following host request) and free the instance ID and any associated
> > +resources on the device side.
> > +
> > +The instance is successfully terminated once the device responds to
> > +the request. After that, it is guaranteed that any buffers registered
> > +by the driver to given instance will no longer be accessed by the
> > +device (unless also provided to another instance).
> > +
> > +\subsubsection{Error Reporting}
> > +\label{sec:Device Types / Video Decode Device / Device Operation / Error Reporting}
> > +
> > +The device has two ways of reporting errors to the driver: one is for
> > +errors associated with a host request, the other is for global errors.
> > +
> > +If an error is associated with a host request, the device sets
> > +\field{type} to the host request and \field{result} to a value of
> > +\field{virtio_vdec_host_req_result} describing the error in
> > +\field{virtio_vdec_host_req}.
> > +
> > +If the device cannot proceed decoding due to an error which is not
> > +associated with any requests, the device MUST send the
> > +VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR. In such case, the driver
> > +MUST stop sending decoding requests.
> > --
> > 2.23.0.237.gc6a4ce50a0-goog
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-09-19 11:15     ` [virtio-dev] " Keiichi Watanabe
@ 2019-09-19 11:17       ` Keiichi Watanabe
  -1 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-09-19 11:17 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: virtio-dev, Alexandre Courbot, Alex Lau, Dylan Reid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Tomasz Figa, Linux Media Mailing List

I shared PDF version of this RFC on Google drive:
https://drive.google.com/drive/folders/1hed-mTVI7dj0M_iab4DTfx5kPMLoeX8R

On Thu, Sep 19, 2019 at 8:15 PM Keiichi Watanabe <keiichiw@chromium.org> wrote:
>
> Hi Hans,
> Thank you for your feedback.
>
> On Thu, Sep 19, 2019 at 6:53 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >
> > Hi Keiichi,
> >
> > On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> > > [Resending because of some issues with sending to virtio-dev. Sorry for the noise.]
> > >
> > > This patch proposes virtio specification for new virtio video decode
> > > device.
> > > This device provides the functionality of hardware accelerated video
> > > decoding from encoded video contents provided by the guest into frame
> > > buffers accessible by the guest.
> > >
> > > We have prototype implementation for VMs on Chrome OS:
> > > * virtio-vdec device in crosvm [1]
> > > * virtio-vdec driver in Linux kernel v4.19 [2]
> > >   - This driver follows V4L2 stateful video decoder API [3].
> > >
> > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > device to use buffers allocated by virtio-gpu device.
> > >
> > > Any feedback would be greatly appreciated. Thank you.
> >
> > I'm not a virtio expert, but as I understand it the virtio-vdec driver
> > looks like a regular v4l2 stateful decoder devices to the guest, while
> > on the host there is a driver (or something like that) that maps the
> > virtio-vdec requests to the actual decoder hardware, right?
> >
> > What concerns me a bit (but there may be good reasons for this) is that
> > this virtio driver is so specific for stateful decoders.
> >
>
> We aim to design a platform-independent interface. The virtio-vdec protocol
> should be designed to be workable regardless of APIs, OS, and platforms
> eventually.
>
> Our prototype virtio-vdec device translates the virtio-vdec protocol to a
> Chrome's video decode acceleration API instead of talking to hardware decoders
> directly. This Chrome's API is an abstract layer for multiple decoder APIs on
> Linux such as V4L2 stateful, V4L2 slice, Intel's VAAPI.
>
> That is to say the guest driver translates V4L2 stateful API to virtio-vdec and
> the host device translates virtio-vdec to Chrome's API. So, I could say that
> this is already more general than a mere V4L2 stateful API wrapper, at least.
>
> I'd appreciate if you could let me know some parts are still specific to V4L2.
>
>
> > How does this scale to stateful encoders? Stateless codecs? Other M2M
> > devices like deinterlacers or colorspace converters? What about webcams?
> >
>
> We're designing virtio protocol for encoder as well, but we are at an early
> stage. So, I'm not sure if we should/can handle decoder and encoder in one
> protocol. I don't have any plans for other media devices.
>
>
> > In other words, I would like to see a bigger picture here.
> >
> > Note that there is also an effort for Xen to expose webcams to a guest:
> >
> > https://www.spinics.net/lists/linux-media/msg148629.html
> >
>
> Good to know. Thanks.
>
> > This may or may not be of interest. This was an RFC only, and I haven't
> > seen any follow-up patches with actual code.
> >
> > There will be a half-day meeting of media developers during the ELCE
> > in October about codecs. I know Alexandre and Tomasz will be there.
> > It might be a good idea to discuss this in more detail if needed.
> >
>
> Sounds good. They are closely working with me.
>
> Regards,
> Keiichi
>
> > Regards,
> >
> >         Hans
> >
> > >
> > > [1] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> > > [2] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> > > [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be merged to Linux 5.4)
> > > [4] https://lkml.org/lkml/2019/9/12/157
> > >
> > > Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> > > ---
> > >  content.tex     |   1 +
> > >  virtio-vdec.tex | 750 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 751 insertions(+)
> > >  create mode 100644 virtio-vdec.tex
> > >
> > > diff --git a/content.tex b/content.tex
> > > index 37a2190..b57d4a9 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> > >  \input{virtio-input.tex}
> > >  \input{virtio-crypto.tex}
> > >  \input{virtio-vsock.tex}
> > > +\input{virtio-vdec.tex}
> > >
> > >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >
> > > diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> > > new file mode 100644
> > > index 0000000..d117129
> > > --- /dev/null
> > > +++ b/virtio-vdec.tex
> > > @@ -0,0 +1,750 @@
> > > +\section{Video Decode Device}
> > > +\label{sec:Device Types / Video Decode Device}
> > > +
> > > +virtio-vdec is a virtio based video decoder. This device provides the
> > > +functionality of hardware accelerated video decoding from encoded
> > > +video contents provided by the guest into frame buffers accessible by
> > > +the guest.
> > > +
> > > +\subsection{Device ID}
> > > +\label{sec:Device Types / Video Decode Device / Device ID}
> > > +
> > > +28
> > > +
> > > +\subsection{Virtqueues}
> > > +\label{sec:Device Types / Video Decode Device / Virtqueues}
> > > +
> > > +\begin{description}
> > > +\item[0] outq - queue for sending requests from the driver to the
> > > +  device
> > > +\item[1] inq - queue for sending requests from the device to the
> > > +  driver
> > > +\end{description}
> > > +
> > > +Each queue is used uni-directionally. outq is used to send requests
> > > +from the driver to the device (i.e., guest requests) and inq is used
> > > +to send requests in the other direction (i.e., host requests).
> > > +
> > > +\subsection{Feature bits}
> > > +\label{sec:Device Types / Video Decode Device / Feature bits}
> > > +
> > > +There are currently no feature bits defined for this device.
> > > +
> > > +\subsection{Device configuration layout}
> > > +\label{sec:Device Types / Video Decode Device / Device configuration layout}
> > > +
> > > +None.
> > > +
> > > +\subsection{Device Requirements: Device Initialization}
> > > +\label{sec:Device Types / Video Decode Device / Device Requirements: Device Initialization}
> > > +
> > > +The virtqueues are initialized.
> > > +
> > > +\subsection{Device Operation}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation}
> > > +
> > > +\subsubsection{Video Buffers}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
> > > +
> > > +A virtio-vdec driver and a device use two types of video buffers:
> > > +\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
> > > +contains encoded video stream data. This buffer is similar to an
> > > +OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
> > > +contains decoded video frame data like a CAPTURE buffers for V4L2 API.
> > > +The driver and the device share these buffers, and each buffer is
> > > +identified by a unique integer called a \emph{resource handle}.
> > > +
> > > +\subsubsection{Guest Request}
> > > +
> > > +The driver queues requests to the outq virtqueue. The device MAY
> > > +process requests out-of-order. All requests on outq use the following
> > > +structure:
> > > +
> > > +\begin{lstlisting}
> > > +enum virtio_vdec_guest_req_type {
> > > +        VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
> > > +
> > > +        /* Global */
> > > +        VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
> > > +
> > > +        /* Per instance */
> > > +        VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
> > > +        VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
> > > +        VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
> > > +        VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
> > > +        VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
> > > +        VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
> > > +        VIRTIO_VDEC_GUEST_REQ_DRAIN,
> > > +        VIRTIO_VDEC_GUEST_REQ_FLUSH,
> > > +        VIRTIO_VDEC_GUEST_REQ_CLOSE,
> > > +};
> > > +
> > > +struct virtio_vdec_guest_req {
> > > +        le32 type;
> > > +        le32 instance_id;
> > > +        union {
> > > +                struct virtio_vdec_guest_req_open open;
> > > +                struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
> > > +                struct virtio_vdec_guest_req_register_buffer register_buffer;
> > > +                struct virtio_vdec_guest_req_bitstream_buffer bitstream_buffer;
> > > +                struct virtio_vdec_guest_req_frame_buffer frame_buffer;
> > > +        };
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +This structure includes the following generic fields to identify the
> > > +request:
> > > +\begin{description}
> > > +\item[\field{type}] specifies the type of the guest request using
> > > +  values from \field{enum virtio_vdec_guest_req_type}.
> > > +\item[\field{instance_id}] specifies an instance ID. This field is
> > > +  ignored unless \field{type} represents a per-instance request.
> > > +\end{description}
> > > +
> > > +The union fields are additional per-request data used only when
> > > +\field{type} has a specific value:
> > > +
> > > +\begin{description}
> > > +\item[\field{open}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_OPEN. See \ref{sec:Device Types / Video Decode
> > > +    Device / Device Operation / Instance Open}.
> > > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > > +\item[\field{register_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Buffer Registration}.
> > > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Bitstream Buffer
> > > +    Requests}.
> > > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Frame Buffer Requests}.
> > > +\end{description}
> > > +
> > > +\subsubsection{Host Request}
> > > +
> > > +The device queues requests to the inq virtqueue. All requests on inq
> > > +use the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +enum virtio_vdec_host_req_type {
> > > +        VIRTIO_VDEC_HOST_REQ_UNDEFINED = 0,
> > > +
> > > +        /* Global */
> > > +        VIRTIO_VDEC_HOST_REQ_QUERY = 0x0100,
> > > +
> > > +        /* Per instance */
> > > +        VIRTIO_VDEC_HOST_REQ_OPEN = 0x0200,
> > > +        VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT,
> > > +        VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER,
> > > +        VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER,
> > > +        VIRTIO_VDEC_HOST_REQ_STREAM_INFO,
> > > +        VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER,
> > > +        VIRTIO_VDEC_HOST_REQ_DRAINED,
> > > +        VIRTIO_VDEC_HOST_REQ_FLUSHED,
> > > +        VIRTIO_VDEC_HOST_REQ_CLOSE,
> > > +        VIRTIO_VDEC_HOST_REQ_EOS,
> > > +
> > > +        /* Error response */
> > > +        VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR = 0x1100,
> > > +};
> > > +
> > > +enum virtio_vdec_host_req_result {
> > > +        /* Success */
> > > +        VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS = 0,
> > > +
> > > +        /* Error */
> > > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_UNSPEC = 0x1000,
> > > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_INVALID_REQUEST,
> > > +};
> > > +
> > > +struct virtio_vdec_host_req {
> > > +        le32 type;     /* VIRTIO_VDEC_HOST_REQ_* */
> > > +        le32 result;   /* VIRTIO_VDEC_HOST_REQ_RESULT_ */
> > > +        le32 instance_id;
> > > +        le32 reserved; /* for 64-bit alignment */
> > > +        union {
> > > +                struct virtio_vdec_host_req_query query;
> > > +                struct virtio_vdec_host_req_set_buffer_count set_buffer_count;
> > > +                struct virtio_vdec_host_req_register_buffer register_buffer;
> > > +                struct virtio_vdec_host_req_bitstream_buffer bitstream_buffer;
> > > +                struct virtio_vdec_host_req_stream_info stream_info;
> > > +                struct virtio_vdec_host_req_frame_buffer frame_buffer;
> > > +        };
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +This structure includes the following generic fields:
> > > +\begin{description}
> > > +\item[\field{type}] specifies the type of the host request using
> > > +  values from \field{enum virtio_vdec_host_req_type}.
> > > +\item[\field{result}] specifies the result of the operation using the
> > > +  \field{enum virtio_vdec_host_req_result}.
> > > +\item[\field{instance_id}] specifies an instance ID. This field is
> > > +  ignored unless \field{type} represents a per-instance request.
> > > +\end{description}
> > > +
> > > +The union fields are additional per-request data used only when
> > > +\field{type} has a specific value:
> > > +
> > > +\begin{description}
> > > +\item[\field{query}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_QUERY. See \ref{sec:Device Types / Video Decode
> > > +    Device / Device Operation / Query Capabilities}.
> > > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > > +\item[\field{register_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Buffer Registration}.
> > > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Bitstream Buffer
> > > +    Requests}.
> > > +\item[\field{stream_info}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types / Video
> > > +    Decode Device / Device Operation / Stream Information}.
> > > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER. See \ref{sec:Device Types / Video
> > > +    Decode Device / Device Operation / Frame Buffer Requests}.
> > > +\end{description}
> > > +
> > > +\subsubsection{Query Capabilities}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Query Capabilities}
> > > +
> > > +The driver can send VIRTIO_VDEC_GUEST_REQ_QUERY to query supported
> > > +bitstream formats and frame formats. This is similar to
> > > +VIDIOC_ENUM_FMT ioctl and VIDIOC_ENUM_FRAMESIZES ioctl for V4L2
> > > +devices.
> > > +
> > > +The device sends supported formats as VIRTIO_VDEC_HOST_REQ_QUERY with
> > > +the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_range {
> > > +        le32 min;
> > > +        le32 max;
> > > +        le32 step;
> > > +};
> > > +
> > > +struct virtio_vdec_format_desc {
> > > +        le32 fourcc;
> > > +        le32 mask;
> > > +        struct virtio_vdec_range width;
> > > +        struct virtio_vdec_range height;
> > > +};
> > > +
> > > +#define VIRTIO_VDEC_NUM_FORMATS 32
> > > +
> > > +struct virtio_vdec_host_req_query {
> > > +        struct virtio_vdec_format_desc bitstream_formats[VIRTIO_VDEC_NUM_FORMATS];
> > > +        struct virtio_vdec_format_desc frame_formats[VIRTIO_VDEC_NUM_FORMATS];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +The \field{virtio_vdec_host_req_query} describes the set of supported
> > > +bitstream and frame formats, with corresponding ranges of image
> > > +resolution and compatibility mask to determine the compatibility
> > > +between coded and raw formats.
> > > +
> > > +\begin{description}
> > > +\item[fourcc] specifies a FOURCC of a supported video codec or a pixel
> > > +  format.
> > > +\item[mask] is used to represent supported combination of video codec
> > > +  and pixel format. If i-th bit is set in \field{mask} of the
> > > +  \field{bitstream_formats[j]}, the device MUST support decoding from
> > > +  the video codec of \field{bitstream_formats[j]} to the pixel format
> > > +  of \field{frame_formats[i]}.
> > > +\item[width/height] represents either a supported stream resolution or
> > > +  a supported frame buffer resolution.
> > > +\end{description}
> > > +
> > > +\devicenormative{\paragraph}{Query Capabilities}{Device Types / Video
> > > +  Decode Device / Device Operation / Query Capabilities}
> > > +
> > > +The device MUST set \field{mask} to 0 if the \field{struct
> > > +  virtio_vdec_format_desc} is invalid in VIRTIO_VDEC_HOST_REQ_QUERY.
> > > +
> > > +\subsubsection{Instance Open}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Open}
> > > +
> > > +To acquire a decoder instance for a given coded format, the driver
> > > +sends a VIRTIO_VDEC_GUEST_REQ_OPEN request. The driver specifies a
> > > +video codec with the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_guest_req_open {
> > > +        le32 bitstream_fourcc;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +When the guest request comes, the device allocates the instance and
> > > +sends a VIRTIO_VDEC_HOST_REQ_OPEN request.
> > > +
> > > +Once the driver recieves VIRTIO_VDEC_HOST_REQ_OPEN, the driver can
> > > +send per-instance guest requests to the opened instance by specifying
> > > +\field{instance_id}. The device MUST process per-instance requests
> > > +with a same value of \field{instance_id} in order.
> > > +
> > > +\drivernormative{\paragraph}{Instance Open}{Device Types / Video
> > > +  Decode Device / Device Operation / Instance Open}
> > > +
> > > +In VIRTIO_VDEC_GUEST_REQ_OPEN,
> > > +\begin{itemize*}
> > > +\item The driver MUST set \field{bitstream_fourcc} to a FOURCC of a
> > > +  video codec that is supported by the device.
> > > +\item The driver MUST set \field{instance_id} to an integer that is
> > > +  not used as an \field{instance_id} before.
> > > +\end{itemize*}
> > > +
> > > +\devicenormative{\paragraph}{Instance Open}{Device Types / Video
> > > +  Decode Device / Device Operation / Instance Open}
> > > +
> > > +If the device fails to allocate an instance, the device MUST set
> > > +\field{result} to a value other than
> > > +VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS.
> > > +
> > > +\subsubsection{Set Buffer Count}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Set Buffer Count}
> > > +
> > > +Once an instance is acquired, the driver and the device negotiate the
> > > +number of buffers to use during the decoding process by
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT and
> > > +VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. These requests are similar to
> > > +VIDIOC_REQBUFS controls for V4L2 devices. First, the driver sends
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to tell how many buffers the
> > > +driver needs by using the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +enum virtio_vdec_buffer_type {
> > > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_BITSTREAM = 0,
> > > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_FRAME = 1,
> > > +};
> > > +
> > > +struct virtio_vdec_guest_req_set_buffer_count {
> > > +        le32 type;
> > > +        le32 buffer_count;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[buffer_count] is the minimum number of buffers that the driver
> > > +  needs to use.
> > > +\end{description}
> > > +
> > > +The device responds the VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to
> > > +notify the number of buffers that it can accept with the following
> > > +structure.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_set_buffer_count {
> > > +        le32 type;
> > > +        le32 buffer_count;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[buffer_count] is the maximum number of buffers that the device
> > > +  can accept.
> > > +\end{description}
> > > +
> > > +\drivernormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > > +  Decode Device / Device Operation / Set Buffer Count}
> > > +
> > > +The driver MUST set \field{buffer_count} to a non-zero value in
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. The value of
> > > +\field{buffer_count} for frame buffers MUST match the requirement
> > > +given via VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types
> > > +  / Video Decode Device / Device Operation / Stream Information}.
> > > +
> > > +\devicenormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > > +  Decode Device / Device Operation / Set Buffer Count}
> > > +
> > > +The device MUST set \field{buffer_count} to a number of buffers that
> > > +the device can accept. The value MAY be different from the value the
> > > +driver set \field{buffer_count} to in the corresponding
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT request.
> > > +
> > > +\subsubsection{Buffer Registration}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffer Registration}
> > > +
> > > +Once the number of buffers is set, the driver registers metadata of
> > > +buffers via the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER with the
> > > +following structure:
> > > +
> > > +\begin{lstlisting}
> > > +#define VIRTIO_VDEC_MAX_PLANES 8
> > > +
> > > +struct virtio_vdec_guest_req_register_buffer {
> > > +        le32 type;
> > > +        le32 num_planes;
> > > +        struct virtio_vdec_plane {
> > > +                le32 handle;
> > > +                le32 offset;
> > > +                le32 length;
> > > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[VIRTIO_VDEC_MAX_PLANES] is the maximum number of planes that a
> > > +  video format needs. It is similar to VIDEO_MAX_PLANES in
> > > +  \href{https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/videodev2.h}{include/uapi/linux/videodev2.h}
> > > +  in the Linux source tree.
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[num_planes] is the number of planes (i.e. separate memory
> > > +  buffers) for this format. This indicates the number of valid entries
> > > +  in \field{planes}.
> > > +\item[planes] is an array of structures describing information of each
> > > +  plane.
> > > +  \begin{description}
> > > +  \item[handle] is a resource handle for the buffer.
> > > +  \item[offset] equals to the offset in the plane to the start of
> > > +    data.
> > > +  \item[length] is the size of this plane in bytes.
> > > +  \end{description}
> > > +\end{description}
> > > +
> > > +Once the device accepts the buffer registration, the device responds
> > > +the VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER with the following structure.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_register_buffer {
> > > +        le32 type;
> > > +        le32 handles[VIRTIO_VDEC_MAX_PLANES];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[handles] is an array of resource handles from the corresponding
> > > +  guest request.
> > > +\end{description}
> > > +
> > > +\drivernormative{\paragraph}{Buffer Registration}{Device Types / Video
> > > +  Decode Device / Device Operation / Buffer Registration}
> > > +
> > > +The VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER requires followings:
> > > +
> > > +\begin{itemize*}
> > > +\item The driver MUST set \field{type}.
> > > +\item The driver MUST set \field{num_planes} to a non-zero value that
> > > +  is less than of equal to VIRTIO_VDEC_MAX_PLANES.
> > > +\item The driver MUST fill out the fields of \field{planes[i]} for
> > > +  each \(0 \le i < \field{num_planes} \).
> > > +\item The driver MUST NOT send VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER
> > > +  more than the number of times that the device requested via
> > > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT.
> > > +\end{itemize*}
> > > +
> > > +\devicenormative{\paragraph}{Buffer Registration}{Device Types / Video
> > > +  Decode Device / Device Operation / Buffer Registration}
> > > +
> > > +The VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER requires followings:
> > > +
> > > +\begin{itemize*}
> > > +\item The device MUST set \field{type} to the value that the
> > > +  corresponding VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER used.
> > > +\item The device MUST set \field{handles[i]} to the value that equals
> > > +  to \field{register_buffer.planes[i].handle} in the corresponding
> > > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Bitstream Buffer Requests}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > > +
> > > +The driver sends VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER to ask the
> > > +device to decode contents in a registered bitstream buffer. This
> > > +request is similar to the V4L2's VIDIOC_QBUF ioctl for the OUTPUT
> > > +buffer. The following structure is used to specify the buffer:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_guest_req_bitstream_buffer {
> > > +        le32 handle;
> > > +        le32 offset;
> > > +        le32 length;
> > > +        le64 cookie;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[handle] is a resource handle for the bitstream buffer to be
> > > +  decoded.
> > > +\item[offset] is the offset in bytes to video data in the buffer.
> > > +\item[length] is the size of valid data in the buffer.
> > > +\item[cookie] is an identifier for the frame decoded from this
> > > +  bitstream data. This cookie will be copied to the matching frame
> > > +  buffer. See also \field{virtio_vdec_host_req_frame_buffer} in
> > > +  \ref{sec:Device Types / Video Decode Device / Device Operation /
> > > +    Frame Buffer Requests}.
> > > +\end{description}
> > > +
> > > +Once the device consumes the bitstream passed by the
> > > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request, it sends
> > > +VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER with the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_bitstream_buffer {
> > > +        le32 handle;
> > > +};
> > > +\end{lstlisting}
> > > +\begin{description}
> > > +\item[handle] is a resource handle of the bitstream buffer that the
> > > +  device completed processing.
> > > +\end{description}
> > > +
> > > +The driver can reuse bitstream buffers that
> > > +VIRTIO_VDEC_HOST_REQ_BITSTREAM is sent for.
> > > +
> > > +\drivernormative{\paragraph}{Bitstream Buffer Requests}{Device Types /
> > > +  Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > > +
> > > +\begin{itemize*}
> > > +\item The driver MUST use \field{handle} which is already registered
> > > +  by the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Stream Information}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Stream Information}
> > > +
> > > +To obtain the information required to allocate frame buffers
> > > +(resolution, etc.), the driver registers bitstream buffers first and
> > > +proceeds with queuing VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER requests
> > > +until stream information is received via a
> > > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO host request. The
> > > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO is also sent when a run-time
> > > +resolution change is detected. This request is similar to
> > > +V4L2_EVENT_SOURCE_CHANGE events for V4L2 devices.
> > > +
> > > +The following structure is used for the
> > > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_stream_info {
> > > +        le32 frame_fourcc;
> > > +        le32 fb_width;
> > > +        le32 fb_height;
> > > +        le32 min_frame_buffer_count;
> > > +        le32 max_frame_buffer_count;
> > > +        struct virtio_vdec_host_req_stream_crop {
> > > +                le32 left;
> > > +                le32 top;
> > > +                le32 width;
> > > +                le32 height;
> > > +        } crop;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[frame_fourcc] indicates the pixel format that the video is
> > > +  decoded to.
> > > +\item[fb_width] is the width of a required frame buffer.
> > > +\item[fb_height] is the height of a required frame buffer.
> > > +\item[min_frame_buffer_count] is the minimum number of frame buffers
> > > +  that the device requires.
> > > +\item[max_frame_buffer_count] is the maximum number of frame buffers
> > > +  that the device can accept.
> > > +\item[crop] indicates video cropping information.
> > > +    \begin{description}
> > > +    \item[left] is the horizontal offset of the left-most valid pixel
> > > +      of the video in pixels.
> > > +    \item[top] is the vertical offset of the top-most valid line of
> > > +      the video in lines.
> > > +    \item[width] is the width of the rectangle in pixels.
> > > +    \item[height] is the height of the rectangle in pixels.
> > > +    \end{description}
> > > +\end{description}
> > > +
> > > +The driver responds the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO once it
> > > +processes the incoming stream information.
> > > +
> > > +\devicenormative{\paragraph}{Stream Information}{Device Types / Video
> > > +  Decode Device / Device Operation / Stream Information}
> > > +
> > > +\begin{itemize*}
> > > +\item After the device send the VIRTIO_VDEC_HOST_REQ_STREAM_INFO, the
> > > +  device MUST empty its list of registered frame buffers and skip any
> > > +  pending VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER requests until a
> > > +  VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO request is received.
> > > +\end{itemize*}
> > > +
> > > +\drivernormative{\paragraph}{Stream Information}{Device Types / Video
> > > +  Decode Device / Device Operation / Stream Information}
> > > +
> > > +\begin{itemize*}
> > > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO
> > > +  once it processes a VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > > +\item If VIRTIO_VDEC_HOST_REQ_STREAM_INFO came when the driver is
> > > +  waiting for a VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER request, the driver
> > > +  MUST enqueue the corresponding VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > > +  again after sending the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for a
> > > +  frame buffer with new stream information.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Frame Buffer Requests}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Frame Buffer Requests}
> > > +
> > > +The driver provides frame buffers via
> > > +VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER, in which the device will fill
> > > +decoded frames. This request is similar to V4L2's VIDIOC_QBUF ioctl
> > > +for CAPTURE buffers. The following structure is used for the request:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_guest_req_frame_buffer {
> > > +        struct virtio_vdec_frame_buffer_plane {
> > > +                le32 handle;
> > > +                le32 offset;
> > > +                le32 stride;
> > > +                le32 length;
> > > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[planes] is an array of planes that a frame buffer consists of.
> > > +    \begin{description}
> > > +    \item[handle] is a resource handle to indicate the buffer for a
> > > +      plane.
> > > +    \item[offset] is the offset in bytes to the place where decoded
> > > +      frame is filled in.
> > > +    \item[stride] is the line stride.
> > > +    \item[length] is the length of plane where the device can fill in
> > > +      decoded data.
> > > +    \end{description}
> > > +\end{description}
> > > +
> > > +When the device fills the buffer with decoded frame data, the device
> > > +notifies to the driver by VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER with the
> > > +following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_frame_buffer {
> > > +        le32 handle[VIRTIO_VDEC_MAX_PLANES];
> > > +        le64 cookie;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[handles] is an array of handles passed via the
> > > +  \field{virtio_vdec_guest_req_frame_buffer} in the corresponding
> > > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER.
> > > +\item[cookie] is a \field{cookie} passed via a
> > > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. This indicates a bitstream
> > > +  buffer which a frame was decoded from.
> > > +\end{description}
> > > +
> > > +\drivernormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > > +
> > > +\begin{itemize*}
> > > +\item Before the driver sends the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > > +  for a frame buffer, the driver MUST have sent the
> > > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for the buffer.
> > > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER for
> > > +  a frame buffer that satisfies the restriction passed by
> > > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > > +\end{itemize*}
> > > +
> > > +\devicenormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > > +
> > > +When the device successfully processed the frame buffer, it MUST set
> > > +\field{cookie} to a value that is passed via a
> > > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request that provided the
> > > +corresponding bitstream.
> > > +
> > > +\subsubsection{Drain}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Drain}
> > > +
> > > +To ensure that any pending requests are processed and decoded frames
> > > +returned in appropriate replies, the driver MAY issue a
> > > +VIRTIO_VDEC_GUEST_REQ_DRAIN request. The device will continue
> > > +processing the requests as usual, but once all the requests of given
> > > +context preceding the DRAIN request are processed, the device will
> > > +issue a VIRTIO_VDEC_HOST_REQ_DRAINED host request to notify the driver
> > > +of this fact.
> > > +
> > > +\devicenormative{\paragraph}{Drain}{Device Types / Video Decode Device
> > > +  / Device Operation / Drain}
> > > +
> > > +\begin{itemize*}
> > > +\item When VIRTIO_VDEC_GUEST_REQ_DRAIN comes, the device MUST process
> > > +  all pending guest requests. After processing them, the device MUST
> > > +  send VIRTIO_VDEC_HOST_REQ_DRAINED to the driver.
> > > +\item While processing a drain request, the device MUST ignore any
> > > +  guest requests other than Flush and Close requests. See
> > > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > > +  and
> > > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Instance
> > > +    Close}.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Flush}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > > +
> > > +To drop any pending decode requests and decoded frames pending on
> > > +output buffers, the driver MAY issue a VIRTIO_VDEC_GUEST_REQ_FLUSH
> > > +request. Any frames that would result from the previously enqueued
> > > +bitstream data will be dropped and the device will not write to any of
> > > +the frame buffers provided earlier.
> > > +
> > > +\devicenormative{\paragraph}{Flush}{Device Types / Video Decode Device
> > > +  / Device Operation / Flush}
> > > +
> > > +\begin{itemize*}
> > > +\item The device MUST send VIRTIO_VDEC_HOST_REQ_FLUSHED when finishing
> > > +  the flush process.
> > > +\item If VIRTIO_VDEC_GUEST_REQ_FLUSH comes while processing guest
> > > +  requests, the device MUST stop them.
> > > +\item The device MUST NOT process any guest request that comes after
> > > +  VIRTIO_VDEC_GUEST_REQ_FLUSH until it sends
> > > +  VIRTIO_VDEC_HOST_REQ_FLUSHED.
> > > +\item The device MUST dequeue all unread requests in outq when
> > > +  VIRTIO_VDEC_GUEST_REQ_FLUSH comes.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{EOS Notification}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / EOS Notification}
> > > +
> > > +The device sends VIRTIO_VDEC_HOST_REQ_EOS when the end of a stream
> > > +(EOS) is reached while decoding.
> > > +
> > > +This request is similar to the V4L2_EVENT_EOS for V4L2 devices.
> > > +
> > > +\subsubsection{Instance Close}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Close}
> > > +
> > > +If a decoding instance is no longer needed, the driver SHOULD issue a
> > > +VIRTIO_VDEC_GUEST_REQ_CLOSE request. This will trigger an implicit
> > > +flush operation (as in a VIRTIO_VDEC_GUEST_REQ_FLUSH request; without
> > > +a following host request) and free the instance ID and any associated
> > > +resources on the device side.
> > > +
> > > +The instance is successfully terminated once the device responds to
> > > +the request. After that, it is guaranteed that any buffers registered
> > > +by the driver to given instance will no longer be accessed by the
> > > +device (unless also provided to another instance).
> > > +
> > > +\subsubsection{Error Reporting}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Error Reporting}
> > > +
> > > +The device has two ways of reporting errors to the driver: one is for
> > > +errors associated with a host request, the other is for global errors.
> > > +
> > > +If an error is associated with a host request, the device sets
> > > +\field{type} to the host request and \field{result} to a value of
> > > +\field{virtio_vdec_host_req_result} describing the error in
> > > +\field{virtio_vdec_host_req}.
> > > +
> > > +If the device cannot proceed decoding due to an error which is not
> > > +associated with any requests, the device MUST send the
> > > +VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR. In such case, the driver
> > > +MUST stop sending decoding requests.
> > > --
> > > 2.23.0.237.gc6a4ce50a0-goog
> > >
> >

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [virtio-dev] Re: [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-09-19 11:17       ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-09-19 11:17 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: virtio-dev, Alexandre Courbot, Alex Lau, Dylan Reid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Tomasz Figa, Linux Media Mailing List

I shared PDF version of this RFC on Google drive:
https://drive.google.com/drive/folders/1hed-mTVI7dj0M_iab4DTfx5kPMLoeX8R

On Thu, Sep 19, 2019 at 8:15 PM Keiichi Watanabe <keiichiw@chromium.org> wrote:
>
> Hi Hans,
> Thank you for your feedback.
>
> On Thu, Sep 19, 2019 at 6:53 PM Hans Verkuil <hverkuil@xs4all.nl> wrote:
> >
> > Hi Keiichi,
> >
> > On 9/19/19 11:34 AM, Keiichi Watanabe wrote:
> > > [Resending because of some issues with sending to virtio-dev. Sorry for the noise.]
> > >
> > > This patch proposes virtio specification for new virtio video decode
> > > device.
> > > This device provides the functionality of hardware accelerated video
> > > decoding from encoded video contents provided by the guest into frame
> > > buffers accessible by the guest.
> > >
> > > We have prototype implementation for VMs on Chrome OS:
> > > * virtio-vdec device in crosvm [1]
> > > * virtio-vdec driver in Linux kernel v4.19 [2]
> > >   - This driver follows V4L2 stateful video decoder API [3].
> > >
> > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > device to use buffers allocated by virtio-gpu device.
> > >
> > > Any feedback would be greatly appreciated. Thank you.
> >
> > I'm not a virtio expert, but as I understand it the virtio-vdec driver
> > looks like a regular v4l2 stateful decoder devices to the guest, while
> > on the host there is a driver (or something like that) that maps the
> > virtio-vdec requests to the actual decoder hardware, right?
> >
> > What concerns me a bit (but there may be good reasons for this) is that
> > this virtio driver is so specific for stateful decoders.
> >
>
> We aim to design a platform-independent interface. The virtio-vdec protocol
> should be designed to be workable regardless of APIs, OS, and platforms
> eventually.
>
> Our prototype virtio-vdec device translates the virtio-vdec protocol to a
> Chrome's video decode acceleration API instead of talking to hardware decoders
> directly. This Chrome's API is an abstract layer for multiple decoder APIs on
> Linux such as V4L2 stateful, V4L2 slice, Intel's VAAPI.
>
> That is to say the guest driver translates V4L2 stateful API to virtio-vdec and
> the host device translates virtio-vdec to Chrome's API. So, I could say that
> this is already more general than a mere V4L2 stateful API wrapper, at least.
>
> I'd appreciate if you could let me know some parts are still specific to V4L2.
>
>
> > How does this scale to stateful encoders? Stateless codecs? Other M2M
> > devices like deinterlacers or colorspace converters? What about webcams?
> >
>
> We're designing virtio protocol for encoder as well, but we are at an early
> stage. So, I'm not sure if we should/can handle decoder and encoder in one
> protocol. I don't have any plans for other media devices.
>
>
> > In other words, I would like to see a bigger picture here.
> >
> > Note that there is also an effort for Xen to expose webcams to a guest:
> >
> > https://www.spinics.net/lists/linux-media/msg148629.html
> >
>
> Good to know. Thanks.
>
> > This may or may not be of interest. This was an RFC only, and I haven't
> > seen any follow-up patches with actual code.
> >
> > There will be a half-day meeting of media developers during the ELCE
> > in October about codecs. I know Alexandre and Tomasz will be there.
> > It might be a good idea to discuss this in more detail if needed.
> >
>
> Sounds good. They are closely working with me.
>
> Regards,
> Keiichi
>
> > Regards,
> >
> >         Hans
> >
> > >
> > > [1] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-device%22+(status:open%20OR%20status:merged)
> > > [2] https://chromium-review.googlesource.com/q/hashtag:%22virtio-vdec-driver%22+(status:open%20OR%20status:merged)
> > > [3] https://hverkuil.home.xs4all.nl/codec-api/uapi/v4l/dev-decoder.html (to be merged to Linux 5.4)
> > > [4] https://lkml.org/lkml/2019/9/12/157
> > >
> > > Signed-off-by: Keiichi Watanabe <keiichiw@chromium.org>
> > > ---
> > >  content.tex     |   1 +
> > >  virtio-vdec.tex | 750 ++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 751 insertions(+)
> > >  create mode 100644 virtio-vdec.tex
> > >
> > > diff --git a/content.tex b/content.tex
> > > index 37a2190..b57d4a9 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -5682,6 +5682,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> > >  \input{virtio-input.tex}
> > >  \input{virtio-crypto.tex}
> > >  \input{virtio-vsock.tex}
> > > +\input{virtio-vdec.tex}
> > >
> > >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >
> > > diff --git a/virtio-vdec.tex b/virtio-vdec.tex
> > > new file mode 100644
> > > index 0000000..d117129
> > > --- /dev/null
> > > +++ b/virtio-vdec.tex
> > > @@ -0,0 +1,750 @@
> > > +\section{Video Decode Device}
> > > +\label{sec:Device Types / Video Decode Device}
> > > +
> > > +virtio-vdec is a virtio based video decoder. This device provides the
> > > +functionality of hardware accelerated video decoding from encoded
> > > +video contents provided by the guest into frame buffers accessible by
> > > +the guest.
> > > +
> > > +\subsection{Device ID}
> > > +\label{sec:Device Types / Video Decode Device / Device ID}
> > > +
> > > +28
> > > +
> > > +\subsection{Virtqueues}
> > > +\label{sec:Device Types / Video Decode Device / Virtqueues}
> > > +
> > > +\begin{description}
> > > +\item[0] outq - queue for sending requests from the driver to the
> > > +  device
> > > +\item[1] inq - queue for sending requests from the device to the
> > > +  driver
> > > +\end{description}
> > > +
> > > +Each queue is used uni-directionally. outq is used to send requests
> > > +from the driver to the device (i.e., guest requests) and inq is used
> > > +to send requests in the other direction (i.e., host requests).
> > > +
> > > +\subsection{Feature bits}
> > > +\label{sec:Device Types / Video Decode Device / Feature bits}
> > > +
> > > +There are currently no feature bits defined for this device.
> > > +
> > > +\subsection{Device configuration layout}
> > > +\label{sec:Device Types / Video Decode Device / Device configuration layout}
> > > +
> > > +None.
> > > +
> > > +\subsection{Device Requirements: Device Initialization}
> > > +\label{sec:Device Types / Video Decode Device / Device Requirements: Device Initialization}
> > > +
> > > +The virtqueues are initialized.
> > > +
> > > +\subsection{Device Operation}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation}
> > > +
> > > +\subsubsection{Video Buffers}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffers}
> > > +
> > > +A virtio-vdec driver and a device use two types of video buffers:
> > > +\emph{bitstream buffer} and \emph{frame buffer}. A bitstream buffer
> > > +contains encoded video stream data. This buffer is similar to an
> > > +OUTPUT buffer for Video for Linux Two (V4L2) API. A frame buffer
> > > +contains decoded video frame data like a CAPTURE buffers for V4L2 API.
> > > +The driver and the device share these buffers, and each buffer is
> > > +identified by a unique integer called a \emph{resource handle}.
> > > +
> > > +\subsubsection{Guest Request}
> > > +
> > > +The driver queues requests to the outq virtqueue. The device MAY
> > > +process requests out-of-order. All requests on outq use the following
> > > +structure:
> > > +
> > > +\begin{lstlisting}
> > > +enum virtio_vdec_guest_req_type {
> > > +        VIRTIO_VDEC_GUEST_REQ_UNDEFINED = 0,
> > > +
> > > +        /* Global */
> > > +        VIRTIO_VDEC_GUEST_REQ_QUERY = 0x0100,
> > > +
> > > +        /* Per instance */
> > > +        VIRTIO_VDEC_GUEST_REQ_OPEN = 0x0200,
> > > +        VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT,
> > > +        VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER,
> > > +        VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO,
> > > +        VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER,
> > > +        VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER,
> > > +        VIRTIO_VDEC_GUEST_REQ_DRAIN,
> > > +        VIRTIO_VDEC_GUEST_REQ_FLUSH,
> > > +        VIRTIO_VDEC_GUEST_REQ_CLOSE,
> > > +};
> > > +
> > > +struct virtio_vdec_guest_req {
> > > +        le32 type;
> > > +        le32 instance_id;
> > > +        union {
> > > +                struct virtio_vdec_guest_req_open open;
> > > +                struct virtio_vdec_guest_req_set_buffer_count set_buffer_count;
> > > +                struct virtio_vdec_guest_req_register_buffer register_buffer;
> > > +                struct virtio_vdec_guest_req_bitstream_buffer bitstream_buffer;
> > > +                struct virtio_vdec_guest_req_frame_buffer frame_buffer;
> > > +        };
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +This structure includes the following generic fields to identify the
> > > +request:
> > > +\begin{description}
> > > +\item[\field{type}] specifies the type of the guest request using
> > > +  values from \field{enum virtio_vdec_guest_req_type}.
> > > +\item[\field{instance_id}] specifies an instance ID. This field is
> > > +  ignored unless \field{type} represents a per-instance request.
> > > +\end{description}
> > > +
> > > +The union fields are additional per-request data used only when
> > > +\field{type} has a specific value:
> > > +
> > > +\begin{description}
> > > +\item[\field{open}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_OPEN. See \ref{sec:Device Types / Video Decode
> > > +    Device / Device Operation / Instance Open}.
> > > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > > +\item[\field{register_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Buffer Registration}.
> > > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Bitstream Buffer
> > > +    Requests}.
> > > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Frame Buffer Requests}.
> > > +\end{description}
> > > +
> > > +\subsubsection{Host Request}
> > > +
> > > +The device queues requests to the inq virtqueue. All requests on inq
> > > +use the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +enum virtio_vdec_host_req_type {
> > > +        VIRTIO_VDEC_HOST_REQ_UNDEFINED = 0,
> > > +
> > > +        /* Global */
> > > +        VIRTIO_VDEC_HOST_REQ_QUERY = 0x0100,
> > > +
> > > +        /* Per instance */
> > > +        VIRTIO_VDEC_HOST_REQ_OPEN = 0x0200,
> > > +        VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT,
> > > +        VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER,
> > > +        VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER,
> > > +        VIRTIO_VDEC_HOST_REQ_STREAM_INFO,
> > > +        VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER,
> > > +        VIRTIO_VDEC_HOST_REQ_DRAINED,
> > > +        VIRTIO_VDEC_HOST_REQ_FLUSHED,
> > > +        VIRTIO_VDEC_HOST_REQ_CLOSE,
> > > +        VIRTIO_VDEC_HOST_REQ_EOS,
> > > +
> > > +        /* Error response */
> > > +        VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR = 0x1100,
> > > +};
> > > +
> > > +enum virtio_vdec_host_req_result {
> > > +        /* Success */
> > > +        VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS = 0,
> > > +
> > > +        /* Error */
> > > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_UNSPEC = 0x1000,
> > > +        VIRTIO_VDEC_HOST_REQ_RESULT_ERROR_INVALID_REQUEST,
> > > +};
> > > +
> > > +struct virtio_vdec_host_req {
> > > +        le32 type;     /* VIRTIO_VDEC_HOST_REQ_* */
> > > +        le32 result;   /* VIRTIO_VDEC_HOST_REQ_RESULT_ */
> > > +        le32 instance_id;
> > > +        le32 reserved; /* for 64-bit alignment */
> > > +        union {
> > > +                struct virtio_vdec_host_req_query query;
> > > +                struct virtio_vdec_host_req_set_buffer_count set_buffer_count;
> > > +                struct virtio_vdec_host_req_register_buffer register_buffer;
> > > +                struct virtio_vdec_host_req_bitstream_buffer bitstream_buffer;
> > > +                struct virtio_vdec_host_req_stream_info stream_info;
> > > +                struct virtio_vdec_host_req_frame_buffer frame_buffer;
> > > +        };
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +This structure includes the following generic fields:
> > > +\begin{description}
> > > +\item[\field{type}] specifies the type of the host request using
> > > +  values from \field{enum virtio_vdec_host_req_type}.
> > > +\item[\field{result}] specifies the result of the operation using the
> > > +  \field{enum virtio_vdec_host_req_result}.
> > > +\item[\field{instance_id}] specifies an instance ID. This field is
> > > +  ignored unless \field{type} represents a per-instance request.
> > > +\end{description}
> > > +
> > > +The union fields are additional per-request data used only when
> > > +\field{type} has a specific value:
> > > +
> > > +\begin{description}
> > > +\item[\field{query}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_QUERY. See \ref{sec:Device Types / Video Decode
> > > +    Device / Device Operation / Query Capabilities}.
> > > +\item[\field{set_buffer_count}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Set Buffer Count}.
> > > +\item[\field{register_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Buffer Registration}.
> > > +\item[\field{bitstream_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER. See \ref{sec:Device Types /
> > > +    Video Decode Device / Device Operation / Bitstream Buffer
> > > +    Requests}.
> > > +\item[\field{stream_info}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types / Video
> > > +    Decode Device / Device Operation / Stream Information}.
> > > +\item[\field{frame_buffer}] is used when \field{type} is set to
> > > +  VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER. See \ref{sec:Device Types / Video
> > > +    Decode Device / Device Operation / Frame Buffer Requests}.
> > > +\end{description}
> > > +
> > > +\subsubsection{Query Capabilities}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Query Capabilities}
> > > +
> > > +The driver can send VIRTIO_VDEC_GUEST_REQ_QUERY to query supported
> > > +bitstream formats and frame formats. This is similar to
> > > +VIDIOC_ENUM_FMT ioctl and VIDIOC_ENUM_FRAMESIZES ioctl for V4L2
> > > +devices.
> > > +
> > > +The device sends supported formats as VIRTIO_VDEC_HOST_REQ_QUERY with
> > > +the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_range {
> > > +        le32 min;
> > > +        le32 max;
> > > +        le32 step;
> > > +};
> > > +
> > > +struct virtio_vdec_format_desc {
> > > +        le32 fourcc;
> > > +        le32 mask;
> > > +        struct virtio_vdec_range width;
> > > +        struct virtio_vdec_range height;
> > > +};
> > > +
> > > +#define VIRTIO_VDEC_NUM_FORMATS 32
> > > +
> > > +struct virtio_vdec_host_req_query {
> > > +        struct virtio_vdec_format_desc bitstream_formats[VIRTIO_VDEC_NUM_FORMATS];
> > > +        struct virtio_vdec_format_desc frame_formats[VIRTIO_VDEC_NUM_FORMATS];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +The \field{virtio_vdec_host_req_query} describes the set of supported
> > > +bitstream and frame formats, with corresponding ranges of image
> > > +resolution and compatibility mask to determine the compatibility
> > > +between coded and raw formats.
> > > +
> > > +\begin{description}
> > > +\item[fourcc] specifies a FOURCC of a supported video codec or a pixel
> > > +  format.
> > > +\item[mask] is used to represent supported combination of video codec
> > > +  and pixel format. If i-th bit is set in \field{mask} of the
> > > +  \field{bitstream_formats[j]}, the device MUST support decoding from
> > > +  the video codec of \field{bitstream_formats[j]} to the pixel format
> > > +  of \field{frame_formats[i]}.
> > > +\item[width/height] represents either a supported stream resolution or
> > > +  a supported frame buffer resolution.
> > > +\end{description}
> > > +
> > > +\devicenormative{\paragraph}{Query Capabilities}{Device Types / Video
> > > +  Decode Device / Device Operation / Query Capabilities}
> > > +
> > > +The device MUST set \field{mask} to 0 if the \field{struct
> > > +  virtio_vdec_format_desc} is invalid in VIRTIO_VDEC_HOST_REQ_QUERY.
> > > +
> > > +\subsubsection{Instance Open}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Open}
> > > +
> > > +To acquire a decoder instance for a given coded format, the driver
> > > +sends a VIRTIO_VDEC_GUEST_REQ_OPEN request. The driver specifies a
> > > +video codec with the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_guest_req_open {
> > > +        le32 bitstream_fourcc;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +When the guest request comes, the device allocates the instance and
> > > +sends a VIRTIO_VDEC_HOST_REQ_OPEN request.
> > > +
> > > +Once the driver recieves VIRTIO_VDEC_HOST_REQ_OPEN, the driver can
> > > +send per-instance guest requests to the opened instance by specifying
> > > +\field{instance_id}. The device MUST process per-instance requests
> > > +with a same value of \field{instance_id} in order.
> > > +
> > > +\drivernormative{\paragraph}{Instance Open}{Device Types / Video
> > > +  Decode Device / Device Operation / Instance Open}
> > > +
> > > +In VIRTIO_VDEC_GUEST_REQ_OPEN,
> > > +\begin{itemize*}
> > > +\item The driver MUST set \field{bitstream_fourcc} to a FOURCC of a
> > > +  video codec that is supported by the device.
> > > +\item The driver MUST set \field{instance_id} to an integer that is
> > > +  not used as an \field{instance_id} before.
> > > +\end{itemize*}
> > > +
> > > +\devicenormative{\paragraph}{Instance Open}{Device Types / Video
> > > +  Decode Device / Device Operation / Instance Open}
> > > +
> > > +If the device fails to allocate an instance, the device MUST set
> > > +\field{result} to a value other than
> > > +VIRTIO_VDEC_HOST_REQ_RESULT_SUCCESS.
> > > +
> > > +\subsubsection{Set Buffer Count}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Set Buffer Count}
> > > +
> > > +Once an instance is acquired, the driver and the device negotiate the
> > > +number of buffers to use during the decoding process by
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT and
> > > +VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT. These requests are similar to
> > > +VIDIOC_REQBUFS controls for V4L2 devices. First, the driver sends
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to tell how many buffers the
> > > +driver needs by using the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +enum virtio_vdec_buffer_type {
> > > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_BITSTREAM = 0,
> > > +        VIRTIO_VDEC_GUEST_REQ_BUFFER_TYPE_FRAME = 1,
> > > +};
> > > +
> > > +struct virtio_vdec_guest_req_set_buffer_count {
> > > +        le32 type;
> > > +        le32 buffer_count;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[buffer_count] is the minimum number of buffers that the driver
> > > +  needs to use.
> > > +\end{description}
> > > +
> > > +The device responds the VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT to
> > > +notify the number of buffers that it can accept with the following
> > > +structure.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_set_buffer_count {
> > > +        le32 type;
> > > +        le32 buffer_count;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[buffer_count] is the maximum number of buffers that the device
> > > +  can accept.
> > > +\end{description}
> > > +
> > > +\drivernormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > > +  Decode Device / Device Operation / Set Buffer Count}
> > > +
> > > +The driver MUST set \field{buffer_count} to a non-zero value in
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT. The value of
> > > +\field{buffer_count} for frame buffers MUST match the requirement
> > > +given via VIRTIO_VDEC_HOST_REQ_STREAM_INFO. See \ref{sec:Device Types
> > > +  / Video Decode Device / Device Operation / Stream Information}.
> > > +
> > > +\devicenormative{\paragraph}{Set Buffer Count}{Device Types / Video
> > > +  Decode Device / Device Operation / Set Buffer Count}
> > > +
> > > +The device MUST set \field{buffer_count} to a number of buffers that
> > > +the device can accept. The value MAY be different from the value the
> > > +driver set \field{buffer_count} to in the corresponding
> > > +VIRTIO_VDEC_GUEST_REQ_SET_BUFFER_COUNT request.
> > > +
> > > +\subsubsection{Buffer Registration}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Buffer Registration}
> > > +
> > > +Once the number of buffers is set, the driver registers metadata of
> > > +buffers via the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER with the
> > > +following structure:
> > > +
> > > +\begin{lstlisting}
> > > +#define VIRTIO_VDEC_MAX_PLANES 8
> > > +
> > > +struct virtio_vdec_guest_req_register_buffer {
> > > +        le32 type;
> > > +        le32 num_planes;
> > > +        struct virtio_vdec_plane {
> > > +                le32 handle;
> > > +                le32 offset;
> > > +                le32 length;
> > > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[VIRTIO_VDEC_MAX_PLANES] is the maximum number of planes that a
> > > +  video format needs. It is similar to VIDEO_MAX_PLANES in
> > > +  \href{https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/videodev2.h}{include/uapi/linux/videodev2.h}
> > > +  in the Linux source tree.
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[num_planes] is the number of planes (i.e. separate memory
> > > +  buffers) for this format. This indicates the number of valid entries
> > > +  in \field{planes}.
> > > +\item[planes] is an array of structures describing information of each
> > > +  plane.
> > > +  \begin{description}
> > > +  \item[handle] is a resource handle for the buffer.
> > > +  \item[offset] equals to the offset in the plane to the start of
> > > +    data.
> > > +  \item[length] is the size of this plane in bytes.
> > > +  \end{description}
> > > +\end{description}
> > > +
> > > +Once the device accepts the buffer registration, the device responds
> > > +the VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER with the following structure.
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_register_buffer {
> > > +        le32 type;
> > > +        le32 handles[VIRTIO_VDEC_MAX_PLANES];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[type] specifies a buffer type with a value of the
> > > +  \field{virtio_vdec_buffer_type}.
> > > +\item[handles] is an array of resource handles from the corresponding
> > > +  guest request.
> > > +\end{description}
> > > +
> > > +\drivernormative{\paragraph}{Buffer Registration}{Device Types / Video
> > > +  Decode Device / Device Operation / Buffer Registration}
> > > +
> > > +The VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER requires followings:
> > > +
> > > +\begin{itemize*}
> > > +\item The driver MUST set \field{type}.
> > > +\item The driver MUST set \field{num_planes} to a non-zero value that
> > > +  is less than of equal to VIRTIO_VDEC_MAX_PLANES.
> > > +\item The driver MUST fill out the fields of \field{planes[i]} for
> > > +  each \(0 \le i < \field{num_planes} \).
> > > +\item The driver MUST NOT send VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER
> > > +  more than the number of times that the device requested via
> > > +  VIRTIO_VDEC_HOST_REQ_SET_BUFFER_COUNT.
> > > +\end{itemize*}
> > > +
> > > +\devicenormative{\paragraph}{Buffer Registration}{Device Types / Video
> > > +  Decode Device / Device Operation / Buffer Registration}
> > > +
> > > +The VIRTIO_VDEC_HOST_REQ_REGISTER_BUFFER requires followings:
> > > +
> > > +\begin{itemize*}
> > > +\item The device MUST set \field{type} to the value that the
> > > +  corresponding VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER used.
> > > +\item The device MUST set \field{handles[i]} to the value that equals
> > > +  to \field{register_buffer.planes[i].handle} in the corresponding
> > > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Bitstream Buffer Requests}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > > +
> > > +The driver sends VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER to ask the
> > > +device to decode contents in a registered bitstream buffer. This
> > > +request is similar to the V4L2's VIDIOC_QBUF ioctl for the OUTPUT
> > > +buffer. The following structure is used to specify the buffer:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_guest_req_bitstream_buffer {
> > > +        le32 handle;
> > > +        le32 offset;
> > > +        le32 length;
> > > +        le64 cookie;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[handle] is a resource handle for the bitstream buffer to be
> > > +  decoded.
> > > +\item[offset] is the offset in bytes to video data in the buffer.
> > > +\item[length] is the size of valid data in the buffer.
> > > +\item[cookie] is an identifier for the frame decoded from this
> > > +  bitstream data. This cookie will be copied to the matching frame
> > > +  buffer. See also \field{virtio_vdec_host_req_frame_buffer} in
> > > +  \ref{sec:Device Types / Video Decode Device / Device Operation /
> > > +    Frame Buffer Requests}.
> > > +\end{description}
> > > +
> > > +Once the device consumes the bitstream passed by the
> > > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request, it sends
> > > +VIRTIO_VDEC_HOST_REQ_BITSTREAM_BUFFER with the following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_bitstream_buffer {
> > > +        le32 handle;
> > > +};
> > > +\end{lstlisting}
> > > +\begin{description}
> > > +\item[handle] is a resource handle of the bitstream buffer that the
> > > +  device completed processing.
> > > +\end{description}
> > > +
> > > +The driver can reuse bitstream buffers that
> > > +VIRTIO_VDEC_HOST_REQ_BITSTREAM is sent for.
> > > +
> > > +\drivernormative{\paragraph}{Bitstream Buffer Requests}{Device Types /
> > > +  Video Decode Device / Device Operation / Bitstream Buffer Requests}
> > > +
> > > +\begin{itemize*}
> > > +\item The driver MUST use \field{handle} which is already registered
> > > +  by the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Stream Information}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Stream Information}
> > > +
> > > +To obtain the information required to allocate frame buffers
> > > +(resolution, etc.), the driver registers bitstream buffers first and
> > > +proceeds with queuing VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER requests
> > > +until stream information is received via a
> > > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO host request. The
> > > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO is also sent when a run-time
> > > +resolution change is detected. This request is similar to
> > > +V4L2_EVENT_SOURCE_CHANGE events for V4L2 devices.
> > > +
> > > +The following structure is used for the
> > > +VIRTIO_VDEC_HOST_REQ_STREAM_INFO:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_stream_info {
> > > +        le32 frame_fourcc;
> > > +        le32 fb_width;
> > > +        le32 fb_height;
> > > +        le32 min_frame_buffer_count;
> > > +        le32 max_frame_buffer_count;
> > > +        struct virtio_vdec_host_req_stream_crop {
> > > +                le32 left;
> > > +                le32 top;
> > > +                le32 width;
> > > +                le32 height;
> > > +        } crop;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[frame_fourcc] indicates the pixel format that the video is
> > > +  decoded to.
> > > +\item[fb_width] is the width of a required frame buffer.
> > > +\item[fb_height] is the height of a required frame buffer.
> > > +\item[min_frame_buffer_count] is the minimum number of frame buffers
> > > +  that the device requires.
> > > +\item[max_frame_buffer_count] is the maximum number of frame buffers
> > > +  that the device can accept.
> > > +\item[crop] indicates video cropping information.
> > > +    \begin{description}
> > > +    \item[left] is the horizontal offset of the left-most valid pixel
> > > +      of the video in pixels.
> > > +    \item[top] is the vertical offset of the top-most valid line of
> > > +      the video in lines.
> > > +    \item[width] is the width of the rectangle in pixels.
> > > +    \item[height] is the height of the rectangle in pixels.
> > > +    \end{description}
> > > +\end{description}
> > > +
> > > +The driver responds the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO once it
> > > +processes the incoming stream information.
> > > +
> > > +\devicenormative{\paragraph}{Stream Information}{Device Types / Video
> > > +  Decode Device / Device Operation / Stream Information}
> > > +
> > > +\begin{itemize*}
> > > +\item After the device send the VIRTIO_VDEC_HOST_REQ_STREAM_INFO, the
> > > +  device MUST empty its list of registered frame buffers and skip any
> > > +  pending VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER requests until a
> > > +  VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO request is received.
> > > +\end{itemize*}
> > > +
> > > +\drivernormative{\paragraph}{Stream Information}{Device Types / Video
> > > +  Decode Device / Device Operation / Stream Information}
> > > +
> > > +\begin{itemize*}
> > > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_ACK_STREAM_INFO
> > > +  once it processes a VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > > +\item If VIRTIO_VDEC_HOST_REQ_STREAM_INFO came when the driver is
> > > +  waiting for a VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER request, the driver
> > > +  MUST enqueue the corresponding VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > > +  again after sending the VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for a
> > > +  frame buffer with new stream information.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Frame Buffer Requests}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Frame Buffer Requests}
> > > +
> > > +The driver provides frame buffers via
> > > +VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER, in which the device will fill
> > > +decoded frames. This request is similar to V4L2's VIDIOC_QBUF ioctl
> > > +for CAPTURE buffers. The following structure is used for the request:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_guest_req_frame_buffer {
> > > +        struct virtio_vdec_frame_buffer_plane {
> > > +                le32 handle;
> > > +                le32 offset;
> > > +                le32 stride;
> > > +                le32 length;
> > > +        } planes[VIRTIO_VDEC_MAX_PLANES];
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[planes] is an array of planes that a frame buffer consists of.
> > > +    \begin{description}
> > > +    \item[handle] is a resource handle to indicate the buffer for a
> > > +      plane.
> > > +    \item[offset] is the offset in bytes to the place where decoded
> > > +      frame is filled in.
> > > +    \item[stride] is the line stride.
> > > +    \item[length] is the length of plane where the device can fill in
> > > +      decoded data.
> > > +    \end{description}
> > > +\end{description}
> > > +
> > > +When the device fills the buffer with decoded frame data, the device
> > > +notifies to the driver by VIRTIO_VDEC_HOST_REQ_FRAME_BUFFER with the
> > > +following structure:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_vdec_host_req_frame_buffer {
> > > +        le32 handle[VIRTIO_VDEC_MAX_PLANES];
> > > +        le64 cookie;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[handles] is an array of handles passed via the
> > > +  \field{virtio_vdec_guest_req_frame_buffer} in the corresponding
> > > +  VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER.
> > > +\item[cookie] is a \field{cookie} passed via a
> > > +  VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER. This indicates a bitstream
> > > +  buffer which a frame was decoded from.
> > > +\end{description}
> > > +
> > > +\drivernormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > > +
> > > +\begin{itemize*}
> > > +\item Before the driver sends the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER
> > > +  for a frame buffer, the driver MUST have sent the
> > > +  VIRTIO_VDEC_GUEST_REQ_REGISTER_BUFFER for the buffer.
> > > +\item The driver MUST send the VIRTIO_VDEC_GUEST_REQ_FRAME_BUFFER for
> > > +  a frame buffer that satisfies the restriction passed by
> > > +  VIRTIO_VDEC_HOST_REQ_STREAM_INFO.
> > > +\end{itemize*}
> > > +
> > > +\devicenormative{\paragraph}{Frame Buffer Requests}{Device Types /
> > > +  Video Decode Device / Device Operation / Frame Buffer Requests}
> > > +
> > > +When the device successfully processed the frame buffer, it MUST set
> > > +\field{cookie} to a value that is passed via a
> > > +VIRTIO_VDEC_GUEST_REQ_BITSTREAM_BUFFER request that provided the
> > > +corresponding bitstream.
> > > +
> > > +\subsubsection{Drain}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Drain}
> > > +
> > > +To ensure that any pending requests are processed and decoded frames
> > > +returned in appropriate replies, the driver MAY issue a
> > > +VIRTIO_VDEC_GUEST_REQ_DRAIN request. The device will continue
> > > +processing the requests as usual, but once all the requests of given
> > > +context preceding the DRAIN request are processed, the device will
> > > +issue a VIRTIO_VDEC_HOST_REQ_DRAINED host request to notify the driver
> > > +of this fact.
> > > +
> > > +\devicenormative{\paragraph}{Drain}{Device Types / Video Decode Device
> > > +  / Device Operation / Drain}
> > > +
> > > +\begin{itemize*}
> > > +\item When VIRTIO_VDEC_GUEST_REQ_DRAIN comes, the device MUST process
> > > +  all pending guest requests. After processing them, the device MUST
> > > +  send VIRTIO_VDEC_HOST_REQ_DRAINED to the driver.
> > > +\item While processing a drain request, the device MUST ignore any
> > > +  guest requests other than Flush and Close requests. See
> > > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > > +  and
> > > +  \ref{sec:Device Types / Video Decode Device / Device Operation / Instance
> > > +    Close}.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{Flush}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Flush}
> > > +
> > > +To drop any pending decode requests and decoded frames pending on
> > > +output buffers, the driver MAY issue a VIRTIO_VDEC_GUEST_REQ_FLUSH
> > > +request. Any frames that would result from the previously enqueued
> > > +bitstream data will be dropped and the device will not write to any of
> > > +the frame buffers provided earlier.
> > > +
> > > +\devicenormative{\paragraph}{Flush}{Device Types / Video Decode Device
> > > +  / Device Operation / Flush}
> > > +
> > > +\begin{itemize*}
> > > +\item The device MUST send VIRTIO_VDEC_HOST_REQ_FLUSHED when finishing
> > > +  the flush process.
> > > +\item If VIRTIO_VDEC_GUEST_REQ_FLUSH comes while processing guest
> > > +  requests, the device MUST stop them.
> > > +\item The device MUST NOT process any guest request that comes after
> > > +  VIRTIO_VDEC_GUEST_REQ_FLUSH until it sends
> > > +  VIRTIO_VDEC_HOST_REQ_FLUSHED.
> > > +\item The device MUST dequeue all unread requests in outq when
> > > +  VIRTIO_VDEC_GUEST_REQ_FLUSH comes.
> > > +\end{itemize*}
> > > +
> > > +\subsubsection{EOS Notification}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / EOS Notification}
> > > +
> > > +The device sends VIRTIO_VDEC_HOST_REQ_EOS when the end of a stream
> > > +(EOS) is reached while decoding.
> > > +
> > > +This request is similar to the V4L2_EVENT_EOS for V4L2 devices.
> > > +
> > > +\subsubsection{Instance Close}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Instance Close}
> > > +
> > > +If a decoding instance is no longer needed, the driver SHOULD issue a
> > > +VIRTIO_VDEC_GUEST_REQ_CLOSE request. This will trigger an implicit
> > > +flush operation (as in a VIRTIO_VDEC_GUEST_REQ_FLUSH request; without
> > > +a following host request) and free the instance ID and any associated
> > > +resources on the device side.
> > > +
> > > +The instance is successfully terminated once the device responds to
> > > +the request. After that, it is guaranteed that any buffers registered
> > > +by the driver to given instance will no longer be accessed by the
> > > +device (unless also provided to another instance).
> > > +
> > > +\subsubsection{Error Reporting}
> > > +\label{sec:Device Types / Video Decode Device / Device Operation / Error Reporting}
> > > +
> > > +The device has two ways of reporting errors to the driver: one is for
> > > +errors associated with a host request, the other is for global errors.
> > > +
> > > +If an error is associated with a host request, the device sets
> > > +\field{type} to the host request and \field{result} to a value of
> > > +\field{virtio_vdec_host_req_result} describing the error in
> > > +\field{virtio_vdec_host_req}.
> > > +
> > > +If the device cannot proceed decoding due to an error which is not
> > > +associated with any requests, the device MUST send the
> > > +VIRTIO_VDEC_HOST_REQ_NOTIFY_GLOBAL_ERROR. In such case, the driver
> > > +MUST stop sending decoding requests.
> > > --
> > > 2.23.0.237.gc6a4ce50a0-goog
> > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-09-19  9:34 ` [virtio-dev] " Keiichi Watanabe
@ 2019-09-23  8:56   ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-09-23  8:56 UTC (permalink / raw)
  To: Keiichi Watanabe
  Cc: virtio-dev, acourbot, alexlau, dgreid, marcheu, posciak,
	stevensd, tfiga, hverkuil, linux-media

  Hi,

> Our prototype implementation uses [4], which allows the virtio-vdec
> device to use buffers allocated by virtio-gpu device.

> [4] https://lkml.org/lkml/2019/9/12/157

Well.  I think before even discussing the protocol details we need a
reasonable plan for buffer handling.  I think using virtio-gpu buffers
should be an optional optimization and not a requirement.  Also the
motivation for that should be clear (Let the host decoder write directly
to virtio-gpu resources, to display video without copying around the
decoded framebuffers from one device to another).

Referencing virtio-gpu buffers needs a better plan than just re-using
virtio-gpu resource handles.  The handles are device-specific.  What if
there are multiple virtio-gpu devices present in the guest?

I think we need a framework for cross-device buffer sharing.  One
possible option would be to have some kind of buffer registry, where
buffers can be registered for cross-device sharing and get a unique
id (a uuid maybe?).  Drivers would typically register buffers on
dma-buf export.

Another option would be to pass around both buffer handle and buffer
owner, i.e. instead of "u32 handle" have something like this:

struct buffer_reference {
	enum device_type; /* pci, virtio-mmio, ... */
	union device_address {
		struct pci_address pci_addr;
		u64 virtio_mmio_addr;
		[ ... ]
	};
	u64 device_buffer_handle; /* device-specific, virtio-gpu could use resource ids here */
};

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-09-23  8:56   ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-09-23  8:56 UTC (permalink / raw)
  To: Keiichi Watanabe
  Cc: virtio-dev, acourbot, alexlau, dgreid, marcheu, posciak,
	stevensd, tfiga, hverkuil, linux-media

  Hi,

> Our prototype implementation uses [4], which allows the virtio-vdec
> device to use buffers allocated by virtio-gpu device.

> [4] https://lkml.org/lkml/2019/9/12/157

Well.  I think before even discussing the protocol details we need a
reasonable plan for buffer handling.  I think using virtio-gpu buffers
should be an optional optimization and not a requirement.  Also the
motivation for that should be clear (Let the host decoder write directly
to virtio-gpu resources, to display video without copying around the
decoded framebuffers from one device to another).

Referencing virtio-gpu buffers needs a better plan than just re-using
virtio-gpu resource handles.  The handles are device-specific.  What if
there are multiple virtio-gpu devices present in the guest?

I think we need a framework for cross-device buffer sharing.  One
possible option would be to have some kind of buffer registry, where
buffers can be registered for cross-device sharing and get a unique
id (a uuid maybe?).  Drivers would typically register buffers on
dma-buf export.

Another option would be to pass around both buffer handle and buffer
owner, i.e. instead of "u32 handle" have something like this:

struct buffer_reference {
	enum device_type; /* pci, virtio-mmio, ... */
	union device_address {
		struct pci_address pci_addr;
		u64 virtio_mmio_addr;
		[ ... ]
	};
	u64 device_buffer_handle; /* device-specific, virtio-gpu could use resource ids here */
};

cheers,
  Gerd


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-09-23  8:56   ` Gerd Hoffmann
@ 2019-10-05  6:08     ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-05  6:08 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, stevensd, Hans Verkuil,
	Linux Media Mailing List, Daniel Vetter

Hi Gerd,

On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > Our prototype implementation uses [4], which allows the virtio-vdec
> > device to use buffers allocated by virtio-gpu device.
>
> > [4] https://lkml.org/lkml/2019/9/12/157

First of all, thanks for taking a look at this RFC and for valuable
feedback. Sorry for the late reply.

For reference, Keiichi is working with me and David Stevens on
accelerated video support for virtual machines and integration with
other virtual devices, like virtio-gpu for rendering or our
currently-downstream virtio-wayland for display (I believe there is
ongoing work to solve this problem in upstream too).

>
> Well.  I think before even discussing the protocol details we need a
> reasonable plan for buffer handling.  I think using virtio-gpu buffers
> should be an optional optimization and not a requirement.  Also the
> motivation for that should be clear (Let the host decoder write directly
> to virtio-gpu resources, to display video without copying around the
> decoded framebuffers from one device to another).

Just to make sure we're on the same page, what would the buffers come
from if we don't use this optimization?

I can imagine a setup like this;
 1) host device allocates host memory appropriate for usage with host
video decoder,
 2) guest driver allocates arbitrary guest pages for storage
accessible to the guest software,
 3) guest userspace writes input for the decoder to guest pages,
 4) guest driver passes the list of pages for the input and output
buffers to the host device
 5) host device copies data from input guest pages to host buffer
 6) host device runs the decoding
 7) host device copies decoded frame to output guest pages
 8) guest userspace can access decoded frame from those pages; back to 3

Is that something you have in mind?

>
> Referencing virtio-gpu buffers needs a better plan than just re-using
> virtio-gpu resource handles.  The handles are device-specific.  What if
> there are multiple virtio-gpu devices present in the guest?
>
> I think we need a framework for cross-device buffer sharing.  One
> possible option would be to have some kind of buffer registry, where
> buffers can be registered for cross-device sharing and get a unique
> id (a uuid maybe?).  Drivers would typically register buffers on
> dma-buf export.

This approach could possibly let us handle this transparently to
importers, which would work for guest kernel subsystems that rely on
the ability to handle buffers like native memory (e.g. having a
sgtable or DMA address) for them.

How about allocating guest physical addresses for memory corresponding
to those buffers? On the virtio-gpu example, that could work like
this:
 - by default a virtio-gpu buffer has only a resource handle,
 - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
virtio-gpu device export the buffer to a host framework (inside the
VMM) that would allocate guest page addresses for it, which the
command would return in a response to the guest,
 - virtio-gpu driver could then create a regular DMA-buf object for
such memory, because it's just backed by pages (even though they may
not be accessible to the guest; just like in the case of TrustZone
memory protection on bare metal systems),
 - any consumer would be able to handle such buffer like a regular
guest memory, passing low-level scatter-gather tables to the host as
buffer descriptors - this would nicely integrate with the basic case
without buffer sharing, as described above.

Another interesting side effect of the above approach would be the
ease of integration with virtio-iommu. If the virtio master device is
put behind a virtio-iommu, the guest page addresses become the input
to iommu page tables and IOVA addresses go to the host via the virtio
master device protocol, inside the low-level scatter-gather tables.

What do you think?

Best regards,
Tomasz

>
> Another option would be to pass around both buffer handle and buffer
> owner, i.e. instead of "u32 handle" have something like this:
>
> struct buffer_reference {
>         enum device_type; /* pci, virtio-mmio, ... */
>         union device_address {
>                 struct pci_address pci_addr;
>                 u64 virtio_mmio_addr;
>                 [ ... ]
>         };
>         u64 device_buffer_handle; /* device-specific, virtio-gpu could use resource ids here */
> };
>
> cheers,
>   Gerd
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-05  6:08     ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-05  6:08 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, stevensd, Hans Verkuil,
	Linux Media Mailing List, Daniel Vetter

Hi Gerd,

On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > Our prototype implementation uses [4], which allows the virtio-vdec
> > device to use buffers allocated by virtio-gpu device.
>
> > [4] https://lkml.org/lkml/2019/9/12/157

First of all, thanks for taking a look at this RFC and for valuable
feedback. Sorry for the late reply.

For reference, Keiichi is working with me and David Stevens on
accelerated video support for virtual machines and integration with
other virtual devices, like virtio-gpu for rendering or our
currently-downstream virtio-wayland for display (I believe there is
ongoing work to solve this problem in upstream too).

>
> Well.  I think before even discussing the protocol details we need a
> reasonable plan for buffer handling.  I think using virtio-gpu buffers
> should be an optional optimization and not a requirement.  Also the
> motivation for that should be clear (Let the host decoder write directly
> to virtio-gpu resources, to display video without copying around the
> decoded framebuffers from one device to another).

Just to make sure we're on the same page, what would the buffers come
from if we don't use this optimization?

I can imagine a setup like this;
 1) host device allocates host memory appropriate for usage with host
video decoder,
 2) guest driver allocates arbitrary guest pages for storage
accessible to the guest software,
 3) guest userspace writes input for the decoder to guest pages,
 4) guest driver passes the list of pages for the input and output
buffers to the host device
 5) host device copies data from input guest pages to host buffer
 6) host device runs the decoding
 7) host device copies decoded frame to output guest pages
 8) guest userspace can access decoded frame from those pages; back to 3

Is that something you have in mind?

>
> Referencing virtio-gpu buffers needs a better plan than just re-using
> virtio-gpu resource handles.  The handles are device-specific.  What if
> there are multiple virtio-gpu devices present in the guest?
>
> I think we need a framework for cross-device buffer sharing.  One
> possible option would be to have some kind of buffer registry, where
> buffers can be registered for cross-device sharing and get a unique
> id (a uuid maybe?).  Drivers would typically register buffers on
> dma-buf export.

This approach could possibly let us handle this transparently to
importers, which would work for guest kernel subsystems that rely on
the ability to handle buffers like native memory (e.g. having a
sgtable or DMA address) for them.

How about allocating guest physical addresses for memory corresponding
to those buffers? On the virtio-gpu example, that could work like
this:
 - by default a virtio-gpu buffer has only a resource handle,
 - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
virtio-gpu device export the buffer to a host framework (inside the
VMM) that would allocate guest page addresses for it, which the
command would return in a response to the guest,
 - virtio-gpu driver could then create a regular DMA-buf object for
such memory, because it's just backed by pages (even though they may
not be accessible to the guest; just like in the case of TrustZone
memory protection on bare metal systems),
 - any consumer would be able to handle such buffer like a regular
guest memory, passing low-level scatter-gather tables to the host as
buffer descriptors - this would nicely integrate with the basic case
without buffer sharing, as described above.

Another interesting side effect of the above approach would be the
ease of integration with virtio-iommu. If the virtio master device is
put behind a virtio-iommu, the guest page addresses become the input
to iommu page tables and IOVA addresses go to the host via the virtio
master device protocol, inside the low-level scatter-gather tables.

What do you think?

Best regards,
Tomasz

>
> Another option would be to pass around both buffer handle and buffer
> owner, i.e. instead of "u32 handle" have something like this:
>
> struct buffer_reference {
>         enum device_type; /* pci, virtio-mmio, ... */
>         union device_address {
>                 struct pci_address pci_addr;
>                 u64 virtio_mmio_addr;
>                 [ ... ]
>         };
>         u64 device_buffer_handle; /* device-specific, virtio-gpu could use resource ids here */
> };
>
> cheers,
>   Gerd
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-05  6:08     ` Tomasz Figa
@ 2019-10-07 14:00       ` Dmitry Morozov
  -1 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-07 14:00 UTC (permalink / raw)
  To: virtio-dev
  Cc: Tomasz Figa, Gerd Hoffmann, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak, stevensd,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

Hello,

We at OpenSynergy are also working on an abstract paravirtualized video 
streaming device that operates input and/or output data buffers and can be used 
as a generic video decoder/encoder/input/output device.

We would be glad to share our thoughts and contribute to the discussion. 
Please see some comments regarding buffer allocation inline.

Best regards,
Dmitry.

On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> Hi Gerd,
> 
> On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >   Hi,
> >   
> > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > device to use buffers allocated by virtio-gpu device.
> > > 
> > > [4] https://lkml.org/lkml/2019/9/12/157
> 
> First of all, thanks for taking a look at this RFC and for valuable
> feedback. Sorry for the late reply.
> 
> For reference, Keiichi is working with me and David Stevens on
> accelerated video support for virtual machines and integration with
> other virtual devices, like virtio-gpu for rendering or our
> currently-downstream virtio-wayland for display (I believe there is
> ongoing work to solve this problem in upstream too).
> 
> > Well.  I think before even discussing the protocol details we need a
> > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > should be an optional optimization and not a requirement.  Also the
> > motivation for that should be clear (Let the host decoder write directly
> > to virtio-gpu resources, to display video without copying around the
> > decoded framebuffers from one device to another).
> 
> Just to make sure we're on the same page, what would the buffers come
> from if we don't use this optimization?
> 
> I can imagine a setup like this;
>  1) host device allocates host memory appropriate for usage with host
> video decoder,
>  2) guest driver allocates arbitrary guest pages for storage
> accessible to the guest software,
>  3) guest userspace writes input for the decoder to guest pages,
>  4) guest driver passes the list of pages for the input and output
> buffers to the host device
>  5) host device copies data from input guest pages to host buffer
>  6) host device runs the decoding
>  7) host device copies decoded frame to output guest pages
>  8) guest userspace can access decoded frame from those pages; back to 3
> 
> Is that something you have in mind?
While GPU side allocations can be useful (especially in case of decoder), it 
could be more practical to stick to driver side allocations. This is also due 
to the fact that paravirtualized encoders and cameras are not necessarily 
require a GPU device.

Also, the v4l2 framework already features convenient helpers for CMA and SG 
allocations. The buffers can be used in the same manner as in virtio-gpu: 
buffers are first attached to an already allocated buffer/resource descriptor and 
then are made available for processing by the device using a dedicated command 
from the driver.
> 
> > Referencing virtio-gpu buffers needs a better plan than just re-using
> > virtio-gpu resource handles.  The handles are device-specific.  What if
> > there are multiple virtio-gpu devices present in the guest?
> > 
> > I think we need a framework for cross-device buffer sharing.  One
> > possible option would be to have some kind of buffer registry, where
> > buffers can be registered for cross-device sharing and get a unique
> > id (a uuid maybe?).  Drivers would typically register buffers on
> > dma-buf export.
> 
> This approach could possibly let us handle this transparently to
> importers, which would work for guest kernel subsystems that rely on
> the ability to handle buffers like native memory (e.g. having a
> sgtable or DMA address) for them.
> 
> How about allocating guest physical addresses for memory corresponding
> to those buffers? On the virtio-gpu example, that could work like
> this:
>  - by default a virtio-gpu buffer has only a resource handle,
>  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> virtio-gpu device export the buffer to a host framework (inside the
> VMM) that would allocate guest page addresses for it, which the
> command would return in a response to the guest,
>  - virtio-gpu driver could then create a regular DMA-buf object for
> such memory, because it's just backed by pages (even though they may
> not be accessible to the guest; just like in the case of TrustZone
> memory protection on bare metal systems),
>  - any consumer would be able to handle such buffer like a regular
> guest memory, passing low-level scatter-gather tables to the host as
> buffer descriptors - this would nicely integrate with the basic case
> without buffer sharing, as described above.
> 
> Another interesting side effect of the above approach would be the
> ease of integration with virtio-iommu. If the virtio master device is
> put behind a virtio-iommu, the guest page addresses become the input
> to iommu page tables and IOVA addresses go to the host via the virtio
> master device protocol, inside the low-level scatter-gather tables.
> 
> What do you think?
> 
> Best regards,
> Tomasz
> 
> > Another option would be to pass around both buffer handle and buffer
> > owner, i.e. instead of "u32 handle" have something like this:
> > 
> > struct buffer_reference {
> > 
> >         enum device_type; /* pci, virtio-mmio, ... */
> >         union device_address {
> >         
> >                 struct pci_address pci_addr;
> >                 u64 virtio_mmio_addr;
> >                 [ ... ]
> >         
> >         };
> >         u64 device_buffer_handle; /* device-specific, virtio-gpu could use
> >         resource ids here */> 
> > };
> > 
> > cheers,
> > 
> >   Gerd
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-07 14:00       ` Dmitry Morozov
  0 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-07 14:00 UTC (permalink / raw)
  To: virtio-dev
  Cc: Tomasz Figa, Gerd Hoffmann, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak, stevensd,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

Hello,

We at OpenSynergy are also working on an abstract paravirtualized video 
streaming device that operates input and/or output data buffers and can be used 
as a generic video decoder/encoder/input/output device.

We would be glad to share our thoughts and contribute to the discussion. 
Please see some comments regarding buffer allocation inline.

Best regards,
Dmitry.

On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> Hi Gerd,
> 
> On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >   Hi,
> >   
> > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > device to use buffers allocated by virtio-gpu device.
> > > 
> > > [4] https://lkml.org/lkml/2019/9/12/157
> 
> First of all, thanks for taking a look at this RFC and for valuable
> feedback. Sorry for the late reply.
> 
> For reference, Keiichi is working with me and David Stevens on
> accelerated video support for virtual machines and integration with
> other virtual devices, like virtio-gpu for rendering or our
> currently-downstream virtio-wayland for display (I believe there is
> ongoing work to solve this problem in upstream too).
> 
> > Well.  I think before even discussing the protocol details we need a
> > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > should be an optional optimization and not a requirement.  Also the
> > motivation for that should be clear (Let the host decoder write directly
> > to virtio-gpu resources, to display video without copying around the
> > decoded framebuffers from one device to another).
> 
> Just to make sure we're on the same page, what would the buffers come
> from if we don't use this optimization?
> 
> I can imagine a setup like this;
>  1) host device allocates host memory appropriate for usage with host
> video decoder,
>  2) guest driver allocates arbitrary guest pages for storage
> accessible to the guest software,
>  3) guest userspace writes input for the decoder to guest pages,
>  4) guest driver passes the list of pages for the input and output
> buffers to the host device
>  5) host device copies data from input guest pages to host buffer
>  6) host device runs the decoding
>  7) host device copies decoded frame to output guest pages
>  8) guest userspace can access decoded frame from those pages; back to 3
> 
> Is that something you have in mind?
While GPU side allocations can be useful (especially in case of decoder), it 
could be more practical to stick to driver side allocations. This is also due 
to the fact that paravirtualized encoders and cameras are not necessarily 
require a GPU device.

Also, the v4l2 framework already features convenient helpers for CMA and SG 
allocations. The buffers can be used in the same manner as in virtio-gpu: 
buffers are first attached to an already allocated buffer/resource descriptor and 
then are made available for processing by the device using a dedicated command 
from the driver.
> 
> > Referencing virtio-gpu buffers needs a better plan than just re-using
> > virtio-gpu resource handles.  The handles are device-specific.  What if
> > there are multiple virtio-gpu devices present in the guest?
> > 
> > I think we need a framework for cross-device buffer sharing.  One
> > possible option would be to have some kind of buffer registry, where
> > buffers can be registered for cross-device sharing and get a unique
> > id (a uuid maybe?).  Drivers would typically register buffers on
> > dma-buf export.
> 
> This approach could possibly let us handle this transparently to
> importers, which would work for guest kernel subsystems that rely on
> the ability to handle buffers like native memory (e.g. having a
> sgtable or DMA address) for them.
> 
> How about allocating guest physical addresses for memory corresponding
> to those buffers? On the virtio-gpu example, that could work like
> this:
>  - by default a virtio-gpu buffer has only a resource handle,
>  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> virtio-gpu device export the buffer to a host framework (inside the
> VMM) that would allocate guest page addresses for it, which the
> command would return in a response to the guest,
>  - virtio-gpu driver could then create a regular DMA-buf object for
> such memory, because it's just backed by pages (even though they may
> not be accessible to the guest; just like in the case of TrustZone
> memory protection on bare metal systems),
>  - any consumer would be able to handle such buffer like a regular
> guest memory, passing low-level scatter-gather tables to the host as
> buffer descriptors - this would nicely integrate with the basic case
> without buffer sharing, as described above.
> 
> Another interesting side effect of the above approach would be the
> ease of integration with virtio-iommu. If the virtio master device is
> put behind a virtio-iommu, the guest page addresses become the input
> to iommu page tables and IOVA addresses go to the host via the virtio
> master device protocol, inside the low-level scatter-gather tables.
> 
> What do you think?
> 
> Best regards,
> Tomasz
> 
> > Another option would be to pass around both buffer handle and buffer
> > owner, i.e. instead of "u32 handle" have something like this:
> > 
> > struct buffer_reference {
> > 
> >         enum device_type; /* pci, virtio-mmio, ... */
> >         union device_address {
> >         
> >                 struct pci_address pci_addr;
> >                 u64 virtio_mmio_addr;
> >                 [ ... ]
> >         
> >         };
> >         u64 device_buffer_handle; /* device-specific, virtio-gpu could use
> >         resource ids here */> 
> > };
> > 
> > cheers,
> > 
> >   Gerd
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-07 14:00       ` Dmitry Morozov
@ 2019-10-07 14:14         ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-07 14:14 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: virtio-dev, Gerd Hoffmann, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak, stevensd,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

Hi Dmitry,

On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hello,
>
> We at OpenSynergy are also working on an abstract paravirtualized video
> streaming device that operates input and/or output data buffers and can be used
> as a generic video decoder/encoder/input/output device.
>
> We would be glad to share our thoughts and contribute to the discussion.
> Please see some comments regarding buffer allocation inline.
>
> Best regards,
> Dmitry.
>
> On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > Hi Gerd,
> >
> > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >   Hi,
> > >
> > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > device to use buffers allocated by virtio-gpu device.
> > > >
> > > > [4] https://lkml.org/lkml/2019/9/12/157
> >
> > First of all, thanks for taking a look at this RFC and for valuable
> > feedback. Sorry for the late reply.
> >
> > For reference, Keiichi is working with me and David Stevens on
> > accelerated video support for virtual machines and integration with
> > other virtual devices, like virtio-gpu for rendering or our
> > currently-downstream virtio-wayland for display (I believe there is
> > ongoing work to solve this problem in upstream too).
> >
> > > Well.  I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.  Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >  1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >  2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >  3) guest userspace writes input for the decoder to guest pages,
> >  4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >  5) host device copies data from input guest pages to host buffer
> >  6) host device runs the decoding
> >  7) host device copies decoded frame to output guest pages
> >  8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
> While GPU side allocations can be useful (especially in case of decoder), it
> could be more practical to stick to driver side allocations. This is also due
> to the fact that paravirtualized encoders and cameras are not necessarily
> require a GPU device.
>
> Also, the v4l2 framework already features convenient helpers for CMA and SG
> allocations. The buffers can be used in the same manner as in virtio-gpu:
> buffers are first attached to an already allocated buffer/resource descriptor and
> then are made available for processing by the device using a dedicated command
> from the driver.

First of all, thanks a lot for your input. This is a relatively new
area of virtualization and we definitely need to collect various
possible perspectives in the discussion.

From Chrome OS point of view, there are several aspects for which the
guest side allocation doesn't really work well:
1) host-side hardware has a lot of specific low level allocation
requirements, like alignments, paddings, address space limitations and
so on, which is not something that can be (easily) taught to the guest
OS,
2) allocation system is designed to be centralized, like Android
gralloc, because there is almost never a case when a buffer is to be
used only with 1 specific device. 99% of the cases are pipelines like
decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
encoder and so on, which means that allocations need to take into
account multiple hardware constraints.
3) protected content decoding: the memory for decoded video frames
must not be accessible to the guest at all

That said, the common desktop Linux model bases on allocating from the
producer device (which is why videobuf2 has allocation capability) and
we definitely need to consider this model, even if we just think about
Linux V4L2 compliance. That's why I'm suggesting the unified memory
handling based on guest physical addresses, which would handle both
guest-allocated and host-allocated memory.

Best regards,
Tomasz

> >
> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.  The handles are device-specific.  What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.  One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > dma-buf export.
> >
> > This approach could possibly let us handle this transparently to
> > importers, which would work for guest kernel subsystems that rely on
> > the ability to handle buffers like native memory (e.g. having a
> > sgtable or DMA address) for them.
> >
> > How about allocating guest physical addresses for memory corresponding
> > to those buffers? On the virtio-gpu example, that could work like
> > this:
> >  - by default a virtio-gpu buffer has only a resource handle,
> >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > virtio-gpu device export the buffer to a host framework (inside the
> > VMM) that would allocate guest page addresses for it, which the
> > command would return in a response to the guest,
> >  - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
> >  - any consumer would be able to handle such buffer like a regular
> > guest memory, passing low-level scatter-gather tables to the host as
> > buffer descriptors - this would nicely integrate with the basic case
> > without buffer sharing, as described above.
> >
> > Another interesting side effect of the above approach would be the
> > ease of integration with virtio-iommu. If the virtio master device is
> > put behind a virtio-iommu, the guest page addresses become the input
> > to iommu page tables and IOVA addresses go to the host via the virtio
> > master device protocol, inside the low-level scatter-gather tables.
> >
> > What do you think?
> >
> > Best regards,
> > Tomasz
> >
> > > Another option would be to pass around both buffer handle and buffer
> > > owner, i.e. instead of "u32 handle" have something like this:
> > >
> > > struct buffer_reference {
> > >
> > >         enum device_type; /* pci, virtio-mmio, ... */
> > >         union device_address {
> > >
> > >                 struct pci_address pci_addr;
> > >                 u64 virtio_mmio_addr;
> > >                 [ ... ]
> > >
> > >         };
> > >         u64 device_buffer_handle; /* device-specific, virtio-gpu could use
> > >         resource ids here */>
> > > };
> > >
> > > cheers,
> > >
> > >   Gerd
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-07 14:14         ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-07 14:14 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: virtio-dev, Gerd Hoffmann, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak, stevensd,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

Hi Dmitry,

On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hello,
>
> We at OpenSynergy are also working on an abstract paravirtualized video
> streaming device that operates input and/or output data buffers and can be used
> as a generic video decoder/encoder/input/output device.
>
> We would be glad to share our thoughts and contribute to the discussion.
> Please see some comments regarding buffer allocation inline.
>
> Best regards,
> Dmitry.
>
> On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > Hi Gerd,
> >
> > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >   Hi,
> > >
> > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > device to use buffers allocated by virtio-gpu device.
> > > >
> > > > [4] https://lkml.org/lkml/2019/9/12/157
> >
> > First of all, thanks for taking a look at this RFC and for valuable
> > feedback. Sorry for the late reply.
> >
> > For reference, Keiichi is working with me and David Stevens on
> > accelerated video support for virtual machines and integration with
> > other virtual devices, like virtio-gpu for rendering or our
> > currently-downstream virtio-wayland for display (I believe there is
> > ongoing work to solve this problem in upstream too).
> >
> > > Well.  I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.  Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >  1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >  2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >  3) guest userspace writes input for the decoder to guest pages,
> >  4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >  5) host device copies data from input guest pages to host buffer
> >  6) host device runs the decoding
> >  7) host device copies decoded frame to output guest pages
> >  8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
> While GPU side allocations can be useful (especially in case of decoder), it
> could be more practical to stick to driver side allocations. This is also due
> to the fact that paravirtualized encoders and cameras are not necessarily
> require a GPU device.
>
> Also, the v4l2 framework already features convenient helpers for CMA and SG
> allocations. The buffers can be used in the same manner as in virtio-gpu:
> buffers are first attached to an already allocated buffer/resource descriptor and
> then are made available for processing by the device using a dedicated command
> from the driver.

First of all, thanks a lot for your input. This is a relatively new
area of virtualization and we definitely need to collect various
possible perspectives in the discussion.

From Chrome OS point of view, there are several aspects for which the
guest side allocation doesn't really work well:
1) host-side hardware has a lot of specific low level allocation
requirements, like alignments, paddings, address space limitations and
so on, which is not something that can be (easily) taught to the guest
OS,
2) allocation system is designed to be centralized, like Android
gralloc, because there is almost never a case when a buffer is to be
used only with 1 specific device. 99% of the cases are pipelines like
decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
encoder and so on, which means that allocations need to take into
account multiple hardware constraints.
3) protected content decoding: the memory for decoded video frames
must not be accessible to the guest at all

That said, the common desktop Linux model bases on allocating from the
producer device (which is why videobuf2 has allocation capability) and
we definitely need to consider this model, even if we just think about
Linux V4L2 compliance. That's why I'm suggesting the unified memory
handling based on guest physical addresses, which would handle both
guest-allocated and host-allocated memory.

Best regards,
Tomasz

> >
> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.  The handles are device-specific.  What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.  One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > dma-buf export.
> >
> > This approach could possibly let us handle this transparently to
> > importers, which would work for guest kernel subsystems that rely on
> > the ability to handle buffers like native memory (e.g. having a
> > sgtable or DMA address) for them.
> >
> > How about allocating guest physical addresses for memory corresponding
> > to those buffers? On the virtio-gpu example, that could work like
> > this:
> >  - by default a virtio-gpu buffer has only a resource handle,
> >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > virtio-gpu device export the buffer to a host framework (inside the
> > VMM) that would allocate guest page addresses for it, which the
> > command would return in a response to the guest,
> >  - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
> >  - any consumer would be able to handle such buffer like a regular
> > guest memory, passing low-level scatter-gather tables to the host as
> > buffer descriptors - this would nicely integrate with the basic case
> > without buffer sharing, as described above.
> >
> > Another interesting side effect of the above approach would be the
> > ease of integration with virtio-iommu. If the virtio master device is
> > put behind a virtio-iommu, the guest page addresses become the input
> > to iommu page tables and IOVA addresses go to the host via the virtio
> > master device protocol, inside the low-level scatter-gather tables.
> >
> > What do you think?
> >
> > Best regards,
> > Tomasz
> >
> > > Another option would be to pass around both buffer handle and buffer
> > > owner, i.e. instead of "u32 handle" have something like this:
> > >
> > > struct buffer_reference {
> > >
> > >         enum device_type; /* pci, virtio-mmio, ... */
> > >         union device_address {
> > >
> > >                 struct pci_address pci_addr;
> > >                 u64 virtio_mmio_addr;
> > >                 [ ... ]
> > >
> > >         };
> > >         u64 device_buffer_handle; /* device-specific, virtio-gpu could use
> > >         resource ids here */>
> > > };
> > >
> > > cheers,
> > >
> > >   Gerd
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-07 14:14         ` Tomasz Figa
@ 2019-10-07 15:09           ` Dmitry Morozov
  -1 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-07 15:09 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: virtio-dev, Gerd Hoffmann, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak, stevensd,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

Hi Tomasz,

On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> Hi Dmitry,
> 
> On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> 
> <dmitry.morozov@opensynergy.com> wrote:
> > Hello,
> > 
> > We at OpenSynergy are also working on an abstract paravirtualized video
> > streaming device that operates input and/or output data buffers and can be
> > used as a generic video decoder/encoder/input/output device.
> > 
> > We would be glad to share our thoughts and contribute to the discussion.
> > Please see some comments regarding buffer allocation inline.
> > 
> > Best regards,
> > Dmitry.
> > 
> > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > Hi Gerd,
> > > 
> > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > >   Hi,
> > > >   
> > > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > > device to use buffers allocated by virtio-gpu device.
> > > > > 
> > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > 
> > > First of all, thanks for taking a look at this RFC and for valuable
> > > feedback. Sorry for the late reply.
> > > 
> > > For reference, Keiichi is working with me and David Stevens on
> > > accelerated video support for virtual machines and integration with
> > > other virtual devices, like virtio-gpu for rendering or our
> > > currently-downstream virtio-wayland for display (I believe there is
> > > ongoing work to solve this problem in upstream too).
> > > 
> > > > Well.  I think before even discussing the protocol details we need a
> > > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > > should be an optional optimization and not a requirement.  Also the
> > > > motivation for that should be clear (Let the host decoder write
> > > > directly
> > > > to virtio-gpu resources, to display video without copying around the
> > > > decoded framebuffers from one device to another).
> > > 
> > > Just to make sure we're on the same page, what would the buffers come
> > > from if we don't use this optimization?
> > > 
> > > I can imagine a setup like this;
> > > 
> > >  1) host device allocates host memory appropriate for usage with host
> > > 
> > > video decoder,
> > > 
> > >  2) guest driver allocates arbitrary guest pages for storage
> > > 
> > > accessible to the guest software,
> > > 
> > >  3) guest userspace writes input for the decoder to guest pages,
> > >  4) guest driver passes the list of pages for the input and output
> > > 
> > > buffers to the host device
> > > 
> > >  5) host device copies data from input guest pages to host buffer
> > >  6) host device runs the decoding
> > >  7) host device copies decoded frame to output guest pages
> > >  8) guest userspace can access decoded frame from those pages; back to 3
> > > 
> > > Is that something you have in mind?
> > 
> > While GPU side allocations can be useful (especially in case of decoder),
> > it could be more practical to stick to driver side allocations. This is
> > also due to the fact that paravirtualized encoders and cameras are not
> > necessarily require a GPU device.
> > 
> > Also, the v4l2 framework already features convenient helpers for CMA and
> > SG
> > allocations. The buffers can be used in the same manner as in virtio-gpu:
> > buffers are first attached to an already allocated buffer/resource
> > descriptor and then are made available for processing by the device using
> > a dedicated command from the driver.
> 
> First of all, thanks a lot for your input. This is a relatively new
> area of virtualization and we definitely need to collect various
> possible perspectives in the discussion.
> 
> From Chrome OS point of view, there are several aspects for which the
> guest side allocation doesn't really work well:
> 1) host-side hardware has a lot of specific low level allocation
> requirements, like alignments, paddings, address space limitations and
> so on, which is not something that can be (easily) taught to the guest
> OS,
I couldn't agree more. There are some changes by Greg to add support for 
querying GPU buffer metadata. Probably those changes could be integrated with 
'a framework for cross-device buffer sharing' (something that Greg mentioned 
earlier in the thread and that would totally make sense).

> 2) allocation system is designed to be centralized, like Android
> gralloc, because there is almost never a case when a buffer is to be
> used only with 1 specific device. 99% of the cases are pipelines like
> decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> encoder and so on, which means that allocations need to take into
> account multiple hardware constraints.
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all
This looks like a valid use case. Would it also be possible for instance to 
allocate mem from a secure ION heap on the guest and then to provide the sgt 
to the device? We don't necessarily need to map that sgt for guest access.

Best regards,
Dmitry.

> 
> That said, the common desktop Linux model bases on allocating from the
> producer device (which is why videobuf2 has allocation capability) and
> we definitely need to consider this model, even if we just think about
> Linux V4L2 compliance. That's why I'm suggesting the unified memory
> handling based on guest physical addresses, which would handle both
> guest-allocated and host-allocated memory.
> 
> Best regards,
> Tomasz
> 
> > > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > > virtio-gpu resource handles.  The handles are device-specific.  What
> > > > if
> > > > there are multiple virtio-gpu devices present in the guest?
> > > > 
> > > > I think we need a framework for cross-device buffer sharing.  One
> > > > possible option would be to have some kind of buffer registry, where
> > > > buffers can be registered for cross-device sharing and get a unique
> > > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > > dma-buf export.
> > > 
> > > This approach could possibly let us handle this transparently to
> > > importers, which would work for guest kernel subsystems that rely on
> > > the ability to handle buffers like native memory (e.g. having a
> > > sgtable or DMA address) for them.
> > > 
> > > How about allocating guest physical addresses for memory corresponding
> > > to those buffers? On the virtio-gpu example, that could work like
> > > 
> > > this:
> > >  - by default a virtio-gpu buffer has only a resource handle,
> > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > > 
> > > virtio-gpu device export the buffer to a host framework (inside the
> > > VMM) that would allocate guest page addresses for it, which the
> > > command would return in a response to the guest,
> > > 
> > >  - virtio-gpu driver could then create a regular DMA-buf object for
> > > 
> > > such memory, because it's just backed by pages (even though they may
> > > not be accessible to the guest; just like in the case of TrustZone
> > > memory protection on bare metal systems),
> > > 
> > >  - any consumer would be able to handle such buffer like a regular
> > > 
> > > guest memory, passing low-level scatter-gather tables to the host as
> > > buffer descriptors - this would nicely integrate with the basic case
> > > without buffer sharing, as described above.
> > > 
> > > Another interesting side effect of the above approach would be the
> > > ease of integration with virtio-iommu. If the virtio master device is
> > > put behind a virtio-iommu, the guest page addresses become the input
> > > to iommu page tables and IOVA addresses go to the host via the virtio
> > > master device protocol, inside the low-level scatter-gather tables.
> > > 
> > > What do you think?
> > > 
> > > Best regards,
> > > Tomasz
> > > 
> > > > Another option would be to pass around both buffer handle and buffer
> > > > owner, i.e. instead of "u32 handle" have something like this:
> > > > 
> > > > struct buffer_reference {
> > > > 
> > > >         enum device_type; /* pci, virtio-mmio, ... */
> > > >         union device_address {
> > > >         
> > > >                 struct pci_address pci_addr;
> > > >                 u64 virtio_mmio_addr;
> > > >                 [ ... ]
> > > >         
> > > >         };
> > > >         u64 device_buffer_handle; /* device-specific, virtio-gpu could
> > > >         use
> > > >         resource ids here */>
> > > > 
> > > > };
> > > > 
> > > > cheers,
> > > > 
> > > >   Gerd
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
-- 

Dmitry Morozov
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:    +49 30 60 98 54 0 - 910
Fax:      +49 30 60 98 54 0 - 99



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-07 15:09           ` Dmitry Morozov
  0 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-07 15:09 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: virtio-dev, Gerd Hoffmann, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak, stevensd,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

Hi Tomasz,

On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> Hi Dmitry,
> 
> On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> 
> <dmitry.morozov@opensynergy.com> wrote:
> > Hello,
> > 
> > We at OpenSynergy are also working on an abstract paravirtualized video
> > streaming device that operates input and/or output data buffers and can be
> > used as a generic video decoder/encoder/input/output device.
> > 
> > We would be glad to share our thoughts and contribute to the discussion.
> > Please see some comments regarding buffer allocation inline.
> > 
> > Best regards,
> > Dmitry.
> > 
> > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > Hi Gerd,
> > > 
> > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > >   Hi,
> > > >   
> > > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > > device to use buffers allocated by virtio-gpu device.
> > > > > 
> > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > 
> > > First of all, thanks for taking a look at this RFC and for valuable
> > > feedback. Sorry for the late reply.
> > > 
> > > For reference, Keiichi is working with me and David Stevens on
> > > accelerated video support for virtual machines and integration with
> > > other virtual devices, like virtio-gpu for rendering or our
> > > currently-downstream virtio-wayland for display (I believe there is
> > > ongoing work to solve this problem in upstream too).
> > > 
> > > > Well.  I think before even discussing the protocol details we need a
> > > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > > should be an optional optimization and not a requirement.  Also the
> > > > motivation for that should be clear (Let the host decoder write
> > > > directly
> > > > to virtio-gpu resources, to display video without copying around the
> > > > decoded framebuffers from one device to another).
> > > 
> > > Just to make sure we're on the same page, what would the buffers come
> > > from if we don't use this optimization?
> > > 
> > > I can imagine a setup like this;
> > > 
> > >  1) host device allocates host memory appropriate for usage with host
> > > 
> > > video decoder,
> > > 
> > >  2) guest driver allocates arbitrary guest pages for storage
> > > 
> > > accessible to the guest software,
> > > 
> > >  3) guest userspace writes input for the decoder to guest pages,
> > >  4) guest driver passes the list of pages for the input and output
> > > 
> > > buffers to the host device
> > > 
> > >  5) host device copies data from input guest pages to host buffer
> > >  6) host device runs the decoding
> > >  7) host device copies decoded frame to output guest pages
> > >  8) guest userspace can access decoded frame from those pages; back to 3
> > > 
> > > Is that something you have in mind?
> > 
> > While GPU side allocations can be useful (especially in case of decoder),
> > it could be more practical to stick to driver side allocations. This is
> > also due to the fact that paravirtualized encoders and cameras are not
> > necessarily require a GPU device.
> > 
> > Also, the v4l2 framework already features convenient helpers for CMA and
> > SG
> > allocations. The buffers can be used in the same manner as in virtio-gpu:
> > buffers are first attached to an already allocated buffer/resource
> > descriptor and then are made available for processing by the device using
> > a dedicated command from the driver.
> 
> First of all, thanks a lot for your input. This is a relatively new
> area of virtualization and we definitely need to collect various
> possible perspectives in the discussion.
> 
> From Chrome OS point of view, there are several aspects for which the
> guest side allocation doesn't really work well:
> 1) host-side hardware has a lot of specific low level allocation
> requirements, like alignments, paddings, address space limitations and
> so on, which is not something that can be (easily) taught to the guest
> OS,
I couldn't agree more. There are some changes by Greg to add support for 
querying GPU buffer metadata. Probably those changes could be integrated with 
'a framework for cross-device buffer sharing' (something that Greg mentioned 
earlier in the thread and that would totally make sense).

> 2) allocation system is designed to be centralized, like Android
> gralloc, because there is almost never a case when a buffer is to be
> used only with 1 specific device. 99% of the cases are pipelines like
> decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> encoder and so on, which means that allocations need to take into
> account multiple hardware constraints.
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all
This looks like a valid use case. Would it also be possible for instance to 
allocate mem from a secure ION heap on the guest and then to provide the sgt 
to the device? We don't necessarily need to map that sgt for guest access.

Best regards,
Dmitry.

> 
> That said, the common desktop Linux model bases on allocating from the
> producer device (which is why videobuf2 has allocation capability) and
> we definitely need to consider this model, even if we just think about
> Linux V4L2 compliance. That's why I'm suggesting the unified memory
> handling based on guest physical addresses, which would handle both
> guest-allocated and host-allocated memory.
> 
> Best regards,
> Tomasz
> 
> > > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > > virtio-gpu resource handles.  The handles are device-specific.  What
> > > > if
> > > > there are multiple virtio-gpu devices present in the guest?
> > > > 
> > > > I think we need a framework for cross-device buffer sharing.  One
> > > > possible option would be to have some kind of buffer registry, where
> > > > buffers can be registered for cross-device sharing and get a unique
> > > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > > dma-buf export.
> > > 
> > > This approach could possibly let us handle this transparently to
> > > importers, which would work for guest kernel subsystems that rely on
> > > the ability to handle buffers like native memory (e.g. having a
> > > sgtable or DMA address) for them.
> > > 
> > > How about allocating guest physical addresses for memory corresponding
> > > to those buffers? On the virtio-gpu example, that could work like
> > > 
> > > this:
> > >  - by default a virtio-gpu buffer has only a resource handle,
> > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > > 
> > > virtio-gpu device export the buffer to a host framework (inside the
> > > VMM) that would allocate guest page addresses for it, which the
> > > command would return in a response to the guest,
> > > 
> > >  - virtio-gpu driver could then create a regular DMA-buf object for
> > > 
> > > such memory, because it's just backed by pages (even though they may
> > > not be accessible to the guest; just like in the case of TrustZone
> > > memory protection on bare metal systems),
> > > 
> > >  - any consumer would be able to handle such buffer like a regular
> > > 
> > > guest memory, passing low-level scatter-gather tables to the host as
> > > buffer descriptors - this would nicely integrate with the basic case
> > > without buffer sharing, as described above.
> > > 
> > > Another interesting side effect of the above approach would be the
> > > ease of integration with virtio-iommu. If the virtio master device is
> > > put behind a virtio-iommu, the guest page addresses become the input
> > > to iommu page tables and IOVA addresses go to the host via the virtio
> > > master device protocol, inside the low-level scatter-gather tables.
> > > 
> > > What do you think?
> > > 
> > > Best regards,
> > > Tomasz
> > > 
> > > > Another option would be to pass around both buffer handle and buffer
> > > > owner, i.e. instead of "u32 handle" have something like this:
> > > > 
> > > > struct buffer_reference {
> > > > 
> > > >         enum device_type; /* pci, virtio-mmio, ... */
> > > >         union device_address {
> > > >         
> > > >                 struct pci_address pci_addr;
> > > >                 u64 virtio_mmio_addr;
> > > >                 [ ... ]
> > > >         
> > > >         };
> > > >         u64 device_buffer_handle; /* device-specific, virtio-gpu could
> > > >         use
> > > >         resource ids here */>
> > > > 
> > > > };
> > > > 
> > > > cheers,
> > > > 
> > > >   Gerd
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
-- 

Dmitry Morozov
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:    +49 30 60 98 54 0 - 910
Fax:      +49 30 60 98 54 0 - 99



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-07 15:09           ` Dmitry Morozov
@ 2019-10-09  3:55             ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-09  3:55 UTC (permalink / raw)
  To: Dmitry Morozov, Gerd Hoffmann, stevensd
  Cc: virtio-dev, Keiichi Watanabe, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, Hans Verkuil,
	Linux Media Mailing List, Daniel Vetter

On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hi Tomasz,
>
> On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > Hi Dmitry,
> >
> > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> >
> > <dmitry.morozov@opensynergy.com> wrote:
> > > Hello,
> > >
> > > We at OpenSynergy are also working on an abstract paravirtualized video
> > > streaming device that operates input and/or output data buffers and can be
> > > used as a generic video decoder/encoder/input/output device.
> > >
> > > We would be glad to share our thoughts and contribute to the discussion.
> > > Please see some comments regarding buffer allocation inline.
> > >
> > > Best regards,
> > > Dmitry.
> > >
> > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > Hi Gerd,
> > > >
> > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > > >   Hi,
> > > > >
> > > > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > >
> > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > >
> > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > feedback. Sorry for the late reply.
> > > >
> > > > For reference, Keiichi is working with me and David Stevens on
> > > > accelerated video support for virtual machines and integration with
> > > > other virtual devices, like virtio-gpu for rendering or our
> > > > currently-downstream virtio-wayland for display (I believe there is
> > > > ongoing work to solve this problem in upstream too).
> > > >
> > > > > Well.  I think before even discussing the protocol details we need a
> > > > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > > > should be an optional optimization and not a requirement.  Also the
> > > > > motivation for that should be clear (Let the host decoder write
> > > > > directly
> > > > > to virtio-gpu resources, to display video without copying around the
> > > > > decoded framebuffers from one device to another).
> > > >
> > > > Just to make sure we're on the same page, what would the buffers come
> > > > from if we don't use this optimization?
> > > >
> > > > I can imagine a setup like this;
> > > >
> > > >  1) host device allocates host memory appropriate for usage with host
> > > >
> > > > video decoder,
> > > >
> > > >  2) guest driver allocates arbitrary guest pages for storage
> > > >
> > > > accessible to the guest software,
> > > >
> > > >  3) guest userspace writes input for the decoder to guest pages,
> > > >  4) guest driver passes the list of pages for the input and output
> > > >
> > > > buffers to the host device
> > > >
> > > >  5) host device copies data from input guest pages to host buffer
> > > >  6) host device runs the decoding
> > > >  7) host device copies decoded frame to output guest pages
> > > >  8) guest userspace can access decoded frame from those pages; back to 3
> > > >
> > > > Is that something you have in mind?
> > >
> > > While GPU side allocations can be useful (especially in case of decoder),
> > > it could be more practical to stick to driver side allocations. This is
> > > also due to the fact that paravirtualized encoders and cameras are not
> > > necessarily require a GPU device.
> > >
> > > Also, the v4l2 framework already features convenient helpers for CMA and
> > > SG
> > > allocations. The buffers can be used in the same manner as in virtio-gpu:
> > > buffers are first attached to an already allocated buffer/resource
> > > descriptor and then are made available for processing by the device using
> > > a dedicated command from the driver.
> >
> > First of all, thanks a lot for your input. This is a relatively new
> > area of virtualization and we definitely need to collect various
> > possible perspectives in the discussion.
> >
> > From Chrome OS point of view, there are several aspects for which the
> > guest side allocation doesn't really work well:
> > 1) host-side hardware has a lot of specific low level allocation
> > requirements, like alignments, paddings, address space limitations and
> > so on, which is not something that can be (easily) taught to the guest
> > OS,
> I couldn't agree more. There are some changes by Greg to add support for
> querying GPU buffer metadata. Probably those changes could be integrated with
> 'a framework for cross-device buffer sharing' (something that Greg mentioned
> earlier in the thread and that would totally make sense).
>

Did you mean one of Gerd's proposals?

I think we need some clarification there, as it's not clear to me
whether the framework is host-side, guest-side or both. The approach I
suggested would rely on a host-side framework and guest-side wouldn't
need any special handling for sharing, because the memory would behave
as on bare metal.

However allocation would still need some special API to express high
level buffer parameters and delegate the exact allocation requirements
to the host. Currently virtio-gpu already has such interface and also
has a DRM driver, which were the 2 main reasons for us to use it as
the allocator in Chrome OS. (minigbm/cros_gralloc are implemented on
top of the Linux DRM API)

> > 2) allocation system is designed to be centralized, like Android
> > gralloc, because there is almost never a case when a buffer is to be
> > used only with 1 specific device. 99% of the cases are pipelines like
> > decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> > encoder and so on, which means that allocations need to take into
> > account multiple hardware constraints.
> > 3) protected content decoding: the memory for decoded video frames
> > must not be accessible to the guest at all
> This looks like a valid use case. Would it also be possible for instance to
> allocate mem from a secure ION heap on the guest and then to provide the sgt
> to the device? We don't necessarily need to map that sgt for guest access.

Could you elaborate on how that would work? Would the secure ION heap
implementation use some protocol to request the allocation from the
host?

Another aspect is that on Chrome OS we don't support pre-reserved
carveout heaps, so we need this memory to be allocated by the host
dynamically.

>
> Best regards,
> Dmitry.
>
> >
> > That said, the common desktop Linux model bases on allocating from the
> > producer device (which is why videobuf2 has allocation capability) and
> > we definitely need to consider this model, even if we just think about
> > Linux V4L2 compliance. That's why I'm suggesting the unified memory
> > handling based on guest physical addresses, which would handle both
> > guest-allocated and host-allocated memory.
> >
> > Best regards,
> > Tomasz
> >
> > > > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > > > virtio-gpu resource handles.  The handles are device-specific.  What
> > > > > if
> > > > > there are multiple virtio-gpu devices present in the guest?
> > > > >
> > > > > I think we need a framework for cross-device buffer sharing.  One
> > > > > possible option would be to have some kind of buffer registry, where
> > > > > buffers can be registered for cross-device sharing and get a unique
> > > > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > > > dma-buf export.
> > > >
> > > > This approach could possibly let us handle this transparently to
> > > > importers, which would work for guest kernel subsystems that rely on
> > > > the ability to handle buffers like native memory (e.g. having a
> > > > sgtable or DMA address) for them.
> > > >
> > > > How about allocating guest physical addresses for memory corresponding
> > > > to those buffers? On the virtio-gpu example, that could work like
> > > >
> > > > this:
> > > >  - by default a virtio-gpu buffer has only a resource handle,
> > > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > > >
> > > > virtio-gpu device export the buffer to a host framework (inside the
> > > > VMM) that would allocate guest page addresses for it, which the
> > > > command would return in a response to the guest,
> > > >
> > > >  - virtio-gpu driver could then create a regular DMA-buf object for
> > > >
> > > > such memory, because it's just backed by pages (even though they may
> > > > not be accessible to the guest; just like in the case of TrustZone
> > > > memory protection on bare metal systems),
> > > >
> > > >  - any consumer would be able to handle such buffer like a regular
> > > >
> > > > guest memory, passing low-level scatter-gather tables to the host as
> > > > buffer descriptors - this would nicely integrate with the basic case
> > > > without buffer sharing, as described above.
> > > >
> > > > Another interesting side effect of the above approach would be the
> > > > ease of integration with virtio-iommu. If the virtio master device is
> > > > put behind a virtio-iommu, the guest page addresses become the input
> > > > to iommu page tables and IOVA addresses go to the host via the virtio
> > > > master device protocol, inside the low-level scatter-gather tables.
> > > >
> > > > What do you think?
> > > >

I was recently thinking about emulating real devices, like a USB
camera (via an emulated USB host controller) and realized that this
approach would also make it possible for such hardware to share
buffers with virtio (or paravirtualized in general) devices in a
zero-copy manner, because the memory would be described as on a native
system, using a scatter-gather list of DMA addresses.

Best regards,
Tomasz

> > > > Best regards,
> > > > Tomasz
> > > >
> > > > > Another option would be to pass around both buffer handle and buffer
> > > > > owner, i.e. instead of "u32 handle" have something like this:
> > > > >
> > > > > struct buffer_reference {
> > > > >
> > > > >         enum device_type; /* pci, virtio-mmio, ... */
> > > > >         union device_address {
> > > > >
> > > > >                 struct pci_address pci_addr;
> > > > >                 u64 virtio_mmio_addr;
> > > > >                 [ ... ]
> > > > >
> > > > >         };
> > > > >         u64 device_buffer_handle; /* device-specific, virtio-gpu could
> > > > >         use
> > > > >         resource ids here */>
> > > > >
> > > > > };
> > > > >
> > > > > cheers,
> > > > >
> > > > >   Gerd
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> --
>
> Dmitry Morozov
> Senior Software Engineer
>
> OpenSynergy GmbH
> Rotherstr. 20, 10245 Berlin
>
> Phone:    +49 30 60 98 54 0 - 910
> Fax:      +49 30 60 98 54 0 - 99
>
>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-09  3:55             ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-09  3:55 UTC (permalink / raw)
  To: Dmitry Morozov, Gerd Hoffmann, stevensd
  Cc: virtio-dev, Keiichi Watanabe, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, Hans Verkuil,
	Linux Media Mailing List, Daniel Vetter

On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hi Tomasz,
>
> On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > Hi Dmitry,
> >
> > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> >
> > <dmitry.morozov@opensynergy.com> wrote:
> > > Hello,
> > >
> > > We at OpenSynergy are also working on an abstract paravirtualized video
> > > streaming device that operates input and/or output data buffers and can be
> > > used as a generic video decoder/encoder/input/output device.
> > >
> > > We would be glad to share our thoughts and contribute to the discussion.
> > > Please see some comments regarding buffer allocation inline.
> > >
> > > Best regards,
> > > Dmitry.
> > >
> > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > Hi Gerd,
> > > >
> > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > > >   Hi,
> > > > >
> > > > > > Our prototype implementation uses [4], which allows the virtio-vdec
> > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > >
> > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > >
> > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > feedback. Sorry for the late reply.
> > > >
> > > > For reference, Keiichi is working with me and David Stevens on
> > > > accelerated video support for virtual machines and integration with
> > > > other virtual devices, like virtio-gpu for rendering or our
> > > > currently-downstream virtio-wayland for display (I believe there is
> > > > ongoing work to solve this problem in upstream too).
> > > >
> > > > > Well.  I think before even discussing the protocol details we need a
> > > > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > > > should be an optional optimization and not a requirement.  Also the
> > > > > motivation for that should be clear (Let the host decoder write
> > > > > directly
> > > > > to virtio-gpu resources, to display video without copying around the
> > > > > decoded framebuffers from one device to another).
> > > >
> > > > Just to make sure we're on the same page, what would the buffers come
> > > > from if we don't use this optimization?
> > > >
> > > > I can imagine a setup like this;
> > > >
> > > >  1) host device allocates host memory appropriate for usage with host
> > > >
> > > > video decoder,
> > > >
> > > >  2) guest driver allocates arbitrary guest pages for storage
> > > >
> > > > accessible to the guest software,
> > > >
> > > >  3) guest userspace writes input for the decoder to guest pages,
> > > >  4) guest driver passes the list of pages for the input and output
> > > >
> > > > buffers to the host device
> > > >
> > > >  5) host device copies data from input guest pages to host buffer
> > > >  6) host device runs the decoding
> > > >  7) host device copies decoded frame to output guest pages
> > > >  8) guest userspace can access decoded frame from those pages; back to 3
> > > >
> > > > Is that something you have in mind?
> > >
> > > While GPU side allocations can be useful (especially in case of decoder),
> > > it could be more practical to stick to driver side allocations. This is
> > > also due to the fact that paravirtualized encoders and cameras are not
> > > necessarily require a GPU device.
> > >
> > > Also, the v4l2 framework already features convenient helpers for CMA and
> > > SG
> > > allocations. The buffers can be used in the same manner as in virtio-gpu:
> > > buffers are first attached to an already allocated buffer/resource
> > > descriptor and then are made available for processing by the device using
> > > a dedicated command from the driver.
> >
> > First of all, thanks a lot for your input. This is a relatively new
> > area of virtualization and we definitely need to collect various
> > possible perspectives in the discussion.
> >
> > From Chrome OS point of view, there are several aspects for which the
> > guest side allocation doesn't really work well:
> > 1) host-side hardware has a lot of specific low level allocation
> > requirements, like alignments, paddings, address space limitations and
> > so on, which is not something that can be (easily) taught to the guest
> > OS,
> I couldn't agree more. There are some changes by Greg to add support for
> querying GPU buffer metadata. Probably those changes could be integrated with
> 'a framework for cross-device buffer sharing' (something that Greg mentioned
> earlier in the thread and that would totally make sense).
>

Did you mean one of Gerd's proposals?

I think we need some clarification there, as it's not clear to me
whether the framework is host-side, guest-side or both. The approach I
suggested would rely on a host-side framework and guest-side wouldn't
need any special handling for sharing, because the memory would behave
as on bare metal.

However allocation would still need some special API to express high
level buffer parameters and delegate the exact allocation requirements
to the host. Currently virtio-gpu already has such interface and also
has a DRM driver, which were the 2 main reasons for us to use it as
the allocator in Chrome OS. (minigbm/cros_gralloc are implemented on
top of the Linux DRM API)

> > 2) allocation system is designed to be centralized, like Android
> > gralloc, because there is almost never a case when a buffer is to be
> > used only with 1 specific device. 99% of the cases are pipelines like
> > decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> > encoder and so on, which means that allocations need to take into
> > account multiple hardware constraints.
> > 3) protected content decoding: the memory for decoded video frames
> > must not be accessible to the guest at all
> This looks like a valid use case. Would it also be possible for instance to
> allocate mem from a secure ION heap on the guest and then to provide the sgt
> to the device? We don't necessarily need to map that sgt for guest access.

Could you elaborate on how that would work? Would the secure ION heap
implementation use some protocol to request the allocation from the
host?

Another aspect is that on Chrome OS we don't support pre-reserved
carveout heaps, so we need this memory to be allocated by the host
dynamically.

>
> Best regards,
> Dmitry.
>
> >
> > That said, the common desktop Linux model bases on allocating from the
> > producer device (which is why videobuf2 has allocation capability) and
> > we definitely need to consider this model, even if we just think about
> > Linux V4L2 compliance. That's why I'm suggesting the unified memory
> > handling based on guest physical addresses, which would handle both
> > guest-allocated and host-allocated memory.
> >
> > Best regards,
> > Tomasz
> >
> > > > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > > > virtio-gpu resource handles.  The handles are device-specific.  What
> > > > > if
> > > > > there are multiple virtio-gpu devices present in the guest?
> > > > >
> > > > > I think we need a framework for cross-device buffer sharing.  One
> > > > > possible option would be to have some kind of buffer registry, where
> > > > > buffers can be registered for cross-device sharing and get a unique
> > > > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > > > dma-buf export.
> > > >
> > > > This approach could possibly let us handle this transparently to
> > > > importers, which would work for guest kernel subsystems that rely on
> > > > the ability to handle buffers like native memory (e.g. having a
> > > > sgtable or DMA address) for them.
> > > >
> > > > How about allocating guest physical addresses for memory corresponding
> > > > to those buffers? On the virtio-gpu example, that could work like
> > > >
> > > > this:
> > > >  - by default a virtio-gpu buffer has only a resource handle,
> > > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > > >
> > > > virtio-gpu device export the buffer to a host framework (inside the
> > > > VMM) that would allocate guest page addresses for it, which the
> > > > command would return in a response to the guest,
> > > >
> > > >  - virtio-gpu driver could then create a regular DMA-buf object for
> > > >
> > > > such memory, because it's just backed by pages (even though they may
> > > > not be accessible to the guest; just like in the case of TrustZone
> > > > memory protection on bare metal systems),
> > > >
> > > >  - any consumer would be able to handle such buffer like a regular
> > > >
> > > > guest memory, passing low-level scatter-gather tables to the host as
> > > > buffer descriptors - this would nicely integrate with the basic case
> > > > without buffer sharing, as described above.
> > > >
> > > > Another interesting side effect of the above approach would be the
> > > > ease of integration with virtio-iommu. If the virtio master device is
> > > > put behind a virtio-iommu, the guest page addresses become the input
> > > > to iommu page tables and IOVA addresses go to the host via the virtio
> > > > master device protocol, inside the low-level scatter-gather tables.
> > > >
> > > > What do you think?
> > > >

I was recently thinking about emulating real devices, like a USB
camera (via an emulated USB host controller) and realized that this
approach would also make it possible for such hardware to share
buffers with virtio (or paravirtualized in general) devices in a
zero-copy manner, because the memory would be described as on a native
system, using a scatter-gather list of DMA addresses.

Best regards,
Tomasz

> > > > Best regards,
> > > > Tomasz
> > > >
> > > > > Another option would be to pass around both buffer handle and buffer
> > > > > owner, i.e. instead of "u32 handle" have something like this:
> > > > >
> > > > > struct buffer_reference {
> > > > >
> > > > >         enum device_type; /* pci, virtio-mmio, ... */
> > > > >         union device_address {
> > > > >
> > > > >                 struct pci_address pci_addr;
> > > > >                 u64 virtio_mmio_addr;
> > > > >                 [ ... ]
> > > > >
> > > > >         };
> > > > >         u64 device_buffer_handle; /* device-specific, virtio-gpu could
> > > > >         use
> > > > >         resource ids here */>
> > > > >
> > > > > };
> > > > >
> > > > > cheers,
> > > > >
> > > > >   Gerd
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> --
>
> Dmitry Morozov
> Senior Software Engineer
>
> OpenSynergy GmbH
> Rotherstr. 20, 10245 Berlin
>
> Phone:    +49 30 60 98 54 0 - 910
> Fax:      +49 30 60 98 54 0 - 99
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-09  3:55             ` Tomasz Figa
@ 2019-10-11  8:53               ` Dmitry Morozov
  -1 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-11  8:53 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Gerd Hoffmann, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

Hi Tomasz,

On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote:
> On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
> 
> <dmitry.morozov@opensynergy.com> wrote:
> > Hi Tomasz,
> > 
> > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > > Hi Dmitry,
> > > 
> > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> > > 
> > > <dmitry.morozov@opensynergy.com> wrote:
> > > > Hello,
> > > > 
> > > > We at OpenSynergy are also working on an abstract paravirtualized
> > > > video
> > > > streaming device that operates input and/or output data buffers and
> > > > can be
> > > > used as a generic video decoder/encoder/input/output device.
> > > > 
> > > > We would be glad to share our thoughts and contribute to the
> > > > discussion.
> > > > Please see some comments regarding buffer allocation inline.
> > > > 
> > > > Best regards,
> > > > Dmitry.
> > > > 
> > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > > Hi Gerd,
> > > > > 
> > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> 
wrote:
> > > > > >   Hi,
> > > > > >   
> > > > > > > Our prototype implementation uses [4], which allows the
> > > > > > > virtio-vdec
> > > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > > > 
> > > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > > > 
> > > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > > feedback. Sorry for the late reply.
> > > > > 
> > > > > For reference, Keiichi is working with me and David Stevens on
> > > > > accelerated video support for virtual machines and integration with
> > > > > other virtual devices, like virtio-gpu for rendering or our
> > > > > currently-downstream virtio-wayland for display (I believe there is
> > > > > ongoing work to solve this problem in upstream too).
> > > > > 
> > > > > > Well.  I think before even discussing the protocol details we need
> > > > > > a
> > > > > > reasonable plan for buffer handling.  I think using virtio-gpu
> > > > > > buffers
> > > > > > should be an optional optimization and not a requirement.  Also
> > > > > > the
> > > > > > motivation for that should be clear (Let the host decoder write
> > > > > > directly
> > > > > > to virtio-gpu resources, to display video without copying around
> > > > > > the
> > > > > > decoded framebuffers from one device to another).
> > > > > 
> > > > > Just to make sure we're on the same page, what would the buffers
> > > > > come
> > > > > from if we don't use this optimization?
> > > > > 
> > > > > I can imagine a setup like this;
> > > > > 
> > > > >  1) host device allocates host memory appropriate for usage with
> > > > >  host
> > > > > 
> > > > > video decoder,
> > > > > 
> > > > >  2) guest driver allocates arbitrary guest pages for storage
> > > > > 
> > > > > accessible to the guest software,
> > > > > 
> > > > >  3) guest userspace writes input for the decoder to guest pages,
> > > > >  4) guest driver passes the list of pages for the input and output
> > > > > 
> > > > > buffers to the host device
> > > > > 
> > > > >  5) host device copies data from input guest pages to host buffer
> > > > >  6) host device runs the decoding
> > > > >  7) host device copies decoded frame to output guest pages
> > > > >  8) guest userspace can access decoded frame from those pages; back
> > > > >  to 3
> > > > > 
> > > > > Is that something you have in mind?
> > > > 
> > > > While GPU side allocations can be useful (especially in case of
> > > > decoder),
> > > > it could be more practical to stick to driver side allocations. This
> > > > is
> > > > also due to the fact that paravirtualized encoders and cameras are not
> > > > necessarily require a GPU device.
> > > > 
> > > > Also, the v4l2 framework already features convenient helpers for CMA
> > > > and
> > > > SG
> > > > allocations. The buffers can be used in the same manner as in
> > > > virtio-gpu:
> > > > buffers are first attached to an already allocated buffer/resource
> > > > descriptor and then are made available for processing by the device
> > > > using
> > > > a dedicated command from the driver.
> > > 
> > > First of all, thanks a lot for your input. This is a relatively new
> > > area of virtualization and we definitely need to collect various
> > > possible perspectives in the discussion.
> > > 
> > > From Chrome OS point of view, there are several aspects for which the
> > > guest side allocation doesn't really work well:
> > > 1) host-side hardware has a lot of specific low level allocation
> > > requirements, like alignments, paddings, address space limitations and
> > > so on, which is not something that can be (easily) taught to the guest
> > > OS,
> > 
> > I couldn't agree more. There are some changes by Greg to add support for
> > querying GPU buffer metadata. Probably those changes could be integrated
> > with 'a framework for cross-device buffer sharing' (something that Greg
> > mentioned earlier in the thread and that would totally make sense).
> 
> Did you mean one of Gerd's proposals?
> 
> I think we need some clarification there, as it's not clear to me
> whether the framework is host-side, guest-side or both. The approach I
> suggested would rely on a host-side framework and guest-side wouldn't
> need any special handling for sharing, because the memory would behave
> as on bare metal.
> 
> However allocation would still need some special API to express high
> level buffer parameters and delegate the exact allocation requirements
> to the host. Currently virtio-gpu already has such interface and also
> has a DRM driver, which were the 2 main reasons for us to use it as
> the allocator in Chrome OS. (minigbm/cros_gralloc are implemented on
> top of the Linux DRM API)
>
Yes, it was about Gerd's proposals. To be honest, I was considering guest 
allocations only. The operation flow in that case might look in more or less 
the same way: the driver (GPU, Codec/Camera) first allocates a resource 
descriptor on the host side. Than the driver uses the framework from above (so 
support on both sides might be required) to request buffer metadata and does 
allocations on the guest side accordingly. Then it attaches backing storage to 
the host resource.
> > > 2) allocation system is designed to be centralized, like Android
> > > gralloc, because there is almost never a case when a buffer is to be
> > > used only with 1 specific device. 99% of the cases are pipelines like
> > > decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> > > encoder and so on, which means that allocations need to take into
> > > account multiple hardware constraints.
> > > 3) protected content decoding: the memory for decoded video frames
> > > must not be accessible to the guest at all
> > 
> > This looks like a valid use case. Would it also be possible for instance
> > to
> > allocate mem from a secure ION heap on the guest and then to provide the
> > sgt to the device? We don't necessarily need to map that sgt for guest
> > access.
> Could you elaborate on how that would work? Would the secure ION heap
> implementation use some protocol to request the allocation from the
> host?
> 
> Another aspect is that on Chrome OS we don't support pre-reserved
> carveout heaps, so we need this memory to be allocated by the host
> dynamically.
>
My take on this (for a decoder) would be to allocate memory for output buffers 
from a secure ION heap, import in the v4l2 driver, and then to provide those 
to the device using virtio. The device side then uses the dmabuf framework to 
make the buffers accessible for the hardware. I'm not sure about that, it's 
just an idea.
> > Best regards,
> > Dmitry.
> > 
> > > That said, the common desktop Linux model bases on allocating from the
> > > producer device (which is why videobuf2 has allocation capability) and
> > > we definitely need to consider this model, even if we just think about
> > > Linux V4L2 compliance. That's why I'm suggesting the unified memory
> > > handling based on guest physical addresses, which would handle both
> > > guest-allocated and host-allocated memory.
> > > 
> > > Best regards,
> > > Tomasz
> > > 
> > > > > > Referencing virtio-gpu buffers needs a better plan than just
> > > > > > re-using
> > > > > > virtio-gpu resource handles.  The handles are device-specific. 
> > > > > > What
> > > > > > if
> > > > > > there are multiple virtio-gpu devices present in the guest?
> > > > > > 
> > > > > > I think we need a framework for cross-device buffer sharing.  One
> > > > > > possible option would be to have some kind of buffer registry,
> > > > > > where
> > > > > > buffers can be registered for cross-device sharing and get a
> > > > > > unique
> > > > > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > > > > dma-buf export.
> > > > > 
> > > > > This approach could possibly let us handle this transparently to
> > > > > importers, which would work for guest kernel subsystems that rely on
> > > > > the ability to handle buffers like native memory (e.g. having a
> > > > > sgtable or DMA address) for them.
> > > > > 
> > > > > How about allocating guest physical addresses for memory
> > > > > corresponding
> > > > > to those buffers? On the virtio-gpu example, that could work like
> > > > > 
> > > > > this:
> > > > >  - by default a virtio-gpu buffer has only a resource handle,
> > > > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > > > > 
> > > > > virtio-gpu device export the buffer to a host framework (inside the
> > > > > VMM) that would allocate guest page addresses for it, which the
> > > > > command would return in a response to the guest,
> > > > > 
> > > > >  - virtio-gpu driver could then create a regular DMA-buf object for
> > > > > 
> > > > > such memory, because it's just backed by pages (even though they may
> > > > > not be accessible to the guest; just like in the case of TrustZone
> > > > > memory protection on bare metal systems),
> > > > > 
> > > > >  - any consumer would be able to handle such buffer like a regular
> > > > > 
> > > > > guest memory, passing low-level scatter-gather tables to the host as
> > > > > buffer descriptors - this would nicely integrate with the basic case
> > > > > without buffer sharing, as described above.
> > > > > 
> > > > > Another interesting side effect of the above approach would be the
> > > > > ease of integration with virtio-iommu. If the virtio master device
> > > > > is
> > > > > put behind a virtio-iommu, the guest page addresses become the input
> > > > > to iommu page tables and IOVA addresses go to the host via the
> > > > > virtio
> > > > > master device protocol, inside the low-level scatter-gather tables.
> > > > > 
> > > > > What do you think?
> 
> I was recently thinking about emulating real devices, like a USB
> camera (via an emulated USB host controller) and realized that this
> approach would also make it possible for such hardware to share
> buffers with virtio (or paravirtualized in general) devices in a
> zero-copy manner, because the memory would be described as on a native
> system, using a scatter-gather list of DMA addresses.
> 
> Best regards,
> Tomasz
> 
> > > > > Best regards,
> > > > > Tomasz
> > > > > 
> > > > > > Another option would be to pass around both buffer handle and
> > > > > > buffer
> > > > > > owner, i.e. instead of "u32 handle" have something like this:
> > > > > > 
> > > > > > struct buffer_reference {
> > > > > > 
> > > > > >         enum device_type; /* pci, virtio-mmio, ... */
> > > > > >         union device_address {
> > > > > >         
> > > > > >                 struct pci_address pci_addr;
> > > > > >                 u64 virtio_mmio_addr;
> > > > > >                 [ ... ]
> > > > > >         
> > > > > >         };
> > > > > >         u64 device_buffer_handle; /* device-specific, virtio-gpu
> > > > > >         could
> > > > > >         use
> > > > > >         resource ids here */>
> > > > > > 
> > > > > > };
> > > > > > 
> > > > > > cheers,
> > > > > > 
> > > > > >   Gerd
> > > > > 
> > > > > --------------------------------------------------------------------
> > > > > -
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail:
> > > > > virtio-dev-help@lists.oasis-open.org
> > 
> > --
> > 
> > Dmitry Morozov
> > Senior Software Engineer
> > 
> > OpenSynergy GmbH
> > Rotherstr. 20, 10245 Berlin
> > 
> > Phone:    +49 30 60 98 54 0 - 910
> > Fax:      +49 30 60 98 54 0 - 99
-- 

Dmitry Morozov
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:    +49 30 60 98 54 0 - 910
Fax:      +49 30 60 98 54 0 - 99



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-11  8:53               ` Dmitry Morozov
  0 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-11  8:53 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Gerd Hoffmann, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

Hi Tomasz,

On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote:
> On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
> 
> <dmitry.morozov@opensynergy.com> wrote:
> > Hi Tomasz,
> > 
> > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > > Hi Dmitry,
> > > 
> > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> > > 
> > > <dmitry.morozov@opensynergy.com> wrote:
> > > > Hello,
> > > > 
> > > > We at OpenSynergy are also working on an abstract paravirtualized
> > > > video
> > > > streaming device that operates input and/or output data buffers and
> > > > can be
> > > > used as a generic video decoder/encoder/input/output device.
> > > > 
> > > > We would be glad to share our thoughts and contribute to the
> > > > discussion.
> > > > Please see some comments regarding buffer allocation inline.
> > > > 
> > > > Best regards,
> > > > Dmitry.
> > > > 
> > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > > Hi Gerd,
> > > > > 
> > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com> 
wrote:
> > > > > >   Hi,
> > > > > >   
> > > > > > > Our prototype implementation uses [4], which allows the
> > > > > > > virtio-vdec
> > > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > > > 
> > > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > > > 
> > > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > > feedback. Sorry for the late reply.
> > > > > 
> > > > > For reference, Keiichi is working with me and David Stevens on
> > > > > accelerated video support for virtual machines and integration with
> > > > > other virtual devices, like virtio-gpu for rendering or our
> > > > > currently-downstream virtio-wayland for display (I believe there is
> > > > > ongoing work to solve this problem in upstream too).
> > > > > 
> > > > > > Well.  I think before even discussing the protocol details we need
> > > > > > a
> > > > > > reasonable plan for buffer handling.  I think using virtio-gpu
> > > > > > buffers
> > > > > > should be an optional optimization and not a requirement.  Also
> > > > > > the
> > > > > > motivation for that should be clear (Let the host decoder write
> > > > > > directly
> > > > > > to virtio-gpu resources, to display video without copying around
> > > > > > the
> > > > > > decoded framebuffers from one device to another).
> > > > > 
> > > > > Just to make sure we're on the same page, what would the buffers
> > > > > come
> > > > > from if we don't use this optimization?
> > > > > 
> > > > > I can imagine a setup like this;
> > > > > 
> > > > >  1) host device allocates host memory appropriate for usage with
> > > > >  host
> > > > > 
> > > > > video decoder,
> > > > > 
> > > > >  2) guest driver allocates arbitrary guest pages for storage
> > > > > 
> > > > > accessible to the guest software,
> > > > > 
> > > > >  3) guest userspace writes input for the decoder to guest pages,
> > > > >  4) guest driver passes the list of pages for the input and output
> > > > > 
> > > > > buffers to the host device
> > > > > 
> > > > >  5) host device copies data from input guest pages to host buffer
> > > > >  6) host device runs the decoding
> > > > >  7) host device copies decoded frame to output guest pages
> > > > >  8) guest userspace can access decoded frame from those pages; back
> > > > >  to 3
> > > > > 
> > > > > Is that something you have in mind?
> > > > 
> > > > While GPU side allocations can be useful (especially in case of
> > > > decoder),
> > > > it could be more practical to stick to driver side allocations. This
> > > > is
> > > > also due to the fact that paravirtualized encoders and cameras are not
> > > > necessarily require a GPU device.
> > > > 
> > > > Also, the v4l2 framework already features convenient helpers for CMA
> > > > and
> > > > SG
> > > > allocations. The buffers can be used in the same manner as in
> > > > virtio-gpu:
> > > > buffers are first attached to an already allocated buffer/resource
> > > > descriptor and then are made available for processing by the device
> > > > using
> > > > a dedicated command from the driver.
> > > 
> > > First of all, thanks a lot for your input. This is a relatively new
> > > area of virtualization and we definitely need to collect various
> > > possible perspectives in the discussion.
> > > 
> > > From Chrome OS point of view, there are several aspects for which the
> > > guest side allocation doesn't really work well:
> > > 1) host-side hardware has a lot of specific low level allocation
> > > requirements, like alignments, paddings, address space limitations and
> > > so on, which is not something that can be (easily) taught to the guest
> > > OS,
> > 
> > I couldn't agree more. There are some changes by Greg to add support for
> > querying GPU buffer metadata. Probably those changes could be integrated
> > with 'a framework for cross-device buffer sharing' (something that Greg
> > mentioned earlier in the thread and that would totally make sense).
> 
> Did you mean one of Gerd's proposals?
> 
> I think we need some clarification there, as it's not clear to me
> whether the framework is host-side, guest-side or both. The approach I
> suggested would rely on a host-side framework and guest-side wouldn't
> need any special handling for sharing, because the memory would behave
> as on bare metal.
> 
> However allocation would still need some special API to express high
> level buffer parameters and delegate the exact allocation requirements
> to the host. Currently virtio-gpu already has such interface and also
> has a DRM driver, which were the 2 main reasons for us to use it as
> the allocator in Chrome OS. (minigbm/cros_gralloc are implemented on
> top of the Linux DRM API)
>
Yes, it was about Gerd's proposals. To be honest, I was considering guest 
allocations only. The operation flow in that case might look in more or less 
the same way: the driver (GPU, Codec/Camera) first allocates a resource 
descriptor on the host side. Than the driver uses the framework from above (so 
support on both sides might be required) to request buffer metadata and does 
allocations on the guest side accordingly. Then it attaches backing storage to 
the host resource.
> > > 2) allocation system is designed to be centralized, like Android
> > > gralloc, because there is almost never a case when a buffer is to be
> > > used only with 1 specific device. 99% of the cases are pipelines like
> > > decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> > > encoder and so on, which means that allocations need to take into
> > > account multiple hardware constraints.
> > > 3) protected content decoding: the memory for decoded video frames
> > > must not be accessible to the guest at all
> > 
> > This looks like a valid use case. Would it also be possible for instance
> > to
> > allocate mem from a secure ION heap on the guest and then to provide the
> > sgt to the device? We don't necessarily need to map that sgt for guest
> > access.
> Could you elaborate on how that would work? Would the secure ION heap
> implementation use some protocol to request the allocation from the
> host?
> 
> Another aspect is that on Chrome OS we don't support pre-reserved
> carveout heaps, so we need this memory to be allocated by the host
> dynamically.
>
My take on this (for a decoder) would be to allocate memory for output buffers 
from a secure ION heap, import in the v4l2 driver, and then to provide those 
to the device using virtio. The device side then uses the dmabuf framework to 
make the buffers accessible for the hardware. I'm not sure about that, it's 
just an idea.
> > Best regards,
> > Dmitry.
> > 
> > > That said, the common desktop Linux model bases on allocating from the
> > > producer device (which is why videobuf2 has allocation capability) and
> > > we definitely need to consider this model, even if we just think about
> > > Linux V4L2 compliance. That's why I'm suggesting the unified memory
> > > handling based on guest physical addresses, which would handle both
> > > guest-allocated and host-allocated memory.
> > > 
> > > Best regards,
> > > Tomasz
> > > 
> > > > > > Referencing virtio-gpu buffers needs a better plan than just
> > > > > > re-using
> > > > > > virtio-gpu resource handles.  The handles are device-specific. 
> > > > > > What
> > > > > > if
> > > > > > there are multiple virtio-gpu devices present in the guest?
> > > > > > 
> > > > > > I think we need a framework for cross-device buffer sharing.  One
> > > > > > possible option would be to have some kind of buffer registry,
> > > > > > where
> > > > > > buffers can be registered for cross-device sharing and get a
> > > > > > unique
> > > > > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > > > > dma-buf export.
> > > > > 
> > > > > This approach could possibly let us handle this transparently to
> > > > > importers, which would work for guest kernel subsystems that rely on
> > > > > the ability to handle buffers like native memory (e.g. having a
> > > > > sgtable or DMA address) for them.
> > > > > 
> > > > > How about allocating guest physical addresses for memory
> > > > > corresponding
> > > > > to those buffers? On the virtio-gpu example, that could work like
> > > > > 
> > > > > this:
> > > > >  - by default a virtio-gpu buffer has only a resource handle,
> > > > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > > > > 
> > > > > virtio-gpu device export the buffer to a host framework (inside the
> > > > > VMM) that would allocate guest page addresses for it, which the
> > > > > command would return in a response to the guest,
> > > > > 
> > > > >  - virtio-gpu driver could then create a regular DMA-buf object for
> > > > > 
> > > > > such memory, because it's just backed by pages (even though they may
> > > > > not be accessible to the guest; just like in the case of TrustZone
> > > > > memory protection on bare metal systems),
> > > > > 
> > > > >  - any consumer would be able to handle such buffer like a regular
> > > > > 
> > > > > guest memory, passing low-level scatter-gather tables to the host as
> > > > > buffer descriptors - this would nicely integrate with the basic case
> > > > > without buffer sharing, as described above.
> > > > > 
> > > > > Another interesting side effect of the above approach would be the
> > > > > ease of integration with virtio-iommu. If the virtio master device
> > > > > is
> > > > > put behind a virtio-iommu, the guest page addresses become the input
> > > > > to iommu page tables and IOVA addresses go to the host via the
> > > > > virtio
> > > > > master device protocol, inside the low-level scatter-gather tables.
> > > > > 
> > > > > What do you think?
> 
> I was recently thinking about emulating real devices, like a USB
> camera (via an emulated USB host controller) and realized that this
> approach would also make it possible for such hardware to share
> buffers with virtio (or paravirtualized in general) devices in a
> zero-copy manner, because the memory would be described as on a native
> system, using a scatter-gather list of DMA addresses.
> 
> Best regards,
> Tomasz
> 
> > > > > Best regards,
> > > > > Tomasz
> > > > > 
> > > > > > Another option would be to pass around both buffer handle and
> > > > > > buffer
> > > > > > owner, i.e. instead of "u32 handle" have something like this:
> > > > > > 
> > > > > > struct buffer_reference {
> > > > > > 
> > > > > >         enum device_type; /* pci, virtio-mmio, ... */
> > > > > >         union device_address {
> > > > > >         
> > > > > >                 struct pci_address pci_addr;
> > > > > >                 u64 virtio_mmio_addr;
> > > > > >                 [ ... ]
> > > > > >         
> > > > > >         };
> > > > > >         u64 device_buffer_handle; /* device-specific, virtio-gpu
> > > > > >         could
> > > > > >         use
> > > > > >         resource ids here */>
> > > > > > 
> > > > > > };
> > > > > > 
> > > > > > cheers,
> > > > > > 
> > > > > >   Gerd
> > > > > 
> > > > > --------------------------------------------------------------------
> > > > > -
> > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > For additional commands, e-mail:
> > > > > virtio-dev-help@lists.oasis-open.org
> > 
> > --
> > 
> > Dmitry Morozov
> > Senior Software Engineer
> > 
> > OpenSynergy GmbH
> > Rotherstr. 20, 10245 Berlin
> > 
> > Phone:    +49 30 60 98 54 0 - 910
> > Fax:      +49 30 60 98 54 0 - 99
-- 

Dmitry Morozov
Senior Software Engineer

OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin

Phone:    +49 30 60 98 54 0 - 910
Fax:      +49 30 60 98 54 0 - 99



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-05  6:08     ` Tomasz Figa
@ 2019-10-14 12:19       ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-14 12:19 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, stevensd, Hans Verkuil,
	Linux Media Mailing List, Daniel Vetter

> > Well.  I think before even discussing the protocol details we need a
> > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > should be an optional optimization and not a requirement.  Also the
> > motivation for that should be clear (Let the host decoder write directly
> > to virtio-gpu resources, to display video without copying around the
> > decoded framebuffers from one device to another).
> 
> Just to make sure we're on the same page, what would the buffers come
> from if we don't use this optimization?
> 
> I can imagine a setup like this;
>  1) host device allocates host memory appropriate for usage with host
> video decoder,
>  2) guest driver allocates arbitrary guest pages for storage
> accessible to the guest software,
>  3) guest userspace writes input for the decoder to guest pages,
>  4) guest driver passes the list of pages for the input and output
> buffers to the host device
>  5) host device copies data from input guest pages to host buffer
>  6) host device runs the decoding
>  7) host device copies decoded frame to output guest pages
>  8) guest userspace can access decoded frame from those pages; back to 3
> 
> Is that something you have in mind?

I don't have any specific workflow in mind.

If you want display the decoded video frames you want use dma-bufs shared
by video decoder and gpu, right?  So the userspace application (video
player probably) would create the buffers using one of the drivers,
export them as dma-buf, then import them into the other driver.  Just
like you would do on physical hardware.  So, when using virtio-gpu
buffers:

  (1) guest app creates buffers using virtio-gpu.
  (2) guest app exports virtio-gpu buffers buffers as dma-buf.
  (3) guest app imports the dma-bufs into virtio-vdec.
  (4) guest app asks the virtio-vdec driver to write the decoded
      frames into the dma-bufs.
  (5) guest app asks the virtio-gpu driver to display the decoded
      frame.

The guest video decoder driver passes the dma-buf pages to the host, and
it is the host driver's job to fill the buffer.  How this is done
exactly might depend on hardware capabilities (whenever a host-allocated
bounce buffer is needed or whenever the hardware can decode directly to
the dma-buf passed by the guest driver) and is an implementation detail.

Now, with cross-device sharing added the virtio-gpu would attach some
kind of identifier to the dma-buf, virtio-vdec could fetch the
identifier and pass it to the host too, and the host virtio-vdec device
can use the identifier to get a host dma-buf handle for the (virtio-gpu)
buffer.  Ask the host video decoder driver to import the host dma-buf.
If it all worked fine it can ask the host hardware to decode directly to
the host virtio-gpu resource.

> > Referencing virtio-gpu buffers needs a better plan than just re-using
> > virtio-gpu resource handles.  The handles are device-specific.  What if
> > there are multiple virtio-gpu devices present in the guest?
> >
> > I think we need a framework for cross-device buffer sharing.  One
> > possible option would be to have some kind of buffer registry, where
> > buffers can be registered for cross-device sharing and get a unique
> > id (a uuid maybe?).  Drivers would typically register buffers on
> > dma-buf export.
> 
> This approach could possibly let us handle this transparently to
> importers, which would work for guest kernel subsystems that rely on
> the ability to handle buffers like native memory (e.g. having a
> sgtable or DMA address) for them.
> 
> How about allocating guest physical addresses for memory corresponding
> to those buffers? On the virtio-gpu example, that could work like
> this:
>  - by default a virtio-gpu buffer has only a resource handle,
>  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> virtio-gpu device export the buffer to a host framework (inside the
> VMM) that would allocate guest page addresses for it, which the
> command would return in a response to the guest,

Hmm, the cross-device buffer sharing framework I have in mind would
basically be a buffer registry.  virtio-gpu would create buffers as
usual, create a identifier somehow (details to be hashed out), attach
the identifier to the dma-buf so it can be used as outlined above.

Also note that the guest manages the address space, so the host can't
simply allocate guest page addresses.  Mapping host virtio-gpu resources
into guest address space is planned, it'll most likely use a pci memory
bar to reserve some address space.  The host can map resources into that
pci bar, on guest request.

>  - virtio-gpu driver could then create a regular DMA-buf object for
> such memory, because it's just backed by pages (even though they may
> not be accessible to the guest; just like in the case of TrustZone
> memory protection on bare metal systems),

Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
able to lookup buffers using device and dma address, without explicitly
creating some identifier.  Not investigated yet in detail.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-14 12:19       ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-14 12:19 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, stevensd, Hans Verkuil,
	Linux Media Mailing List, Daniel Vetter

> > Well.  I think before even discussing the protocol details we need a
> > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > should be an optional optimization and not a requirement.  Also the
> > motivation for that should be clear (Let the host decoder write directly
> > to virtio-gpu resources, to display video without copying around the
> > decoded framebuffers from one device to another).
> 
> Just to make sure we're on the same page, what would the buffers come
> from if we don't use this optimization?
> 
> I can imagine a setup like this;
>  1) host device allocates host memory appropriate for usage with host
> video decoder,
>  2) guest driver allocates arbitrary guest pages for storage
> accessible to the guest software,
>  3) guest userspace writes input for the decoder to guest pages,
>  4) guest driver passes the list of pages for the input and output
> buffers to the host device
>  5) host device copies data from input guest pages to host buffer
>  6) host device runs the decoding
>  7) host device copies decoded frame to output guest pages
>  8) guest userspace can access decoded frame from those pages; back to 3
> 
> Is that something you have in mind?

I don't have any specific workflow in mind.

If you want display the decoded video frames you want use dma-bufs shared
by video decoder and gpu, right?  So the userspace application (video
player probably) would create the buffers using one of the drivers,
export them as dma-buf, then import them into the other driver.  Just
like you would do on physical hardware.  So, when using virtio-gpu
buffers:

  (1) guest app creates buffers using virtio-gpu.
  (2) guest app exports virtio-gpu buffers buffers as dma-buf.
  (3) guest app imports the dma-bufs into virtio-vdec.
  (4) guest app asks the virtio-vdec driver to write the decoded
      frames into the dma-bufs.
  (5) guest app asks the virtio-gpu driver to display the decoded
      frame.

The guest video decoder driver passes the dma-buf pages to the host, and
it is the host driver's job to fill the buffer.  How this is done
exactly might depend on hardware capabilities (whenever a host-allocated
bounce buffer is needed or whenever the hardware can decode directly to
the dma-buf passed by the guest driver) and is an implementation detail.

Now, with cross-device sharing added the virtio-gpu would attach some
kind of identifier to the dma-buf, virtio-vdec could fetch the
identifier and pass it to the host too, and the host virtio-vdec device
can use the identifier to get a host dma-buf handle for the (virtio-gpu)
buffer.  Ask the host video decoder driver to import the host dma-buf.
If it all worked fine it can ask the host hardware to decode directly to
the host virtio-gpu resource.

> > Referencing virtio-gpu buffers needs a better plan than just re-using
> > virtio-gpu resource handles.  The handles are device-specific.  What if
> > there are multiple virtio-gpu devices present in the guest?
> >
> > I think we need a framework for cross-device buffer sharing.  One
> > possible option would be to have some kind of buffer registry, where
> > buffers can be registered for cross-device sharing and get a unique
> > id (a uuid maybe?).  Drivers would typically register buffers on
> > dma-buf export.
> 
> This approach could possibly let us handle this transparently to
> importers, which would work for guest kernel subsystems that rely on
> the ability to handle buffers like native memory (e.g. having a
> sgtable or DMA address) for them.
> 
> How about allocating guest physical addresses for memory corresponding
> to those buffers? On the virtio-gpu example, that could work like
> this:
>  - by default a virtio-gpu buffer has only a resource handle,
>  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> virtio-gpu device export the buffer to a host framework (inside the
> VMM) that would allocate guest page addresses for it, which the
> command would return in a response to the guest,

Hmm, the cross-device buffer sharing framework I have in mind would
basically be a buffer registry.  virtio-gpu would create buffers as
usual, create a identifier somehow (details to be hashed out), attach
the identifier to the dma-buf so it can be used as outlined above.

Also note that the guest manages the address space, so the host can't
simply allocate guest page addresses.  Mapping host virtio-gpu resources
into guest address space is planned, it'll most likely use a pci memory
bar to reserve some address space.  The host can map resources into that
pci bar, on guest request.

>  - virtio-gpu driver could then create a regular DMA-buf object for
> such memory, because it's just backed by pages (even though they may
> not be accessible to the guest; just like in the case of TrustZone
> memory protection on bare metal systems),

Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
able to lookup buffers using device and dma address, without explicitly
creating some identifier.  Not investigated yet in detail.

cheers,
  Gerd


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-11  8:53               ` Dmitry Morozov
@ 2019-10-14 12:34                 ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-14 12:34 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

  Hi,

> My take on this (for a decoder) would be to allocate memory for output buffers 
> from a secure ION heap, import in the v4l2 driver, and then to provide those 
> to the device using virtio. The device side then uses the dmabuf framework to 
> make the buffers accessible for the hardware. I'm not sure about that, it's 
> just an idea.

Virtualization aside, how does the complete video decoding workflow
work?  I assume along the lines of ...

  (1) allocate buffer for decoded video frames (from ion).
  (2) export those buffers as dma-buf.
  (3) import dma-buf to video decoder.
  (4) import dma-buf to gpu.

... to establish buffers shared between video decoder and gpu?

Then feed the video stream into the decoder, which decodes into the ion
buffers?  Ask the gpu to scanout the ion buffers to show the video?

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-14 12:34                 ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-14 12:34 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

  Hi,

> My take on this (for a decoder) would be to allocate memory for output buffers 
> from a secure ION heap, import in the v4l2 driver, and then to provide those 
> to the device using virtio. The device side then uses the dmabuf framework to 
> make the buffers accessible for the hardware. I'm not sure about that, it's 
> just an idea.

Virtualization aside, how does the complete video decoding workflow
work?  I assume along the lines of ...

  (1) allocate buffer for decoded video frames (from ion).
  (2) export those buffers as dma-buf.
  (3) import dma-buf to video decoder.
  (4) import dma-buf to gpu.

... to establish buffers shared between video decoder and gpu?

Then feed the video stream into the decoder, which decodes into the ion
buffers?  Ask the gpu to scanout the ion buffers to show the video?

cheers,
  Gerd


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-14 12:34                 ` Gerd Hoffmann
@ 2019-10-14 13:05                   ` Dmitry Morozov
  -1 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-14 13:05 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter


On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
>   Hi,
> 
> > My take on this (for a decoder) would be to allocate memory for output
> > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > provide those to the device using virtio. The device side then uses the
> > dmabuf framework to make the buffers accessible for the hardware. I'm not
> > sure about that, it's just an idea.
> 
> Virtualization aside, how does the complete video decoding workflow
> work?  I assume along the lines of ...
> 
>   (1) allocate buffer for decoded video frames (from ion).
>   (2) export those buffers as dma-buf.
>   (3) import dma-buf to video decoder.
>   (4) import dma-buf to gpu.
> 
> ... to establish buffers shared between video decoder and gpu?
> 
> Then feed the video stream into the decoder, which decodes into the ion
> buffers?  Ask the gpu to scanout the ion buffers to show the video?
> 
> cheers,
>   Gerd

Yes, exactly.

[decoder]
1) Input buffers are allocated using  VIDIOC_*BUFS.
2) Output buffers are allocated in a guest specific manner (ION, gbm).
3) Both input and output buffers are exported as dma-bufs.
4) The backing storage of both inputs and outputs is made available to the 
device.
5) Decoder hardware writes to output buffers directly.
6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.

Best regards,
Dmitry 



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-14 13:05                   ` Dmitry Morozov
  0 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-14 13:05 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter


On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
>   Hi,
> 
> > My take on this (for a decoder) would be to allocate memory for output
> > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > provide those to the device using virtio. The device side then uses the
> > dmabuf framework to make the buffers accessible for the hardware. I'm not
> > sure about that, it's just an idea.
> 
> Virtualization aside, how does the complete video decoding workflow
> work?  I assume along the lines of ...
> 
>   (1) allocate buffer for decoded video frames (from ion).
>   (2) export those buffers as dma-buf.
>   (3) import dma-buf to video decoder.
>   (4) import dma-buf to gpu.
> 
> ... to establish buffers shared between video decoder and gpu?
> 
> Then feed the video stream into the decoder, which decodes into the ion
> buffers?  Ask the gpu to scanout the ion buffers to show the video?
> 
> cheers,
>   Gerd

Yes, exactly.

[decoder]
1) Input buffers are allocated using  VIDIOC_*BUFS.
2) Output buffers are allocated in a guest specific manner (ION, gbm).
3) Both input and output buffers are exported as dma-bufs.
4) The backing storage of both inputs and outputs is made available to the 
device.
5) Decoder hardware writes to output buffers directly.
6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.

Best regards,
Dmitry 



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-14 13:05                   ` Dmitry Morozov
@ 2019-10-15  7:54                     ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-15  7:54 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> 
> On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > My take on this (for a decoder) would be to allocate memory for output
> > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > provide those to the device using virtio. The device side then uses the
> > > dmabuf framework to make the buffers accessible for the hardware. I'm not
> > > sure about that, it's just an idea.
> > 
> > Virtualization aside, how does the complete video decoding workflow
> > work?  I assume along the lines of ...
> > 
> >   (1) allocate buffer for decoded video frames (from ion).
> >   (2) export those buffers as dma-buf.
> >   (3) import dma-buf to video decoder.
> >   (4) import dma-buf to gpu.
> > 
> > ... to establish buffers shared between video decoder and gpu?
> > 
> > Then feed the video stream into the decoder, which decodes into the ion
> > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > 
> > cheers,
> >   Gerd
> 
> Yes, exactly.
> 
> [decoder]
> 1) Input buffers are allocated using  VIDIOC_*BUFS.

Ok.

> 2) Output buffers are allocated in a guest specific manner (ION, gbm).

Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
used above sounds like using ION is required for decoding drm-protected
content.

So, do we have to worry about ION here?  Or can we just use gbm?

[ Note: don't know much about ion, other than that it is used by
        android, is in staging right now and patches to move it
        out of staging are floating around @ dri-devel ]

> 3) Both input and output buffers are exported as dma-bufs.
> 4) The backing storage of both inputs and outputs is made available to the 
> device.
> 5) Decoder hardware writes to output buffers directly.

As expected.

> 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.

Ok.  So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
should land upstream in kernel 5.5).  dma-buf imports are not that
simple unfortunately.  When using the gbm allocation route dma-buf
exports are good enough though.

The virtio-gpu resources have both a host buffer and a guest buffer[1]
Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
ioctls.  The dma-buf export will export the guest buffer (which lives
in guest ram).

It would make sense for the decoded video to go directly to the host
buffer though.  First because we want avoid copying the video frames for
performance reasons, and second because we might not be able to copy
video frames (drm ...).

This is where the buffer registry idea comes in.  Attach a (host)
identifier to (guest) dma-bufs, which then allows host device emulation
share buffers, i.e. virtio-vdec device emulation could decode to a
dma-buf it got from virtio-gpu device emulation.

Alternatively we could use virtual ION (or whatever it becomes after
de-staging) for buffer management, with both virtio-vdec and virtio-gpu
importing dma-bufs from virtual ION on both guest and host side.

cheers,
  Gerd

[1] support for shared buffers is in progress.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-15  7:54                     ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-15  7:54 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> 
> On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > My take on this (for a decoder) would be to allocate memory for output
> > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > provide those to the device using virtio. The device side then uses the
> > > dmabuf framework to make the buffers accessible for the hardware. I'm not
> > > sure about that, it's just an idea.
> > 
> > Virtualization aside, how does the complete video decoding workflow
> > work?  I assume along the lines of ...
> > 
> >   (1) allocate buffer for decoded video frames (from ion).
> >   (2) export those buffers as dma-buf.
> >   (3) import dma-buf to video decoder.
> >   (4) import dma-buf to gpu.
> > 
> > ... to establish buffers shared between video decoder and gpu?
> > 
> > Then feed the video stream into the decoder, which decodes into the ion
> > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > 
> > cheers,
> >   Gerd
> 
> Yes, exactly.
> 
> [decoder]
> 1) Input buffers are allocated using  VIDIOC_*BUFS.

Ok.

> 2) Output buffers are allocated in a guest specific manner (ION, gbm).

Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
used above sounds like using ION is required for decoding drm-protected
content.

So, do we have to worry about ION here?  Or can we just use gbm?

[ Note: don't know much about ion, other than that it is used by
        android, is in staging right now and patches to move it
        out of staging are floating around @ dri-devel ]

> 3) Both input and output buffers are exported as dma-bufs.
> 4) The backing storage of both inputs and outputs is made available to the 
> device.
> 5) Decoder hardware writes to output buffers directly.

As expected.

> 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.

Ok.  So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
should land upstream in kernel 5.5).  dma-buf imports are not that
simple unfortunately.  When using the gbm allocation route dma-buf
exports are good enough though.

The virtio-gpu resources have both a host buffer and a guest buffer[1]
Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
ioctls.  The dma-buf export will export the guest buffer (which lives
in guest ram).

It would make sense for the decoded video to go directly to the host
buffer though.  First because we want avoid copying the video frames for
performance reasons, and second because we might not be able to copy
video frames (drm ...).

This is where the buffer registry idea comes in.  Attach a (host)
identifier to (guest) dma-bufs, which then allows host device emulation
share buffers, i.e. virtio-vdec device emulation could decode to a
dma-buf it got from virtio-gpu device emulation.

Alternatively we could use virtual ION (or whatever it becomes after
de-staging) for buffer management, with both virtio-vdec and virtio-gpu
importing dma-bufs from virtual ION on both guest and host side.

cheers,
  Gerd

[1] support for shared buffers is in progress.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-15  7:54                     ` Gerd Hoffmann
@ 2019-10-15 14:06                       ` Dmitry Morozov
  -1 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-15 14:06 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

Hello Gerd,

On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote:
> On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > >   Hi,
> > >   
> > > > My take on this (for a decoder) would be to allocate memory for output
> > > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > > provide those to the device using virtio. The device side then uses
> > > > the
> > > > dmabuf framework to make the buffers accessible for the hardware. I'm
> > > > not
> > > > sure about that, it's just an idea.
> > > 
> > > Virtualization aside, how does the complete video decoding workflow
> > > work?  I assume along the lines of ...
> > > 
> > >   (1) allocate buffer for decoded video frames (from ion).
> > >   (2) export those buffers as dma-buf.
> > >   (3) import dma-buf to video decoder.
> > >   (4) import dma-buf to gpu.
> > > 
> > > ... to establish buffers shared between video decoder and gpu?
> > > 
> > > Then feed the video stream into the decoder, which decodes into the ion
> > > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > > 
> > > cheers,
> > > 
> > >   Gerd
> > 
> > Yes, exactly.
> > 
> > [decoder]
> > 1) Input buffers are allocated using  VIDIOC_*BUFS.
> 
> Ok.
> 
> > 2) Output buffers are allocated in a guest specific manner (ION, gbm).
> 
> Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
> used above sounds like using ION is required for decoding drm-protected
> content.

I mention the secure ION heap to address this Chrome OS related point:
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all

There was an RFC to implement a secure memory allocation framework, but 
apparently it was not accepted: https://lwn.net/Articles/661549/.

In case of Android, it allocates GPU buffers for output frames, so it is the 
gralloc implementation who decides how to allocate memory. It can use some 
dedicated ION heap or can use libgbm. It can also be some proprietary 
implementation.

> 
> So, do we have to worry about ION here?  Or can we just use gbm?

If we replace vendor specific code in the Android guest and provide a way to 
communicate meatdata for buffer allocations from the device to the driver, we 
can use gbm. In the PC world it might be easier.

> 
> [ Note: don't know much about ion, other than that it is used by
>         android, is in staging right now and patches to move it
>         out of staging are floating around @ dri-devel ]
> 
> > 3) Both input and output buffers are exported as dma-bufs.
> > 4) The backing storage of both inputs and outputs is made available to the
> > device.
> > 5) Decoder hardware writes to output buffers directly.
> 
> As expected.
> 
> > 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.
> 
> Ok.  So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
> should land upstream in kernel 5.5).  dma-buf imports are not that
> simple unfortunately.  When using the gbm allocation route dma-buf
> exports are good enough though.
>
> The virtio-gpu resources have both a host buffer and a guest buffer[1]
> Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
> ioctls.  The dma-buf export will export the guest buffer (which lives
> in guest ram).
> 
> It would make sense for the decoded video to go directly to the host
> buffer though.  First because we want avoid copying the video frames for
> performance reasons, and second because we might not be able to copy
> video frames (drm ...).
> 
> This is where the buffer registry idea comes in.  Attach a (host)
> identifier to (guest) dma-bufs, which then allows host device emulation
> share buffers, i.e. virtio-vdec device emulation could decode to a
> dma-buf it got from virtio-gpu device emulation.

Yes. Also, as I mentioned above, in case of gbm the buffers already can 
originate from GPU.

Best regards,
Dmitry.

> 
> Alternatively we could use virtual ION (or whatever it becomes after
> de-staging) for buffer management, with both virtio-vdec and virtio-gpu
> importing dma-bufs from virtual ION on both guest and host side.
> 
> cheers,
>   Gerd
> 
> [1] support for shared buffers is in progress.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-15 14:06                       ` Dmitry Morozov
  0 siblings, 0 replies; 59+ messages in thread
From: Dmitry Morozov @ 2019-10-15 14:06 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

Hello Gerd,

On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote:
> On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > >   Hi,
> > >   
> > > > My take on this (for a decoder) would be to allocate memory for output
> > > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > > provide those to the device using virtio. The device side then uses
> > > > the
> > > > dmabuf framework to make the buffers accessible for the hardware. I'm
> > > > not
> > > > sure about that, it's just an idea.
> > > 
> > > Virtualization aside, how does the complete video decoding workflow
> > > work?  I assume along the lines of ...
> > > 
> > >   (1) allocate buffer for decoded video frames (from ion).
> > >   (2) export those buffers as dma-buf.
> > >   (3) import dma-buf to video decoder.
> > >   (4) import dma-buf to gpu.
> > > 
> > > ... to establish buffers shared between video decoder and gpu?
> > > 
> > > Then feed the video stream into the decoder, which decodes into the ion
> > > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > > 
> > > cheers,
> > > 
> > >   Gerd
> > 
> > Yes, exactly.
> > 
> > [decoder]
> > 1) Input buffers are allocated using  VIDIOC_*BUFS.
> 
> Ok.
> 
> > 2) Output buffers are allocated in a guest specific manner (ION, gbm).
> 
> Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
> used above sounds like using ION is required for decoding drm-protected
> content.

I mention the secure ION heap to address this Chrome OS related point:
> 3) protected content decoding: the memory for decoded video frames
> must not be accessible to the guest at all

There was an RFC to implement a secure memory allocation framework, but 
apparently it was not accepted: https://lwn.net/Articles/661549/.

In case of Android, it allocates GPU buffers for output frames, so it is the 
gralloc implementation who decides how to allocate memory. It can use some 
dedicated ION heap or can use libgbm. It can also be some proprietary 
implementation.

> 
> So, do we have to worry about ION here?  Or can we just use gbm?

If we replace vendor specific code in the Android guest and provide a way to 
communicate meatdata for buffer allocations from the device to the driver, we 
can use gbm. In the PC world it might be easier.

> 
> [ Note: don't know much about ion, other than that it is used by
>         android, is in staging right now and patches to move it
>         out of staging are floating around @ dri-devel ]
> 
> > 3) Both input and output buffers are exported as dma-bufs.
> > 4) The backing storage of both inputs and outputs is made available to the
> > device.
> > 5) Decoder hardware writes to output buffers directly.
> 
> As expected.
> 
> > 6) Back to the guest side, the output dma-bufs are used by (virtio-) gpu.
> 
> Ok.  So, virtio-gpu has support for dma-buf exports (in drm-misc-next,
> should land upstream in kernel 5.5).  dma-buf imports are not that
> simple unfortunately.  When using the gbm allocation route dma-buf
> exports are good enough though.
>
> The virtio-gpu resources have both a host buffer and a guest buffer[1]
> Data can be copied using the DRM_IOCTL_VIRTGPU_TRANSFER_{FROM,TO}_HOST
> ioctls.  The dma-buf export will export the guest buffer (which lives
> in guest ram).
> 
> It would make sense for the decoded video to go directly to the host
> buffer though.  First because we want avoid copying the video frames for
> performance reasons, and second because we might not be able to copy
> video frames (drm ...).
> 
> This is where the buffer registry idea comes in.  Attach a (host)
> identifier to (guest) dma-bufs, which then allows host device emulation
> share buffers, i.e. virtio-vdec device emulation could decode to a
> dma-buf it got from virtio-gpu device emulation.

Yes. Also, as I mentioned above, in case of gbm the buffers already can 
originate from GPU.

Best regards,
Dmitry.

> 
> Alternatively we could use virtual ION (or whatever it becomes after
> de-staging) for buffer management, with both virtio-vdec and virtio-gpu
> importing dma-bufs from virtual ION on both guest and host side.
> 
> cheers,
>   Gerd
> 
> [1] support for shared buffers is in progress.




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-11  8:53               ` Dmitry Morozov
@ 2019-10-17  6:40                 ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  6:40 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: Gerd Hoffmann, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

On Fri, Oct 11, 2019 at 5:54 PM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hi Tomasz,
>
> On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote:
> > On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
> >
> > <dmitry.morozov@opensynergy.com> wrote:
> > > Hi Tomasz,
> > >
> > > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > > > Hi Dmitry,
> > > >
> > > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> > > >
> > > > <dmitry.morozov@opensynergy.com> wrote:
> > > > > Hello,
> > > > >
> > > > > We at OpenSynergy are also working on an abstract paravirtualized
> > > > > video
> > > > > streaming device that operates input and/or output data buffers and
> > > > > can be
> > > > > used as a generic video decoder/encoder/input/output device.
> > > > >
> > > > > We would be glad to share our thoughts and contribute to the
> > > > > discussion.
> > > > > Please see some comments regarding buffer allocation inline.
> > > > >
> > > > > Best regards,
> > > > > Dmitry.
> > > > >
> > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > > > Hi Gerd,
> > > > > >
> > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com>
> wrote:
> > > > > > >   Hi,
> > > > > > >
> > > > > > > > Our prototype implementation uses [4], which allows the
> > > > > > > > virtio-vdec
> > > > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > > > >
> > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > > > >
> > > > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > > > feedback. Sorry for the late reply.
> > > > > >
> > > > > > For reference, Keiichi is working with me and David Stevens on
> > > > > > accelerated video support for virtual machines and integration with
> > > > > > other virtual devices, like virtio-gpu for rendering or our
> > > > > > currently-downstream virtio-wayland for display (I believe there is
> > > > > > ongoing work to solve this problem in upstream too).
> > > > > >
> > > > > > > Well.  I think before even discussing the protocol details we need
> > > > > > > a
> > > > > > > reasonable plan for buffer handling.  I think using virtio-gpu
> > > > > > > buffers
> > > > > > > should be an optional optimization and not a requirement.  Also
> > > > > > > the
> > > > > > > motivation for that should be clear (Let the host decoder write
> > > > > > > directly
> > > > > > > to virtio-gpu resources, to display video without copying around
> > > > > > > the
> > > > > > > decoded framebuffers from one device to another).
> > > > > >
> > > > > > Just to make sure we're on the same page, what would the buffers
> > > > > > come
> > > > > > from if we don't use this optimization?
> > > > > >
> > > > > > I can imagine a setup like this;
> > > > > >
> > > > > >  1) host device allocates host memory appropriate for usage with
> > > > > >  host
> > > > > >
> > > > > > video decoder,
> > > > > >
> > > > > >  2) guest driver allocates arbitrary guest pages for storage
> > > > > >
> > > > > > accessible to the guest software,
> > > > > >
> > > > > >  3) guest userspace writes input for the decoder to guest pages,
> > > > > >  4) guest driver passes the list of pages for the input and output
> > > > > >
> > > > > > buffers to the host device
> > > > > >
> > > > > >  5) host device copies data from input guest pages to host buffer
> > > > > >  6) host device runs the decoding
> > > > > >  7) host device copies decoded frame to output guest pages
> > > > > >  8) guest userspace can access decoded frame from those pages; back
> > > > > >  to 3
> > > > > >
> > > > > > Is that something you have in mind?
> > > > >
> > > > > While GPU side allocations can be useful (especially in case of
> > > > > decoder),
> > > > > it could be more practical to stick to driver side allocations. This
> > > > > is
> > > > > also due to the fact that paravirtualized encoders and cameras are not
> > > > > necessarily require a GPU device.
> > > > >
> > > > > Also, the v4l2 framework already features convenient helpers for CMA
> > > > > and
> > > > > SG
> > > > > allocations. The buffers can be used in the same manner as in
> > > > > virtio-gpu:
> > > > > buffers are first attached to an already allocated buffer/resource
> > > > > descriptor and then are made available for processing by the device
> > > > > using
> > > > > a dedicated command from the driver.
> > > >
> > > > First of all, thanks a lot for your input. This is a relatively new
> > > > area of virtualization and we definitely need to collect various
> > > > possible perspectives in the discussion.
> > > >
> > > > From Chrome OS point of view, there are several aspects for which the
> > > > guest side allocation doesn't really work well:
> > > > 1) host-side hardware has a lot of specific low level allocation
> > > > requirements, like alignments, paddings, address space limitations and
> > > > so on, which is not something that can be (easily) taught to the guest
> > > > OS,
> > >
> > > I couldn't agree more. There are some changes by Greg to add support for
> > > querying GPU buffer metadata. Probably those changes could be integrated
> > > with 'a framework for cross-device buffer sharing' (something that Greg
> > > mentioned earlier in the thread and that would totally make sense).
> >
> > Did you mean one of Gerd's proposals?
> >
> > I think we need some clarification there, as it's not clear to me
> > whether the framework is host-side, guest-side or both. The approach I
> > suggested would rely on a host-side framework and guest-side wouldn't
> > need any special handling for sharing, because the memory would behave
> > as on bare metal.
> >
> > However allocation would still need some special API to express high
> > level buffer parameters and delegate the exact allocation requirements
> > to the host. Currently virtio-gpu already has such interface and also
> > has a DRM driver, which were the 2 main reasons for us to use it as
> > the allocator in Chrome OS. (minigbm/cros_gralloc are implemented on
> > top of the Linux DRM API)
> >
> Yes, it was about Gerd's proposals. To be honest, I was considering guest
> allocations only. The operation flow in that case might look in more or less
> the same way: the driver (GPU, Codec/Camera) first allocates a resource
> descriptor on the host side. Than the driver uses the framework from above (so
> support on both sides might be required) to request buffer metadata and does
> allocations on the guest side accordingly. Then it attaches backing storage to
> the host resource.
> > > > 2) allocation system is designed to be centralized, like Android
> > > > gralloc, because there is almost never a case when a buffer is to be
> > > > used only with 1 specific device. 99% of the cases are pipelines like
> > > > decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> > > > encoder and so on, which means that allocations need to take into
> > > > account multiple hardware constraints.
> > > > 3) protected content decoding: the memory for decoded video frames
> > > > must not be accessible to the guest at all
> > >
> > > This looks like a valid use case. Would it also be possible for instance
> > > to
> > > allocate mem from a secure ION heap on the guest and then to provide the
> > > sgt to the device? We don't necessarily need to map that sgt for guest
> > > access.
> > Could you elaborate on how that would work? Would the secure ION heap
> > implementation use some protocol to request the allocation from the
> > host?
> >
> > Another aspect is that on Chrome OS we don't support pre-reserved
> > carveout heaps, so we need this memory to be allocated by the host
> > dynamically.
> >
> My take on this (for a decoder) would be to allocate memory for output buffers
> from a secure ION heap, import in the v4l2 driver, and then to provide those
> to the device using virtio. The device side then uses the dmabuf framework to
> make the buffers accessible for the hardware. I'm not sure about that, it's
> just an idea.

Where is the secure ION heap implemented? On the host or on the guest?
If the latter, how is it ensured that it's really secure?

That said, Chrome OS would use a similar model, except that we don't
use ION. We would likely use minigbm backed by virtio-gpu to allocate
appropriate secure buffers for us and then import them to the V4L2
driver.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17  6:40                 ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  6:40 UTC (permalink / raw)
  To: Dmitry Morozov
  Cc: Gerd Hoffmann, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

On Fri, Oct 11, 2019 at 5:54 PM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hi Tomasz,
>
> On Mittwoch, 9. Oktober 2019 05:55:45 CEST Tomasz Figa wrote:
> > On Tue, Oct 8, 2019 at 12:09 AM Dmitry Morozov
> >
> > <dmitry.morozov@opensynergy.com> wrote:
> > > Hi Tomasz,
> > >
> > > On Montag, 7. Oktober 2019 16:14:13 CEST Tomasz Figa wrote:
> > > > Hi Dmitry,
> > > >
> > > > On Mon, Oct 7, 2019 at 11:01 PM Dmitry Morozov
> > > >
> > > > <dmitry.morozov@opensynergy.com> wrote:
> > > > > Hello,
> > > > >
> > > > > We at OpenSynergy are also working on an abstract paravirtualized
> > > > > video
> > > > > streaming device that operates input and/or output data buffers and
> > > > > can be
> > > > > used as a generic video decoder/encoder/input/output device.
> > > > >
> > > > > We would be glad to share our thoughts and contribute to the
> > > > > discussion.
> > > > > Please see some comments regarding buffer allocation inline.
> > > > >
> > > > > Best regards,
> > > > > Dmitry.
> > > > >
> > > > > On Samstag, 5. Oktober 2019 08:08:12 CEST Tomasz Figa wrote:
> > > > > > Hi Gerd,
> > > > > >
> > > > > > On Mon, Sep 23, 2019 at 5:56 PM Gerd Hoffmann <kraxel@redhat.com>
> wrote:
> > > > > > >   Hi,
> > > > > > >
> > > > > > > > Our prototype implementation uses [4], which allows the
> > > > > > > > virtio-vdec
> > > > > > > > device to use buffers allocated by virtio-gpu device.
> > > > > > > >
> > > > > > > > [4] https://lkml.org/lkml/2019/9/12/157
> > > > > >
> > > > > > First of all, thanks for taking a look at this RFC and for valuable
> > > > > > feedback. Sorry for the late reply.
> > > > > >
> > > > > > For reference, Keiichi is working with me and David Stevens on
> > > > > > accelerated video support for virtual machines and integration with
> > > > > > other virtual devices, like virtio-gpu for rendering or our
> > > > > > currently-downstream virtio-wayland for display (I believe there is
> > > > > > ongoing work to solve this problem in upstream too).
> > > > > >
> > > > > > > Well.  I think before even discussing the protocol details we need
> > > > > > > a
> > > > > > > reasonable plan for buffer handling.  I think using virtio-gpu
> > > > > > > buffers
> > > > > > > should be an optional optimization and not a requirement.  Also
> > > > > > > the
> > > > > > > motivation for that should be clear (Let the host decoder write
> > > > > > > directly
> > > > > > > to virtio-gpu resources, to display video without copying around
> > > > > > > the
> > > > > > > decoded framebuffers from one device to another).
> > > > > >
> > > > > > Just to make sure we're on the same page, what would the buffers
> > > > > > come
> > > > > > from if we don't use this optimization?
> > > > > >
> > > > > > I can imagine a setup like this;
> > > > > >
> > > > > >  1) host device allocates host memory appropriate for usage with
> > > > > >  host
> > > > > >
> > > > > > video decoder,
> > > > > >
> > > > > >  2) guest driver allocates arbitrary guest pages for storage
> > > > > >
> > > > > > accessible to the guest software,
> > > > > >
> > > > > >  3) guest userspace writes input for the decoder to guest pages,
> > > > > >  4) guest driver passes the list of pages for the input and output
> > > > > >
> > > > > > buffers to the host device
> > > > > >
> > > > > >  5) host device copies data from input guest pages to host buffer
> > > > > >  6) host device runs the decoding
> > > > > >  7) host device copies decoded frame to output guest pages
> > > > > >  8) guest userspace can access decoded frame from those pages; back
> > > > > >  to 3
> > > > > >
> > > > > > Is that something you have in mind?
> > > > >
> > > > > While GPU side allocations can be useful (especially in case of
> > > > > decoder),
> > > > > it could be more practical to stick to driver side allocations. This
> > > > > is
> > > > > also due to the fact that paravirtualized encoders and cameras are not
> > > > > necessarily require a GPU device.
> > > > >
> > > > > Also, the v4l2 framework already features convenient helpers for CMA
> > > > > and
> > > > > SG
> > > > > allocations. The buffers can be used in the same manner as in
> > > > > virtio-gpu:
> > > > > buffers are first attached to an already allocated buffer/resource
> > > > > descriptor and then are made available for processing by the device
> > > > > using
> > > > > a dedicated command from the driver.
> > > >
> > > > First of all, thanks a lot for your input. This is a relatively new
> > > > area of virtualization and we definitely need to collect various
> > > > possible perspectives in the discussion.
> > > >
> > > > From Chrome OS point of view, there are several aspects for which the
> > > > guest side allocation doesn't really work well:
> > > > 1) host-side hardware has a lot of specific low level allocation
> > > > requirements, like alignments, paddings, address space limitations and
> > > > so on, which is not something that can be (easily) taught to the guest
> > > > OS,
> > >
> > > I couldn't agree more. There are some changes by Greg to add support for
> > > querying GPU buffer metadata. Probably those changes could be integrated
> > > with 'a framework for cross-device buffer sharing' (something that Greg
> > > mentioned earlier in the thread and that would totally make sense).
> >
> > Did you mean one of Gerd's proposals?
> >
> > I think we need some clarification there, as it's not clear to me
> > whether the framework is host-side, guest-side or both. The approach I
> > suggested would rely on a host-side framework and guest-side wouldn't
> > need any special handling for sharing, because the memory would behave
> > as on bare metal.
> >
> > However allocation would still need some special API to express high
> > level buffer parameters and delegate the exact allocation requirements
> > to the host. Currently virtio-gpu already has such interface and also
> > has a DRM driver, which were the 2 main reasons for us to use it as
> > the allocator in Chrome OS. (minigbm/cros_gralloc are implemented on
> > top of the Linux DRM API)
> >
> Yes, it was about Gerd's proposals. To be honest, I was considering guest
> allocations only. The operation flow in that case might look in more or less
> the same way: the driver (GPU, Codec/Camera) first allocates a resource
> descriptor on the host side. Than the driver uses the framework from above (so
> support on both sides might be required) to request buffer metadata and does
> allocations on the guest side accordingly. Then it attaches backing storage to
> the host resource.
> > > > 2) allocation system is designed to be centralized, like Android
> > > > gralloc, because there is almost never a case when a buffer is to be
> > > > used only with 1 specific device. 99% of the cases are pipelines like
> > > > decoder -> GPU/display, camera -> encoder + GPU/display, GPU ->
> > > > encoder and so on, which means that allocations need to take into
> > > > account multiple hardware constraints.
> > > > 3) protected content decoding: the memory for decoded video frames
> > > > must not be accessible to the guest at all
> > >
> > > This looks like a valid use case. Would it also be possible for instance
> > > to
> > > allocate mem from a secure ION heap on the guest and then to provide the
> > > sgt to the device? We don't necessarily need to map that sgt for guest
> > > access.
> > Could you elaborate on how that would work? Would the secure ION heap
> > implementation use some protocol to request the allocation from the
> > host?
> >
> > Another aspect is that on Chrome OS we don't support pre-reserved
> > carveout heaps, so we need this memory to be allocated by the host
> > dynamically.
> >
> My take on this (for a decoder) would be to allocate memory for output buffers
> from a secure ION heap, import in the v4l2 driver, and then to provide those
> to the device using virtio. The device side then uses the dmabuf framework to
> make the buffers accessible for the hardware. I'm not sure about that, it's
> just an idea.

Where is the secure ION heap implemented? On the host or on the guest?
If the latter, how is it ensured that it's really secure?

That said, Chrome OS would use a similar model, except that we don't
use ION. We would likely use minigbm backed by virtio-gpu to allocate
appropriate secure buffers for us and then import them to the V4L2
driver.

Best regards,
Tomasz

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-14 12:19       ` Gerd Hoffmann
@ 2019-10-17  6:58         ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  6:58 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

On Mon, Oct 14, 2019 at 9:19 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> > > Well.  I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.  Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >  1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >  2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >  3) guest userspace writes input for the decoder to guest pages,
> >  4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >  5) host device copies data from input guest pages to host buffer
> >  6) host device runs the decoding
> >  7) host device copies decoded frame to output guest pages
> >  8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
>
> I don't have any specific workflow in mind.
>
> If you want display the decoded video frames you want use dma-bufs shared
> by video decoder and gpu, right?  So the userspace application (video
> player probably) would create the buffers using one of the drivers,
> export them as dma-buf, then import them into the other driver.  Just
> like you would do on physical hardware.  So, when using virtio-gpu
> buffers:
>
>   (1) guest app creates buffers using virtio-gpu.
>   (2) guest app exports virtio-gpu buffers buffers as dma-buf.
>   (3) guest app imports the dma-bufs into virtio-vdec.
>   (4) guest app asks the virtio-vdec driver to write the decoded
>       frames into the dma-bufs.
>   (5) guest app asks the virtio-gpu driver to display the decoded
>       frame.
>
> The guest video decoder driver passes the dma-buf pages to the host, and
> it is the host driver's job to fill the buffer.  How this is done
> exactly might depend on hardware capabilities (whenever a host-allocated
> bounce buffer is needed or whenever the hardware can decode directly to
> the dma-buf passed by the guest driver) and is an implementation detail.
>
> Now, with cross-device sharing added the virtio-gpu would attach some
> kind of identifier to the dma-buf, virtio-vdec could fetch the
> identifier and pass it to the host too, and the host virtio-vdec device
> can use the identifier to get a host dma-buf handle for the (virtio-gpu)
> buffer.  Ask the host video decoder driver to import the host dma-buf.
> If it all worked fine it can ask the host hardware to decode directly to
> the host virtio-gpu resource.
>

Agreed.

> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.  The handles are device-specific.  What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.  One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > dma-buf export.
> >
> > This approach could possibly let us handle this transparently to
> > importers, which would work for guest kernel subsystems that rely on
> > the ability to handle buffers like native memory (e.g. having a
> > sgtable or DMA address) for them.
> >
> > How about allocating guest physical addresses for memory corresponding
> > to those buffers? On the virtio-gpu example, that could work like
> > this:
> >  - by default a virtio-gpu buffer has only a resource handle,
> >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > virtio-gpu device export the buffer to a host framework (inside the
> > VMM) that would allocate guest page addresses for it, which the
> > command would return in a response to the guest,
>
> Hmm, the cross-device buffer sharing framework I have in mind would
> basically be a buffer registry.  virtio-gpu would create buffers as
> usual, create a identifier somehow (details to be hashed out), attach
> the identifier to the dma-buf so it can be used as outlined above.
>
> Also note that the guest manages the address space, so the host can't
> simply allocate guest page addresses.

Is this really true? I'm not an expert in this area, but on a bare
metal system it's the hardware or firmware that sets up the various
physical address allocations on a hardware level and most of the time
most of the addresses are already pre-assigned in hardware (like the
DRAM base, various IOMEM spaces, etc.).

I think that means that we could have a reserved region that could be
used by the host for dynamic memory hot-plug-like operation. The
reference to memory hot-plug here is fully intentional, we could even
use this feature of Linux to get struct pages for such memory if we
really wanted.

> Mapping host virtio-gpu resources
> into guest address space is planned, it'll most likely use a pci memory
> bar to reserve some address space.  The host can map resources into that
> pci bar, on guest request.

Sounds like a viable option too. Do you have a pointer to some
description on how this would work on both host and guest side?

>
> >  - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
>
> Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> able to lookup buffers using device and dma address, without explicitly
> creating some identifier.  Not investigated yet in detail.

Not backed by pages as in "struct page", but those are still regular
pages of the physical address space. That said, currently the sg_table
interface is only able to describe physical memory using struct page
pointers. It's been a long standing limitation affecting even bare
metal systems, so perhaps it's just the right time to make them
possible to use some other identifiers, like PFNs? There is at least
one more free bit in the page_link field, which could mean that the
field contains a PFN instead of a struct page pointer.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17  6:58         ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  6:58 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

On Mon, Oct 14, 2019 at 9:19 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> > > Well.  I think before even discussing the protocol details we need a
> > > reasonable plan for buffer handling.  I think using virtio-gpu buffers
> > > should be an optional optimization and not a requirement.  Also the
> > > motivation for that should be clear (Let the host decoder write directly
> > > to virtio-gpu resources, to display video without copying around the
> > > decoded framebuffers from one device to another).
> >
> > Just to make sure we're on the same page, what would the buffers come
> > from if we don't use this optimization?
> >
> > I can imagine a setup like this;
> >  1) host device allocates host memory appropriate for usage with host
> > video decoder,
> >  2) guest driver allocates arbitrary guest pages for storage
> > accessible to the guest software,
> >  3) guest userspace writes input for the decoder to guest pages,
> >  4) guest driver passes the list of pages for the input and output
> > buffers to the host device
> >  5) host device copies data from input guest pages to host buffer
> >  6) host device runs the decoding
> >  7) host device copies decoded frame to output guest pages
> >  8) guest userspace can access decoded frame from those pages; back to 3
> >
> > Is that something you have in mind?
>
> I don't have any specific workflow in mind.
>
> If you want display the decoded video frames you want use dma-bufs shared
> by video decoder and gpu, right?  So the userspace application (video
> player probably) would create the buffers using one of the drivers,
> export them as dma-buf, then import them into the other driver.  Just
> like you would do on physical hardware.  So, when using virtio-gpu
> buffers:
>
>   (1) guest app creates buffers using virtio-gpu.
>   (2) guest app exports virtio-gpu buffers buffers as dma-buf.
>   (3) guest app imports the dma-bufs into virtio-vdec.
>   (4) guest app asks the virtio-vdec driver to write the decoded
>       frames into the dma-bufs.
>   (5) guest app asks the virtio-gpu driver to display the decoded
>       frame.
>
> The guest video decoder driver passes the dma-buf pages to the host, and
> it is the host driver's job to fill the buffer.  How this is done
> exactly might depend on hardware capabilities (whenever a host-allocated
> bounce buffer is needed or whenever the hardware can decode directly to
> the dma-buf passed by the guest driver) and is an implementation detail.
>
> Now, with cross-device sharing added the virtio-gpu would attach some
> kind of identifier to the dma-buf, virtio-vdec could fetch the
> identifier and pass it to the host too, and the host virtio-vdec device
> can use the identifier to get a host dma-buf handle for the (virtio-gpu)
> buffer.  Ask the host video decoder driver to import the host dma-buf.
> If it all worked fine it can ask the host hardware to decode directly to
> the host virtio-gpu resource.
>

Agreed.

> > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > virtio-gpu resource handles.  The handles are device-specific.  What if
> > > there are multiple virtio-gpu devices present in the guest?
> > >
> > > I think we need a framework for cross-device buffer sharing.  One
> > > possible option would be to have some kind of buffer registry, where
> > > buffers can be registered for cross-device sharing and get a unique
> > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > dma-buf export.
> >
> > This approach could possibly let us handle this transparently to
> > importers, which would work for guest kernel subsystems that rely on
> > the ability to handle buffers like native memory (e.g. having a
> > sgtable or DMA address) for them.
> >
> > How about allocating guest physical addresses for memory corresponding
> > to those buffers? On the virtio-gpu example, that could work like
> > this:
> >  - by default a virtio-gpu buffer has only a resource handle,
> >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > virtio-gpu device export the buffer to a host framework (inside the
> > VMM) that would allocate guest page addresses for it, which the
> > command would return in a response to the guest,
>
> Hmm, the cross-device buffer sharing framework I have in mind would
> basically be a buffer registry.  virtio-gpu would create buffers as
> usual, create a identifier somehow (details to be hashed out), attach
> the identifier to the dma-buf so it can be used as outlined above.
>
> Also note that the guest manages the address space, so the host can't
> simply allocate guest page addresses.

Is this really true? I'm not an expert in this area, but on a bare
metal system it's the hardware or firmware that sets up the various
physical address allocations on a hardware level and most of the time
most of the addresses are already pre-assigned in hardware (like the
DRAM base, various IOMEM spaces, etc.).

I think that means that we could have a reserved region that could be
used by the host for dynamic memory hot-plug-like operation. The
reference to memory hot-plug here is fully intentional, we could even
use this feature of Linux to get struct pages for such memory if we
really wanted.

> Mapping host virtio-gpu resources
> into guest address space is planned, it'll most likely use a pci memory
> bar to reserve some address space.  The host can map resources into that
> pci bar, on guest request.

Sounds like a viable option too. Do you have a pointer to some
description on how this would work on both host and guest side?

>
> >  - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
>
> Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> able to lookup buffers using device and dma address, without explicitly
> creating some identifier.  Not investigated yet in detail.

Not backed by pages as in "struct page", but those are still regular
pages of the physical address space. That said, currently the sg_table
interface is only able to describe physical memory using struct page
pointers. It's been a long standing limitation affecting even bare
metal systems, so perhaps it's just the right time to make them
possible to use some other identifiers, like PFNs? There is at least
one more free bit in the page_link field, which could mean that the
field contains a PFN instead of a struct page pointer.

Best regards,
Tomasz

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-14 12:19       ` Gerd Hoffmann
  (?)
  (?)
@ 2019-10-17  7:06       ` David Stevens
  -1 siblings, 0 replies; 59+ messages in thread
From: David Stevens @ 2019-10-17  7:06 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, Keiichi Watanabe, virtio-dev, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

> Hmm, the cross-device buffer sharing framework I have in mind would
> basically be a buffer registry.  virtio-gpu would create buffers as
> usual, create a identifier somehow (details to be hashed out), attach
> the identifier to the dma-buf so it can be used as outlined above.

Using physical addresses to identify buffers is using the guest
physical address space as the buffer registry. Especially if every
device should be able to operate in isolation, then each virtio
protocol will have some way to allocate buffers that are accessible to
the guest and host. This requires guest physical addresses, and the
guest physical address of the start of the buffer can serve as the
unique identifier for the buffer in both the guest and the host. Even
with buffers that are only accessible to the host, I think it's
reasonable to allocate guest physical addresses since the pages still
exist (in the same way physical addresses for secure physical memory
make sense).

This approach also sidesteps the need for explicit registration. With
explicit registration, either there would need to be some centralized
buffer exporter device or each protocol would need to have its own
export function. Using guest physical addresses means that buffers get
a unique identifier during creation. For example, in the virtio-gpu
protocol, buffers would get this identifier through
VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING, or through
VIRTIO_GPU_CMD_RESOURCE_CREATE_V2 with impending additions to resource
creation.

> Also note that the guest manages the address space, so the host can't
> simply allocate guest page addresses.  Mapping host virtio-gpu resources
> into guest address space is planned, it'll most likely use a pci memory
> bar to reserve some address space.  The host can map resources into that
> pci bar, on guest request.
>
> >  - virtio-gpu driver could then create a regular DMA-buf object for
> > such memory, because it's just backed by pages (even though they may
> > not be accessible to the guest; just like in the case of TrustZone
> > memory protection on bare metal systems),
>
> Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> able to lookup buffers using device and dma address, without explicitly
> creating some identifier.  Not investigated yet in detail.

For the linux guest implementation, mapping a dma-buf doesn't
necessarily require actual pages. The exporting driver's map_dma_buf
function just needs to provide a sg_table with populated dma_addres
fields, it doesn't actually need to populate the sg_table with pages.
At the very least, there are places such as i915_gem_stolen.c and
(some situations of) videobuf-dma-sg.c that take this approach.

Cheers,
David

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17  6:40                 ` Tomasz Figa
@ 2019-10-17  7:19                   ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17  7:19 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Dmitry Morozov, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

  Hi,

> That said, Chrome OS would use a similar model, except that we don't
> use ION. We would likely use minigbm backed by virtio-gpu to allocate
> appropriate secure buffers for us and then import them to the V4L2
> driver.

What exactly is a "secure buffer"?  I guess a gem object where read
access is not allowed, only scanout to display?  Who enforces this?
The hardware?  Or the kernel driver?

It might make sense for virtio-gpu to know that concept, to allow guests
ask for secure buffers.

And of course we'll need some way to pass around identifiers for these
(and maybe other) buffers (from virtio-gpu device via guest drivers to
virtio-vdec device).  virtio-gpu guest driver could generate a uuid for
that, attach it to the dma-buf and also notify the host so qemu can
maintain a uuid -> buffer lookup table.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17  7:19                   ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17  7:19 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Dmitry Morozov, stevensd, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

  Hi,

> That said, Chrome OS would use a similar model, except that we don't
> use ION. We would likely use minigbm backed by virtio-gpu to allocate
> appropriate secure buffers for us and then import them to the V4L2
> driver.

What exactly is a "secure buffer"?  I guess a gem object where read
access is not allowed, only scanout to display?  Who enforces this?
The hardware?  Or the kernel driver?

It might make sense for virtio-gpu to know that concept, to allow guests
ask for secure buffers.

And of course we'll need some way to pass around identifiers for these
(and maybe other) buffers (from virtio-gpu device via guest drivers to
virtio-vdec device).  virtio-gpu guest driver could generate a uuid for
that, attach it to the dma-buf and also notify the host so qemu can
maintain a uuid -> buffer lookup table.

cheers,
  Gerd


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17  6:58         ` Tomasz Figa
@ 2019-10-17  7:44           ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17  7:44 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

  Hi,

> > Also note that the guest manages the address space, so the host can't
> > simply allocate guest page addresses.
> 
> Is this really true? I'm not an expert in this area, but on a bare
> metal system it's the hardware or firmware that sets up the various
> physical address allocations on a hardware level and most of the time
> most of the addresses are already pre-assigned in hardware (like the
> DRAM base, various IOMEM spaces, etc.).

Yes, the firmware does it.  Same in a VM, ovmf or seabios (which runs
inside the guest) typically does it.  And sometimes the linux kernel
too.

> I think that means that we could have a reserved region that could be
> used by the host for dynamic memory hot-plug-like operation. The
> reference to memory hot-plug here is fully intentional, we could even
> use this feature of Linux to get struct pages for such memory if we
> really wanted.

We try to avoid such quirks whenever possible.  Negotiating such things
between qemu and firmware can be done if really needed (and actually is
done for memory hotplug support), but it's an extra interface which
needs maintenance.

> > Mapping host virtio-gpu resources
> > into guest address space is planned, it'll most likely use a pci memory
> > bar to reserve some address space.  The host can map resources into that
> > pci bar, on guest request.
> 
> Sounds like a viable option too. Do you have a pointer to some
> description on how this would work on both host and guest side?

Some early code:
  https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2
  https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2

Branches have other stuff too, look for "hostmem" commits.

Not much code yet beyond creating a pci bar on the host and detecting
presence in the guest.

On the host side qemu would create subregions inside the hostmem memory
region for the resources.

Oh the guest side we can ioremap stuff, like vram.

> > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> > able to lookup buffers using device and dma address, without explicitly
> > creating some identifier.  Not investigated yet in detail.
> 
> Not backed by pages as in "struct page", but those are still regular
> pages of the physical address space.

Well, maybe not.  Host gem object could live in device memory, and if we
map them into the guest ...

> That said, currently the sg_table interface is only able to describe
> physical memory using struct page pointers.  It's been a long standing
> limitation affecting even bare metal systems, so perhaps it's just the
> right time to make them possible to use some other identifiers, like
> PFNs?

I doubt you can handle pci memory bars like regular ram when it comes to
dma and iommu support.  There is a reason we have p2pdma in the first
place ...

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17  7:44           ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17  7:44 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

  Hi,

> > Also note that the guest manages the address space, so the host can't
> > simply allocate guest page addresses.
> 
> Is this really true? I'm not an expert in this area, but on a bare
> metal system it's the hardware or firmware that sets up the various
> physical address allocations on a hardware level and most of the time
> most of the addresses are already pre-assigned in hardware (like the
> DRAM base, various IOMEM spaces, etc.).

Yes, the firmware does it.  Same in a VM, ovmf or seabios (which runs
inside the guest) typically does it.  And sometimes the linux kernel
too.

> I think that means that we could have a reserved region that could be
> used by the host for dynamic memory hot-plug-like operation. The
> reference to memory hot-plug here is fully intentional, we could even
> use this feature of Linux to get struct pages for such memory if we
> really wanted.

We try to avoid such quirks whenever possible.  Negotiating such things
between qemu and firmware can be done if really needed (and actually is
done for memory hotplug support), but it's an extra interface which
needs maintenance.

> > Mapping host virtio-gpu resources
> > into guest address space is planned, it'll most likely use a pci memory
> > bar to reserve some address space.  The host can map resources into that
> > pci bar, on guest request.
> 
> Sounds like a viable option too. Do you have a pointer to some
> description on how this would work on both host and guest side?

Some early code:
  https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2
  https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2

Branches have other stuff too, look for "hostmem" commits.

Not much code yet beyond creating a pci bar on the host and detecting
presence in the guest.

On the host side qemu would create subregions inside the hostmem memory
region for the resources.

Oh the guest side we can ioremap stuff, like vram.

> > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> > able to lookup buffers using device and dma address, without explicitly
> > creating some identifier.  Not investigated yet in detail.
> 
> Not backed by pages as in "struct page", but those are still regular
> pages of the physical address space.

Well, maybe not.  Host gem object could live in device memory, and if we
map them into the guest ...

> That said, currently the sg_table interface is only able to describe
> physical memory using struct page pointers.  It's been a long standing
> limitation affecting even bare metal systems, so perhaps it's just the
> right time to make them possible to use some other identifiers, like
> PFNs?

I doubt you can handle pci memory bars like regular ram when it comes to
dma and iommu support.  There is a reason we have p2pdma in the first
place ...

cheers,
  Gerd


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-15 14:06                       ` Dmitry Morozov
@ 2019-10-17  8:06                         ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  8:06 UTC (permalink / raw)
  To: Dmitry Morozov, Gerd Hoffmann
  Cc: David Stevens, virtio-dev, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

On Tue, Oct 15, 2019 at 11:06 PM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hello Gerd,
>
> On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote:
> > On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> > > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > > >   Hi,
> > > >
> > > > > My take on this (for a decoder) would be to allocate memory for output
> > > > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > > > provide those to the device using virtio. The device side then uses
> > > > > the
> > > > > dmabuf framework to make the buffers accessible for the hardware. I'm
> > > > > not
> > > > > sure about that, it's just an idea.
> > > >
> > > > Virtualization aside, how does the complete video decoding workflow
> > > > work?  I assume along the lines of ...
> > > >
> > > >   (1) allocate buffer for decoded video frames (from ion).
> > > >   (2) export those buffers as dma-buf.
> > > >   (3) import dma-buf to video decoder.
> > > >   (4) import dma-buf to gpu.
> > > >
> > > > ... to establish buffers shared between video decoder and gpu?
> > > >
> > > > Then feed the video stream into the decoder, which decodes into the ion
> > > > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > > >
> > > > cheers,
> > > >
> > > >   Gerd
> > >
> > > Yes, exactly.
> > >
> > > [decoder]
> > > 1) Input buffers are allocated using  VIDIOC_*BUFS.
> >
> > Ok.
> >
> > > 2) Output buffers are allocated in a guest specific manner (ION, gbm).
> >
> > Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
> > used above sounds like using ION is required for decoding drm-protected
> > content.
>
> I mention the secure ION heap to address this Chrome OS related point:
> > 3) protected content decoding: the memory for decoded video frames
> > must not be accessible to the guest at all
>
> There was an RFC to implement a secure memory allocation framework, but
> apparently it was not accepted: https://lwn.net/Articles/661549/.
>
> In case of Android, it allocates GPU buffers for output frames, so it is the
> gralloc implementation who decides how to allocate memory. It can use some
> dedicated ION heap or can use libgbm. It can also be some proprietary
> implementation.
>
> >
> > So, do we have to worry about ION here?  Or can we just use gbm?
>
> If we replace vendor specific code in the Android guest and provide a way to
> communicate meatdata for buffer allocations from the device to the driver, we
> can use gbm. In the PC world it might be easier.
>
> >
> > [ Note: don't know much about ion, other than that it is used by
> >         android, is in staging right now and patches to move it
> >         out of staging are floating around @ dri-devel ]

Chrome OS has cros_gralloc, which is an open source implementation of
gralloc on top of minigbm (which itself is built on top of the Linux
DRM interfaces). It's not limited to Chrome OS and I believe Intel
also uses it for their native Android setups. With that, we could
completely disregard ION, but I feel like it's not a core problem
here. Whoever wants to use ION should be still able to do so if they
back the allocations with guest pages or memory coming from the host
using some other interface and it can be described using an identifier
compatible with what we're discussing here.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17  8:06                         ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  8:06 UTC (permalink / raw)
  To: Dmitry Morozov, Gerd Hoffmann
  Cc: David Stevens, virtio-dev, Keiichi Watanabe, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

On Tue, Oct 15, 2019 at 11:06 PM Dmitry Morozov
<dmitry.morozov@opensynergy.com> wrote:
>
> Hello Gerd,
>
> On Dienstag, 15. Oktober 2019 09:54:22 CEST Gerd Hoffmann wrote:
> > On Mon, Oct 14, 2019 at 03:05:03PM +0200, Dmitry Morozov wrote:
> > > On Montag, 14. Oktober 2019 14:34:43 CEST Gerd Hoffmann wrote:
> > > >   Hi,
> > > >
> > > > > My take on this (for a decoder) would be to allocate memory for output
> > > > > buffers from a secure ION heap, import in the v4l2 driver, and then to
> > > > > provide those to the device using virtio. The device side then uses
> > > > > the
> > > > > dmabuf framework to make the buffers accessible for the hardware. I'm
> > > > > not
> > > > > sure about that, it's just an idea.
> > > >
> > > > Virtualization aside, how does the complete video decoding workflow
> > > > work?  I assume along the lines of ...
> > > >
> > > >   (1) allocate buffer for decoded video frames (from ion).
> > > >   (2) export those buffers as dma-buf.
> > > >   (3) import dma-buf to video decoder.
> > > >   (4) import dma-buf to gpu.
> > > >
> > > > ... to establish buffers shared between video decoder and gpu?
> > > >
> > > > Then feed the video stream into the decoder, which decodes into the ion
> > > > buffers?  Ask the gpu to scanout the ion buffers to show the video?
> > > >
> > > > cheers,
> > > >
> > > >   Gerd
> > >
> > > Yes, exactly.
> > >
> > > [decoder]
> > > 1) Input buffers are allocated using  VIDIOC_*BUFS.
> >
> > Ok.
> >
> > > 2) Output buffers are allocated in a guest specific manner (ION, gbm).
> >
> > Who decides whenever ION or gbm is used?  The phrase "secure ION heap"
> > used above sounds like using ION is required for decoding drm-protected
> > content.
>
> I mention the secure ION heap to address this Chrome OS related point:
> > 3) protected content decoding: the memory for decoded video frames
> > must not be accessible to the guest at all
>
> There was an RFC to implement a secure memory allocation framework, but
> apparently it was not accepted: https://lwn.net/Articles/661549/.
>
> In case of Android, it allocates GPU buffers for output frames, so it is the
> gralloc implementation who decides how to allocate memory. It can use some
> dedicated ION heap or can use libgbm. It can also be some proprietary
> implementation.
>
> >
> > So, do we have to worry about ION here?  Or can we just use gbm?
>
> If we replace vendor specific code in the Android guest and provide a way to
> communicate meatdata for buffer allocations from the device to the driver, we
> can use gbm. In the PC world it might be easier.
>
> >
> > [ Note: don't know much about ion, other than that it is used by
> >         android, is in staging right now and patches to move it
> >         out of staging are floating around @ dri-devel ]

Chrome OS has cros_gralloc, which is an open source implementation of
gralloc on top of minigbm (which itself is built on top of the Linux
DRM interfaces). It's not limited to Chrome OS and I believe Intel
also uses it for their native Android setups. With that, we could
completely disregard ION, but I feel like it's not a core problem
here. Whoever wants to use ION should be still able to do so if they
back the allocations with guest pages or memory coming from the host
using some other interface and it can be described using an identifier
compatible with what we're discussing here.

Best regards,
Tomasz

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17  7:19                   ` Gerd Hoffmann
@ 2019-10-17  8:11                     ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  8:11 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Dmitry Morozov, David Stevens, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

On Thu, Oct 17, 2019 at 4:19 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > That said, Chrome OS would use a similar model, except that we don't
> > use ION. We would likely use minigbm backed by virtio-gpu to allocate
> > appropriate secure buffers for us and then import them to the V4L2
> > driver.
>
> What exactly is a "secure buffer"?  I guess a gem object where read
> access is not allowed, only scanout to display?  Who enforces this?
> The hardware?  Or the kernel driver?

In general, it's a buffer which can be accessed only by a specific set
of entities. The set depends on the use case and the level of security
you want to achieve. In Chrome OS we at least want to make such
buffers completely inaccessible for the guest, enforced by the VMM,
for example by not installing corresponding memory into the guest
address space (and not allowing transfers if the virtio-gpu shadow
buffer model is used).

Beyond that, the host memory itself could be further protected by some
hardware mechanisms or another hypervisor running above the host OS,
like in the ARM TrustZone model. That shouldn't matter for a VM guest,
though.

>
> It might make sense for virtio-gpu to know that concept, to allow guests
> ask for secure buffers.
>
> And of course we'll need some way to pass around identifiers for these
> (and maybe other) buffers (from virtio-gpu device via guest drivers to
> virtio-vdec device).  virtio-gpu guest driver could generate a uuid for
> that, attach it to the dma-buf and also notify the host so qemu can
> maintain a uuid -> buffer lookup table.

That could be still a guest physical address. Like on a bare metal
system with TrustZone, there could be physical memory that is not
accessible to the CPU.

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17  8:11                     ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  8:11 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Dmitry Morozov, David Stevens, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

On Thu, Oct 17, 2019 at 4:19 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > That said, Chrome OS would use a similar model, except that we don't
> > use ION. We would likely use minigbm backed by virtio-gpu to allocate
> > appropriate secure buffers for us and then import them to the V4L2
> > driver.
>
> What exactly is a "secure buffer"?  I guess a gem object where read
> access is not allowed, only scanout to display?  Who enforces this?
> The hardware?  Or the kernel driver?

In general, it's a buffer which can be accessed only by a specific set
of entities. The set depends on the use case and the level of security
you want to achieve. In Chrome OS we at least want to make such
buffers completely inaccessible for the guest, enforced by the VMM,
for example by not installing corresponding memory into the guest
address space (and not allowing transfers if the virtio-gpu shadow
buffer model is used).

Beyond that, the host memory itself could be further protected by some
hardware mechanisms or another hypervisor running above the host OS,
like in the ARM TrustZone model. That shouldn't matter for a VM guest,
though.

>
> It might make sense for virtio-gpu to know that concept, to allow guests
> ask for secure buffers.
>
> And of course we'll need some way to pass around identifiers for these
> (and maybe other) buffers (from virtio-gpu device via guest drivers to
> virtio-vdec device).  virtio-gpu guest driver could generate a uuid for
> that, attach it to the dma-buf and also notify the host so qemu can
> maintain a uuid -> buffer lookup table.

That could be still a guest physical address. Like on a bare metal
system with TrustZone, there could be physical memory that is not
accessible to the CPU.

Best regards,
Tomasz

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17  7:44           ` Gerd Hoffmann
@ 2019-10-17  8:23             ` Tomasz Figa
  -1 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  8:23 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

On Thu, Oct 17, 2019 at 4:44 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > > Also note that the guest manages the address space, so the host can't
> > > simply allocate guest page addresses.
> >
> > Is this really true? I'm not an expert in this area, but on a bare
> > metal system it's the hardware or firmware that sets up the various
> > physical address allocations on a hardware level and most of the time
> > most of the addresses are already pre-assigned in hardware (like the
> > DRAM base, various IOMEM spaces, etc.).
>
> Yes, the firmware does it.  Same in a VM, ovmf or seabios (which runs
> inside the guest) typically does it.  And sometimes the linux kernel
> too.
>
> > I think that means that we could have a reserved region that could be
> > used by the host for dynamic memory hot-plug-like operation. The
> > reference to memory hot-plug here is fully intentional, we could even
> > use this feature of Linux to get struct pages for such memory if we
> > really wanted.
>
> We try to avoid such quirks whenever possible.  Negotiating such things
> between qemu and firmware can be done if really needed (and actually is
> done for memory hotplug support), but it's an extra interface which
> needs maintenance.
>
> > > Mapping host virtio-gpu resources
> > > into guest address space is planned, it'll most likely use a pci memory
> > > bar to reserve some address space.  The host can map resources into that
> > > pci bar, on guest request.
> >
> > Sounds like a viable option too. Do you have a pointer to some
> > description on how this would work on both host and guest side?
>
> Some early code:
>   https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2
>   https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2
>
> Branches have other stuff too, look for "hostmem" commits.
>
> Not much code yet beyond creating a pci bar on the host and detecting
> presence in the guest.
>
> On the host side qemu would create subregions inside the hostmem memory
> region for the resources.
>
> Oh the guest side we can ioremap stuff, like vram.
>
> > > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> > > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> > > able to lookup buffers using device and dma address, without explicitly
> > > creating some identifier.  Not investigated yet in detail.
> >
> > Not backed by pages as in "struct page", but those are still regular
> > pages of the physical address space.
>
> Well, maybe not.  Host gem object could live in device memory, and if we
> map them into the guest ...

That's an interesting scenario, but in that case would we still want
to map it into the guest? I think in such case may need to have some
shadow buffer in regular RAM and that's already implemented in
virtio-gpu.

>
> > That said, currently the sg_table interface is only able to describe
> > physical memory using struct page pointers.  It's been a long standing
> > limitation affecting even bare metal systems, so perhaps it's just the
> > right time to make them possible to use some other identifiers, like
> > PFNs?
>
> I doubt you can handle pci memory bars like regular ram when it comes to
> dma and iommu support.  There is a reason we have p2pdma in the first
> place ...

The thing is that such bars would be actually backed by regular host
RAM. Do we really need the complexity of real PCI bar handling for
that?

Best regards,
Tomasz

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17  8:23             ` Tomasz Figa
  0 siblings, 0 replies; 59+ messages in thread
From: Tomasz Figa @ 2019-10-17  8:23 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

On Thu, Oct 17, 2019 at 4:44 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
>   Hi,
>
> > > Also note that the guest manages the address space, so the host can't
> > > simply allocate guest page addresses.
> >
> > Is this really true? I'm not an expert in this area, but on a bare
> > metal system it's the hardware or firmware that sets up the various
> > physical address allocations on a hardware level and most of the time
> > most of the addresses are already pre-assigned in hardware (like the
> > DRAM base, various IOMEM spaces, etc.).
>
> Yes, the firmware does it.  Same in a VM, ovmf or seabios (which runs
> inside the guest) typically does it.  And sometimes the linux kernel
> too.
>
> > I think that means that we could have a reserved region that could be
> > used by the host for dynamic memory hot-plug-like operation. The
> > reference to memory hot-plug here is fully intentional, we could even
> > use this feature of Linux to get struct pages for such memory if we
> > really wanted.
>
> We try to avoid such quirks whenever possible.  Negotiating such things
> between qemu and firmware can be done if really needed (and actually is
> done for memory hotplug support), but it's an extra interface which
> needs maintenance.
>
> > > Mapping host virtio-gpu resources
> > > into guest address space is planned, it'll most likely use a pci memory
> > > bar to reserve some address space.  The host can map resources into that
> > > pci bar, on guest request.
> >
> > Sounds like a viable option too. Do you have a pointer to some
> > description on how this would work on both host and guest side?
>
> Some early code:
>   https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-memory-v2
>   https://git.kraxel.org/cgit/linux/log/?h=drm-virtio-memory-v2
>
> Branches have other stuff too, look for "hostmem" commits.
>
> Not much code yet beyond creating a pci bar on the host and detecting
> presence in the guest.
>
> On the host side qemu would create subregions inside the hostmem memory
> region for the resources.
>
> Oh the guest side we can ioremap stuff, like vram.
>
> > > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> > > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> > > able to lookup buffers using device and dma address, without explicitly
> > > creating some identifier.  Not investigated yet in detail.
> >
> > Not backed by pages as in "struct page", but those are still regular
> > pages of the physical address space.
>
> Well, maybe not.  Host gem object could live in device memory, and if we
> map them into the guest ...

That's an interesting scenario, but in that case would we still want
to map it into the guest? I think in such case may need to have some
shadow buffer in regular RAM and that's already implemented in
virtio-gpu.

>
> > That said, currently the sg_table interface is only able to describe
> > physical memory using struct page pointers.  It's been a long standing
> > limitation affecting even bare metal systems, so perhaps it's just the
> > right time to make them possible to use some other identifiers, like
> > PFNs?
>
> I doubt you can handle pci memory bars like regular ram when it comes to
> dma and iommu support.  There is a reason we have p2pdma in the first
> place ...

The thing is that such bars would be actually backed by regular host
RAM. Do we really need the complexity of real PCI bar handling for
that?

Best regards,
Tomasz

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17  8:11                     ` Tomasz Figa
@ 2019-10-17 10:13                       ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17 10:13 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Dmitry Morozov, David Stevens, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

  Hi,

> That could be still a guest physical address. Like on a bare metal
> system with TrustZone, there could be physical memory that is not
> accessible to the CPU.

Hmm.  Yes, maybe.  We could use the dma address of the (first page of
the) guest buffer.  In case of a secure buffer the guest has no access
to the guest buffer would be unused, but it would at least make sure
that things don't crash in case someone tries to map & access the
buffer.

The host should be able to figure the corresponding host buffer from the
guest buffer address.

When running drm-misc-next you should be able to test whenever that'll
actually work without any virtio-gpu driver changes.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17 10:13                       ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17 10:13 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Dmitry Morozov, David Stevens, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, alexlau, dgreid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

  Hi,

> That could be still a guest physical address. Like on a bare metal
> system with TrustZone, there could be physical memory that is not
> accessible to the CPU.

Hmm.  Yes, maybe.  We could use the dma address of the (first page of
the) guest buffer.  In case of a secure buffer the guest has no access
to the guest buffer would be unused, but it would at least make sure
that things don't crash in case someone tries to map & access the
buffer.

The host should be able to figure the corresponding host buffer from the
guest buffer address.

When running drm-misc-next you should be able to test whenever that'll
actually work without any virtio-gpu driver changes.

cheers,
  Gerd


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17  8:23             ` Tomasz Figa
@ 2019-10-17 10:22               ` Gerd Hoffmann
  -1 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17 10:22 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

> > I doubt you can handle pci memory bars like regular ram when it comes to
> > dma and iommu support.  There is a reason we have p2pdma in the first
> > place ...
> 
> The thing is that such bars would be actually backed by regular host
> RAM. Do we really need the complexity of real PCI bar handling for
> that?

Well, taking shortcuts because of virtualization-specific assumptions
already caused problems in the past.  See the messy iommu handling we
have in virtio-pci for example.

So I don't feel like going the "we know it's just normal pages, so lets
simplify things" route.

Beside that hostmap isn't important for secure buffers, we wouldn't
allow the guest mapping them anyway ;)

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-17 10:22               ` Gerd Hoffmann
  0 siblings, 0 replies; 59+ messages in thread
From: Gerd Hoffmann @ 2019-10-17 10:22 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Keiichi Watanabe, virtio-dev, Alexandre Courbot, alexlau, dgreid,
	Stéphane Marchesin, Pawel Osciak, David Stevens,
	Hans Verkuil, Linux Media Mailing List, Daniel Vetter

> > I doubt you can handle pci memory bars like regular ram when it comes to
> > dma and iommu support.  There is a reason we have p2pdma in the first
> > place ...
> 
> The thing is that such bars would be actually backed by regular host
> RAM. Do we really need the complexity of real PCI bar handling for
> that?

Well, taking shortcuts because of virtualization-specific assumptions
already caused problems in the past.  See the messy iommu handling we
have in virtio-pci for example.

So I don't feel like going the "we know it's just normal pages, so lets
simplify things" route.

Beside that hostmap isn't important for secure buffers, we wouldn't
allow the guest mapping them anyway ;)

cheers,
  Gerd


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17  6:58         ` Tomasz Figa
  (?)
  (?)
@ 2019-10-17 15:00         ` Frank Yang
  2019-10-17 16:22           ` Frank Yang
  -1 siblings, 1 reply; 59+ messages in thread
From: Frank Yang @ 2019-10-17 15:00 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Gerd Hoffmann, Keiichi Watanabe, virtio-dev, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak,
	David Stevens, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

[-- Attachment #1: Type: text/plain, Size: 8555 bytes --]

On Wed, Oct 16, 2019 at 11:58 PM Tomasz Figa <tfiga@chromium.org> wrote:

> On Mon, Oct 14, 2019 at 9:19 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> > > > Well.  I think before even discussing the protocol details we need a
> > > > reasonable plan for buffer handling.  I think using virtio-gpu
> buffers
> > > > should be an optional optimization and not a requirement.  Also the
> > > > motivation for that should be clear (Let the host decoder write
> directly
> > > > to virtio-gpu resources, to display video without copying around the
> > > > decoded framebuffers from one device to another).
> > >
> > > Just to make sure we're on the same page, what would the buffers come
> > > from if we don't use this optimization?
> > >
> > > I can imagine a setup like this;
> > >  1) host device allocates host memory appropriate for usage with host
> > > video decoder,
> > >  2) guest driver allocates arbitrary guest pages for storage
> > > accessible to the guest software,
> > >  3) guest userspace writes input for the decoder to guest pages,
> > >  4) guest driver passes the list of pages for the input and output
> > > buffers to the host device
> > >  5) host device copies data from input guest pages to host buffer
> > >  6) host device runs the decoding
> > >  7) host device copies decoded frame to output guest pages
> > >  8) guest userspace can access decoded frame from those pages; back to
> 3
> > >
> > > Is that something you have in mind?
> >
> > I don't have any specific workflow in mind.
> >
> > If you want display the decoded video frames you want use dma-bufs shared
> > by video decoder and gpu, right?  So the userspace application (video
> > player probably) would create the buffers using one of the drivers,
> > export them as dma-buf, then import them into the other driver.  Just
> > like you would do on physical hardware.  So, when using virtio-gpu
> > buffers:
> >
> >   (1) guest app creates buffers using virtio-gpu.
> >   (2) guest app exports virtio-gpu buffers buffers as dma-buf.
> >   (3) guest app imports the dma-bufs into virtio-vdec.
> >   (4) guest app asks the virtio-vdec driver to write the decoded
> >       frames into the dma-bufs.
> >   (5) guest app asks the virtio-gpu driver to display the decoded
> >       frame.
> >
> > The guest video decoder driver passes the dma-buf pages to the host, and
> > it is the host driver's job to fill the buffer.  How this is done
> > exactly might depend on hardware capabilities (whenever a host-allocated
> > bounce buffer is needed or whenever the hardware can decode directly to
> > the dma-buf passed by the guest driver) and is an implementation detail.
> >
> > Now, with cross-device sharing added the virtio-gpu would attach some
> > kind of identifier to the dma-buf, virtio-vdec could fetch the
> > identifier and pass it to the host too, and the host virtio-vdec device
> > can use the identifier to get a host dma-buf handle for the (virtio-gpu)
> > buffer.  Ask the host video decoder driver to import the host dma-buf.
> > If it all worked fine it can ask the host hardware to decode directly to
> > the host virtio-gpu resource.
> >
>
> Agreed.
>
> > > > Referencing virtio-gpu buffers needs a better plan than just re-using
> > > > virtio-gpu resource handles.  The handles are device-specific.  What
> if
> > > > there are multiple virtio-gpu devices present in the guest?
> > > >
> > > > I think we need a framework for cross-device buffer sharing.  One
> > > > possible option would be to have some kind of buffer registry, where
> > > > buffers can be registered for cross-device sharing and get a unique
> > > > id (a uuid maybe?).  Drivers would typically register buffers on
> > > > dma-buf export.
> > >
> > > This approach could possibly let us handle this transparently to
> > > importers, which would work for guest kernel subsystems that rely on
> > > the ability to handle buffers like native memory (e.g. having a
> > > sgtable or DMA address) for them.
> > >
> > > How about allocating guest physical addresses for memory corresponding
> > > to those buffers? On the virtio-gpu example, that could work like
> > > this:
> > >  - by default a virtio-gpu buffer has only a resource handle,
> > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
> > > virtio-gpu device export the buffer to a host framework (inside the
> > > VMM) that would allocate guest page addresses for it, which the
> > > command would return in a response to the guest,
> >
> > Hmm, the cross-device buffer sharing framework I have in mind would
> > basically be a buffer registry.  virtio-gpu would create buffers as
> > usual, create a identifier somehow (details to be hashed out), attach
> > the identifier to the dma-buf so it can be used as outlined above.
> >
> > Also note that the guest manages the address space, so the host can't
> > simply allocate guest page addresses.
>
> Is this really true? I'm not an expert in this area, but on a bare
> metal system it's the hardware or firmware that sets up the various
> physical address allocations on a hardware level and most of the time
> most of the addresses are already pre-assigned in hardware (like the
> DRAM base, various IOMEM spaces, etc.).
>
> I think that means that we could have a reserved region that could be
> used by the host for dynamic memory hot-plug-like operation. The
> reference to memory hot-plug here is fully intentional, we could even
> use this feature of Linux to get struct pages for such memory if we
> really wanted.
>
>
Wouldn't normal hotplug result in Linux trying to claim what you just hot
plugged?


> > Mapping host virtio-gpu resources
> > into guest address space is planned, it'll most likely use a pci memory
> > bar to reserve some address space.  The host can map resources into that
> > pci bar, on guest request.
>
> Sounds like a viable option too. Do you have a pointer to some
> description on how this would work on both host and guest side?
>
>
I've been following the conv. and I'd like to have a more in depth
discussion sometime. On the Android Emulator, we've successfully been
mapping host memory into the PCI bar for graphics applications where host
visible/coherent memory is needed (Vulkan) or has benefits for
performance/compatibility (gralloc with CPU usage, as a substitute for
ashmem).

We use the address space driver
<https://android.googlesource.com/kernel/goldfish/+/refs/heads/android-goldfish-4.14-dev/drivers/misc/goldfish_address_space.c>
 + device
<https://android.googlesource.com/platform/external/qemu/+/refs/heads/emu-master-dev/hw/pci/goldfish_address_space.c>,
which through the PING ioctl allows notifications from guest to host and
allows all the implementation to be in VMM or guest userspace. I had a
previous spec
<https://github.com/741g/virtio-spec/blob/master/virtio-hostmem.tex>(virtio-hostmem)
on making a version of the addr. space device for virtio, but we got
sidetracked in various ways. One technical point from there was deciding on
whether the host needed to allocate everything up front. Wonder if now that
it's acceptable to map host memory on guest request into that, we can
revisit the virtio-hostmem idea.


> >
> > >  - virtio-gpu driver could then create a regular DMA-buf object for
> > > such memory, because it's just backed by pages (even though they may
> > > not be accessible to the guest; just like in the case of TrustZone
> > > memory protection on bare metal systems),
> >
> > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
> > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
> > able to lookup buffers using device and dma address, without explicitly
> > creating some identifier.  Not investigated yet in detail.
>
> Not backed by pages as in "struct page", but those are still regular
> pages of the physical address space. That said, currently the sg_table
> interface is only able to describe physical memory using struct page
> pointers. It's been a long standing limitation affecting even bare
> metal systems, so perhaps it's just the right time to make them
> possible to use some other identifiers, like PFNs? There is at least
> one more free bit in the page_link field, which could mean that the
> field contains a PFN instead of a struct page pointer.
>
> Best regards,
> Tomasz
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>
>

[-- Attachment #2: Type: text/html, Size: 10519 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17 15:00         ` Frank Yang
@ 2019-10-17 16:22           ` Frank Yang
  0 siblings, 0 replies; 59+ messages in thread
From: Frank Yang @ 2019-10-17 16:22 UTC (permalink / raw)
  To: Tomasz Figa
  Cc: Gerd Hoffmann, Keiichi Watanabe, virtio-dev, Alexandre Courbot,
	alexlau, dgreid, Stéphane Marchesin, Pawel Osciak,
	David Stevens, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

[-- Attachment #1: Type: text/plain, Size: 9956 bytes --]

On Thu, Oct 17, 2019 at 8:00 AM Frank Yang <lfy@google.com> wrote:

>
>
> On Wed, Oct 16, 2019 at 11:58 PM Tomasz Figa <tfiga@chromium.org> wrote:
>
>> On Mon, Oct 14, 2019 at 9:19 PM Gerd Hoffmann <kraxel@redhat.com> wrote:
>> >
>> > > > Well.  I think before even discussing the protocol details we need a
>> > > > reasonable plan for buffer handling.  I think using virtio-gpu
>> buffers
>> > > > should be an optional optimization and not a requirement.  Also the
>> > > > motivation for that should be clear (Let the host decoder write
>> directly
>> > > > to virtio-gpu resources, to display video without copying around the
>> > > > decoded framebuffers from one device to another).
>> > >
>> > > Just to make sure we're on the same page, what would the buffers come
>> > > from if we don't use this optimization?
>> > >
>> > > I can imagine a setup like this;
>> > >  1) host device allocates host memory appropriate for usage with host
>> > > video decoder,
>> > >  2) guest driver allocates arbitrary guest pages for storage
>> > > accessible to the guest software,
>> > >  3) guest userspace writes input for the decoder to guest pages,
>> > >  4) guest driver passes the list of pages for the input and output
>> > > buffers to the host device
>> > >  5) host device copies data from input guest pages to host buffer
>> > >  6) host device runs the decoding
>> > >  7) host device copies decoded frame to output guest pages
>> > >  8) guest userspace can access decoded frame from those pages; back
>> to 3
>> > >
>> > > Is that something you have in mind?
>> >
>> > I don't have any specific workflow in mind.
>> >
>> > If you want display the decoded video frames you want use dma-bufs
>> shared
>> > by video decoder and gpu, right?  So the userspace application (video
>> > player probably) would create the buffers using one of the drivers,
>> > export them as dma-buf, then import them into the other driver.  Just
>> > like you would do on physical hardware.  So, when using virtio-gpu
>> > buffers:
>> >
>> >   (1) guest app creates buffers using virtio-gpu.
>> >   (2) guest app exports virtio-gpu buffers buffers as dma-buf.
>> >   (3) guest app imports the dma-bufs into virtio-vdec.
>> >   (4) guest app asks the virtio-vdec driver to write the decoded
>> >       frames into the dma-bufs.
>> >   (5) guest app asks the virtio-gpu driver to display the decoded
>> >       frame.
>> >
>> > The guest video decoder driver passes the dma-buf pages to the host, and
>> > it is the host driver's job to fill the buffer.  How this is done
>> > exactly might depend on hardware capabilities (whenever a host-allocated
>> > bounce buffer is needed or whenever the hardware can decode directly to
>> > the dma-buf passed by the guest driver) and is an implementation detail.
>> >
>> > Now, with cross-device sharing added the virtio-gpu would attach some
>> > kind of identifier to the dma-buf, virtio-vdec could fetch the
>> > identifier and pass it to the host too, and the host virtio-vdec device
>> > can use the identifier to get a host dma-buf handle for the (virtio-gpu)
>> > buffer.  Ask the host video decoder driver to import the host dma-buf.
>> > If it all worked fine it can ask the host hardware to decode directly to
>> > the host virtio-gpu resource.
>> >
>>
>> Agreed.
>>
>> > > > Referencing virtio-gpu buffers needs a better plan than just
>> re-using
>> > > > virtio-gpu resource handles.  The handles are device-specific.
>> What if
>> > > > there are multiple virtio-gpu devices present in the guest?
>> > > >
>> > > > I think we need a framework for cross-device buffer sharing.  One
>> > > > possible option would be to have some kind of buffer registry, where
>> > > > buffers can be registered for cross-device sharing and get a unique
>> > > > id (a uuid maybe?).  Drivers would typically register buffers on
>> > > > dma-buf export.
>> > >
>> > > This approach could possibly let us handle this transparently to
>> > > importers, which would work for guest kernel subsystems that rely on
>> > > the ability to handle buffers like native memory (e.g. having a
>> > > sgtable or DMA address) for them.
>> > >
>> > > How about allocating guest physical addresses for memory corresponding
>> > > to those buffers? On the virtio-gpu example, that could work like
>> > > this:
>> > >  - by default a virtio-gpu buffer has only a resource handle,
>> > >  - VIRTIO_GPU_RESOURCE_EXPORT command could be called to have the
>> > > virtio-gpu device export the buffer to a host framework (inside the
>> > > VMM) that would allocate guest page addresses for it, which the
>> > > command would return in a response to the guest,
>> >
>> > Hmm, the cross-device buffer sharing framework I have in mind would
>> > basically be a buffer registry.  virtio-gpu would create buffers as
>> > usual, create a identifier somehow (details to be hashed out), attach
>> > the identifier to the dma-buf so it can be used as outlined above.
>> >
>> > Also note that the guest manages the address space, so the host can't
>> > simply allocate guest page addresses.
>>
>> Is this really true? I'm not an expert in this area, but on a bare
>> metal system it's the hardware or firmware that sets up the various
>> physical address allocations on a hardware level and most of the time
>> most of the addresses are already pre-assigned in hardware (like the
>> DRAM base, various IOMEM spaces, etc.).
>>
>> I think that means that we could have a reserved region that could be
>> used by the host for dynamic memory hot-plug-like operation. The
>> reference to memory hot-plug here is fully intentional, we could even
>> use this feature of Linux to get struct pages for such memory if we
>> really wanted.
>>
>>
> Wouldn't normal hotplug result in Linux trying to claim what you just hot
> plugged?
>
>
>> > Mapping host virtio-gpu resources
>> > into guest address space is planned, it'll most likely use a pci memory
>> > bar to reserve some address space.  The host can map resources into that
>> > pci bar, on guest request.
>>
>> Sounds like a viable option too. Do you have a pointer to some
>> description on how this would work on both host and guest side?
>>
>>
> I've been following the conv. and I'd like to have a more in depth
> discussion sometime. On the Android Emulator, we've successfully been
> mapping host memory into the PCI bar for graphics applications where host
> visible/coherent memory is needed (Vulkan) or has benefits for
> performance/compatibility (gralloc with CPU usage, as a substitute for
> ashmem).
>
> We use the address space driver
> <https://android.googlesource.com/kernel/goldfish/+/refs/heads/android-goldfish-4.14-dev/drivers/misc/goldfish_address_space.c>
>  + device
> <https://android.googlesource.com/platform/external/qemu/+/refs/heads/emu-master-dev/hw/pci/goldfish_address_space.c>,
> which through the PING ioctl allows notifications from guest to host and
> allows all the implementation to be in VMM or guest userspace. I had a
> previous spec
> <https://github.com/741g/virtio-spec/blob/master/virtio-hostmem.tex>(virtio-hostmem)
> on making a version of the addr. space device for virtio, but we got
> sidetracked in various ways. One technical point from there was deciding on
> whether the host needed to allocate everything up front. Wonder if now that
> it's acceptable to map host memory on guest request into that, we can
> revisit the virtio-hostmem idea.
>
>

In terms of video decode specifically, we're able to allocate host memory
for the video codec, then map it to the guest via the PCI bar, and then the
guest ends up using the decoded frame in graphics. But, once it does so,
and the guest wants to upload the deocded frame as a texture or something,
on the VMM side we'd already be aware of the physical address (host virtual
address) where the decoded frame lives, regardless of how many layers and
other abstractions the decoded frame travels through (assuming it doesn't
get copied, but if it does, then we properly emulate the fact).

The situation is probably a bit more complex for video decode that is done
on hardware without copying to cpu memory. We haven't really addressed that
yet. I imagine on the VMM side, video codec decodes in hardware to some
image or texture that is associated with a dmabuf, or a native_handle_t, or
an OpenGL texture. But then the guest userspace shouldn't really care that
they are related on the VMM side as long as the VMM has a consistent
interface with whatever shared image kernel abstr / library (layers above
virtio) the guest uses?

>
>> > >  - virtio-gpu driver could then create a regular DMA-buf object for
>> > > such memory, because it's just backed by pages (even though they may
>> > > not be accessible to the guest; just like in the case of TrustZone
>> > > memory protection on bare metal systems),
>> >
>> > Hmm, well, pci memory bars are *not* backed by pages.  Maybe we can use
>> > Documentation/driver-api/pci/p2pdma.rst though.  With that we might be
>> > able to lookup buffers using device and dma address, without explicitly
>> > creating some identifier.  Not investigated yet in detail.
>>
>> Not backed by pages as in "struct page", but those are still regular
>> pages of the physical address space. That said, currently the sg_table
>> interface is only able to describe physical memory using struct page
>> pointers. It's been a long standing limitation affecting even bare
>> metal systems, so perhaps it's just the right time to make them
>> possible to use some other identifiers, like PFNs? There is at least
>> one more free bit in the page_link field, which could mean that the
>> field contains a PFN instead of a struct page pointer.
>>
>> Best regards,
>> Tomasz
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>
>>

[-- Attachment #2: Type: text/html, Size: 12351 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17 10:13                       ` Gerd Hoffmann
  (?)
@ 2019-10-29  7:39                       ` David Stevens
  2019-10-31  7:30                           ` Keiichi Watanabe
  -1 siblings, 1 reply; 59+ messages in thread
From: David Stevens @ 2019-10-29  7:39 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, Dmitry Morozov, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, Alex Lau, Dylan Reid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

> When running drm-misc-next you should be able to test whenever that'll
> actually work without any virtio-gpu driver changes.

I did some experimentation with the Chrome OS kernel-next branch
(based on v5.4-rc3) plus drm-misc-next. I looked at both the chromeos
downstream virtio-wayland driver as well as the virtio-vdec driver
that was recently proposed to upstream.

Using the dma address of buffers generally works. However, it does
require the addition of some synchronization in the virtio-gpu driver
to prevent races between the virtio-gpu device registering the buffer
with the guest dma address (which happens with the ATTACH_BACKING
command) and other virtio devices using the guest dma address as a
buffer identifier. I've included a patch that adds this
synchronization.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 drivers/gpu/drm/virtio/virtgpu_drv.h    |  2 +
 drivers/gpu/drm/virtio/virtgpu_object.c | 15 +++++-
 drivers/gpu/drm/virtio/virtgpu_vq.c     | 64 +++++++++++++++++++++----
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 314e02f94d9c4..00a2c0e6b6382 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -72,6 +72,8 @@ struct virtio_gpu_object {
        uint32_t mapped;
        bool dumb;
        bool created;
+
+       bool attach_backing_complete;
 };
 #define gem_to_virtio_gpu_obj(gobj) \
        container_of((gobj), struct virtio_gpu_object, base.base)
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
b/drivers/gpu/drm/virtio/virtgpu_object.c
index 017a9e0fc3bb8..812a0a48f6385 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -75,13 +75,26 @@ static void virtio_gpu_free_object(struct
drm_gem_object *obj)
        drm_gem_shmem_free_object(obj);
 }

+int virtio_gpu_gem_object_pin(struct drm_gem_object *obj)
+{
+       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+       struct virtio_gpu_device *vgdev = obj->dev->dev_private;
+
+       if (!bo->attach_backing_complete)
+               wait_event(vgdev->resp_wq, bo->attach_backing_complete);
+       if (!bo->attach_backing_complete)
+               return -EFAULT;
+
+       return drm_gem_shmem_pin(obj);
+}
+
 static const struct drm_gem_object_funcs virtio_gpu_gem_funcs = {
        .free = virtio_gpu_free_object,
        .open = virtio_gpu_gem_object_open,
        .close = virtio_gpu_gem_object_close,

        .print_info = drm_gem_shmem_print_info,
-       .pin = drm_gem_shmem_pin,
+       .pin = virtio_gpu_gem_object_pin,
        .unpin = drm_gem_shmem_unpin,
        .get_sg_table = drm_gem_shmem_get_sg_table,
        .vmap = drm_gem_shmem_vmap,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 80176f379ad51..8bc2359a6d625 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -579,26 +579,42 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct
virtio_gpu_device *vgdev,
        virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
 }

+static void
+virtio_gpu_cmd_resource_attach_backing_cb(struct virtio_gpu_device *vgdev,
+                                         struct virtio_gpu_vbuffer *vbuf)
+{
+       struct virtio_gpu_object_array *objs = vbuf->objs;
+       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
+
+       bo->attach_backing_complete = true;
+       wake_up_all(&vgdev->resp_wq);
+}
+
 static void
 virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev,
-                                      uint32_t resource_id,
+                                      struct virtio_gpu_object *bo,
                                       struct virtio_gpu_mem_entry *ents,
                                       uint32_t nents,
+                                      struct virtio_gpu_object_array *objs,
                                       struct virtio_gpu_fence *fence)
 {
        struct virtio_gpu_resource_attach_backing *cmd_p;
        struct virtio_gpu_vbuffer *vbuf;

-       cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
+       cmd_p = virtio_gpu_alloc_cmd_resp(
+               vgdev, virtio_gpu_cmd_resource_attach_backing_cb, &vbuf,
+               sizeof(*cmd_p), sizeof(struct virtio_gpu_ctrl_hdr), NULL);
        memset(cmd_p, 0, sizeof(*cmd_p));

        cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING);
-       cmd_p->resource_id = cpu_to_le32(resource_id);
+       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
        cmd_p->nr_entries = cpu_to_le32(nents);

        vbuf->data_buf = ents;
        vbuf->data_size = sizeof(*ents) * nents;

+       vbuf->objs = objs;
+
        virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
 }

@@ -1048,9 +1064,10 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                             struct virtio_gpu_fence *fence)
 {
        bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
+       struct virtio_gpu_object_array *objs = NULL;
        struct virtio_gpu_mem_entry *ents;
        struct scatterlist *sg;
-       int si, nents, ret;
+       int si, nents, ret = 0;

        if (WARN_ON_ONCE(!obj->created))
                return -EINVAL;
@@ -1063,8 +1080,8 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,

        obj->pages = drm_gem_shmem_get_sg_table(&obj->base.base);
        if (obj->pages == NULL) {
-               drm_gem_shmem_unpin(&obj->base.base);
-               return -EINVAL;
+               ret = -EINVAL;
+               goto err_unpin;
        }

        if (use_dma_api) {
@@ -1081,7 +1098,8 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                             GFP_KERNEL);
        if (!ents) {
                DRM_ERROR("failed to allocate ent list\n");
-               return -ENOMEM;
+               ret = -ENOMEM;
+               goto err_unmap;
        }

        for_each_sg(obj->pages->sgl, sg, nents, si) {
@@ -1092,10 +1110,38 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                ents[si].padding = 0;
        }

-       virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle,
+       objs = virtio_gpu_array_alloc(1);
+       if (!objs) {
+               ret = -ENOMEM;
+               goto err_free_ents;
+       }
+       virtio_gpu_array_add_obj(objs, &obj->base.base);
+
+       if (fence) {
+               ret = virtio_gpu_array_lock_resv(objs);
+               if (ret != 0)
+                       goto err_put_objs;
+       }
+
+       virtio_gpu_cmd_resource_attach_backing(vgdev, obj,
                                               ents, nents,
-                                              fence);
+                                              objs, fence);
        return 0;
+
+err_put_objs:
+       virtio_gpu_array_put_free(objs);
+err_free_ents:
+       kfree(ents);
+err_unmap:
+       if (use_dma_api) {
+               dma_unmap_sg(vgdev->vdev->dev.parent,
+                            obj->pages->sgl, obj->pages->nents,
+                            DMA_TO_DEVICE);
+               obj->mapped = 0;
+       }
+err_unpin:
+       drm_gem_shmem_unpin(&obj->base.base);
+       return ret;
 }

 void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
--
2.24.0.rc0.303.g954a862665-goog

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-29  7:39                       ` David Stevens
@ 2019-10-31  7:30                           ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-10-31  7:30 UTC (permalink / raw)
  To: David Stevens
  Cc: Gerd Hoffmann, Tomasz Figa, Dmitry Morozov, virtio-dev,
	Alexandre Courbot, Alex Lau, Dylan Reid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

Hi, I just sent a patch for our virtio-vdec driver to the Linux Media
mailing list:
https://patchwork.kernel.org/patch/11220661/

This implementation uses the initial version of virtio-vdec I proposed
in this thread and doesn't contain David's recent patch for DMA
addresses of virtio-gpu buffers.
If his patch looks good to everyone, I'm happy to update my patch to
use them instead of exported virtio-gpu resource handles.

Any feedback will be appreciated. Thanks in advance.

Best regards,
Keiichi

On Tue, Oct 29, 2019 at 4:39 PM David Stevens <stevensd@chromium.org> wrote:
>
> > When running drm-misc-next you should be able to test whenever that'll
> > actually work without any virtio-gpu driver changes.
>
> I did some experimentation with the Chrome OS kernel-next branch
> (based on v5.4-rc3) plus drm-misc-next. I looked at both the chromeos
> downstream virtio-wayland driver as well as the virtio-vdec driver
> that was recently proposed to upstream.
>
> Using the dma address of buffers generally works. However, it does
> require the addition of some synchronization in the virtio-gpu driver
> to prevent races between the virtio-gpu device registering the buffer
> with the guest dma address (which happens with the ATTACH_BACKING
> command) and other virtio devices using the guest dma address as a
> buffer identifier. I've included a patch that adds this
> synchronization.
>
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h    |  2 +
>  drivers/gpu/drm/virtio/virtgpu_object.c | 15 +++++-
>  drivers/gpu/drm/virtio/virtgpu_vq.c     | 64 +++++++++++++++++++++----
>  3 files changed, 71 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 314e02f94d9c4..00a2c0e6b6382 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -72,6 +72,8 @@ struct virtio_gpu_object {
>         uint32_t mapped;
>         bool dumb;
>         bool created;
> +
> +       bool attach_backing_complete;
>  };
>  #define gem_to_virtio_gpu_obj(gobj) \
>         container_of((gobj), struct virtio_gpu_object, base.base)
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index 017a9e0fc3bb8..812a0a48f6385 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -75,13 +75,26 @@ static void virtio_gpu_free_object(struct
> drm_gem_object *obj)
>         drm_gem_shmem_free_object(obj);
>  }
>
> +int virtio_gpu_gem_object_pin(struct drm_gem_object *obj)
> +{
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> +       struct virtio_gpu_device *vgdev = obj->dev->dev_private;
> +
> +       if (!bo->attach_backing_complete)
> +               wait_event(vgdev->resp_wq, bo->attach_backing_complete);
> +       if (!bo->attach_backing_complete)
> +               return -EFAULT;
> +
> +       return drm_gem_shmem_pin(obj);
> +}
> +
>  static const struct drm_gem_object_funcs virtio_gpu_gem_funcs = {
>         .free = virtio_gpu_free_object,
>         .open = virtio_gpu_gem_object_open,
>         .close = virtio_gpu_gem_object_close,
>
>         .print_info = drm_gem_shmem_print_info,
> -       .pin = drm_gem_shmem_pin,
> +       .pin = virtio_gpu_gem_object_pin,
>         .unpin = drm_gem_shmem_unpin,
>         .get_sg_table = drm_gem_shmem_get_sg_table,
>         .vmap = drm_gem_shmem_vmap,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 80176f379ad51..8bc2359a6d625 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -579,26 +579,42 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct
> virtio_gpu_device *vgdev,
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> +static void
> +virtio_gpu_cmd_resource_attach_backing_cb(struct virtio_gpu_device *vgdev,
> +                                         struct virtio_gpu_vbuffer *vbuf)
> +{
> +       struct virtio_gpu_object_array *objs = vbuf->objs;
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
> +
> +       bo->attach_backing_complete = true;
> +       wake_up_all(&vgdev->resp_wq);
> +}
> +
>  static void
>  virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev,
> -                                      uint32_t resource_id,
> +                                      struct virtio_gpu_object *bo,
>                                        struct virtio_gpu_mem_entry *ents,
>                                        uint32_t nents,
> +                                      struct virtio_gpu_object_array *objs,
>                                        struct virtio_gpu_fence *fence)
>  {
>         struct virtio_gpu_resource_attach_backing *cmd_p;
>         struct virtio_gpu_vbuffer *vbuf;
>
> -       cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
> +       cmd_p = virtio_gpu_alloc_cmd_resp(
> +               vgdev, virtio_gpu_cmd_resource_attach_backing_cb, &vbuf,
> +               sizeof(*cmd_p), sizeof(struct virtio_gpu_ctrl_hdr), NULL);
>         memset(cmd_p, 0, sizeof(*cmd_p));
>
>         cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING);
> -       cmd_p->resource_id = cpu_to_le32(resource_id);
> +       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
>         cmd_p->nr_entries = cpu_to_le32(nents);
>
>         vbuf->data_buf = ents;
>         vbuf->data_size = sizeof(*ents) * nents;
>
> +       vbuf->objs = objs;
> +
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> @@ -1048,9 +1064,10 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              struct virtio_gpu_fence *fence)
>  {
>         bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
> +       struct virtio_gpu_object_array *objs = NULL;
>         struct virtio_gpu_mem_entry *ents;
>         struct scatterlist *sg;
> -       int si, nents, ret;
> +       int si, nents, ret = 0;
>
>         if (WARN_ON_ONCE(!obj->created))
>                 return -EINVAL;
> @@ -1063,8 +1080,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>
>         obj->pages = drm_gem_shmem_get_sg_table(&obj->base.base);
>         if (obj->pages == NULL) {
> -               drm_gem_shmem_unpin(&obj->base.base);
> -               return -EINVAL;
> +               ret = -EINVAL;
> +               goto err_unpin;
>         }
>
>         if (use_dma_api) {
> @@ -1081,7 +1098,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              GFP_KERNEL);
>         if (!ents) {
>                 DRM_ERROR("failed to allocate ent list\n");
> -               return -ENOMEM;
> +               ret = -ENOMEM;
> +               goto err_unmap;
>         }
>
>         for_each_sg(obj->pages->sgl, sg, nents, si) {
> @@ -1092,10 +1110,38 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                 ents[si].padding = 0;
>         }
>
> -       virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle,
> +       objs = virtio_gpu_array_alloc(1);
> +       if (!objs) {
> +               ret = -ENOMEM;
> +               goto err_free_ents;
> +       }
> +       virtio_gpu_array_add_obj(objs, &obj->base.base);
> +
> +       if (fence) {
> +               ret = virtio_gpu_array_lock_resv(objs);
> +               if (ret != 0)
> +                       goto err_put_objs;
> +       }
> +
> +       virtio_gpu_cmd_resource_attach_backing(vgdev, obj,
>                                                ents, nents,
> -                                              fence);
> +                                              objs, fence);
>         return 0;
> +
> +err_put_objs:
> +       virtio_gpu_array_put_free(objs);
> +err_free_ents:
> +       kfree(ents);
> +err_unmap:
> +       if (use_dma_api) {
> +               dma_unmap_sg(vgdev->vdev->dev.parent,
> +                            obj->pages->sgl, obj->pages->nents,
> +                            DMA_TO_DEVICE);
> +               obj->mapped = 0;
> +       }
> +err_unpin:
> +       drm_gem_shmem_unpin(&obj->base.base);
> +       return ret;
>  }
>
>  void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
> --
> 2.24.0.rc0.303.g954a862665-goog

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-31  7:30                           ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-10-31  7:30 UTC (permalink / raw)
  To: David Stevens
  Cc: Gerd Hoffmann, Tomasz Figa, Dmitry Morozov, virtio-dev,
	Alexandre Courbot, Alex Lau, Dylan Reid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

Hi, I just sent a patch for our virtio-vdec driver to the Linux Media
mailing list:
https://patchwork.kernel.org/patch/11220661/

This implementation uses the initial version of virtio-vdec I proposed
in this thread and doesn't contain David's recent patch for DMA
addresses of virtio-gpu buffers.
If his patch looks good to everyone, I'm happy to update my patch to
use them instead of exported virtio-gpu resource handles.

Any feedback will be appreciated. Thanks in advance.

Best regards,
Keiichi

On Tue, Oct 29, 2019 at 4:39 PM David Stevens <stevensd@chromium.org> wrote:
>
> > When running drm-misc-next you should be able to test whenever that'll
> > actually work without any virtio-gpu driver changes.
>
> I did some experimentation with the Chrome OS kernel-next branch
> (based on v5.4-rc3) plus drm-misc-next. I looked at both the chromeos
> downstream virtio-wayland driver as well as the virtio-vdec driver
> that was recently proposed to upstream.
>
> Using the dma address of buffers generally works. However, it does
> require the addition of some synchronization in the virtio-gpu driver
> to prevent races between the virtio-gpu device registering the buffer
> with the guest dma address (which happens with the ATTACH_BACKING
> command) and other virtio devices using the guest dma address as a
> buffer identifier. I've included a patch that adds this
> synchronization.
>
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h    |  2 +
>  drivers/gpu/drm/virtio/virtgpu_object.c | 15 +++++-
>  drivers/gpu/drm/virtio/virtgpu_vq.c     | 64 +++++++++++++++++++++----
>  3 files changed, 71 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 314e02f94d9c4..00a2c0e6b6382 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -72,6 +72,8 @@ struct virtio_gpu_object {
>         uint32_t mapped;
>         bool dumb;
>         bool created;
> +
> +       bool attach_backing_complete;
>  };
>  #define gem_to_virtio_gpu_obj(gobj) \
>         container_of((gobj), struct virtio_gpu_object, base.base)
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index 017a9e0fc3bb8..812a0a48f6385 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -75,13 +75,26 @@ static void virtio_gpu_free_object(struct
> drm_gem_object *obj)
>         drm_gem_shmem_free_object(obj);
>  }
>
> +int virtio_gpu_gem_object_pin(struct drm_gem_object *obj)
> +{
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> +       struct virtio_gpu_device *vgdev = obj->dev->dev_private;
> +
> +       if (!bo->attach_backing_complete)
> +               wait_event(vgdev->resp_wq, bo->attach_backing_complete);
> +       if (!bo->attach_backing_complete)
> +               return -EFAULT;
> +
> +       return drm_gem_shmem_pin(obj);
> +}
> +
>  static const struct drm_gem_object_funcs virtio_gpu_gem_funcs = {
>         .free = virtio_gpu_free_object,
>         .open = virtio_gpu_gem_object_open,
>         .close = virtio_gpu_gem_object_close,
>
>         .print_info = drm_gem_shmem_print_info,
> -       .pin = drm_gem_shmem_pin,
> +       .pin = virtio_gpu_gem_object_pin,
>         .unpin = drm_gem_shmem_unpin,
>         .get_sg_table = drm_gem_shmem_get_sg_table,
>         .vmap = drm_gem_shmem_vmap,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 80176f379ad51..8bc2359a6d625 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -579,26 +579,42 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct
> virtio_gpu_device *vgdev,
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> +static void
> +virtio_gpu_cmd_resource_attach_backing_cb(struct virtio_gpu_device *vgdev,
> +                                         struct virtio_gpu_vbuffer *vbuf)
> +{
> +       struct virtio_gpu_object_array *objs = vbuf->objs;
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
> +
> +       bo->attach_backing_complete = true;
> +       wake_up_all(&vgdev->resp_wq);
> +}
> +
>  static void
>  virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev,
> -                                      uint32_t resource_id,
> +                                      struct virtio_gpu_object *bo,
>                                        struct virtio_gpu_mem_entry *ents,
>                                        uint32_t nents,
> +                                      struct virtio_gpu_object_array *objs,
>                                        struct virtio_gpu_fence *fence)
>  {
>         struct virtio_gpu_resource_attach_backing *cmd_p;
>         struct virtio_gpu_vbuffer *vbuf;
>
> -       cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
> +       cmd_p = virtio_gpu_alloc_cmd_resp(
> +               vgdev, virtio_gpu_cmd_resource_attach_backing_cb, &vbuf,
> +               sizeof(*cmd_p), sizeof(struct virtio_gpu_ctrl_hdr), NULL);
>         memset(cmd_p, 0, sizeof(*cmd_p));
>
>         cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING);
> -       cmd_p->resource_id = cpu_to_le32(resource_id);
> +       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
>         cmd_p->nr_entries = cpu_to_le32(nents);
>
>         vbuf->data_buf = ents;
>         vbuf->data_size = sizeof(*ents) * nents;
>
> +       vbuf->objs = objs;
> +
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> @@ -1048,9 +1064,10 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              struct virtio_gpu_fence *fence)
>  {
>         bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
> +       struct virtio_gpu_object_array *objs = NULL;
>         struct virtio_gpu_mem_entry *ents;
>         struct scatterlist *sg;
> -       int si, nents, ret;
> +       int si, nents, ret = 0;
>
>         if (WARN_ON_ONCE(!obj->created))
>                 return -EINVAL;
> @@ -1063,8 +1080,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>
>         obj->pages = drm_gem_shmem_get_sg_table(&obj->base.base);
>         if (obj->pages == NULL) {
> -               drm_gem_shmem_unpin(&obj->base.base);
> -               return -EINVAL;
> +               ret = -EINVAL;
> +               goto err_unpin;
>         }
>
>         if (use_dma_api) {
> @@ -1081,7 +1098,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              GFP_KERNEL);
>         if (!ents) {
>                 DRM_ERROR("failed to allocate ent list\n");
> -               return -ENOMEM;
> +               ret = -ENOMEM;
> +               goto err_unmap;
>         }
>
>         for_each_sg(obj->pages->sgl, sg, nents, si) {
> @@ -1092,10 +1110,38 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                 ents[si].padding = 0;
>         }
>
> -       virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle,
> +       objs = virtio_gpu_array_alloc(1);
> +       if (!objs) {
> +               ret = -ENOMEM;
> +               goto err_free_ents;
> +       }
> +       virtio_gpu_array_add_obj(objs, &obj->base.base);
> +
> +       if (fence) {
> +               ret = virtio_gpu_array_lock_resv(objs);
> +               if (ret != 0)
> +                       goto err_put_objs;
> +       }
> +
> +       virtio_gpu_cmd_resource_attach_backing(vgdev, obj,
>                                                ents, nents,
> -                                              fence);
> +                                              objs, fence);
>         return 0;
> +
> +err_put_objs:
> +       virtio_gpu_array_put_free(objs);
> +err_free_ents:
> +       kfree(ents);
> +err_unmap:
> +       if (use_dma_api) {
> +               dma_unmap_sg(vgdev->vdev->dev.parent,
> +                            obj->pages->sgl, obj->pages->nents,
> +                            DMA_TO_DEVICE);
> +               obj->mapped = 0;
> +       }
> +err_unpin:
> +       drm_gem_shmem_unpin(&obj->base.base);
> +       return ret;
>  }
>
>  void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
> --
> 2.24.0.rc0.303.g954a862665-goog

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-17 10:13                       ` Gerd Hoffmann
@ 2019-10-31  9:10                         ` David Stevens
  -1 siblings, 0 replies; 59+ messages in thread
From: David Stevens @ 2019-10-31  9:10 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, Dmitry Morozov, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, Alex Lau, Dylan Reid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

[Resending after subscribing to virtio-dev, sorry for the noise]

> When running drm-misc-next you should be able to test whenever that'll
> actually work without any virtio-gpu driver changes.

I did some experimentation with the Chrome OS kernel-next branch
(based on v5.4-rc3) plus drm-misc-next. I looked at both the chromeos
downstream virtio-wayland driver as well as the virtio-vdec driver
that was recently proposed to upstream.

Using the dma address of buffers generally works. However, it does
require the addition of some synchronization in the virtio-gpu driver
to prevent races between the virtio-gpu device registering the buffer
with the guest dma address (which happens with the ATTACH_BACKING
command) and other virtio devices using the guest dma address as a
buffer identifier. I've included a patch that adds this
synchronization.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 drivers/gpu/drm/virtio/virtgpu_drv.h    |  2 +
 drivers/gpu/drm/virtio/virtgpu_object.c | 15 +++++-
 drivers/gpu/drm/virtio/virtgpu_vq.c     | 64 +++++++++++++++++++++----
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 314e02f94d9c4..00a2c0e6b6382 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -72,6 +72,8 @@ struct virtio_gpu_object {
        uint32_t mapped;
        bool dumb;
        bool created;
+
+       bool attach_backing_complete;
 };
 #define gem_to_virtio_gpu_obj(gobj) \
        container_of((gobj), struct virtio_gpu_object, base.base)
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
b/drivers/gpu/drm/virtio/virtgpu_object.c
index 017a9e0fc3bb8..812a0a48f6385 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -75,13 +75,26 @@ static void virtio_gpu_free_object(struct
drm_gem_object *obj)
        drm_gem_shmem_free_object(obj);
 }

+int virtio_gpu_gem_object_pin(struct drm_gem_object *obj)
+{
+       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+       struct virtio_gpu_device *vgdev = obj->dev->dev_private;
+
+       if (!bo->attach_backing_complete)
+               wait_event(vgdev->resp_wq, bo->attach_backing_complete);
+       if (!bo->attach_backing_complete)
+               return -EFAULT;
+
+       return drm_gem_shmem_pin(obj);
+}
+
 static const struct drm_gem_object_funcs virtio_gpu_gem_funcs = {
        .free = virtio_gpu_free_object,
        .open = virtio_gpu_gem_object_open,
        .close = virtio_gpu_gem_object_close,

        .print_info = drm_gem_shmem_print_info,
-       .pin = drm_gem_shmem_pin,
+       .pin = virtio_gpu_gem_object_pin,
        .unpin = drm_gem_shmem_unpin,
        .get_sg_table = drm_gem_shmem_get_sg_table,
        .vmap = drm_gem_shmem_vmap,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 80176f379ad51..8bc2359a6d625 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -579,26 +579,42 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct
virtio_gpu_device *vgdev,
        virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
 }

+static void
+virtio_gpu_cmd_resource_attach_backing_cb(struct virtio_gpu_device *vgdev,
+                                         struct virtio_gpu_vbuffer *vbuf)
+{
+       struct virtio_gpu_object_array *objs = vbuf->objs;
+       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
+
+       bo->attach_backing_complete = true;
+       wake_up_all(&vgdev->resp_wq);
+}
+
 static void
 virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev,
-                                      uint32_t resource_id,
+                                      struct virtio_gpu_object *bo,
                                       struct virtio_gpu_mem_entry *ents,
                                       uint32_t nents,
+                                      struct virtio_gpu_object_array *objs,
                                       struct virtio_gpu_fence *fence)
 {
        struct virtio_gpu_resource_attach_backing *cmd_p;
        struct virtio_gpu_vbuffer *vbuf;

-       cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
+       cmd_p = virtio_gpu_alloc_cmd_resp(
+               vgdev, virtio_gpu_cmd_resource_attach_backing_cb, &vbuf,
+               sizeof(*cmd_p), sizeof(struct virtio_gpu_ctrl_hdr), NULL);
        memset(cmd_p, 0, sizeof(*cmd_p));

        cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING);
-       cmd_p->resource_id = cpu_to_le32(resource_id);
+       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
        cmd_p->nr_entries = cpu_to_le32(nents);

        vbuf->data_buf = ents;
        vbuf->data_size = sizeof(*ents) * nents;

+       vbuf->objs = objs;
+
        virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
 }

@@ -1048,9 +1064,10 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                             struct virtio_gpu_fence *fence)
 {
        bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
+       struct virtio_gpu_object_array *objs = NULL;
        struct virtio_gpu_mem_entry *ents;
        struct scatterlist *sg;
-       int si, nents, ret;
+       int si, nents, ret = 0;

        if (WARN_ON_ONCE(!obj->created))
                return -EINVAL;
@@ -1063,8 +1080,8 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,

        obj->pages = drm_gem_shmem_get_sg_table(&obj->base.base);
        if (obj->pages == NULL) {
-               drm_gem_shmem_unpin(&obj->base.base);
-               return -EINVAL;
+               ret = -EINVAL;
+               goto err_unpin;
        }

        if (use_dma_api) {
@@ -1081,7 +1098,8 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                             GFP_KERNEL);
        if (!ents) {
                DRM_ERROR("failed to allocate ent list\n");
-               return -ENOMEM;
+               ret = -ENOMEM;
+               goto err_unmap;
        }

        for_each_sg(obj->pages->sgl, sg, nents, si) {
@@ -1092,10 +1110,38 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                ents[si].padding = 0;
        }

-       virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle,
+       objs = virtio_gpu_array_alloc(1);
+       if (!objs) {
+               ret = -ENOMEM;
+               goto err_free_ents;
+       }
+       virtio_gpu_array_add_obj(objs, &obj->base.base);
+
+       if (fence) {
+               ret = virtio_gpu_array_lock_resv(objs);
+               if (ret != 0)
+                       goto err_put_objs;
+       }
+
+       virtio_gpu_cmd_resource_attach_backing(vgdev, obj,
                                               ents, nents,
-                                              fence);
+                                              objs, fence);
        return 0;
+
+err_put_objs:
+       virtio_gpu_array_put_free(objs);
+err_free_ents:
+       kfree(ents);
+err_unmap:
+       if (use_dma_api) {
+               dma_unmap_sg(vgdev->vdev->dev.parent,
+                            obj->pages->sgl, obj->pages->nents,
+                            DMA_TO_DEVICE);
+               obj->mapped = 0;
+       }
+err_unpin:
+       drm_gem_shmem_unpin(&obj->base.base);
+       return ret;
 }

 void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
--
2.24.0.rc0.303.g954a862665-goog

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-10-31  9:10                         ` David Stevens
  0 siblings, 0 replies; 59+ messages in thread
From: David Stevens @ 2019-10-31  9:10 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Tomasz Figa, Dmitry Morozov, virtio-dev, Keiichi Watanabe,
	Alexandre Courbot, Alex Lau, Dylan Reid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter

[Resending after subscribing to virtio-dev, sorry for the noise]

> When running drm-misc-next you should be able to test whenever that'll
> actually work without any virtio-gpu driver changes.

I did some experimentation with the Chrome OS kernel-next branch
(based on v5.4-rc3) plus drm-misc-next. I looked at both the chromeos
downstream virtio-wayland driver as well as the virtio-vdec driver
that was recently proposed to upstream.

Using the dma address of buffers generally works. However, it does
require the addition of some synchronization in the virtio-gpu driver
to prevent races between the virtio-gpu device registering the buffer
with the guest dma address (which happens with the ATTACH_BACKING
command) and other virtio devices using the guest dma address as a
buffer identifier. I've included a patch that adds this
synchronization.

Signed-off-by: David Stevens <stevensd@chromium.org>
---
 drivers/gpu/drm/virtio/virtgpu_drv.h    |  2 +
 drivers/gpu/drm/virtio/virtgpu_object.c | 15 +++++-
 drivers/gpu/drm/virtio/virtgpu_vq.c     | 64 +++++++++++++++++++++----
 3 files changed, 71 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 314e02f94d9c4..00a2c0e6b6382 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -72,6 +72,8 @@ struct virtio_gpu_object {
        uint32_t mapped;
        bool dumb;
        bool created;
+
+       bool attach_backing_complete;
 };
 #define gem_to_virtio_gpu_obj(gobj) \
        container_of((gobj), struct virtio_gpu_object, base.base)
diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
b/drivers/gpu/drm/virtio/virtgpu_object.c
index 017a9e0fc3bb8..812a0a48f6385 100644
--- a/drivers/gpu/drm/virtio/virtgpu_object.c
+++ b/drivers/gpu/drm/virtio/virtgpu_object.c
@@ -75,13 +75,26 @@ static void virtio_gpu_free_object(struct
drm_gem_object *obj)
        drm_gem_shmem_free_object(obj);
 }

+int virtio_gpu_gem_object_pin(struct drm_gem_object *obj)
+{
+       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+       struct virtio_gpu_device *vgdev = obj->dev->dev_private;
+
+       if (!bo->attach_backing_complete)
+               wait_event(vgdev->resp_wq, bo->attach_backing_complete);
+       if (!bo->attach_backing_complete)
+               return -EFAULT;
+
+       return drm_gem_shmem_pin(obj);
+}
+
 static const struct drm_gem_object_funcs virtio_gpu_gem_funcs = {
        .free = virtio_gpu_free_object,
        .open = virtio_gpu_gem_object_open,
        .close = virtio_gpu_gem_object_close,

        .print_info = drm_gem_shmem_print_info,
-       .pin = drm_gem_shmem_pin,
+       .pin = virtio_gpu_gem_object_pin,
        .unpin = drm_gem_shmem_unpin,
        .get_sg_table = drm_gem_shmem_get_sg_table,
        .vmap = drm_gem_shmem_vmap,
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c
b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 80176f379ad51..8bc2359a6d625 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -579,26 +579,42 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct
virtio_gpu_device *vgdev,
        virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
 }

+static void
+virtio_gpu_cmd_resource_attach_backing_cb(struct virtio_gpu_device *vgdev,
+                                         struct virtio_gpu_vbuffer *vbuf)
+{
+       struct virtio_gpu_object_array *objs = vbuf->objs;
+       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
+
+       bo->attach_backing_complete = true;
+       wake_up_all(&vgdev->resp_wq);
+}
+
 static void
 virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev,
-                                      uint32_t resource_id,
+                                      struct virtio_gpu_object *bo,
                                       struct virtio_gpu_mem_entry *ents,
                                       uint32_t nents,
+                                      struct virtio_gpu_object_array *objs,
                                       struct virtio_gpu_fence *fence)
 {
        struct virtio_gpu_resource_attach_backing *cmd_p;
        struct virtio_gpu_vbuffer *vbuf;

-       cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
+       cmd_p = virtio_gpu_alloc_cmd_resp(
+               vgdev, virtio_gpu_cmd_resource_attach_backing_cb, &vbuf,
+               sizeof(*cmd_p), sizeof(struct virtio_gpu_ctrl_hdr), NULL);
        memset(cmd_p, 0, sizeof(*cmd_p));

        cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING);
-       cmd_p->resource_id = cpu_to_le32(resource_id);
+       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
        cmd_p->nr_entries = cpu_to_le32(nents);

        vbuf->data_buf = ents;
        vbuf->data_size = sizeof(*ents) * nents;

+       vbuf->objs = objs;
+
        virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
 }

@@ -1048,9 +1064,10 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                             struct virtio_gpu_fence *fence)
 {
        bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
+       struct virtio_gpu_object_array *objs = NULL;
        struct virtio_gpu_mem_entry *ents;
        struct scatterlist *sg;
-       int si, nents, ret;
+       int si, nents, ret = 0;

        if (WARN_ON_ONCE(!obj->created))
                return -EINVAL;
@@ -1063,8 +1080,8 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,

        obj->pages = drm_gem_shmem_get_sg_table(&obj->base.base);
        if (obj->pages == NULL) {
-               drm_gem_shmem_unpin(&obj->base.base);
-               return -EINVAL;
+               ret = -EINVAL;
+               goto err_unpin;
        }

        if (use_dma_api) {
@@ -1081,7 +1098,8 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                             GFP_KERNEL);
        if (!ents) {
                DRM_ERROR("failed to allocate ent list\n");
-               return -ENOMEM;
+               ret = -ENOMEM;
+               goto err_unmap;
        }

        for_each_sg(obj->pages->sgl, sg, nents, si) {
@@ -1092,10 +1110,38 @@ int virtio_gpu_object_attach(struct
virtio_gpu_device *vgdev,
                ents[si].padding = 0;
        }

-       virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle,
+       objs = virtio_gpu_array_alloc(1);
+       if (!objs) {
+               ret = -ENOMEM;
+               goto err_free_ents;
+       }
+       virtio_gpu_array_add_obj(objs, &obj->base.base);
+
+       if (fence) {
+               ret = virtio_gpu_array_lock_resv(objs);
+               if (ret != 0)
+                       goto err_put_objs;
+       }
+
+       virtio_gpu_cmd_resource_attach_backing(vgdev, obj,
                                               ents, nents,
-                                              fence);
+                                              objs, fence);
        return 0;
+
+err_put_objs:
+       virtio_gpu_array_put_free(objs);
+err_free_ents:
+       kfree(ents);
+err_unmap:
+       if (use_dma_api) {
+               dma_unmap_sg(vgdev->vdev->dev.parent,
+                            obj->pages->sgl, obj->pages->nents,
+                            DMA_TO_DEVICE);
+               obj->mapped = 0;
+       }
+err_unpin:
+       drm_gem_shmem_unpin(&obj->base.base);
+       return ret;
 }

 void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
--
2.24.0.rc0.303.g954a862665-goog

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
  2019-10-31  9:10                         ` David Stevens
@ 2019-11-07  8:29                           ` Keiichi Watanabe
  -1 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-11-07  8:29 UTC (permalink / raw)
  To: David Stevens
  Cc: Gerd Hoffmann, Tomasz Figa, Dmitry Morozov, virtio-dev,
	Alexandre Courbot, Alex Lau, Dylan Reid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter, nicolas.dufresne

Hi everyone,

Thanks for all the comments.

As Gerd created a separate thread for buffer sharing
(https://markmail.org/message/jeh5xjjxvylyrbur), I wonder if we can start a
discussion on the protocol details in this thread now. I think the details of
buffer sharing shouldn't affect overall design of the virtio-vdec protocol so
much.

Any feedback would be greatly appreciated. Thanks in advance.

Best regards,
Keiichi

On Thu, Oct 31, 2019 at 6:11 PM David Stevens <stevensd@chromium.org> wrote:
>
> [Resending after subscribing to virtio-dev, sorry for the noise]
>
> > When running drm-misc-next you should be able to test whenever that'll
> > actually work without any virtio-gpu driver changes.
>
> I did some experimentation with the Chrome OS kernel-next branch
> (based on v5.4-rc3) plus drm-misc-next. I looked at both the chromeos
> downstream virtio-wayland driver as well as the virtio-vdec driver
> that was recently proposed to upstream.
>
> Using the dma address of buffers generally works. However, it does
> require the addition of some synchronization in the virtio-gpu driver
> to prevent races between the virtio-gpu device registering the buffer
> with the guest dma address (which happens with the ATTACH_BACKING
> command) and other virtio devices using the guest dma address as a
> buffer identifier. I've included a patch that adds this
> synchronization.
>
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h    |  2 +
>  drivers/gpu/drm/virtio/virtgpu_object.c | 15 +++++-
>  drivers/gpu/drm/virtio/virtgpu_vq.c     | 64 +++++++++++++++++++++----
>  3 files changed, 71 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 314e02f94d9c4..00a2c0e6b6382 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -72,6 +72,8 @@ struct virtio_gpu_object {
>         uint32_t mapped;
>         bool dumb;
>         bool created;
> +
> +       bool attach_backing_complete;
>  };
>  #define gem_to_virtio_gpu_obj(gobj) \
>         container_of((gobj), struct virtio_gpu_object, base.base)
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index 017a9e0fc3bb8..812a0a48f6385 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -75,13 +75,26 @@ static void virtio_gpu_free_object(struct
> drm_gem_object *obj)
>         drm_gem_shmem_free_object(obj);
>  }
>
> +int virtio_gpu_gem_object_pin(struct drm_gem_object *obj)
> +{
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> +       struct virtio_gpu_device *vgdev = obj->dev->dev_private;
> +
> +       if (!bo->attach_backing_complete)
> +               wait_event(vgdev->resp_wq, bo->attach_backing_complete);
> +       if (!bo->attach_backing_complete)
> +               return -EFAULT;
> +
> +       return drm_gem_shmem_pin(obj);
> +}
> +
>  static const struct drm_gem_object_funcs virtio_gpu_gem_funcs = {
>         .free = virtio_gpu_free_object,
>         .open = virtio_gpu_gem_object_open,
>         .close = virtio_gpu_gem_object_close,
>
>         .print_info = drm_gem_shmem_print_info,
> -       .pin = drm_gem_shmem_pin,
> +       .pin = virtio_gpu_gem_object_pin,
>         .unpin = drm_gem_shmem_unpin,
>         .get_sg_table = drm_gem_shmem_get_sg_table,
>         .vmap = drm_gem_shmem_vmap,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 80176f379ad51..8bc2359a6d625 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -579,26 +579,42 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct
> virtio_gpu_device *vgdev,
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> +static void
> +virtio_gpu_cmd_resource_attach_backing_cb(struct virtio_gpu_device *vgdev,
> +                                         struct virtio_gpu_vbuffer *vbuf)
> +{
> +       struct virtio_gpu_object_array *objs = vbuf->objs;
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
> +
> +       bo->attach_backing_complete = true;
> +       wake_up_all(&vgdev->resp_wq);
> +}
> +
>  static void
>  virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev,
> -                                      uint32_t resource_id,
> +                                      struct virtio_gpu_object *bo,
>                                        struct virtio_gpu_mem_entry *ents,
>                                        uint32_t nents,
> +                                      struct virtio_gpu_object_array *objs,
>                                        struct virtio_gpu_fence *fence)
>  {
>         struct virtio_gpu_resource_attach_backing *cmd_p;
>         struct virtio_gpu_vbuffer *vbuf;
>
> -       cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
> +       cmd_p = virtio_gpu_alloc_cmd_resp(
> +               vgdev, virtio_gpu_cmd_resource_attach_backing_cb, &vbuf,
> +               sizeof(*cmd_p), sizeof(struct virtio_gpu_ctrl_hdr), NULL);
>         memset(cmd_p, 0, sizeof(*cmd_p));
>
>         cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING);
> -       cmd_p->resource_id = cpu_to_le32(resource_id);
> +       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
>         cmd_p->nr_entries = cpu_to_le32(nents);
>
>         vbuf->data_buf = ents;
>         vbuf->data_size = sizeof(*ents) * nents;
>
> +       vbuf->objs = objs;
> +
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> @@ -1048,9 +1064,10 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              struct virtio_gpu_fence *fence)
>  {
>         bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
> +       struct virtio_gpu_object_array *objs = NULL;
>         struct virtio_gpu_mem_entry *ents;
>         struct scatterlist *sg;
> -       int si, nents, ret;
> +       int si, nents, ret = 0;
>
>         if (WARN_ON_ONCE(!obj->created))
>                 return -EINVAL;
> @@ -1063,8 +1080,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>
>         obj->pages = drm_gem_shmem_get_sg_table(&obj->base.base);
>         if (obj->pages == NULL) {
> -               drm_gem_shmem_unpin(&obj->base.base);
> -               return -EINVAL;
> +               ret = -EINVAL;
> +               goto err_unpin;
>         }
>
>         if (use_dma_api) {
> @@ -1081,7 +1098,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              GFP_KERNEL);
>         if (!ents) {
>                 DRM_ERROR("failed to allocate ent list\n");
> -               return -ENOMEM;
> +               ret = -ENOMEM;
> +               goto err_unmap;
>         }
>
>         for_each_sg(obj->pages->sgl, sg, nents, si) {
> @@ -1092,10 +1110,38 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                 ents[si].padding = 0;
>         }
>
> -       virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle,
> +       objs = virtio_gpu_array_alloc(1);
> +       if (!objs) {
> +               ret = -ENOMEM;
> +               goto err_free_ents;
> +       }
> +       virtio_gpu_array_add_obj(objs, &obj->base.base);
> +
> +       if (fence) {
> +               ret = virtio_gpu_array_lock_resv(objs);
> +               if (ret != 0)
> +                       goto err_put_objs;
> +       }
> +
> +       virtio_gpu_cmd_resource_attach_backing(vgdev, obj,
>                                                ents, nents,
> -                                              fence);
> +                                              objs, fence);
>         return 0;
> +
> +err_put_objs:
> +       virtio_gpu_array_put_free(objs);
> +err_free_ents:
> +       kfree(ents);
> +err_unmap:
> +       if (use_dma_api) {
> +               dma_unmap_sg(vgdev->vdev->dev.parent,
> +                            obj->pages->sgl, obj->pages->nents,
> +                            DMA_TO_DEVICE);
> +               obj->mapped = 0;
> +       }
> +err_unpin:
> +       drm_gem_shmem_unpin(&obj->base.base);
> +       return ret;
>  }
>
>  void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
> --
> 2.24.0.rc0.303.g954a862665-goog

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [virtio-dev] [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification
@ 2019-11-07  8:29                           ` Keiichi Watanabe
  0 siblings, 0 replies; 59+ messages in thread
From: Keiichi Watanabe @ 2019-11-07  8:29 UTC (permalink / raw)
  To: David Stevens
  Cc: Gerd Hoffmann, Tomasz Figa, Dmitry Morozov, virtio-dev,
	Alexandre Courbot, Alex Lau, Dylan Reid, Stéphane Marchesin,
	Pawel Osciak, Hans Verkuil, Linux Media Mailing List,
	Daniel Vetter, nicolas.dufresne

Hi everyone,

Thanks for all the comments.

As Gerd created a separate thread for buffer sharing
(https://markmail.org/message/jeh5xjjxvylyrbur), I wonder if we can start a
discussion on the protocol details in this thread now. I think the details of
buffer sharing shouldn't affect overall design of the virtio-vdec protocol so
much.

Any feedback would be greatly appreciated. Thanks in advance.

Best regards,
Keiichi

On Thu, Oct 31, 2019 at 6:11 PM David Stevens <stevensd@chromium.org> wrote:
>
> [Resending after subscribing to virtio-dev, sorry for the noise]
>
> > When running drm-misc-next you should be able to test whenever that'll
> > actually work without any virtio-gpu driver changes.
>
> I did some experimentation with the Chrome OS kernel-next branch
> (based on v5.4-rc3) plus drm-misc-next. I looked at both the chromeos
> downstream virtio-wayland driver as well as the virtio-vdec driver
> that was recently proposed to upstream.
>
> Using the dma address of buffers generally works. However, it does
> require the addition of some synchronization in the virtio-gpu driver
> to prevent races between the virtio-gpu device registering the buffer
> with the guest dma address (which happens with the ATTACH_BACKING
> command) and other virtio devices using the guest dma address as a
> buffer identifier. I've included a patch that adds this
> synchronization.
>
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.h    |  2 +
>  drivers/gpu/drm/virtio/virtgpu_object.c | 15 +++++-
>  drivers/gpu/drm/virtio/virtgpu_vq.c     | 64 +++++++++++++++++++++----
>  3 files changed, 71 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h
> b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 314e02f94d9c4..00a2c0e6b6382 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -72,6 +72,8 @@ struct virtio_gpu_object {
>         uint32_t mapped;
>         bool dumb;
>         bool created;
> +
> +       bool attach_backing_complete;
>  };
>  #define gem_to_virtio_gpu_obj(gobj) \
>         container_of((gobj), struct virtio_gpu_object, base.base)
> diff --git a/drivers/gpu/drm/virtio/virtgpu_object.c
> b/drivers/gpu/drm/virtio/virtgpu_object.c
> index 017a9e0fc3bb8..812a0a48f6385 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_object.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_object.c
> @@ -75,13 +75,26 @@ static void virtio_gpu_free_object(struct
> drm_gem_object *obj)
>         drm_gem_shmem_free_object(obj);
>  }
>
> +int virtio_gpu_gem_object_pin(struct drm_gem_object *obj)
> +{
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
> +       struct virtio_gpu_device *vgdev = obj->dev->dev_private;
> +
> +       if (!bo->attach_backing_complete)
> +               wait_event(vgdev->resp_wq, bo->attach_backing_complete);
> +       if (!bo->attach_backing_complete)
> +               return -EFAULT;
> +
> +       return drm_gem_shmem_pin(obj);
> +}
> +
>  static const struct drm_gem_object_funcs virtio_gpu_gem_funcs = {
>         .free = virtio_gpu_free_object,
>         .open = virtio_gpu_gem_object_open,
>         .close = virtio_gpu_gem_object_close,
>
>         .print_info = drm_gem_shmem_print_info,
> -       .pin = drm_gem_shmem_pin,
> +       .pin = virtio_gpu_gem_object_pin,
>         .unpin = drm_gem_shmem_unpin,
>         .get_sg_table = drm_gem_shmem_get_sg_table,
>         .vmap = drm_gem_shmem_vmap,
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c
> b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 80176f379ad51..8bc2359a6d625 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -579,26 +579,42 @@ void virtio_gpu_cmd_transfer_to_host_2d(struct
> virtio_gpu_device *vgdev,
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> +static void
> +virtio_gpu_cmd_resource_attach_backing_cb(struct virtio_gpu_device *vgdev,
> +                                         struct virtio_gpu_vbuffer *vbuf)
> +{
> +       struct virtio_gpu_object_array *objs = vbuf->objs;
> +       struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(objs->objs[0]);
> +
> +       bo->attach_backing_complete = true;
> +       wake_up_all(&vgdev->resp_wq);
> +}
> +
>  static void
>  virtio_gpu_cmd_resource_attach_backing(struct virtio_gpu_device *vgdev,
> -                                      uint32_t resource_id,
> +                                      struct virtio_gpu_object *bo,
>                                        struct virtio_gpu_mem_entry *ents,
>                                        uint32_t nents,
> +                                      struct virtio_gpu_object_array *objs,
>                                        struct virtio_gpu_fence *fence)
>  {
>         struct virtio_gpu_resource_attach_backing *cmd_p;
>         struct virtio_gpu_vbuffer *vbuf;
>
> -       cmd_p = virtio_gpu_alloc_cmd(vgdev, &vbuf, sizeof(*cmd_p));
> +       cmd_p = virtio_gpu_alloc_cmd_resp(
> +               vgdev, virtio_gpu_cmd_resource_attach_backing_cb, &vbuf,
> +               sizeof(*cmd_p), sizeof(struct virtio_gpu_ctrl_hdr), NULL);
>         memset(cmd_p, 0, sizeof(*cmd_p));
>
>         cmd_p->hdr.type = cpu_to_le32(VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING);
> -       cmd_p->resource_id = cpu_to_le32(resource_id);
> +       cmd_p->resource_id = cpu_to_le32(bo->hw_res_handle);
>         cmd_p->nr_entries = cpu_to_le32(nents);
>
>         vbuf->data_buf = ents;
>         vbuf->data_size = sizeof(*ents) * nents;
>
> +       vbuf->objs = objs;
> +
>         virtio_gpu_queue_fenced_ctrl_buffer(vgdev, vbuf, &cmd_p->hdr, fence);
>  }
>
> @@ -1048,9 +1064,10 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              struct virtio_gpu_fence *fence)
>  {
>         bool use_dma_api = !virtio_has_iommu_quirk(vgdev->vdev);
> +       struct virtio_gpu_object_array *objs = NULL;
>         struct virtio_gpu_mem_entry *ents;
>         struct scatterlist *sg;
> -       int si, nents, ret;
> +       int si, nents, ret = 0;
>
>         if (WARN_ON_ONCE(!obj->created))
>                 return -EINVAL;
> @@ -1063,8 +1080,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>
>         obj->pages = drm_gem_shmem_get_sg_table(&obj->base.base);
>         if (obj->pages == NULL) {
> -               drm_gem_shmem_unpin(&obj->base.base);
> -               return -EINVAL;
> +               ret = -EINVAL;
> +               goto err_unpin;
>         }
>
>         if (use_dma_api) {
> @@ -1081,7 +1098,8 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                              GFP_KERNEL);
>         if (!ents) {
>                 DRM_ERROR("failed to allocate ent list\n");
> -               return -ENOMEM;
> +               ret = -ENOMEM;
> +               goto err_unmap;
>         }
>
>         for_each_sg(obj->pages->sgl, sg, nents, si) {
> @@ -1092,10 +1110,38 @@ int virtio_gpu_object_attach(struct
> virtio_gpu_device *vgdev,
>                 ents[si].padding = 0;
>         }
>
> -       virtio_gpu_cmd_resource_attach_backing(vgdev, obj->hw_res_handle,
> +       objs = virtio_gpu_array_alloc(1);
> +       if (!objs) {
> +               ret = -ENOMEM;
> +               goto err_free_ents;
> +       }
> +       virtio_gpu_array_add_obj(objs, &obj->base.base);
> +
> +       if (fence) {
> +               ret = virtio_gpu_array_lock_resv(objs);
> +               if (ret != 0)
> +                       goto err_put_objs;
> +       }
> +
> +       virtio_gpu_cmd_resource_attach_backing(vgdev, obj,
>                                                ents, nents,
> -                                              fence);
> +                                              objs, fence);
>         return 0;
> +
> +err_put_objs:
> +       virtio_gpu_array_put_free(objs);
> +err_free_ents:
> +       kfree(ents);
> +err_unmap:
> +       if (use_dma_api) {
> +               dma_unmap_sg(vgdev->vdev->dev.parent,
> +                            obj->pages->sgl, obj->pages->nents,
> +                            DMA_TO_DEVICE);
> +               obj->mapped = 0;
> +       }
> +err_unpin:
> +       drm_gem_shmem_unpin(&obj->base.base);
> +       return ret;
>  }
>
>  void virtio_gpu_object_detach(struct virtio_gpu_device *vgdev,
> --
> 2.24.0.rc0.303.g954a862665-goog

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2019-11-07  8:29 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-19  9:34 [PATCH] [RFC RESEND] vdec: Add virtio video decode device specification Keiichi Watanabe
2019-09-19  9:34 ` [virtio-dev] " Keiichi Watanabe
2019-09-19  9:52 ` Hans Verkuil
2019-09-19 11:15   ` Keiichi Watanabe
2019-09-19 11:15     ` [virtio-dev] " Keiichi Watanabe
2019-09-19 11:17     ` Keiichi Watanabe
2019-09-19 11:17       ` [virtio-dev] " Keiichi Watanabe
2019-09-23  8:56 ` [virtio-dev] " Gerd Hoffmann
2019-09-23  8:56   ` Gerd Hoffmann
2019-10-05  6:08   ` Tomasz Figa
2019-10-05  6:08     ` Tomasz Figa
2019-10-07 14:00     ` Dmitry Morozov
2019-10-07 14:00       ` Dmitry Morozov
2019-10-07 14:14       ` Tomasz Figa
2019-10-07 14:14         ` Tomasz Figa
2019-10-07 15:09         ` Dmitry Morozov
2019-10-07 15:09           ` Dmitry Morozov
2019-10-09  3:55           ` Tomasz Figa
2019-10-09  3:55             ` Tomasz Figa
2019-10-11  8:53             ` Dmitry Morozov
2019-10-11  8:53               ` Dmitry Morozov
2019-10-14 12:34               ` Gerd Hoffmann
2019-10-14 12:34                 ` Gerd Hoffmann
2019-10-14 13:05                 ` Dmitry Morozov
2019-10-14 13:05                   ` Dmitry Morozov
2019-10-15  7:54                   ` Gerd Hoffmann
2019-10-15  7:54                     ` Gerd Hoffmann
2019-10-15 14:06                     ` Dmitry Morozov
2019-10-15 14:06                       ` Dmitry Morozov
2019-10-17  8:06                       ` Tomasz Figa
2019-10-17  8:06                         ` Tomasz Figa
2019-10-17  6:40               ` Tomasz Figa
2019-10-17  6:40                 ` Tomasz Figa
2019-10-17  7:19                 ` Gerd Hoffmann
2019-10-17  7:19                   ` Gerd Hoffmann
2019-10-17  8:11                   ` Tomasz Figa
2019-10-17  8:11                     ` Tomasz Figa
2019-10-17 10:13                     ` Gerd Hoffmann
2019-10-17 10:13                       ` Gerd Hoffmann
2019-10-29  7:39                       ` David Stevens
2019-10-31  7:30                         ` Keiichi Watanabe
2019-10-31  7:30                           ` Keiichi Watanabe
2019-10-31  9:10                       ` David Stevens
2019-10-31  9:10                         ` David Stevens
2019-11-07  8:29                         ` Keiichi Watanabe
2019-11-07  8:29                           ` Keiichi Watanabe
2019-10-14 12:19     ` Gerd Hoffmann
2019-10-14 12:19       ` Gerd Hoffmann
2019-10-17  6:58       ` Tomasz Figa
2019-10-17  6:58         ` Tomasz Figa
2019-10-17  7:44         ` Gerd Hoffmann
2019-10-17  7:44           ` Gerd Hoffmann
2019-10-17  8:23           ` Tomasz Figa
2019-10-17  8:23             ` Tomasz Figa
2019-10-17 10:22             ` Gerd Hoffmann
2019-10-17 10:22               ` Gerd Hoffmann
2019-10-17 15:00         ` Frank Yang
2019-10-17 16:22           ` Frank Yang
2019-10-17  7:06       ` David Stevens

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.