All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] Introduce admin virtqueue as a new transport
@ 2021-08-03  3:20 Jason Wang
  2021-08-03 12:40 ` Max Gurtovoy
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Jason Wang @ 2021-08-03  3:20 UTC (permalink / raw)
  To: virtio-comment
  Cc: mst, cohuck, stefanha, mgurtovoy, eperezma, lulu, Jason Wang

This patch introduces a new transport - the admin virtqueue. This
transport is useful for implementing virtual devices with a limited
transport specific resources or presenting the virtual device in a
transport independent way.

This means, all the basic device facilities are provided solely via
the the admin virtqueue. Additionally, the admin virtqueue is also in
charge of the creating and destroying of the virtual device.

To be self-contained and not depend on the platform specific
feature. Device MMU is also introduced for providing the DMA isolation
among virtual devices.

With the help of the admin virtqueue, the presenting of the virtual
device is done via the co-operation between the management device and
its driver.

This is just a draft for demonstrating the basic ideas. Some possible
enhancements:

- admin event virtqueue for reporting events like interrupts (on the
  platform withouth MSI) and MMU translation failure
- hardware friendly MMU translation table (e.g in the memory instead
  of using control virtqueue commands)
- command to kick the virtqueue

Comments are more than welcomed.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 content.tex | 639 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 639 insertions(+)

diff --git a/content.tex b/content.tex
index 620c0e2..1f66d42 100644
--- a/content.tex
+++ b/content.tex
@@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
   drive the device.
 
+\item[DEVICE_MMU_FAIL (32)] Indicates that the device MMU has
+  experienced an error from which it can't recover.
+
 \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
   an error from which it can't recover.
 \end{description}
@@ -515,6 +518,642 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
 Virtio can use various different buses, thus the standard is split
 into virtio general and bus-specific sections.
 
+\section{Virtio Over Admin Virtqueue}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue}
+
+Sometimes it's hard to implement the device in a transport specific
+method. One example is that a physical device may try to present
+multiple virtual devices with a limited transport specific
+resources. Another example is to implement virtual devices which is
+transport independent. In those cases, the admin virtqueue provided by
+the management device could be used to replace the transport specific
+method to implement the virtual device. Then the presenting of the
+virtual device is done through the cooperation between the admin
+virtqueue and the driver.
+
+\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over Admin Virtqueue / Basic Concepts}
+
+The device that offers the admin virtqueue (via feature
+VIRTIO_F_ADMIN_VQ) is the management device of the virtual
+devices. All commands are of the following form:
+
+\begin{lstlisting}
+struct virtio_admin_ctrl {
+        u64 device_id;
+        u16 class;
+        u16 command;
+        u8 command-out-data[];
+        u8 ack;
+        u8 command-in-data[]
+};
+
+/* ack values */
+#define VIRTIO_ADMIN_OK     0
+#define VIRTIO_ADMIN_ERR    1
+\end{lstlisting}
+
+The device_id, class, command and command-out-data are set by
+the driver, and the device sets the ack and command-in-data. 0 is used
+for identify the management device itself.
+
+\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
+
+The virtual device MUST not offer VIRTIO_F_ADMIN_VQ feature.
+
+\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
+
+The driver SHOULD negotiate VIRTIO_F_ADMIN_VQ if the device offers it.
+
+\subsection{Virtual Device Discovery}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtual Device Discovery}
+
+The management device is discovered through a transport and device
+specific method. Virtual devices is created and discovered via the
+admin virtqueue.
+
+\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
+
+The capabilites that are supported by the admin virtqueue could be
+fetched through the following commands:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_CAP    0
+ #define VIRTIO_ADMIN_CTRL_CAP_GET        0
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
+supported by the admin virtqueue through a u64 which is a bit mask of
+the capabilies in command-in-data. There's no command-out-data.
+
+The capabilies that is currently supported are:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_F_CAP_VDEV    1
+\end{lstlisting}
+
+The VIRTIO_ADMIN_F_CAP_VDEV capability demonstrates that the virtual
+devices is created, configured and destroyed through admin
+virtqueue. That means the admin virtqueue is the transport for the
+virtual devices.
+
+\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
+
+The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when
+VIRTIO_F_ADMIN_VQ is offered.
+
+The management device MUST fail VIRTIO_ADMIN_CTRL_CAP class when the
+\field{device_id} is not zero.
+
+\drivernormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
+
+The driver MUST use 0 as \field{device_id} for VIRTIO_ADMIN_CTRL_CAP
+class.
+
+\subsection{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capibility,
+virtual devices must be created and discovered through the admin
+virtqueue.
+
+\begin{lstlisting}
+struct virtio_admin_ctrl_vdev_attribute {
+       u32 device_id;
+       u8 config[];
+};
+
+#define VIRTIO_ADMIN_CTRL_VDEV    2
+ #define VIRTIO_ADMIN_CTRL_VDEV_CREATE        0
+ #define VIRTIO_ADMIN_CTRL_VDEV_DESTROY        1
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_VDEV_CREAT command is used to create a virtual
+device. The command-out-data for VIRTIO_ADMIN_CTRL_CREATE is the
+virtio device id (\field{device_id}) and device specific configuration
+(\field{config}) for creating the virtual device. When succeed, the
+device returns a u64 as a unique identifier of the created virtual
+device in command-in-data.
+
+The VIRTIO_ADMIN_CTRL_VDEV_DESTROY command is used to destroy a
+virtual device which is identified by its 64bit identifier
+\field{virtual_device_id}. There's no command-in-data for
+VIRTIO_ADMIN_CTRL_DESTROY command.
+
+\devicenormative{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
+
+The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_CREATE if
+\field{device_id} is not 0.
+
+The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_DESTROY if
+\field{device_id} is 0.
+
+All virtual devices MUST be created via admin virtqueue if the admin
+virtqueue offers VIRTIO_F_CTRL_VDEV.
+
+The management device MAY map implement the virtual device in a
+transport specific way.
+
+\drivernormative{Driver Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
+
+The management driver MUST use 0 as \field{device_id} for
+VIRTIO_ADMIN_CTRL_VDEV_CREATE command.
+
+The management driver SHOULD make sure the virtual device is not used
+by any driver before trying to destroy it.
+
+\subsection{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
+the feature negotiation of virtual devices could be done by the
+following commands:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_FEAT    3
+ #define VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET        0
+ #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET        1
+ #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET        2
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET is to get the features offered
+by a virtual device.
+
+The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET is for driver to accept feature
+bits offered by the virtual device.
+
+The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET is to get the features accepted
+by both the virtual driver and the device.
+
+The features is 64 bits mask of the virtio features bit. For
+VIRTIO_ADMIN_CTRL_DRIVER_SET, the feature is passed to the device
+through command-out-data. For VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET and
+VIRTIO_ADMIN_CTRL_DRIVER_GET the feature is returned for the device
+through command-in-data.
+
+\devicenormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
+
+The management device MUST fail VIRTIO_ADMIN_F_CTRL_FEAT class for the
+command that use 0 as its \field{virtual_device_id}.
+
+\drivernormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
+
+The management driver MAY mediate between the feature negotiation
+request of the virtual devices and the admin virtqueue. E.g when
+offering features to the virtual device, the management driver MAY
+exclude some features in order to limit the behaviour of the virtual
+device.
+
+\subsection{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
+
+When the admin virtqueue offers VIRTIO_ADNIN_F_CAP_VDEV capability,
+the status of virtual device could be accessed by the following
+commands:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_STATUS    4
+ #define VIRTIO_ADMIN_CTRL_STATUS_SET        0
+ #define VIRTIO_ADMIN_CTRL_STATUS_GET        1
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_STATUS_SET is used to set the device status of
+the virtual device here. The command-out-data is the one byte status
+to set to the device. There's no command-in-data for this command.
+
+The VIRTIO_ADMIN_CTRL_STATUS_GET is used to get the device status of
+the virtual device. The command-in-data is the one byte status
+returned from the device. There's no command-out-data for this
+command.
+
+\devicenormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
+
+The management device MUST start the reset of a virtual device when 0
+is written via VIRTIO_ADMIN_CTRL_STATUS_SET, the success of this
+command demonstrate the success of the reset.
+
+The management device MUST present 0 through
+VIRTIO_ADMIN_CTRL_STATUS_GET once the reset is done.
+
+The management device MUST fail the device status access if
+\field{device_id} is zero.
+
+\drivernormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
+
+After writing 0 via VIRTIO_ADMIN_CTRL_STATUS_SET, the driver MUST wait
+for the success of the command before re-initializing the device.
+
+\subsection{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Genreation}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
+the device generation could be read from the following commands:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_GENERATION    5
+ #define VIRTIO_ADMIN_CTRL_GENERATION_GET        0
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_GENERATION_GET is used to get the device generation
+of the virtual device. The command-in-data is the u32 device
+generation returned from the device. There's no command-out-data for
+this command.
+
+\devicenormative{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Generation}
+
+The device MUST present a changed config_generation after the driver
+has read a device-specific configuration value which has changed since
+any part of the device-specific configuration was last read.
+
+The device MUST fail the device generation access if \field{device_id} is zero.
+
+\subsection{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the
+config space of a virtual device could be accessed from
+VIRTIO_ADMIN_CTRL_CONFIG_GET and VIRTIO_ADMIN_CTRL_CONFIG_SET.
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_CONFIG    6
+  #define VIRTIO_ADMIN_CTRL_CONFIG_GET        0
+  #define VIRTIO_ADMIN_CTRL_CONFIG_SET        1
+
+struct virtio_admin_ctrl_vdev_config_get {
+       u32 offset;
+       u32 size;
+};
+
+struct virtio_admin_ctrl_vdev_config_set {
+       u32 offset;
+       u32 size;
+       u8  data[];
+};
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_CONFIG_GET is used to read data from the
+device configuration space. As described in struct
+virtio_admin_ctrl_vdev_config_get, The command-out-data is the offset
+since the start of the config space and the size of the data. The
+command-in-data is the array of u8 data that read from the config
+space.
+
+The VIRTIO_ADMIN_CTRL_CONFIG_SET is used to write data to the device
+configuration space. As described in struct
+virtio_admin_ctrl_vdev_config_set, the command-out-data contains the
+offset since the start of the config space, the size of the data and
+the data that will be wrote. There's no command-in-data for this
+command.
+
+\devicenormative{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
+
+The management device MUST fail the device configuration space access
+if the driver want to access the range which is out of the config
+space.
+
+The management device MUST fail the device configuration space access
+if \field{device_id} is zero.
+
+\subsection{MSI Configuration}label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the MSI entry
+for a specific virtqueue could be set through following command:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_MSI    7
+ #define VIRTIO_ADMIN_CTRL_MSI_VQ_SET        0
+ #define VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE     1
+ #define VIRTIO_ADMIN_CTRL_MSI_VQ_MASK       2
+ #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET    3
+ #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE 4
+ #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK   5
+
+struct virtio_admin_ctrl_vdev_msi_vq_set {
+       u16 queue_index;
+       u64 addr;
+       u32 data;
+};
+
+struct virtio_admin_ctrl_vdev_msi_vq_enable {
+       u16 queue_index;
+       u8 enable;
+};
+
+struct virtio_admin_ctrl_vdev_msi_vq_mask {
+       u16 queue_index;
+       u8 mask;
+};
+
+struct virtio_admin_ctrl_vdev_msi_config {
+       u64 addr;
+       u32 data;
+};
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_MSI_VQ_SET is used to set the MSI entry for a
+specific virtqueue. The command-out-data is the virtqueue index and
+the MSI address and data (as described in struct
+virtio_admin_ctrl_vdev_msix_vq_set).
+
+The VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE and is used to enable or disable
+MSI interrupt for a specific virtqueue. The command-out-data is the
+virtqueue index and whether to enable the MSI: 0 means to enable and 1
+means to disable (as described in struct
+virtio_admin_ctrl_vdev_msi_vq_enable).
+
+The VIRTIO_ADMIN_CTRL_MSI_VQ_MASK and is used to mask or unmask MSI
+interrupt for a specific virtqueue. The command-out-data is the
+virtqueue index and the mask status: 0 means unmak and 1 means mask
+(as described in struct virtio_admin_ctrl_vdev_msi_vq_mask).
+
+The VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET is used to set the MSI entry
+for the config interrupt. The command-out-data is the MSI address and
+data (as described in struct virtio_admin_ctrl_vdev_msix_config).
+
+The VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE is used to enable and disable
+MSI for config space. The command-out-data is an u8: 0 means to
+disable and 1 means to enable.
+
+The VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK is used to mask and unmask MSI
+interrupt for config space. The command-out-data is an u8: 0 means to
+mask and 1 means to unmask.
+
+There's no command-in-data for all the above MSI commands.
+
+\devicenormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
+
+The virtual device MUST record the pending MSI interrupt and
+generate the MSI interrupt is it was pending after unmasking.
+
+The virtual MUST disable the MSI for both virtqueue and config space
+upon reset.
+
+\drivernormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
+
+The driver MUST allocate transport or platform specific MSI entries
+for both virtqueue and config space if it wants to use interrupt.
+
+The driver MAY choose disable the MSI if polling is used.
+
+\subsection{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Address}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the address of
+a specific virtqueue could be done through the following command:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_VQ_ADDR    9
+ #define VIRTIO_ADMIN_CTRL_VQ_ADDR_SET        0
+
+struct virtio_admin_ctrl_vdev_vq_addr {
+       u16 queue_index;
+       u64 device_area;
+       u64 descriptor_area;
+       u64 driver_area;
+};
+\end{lstlisting}
+
+The command-out-data is the queue index, the addresses of device area,
+descriptor area and driver area (as described in struct
+virtio_admin_ctrl_vdev_vq_addr); There's no command-in-data.
+
+\devicenormative{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueeu Address}
+
+The management device MUST fail the commands of class
+VIRTIO_ADMIN_CTRL_VQ_ADDR if \field{device_id} is zero.
+
+\subsection{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
+
+When the admin virtqueue offers VIRTIO_F_ADMIN_F_CAP_VDEV, virtqueue
+status could be set and get through the following command:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_VQ_ENABLE    10
+ #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET       0
+ #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET       1
+
+struct virtio_admin_ctrl_vq_status_set {
+  u16 queue_index;
+  u8 status;
+};
+
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET is used to set the status to a
+specific virtqueue. The command-out-data is the queue index, the
+status that is set to the virtqueue (0 disabled, 1 enabled); There's
+no command-in-data.
+
+The VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET is used to get the status of a
+specific virtqueue. The command-out-data is the u16 of queue
+index. The command-in-data is the virtqueue status (0 disalbed, 1
+enabled).
+
+\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
+
+When disabled, the virtual device MUST stop processing requests from
+this virtqueue.
+
+The management device MUST present a 0 via
+VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET on reset of the virtual device.
+
+The management device MUST fail the virtqueue status access if
+\field{device_id} is zero.
+
+\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
+
+The driver MUST configure the other virtqueue fields before enabling
+the virtqueue with VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET.
+
+\subsection{Virtqueue Size}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, virtqueue size
+could be accessed through the following command:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_VQ_SIZE    11
+ #define VIRTIO_ADMIN_CTRL_VQ_SIZE_SET       0
+ #define VIRTIO_ADMIN_CTRL_VQ_SIZE_GET       1
+
+struct virtio_admin_ctrl_vdev_vq_size_set {
+       u16 queue_index;
+       u16 size;
+};
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_VQ_SIZE_SET command is used to set the virtqueue
+size. The command-out-data is the queue index and the size of the
+virtqueue (as described in struct
+virtio_admin_ctrl_vdev_vq_size_set). There's no command-in-data.
+
+The VIRTIO_ADMIN_CTRL_VQ_SIZE_GET command is used to get the virtqueue
+size. On reset, the maximum queue size supported by the device is
+returned. The command-out-data is the u16 of the virtqueue index. The
+command-in-data is the u16 of queue size for the virtqueue.
+
+\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
+
+The management device MUST fail the virtqueue size access if
+\field{device_id} is zero.
+
+\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the virtqueue
+notification could be done through the following commands:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_CTRL_VQ_NOTIFY    12
+ #define VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET          1
+
+struct virtio_admin_ctrl_vdev_vq_notification_area {
+       le64 addr
+       le64 size;
+};
+\end{lstlisting}
+
+The VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET is used to get the transport
+specific address area that can be used to notify a virtqueue. The
+command-out-data is a u16 of the virtqueue index. The command-in-data
+contains the address and the size of the notification area (as
+described in struct virtio_admin_ctrl_vdev_vq_notification_area).
+
+\devicenormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
+
+The management device MUST fail the VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET if
+there's no transport specific notification address for a virtqueue of
+its virtual device.
+
+The management device MUST fail the virtqueue notification access if
+\field{device_id} is zero.
+
+The management device MUST forbid the notification area of a specific
+virtual device to be accessed from another virtual device.
+
+\drivernormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
+
+The driver MAY choose to notify the virtqueue by writing the queue
+index at address \field{addr} which is fetched from the
+VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET command.
+
+\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
+
+When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, management device
+offers a device MMU for a secure DMA context for each virtual
+device. The device MMU will translate I/O Virtual Address to transport
+specific DMA address before using a transport specific way for DMA:
+
+\begin{lstlisting}
+#define VIRTIO_ADMIN_VQ_CTRL_MMU    13
+ #define VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE       1
+ #define VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET     2
+ #define VIRTIO_ADMIN_VQ_CTRL_MMU_MAP          3
+ #define VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP        4
+ #define VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET      5
+
+struct virtio_admin_ctrl_vdev_mmu_asid_set {
+  le16 queue_index;
+  le64 asid;
+};
+
+struct virtio_admin_ctrl_vdev_mmu_map {
+  le64 iova_start;
+  le64 iova_end;
+  le64 dma_start;
+  le32 flags;
+};
+
+/* Read access is allowed */
+#define VIRTIO_ADMIN_VQ_MAP_F_READ   (1 << 0)
+/* Write access is allowed */
+#define VIRTIO_IOMMU_VQ_MAP_F_WRITE  (1 << 1)
+
+struct virtio_admin_ctrl_vdev_mmu_err {
+  le32 reason;
+  le16 queue_index;
+  le64 asid;
+  le64 iova_start;
+  le64 iova_end;
+  le32 flags;
+};
+
+/* Mapping does not exist */
+#define VIRTIO_ADMIN_VQ_MAP_ERR_NON_EXIST (1 << 0)
+/* Access violates the permission */
+#define VIRTIO_ADMIN_VQ_MAP_ERR_ACC_VIOLATION (1 << 1)
+
+\end{lstlisting}
+
+The VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE is used to enable or disable
+device MMU for a specific virtual device. The command-out-data is a u8
+for telling whether device MMU is enabled for the virtual device: 0
+means to enable and 1 means to disable.
+
+The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
+address space id to a virtqueue. The command-out-data is the queue
+index (\field{queue_index}) and the address space ID (\field{asid})
+assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
+
+The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
+Address range [\field{iova_start}, \field{iova_end}] to transport
+specific DMA address range [\field{dma_start}, \field{dma_start} +
+ \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
+specify the device access permission.
+
+The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
+Virtual Address ranges that are intersected with the range
+[\field{iova_start}, \field{iova_end}].
+
+There's no command-in-data for all the above four commands.
+
+The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
+information of the device MMU. There's no command-out-data, the
+command-in-date is the queue index and its asid, the iova range and
+the access of the operation (as described in struct
+virtio_admin_ctrl_vdev_mmu_err).
+
+\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
+
+The management device MUST fail the device MMU command if \field{device_id} is
+zero.
+
+The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
+command if the iova range is intersected with a existing range.
+
+The management device MUST set both DEVICE_NEEDS_RESET and
+DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
+virtual device.
+
+The device MMU for the virtual device MUST be disabled upon its reset.
+
+Upon reset, the virtual device must reset the Address Space ID for
+each virtqueue to 0.
+
+\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
+  / Virtio Over Admin Virtqueue / Device MMU}
+
+The driver MAY choose to disable the device MMU but it MUST make sure
+the transport specific method could be used to provide a secure DMA
+context for each virtual device.
+
+The driver MAY query the error of device MMU after DEVICE_MMU_FAIL is set.
+
+\subsection{Presenting Virtual Device}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Presenting Virtual Device}
+
+If VIRTIO_ADMIN_F_VDEV is offered by the device. The presenting of
+the virtual device requires co-operation between the management
+driver and the admin virtqueue. This means, from the view of the
+virtual device driver, the transport is done via the communication
+with the management device driver. It's up to the software to decide
+what kind of method that is needed be used for those communications.
+
+The management driver typically do the following steps for creating a
+virtual device:
+
+\begin{enumerate}
+\item Determine the virtio id and device specific configuration.
+\item Create the virtual devices using VIRTIO_ADMIN_CTRL_VDEV_CREATE
+command.
+\item Optionally, configure the MSI.
+\item Optionally, enable and initialize the device MMU.
+\item Setup the necessary communication methods with virtual device driver.
+\item Perform device specific setups.
+\item Let the virtual device to be probed by the virtual device
+driver. The management driver will then use the admin virtqueue to
+implement the requests of basic facility from the virtual device
+driver.
+\end{enumerate}
+
 \section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
 
 Virtio devices are commonly implemented as PCI devices.
-- 
2.24.3 (Apple Git-128)


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-03  3:20 [RFC PATCH] Introduce admin virtqueue as a new transport Jason Wang
@ 2021-08-03 12:40 ` Max Gurtovoy
  2021-08-04  1:37   ` Jason Wang
  2021-08-03 14:51 ` Stefan Hajnoczi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Max Gurtovoy @ 2021-08-03 12:40 UTC (permalink / raw)
  To: Jason Wang, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


On 8/3/2021 6:20 AM, Jason Wang wrote:
> This patch introduces a new transport - the admin virtqueue. This
> transport is useful for implementing virtual devices with a limited
> transport specific resources or presenting the virtual device in a
> transport independent way.
>
> This means, all the basic device facilities are provided solely via
> the the admin virtqueue. Additionally, the admin virtqueue is also in
> charge of the creating and destroying of the virtual device.
>
> To be self-contained and not depend on the platform specific
> feature. Device MMU is also introduced for providing the DMA isolation
> among virtual devices.
>
> With the help of the admin virtqueue, the presenting of the virtual
> device is done via the co-operation between the management device and
> its driver.
>
> This is just a draft for demonstrating the basic ideas. Some possible
> enhancements:
>
> - admin event virtqueue for reporting events like interrupts (on the
>    platform withouth MSI) and MMU translation failure
> - hardware friendly MMU translation table (e.g in the memory instead
>    of using control virtqueue commands)
> - command to kick the virtqueue
>
> Comments are more than welcomed.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   content.tex | 639 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 639 insertions(+)
>
> diff --git a/content.tex b/content.tex
> index 620c0e2..1f66d42 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>     drive the device.
>   
> +\item[DEVICE_MMU_FAIL (32)] Indicates that the device MMU has
> +  experienced an error from which it can't recover.
> +
>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>     an error from which it can't recover.
>   \end{description}
> @@ -515,6 +518,642 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
>   Virtio can use various different buses, thus the standard is split
>   into virtio general and bus-specific sections.
>   
> +\section{Virtio Over Admin Virtqueue}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue}
> +
> +Sometimes it's hard to implement the device in a transport specific
> +method. One example is that a physical device may try to present
> +multiple virtual devices with a limited transport specific
> +resources. Another example is to implement virtual devices which is
> +transport independent. In those cases, the admin virtqueue provided by
> +the management device could be used to replace the transport specific
> +method to implement the virtual device. Then the presenting of the
> +virtual device is done through the cooperation between the admin
> +virtqueue and the driver.

maybe it's me, but I can't understand how admin queue is a transport.

And how can I use admin queue transport to migrate VFs that are 
controlled by virtio PCI PF.

And why the regular admin queue that is part of the device queues can't 
fit to your needs ?

Can you explain your needs ? is it to create a vDPA device from some SW 
interface ?

I don't follow.

> +
> +\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over Admin Virtqueue / Basic Concepts}
> +
> +The device that offers the admin virtqueue (via feature
> +VIRTIO_F_ADMIN_VQ) is the management device of the virtual
> +devices. All commands are of the following form:
> +
> +\begin{lstlisting}
> +struct virtio_admin_ctrl {
> +        u64 device_id;
> +        u16 class;
> +        u16 command;
> +        u8 command-out-data[];
> +        u8 ack;
> +        u8 command-in-data[]
> +};
> +
> +/* ack values */
> +#define VIRTIO_ADMIN_OK     0
> +#define VIRTIO_ADMIN_ERR    1
> +\end{lstlisting}
> +
> +The device_id, class, command and command-out-data are set by
> +the driver, and the device sets the ack and command-in-data. 0 is used
> +for identify the management device itself.
> +
> +\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
> +
> +The virtual device MUST not offer VIRTIO_F_ADMIN_VQ feature.
> +
> +\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
> +
> +The driver SHOULD negotiate VIRTIO_F_ADMIN_VQ if the device offers it.
> +
> +\subsection{Virtual Device Discovery}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtual Device Discovery}
> +
> +The management device is discovered through a transport and device
> +specific method. Virtual devices is created and discovered via the
> +admin virtqueue.
> +
> +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The capabilites that are supported by the admin virtqueue could be
> +fetched through the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_CAP    0
> + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
> +supported by the admin virtqueue through a u64 which is a bit mask of
> +the capabilies in command-in-data. There's no command-out-data.
> +
> +The capabilies that is currently supported are:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_F_CAP_VDEV    1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_F_CAP_VDEV capability demonstrates that the virtual
> +devices is created, configured and destroyed through admin
> +virtqueue. That means the admin virtqueue is the transport for the
> +virtual devices.
> +
> +\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when
> +VIRTIO_F_ADMIN_VQ is offered.
> +
> +The management device MUST fail VIRTIO_ADMIN_CTRL_CAP class when the
> +\field{device_id} is not zero.
> +
> +\drivernormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The driver MUST use 0 as \field{device_id} for VIRTIO_ADMIN_CTRL_CAP
> +class.
> +
> +\subsection{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capibility,
> +virtual devices must be created and discovered through the admin
> +virtqueue.
> +
> +\begin{lstlisting}
> +struct virtio_admin_ctrl_vdev_attribute {
> +       u32 device_id;
> +       u8 config[];
> +};
> +
> +#define VIRTIO_ADMIN_CTRL_VDEV    2
> + #define VIRTIO_ADMIN_CTRL_VDEV_CREATE        0
> + #define VIRTIO_ADMIN_CTRL_VDEV_DESTROY        1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VDEV_CREAT command is used to create a virtual
> +device. The command-out-data for VIRTIO_ADMIN_CTRL_CREATE is the
> +virtio device id (\field{device_id}) and device specific configuration
> +(\field{config}) for creating the virtual device. When succeed, the
> +device returns a u64 as a unique identifier of the created virtual
> +device in command-in-data.
> +
> +The VIRTIO_ADMIN_CTRL_VDEV_DESTROY command is used to destroy a
> +virtual device which is identified by its 64bit identifier
> +\field{virtual_device_id}. There's no command-in-data for
> +VIRTIO_ADMIN_CTRL_DESTROY command.
> +
> +\devicenormative{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_CREATE if
> +\field{device_id} is not 0.
> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_DESTROY if
> +\field{device_id} is 0.
> +
> +All virtual devices MUST be created via admin virtqueue if the admin
> +virtqueue offers VIRTIO_F_CTRL_VDEV.
> +
> +The management device MAY map implement the virtual device in a
> +transport specific way.
> +
> +\drivernormative{Driver Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +The management driver MUST use 0 as \field{device_id} for
> +VIRTIO_ADMIN_CTRL_VDEV_CREATE command.
> +
> +The management driver SHOULD make sure the virtual device is not used
> +by any driver before trying to destroy it.
> +
> +\subsection{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
> +the feature negotiation of virtual devices could be done by the
> +following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_FEAT    3
> + #define VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET        0
> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET        1
> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET        2
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET is to get the features offered
> +by a virtual device.
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET is for driver to accept feature
> +bits offered by the virtual device.
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET is to get the features accepted
> +by both the virtual driver and the device.
> +
> +The features is 64 bits mask of the virtio features bit. For
> +VIRTIO_ADMIN_CTRL_DRIVER_SET, the feature is passed to the device
> +through command-out-data. For VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET and
> +VIRTIO_ADMIN_CTRL_DRIVER_GET the feature is returned for the device
> +through command-in-data.
> +
> +\devicenormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +The management device MUST fail VIRTIO_ADMIN_F_CTRL_FEAT class for the
> +command that use 0 as its \field{virtual_device_id}.
> +
> +\drivernormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +The management driver MAY mediate between the feature negotiation
> +request of the virtual devices and the admin virtqueue. E.g when
> +offering features to the virtual device, the management driver MAY
> +exclude some features in order to limit the behaviour of the virtual
> +device.
> +
> +\subsection{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +When the admin virtqueue offers VIRTIO_ADNIN_F_CAP_VDEV capability,
> +the status of virtual device could be accessed by the following
> +commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_STATUS    4
> + #define VIRTIO_ADMIN_CTRL_STATUS_SET        0
> + #define VIRTIO_ADMIN_CTRL_STATUS_GET        1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_STATUS_SET is used to set the device status of
> +the virtual device here. The command-out-data is the one byte status
> +to set to the device. There's no command-in-data for this command.
> +
> +The VIRTIO_ADMIN_CTRL_STATUS_GET is used to get the device status of
> +the virtual device. The command-in-data is the one byte status
> +returned from the device. There's no command-out-data for this
> +command.
> +
> +\devicenormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +The management device MUST start the reset of a virtual device when 0
> +is written via VIRTIO_ADMIN_CTRL_STATUS_SET, the success of this
> +command demonstrate the success of the reset.
> +
> +The management device MUST present 0 through
> +VIRTIO_ADMIN_CTRL_STATUS_GET once the reset is done.
> +
> +The management device MUST fail the device status access if
> +\field{device_id} is zero.
> +
> +\drivernormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +After writing 0 via VIRTIO_ADMIN_CTRL_STATUS_SET, the driver MUST wait
> +for the success of the command before re-initializing the device.
> +
> +\subsection{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Genreation}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
> +the device generation could be read from the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_GENERATION    5
> + #define VIRTIO_ADMIN_CTRL_GENERATION_GET        0
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_GENERATION_GET is used to get the device generation
> +of the virtual device. The command-in-data is the u32 device
> +generation returned from the device. There's no command-out-data for
> +this command.
> +
> +\devicenormative{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Generation}
> +
> +The device MUST present a changed config_generation after the driver
> +has read a device-specific configuration value which has changed since
> +any part of the device-specific configuration was last read.
> +
> +The device MUST fail the device generation access if \field{device_id} is zero.
> +
> +\subsection{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the
> +config space of a virtual device could be accessed from
> +VIRTIO_ADMIN_CTRL_CONFIG_GET and VIRTIO_ADMIN_CTRL_CONFIG_SET.
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_CONFIG    6
> +  #define VIRTIO_ADMIN_CTRL_CONFIG_GET        0
> +  #define VIRTIO_ADMIN_CTRL_CONFIG_SET        1
> +
> +struct virtio_admin_ctrl_vdev_config_get {
> +       u32 offset;
> +       u32 size;
> +};
> +
> +struct virtio_admin_ctrl_vdev_config_set {
> +       u32 offset;
> +       u32 size;
> +       u8  data[];
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_CONFIG_GET is used to read data from the
> +device configuration space. As described in struct
> +virtio_admin_ctrl_vdev_config_get, The command-out-data is the offset
> +since the start of the config space and the size of the data. The
> +command-in-data is the array of u8 data that read from the config
> +space.
> +
> +The VIRTIO_ADMIN_CTRL_CONFIG_SET is used to write data to the device
> +configuration space. As described in struct
> +virtio_admin_ctrl_vdev_config_set, the command-out-data contains the
> +offset since the start of the config space, the size of the data and
> +the data that will be wrote. There's no command-in-data for this
> +command.
> +
> +\devicenormative{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
> +
> +The management device MUST fail the device configuration space access
> +if the driver want to access the range which is out of the config
> +space.
> +
> +The management device MUST fail the device configuration space access
> +if \field{device_id} is zero.
> +
> +\subsection{MSI Configuration}label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the MSI entry
> +for a specific virtqueue could be set through following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_MSI    7
> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_SET        0
> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE     1
> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_MASK       2
> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET    3
> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE 4
> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK   5
> +
> +struct virtio_admin_ctrl_vdev_msi_vq_set {
> +       u16 queue_index;
> +       u64 addr;
> +       u32 data;
> +};
> +
> +struct virtio_admin_ctrl_vdev_msi_vq_enable {
> +       u16 queue_index;
> +       u8 enable;
> +};
> +
> +struct virtio_admin_ctrl_vdev_msi_vq_mask {
> +       u16 queue_index;
> +       u8 mask;
> +};
> +
> +struct virtio_admin_ctrl_vdev_msi_config {
> +       u64 addr;
> +       u32 data;
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_MSI_VQ_SET is used to set the MSI entry for a
> +specific virtqueue. The command-out-data is the virtqueue index and
> +the MSI address and data (as described in struct
> +virtio_admin_ctrl_vdev_msix_vq_set).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE and is used to enable or disable
> +MSI interrupt for a specific virtqueue. The command-out-data is the
> +virtqueue index and whether to enable the MSI: 0 means to enable and 1
> +means to disable (as described in struct
> +virtio_admin_ctrl_vdev_msi_vq_enable).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_VQ_MASK and is used to mask or unmask MSI
> +interrupt for a specific virtqueue. The command-out-data is the
> +virtqueue index and the mask status: 0 means unmak and 1 means mask
> +(as described in struct virtio_admin_ctrl_vdev_msi_vq_mask).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET is used to set the MSI entry
> +for the config interrupt. The command-out-data is the MSI address and
> +data (as described in struct virtio_admin_ctrl_vdev_msix_config).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE is used to enable and disable
> +MSI for config space. The command-out-data is an u8: 0 means to
> +disable and 1 means to enable.
> +
> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK is used to mask and unmask MSI
> +interrupt for config space. The command-out-data is an u8: 0 means to
> +mask and 1 means to unmask.
> +
> +There's no command-in-data for all the above MSI commands.
> +
> +\devicenormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
> +
> +The virtual device MUST record the pending MSI interrupt and
> +generate the MSI interrupt is it was pending after unmasking.
> +
> +The virtual MUST disable the MSI for both virtqueue and config space
> +upon reset.
> +
> +\drivernormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
> +
> +The driver MUST allocate transport or platform specific MSI entries
> +for both virtqueue and config space if it wants to use interrupt.
> +
> +The driver MAY choose disable the MSI if polling is used.
> +
> +\subsection{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Address}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the address of
> +a specific virtqueue could be done through the following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_ADDR    9
> + #define VIRTIO_ADMIN_CTRL_VQ_ADDR_SET        0
> +
> +struct virtio_admin_ctrl_vdev_vq_addr {
> +       u16 queue_index;
> +       u64 device_area;
> +       u64 descriptor_area;
> +       u64 driver_area;
> +};
> +\end{lstlisting}
> +
> +The command-out-data is the queue index, the addresses of device area,
> +descriptor area and driver area (as described in struct
> +virtio_admin_ctrl_vdev_vq_addr); There's no command-in-data.
> +
> +\devicenormative{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueeu Address}
> +
> +The management device MUST fail the commands of class
> +VIRTIO_ADMIN_CTRL_VQ_ADDR if \field{device_id} is zero.
> +
> +\subsection{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
> +
> +When the admin virtqueue offers VIRTIO_F_ADMIN_F_CAP_VDEV, virtqueue
> +status could be set and get through the following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_ENABLE    10
> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET       0
> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET       1
> +
> +struct virtio_admin_ctrl_vq_status_set {
> +  u16 queue_index;
> +  u8 status;
> +};
> +
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET is used to set the status to a
> +specific virtqueue. The command-out-data is the queue index, the
> +status that is set to the virtqueue (0 disabled, 1 enabled); There's
> +no command-in-data.
> +
> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET is used to get the status of a
> +specific virtqueue. The command-out-data is the u16 of queue
> +index. The command-in-data is the virtqueue status (0 disalbed, 1
> +enabled).
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
> +
> +When disabled, the virtual device MUST stop processing requests from
> +this virtqueue.
> +
> +The management device MUST present a 0 via
> +VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET on reset of the virtual device.
> +
> +The management device MUST fail the virtqueue status access if
> +\field{device_id} is zero.
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
> +
> +The driver MUST configure the other virtqueue fields before enabling
> +the virtqueue with VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET.
> +
> +\subsection{Virtqueue Size}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, virtqueue size
> +could be accessed through the following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_SIZE    11
> + #define VIRTIO_ADMIN_CTRL_VQ_SIZE_SET       0
> + #define VIRTIO_ADMIN_CTRL_VQ_SIZE_GET       1
> +
> +struct virtio_admin_ctrl_vdev_vq_size_set {
> +       u16 queue_index;
> +       u16 size;
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VQ_SIZE_SET command is used to set the virtqueue
> +size. The command-out-data is the queue index and the size of the
> +virtqueue (as described in struct
> +virtio_admin_ctrl_vdev_vq_size_set). There's no command-in-data.
> +
> +The VIRTIO_ADMIN_CTRL_VQ_SIZE_GET command is used to get the virtqueue
> +size. On reset, the maximum queue size supported by the device is
> +returned. The command-out-data is the u16 of the virtqueue index. The
> +command-in-data is the u16 of queue size for the virtqueue.
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
> +
> +The management device MUST fail the virtqueue size access if
> +\field{device_id} is zero.
> +
> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the virtqueue
> +notification could be done through the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_NOTIFY    12
> + #define VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET          1
> +
> +struct virtio_admin_ctrl_vdev_vq_notification_area {
> +       le64 addr
> +       le64 size;
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET is used to get the transport
> +specific address area that can be used to notify a virtqueue. The
> +command-out-data is a u16 of the virtqueue index. The command-in-data
> +contains the address and the size of the notification area (as
> +described in struct virtio_admin_ctrl_vdev_vq_notification_area).
> +
> +\devicenormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET if
> +there's no transport specific notification address for a virtqueue of
> +its virtual device.
> +
> +The management device MUST fail the virtqueue notification access if
> +\field{device_id} is zero.
> +
> +The management device MUST forbid the notification area of a specific
> +virtual device to be accessed from another virtual device.
> +
> +\drivernormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
> +
> +The driver MAY choose to notify the virtqueue by writing the queue
> +index at address \field{addr} which is fetched from the
> +VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET command.
> +
> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, management device
> +offers a device MMU for a secure DMA context for each virtual
> +device. The device MMU will translate I/O Virtual Address to transport
> +specific DMA address before using a transport specific way for DMA:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_VQ_CTRL_MMU    13
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE       1
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET     2
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_MAP          3
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP        4
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET      5
> +
> +struct virtio_admin_ctrl_vdev_mmu_asid_set {
> +  le16 queue_index;
> +  le64 asid;
> +};
> +
> +struct virtio_admin_ctrl_vdev_mmu_map {
> +  le64 iova_start;
> +  le64 iova_end;
> +  le64 dma_start;
> +  le32 flags;
> +};
> +
> +/* Read access is allowed */
> +#define VIRTIO_ADMIN_VQ_MAP_F_READ   (1 << 0)
> +/* Write access is allowed */
> +#define VIRTIO_IOMMU_VQ_MAP_F_WRITE  (1 << 1)
> +
> +struct virtio_admin_ctrl_vdev_mmu_err {
> +  le32 reason;
> +  le16 queue_index;
> +  le64 asid;
> +  le64 iova_start;
> +  le64 iova_end;
> +  le32 flags;
> +};
> +
> +/* Mapping does not exist */
> +#define VIRTIO_ADMIN_VQ_MAP_ERR_NON_EXIST (1 << 0)
> +/* Access violates the permission */
> +#define VIRTIO_ADMIN_VQ_MAP_ERR_ACC_VIOLATION (1 << 1)
> +
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE is used to enable or disable
> +device MMU for a specific virtual device. The command-out-data is a u8
> +for telling whether device MMU is enabled for the virtual device: 0
> +means to enable and 1 means to disable.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
> +address space id to a virtqueue. The command-out-data is the queue
> +index (\field{queue_index}) and the address space ID (\field{asid})
> +assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
> +Address range [\field{iova_start}, \field{iova_end}] to transport
> +specific DMA address range [\field{dma_start}, \field{dma_start} +
> + \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
> +specify the device access permission.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
> +Virtual Address ranges that are intersected with the range
> +[\field{iova_start}, \field{iova_end}].
> +
> +There's no command-in-data for all the above four commands.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
> +information of the device MMU. There's no command-out-data, the
> +command-in-date is the queue index and its asid, the iova range and
> +the access of the operation (as described in struct
> +virtio_admin_ctrl_vdev_mmu_err).
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
> +
> +The management device MUST fail the device MMU command if \field{device_id} is
> +zero.
> +
> +The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
> +command if the iova range is intersected with a existing range.
> +
> +The management device MUST set both DEVICE_NEEDS_RESET and
> +DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
> +virtual device.
> +
> +The device MMU for the virtual device MUST be disabled upon its reset.
> +
> +Upon reset, the virtual device must reset the Address Space ID for
> +each virtqueue to 0.
> +
> +\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
> +  / Virtio Over Admin Virtqueue / Device MMU}
> +
> +The driver MAY choose to disable the device MMU but it MUST make sure
> +the transport specific method could be used to provide a secure DMA
> +context for each virtual device.
> +
> +The driver MAY query the error of device MMU after DEVICE_MMU_FAIL is set.
> +
> +\subsection{Presenting Virtual Device}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Presenting Virtual Device}
> +
> +If VIRTIO_ADMIN_F_VDEV is offered by the device. The presenting of
> +the virtual device requires co-operation between the management
> +driver and the admin virtqueue. This means, from the view of the
> +virtual device driver, the transport is done via the communication
> +with the management device driver. It's up to the software to decide
> +what kind of method that is needed be used for those communications.
> +
> +The management driver typically do the following steps for creating a
> +virtual device:
> +
> +\begin{enumerate}
> +\item Determine the virtio id and device specific configuration.
> +\item Create the virtual devices using VIRTIO_ADMIN_CTRL_VDEV_CREATE
> +command.
> +\item Optionally, configure the MSI.
> +\item Optionally, enable and initialize the device MMU.
> +\item Setup the necessary communication methods with virtual device driver.
> +\item Perform device specific setups.
> +\item Let the virtual device to be probed by the virtual device
> +driver. The management driver will then use the admin virtqueue to
> +implement the requests of basic facility from the virtual device
> +driver.
> +\end{enumerate}
> +
>   \section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
>   
>   Virtio devices are commonly implemented as PCI devices.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-03  3:20 [RFC PATCH] Introduce admin virtqueue as a new transport Jason Wang
  2021-08-03 12:40 ` Max Gurtovoy
@ 2021-08-03 14:51 ` Stefan Hajnoczi
  2021-08-04  3:01   ` Jason Wang
  2021-08-04  8:09 ` Stefan Hajnoczi
  2021-08-04 13:36 ` Michael S. Tsirkin
  3 siblings, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2021-08-03 14:51 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu

[-- Attachment #1: Type: text/plain, Size: 17563 bytes --]

On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> This patch introduces a new transport - the admin virtqueue. This
> transport is useful for implementing virtual devices with a limited
> transport specific resources or presenting the virtual device in a
> transport independent way.
> 
> This means, all the basic device facilities are provided solely via
> the the admin virtqueue. Additionally, the admin virtqueue is also in
> charge of the creating and destroying of the virtual device.
> 
> To be self-contained and not depend on the platform specific
> feature. Device MMU is also introduced for providing the DMA isolation
> among virtual devices.
> 
> With the help of the admin virtqueue, the presenting of the virtual
> device is done via the co-operation between the management device and
> its driver.
> 
> This is just a draft for demonstrating the basic ideas. Some possible
> enhancements:
> 
> - admin event virtqueue for reporting events like interrupts (on the
>   platform withouth MSI) and MMU translation failure
> - hardware friendly MMU translation table (e.g in the memory instead
>   of using control virtqueue commands)
> - command to kick the virtqueue
> 
> Comments are more than welcomed.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  content.tex | 639 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 639 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 620c0e2..1f66d42 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>  
> +\item[DEVICE_MMU_FAIL (32)] Indicates that the device MMU has
> +  experienced an error from which it can't recover.

Will DEVICE_NEEDS_RESET be set together with this bit? The description
suggests that the device operation cannot proceed, so I guess a reset is
required.

> +
>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -515,6 +518,642 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
>  Virtio can use various different buses, thus the standard is split
>  into virtio general and bus-specific sections.
>  
> +\section{Virtio Over Admin Virtqueue}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue}
> +
> +Sometimes it's hard to implement the device in a transport specific
> +method. One example is that a physical device may try to present
> +multiple virtual devices with a limited transport specific
> +resources. Another example is to implement virtual devices which is

s/which is/which are/

> +transport independent. In those cases, the admin virtqueue provided by
> +the management device could be used to replace the transport specific

"management device" is being defined here but the sentence reads as if
referring to a previously-defined term ("the management device").

I suggest saying "a so-called management device" instead or defining it
more formally ("A management device is a device that acts as a container
for virtual devices. Virtual devices are controlled via the management
device's admin virtqueue." ... plus maybe something about whether
virtual devices are static or can be created/deleted dynamically).

> +method to implement the virtual device. Then the presenting of the
> +virtual device is done through the cooperation between the admin
> +virtqueue and the driver.

"the driver" == the management device's driver? (There is also the
virtual device's driver, so it can be confusing.)

> +
> +\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over Admin Virtqueue / Basic Concepts}
> +
> +The device that offers the admin virtqueue (via feature
> +VIRTIO_F_ADMIN_VQ) is the management device of the virtual
> +devices. All commands are of the following form:

I wonder if the name "admin" should be replaced by "management" or
"mgmt". That way it's clear that the virtqueue is located on management
devices, not a generic admin virtqueue that non-management devices can
offer.

> +
> +\begin{lstlisting}
> +struct virtio_admin_ctrl {
> +        u64 device_id;
> +        u16 class;
> +        u16 command;
> +        u8 command-out-data[];
> +        u8 ack;
> +        u8 command-in-data[]
> +};
> +
> +/* ack values */
> +#define VIRTIO_ADMIN_OK     0
> +#define VIRTIO_ADMIN_ERR    1
> +\end{lstlisting}
> +
> +The device_id, class, command and command-out-data are set by
> +the driver, and the device sets the ack and command-in-data. 0 is used

More explicit:
s/0/The device_id value 0/

> +for identify the management device itself.
> +
> +\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
> +
> +The virtual device MUST not offer VIRTIO_F_ADMIN_VQ feature.

Is there a design reason why this isn't possible?

> +
> +\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
> +
> +The driver SHOULD negotiate VIRTIO_F_ADMIN_VQ if the device offers it.
> +
> +\subsection{Virtual Device Discovery}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtual Device Discovery}
> +
> +The management device is discovered through a transport and device
> +specific method. Virtual devices is created and discovered via the

s/devices is created/devices are created/

> +admin virtqueue.
> +
> +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The capabilites that are supported by the admin virtqueue could be
> +fetched through the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_CAP    0
> + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
> +supported by the admin virtqueue through a u64 which is a bit mask of
> +the capabilies in command-in-data. There's no command-out-data.

I'm not sure what this paragraph is describing. Is VIRTIO_ADMIN_CTRL_CAP
a struct virtio_admin_ctrl::command value?

> +
> +The capabilies that is currently supported are:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_F_CAP_VDEV    1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_F_CAP_VDEV capability demonstrates that the virtual
> +devices is created, configured and destroyed through admin

s/devices is created/devices are created/

s/through admin/through the admin/

> +virtqueue. That means the admin virtqueue is the transport for the
> +virtual devices.

At this point 3 terms have been used: admin, management, and vdev. I
think admin is more general than management or vdev, since there is a
vdev capability and other capabilities could be added later? I think
management and vdev mean the same thing?

Does this mean there are effectively two separate concepts:
- Admin virtqueue: a generic mechanism for special commands
- Vdev: the ability for a management device to create, configure, and
  destroy virtual devices
?

Another question: I wonder why there is an admin virtqueue feature bit
instead of a new VIRTIO device ID for the management device? Does this
mean regular virtio-net, virtio-blk, etc devices can have an admin queue
in addition to their normal role?

> +
> +\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when

Oh, VIRTIO_ADMIN_CTRL_CAP is a class value. That wasn't clear when the
constant value was defined.

> +VIRTIO_F_ADMIN_VQ is offered.
> +
> +The management device MUST fail VIRTIO_ADMIN_CTRL_CAP class when the
> +\field{device_id} is not zero.
> +
> +\drivernormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The driver MUST use 0 as \field{device_id} for VIRTIO_ADMIN_CTRL_CAP
> +class.
> +
> +\subsection{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capibility,

s/offers/offers the/

s/capibility/capability/

> +virtual devices must be created and discovered through the admin
> +virtqueue.

Are management devices with statically pre-allocated virtual devices
supported? The text makes it sound like vdev creation is dynamic and
always performed by the driver.

> +\begin{lstlisting}
> +struct virtio_admin_ctrl_vdev_attribute {
> +       u32 device_id;

This name is easy to confuse with the struct virtio_admin_ctrl's
device_id. They have different meanings. Want to rename struct
virtio_admin_ctrl's field to "vdev_id"?

> +       u8 config[];

This field contains device-specific creation attributes, not the initial
contents of the Device Configuration Space? Other names like args,
params, or attrs are less likely to be confused with the Device
Configuration Space.

> +};
> +
> +#define VIRTIO_ADMIN_CTRL_VDEV    2
> + #define VIRTIO_ADMIN_CTRL_VDEV_CREATE        0
> + #define VIRTIO_ADMIN_CTRL_VDEV_DESTROY        1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VDEV_CREAT command is used to create a virtual

s/CREAT/CREATE/

> +device. The command-out-data for VIRTIO_ADMIN_CTRL_CREATE is the
> +virtio device id (\field{device_id}) and device specific configuration
> +(\field{config}) for creating the virtual device. When succeed, the
> +device returns a u64 as a unique identifier of the created virtual
> +device in command-in-data.

Is there a way to distinguish between different types of errors:
- No more resources to create another virtual device
- Invalid config value
- Unsupported device_id value
?

> +
> +The VIRTIO_ADMIN_CTRL_VDEV_DESTROY command is used to destroy a
> +virtual device which is identified by its 64bit identifier
> +\field{virtual_device_id}. There's no command-in-data for
> +VIRTIO_ADMIN_CTRL_DESTROY command.
> +
> +\devicenormative{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_CREATE if
> +\field{device_id} is not 0.

This is where the naming really gets confusing. This "device_id" is the
struct virtio_admin_ctrl field and not the struct
virtio_admin_ctrl_vdev_attribute "device_id" field. Please rename one of
them.

> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_DESTROY if
> +\field{device_id} is 0.
> +
> +All virtual devices MUST be created via admin virtqueue if the admin
> +virtqueue offers VIRTIO_F_CTRL_VDEV.

I'm not sure what the purpose of this statement is. Does it imply that
all virtual devices are destroy on device reset?

> +
> +The management device MAY map implement the virtual device in a
> +transport specific way.

I'm not sure what the purpose of this statement is.

> +
> +\drivernormative{Driver Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +The management driver MUST use 0 as \field{device_id} for
> +VIRTIO_ADMIN_CTRL_VDEV_CREATE command.
> +
> +The management driver SHOULD make sure the virtual device is not used
> +by any driver before trying to destroy it.
> +
> +\subsection{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,

s/VIRTIO_ADMIN_F_CAP_VDEV capability/the VIRTIO_ADMIN_F_CAP_VDEV capability/

> +the feature negotiation of virtual devices could be done by the
> +following commands:

What does "could" mean? IIUC this mechanism is the only way to negotiate
features because this admin queue *is* the VIRTIO Transport for the
virtual device. Therefore:
s/could be/is/

> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_FEAT    3
> + #define VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET        0
> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET        1
> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET        2
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET is to get the features offered
> +by a virtual device.
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET is for driver to accept feature
> +bits offered by the virtual device.
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET is to get the features accepted
> +by both the virtual driver and the device.
> +
> +The features is 64 bits mask of the virtio features bit. For
> +VIRTIO_ADMIN_CTRL_DRIVER_SET, the feature is passed to the device
> +through command-out-data. For VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET and
> +VIRTIO_ADMIN_CTRL_DRIVER_GET the feature is returned for the device
> +through command-in-data.
> +
> +\devicenormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +The management device MUST fail VIRTIO_ADMIN_F_CTRL_FEAT class for the
> +command that use 0 as its \field{virtual_device_id}.

s/virtual_device_id/device_id/ in this version of the document, but I
think virtual_device_id would be clearer :)

> +
> +\drivernormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +The management driver MAY mediate between the feature negotiation
> +request of the virtual devices and the admin virtqueue. E.g when
> +offering features to the virtual device, the management driver MAY
> +exclude some features in order to limit the behaviour of the virtual
> +device.
> +
> +\subsection{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +When the admin virtqueue offers VIRTIO_ADNIN_F_CAP_VDEV capability,

s/VIRTIO_ADNIN_F_CAP_VDEV/the VIRTIO_ADNIN_F_CAP_VDEV/

s/ADNIN/ADMIN/

> +the status of virtual device could be accessed by the following
> +commands:

s/could be/is/

> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_STATUS    4
> + #define VIRTIO_ADMIN_CTRL_STATUS_SET        0
> + #define VIRTIO_ADMIN_CTRL_STATUS_GET        1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_STATUS_SET is used to set the device status of
> +the virtual device here. The command-out-data is the one byte status
> +to set to the device. There's no command-in-data for this command.
> +
> +The VIRTIO_ADMIN_CTRL_STATUS_GET is used to get the device status of
> +the virtual device. The command-in-data is the one byte status
> +returned from the device. There's no command-out-data for this
> +command.
> +
> +\devicenormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +The management device MUST start the reset of a virtual device when 0
> +is written via VIRTIO_ADMIN_CTRL_STATUS_SET, the success of this
> +command demonstrate the success of the reset.
> +
> +The management device MUST present 0 through
> +VIRTIO_ADMIN_CTRL_STATUS_GET once the reset is done.
> +
> +The management device MUST fail the device status access if
> +\field{device_id} is zero.
> +
> +\drivernormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +After writing 0 via VIRTIO_ADMIN_CTRL_STATUS_SET, the driver MUST wait
> +for the success of the command before re-initializing the device.
> +
> +\subsection{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Genreation}

s/Genreation/Generation/

> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
> +the device generation could be read from the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_GENERATION    5
> + #define VIRTIO_ADMIN_CTRL_GENERATION_GET        0
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_GENERATION_GET is used to get the device generation
> +of the virtual device. The command-in-data is the u32 device
> +generation returned from the device. There's no command-out-data for
> +this command.

The term "device generation" is undefined. IIUC this is the
"configuration generation field", "config_generation", or "configuration
atomicity value" in the spec. Please use one of those existing terms.

I have run out of time and am pausing the review here for now.

Will drivers sleep or busy wait for admin vq command completion? I guess
it's unavoidable since other transports usually offer synchronous
operations. The driver code may not able to deschedule since it was not
necessary with the other transports.

An interesting by-product of this approach may be that the admin
virtqueue transport can enable inter-VM device emulation. It might be a
natural way to let VM A emulate devices for VM B by letting VM A handle
the admin virtqueue. Basically a variant of the virtio-vhost-user
approach I tried out a few years ago, but now we're using the admin
virtqueue instead of the vhost-user protocol for inter-VM device
emulation.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-03 12:40 ` Max Gurtovoy
@ 2021-08-04  1:37   ` Jason Wang
  2021-08-04 10:20     ` Max Gurtovoy
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-04  1:37 UTC (permalink / raw)
  To: Max Gurtovoy, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


在 2021/8/3 下午8:40, Max Gurtovoy 写道:
>>
>> +Sometimes it's hard to implement the device in a transport specific
>> +method. One example is that a physical device may try to present
>> +multiple virtual devices with a limited transport specific
>> +resources. Another example is to implement virtual devices which is
>> +transport independent. In those cases, the admin virtqueue provided by
>> +the management device could be used to replace the transport specific
>> +method to implement the virtual device. Then the presenting of the
>> +virtual device is done through the cooperation between the admin
>> +virtqueue and the driver.
>
> maybe it's me, but I can't understand how admin queue is a transport.


The transport is the method that provides basic facility. In this 
proposal, the admin virtqueue is used to provide basic facility for the 
virtual device. That is to say, it's the transport for virtual device.


>
> And how can I use admin queue transport to migrate VFs that are 
> controlled by virtio PCI PF.


This live migration support and the admin virtqueue transport are 
orthogonal. The main motivation of this proposal is used for 
implementing virtual device transport via admin virtqueue. It's not hard 
to add new commands for doing live migration for the virtual device, I 
don't do that since I believe it's expected to be addressed in your 
proposal.

For virtual device, it's a independent virtio device that could be 
assigned to secure DMA context/domain,  it is functional equivalent ADI 
or SF. The difference is that it can work with or without platform 
support (SIOV or PASID).


>
> And why the regular admin queue that is part of the device queues 
> can't fit to your needs ?


For "regular admin queue", did you mean your proposal. Actually, it's 
not conflict, we can unify the interface though the motivation is different.


>
> Can you explain your needs ? is it to create a vDPA device from some 
> SW interface ?


As stated in the patch, the needs are:

- Presenting virtual devices with limited transport specific resources
- Presenting virtual devices without platform support (e.g SR-IOV or SIOV)

We want virtio to have hyper-scalability via slicing at virtio level. 
It's not directly related to vDPA.

For vDPA, vendor are freed to have their own technology to be hyper 
scalable (e.g SF, ADI or other stuffs).

Thanks


>
> I don't follow. 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-03 14:51 ` Stefan Hajnoczi
@ 2021-08-04  3:01   ` Jason Wang
  2021-08-04  6:39     ` Stefan Hajnoczi
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-04  3:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu


在 2021/8/3 下午10:51, Stefan Hajnoczi 写道:
> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>> This patch introduces a new transport - the admin virtqueue. This
>> transport is useful for implementing virtual devices with a limited
>> transport specific resources or presenting the virtual device in a
>> transport independent way.
>>
>> This means, all the basic device facilities are provided solely via
>> the the admin virtqueue. Additionally, the admin virtqueue is also in
>> charge of the creating and destroying of the virtual device.
>>
>> To be self-contained and not depend on the platform specific
>> feature. Device MMU is also introduced for providing the DMA isolation
>> among virtual devices.
>>
>> With the help of the admin virtqueue, the presenting of the virtual
>> device is done via the co-operation between the management device and
>> its driver.
>>
>> This is just a draft for demonstrating the basic ideas. Some possible
>> enhancements:
>>
>> - admin event virtqueue for reporting events like interrupts (on the
>>    platform withouth MSI) and MMU translation failure
>> - hardware friendly MMU translation table (e.g in the memory instead
>>    of using control virtqueue commands)
>> - command to kick the virtqueue
>>
>> Comments are more than welcomed.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   content.tex | 639 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 639 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 620c0e2..1f66d42 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>   
>> +\item[DEVICE_MMU_FAIL (32)] Indicates that the device MMU has
>> +  experienced an error from which it can't recover.
> Will DEVICE_NEEDS_RESET be set together with this bit? The description
> suggests that the device operation cannot proceed, so I guess a reset is
> required.


Yes, the device MMU part mandates DEVICE_NEEDS_RESET with DEVICE_MMU_FAIL.


>
>> +
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -515,6 +518,642 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
>>   Virtio can use various different buses, thus the standard is split
>>   into virtio general and bus-specific sections.
>>   
>> +\section{Virtio Over Admin Virtqueue}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue}
>> +
>> +Sometimes it's hard to implement the device in a transport specific
>> +method. One example is that a physical device may try to present
>> +multiple virtual devices with a limited transport specific
>> +resources. Another example is to implement virtual devices which is
> s/which is/which are/


will fix.


>
>> +transport independent. In those cases, the admin virtqueue provided by
>> +the management device could be used to replace the transport specific
> "management device" is being defined here but the sentence reads as if
> referring to a previously-defined term ("the management device").
>
> I suggest saying "a so-called management device" instead or defining it
> more formally ("A management device is a device that acts as a container
> for virtual devices. Virtual devices are controlled via the management
> device's admin virtqueue." ... plus maybe something about whether
> virtual devices are static or can be created/deleted dynamically).


I will define it formally in the next version.


>
>> +method to implement the virtual device. Then the presenting of the
>> +virtual device is done through the cooperation between the admin
>> +virtqueue and the driver.
> "the driver" == the management device's driver? (There is also the
> virtual device's driver, so it can be confusing.)


Yes, will fix.


>
>> +
>> +\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over Admin Virtqueue / Basic Concepts}
>> +
>> +The device that offers the admin virtqueue (via feature
>> +VIRTIO_F_ADMIN_VQ) is the management device of the virtual
>> +devices. All commands are of the following form:
> I wonder if the name "admin" should be replaced by "management" or
> "mgmt". That way it's clear that the virtqueue is located on management
> devices, not a generic admin virtqueue that non-management devices can
> offer.


That's fine.


>
>> +
>> +\begin{lstlisting}
>> +struct virtio_admin_ctrl {
>> +        u64 device_id;
>> +        u16 class;
>> +        u16 command;
>> +        u8 command-out-data[];
>> +        u8 ack;
>> +        u8 command-in-data[]
>> +};
>> +
>> +/* ack values */
>> +#define VIRTIO_ADMIN_OK     0
>> +#define VIRTIO_ADMIN_ERR    1
>> +\end{lstlisting}
>> +
>> +The device_id, class, command and command-out-data are set by
>> +the driver, and the device sets the ack and command-in-data. 0 is used
> More explicit:
> s/0/The device_id value 0/


Ok.


>
>> +for identify the management device itself.
>> +
>> +\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
>> +
>> +The virtual device MUST not offer VIRTIO_F_ADMIN_VQ feature.
> Is there a design reason why this isn't possible?


I think not. We can allow a virtual device has the admin/mgmt virtqueue. 
But I'm not sure how much useful is it.


>
>> +
>> +\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
>> +
>> +The driver SHOULD negotiate VIRTIO_F_ADMIN_VQ if the device offers it.
>> +
>> +\subsection{Virtual Device Discovery}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtual Device Discovery}
>> +
>> +The management device is discovered through a transport and device
>> +specific method. Virtual devices is created and discovered via the
> s/devices is created/devices are created/


Ok.


>
>> +admin virtqueue.
>> +
>> +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>> +
>> +The capabilites that are supported by the admin virtqueue could be
>> +fetched through the following commands:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_CAP    0
>> + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
>> +supported by the admin virtqueue through a u64 which is a bit mask of
>> +the capabilies in command-in-data. There's no command-out-data.
> I'm not sure what this paragraph is describing. Is VIRTIO_ADMIN_CTRL_CAP
> a struct virtio_admin_ctrl::command value?


VIRTIO_ADMIN_CTRL_CAP is the class, VIRTIO_ADMIN_CTRL_CAP_GET is the 
command.


>
>> +
>> +The capabilies that is currently supported are:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_F_CAP_VDEV    1
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_F_CAP_VDEV capability demonstrates that the virtual
>> +devices is created, configured and destroyed through admin
> s/devices is created/devices are created/
>
> s/through admin/through the admin/


Will fix.


>
>> +virtqueue. That means the admin virtqueue is the transport for the
>> +virtual devices.
> At this point 3 terms have been used: admin, management, and vdev. I
> think admin is more general than management or vdev, since there is a
> vdev capability and other capabilities could be added later?


Yes, it is designed to be extensible for other features like migration.


> I think
> management and vdev mean the same thing?


Nope.

The management device is the device with admin virtqueue. But I agree 
that the terminology is kind of confusing. I can switch to use 
management virtqueue in the next version.

The virtual device is created by the management device, and the 
management provides the transport for the virtual device via the admin 
virtqueue.


>
> Does this mean there are effectively two separate concepts:
> - Admin virtqueue: a generic mechanism for special commands
> - Vdev: the ability for a management device to create, configure, and
>    destroy virtual devices
> ?


Yes.


>
> Another question: I wonder why there is an admin virtqueue feature bit
> instead of a new VIRTIO device ID for the management device? Does this
> mean regular virtio-net, virtio-blk, etc devices can have an admin queue
> in addition to their normal role?


I think so, I just follow the normal networking PF role which is usually 
a network device which allows some kind of remote management.

But it doesn't forbid us to create the management device.

I think it's better to not mandate the management device for now, or is 
there any reason for doing that?


>
>> +
>> +\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>> +
>> +The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when
> Oh, VIRTIO_ADMIN_CTRL_CAP is a class value. That wasn't clear when the
> constant value was defined.


Any suggestion to make it clear? I just follow the style of the current 
virtio-net control vq command definitions.


>
>> +VIRTIO_F_ADMIN_VQ is offered.
>> +
>> +The management device MUST fail VIRTIO_ADMIN_CTRL_CAP class when the
>> +\field{device_id} is not zero.
>> +
>> +\drivernormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>> +
>> +The driver MUST use 0 as \field{device_id} for VIRTIO_ADMIN_CTRL_CAP
>> +class.
>> +
>> +\subsection{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capibility,
> s/offers/offers the/
>
> s/capibility/capability/


Will fix.


>
>> +virtual devices must be created and discovered through the admin
>> +virtqueue.
> Are management devices with statically pre-allocated virtual devices
> supported?


It is supported when I wrote the patch, but for simplicity I remove that 
part. I can add them back.


> The text makes it sound like vdev creation is dynamic and
> always performed by the driver.


Yes.


>
>> +\begin{lstlisting}
>> +struct virtio_admin_ctrl_vdev_attribute {
>> +       u32 device_id;
> This name is easy to confuse with the struct virtio_admin_ctrl's
> device_id. They have different meanings. Want to rename struct
> virtio_admin_ctrl's field to "vdev_id"?


This is the virtio device id, but rethink of the design, it's meangless 
if we don't have a dedicated management device.

Since we don't want a virtio-net management device to create a 
virtio-blk virtual device.


>
>> +       u8 config[];
> This field contains device-specific creation attributes, not the initial
> contents of the Device Configuration Space?


Yes. I tend to leave this device specific.

The detailed format needs more thought. It should be at least the device 
features plus the config space. (Since the config space itself is not 
self-contained).


>   Other names like args,
> params, or attrs are less likely to be confused with the Device
> Configuration Space.


Yes.


>
>> +};
>> +
>> +#define VIRTIO_ADMIN_CTRL_VDEV    2
>> + #define VIRTIO_ADMIN_CTRL_VDEV_CREATE        0
>> + #define VIRTIO_ADMIN_CTRL_VDEV_DESTROY        1
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_VDEV_CREAT command is used to create a virtual
> s/CREAT/CREATE/


Will fix.


>
>> +device. The command-out-data for VIRTIO_ADMIN_CTRL_CREATE is the
>> +virtio device id (\field{device_id}) and device specific configuration
>> +(\field{config}) for creating the virtual device. When succeed, the
>> +device returns a u64 as a unique identifier of the created virtual
>> +device in command-in-data.
> Is there a way to distinguish between different types of errors:
> - No more resources to create another virtual device
> - Invalid config value
> - Unsupported device_id value
> ?


Yes we can. I can add a new filed in the virtio_admin_ctrl for error code.


>
>> +
>> +The VIRTIO_ADMIN_CTRL_VDEV_DESTROY command is used to destroy a
>> +virtual device which is identified by its 64bit identifier
>> +\field{virtual_device_id}. There's no command-in-data for
>> +VIRTIO_ADMIN_CTRL_DESTROY command.
>> +
>> +\devicenormative{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
>> +
>> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_CREATE if
>> +\field{device_id} is not 0.
> This is where the naming really gets confusing. This "device_id" is the
> struct virtio_admin_ctrl field and not the struct
> virtio_admin_ctrl_vdev_attribute "device_id" field. Please rename one of
> them.


Ok.


>
>> +
>> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_DESTROY if
>> +\field{device_id} is 0.
>> +
>> +All virtual devices MUST be created via admin virtqueue if the admin
>> +virtqueue offers VIRTIO_F_CTRL_VDEV.
> I'm not sure what the purpose of this statement is. Does it imply that
> all virtual devices are destroy on device reset?


Nope, it means there should not be static pre-allocated virtual devices.

I will try to add the static pre-allocated virtual devices support.


>
>> +
>> +The management device MAY map implement the virtual device in a
>> +transport specific way.
> I'm not sure what the purpose of this statement is.


Right, this is meaningless and contract to the motivation. If the 
virtual device can be implemented in a transport specific way (e.g the 
virtual function), the admin virtqueue is useless.


>> +
>> +\drivernormative{Driver Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
>> +
>> +The management driver MUST use 0 as \field{device_id} for
>> +VIRTIO_ADMIN_CTRL_VDEV_CREATE command.
>> +
>> +The management driver SHOULD make sure the virtual device is not used
>> +by any driver before trying to destroy it.
>> +
>> +\subsection{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
> s/VIRTIO_ADMIN_F_CAP_VDEV capability/the VIRTIO_ADMIN_F_CAP_VDEV capability/


ok.


>
>> +the feature negotiation of virtual devices could be done by the
>> +following commands:
> What does "could" mean? IIUC this mechanism is the only way to negotiate
> features because this admin queue *is* the VIRTIO Transport for the
> virtual device. Therefore:
> s/could be/is/


Yes, will fix.


>
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_FEAT    3
>> + #define VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET        0
>> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET        1
>> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET        2
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET is to get the features offered
>> +by a virtual device.
>> +
>> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET is for driver to accept feature
>> +bits offered by the virtual device.
>> +
>> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET is to get the features accepted
>> +by both the virtual driver and the device.
>> +
>> +The features is 64 bits mask of the virtio features bit. For
>> +VIRTIO_ADMIN_CTRL_DRIVER_SET, the feature is passed to the device
>> +through command-out-data. For VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET and
>> +VIRTIO_ADMIN_CTRL_DRIVER_GET the feature is returned for the device
>> +through command-in-data.
>> +
>> +\devicenormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
>> +
>> +The management device MUST fail VIRTIO_ADMIN_F_CTRL_FEAT class for the
>> +command that use 0 as its \field{virtual_device_id}.
> s/virtual_device_id/device_id/ in this version of the document, but I
> think virtual_device_id would be clearer :)


Right, let me use that in the next version.


>
>> +
>> +\drivernormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
>> +
>> +The management driver MAY mediate between the feature negotiation
>> +request of the virtual devices and the admin virtqueue. E.g when
>> +offering features to the virtual device, the management driver MAY
>> +exclude some features in order to limit the behaviour of the virtual
>> +device.
>> +
>> +\subsection{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
>> +
>> +When the admin virtqueue offers VIRTIO_ADNIN_F_CAP_VDEV capability,
> s/VIRTIO_ADNIN_F_CAP_VDEV/the VIRTIO_ADNIN_F_CAP_VDEV/
>
> s/ADNIN/ADMIN/


will fix.


>
>> +the status of virtual device could be accessed by the following
>> +commands:
> s/could be/is/


ok.


>
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_STATUS    4
>> + #define VIRTIO_ADMIN_CTRL_STATUS_SET        0
>> + #define VIRTIO_ADMIN_CTRL_STATUS_GET        1
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_STATUS_SET is used to set the device status of
>> +the virtual device here. The command-out-data is the one byte status
>> +to set to the device. There's no command-in-data for this command.
>> +
>> +The VIRTIO_ADMIN_CTRL_STATUS_GET is used to get the device status of
>> +the virtual device. The command-in-data is the one byte status
>> +returned from the device. There's no command-out-data for this
>> +command.
>> +
>> +\devicenormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
>> +
>> +The management device MUST start the reset of a virtual device when 0
>> +is written via VIRTIO_ADMIN_CTRL_STATUS_SET, the success of this
>> +command demonstrate the success of the reset.
>> +
>> +The management device MUST present 0 through
>> +VIRTIO_ADMIN_CTRL_STATUS_GET once the reset is done.
>> +
>> +The management device MUST fail the device status access if
>> +\field{device_id} is zero.
>> +
>> +\drivernormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
>> +
>> +After writing 0 via VIRTIO_ADMIN_CTRL_STATUS_SET, the driver MUST wait
>> +for the success of the command before re-initializing the device.
>> +
>> +\subsection{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Genreation}
> s/Genreation/Generation/
>
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
>> +the device generation could be read from the following commands:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_GENERATION    5
>> + #define VIRTIO_ADMIN_CTRL_GENERATION_GET        0
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_GENERATION_GET is used to get the device generation
>> +of the virtual device. The command-in-data is the u32 device
>> +generation returned from the device. There's no command-out-data for
>> +this command.
> The term "device generation" is undefined. IIUC this is the
> "configuration generation field", "config_generation", or "configuration
> atomicity value" in the spec. Please use one of those existing terms.


Yes.


>
> I have run out of time and am pausing the review here for now.


Appreciate for the reviewing.


>
> Will drivers sleep or busy wait for admin vq command completion? I guess
> it's unavoidable since other transports usually offer synchronous
> operations. The driver code may not able to deschedule since it was not
> necessary with the other transports.


My understanding is the driver can choose to sleep as some transport did.


>
> An interesting by-product of this approach may be that the admin
> virtqueue transport can enable inter-VM device emulation. It might be a
> natural way to let VM A emulate devices for VM B by letting VM A handle
> the admin virtqueue. Basically a variant of the virtio-vhost-user
> approach I tried out a few years ago, but now we're using the admin
> virtqueue instead of the vhost-user protocol for inter-VM device
> emulation.


Yes, that's possible.

Thanks


>
> Stefan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04  3:01   ` Jason Wang
@ 2021-08-04  6:39     ` Stefan Hajnoczi
  2021-08-04  8:39       ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2021-08-04  6:39 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu

[-- Attachment #1: Type: text/plain, Size: 5163 bytes --]

On Wed, Aug 04, 2021 at 11:01:39AM +0800, Jason Wang wrote:
> 在 2021/8/3 下午10:51, Stefan Hajnoczi 写道:
> > On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> > > +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> > > +
> > > +The capabilites that are supported by the admin virtqueue could be
> > > +fetched through the following commands:
> > > +
> > > +\begin{lstlisting}
> > > +#define VIRTIO_ADMIN_CTRL_CAP    0
> > > + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
> > > +\end{lstlisting}
> > > +
> > > +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
> > > +supported by the admin virtqueue through a u64 which is a bit mask of
> > > +the capabilies in command-in-data. There's no command-out-data.
> > I'm not sure what this paragraph is describing. Is VIRTIO_ADMIN_CTRL_CAP
> > a struct virtio_admin_ctrl::command value?
> 
> 
> VIRTIO_ADMIN_CTRL_CAP is the class, VIRTIO_ADMIN_CTRL_CAP_GET is the
> command.

Okay. I found the admin virtqueue command descriptions hard to read, not
just because the class/command definitions weren't obvious to me, but
also because the command/response layout is described in English instead
of a table or C-like notation.

I think something like this would make the commands easier to
understand:

  Capabilities supported by the admin virtqueue are fetched as follows:

  Driver->Device:
  Field       Value                          Type
  -----------------------------------------------
  device_id   0                              u64
  class       VIRTIO_ADMIN_CTRL_CAP (0)      u16
  command     VIRTIO_ADMIN_CTRL_CAP_GET (0)  u16

  Device->Driver:
  Field         Value                        Type
  -----------------------------------------------
  ack           VIRTIO_ADMIN_OK (0)          u8
  capabilities  <supported capability bits>  u64

> > Another question: I wonder why there is an admin virtqueue feature bit
> > instead of a new VIRTIO device ID for the management device? Does this
> > mean regular virtio-net, virtio-blk, etc devices can have an admin queue
> > in addition to their normal role?
> 
> 
> I think so, I just follow the normal networking PF role which is usually a
> network device which allows some kind of remote management.
> 
> But it doesn't forbid us to create the management device.
> 
> I think it's better to not mandate the management device for now, or is
> there any reason for doing that?

Since the admin virtqueue is generic infrastructure (not specific to
vdev management) I think it makes sense to use a feature bit and not a
separate management device. This wasn't obvious to me from this
document, maybe the text can be tweaked to clearly separate the
(generic) admin virtqueue from vdev management.

> 
> 
> > 
> > > +
> > > +\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> > > +
> > > +The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when
> > Oh, VIRTIO_ADMIN_CTRL_CAP is a class value. That wasn't clear when the
> > constant value was defined.
> 
> 
> Any suggestion to make it clear? I just follow the style of the current
> virtio-net control vq command definitions.

#define VIRTIO_ADMIN_CLASS_CAP 0
 #define VIRTIO_ADMIN_CMD_CAP_GET 0

There are many other possible variations that would be equally clear, I
don't really mind.

> > > +virtual devices must be created and discovered through the admin
> > > +virtqueue.
> > Are management devices with statically pre-allocated virtual devices
> > supported?
> 
> 
> It is supported when I wrote the patch, but for simplicity I remove that
> part. I can add them back.

I guess the advantage is that static vdevs don't need creation
parameters. So they could be useful in cases where the hardware defines
the creation parameters (i.e. because they are fixed and unknown to the
management driver). But I don't know whether there is any real-world use
case...

I don't mind if only dynamic vdevs are supported, but please document it
so the scope/intent is clear.

> > 
> > > +\begin{lstlisting}
> > > +struct virtio_admin_ctrl_vdev_attribute {
> > > +       u32 device_id;
> > This name is easy to confuse with the struct virtio_admin_ctrl's
> > device_id. They have different meanings. Want to rename struct
> > virtio_admin_ctrl's field to "vdev_id"?
> 
> 
> This is the virtio device id, but rethink of the design, it's meangless if
> we don't have a dedicated management device.
> 
> Since we don't want a virtio-net management device to create a virtio-blk
> virtual device.

I see. This means that a physical device wishing to support multiple
device types needs to expose multiple management devices, even though
they may share resources (e.g. you can only instantiate 8 vdevs in total
regardless of their type).

I don't have enough experience with device splitting to say whether this
design is flexible enough. It seems okay to me.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-03  3:20 [RFC PATCH] Introduce admin virtqueue as a new transport Jason Wang
  2021-08-03 12:40 ` Max Gurtovoy
  2021-08-03 14:51 ` Stefan Hajnoczi
@ 2021-08-04  8:09 ` Stefan Hajnoczi
  2021-08-04  8:51   ` Jason Wang
  2021-08-04 13:36 ` Michael S. Tsirkin
  3 siblings, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2021-08-04  8:09 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu

[-- Attachment #1: Type: text/plain, Size: 10391 bytes --]

On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:

It might be clearer to call the management device the "parent"
and virtual devices "children". That way the term "virtual" is avoided -
it's already used a lot in the spec :).

> +The VIRTIO_ADMIN_CTRL_CONFIG_SET is used to write data to the device
> +configuration space. As described in struct
> +virtio_admin_ctrl_vdev_config_set, the command-out-data contains the
> +offset since the start of the config space, the size of the data and
> +the data that will be wrote. There's no command-in-data for this

s/wrote/written/

> +command.
> +
> +\devicenormative{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
> +
> +The management device MUST fail the device configuration space access
> +if the driver want to access the range which is out of the config

s/want/wants/

s/out of/outside/

> +\subsection{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Address}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the address of
> +a specific virtqueue could be done through the following command:

s/could be done/is set/

> +\subsection{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
> +
> +When the admin virtqueue offers VIRTIO_F_ADMIN_F_CAP_VDEV, virtqueue
> +status could be set and get through the following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_ENABLE    10
> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET       0
> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET       1
> +
> +struct virtio_admin_ctrl_vq_status_set {
> +  u16 queue_index;
> +  u8 status;
> +};
> +
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET is used to set the status to a
> +specific virtqueue. The command-out-data is the queue index, the
> +status that is set to the virtqueue (0 disabled, 1 enabled); There's
> +no command-in-data.
> +
> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET is used to get the status of a
> +specific virtqueue. The command-out-data is the u16 of queue
> +index. The command-in-data is the virtqueue status (0 disalbed, 1

s/disalbed/disabled/

> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, management device

Is the device MMU feature mandatory when VIRTIO_ADMIN_F_VDEV is
supported?

> +offers a device MMU for a secure DMA context for each virtual

"MMU" is not used elsewhere in the VIRTIO specification so it's an
undefined term. Should this paragraph mention VIRTIO_F_ACCESS_PLATFORM
in order to explain how the Device MMU affects the virtual device?

> +device. The device MMU will translate I/O Virtual Address to transport

s/Address/Addresses/

> +specific DMA address before using a transport specific way for DMA:

s/address/addresses/

> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_VQ_CTRL_MMU    13
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE       1
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET     2
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_MAP          3
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP        4
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET      5
> +
> +struct virtio_admin_ctrl_vdev_mmu_asid_set {
> +  le16 queue_index;
> +  le64 asid;
> +};
> +
> +struct virtio_admin_ctrl_vdev_mmu_map {
> +  le64 iova_start;
> +  le64 iova_end;
> +  le64 dma_start;
> +  le32 flags;
> +};
> +
> +/* Read access is allowed */
> +#define VIRTIO_ADMIN_VQ_MAP_F_READ   (1 << 0)
> +/* Write access is allowed */
> +#define VIRTIO_IOMMU_VQ_MAP_F_WRITE  (1 << 1)
> +
> +struct virtio_admin_ctrl_vdev_mmu_err {
> +  le32 reason;
> +  le16 queue_index;
> +  le64 asid;
> +  le64 iova_start;
> +  le64 iova_end;
> +  le32 flags;
> +};
> +
> +/* Mapping does not exist */
> +#define VIRTIO_ADMIN_VQ_MAP_ERR_NON_EXIST (1 << 0)
> +/* Access violates the permission */
> +#define VIRTIO_ADMIN_VQ_MAP_ERR_ACC_VIOLATION (1 << 1)
> +
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE is used to enable or disable
> +device MMU for a specific virtual device. The command-out-data is a u8

s/device MMU/the device MMU/

> +for telling whether device MMU is enabled for the virtual device: 0
> +means to enable and 1 means to disable.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
> +address space id to a virtqueue. The command-out-data is the queue
> +index (\field{queue_index}) and the address space ID (\field{asid})
> +assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
> +Address range [\field{iova_start}, \field{iova_end}] to transport
> +specific DMA address range [\field{dma_start}, \field{dma_start} +
> + \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
> +specify the device access permission.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
> +Virtual Address ranges that are intersected with the range
> +[\field{iova_start}, \field{iova_end}].
> +
> +There's no command-in-data for all the above four commands.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
> +information of the device MMU. There's no command-out-data, the
> +command-in-date is the queue index and its asid, the iova range and
> +the access of the operation (as described in struct
> +virtio_admin_ctrl_vdev_mmu_err).
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}

Does the virtual device need to support running with the device MMU
disabled?

> +
> +The management device MUST fail the device MMU command if \field{device_id} is
> +zero.
> +
> +The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
> +command if the iova range is intersected with a existing range.
> +
> +The management device MUST set both DEVICE_NEEDS_RESET and
> +DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
> +virtual device.
> +
> +The device MMU for the virtual device MUST be disabled upon its reset.
> +
> +Upon reset, the virtual device must reset the Address Space ID for
> +each virtqueue to 0.
> +
> +\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
> +  / Virtio Over Admin Virtqueue / Device MMU}
> +
> +The driver MAY choose to disable the device MMU but it MUST make sure
> +the transport specific method could be used to provide a secure DMA
> +context for each virtual device.

What does this mean?

During which stages of the virtual device's lifecycle (ACKNOWLEDGE,
DRIVER, FEATURES_OK, DRIVER_OK) may the management driver enable/disable
the device MMU?

> +
> +The driver MAY query the error of device MMU after DEVICE_MMU_FAIL is set.
> +
> +\subsection{Presenting Virtual Device}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Presenting Virtual Device}
> +
> +If VIRTIO_ADMIN_F_VDEV is offered by the device. The presenting of
> +the virtual device requires co-operation between the management
> +driver and the admin virtqueue. This means, from the view of the
> +virtual device driver, the transport is done via the communication
> +with the management device driver. It's up to the software to decide
> +what kind of method that is needed be used for those communications.

I think what this is saying is that the management device's admin
virtqueue is the VIRTIO Transport for the virtual device. In practice
the VMM will emulate another VIRTIO Transport like virtio-pci for the
virtual device and then forward operations to the management device's
admin virtqueue?

> +The management driver typically do the following steps for creating a

s/do/takes/

> +virtual device:
> +
> +\begin{enumerate}
> +\item Determine the virtio id and device specific configuration.
> +\item Create the virtual devices using VIRTIO_ADMIN_CTRL_VDEV_CREATE

s/VIRTIO_ADMIN_CTRL_VDEV_CREATE/the VIRTIO_ADMIN_CTRL_VDEV_CREATE/

> +command.
> +\item Optionally, configure the MSI.
> +\item Optionally, enable and initialize the device MMU.
> +\item Setup the necessary communication methods with virtual device driver.
> +\item Perform device specific setups.

s/setups/setup/

I tried to imagine what the virtio-blk vdev creation parameters need to
look like. Here is what I came up with:

  Virtual Device Creation Parameters for Block Devices
  ----------------------------------------------------
  The following creation parameters specify the details of a new virtual
  block device:

  Field        Type   Meaning
  ----------------------------------------------------------------------
  blkdev_id    u64    Identifier of the underlying block device that
                      provides storage. The enumeration and creation of
                      underlying block devices is
                      implementation-specific.
  num_queues   u16    Number of request virtqueues.
  features_len u8     Number of elements in features[].
  features[]   u32    Device feature bits to report.

  Creation error codes are as follows:

  Error               Meaning
  ----------------------------------------------------------------------
  INVALID_BLKDEV_ID   The underlying block device does not exist.
  BLKDEV_BUSY         The underlying block device is already in use.
  BLKDEV_READ_ONLY    The underlying block device is read-only.
  INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
  UNSUPPORTED_FEATURE A feature bit was given that the device does not
                      support.

  If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
  block device is made available for read-only access.

  Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
  already in use is given.

  Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
  does not support writes and the VIRTIO_BLK_F_RO bit is not set in
  features[].

  The configuration space parameters (see 5.2.4 Device configuration
  layout) are determined by the device based on the underlying block
  device capacity, block size, etc.

Note that this doesn't allow overriding configuration space parameters
(e.g. block size). We probably need to support that in the future for
live migration compatibility.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04  6:39     ` Stefan Hajnoczi
@ 2021-08-04  8:39       ` Jason Wang
  2021-08-04 12:56         ` Stefan Hajnoczi
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-04  8:39 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu


在 2021/8/4 下午2:39, Stefan Hajnoczi 写道:
> On Wed, Aug 04, 2021 at 11:01:39AM +0800, Jason Wang wrote:
>> 在 2021/8/3 下午10:51, Stefan Hajnoczi 写道:
>>> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>>>> +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>>>> +
>>>> +The capabilites that are supported by the admin virtqueue could be
>>>> +fetched through the following commands:
>>>> +
>>>> +\begin{lstlisting}
>>>> +#define VIRTIO_ADMIN_CTRL_CAP    0
>>>> + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
>>>> +\end{lstlisting}
>>>> +
>>>> +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
>>>> +supported by the admin virtqueue through a u64 which is a bit mask of
>>>> +the capabilies in command-in-data. There's no command-out-data.
>>> I'm not sure what this paragraph is describing. Is VIRTIO_ADMIN_CTRL_CAP
>>> a struct virtio_admin_ctrl::command value?
>>
>> VIRTIO_ADMIN_CTRL_CAP is the class, VIRTIO_ADMIN_CTRL_CAP_GET is the
>> command.
> Okay. I found the admin virtqueue command descriptions hard to read, not
> just because the class/command definitions weren't obvious to me, but
> also because the command/response layout is described in English instead
> of a table or C-like notation.
>
> I think something like this would make the commands easier to
> understand:
>
>    Capabilities supported by the admin virtqueue are fetched as follows:
>
>    Driver->Device:
>    Field       Value                          Type
>    -----------------------------------------------
>    device_id   0                              u64
>    class       VIRTIO_ADMIN_CTRL_CAP (0)      u16
>    command     VIRTIO_ADMIN_CTRL_CAP_GET (0)  u16
>
>    Device->Driver:
>    Field         Value                        Type
>    -----------------------------------------------
>    ack           VIRTIO_ADMIN_OK (0)          u8
>    capabilities  <supported capability bits>  u64


I just follows styles that is used in virtio-net control vq:


\begin{lstlisting}
#define VIRTIO_NET_CTRL_MQ    4
  #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0 (for automatic 
receive steering)
  #define VIRTIO_NET_CTRL_MQ_RSS_CONFIG          1 (for configurable 
receive steering)
  #define VIRTIO_NET_CTRL_MQ_HASH_CONFIG         2 (for configurable 
hash calculation)
\end{lstlisting}

...

Do you suggest to change that as well?


>
>>> Another question: I wonder why there is an admin virtqueue feature bit
>>> instead of a new VIRTIO device ID for the management device? Does this
>>> mean regular virtio-net, virtio-blk, etc devices can have an admin queue
>>> in addition to their normal role?
>>
>> I think so, I just follow the normal networking PF role which is usually a
>> network device which allows some kind of remote management.
>>
>> But it doesn't forbid us to create the management device.
>>
>> I think it's better to not mandate the management device for now, or is
>> there any reason for doing that?
> Since the admin virtqueue is generic infrastructure (not specific to
> vdev management) I think it makes sense to use a feature bit and not a
> separate management device. This wasn't obvious to me from this
> document, maybe the text can be tweaked to clearly separate the
> (generic) admin virtqueue from vdev management.


Maybe adding a device normative like

"

The management device MUST offer the admin virtqueue is a device 
specific virtqueue.

"


>
>>
>>>> +
>>>> +\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>>>> +
>>>> +The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when
>>> Oh, VIRTIO_ADMIN_CTRL_CAP is a class value. That wasn't clear when the
>>> constant value was defined.
>>
>> Any suggestion to make it clear? I just follow the style of the current
>> virtio-net control vq command definitions.
> #define VIRTIO_ADMIN_CLASS_CAP 0
>   #define VIRTIO_ADMIN_CMD_CAP_GET 0
>
> There are many other possible variations that would be equally clear, I
> don't really mind.


I see, I can change.


>
>>>> +virtual devices must be created and discovered through the admin
>>>> +virtqueue.
>>> Are management devices with statically pre-allocated virtual devices
>>> supported?
>>
>> It is supported when I wrote the patch, but for simplicity I remove that
>> part. I can add them back.
> I guess the advantage is that static vdevs don't need creation
> parameters. So they could be useful in cases where the hardware defines
> the creation parameters (i.e. because they are fixed and unknown to the
> management driver). But I don't know whether there is any real-world use
> case...


Ok.


> I don't mind if only dynamic vdevs are supported, but please document it
> so the scope/intent is clear.
>
>>>> +\begin{lstlisting}
>>>> +struct virtio_admin_ctrl_vdev_attribute {
>>>> +       u32 device_id;
>>> This name is easy to confuse with the struct virtio_admin_ctrl's
>>> device_id. They have different meanings. Want to rename struct
>>> virtio_admin_ctrl's field to "vdev_id"?
>>
>> This is the virtio device id, but rethink of the design, it's meangless if
>> we don't have a dedicated management device.
>>
>> Since we don't want a virtio-net management device to create a virtio-blk
>> virtual device.
> I see. This means that a physical device wishing to support multiple
> device types needs to expose multiple management devices, even though
> they may share resources (e.g. you can only instantiate 8 vdevs in total
> regardless of their type).
>
> I don't have enough experience with device splitting to say whether this
> design is flexible enough. It seems okay to me.


Ok.

Thanks



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04  8:09 ` Stefan Hajnoczi
@ 2021-08-04  8:51   ` Jason Wang
  2021-08-04 12:50     ` Stefan Hajnoczi
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-04  8:51 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu


在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>
> It might be clearer to call the management device the "parent"
> and virtual devices "children". That way the term "virtual" is avoided -
> it's already used a lot in the spec :).
>
>> +The VIRTIO_ADMIN_CTRL_CONFIG_SET is used to write data to the device
>> +configuration space. As described in struct
>> +virtio_admin_ctrl_vdev_config_set, the command-out-data contains the
>> +offset since the start of the config space, the size of the data and
>> +the data that will be wrote. There's no command-in-data for this
> s/wrote/written/
>
>> +command.
>> +
>> +\devicenormative{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
>> +
>> +The management device MUST fail the device configuration space access
>> +if the driver want to access the range which is out of the config
> s/want/wants/
>
> s/out of/outside/
>
>> +\subsection{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Address}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the address of
>> +a specific virtqueue could be done through the following command:
> s/could be done/is set/


Will fix all of those.


>
>> +\subsection{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
>> +
>> +When the admin virtqueue offers VIRTIO_F_ADMIN_F_CAP_VDEV, virtqueue
>> +status could be set and get through the following command:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_VQ_ENABLE    10
>> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET       0
>> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET       1
>> +
>> +struct virtio_admin_ctrl_vq_status_set {
>> +  u16 queue_index;
>> +  u8 status;
>> +};
>> +
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET is used to set the status to a
>> +specific virtqueue. The command-out-data is the queue index, the
>> +status that is set to the virtqueue (0 disabled, 1 enabled); There's
>> +no command-in-data.
>> +
>> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET is used to get the status of a
>> +specific virtqueue. The command-out-data is the u16 of queue
>> +index. The command-in-data is the virtqueue status (0 disalbed, 1
> s/disalbed/disabled/
>
>> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, management device
> Is the device MMU feature mandatory when VIRTIO_ADMIN_F_VDEV is
> supported?


Currently yes, but a second thought is that, the device MMU could be 
optional. This makes sure this design could be used with the platform 
specific MMU isolation like SIOV/ADI.


>
>> +offers a device MMU for a secure DMA context for each virtual
> "MMU" is not used elsewhere in the VIRTIO specification so it's an
> undefined term. Should this paragraph mention VIRTIO_F_ACCESS_PLATFORM
> in order to explain how the Device MMU affects the virtual device?


Yes, will do.


>
>> +device. The device MMU will translate I/O Virtual Address to transport
> s/Address/Addresses/
>
>> +specific DMA address before using a transport specific way for DMA:
> s/address/addresses/
>
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_VQ_CTRL_MMU    13
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE       1
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET     2
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_MAP          3
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP        4
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET      5
>> +
>> +struct virtio_admin_ctrl_vdev_mmu_asid_set {
>> +  le16 queue_index;
>> +  le64 asid;
>> +};
>> +
>> +struct virtio_admin_ctrl_vdev_mmu_map {
>> +  le64 iova_start;
>> +  le64 iova_end;
>> +  le64 dma_start;
>> +  le32 flags;
>> +};
>> +
>> +/* Read access is allowed */
>> +#define VIRTIO_ADMIN_VQ_MAP_F_READ   (1 << 0)
>> +/* Write access is allowed */
>> +#define VIRTIO_IOMMU_VQ_MAP_F_WRITE  (1 << 1)
>> +
>> +struct virtio_admin_ctrl_vdev_mmu_err {
>> +  le32 reason;
>> +  le16 queue_index;
>> +  le64 asid;
>> +  le64 iova_start;
>> +  le64 iova_end;
>> +  le32 flags;
>> +};
>> +
>> +/* Mapping does not exist */
>> +#define VIRTIO_ADMIN_VQ_MAP_ERR_NON_EXIST (1 << 0)
>> +/* Access violates the permission */
>> +#define VIRTIO_ADMIN_VQ_MAP_ERR_ACC_VIOLATION (1 << 1)
>> +
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE is used to enable or disable
>> +device MMU for a specific virtual device. The command-out-data is a u8
> s/device MMU/the device MMU/
>
>> +for telling whether device MMU is enabled for the virtual device: 0
>> +means to enable and 1 means to disable.
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
>> +address space id to a virtqueue. The command-out-data is the queue
>> +index (\field{queue_index}) and the address space ID (\field{asid})
>> +assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
>> +Address range [\field{iova_start}, \field{iova_end}] to transport
>> +specific DMA address range [\field{dma_start}, \field{dma_start} +
>> + \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
>> +specify the device access permission.
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
>> +Virtual Address ranges that are intersected with the range
>> +[\field{iova_start}, \field{iova_end}].
>> +
>> +There's no command-in-data for all the above four commands.
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
>> +information of the device MMU. There's no command-out-data, the
>> +command-in-date is the queue index and its asid, the iova range and
>> +the access of the operation (as described in struct
>> +virtio_admin_ctrl_vdev_mmu_err).
>> +
>> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
> Does the virtual device need to support running with the device MMU
> disabled?


Yes, device MMU could be disabled.


>
>> +
>> +The management device MUST fail the device MMU command if \field{device_id} is
>> +zero.
>> +
>> +The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
>> +command if the iova range is intersected with a existing range.
>> +
>> +The management device MUST set both DEVICE_NEEDS_RESET and
>> +DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
>> +virtual device.
>> +
>> +The device MMU for the virtual device MUST be disabled upon its reset.
>> +
>> +Upon reset, the virtual device must reset the Address Space ID for
>> +each virtqueue to 0.
>> +
>> +\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
>> +  / Virtio Over Admin Virtqueue / Device MMU}
>> +
>> +The driver MAY choose to disable the device MMU but it MUST make sure
>> +the transport specific method could be used to provide a secure DMA
>> +context for each virtual device.
> What does this mean?


It means when device MMU is disabled. The management driver must make 
sure a platform specific feature like PASID capable IOMMU is used to 
isolate DMA between the virtual devices.


>
> During which stages of the virtual device's lifecycle (ACKNOWLEDGE,
> DRIVER, FEATURES_OK, DRIVER_OK) may the management driver enable/disable
> the device MMU?


My understanding is that it doesn't matter which stage. E.g it should be 
safe to be enabled before DRIVER_OK since it doesn't have any virtqueue 
operations.

The driver just need to make sure:

- device MMU is enabled before DRIVER_OK
- when device MMU is disabled, the transport/platform DMA hardware has 
been setup


>
>> +
>> +The driver MAY query the error of device MMU after DEVICE_MMU_FAIL is set.
>> +
>> +\subsection{Presenting Virtual Device}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Presenting Virtual Device}
>> +
>> +If VIRTIO_ADMIN_F_VDEV is offered by the device. The presenting of
>> +the virtual device requires co-operation between the management
>> +driver and the admin virtqueue. This means, from the view of the
>> +virtual device driver, the transport is done via the communication
>> +with the management device driver. It's up to the software to decide
>> +what kind of method that is needed be used for those communications.
> I think what this is saying is that the management device's admin
> virtqueue is the VIRTIO Transport for the virtual device. In practice
> the VMM will emulate another VIRTIO Transport like virtio-pci for the
> virtual device and then forward operations to the management device's
> admin virtqueue?


Yes.


>
>> +The management driver typically do the following steps for creating a
> s/do/takes/
>
>> +virtual device:
>> +
>> +\begin{enumerate}
>> +\item Determine the virtio id and device specific configuration.
>> +\item Create the virtual devices using VIRTIO_ADMIN_CTRL_VDEV_CREATE
> s/VIRTIO_ADMIN_CTRL_VDEV_CREATE/the VIRTIO_ADMIN_CTRL_VDEV_CREATE/
>
>> +command.
>> +\item Optionally, configure the MSI.
>> +\item Optionally, enable and initialize the device MMU.
>> +\item Setup the necessary communication methods with virtual device driver.
>> +\item Perform device specific setups.
> s/setups/setup/
>
> I tried to imagine what the virtio-blk vdev creation parameters need to
> look like. Here is what I came up with:
>
>    Virtual Device Creation Parameters for Block Devices
>    ----------------------------------------------------
>    The following creation parameters specify the details of a new virtual
>    block device:
>
>    Field        Type   Meaning
>    ----------------------------------------------------------------------
>    blkdev_id    u64    Identifier of the underlying block device that
>                        provides storage. The enumeration and creation of
>                        underlying block devices is
>                        implementation-specific.
>    num_queues   u16    Number of request virtqueues.
>    features_len u8     Number of elements in features[].


For 'elements' do you mean the 'u32 elements'?


>    features[]   u32    Device feature bits to report.
>
>    Creation error codes are as follows:
>
>    Error               Meaning
>    ----------------------------------------------------------------------
>    INVALID_BLKDEV_ID   The underlying block device does not exist.
>    BLKDEV_BUSY         The underlying block device is already in use.
>    BLKDEV_READ_ONLY    The underlying block device is read-only.
>    INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
>    UNSUPPORTED_FEATURE A feature bit was given that the device does not
>                        support.
>
>    If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
>    block device is made available for read-only access.
>
>    Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
>    already in use is given.
>
>    Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
>    does not support writes and the VIRTIO_BLK_F_RO bit is not set in
>    features[].
>
>    The configuration space parameters (see 5.2.4 Device configuration
>    layout) are determined by the device based on the underlying block
>    device capacity, block size, etc.
>
> Note that this doesn't allow overriding configuration space parameters
> (e.g. block size). We probably need to support that in the future for
> live migration compatibility.


I wonder do we need those configuration to be self-descriptive? E.g how 
did the device know that the config contains the blk_size. (I guess it's 
not a good practice to infer this from the config len).

Thanks


>
> Stefan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04  1:37   ` Jason Wang
@ 2021-08-04 10:20     ` Max Gurtovoy
  2021-08-05  1:36       ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Max Gurtovoy @ 2021-08-04 10:20 UTC (permalink / raw)
  To: Jason Wang, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


On 8/4/2021 4:37 AM, Jason Wang wrote:
>
> 在 2021/8/3 下午8:40, Max Gurtovoy 写道:
>>>
>>> +Sometimes it's hard to implement the device in a transport specific
>>> +method. One example is that a physical device may try to present
>>> +multiple virtual devices with a limited transport specific
>>> +resources. Another example is to implement virtual devices which is
>>> +transport independent. In those cases, the admin virtqueue provided by
>>> +the management device could be used to replace the transport specific
>>> +method to implement the virtual device. Then the presenting of the
>>> +virtual device is done through the cooperation between the admin
>>> +virtqueue and the driver.
>>
>> maybe it's me, but I can't understand how admin queue is a transport.
>
>
> The transport is the method that provides basic facility. In this 
> proposal, the admin virtqueue is used to provide basic facility for 
> the virtual device. That is to say, it's the transport for virtual 
> device.
>
>
>>
>> And how can I use admin queue transport to migrate VFs that are 
>> controlled by virtio PCI PF.
>
>
> This live migration support and the admin virtqueue transport are 
> orthogonal. The main motivation of this proposal is used for 
> implementing virtual device transport via admin virtqueue. It's not 
> hard to add new commands for doing live migration for the virtual 
> device, I don't do that since I believe it's expected to be addressed 
> in your proposal.

so why do you call it in the same name that I used in my RFC ? This is 
confusing and causing problems.

You are working on a parallel feature and reviewing my RFC as if it was 
instead of your proposal.

IIUC, in your proposal a non SRIOV device parent will create admin queue 
and using this admin queue you'll be able to create children devices and 
their transport will be admin queue.

That means that the configuration cycles will be trapped by the parent 
device somehow.

This also means we need to merge my RFC first to create infrastructure 
for this RFC.

For admin management that you'll need probably virtio-cli tool from user 
space.

So this proposal is complementary to mine. Your management device will 
negotiate "my" admin_queue feature and you'll need to add more commands 
to this admin queue that are probably in transport specific domain to 
create children.

You do need to handle the configuration cycles that this management 
parent device will need to support.


>
> For virtual device, it's a independent virtio device that could be 
> assigned to secure DMA context/domain,  it is functional equivalent 
> ADI or SF. The difference is that it can work with or without platform 
> support (SIOV or PASID).
>
>
>>
>> And why the regular admin queue that is part of the device queues 
>> can't fit to your needs ?
>
>
> For "regular admin queue", did you mean your proposal. Actually, it's 
> not conflict, we can unify the interface though the motivation is 
> different.
>
>
>>
>> Can you explain your needs ? is it to create a vDPA device from some 
>> SW interface ?
>
>
> As stated in the patch, the needs are:
>
> - Presenting virtual devices with limited transport specific resources
> - Presenting virtual devices without platform support (e.g SR-IOV or 
> SIOV)
>
> We want virtio to have hyper-scalability via slicing at virtio level. 
> It's not directly related to vDPA.
>
> For vDPA, vendor are freed to have their own technology to be hyper 
> scalable (e.g SF, ADI or other stuffs).
>
> Thanks
>
>
>>
>> I don't follow. 
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04  8:51   ` Jason Wang
@ 2021-08-04 12:50     ` Stefan Hajnoczi
  2021-08-05  6:32       ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2021-08-04 12:50 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu

[-- Attachment #1: Type: text/plain, Size: 6983 bytes --]

On Wed, Aug 04, 2021 at 04:51:15PM +0800, Jason Wang wrote:
> 在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
> > On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> > > +for telling whether device MMU is enabled for the virtual device: 0
> > > +means to enable and 1 means to disable.
> > > +
> > > +The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
> > > +address space id to a virtqueue. The command-out-data is the queue
> > > +index (\field{queue_index}) and the address space ID (\field{asid})
> > > +assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
> > > +
> > > +The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
> > > +Address range [\field{iova_start}, \field{iova_end}] to transport
> > > +specific DMA address range [\field{dma_start}, \field{dma_start} +
> > > + \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
> > > +specify the device access permission.
> > > +
> > > +The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
> > > +Virtual Address ranges that are intersected with the range
> > > +[\field{iova_start}, \field{iova_end}].
> > > +
> > > +There's no command-in-data for all the above four commands.
> > > +
> > > +The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
> > > +information of the device MMU. There's no command-out-data, the
> > > +command-in-date is the queue index and its asid, the iova range and
> > > +the access of the operation (as described in struct
> > > +virtio_admin_ctrl_vdev_mmu_err).
> > > +
> > > +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
> > Does the virtual device need to support running with the device MMU
> > disabled?
> 
> 
> Yes, device MMU could be disabled.

I'm a little surprised that there is no way to refuse to run without the
device MMU because I got the impression using MMUs is required for
security/isolation on modern systems.

If I have a physical non-SR-IOV PCI management device, how does the
virtual device access memory when the device MMU is disabled? Would its
address space be identical to the management device's IOVA space?

Maybe you can write a paragraph or two explaining how the address spaces
of management and virtual devices are related?

> 
> 
> > 
> > > +
> > > +The management device MUST fail the device MMU command if \field{device_id} is
> > > +zero.
> > > +
> > > +The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
> > > +command if the iova range is intersected with a existing range.
> > > +
> > > +The management device MUST set both DEVICE_NEEDS_RESET and
> > > +DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
> > > +virtual device.
> > > +
> > > +The device MMU for the virtual device MUST be disabled upon its reset.
> > > +
> > > +Upon reset, the virtual device must reset the Address Space ID for
> > > +each virtqueue to 0.
> > > +
> > > +\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
> > > +  / Virtio Over Admin Virtqueue / Device MMU}
> > > +
> > > +The driver MAY choose to disable the device MMU but it MUST make sure
> > > +the transport specific method could be used to provide a secure DMA
> > > +context for each virtual device.
> > What does this mean?
> 
> 
> It means when device MMU is disabled. The management driver must make sure a
> platform specific feature like PASID capable IOMMU is used to isolate DMA
> between the virtual devices.

Ah, this is related to what I asked above. It's still not 100% clear to
me.

> 
> 
> > 
> > During which stages of the virtual device's lifecycle (ACKNOWLEDGE,
> > DRIVER, FEATURES_OK, DRIVER_OK) may the management driver enable/disable
> > the device MMU?
> 
> 
> My understanding is that it doesn't matter which stage. E.g it should be
> safe to be enabled before DRIVER_OK since it doesn't have any virtqueue
> operations.
> 
> The driver just need to make sure:
> 
> - device MMU is enabled before DRIVER_OK
> - when device MMU is disabled, the transport/platform DMA hardware has been
> setup

Thanks, please include that information.

> > I tried to imagine what the virtio-blk vdev creation parameters need to
> > look like. Here is what I came up with:
> > 
> >    Virtual Device Creation Parameters for Block Devices
> >    ----------------------------------------------------
> >    The following creation parameters specify the details of a new virtual
> >    block device:
> > 
> >    Field        Type   Meaning
> >    ----------------------------------------------------------------------
> >    blkdev_id    u64    Identifier of the underlying block device that
> >                        provides storage. The enumeration and creation of
> >                        underlying block devices is
> >                        implementation-specific.
> >    num_queues   u16    Number of request virtqueues.
> >    features_len u8     Number of elements in features[].
> 
> 
> For 'elements' do you mean the 'u32 elements'?

Yes, u32 array elements.

> >    features[]   u32    Device feature bits to report.
> > 
> >    Creation error codes are as follows:
> > 
> >    Error               Meaning
> >    ----------------------------------------------------------------------
> >    INVALID_BLKDEV_ID   The underlying block device does not exist.
> >    BLKDEV_BUSY         The underlying block device is already in use.
> >    BLKDEV_READ_ONLY    The underlying block device is read-only.
> >    INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
> >    UNSUPPORTED_FEATURE A feature bit was given that the device does not
> >                        support.
> > 
> >    If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
> >    block device is made available for read-only access.
> > 
> >    Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
> >    already in use is given.
> > 
> >    Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
> >    does not support writes and the VIRTIO_BLK_F_RO bit is not set in
> >    features[].
> > 
> >    The configuration space parameters (see 5.2.4 Device configuration
> >    layout) are determined by the device based on the underlying block
> >    device capacity, block size, etc.
> > 
> > Note that this doesn't allow overriding configuration space parameters
> > (e.g. block size). We probably need to support that in the future for
> > live migration compatibility.
> 
> 
> I wonder do we need those configuration to be self-descriptive? E.g how did
> the device know that the config contains the blk_size. (I guess it's not a
> good practice to infer this from the config len).

The device configuration space size and layout is determined by the
device feature bits.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04  8:39       ` Jason Wang
@ 2021-08-04 12:56         ` Stefan Hajnoczi
  2021-08-05  6:33           ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2021-08-04 12:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu

[-- Attachment #1: Type: text/plain, Size: 4101 bytes --]

On Wed, Aug 04, 2021 at 04:39:24PM +0800, Jason Wang wrote:
> 在 2021/8/4 下午2:39, Stefan Hajnoczi 写道:
> > On Wed, Aug 04, 2021 at 11:01:39AM +0800, Jason Wang wrote:
> > > 在 2021/8/3 下午10:51, Stefan Hajnoczi 写道:
> > > > On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> > > > > +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> > > > > +
> > > > > +The capabilites that are supported by the admin virtqueue could be
> > > > > +fetched through the following commands:
> > > > > +
> > > > > +\begin{lstlisting}
> > > > > +#define VIRTIO_ADMIN_CTRL_CAP    0
> > > > > + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
> > > > > +\end{lstlisting}
> > > > > +
> > > > > +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
> > > > > +supported by the admin virtqueue through a u64 which is a bit mask of
> > > > > +the capabilies in command-in-data. There's no command-out-data.
> > > > I'm not sure what this paragraph is describing. Is VIRTIO_ADMIN_CTRL_CAP
> > > > a struct virtio_admin_ctrl::command value?
> > > 
> > > VIRTIO_ADMIN_CTRL_CAP is the class, VIRTIO_ADMIN_CTRL_CAP_GET is the
> > > command.
> > Okay. I found the admin virtqueue command descriptions hard to read, not
> > just because the class/command definitions weren't obvious to me, but
> > also because the command/response layout is described in English instead
> > of a table or C-like notation.
> > 
> > I think something like this would make the commands easier to
> > understand:
> > 
> >    Capabilities supported by the admin virtqueue are fetched as follows:
> > 
> >    Driver->Device:
> >    Field       Value                          Type
> >    -----------------------------------------------
> >    device_id   0                              u64
> >    class       VIRTIO_ADMIN_CTRL_CAP (0)      u16
> >    command     VIRTIO_ADMIN_CTRL_CAP_GET (0)  u16
> > 
> >    Device->Driver:
> >    Field         Value                        Type
> >    -----------------------------------------------
> >    ack           VIRTIO_ADMIN_OK (0)          u8
> >    capabilities  <supported capability bits>  u64
> 
> 
> I just follows styles that is used in virtio-net control vq:
> 
> 
> \begin{lstlisting}
> #define VIRTIO_NET_CTRL_MQ    4
>  #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0 (for automatic receive
> steering)
>  #define VIRTIO_NET_CTRL_MQ_RSS_CONFIG          1 (for configurable receive
> steering)
>  #define VIRTIO_NET_CTRL_MQ_HASH_CONFIG         2 (for configurable hash
> calculation)
> \end{lstlisting}
> 
> ...
> 
> Do you suggest to change that as well?

My preference is to change it because I think describing inputs/outputs
in English makes it hard to understand the exact binary layout.

> 
> 
> > 
> > > > Another question: I wonder why there is an admin virtqueue feature bit
> > > > instead of a new VIRTIO device ID for the management device? Does this
> > > > mean regular virtio-net, virtio-blk, etc devices can have an admin queue
> > > > in addition to their normal role?
> > > 
> > > I think so, I just follow the normal networking PF role which is usually a
> > > network device which allows some kind of remote management.
> > > 
> > > But it doesn't forbid us to create the management device.
> > > 
> > > I think it's better to not mandate the management device for now, or is
> > > there any reason for doing that?
> > Since the admin virtqueue is generic infrastructure (not specific to
> > vdev management) I think it makes sense to use a feature bit and not a
> > separate management device. This wasn't obvious to me from this
> > document, maybe the text can be tweaked to clearly separate the
> > (generic) admin virtqueue from vdev management.
> 
> 
> Maybe adding a device normative like
> 
> "
> 
> The management device MUST offer the admin virtqueue is a device specific
> virtqueue.
> 
> "

Yes, with
s/is/as/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-03  3:20 [RFC PATCH] Introduce admin virtqueue as a new transport Jason Wang
                   ` (2 preceding siblings ...)
  2021-08-04  8:09 ` Stefan Hajnoczi
@ 2021-08-04 13:36 ` Michael S. Tsirkin
  2021-08-05  2:07   ` Jason Wang
  3 siblings, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2021-08-04 13:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, cohuck, stefanha, mgurtovoy, eperezma, lulu

On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> This patch introduces a new transport - the admin virtqueue. This
> transport is useful for implementing virtual devices with a limited
> transport specific resources or presenting the virtual device in a
> transport independent way.
> 
> This means, all the basic device facilities are provided solely via
> the the admin virtqueue. Additionally, the admin virtqueue is also in
> charge of the creating and destroying of the virtual device.
> 
> To be self-contained and not depend on the platform specific
> feature. Device MMU is also introduced for providing the DMA isolation
> among virtual devices.
> 
> With the help of the admin virtqueue, the presenting of the virtual
> device is done via the co-operation between the management device and
> its driver.
> 
> This is just a draft for demonstrating the basic ideas. Some possible
> enhancements:
> 
> - admin event virtqueue for reporting events like interrupts (on the
>   platform withouth MSI) and MMU translation failure
> - hardware friendly MMU translation table (e.g in the memory instead
>   of using control virtqueue commands)
> - command to kick the virtqueue
> 
> Comments are more than welcomed.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  content.tex | 639 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 639 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 620c0e2..1f66d42 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>  
> +\item[DEVICE_MMU_FAIL (32)] Indicates that the device MMU has
> +  experienced an error from which it can't recover.
> +
>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -515,6 +518,642 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
>  Virtio can use various different buses, thus the standard is split
>  into virtio general and bus-specific sections.
>  
> +\section{Virtio Over Admin Virtqueue}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue}
> +
> +Sometimes it's hard to implement the device in a transport specific
> +method. One example is that a physical device may try to present
> +multiple virtual devices with a limited transport specific
> +resources. Another example is to implement virtual devices which is
> +transport independent. In those cases, the admin virtqueue provided by
> +the management device could be used to replace the transport specific
> +method to implement the virtual device.

terminology here needs clarification. Especially talking about
virtual devices is confusing. I propose management device and
managed device. Other options exist. Also pls give examples
such as PF/VF.


> Then the presenting of the
> +virtual device is done through the cooperation between the admin
> +virtqueue and the driver.

A natural question to ask is why is this a VQ and not a device?
Is this because people want to implement a VQ as part
of an arbitrary device?



> +\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over Admin Virtqueue / Basic Concepts}
> +
> +The device that offers the admin virtqueue (via feature
> +VIRTIO_F_ADMIN_VQ) is the management device of the virtual
> +devices.

Don't we need a way to specify how many such VQs are there and what
their numbers are? Doing this in a device specific way seems a bit
annoying ...



> All commands are of the following form:
> +
> +\begin{lstlisting}
> +struct virtio_admin_ctrl {
> +        u64 device_id;
> +        u16 class;
> +        u16 command;
> +        u8 command-out-data[];
> +        u8 ack;
> +        u8 command-in-data[]
> +};
> +
> +/* ack values */
> +#define VIRTIO_ADMIN_OK     0
> +#define VIRTIO_ADMIN_ERR    1
> +\end{lstlisting}
> +
> +The device_id, class, command and command-out-data are set by
> +the driver, and the device sets the ack and command-in-data. 0 is used
> +for identify the management device itself.
> +
> +\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
> +
> +The virtual device MUST not offer VIRTIO_F_ADMIN_VQ feature.
> +
> +\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
> +
> +The driver SHOULD negotiate VIRTIO_F_ADMIN_VQ if the device offers it.
> +
> +\subsection{Virtual Device Discovery}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtual Device Discovery}
> +
> +The management device is discovered through a transport and device
> +specific method. Virtual devices is created and discovered via the
> +admin virtqueue.
> +
> +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The capabilites that are supported by the admin virtqueue could be
> +fetched through the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_CAP    0
> + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
> +supported by the admin virtqueue through a u64 which is a bit mask of
> +the capabilies in command-in-data. There's no command-out-data.
> +
> +The capabilies that is currently supported are:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_F_CAP_VDEV    1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_F_CAP_VDEV capability demonstrates that the virtual
> +devices is created, configured and destroyed through admin
> +virtqueue. That means the admin virtqueue is the transport for the
> +virtual devices.

How about using a feature bit for this? Or having this in the config space.
This might call for a generic config space feature, but it's not
the first time we want that, maybe it's time.


> +\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when
> +VIRTIO_F_ADMIN_VQ is offered.
> +
> +The management device MUST fail VIRTIO_ADMIN_CTRL_CAP class when the
> +\field{device_id} is not zero.
> +
> +\drivernormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
> +
> +The driver MUST use 0 as \field{device_id} for VIRTIO_ADMIN_CTRL_CAP
> +class.
> +
> +\subsection{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capibility,
> +virtual devices must be created and discovered through the admin
> +virtqueue.
> +
> +\begin{lstlisting}
> +struct virtio_admin_ctrl_vdev_attribute {
> +       u32 device_id;
> +       u8 config[];
> +};
> +
> +#define VIRTIO_ADMIN_CTRL_VDEV    2
> + #define VIRTIO_ADMIN_CTRL_VDEV_CREATE        0
> + #define VIRTIO_ADMIN_CTRL_VDEV_DESTROY        1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VDEV_CREAT command is used to create a virtual
> +device. The command-out-data for VIRTIO_ADMIN_CTRL_CREATE is the
> +virtio device id (\field{device_id}) and device specific configuration
> +(\field{config}) for creating the virtual device. When succeed, the
> +device returns a u64 as a unique identifier of the created virtual
> +device in command-in-data.

So how are we going to specify config? Per device type?


> +The VIRTIO_ADMIN_CTRL_VDEV_DESTROY command is used to destroy a
> +virtual device which is identified by its 64bit identifier
> +\field{virtual_device_id}. There's no command-in-data for
> +VIRTIO_ADMIN_CTRL_DESTROY command.

So I am confused here. Rest of the spec seems to map driver
actions to commands on the admin VQ. However where do the create and
destroy commands coming from? If they have a separate source
from driver commands, why do they share the same VQ?


> +\devicenormative{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_CREATE if
> +\field{device_id} is not 0.
> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_DESTROY if
> +\field{device_id} is 0.
> +
> +All virtual devices MUST be created via admin virtqueue if the admin
> +virtqueue offers VIRTIO_F_CTRL_VDEV.
> +
> +The management device MAY map implement the virtual device in a
> +transport specific way.


I'm not sure what does this mean.

> +\drivernormative{Driver Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
> +
> +The management driver MUST use 0 as \field{device_id} for
> +VIRTIO_ADMIN_CTRL_VDEV_CREATE command.
> +
> +The management driver SHOULD make sure the virtual device is not used
> +by any driver before trying to destroy it.

Device drivers are within guests. Not sure how this can be
accomplished.

> +
> +\subsection{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
> +the feature negotiation of virtual devices could be done by the
> +following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_FEAT    3
> + #define VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET        0
> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET        1
> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET        2
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET is to get the features offered
> +by a virtual device.
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET is for driver to accept feature
> +bits offered by the virtual device.
> +
> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET is to get the features accepted
> +by both the virtual driver and the device.

So there's a lot of text here to basically pass config read/writes
over a VQ. How about specifying admin VQ in terms of e.g. virtio PCI
transport? Thus basically supply read/write commands and that's it?


> +The features is 64 bits mask of the virtio features bit. For
> +VIRTIO_ADMIN_CTRL_DRIVER_SET, the feature is passed to the device
> +through command-out-data. For VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET and
> +VIRTIO_ADMIN_CTRL_DRIVER_GET the feature is returned for the device
> +through command-in-data.
> +
> +\devicenormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +The management device MUST fail VIRTIO_ADMIN_F_CTRL_FEAT class for the
> +command that use 0 as its \field{virtual_device_id}.
> +
> +\drivernormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
> +
> +The management driver MAY mediate between the feature negotiation
> +request of the virtual devices and the admin virtqueue. E.g when
> +offering features to the virtual device, the management driver MAY
> +exclude some features in order to limit the behaviour of the virtual
> +device.
> +
> +\subsection{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +When the admin virtqueue offers VIRTIO_ADNIN_F_CAP_VDEV capability,
> +the status of virtual device could be accessed by the following
> +commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_STATUS    4
> + #define VIRTIO_ADMIN_CTRL_STATUS_SET        0
> + #define VIRTIO_ADMIN_CTRL_STATUS_GET        1
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_STATUS_SET is used to set the device status of
> +the virtual device here. The command-out-data is the one byte status
> +to set to the device. There's no command-in-data for this command.
> +
> +The VIRTIO_ADMIN_CTRL_STATUS_GET is used to get the device status of
> +the virtual device. The command-in-data is the one byte status
> +returned from the device. There's no command-out-data for this
> +command.
> +
> +\devicenormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +The management device MUST start the reset of a virtual device when 0
> +is written via VIRTIO_ADMIN_CTRL_STATUS_SET, the success of this
> +command demonstrate the success of the reset.
> +
> +The management device MUST present 0 through
> +VIRTIO_ADMIN_CTRL_STATUS_GET once the reset is done.
> +
> +The management device MUST fail the device status access if
> +\field{device_id} is zero.
> +
> +\drivernormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
> +
> +After writing 0 via VIRTIO_ADMIN_CTRL_STATUS_SET, the driver MUST wait
> +for the success of the command before re-initializing the device.
> +
> +\subsection{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Genreation}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
> +the device generation could be read from the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_GENERATION    5
> + #define VIRTIO_ADMIN_CTRL_GENERATION_GET        0
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_GENERATION_GET is used to get the device generation
> +of the virtual device. The command-in-data is the u32 device
> +generation returned from the device. There's no command-out-data for
> +this command.
> +
> +\devicenormative{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Generation}
> +
> +The device MUST present a changed config_generation after the driver
> +has read a device-specific configuration value which has changed since
> +any part of the device-specific configuration was last read.
> +
> +The device MUST fail the device generation access if \field{device_id} is zero.
> +
> +\subsection{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the
> +config space of a virtual device could be accessed from
> +VIRTIO_ADMIN_CTRL_CONFIG_GET and VIRTIO_ADMIN_CTRL_CONFIG_SET.
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_CONFIG    6
> +  #define VIRTIO_ADMIN_CTRL_CONFIG_GET        0
> +  #define VIRTIO_ADMIN_CTRL_CONFIG_SET        1
> +
> +struct virtio_admin_ctrl_vdev_config_get {
> +       u32 offset;
> +       u32 size;
> +};
> +
> +struct virtio_admin_ctrl_vdev_config_set {
> +       u32 offset;
> +       u32 size;
> +       u8  data[];
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_CONFIG_GET is used to read data from the
> +device configuration space. As described in struct
> +virtio_admin_ctrl_vdev_config_get, The command-out-data is the offset
> +since the start of the config space and the size of the data. The
> +command-in-data is the array of u8 data that read from the config
> +space.
> +
> +The VIRTIO_ADMIN_CTRL_CONFIG_SET is used to write data to the device
> +configuration space. As described in struct
> +virtio_admin_ctrl_vdev_config_set, the command-out-data contains the
> +offset since the start of the config space, the size of the data and
> +the data that will be wrote. There's no command-in-data for this
> +command.
> +
> +\devicenormative{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
> +
> +The management device MUST fail the device configuration space access
> +if the driver want to access the range which is out of the config
> +space.
> +
> +The management device MUST fail the device configuration space access
> +if \field{device_id} is zero.
> +
> +\subsection{MSI Configuration}label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the MSI entry
> +for a specific virtqueue could be set through following command:
>

I think this is a bit problematic. E.g. for PCI MSI is programmed through
standard registers. Specifying address is data is insufficient, neither
is masking and enabling through device specific registers.
Referring to a vector seems more correct.
Further, need to think about how will all this be generalized
outside of PCI.

> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_MSI    7
> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_SET        0
> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE     1
> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_MASK       2
> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET    3
> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE 4
> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK   5
> +
> +struct virtio_admin_ctrl_vdev_msi_vq_set {
> +       u16 queue_index;
> +       u64 addr;
> +       u32 data;
> +};
> +
> +struct virtio_admin_ctrl_vdev_msi_vq_enable {
> +       u16 queue_index;
> +       u8 enable;
> +};
> +
> +struct virtio_admin_ctrl_vdev_msi_vq_mask {
> +       u16 queue_index;
> +       u8 mask;
> +};
> +
> +struct virtio_admin_ctrl_vdev_msi_config {
> +       u64 addr;
> +       u32 data;
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_MSI_VQ_SET is used to set the MSI entry for a
> +specific virtqueue. The command-out-data is the virtqueue index and
> +the MSI address and data (as described in struct
> +virtio_admin_ctrl_vdev_msix_vq_set).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE and is used to enable or disable
> +MSI interrupt for a specific virtqueue. The command-out-data is the
> +virtqueue index and whether to enable the MSI: 0 means to enable and 1
> +means to disable (as described in struct
> +virtio_admin_ctrl_vdev_msi_vq_enable).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_VQ_MASK and is used to mask or unmask MSI
> +interrupt for a specific virtqueue. The command-out-data is the
> +virtqueue index and the mask status: 0 means unmak and 1 means mask
> +(as described in struct virtio_admin_ctrl_vdev_msi_vq_mask).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET is used to set the MSI entry
> +for the config interrupt. The command-out-data is the MSI address and
> +data (as described in struct virtio_admin_ctrl_vdev_msix_config).
> +
> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE is used to enable and disable
> +MSI for config space. The command-out-data is an u8: 0 means to
> +disable and 1 means to enable.
> +
> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK is used to mask and unmask MSI
> +interrupt for config space. The command-out-data is an u8: 0 means to
> +mask and 1 means to unmask.
> +
> +There's no command-in-data for all the above MSI commands.
> +
> +\devicenormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
> +
> +The virtual device MUST record the pending MSI interrupt and
> +generate the MSI interrupt is it was pending after unmasking.
> +
> +The virtual MUST disable the MSI for both virtqueue and config space
> +upon reset.
> +
> +\drivernormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
> +
> +The driver MUST allocate transport or platform specific MSI entries
> +for both virtqueue and config space if it wants to use interrupt.
> +
> +The driver MAY choose disable the MSI if polling is used.
> +
> +\subsection{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Address}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the address of
> +a specific virtqueue could be done through the following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_ADDR    9
> + #define VIRTIO_ADMIN_CTRL_VQ_ADDR_SET        0
> +
> +struct virtio_admin_ctrl_vdev_vq_addr {
> +       u16 queue_index;
> +       u64 device_area;
> +       u64 descriptor_area;
> +       u64 driver_area;
> +};
> +\end{lstlisting}
> +
> +The command-out-data is the queue index, the addresses of device area,
> +descriptor area and driver area (as described in struct
> +virtio_admin_ctrl_vdev_vq_addr); There's no command-in-data.
> +
> +\devicenormative{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueeu Address}
> +
> +The management device MUST fail the commands of class
> +VIRTIO_ADMIN_CTRL_VQ_ADDR if \field{device_id} is zero.
> +
> +\subsection{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
> +
> +When the admin virtqueue offers VIRTIO_F_ADMIN_F_CAP_VDEV, virtqueue
> +status could be set and get through the following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_ENABLE    10
> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET       0
> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET       1
> +
> +struct virtio_admin_ctrl_vq_status_set {
> +  u16 queue_index;
> +  u8 status;
> +};
> +
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET is used to set the status to a
> +specific virtqueue. The command-out-data is the queue index, the
> +status that is set to the virtqueue (0 disabled, 1 enabled); There's
> +no command-in-data.
> +
> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET is used to get the status of a
> +specific virtqueue. The command-out-data is the u16 of queue
> +index. The command-in-data is the virtqueue status (0 disalbed, 1
> +enabled).
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
> +
> +When disabled, the virtual device MUST stop processing requests from
> +this virtqueue.
> +
> +The management device MUST present a 0 via
> +VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET on reset of the virtual device.
> +
> +The management device MUST fail the virtqueue status access if
> +\field{device_id} is zero.
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
> +
> +The driver MUST configure the other virtqueue fields before enabling
> +the virtqueue with VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET.
> +
> +\subsection{Virtqueue Size}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, virtqueue size
> +could be accessed through the following command:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_SIZE    11
> + #define VIRTIO_ADMIN_CTRL_VQ_SIZE_SET       0
> + #define VIRTIO_ADMIN_CTRL_VQ_SIZE_GET       1
> +
> +struct virtio_admin_ctrl_vdev_vq_size_set {
> +       u16 queue_index;
> +       u16 size;
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VQ_SIZE_SET command is used to set the virtqueue
> +size. The command-out-data is the queue index and the size of the
> +virtqueue (as described in struct
> +virtio_admin_ctrl_vdev_vq_size_set). There's no command-in-data.
> +
> +The VIRTIO_ADMIN_CTRL_VQ_SIZE_GET command is used to get the virtqueue
> +size. On reset, the maximum queue size supported by the device is
> +returned. The command-out-data is the u16 of the virtqueue index. The
> +command-in-data is the u16 of queue size for the virtqueue.
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
> +
> +The management device MUST fail the virtqueue size access if
> +\field{device_id} is zero.
> +
> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the virtqueue
> +notification could be done through the following commands:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_CTRL_VQ_NOTIFY    12
> + #define VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET          1
> +
> +struct virtio_admin_ctrl_vdev_vq_notification_area {
> +       le64 addr
> +       le64 size;
> +};
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET is used to get the transport
> +specific address area that can be used to notify a virtqueue. The
> +command-out-data is a u16 of the virtqueue index. The command-in-data
> +contains the address and the size of the notification area (as
> +described in struct virtio_admin_ctrl_vdev_vq_notification_area).
> +
> +\devicenormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
> +
> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET if
> +there's no transport specific notification address for a virtqueue of
> +its virtual device.
> +
> +The management device MUST fail the virtqueue notification access if
> +\field{device_id} is zero.
> +
> +The management device MUST forbid the notification area of a specific
> +virtual device to be accessed from another virtual device.
> +
> +\drivernormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
> +
> +The driver MAY choose to notify the virtqueue by writing the queue
> +index at address \field{addr} which is fetched from the
> +VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET command.
> +
> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
> +
> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, management device
> +offers a device MMU for a secure DMA context for each virtual
> +device. The device MMU will translate I/O Virtual Address to transport
> +specific DMA address before using a transport specific way for DMA:
> +
> +\begin{lstlisting}
> +#define VIRTIO_ADMIN_VQ_CTRL_MMU    13
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE       1
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET     2
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_MAP          3
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP        4
> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET      5
> +
> +struct virtio_admin_ctrl_vdev_mmu_asid_set {
> +  le16 queue_index;
> +  le64 asid;
> +};
> +
> +struct virtio_admin_ctrl_vdev_mmu_map {
> +  le64 iova_start;
> +  le64 iova_end;
> +  le64 dma_start;
> +  le32 flags;
> +};
> +
> +/* Read access is allowed */
> +#define VIRTIO_ADMIN_VQ_MAP_F_READ   (1 << 0)
> +/* Write access is allowed */
> +#define VIRTIO_IOMMU_VQ_MAP_F_WRITE  (1 << 1)
> +
> +struct virtio_admin_ctrl_vdev_mmu_err {
> +  le32 reason;
> +  le16 queue_index;
> +  le64 asid;
> +  le64 iova_start;
> +  le64 iova_end;
> +  le32 flags;
> +};
> +
> +/* Mapping does not exist */
> +#define VIRTIO_ADMIN_VQ_MAP_ERR_NON_EXIST (1 << 0)
> +/* Access violates the permission */
> +#define VIRTIO_ADMIN_VQ_MAP_ERR_ACC_VIOLATION (1 << 1)
> +
> +\end{lstlisting}
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE is used to enable or disable
> +device MMU for a specific virtual device. The command-out-data is a u8
> +for telling whether device MMU is enabled for the virtual device: 0
> +means to enable and 1 means to disable.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
> +address space id to a virtqueue. The command-out-data is the queue
> +index (\field{queue_index}) and the address space ID (\field{asid})
> +assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
> +Address range [\field{iova_start}, \field{iova_end}] to transport
> +specific DMA address range [\field{dma_start}, \field{dma_start} +
> + \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
> +specify the device access permission.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
> +Virtual Address ranges that are intersected with the range
> +[\field{iova_start}, \field{iova_end}].
> +
> +There's no command-in-data for all the above four commands.
> +
> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
> +information of the device MMU. There's no command-out-data, the
> +command-in-date is the queue index and its asid, the iova range and
> +the access of the operation (as described in struct
> +virtio_admin_ctrl_vdev_mmu_err).
> +
> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
> +
> +The management device MUST fail the device MMU command if \field{device_id} is
> +zero.
> +
> +The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
> +command if the iova range is intersected with a existing range.
> +
> +The management device MUST set both DEVICE_NEEDS_RESET and
> +DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
> +virtual device.
> +
> +The device MMU for the virtual device MUST be disabled upon its reset.
> +
> +Upon reset, the virtual device must reset the Address Space ID for
> +each virtqueue to 0.
> +
> +\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
> +  / Virtio Over Admin Virtqueue / Device MMU}
> +
> +The driver MAY choose to disable the device MMU but it MUST make sure
> +the transport specific method could be used to provide a secure DMA
> +context for each virtual device.
> +
> +The driver MAY query the error of device MMU after DEVICE_MMU_FAIL is set.
> +
> +\subsection{Presenting Virtual Device}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Presenting Virtual Device}
> +
> +If VIRTIO_ADMIN_F_VDEV is offered by the device. The presenting of
> +the virtual device requires co-operation between the management
> +driver and the admin virtqueue. This means, from the view of the
> +virtual device driver, the transport is done via the communication
> +with the management device driver. It's up to the software to decide
> +what kind of method that is needed be used for those communications.
> +
> +The management driver typically do the following steps for creating a
> +virtual device:
> +
> +\begin{enumerate}
> +\item Determine the virtio id and device specific configuration.
> +\item Create the virtual devices using VIRTIO_ADMIN_CTRL_VDEV_CREATE
> +command.
> +\item Optionally, configure the MSI.
> +\item Optionally, enable and initialize the device MMU.
> +\item Setup the necessary communication methods with virtual device driver.
> +\item Perform device specific setups.
> +\item Let the virtual device to be probed by the virtual device
> +driver. The management driver will then use the admin virtqueue to
> +implement the requests of basic facility from the virtual device
> +driver.
> +\end{enumerate}
> +
>  \section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
>  
>  Virtio devices are commonly implemented as PCI devices.
> -- 
> 2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04 10:20     ` Max Gurtovoy
@ 2021-08-05  1:36       ` Jason Wang
  2021-08-05 12:37         ` Max Gurtovoy
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-05  1:36 UTC (permalink / raw)
  To: Max Gurtovoy, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


在 2021/8/4 下午6:20, Max Gurtovoy 写道:
>
> On 8/4/2021 4:37 AM, Jason Wang wrote:
>>
>> 在 2021/8/3 下午8:40, Max Gurtovoy 写道:
>>>>
>>>> +Sometimes it's hard to implement the device in a transport specific
>>>> +method. One example is that a physical device may try to present
>>>> +multiple virtual devices with a limited transport specific
>>>> +resources. Another example is to implement virtual devices which is
>>>> +transport independent. In those cases, the admin virtqueue 
>>>> provided by
>>>> +the management device could be used to replace the transport specific
>>>> +method to implement the virtual device. Then the presenting of the
>>>> +virtual device is done through the cooperation between the admin
>>>> +virtqueue and the driver.
>>>
>>> maybe it's me, but I can't understand how admin queue is a transport.
>>
>>
>> The transport is the method that provides basic facility. In this 
>> proposal, the admin virtqueue is used to provide basic facility for 
>> the virtual device. That is to say, it's the transport for virtual 
>> device.
>>
>>
>>>
>>> And how can I use admin queue transport to migrate VFs that are 
>>> controlled by virtio PCI PF.
>>
>>
>> This live migration support and the admin virtqueue transport are 
>> orthogonal. The main motivation of this proposal is used for 
>> implementing virtual device transport via admin virtqueue. It's not 
>> hard to add new commands for doing live migration for the virtual 
>> device, I don't do that since I believe it's expected to be addressed 
>> in your proposal.
>
> so why do you call it in the same name that I used in my RFC ? This is 
> confusing and causing problems.


Max,

I really think the game of "who comes first" is meaningless.

I've used the terminology like "admin virtqueue" sometime early this year:

https://lists.oasis-open.org/archives/virtio-comment/202101/msg00034.html

I start the work of the admin virtqueue as a transport since then and I 
think I don't say "you call it the same name that I used before".


>
> You are working on a parallel feature and reviewing my RFC as if it 
> was instead of your proposal.


Firstly, though they may have the same interface/commands, the two 
proposals serves for completely different goals.

Secondly, it's about how to justify your proposal in the community, and 
I think I don't get the convincing answers for the following two points:

1) I have said that your proposal of using admin virtqueue for doing 
live migration makes sense, but we need the per function interface for 
nested virt

2) I've pointed out that using the general vitqueue for carrying vendor 
specific command breaks the efforts of spec as a standard device

Lastly, this proposal is RFC, it's not perfect for sure. The most 
important thing for the current stage is not about how and when this can 
be merged but whether or not this approach can work. I post them now 
since a talk about the hyper scalability will be given at the KVM Forum 
then I need to post this as one of the approach before as a reference. 
There are vendors that are asking something like this as a reference for 
having better scalability than SR-IOV.


>
> IIUC, in your proposal a non SRIOV device parent will create admin 
> queue and using this admin queue you'll be able to create children 
> devices and their transport will be admin queue.
>
> That means that the configuration cycles will be trapped by the parent 
> device somehow.


That's the way for having better scalability. And this is also the 
approach that SIOV used.


>
> This also means we need to merge my RFC first to create infrastructure 
> for this RFC.


It doesn't matter which will be merged. We should guarantee:

1) The idea is justified by the community
2) The merged proposal is extensible for accepting new features and commands

For 2) I think both proposals can do that. And to me, it's not hard to 
switch to the similar interface as you invent.


>
> For admin management that you'll need probably virtio-cli tool from 
> user space.


What does virtio-cli do? We've already had vdpa that is integrated into 
iproute2.

And let me clarify the concept again: vDPA is a superset of virtio. That 
means virtio could be treated as one kind of vDPA.

In this sense, I don't see a value of re-inventing the wheels again in 
the virtio-cli.


>
> So this proposal is complementary to mine. Your management device will 
> negotiate "my" admin_queue feature and you'll need to add more 
> commands to this admin queue that are probably in transport specific 
> domain to create children.


I think we can go either:

1) Make two virtqueues separately

or

2) Using a single virtqueue

And I would like to co-operate if 2) makes more sense.


>
> You do need to handle the configuration cycles that this management 
> parent device will need to support.


My proposals have already supported basic management like device 
creation and destroy. And I think it's not hard to extend it to other.

To reduce the complexity, I've stripped out a lot of features from the 
first RFC. It's better to start from the minimal set.

Thanks


>
>
>>
>> For virtual device, it's a independent virtio device that could be 
>> assigned to secure DMA context/domain,  it is functional equivalent 
>> ADI or SF. The difference is that it can work with or without 
>> platform support (SIOV or PASID).
>>
>>
>>>
>>> And why the regular admin queue that is part of the device queues 
>>> can't fit to your needs ?
>>
>>
>> For "regular admin queue", did you mean your proposal. Actually, it's 
>> not conflict, we can unify the interface though the motivation is 
>> different.
>>
>>
>>>
>>> Can you explain your needs ? is it to create a vDPA device from some 
>>> SW interface ?
>>
>>
>> As stated in the patch, the needs are:
>>
>> - Presenting virtual devices with limited transport specific resources
>> - Presenting virtual devices without platform support (e.g SR-IOV or 
>> SIOV)
>>
>> We want virtio to have hyper-scalability via slicing at virtio level. 
>> It's not directly related to vDPA.
>>
>> For vDPA, vendor are freed to have their own technology to be hyper 
>> scalable (e.g SF, ADI or other stuffs).
>>
>> Thanks
>>
>>
>>>
>>> I don't follow. 
>>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04 13:36 ` Michael S. Tsirkin
@ 2021-08-05  2:07   ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2021-08-05  2:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, cohuck, stefanha, mgurtovoy, eperezma, lulu


在 2021/8/4 下午9:36, Michael S. Tsirkin 写道:
> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>> This patch introduces a new transport - the admin virtqueue. This
>> transport is useful for implementing virtual devices with a limited
>> transport specific resources or presenting the virtual device in a
>> transport independent way.
>>
>> This means, all the basic device facilities are provided solely via
>> the the admin virtqueue. Additionally, the admin virtqueue is also in
>> charge of the creating and destroying of the virtual device.
>>
>> To be self-contained and not depend on the platform specific
>> feature. Device MMU is also introduced for providing the DMA isolation
>> among virtual devices.
>>
>> With the help of the admin virtqueue, the presenting of the virtual
>> device is done via the co-operation between the management device and
>> its driver.
>>
>> This is just a draft for demonstrating the basic ideas. Some possible
>> enhancements:
>>
>> - admin event virtqueue for reporting events like interrupts (on the
>>    platform withouth MSI) and MMU translation failure
>> - hardware friendly MMU translation table (e.g in the memory instead
>>    of using control virtqueue commands)
>> - command to kick the virtqueue
>>
>> Comments are more than welcomed.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   content.tex | 639 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 639 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 620c0e2..1f66d42 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>   
>> +\item[DEVICE_MMU_FAIL (32)] Indicates that the device MMU has
>> +  experienced an error from which it can't recover.
>> +
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -515,6 +518,642 @@ \chapter{Virtio Transport Options}\label{sec:Virtio Transport Options}
>>   Virtio can use various different buses, thus the standard is split
>>   into virtio general and bus-specific sections.
>>   
>> +\section{Virtio Over Admin Virtqueue}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue}
>> +
>> +Sometimes it's hard to implement the device in a transport specific
>> +method. One example is that a physical device may try to present
>> +multiple virtual devices with a limited transport specific
>> +resources. Another example is to implement virtual devices which is
>> +transport independent. In those cases, the admin virtqueue provided by
>> +the management device could be used to replace the transport specific
>> +method to implement the virtual device.
> terminology here needs clarification. Especially talking about
> virtual devices is confusing. I propose management device and
> managed device. Other options exist.


Sure, will do.


>   Also pls give examples
> such as PF/VF.


Ok.


>
>
>> Then the presenting of the
>> +virtual device is done through the cooperation between the admin
>> +virtqueue and the driver.
> A natural question to ask is why is this a VQ and not a device?
> Is this because people want to implement a VQ as part
> of an arbitrary device?


Yes, and actually, I don't want to exclude the possibility of having a 
dedicated management device.

But using an arbitrary device may have advantages (e.g for net it have 
the interface for remote management).

And based on the discussion with Stefan, I think we can add the 
following text:

"
The management device MUST offer the admin virtqueue in a device 
specific virtqueue
"


>
>
>
>> +\subsection{Basic Concepts}\label{sec:Virtio Transport Options / Virtio over Admin Virtqueue / Basic Concepts}
>> +
>> +The device that offers the admin virtqueue (via feature
>> +VIRTIO_F_ADMIN_VQ) is the management device of the virtual
>> +devices.
> Don't we need a way to specify how many such VQs are there and what
> their numbers are?


Yes.


>   Doing this in a device specific way seems a bit
> annoying ...
>

I'm not sure I get the comment here, the above text is the definition of 
the management device.


>
>> All commands are of the following form:
>> +
>> +\begin{lstlisting}
>> +struct virtio_admin_ctrl {
>> +        u64 device_id;
>> +        u16 class;
>> +        u16 command;
>> +        u8 command-out-data[];
>> +        u8 ack;
>> +        u8 command-in-data[]
>> +};
>> +
>> +/* ack values */
>> +#define VIRTIO_ADMIN_OK     0
>> +#define VIRTIO_ADMIN_ERR    1
>> +\end{lstlisting}
>> +
>> +The device_id, class, command and command-out-data are set by
>> +the driver, and the device sets the ack and command-in-data. 0 is used
>> +for identify the management device itself.
>> +
>> +\devicenormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
>> +
>> +The virtual device MUST not offer VIRTIO_F_ADMIN_VQ feature.
>> +
>> +\drivernormative{\subsubsection}{Basic Concepts}{Virtio Transport Options / Virtio Over Admin Virtqueue / Basic Concepts}
>> +
>> +The driver SHOULD negotiate VIRTIO_F_ADMIN_VQ if the device offers it.
>> +
>> +\subsection{Virtual Device Discovery}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtual Device Discovery}
>> +
>> +The management device is discovered through a transport and device
>> +specific method. Virtual devices is created and discovered via the
>> +admin virtqueue.
>> +
>> +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>> +
>> +The capabilites that are supported by the admin virtqueue could be
>> +fetched through the following commands:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_CAP    0
>> + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
>> +supported by the admin virtqueue through a u64 which is a bit mask of
>> +the capabilies in command-in-data. There's no command-out-data.
>> +
>> +The capabilies that is currently supported are:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_F_CAP_VDEV    1
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_F_CAP_VDEV capability demonstrates that the virtual
>> +devices is created, configured and destroyed through admin
>> +virtqueue. That means the admin virtqueue is the transport for the
>> +virtual devices.
> How about using a feature bit for this? Or having this in the config space.
> This might call for a generic config space feature, but it's not
> the first time we want that, maybe it's time.


Feature bit works and seems better than the config space. I can change.


>
>
>> +\devicenormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>> +
>> +The management device MUST support VIRTIO_ADMIN_CTRL_CAP class when
>> +VIRTIO_F_ADMIN_VQ is offered.
>> +
>> +The management device MUST fail VIRTIO_ADMIN_CTRL_CAP class when the
>> +\field{device_id} is not zero.
>> +
>> +\drivernormative{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>> +
>> +The driver MUST use 0 as \field{device_id} for VIRTIO_ADMIN_CTRL_CAP
>> +class.
>> +
>> +\subsection{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capibility,
>> +virtual devices must be created and discovered through the admin
>> +virtqueue.
>> +
>> +\begin{lstlisting}
>> +struct virtio_admin_ctrl_vdev_attribute {
>> +       u32 device_id;
>> +       u8 config[];
>> +};
>> +
>> +#define VIRTIO_ADMIN_CTRL_VDEV    2
>> + #define VIRTIO_ADMIN_CTRL_VDEV_CREATE        0
>> + #define VIRTIO_ADMIN_CTRL_VDEV_DESTROY        1
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_VDEV_CREAT command is used to create a virtual
>> +device. The command-out-data for VIRTIO_ADMIN_CTRL_CREATE is the
>> +virtio device id (\field{device_id}) and device specific configuration
>> +(\field{config}) for creating the virtual device. When succeed, the
>> +device returns a u64 as a unique identifier of the created virtual
>> +device in command-in-data.
> So how are we going to specify config? Per device type?


Yes, the config format is device specific. I will add some text for this.


>
>
>> +The VIRTIO_ADMIN_CTRL_VDEV_DESTROY command is used to destroy a
>> +virtual device which is identified by its 64bit identifier
>> +\field{virtual_device_id}. There's no command-in-data for
>> +VIRTIO_ADMIN_CTRL_DESTROY command.
> So I am confused here. Rest of the spec seems to map driver
> actions to commands on the admin VQ. However where do the create and
> destroy commands coming from? If they have a separate source
> from driver commands, why do they share the same VQ?


It means the virtual/managed devices are provisioned dynamically. That 
is the hardware can structure the resources of virtual/managed device 
based on the requests from the user.

E.g when user is asking to create a networking device with 4 queue 
pairs. The hardware build up the resources dynamically during VDEV_CREAT 
and the resources are freed during VDEV_DESTROY.

I think we can add the support for statically pre-allocated 
virtual/managed device.


>
>
>> +\devicenormative{Device Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
>> +
>> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_CREATE if
>> +\field{device_id} is not 0.
>> +
>> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VDEV_DESTROY if
>> +\field{device_id} is 0.
>> +
>> +All virtual devices MUST be created via admin virtqueue if the admin
>> +virtqueue offers VIRTIO_F_CTRL_VDEV.
>> +
>> +The management device MAY map implement the virtual device in a
>> +transport specific way.
>
> I'm not sure what does this mean.


Yes, will remove.


>
>> +\drivernormative{Driver Management}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Management}
>> +
>> +The management driver MUST use 0 as \field{device_id} for
>> +VIRTIO_ADMIN_CTRL_VDEV_CREATE command.
>> +
>> +The management driver SHOULD make sure the virtual device is not used
>> +by any driver before trying to destroy it.
> Device drivers are within guests. Not sure how this can be
> accomplished.


I can remove this. But even if drivers are within guests, form the host 
it's still bound to a driver.


>
>> +
>> +\subsection{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
>> +the feature negotiation of virtual devices could be done by the
>> +following commands:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_FEAT    3
>> + #define VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET        0
>> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET        1
>> + #define VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET        2
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET is to get the features offered
>> +by a virtual device.
>> +
>> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_SET is for driver to accept feature
>> +bits offered by the virtual device.
>> +
>> +The VIRTIO_ADMIN_CTRL_FEAT_DRIVER_GET is to get the features accepted
>> +by both the virtual driver and the device.
> So there's a lot of text here to basically pass config read/writes
> over a VQ. How about specifying admin VQ in terms of e.g. virtio PCI
> transport? Thus basically supply read/write commands and that's it?


I just simply duplicate the functions of PCI transport. Maybe you can 
give an example of your idea? Is it something like read and write with 
an offset and define the commands as the offset?


>
>
>> +The features is 64 bits mask of the virtio features bit. For
>> +VIRTIO_ADMIN_CTRL_DRIVER_SET, the feature is passed to the device
>> +through command-out-data. For VIRTIO_ADMIN_CTRL_FEAT_DEVICE_GET and
>> +VIRTIO_ADMIN_CTRL_DRIVER_GET the feature is returned for the device
>> +through command-in-data.
>> +
>> +\devicenormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
>> +
>> +The management device MUST fail VIRTIO_ADMIN_F_CTRL_FEAT class for the
>> +command that use 0 as its \field{virtual_device_id}.
>> +
>> +\drivernormative{Features Negotiation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Features Negotiation}
>> +
>> +The management driver MAY mediate between the feature negotiation
>> +request of the virtual devices and the admin virtqueue. E.g when
>> +offering features to the virtual device, the management driver MAY
>> +exclude some features in order to limit the behaviour of the virtual
>> +device.
>> +
>> +\subsection{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
>> +
>> +When the admin virtqueue offers VIRTIO_ADNIN_F_CAP_VDEV capability,
>> +the status of virtual device could be accessed by the following
>> +commands:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_STATUS    4
>> + #define VIRTIO_ADMIN_CTRL_STATUS_SET        0
>> + #define VIRTIO_ADMIN_CTRL_STATUS_GET        1
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_STATUS_SET is used to set the device status of
>> +the virtual device here. The command-out-data is the one byte status
>> +to set to the device. There's no command-in-data for this command.
>> +
>> +The VIRTIO_ADMIN_CTRL_STATUS_GET is used to get the device status of
>> +the virtual device. The command-in-data is the one byte status
>> +returned from the device. There's no command-out-data for this
>> +command.
>> +
>> +\devicenormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
>> +
>> +The management device MUST start the reset of a virtual device when 0
>> +is written via VIRTIO_ADMIN_CTRL_STATUS_SET, the success of this
>> +command demonstrate the success of the reset.
>> +
>> +The management device MUST present 0 through
>> +VIRTIO_ADMIN_CTRL_STATUS_GET once the reset is done.
>> +
>> +The management device MUST fail the device status access if
>> +\field{device_id} is zero.
>> +
>> +\drivernormative{Device Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Status}
>> +
>> +After writing 0 via VIRTIO_ADMIN_CTRL_STATUS_SET, the driver MUST wait
>> +for the success of the command before re-initializing the device.
>> +
>> +\subsection{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Genreation}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV capability,
>> +the device generation could be read from the following commands:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_GENERATION    5
>> + #define VIRTIO_ADMIN_CTRL_GENERATION_GET        0
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_GENERATION_GET is used to get the device generation
>> +of the virtual device. The command-in-data is the u32 device
>> +generation returned from the device. There's no command-out-data for
>> +this command.
>> +
>> +\devicenormative{Device Generation}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Generation}
>> +
>> +The device MUST present a changed config_generation after the driver
>> +has read a device-specific configuration value which has changed since
>> +any part of the device-specific configuration was last read.
>> +
>> +The device MUST fail the device generation access if \field{device_id} is zero.
>> +
>> +\subsection{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the
>> +config space of a virtual device could be accessed from
>> +VIRTIO_ADMIN_CTRL_CONFIG_GET and VIRTIO_ADMIN_CTRL_CONFIG_SET.
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_CONFIG    6
>> +  #define VIRTIO_ADMIN_CTRL_CONFIG_GET        0
>> +  #define VIRTIO_ADMIN_CTRL_CONFIG_SET        1
>> +
>> +struct virtio_admin_ctrl_vdev_config_get {
>> +       u32 offset;
>> +       u32 size;
>> +};
>> +
>> +struct virtio_admin_ctrl_vdev_config_set {
>> +       u32 offset;
>> +       u32 size;
>> +       u8  data[];
>> +};
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_CONFIG_GET is used to read data from the
>> +device configuration space. As described in struct
>> +virtio_admin_ctrl_vdev_config_get, The command-out-data is the offset
>> +since the start of the config space and the size of the data. The
>> +command-in-data is the array of u8 data that read from the config
>> +space.
>> +
>> +The VIRTIO_ADMIN_CTRL_CONFIG_SET is used to write data to the device
>> +configuration space. As described in struct
>> +virtio_admin_ctrl_vdev_config_set, the command-out-data contains the
>> +offset since the start of the config space, the size of the data and
>> +the data that will be wrote. There's no command-in-data for this
>> +command.
>> +
>> +\devicenormative{Device Specific Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device Specific Configuration}
>> +
>> +The management device MUST fail the device configuration space access
>> +if the driver want to access the range which is out of the config
>> +space.
>> +
>> +The management device MUST fail the device configuration space access
>> +if \field{device_id} is zero.
>> +
>> +\subsection{MSI Configuration}label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the MSI entry
>> +for a specific virtqueue could be set through following command:
>>
> I think this is a bit problematic. E.g. for PCI MSI is programmed through
> standard registers.


That's not conflict AFAIK.

E.g if the management device is PCI, the standard registers will still work.

But the virtual/managed device is not a PCI device, we need another way 
to associate MSI to them

Another important reason is that, we want to scale better than the PCI 
standard registers which only support about 2048 MSI vectors.


> Specifying address is data is insufficient,


We have both addr and data and associate it with a specific virtqueue. I 
think it can work or anything I missed?


>   neither
> is masking and enabling through device specific registers.
> Referring to a vector seems more correct.


See above, we want to scale more than 2048, and the masking is per vq 
not per vector.

Or technically we can use vector if you wish, e.g have MSI-X table per 
virtual device.


> Further, need to think about how will all this be generalized
> outside of PCI.


AFAIK, MSI is already used beyond the scope of PCI, the examples are 
some platform device and even the APIC interrupt. E.g in x86 a dedicated 
range of address space is used for MSI interrupt.

Thanks


>
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_MSI    7
>> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_SET        0
>> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE     1
>> + #define VIRTIO_ADMIN_CTRL_MSI_VQ_MASK       2
>> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET    3
>> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE 4
>> + #define VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK   5
>> +
>> +struct virtio_admin_ctrl_vdev_msi_vq_set {
>> +       u16 queue_index;
>> +       u64 addr;
>> +       u32 data;
>> +};
>> +
>> +struct virtio_admin_ctrl_vdev_msi_vq_enable {
>> +       u16 queue_index;
>> +       u8 enable;
>> +};
>> +
>> +struct virtio_admin_ctrl_vdev_msi_vq_mask {
>> +       u16 queue_index;
>> +       u8 mask;
>> +};
>> +
>> +struct virtio_admin_ctrl_vdev_msi_config {
>> +       u64 addr;
>> +       u32 data;
>> +};
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_MSI_VQ_SET is used to set the MSI entry for a
>> +specific virtqueue. The command-out-data is the virtqueue index and
>> +the MSI address and data (as described in struct
>> +virtio_admin_ctrl_vdev_msix_vq_set).
>> +
>> +The VIRTIO_ADMIN_CTRL_MSI_VQ_ENABLE and is used to enable or disable
>> +MSI interrupt for a specific virtqueue. The command-out-data is the
>> +virtqueue index and whether to enable the MSI: 0 means to enable and 1
>> +means to disable (as described in struct
>> +virtio_admin_ctrl_vdev_msi_vq_enable).
>> +
>> +The VIRTIO_ADMIN_CTRL_MSI_VQ_MASK and is used to mask or unmask MSI
>> +interrupt for a specific virtqueue. The command-out-data is the
>> +virtqueue index and the mask status: 0 means unmak and 1 means mask
>> +(as described in struct virtio_admin_ctrl_vdev_msi_vq_mask).
>> +
>> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_SET is used to set the MSI entry
>> +for the config interrupt. The command-out-data is the MSI address and
>> +data (as described in struct virtio_admin_ctrl_vdev_msix_config).
>> +
>> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_ENABLE is used to enable and disable
>> +MSI for config space. The command-out-data is an u8: 0 means to
>> +disable and 1 means to enable.
>> +
>> +The VIRTIO_ADMIN_CTRL_MSI_CONFIG_MASK is used to mask and unmask MSI
>> +interrupt for config space. The command-out-data is an u8: 0 means to
>> +mask and 1 means to unmask.
>> +
>> +There's no command-in-data for all the above MSI commands.
>> +
>> +\devicenormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
>> +
>> +The virtual device MUST record the pending MSI interrupt and
>> +generate the MSI interrupt is it was pending after unmasking.
>> +
>> +The virtual MUST disable the MSI for both virtqueue and config space
>> +upon reset.
>> +
>> +\drivernormative{MSI Configuration}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / MSI Configuration}
>> +
>> +The driver MUST allocate transport or platform specific MSI entries
>> +for both virtqueue and config space if it wants to use interrupt.
>> +
>> +The driver MAY choose disable the MSI if polling is used.
>> +
>> +\subsection{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Address}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_CAP_VDEV, the address of
>> +a specific virtqueue could be done through the following command:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_VQ_ADDR    9
>> + #define VIRTIO_ADMIN_CTRL_VQ_ADDR_SET        0
>> +
>> +struct virtio_admin_ctrl_vdev_vq_addr {
>> +       u16 queue_index;
>> +       u64 device_area;
>> +       u64 descriptor_area;
>> +       u64 driver_area;
>> +};
>> +\end{lstlisting}
>> +
>> +The command-out-data is the queue index, the addresses of device area,
>> +descriptor area and driver area (as described in struct
>> +virtio_admin_ctrl_vdev_vq_addr); There's no command-in-data.
>> +
>> +\devicenormative{Virtqueue Address}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueeu Address}
>> +
>> +The management device MUST fail the commands of class
>> +VIRTIO_ADMIN_CTRL_VQ_ADDR if \field{device_id} is zero.
>> +
>> +\subsection{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
>> +
>> +When the admin virtqueue offers VIRTIO_F_ADMIN_F_CAP_VDEV, virtqueue
>> +status could be set and get through the following command:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_VQ_ENABLE    10
>> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET       0
>> + #define VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET       1
>> +
>> +struct virtio_admin_ctrl_vq_status_set {
>> +  u16 queue_index;
>> +  u8 status;
>> +};
>> +
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET is used to set the status to a
>> +specific virtqueue. The command-out-data is the queue index, the
>> +status that is set to the virtqueue (0 disabled, 1 enabled); There's
>> +no command-in-data.
>> +
>> +The VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET is used to get the status of a
>> +specific virtqueue. The command-out-data is the u16 of queue
>> +index. The command-in-data is the virtqueue status (0 disalbed, 1
>> +enabled).
>> +
>> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
>> +
>> +When disabled, the virtual device MUST stop processing requests from
>> +this virtqueue.
>> +
>> +The management device MUST present a 0 via
>> +VIRTIO_ADMIN_CTRL_VQ_ENABLE_GET on reset of the virtual device.
>> +
>> +The management device MUST fail the virtqueue status access if
>> +\field{device_id} is zero.
>> +
>> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Status}
>> +
>> +The driver MUST configure the other virtqueue fields before enabling
>> +the virtqueue with VIRTIO_ADMIN_CTRL_VQ_ENABLE_SET.
>> +
>> +\subsection{Virtqueue Size}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, virtqueue size
>> +could be accessed through the following command:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_VQ_SIZE    11
>> + #define VIRTIO_ADMIN_CTRL_VQ_SIZE_SET       0
>> + #define VIRTIO_ADMIN_CTRL_VQ_SIZE_GET       1
>> +
>> +struct virtio_admin_ctrl_vdev_vq_size_set {
>> +       u16 queue_index;
>> +       u16 size;
>> +};
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_VQ_SIZE_SET command is used to set the virtqueue
>> +size. The command-out-data is the queue index and the size of the
>> +virtqueue (as described in struct
>> +virtio_admin_ctrl_vdev_vq_size_set). There's no command-in-data.
>> +
>> +The VIRTIO_ADMIN_CTRL_VQ_SIZE_GET command is used to get the virtqueue
>> +size. On reset, the maximum queue size supported by the device is
>> +returned. The command-out-data is the u16 of the virtqueue index. The
>> +command-in-data is the u16 of queue size for the virtqueue.
>> +
>> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Size}
>> +
>> +The management device MUST fail the virtqueue size access if
>> +\field{device_id} is zero.
>> +
>> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, the virtqueue
>> +notification could be done through the following commands:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_CTRL_VQ_NOTIFY    12
>> + #define VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET          1
>> +
>> +struct virtio_admin_ctrl_vdev_vq_notification_area {
>> +       le64 addr
>> +       le64 size;
>> +};
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET is used to get the transport
>> +specific address area that can be used to notify a virtqueue. The
>> +command-out-data is a u16 of the virtqueue index. The command-in-data
>> +contains the address and the size of the notification area (as
>> +described in struct virtio_admin_ctrl_vdev_vq_notification_area).
>> +
>> +\devicenormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
>> +
>> +The management device MUST fail the VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET if
>> +there's no transport specific notification address for a virtqueue of
>> +its virtual device.
>> +
>> +The management device MUST fail the virtqueue notification access if
>> +\field{device_id} is zero.
>> +
>> +The management device MUST forbid the notification area of a specific
>> +virtual device to be accessed from another virtual device.
>> +
>> +\drivernormative{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Virtqueue Notification}
>> +
>> +The driver MAY choose to notify the virtqueue by writing the queue
>> +index at address \field{addr} which is fetched from the
>> +VIRTIO_ADMIN_CTRL_VQ_NOTIFY_GET command.
>> +
>> +\subsection{Virtqueue Notification}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
>> +
>> +When the admin virtqueue offers VIRTIO_ADMIN_F_VDEV, management device
>> +offers a device MMU for a secure DMA context for each virtual
>> +device. The device MMU will translate I/O Virtual Address to transport
>> +specific DMA address before using a transport specific way for DMA:
>> +
>> +\begin{lstlisting}
>> +#define VIRTIO_ADMIN_VQ_CTRL_MMU    13
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE       1
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET     2
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_MAP          3
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP        4
>> + #define VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET      5
>> +
>> +struct virtio_admin_ctrl_vdev_mmu_asid_set {
>> +  le16 queue_index;
>> +  le64 asid;
>> +};
>> +
>> +struct virtio_admin_ctrl_vdev_mmu_map {
>> +  le64 iova_start;
>> +  le64 iova_end;
>> +  le64 dma_start;
>> +  le32 flags;
>> +};
>> +
>> +/* Read access is allowed */
>> +#define VIRTIO_ADMIN_VQ_MAP_F_READ   (1 << 0)
>> +/* Write access is allowed */
>> +#define VIRTIO_IOMMU_VQ_MAP_F_WRITE  (1 << 1)
>> +
>> +struct virtio_admin_ctrl_vdev_mmu_err {
>> +  le32 reason;
>> +  le16 queue_index;
>> +  le64 asid;
>> +  le64 iova_start;
>> +  le64 iova_end;
>> +  le32 flags;
>> +};
>> +
>> +/* Mapping does not exist */
>> +#define VIRTIO_ADMIN_VQ_MAP_ERR_NON_EXIST (1 << 0)
>> +/* Access violates the permission */
>> +#define VIRTIO_ADMIN_VQ_MAP_ERR_ACC_VIOLATION (1 << 1)
>> +
>> +\end{lstlisting}
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ENABLE is used to enable or disable
>> +device MMU for a specific virtual device. The command-out-data is a u8
>> +for telling whether device MMU is enabled for the virtual device: 0
>> +means to enable and 1 means to disable.
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
>> +address space id to a virtqueue. The command-out-data is the queue
>> +index (\field{queue_index}) and the address space ID (\field{asid})
>> +assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
>> +Address range [\field{iova_start}, \field{iova_end}] to transport
>> +specific DMA address range [\field{dma_start}, \field{dma_start} +
>> + \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
>> +specify the device access permission.
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
>> +Virtual Address ranges that are intersected with the range
>> +[\field{iova_start}, \field{iova_end}].
>> +
>> +There's no command-in-data for all the above four commands.
>> +
>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
>> +information of the device MMU. There's no command-out-data, the
>> +command-in-date is the queue index and its asid, the iova range and
>> +the access of the operation (as described in struct
>> +virtio_admin_ctrl_vdev_mmu_err).
>> +
>> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
>> +
>> +The management device MUST fail the device MMU command if \field{device_id} is
>> +zero.
>> +
>> +The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
>> +command if the iova range is intersected with a existing range.
>> +
>> +The management device MUST set both DEVICE_NEEDS_RESET and
>> +DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
>> +virtual device.
>> +
>> +The device MMU for the virtual device MUST be disabled upon its reset.
>> +
>> +Upon reset, the virtual device must reset the Address Space ID for
>> +each virtqueue to 0.
>> +
>> +\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
>> +  / Virtio Over Admin Virtqueue / Device MMU}
>> +
>> +The driver MAY choose to disable the device MMU but it MUST make sure
>> +the transport specific method could be used to provide a secure DMA
>> +context for each virtual device.
>> +
>> +The driver MAY query the error of device MMU after DEVICE_MMU_FAIL is set.
>> +
>> +\subsection{Presenting Virtual Device}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Presenting Virtual Device}
>> +
>> +If VIRTIO_ADMIN_F_VDEV is offered by the device. The presenting of
>> +the virtual device requires co-operation between the management
>> +driver and the admin virtqueue. This means, from the view of the
>> +virtual device driver, the transport is done via the communication
>> +with the management device driver. It's up to the software to decide
>> +what kind of method that is needed be used for those communications.
>> +
>> +The management driver typically do the following steps for creating a
>> +virtual device:
>> +
>> +\begin{enumerate}
>> +\item Determine the virtio id and device specific configuration.
>> +\item Create the virtual devices using VIRTIO_ADMIN_CTRL_VDEV_CREATE
>> +command.
>> +\item Optionally, configure the MSI.
>> +\item Optionally, enable and initialize the device MMU.
>> +\item Setup the necessary communication methods with virtual device driver.
>> +\item Perform device specific setups.
>> +\item Let the virtual device to be probed by the virtual device
>> +driver. The management driver will then use the admin virtqueue to
>> +implement the requests of basic facility from the virtual device
>> +driver.
>> +\end{enumerate}
>> +
>>   \section{Virtio Over PCI Bus}\label{sec:Virtio Transport Options / Virtio Over PCI Bus}
>>   
>>   Virtio devices are commonly implemented as PCI devices.
>> -- 
>> 2.24.3 (Apple Git-128)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04 12:50     ` Stefan Hajnoczi
@ 2021-08-05  6:32       ` Jason Wang
  2021-08-05 13:59         ` Stefan Hajnoczi
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-05  6:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu, Parav Pandit


在 2021/8/4 下午8:50, Stefan Hajnoczi 写道:
> On Wed, Aug 04, 2021 at 04:51:15PM +0800, Jason Wang wrote:
>> 在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
>>> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>>>> +for telling whether device MMU is enabled for the virtual device: 0
>>>> +means to enable and 1 means to disable.
>>>> +
>>>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ASID_SET is used to assign a device
>>>> +address space id to a virtqueue. The command-out-data is the queue
>>>> +index (\field{queue_index}) and the address space ID (\field{asid})
>>>> +assigned to it (as described in struct virtio_admin_ctrl_vdev_mmu_asid_set).
>>>> +
>>>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_MAP is used to map the I/O Virtual
>>>> +Address range [\field{iova_start}, \field{iova_end}] to transport
>>>> +specific DMA address range [\field{dma_start}, \field{dma_start} +
>>>> + \field{iova_end} - \field{iova_start} + 1]. \field{flags} is used to
>>>> +specify the device access permission.
>>>> +
>>>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_UNMAP is used to unmap all the mapped I/O
>>>> +Virtual Address ranges that are intersected with the range
>>>> +[\field{iova_start}, \field{iova_end}].
>>>> +
>>>> +There's no command-in-data for all the above four commands.
>>>> +
>>>> +The VIRTIO_ADMIN_VQ_CTRL_MMU_ERR_GET is used to get the error
>>>> +information of the device MMU. There's no command-out-data, the
>>>> +command-in-date is the queue index and its asid, the iova range and
>>>> +the access of the operation (as described in struct
>>>> +virtio_admin_ctrl_vdev_mmu_err).
>>>> +
>>>> +\devicenormative{Virtqueue Status}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Device MMU}
>>> Does the virtual device need to support running with the device MMU
>>> disabled?
>>
>> Yes, device MMU could be disabled.
> I'm a little surprised that there is no way to refuse to run without the
> device MMU because I got the impression using MMUs is required for
> security/isolation on modern systems.


Yes but only if the platform can't provide sub device isolation (e.g PASID).


>
> If I have a physical non-SR-IOV PCI management device, how does the
> virtual device access memory when the device MMU is disabled? Would its
> address space be identical to the management device's IOVA space?


Yes, it's the transport specific DMA address space.


> Maybe you can write a paragraph or two explaining how the address spaces
> of management and virtual devices are related?


Ok.


>
>>
>>>> +
>>>> +The management device MUST fail the device MMU command if \field{device_id} is
>>>> +zero.
>>>> +
>>>> +The management device MUST fail the VIRTIO_ADMIN_VQ_CTRL_MMU_MAP
>>>> +command if the iova range is intersected with a existing range.
>>>> +
>>>> +The management device MUST set both DEVICE_NEEDS_RESET and
>>>> +DEVICE_MMU_FAIL when the device MMU fails to do the translation for a
>>>> +virtual device.
>>>> +
>>>> +The device MMU for the virtual device MUST be disabled upon its reset.
>>>> +
>>>> +Upon reset, the virtual device must reset the Address Space ID for
>>>> +each virtqueue to 0.
>>>> +
>>>> +\drivernormative{Virtqueue Status}\label{sec:Virtio Transport Options
>>>> +  / Virtio Over Admin Virtqueue / Device MMU}
>>>> +
>>>> +The driver MAY choose to disable the device MMU but it MUST make sure
>>>> +the transport specific method could be used to provide a secure DMA
>>>> +context for each virtual device.
>>> What does this mean?
>>
>> It means when device MMU is disabled. The management driver must make sure a
>> platform specific feature like PASID capable IOMMU is used to isolate DMA
>> between the virtual devices.
> Ah, this is related to what I asked above. It's still not 100% clear to
> me.


It means, e.g the transport or platform support PASID. The management 
driver can choose to disable device MMU but use PASID for DMA isolation.


>
>>
>>> During which stages of the virtual device's lifecycle (ACKNOWLEDGE,
>>> DRIVER, FEATURES_OK, DRIVER_OK) may the management driver enable/disable
>>> the device MMU?
>>
>> My understanding is that it doesn't matter which stage. E.g it should be
>> safe to be enabled before DRIVER_OK since it doesn't have any virtqueue
>> operations.
>>
>> The driver just need to make sure:
>>
>> - device MMU is enabled before DRIVER_OK
>> - when device MMU is disabled, the transport/platform DMA hardware has been
>> setup
> Thanks, please include that information.


Ok.


>
>>> I tried to imagine what the virtio-blk vdev creation parameters need to
>>> look like. Here is what I came up with:
>>>
>>>     Virtual Device Creation Parameters for Block Devices
>>>     ----------------------------------------------------
>>>     The following creation parameters specify the details of a new virtual
>>>     block device:
>>>
>>>     Field        Type   Meaning
>>>     ----------------------------------------------------------------------
>>>     blkdev_id    u64    Identifier of the underlying block device that
>>>                         provides storage. The enumeration and creation of
>>>                         underlying block devices is
>>>                         implementation-specific.
>>>     num_queues   u16    Number of request virtqueues.
>>>     features_len u8     Number of elements in features[].
>>
>> For 'elements' do you mean the 'u32 elements'?
> Yes, u32 array elements.
>
>>>     features[]   u32    Device feature bits to report.
>>>
>>>     Creation error codes are as follows:
>>>
>>>     Error               Meaning
>>>     ----------------------------------------------------------------------
>>>     INVALID_BLKDEV_ID   The underlying block device does not exist.
>>>     BLKDEV_BUSY         The underlying block device is already in use.
>>>     BLKDEV_READ_ONLY    The underlying block device is read-only.
>>>     INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
>>>     UNSUPPORTED_FEATURE A feature bit was given that the device does not
>>>                         support.
>>>
>>>     If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
>>>     block device is made available for read-only access.
>>>
>>>     Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
>>>     already in use is given.
>>>
>>>     Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
>>>     does not support writes and the VIRTIO_BLK_F_RO bit is not set in
>>>     features[].
>>>
>>>     The configuration space parameters (see 5.2.4 Device configuration
>>>     layout) are determined by the device based on the underlying block
>>>     device capacity, block size, etc.
>>>
>>> Note that this doesn't allow overriding configuration space parameters
>>> (e.g. block size). We probably need to support that in the future for
>>> live migration compatibility.
>>
>> I wonder do we need those configuration to be self-descriptive? E.g how did
>> the device know that the config contains the blk_size. (I guess it's not a
>> good practice to infer this from the config len).
> The device configuration space size and layout is determined by the
> device feature bits.


So blk_size doesn't belong to any feature. I guess it means we should 
start the support of blk_size from day 0.

I had some discussion with Parav about this in the series that 
introduces the netlink extension for setting up the device.

I guess this is what we want:

struct virtio_config {
attribute_X; //only exist when feature X existing
attribute_Y; //only exist when feature Y existing
...
};

Thanks


>
> Stefan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-04 12:56         ` Stefan Hajnoczi
@ 2021-08-05  6:33           ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2021-08-05  6:33 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu


在 2021/8/4 下午8:56, Stefan Hajnoczi 写道:
> On Wed, Aug 04, 2021 at 04:39:24PM +0800, Jason Wang wrote:
>> 在 2021/8/4 下午2:39, Stefan Hajnoczi 写道:
>>> On Wed, Aug 04, 2021 at 11:01:39AM +0800, Jason Wang wrote:
>>>> 在 2021/8/3 下午10:51, Stefan Hajnoczi 写道:
>>>>> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>>>>>> +\subsection{Admin Virtqueue Capabilities}\label{sec:Virtio Transport Options / Virtio Over Admin Virtqueue / Admin Virtqueue Capabilities}
>>>>>> +
>>>>>> +The capabilites that are supported by the admin virtqueue could be
>>>>>> +fetched through the following commands:
>>>>>> +
>>>>>> +\begin{lstlisting}
>>>>>> +#define VIRTIO_ADMIN_CTRL_CAP    0
>>>>>> + #define VIRTIO_ADMIN_CTRL_CAP_GET        0
>>>>>> +\end{lstlisting}
>>>>>> +
>>>>>> +The VIRTIO_ADMIN_CTRL_CAP_GET is used to get the capabilites that are
>>>>>> +supported by the admin virtqueue through a u64 which is a bit mask of
>>>>>> +the capabilies in command-in-data. There's no command-out-data.
>>>>> I'm not sure what this paragraph is describing. Is VIRTIO_ADMIN_CTRL_CAP
>>>>> a struct virtio_admin_ctrl::command value?
>>>> VIRTIO_ADMIN_CTRL_CAP is the class, VIRTIO_ADMIN_CTRL_CAP_GET is the
>>>> command.
>>> Okay. I found the admin virtqueue command descriptions hard to read, not
>>> just because the class/command definitions weren't obvious to me, but
>>> also because the command/response layout is described in English instead
>>> of a table or C-like notation.
>>>
>>> I think something like this would make the commands easier to
>>> understand:
>>>
>>>     Capabilities supported by the admin virtqueue are fetched as follows:
>>>
>>>     Driver->Device:
>>>     Field       Value                          Type
>>>     -----------------------------------------------
>>>     device_id   0                              u64
>>>     class       VIRTIO_ADMIN_CTRL_CAP (0)      u16
>>>     command     VIRTIO_ADMIN_CTRL_CAP_GET (0)  u16
>>>
>>>     Device->Driver:
>>>     Field         Value                        Type
>>>     -----------------------------------------------
>>>     ack           VIRTIO_ADMIN_OK (0)          u8
>>>     capabilities  <supported capability bits>  u64
>>
>> I just follows styles that is used in virtio-net control vq:
>>
>>
>> \begin{lstlisting}
>> #define VIRTIO_NET_CTRL_MQ    4
>>   #define VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET        0 (for automatic receive
>> steering)
>>   #define VIRTIO_NET_CTRL_MQ_RSS_CONFIG          1 (for configurable receive
>> steering)
>>   #define VIRTIO_NET_CTRL_MQ_HASH_CONFIG         2 (for configurable hash
>> calculation)
>> \end{lstlisting}
>>
>> ...
>>
>> Do you suggest to change that as well?
> My preference is to change it because I think describing inputs/outputs
> in English makes it hard to understand the exact binary layout.


Ok.


>
>>
>>>>> Another question: I wonder why there is an admin virtqueue feature bit
>>>>> instead of a new VIRTIO device ID for the management device? Does this
>>>>> mean regular virtio-net, virtio-blk, etc devices can have an admin queue
>>>>> in addition to their normal role?
>>>> I think so, I just follow the normal networking PF role which is usually a
>>>> network device which allows some kind of remote management.
>>>>
>>>> But it doesn't forbid us to create the management device.
>>>>
>>>> I think it's better to not mandate the management device for now, or is
>>>> there any reason for doing that?
>>> Since the admin virtqueue is generic infrastructure (not specific to
>>> vdev management) I think it makes sense to use a feature bit and not a
>>> separate management device. This wasn't obvious to me from this
>>> document, maybe the text can be tweaked to clearly separate the
>>> (generic) admin virtqueue from vdev management.
>>
>> Maybe adding a device normative like
>>
>> "
>>
>> The management device MUST offer the admin virtqueue is a device specific
>> virtqueue.
>>
>> "
> Yes, with
> s/is/as/


Right.

Thanks



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-05  1:36       ` Jason Wang
@ 2021-08-05 12:37         ` Max Gurtovoy
  2021-08-06  2:26           ` Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Max Gurtovoy @ 2021-08-05 12:37 UTC (permalink / raw)
  To: Jason Wang, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


On 8/5/2021 4:36 AM, Jason Wang wrote:
>
> 在 2021/8/4 下午6:20, Max Gurtovoy 写道:
>>
>> On 8/4/2021 4:37 AM, Jason Wang wrote:
>>>
>>> 在 2021/8/3 下午8:40, Max Gurtovoy 写道:
>>>>>
>>>>> +Sometimes it's hard to implement the device in a transport specific
>>>>> +method. One example is that a physical device may try to present
>>>>> +multiple virtual devices with a limited transport specific
>>>>> +resources. Another example is to implement virtual devices which is
>>>>> +transport independent. In those cases, the admin virtqueue 
>>>>> provided by
>>>>> +the management device could be used to replace the transport 
>>>>> specific
>>>>> +method to implement the virtual device. Then the presenting of the
>>>>> +virtual device is done through the cooperation between the admin
>>>>> +virtqueue and the driver.
>>>>
>>>> maybe it's me, but I can't understand how admin queue is a transport.
>>>
>>>
>>> The transport is the method that provides basic facility. In this 
>>> proposal, the admin virtqueue is used to provide basic facility for 
>>> the virtual device. That is to say, it's the transport for virtual 
>>> device.
>>>
>>>
>>>>
>>>> And how can I use admin queue transport to migrate VFs that are 
>>>> controlled by virtio PCI PF.
>>>
>>>
>>> This live migration support and the admin virtqueue transport are 
>>> orthogonal. The main motivation of this proposal is used for 
>>> implementing virtual device transport via admin virtqueue. It's not 
>>> hard to add new commands for doing live migration for the virtual 
>>> device, I don't do that since I believe it's expected to be 
>>> addressed in your proposal.
>>
>> so why do you call it in the same name that I used in my RFC ? This 
>> is confusing and causing problems.
>
>
> Max,
>
> I really think the game of "who comes first" is meaningless.
>
> I've used the terminology like "admin virtqueue" sometime early this 
> year:
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F202101%2Fmsg00034.html&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cffc813bcfeb743b309cd08d957b17eed%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637637242113472461%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=GwU%2Fz5U4HIjsS2MH1%2BaI1u7ff8dIG%2F15zE5CwDJ91Cg%3D&amp;reserved=0 
>
>
> I start the work of the admin virtqueue as a transport since then and 
> I think I don't say "you call it the same name that I used before".

I didn't say it matters who call it first.

But it does matter for us to have different naming since it causes 
confusion.

In any case, the management device will introduce an admin queue 
feature. So my RFC is defining this infrastructure. Your first commit 
should be on top of my RFC introducing new admin commands for creating 
virtualized devices (you can call it 
VIRTIO_ADMIN_PCI_VIRTUALIZATION_MANAGEMENT class in the transport 
command range). The second commit should define the new transport and 
configuration cycles to get to a point that the SF virtio device is 
ready to be used by virtio driver.


>
>
>>
>> You are working on a parallel feature and reviewing my RFC as if it 
>> was instead of your proposal.
>
>
> Firstly, though they may have the same interface/commands, the two 
> proposals serves for completely different goals.
>
> Secondly, it's about how to justify your proposal in the community, 
> and I think I don't get the convincing answers for the following two 
> points:
>
> 1) I have said that your proposal of using admin virtqueue for doing 
> live migration makes sense, but we need the per function interface for 
> nested virt

You need to add the needed commands for vDPA to stop/start queues and 
you'll have the nested migration. BTW, this stop/start is not part of 
virtio migration. It's a capabiliy that vDPA will use to implement vDPA 
live migration.

So please don't mix virtio migration and vDPA migration.

Also in case we'll want nexted migration, the L1 VF (that is seem as PF 
to the VM) can expose admin_q and migration caps to manage migration 
process for L2 VFs.

I already mentioned this.

>
> 2) I've pointed out that using the general vitqueue for carrying 
> vendor specific command breaks the efforts of spec as a standard device

It's not breaking anything. Most of the specifications (NVMe, SCSI, FC, 
more..) allow Vendors to innovate, and so does VIRTIO today. I don't 
understand the objection and this is a contradiction to the vendor 
specific cfg area you added for some reason in the past. I'm repeating 
myself again and again.

In case Virtio doesn't want to be innovative and encourage vendors to 
innovate their products, lets keep the 192-255 classes reserved for now 
and discuss it in the future again.

>
> Lastly, this proposal is RFC, it's not perfect for sure. The most 
> important thing for the current stage is not about how and when this 
> can be merged but whether or not this approach can work. I post them 
> now since a talk about the hyper scalability will be given at the KVM 
> Forum then I need to post this as one of the approach before as a 
> reference. There are vendors that are asking something like this as a 
> reference for having better scalability than SR-IOV.
>
>
>>
>> IIUC, in your proposal a non SRIOV device parent will create admin 
>> queue and using this admin queue you'll be able to create children 
>> devices and their transport will be admin queue.
>>
>> That means that the configuration cycles will be trapped by the 
>> parent device somehow.
>
>
> That's the way for having better scalability. And this is also the 
> approach that SIOV used.
>
>
>>
>> This also means we need to merge my RFC first to create 
>> infrastructure for this RFC.
>
>
> It doesn't matter which will be merged. We should guarantee:
>
> 1) The idea is justified by the community
> 2) The merged proposal is extensible for accepting new features and 
> commands
>
> For 2) I think both proposals can do that. And to me, it's not hard to 
> switch to the similar interface as you invent.
>
so please be supportive and please understand that this RFC will allow 
easier addition to your proposal for virtio SFs.

>
>>
>> For admin management that you'll need probably virtio-cli tool from 
>> user space.
>
>
> What does virtio-cli do? We've already had vdpa that is integrated 
> into iproute2.
>
> And let me clarify the concept again: vDPA is a superset of virtio. 
> That means virtio could be treated as one kind of vDPA.
>
> In this sense, I don't see a value of re-inventing the wheels again in 
> the virtio-cli.

virtio-cli has nothing to do with vDPA.

It's a tool to configure virtio device from cmdline.


>
>
>>
>> So this proposal is complementary to mine. Your management device 
>> will negotiate "my" admin_queue feature and you'll need to add more 
>> commands to this admin queue that are probably in transport specific 
>> domain to create children.
>
>
> I think we can go either:
>
> 1) Make two virtqueues separately
>
> or
>
> 2) Using a single virtqueue
>
> And I would like to co-operate if 2) makes more sense.
>
>
>>
>> You do need to handle the configuration cycles that this management 
>> parent device will need to support.
>
>
> My proposals have already supported basic management like device 
> creation and destroy. And I think it's not hard to extend it to other.

see my suggestion above. Need to build this on top of my RFC before 
defining the new transport.


>
> To reduce the complexity, I've stripped out a lot of features from the 
> first RFC. It's better to start from the minimal set.
>
> Thanks
>
>
>>
>>
>>>
>>> For virtual device, it's a independent virtio device that could be 
>>> assigned to secure DMA context/domain,  it is functional equivalent 
>>> ADI or SF. The difference is that it can work with or without 
>>> platform support (SIOV or PASID).
>>>
>>>
>>>>
>>>> And why the regular admin queue that is part of the device queues 
>>>> can't fit to your needs ?
>>>
>>>
>>> For "regular admin queue", did you mean your proposal. Actually, 
>>> it's not conflict, we can unify the interface though the motivation 
>>> is different.
>>>
>>>
>>>>
>>>> Can you explain your needs ? is it to create a vDPA device from 
>>>> some SW interface ?
>>>
>>>
>>> As stated in the patch, the needs are:
>>>
>>> - Presenting virtual devices with limited transport specific resources
>>> - Presenting virtual devices without platform support (e.g SR-IOV or 
>>> SIOV)
>>>
>>> We want virtio to have hyper-scalability via slicing at virtio 
>>> level. It's not directly related to vDPA.
>>>
>>> For vDPA, vendor are freed to have their own technology to be hyper 
>>> scalable (e.g SF, ADI or other stuffs).
>>>
>>> Thanks
>>>
>>>
>>>>
>>>> I don't follow. 
>>>
>>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-05  6:32       ` Jason Wang
@ 2021-08-05 13:59         ` Stefan Hajnoczi
  2021-08-05 19:19           ` Michael S. Tsirkin
  0 siblings, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2021-08-05 13:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtio-comment, mst, cohuck, mgurtovoy, eperezma, lulu, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 5337 bytes --]

On Thu, Aug 05, 2021 at 02:32:31PM +0800, Jason Wang wrote:
> 
> 在 2021/8/4 下午8:50, Stefan Hajnoczi 写道:
> > On Wed, Aug 04, 2021 at 04:51:15PM +0800, Jason Wang wrote:
> > > 在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
> > > > On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> > > > I tried to imagine what the virtio-blk vdev creation parameters need to
> > > > look like. Here is what I came up with:
> > > > 
> > > >     Virtual Device Creation Parameters for Block Devices
> > > >     ----------------------------------------------------
> > > >     The following creation parameters specify the details of a new virtual
> > > >     block device:
> > > > 
> > > >     Field        Type   Meaning
> > > >     ----------------------------------------------------------------------
> > > >     blkdev_id    u64    Identifier of the underlying block device that
> > > >                         provides storage. The enumeration and creation of
> > > >                         underlying block devices is
> > > >                         implementation-specific.
> > > >     num_queues   u16    Number of request virtqueues.
> > > >     features_len u8     Number of elements in features[].
> > > 
> > > For 'elements' do you mean the 'u32 elements'?
> > Yes, u32 array elements.
> > 
> > > >     features[]   u32    Device feature bits to report.
> > > > 
> > > >     Creation error codes are as follows:
> > > > 
> > > >     Error               Meaning
> > > >     ----------------------------------------------------------------------
> > > >     INVALID_BLKDEV_ID   The underlying block device does not exist.
> > > >     BLKDEV_BUSY         The underlying block device is already in use.
> > > >     BLKDEV_READ_ONLY    The underlying block device is read-only.
> > > >     INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
> > > >     UNSUPPORTED_FEATURE A feature bit was given that the device does not
> > > >                         support.
> > > > 
> > > >     If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
> > > >     block device is made available for read-only access.
> > > > 
> > > >     Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
> > > >     already in use is given.
> > > > 
> > > >     Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
> > > >     does not support writes and the VIRTIO_BLK_F_RO bit is not set in
> > > >     features[].
> > > > 
> > > >     The configuration space parameters (see 5.2.4 Device configuration
> > > >     layout) are determined by the device based on the underlying block
> > > >     device capacity, block size, etc.
> > > > 
> > > > Note that this doesn't allow overriding configuration space parameters
> > > > (e.g. block size). We probably need to support that in the future for
> > > > live migration compatibility.
> > > 
> > > I wonder do we need those configuration to be self-descriptive? E.g how did
> > > the device know that the config contains the blk_size. (I guess it's not a
> > > good practice to infer this from the config len).
> > The device configuration space size and layout is determined by the
> > device feature bits.
> 
> 
> So blk_size doesn't belong to any feature. I guess it means we should start
> the support of blk_size from day 0.

The device creation parameters can either include a full configuration
space-sized blob:

  Field        Type                      Meaning
  ----------------------------------------------------------------------
  init_config  struct virtio_blk_config  Initial contents of the
                                         configuration space.

or they can include individual fields (basically
re-define them outside struct virtio_foo_config):

  Field        Type        Meaning
  ----------------------------------------------------------------------
  blk_size     u32         Block size.

Which approach to use depends on how much of the configuration space
should be settable at device creation time. If most of it will be
initialized by the device and isn't configurable, then embedding the
entire struct is not necessary.

Additionally, there must be a flags field in the device creation
parameters for indicating which configuration space fields or individual
fields described above to use. This allows you to accept the default
blk_size value instead of providing your own value:

  Field       Type         Meaning
  ----------------------------------------------------------------------
  init_flags  u64          Use the corresponding field value to
                           initialize the device configuration space if
			   the flag is set:

			     INIT_BLK_SIZE (1 << 0)

> I had some discussion with Parav about this in the series that introduces
> the netlink extension for setting up the device.
> 
> I guess this is what we want:
> 
> struct virtio_config {
> attribute_X; //only exist when feature X existing
> attribute_Y; //only exist when feature Y existing
> ...
> };

That's more or less how configuration space layout works today. We don't
have explicit comments in the header file but when feature X is enabled
the driver may access virtio_config::attribute_X.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-05 13:59         ` Stefan Hajnoczi
@ 2021-08-05 19:19           ` Michael S. Tsirkin
  2021-08-06  2:39             ` Jason Wang
  2021-08-19 12:54             ` Stefan Hajnoczi
  0 siblings, 2 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2021-08-05 19:19 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Jason Wang, virtio-comment, cohuck, mgurtovoy, eperezma, lulu,
	Parav Pandit

On Thu, Aug 05, 2021 at 02:59:04PM +0100, Stefan Hajnoczi wrote:
> On Thu, Aug 05, 2021 at 02:32:31PM +0800, Jason Wang wrote:
> > 
> > 在 2021/8/4 下午8:50, Stefan Hajnoczi 写道:
> > > On Wed, Aug 04, 2021 at 04:51:15PM +0800, Jason Wang wrote:
> > > > 在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
> > > > > On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> > > > > I tried to imagine what the virtio-blk vdev creation parameters need to
> > > > > look like. Here is what I came up with:
> > > > > 
> > > > >     Virtual Device Creation Parameters for Block Devices
> > > > >     ----------------------------------------------------
> > > > >     The following creation parameters specify the details of a new virtual
> > > > >     block device:
> > > > > 
> > > > >     Field        Type   Meaning
> > > > >     ----------------------------------------------------------------------
> > > > >     blkdev_id    u64    Identifier of the underlying block device that
> > > > >                         provides storage. The enumeration and creation of
> > > > >                         underlying block devices is
> > > > >                         implementation-specific.
> > > > >     num_queues   u16    Number of request virtqueues.
> > > > >     features_len u8     Number of elements in features[].
> > > > 
> > > > For 'elements' do you mean the 'u32 elements'?
> > > Yes, u32 array elements.
> > > 
> > > > >     features[]   u32    Device feature bits to report.
> > > > > 
> > > > >     Creation error codes are as follows:
> > > > > 
> > > > >     Error               Meaning
> > > > >     ----------------------------------------------------------------------
> > > > >     INVALID_BLKDEV_ID   The underlying block device does not exist.
> > > > >     BLKDEV_BUSY         The underlying block device is already in use.
> > > > >     BLKDEV_READ_ONLY    The underlying block device is read-only.
> > > > >     INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
> > > > >     UNSUPPORTED_FEATURE A feature bit was given that the device does not
> > > > >                         support.
> > > > > 
> > > > >     If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
> > > > >     block device is made available for read-only access.
> > > > > 
> > > > >     Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
> > > > >     already in use is given.
> > > > > 
> > > > >     Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
> > > > >     does not support writes and the VIRTIO_BLK_F_RO bit is not set in
> > > > >     features[].
> > > > > 
> > > > >     The configuration space parameters (see 5.2.4 Device configuration
> > > > >     layout) are determined by the device based on the underlying block
> > > > >     device capacity, block size, etc.
> > > > > 
> > > > > Note that this doesn't allow overriding configuration space parameters
> > > > > (e.g. block size). We probably need to support that in the future for
> > > > > live migration compatibility.
> > > > 
> > > > I wonder do we need those configuration to be self-descriptive? E.g how did
> > > > the device know that the config contains the blk_size. (I guess it's not a
> > > > good practice to infer this from the config len).
> > > The device configuration space size and layout is determined by the
> > > device feature bits.
> > 
> > 
> > So blk_size doesn't belong to any feature. I guess it means we should start
> > the support of blk_size from day 0.
> 
> The device creation parameters can either include a full configuration
> space-sized blob:
> 
>   Field        Type                      Meaning
>   ----------------------------------------------------------------------
>   init_config  struct virtio_blk_config  Initial contents of the
>                                          configuration space.
> 
> or they can include individual fields (basically
> re-define them outside struct virtio_foo_config):
> 
>   Field        Type        Meaning
>   ----------------------------------------------------------------------
>   blk_size     u32         Block size.
> 
> Which approach to use depends on how much of the configuration space
> should be settable at device creation time. If most of it will be
> initialized by the device and isn't configurable, then embedding the
> entire struct is not necessary.
> Additionally, there must be a flags field in the device creation
> parameters for indicating which configuration space fields or individual
> fields described above to use. This allows you to accept the default
> blk_size value instead of providing your own value:
> 
>   Field       Type         Meaning
>   ----------------------------------------------------------------------
>   init_flags  u64          Use the corresponding field value to
>                            initialize the device configuration space if
> 			   the flag is set:
> 
> 			     INIT_BLK_SIZE (1 << 0)
> 
> > I had some discussion with Parav about this in the series that introduces
> > the netlink extension for setting up the device.
> > 
> > I guess this is what we want:
> > 
> > struct virtio_config {
> > attribute_X; //only exist when feature X existing
> > attribute_Y; //only exist when feature Y existing
> > ...
> > };
> 
> That's more or less how configuration space layout works today. We don't
> have explicit comments in the header file but when feature X is enabled
> the driver may access virtio_config::attribute_X.
> 
> Stefan


Two things I know about network devices
- some VF configuration isn't in config space at all
  since config space describes guest visible fields.
  E.g. a vlan tag to be attached to all packets.

- some VF configuration is normally settable by PF without
  destroying/recreating the VF.
  E.g. the default MAC address.

Does blk have such configuration?




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-05 12:37         ` Max Gurtovoy
@ 2021-08-06  2:26           ` Jason Wang
  2021-08-11 10:00             ` Max Gurtovoy
  0 siblings, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-06  2:26 UTC (permalink / raw)
  To: Max Gurtovoy, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


在 2021/8/5 下午8:37, Max Gurtovoy 写道:
>
> On 8/5/2021 4:36 AM, Jason Wang wrote:
>>
>> 在 2021/8/4 下午6:20, Max Gurtovoy 写道:
>>>
>>> On 8/4/2021 4:37 AM, Jason Wang wrote:
>>>>
>>>> 在 2021/8/3 下午8:40, Max Gurtovoy 写道:
>>>>>>
>>>>>> +Sometimes it's hard to implement the device in a transport specific
>>>>>> +method. One example is that a physical device may try to present
>>>>>> +multiple virtual devices with a limited transport specific
>>>>>> +resources. Another example is to implement virtual devices which is
>>>>>> +transport independent. In those cases, the admin virtqueue 
>>>>>> provided by
>>>>>> +the management device could be used to replace the transport 
>>>>>> specific
>>>>>> +method to implement the virtual device. Then the presenting of the
>>>>>> +virtual device is done through the cooperation between the admin
>>>>>> +virtqueue and the driver.
>>>>>
>>>>> maybe it's me, but I can't understand how admin queue is a transport.
>>>>
>>>>
>>>> The transport is the method that provides basic facility. In this 
>>>> proposal, the admin virtqueue is used to provide basic facility for 
>>>> the virtual device. That is to say, it's the transport for virtual 
>>>> device.
>>>>
>>>>
>>>>>
>>>>> And how can I use admin queue transport to migrate VFs that are 
>>>>> controlled by virtio PCI PF.
>>>>
>>>>
>>>> This live migration support and the admin virtqueue transport are 
>>>> orthogonal. The main motivation of this proposal is used for 
>>>> implementing virtual device transport via admin virtqueue. It's not 
>>>> hard to add new commands for doing live migration for the virtual 
>>>> device, I don't do that since I believe it's expected to be 
>>>> addressed in your proposal.
>>>
>>> so why do you call it in the same name that I used in my RFC ? This 
>>> is confusing and causing problems.
>>
>>
>> Max,
>>
>> I really think the game of "who comes first" is meaningless.
>>
>> I've used the terminology like "admin virtqueue" sometime early this 
>> year:
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F202101%2Fmsg00034.html&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cffc813bcfeb743b309cd08d957b17eed%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637637242113472461%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=GwU%2Fz5U4HIjsS2MH1%2BaI1u7ff8dIG%2F15zE5CwDJ91Cg%3D&amp;reserved=0 
>>
>>
>> I start the work of the admin virtqueue as a transport since then and 
>> I think I don't say "you call it the same name that I used before".
>
> I didn't say it matters who call it first.
>
> But it does matter for us to have different naming since it causes 
> confusion.
>
> In any case, the management device will introduce an admin queue 
> feature. So my RFC is defining this infrastructure. Your first commit 
> should be on top of my RFC introducing new admin commands for creating 
> virtualized devices (you can call it 
> VIRTIO_ADMIN_PCI_VIRTUALIZATION_MANAGEMENT class in the transport 
> command range).


Both proposals are RFC, why need to mandate that now? It's always not 
late to switch to whatever has been agreed or justified. No?


> The second commit should define the new transport and configuration 
> cycles to get to a point that the SF virtio device is ready to be used 
> by virtio driver.
>
>
>>
>>
>>>
>>> You are working on a parallel feature and reviewing my RFC as if it 
>>> was instead of your proposal.
>>
>>
>> Firstly, though they may have the same interface/commands, the two 
>> proposals serves for completely different goals.
>>
>> Secondly, it's about how to justify your proposal in the community, 
>> and I think I don't get the convincing answers for the following two 
>> points:
>>
>> 1) I have said that your proposal of using admin virtqueue for doing 
>> live migration makes sense, but we need the per function interface 
>> for nested virt
>
> You need to add the needed commands for vDPA to stop/start queues and 
> you'll have the nested migration. BTW, this stop/start is not part of 
> virtio migration.


Why not? My proposal is for virtio spec.


> It's a capabiliy that vDPA will use to implement vDPA live migration.


For vDPA device, we don't need to bother the spec. All vDPA devices 
supported by the Linux has already supported stop/start and indices 
save/restore (I meant mlx5e and IFCVF).


>
> So please don't mix virtio migration and vDPA migration.


It's not me but you that mixes the concept.

Virtio migration and vDPA migration are functional equivalent. Adding 
the live migration support for virtio spec will help for the vendor that 
doesn't want to go for vendor specific control path.


>
> Also in case we'll want nexted migration, the L1 VF (that is seem as 
> PF to the VM) can expose admin_q and migration caps to manage 
> migration process for L2 VFs.
>
> I already mentioned this.


But you tend to ignore the issues I've pointed out:

- How to migrate L1 in this case?
- If you want to migrate L(N) you need admin virtqueue in L0 to L(N-1)?

And ignore the suggestion that has nothing conflict with your proposal:

1) introduce both start/stop and device states as basic facility, that 
is to define the semantic and format
2) introduce the admin virtqueue and the commands for implementing the 
above facility

Then we leave the chance for the per-function interface which fits 
naturally for nested virtualization and for the vendor that doesn't want 
to have admin virtqueue. And we don't need to expose the admin virtqueue 
in the nested layers.


>
>>
>> 2) I've pointed out that using the general vitqueue for carrying 
>> vendor specific command breaks the efforts of spec as a standard device
>
> It's not breaking anything. Most of the specifications (NVMe, SCSI, 
> FC, more..) allow Vendors to innovate, and so does VIRTIO today. I 
> don't understand the objection and this is a contradiction to the 
> vendor specific cfg area you added for some reason in the past. I'm 
> repeating myself again and again.


The repeating is because you ignore my concerns. Such design violates 
the goal of the virtio spec for being a standard device:

- customer doesn't want to be locked by a specific vendor
- the spec doesn't prevent you from doing innovation under the virtio level
- the vendor specific cfg has not been used by any vendor so far, and we 
had that doesn't mean it's a good practice (or maybe we can try to 
deprecate that)
- the vendor specific feature may greatly complicate the live migration 
and its compatibility, virtio is deigned to be capable of migration, 
that's the most important difference compared to NVM and other 
architecture. A simple blob of (vendor specific) states just won't work, 
and they are a lot of other things you need to consider, e.g the 
migration comparability with the existing software backend or machine 
types. If you check the qemu git history, you can see how hard for 
maintaining the migration compatibility for the past 10 years. I don't 
think it can work well if we want to allow vendor specific state to be 
migrated.

NVME mandates the BAR layout so it must explicitly reserve BAR for 
vendor specific usage. Virtio-pci is much more flexible and doesn't 
mandate BAR layout and the driver discover the virtio facility via 
virtio capabilities. That means, the spec doesn't prevent you from 
adding vendor specific stuffs by using dedicated BARs. I've told you 
that they are other vendor that reserved the BAR for vendor specific 
functions which is just ignored by you.


>
> In case Virtio doesn't want to be innovative 


That's not true.  The innovation should be done at virtio level if it's 
a general feature.

For the innovation that depends on the vendor, it's not the innovation 
of virtio itself but the vendor. Let's don't do that at the spec since 
it's the wrong layer.


> and encourage vendors to innovate their products, lets keep the 
> 192-255 classes reserved for now and discuss it in the future again.


That should work.


>
>>
>> Lastly, this proposal is RFC, it's not perfect for sure. The most 
>> important thing for the current stage is not about how and when this 
>> can be merged but whether or not this approach can work. I post them 
>> now since a talk about the hyper scalability will be given at the KVM 
>> Forum then I need to post this as one of the approach before as a 
>> reference. There are vendors that are asking something like this as a 
>> reference for having better scalability than SR-IOV.
>>
>>
>>>
>>> IIUC, in your proposal a non SRIOV device parent will create admin 
>>> queue and using this admin queue you'll be able to create children 
>>> devices and their transport will be admin queue.
>>>
>>> That means that the configuration cycles will be trapped by the 
>>> parent device somehow.
>>
>>
>> That's the way for having better scalability. And this is also the 
>> approach that SIOV used.
>>
>>
>>>
>>> This also means we need to merge my RFC first to create 
>>> infrastructure for this RFC.
>>
>>
>> It doesn't matter which will be merged. We should guarantee:
>>
>> 1) The idea is justified by the community
>> 2) The merged proposal is extensible for accepting new features and 
>> commands
>>
>> For 2) I think both proposals can do that. And to me, it's not hard 
>> to switch to the similar interface as you invent.
>>
> so please be supportive and please understand that this RFC will allow 
> easier addition to your proposal for virtio SFs.


I really want to co-operate, but please make sure to answer the concerns 
that were raised. Since we're discussing the spec, it won't be as fast 
as a patch for Linux. We must be careful so please be patient.


>
>>
>>>
>>> For admin management that you'll need probably virtio-cli tool from 
>>> user space.
>>
>>
>> What does virtio-cli do? We've already had vdpa that is integrated 
>> into iproute2.
>>
>> And let me clarify the concept again: vDPA is a superset of virtio. 
>> That means virtio could be treated as one kind of vDPA.
>>
>> In this sense, I don't see a value of re-inventing the wheels again 
>> in the virtio-cli.
>
> virtio-cli has nothing to do with vDPA.
>
> It's a tool to configure virtio device from cmdline.


Again, please look at how Linux implement vDPA. It has already supported 
virtio-pci via the vp_vdpa driver. From its point of view, virtio-pci is 
yet another vendor specific implementation of virtio.

It's just as simple as to implement the vdpa management device in the 
vp_vdpa driver to leverage all the existing management capability of the 
vdpa tool for virtio device.

Let's don't duplicate the efforts.

Thanks


>
>
>>
>>
>>>
>>> So this proposal is complementary to mine. Your management device 
>>> will negotiate "my" admin_queue feature and you'll need to add more 
>>> commands to this admin queue that are probably in transport specific 
>>> domain to create children.
>>
>>
>> I think we can go either:
>>
>> 1) Make two virtqueues separately
>>
>> or
>>
>> 2) Using a single virtqueue
>>
>> And I would like to co-operate if 2) makes more sense.
>>
>>
>>>
>>> You do need to handle the configuration cycles that this management 
>>> parent device will need to support.
>>
>>
>> My proposals have already supported basic management like device 
>> creation and destroy. And I think it's not hard to extend it to other.
>
> see my suggestion above. Need to build this on top of my RFC before 
> defining the new transport.
>
>
>>
>> To reduce the complexity, I've stripped out a lot of features from 
>> the first RFC. It's better to start from the minimal set.
>>
>> Thanks
>>
>>
>>>
>>>
>>>>
>>>> For virtual device, it's a independent virtio device that could be 
>>>> assigned to secure DMA context/domain,  it is functional equivalent 
>>>> ADI or SF. The difference is that it can work with or without 
>>>> platform support (SIOV or PASID).
>>>>
>>>>
>>>>>
>>>>> And why the regular admin queue that is part of the device queues 
>>>>> can't fit to your needs ?
>>>>
>>>>
>>>> For "regular admin queue", did you mean your proposal. Actually, 
>>>> it's not conflict, we can unify the interface though the motivation 
>>>> is different.
>>>>
>>>>
>>>>>
>>>>> Can you explain your needs ? is it to create a vDPA device from 
>>>>> some SW interface ?
>>>>
>>>>
>>>> As stated in the patch, the needs are:
>>>>
>>>> - Presenting virtual devices with limited transport specific resources
>>>> - Presenting virtual devices without platform support (e.g SR-IOV 
>>>> or SIOV)
>>>>
>>>> We want virtio to have hyper-scalability via slicing at virtio 
>>>> level. It's not directly related to vDPA.
>>>>
>>>> For vDPA, vendor are freed to have their own technology to be hyper 
>>>> scalable (e.g SF, ADI or other stuffs).
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>
>>>>> I don't follow. 
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-05 19:19           ` Michael S. Tsirkin
@ 2021-08-06  2:39             ` Jason Wang
       [not found]               ` <20210806044426-mutt-send-email-mst@kernel.org>
  2021-08-19 12:54             ` Stefan Hajnoczi
  1 sibling, 1 reply; 26+ messages in thread
From: Jason Wang @ 2021-08-06  2:39 UTC (permalink / raw)
  To: Michael S. Tsirkin, Stefan Hajnoczi
  Cc: virtio-comment, cohuck, mgurtovoy, eperezma, lulu, Parav Pandit


在 2021/8/6 上午3:19, Michael S. Tsirkin 写道:
> On Thu, Aug 05, 2021 at 02:59:04PM +0100, Stefan Hajnoczi wrote:
>> On Thu, Aug 05, 2021 at 02:32:31PM +0800, Jason Wang wrote:
>>> 在 2021/8/4 下午8:50, Stefan Hajnoczi 写道:
>>>> On Wed, Aug 04, 2021 at 04:51:15PM +0800, Jason Wang wrote:
>>>>> 在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
>>>>>> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>>>>>> I tried to imagine what the virtio-blk vdev creation parameters need to
>>>>>> look like. Here is what I came up with:
>>>>>>
>>>>>>      Virtual Device Creation Parameters for Block Devices
>>>>>>      ----------------------------------------------------
>>>>>>      The following creation parameters specify the details of a new virtual
>>>>>>      block device:
>>>>>>
>>>>>>      Field        Type   Meaning
>>>>>>      ----------------------------------------------------------------------
>>>>>>      blkdev_id    u64    Identifier of the underlying block device that
>>>>>>                          provides storage. The enumeration and creation of
>>>>>>                          underlying block devices is
>>>>>>                          implementation-specific.
>>>>>>      num_queues   u16    Number of request virtqueues.
>>>>>>      features_len u8     Number of elements in features[].
>>>>> For 'elements' do you mean the 'u32 elements'?
>>>> Yes, u32 array elements.
>>>>
>>>>>>      features[]   u32    Device feature bits to report.
>>>>>>
>>>>>>      Creation error codes are as follows:
>>>>>>
>>>>>>      Error               Meaning
>>>>>>      ----------------------------------------------------------------------
>>>>>>      INVALID_BLKDEV_ID   The underlying block device does not exist.
>>>>>>      BLKDEV_BUSY         The underlying block device is already in use.
>>>>>>      BLKDEV_READ_ONLY    The underlying block device is read-only.
>>>>>>      INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
>>>>>>      UNSUPPORTED_FEATURE A feature bit was given that the device does not
>>>>>>                          support.
>>>>>>
>>>>>>      If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
>>>>>>      block device is made available for read-only access.
>>>>>>
>>>>>>      Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
>>>>>>      already in use is given.
>>>>>>
>>>>>>      Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
>>>>>>      does not support writes and the VIRTIO_BLK_F_RO bit is not set in
>>>>>>      features[].
>>>>>>
>>>>>>      The configuration space parameters (see 5.2.4 Device configuration
>>>>>>      layout) are determined by the device based on the underlying block
>>>>>>      device capacity, block size, etc.
>>>>>>
>>>>>> Note that this doesn't allow overriding configuration space parameters
>>>>>> (e.g. block size). We probably need to support that in the future for
>>>>>> live migration compatibility.
>>>>> I wonder do we need those configuration to be self-descriptive? E.g how did
>>>>> the device know that the config contains the blk_size. (I guess it's not a
>>>>> good practice to infer this from the config len).
>>>> The device configuration space size and layout is determined by the
>>>> device feature bits.
>>>
>>> So blk_size doesn't belong to any feature. I guess it means we should start
>>> the support of blk_size from day 0.
>> The device creation parameters can either include a full configuration
>> space-sized blob:
>>
>>    Field        Type                      Meaning
>>    ----------------------------------------------------------------------
>>    init_config  struct virtio_blk_config  Initial contents of the
>>                                           configuration space.
>>
>> or they can include individual fields (basically
>> re-define them outside struct virtio_foo_config):
>>
>>    Field        Type        Meaning
>>    ----------------------------------------------------------------------
>>    blk_size     u32         Block size.
>>
>> Which approach to use depends on how much of the configuration space
>> should be settable at device creation time. If most of it will be
>> initialized by the device and isn't configurable, then embedding the
>> entire struct is not necessary.
>> Additionally, there must be a flags field in the device creation
>> parameters for indicating which configuration space fields or individual
>> fields described above to use. This allows you to accept the default
>> blk_size value instead of providing your own value:
>>
>>    Field       Type         Meaning
>>    ----------------------------------------------------------------------
>>    init_flags  u64          Use the corresponding field value to
>>                             initialize the device configuration space if
>> 			   the flag is set:
>>
>> 			     INIT_BLK_SIZE (1 << 0)
>>
>>> I had some discussion with Parav about this in the series that introduces
>>> the netlink extension for setting up the device.
>>>
>>> I guess this is what we want:
>>>
>>> struct virtio_config {
>>> attribute_X; //only exist when feature X existing
>>> attribute_Y; //only exist when feature Y existing
>>> ...
>>> };
>> That's more or less how configuration space layout works today. We don't
>> have explicit comments in the header file but when feature X is enabled
>> the driver may access virtio_config::attribute_X.
>>
>> Stefan
>
> Two things I know about network devices
> - some VF configuration isn't in config space at all
>    since config space describes guest visible fields.
>    E.g. a vlan tag to be attached to all packets.


Yes, they are done via cvq. But we are discussing the way to implement 
the virtual device provisioning. In this case we probably don't need to 
care about vlan.


>
> - some VF configuration is normally settable by PF without
>    destroying/recreating the VF.
>    E.g. the default MAC address.


Right, it depends on the device features which should be contained in 
the config blob. So if the device doesn't allow the mac to be changed, 
we can fail the device creation.


>
> Does blk have such configuration?


I guess so. E.g the geometry and topology?

Thanks


>
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
       [not found]               ` <20210806044426-mutt-send-email-mst@kernel.org>
@ 2021-08-09  3:10                 ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2021-08-09  3:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, virtio-comment, cohuck, mgurtovoy, eperezma,
	lulu, Parav Pandit


在 2021/8/6 下午4:47, Michael S. Tsirkin 写道:
> On Fri, Aug 06, 2021 at 10:39:48AM +0800, Jason Wang wrote:
>> 在 2021/8/6 上午3:19, Michael S. Tsirkin 写道:
>>> On Thu, Aug 05, 2021 at 02:59:04PM +0100, Stefan Hajnoczi wrote:
>>>> On Thu, Aug 05, 2021 at 02:32:31PM +0800, Jason Wang wrote:
>>>>> 在 2021/8/4 下午8:50, Stefan Hajnoczi 写道:
>>>>>> On Wed, Aug 04, 2021 at 04:51:15PM +0800, Jason Wang wrote:
>>>>>>> 在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
>>>>>>>> On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
>>>>>>>> I tried to imagine what the virtio-blk vdev creation parameters need to
>>>>>>>> look like. Here is what I came up with:
>>>>>>>>
>>>>>>>>       Virtual Device Creation Parameters for Block Devices
>>>>>>>>       ----------------------------------------------------
>>>>>>>>       The following creation parameters specify the details of a new virtual
>>>>>>>>       block device:
>>>>>>>>
>>>>>>>>       Field        Type   Meaning
>>>>>>>>       ----------------------------------------------------------------------
>>>>>>>>       blkdev_id    u64    Identifier of the underlying block device that
>>>>>>>>                           provides storage. The enumeration and creation of
>>>>>>>>                           underlying block devices is
>>>>>>>>                           implementation-specific.
>>>>>>>>       num_queues   u16    Number of request virtqueues.
>>>>>>>>       features_len u8     Number of elements in features[].
>>>>>>> For 'elements' do you mean the 'u32 elements'?
>>>>>> Yes, u32 array elements.
>>>>>>
>>>>>>>>       features[]   u32    Device feature bits to report.
>>>>>>>>
>>>>>>>>       Creation error codes are as follows:
>>>>>>>>
>>>>>>>>       Error               Meaning
>>>>>>>>       ----------------------------------------------------------------------
>>>>>>>>       INVALID_BLKDEV_ID   The underlying block device does not exist.
>>>>>>>>       BLKDEV_BUSY         The underlying block device is already in use.
>>>>>>>>       BLKDEV_READ_ONLY    The underlying block device is read-only.
>>>>>>>>       INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
>>>>>>>>       UNSUPPORTED_FEATURE A feature bit was given that the device does not
>>>>>>>>                           support.
>>>>>>>>
>>>>>>>>       If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
>>>>>>>>       block device is made available for read-only access.
>>>>>>>>
>>>>>>>>       Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
>>>>>>>>       already in use is given.
>>>>>>>>
>>>>>>>>       Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
>>>>>>>>       does not support writes and the VIRTIO_BLK_F_RO bit is not set in
>>>>>>>>       features[].
>>>>>>>>
>>>>>>>>       The configuration space parameters (see 5.2.4 Device configuration
>>>>>>>>       layout) are determined by the device based on the underlying block
>>>>>>>>       device capacity, block size, etc.
>>>>>>>>
>>>>>>>> Note that this doesn't allow overriding configuration space parameters
>>>>>>>> (e.g. block size). We probably need to support that in the future for
>>>>>>>> live migration compatibility.
>>>>>>> I wonder do we need those configuration to be self-descriptive? E.g how did
>>>>>>> the device know that the config contains the blk_size. (I guess it's not a
>>>>>>> good practice to infer this from the config len).
>>>>>> The device configuration space size and layout is determined by the
>>>>>> device feature bits.
>>>>> So blk_size doesn't belong to any feature. I guess it means we should start
>>>>> the support of blk_size from day 0.
>>>> The device creation parameters can either include a full configuration
>>>> space-sized blob:
>>>>
>>>>     Field        Type                      Meaning
>>>>     ----------------------------------------------------------------------
>>>>     init_config  struct virtio_blk_config  Initial contents of the
>>>>                                            configuration space.
>>>>
>>>> or they can include individual fields (basically
>>>> re-define them outside struct virtio_foo_config):
>>>>
>>>>     Field        Type        Meaning
>>>>     ----------------------------------------------------------------------
>>>>     blk_size     u32         Block size.
>>>>
>>>> Which approach to use depends on how much of the configuration space
>>>> should be settable at device creation time. If most of it will be
>>>> initialized by the device and isn't configurable, then embedding the
>>>> entire struct is not necessary.
>>>> Additionally, there must be a flags field in the device creation
>>>> parameters for indicating which configuration space fields or individual
>>>> fields described above to use. This allows you to accept the default
>>>> blk_size value instead of providing your own value:
>>>>
>>>>     Field       Type         Meaning
>>>>     ----------------------------------------------------------------------
>>>>     init_flags  u64          Use the corresponding field value to
>>>>                              initialize the device configuration space if
>>>> 			   the flag is set:
>>>>
>>>> 			     INIT_BLK_SIZE (1 << 0)
>>>>
>>>>> I had some discussion with Parav about this in the series that introduces
>>>>> the netlink extension for setting up the device.
>>>>>
>>>>> I guess this is what we want:
>>>>>
>>>>> struct virtio_config {
>>>>> attribute_X; //only exist when feature X existing
>>>>> attribute_Y; //only exist when feature Y existing
>>>>> ...
>>>>> };
>>>> That's more or less how configuration space layout works today. We don't
>>>> have explicit comments in the header file but when feature X is enabled
>>>> the driver may access virtio_config::attribute_X.
>>>>
>>>> Stefan
>>> Two things I know about network devices
>>> - some VF configuration isn't in config space at all
>>>     since config space describes guest visible fields.
>>>     E.g. a vlan tag to be attached to all packets.
>>
>> Yes, they are done via cvq. But we are discussing the way to implement the
>> virtual device provisioning. In this case we probably don't need to care
>> about vlan.
> Looks like I'm unclear. Host might want to attach a vlan to all packets
> going through a VF transparently to guest. This is not something that
> can be controlled through a cvq, that is guest visible.


I see, so I think we need to standardize them in the spec first. Then we 
can introduce the API for device provisioning.


>
>
>>> - some VF configuration is normally settable by PF without
>>>     destroying/recreating the VF.
>>>     E.g. the default MAC address.
>>
>> Right, it depends on the device features which should be contained in the
>> config blob. So if the device doesn't allow the mac to be changed, we can
>> fail the device creation.
> Again I am trying to make a distinction. Guest settable mac can be off but
> host can set mac almost universally. this is not something that is
> covered by features - features cover guest visible aspects.


Right, see above. I think we need to standardize the mac configuration 
as well.

Thanks


>>> Does blk have such configuration?
>>
>> I guess so. E.g the geometry and topology?
>>
>> Thanks
>>
>>
>>>
>>>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-06  2:26           ` Jason Wang
@ 2021-08-11 10:00             ` Max Gurtovoy
  2021-08-12  3:02               ` [virtio-comment] " Jason Wang
  0 siblings, 1 reply; 26+ messages in thread
From: Max Gurtovoy @ 2021-08-11 10:00 UTC (permalink / raw)
  To: Jason Wang, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


On 8/6/2021 5:26 AM, Jason Wang wrote:
>
> 在 2021/8/5 下午8:37, Max Gurtovoy 写道:
>>
>> On 8/5/2021 4:36 AM, Jason Wang wrote:
>>>
>>> 在 2021/8/4 下午6:20, Max Gurtovoy 写道:
>>>>
>>>> On 8/4/2021 4:37 AM, Jason Wang wrote:
>>>>>
>>>>> 在 2021/8/3 下午8:40, Max Gurtovoy 写道:
>>>>>>>
>>>>>>> +Sometimes it's hard to implement the device in a transport 
>>>>>>> specific
>>>>>>> +method. One example is that a physical device may try to present
>>>>>>> +multiple virtual devices with a limited transport specific
>>>>>>> +resources. Another example is to implement virtual devices 
>>>>>>> which is
>>>>>>> +transport independent. In those cases, the admin virtqueue 
>>>>>>> provided by
>>>>>>> +the management device could be used to replace the transport 
>>>>>>> specific
>>>>>>> +method to implement the virtual device. Then the presenting of the
>>>>>>> +virtual device is done through the cooperation between the admin
>>>>>>> +virtqueue and the driver.
>>>>>>
>>>>>> maybe it's me, but I can't understand how admin queue is a 
>>>>>> transport.
>>>>>
>>>>>
>>>>> The transport is the method that provides basic facility. In this 
>>>>> proposal, the admin virtqueue is used to provide basic facility 
>>>>> for the virtual device. That is to say, it's the transport for 
>>>>> virtual device.
>>>>>
>>>>>
>>>>>>
>>>>>> And how can I use admin queue transport to migrate VFs that are 
>>>>>> controlled by virtio PCI PF.
>>>>>
>>>>>
>>>>> This live migration support and the admin virtqueue transport are 
>>>>> orthogonal. The main motivation of this proposal is used for 
>>>>> implementing virtual device transport via admin virtqueue. It's 
>>>>> not hard to add new commands for doing live migration for the 
>>>>> virtual device, I don't do that since I believe it's expected to 
>>>>> be addressed in your proposal.
>>>>
>>>> so why do you call it in the same name that I used in my RFC ? This 
>>>> is confusing and causing problems.
>>>
>>>
>>> Max,
>>>
>>> I really think the game of "who comes first" is meaningless.
>>>
>>> I've used the terminology like "admin virtqueue" sometime early this 
>>> year:
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F202101%2Fmsg00034.html&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cc99993bf49e244bba93108d958819b84%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637638135950808128%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=F4XXvOttuggAOy34SYHMRigm80xp%2BLiUqGb9UVVhQJU%3D&amp;reserved=0 
>>>
>>>
>>> I start the work of the admin virtqueue as a transport since then 
>>> and I think I don't say "you call it the same name that I used before".
>>
>> I didn't say it matters who call it first.
>>
>> But it does matter for us to have different naming since it causes 
>> confusion.
>>
>> In any case, the management device will introduce an admin queue 
>> feature. So my RFC is defining this infrastructure. Your first commit 
>> should be on top of my RFC introducing new admin commands for 
>> creating virtualized devices (you can call it 
>> VIRTIO_ADMIN_PCI_VIRTUALIZATION_MANAGEMENT class in the transport 
>> command range).
>
>
> Both proposals are RFC, why need to mandate that now? It's always not 
> late to switch to whatever has been agreed or justified. No?

I would like to cooperate and not block each other.

>
>
>> The second commit should define the new transport and configuration 
>> cycles to get to a point that the SF virtio device is ready to be 
>> used by virtio driver.
>>
>>
>>>
>>>
>>>>
>>>> You are working on a parallel feature and reviewing my RFC as if it 
>>>> was instead of your proposal.
>>>
>>>
>>> Firstly, though they may have the same interface/commands, the two 
>>> proposals serves for completely different goals.
>>>
>>> Secondly, it's about how to justify your proposal in the community, 
>>> and I think I don't get the convincing answers for the following two 
>>> points:
>>>
>>> 1) I have said that your proposal of using admin virtqueue for doing 
>>> live migration makes sense, but we need the per function interface 
>>> for nested virt
>>
>> You need to add the needed commands for vDPA to stop/start queues and 
>> you'll have the nested migration. BTW, this stop/start is not part of 
>> virtio migration.
>
>
> Why not? My proposal is for virtio spec.
>
>
>> It's a capabiliy that vDPA will use to implement vDPA live migration.
>
>
> For vDPA device, we don't need to bother the spec. All vDPA devices 
> supported by the Linux has already supported stop/start and indices 
> save/restore (I meant mlx5e and IFCVF).
>
>
>>
>> So please don't mix virtio migration and vDPA migration.
>
>
> It's not me but you that mixes the concept.
>
> Virtio migration and vDPA migration are functional equivalent. Adding 
> the live migration support for virtio spec will help for the vendor 
> that doesn't want to go for vendor specific control path.

so how will virtio Live migration spec help vDPA ?

Vendor create only virtq acceleration to implement virtio.

>
>
>>
>> Also in case we'll want nexted migration, the L1 VF (that is seem as 
>> PF to the VM) can expose admin_q and migration caps to manage 
>> migration process for L2 VFs.
>>
>> I already mentioned this.
>
>
> But you tend to ignore the issues I've pointed out:
>
> - How to migrate L1 in this case?
> - If you want to migrate L(N) you need admin virtqueue in L0 to L(N-1)?

I don't ignore it.

For some reason you ask about Live migration in the RFC for AdminQ. I'm 
trying to build a concept and infrastructure.

Lets focus.

>
> And ignore the suggestion that has nothing conflict with your proposal:
>
> 1) introduce both start/stop and device states as basic facility, that 
> is to define the semantic and format
> 2) introduce the admin virtqueue and the commands for implementing the 
> above facility
>
> Then we leave the chance for the per-function interface which fits 
> naturally for nested virtualization and for the vendor that doesn't 
> want to have admin virtqueue. And we don't need to expose the admin 
> virtqueue in the nested layers.

As mentioned, I want to agree on adminq infrastructure as proposed here.

Please don't involve the LM and nested LM.

>
>
>>
>>>
>>> 2) I've pointed out that using the general vitqueue for carrying 
>>> vendor specific command breaks the efforts of spec as a standard device
>>
>> It's not breaking anything. Most of the specifications (NVMe, SCSI, 
>> FC, more..) allow Vendors to innovate, and so does VIRTIO today. I 
>> don't understand the objection and this is a contradiction to the 
>> vendor specific cfg area you added for some reason in the past. I'm 
>> repeating myself again and again.
>
>
> The repeating is because you ignore my concerns. Such design violates 
> the goal of the virtio spec for being a standard device:
>
> - customer doesn't want to be locked by a specific vendor

Same as NVMe.

> - the spec doesn't prevent you from doing innovation under the virtio 
> level

do you mean in transport level ?

> - the vendor specific cfg has not been used by any vendor so far, and 
> we had that doesn't mean it's a good practice (or maybe we can try to 
> deprecate that)

I guess you added it for a reason.

> - the vendor specific feature may greatly complicate the live 
> migration and its compatibility, virtio is deigned to be capable of 
> migration, that's the most important difference compared to NVM and 
> other architecture. A simple blob of (vendor specific) states just 
> won't work, and they are a lot of other things you need to consider, 
> e.g the migration comparability with the existing software backend or 
> machine types. If you check the qemu git history, you can see how hard 
> for maintaining the migration compatibility for the past 10 years. I 
> don't think it can work well if we want to allow vendor specific state 
> to be migrated.

SW migration and HW migration is different.

Migrating between 2 different HW vendors will not be supported at stage 1.

And again, the discussion is not related to this RFC.

>
> NVME mandates the BAR layout so it must explicitly reserve BAR for 
> vendor specific usage. Virtio-pci is much more flexible and doesn't 
> mandate BAR layout and the driver discover the virtio facility via 
> virtio capabilities. That means, the spec doesn't prevent you from 
> adding vendor specific stuffs by using dedicated BARs. I've told you 
> that they are other vendor that reserved the BAR for vendor specific 
> functions which is just ignored by you.

But NVMe allows vendor specific IO and Admin commands that are not 
related to the BAR.

Why is this a problem for virtio ?

>
>
>>
>> In case Virtio doesn't want to be innovative 
>
>
> That's not true.  The innovation should be done at virtio level if 
> it's a general feature.
>
> For the innovation that depends on the vendor, it's not the innovation 
> of virtio itself but the vendor. Let's don't do that at the spec since 
> it's the wrong layer.

I disagree.

The window should be opened in the spec.


>
>
>> and encourage vendors to innovate their products, lets keep the 
>> 192-255 classes reserved for now and discuss it in the future again.
>
>
> That should work.
>
>
>>
>>>
>>> Lastly, this proposal is RFC, it's not perfect for sure. The most 
>>> important thing for the current stage is not about how and when this 
>>> can be merged but whether or not this approach can work. I post them 
>>> now since a talk about the hyper scalability will be given at the 
>>> KVM Forum then I need to post this as one of the approach before as 
>>> a reference. There are vendors that are asking something like this 
>>> as a reference for having better scalability than SR-IOV.
>>>
>>>
>>>>
>>>> IIUC, in your proposal a non SRIOV device parent will create admin 
>>>> queue and using this admin queue you'll be able to create children 
>>>> devices and their transport will be admin queue.
>>>>
>>>> That means that the configuration cycles will be trapped by the 
>>>> parent device somehow.
>>>
>>>
>>> That's the way for having better scalability. And this is also the 
>>> approach that SIOV used.
>>>
>>>
>>>>
>>>> This also means we need to merge my RFC first to create 
>>>> infrastructure for this RFC.
>>>
>>>
>>> It doesn't matter which will be merged. We should guarantee:
>>>
>>> 1) The idea is justified by the community
>>> 2) The merged proposal is extensible for accepting new features and 
>>> commands
>>>
>>> For 2) I think both proposals can do that. And to me, it's not hard 
>>> to switch to the similar interface as you invent.
>>>
>> so please be supportive and please understand that this RFC will 
>> allow easier addition to your proposal for virtio SFs.
>
>
> I really want to co-operate, but please make sure to answer the 
> concerns that were raised. Since we're discussing the spec, it won't 
> be as fast as a patch for Linux. We must be careful so please be patient.
>
>
>>
>>>
>>>>
>>>> For admin management that you'll need probably virtio-cli tool from 
>>>> user space.
>>>
>>>
>>> What does virtio-cli do? We've already had vdpa that is integrated 
>>> into iproute2.
>>>
>>> And let me clarify the concept again: vDPA is a superset of virtio. 
>>> That means virtio could be treated as one kind of vDPA.
>>>
>>> In this sense, I don't see a value of re-inventing the wheels again 
>>> in the virtio-cli.
>>
>> virtio-cli has nothing to do with vDPA.
>>
>> It's a tool to configure virtio device from cmdline.
>
>
> Again, please look at how Linux implement vDPA. It has already 
> supported virtio-pci via the vp_vdpa driver. From its point of view, 
> virtio-pci is yet another vendor specific implementation of virtio.
>
> It's just as simple as to implement the vdpa management device in the 
> vp_vdpa driver to leverage all the existing management capability of 
> the vdpa tool for virtio device.
>
> Let's don't duplicate the efforts.

You're looking only on Linux and I look on the virtio as a spec that is 
being used by other OS as well.

I understand your will for vDPA, but you need to make a separation 
between vDPA and virtio.


>
> Thanks
>
>
>>
>>
>>>
>>>
>>>>
>>>> So this proposal is complementary to mine. Your management device 
>>>> will negotiate "my" admin_queue feature and you'll need to add more 
>>>> commands to this admin queue that are probably in transport 
>>>> specific domain to create children.
>>>
>>>
>>> I think we can go either:
>>>
>>> 1) Make two virtqueues separately
>>>
>>> or
>>>
>>> 2) Using a single virtqueue
>>>
>>> And I would like to co-operate if 2) makes more sense.
>>>
>>>
>>>>
>>>> You do need to handle the configuration cycles that this management 
>>>> parent device will need to support.
>>>
>>>
>>> My proposals have already supported basic management like device 
>>> creation and destroy. And I think it's not hard to extend it to other.
>>
>> see my suggestion above. Need to build this on top of my RFC before 
>> defining the new transport.
>>
>>
>>>
>>> To reduce the complexity, I've stripped out a lot of features from 
>>> the first RFC. It's better to start from the minimal set.
>>>
>>> Thanks
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> For virtual device, it's a independent virtio device that could be 
>>>>> assigned to secure DMA context/domain,  it is functional 
>>>>> equivalent ADI or SF. The difference is that it can work with or 
>>>>> without platform support (SIOV or PASID).
>>>>>
>>>>>
>>>>>>
>>>>>> And why the regular admin queue that is part of the device queues 
>>>>>> can't fit to your needs ?
>>>>>
>>>>>
>>>>> For "regular admin queue", did you mean your proposal. Actually, 
>>>>> it's not conflict, we can unify the interface though the 
>>>>> motivation is different.
>>>>>
>>>>>
>>>>>>
>>>>>> Can you explain your needs ? is it to create a vDPA device from 
>>>>>> some SW interface ?
>>>>>
>>>>>
>>>>> As stated in the patch, the needs are:
>>>>>
>>>>> - Presenting virtual devices with limited transport specific 
>>>>> resources
>>>>> - Presenting virtual devices without platform support (e.g SR-IOV 
>>>>> or SIOV)
>>>>>
>>>>> We want virtio to have hyper-scalability via slicing at virtio 
>>>>> level. It's not directly related to vDPA.
>>>>>
>>>>> For vDPA, vendor are freed to have their own technology to be 
>>>>> hyper scalable (e.g SF, ADI or other stuffs).
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>>
>>>>>> I don't follow. 
>>>>>
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [virtio-comment] Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-11 10:00             ` Max Gurtovoy
@ 2021-08-12  3:02               ` Jason Wang
  0 siblings, 0 replies; 26+ messages in thread
From: Jason Wang @ 2021-08-12  3:02 UTC (permalink / raw)
  To: Max Gurtovoy, virtio-comment; +Cc: mst, cohuck, stefanha, eperezma, lulu


在 2021/8/11 下午6:00, Max Gurtovoy 写道:
>
> On 8/6/2021 5:26 AM, Jason Wang wrote:
>>
>> 在 2021/8/5 下午8:37, Max Gurtovoy 写道:
>>>
>>> On 8/5/2021 4:36 AM, Jason Wang wrote:
>>>>
>>>> 在 2021/8/4 下午6:20, Max Gurtovoy 写道:
>>>>>
>>>>> On 8/4/2021 4:37 AM, Jason Wang wrote:
>>>>>>
>>>>>> 在 2021/8/3 下午8:40, Max Gurtovoy 写道:
>>>>>>>>
>>>>>>>> +Sometimes it's hard to implement the device in a transport 
>>>>>>>> specific
>>>>>>>> +method. One example is that a physical device may try to present
>>>>>>>> +multiple virtual devices with a limited transport specific
>>>>>>>> +resources. Another example is to implement virtual devices 
>>>>>>>> which is
>>>>>>>> +transport independent. In those cases, the admin virtqueue 
>>>>>>>> provided by
>>>>>>>> +the management device could be used to replace the transport 
>>>>>>>> specific
>>>>>>>> +method to implement the virtual device. Then the presenting of 
>>>>>>>> the
>>>>>>>> +virtual device is done through the cooperation between the admin
>>>>>>>> +virtqueue and the driver.
>>>>>>>
>>>>>>> maybe it's me, but I can't understand how admin queue is a 
>>>>>>> transport.
>>>>>>
>>>>>>
>>>>>> The transport is the method that provides basic facility. In this 
>>>>>> proposal, the admin virtqueue is used to provide basic facility 
>>>>>> for the virtual device. That is to say, it's the transport for 
>>>>>> virtual device.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> And how can I use admin queue transport to migrate VFs that are 
>>>>>>> controlled by virtio PCI PF.
>>>>>>
>>>>>>
>>>>>> This live migration support and the admin virtqueue transport are 
>>>>>> orthogonal. The main motivation of this proposal is used for 
>>>>>> implementing virtual device transport via admin virtqueue. It's 
>>>>>> not hard to add new commands for doing live migration for the 
>>>>>> virtual device, I don't do that since I believe it's expected to 
>>>>>> be addressed in your proposal.
>>>>>
>>>>> so why do you call it in the same name that I used in my RFC ? 
>>>>> This is confusing and causing problems.
>>>>
>>>>
>>>> Max,
>>>>
>>>> I really think the game of "who comes first" is meaningless.
>>>>
>>>> I've used the terminology like "admin virtqueue" sometime early 
>>>> this year:
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.oasis-open.org%2Farchives%2Fvirtio-comment%2F202101%2Fmsg00034.html&amp;data=04%7C01%7Cmgurtovoy%40nvidia.com%7Cc99993bf49e244bba93108d958819b84%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637638135950808128%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=F4XXvOttuggAOy34SYHMRigm80xp%2BLiUqGb9UVVhQJU%3D&amp;reserved=0 
>>>>
>>>>
>>>> I start the work of the admin virtqueue as a transport since then 
>>>> and I think I don't say "you call it the same name that I used 
>>>> before".
>>>
>>> I didn't say it matters who call it first.
>>>
>>> But it does matter for us to have different naming since it causes 
>>> confusion.
>>>
>>> In any case, the management device will introduce an admin queue 
>>> feature. So my RFC is defining this infrastructure. Your first 
>>> commit should be on top of my RFC introducing new admin commands for 
>>> creating virtualized devices (you can call it 
>>> VIRTIO_ADMIN_PCI_VIRTUALIZATION_MANAGEMENT class in the transport 
>>> command range).
>>
>>
>> Both proposals are RFC, why need to mandate that now? It's always not 
>> late to switch to whatever has been agreed or justified. No?
>
> I would like to cooperate and not block each other.


It doesn't block since they are both RFCs.

I start the work from Jan, though it's trivial but it still a lot of 
work of switching to your proposal.

As I said, it's not late to switch to your format if it is agreed.


>
>>
>>
>>> The second commit should define the new transport and configuration 
>>> cycles to get to a point that the SF virtio device is ready to be 
>>> used by virtio driver.
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> You are working on a parallel feature and reviewing my RFC as if 
>>>>> it was instead of your proposal.
>>>>
>>>>
>>>> Firstly, though they may have the same interface/commands, the two 
>>>> proposals serves for completely different goals.
>>>>
>>>> Secondly, it's about how to justify your proposal in the community, 
>>>> and I think I don't get the convincing answers for the following 
>>>> two points:
>>>>
>>>> 1) I have said that your proposal of using admin virtqueue for 
>>>> doing live migration makes sense, but we need the per function 
>>>> interface for nested virt
>>>
>>> You need to add the needed commands for vDPA to stop/start queues 
>>> and you'll have the nested migration. BTW, this stop/start is not 
>>> part of virtio migration.
>>
>>
>> Why not? My proposal is for virtio spec.
>>
>>
>>> It's a capabiliy that vDPA will use to implement vDPA live migration.
>>
>>
>> For vDPA device, we don't need to bother the spec. All vDPA devices 
>> supported by the Linux has already supported stop/start and indices 
>> save/restore (I meant mlx5e and IFCVF).
>>
>>
>>>
>>> So please don't mix virtio migration and vDPA migration.
>>
>>
>> It's not me but you that mixes the concept.
>>
>> Virtio migration and vDPA migration are functional equivalent. Adding 
>> the live migration support for virtio spec will help for the vendor 
>> that doesn't want to go for vendor specific control path.
>
> so how will virtio Live migration spec help vDPA ?


Virtio spec defines want to migrate, vDPA can use vendor specific way to 
save and get the state.


>
> Vendor create only virtq acceleration to implement virtio.


So let's no consider the vDPA as somehow an acceleration. It's actual a 
vendor specific transport/control path. This has been clarified in the 
kernel commit log.

We can add a vDPA transport to the spec if it helps to reduce the confusion.


>
>>
>>
>>>
>>> Also in case we'll want nexted migration, the L1 VF (that is seem as 
>>> PF to the VM) can expose admin_q and migration caps to manage 
>>> migration process for L2 VFs.
>>>
>>> I already mentioned this.
>>
>>
>> But you tend to ignore the issues I've pointed out:
>>
>> - How to migrate L1 in this case?
>> - If you want to migrate L(N) you need admin virtqueue in L0 to L(N-1)?
>
> I don't ignore it.
>
> For some reason you ask about Live migration in the RFC for AdminQ. 
> I'm trying to build a concept and infrastructure.


Live migration is very important for virtio, When we design a 
infrastructure, we should make sure it can be migrated easily instead of 
introducing blockers.


>
> Lets focus.
>
>>
>> And ignore the suggestion that has nothing conflict with your proposal:
>>
>> 1) introduce both start/stop and device states as basic facility, 
>> that is to define the semantic and format
>> 2) introduce the admin virtqueue and the commands for implementing 
>> the above facility
>>
>> Then we leave the chance for the per-function interface which fits 
>> naturally for nested virtualization and for the vendor that doesn't 
>> want to have admin virtqueue. And we don't need to expose the admin 
>> virtqueue in the nested layers.
>
> As mentioned, I want to agree on adminq infrastructure as proposed here.


Well in your patch you mentioned that it supports live migration.


>
> Please don't involve the LM and nested LM.


Again, live migration is very important, let's consider that from the 
beginning.

For the case of nest, live migration is just one of the issues. 
Basically if you tie a feature to admin virtqueue, it would be very hard 
for the guest to use that.

You may say VF can have an admin virtqueue, then you end up with more 
sophisticated problem:

1) we need isolate the DMA since admin virtqueue is expected to be used 
by VMM, this means something like a PASID is a must

2) you need to live migration admin virtqueue


>
>>
>>
>>>
>>>>
>>>> 2) I've pointed out that using the general vitqueue for carrying 
>>>> vendor specific command breaks the efforts of spec as a standard 
>>>> device
>>>
>>> It's not breaking anything. Most of the specifications (NVMe, SCSI, 
>>> FC, more..) allow Vendors to innovate, and so does VIRTIO today. I 
>>> don't understand the objection and this is a contradiction to the 
>>> vendor specific cfg area you added for some reason in the past. I'm 
>>> repeating myself again and again.
>>
>>
>> The repeating is because you ignore my concerns. Such design violates 
>> the goal of the virtio spec for being a standard device:
>>
>> - customer doesn't want to be locked by a specific vendor
>
> Same as NVMe.


Well, you try to expose a "standard" virtio device for guest but some 
features must be used through a vendor specific way (e.g the cli you've 
mentioned). Isn't this a lock?


>
>> - the spec doesn't prevent you from doing innovation under the virtio 
>> level
>
> do you mean in transport level ?


Yes.


>
>> - the vendor specific cfg has not been used by any vendor so far, and 
>> we had that doesn't mean it's a good practice (or maybe we can try to 
>> deprecate that)
>
> I guess you added it for a reason.


So the spec said:

"
The optional Vendor data capability allows the device to present
vendor-specific data to the driver, without
conflicts, for debugging and/or reporting purposes,
and without conflicting with standard functionality.
"

It is only used for present data, or debugging. It's not used for adding 
innovations.

Obviously, it's a blocker for live migration.


>
>> - the vendor specific feature may greatly complicate the live 
>> migration and its compatibility, virtio is deigned to be capable of 
>> migration, that's the most important difference compared to NVM and 
>> other architecture. A simple blob of (vendor specific) states just 
>> won't work, and they are a lot of other things you need to consider, 
>> e.g the migration comparability with the existing software backend or 
>> machine types. If you check the qemu git history, you can see how 
>> hard for maintaining the migration compatibility for the past 10 
>> years. I don't think it can work well if we want to allow vendor 
>> specific state to be migrated.
>
> SW migration and HW migration is different.


We are talking about HW migration, isn't it?

But keeping the migration compatibility with software migration is very 
important.

The design will be less interested if it doesn't support his.


>
> Migrating between 2 different HW vendors will not be supported at 
> stage 1.


Why? We are pretty sure it can be done at least for virtio-net and 
virtio-blk. Let's don't go back.


>
> And again, the discussion is not related to this RFC.
>
>>
>> NVME mandates the BAR layout so it must explicitly reserve BAR for 
>> vendor specific usage. Virtio-pci is much more flexible and doesn't 
>> mandate BAR layout and the driver discover the virtio facility via 
>> virtio capabilities. That means, the spec doesn't prevent you from 
>> adding vendor specific stuffs by using dedicated BARs. I've told you 
>> that they are other vendor that reserved the BAR for vendor specific 
>> functions which is just ignored by you.
>
> But NVMe allows vendor specific IO and Admin commands that are not 
> related to the BAR.


It's a migration blocker.


>
> Why is this a problem for virtio ?


Virtio is born for virt. Please remember, (cross vendor/implementation) 
migration is one of the key advantage for virtio. This is different from 
NVME.

AFAIK, in order to live migrate NVME, the (smartNIC) vendor have done a 
lot of vendor specific tricks which is fragile and makes it impossible 
to cross the vendor.


>
>>
>>
>>>
>>> In case Virtio doesn't want to be innovative 
>>
>>
>> That's not true.  The innovation should be done at virtio level if 
>> it's a general feature.
>>
>> For the innovation that depends on the vendor, it's not the 
>> innovation of virtio itself but the vendor. Let's don't do that at 
>> the spec since it's the wrong layer.
>
> I disagree.
>
> The window should be opened in the spec.


You don't answer why is must be done in the spec, especially:

1) it breaks migration
2) spec doesn't forbid it at transport level
3) a concrete example has been given for how to do that (IFCVF)


>
>
>>
>>
>>> and encourage vendors to innovate their products, lets keep the 
>>> 192-255 classes reserved for now and discuss it in the future again.
>>
>>
>> That should work.
>>
>>
>>>
>>>>
>>>> Lastly, this proposal is RFC, it's not perfect for sure. The most 
>>>> important thing for the current stage is not about how and when 
>>>> this can be merged but whether or not this approach can work. I 
>>>> post them now since a talk about the hyper scalability will be 
>>>> given at the KVM Forum then I need to post this as one of the 
>>>> approach before as a reference. There are vendors that are asking 
>>>> something like this as a reference for having better scalability 
>>>> than SR-IOV.
>>>>
>>>>
>>>>>
>>>>> IIUC, in your proposal a non SRIOV device parent will create admin 
>>>>> queue and using this admin queue you'll be able to create children 
>>>>> devices and their transport will be admin queue.
>>>>>
>>>>> That means that the configuration cycles will be trapped by the 
>>>>> parent device somehow.
>>>>
>>>>
>>>> That's the way for having better scalability. And this is also the 
>>>> approach that SIOV used.
>>>>
>>>>
>>>>>
>>>>> This also means we need to merge my RFC first to create 
>>>>> infrastructure for this RFC.
>>>>
>>>>
>>>> It doesn't matter which will be merged. We should guarantee:
>>>>
>>>> 1) The idea is justified by the community
>>>> 2) The merged proposal is extensible for accepting new features and 
>>>> commands
>>>>
>>>> For 2) I think both proposals can do that. And to me, it's not hard 
>>>> to switch to the similar interface as you invent.
>>>>
>>> so please be supportive and please understand that this RFC will 
>>> allow easier addition to your proposal for virtio SFs.
>>
>>
>> I really want to co-operate, but please make sure to answer the 
>> concerns that were raised. Since we're discussing the spec, it won't 
>> be as fast as a patch for Linux. We must be careful so please be 
>> patient.
>>
>>
>>>
>>>>
>>>>>
>>>>> For admin management that you'll need probably virtio-cli tool 
>>>>> from user space.
>>>>
>>>>
>>>> What does virtio-cli do? We've already had vdpa that is integrated 
>>>> into iproute2.
>>>>
>>>> And let me clarify the concept again: vDPA is a superset of virtio. 
>>>> That means virtio could be treated as one kind of vDPA.
>>>>
>>>> In this sense, I don't see a value of re-inventing the wheels again 
>>>> in the virtio-cli.
>>>
>>> virtio-cli has nothing to do with vDPA.
>>>
>>> It's a tool to configure virtio device from cmdline.
>>
>>
>> Again, please look at how Linux implement vDPA. It has already 
>> supported virtio-pci via the vp_vdpa driver. From its point of view, 
>> virtio-pci is yet another vendor specific implementation of virtio.
>>
>> It's just as simple as to implement the vdpa management device in the 
>> vp_vdpa driver to leverage all the existing management capability of 
>> the vdpa tool for virtio device.
>>
>> Let's don't duplicate the efforts.
>
> You're looking only on Linux and I look on the virtio as a spec that 
> is being used by other OS as well.


Well, you've mentioned the virtio-cli which must be OS specific. I don't 
think the linux specific stuffs like netlink can work for other OSes.

As I repeat many times. vDPA is a hardware concept not software and not 
specific to Linux. And I don't think the vendor will design a hardware 
that just work for Linux.


>
> I understand your will for vDPA, but you need to make a separation 
> between vDPA and virtio.


Let's treat vDPA as a vendor specific transport. This will make the 
discussion more constructive.

Thanks


>
>
>>
>> Thanks
>>
>>
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> So this proposal is complementary to mine. Your management device 
>>>>> will negotiate "my" admin_queue feature and you'll need to add 
>>>>> more commands to this admin queue that are probably in transport 
>>>>> specific domain to create children.
>>>>
>>>>
>>>> I think we can go either:
>>>>
>>>> 1) Make two virtqueues separately
>>>>
>>>> or
>>>>
>>>> 2) Using a single virtqueue
>>>>
>>>> And I would like to co-operate if 2) makes more sense.
>>>>
>>>>
>>>>>
>>>>> You do need to handle the configuration cycles that this 
>>>>> management parent device will need to support.
>>>>
>>>>
>>>> My proposals have already supported basic management like device 
>>>> creation and destroy. And I think it's not hard to extend it to other.
>>>
>>> see my suggestion above. Need to build this on top of my RFC before 
>>> defining the new transport.
>>>
>>>
>>>>
>>>> To reduce the complexity, I've stripped out a lot of features from 
>>>> the first RFC. It's better to start from the minimal set.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> For virtual device, it's a independent virtio device that could 
>>>>>> be assigned to secure DMA context/domain,  it is functional 
>>>>>> equivalent ADI or SF. The difference is that it can work with or 
>>>>>> without platform support (SIOV or PASID).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> And why the regular admin queue that is part of the device 
>>>>>>> queues can't fit to your needs ?
>>>>>>
>>>>>>
>>>>>> For "regular admin queue", did you mean your proposal. Actually, 
>>>>>> it's not conflict, we can unify the interface though the 
>>>>>> motivation is different.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Can you explain your needs ? is it to create a vDPA device from 
>>>>>>> some SW interface ?
>>>>>>
>>>>>>
>>>>>> As stated in the patch, the needs are:
>>>>>>
>>>>>> - Presenting virtual devices with limited transport specific 
>>>>>> resources
>>>>>> - Presenting virtual devices without platform support (e.g SR-IOV 
>>>>>> or SIOV)
>>>>>>
>>>>>> We want virtio to have hyper-scalability via slicing at virtio 
>>>>>> level. It's not directly related to vDPA.
>>>>>>
>>>>>> For vDPA, vendor are freed to have their own technology to be 
>>>>>> hyper scalable (e.g SF, ADI or other stuffs).
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I don't follow. 
>>>>>>
>>>>>
>>>>
>>>
>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: 
> https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] Introduce admin virtqueue as a new transport
  2021-08-05 19:19           ` Michael S. Tsirkin
  2021-08-06  2:39             ` Jason Wang
@ 2021-08-19 12:54             ` Stefan Hajnoczi
  1 sibling, 0 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2021-08-19 12:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtio-comment, cohuck, mgurtovoy, eperezma, lulu,
	Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 6627 bytes --]

On Thu, Aug 05, 2021 at 03:19:31PM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 05, 2021 at 02:59:04PM +0100, Stefan Hajnoczi wrote:
> > On Thu, Aug 05, 2021 at 02:32:31PM +0800, Jason Wang wrote:
> > > 
> > > 在 2021/8/4 下午8:50, Stefan Hajnoczi 写道:
> > > > On Wed, Aug 04, 2021 at 04:51:15PM +0800, Jason Wang wrote:
> > > > > 在 2021/8/4 下午4:09, Stefan Hajnoczi 写道:
> > > > > > On Tue, Aug 03, 2021 at 11:20:06AM +0800, Jason Wang wrote:
> > > > > > I tried to imagine what the virtio-blk vdev creation parameters need to
> > > > > > look like. Here is what I came up with:
> > > > > > 
> > > > > >     Virtual Device Creation Parameters for Block Devices
> > > > > >     ----------------------------------------------------
> > > > > >     The following creation parameters specify the details of a new virtual
> > > > > >     block device:
> > > > > > 
> > > > > >     Field        Type   Meaning
> > > > > >     ----------------------------------------------------------------------
> > > > > >     blkdev_id    u64    Identifier of the underlying block device that
> > > > > >                         provides storage. The enumeration and creation of
> > > > > >                         underlying block devices is
> > > > > >                         implementation-specific.
> > > > > >     num_queues   u16    Number of request virtqueues.
> > > > > >     features_len u8     Number of elements in features[].
> > > > > 
> > > > > For 'elements' do you mean the 'u32 elements'?
> > > > Yes, u32 array elements.
> > > > 
> > > > > >     features[]   u32    Device feature bits to report.
> > > > > > 
> > > > > >     Creation error codes are as follows:
> > > > > > 
> > > > > >     Error               Meaning
> > > > > >     ----------------------------------------------------------------------
> > > > > >     INVALID_BLKDEV_ID   The underlying block device does not exist.
> > > > > >     BLKDEV_BUSY         The underlying block device is already in use.
> > > > > >     BLKDEV_READ_ONLY    The underlying block device is read-only.
> > > > > >     INVALID_NUM_QUEUES  The number of request queues was 0 or too large.
> > > > > >     UNSUPPORTED_FEATURE A feature bit was given that the device does not
> > > > > >                         support.
> > > > > > 
> > > > > >     If the VIRTIO_BLK_F_RO bit is set in features[] then the underlying
> > > > > >     block device is made available for read-only access.
> > > > > > 
> > > > > >     Creation MAY fail with BLKDEV_BUSY if a blkdev_id value that is
> > > > > >     already in use is given.
> > > > > > 
> > > > > >     Creation MAY fail with BLKDEV_READ_ONLY if the underlying block device
> > > > > >     does not support writes and the VIRTIO_BLK_F_RO bit is not set in
> > > > > >     features[].
> > > > > > 
> > > > > >     The configuration space parameters (see 5.2.4 Device configuration
> > > > > >     layout) are determined by the device based on the underlying block
> > > > > >     device capacity, block size, etc.
> > > > > > 
> > > > > > Note that this doesn't allow overriding configuration space parameters
> > > > > > (e.g. block size). We probably need to support that in the future for
> > > > > > live migration compatibility.
> > > > > 
> > > > > I wonder do we need those configuration to be self-descriptive? E.g how did
> > > > > the device know that the config contains the blk_size. (I guess it's not a
> > > > > good practice to infer this from the config len).
> > > > The device configuration space size and layout is determined by the
> > > > device feature bits.
> > > 
> > > 
> > > So blk_size doesn't belong to any feature. I guess it means we should start
> > > the support of blk_size from day 0.
> > 
> > The device creation parameters can either include a full configuration
> > space-sized blob:
> > 
> >   Field        Type                      Meaning
> >   ----------------------------------------------------------------------
> >   init_config  struct virtio_blk_config  Initial contents of the
> >                                          configuration space.
> > 
> > or they can include individual fields (basically
> > re-define them outside struct virtio_foo_config):
> > 
> >   Field        Type        Meaning
> >   ----------------------------------------------------------------------
> >   blk_size     u32         Block size.
> > 
> > Which approach to use depends on how much of the configuration space
> > should be settable at device creation time. If most of it will be
> > initialized by the device and isn't configurable, then embedding the
> > entire struct is not necessary.
> > Additionally, there must be a flags field in the device creation
> > parameters for indicating which configuration space fields or individual
> > fields described above to use. This allows you to accept the default
> > blk_size value instead of providing your own value:
> > 
> >   Field       Type         Meaning
> >   ----------------------------------------------------------------------
> >   init_flags  u64          Use the corresponding field value to
> >                            initialize the device configuration space if
> > 			   the flag is set:
> > 
> > 			     INIT_BLK_SIZE (1 << 0)
> > 
> > > I had some discussion with Parav about this in the series that introduces
> > > the netlink extension for setting up the device.
> > > 
> > > I guess this is what we want:
> > > 
> > > struct virtio_config {
> > > attribute_X; //only exist when feature X existing
> > > attribute_Y; //only exist when feature Y existing
> > > ...
> > > };
> > 
> > That's more or less how configuration space layout works today. We don't
> > have explicit comments in the header file but when feature X is enabled
> > the driver may access virtio_config::attribute_X.
> > 
> > Stefan
> 
> 
> Two things I know about network devices
> - some VF configuration isn't in config space at all
>   since config space describes guest visible fields.
>   E.g. a vlan tag to be attached to all packets.

blk has additional configuration that is not visible in struct
virtio_blk_config. For example, the request virtqueue size.

Such device creation parameters can be defined as fields just like the
ones from struct virtio_blk_config that I showed above, so I think we're
fine there.

> - some VF configuration is normally settable by PF without
>   destroying/recreating the VF.
>   E.g. the default MAC address.

Sounds like a job for an admin virtqueue :).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-08-19 12:54 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-03  3:20 [RFC PATCH] Introduce admin virtqueue as a new transport Jason Wang
2021-08-03 12:40 ` Max Gurtovoy
2021-08-04  1:37   ` Jason Wang
2021-08-04 10:20     ` Max Gurtovoy
2021-08-05  1:36       ` Jason Wang
2021-08-05 12:37         ` Max Gurtovoy
2021-08-06  2:26           ` Jason Wang
2021-08-11 10:00             ` Max Gurtovoy
2021-08-12  3:02               ` [virtio-comment] " Jason Wang
2021-08-03 14:51 ` Stefan Hajnoczi
2021-08-04  3:01   ` Jason Wang
2021-08-04  6:39     ` Stefan Hajnoczi
2021-08-04  8:39       ` Jason Wang
2021-08-04 12:56         ` Stefan Hajnoczi
2021-08-05  6:33           ` Jason Wang
2021-08-04  8:09 ` Stefan Hajnoczi
2021-08-04  8:51   ` Jason Wang
2021-08-04 12:50     ` Stefan Hajnoczi
2021-08-05  6:32       ` Jason Wang
2021-08-05 13:59         ` Stefan Hajnoczi
2021-08-05 19:19           ` Michael S. Tsirkin
2021-08-06  2:39             ` Jason Wang
     [not found]               ` <20210806044426-mutt-send-email-mst@kernel.org>
2021-08-09  3:10                 ` Jason Wang
2021-08-19 12:54             ` Stefan Hajnoczi
2021-08-04 13:36 ` Michael S. Tsirkin
2021-08-05  2:07   ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.