[PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
@ 2022-01-13 14:50 Max Gurtovoy
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
                   ` (5 more replies)
  0 siblings, 6 replies; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:50 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

Hi,

In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
device resource. Hence making efficient use of it based on the use case
that aligns to the VM configuration is desired for best system
performance.

For example, today's static assignment of the amount of MSI-X vectors
doesn't allow sophisticated utilization of resources.

A typical cloud provider SR-IOV use case is to create many VFs for
use by guest VMs. Each VM might have a different purpose and different
amount of resources accordingly (e.g. number of CPUs). A common driver
usage of device's MSI-X vectors is proportional to the number of CPUs in
the VM. Since the system administrator might know the amount of CPUs in
the requested VM, he can also configure the VF's MSI-X vectors amount
proportional to the number of CPUs in the VM. In this way, the
utilization of the physical hardware will be improved.

Today we have some operating systems that support provisioning MSI-X
vectors for PCI VFs.

Update the specification to have a method to change the number of MSI-X
vectors supported by a VF using the PF admin virtqueue interface. For that,
create a generic infrastructure for managing PCI resources of the managed
VF by its parent PF.

Patches (1/5)-(2/5) introduce the admin virtqueue concept and feature bits.
Patches (3/5)-(4/5) add the admin virtq to virtio-blk and virtio-net
devices.
Patch (5/5) introduce MSI-X mgmt support.

Max Gurtovoy (5):
  Add virtio Admin Virtqueue specification
  Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  virtio-blk: add support for VIRTIO_F_ADMIN_VQ
  virtio-net: add support for VIRTIO_F_ADMIN_VQ
  Add support for dynamic MSI-X vector mgmt for VFs

 admin-virtq.tex | 145 ++++++++++++++++++++++++++++++++++++++++++++++++
 content.tex     |  91 +++++++++++++++++++++++++++---
 packed-ring.tex |  26 ++++-----
 split-ring.tex  |  35 ++++++++----
 4 files changed, 263 insertions(+), 34 deletions(-)
 create mode 100644 admin-virtq.tex

-- 
2.21.0

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
@ 2022-01-13 14:50 ` Max Gurtovoy
  2022-01-13 17:53   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:50 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

In one of the many use cases a user wants to manipulate features and
configuration of the virtio devices regardless of the device type
(net/block/console). Some of this configuration is generic enough. i.e
Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
such features query and manipulation by its parent PCI PF.

Currently virtio specification defines control virtqueue to manipulate
features and configuration of the device it operates on. However,
control virtqueue commands are device type specific, which makes it very
difficult to extend for device agnostic commands. Control virtqueue is
also limited to follow in order completion for the device which
negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
control virtqueue for feature manipulation in out of order manner for
unrelated commands.

To support these requirements which overcome above two limitations in
elegant way, this patch introduces a new admin virtqueue. Admin
virtqueue will use the same command format for all types of virtio
devices.

Subsequent patches make use of this admin virtqueue.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 content.tex     |  9 +++++++--
 2 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 admin-virtq.tex

diff --git a/admin-virtq.tex b/admin-virtq.tex
new file mode 100644
index 0000000..ad20f89
--- /dev/null
+++ b/admin-virtq.tex
@@ -0,0 +1,49 @@
+\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
+
+Admin virtqueue is used to send administrative commands to manipulate
+various features of the device which would not easily map into the
+configuration space.
+
+Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
+feature bit.
+
+Admin virtqueue index may vary among different device types.
+
+The Admin command set defines the commands that may be issued only to the admin
+virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
+support all the mandatory admin commands. A device MAY support also one or more
+optional admin commands. All commands are of the following form:
+
+\begin{lstlisting}
+struct virtio_admin_cmd {
+        /* Device-readable part */
+        u8 command;
+        u8 command-specific-data[];
+
+        /* Device-writable part */
+        u8 status;
+        u8 command-specific-result[];
+};
+
+/* status values */
+#define VIRTIO_ADMIN_STATUS_OK 0
+#define VIRTIO_ADMIN_STATUS_ERR 1
+#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
+\end{lstlisting}
+
+The \field{command} and \field{command-specific-data} are
+set by the driver, and the device sets the \field{status} and the
+\field{command-specific-result}, if needed.
+
+The following table describes the Admin command set:
+
+\begin{tabular}{|l|l|l|l|}
+\hline
+Opcode (bits) & Opcode (hex) & Command & M/O \\
+\hline \hline
+ -  & 00h - 7Fh   & Generic admin cmds    & -  \\
+\hline
+ -  & 80h - FFh   & Reserved    & - \\
+\hline
+\end{tabular}
+
diff --git a/content.tex b/content.tex
index 32de668..c524fab 100644
--- a/content.tex
+++ b/content.tex
@@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
 \begin{description}
 \item[0 to 23] Feature bits for the specific device type
 
-\item[24 to 40] Feature bits reserved for extensions to the queue and
+\item[24 to 41] Feature bits reserved for extensions to the queue and
   feature negotiation mechanisms
 
-\item[41 and above] Feature bits reserved for future extensions.
+\item[42 and above] Feature bits reserved for future extensions.
 \end{description}
 
 \begin{note}
@@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
 types. It is RECOMMENDED that devices generate version 4
 UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
 
+\input{admin-virtq.tex}
+
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
 
 We start with an overview of device initialization, then expand on the
@@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   that the driver can reset a queue individually.
   See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
 
+  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
+  the device supports administration virtqueue negotiation.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 15:33   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

These new features are parallel to VIRTIO_F_INDIRECT_DESC and
VIRTIO_F_IN_ORDER. Some devices might support these features only for
admin virtqueues and some might support them for both admin virtqueues
and request virtqueues or only for non-admin virtqueues. Some
optimization can be made for each type of virtqueue, thus we separate
these features for the different virtqueue types.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
 packed-ring.tex | 26 +++++++++++++-------------
 split-ring.tex  | 35 +++++++++++++++++++++++------------
 3 files changed, 75 insertions(+), 33 deletions(-)

diff --git a/content.tex b/content.tex
index c524fab..cc3e648 100644
--- a/content.tex
+++ b/content.tex
@@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
 \begin{description}
 \item[0 to 23] Feature bits for the specific device type
 
-\item[24 to 41] Feature bits reserved for extensions to the queue and
+\item[24 to 43] Feature bits reserved for extensions to the queue and
   feature negotiation mechanisms
 
-\item[42 and above] Feature bits reserved for future extensions.
+\item[44 and above] Feature bits reserved for future extensions.
 \end{description}
 
 \begin{note}
@@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
 
 Some devices always use descriptors in the same order in which
 they have been made available. These devices can offer the
-VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
-might allow optimizations or simplify driver and/or device code.
+VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
+If negotiated, this knowledge might allow optimizations or
+simplify driver and/or device code.
 
 Each virtqueue can consist of up to 3 parts:
 \begin{itemize}
@@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
 Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
 Virtqueues / The Virtqueue Descriptor Table / Indirect
-Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
+Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
   \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
   and the \field{avail_event} fields as described in
 \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
@@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   support for the packed virtqueue layout as described in
   \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
   \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
-  that all buffers are used by the device in the same
-  order in which they have been made available.
+  that all buffers are used by the device, excluding buffers used by
+  the admin virtqueue, in the same order in which they have been made
+  available.
   \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
   that memory accesses by the driver and the device are ordered
   in a way described by the platform.
@@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
   the device supports administration virtqueue negotiation.
 
+  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
+  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
+  flag set, as described in \ref{sec:Basic Facilities of a Virtio
+Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
+Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
+Virtqueues / The Virtqueue Descriptor Table / Indirect
+Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
+  virtqueue and excluding the descriptors that sent via other virtqueues.
+  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
+  that all buffers are used by the admin virtqueue of the device in
+  the same order in which they have been made available.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
 A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
 
+A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
+VIRTIO_F_ADMIN_VQ.
+
+A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
+VIRTIO_F_ADMIN_VQ.
+
 \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
@@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 accepted.
 
 If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
-buffers in the same order in which they have been available.
+buffers in the same order in which they have been available. This refers
+to buffers that are used by virtqueue that is not the admin virtqueue.
+
+If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
+buffers in the same order in which they have been available. This refers
+only for buffers that are used by the admin virtqueue.
 
 A device MAY fail to operate further if
 VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
@@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 and presents a PCI SR-IOV capability structure, otherwise
 it MUST NOT offer VIRTIO_F_SR_IOV.
 
+A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
+VIRTIO_F_ADMIN_VQ.
+
+A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
+VIRTIO_F_ADMIN_VQ.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following:
diff --git a/packed-ring.tex b/packed-ring.tex
index a9e6c16..ef1dbc2 100644
--- a/packed-ring.tex
+++ b/packed-ring.tex
@@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
 \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
 
 Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
-ring capacity the driver can store a (read-only by the device) table of indirect
-descriptors anywhere in memory, and insert a descriptor in the main
-virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
-a buffer element
-containing this indirect descriptor table; \field{addr} and \field{len}
-refer to the indirect table address and length in bytes,
+of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
+features allows this. To increase ring capacity the driver can store a (read-only
+by the device) table of indirect descriptors anywhere in memory, and insert a
+descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
+that refers to a buffer element containing this indirect descriptor table;
+\field{addr} and \field{len} refer to the indirect table address and length in bytes,
 respectively.
 \begin{lstlisting}
 /* This means the element contains a table of descriptors. */
@@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
 
 Some devices always use descriptors in the same order in which
 they have been made available. These devices can offer the
-VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
-devices to notify the use of a batch of buffers to the driver by
-only writing out a single used descriptor with the Buffer ID
-corresponding to the last descriptor in the batch.
+VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
+If negotiated, this knowledge allows devices to notify the use of
+a batch of buffers to the driver by only writing out a single used
+descriptor with the Buffer ID corresponding to the last descriptor
+in the batch.
 
 The device then skips forward in the ring according to the size of
 the batch. The driver needs to look up the used Buffer ID and
@@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
 
 \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 The driver MUST NOT set the DESC_F_INDIRECT flag unless the
-VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
-set any flags except DESC_F_WRITE within an indirect descriptor.
+VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
+The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
 
 A driver MUST NOT create a descriptor chain longer than allowed
 by the device.
diff --git a/split-ring.tex b/split-ring.tex
index de94038..cd53840 100644
--- a/split-ring.tex
+++ b/split-ring.tex
@@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
 descriptors in ring order: starting from offset 0 in the table,
 and wrapping around at the end of the table.
 
+If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
+descriptors in admin virtqueue ring order: starting from offset 0 in the
+table, and wrapping around at the end of the table.
+
 \begin{note}
 The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
 referred to this structure as vring_desc, and the constants as
@@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
 Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
 this implies that loops in the descriptor chain are forbidden!
 
-If VIRTIO_F_IN_ORDER has been negotiated, and when making a
-descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
-$x$ in the table available to the device, driver MUST set
+If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
+and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
+offset $x$ in the table available to the device, driver MUST set
 \field{next} to $0$ for the last descriptor in the table
 (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
+This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
 
 \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 
 Some devices benefit by concurrently dispatching a large number
-of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
+of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
+features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
 ring capacity the driver can store a table of indirect
 descriptors anywhere in memory, and insert a descriptor in main
 virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
@@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
 A single indirect descriptor
 table can include both device-readable and device-writable descriptors.
 
-If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
-use sequential indices, in-order: index 0 followed by index 1
+If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
+for non admin virtqueue, use sequential indices, in-order: index 0 followed
+by index 1 followed by index 2, etc.
+
+If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
+descriptors use sequential indices, in-order: index 0 followed by index 1
 followed by index 2, etc.
 
 \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
-VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
-set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
-one table per descriptor).
+VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
+The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
+descriptor (ie. only one table per descriptor).
 
 A driver MUST NOT create a descriptor chain longer than the Queue Size of
 the device.
@@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
 A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
 in \field{flags}.
 
-If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
-MUST appear sequentially, with \field{next} taking the value
-of 1 for the 1st descriptor, 2 for the 2nd one, etc.
+If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
+indirect descriptors MUST appear sequentially, with \field{next} taking the
+value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
+and rest other virtqueues types respectively.
 
 \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
 The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
  2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 18:24   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 content.tex | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/content.tex b/content.tex
index cc3e648..0ae4b68 100644
--- a/content.tex
+++ b/content.tex
@@ -4518,10 +4518,19 @@ \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID}
   2
 
 \subsection{Virtqueues}\label{sec:Device Types / Block Device / Virtqueues}
+ if VIRTIO_F_ADMIN_VQ is not negotiated, the request queues layout is as follows:
 \begin{description}
 \item[0] requestq1
 \item[\ldots]
 \item[N-1] requestqN
+\end{description}
+
+ If VIRTIO_F_ADMIN_VQ is negotiated, the queues layout is as follows:
+\begin{description}
+\item[0] requestq1
+\item[\ldots]
+\item[N-1] requestqN
+\item[N] adminq
 \end{description}
 
  N=1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by
@@ -4590,7 +4599,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
 bits as indicated above.
 
 The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies
-the number of queues.
+the number of request queues. This field doesn't account admin virtqueue.
 
 The parameters in the configuration space of the device \field{max_discard_sectors}
 \field{discard_sector_alignment} are expressed in 512-byte units if the
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
                   ` (2 preceding siblings ...)
  2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 17:56   ` Michael S. Tsirkin
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
  2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
  5 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 content.tex | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/content.tex b/content.tex
index 0ae4b68..e9c2383 100644
--- a/content.tex
+++ b/content.tex
@@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
 \item[2(N-1)] receiveqN
 \item[2(N-1)+1] transmitqN
 \item[2N] controlq
+\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
 \end{description}
 
  N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
@@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
 
  controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
 
+ adminq only exists if VIRTIO_F_ADMIN_VQ set.
+
 \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
 
 \begin{description}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
                   ` (3 preceding siblings ...)
  2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
@ 2022-01-13 14:51 ` Max Gurtovoy
  2022-01-13 18:20   ` Michael S. Tsirkin
  2022-01-18 10:38   ` Michael S. Tsirkin
  2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
  5 siblings, 2 replies; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-13 14:51 UTC (permalink / raw)
  To: virtio-comment, mst, cohuck, virtio-dev, jasowang
  Cc: parav, shahafs, oren, stefanha, Max Gurtovoy

A typical cloud provider SR-IOV use case is to create many VFs for
use by guest VMs. The VFs may not be assigned to a VM until a user
requests a VM of a certain size, e.g., number of CPUs. A VF may need
MSI-X vectors proportional to the number of CPUs in the VM, but there is
no standard way today in the spec to change the number of MSI-X vectors
supported by a VF, although there are some operating systems that
support this.

Introduce new feature bits for generic PCI virtualization management
mechanism and a specific mechanism to manage the MSI-X vector assignment
process of virtual/managed functions by its parent virtio device via its
admin virtqueue. For now, virtio supports only PCI virtual function
virtualization, thus the virt manager device will be the PF and the
managed device will be the VF.

Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---
 admin-virtq.tex | 98 ++++++++++++++++++++++++++++++++++++++++++++++++-
 content.tex     | 29 ++++++++++++++-
 2 files changed, 124 insertions(+), 3 deletions(-)

diff --git a/admin-virtq.tex b/admin-virtq.tex
index ad20f89..4ee8a32 100644
--- a/admin-virtq.tex
+++ b/admin-virtq.tex
@@ -41,9 +41,105 @@ \section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin
 \hline
 Opcode (bits) & Opcode (hex) & Command & M/O \\
 \hline \hline
- -  & 00h - 7Fh   & Generic admin cmds    & -  \\
+ 00000000b  & 00h   & VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY    & O  \\
+\hline
+ 00000001b  & 01h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET    & O  \\
+\hline
+ 00000010b  & 02h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET    & O  \\
+\hline
+ -  & 03h - 7Fh   & Generic admin cmds    & -  \\
 \hline
  -  & 80h - FFh   & Reserved    & - \\
 \hline
 \end{tabular}
 
+\subsection{VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}
+
+The VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY command has no command specific data set by the driver.
+This command upon success, returns a data buffer that describes information about PCI virtualization
+management attributes. This information is of form:
+\begin{lstlisting}
+struct virtio_admin_pci_virt_mgmt_attr_identify_result {
+        /* For compatibility - indicates which of the below fields are valid (1 means valid):
+         * Bit 0x0 - total_free_vfs_msix_count
+         * Bit 0x1 - per_vf_max_msix_count
+         * Bits 0x2 - 0x3F - reserved for future fields
+         */
+        le64 mask;
+        /* Number of free msix in the global msix pool for VFs */
+        le32 total_free_vfs_msix_count;
+        /* Max number of msix vectors that can be assigned for a single VF */
+        le16 per_vf_max_msix_count;
+};
+\end{lstlisting}
+
+\subsection{VIRTIO ADMIN PCI VIRT PROPERTY SET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY SET command}
+
+The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET command is used to modify the values of VF properties.
+The command specific data set by the driver is of form:
+\begin{lstlisting}
+virtio_admin_pci_virt_property_set_data {
+        /* The virtual function number */
+        le16 vf_number;
+        /* For compatibility - indicates which of the below properties should be
+         * modified (1 means that field should be modified):
+         * Bit 0x0 - msix_count
+         * Bits 0x1 - 0x3F - reserved for future fields
+         */
+        le64 property_mask;
+        /* The amount of MSI-X vectors */
+        le16 msix_count;
+};
+\end{lstlisting}
+
+\begin{note}
+{vf_number can't be greater than NumVFs value as defined in the PCI specification
+or smaller than 1. An error status will be returned otherwise.}
+\end{note}
+
+This command has no command specific result set by the device. Upon success, the device guarantees
+that all the requested properties were modified to the given values. Otherwise, error will be returned.
+
+\begin{note}
+{Before setting msix_count property the virtual/managed device (VF) shall be un-initialized and may not be used by the driver.}
+\end{note}
+
+\subsection{VIRTIO ADMIN PCI VIRT PROPERTY GET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY GET command}
+
+The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET command is used to obtain the values of VF properties.
+The command specific data set by the driver is of form:
+\begin{lstlisting}
+virtio_admin_pci_virt_property_get_data {
+        /* The virtual function number */
+        le16 vf_number;
+        /* For compatibility - indicates which of the below properties should be
+         * queried (1 means that field should be queried):
+         * Bit 0x0 - msix_count
+         * Bits 0x1 - 0x3F - reserved for future fields
+         */
+        le64 property_mask;
+        /* The amount of MSI-X vectors */
+        le16 msix_count;
+};
+\end{lstlisting}
+
+\begin{note}
+{vf_number can't be greater than NumVFs value as defined in the PCI specification
+or smaller than 1. An error status will be returned otherwise.}
+\end{note}
+
+This command, upon success, returns a data buffer that describes the properties that were requested
+and their values for the subject virtio VF device according to the given vf_number.
+This information is of form:
+\begin{lstlisting}
+struct virtio_admin_pci_virt_property_get_result {
+        /* For compatibility - indicates which of the below fields were returned
+         * (1 means that field was returned):
+         * Bit 0x0 - msix_count
+         * Bits 0x1 - 0x3F - reserved for future fields
+         */
+        le64 property_mask;
+        /* The amount of MSI-X vectors */
+        le16 msix_count;
+};
+\end{lstlisting}
diff --git a/content.tex b/content.tex
index e9c2383..64678f0 100644
--- a/content.tex
+++ b/content.tex
@@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
 \begin{description}
 \item[0 to 23] Feature bits for the specific device type
 
-\item[24 to 43] Feature bits reserved for extensions to the queue and
+\item[24 to 45] Feature bits reserved for extensions to the queue and
   feature negotiation mechanisms
 
-\item[44 and above] Feature bits reserved for future extensions.
+\item[46 and above] Feature bits reserved for future extensions.
 \end{description}
 
 \begin{note}
@@ -6878,6 +6878,17 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   that all buffers are used by the admin virtqueue of the device in
   the same order in which they have been made available.
 
+  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates
+  that the device can manage PCI related capabilities for its managed PCI VF
+  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,
+  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
+  admin commands. This feature can be supported only by PCI devices.
+
+  \item[VIRTIO_F_ADMIN_MSIX_MGMT (45)] This feature indicates
+  that the device supports management of the MSI-X vectors for its
+  managed PCI VF devices using VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and
+  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET admin commands.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
@@ -6920,6 +6931,14 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
 VIRTIO_F_ADMIN_VQ.
 
+A driver MAY accept VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it accepts
+VIRTIO_F_ADMIN_VQ.
+
+A driver MAY accept VIRTIO_F_ADMIN_MSIX_MGMT only if it accepts
+VIRTIO_F_ADMIN_VQ and VIRTIO_F_ADMIN_PCI_VIRT_MANAGER. Currently only
+MSI-X management of PCI virtual functions is supported, so the driver
+MUST NOT negotiate VIRTIO_F_ADMIN_MSIX_MGMT if VIRTIO_F_SR_IOV is not negotiated.
+
 \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
 
 A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
@@ -6960,6 +6979,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
 VIRTIO_F_ADMIN_VQ.
 
+A PCI device MAY offer VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it
+offers VIRTIO_F_ADMIN_VQ.
+
+A PCI device MAY offer VIRTIO_F_ADMIN_MSIX_MGMT only if it
+offers VIRTIO_F_ADMIN_VQ, VIRTIO_F_ADMIN_PCI_VIRT_MANAGER and VIRTIO_F_SR_IOV.
+
 \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
 
 Transitional devices MAY offer the following:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
@ 2022-01-13 15:33   ` Michael S. Tsirkin
  2022-01-13 17:07     ` Max Gurtovoy
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 15:33 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> VIRTIO_F_IN_ORDER. Some devices might support these features only for
> admin virtqueues and some might support them for both admin virtqueues
> and request virtqueues or only for non-admin virtqueues. Some
> optimization can be made for each type of virtqueue, thus we separate
> these features for the different virtqueue types.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

That seems vague as motivation.
Why do we need to optimize admin queues? Aren't they
fundamentally a control path feature?
Why would we want to special-case these features specifically?
Should we allow control of features per VQ generally?


> ---
>  content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
>  packed-ring.tex | 26 +++++++++++++-------------
>  split-ring.tex  | 35 +++++++++++++++++++++++------------
>  3 files changed, 75 insertions(+), 33 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index c524fab..cc3e648 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 41] Feature bits reserved for extensions to the queue and
> +\item[24 to 43] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[42 and above] Feature bits reserved for future extensions.
> +\item[44 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
>  
>  Some devices always use descriptors in the same order in which
>  they have been made available. These devices can offer the
> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> -might allow optimizations or simplify driver and/or device code.
> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> +If negotiated, this knowledge might allow optimizations or
> +simplify driver and/or device code.
>  
>  Each virtqueue can consist of up to 3 parts:
>  \begin{itemize}
> @@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
>  Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
>  Virtqueues / The Virtqueue Descriptor Table / Indirect
> -Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
>    \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
>    and the \field{avail_event} fields as described in
>  \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
> @@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    support for the packed virtqueue layout as described in
>    \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
>    \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
> -  that all buffers are used by the device in the same
> -  order in which they have been made available.
> +  that all buffers are used by the device, excluding buffers used by
> +  the admin virtqueue, in the same order in which they have been made
> +  available.
>    \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
>    that memory accesses by the driver and the device are ordered
>    in a way described by the platform.
> @@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>    the device supports administration virtqueue negotiation.
>  
> +  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
> +  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> +  flag set, as described in \ref{sec:Basic Facilities of a Virtio
> +Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> +Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
> +Virtqueues / The Virtqueue Descriptor Table / Indirect
> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
> +  virtqueue and excluding the descriptors that sent via other virtqueues.
> +  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
> +  that all buffers are used by the admin virtqueue of the device in
> +  the same order in which they have been made available.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> @@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
>  A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
>  
> +A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
> +A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
>  \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>  
>  A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> @@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  accepted.
>  
>  If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> -buffers in the same order in which they have been available.
> +buffers in the same order in which they have been available. This refers
> +to buffers that are used by virtqueue that is not the admin virtqueue.
> +
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
> +buffers in the same order in which they have been available. This refers
> +only for buffers that are used by the admin virtqueue.
>  
>  A device MAY fail to operate further if
>  VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
> @@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  and presents a PCI SR-IOV capability structure, otherwise
>  it MUST NOT offer VIRTIO_F_SR_IOV.
>  
> +A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
> +VIRTIO_F_ADMIN_VQ.
> +
> +A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
> +VIRTIO_F_ADMIN_VQ.
> +
>  \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>  
>  Transitional devices MAY offer the following:
> diff --git a/packed-ring.tex b/packed-ring.tex
> index a9e6c16..ef1dbc2 100644
> --- a/packed-ring.tex
> +++ b/packed-ring.tex
> @@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
>  \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
>  
>  Some devices benefit by concurrently dispatching a large number
> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
> -ring capacity the driver can store a (read-only by the device) table of indirect
> -descriptors anywhere in memory, and insert a descriptor in the main
> -virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
> -a buffer element
> -containing this indirect descriptor table; \field{addr} and \field{len}
> -refer to the indirect table address and length in bytes,
> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> +features allows this. To increase ring capacity the driver can store a (read-only
> +by the device) table of indirect descriptors anywhere in memory, and insert a
> +descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
> +that refers to a buffer element containing this indirect descriptor table;
> +\field{addr} and \field{len} refer to the indirect table address and length in bytes,
>  respectively.
>  \begin{lstlisting}
>  /* This means the element contains a table of descriptors. */
> @@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
>  
>  Some devices always use descriptors in the same order in which
>  they have been made available. These devices can offer the
> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
> -devices to notify the use of a batch of buffers to the driver by
> -only writing out a single used descriptor with the Buffer ID
> -corresponding to the last descriptor in the batch.
> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> +If negotiated, this knowledge allows devices to notify the use of
> +a batch of buffers to the driver by only writing out a single used
> +descriptor with the Buffer ID corresponding to the last descriptor
> +in the batch.
>  
>  The device then skips forward in the ring according to the size of
>  the batch. The driver needs to look up the used Buffer ID and
> @@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
>  
>  \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  The driver MUST NOT set the DESC_F_INDIRECT flag unless the
> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> -set any flags except DESC_F_WRITE within an indirect descriptor.
> +VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> +The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
>  
>  A driver MUST NOT create a descriptor chain longer than allowed
>  by the device.
> diff --git a/split-ring.tex b/split-ring.tex
> index de94038..cd53840 100644
> --- a/split-ring.tex
> +++ b/split-ring.tex
> @@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>  descriptors in ring order: starting from offset 0 in the table,
>  and wrapping around at the end of the table.
>  
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
> +descriptors in admin virtqueue ring order: starting from offset 0 in the
> +table, and wrapping around at the end of the table.
> +
>  \begin{note}
>  The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
>  referred to this structure as vring_desc, and the constants as
> @@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>  Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
>  this implies that loops in the descriptor chain are forbidden!
>  
> -If VIRTIO_F_IN_ORDER has been negotiated, and when making a
> -descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
> -$x$ in the table available to the device, driver MUST set
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> +and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
> +offset $x$ in the table available to the device, driver MUST set
>  \field{next} to $0$ for the last descriptor in the table
>  (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
> +This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
>  
>  \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  
>  Some devices benefit by concurrently dispatching a large number
> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> +features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
>  ring capacity the driver can store a table of indirect
>  descriptors anywhere in memory, and insert a descriptor in main
>  virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
> @@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>  A single indirect descriptor
>  table can include both device-readable and device-writable descriptors.
>  
> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> -use sequential indices, in-order: index 0 followed by index 1
> +If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
> +for non admin virtqueue, use sequential indices, in-order: index 0 followed
> +by index 1 followed by index 2, etc.
> +
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
> +descriptors use sequential indices, in-order: index 0 followed by index 1
>  followed by index 2, etc.
>  
>  \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> -set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
> -one table per descriptor).
> +VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> +The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
> +descriptor (ie. only one table per descriptor).
>  
>  A driver MUST NOT create a descriptor chain longer than the Queue Size of
>  the device.
> @@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>  A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
>  in \field{flags}.
>  
> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> -MUST appear sequentially, with \field{next} taking the value
> -of 1 for the 1st descriptor, 2 for the 2nd one, etc.
> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> +indirect descriptors MUST appear sequentially, with \field{next} taking the
> +value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
> +and rest other virtqueues types respectively.
>  
>  \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>  The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 15:33   ` Michael S. Tsirkin
@ 2022-01-13 17:07     ` Max Gurtovoy
  2022-01-13 17:25       ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-13 17:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha


On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
>> These new features are parallel to VIRTIO_F_INDIRECT_DESC and
>> VIRTIO_F_IN_ORDER. Some devices might support these features only for
>> admin virtqueues and some might support them for both admin virtqueues
>> and request virtqueues or only for non-admin virtqueues. Some
>> optimization can be made for each type of virtqueue, thus we separate
>> these features for the different virtqueue types.
>>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> That seems vague as motivation.
> Why do we need to optimize admin queues? Aren't they
> fundamentally a control path feature?
> Why would we want to special-case these features specifically?
> Should we allow control of features per VQ generally?

We would like to allow executing admins commands out of order and IO 
requests in order for efficiency.

And also the other way around.

IO cmds and admin cmds have different considerations in many cases.

>
>
>> ---
>>   content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
>>   packed-ring.tex | 26 +++++++++++++-------------
>>   split-ring.tex  | 35 +++++++++++++++++++++++------------
>>   3 files changed, 75 insertions(+), 33 deletions(-)
>>
>> diff --git a/content.tex b/content.tex
>> index c524fab..cc3e648 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>>   \begin{description}
>>   \item[0 to 23] Feature bits for the specific device type
>>   
>> -\item[24 to 41] Feature bits reserved for extensions to the queue and
>> +\item[24 to 43] Feature bits reserved for extensions to the queue and
>>     feature negotiation mechanisms
>>   
>> -\item[42 and above] Feature bits reserved for future extensions.
>> +\item[44 and above] Feature bits reserved for future extensions.
>>   \end{description}
>>   
>>   \begin{note}
>> @@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
>>   
>>   Some devices always use descriptors in the same order in which
>>   they have been made available. These devices can offer the
>> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
>> -might allow optimizations or simplify driver and/or device code.
>> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
>> +If negotiated, this knowledge might allow optimizations or
>> +simplify driver and/or device code.
>>   
>>   Each virtqueue can consist of up to 3 parts:
>>   \begin{itemize}
>> @@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
>>   Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
>>   Virtqueues / The Virtqueue Descriptor Table / Indirect
>> -Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
>> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
>>     \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
>>     and the \field{avail_event} fields as described in
>>   \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
>> @@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     support for the packed virtqueue layout as described in
>>     \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
>>     \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
>> -  that all buffers are used by the device in the same
>> -  order in which they have been made available.
>> +  that all buffers are used by the device, excluding buffers used by
>> +  the admin virtqueue, in the same order in which they have been made
>> +  available.
>>     \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
>>     that memory accesses by the driver and the device are ordered
>>     in a way described by the platform.
>> @@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>>     the device supports administration virtqueue negotiation.
>>   
>> +  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
>> +  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
>> +  flag set, as described in \ref{sec:Basic Facilities of a Virtio
>> +Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
>> +Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
>> +Virtqueues / The Virtqueue Descriptor Table / Indirect
>> +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
>> +  virtqueue and excluding the descriptors that sent via other virtqueues.
>> +  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
>> +  that all buffers are used by the admin virtqueue of the device in
>> +  the same order in which they have been made available.
>> +
>>   \end{description}
>>   
>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>> @@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   
>>   A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
>>   
>> +A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
>> +VIRTIO_F_ADMIN_VQ.
>> +
>> +A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
>> +VIRTIO_F_ADMIN_VQ.
>> +
>>   \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>>   
>>   A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
>> @@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   accepted.
>>   
>>   If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
>> -buffers in the same order in which they have been available.
>> +buffers in the same order in which they have been available. This refers
>> +to buffers that are used by virtqueue that is not the admin virtqueue.
>> +
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
>> +buffers in the same order in which they have been available. This refers
>> +only for buffers that are used by the admin virtqueue.
>>   
>>   A device MAY fail to operate further if
>>   VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
>> @@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   and presents a PCI SR-IOV capability structure, otherwise
>>   it MUST NOT offer VIRTIO_F_SR_IOV.
>>   
>> +A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
>> +VIRTIO_F_ADMIN_VQ.
>> +
>> +A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
>> +VIRTIO_F_ADMIN_VQ.
>> +
>>   \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>>   
>>   Transitional devices MAY offer the following:
>> diff --git a/packed-ring.tex b/packed-ring.tex
>> index a9e6c16..ef1dbc2 100644
>> --- a/packed-ring.tex
>> +++ b/packed-ring.tex
>> @@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
>>   \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
>>   
>>   Some devices benefit by concurrently dispatching a large number
>> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
>> -ring capacity the driver can store a (read-only by the device) table of indirect
>> -descriptors anywhere in memory, and insert a descriptor in the main
>> -virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
>> -a buffer element
>> -containing this indirect descriptor table; \field{addr} and \field{len}
>> -refer to the indirect table address and length in bytes,
>> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
>> +features allows this. To increase ring capacity the driver can store a (read-only
>> +by the device) table of indirect descriptors anywhere in memory, and insert a
>> +descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
>> +that refers to a buffer element containing this indirect descriptor table;
>> +\field{addr} and \field{len} refer to the indirect table address and length in bytes,
>>   respectively.
>>   \begin{lstlisting}
>>   /* This means the element contains a table of descriptors. */
>> @@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
>>   
>>   Some devices always use descriptors in the same order in which
>>   they have been made available. These devices can offer the
>> -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
>> -devices to notify the use of a batch of buffers to the driver by
>> -only writing out a single used descriptor with the Buffer ID
>> -corresponding to the last descriptor in the batch.
>> +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
>> +If negotiated, this knowledge allows devices to notify the use of
>> +a batch of buffers to the driver by only writing out a single used
>> +descriptor with the Buffer ID corresponding to the last descriptor
>> +in the batch.
>>   
>>   The device then skips forward in the ring according to the size of
>>   the batch. The driver needs to look up the used Buffer ID and
>> @@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
>>   
>>   \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   The driver MUST NOT set the DESC_F_INDIRECT flag unless the
>> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
>> -set any flags except DESC_F_WRITE within an indirect descriptor.
>> +VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
>> +The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
>>   
>>   A driver MUST NOT create a descriptor chain longer than allowed
>>   by the device.
>> diff --git a/split-ring.tex b/split-ring.tex
>> index de94038..cd53840 100644
>> --- a/split-ring.tex
>> +++ b/split-ring.tex
>> @@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>>   descriptors in ring order: starting from offset 0 in the table,
>>   and wrapping around at the end of the table.
>>   
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
>> +descriptors in admin virtqueue ring order: starting from offset 0 in the
>> +table, and wrapping around at the end of the table.
>> +
>>   \begin{note}
>>   The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
>>   referred to this structure as vring_desc, and the constants as
>> @@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
>>   Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
>>   this implies that loops in the descriptor chain are forbidden!
>>   
>> -If VIRTIO_F_IN_ORDER has been negotiated, and when making a
>> -descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
>> -$x$ in the table available to the device, driver MUST set
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
>> +and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
>> +offset $x$ in the table available to the device, driver MUST set
>>   \field{next} to $0$ for the last descriptor in the table
>>   (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
>> +This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
>>   
>>   \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   
>>   Some devices benefit by concurrently dispatching a large number
>> -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
>> +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
>> +features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
>>   ring capacity the driver can store a table of indirect
>>   descriptors anywhere in memory, and insert a descriptor in main
>>   virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
>> @@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>>   A single indirect descriptor
>>   table can include both device-readable and device-writable descriptors.
>>   
>> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
>> -use sequential indices, in-order: index 0 followed by index 1
>> +If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
>> +for non admin virtqueue, use sequential indices, in-order: index 0 followed
>> +by index 1 followed by index 2, etc.
>> +
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
>> +descriptors use sequential indices, in-order: index 0 followed by index 1
>>   followed by index 2, etc.
>>   
>>   \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
>> -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
>> -set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
>> -one table per descriptor).
>> +VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
>> +The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
>> +descriptor (ie. only one table per descriptor).
>>   
>>   A driver MUST NOT create a descriptor chain longer than the Queue Size of
>>   the device.
>> @@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
>>   A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
>>   in \field{flags}.
>>   
>> -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
>> -MUST appear sequentially, with \field{next} taking the value
>> -of 1 for the 1st descriptor, 2 for the 2nd one, etc.
>> +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
>> +indirect descriptors MUST appear sequentially, with \field{next} taking the
>> +value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
>> +and rest other virtqueues types respectively.
>>   
>>   \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
>>   The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
>> -- 
>> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 17:07     ` Max Gurtovoy
@ 2022-01-13 17:25       ` Michael S. Tsirkin
  2022-01-17 13:59         ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 17:25 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 07:07:53PM +0200, Max Gurtovoy wrote:
> 
> On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> > > These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> > > VIRTIO_F_IN_ORDER. Some devices might support these features only for
> > > admin virtqueues and some might support them for both admin virtqueues
> > > and request virtqueues or only for non-admin virtqueues. Some
> > > optimization can be made for each type of virtqueue, thus we separate
> > > these features for the different virtqueue types.
> > > 
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > That seems vague as motivation.
> > Why do we need to optimize admin queues? Aren't they
> > fundamentally a control path feature?
> > Why would we want to special-case these features specifically?
> > Should we allow control of features per VQ generally?
> 
> We would like to allow executing admins commands out of order and IO
> requests in order for efficiency.

It's a control queue. Why do we worry?


> 
> And also the other way around.

what exactly does this mean?

> IO cmds and admin cmds have different considerations in many cases.

That's still pretty vague.  so do other types of VQ, such as RX/TX.

E.g. I can see how a hardware vendor might want to avoid supporting
indirect with RX for virtio net with mergeable buffers, but still
support it for TX.


I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
I think you want to reorder admin commands dealing with
unrelated VFs but keep io vqs in order for speed.
Just guessing, you should spell the real motivation out.
However, I think a better way to do that is with finalizing the
VIRTIO_F_PARTIAL_ORDER proposal from august.
Pls review and let me know. If there's finally a use for it
I'll prioritize finalizing that idea.
Don't see much point in tweaking INDIRECT at all.



> > 
> > 
> > > ---
> > >   content.tex     | 47 +++++++++++++++++++++++++++++++++++++++--------
> > >   packed-ring.tex | 26 +++++++++++++-------------
> > >   split-ring.tex  | 35 +++++++++++++++++++++++------------
> > >   3 files changed, 75 insertions(+), 33 deletions(-)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index c524fab..cc3e648 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
> > >   \begin{description}
> > >   \item[0 to 23] Feature bits for the specific device type
> > > -\item[24 to 41] Feature bits reserved for extensions to the queue and
> > > +\item[24 to 43] Feature bits reserved for extensions to the queue and
> > >     feature negotiation mechanisms
> > > -\item[42 and above] Feature bits reserved for future extensions.
> > > +\item[44 and above] Feature bits reserved for future extensions.
> > >   \end{description}
> > >   \begin{note}
> > > @@ -318,8 +318,9 @@ \section{Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Virtqueues}
> > >   Some devices always use descriptors in the same order in which
> > >   they have been made available. These devices can offer the
> > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge
> > > -might allow optimizations or simplify driver and/or device code.
> > > +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> > > +If negotiated, this knowledge might allow optimizations or
> > > +simplify driver and/or device code.
> > >   Each virtqueue can consist of up to 3 parts:
> > >   \begin{itemize}
> > > @@ -6768,7 +6769,7 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> > >   Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
> > >   Virtqueues / The Virtqueue Descriptor Table / Indirect
> > > -Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}.
> > > +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This excluding the descriptors sent via the admin virtqueue.
> > >     \item[VIRTIO_F_EVENT_IDX(29)] This feature enables the \field{used_event}
> > >     and the \field{avail_event} fields as described in
> > >   \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Used Buffer Notification Suppression}, \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Used Ring} and \ref{sec:Packed Virtqueues / Driver and Device Event Suppression}.
> > > @@ -6800,8 +6801,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >     support for the packed virtqueue layout as described in
> > >     \ref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}~\nameref{sec:Basic Facilities of a Virtio Device / Packed Virtqueues}.
> > >     \item[VIRTIO_F_IN_ORDER(35)] This feature indicates
> > > -  that all buffers are used by the device in the same
> > > -  order in which they have been made available.
> > > +  that all buffers are used by the device, excluding buffers used by
> > > +  the admin virtqueue, in the same order in which they have been made
> > > +  available.
> > >     \item[VIRTIO_F_ORDER_PLATFORM(36)] This feature indicates
> > >     that memory accesses by the driver and the device are ordered
> > >     in a way described by the platform.
> > > @@ -6852,6 +6854,18 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >     \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
> > >     the device supports administration virtqueue negotiation.
> > > +  \item[VIRTIO_F_ADMIN_VQ_INDIRECT_DESC (42)] Negotiating this feature
> > > +  indicates that the driver can use descriptors with the VIRTQ_DESC_F_INDIRECT
> > > +  flag set, as described in \ref{sec:Basic Facilities of a Virtio
> > > +Device / Virtqueues / The Virtqueue Descriptor Table / Indirect
> > > +Descriptors}~\nameref{sec:Basic Facilities of a Virtio Device /
> > > +Virtqueues / The Virtqueue Descriptor Table / Indirect
> > > +Descriptors} and \ref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}~\nameref{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}. This refers to descriptors sent via the admin
> > > +  virtqueue and excluding the descriptors that sent via other virtqueues.
> > > +  \item[VIRTIO_F_ADMIN_VQ_IN_ORDER (43)] This feature indicates
> > > +  that all buffers are used by the admin virtqueue of the device in
> > > +  the same order in which they have been made available.
> > > +
> > >   \end{description}
> > >   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > @@ -6888,6 +6902,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   A driver SHOULD accept VIRTIO_F_NOTIF_CONFIG_DATA if it is offered.
> > > +A driver MAY accept VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it accepts
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > > +A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > >   \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > >   A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> > > @@ -6902,7 +6922,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   accepted.
> > >   If VIRTIO_F_IN_ORDER has been negotiated, a device MUST use
> > > -buffers in the same order in which they have been available.
> > > +buffers in the same order in which they have been available. This refers
> > > +to buffers that are used by virtqueue that is not the admin virtqueue.
> > > +
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, a device MUST use
> > > +buffers in the same order in which they have been available. This refers
> > > +only for buffers that are used by the admin virtqueue.
> > >   A device MAY fail to operate further if
> > >   VIRTIO_F_ORDER_PLATFORM is offered but not accepted.
> > > @@ -6917,6 +6942,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   and presents a PCI SR-IOV capability structure, otherwise
> > >   it MUST NOT offer VIRTIO_F_SR_IOV.
> > > +A device MAY offer VIRTIO_F_ADMIN_VQ_INDIRECT_DESC only if it offers
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > > +A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
> > > +VIRTIO_F_ADMIN_VQ.
> > > +
> > >   \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
> > >   Transitional devices MAY offer the following:
> > > diff --git a/packed-ring.tex b/packed-ring.tex
> > > index a9e6c16..ef1dbc2 100644
> > > --- a/packed-ring.tex
> > > +++ b/packed-ring.tex
> > > @@ -240,13 +240,12 @@ \subsection{Indirect Flag: Scatter-Gather Support}
> > >   \label{sec:Packed Virtqueues / Indirect Flag: Scatter-Gather Support}
> > >   Some devices benefit by concurrently dispatching a large number
> > > -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this. To increase
> > > -ring capacity the driver can store a (read-only by the device) table of indirect
> > > -descriptors anywhere in memory, and insert a descriptor in the main
> > > -virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on) that refers to
> > > -a buffer element
> > > -containing this indirect descriptor table; \field{addr} and \field{len}
> > > -refer to the indirect table address and length in bytes,
> > > +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> > > +features allows this. To increase ring capacity the driver can store a (read-only
> > > +by the device) table of indirect descriptors anywhere in memory, and insert a
> > > +descriptor in the main virtqueue (with \field{Flags} bit VIRTQ_DESC_F_INDIRECT on)
> > > +that refers to a buffer element containing this indirect descriptor table;
> > > +\field{addr} and \field{len} refer to the indirect table address and length in bytes,
> > >   respectively.
> > >   \begin{lstlisting}
> > >   /* This means the element contains a table of descriptors. */
> > > @@ -279,10 +278,11 @@ \subsection{In-order use of descriptors}
> > >   Some devices always use descriptors in the same order in which
> > >   they have been made available. These devices can offer the
> > > -VIRTIO_F_IN_ORDER feature. If negotiated, this knowledge allows
> > > -devices to notify the use of a batch of buffers to the driver by
> > > -only writing out a single used descriptor with the Buffer ID
> > > -corresponding to the last descriptor in the batch.
> > > +VIRTIO_F_IN_ORDER and/or VIRTIO_F_ADMIN_VQ_IN_ORDER features.
> > > +If negotiated, this knowledge allows devices to notify the use of
> > > +a batch of buffers to the driver by only writing out a single used
> > > +descriptor with the Buffer ID corresponding to the last descriptor
> > > +in the batch.
> > >   The device then skips forward in the ring according to the size of
> > >   the batch. The driver needs to look up the used Buffer ID and
> > > @@ -500,8 +500,8 @@ \subsection{Event Suppression Structure Format}\label{sec:Basic
> > >   \drivernormative{\subsection}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Packed Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   The driver MUST NOT set the DESC_F_INDIRECT flag unless the
> > > -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> > > -set any flags except DESC_F_WRITE within an indirect descriptor.
> > > +VIRTIO_F_INDIRECT_DESC or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> > > +The driver MUST NOT set any flags except DESC_F_WRITE within an indirect descriptor.
> > >   A driver MUST NOT create a descriptor chain longer than allowed
> > >   by the device.
> > > diff --git a/split-ring.tex b/split-ring.tex
> > > index de94038..cd53840 100644
> > > --- a/split-ring.tex
> > > +++ b/split-ring.tex
> > > @@ -208,6 +208,10 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
> > >   descriptors in ring order: starting from offset 0 in the table,
> > >   and wrapping around at the end of the table.
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, driver uses
> > > +descriptors in admin virtqueue ring order: starting from offset 0 in the
> > > +table, and wrapping around at the end of the table.
> > > +
> > >   \begin{note}
> > >   The legacy \hyperref[intro:Virtio PCI Draft]{[Virtio PCI Draft]}
> > >   referred to this structure as vring_desc, and the constants as
> > > @@ -223,16 +227,18 @@ \subsection{The Virtqueue Descriptor Table}\label{sec:Basic Facilities of a Virt
> > >   Drivers MUST NOT add a descriptor chain longer than $2^{32}$ bytes in total;
> > >   this implies that loops in the descriptor chain are forbidden!
> > > -If VIRTIO_F_IN_ORDER has been negotiated, and when making a
> > > -descriptor with VRING_DESC_F_NEXT set in \field{flags} at offset
> > > -$x$ in the table available to the device, driver MUST set
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> > > +and when making a descriptor with VRING_DESC_F_NEXT set in \field{flags} at
> > > +offset $x$ in the table available to the device, driver MUST set
> > >   \field{next} to $0$ for the last descriptor in the table
> > >   (where $x = queue\_size - 1$) and to $x + 1$ for the rest of the descriptors.
> > > +This refers to admin virtqueue descriptors and rest other virtqueues types descriptors respectively.
> > >   \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   Some devices benefit by concurrently dispatching a large number
> > > -of large requests. The VIRTIO_F_INDIRECT_DESC feature allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
> > > +of large requests. The VIRTIO_F_INDIRECT_DESC and VIRTIO_F_ADMIN_VQ_INDIRECT_DESC
> > > +features allows this (see \ref{sec:virtio-queue.h}~\nameref{sec:virtio-queue.h}). To increase
> > >   ring capacity the driver can store a table of indirect
> > >   descriptors anywhere in memory, and insert a descriptor in main
> > >   virtqueue (with \field{flags}\&VIRTQ_DESC_F_INDIRECT on) that refers to memory buffer
> > > @@ -258,15 +264,19 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
> > >   A single indirect descriptor
> > >   table can include both device-readable and device-writable descriptors.
> > > -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> > > -use sequential indices, in-order: index 0 followed by index 1
> > > +If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors,
> > > +for non admin virtqueue, use sequential indices, in-order: index 0 followed
> > > +by index 1 followed by index 2, etc.
> > > +
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER has been negotiated, admin virtqueue indirect
> > > +descriptors use sequential indices, in-order: index 0 followed by index 1
> > >   followed by index 2, etc.
> > >   \drivernormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag unless the
> > > -VIRTIO_F_INDIRECT_DESC feature was negotiated.   The driver MUST NOT
> > > -set the VIRTQ_DESC_F_INDIRECT flag within an indirect descriptor (ie. only
> > > -one table per descriptor).
> > > +VIRTIO_F_INDIRECT_DESC and/or VIRTIO_F_ADMIN_VQ_INDIRECT_DESC features were negotiated.
> > > +The driver MUST NOT set the VIRTQ_DESC_F_INDIRECT flag within an indirect
> > > +descriptor (ie. only one table per descriptor).
> > >   A driver MUST NOT create a descriptor chain longer than the Queue Size of
> > >   the device.
> > > @@ -274,9 +284,10 @@ \subsubsection{Indirect Descriptors}\label{sec:Basic Facilities of a Virtio Devi
> > >   A driver MUST NOT set both VIRTQ_DESC_F_INDIRECT and VIRTQ_DESC_F_NEXT
> > >   in \field{flags}.
> > > -If VIRTIO_F_IN_ORDER has been negotiated, indirect descriptors
> > > -MUST appear sequentially, with \field{next} taking the value
> > > -of 1 for the 1st descriptor, 2 for the 2nd one, etc.
> > > +If VIRTIO_F_ADMIN_VQ_IN_ORDER and/or VIRTIO_F_IN_ORDER has been negotiated,
> > > +indirect descriptors MUST appear sequentially, with \field{next} taking the
> > > +value of 1 for the 1st descriptor, 2 for the 2nd one, etc for admin virtqueue
> > > +and rest other virtqueues types respectively.
> > >   \devicenormative{\paragraph}{Indirect Descriptors}{Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Descriptor Table / Indirect Descriptors}
> > >   The device MUST ignore the write-only flag (\field{flags}\&VIRTQ_DESC_F_WRITE) in the descriptor that refers to an indirect table.
> > > -- 
> > > 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
@ 2022-01-13 17:53   ` Michael S. Tsirkin
  2022-01-17  9:56     ` Max Gurtovoy
  2022-01-17 14:12     ` Parav Pandit
  0 siblings, 2 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 17:53 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
> In one of the many use cases a user wants to manipulate features and
> configuration of the virtio devices regardless of the device type
> (net/block/console). Some of this configuration is generic enough. i.e
> Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
> such features query and manipulation by its parent PCI PF.
> 
> Currently virtio specification defines control virtqueue to manipulate
> features and configuration of the device it operates on. However,
> control virtqueue commands are device type specific, which makes it very
> difficult to extend for device agnostic commands. Control virtqueue is
> also limited to follow in order completion for the device which
> negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
> control virtqueue for feature manipulation in out of order manner for
> unrelated commands.
> 
> To support these requirements which overcome above two limitations in
> elegant way, this patch introduces a new admin virtqueue. Admin
> virtqueue will use the same command format for all types of virtio
> devices.
> 
> Subsequent patches make use of this admin virtqueue.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  content.tex     |  9 +++++++--
>  2 files changed, 56 insertions(+), 2 deletions(-)
>  create mode 100644 admin-virtq.tex
> 
> diff --git a/admin-virtq.tex b/admin-virtq.tex
> new file mode 100644
> index 0000000..ad20f89
> --- /dev/null
> +++ b/admin-virtq.tex
> @@ -0,0 +1,49 @@
> +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
> +
> +Admin virtqueue is used to send administrative commands to manipulate
> +various features of the device which would not easily map into the
> +configuration space.

IMHO this is too vague to be useful. E.g. I don't really see
why would not commands specified in the next patch map to config space.


We had an off-list meeting where I proposed addressing one device
from another or grouping multiple devices as a more specific
scope. That would be one way to address this.

Following this idea, all commands would then gain fields for addressing
one device from another.

Not everything maps well to a queue. E.g. it would be great to have
list of available commands in memory.
Figuring out max vectors also looks like a good
example for memory and not through a command.
VQ # of the admin VQ could also be made more discoverable.
How about an SRIOV capability describing this stuff then?




> +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
> +feature bit.
> +
> +Admin virtqueue index may vary among different device types.
> +
> +The Admin command set defines the commands that may be issued only to the admin
> +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
> +support all the mandatory admin commands. A device MAY support also one or more
> +optional admin commands. All commands are of the following form:
> +
> +\begin{lstlisting}
> +struct virtio_admin_cmd {
> +        /* Device-readable part */
> +        u8 command;
> +        u8 command-specific-data[];
> +
> +        /* Device-writable part */
> +        u8 status;
> +        u8 command-specific-result[];
> +};
> +
> +/* status values */
> +#define VIRTIO_ADMIN_STATUS_OK 0
> +#define VIRTIO_ADMIN_STATUS_ERR 1
> +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
> +\end{lstlisting}
> +
> +The \field{command} and \field{command-specific-data} are
> +set by the driver, and the device sets the \field{status} and the
> +\field{command-specific-result}, if needed.
> +
> +The following table describes the Admin command set:
> +
> +\begin{tabular}{|l|l|l|l|}
> +\hline
> +Opcode (bits) & Opcode (hex) & Command & M/O \\
> +\hline \hline
> + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> +\hline
> + -  & 80h - FFh   & Reserved    & - \\
> +\hline
> +\end{tabular}
> +

Add conformance clauses pls. If this section is too generic to have any then
this functionality is too generic to be useful ;)

> diff --git a/content.tex b/content.tex
> index 32de668..c524fab 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 40] Feature bits reserved for extensions to the queue and
> +\item[24 to 41] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[41 and above] Feature bits reserved for future extensions.
> +\item[42 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\input{admin-virtq.tex}
> +
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>  
>  We start with an overview of device initialization, then expand on the
> @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that the driver can reset a queue individually.
>    See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
>  
> +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
> +  the device supports administration virtqueue negotiation.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
@ 2022-01-13 17:56   ` Michael S. Tsirkin
  2022-01-16  9:47     ` Max Gurtovoy
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 17:56 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

So admin VQ # is only known when all features are negotiated.
Which is quite annoying if hypervisor wants to partition
things e.g. handling admin q in process and handling vqs
by an external process or by hardware.

I think we can allow devices to set the VQ# for the admin queue
instead. Would that work?


> ---
>  content.tex | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0ae4b68..e9c2383 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>  \item[2(N-1)] receiveqN
>  \item[2(N-1)+1] transmitqN
>  \item[2N] controlq
> +\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
>  \end{description}
>  
>   N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
> @@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>  
>   controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
>  
> + adminq only exists if VIRTIO_F_ADMIN_VQ set.
> +
>  \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
>  
>  \begin{description}
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
@ 2022-01-13 18:20   ` Michael S. Tsirkin
  2022-01-18 10:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 18:20 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:03PM +0200, Max Gurtovoy wrote:
> A typical cloud provider SR-IOV use case is to create many VFs for
> use by guest VMs. The VFs may not be assigned to a VM until a user
> requests a VM of a certain size, e.g., number of CPUs. A VF may need
> MSI-X vectors proportional to the number of CPUs in the VM, but there is
> no standard way today in the spec to change the number of MSI-X vectors
> supported by a VF, although there are some operating systems that
> support this.
> 
> Introduce new feature bits for generic PCI virtualization management
> mechanism and a specific mechanism to manage the MSI-X vector assignment
> process of virtual/managed functions by its parent virtio device via its
> admin virtqueue. For now, virtio supports only PCI virtual function
> virtualization, thus the virt manager device will be the PF and the
> managed device will be the VF.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

So, we have the concept of vectors.


> ---
>  admin-virtq.tex | 98 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  content.tex     | 29 ++++++++++++++-
>  2 files changed, 124 insertions(+), 3 deletions(-)
> 
> diff --git a/admin-virtq.tex b/admin-virtq.tex
> index ad20f89..4ee8a32 100644
> --- a/admin-virtq.tex
> +++ b/admin-virtq.tex
> @@ -41,9 +41,105 @@ \section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin
>  \hline
>  Opcode (bits) & Opcode (hex) & Command & M/O \\
>  \hline \hline
> - -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> + 00000000b  & 00h   & VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY    & O  \\
> +\hline
> + 00000001b  & 01h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET    & O  \\
> +\hline
> + 00000010b  & 02h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET    & O  \\
> +\hline
> + -  & 03h - 7Fh   & Generic admin cmds    & -  \\

What are these?

>  \hline
>   -  & 80h - FFh   & Reserved    & - \\
>  \hline
>  \end{tabular}
>  

What are the rules for these commands? Can they be issued when any VFs
are in use? What happens then? I don't exactly understand how this
interacts with existing virtio devices binding to VFs.
Does device fail assignment of a vector # out of range,
falling back to smaller # of vectors?
Generally # of VQs to use and # of interrupts can be related, otherwise
performance might suffer - e.g. it's pointless to have many more
interrupts than VQs.
Shouldn't we control # of per-device VQs with these commands too then?


> +\subsection{VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY command has no command specific data set by the driver.
> +This command upon success, returns a data buffer that describes information about PCI virtualization
> +management attributes. This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_mgmt_attr_identify_result {
> +        /* For compatibility - indicates which of the below fields are valid (1 means valid):
> +         * Bit 0x0 - total_free_vfs_msix_count
> +         * Bit 0x1 - per_vf_max_msix_count
> +         * Bits 0x2 - 0x3F - reserved for future fields
> +         */
> +        le64 mask;
> +        /* Number of free msix in the global msix pool for VFs */
> +        le32 total_free_vfs_msix_count;
> +        /* Max number of msix vectors that can be assigned for a single VF */
> +        le16 per_vf_max_msix_count;
> +};
> +\end{lstlisting}

Looks like something that should be memory mapped. In fact
you reinvented a features/capability mask here which would
make memory-mapping this quite easy.

> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY SET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY SET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET command is used to modify the values of VF properties.

VF appears completely out of the blue here. You need some description
and quote relevant specs to introduce this to the reader.
Since this depends on a feature and feature depends on VIRTIO_F_SR_IOV
and that in turn is only for PFs, I conclude that this is also
only for PFs. But would not hurt to spell this out.
Also can other transport types support partitioning?
Or is that always a PCI thing?


> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_set_data {
> +        /* The virtual function number */
> +        le16 vf_number;
> +        /* For compatibility - indicates which of the below properties should be
> +         * modified (1 means that field should be modified):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */

Addressing specific VFs seems like something that should be
a generic capability rather a command specific one.
No? Why not?


> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}

Meaning VIRTIO_ADMIN_STATUS_ERR?



> +\end{note}
> +
> +This command has no command specific result set by the device. Upon success, the device guarantees
> +that all the requested properties were modified to the given values. Otherwise, error will be returned.
> +
> +\begin{note}
> +{Before setting msix_count property the virtual/managed device (VF) shall be un-initialized and may not be used by the driver.}
> +\end{note}
> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY GET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY GET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET command is used to obtain the values of VF properties.
> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_get_data {
> +        /* The virtual function number */
> +        le16 vf_number;

How will we extend this for things like scalable IOV partitioning?
Defining a completely new set of commands for that expected usecase
seems weird ...

> +        /* For compatibility - indicates which of the below properties should be
> +         * queried (1 means that field should be queried):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};


Pls change the layout adding padding so fields are length-aligned.

Unclear. So why does query send msix_count?


> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}
> +\end{note}
> +
> +This command, upon success, returns a data buffer that describes the properties that were requested
> +and their values for the subject virtio VF device according to the given vf_number.
> +This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_property_get_result {
> +        /* For compatibility - indicates which of the below fields were returned
> +         * (1 means that field was returned):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};

Seems the same except for VF #. So how about reusing it so reader does
not need to parse this twice?

> +\end{lstlisting}

Please describe the various fields in the document body. The structure formatting
should be of the format similar to

struct virtio_pci_cap {
        u8 cap_vndr;    /* Short description */
	...
};

We should describe this in indtroduction, I notice that we do not
currently do this.

Also pls use bitfields, defines etc as explained in introduction.tex

Pls also add conformance statements about the use here.



> diff --git a/content.tex b/content.tex
> index e9c2383..64678f0 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 43] Feature bits reserved for extensions to the queue and
> +\item[24 to 45] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[44 and above] Feature bits reserved for future extensions.
> +\item[46 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -6878,6 +6878,17 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that all buffers are used by the admin virtqueue of the device in
>    the same order in which they have been made available.
>  
> +  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates
> +  that the device can manage PCI related capabilities for its managed PCI VF
> +  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
> +  admin commands. This feature can be supported only by PCI devices.


Not sure what does _VIRT_ here stand for.

> +
> +  \item[VIRTIO_F_ADMIN_MSIX_MGMT (45)] This feature indicates
> +  that the device supports management of the MSI-X vectors for its
> +  managed PCI VF devices using VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET admin commands.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> @@ -6920,6 +6931,14 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
>  VIRTIO_F_ADMIN_VQ.
>  
> +A driver MAY accept VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
> +A driver MAY accept VIRTIO_F_ADMIN_MSIX_MGMT only if it accepts
> +VIRTIO_F_ADMIN_VQ and VIRTIO_F_ADMIN_PCI_VIRT_MANAGER. Currently only
> +MSI-X management of PCI virtual functions is supported, so the driver
> +MUST NOT negotiate VIRTIO_F_ADMIN_MSIX_MGMT if VIRTIO_F_SR_IOV is not negotiated.
> +
>  \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>  
>  A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> @@ -6960,6 +6979,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
>  VIRTIO_F_ADMIN_VQ.
>  
> +A PCI device MAY offer VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it
> +offers VIRTIO_F_ADMIN_VQ.
> +
> +A PCI device MAY offer VIRTIO_F_ADMIN_MSIX_MGMT only if it
> +offers VIRTIO_F_ADMIN_VQ, VIRTIO_F_ADMIN_PCI_VIRT_MANAGER and VIRTIO_F_SR_IOV.
> +
>  \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>  
>  Transitional devices MAY offer the following:
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
@ 2022-01-13 18:24   ` Michael S. Tsirkin
  0 siblings, 0 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 18:24 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:01PM +0200, Max Gurtovoy wrote:
> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>

Igh. Need to update each and every device just so it can get
generic commands seems very annoying.

> ---
>  content.tex | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/content.tex b/content.tex
> index cc3e648..0ae4b68 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -4518,10 +4518,19 @@ \subsection{Device ID}\label{sec:Device Types / Block Device / Device ID}
>    2
>  
>  \subsection{Virtqueues}\label{sec:Device Types / Block Device / Virtqueues}
> + if VIRTIO_F_ADMIN_VQ is not negotiated, the request queues layout is as follows:
>  \begin{description}
>  \item[0] requestq1
>  \item[\ldots]
>  \item[N-1] requestqN
> +\end{description}
> +
> + If VIRTIO_F_ADMIN_VQ is negotiated, the queues layout is as follows:
> +\begin{description}
> +\item[0] requestq1
> +\item[\ldots]
> +\item[N-1] requestqN
> +\item[N] adminq
>  \end{description}
>  
>   N=1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by
> @@ -4590,7 +4599,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / Block Device /
>  bits as indicated above.
>  
>  The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This field specifies
> -the number of queues.
> +the number of request queues. This field doesn't account admin virtqueue.
>  
>  The parameters in the configuration space of the device \field{max_discard_sectors}
>  \field{discard_sector_alignment} are expressed in 512-byte units if the
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
  2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
                   ` (4 preceding siblings ...)
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
@ 2022-01-13 18:32 ` Michael S. Tsirkin
  2022-01-17 10:00   ` Shahaf Shuler
  5 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-13 18:32 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:50:58PM +0200, Max Gurtovoy wrote:
> Hi,
> 
> In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
> device resource. Hence making efficient use of it based on the use case
> that aligns to the VM configuration is desired for best system
> performance.
> 
> For example, today's static assignment of the amount of MSI-X vectors
> doesn't allow sophisticated utilization of resources.
> 
> A typical cloud provider SR-IOV use case is to create many VFs for
> use by guest VMs. Each VM might have a different purpose and different
> amount of resources accordingly (e.g. number of CPUs). A common driver
> usage of device's MSI-X vectors is proportional to the number of CPUs in
> the VM. Since the system administrator might know the amount of CPUs in
> the requested VM, he can also configure the VF's MSI-X vectors amount
> proportional to the number of CPUs in the VM. In this way, the
> utilization of the physical hardware will be improved.
> 
> Today we have some operating systems that support provisioning MSI-X
> vectors for PCI VFs.
> 
> Update the specification to have a method to change the number of MSI-X
> vectors supported by a VF using the PF admin virtqueue interface. For that,
> create a generic infrastructure for managing PCI resources of the managed
> VF by its parent PF.

Can you describe in the cover letter or the commit log of
the admin VQ patch the motivation for using a VQ and not
memory mapped space for this capability?
In fact I feel at least some commands would be better replaced
with a memory mapped structure.


> Patches (1/5)-(2/5) introduce the admin virtqueue concept and feature bits.
> Patches (3/5)-(4/5) add the admin virtq to virtio-blk and virtio-net
> devices.
> Patch (5/5) introduce MSI-X mgmt support.
> 
> Max Gurtovoy (5):
>   Add virtio Admin Virtqueue specification
>   Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
>   virtio-blk: add support for VIRTIO_F_ADMIN_VQ
>   virtio-net: add support for VIRTIO_F_ADMIN_VQ
>   Add support for dynamic MSI-X vector mgmt for VFs
> 
>  admin-virtq.tex | 145 ++++++++++++++++++++++++++++++++++++++++++++++++
>  content.tex     |  91 +++++++++++++++++++++++++++---
>  packed-ring.tex |  26 ++++-----
>  split-ring.tex  |  35 ++++++++----
>  4 files changed, 263 insertions(+), 34 deletions(-)
>  create mode 100644 admin-virtq.tex
> 
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-13 17:56   ` Michael S. Tsirkin
@ 2022-01-16  9:47     ` Max Gurtovoy
  2022-01-16 16:45       ` Michael S. Tsirkin
  2022-01-17 14:07       ` Parav Pandit
  0 siblings, 2 replies; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-16  9:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha


On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
>> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
>>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> So admin VQ # is only known when all features are negotiated.

No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ are 
set by the device.

Negotiation is not a must.

Lets say CTRL_VQ is supported by the device and driver A would like to 
use it and driver B wouldn't like to use it - in both cases the admiq VQ 
# would be 2N + 1.

> Which is quite annoying if hypervisor wants to partition
> things e.g. handling admin q in process and handling vqs
> by an external process or by hardware.
>
> I think we can allow devices to set the VQ# for the admin queue
> instead. Would that work?
>
>
>> ---
>>   content.tex | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0ae4b68..e9c2383 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>>   \item[2(N-1)] receiveqN
>>   \item[2(N-1)+1] transmitqN
>>   \item[2N] controlq
>> +\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
>>   \end{description}
>>   
>>    N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
>> @@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
>>   
>>    controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
>>   
>> + adminq only exists if VIRTIO_F_ADMIN_VQ set.
>> +
>>   \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
>>   
>>   \begin{description}
>> -- 
>> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-16  9:47     ` Max Gurtovoy
@ 2022-01-16 16:45       ` Michael S. Tsirkin
  2022-01-17 14:07       ` Parav Pandit
  1 sibling, 0 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-16 16:45 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Sun, Jan 16, 2022 at 11:47:30AM +0200, Max Gurtovoy wrote:
> 
> On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > > Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > > 
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > So admin VQ # is only known when all features are negotiated.
> 
> No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ are set
> by the device.
> 
> Negotiation is not a must.
> 
> Lets say CTRL_VQ is supported by the device and driver A would like to use
> it and driver B wouldn't like to use it - in both cases the admiq VQ # would
> be 2N + 1.

What's N here though?

> > Which is quite annoying if hypervisor wants to partition
> > things e.g. handling admin q in process and handling vqs
> > by an external process or by hardware.


This part stands.

> > 
> > I think we can allow devices to set the VQ# for the admin queue
> > instead. Would that work?
> > 
> > 
> > > ---
> > >   content.tex | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0ae4b68..e9c2383 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -3034,6 +3034,7 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
> > >   \item[2(N-1)] receiveqN
> > >   \item[2(N-1)+1] transmitqN
> > >   \item[2N] controlq
> > > +\item[2N + 1] adminq (or \textbf{2N} in case VIRTIO_NET_F_CTRL_VQ is not set)
> > >   \end{description}
> > >    N=1 if neither VIRTIO_NET_F_MQ nor VIRTIO_NET_F_RSS are negotiated, otherwise N is set by
> > > @@ -3041,6 +3042,8 @@ \subsection{Virtqueues}\label{sec:Device Types / Network Device / Virtqueues}
> > >    controlq only exists if VIRTIO_NET_F_CTRL_VQ set.
> > > + adminq only exists if VIRTIO_F_ADMIN_VQ set.
> > > +
> > >   \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits}
> > >   \begin{description}
> > > -- 
> > > 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 17:53   ` Michael S. Tsirkin
@ 2022-01-17  9:56     ` Max Gurtovoy
  2022-01-17 21:30       ` Michael S. Tsirkin
  2022-01-17 14:12     ` Parav Pandit
  1 sibling, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-17  9:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha


On 1/13/2022 7:53 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
>> In one of the many use cases a user wants to manipulate features and
>> configuration of the virtio devices regardless of the device type
>> (net/block/console). Some of this configuration is generic enough. i.e
>> Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
>> such features query and manipulation by its parent PCI PF.
>>
>> Currently virtio specification defines control virtqueue to manipulate
>> features and configuration of the device it operates on. However,
>> control virtqueue commands are device type specific, which makes it very
>> difficult to extend for device agnostic commands. Control virtqueue is
>> also limited to follow in order completion for the device which
>> negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
>> control virtqueue for feature manipulation in out of order manner for
>> unrelated commands.
>>
>> To support these requirements which overcome above two limitations in
>> elegant way, this patch introduces a new admin virtqueue. Admin
>> virtqueue will use the same command format for all types of virtio
>> devices.
>>
>> Subsequent patches make use of this admin virtqueue.
>>
>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>> ---
>>   admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>   content.tex     |  9 +++++++--
>>   2 files changed, 56 insertions(+), 2 deletions(-)
>>   create mode 100644 admin-virtq.tex
>>
>> diff --git a/admin-virtq.tex b/admin-virtq.tex
>> new file mode 100644
>> index 0000000..ad20f89
>> --- /dev/null
>> +++ b/admin-virtq.tex
>> @@ -0,0 +1,49 @@
>> +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
>> +
>> +Admin virtqueue is used to send administrative commands to manipulate
>> +various features of the device which would not easily map into the
>> +configuration space.
> IMHO this is too vague to be useful. E.g. I don't really see
> why would not commands specified in the next patch map to config space.

Well I took this sentence from the current spec :)


>
>
> We had an off-list meeting where I proposed addressing one device
> from another or grouping multiple devices as a more specific
> scope. That would be one way to address this.

Are you suggestion a creation of a virtio subsystem or a virtio group 
definition ?

Devices will be part of this subsystem: one primary/manager device and 
many secondary/managed devices ?

Each subsystem will have a unique UUID and each device will have a 
unique vdev_id within this subsystem.

If this is the direction, I can prepare something..

>
> Following this idea, all commands would then gain fields for addressing
> one device from another.
>
> Not everything maps well to a queue. E.g. it would be great to have
> list of available commands in memory.

I'm not sure I agree. Why can't it map to a queue ?


> Figuring out max vectors also looks like a good
> example for memory and not through a command.

Any explanation why is it looks good ? or better ?

> VQ # of the admin VQ could also be made more discoverable.
> How about an SRIOV capability describing this stuff then?
>
>
>
>
>> +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
>> +feature bit.
>> +
>> +Admin virtqueue index may vary among different device types.
>> +
>> +The Admin command set defines the commands that may be issued only to the admin
>> +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
>> +support all the mandatory admin commands. A device MAY support also one or more
>> +optional admin commands. All commands are of the following form:
>> +
>> +\begin{lstlisting}
>> +struct virtio_admin_cmd {
>> +        /* Device-readable part */
>> +        u8 command;
>> +        u8 command-specific-data[];
>> +
>> +        /* Device-writable part */
>> +        u8 status;
>> +        u8 command-specific-result[];
>> +};
>> +
>> +/* status values */
>> +#define VIRTIO_ADMIN_STATUS_OK 0
>> +#define VIRTIO_ADMIN_STATUS_ERR 1
>> +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
>> +\end{lstlisting}
>> +
>> +The \field{command} and \field{command-specific-data} are
>> +set by the driver, and the device sets the \field{status} and the
>> +\field{command-specific-result}, if needed.
>> +
>> +The following table describes the Admin command set:
>> +
>> +\begin{tabular}{|l|l|l|l|}
>> +\hline
>> +Opcode (bits) & Opcode (hex) & Command & M/O \\
>> +\hline \hline
>> + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
>> +\hline
>> + -  & 80h - FFh   & Reserved    & - \\
>> +\hline
>> +\end{tabular}
>> +
> Add conformance clauses pls. If this section is too generic to have any then
> this functionality is too generic to be useful ;)
>
>> diff --git a/content.tex b/content.tex
>> index 32de668..c524fab 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>>   \begin{description}
>>   \item[0 to 23] Feature bits for the specific device type
>>   
>> -\item[24 to 40] Feature bits reserved for extensions to the queue and
>> +\item[24 to 41] Feature bits reserved for extensions to the queue and
>>     feature negotiation mechanisms
>>   
>> -\item[41 and above] Feature bits reserved for future extensions.
>> +\item[42 and above] Feature bits reserved for future extensions.
>>   \end{description}
>>   
>>   \begin{note}
>> @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\input{admin-virtq.tex}
>> +
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>   
>>   We start with an overview of device initialization, then expand on the
>> @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     that the driver can reset a queue individually.
>>     See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
>>   
>> +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>> +  the device supports administration virtqueue negotiation.
>> +
>>   \end{description}
>>   
>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>> -- 
>> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
  2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
@ 2022-01-17 10:00   ` Shahaf Shuler
  2022-01-17 21:41     ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Shahaf Shuler @ 2022-01-17 10:00 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Parav Pandit,
	Oren Duer, stefanha

Thursday, January 13, 2022 8:32 PM, Michael S. Tsirkin:
> Subject: Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a
> VF
> 
> On Thu, Jan 13, 2022 at 04:50:58PM +0200, Max Gurtovoy wrote:
> > Hi,
> >
> > In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
> > device resource. Hence making efficient use of it based on the use
> > case that aligns to the VM configuration is desired for best system
> > performance.
> >
> > For example, today's static assignment of the amount of MSI-X vectors
> > doesn't allow sophisticated utilization of resources.
> >
> > A typical cloud provider SR-IOV use case is to create many VFs for use
> > by guest VMs. Each VM might have a different purpose and different
> > amount of resources accordingly (e.g. number of CPUs). A common driver
> > usage of device's MSI-X vectors is proportional to the number of CPUs
> > in the VM. Since the system administrator might know the amount of
> > CPUs in the requested VM, he can also configure the VF's MSI-X vectors
> > amount proportional to the number of CPUs in the VM. In this way, the
> > utilization of the physical hardware will be improved.
> >
> > Today we have some operating systems that support provisioning MSI-X
> > vectors for PCI VFs.
> >
> > Update the specification to have a method to change the number of
> > MSI-X vectors supported by a VF using the PF admin virtqueue
> > interface. For that, create a generic infrastructure for managing PCI
> > resources of the managed VF by its parent PF.
> 
> Can you describe in the cover letter or the commit log of the admin VQ patch
> the motivation for using a VQ and not memory mapped space for this
> capability?
> In fact I feel at least some commands would be better replaced with a
> memory mapped structure.

I am wondering what is the motivation to go for memory mapped structures for such control operations. 

I can fully understand why data plane related fields should be placed on MMIO structures. However for control, memory mapped commands are:
1. More constraining for the device implementor and thus not scalable. MMIO direct access implies on-die resources to be allocated. You can see as example the IMS section on Scalable IOV spec[1] that follows this exact design
2. Hard to maintain - each new command may add new MMIO fields, making the device BAR complex.
3. Implies a non-uniform design - some commands are memory mapped, some commands are VQ based. How do we provide the guiding rules to decide? Isn't it simpler to have a single i/f for all the control? 

[1]
https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-scalable-io-virtualization.html


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-13 17:25       ` Michael S. Tsirkin
@ 2022-01-17 13:59         ` Parav Pandit
  2022-01-17 22:14           ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-17 13:59 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, January 13, 2022 10:56 PM
> 
> On Thu, Jan 13, 2022 at 07:07:53PM +0200, Max Gurtovoy wrote:
> >
> > On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> > > On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> > > > These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> > > > VIRTIO_F_IN_ORDER. Some devices might support these features only
> > > > for admin virtqueues and some might support them for both admin
> > > > virtqueues and request virtqueues or only for non-admin
> > > > virtqueues. Some optimization can be made for each type of
> > > > virtqueue, thus we separate these features for the different virtqueue
> types.
> > > >
> > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > That seems vague as motivation.
> > > Why do we need to optimize admin queues? Aren't they fundamentally a
> > > control path feature?
> > > Why would we want to special-case these features specifically?
> > > Should we allow control of features per VQ generally?
> >
> > We would like to allow executing admins commands out of order and IO
> > requests in order for efficiency.
> 
> It's a control queue. Why do we worry?
It is used to control/manage the resource of a VF which is deployed usually to a VM.
So higher the latency, higher the time it takes to deploy start the VM.
Hence, it is better to have this basic functionality in place, being useful beyond MSI-X config.
It is not functionally must. But riding AQ command ordering on VIRTIO_F_IN_ORDER for now and later on driving based on new field requires dual handling.
Better to start with its AQ's own ordering and one scheme.

> 
> 
> >
> > And also the other way around.
> 
> what exactly does this mean?
> 
IO commands out of order (for say block device), but AQ commands in order.
May be AQ command execution can be always treated as out of order, even when VIRTIO_F_IN_ORDER is negotiated.
This way it will be even more simpler design for driver and device.

> > IO cmds and admin cmds have different considerations in many cases.
> 
> That's still pretty vague.  so do other types of VQ, such as RX/TX.
> 
> E.g. I can see how a hardware vendor might want to avoid supporting indirect
> with RX for virtio net with mergeable buffers, but still support it for TX.
> 
> 
> I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
I agree. It only helps driver to ensure that AQ commands are processed in order, so it doesn't need to serialize it.
But yes, driver can always serialize it if needed when AQ is always out of order.
I think we should word it that AQ is always out of order.

> I think you want to reorder admin commands dealing with unrelated VFs but
> keep io vqs in order for speed.
> Just guessing, you should spell the real motivation out.
> However, I think a better way to do that is with finalizing the
> VIRTIO_F_PARTIAL_ORDER proposal from august.
I read the partial order proposal at [1].
It still appears IN_ORDER from driver POV.
So I am not sure if driver can complete AQ commands out of order. Can it?
I think data path needs more plumbing that just PARTIAL_ORDER flag, for descriptor processing differently on tx and rx side.
Not sure merging AQ to it is useful, given that we agree that AQ should always behave as out of order from beginning.

[1] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html

> Pls review and let me know. If there's finally a use for it I'll prioritize finalizing
> that idea.
> Don't see much point in tweaking INDIRECT at all.
Common negotiation of INDIRECT on AQ and other queues forces data path also to handle that.
It is better to not impact the device to handler indirect descriptors on non AQ queues, just because AQ prefers to handle it.
Often AQ and data path queues are not handled by same set of processing engines given they both do different tasks.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-16  9:47     ` Max Gurtovoy
  2022-01-16 16:45       ` Michael S. Tsirkin
@ 2022-01-17 14:07       ` Parav Pandit
  2022-01-17 22:22         ` Michael S. Tsirkin
  1 sibling, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-17 14:07 UTC (permalink / raw)
  To: Max Gurtovoy, Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha


> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> Sent: Sunday, January 16, 2022 3:18 PM
> 
> 
> On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> >> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> >>
> >> Reviewed-by: Parav Pandit <parav@nvidia.com>
> >> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > So admin VQ # is only known when all features are negotiated.
> 
> No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> are set by the device.
> 
> Negotiation is not a must.
> 
> Lets say CTRL_VQ is supported by the device and driver A would like to use it
> and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> + 1.
> 
> > Which is quite annoying if hypervisor wants to partition things e.g.
> > handling admin q in process and handling vqs by an external process or
> > by hardware.
> >
> > I think we can allow devices to set the VQ# for the admin queue
> > instead. Would that work?
Number of MSI-X configuration and number of VQs config are two different, though it has strong correlation.
Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).

So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-13 17:53   ` Michael S. Tsirkin
  2022-01-17  9:56     ` Max Gurtovoy
@ 2022-01-17 14:12     ` Parav Pandit
  2022-01-17 22:03       ` Michael S. Tsirkin
  1 sibling, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-17 14:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, January 13, 2022 11:24 PM

> 
> We had an off-list meeting where I proposed addressing one device from
> another or grouping multiple devices as a more specific scope. That would be
> one way to address this.
> 
> Following this idea, all commands would then gain fields for addressing one
> device from another.
> 
Can you please explain your idea more and a need for grouping?
What do you want to group? VFs of parent pci device?
How to refer to each VF within a group?

If you have notes of the off-list meeting, it will be useful for us to read through.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17  9:56     ` Max Gurtovoy
@ 2022-01-17 21:30       ` Michael S. Tsirkin
  2022-01-18  3:22         ` Parav Pandit
  2022-01-19  3:04         ` Jason Wang
  0 siblings, 2 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 21:30 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Mon, Jan 17, 2022 at 11:56:09AM +0200, Max Gurtovoy wrote:
> 
> On 1/13/2022 7:53 PM, Michael S. Tsirkin wrote:
> > On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
> > > In one of the many use cases a user wants to manipulate features and
> > > configuration of the virtio devices regardless of the device type
> > > (net/block/console). Some of this configuration is generic enough. i.e
> > > Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
> > > such features query and manipulation by its parent PCI PF.
> > > 
> > > Currently virtio specification defines control virtqueue to manipulate
> > > features and configuration of the device it operates on. However,
> > > control virtqueue commands are device type specific, which makes it very
> > > difficult to extend for device agnostic commands. Control virtqueue is
> > > also limited to follow in order completion for the device which
> > > negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
> > > control virtqueue for feature manipulation in out of order manner for
> > > unrelated commands.
> > > 
> > > To support these requirements which overcome above two limitations in
> > > elegant way, this patch introduces a new admin virtqueue. Admin
> > > virtqueue will use the same command format for all types of virtio
> > > devices.
> > > 
> > > Subsequent patches make use of this admin virtqueue.
> > > 
> > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > ---
> > >   admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >   content.tex     |  9 +++++++--
> > >   2 files changed, 56 insertions(+), 2 deletions(-)
> > >   create mode 100644 admin-virtq.tex
> > > 
> > > diff --git a/admin-virtq.tex b/admin-virtq.tex
> > > new file mode 100644
> > > index 0000000..ad20f89
> > > --- /dev/null
> > > +++ b/admin-virtq.tex
> > > @@ -0,0 +1,49 @@
> > > +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
> > > +
> > > +Admin virtqueue is used to send administrative commands to manipulate
> > > +various features of the device which would not easily map into the
> > > +configuration space.
> > IMHO this is too vague to be useful. E.g. I don't really see
> > why would not commands specified in the next patch map to config space.
> 
> Well I took this sentence from the current spec :)

Well in current spec it applies to things like MAC address filtering,
which does not easily map into config space because number of MACs
varies.


> 
> > 
> > 
> > We had an off-list meeting where I proposed addressing one device
> > from another or grouping multiple devices as a more specific
> > scope. That would be one way to address this.
> 
> Are you suggestion a creation of a virtio subsystem or a virtio group
> definition ?
> 
> Devices will be part of this subsystem: one primary/manager device and many
> secondary/managed devices ?
> 
> Each subsystem will have a unique UUID and each device will have a unique
> vdev_id within this subsystem.
> 
> If this is the direction, I can prepare something..

I was merely saying that what is special about admin queue is that it
allows controlling one device from another within some group.
Or maybe that it allows grouping multiple devices.
*Not* that these are things that do not map to config space.

Let me give you another example, imagine that you want to handle
pagefaults from device.  Clearly a generic thing that does not map to
config space.  It could be a good candidate for the admin queue, however
it would require that lots of buffers are pre-added to the queue. So it
looks like it will beed another distinct fault queue.  Further it is
possible that you want to handle faults within guest, by the driver. In
that case you do not want it in the admin queue since that is controlled
by hypervisor, you want it in a separate queue controlled by driver.


I don't recall discussion about UUID so I can't really say what
I think about that. Do we need a UUID? I'm not sure I understand why.
It can't hurt to abstract things a bit so it's not all tied to
PFs/VFs since we know we'll want subfunctions down the road, too,
if that is what you mean.



> > 
> > Following this idea, all commands would then gain fields for addressing
> > one device from another.
> > 
> > Not everything maps well to a queue. E.g. it would be great to have
> > list of available commands in memory.
> 
> I'm not sure I agree. Why can't it map to a queue ?

You can map it to a queue, yes. But something static
and read only such as list of commands maps well to
config space. And it's not controlling one device from
another, so does not really seem to belong in the admin queue.

> 
> > Figuring out max vectors also looks like a good
> > example for memory and not through a command.
> 
> Any explanation why is it looks good ? or better ?

why is memory easier to operate than a VQ?
It's much simpler and so less error prone.  you can have multiple actors
read such a field at the same time without races, so e.g.  there could
be a sysfs attribute that reads from device on each access, and not
special error handling is needed.

> > VQ # of the admin VQ could also be made more discoverable.
> > How about an SRIOV capability describing this stuff then?
> > 
> > 
> > 
> > 
> > > +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
> > > +feature bit.
> > > +
> > > +Admin virtqueue index may vary among different device types.
> > > +
> > > +The Admin command set defines the commands that may be issued only to the admin
> > > +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
> > > +support all the mandatory admin commands. A device MAY support also one or more
> > > +optional admin commands. All commands are of the following form:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_admin_cmd {
> > > +        /* Device-readable part */
> > > +        u8 command;
> > > +        u8 command-specific-data[];
> > > +
> > > +        /* Device-writable part */
> > > +        u8 status;
> > > +        u8 command-specific-result[];
> > > +};
> > > +
> > > +/* status values */
> > > +#define VIRTIO_ADMIN_STATUS_OK 0
> > > +#define VIRTIO_ADMIN_STATUS_ERR 1
> > > +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
> > > +\end{lstlisting}
> > > +
> > > +The \field{command} and \field{command-specific-data} are
> > > +set by the driver, and the device sets the \field{status} and the
> > > +\field{command-specific-result}, if needed.
> > > +
> > > +The following table describes the Admin command set:
> > > +
> > > +\begin{tabular}{|l|l|l|l|}
> > > +\hline
> > > +Opcode (bits) & Opcode (hex) & Command & M/O \\
> > > +\hline \hline
> > > + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> > > +\hline
> > > + -  & 80h - FFh   & Reserved    & - \\
> > > +\hline
> > > +\end{tabular}
> > > +
> > Add conformance clauses pls. If this section is too generic to have any then
> > this functionality is too generic to be useful ;)
> > 
> > > diff --git a/content.tex b/content.tex
> > > index 32de668..c524fab 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
> > >   \begin{description}
> > >   \item[0 to 23] Feature bits for the specific device type
> > > -\item[24 to 40] Feature bits reserved for extensions to the queue and
> > > +\item[24 to 41] Feature bits reserved for extensions to the queue and
> > >     feature negotiation mechanisms
> > > -\item[41 and above] Feature bits reserved for future extensions.
> > > +\item[42 and above] Feature bits reserved for future extensions.
> > >   \end{description}
> > >   \begin{note}
> > > @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
> > >   types. It is RECOMMENDED that devices generate version 4
> > >   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
> > > +\input{admin-virtq.tex}
> > > +
> > >   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > >   We start with an overview of device initialization, then expand on the
> > > @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >     that the driver can reset a queue individually.
> > >     See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
> > > +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
> > > +  the device supports administration virtqueue negotiation.
> > > +
> > >   \end{description}
> > >   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > -- 
> > > 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF
  2022-01-17 10:00   ` Shahaf Shuler
@ 2022-01-17 21:41     ` Michael S. Tsirkin
  0 siblings, 0 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 21:41 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Parav Pandit, Oren Duer, stefanha

On Mon, Jan 17, 2022 at 10:00:21AM +0000, Shahaf Shuler wrote:
> Thursday, January 13, 2022 8:32 PM, Michael S. Tsirkin:
> > Subject: Re: [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a
> > VF
> > 
> > On Thu, Jan 13, 2022 at 04:50:58PM +0200, Max Gurtovoy wrote:
> > > Hi,
> > >
> > > In a PCI SR-IOV configuration, MSI-X vectors of the device is precious
> > > device resource. Hence making efficient use of it based on the use
> > > case that aligns to the VM configuration is desired for best system
> > > performance.
> > >
> > > For example, today's static assignment of the amount of MSI-X vectors
> > > doesn't allow sophisticated utilization of resources.
> > >
> > > A typical cloud provider SR-IOV use case is to create many VFs for use
> > > by guest VMs. Each VM might have a different purpose and different
> > > amount of resources accordingly (e.g. number of CPUs). A common driver
> > > usage of device's MSI-X vectors is proportional to the number of CPUs
> > > in the VM. Since the system administrator might know the amount of
> > > CPUs in the requested VM, he can also configure the VF's MSI-X vectors
> > > amount proportional to the number of CPUs in the VM. In this way, the
> > > utilization of the physical hardware will be improved.
> > >
> > > Today we have some operating systems that support provisioning MSI-X
> > > vectors for PCI VFs.
> > >
> > > Update the specification to have a method to change the number of
> > > MSI-X vectors supported by a VF using the PF admin virtqueue
> > > interface. For that, create a generic infrastructure for managing PCI
> > > resources of the managed VF by its parent PF.
> > 
> > Can you describe in the cover letter or the commit log of the admin VQ patch
> > the motivation for using a VQ and not memory mapped space for this
> > capability?
> > In fact I feel at least some commands would be better replaced with a
> > memory mapped structure.
> 
> I am wondering what is the motivation to go for memory mapped structures for such control operations. 
> 
> I can fully understand why data plane related fields should be placed on MMIO structures.

Actually, data plane is usually in a VQ for us, since MMIO accesses
trigger VM exits.

> However for control, memory mapped commands are:
> 1. More constraining for the device implementor and thus not scalable. MMIO direct access implies on-die resources to be allocated. You can see as example the IMS section on Scalable IOV spec[1] that follows this exact design

Oh it's a PCIe thing, right? Read can not depend on another read?
So this is one of the reasons we don't put big structures in MMIO.
But a couple of bytes is really no big deal IMHO.

> 2. Hard to maintain - each new command may add new MMIO fields, making the device BAR complex.

Well actually we have very nice APIs to handle dependency
between memory and feature bits. It's much harder to abstract
away VQ commands, we don't have anything uniform for that.

> 3. Implies a non-uniform design - some commands are memory mapped,
> some commands are VQ based. How do we provide the guiding rules to
> decide? Isn't it simpler to have a single i/f for all the control? 

newdevice.tex has some guiding principles, see "What Device
Configuration Space Layout?".

But yes, if the answer is "commands A,B,C do not fit in
config space, we placed commands D,E in a VQ for consitency"
then that is an ok answer, but it's something to be mentioned
in the commit log.



> 
> [1]
> https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-scalable-io-virtualization.html

config space is generally more robust, requires less code
on both host and guest side.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 14:12     ` Parav Pandit
@ 2022-01-17 22:03       ` Michael S. Tsirkin
  2022-01-18  3:36         ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 22:03 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Mon, Jan 17, 2022 at 02:12:33PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, January 13, 2022 11:24 PM
> 
> > 
> > We had an off-list meeting where I proposed addressing one device from
> > another or grouping multiple devices as a more specific scope. That would be
> > one way to address this.
> > 
> > Following this idea, all commands would then gain fields for addressing one
> > device from another.
> > 
> Can you please explain your idea more and a need for grouping?
> What do you want to group? VFs of parent pci device?
> How to refer to each VF within a group?

So for example, VFs of a PF are a group right? And they are all
controlled by a PF.

I can think of setups like nesting where we might want to
create a group of VFs and pass them to L1, one of the
VFs to act as an admin for the reset of them for purposes
of L2.  subfunctions with PASID etc are another
example. I am not asking you to add such mechanisms straight away
but the current proposal kind of obscures this to the point
where I don't see how would we extend it with these things
down the road.

> If you have notes of the off-list meeting, it will be useful for us to read through.

Sorry didn't take notes.

-- 
MST

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-17 13:59         ` Parav Pandit
@ 2022-01-17 22:14           ` Michael S. Tsirkin
  2022-01-18  4:44             ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 22:14 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Mon, Jan 17, 2022 at 01:59:29PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, January 13, 2022 10:56 PM
> > 
> > On Thu, Jan 13, 2022 at 07:07:53PM +0200, Max Gurtovoy wrote:
> > >
> > > On 1/13/2022 5:33 PM, Michael S. Tsirkin wrote:
> > > > On Thu, Jan 13, 2022 at 04:51:00PM +0200, Max Gurtovoy wrote:
> > > > > These new features are parallel to VIRTIO_F_INDIRECT_DESC and
> > > > > VIRTIO_F_IN_ORDER. Some devices might support these features only
> > > > > for admin virtqueues and some might support them for both admin
> > > > > virtqueues and request virtqueues or only for non-admin
> > > > > virtqueues. Some optimization can be made for each type of
> > > > > virtqueue, thus we separate these features for the different virtqueue
> > types.
> > > > >
> > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > That seems vague as motivation.
> > > > Why do we need to optimize admin queues? Aren't they fundamentally a
> > > > control path feature?
> > > > Why would we want to special-case these features specifically?
> > > > Should we allow control of features per VQ generally?
> > >
> > > We would like to allow executing admins commands out of order and IO
> > > requests in order for efficiency.
> > 
> > It's a control queue. Why do we worry?
> It is used to control/manage the resource of a VF which is deployed usually to a VM.
> So higher the latency, higher the time it takes to deploy start the VM.

What are the savings here, in real terms? Boot times for smallest VMs
are in 10s of milliseconds. Is reordering of a queue somehow
going to save more than microseconds?

> Hence, it is better to have this basic functionality in place, being useful beyond MSI-X config.
> It is not functionally must. But riding AQ command ordering on VIRTIO_F_IN_ORDER for now and later on driving based on new field requires dual handling.
> Better to start with its AQ's own ordering and one scheme.

Sorry I'm still scratching my head.


> > 
> > 
> > >
> > > And also the other way around.
> > 
> > what exactly does this mean?
> > 
> IO commands out of order (for say block device), but AQ commands in order.
> May be AQ command execution can be always treated as out of order, even when VIRTIO_F_IN_ORDER is negotiated.
> This way it will be even more simpler design for driver and device.
> 
> > > IO cmds and admin cmds have different considerations in many cases.
> > 
> > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > 
> > E.g. I can see how a hardware vendor might want to avoid supporting indirect
> > with RX for virtio net with mergeable buffers, but still support it for TX.
> > 
> > 
> > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> I agree. It only helps driver to ensure that AQ commands are processed in order, so it doesn't need to serialize it.
> But yes, driver can always serialize it if needed when AQ is always out of order.
> I think we should word it that AQ is always out of order.
> 
> > I think you want to reorder admin commands dealing with unrelated VFs but
> > keep io vqs in order for speed.
> > Just guessing, you should spell the real motivation out.
> > However, I think a better way to do that is with finalizing the
> > VIRTIO_F_PARTIAL_ORDER proposal from august.
> I read the partial order proposal at [1].
> It still appears IN_ORDER from driver POV.
> So I am not sure if driver can complete AQ commands out of order. Can it?

complete commands == use buffers?
drivers do not use buffers.

> I think data path needs more plumbing that just PARTIAL_ORDER flag, for descriptor processing differently on tx and rx side.
> Not sure merging AQ to it is useful, given that we agree that AQ should always behave as out of order from beginning.
> 
> [1] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html

You mean *device*. Driver does not control the order.
The point of PARTIAL_ORDER is basically that some
descriptors are in order some out of order and its up to device. So it is
even finer resolution.


> > Pls review and let me know. If there's finally a use for it I'll prioritize finalizing
> > that idea.
> > Don't see much point in tweaking INDIRECT at all.
> Common negotiation of INDIRECT on AQ and other queues forces data path also to handle that.

I don't see why admin queue needs indirect descriptors.

> It is better to not impact the device to handler indirect descriptors on non AQ queues, just because AQ prefers to handle it.
> Often AQ and data path queues are not handled by same set of processing engines given they both do different tasks.

so for example, many guests want to use indirect for tx but not for rx.
if you are worrying about things like that, maybe a per-vq control
over indirect support makes sense.
adding complexity like that should really be much better motivated,
and maybe have some PoC code or back of the napkin math
showing the expected gains.


-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-17 14:07       ` Parav Pandit
@ 2022-01-17 22:22         ` Michael S. Tsirkin
  2022-01-18  2:18           ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-17 22:22 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
> 
> > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > Sent: Sunday, January 16, 2022 3:18 PM
> > 
> > 
> > On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > >> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > >>
> > >> Reviewed-by: Parav Pandit <parav@nvidia.com>
> > >> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > So admin VQ # is only known when all features are negotiated.
> > 
> > No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> > are set by the device.
> > 
> > Negotiation is not a must.
> > 
> > Lets say CTRL_VQ is supported by the device and driver A would like to use it
> > and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> > + 1.
> > 
> > > Which is quite annoying if hypervisor wants to partition things e.g.
> > > handling admin q in process and handling vqs by an external process or
> > > by hardware.
> > >
> > > I think we can allow devices to set the VQ# for the admin queue
> > > instead. Would that work?
> Number of MSI-X configuration and number of VQs config are two different,


I was talking about the number of the VQ used for admin commands. Not
about the number of VQs.

> though it has strong correlation.
> Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
> 
> So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.

I was not talking about that at all, but since you mention that,
to me it looks like something that many device types can support.
It's not necessarily rss related, MQ config would benefit too,
so I am not sure why not have a command for controlling number
of queues. Looks like it could quite be generic.

Since current guests only support two modes: a vector
per VQ and a shared vector for all VQs, it follows that
it is important when configuring vectors per VF to also configure
VQs per VF. This makes me wonder whether ability to configure
vectors per VF in isolation without ability to configure or
at least query VQs per VF even has value.


-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-17 22:22         ` Michael S. Tsirkin
@ 2022-01-18  2:18           ` Jason Wang
  2022-01-18  5:25             ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-18  2:18 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha


在 2022/1/18 上午6:22, Michael S. Tsirkin 写道:
> On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
>>> From: Max Gurtovoy <mgurtovoy@nvidia.com>
>>> Sent: Sunday, January 16, 2022 3:18 PM
>>>
>>>
>>> On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
>>>> On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
>>>>> Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
>>>>>
>>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> So admin VQ # is only known when all features are negotiated.
>>> No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
>>> are set by the device.
>>>
>>> Negotiation is not a must.
>>>
>>> Lets say CTRL_VQ is supported by the device and driver A would like to use it
>>> and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
>>> + 1.
>>>
>>>> Which is quite annoying if hypervisor wants to partition things e.g.
>>>> handling admin q in process and handling vqs by an external process or
>>>> by hardware.
>>>>
>>>> I think we can allow devices to set the VQ# for the admin queue
>>>> instead. Would that work?
>> Number of MSI-X configuration and number of VQs config are two different,
>
> I was talking about the number of the VQ used for admin commands. Not
> about the number of VQs.
>
>> though it has strong correlation.
>> Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
>>
>> So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.
> I was not talking about that at all, but since you mention that,
> to me it looks like something that many device types can support.
> It's not necessarily rss related, MQ config would benefit too,
> so I am not sure why not have a command for controlling number
> of queues. Looks like it could quite be generic.
>
> Since current guests only support two modes: a vector
> per VQ and a shared vector for all VQs, it follows that
> it is important when configuring vectors per VF to also configure
> VQs per VF. This makes me wonder whether ability to configure
> vectors per VF in isolation without ability to configure or
> at least query VQs per VF even has value.


So I had some thought in the past, it looks to me we need a generic 
provision interface that contains all the necessary attributes:

1) #queues
2) device_features
3) #msi_vectors
4) device specific configurations

It could be either an admin virtqueue interface[1] or a dedicated 
capability[2], (the latter seems easier).

Thanks

[1] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html
[2] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


>
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 21:30       ` Michael S. Tsirkin
@ 2022-01-18  3:22         ` Parav Pandit
  2022-01-18  6:17           ` Michael S. Tsirkin
  2022-01-19  3:04         ` Jason Wang
  1 sibling, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  3:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 3:01 AM

> On Mon, Jan 17, 2022 at 11:56:09AM +0200, Max Gurtovoy wrote:
> > > > +Admin virtqueue is used to send administrative commands to
> > > > +manipulate various features of the device which would not easily
> > > > +map into the configuration space.
> > > IMHO this is too vague to be useful. E.g. I don't really see why
> > > would not commands specified in the next patch map to config space.
> >
> > Well I took this sentence from the current spec :)
> 
> Well in current spec it applies to things like MAC address filtering, which does
> not easily map into config space because number of MACs varies.

It doesn't well map to the config space for very primary reason that that it is read+write access that driver should be able to in async manner.
Yes, we will improve this part of the commit log to described that doing via AQ enables driver to not get blocked by previous outstanding command.

> 
> 
> >
> > >
> > >
> > > We had an off-list meeting where I proposed addressing one device
> > > from another or grouping multiple devices as a more specific scope.
> > > That would be one way to address this.
> >
> > Are you suggestion a creation of a virtio subsystem or a virtio group
> > definition ?
> >
> > Devices will be part of this subsystem: one primary/manager device and
> > many secondary/managed devices ?
> >
> > Each subsystem will have a unique UUID and each device will have a
> > unique vdev_id within this subsystem.
> >
> > If this is the direction, I can prepare something..
> 
> I was merely saying that what is special about admin queue is that it allows
> controlling one device from another within some group.
> Or maybe that it allows grouping multiple devices.
> *Not* that these are things that do not map to config space.
> 
> Let me give you another example, imagine that you want to handle pagefaults
> from device.  Clearly a generic thing that does not map to config space.  It
> could be a good candidate for the admin queue, however it would require that
> lots of buffers are pre-added to the queue. So it looks like it will beed another
> distinct fault queue.  
Right page fault queue is async queue located in hv and/or guest more like net device rq.
AQ is serving request-response queue.
Page fault queue likely needed multiple to have any reasonable bw, per cpu is one option.

> Further it is possible that you want to handle faults
> within guest, by the driver. In that case you do not want it in the admin queue
> since that is controlled by hypervisor, you want it in a separate queue
> controlled by driver.
> 
Yes. so it is better to not merge page fault queue with admin queue.

> 
> I don't recall discussion about UUID so I can't really say what I think about that.
> Do we need a UUID? I'm not sure I understand why.
> It can't hurt to abstract things a bit so it's not all tied to PFs/VFs since we know
> we'll want subfunctions down the road, too, if that is what you mean.
>
I still didn't find any reason in the discussion to find out why grouping device is needed.
Current AQ proposal implicitly indicates that VFs of a PF are managed by its parent PF.
And for some reason this work by one of the VF, this role assignment can be certainly a new command on AQ as group command or some other command.
 
> 
> 
> > >
> > > Following this idea, all commands would then gain fields for
> > > addressing one device from another.
> > >
> > > Not everything maps well to a queue. E.g. it would be great to have
> > > list of available commands in memory.
> >
> > I'm not sure I agree. Why can't it map to a queue ?
> 
> You can map it to a queue, yes. But something static and read only such as list
> of commands maps well to config space. And it's not controlling one device
> from another, so does not really seem to belong in the admin queue.
> 
Aq serves the writing device config too in patch-5 in this patchset.

> >
> > > Figuring out max vectors also looks like a good example for memory
> > > and not through a command.
> >
> > Any explanation why is it looks good ? or better ?
> 
> why is memory easier to operate than a VQ?
> It's much simpler and so less error prone.  you can have multiple actors read
> such a field at the same time without races, so e.g.  there could be a sysfs
> attribute that reads from device on each access, and not special error handling
> is needed.
>
Writing fields is inherent part of the aq without getting blocked on previous writes.
I see you acked that AQ is fine in cover letter patch as below, so we are sync on the motivation now.
Yes, will update the commit log as you suggested.

 " if the answer is "commands A,B,C do not fit in config space, we placed commands D,E in a VQ for consistency"
then that is an ok answer, but it's something to be mentioned in the commit log"


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 22:03       ` Michael S. Tsirkin
@ 2022-01-18  3:36         ` Parav Pandit
  2022-01-18  7:07           ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  3:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 3:33 AM
> 
> On Mon, Jan 17, 2022 at 02:12:33PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, January 13, 2022 11:24 PM
> >
> > >
> > > We had an off-list meeting where I proposed addressing one device
> > > from another or grouping multiple devices as a more specific scope.
> > > That would be one way to address this.
> > >
> > > Following this idea, all commands would then gain fields for
> > > addressing one device from another.
> > >
> > Can you please explain your idea more and a need for grouping?
> > What do you want to group? VFs of parent pci device?
> > How to refer to each VF within a group?
> 
> So for example, VFs of a PF are a group right? And they are all controlled by a
> PF.
> 
> I can think of setups like nesting where we might want to create a group of VFs
> and pass them to L1, one of the VFs to act as an admin for the reset of them for
> purposes of L2.  subfunctions with PASID etc are another example. 

Subfunctions with PASID can be similarly managed by extending device identification and its MSIX/IMS vector details.
May be vf_number should be put in the union as,

union device_id {
	struct pci_vf vf_id; /* current */
	struct pci_sf sf_id; /* future */
};

So that they both can use command opcode.

> I am not
> asking you to add such mechanisms straight away but the current proposal
> kind of obscures this to the point where I don't see how would we extend it
> with these things down the road.
> 
Which part in specific make it obscure? New device type can be identifiable by above union.

May be a better structure would be in patch-5 is:
Something like below,

struct virtio_admin_pci_virt_property_set {
	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction */
	union virtio_device_identifier {
		struct virtio_pci_dev_id pf_vf; /* current */
		struct virtio_subfunction sf; /* future */
	};
	enum virtio_interrupt_type interrupt_type; /* msix, ims=device specific, intx, something else */
	union virtio_interrupt_config {
		struct virtio_pci_msix_config msix_config;
	};
};

struct virtio_pci_interrupt_config {
	le16 msix_count;
};


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-17 22:14           ` Michael S. Tsirkin
@ 2022-01-18  4:44             ` Parav Pandit
  2022-01-18  6:23               ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  4:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 3:44 AM

> > > It's a control queue. Why do we worry?
> > It is used to control/manage the resource of a VF which is deployed usually
> to a VM.
> > So higher the latency, higher the time it takes to deploy start the VM.
> 
> What are the savings here, in real terms? Boot times for smallest VMs are in
> 10s of milliseconds. Is reordering of a queue somehow going to save more than
> microseconds?
>
It is probably better not to pick on a specific vendor implementation.
But for real numbers, I see that an implementation takes 54usec to 500 usec range for simple configuration.

It is better to not small VM 4 vector configuration to take longer because there was previous AQ command for 64 vectors.
 
> > Hence, it is better to have this basic functionality in place, being useful
> beyond MSI-X config.
> > It is not functionally must. But riding AQ command ordering on
> VIRTIO_F_IN_ORDER for now and later on driving based on new field requires
> dual handling.
> > Better to start with its AQ's own ordering and one scheme.
> 
> Sorry I'm still scratching my head.

if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
    AQ.IN_ORDERED_NEGOTIATED /* new */) {
	/* handle AQ descriptors in-order way */
} else { 
	/* handle AQ desc out-of order way */
}

By always doing AQ commands out of order, we simplify the driver and device to avoid in-order execution.

> > > >
> > > > And also the other way around.
> > >
> > > what exactly does this mean?
> > >
> > IO commands out of order (for say block device), but AQ commands in order.
> > May be AQ command execution can be always treated as out of order, even
> when VIRTIO_F_IN_ORDER is negotiated.
> > This way it will be even more simpler design for driver and device.
> >
> > > > IO cmds and admin cmds have different considerations in many cases.
> > >
> > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > >
> > > E.g. I can see how a hardware vendor might want to avoid supporting
> > > indirect with RX for virtio net with mergeable buffers, but still support it for
> TX.
> > >
> > >
> > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > I agree. It only helps driver to ensure that AQ commands are processed in
> order, so it doesn't need to serialize it.
> > But yes, driver can always serialize it if needed when AQ is always out of
> order.
> > I think we should word it that AQ is always out of order.
> >
> > > I think you want to reorder admin commands dealing with unrelated
> > > VFs but keep io vqs in order for speed.
> > > Just guessing, you should spell the real motivation out.
> > > However, I think a better way to do that is with finalizing the
> > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > I read the partial order proposal at [1].
> > It still appears IN_ORDER from driver POV.
> > So I am not sure if driver can complete AQ commands out of order. Can it?
> 
> complete commands == use buffers?
Complete descriptors out of order.
I used term command as AQ descriptor used commands.
Will rephase.

> drivers do not use buffers.
>
 
> > I think data path needs more plumbing that just PARTIAL_ORDER flag, for
> descriptor processing differently on tx and rx side.
> > Not sure merging AQ to it is useful, given that we agree that AQ should
> always behave as out of order from beginning.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html
> 
> You mean *device*. Driver does not control the order.
Data path I meant device and driver both.
Driver doesn't control the order, but should be ready to handle used descriptors out of order when PARTIAL is negotiated.

> The point of PARTIAL_ORDER is basically that some descriptors are in order
> some out of order and its up to device. So it is even finer resolution.
> 
> 
> > > Pls review and let me know. If there's finally a use for it I'll
> > > prioritize finalizing that idea.
> > > Don't see much point in tweaking INDIRECT at all.
> > Common negotiation of INDIRECT on AQ and other queues forces data path
> also to handle that.
> 
> I don't see why admin queue needs indirect descriptors.
> 
Probably yes. the simple idea is, not to impose indirect descriptors on AQ because txq/rxq prefers to use it.
Not that AQ needs it.
At the same time, you don't want AQ object in spec to be limited to always operate without indirect descriptor.

> > It is better to not impact the device to handler indirect descriptors on non
> AQ queues, just because AQ prefers to handle it.
> > Often AQ and data path queues are not handled by same set of processing
> engines given they both do different tasks.
> 
> so for example, many guests want to use indirect for tx but not for rx.
> if you are worrying about things like that, maybe a per-vq control over indirect
> support makes sense.
> adding complexity like that should really be much better motivated, and
> maybe have some PoC code or back of the napkin math showing the expected
> gains.
I do not have gains handy for tx vs rx q. It was in your example of partial order page fault thread.
So may be you can share those results and/or poc code?

The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect descriptor capability should not be imposed on AQ.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-18  2:18           ` Jason Wang
@ 2022-01-18  5:25             ` Michael S. Tsirkin
  2022-01-19  4:16               ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  5:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 10:18:29AM +0800, Jason Wang wrote:
> 
> 在 2022/1/18 上午6:22, Michael S. Tsirkin 写道:
> > On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
> > > > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > Sent: Sunday, January 16, 2022 3:18 PM
> > > > 
> > > > 
> > > > On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > > > > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > > > > > Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > > > > > 
> > > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > So admin VQ # is only known when all features are negotiated.
> > > > No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> > > > are set by the device.
> > > > 
> > > > Negotiation is not a must.
> > > > 
> > > > Lets say CTRL_VQ is supported by the device and driver A would like to use it
> > > > and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> > > > + 1.
> > > > 
> > > > > Which is quite annoying if hypervisor wants to partition things e.g.
> > > > > handling admin q in process and handling vqs by an external process or
> > > > > by hardware.
> > > > > 
> > > > > I think we can allow devices to set the VQ# for the admin queue
> > > > > instead. Would that work?
> > > Number of MSI-X configuration and number of VQs config are two different,
> > 
> > I was talking about the number of the VQ used for admin commands. Not
> > about the number of VQs.
> > 
> > > though it has strong correlation.
> > > Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
> > > 
> > > So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.
> > I was not talking about that at all, but since you mention that,
> > to me it looks like something that many device types can support.
> > It's not necessarily rss related, MQ config would benefit too,
> > so I am not sure why not have a command for controlling number
> > of queues. Looks like it could quite be generic.
> > 
> > Since current guests only support two modes: a vector
> > per VQ and a shared vector for all VQs, it follows that
> > it is important when configuring vectors per VF to also configure
> > VQs per VF. This makes me wonder whether ability to configure
> > vectors per VF in isolation without ability to configure or
> > at least query VQs per VF even has value.
> 
> 
> So I had some thought in the past, it looks to me we need a generic
> provision interface that contains all the necessary attributes:
> 
> 1) #queues
> 2) device_features
> 3) #msi_vectors
> 4) device specific configurations
> 
> It could be either an admin virtqueue interface[1] or a dedicated
> capability[2], (the latter seems easier).
> 
> Thanks
> 
> [1]
> https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html
> [2]
> https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> 

We also need 
- something like injecting cvq commands to control rx mode from the admin device
- page fault / dirty page handling

these two seem to call for a vq.


> > 
> > 


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  3:22         ` Parav Pandit
@ 2022-01-18  6:17           ` Michael S. Tsirkin
  2022-01-18  7:57             ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  6:17 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 03:22:27AM +0000, Parav Pandit wrote:
> > I don't recall discussion about UUID so I can't really say what I think about that.
> > Do we need a UUID? I'm not sure I understand why.
> > It can't hurt to abstract things a bit so it's not all tied to PFs/VFs since we know
> > we'll want subfunctions down the road, too, if that is what you mean.
> >
> I still didn't find any reason in the discussion to find out why grouping device is needed.

VFs are already grouped with their PF. However we should spell this out
as the motivation for the admin queue.

> Current AQ proposal implicitly indicates that VFs of a PF are managed by its parent PF.
> And for some reason this work by one of the VF, this role assignment
> can be certainly a new command on AQ as group command or some other
> command.

> > 
> > 
> > > >
> > > > Following this idea, all commands would then gain fields for
> > > > addressing one device from another.
> > > >
> > > > Not everything maps well to a queue. E.g. it would be great to have
> > > > list of available commands in memory.
> > >
> > > I'm not sure I agree. Why can't it map to a queue ?
> > 
> > You can map it to a queue, yes. But something static and read only such as list
> > of commands maps well to config space. And it's not controlling one device
> > from another, so does not really seem to belong in the admin queue.
> > 
> Aq serves the writing device config too in patch-5 in this patchset.

List of available admin commands does not need to be written.

> > >
> > > > Figuring out max vectors also looks like a good example for memory
> > > > and not through a command.
> > >
> > > Any explanation why is it looks good ? or better ?
> > 
> > why is memory easier to operate than a VQ?
> > It's much simpler and so less error prone.  you can have multiple actors read
> > such a field at the same time without races, so e.g.  there could be a sysfs
> > attribute that reads from device on each access, and not special error handling
> > is needed.
> >
> Writing fields is inherent part of the aq without getting blocked on previous writes.
> I see you acked that AQ is fine in cover letter patch as below, so we are sync on the motivation now.
> Yes, will update the commit log as you suggested.
> 
>  " if the answer is "commands A,B,C do not fit in config space, we placed commands D,E in a VQ for consistency"
> then that is an ok answer, but it's something to be mentioned in the commit log"


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  4:44             ` Parav Pandit
@ 2022-01-18  6:23               ` Michael S. Tsirkin
  2022-01-18  6:32                 ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  6:23 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 3:44 AM
> 
> > > > It's a control queue. Why do we worry?
> > > It is used to control/manage the resource of a VF which is deployed usually
> > to a VM.
> > > So higher the latency, higher the time it takes to deploy start the VM.
> > 
> > What are the savings here, in real terms? Boot times for smallest VMs are in
> > 10s of milliseconds. Is reordering of a queue somehow going to save more than
> > microseconds?
> >
> It is probably better not to pick on a specific vendor implementation.
> But for real numbers, I see that an implementation takes 54usec to 500 usec range for simple configuration.
> 
> It is better to not small VM 4 vector configuration to take longer because there was previous AQ command for 64 vectors.

So virtio discovery on boot includes multiple of vmexits, each costs ~1000
cycles.  And people do not seem to worry about it.
You want a compelling argument for working on performance of config.
I frankly think it's not really useful but I especially think
you should cut this out of the current proposal, it's too big as it is.

> > > Hence, it is better to have this basic functionality in place, being useful
> > beyond MSI-X config.
> > > It is not functionally must. But riding AQ command ordering on
> > VIRTIO_F_IN_ORDER for now and later on driving based on new field requires
> > dual handling.
> > > Better to start with its AQ's own ordering and one scheme.
> > 
> > Sorry I'm still scratching my head.
> 
> if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
>     AQ.IN_ORDERED_NEGOTIATED /* new */) {
> 	/* handle AQ descriptors in-order way */
> } else { 
> 	/* handle AQ desc out-of order way */
> }
> 
> By always doing AQ commands out of order, we simplify the driver and device to avoid in-order execution.

No idea what this means. Needs much more motivational discussion, and
more thought about using generic infrastructure.
How about making this a separate proposal?


> > > > >
> > > > > And also the other way around.
> > > >
> > > > what exactly does this mean?
> > > >
> > > IO commands out of order (for say block device), but AQ commands in order.
> > > May be AQ command execution can be always treated as out of order, even
> > when VIRTIO_F_IN_ORDER is negotiated.
> > > This way it will be even more simpler design for driver and device.
> > >
> > > > > IO cmds and admin cmds have different considerations in many cases.
> > > >
> > > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > > >
> > > > E.g. I can see how a hardware vendor might want to avoid supporting
> > > > indirect with RX for virtio net with mergeable buffers, but still support it for
> > TX.
> > > >
> > > >
> > > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > > I agree. It only helps driver to ensure that AQ commands are processed in
> > order, so it doesn't need to serialize it.
> > > But yes, driver can always serialize it if needed when AQ is always out of
> > order.
> > > I think we should word it that AQ is always out of order.
> > >
> > > > I think you want to reorder admin commands dealing with unrelated
> > > > VFs but keep io vqs in order for speed.
> > > > Just guessing, you should spell the real motivation out.
> > > > However, I think a better way to do that is with finalizing the
> > > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > > I read the partial order proposal at [1].
> > > It still appears IN_ORDER from driver POV.
> > > So I am not sure if driver can complete AQ commands out of order. Can it?
> > 
> > complete commands == use buffers?
> Complete descriptors out of order.
> I used term command as AQ descriptor used commands.
> Will rephase.

So in that case just work on VIRTIO_F_PARTIAL_ORDER please. I think
there's a way to make it work for your usecase.

> > drivers do not use buffers.
> >
>  
> > > I think data path needs more plumbing that just PARTIAL_ORDER flag, for
> > descriptor processing differently on tx and rx side.
> > > Not sure merging AQ to it is useful, given that we agree that AQ should
> > always behave as out of order from beginning.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.html
> > 
> > You mean *device*. Driver does not control the order.
> Data path I meant device and driver both.
> Driver doesn't control the order, but should be ready to handle used descriptors out of order when PARTIAL is negotiated.
> 
> > The point of PARTIAL_ORDER is basically that some descriptors are in order
> > some out of order and its up to device. So it is even finer resolution.
> > 
> > 
> > > > Pls review and let me know. If there's finally a use for it I'll
> > > > prioritize finalizing that idea.
> > > > Don't see much point in tweaking INDIRECT at all.
> > > Common negotiation of INDIRECT on AQ and other queues forces data path
> > also to handle that.
> > 
> > I don't see why admin queue needs indirect descriptors.
> > 
> Probably yes. the simple idea is, not to impose indirect descriptors on AQ because txq/rxq prefers to use it.
> Not that AQ needs it.
> At the same time, you don't want AQ object in spec to be limited to always operate without indirect descriptor.
> 
> > > It is better to not impact the device to handler indirect descriptors on non
> > AQ queues, just because AQ prefers to handle it.
> > > Often AQ and data path queues are not handled by same set of processing
> > engines given they both do different tasks.
> > 
> > so for example, many guests want to use indirect for tx but not for rx.
> > if you are worrying about things like that, maybe a per-vq control over indirect
> > support makes sense.
> > adding complexity like that should really be much better motivated, and
> > maybe have some PoC code or back of the napkin math showing the expected
> > gains.
> I do not have gains handy for tx vs rx q. It was in your example of partial order page fault thread.
> So may be you can share those results and/or poc code?
> 
> The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect descriptor capability should not be imposed on AQ.


If I were you I would defer this, the AQ patch is already too big.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  6:23               ` Michael S. Tsirkin
@ 2022-01-18  6:32                 ` Parav Pandit
  2022-01-18  6:54                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  6:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 11:54 AM
> 
> On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 18, 2022 3:44 AM
> >
> > > > > It's a control queue. Why do we worry?
> > > > It is used to control/manage the resource of a VF which is
> > > > deployed usually
> > > to a VM.
> > > > So higher the latency, higher the time it takes to deploy start the VM.
> > >
> > > What are the savings here, in real terms? Boot times for smallest
> > > VMs are in 10s of milliseconds. Is reordering of a queue somehow
> > > going to save more than microseconds?
> > >
> > It is probably better not to pick on a specific vendor implementation.
> > But for real numbers, I see that an implementation takes 54usec to 500 usec
> range for simple configuration.
> >
> > It is better to not small VM 4 vector configuration to take longer because
> there was previous AQ command for 64 vectors.
> 
> So virtio discovery on boot includes multiple of vmexits, each costs ~1000
> cycles.  And people do not seem to worry about it.
It is not the vector configuration by guest VM.
It is the AQ command that provisions number of msix vectors for the VF that takes tens to hundreds of usecs.
These are the command in patch-5 in this proposal.

> You want a compelling argument for working on performance of config.
> I frankly think it's not really useful but I especially think you should cut this out
> of the current proposal, it's too big as it is.
> 
Ok. We can do follow on proposal after AQ.
We already see need of out of order AQ in internal performance tests we are running.
But fine, we can differ.

> > > > Hence, it is better to have this basic functionality in place,
> > > > being useful
> > > beyond MSI-X config.
> > > > It is not functionally must. But riding AQ command ordering on
> > > VIRTIO_F_IN_ORDER for now and later on driving based on new field
> > > requires dual handling.
> > > > Better to start with its AQ's own ordering and one scheme.
> > >
> > > Sorry I'm still scratching my head.
> >
> > if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
> >     AQ.IN_ORDERED_NEGOTIATED /* new */) {
> > 	/* handle AQ descriptors in-order way */ } else {
> > 	/* handle AQ desc out-of order way */ }
> >
> > By always doing AQ commands out of order, we simplify the driver and
> device to avoid in-order execution.
> 
> No idea what this means. Needs much more motivational discussion, and
> more thought about using generic infrastructure.
> How about making this a separate proposal?
>
Got it. Will drive it in follow on separate proposal.
 
> 
> > > > > >
> > > > > > And also the other way around.
> > > > >
> > > > > what exactly does this mean?
> > > > >
> > > > IO commands out of order (for say block device), but AQ commands in
> order.
> > > > May be AQ command execution can be always treated as out of order,
> > > > even
> > > when VIRTIO_F_IN_ORDER is negotiated.
> > > > This way it will be even more simpler design for driver and device.
> > > >
> > > > > > IO cmds and admin cmds have different considerations in many cases.
> > > > >
> > > > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > > > >
> > > > > E.g. I can see how a hardware vendor might want to avoid
> > > > > supporting indirect with RX for virtio net with mergeable
> > > > > buffers, but still support it for
> > > TX.
> > > > >
> > > > >
> > > > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > > > I agree. It only helps driver to ensure that AQ commands are
> > > > processed in
> > > order, so it doesn't need to serialize it.
> > > > But yes, driver can always serialize it if needed when AQ is
> > > > always out of
> > > order.
> > > > I think we should word it that AQ is always out of order.
> > > >
> > > > > I think you want to reorder admin commands dealing with
> > > > > unrelated VFs but keep io vqs in order for speed.
> > > > > Just guessing, you should spell the real motivation out.
> > > > > However, I think a better way to do that is with finalizing the
> > > > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > > > I read the partial order proposal at [1].
> > > > It still appears IN_ORDER from driver POV.
> > > > So I am not sure if driver can complete AQ commands out of order. Can
> it?
> > >
> > > complete commands == use buffers?
> > Complete descriptors out of order.
> > I used term command as AQ descriptor used commands.
> > Will rephase.
> 
> So in that case just work on VIRTIO_F_PARTIAL_ORDER please. I think there's a
> way to make it work for your usecase.
> 
> > > drivers do not use buffers.
> > >
> >
> > > > I think data path needs more plumbing that just PARTIAL_ORDER
> > > > flag, for
> > > descriptor processing differently on tx and rx side.
> > > > Not sure merging AQ to it is useful, given that we agree that AQ
> > > > should
> > > always behave as out of order from beginning.
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.h
> > > > tml
> > >
> > > You mean *device*. Driver does not control the order.
> > Data path I meant device and driver both.
> > Driver doesn't control the order, but should be ready to handle used
> descriptors out of order when PARTIAL is negotiated.
> >
> > > The point of PARTIAL_ORDER is basically that some descriptors are in
> > > order some out of order and its up to device. So it is even finer resolution.
> > >
> > >
> > > > > Pls review and let me know. If there's finally a use for it I'll
> > > > > prioritize finalizing that idea.
> > > > > Don't see much point in tweaking INDIRECT at all.
> > > > Common negotiation of INDIRECT on AQ and other queues forces data
> > > > path
> > > also to handle that.
> > >
> > > I don't see why admin queue needs indirect descriptors.
> > >
> > Probably yes. the simple idea is, not to impose indirect descriptors on AQ
> because txq/rxq prefers to use it.
> > Not that AQ needs it.
> > At the same time, you don't want AQ object in spec to be limited to always
> operate without indirect descriptor.
> >
> > > > It is better to not impact the device to handler indirect
> > > > descriptors on non
> > > AQ queues, just because AQ prefers to handle it.
> > > > Often AQ and data path queues are not handled by same set of
> > > > processing
> > > engines given they both do different tasks.
> > >
> > > so for example, many guests want to use indirect for tx but not for rx.
> > > if you are worrying about things like that, maybe a per-vq control
> > > over indirect support makes sense.
> > > adding complexity like that should really be much better motivated,
> > > and maybe have some PoC code or back of the napkin math showing the
> > > expected gains.
> > I do not have gains handy for tx vs rx q. It was in your example of partial
> order page fault thread.
> > So may be you can share those results and/or poc code?
> >
> > The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect
> descriptor capability should not be imposed on AQ.
> 
> 
> If I were you I would defer this, the AQ patch is already too big.
o.k. we can differ.
But if you see on the other side, AQ always following INDIRECT_DESC feature bit, forces device implementation to support indirect descriptors.
Isn't that make device implementation big from beginning, even though it may not be needed?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  6:32                 ` Parav Pandit
@ 2022-01-18  6:54                   ` Michael S. Tsirkin
  2022-01-18  7:07                     ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  6:54 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 06:32:50AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 11:54 AM
> > 
> > On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 18, 2022 3:44 AM
> > >
> > > > > > It's a control queue. Why do we worry?
> > > > > It is used to control/manage the resource of a VF which is
> > > > > deployed usually
> > > > to a VM.
> > > > > So higher the latency, higher the time it takes to deploy start the VM.
> > > >
> > > > What are the savings here, in real terms? Boot times for smallest
> > > > VMs are in 10s of milliseconds. Is reordering of a queue somehow
> > > > going to save more than microseconds?
> > > >
> > > It is probably better not to pick on a specific vendor implementation.
> > > But for real numbers, I see that an implementation takes 54usec to 500 usec
> > range for simple configuration.
> > >
> > > It is better to not small VM 4 vector configuration to take longer because
> > there was previous AQ command for 64 vectors.
> > 
> > So virtio discovery on boot includes multiple of vmexits, each costs ~1000
> > cycles.  And people do not seem to worry about it.
> It is not the vector configuration by guest VM.
> It is the AQ command that provisions number of msix vectors for the VF that takes tens to hundreds of usecs.
> These are the command in patch-5 in this proposal.

Hundreds of usecs is negligeable compared to VM boot time.
Sorry I don't really see why we worry about indirect in that case.


> > You want a compelling argument for working on performance of config.
> > I frankly think it's not really useful but I especially think you should cut this out
> > of the current proposal, it's too big as it is.
> > 
> Ok. We can do follow on proposal after AQ.
> We already see need of out of order AQ in internal performance tests we are running.

OK so first of all you can avoid declaring IN_ORDER.  If you see that
IN_ORDER improves performance for you so you need it, then look at
PARTIAL_ORDER pls. And if that does not address your needs then let's
discuss, I'd rather have a generic solution since the requirement does
not seem to be specific to AQ.

> But fine, we can differ.
> 
> > > > > Hence, it is better to have this basic functionality in place,
> > > > > being useful
> > > > beyond MSI-X config.
> > > > > It is not functionally must. But riding AQ command ordering on
> > > > VIRTIO_F_IN_ORDER for now and later on driving based on new field
> > > > requires dual handling.
> > > > > Better to start with its AQ's own ordering and one scheme.
> > > >
> > > > Sorry I'm still scratching my head.
> > >
> > > if (DEV.IN_ORDER_NEGOTIATED /*current */ ||
> > >     AQ.IN_ORDERED_NEGOTIATED /* new */) {
> > > 	/* handle AQ descriptors in-order way */ } else {
> > > 	/* handle AQ desc out-of order way */ }
> > >
> > > By always doing AQ commands out of order, we simplify the driver and
> > device to avoid in-order execution.
> > 
> > No idea what this means. Needs much more motivational discussion, and
> > more thought about using generic infrastructure.
> > How about making this a separate proposal?
> >
> Got it. Will drive it in follow on separate proposal.
>  
> > 
> > > > > > >
> > > > > > > And also the other way around.
> > > > > >
> > > > > > what exactly does this mean?
> > > > > >
> > > > > IO commands out of order (for say block device), but AQ commands in
> > order.
> > > > > May be AQ command execution can be always treated as out of order,
> > > > > even
> > > > when VIRTIO_F_IN_ORDER is negotiated.
> > > > > This way it will be even more simpler design for driver and device.
> > > > >
> > > > > > > IO cmds and admin cmds have different considerations in many cases.
> > > > > >
> > > > > > That's still pretty vague.  so do other types of VQ, such as RX/TX.
> > > > > >
> > > > > > E.g. I can see how a hardware vendor might want to avoid
> > > > > > supporting indirect with RX for virtio net with mergeable
> > > > > > buffers, but still support it for
> > > > TX.
> > > > > >
> > > > > >
> > > > > > I can vaguely see the usefulness of VIRTIO_F_ADMIN_VQ_IN_ORDER.
> > > > > I agree. It only helps driver to ensure that AQ commands are
> > > > > processed in
> > > > order, so it doesn't need to serialize it.
> > > > > But yes, driver can always serialize it if needed when AQ is
> > > > > always out of
> > > > order.
> > > > > I think we should word it that AQ is always out of order.
> > > > >
> > > > > > I think you want to reorder admin commands dealing with
> > > > > > unrelated VFs but keep io vqs in order for speed.
> > > > > > Just guessing, you should spell the real motivation out.
> > > > > > However, I think a better way to do that is with finalizing the
> > > > > > VIRTIO_F_PARTIAL_ORDER proposal from august.
> > > > > I read the partial order proposal at [1].
> > > > > It still appears IN_ORDER from driver POV.
> > > > > So I am not sure if driver can complete AQ commands out of order. Can
> > it?
> > > >
> > > > complete commands == use buffers?
> > > Complete descriptors out of order.
> > > I used term command as AQ descriptor used commands.
> > > Will rephase.
> > 
> > So in that case just work on VIRTIO_F_PARTIAL_ORDER please. I think there's a
> > way to make it work for your usecase.
> > 
> > > > drivers do not use buffers.
> > > >
> > >
> > > > > I think data path needs more plumbing that just PARTIAL_ORDER
> > > > > flag, for
> > > > descriptor processing differently on tx and rx side.
> > > > > Not sure merging AQ to it is useful, given that we agree that AQ
> > > > > should
> > > > always behave as out of order from beginning.
> > > > >
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-dev/202008/msg00001.h
> > > > > tml
> > > >
> > > > You mean *device*. Driver does not control the order.
> > > Data path I meant device and driver both.
> > > Driver doesn't control the order, but should be ready to handle used
> > descriptors out of order when PARTIAL is negotiated.
> > >
> > > > The point of PARTIAL_ORDER is basically that some descriptors are in
> > > > order some out of order and its up to device. So it is even finer resolution.
> > > >
> > > >
> > > > > > Pls review and let me know. If there's finally a use for it I'll
> > > > > > prioritize finalizing that idea.
> > > > > > Don't see much point in tweaking INDIRECT at all.
> > > > > Common negotiation of INDIRECT on AQ and other queues forces data
> > > > > path
> > > > also to handle that.
> > > >
> > > > I don't see why admin queue needs indirect descriptors.
> > > >
> > > Probably yes. the simple idea is, not to impose indirect descriptors on AQ
> > because txq/rxq prefers to use it.
> > > Not that AQ needs it.
> > > At the same time, you don't want AQ object in spec to be limited to always
> > operate without indirect descriptor.
> > >
> > > > > It is better to not impact the device to handler indirect
> > > > > descriptors on non
> > > > AQ queues, just because AQ prefers to handle it.
> > > > > Often AQ and data path queues are not handled by same set of
> > > > > processing
> > > > engines given they both do different tasks.
> > > >
> > > > so for example, many guests want to use indirect for tx but not for rx.
> > > > if you are worrying about things like that, maybe a per-vq control
> > > > over indirect support makes sense.
> > > > adding complexity like that should really be much better motivated,
> > > > and maybe have some PoC code or back of the napkin math showing the
> > > > expected gains.
> > > I do not have gains handy for tx vs rx q. It was in your example of partial
> > order page fault thread.
> > > So may be you can share those results and/or poc code?
> > >
> > > The motivation for AQ is simple as I explained about, i.e. txq/rxq indirect
> > descriptor capability should not be imposed on AQ.
> > 
> > 
> > If I were you I would defer this, the AQ patch is already too big.
> o.k. we can differ.
> But if you see on the other side, AQ always following INDIRECT_DESC feature bit, forces device implementation to support indirect descriptors.
> Isn't that make device implementation big from beginning, even though it may not be needed?

The problem is not unique to AQ though. RX queues for virtio net have
the same issue.
If it's there but not used you can punt it to a slow path in firmware
was always our approach. If not, worth thinking of a generic solution.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  6:54                   ` Michael S. Tsirkin
@ 2022-01-18  7:07                     ` Parav Pandit
  2022-01-18  7:12                       ` Michael S. Tsirkin
                                         ` (2 more replies)
  0 siblings, 3 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  7:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:25 PM
> 
> On Tue, Jan 18, 2022 at 06:32:50AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 18, 2022 11:54 AM
> > >
> > > On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, January 18, 2022 3:44 AM
> > > >
> > > > > > > It's a control queue. Why do we worry?
> > > > > > It is used to control/manage the resource of a VF which is
> > > > > > deployed usually
> > > > > to a VM.
> > > > > > So higher the latency, higher the time it takes to deploy start the VM.
> > > > >
> > > > > What are the savings here, in real terms? Boot times for
> > > > > smallest VMs are in 10s of milliseconds. Is reordering of a
> > > > > queue somehow going to save more than microseconds?
> > > > >
> > > > It is probably better not to pick on a specific vendor implementation.
> > > > But for real numbers, I see that an implementation takes 54usec to
> > > > 500 usec
> > > range for simple configuration.
> > > >
> > > > It is better to not small VM 4 vector configuration to take longer
> > > > because
> > > there was previous AQ command for 64 vectors.
> > >
> > > So virtio discovery on boot includes multiple of vmexits, each costs
> > > ~1000 cycles.  And people do not seem to worry about it.
> > It is not the vector configuration by guest VM.
> > It is the AQ command that provisions number of msix vectors for the VF that
> takes tens to hundreds of usecs.
> > These are the command in patch-5 in this proposal.
> 
> Hundreds of usecs is negligeable compared to VM boot time.
> Sorry I don't really see why we worry about indirect in that case.
> 
> 
Ok. we will do incremental proposal after this for wider use case.

> > > You want a compelling argument for working on performance of config.
> > > I frankly think it's not really useful but I especially think you
> > > should cut this out of the current proposal, it's too big as it is.
> > >
> > Ok. We can do follow on proposal after AQ.
> > We already see need of out of order AQ in internal performance tests we are
> running.
> 
> OK so first of all you can avoid declaring IN_ORDER.  
This will force non IN_ORDER on other txq and rxq too that causes higher latency.
But fine, initial implementation can start without it.

> If you see that IN_ORDER
> improves performance for you so you need it, then look at PARTIAL_ORDER
> pls. 
Ok. will consider PARTIAL_ORDER more in future proposal.

> And if that does not address your needs then let's discuss, I'd rather have a
> generic solution since the requirement does not seem to be specific to AQ.
> 
> > But fine, we can differ.

So far I gather below summary that needs to be addressed in v2.

1. Use AQ for msix query and config
2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of the queues
3. Update commit log to describe why config space is not chosen (scale, on-die registers, uniform way to handle all aq cmds)
4. Improve documentation around msix config to link to sriov section of virtio spec
5. Describe error that if VF is bound to the device, admin commands targeting VF can fail, describe this error code

Did I miss anything?

Yet to receive your feedback on group, if/why is it needed and, why/if it must be in this proposal, what pieces prevents it do as follow-on.

Cornelia, Jason,
Can you please review current proposal as well before we revise v2?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  3:36         ` Parav Pandit
@ 2022-01-18  7:07           ` Michael S. Tsirkin
  2022-01-18  7:14             ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:07 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 03:36:19AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 3:33 AM
> > 
> > On Mon, Jan 17, 2022 at 02:12:33PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Thursday, January 13, 2022 11:24 PM
> > >
> > > >
> > > > We had an off-list meeting where I proposed addressing one device
> > > > from another or grouping multiple devices as a more specific scope.
> > > > That would be one way to address this.
> > > >
> > > > Following this idea, all commands would then gain fields for
> > > > addressing one device from another.
> > > >
> > > Can you please explain your idea more and a need for grouping?
> > > What do you want to group? VFs of parent pci device?
> > > How to refer to each VF within a group?
> > 
> > So for example, VFs of a PF are a group right? And they are all controlled by a
> > PF.
> > 
> > I can think of setups like nesting where we might want to create a group of VFs
> > and pass them to L1, one of the VFs to act as an admin for the reset of them for
> > purposes of L2.  subfunctions with PASID etc are another example. 
> 
> Subfunctions with PASID can be similarly managed by extending device identification and its MSIX/IMS vector details.
> May be vf_number should be put in the union as,
> 
> union device_id {
> 	struct pci_vf vf_id; /* current */
> 	struct pci_sf sf_id; /* future */
> };
> 
> So that they both can use command opcode.

device id is not a good name, but yes. However this is why I think we
should have a slightly more generic terminology, and more space for
these IDs, and then we'd have a specific binding for VFs.


> > I am not
> > asking you to add such mechanisms straight away but the current proposal
> > kind of obscures this to the point where I don't see how would we extend it
> > with these things down the road.
> > 
> Which part in specific make it obscure?

just that the text is not generic. would be nicer if adding
new types would involve only changing one or two places

> New device type can be identifiable by above union.
> 
> May be a better structure would be in patch-5 is:
> Something like below,
> 
> struct virtio_admin_pci_virt_property_set {
> 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction */
> 	union virtio_device_identifier {
> 		struct virtio_pci_dev_id pf_vf; /* current */
> 		struct virtio_subfunction sf; /* future */
> 	};
> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device specific, intx, something else */
> 	union virtio_interrupt_config {
> 		struct virtio_pci_msix_config msix_config;
> 	};
> };
> 
> struct virtio_pci_interrupt_config {
> 	le16 msix_count;
> };

you do not need a union straight away, Simply use something like this "device
identifier" everywhere and then add some text explaining that currently
it is a VF number and that admin device is a PF.

However we need better names, device ID is already used in the spec
for enumeration/discovery. come up with something else please.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:07                     ` Parav Pandit
@ 2022-01-18  7:12                       ` Michael S. Tsirkin
  2022-01-18  7:30                         ` Parav Pandit
  2022-01-18  7:13                       ` Michael S. Tsirkin
  2022-01-19  4:03                       ` Jason Wang
  2 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> 1. Use AQ for msix query and config
> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of the queues
> 3. Update commit log to describe why config space is not chosen (scale, on-die registers, uniform way to handle all aq cmds)
> 4. Improve documentation around msix config to link to sriov section of virtio spec
> 5. Describe error that if VF is bound to the device, admin commands targeting VF can fail, describe this error code
> 
> Did I miss anything?

Better document in spec text just what is the scope for AQ.


> Yet to receive your feedback on group, if/why is it needed and, why/if it must be in this proposal, what pieces prevents it do as follow-on.

I think this is related to the subfunction usecase or other future
usecase. In case of PF/VF grouping is implicit through the SRIOV
capability. It would be nice to have things somewhat generic in
most of the text though since we already know this will be needed.
E.g. jason sent a proposal for commands to add/delete subfunctions,
take a look at it, somehow AQ needs to be extendable to support that
functionality too.

> Cornelia, Jason,
> Can you please review current proposal as well before we revise v2?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:07                     ` Parav Pandit
  2022-01-18  7:12                       ` Michael S. Tsirkin
@ 2022-01-18  7:13                       ` Michael S. Tsirkin
  2022-01-18  7:21                         ` Parav Pandit
  2022-01-19  4:03                       ` Jason Wang
  2 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:13 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> Can you please review current proposal as well before we revise v2?

I think what you listed amounts to a significant rework and will make
things easier to review. Not 100% sure you need more feedback at this
point.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  7:07           ` Michael S. Tsirkin
@ 2022-01-18  7:14             ` Parav Pandit
  2022-01-18  7:20               ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  7:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:38 PM

> > Subfunctions with PASID can be similarly managed by extending device
> identification and its MSIX/IMS vector details.
> > May be vf_number should be put in the union as,
> >
> > union device_id {
> > 	struct pci_vf vf_id; /* current */
> > 	struct pci_sf sf_id; /* future */
> > };
> >
> > So that they both can use command opcode.
> 
> device id is not a good name, but yes. However this is why I think we should
> have a slightly more generic terminology, and more space for these IDs, and
> then we'd have a specific binding for VFs.
> 
I couldn't think of better name for identifying a PCI VF. But yes have to think better name for sure.

> > > I am not
> > > asking you to add such mechanisms straight away but the current
> > > proposal kind of obscures this to the point where I don't see how
> > > would we extend it with these things down the road.
> > >
> > Which part in specific make it obscure?
> 
> just that the text is not generic. would be nicer if adding new types would
> involve only changing one or two places
> 
> > New device type can be identifiable by above union.
> >
> > May be a better structure would be in patch-5 is:
> > Something like below,
> >
> > struct virtio_admin_pci_virt_property_set {
> > 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction
> */
> > 	union virtio_device_identifier {
> > 		struct virtio_pci_dev_id pf_vf; /* current */
> > 		struct virtio_subfunction sf; /* future */
> > 	};
> > 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
> specific, intx, something else */
> > 	union virtio_interrupt_config {
> > 		struct virtio_pci_msix_config msix_config;
> > 	};
> > };
> >
> > struct virtio_pci_interrupt_config {
> > 	le16 msix_count;
> > };
> 
> you do not need a union straight away, Simply use something like this "device
> identifier" everywhere and then add some text explaining that currently it is a
> VF number and that admin device is a PF.
Unless we reserve some bytes, I fail to see how can it be future compatible for unknown device id type for subfunction.
pci_vf_number is very crisp for the pci device for a PCI VF specific command.
So I am ruling out arbitrary number of bytes reservation.
And split the command to two pieces.
1. command opcode
2. command content (pci vf specific). This will be different structure for subfunction or for non pci device

Would that be more elegant?

> 
> However we need better names, device ID is already used in the spec for
> enumeration/discovery. come up with something else please.
Yes.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  7:14             ` Parav Pandit
@ 2022-01-18  7:20               ` Michael S. Tsirkin
  2022-01-19 11:33                 ` Max Gurtovoy
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 07:14:56AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 12:38 PM
> 
> > > Subfunctions with PASID can be similarly managed by extending device
> > identification and its MSIX/IMS vector details.
> > > May be vf_number should be put in the union as,
> > >
> > > union device_id {
> > > 	struct pci_vf vf_id; /* current */
> > > 	struct pci_sf sf_id; /* future */
> > > };
> > >
> > > So that they both can use command opcode.
> > 
> > device id is not a good name, but yes. However this is why I think we should
> > have a slightly more generic terminology, and more space for these IDs, and
> > then we'd have a specific binding for VFs.
> > 
> I couldn't think of better name for identifying a PCI VF. But yes have to think better name for sure.
> 
> > > > I am not
> > > > asking you to add such mechanisms straight away but the current
> > > > proposal kind of obscures this to the point where I don't see how
> > > > would we extend it with these things down the road.
> > > >
> > > Which part in specific make it obscure?
> > 
> > just that the text is not generic. would be nicer if adding new types would
> > involve only changing one or two places
> > 
> > > New device type can be identifiable by above union.
> > >
> > > May be a better structure would be in patch-5 is:
> > > Something like below,
> > >
> > > struct virtio_admin_pci_virt_property_set {
> > > 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction
> > */
> > > 	union virtio_device_identifier {
> > > 		struct virtio_pci_dev_id pf_vf; /* current */
> > > 		struct virtio_subfunction sf; /* future */
> > > 	};
> > > 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
> > specific, intx, something else */
> > > 	union virtio_interrupt_config {
> > > 		struct virtio_pci_msix_config msix_config;
> > > 	};
> > > };
> > >
> > > struct virtio_pci_interrupt_config {
> > > 	le16 msix_count;
> > > };
> > 
> > you do not need a union straight away, Simply use something like this "device
> > identifier" everywhere and then add some text explaining that currently it is a
> > VF number and that admin device is a PF.
> Unless we reserve some bytes, I fail to see how can it be future compatible for unknown device id type for subfunction.

So reserve some bytes then. 4 should be plenty.

> pci_vf_number is very crisp for the pci device for a PCI VF specific command.
> So I am ruling out arbitrary number of bytes reservation.

we already know we'll need subfunctions. so I would say make it 4 bytes.

> And split the command to two pieces.
> 1. command opcode
> 2. command content (pci vf specific). This will be different structure for subfunction or for non pci device
> 
> Would that be more elegant?

no idea about non pci. we do know about subfunctions so let us not
pretend then are this unknown entity that are very hard to reason about,
it's something that's just around the corner.

> > 
> > However we need better names, device ID is already used in the spec for
> > enumeration/discovery. come up with something else please.
> Yes.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:13                       ` Michael S. Tsirkin
@ 2022-01-18  7:21                         ` Parav Pandit
  2022-01-18  7:37                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  7:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:44 PM
> 
> On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > Can you please review current proposal as well before we revise v2?
> 
> I think what you listed amounts to a significant rework and will make things
> easier to review. Not 100% sure you need more feedback at this point.

With 
(a) the motivation that Jason mentioned for config vqs, vectors etc,
(b) the msix config/query of this proposal
(c) your description to handle them in uniform way,
(d) understanding the scale inefficiency, on-die resources, multiple outstanding cmds discussion in the thread,

I would like to receive feedback that we all agree to configure these values via AQ.

Rest of the plumbing on AQ etc to address comments to complete in v2, once this looks ok.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:12                       ` Michael S. Tsirkin
@ 2022-01-18  7:30                         ` Parav Pandit
  2022-01-18  7:40                           ` Michael S. Tsirkin
  2022-01-18 10:38                           ` Michael S. Tsirkin
  0 siblings, 2 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  7:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 12:42 PM
> 
> On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > 1. Use AQ for msix query and config
> > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > the queues 3. Update commit log to describe why config space is not
> > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > Improve documentation around msix config to link to sriov section of
> > virtio spec 5. Describe error that if VF is bound to the device, admin
> > commands targeting VF can fail, describe this error code
> >
> > Did I miss anything?
> 
> Better document in spec text just what is the scope for AQ.
>
Yes, will improve this spec.
 
> 
> > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> be in this proposal, what pieces prevents it do as follow-on.
> 
> I think this is related to the subfunction usecase or other future usecase. In
> case of PF/VF grouping is implicit through the SRIOV capability. It would be
> nice to have things somewhat generic in most of the text though since we
> already know this will be needed.
> E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> look at it, somehow AQ needs to be extendable to support that functionality
> too.
I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
But more commands can be added in future.

What I wanted to check with you and other is, do we want command opcode to be 7-bit enough? 
#127 is lot of admin commands. 😊
But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
What do you think?

An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
We see that creator of the subfunction is often not the only entity managing it.
They being same in new era finding less and less users.
So this piece needs more discussion whenever we address that.

[1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:21                         ` Parav Pandit
@ 2022-01-18  7:37                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:37 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 07:21:02AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 12:44 PM
> > 
> > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > Can you please review current proposal as well before we revise v2?
> > 
> > I think what you listed amounts to a significant rework and will make things
> > easier to review. Not 100% sure you need more feedback at this point.
> 
> With 
> (a) the motivation that Jason mentioned for config vqs, vectors etc,
> (b) the msix config/query of this proposal
> (c) your description to handle them in uniform way,
> (d) understanding the scale inefficiency, on-die resources, multiple outstanding cmds discussion in the thread,
> 
> I would like to receive feedback that we all agree to configure these values via AQ.
> Rest of the plumbing on AQ etc to address comments to complete in v2, once this looks ok.

Go ahead and wait if you like, that was just my advice because
personally if I see a mega-thread like this one on the list I just wait
for the next version. Review time has to be viewed as more precious than
developer time, otherwise things do not scale.

Or to put it more succinctly, iterating quickly is recipe for success.
-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:30                         ` Parav Pandit
@ 2022-01-18  7:40                           ` Michael S. Tsirkin
  2022-01-19  4:21                             ` Jason Wang
  2022-01-18 10:38                           ` Michael S. Tsirkin
  1 sibling, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  7:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 12:42 PM
> > 
> > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > 1. Use AQ for msix query and config
> > > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > > the queues 3. Update commit log to describe why config space is not
> > > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > > Improve documentation around msix config to link to sriov section of
> > > virtio spec 5. Describe error that if VF is bound to the device, admin
> > > commands targeting VF can fail, describe this error code
> > >
> > > Did I miss anything?
> > 
> > Better document in spec text just what is the scope for AQ.
> >
> Yes, will improve this spec.
>  
> > 
> > > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> > be in this proposal, what pieces prevents it do as follow-on.
> > 
> > I think this is related to the subfunction usecase or other future usecase. In
> > case of PF/VF grouping is implicit through the SRIOV capability. It would be
> > nice to have things somewhat generic in most of the text though since we
> > already know this will be needed.
> > E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> > look at it, somehow AQ needs to be extendable to support that functionality
> > too.
> I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
> But more commands can be added in future.
> 
> What I wanted to check with you and other is, do we want command opcode to be 7-bit enough? 
> #127 is lot of admin commands. 😊
> But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
> What do you think?

I agree, we are not short on bits.

> An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> We see that creator of the subfunction is often not the only entity managing it.

I think whoever does it can go through the main function driver.

> They being same in new era finding less and less users.
> So this piece needs more discussion whenever we address that.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  6:17           ` Michael S. Tsirkin
@ 2022-01-18  7:57             ` Parav Pandit
  2022-01-18  8:05               ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  7:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 11:47 AM
> 
> On Tue, Jan 18, 2022 at 03:22:27AM +0000, Parav Pandit wrote:
> > > I don't recall discussion about UUID so I can't really say what I think about
> that.
> > > Do we need a UUID? I'm not sure I understand why.
> > > It can't hurt to abstract things a bit so it's not all tied to
> > > PFs/VFs since we know we'll want subfunctions down the road, too, if that
> is what you mean.
> > >
> > I still didn't find any reason in the discussion to find out why grouping device
> is needed.
> 
> VFs are already grouped with their PF. However we should spell this out as the
> motivation for the admin queue.
>
Ok. so for now, we are not introducing any explicitly grouping concept.
In v2, will revise to have description as,

PCI VFs of a parent PCI PF device are grouped together. These devices can be optionally managed by its parent PCI PF.
 
> > Current AQ proposal implicitly indicates that VFs of a PF are managed by its
> parent PF.
> > And for some reason this work by one of the VF, this role assignment
> > can be certainly a new command on AQ as group command or some other
> > command.
> 
> > >
> > >
> > > > >
> > > > > Following this idea, all commands would then gain fields for
> > > > > addressing one device from another.
> > > > >
> > > > > Not everything maps well to a queue. E.g. it would be great to
> > > > > have list of available commands in memory.
> > > >
> > > > I'm not sure I agree. Why can't it map to a queue ?
> > >
> > > You can map it to a queue, yes. But something static and read only
> > > such as list of commands maps well to config space. And it's not
> > > controlling one device from another, so does not really seem to belong in
> the admin queue.
> > >
> > Aq serves the writing device config too in patch-5 in this patchset.
> 
> List of available admin commands does not need to be written.
>
It is not written into the aq commands.
It is part of the feature bit VIRTIO_F_ADMIN_PCI_VIRT_MANAGER indicating a given functionality supported or not in patch-5.
And structure like, virtio_admin_pci_virt_mgmt_attr_identify_result can potentially grow and storing those fields on on-chip resource is less efficient.
Hence, they are shared via AQ.

 
> > > >
> > > > > Figuring out max vectors also looks like a good example for
> > > > > memory and not through a command.
> > > >
> > > > Any explanation why is it looks good ? or better ?
> > >
> > > why is memory easier to operate than a VQ?
> > > It's much simpler and so less error prone.  you can have multiple
> > > actors read such a field at the same time without races, so e.g.
> > > there could be a sysfs attribute that reads from device on each
> > > access, and not special error handling is needed.
> > >
> > Writing fields is inherent part of the aq without getting blocked on previous
> writes.
> > I see you acked that AQ is fine in cover letter patch as below, so we are sync
> on the motivation now.
> > Yes, will update the commit log as you suggested.
> >
> >  " if the answer is "commands A,B,C do not fit in config space, we placed
> commands D,E in a VQ for consistency"
> > then that is an ok answer, but it's something to be mentioned in the commit
> log"


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  7:57             ` Parav Pandit
@ 2022-01-18  8:05               ` Michael S. Tsirkin
  2022-01-18  8:23                 ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18  8:05 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha

On Tue, Jan 18, 2022 at 07:57:36AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 11:47 AM
> > 
> > On Tue, Jan 18, 2022 at 03:22:27AM +0000, Parav Pandit wrote:
> > > > I don't recall discussion about UUID so I can't really say what I think about
> > that.
> > > > Do we need a UUID? I'm not sure I understand why.
> > > > It can't hurt to abstract things a bit so it's not all tied to
> > > > PFs/VFs since we know we'll want subfunctions down the road, too, if that
> > is what you mean.
> > > >
> > > I still didn't find any reason in the discussion to find out why grouping device
> > is needed.
> > 
> > VFs are already grouped with their PF. However we should spell this out as the
> > motivation for the admin queue.
> >
> Ok. so for now, we are not introducing any explicitly grouping concept.
> In v2, will revise to have description as,
> 
> PCI VFs of a parent PCI PF device are grouped together. These devices can be optionally managed by its parent PCI PF.
>  
> > > Current AQ proposal implicitly indicates that VFs of a PF are managed by its
> > parent PF.
> > > And for some reason this work by one of the VF, this role assignment
> > > can be certainly a new command on AQ as group command or some other
> > > command.
> > 
> > > >
> > > >
> > > > > >
> > > > > > Following this idea, all commands would then gain fields for
> > > > > > addressing one device from another.
> > > > > >
> > > > > > Not everything maps well to a queue. E.g. it would be great to
> > > > > > have list of available commands in memory.
> > > > >
> > > > > I'm not sure I agree. Why can't it map to a queue ?
> > > >
> > > > You can map it to a queue, yes. But something static and read only
> > > > such as list of commands maps well to config space. And it's not
> > > > controlling one device from another, so does not really seem to belong in
> > the admin queue.
> > > >
> > > Aq serves the writing device config too in patch-5 in this patchset.
> > 
> > List of available admin commands does not need to be written.
> >
> It is not written into the aq commands.
> It is part of the feature bit VIRTIO_F_ADMIN_PCI_VIRT_MANAGER indicating a given functionality supported or not in patch-5.


Btw I don't see what does "VIRT_MANAGER" mean here. "manager" is just a
generic thing that means nothing, and VIRT just repeats VIRTIO.


> And structure like, virtio_admin_pci_virt_mgmt_attr_identify_result can potentially grow and storing those fields on on-chip resource is less efficient.
> Hence, they are shared via AQ.

The issue is this: VIRTIO_F_ADMIN_PCI_VIRT_MANAGER seems to mean
there are pci related admin commands. OK I guess. However you then
say this same feature bit implies generic functionality like
list of supported commands. Confusing.


>  
> > > > >
> > > > > > Figuring out max vectors also looks like a good example for
> > > > > > memory and not through a command.
> > > > >
> > > > > Any explanation why is it looks good ? or better ?
> > > >
> > > > why is memory easier to operate than a VQ?
> > > > It's much simpler and so less error prone.  you can have multiple
> > > > actors read such a field at the same time without races, so e.g.
> > > > there could be a sysfs attribute that reads from device on each
> > > > access, and not special error handling is needed.
> > > >
> > > Writing fields is inherent part of the aq without getting blocked on previous
> > writes.
> > > I see you acked that AQ is fine in cover letter patch as below, so we are sync
> > on the motivation now.
> > > Yes, will update the commit log as you suggested.
> > >
> > >  " if the answer is "commands A,B,C do not fit in config space, we placed
> > commands D,E in a VQ for consistency"
> > > then that is an ok answer, but it's something to be mentioned in the commit
> > log"


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  8:05               ` Michael S. Tsirkin
@ 2022-01-18  8:23                 ` Parav Pandit
  2022-01-18 10:26                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18  8:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 1:36 PM

[..]
> > > > > You can map it to a queue, yes. But something static and read
> > > > > only such as list of commands maps well to config space. And
> > > > > it's not controlling one device from another, so does not really
> > > > > seem to belong in
> > > the admin queue.
> > > > >
> > > > Aq serves the writing device config too in patch-5 in this patchset.
> > >
> > > List of available admin commands does not need to be written.
> > >
> > It is not written into the aq commands.
> > It is part of the feature bit VIRTIO_F_ADMIN_PCI_VIRT_MANAGER indicating
> a given functionality supported or not in patch-5.
> 
> 
> Btw I don't see what does "VIRT_MANAGER" mean here. "manager" is just a
> generic thing that means nothing, and VIRT just repeats VIRTIO.
> 
VIRT doesn't repeat VIRTIO. VIRT indicates PCI virtual functions.
Manager is generic thing to manage.
Below is the snippet from patch-5. 

+  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates  
+ that the device can manage PCI related capabilities for its managed 
+ PCI VF  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,
+  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and 
+ VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
+  admin commands. This feature can be supported only by PCI devices.
+

> 
> > And structure like, virtio_admin_pci_virt_mgmt_attr_identify_result can
> potentially grow and storing those fields on on-chip resource is less efficient.
> > Hence, they are shared via AQ.
> 
> The issue is this: VIRTIO_F_ADMIN_PCI_VIRT_MANAGER seems to mean there
> are pci related admin commands. OK I guess. 
Right.
> However you then say this same
> feature bit implies generic functionality like list of supported commands.
> Confusing.
>
Not sure where I mentioned generic functionality.
This feature bit implies pci virtualization related functionality by means of above listed commands.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  8:23                 ` Parav Pandit
@ 2022-01-18 10:26                   ` Michael S. Tsirkin
  2022-01-18 10:30                     ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 10:26 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha

On Tue, Jan 18, 2022 at 08:23:57AM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 1:36 PM
> 
> [..]
> > > > > > You can map it to a queue, yes. But something static and read
> > > > > > only such as list of commands maps well to config space. And
> > > > > > it's not controlling one device from another, so does not really
> > > > > > seem to belong in
> > > > the admin queue.
> > > > > >
> > > > > Aq serves the writing device config too in patch-5 in this patchset.
> > > >
> > > > List of available admin commands does not need to be written.
> > > >
> > > It is not written into the aq commands.
> > > It is part of the feature bit VIRTIO_F_ADMIN_PCI_VIRT_MANAGER indicating
> > a given functionality supported or not in patch-5.
> > 
> > 
> > Btw I don't see what does "VIRT_MANAGER" mean here. "manager" is just a
> > generic thing that means nothing, and VIRT just repeats VIRTIO.
> > 
> VIRT doesn't repeat VIRTIO.

It's literally a substring ;)

> VIRT indicates PCI virtual functions.

I'd use something like "SRIOV" then.

> Manager is generic thing to manage.

generic to the point of being meaningless.

> Below is the snippet from patch-5. 
> 
> +  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates  
> + that the device can manage PCI related capabilities for its managed 
> + PCI VF  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,

manage twice here. VFs are always managed. And manage here again
does not mean much. I'd say something like:
"allow control over VFs through admin vq of the PF"

> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and 
> + VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
> +  admin commands. This feature can be supported only by PCI devices.
> +
> 
> > 
> > > And structure like, virtio_admin_pci_virt_mgmt_attr_identify_result can
> > potentially grow and storing those fields on on-chip resource is less efficient.
> > > Hence, they are shared via AQ.
> > 
> > The issue is this: VIRTIO_F_ADMIN_PCI_VIRT_MANAGER seems to mean there
> > are pci related admin commands. OK I guess. 
> Right.
> > However you then say this same
> > feature bit implies generic functionality like list of supported commands.
> > Confusing.
> >
> Not sure where I mentioned generic functionality.
> This feature bit implies pci virtualization related functionality by means of above listed commands.

pci virtualization being SRIOV here with PF controlling VFs.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18 10:26                   ` Michael S. Tsirkin
@ 2022-01-18 10:30                     ` Parav Pandit
  2022-01-18 10:41                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18 10:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 3:57 PM
> 
> On Tue, Jan 18, 2022 at 08:23:57AM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 18, 2022 1:36 PM
> >
> > [..]
> > > > > > > You can map it to a queue, yes. But something static and
> > > > > > > read only such as list of commands maps well to config
> > > > > > > space. And it's not controlling one device from another, so
> > > > > > > does not really seem to belong in
> > > > > the admin queue.
> > > > > > >
> > > > > > Aq serves the writing device config too in patch-5 in this patchset.
> > > > >
> > > > > List of available admin commands does not need to be written.
> > > > >
> > > > It is not written into the aq commands.
> > > > It is part of the feature bit VIRTIO_F_ADMIN_PCI_VIRT_MANAGER
> > > > indicating
> > > a given functionality supported or not in patch-5.
> > >
> > >
> > > Btw I don't see what does "VIRT_MANAGER" mean here. "manager" is
> > > just a generic thing that means nothing, and VIRT just repeats VIRTIO.
> > >
> > VIRT doesn't repeat VIRTIO.
> 
> It's literally a substring ;)
> 
> > VIRT indicates PCI virtual functions.
> 
> I'd use something like "SRIOV" then.
Yeah. This is good short string.
How about VIRTIO_F_ADMIN_PCI_SRIOV?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs
  2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
  2022-01-13 18:20   ` Michael S. Tsirkin
@ 2022-01-18 10:38   ` Michael S. Tsirkin
  1 sibling, 0 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 10:38 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, parav, shahafs,
	oren, stefanha

On Thu, Jan 13, 2022 at 04:51:03PM +0200, Max Gurtovoy wrote:
> A typical cloud provider SR-IOV use case is to create many VFs for
> use by guest VMs. The VFs may not be assigned to a VM until a user
> requests a VM of a certain size, e.g., number of CPUs.
> A VF may need
> MSI-X vectors proportional to the number of CPUs in the VM,

Problem is, it does not work like that.  VF needs vectors
proportional to # of VQs and yes, # of VQs proportional to # of CPUs.

So I am not sure what does control over # of vectors get us
without control over # of VQs. Something to better explain in
the cover letter.


> but there is
> no standard way today in the spec to change the number of MSI-X vectors
> supported by a VF, although there are some operating systems that
> support this.
> 
> Introduce new feature bits for generic PCI virtualization management
> mechanism and a specific mechanism to manage the MSI-X vector assignment
> process of virtual/managed functions by its parent virtio device via its
> admin virtqueue. For now, virtio supports only PCI virtual function
> virtualization, thus the virt manager device will be the PF and the
> managed device will be the VF.
> 
> Reviewed-by: Parav Pandit <parav@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> ---
>  admin-virtq.tex | 98 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  content.tex     | 29 ++++++++++++++-
>  2 files changed, 124 insertions(+), 3 deletions(-)
> 
> diff --git a/admin-virtq.tex b/admin-virtq.tex
> index ad20f89..4ee8a32 100644
> --- a/admin-virtq.tex
> +++ b/admin-virtq.tex
> @@ -41,9 +41,105 @@ \section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin
>  \hline
>  Opcode (bits) & Opcode (hex) & Command & M/O \\
>  \hline \hline
> - -  & 00h - 7Fh   & Generic admin cmds    & -  \\
> + 00000000b  & 00h   & VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY    & O  \\
> +\hline
> + 00000001b  & 01h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET    & O  \\
> +\hline
> + 00000010b  & 02h   & VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET    & O  \\
> +\hline
> + -  & 03h - 7Fh   & Generic admin cmds    & -  \\
>  \hline
>   -  & 80h - FFh   & Reserved    & - \\
>  \hline
>  \end{tabular}
>  
> +\subsection{VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT MGMT ATTR IDENTIFY command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY command has no command specific data set by the driver.
> +This command upon success, returns a data buffer that describes information about PCI virtualization
> +management attributes. This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_mgmt_attr_identify_result {
> +        /* For compatibility - indicates which of the below fields are valid (1 means valid):
> +         * Bit 0x0 - total_free_vfs_msix_count
> +         * Bit 0x1 - per_vf_max_msix_count
> +         * Bits 0x2 - 0x3F - reserved for future fields
> +         */
> +        le64 mask;
> +        /* Number of free msix in the global msix pool for VFs */
> +        le32 total_free_vfs_msix_count;
> +        /* Max number of msix vectors that can be assigned for a single VF */
> +        le16 per_vf_max_msix_count;
> +};
> +\end{lstlisting}
> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY SET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY SET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET command is used to modify the values of VF properties.
> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_set_data {
> +        /* The virtual function number */
> +        le16 vf_number;
> +        /* For compatibility - indicates which of the below properties should be
> +         * modified (1 means that field should be modified):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}
> +\end{note}
> +
> +This command has no command specific result set by the device. Upon success, the device guarantees
> +that all the requested properties were modified to the given values. Otherwise, error will be returned.
> +
> +\begin{note}
> +{Before setting msix_count property the virtual/managed device (VF) shall be un-initialized and may not be used by the driver.}
> +\end{note}
> +
> +\subsection{VIRTIO ADMIN PCI VIRT PROPERTY GET command}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues / VIRTIO ADMIN PCI VIRT PROPERTY GET command}
> +
> +The VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET command is used to obtain the values of VF properties.
> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +virtio_admin_pci_virt_property_get_data {
> +        /* The virtual function number */
> +        le16 vf_number;
> +        /* For compatibility - indicates which of the below properties should be
> +         * queried (1 means that field should be queried):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> +
> +\begin{note}
> +{vf_number can't be greater than NumVFs value as defined in the PCI specification
> +or smaller than 1. An error status will be returned otherwise.}
> +\end{note}
> +
> +This command, upon success, returns a data buffer that describes the properties that were requested
> +and their values for the subject virtio VF device according to the given vf_number.
> +This information is of form:
> +\begin{lstlisting}
> +struct virtio_admin_pci_virt_property_get_result {
> +        /* For compatibility - indicates which of the below fields were returned
> +         * (1 means that field was returned):
> +         * Bit 0x0 - msix_count
> +         * Bits 0x1 - 0x3F - reserved for future fields
> +         */
> +        le64 property_mask;
> +        /* The amount of MSI-X vectors */
> +        le16 msix_count;
> +};
> +\end{lstlisting}
> diff --git a/content.tex b/content.tex
> index e9c2383..64678f0 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>  \begin{description}
>  \item[0 to 23] Feature bits for the specific device type
>  
> -\item[24 to 43] Feature bits reserved for extensions to the queue and
> +\item[24 to 45] Feature bits reserved for extensions to the queue and
>    feature negotiation mechanisms
>  
> -\item[44 and above] Feature bits reserved for future extensions.
> +\item[46 and above] Feature bits reserved for future extensions.
>  \end{description}
>  
>  \begin{note}
> @@ -6878,6 +6878,17 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    that all buffers are used by the admin virtqueue of the device in
>    the same order in which they have been made available.
>  
> +  \item[VIRTIO_F_ADMIN_PCI_VIRT_MANAGER (44)] This feature indicates
> +  that the device can manage PCI related capabilities for its managed PCI VF
> +  devices and supports VIRTIO_ADMIN_PCI_VIRT_MGMT_ATTR_IDENTIFY,
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET
> +  admin commands. This feature can be supported only by PCI devices.
> +
> +  \item[VIRTIO_F_ADMIN_MSIX_MGMT (45)] This feature indicates
> +  that the device supports management of the MSI-X vectors for its
> +  managed PCI VF devices using VIRTIO_ADMIN_PCI_VIRT_PROPERTY_SET and
> +  VIRTIO_ADMIN_PCI_VIRT_PROPERTY_GET admin commands.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> @@ -6920,6 +6931,14 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A driver MAY accept VIRTIO_F_ADMIN_VQ_IN_ORDER only if it accepts
>  VIRTIO_F_ADMIN_VQ.
>  
> +A driver MAY accept VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it accepts
> +VIRTIO_F_ADMIN_VQ.
> +
> +A driver MAY accept VIRTIO_F_ADMIN_MSIX_MGMT only if it accepts
> +VIRTIO_F_ADMIN_VQ and VIRTIO_F_ADMIN_PCI_VIRT_MANAGER. Currently only
> +MSI-X management of PCI virtual functions is supported, so the driver
> +MUST NOT negotiate VIRTIO_F_ADMIN_MSIX_MGMT if VIRTIO_F_SR_IOV is not negotiated.
> +
>  \devicenormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>  
>  A device MUST offer VIRTIO_F_VERSION_1.  A device MAY fail to operate further
> @@ -6960,6 +6979,12 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  A device MAY offer VIRTIO_F_ADMIN_VQ_IN_ORDER only if it offers
>  VIRTIO_F_ADMIN_VQ.
>  
> +A PCI device MAY offer VIRTIO_F_ADMIN_PCI_VIRT_MANAGER only if it
> +offers VIRTIO_F_ADMIN_VQ.
> +
> +A PCI device MAY offer VIRTIO_F_ADMIN_MSIX_MGMT only if it
> +offers VIRTIO_F_ADMIN_VQ, VIRTIO_F_ADMIN_PCI_VIRT_MANAGER and VIRTIO_F_SR_IOV.
> +
>  \section{Legacy Interface: Reserved Feature Bits}\label{sec:Reserved Feature Bits / Legacy Interface: Reserved Feature Bits}
>  
>  Transitional devices MAY offer the following:
> -- 
> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:30                         ` Parav Pandit
  2022-01-18  7:40                           ` Michael S. Tsirkin
@ 2022-01-18 10:38                           ` Michael S. Tsirkin
  2022-01-18 10:50                             ` Parav Pandit
  1 sibling, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 10:38 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> We see that creator of the subfunction is often not the only entity managing it.
> They being same in new era finding less and less users.
> So this piece needs more discussion whenever we address that.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html

This reminds me. How do AQ commands interact with VF lifecycle?
E.g. can one change number of vectors for an active VF?
Need to specify this.

Also, I started worrying about compatibility here.
Let's say the msix capability in a VF specifies 16 vectors.
Can PF specify 32? If yes how will driver program them?
Can PF specify 8? If yes how do we make sure driver does not
attempt to use 16? And what happens if it does?
Again, something to address.


-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18 10:30                     ` Parav Pandit
@ 2022-01-18 10:41                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 10:41 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha

On Tue, Jan 18, 2022 at 10:30:38AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 3:57 PM
> > 
> > On Tue, Jan 18, 2022 at 08:23:57AM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 18, 2022 1:36 PM
> > >
> > > [..]
> > > > > > > > You can map it to a queue, yes. But something static and
> > > > > > > > read only such as list of commands maps well to config
> > > > > > > > space. And it's not controlling one device from another, so
> > > > > > > > does not really seem to belong in
> > > > > > the admin queue.
> > > > > > > >
> > > > > > > Aq serves the writing device config too in patch-5 in this patchset.
> > > > > >
> > > > > > List of available admin commands does not need to be written.
> > > > > >
> > > > > It is not written into the aq commands.
> > > > > It is part of the feature bit VIRTIO_F_ADMIN_PCI_VIRT_MANAGER
> > > > > indicating
> > > > a given functionality supported or not in patch-5.
> > > >
> > > >
> > > > Btw I don't see what does "VIRT_MANAGER" mean here. "manager" is
> > > > just a generic thing that means nothing, and VIRT just repeats VIRTIO.
> > > >
> > > VIRT doesn't repeat VIRTIO.
> > 
> > It's literally a substring ;)
> > 
> > > VIRT indicates PCI virtual functions.
> > 
> > I'd use something like "SRIOV" then.
> Yeah. This is good short string.
> How about VIRTIO_F_ADMIN_PCI_SRIOV?

OK

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 10:38                           ` Michael S. Tsirkin
@ 2022-01-18 10:50                             ` Parav Pandit
  2022-01-18 15:09                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18 10:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 4:09 PM
> 
> On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > An unrelated command to AQ in Jason's proposal [1] is about " The
> management driver MUST create a managed device by allocating".
> > We see that creator of the subfunction is often not the only entity managing
> it.
> > They being same in new era finding less and less users.
> > So this piece needs more discussion whenever we address that.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.h
> > tml
> 
> This reminds me. How do AQ commands interact with VF lifecycle?
VF device usage is controlled by the same system which is configuring the VF via its parent PF device.
So VF device shouldn't be in use. Any configuration change while VF device is in use will result in failing the AQ command.

> E.g. can one change number of vectors for an active VF?
> Need to specify this.
> 
> Also, I started worrying about compatibility here.
> Let's say the msix capability in a VF specifies 16 vectors.
> Can PF specify 32? If yes how will driver program them?
Yes, PF can change to 32. When VF driver queries the PCI capability, it will reflect 32 instead of 16.
> Can PF specify 8? If yes how do we make sure driver does not attempt to use
> 16? And what happens if it does?
PF is programming the VF msix capability in the device. So virtio pci driver operating the PCI VF device cannot access vectors beyond the max value programmed by the PF driver.

> Again, something to address.
Yes, this has to be described in the spec text. Will add more clearly in v2.

> 
> 
> --
> MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 10:50                             ` Parav Pandit
@ 2022-01-18 15:09                               ` Michael S. Tsirkin
  2022-01-18 17:17                                 ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-18 15:09 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 10:50:52AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 4:09 PM
> > 
> > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > An unrelated command to AQ in Jason's proposal [1] is about " The
> > management driver MUST create a managed device by allocating".
> > > We see that creator of the subfunction is often not the only entity managing
> > it.
> > > They being same in new era finding less and less users.
> > > So this piece needs more discussion whenever we address that.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.h
> > > tml
> > 
> > This reminds me. How do AQ commands interact with VF lifecycle?
> VF device usage is controlled by the same system which is configuring the VF via its parent PF device.
> So VF device shouldn't be in use. Any configuration change while VF device is in use will result in failing the AQ command.
> 
> > E.g. can one change number of vectors for an active VF?
> > Need to specify this.
> > 
> > Also, I started worrying about compatibility here.
> > Let's say the msix capability in a VF specifies 16 vectors.
> > Can PF specify 32? If yes how will driver program them?
> Yes, PF can change to 32. When VF driver queries the PCI capability, it will reflect 32 instead of 16.
> > Can PF specify 8? If yes how do we make sure driver does not attempt to use
> > 16? And what happens if it does?
> PF is programming the VF msix capability in the device. So virtio pci driver operating the PCI VF device cannot access vectors beyond the max value programmed by the PF driver.

Um. Interesting. This means that the msix capability of the VF changes?
Is that in fact spec compliant? Could some OSes cache the value of the
capability even if the device is not in active use? E.g. I can see how
this might happen in order to map the MSIX tables even before loading
the driver.

The spec says:
	Depending upon system software policy, system software, device driver software, or each at
	different times or environments may configure a function’s MSI-X capability and table
	structures with suitable vectors.

So MSIX canfiguation might not be up to the driver.

We actually ask driver to read back any vector assigned to a VQ
so it's possible to fail vector assignment. Maybe that's better.

> > Again, something to address.
> Yes, this has to be described in the spec text. Will add more clearly in v2.
> 
> > 
> > 
> > --
> > MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 15:09                               ` Michael S. Tsirkin
@ 2022-01-18 17:17                                 ` Parav Pandit
  2022-01-19  7:20                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-18 17:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 18, 2022 8:39 PM
> 
> On Tue, Jan 18, 2022 at 10:50:52AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 18, 2022 4:09 PM
> > >
> > > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > > An unrelated command to AQ in Jason's proposal [1] is about " The
> > > management driver MUST create a managed device by allocating".
> > > > We see that creator of the subfunction is often not the only
> > > > entity managing
> > > it.
> > > > They being same in new era finding less and less users.
> > > > So this piece needs more discussion whenever we address that.
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202108/msg001
> > > > 36.h
> > > > tml
> > >
> > > This reminds me. How do AQ commands interact with VF lifecycle?
> > VF device usage is controlled by the same system which is configuring the VF
> via its parent PF device.
> > So VF device shouldn't be in use. Any configuration change while VF device is
> in use will result in failing the AQ command.
> >
> > > E.g. can one change number of vectors for an active VF?
> > > Need to specify this.
> > >
> > > Also, I started worrying about compatibility here.
> > > Let's say the msix capability in a VF specifies 16 vectors.
> > > Can PF specify 32? If yes how will driver program them?
> > Yes, PF can change to 32. When VF driver queries the PCI capability, it will
> reflect 32 instead of 16.
> > > Can PF specify 8? If yes how do we make sure driver does not attempt
> > > to use 16? And what happens if it does?
> > PF is programming the VF msix capability in the device. So virtio pci driver
> operating the PCI VF device cannot access vectors beyond the max value
> programmed by the PF driver.
> 
> Um. Interesting. This means that the msix capability of the VF changes?
Yes.
> Is that in fact spec compliant? Could some OSes cache the value of the
> capability even if the device is not in active use? E.g. I can see how this might
> happen in order to map the MSIX tables even before loading the driver.
> 
PCI subsystem can catch the value before the device driver can load.
Generally a device support intx/msix or intx/msi. So PCI subsystem is not aware what will be used by its upper layer device drivers.
So it usually differs such initialization to a later stage until it is actually used.

Whichever OS driver which implements msix configuration, will have to either not cache it or flush+ rebuild the cache.

> The spec says:
> 	Depending upon system software policy, system software, device driver
> software, or each at
> 	different times or environments may configure a function’s MSI-X
> capability and table
> 	structures with suitable vectors.
> 
> So MSIX canfiguation might not be up to the driver.
> 
> We actually ask driver to read back any vector assigned to a VQ so it's possible
> to fail vector assignment. Maybe that's better.
> 
Virtio driver should not incur any additional complexity in re-reading vector etc.
All the msix config should happen much before drivers gets loaded for the VF.
It is PCI layer of the HV to provide a stable device to virtio device driver which is not undergoing msix table changes, when virtio device driver is operating on it.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-17 21:30       ` Michael S. Tsirkin
  2022-01-18  3:22         ` Parav Pandit
@ 2022-01-19  3:04         ` Jason Wang
  2022-01-19  8:11           ` Michael S. Tsirkin
  1 sibling, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-19  3:04 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: virtio-comment, cohuck, virtio-dev, parav, shahafs, oren, stefanha


在 2022/1/18 上午5:30, Michael S. Tsirkin 写道:
> On Mon, Jan 17, 2022 at 11:56:09AM +0200, Max Gurtovoy wrote:
>> On 1/13/2022 7:53 PM, Michael S. Tsirkin wrote:
>>> On Thu, Jan 13, 2022 at 04:50:59PM +0200, Max Gurtovoy wrote:
>>>> In one of the many use cases a user wants to manipulate features and
>>>> configuration of the virtio devices regardless of the device type
>>>> (net/block/console). Some of this configuration is generic enough. i.e
>>>> Number of MSI-X vectors of a virtio PCI VF device. There is a need to do
>>>> such features query and manipulation by its parent PCI PF.
>>>>
>>>> Currently virtio specification defines control virtqueue to manipulate
>>>> features and configuration of the device it operates on. However,
>>>> control virtqueue commands are device type specific, which makes it very
>>>> difficult to extend for device agnostic commands. Control virtqueue is
>>>> also limited to follow in order completion for the device which
>>>> negotiates VIRTIO_F_IN_ORDER feature. This feature limits the use of
>>>> control virtqueue for feature manipulation in out of order manner for
>>>> unrelated commands.
>>>>
>>>> To support these requirements which overcome above two limitations in
>>>> elegant way, this patch introduces a new admin virtqueue. Admin
>>>> virtqueue will use the same command format for all types of virtio
>>>> devices.
>>>>
>>>> Subsequent patches make use of this admin virtqueue.
>>>>
>>>> Reviewed-by: Parav Pandit <parav@nvidia.com>
>>>> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
>>>> ---
>>>>    admin-virtq.tex | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    content.tex     |  9 +++++++--
>>>>    2 files changed, 56 insertions(+), 2 deletions(-)
>>>>    create mode 100644 admin-virtq.tex
>>>>
>>>> diff --git a/admin-virtq.tex b/admin-virtq.tex
>>>> new file mode 100644
>>>> index 0000000..ad20f89
>>>> --- /dev/null
>>>> +++ b/admin-virtq.tex
>>>> @@ -0,0 +1,49 @@
>>>> +\section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / Admin Virtqueues}
>>>> +
>>>> +Admin virtqueue is used to send administrative commands to manipulate
>>>> +various features of the device which would not easily map into the
>>>> +configuration space.
>>> IMHO this is too vague to be useful. E.g. I don't really see
>>> why would not commands specified in the next patch map to config space.
>> Well I took this sentence from the current spec :)
> Well in current spec it applies to things like MAC address filtering,
> which does not easily map into config space because number of MACs
> varies.
>
>
>>>
>>> We had an off-list meeting where I proposed addressing one device
>>> from another or grouping multiple devices as a more specific
>>> scope. That would be one way to address this.
>> Are you suggestion a creation of a virtio subsystem or a virtio group
>> definition ?
>>
>> Devices will be part of this subsystem: one primary/manager device and many
>> secondary/managed devices ?
>>
>> Each subsystem will have a unique UUID and each device will have a unique
>> vdev_id within this subsystem.
>>
>> If this is the direction, I can prepare something..
> I was merely saying that what is special about admin queue is that it
> allows controlling one device from another within some group.
> Or maybe that it allows grouping multiple devices.
> *Not* that these are things that do not map to config space.
>
> Let me give you another example, imagine that you want to handle
> pagefaults from device.  Clearly a generic thing that does not map to
> config space.  It could be a good candidate for the admin queue, however
> it would require that lots of buffers are pre-added to the queue. So it
> looks like it will beed another distinct fault queue.


That seems a duplication of the PRS queue which is implemented in the 
AMD/Intel IOMMUs which I'm not sure it's worth.


>   Further it is
> possible that you want to handle faults within guest, by the driver. In
> that case you do not want it in the admin queue since that is controlled
> by hypervisor, you want it in a separate queue controlled by driver.


Exactly, another call for the using the PRS queue instead. But generally 
speaking, admin virtqueue limit or complicate the functions that can be 
exported to guest. That's why I suggest to decouple all the possible 
features out of admin virtqueue, and make it available by both the admin 
virtqueue and the transport specific method (e.g capability).

Thanks


>
>
> I don't recall discussion about UUID so I can't really say what
> I think about that. Do we need a UUID? I'm not sure I understand why.
> It can't hurt to abstract things a bit so it's not all tied to
> PFs/VFs since we know we'll want subfunctions down the road, too,
> if that is what you mean.
>
>
>
>>> Following this idea, all commands would then gain fields for addressing
>>> one device from another.
>>>
>>> Not everything maps well to a queue. E.g. it would be great to have
>>> list of available commands in memory.
>> I'm not sure I agree. Why can't it map to a queue ?
> You can map it to a queue, yes. But something static
> and read only such as list of commands maps well to
> config space. And it's not controlling one device from
> another, so does not really seem to belong in the admin queue.
>
>>> Figuring out max vectors also looks like a good
>>> example for memory and not through a command.
>> Any explanation why is it looks good ? or better ?
> why is memory easier to operate than a VQ?
> It's much simpler and so less error prone.  you can have multiple actors
> read such a field at the same time without races, so e.g.  there could
> be a sysfs attribute that reads from device on each access, and not
> special error handling is needed.
>
>>> VQ # of the admin VQ could also be made more discoverable.
>>> How about an SRIOV capability describing this stuff then?
>>>
>>>
>>>
>>>
>>>> +Use of Admin virtqueue is negotiated by the VIRTIO_F_ADMIN_VQ
>>>> +feature bit.
>>>> +
>>>> +Admin virtqueue index may vary among different device types.
>>>> +
>>>> +The Admin command set defines the commands that may be issued only to the admin
>>>> +virtqueue. Each virtio device that advertises the VIRTIO_F_ADMIN_VQ feature, MUST
>>>> +support all the mandatory admin commands. A device MAY support also one or more
>>>> +optional admin commands. All commands are of the following form:
>>>> +
>>>> +\begin{lstlisting}
>>>> +struct virtio_admin_cmd {
>>>> +        /* Device-readable part */
>>>> +        u8 command;
>>>> +        u8 command-specific-data[];
>>>> +
>>>> +        /* Device-writable part */
>>>> +        u8 status;
>>>> +        u8 command-specific-result[];
>>>> +};
>>>> +
>>>> +/* status values */
>>>> +#define VIRTIO_ADMIN_STATUS_OK 0
>>>> +#define VIRTIO_ADMIN_STATUS_ERR 1
>>>> +#define VIRTIO_ADMIN_STATUS_COMMAND_UNSUPPORTED 2
>>>> +\end{lstlisting}
>>>> +
>>>> +The \field{command} and \field{command-specific-data} are
>>>> +set by the driver, and the device sets the \field{status} and the
>>>> +\field{command-specific-result}, if needed.
>>>> +
>>>> +The following table describes the Admin command set:
>>>> +
>>>> +\begin{tabular}{|l|l|l|l|}
>>>> +\hline
>>>> +Opcode (bits) & Opcode (hex) & Command & M/O \\
>>>> +\hline \hline
>>>> + -  & 00h - 7Fh   & Generic admin cmds    & -  \\
>>>> +\hline
>>>> + -  & 80h - FFh   & Reserved    & - \\
>>>> +\hline
>>>> +\end{tabular}
>>>> +
>>> Add conformance clauses pls. If this section is too generic to have any then
>>> this functionality is too generic to be useful ;)
>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 32de668..c524fab 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -99,10 +99,10 @@ \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature B
>>>>    \begin{description}
>>>>    \item[0 to 23] Feature bits for the specific device type
>>>> -\item[24 to 40] Feature bits reserved for extensions to the queue and
>>>> +\item[24 to 41] Feature bits reserved for extensions to the queue and
>>>>      feature negotiation mechanisms
>>>> -\item[41 and above] Feature bits reserved for future extensions.
>>>> +\item[42 and above] Feature bits reserved for future extensions.
>>>>    \end{description}
>>>>    \begin{note}
>>>> @@ -449,6 +449,8 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>>>    types. It is RECOMMENDED that devices generate version 4
>>>>    UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>>> +\input{admin-virtq.tex}
>>>> +
>>>>    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>>    We start with an overview of device initialization, then expand on the
>>>> @@ -6847,6 +6849,9 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>>>      that the driver can reset a queue individually.
>>>>      See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset}.
>>>> +  \item[VIRTIO_F_ADMIN_VQ (41)] This feature indicates that
>>>> +  the device supports administration virtqueue negotiation.
>>>> +
>>>>    \end{description}
>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>>>> -- 
>>>> 2.21.0


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:07                     ` Parav Pandit
  2022-01-18  7:12                       ` Michael S. Tsirkin
  2022-01-18  7:13                       ` Michael S. Tsirkin
@ 2022-01-19  4:03                       ` Jason Wang
  2022-01-19  4:48                         ` Parav Pandit
  2 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-19  4:03 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha


在 2022/1/18 下午3:07, Parav Pandit 写道:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Tuesday, January 18, 2022 12:25 PM
>>
>> On Tue, Jan 18, 2022 at 06:32:50AM +0000, Parav Pandit wrote:
>>>
>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>> Sent: Tuesday, January 18, 2022 11:54 AM
>>>>
>>>> On Tue, Jan 18, 2022 at 04:44:36AM +0000, Parav Pandit wrote:
>>>>>
>>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>>> Sent: Tuesday, January 18, 2022 3:44 AM
>>>>>>>> It's a control queue. Why do we worry?
>>>>>>> It is used to control/manage the resource of a VF which is
>>>>>>> deployed usually
>>>>>> to a VM.
>>>>>>> So higher the latency, higher the time it takes to deploy start the VM.
>>>>>> What are the savings here, in real terms? Boot times for
>>>>>> smallest VMs are in 10s of milliseconds. Is reordering of a
>>>>>> queue somehow going to save more than microseconds?
>>>>>>
>>>>> It is probably better not to pick on a specific vendor implementation.
>>>>> But for real numbers, I see that an implementation takes 54usec to
>>>>> 500 usec
>>>> range for simple configuration.
>>>>> It is better to not small VM 4 vector configuration to take longer
>>>>> because
>>>> there was previous AQ command for 64 vectors.
>>>>
>>>> So virtio discovery on boot includes multiple of vmexits, each costs
>>>> ~1000 cycles.  And people do not seem to worry about it.
>>> It is not the vector configuration by guest VM.
>>> It is the AQ command that provisions number of msix vectors for the VF that
>> takes tens to hundreds of usecs.
>>> These are the command in patch-5 in this proposal.
>> Hundreds of usecs is negligeable compared to VM boot time.
>> Sorry I don't really see why we worry about indirect in that case.
>>
>>
> Ok. we will do incremental proposal after this for wider use case.
>
>>>> You want a compelling argument for working on performance of config.
>>>> I frankly think it's not really useful but I especially think you
>>>> should cut this out of the current proposal, it's too big as it is.
>>>>
>>> Ok. We can do follow on proposal after AQ.
>>> We already see need of out of order AQ in internal performance tests we are
>> running.
>>
>> OK so first of all you can avoid declaring IN_ORDER.
> This will force non IN_ORDER on other txq and rxq too that causes higher latency.
> But fine, initial implementation can start without it.
>
>> If you see that IN_ORDER
>> improves performance for you so you need it, then look at PARTIAL_ORDER
>> pls.
> Ok. will consider PARTIAL_ORDER more in future proposal.
>
>> And if that does not address your needs then let's discuss, I'd rather have a
>> generic solution since the requirement does not seem to be specific to AQ.
>>
>>> But fine, we can differ.
> So far I gather below summary that needs to be addressed in v2.
>
> 1. Use AQ for msix query and config


It it means IMS, there's already a proposal[1] that introduce MSI 
commands via the admin virtqueue. And we had similar requirement for 
virtio-MMIO[2] and managed device or SF [3], so I would rather to 
introduce IMS (need a better name though) as a basic facility instead of 
tie it to any specific transport.


> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of the queues
> 3. Update commit log to describe why config space is not chosen (scale, on-die registers, uniform way to handle all aq cmds)


I fail to understand the scale/registeres issues. With the one of my 
previous proposal (device selector), technically we don't even need any 
config space or BAR for VF or SF by multiplexing the registers for PF.

I do see one advantage is that the admin virtqueue is transport 
independent (or it could be used as a transport).


> 4. Improve documentation around msix config to link to sriov section of virtio spec
> 5. Describe error that if VF is bound to the device, admin commands targeting VF can fail, describe this error code
>
> Did I miss anything?
>
> Yet to receive your feedback on group, if/why is it needed and, why/if it must be in this proposal, what pieces prevents it do as follow-on.
>
> Cornelia, Jason,
> Can you please review current proposal as well before we revise v2?


If I understand correctly, most of the features (except for the admin 
virtqueue in_order stuffs) are not specific to the admin virtqueue. As 
discussed in the previous versions, I still think it's better:

1) adding sections in the basic device facility or data structure for 
provisioning and MSI
2) introduce admin virtqueue on top as an device interface for those 
features

The leaves the chance for future extensions to allow those features to 
be used by transport specific interface which will benefit for

1) vendor that doesn't want to transport specific method (MMIO or PCIe 
capability) [4]
2) features that can be used by guest or nesting environment (L1)

Thanks

[1] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html

[2] https://lkml.org/lkml/2020/1/21/31

[3] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00134.html

[4] 
https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-18  5:25             ` Michael S. Tsirkin
@ 2022-01-19  4:16               ` Jason Wang
  2022-01-19  9:26                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-19  4:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 1:25 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 18, 2022 at 10:18:29AM +0800, Jason Wang wrote:
> >
> > 在 2022/1/18 上午6:22, Michael S. Tsirkin 写道:
> > > On Mon, Jan 17, 2022 at 02:07:51PM +0000, Parav Pandit wrote:
> > > > > From: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > Sent: Sunday, January 16, 2022 3:18 PM
> > > > >
> > > > >
> > > > > On 1/13/2022 7:56 PM, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jan 13, 2022 at 04:51:02PM +0200, Max Gurtovoy wrote:
> > > > > > > Set the relevant index in case of VIRTIO_F_ADMIN_VQ negotiation.
> > > > > > >
> > > > > > > Reviewed-by: Parav Pandit <parav@nvidia.com>
> > > > > > > Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
> > > > > > So admin VQ # is only known when all features are negotiated.
> > > > > No. The driver should see if VIRTIO_NET_F_CTRL_VQ/VIRTIO_F_ADMIN_VQ
> > > > > are set by the device.
> > > > >
> > > > > Negotiation is not a must.
> > > > >
> > > > > Lets say CTRL_VQ is supported by the device and driver A would like to use it
> > > > > and driver B wouldn't like to use it - in both cases the admiq VQ # would be 2N
> > > > > + 1.
> > > > >
> > > > > > Which is quite annoying if hypervisor wants to partition things e.g.
> > > > > > handling admin q in process and handling vqs by an external process or
> > > > > > by hardware.
> > > > > >
> > > > > > I think we can allow devices to set the VQ# for the admin queue
> > > > > > instead. Would that work?
> > > > Number of MSI-X configuration and number of VQs config are two different,
> > >
> > > I was talking about the number of the VQ used for admin commands. Not
> > > about the number of VQs.
> > >
> > > > though it has strong correlation.
> > > > Configuring number of queues seems a very device specific configuration (even though num_queues is generic field in struct virtio_pci_common_cfg).
> > > >
> > > > So num VQ configuration is a different command likely combined with other device specific config such as mac or rss or others.
> > > I was not talking about that at all, but since you mention that,
> > > to me it looks like something that many device types can support.
> > > It's not necessarily rss related, MQ config would benefit too,
> > > so I am not sure why not have a command for controlling number
> > > of queues. Looks like it could quite be generic.
> > >
> > > Since current guests only support two modes: a vector
> > > per VQ and a shared vector for all VQs, it follows that
> > > it is important when configuring vectors per VF to also configure
> > > VQs per VF. This makes me wonder whether ability to configure
> > > vectors per VF in isolation without ability to configure or
> > > at least query VQs per VF even has value.
> >
> >
> > So I had some thought in the past, it looks to me we need a generic
> > provision interface that contains all the necessary attributes:
> >
> > 1) #queues
> > 2) device_features
> > 3) #msi_vectors
> > 4) device specific configurations
> >
> > It could be either an admin virtqueue interface[1] or a dedicated
> > capability[2], (the latter seems easier).
> >
> > Thanks
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html
> > [2]
> > https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> >
>
> We also need
> - something like injecting cvq commands to control rx mode from the admin device
> - page fault / dirty page handling
>
> these two seem to call for a vq.

Right, but vq is not necessarily for PF if we had PASID. And with
PASID we don't even need a dedicated new cvq.

Thanks

>
>
> > >
> > >
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18  7:40                           ` Michael S. Tsirkin
@ 2022-01-19  4:21                             ` Jason Wang
  2022-01-19  9:30                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-19  4:21 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha


在 2022/1/18 下午3:40, Michael S. Tsirkin 写道:
> On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Tuesday, January 18, 2022 12:42 PM
>>>
>>> On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
>>>> 1. Use AQ for msix query and config
>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
>>>> the queues 3. Update commit log to describe why config space is not
>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
>>>> Improve documentation around msix config to link to sriov section of
>>>> virtio spec 5. Describe error that if VF is bound to the device, admin
>>>> commands targeting VF can fail, describe this error code
>>>>
>>>> Did I miss anything?
>>> Better document in spec text just what is the scope for AQ.
>>>
>> Yes, will improve this spec.
>>   
>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it must
>>> be in this proposal, what pieces prevents it do as follow-on.
>>>
>>> I think this is related to the subfunction usecase or other future usecase. In
>>> case of PF/VF grouping is implicit through the SRIOV capability. It would be
>>> nice to have things somewhat generic in most of the text though since we
>>> already know this will be needed.
>>> E.g. jason sent a proposal for commands to add/delete subfunctions, take a
>>> look at it, somehow AQ needs to be extendable to support that functionality
>>> too.
>> I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
>> But more commands can be added in future.
>>
>> What I wanted to check with you and other is, do we want command opcode to be 7-bit enough?
>> #127 is lot of admin commands. 😊
>> But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
>> What do you think?
> I agree, we are not short on bits.
>
>> An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
>> We see that creator of the subfunction is often not the only entity managing it.
> I think whoever does it can go through the main function driver.
>
>> They being same in new era finding less and less users.
>> So this piece needs more discussion whenever we address that.
>>
>> [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html


Yes, I do that for dynamic provisioning which seems a requirement (or 
better to have) for SIOV spec. We can extend or tweak it for static 
provisioning.

Thanks



^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  4:03                       ` Jason Wang
@ 2022-01-19  4:48                         ` Parav Pandit
  2022-01-19 20:25                           ` Parav Pandit
  2022-01-25  3:29                           ` Jason Wang
  0 siblings, 2 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-19  4:48 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, January 19, 2022 9:33 AM
> 
> 
> It it means IMS, there's already a proposal[1] that introduce MSI commands
> via the admin virtqueue. And we had similar requirement for virtio-MMIO[2]
> and managed device or SF [3], so I would rather to introduce IMS (need a
> better name though) as a basic facility instead of tie it to any specific
> transport.
> 
IMS of [1] is a interrupt configuration by the virtio driver for the device is it driving, which needs a queue.
So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a generic admin queue not attached to device type.
And AQ in this proposal exactly serves this purpose.

Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max vector count are two different functionality.
Both of these commands can ride on a generic queue.
However the queue is not same, because
PF owns its own admin queue (for vf msix config), 
VF or SF operates its own admin queue (for IMS config).

So a good example is,
1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ in GVM.
Both the functions will have AQ feature bit set.

Fair enough, so we have more users of admin queue than just MSI-X config.

> 
> > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > the queues 3. Update commit log to describe why config space is not
> > chosen (scale, on-die registers, uniform way to handle all aq cmds)
> 
> 
> I fail to understand the scale/registeres issues. With the one of my previous
> proposal (device selector), technically we don't even need any config space or
> BAR for VF or SF by multiplexing the registers for PF.
>
Scale issue is: when you want to create, query, manipulate hundreds of objects, having shared MMIO register or configuration register, will be too slow.
And additionally such register set doesn't scale to allow sharing large number of bytes as DMA cannot be done.

From physical device perspective, it doesn’t scale because device needs to have those resources ready to answer on MMIO reads and for hundreds to thousand of devices it just cannot do it.
This is one of the reason for birth of IMS.

> I do see one advantage is that the admin virtqueue is transport independent
> (or it could be used as a transport).
> 
I am yet to read the transport part from [1].

> 
> > 4. Improve documentation around msix config to link to sriov section of virtio
> spec
> > 5. Describe error that if VF is bound to the device, admin commands
> targeting VF can fail, describe this error code
> >
> > Did I miss anything?
> >
> > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> be in this proposal, what pieces prevents it do as follow-on.
> >
> > Cornelia, Jason,
> > Can you please review current proposal as well before we revise v2?
> 
> 
> If I understand correctly, most of the features (except for the admin
> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
> discussed in the previous versions, I still think it's better:
> 
> 1) adding sections in the basic device facility or data structure for
> provisioning and MSI
> 2) introduce admin virtqueue on top as an device interface for those
> features
>
I didn't follow your suggestion. Can you please explain?
Specifically "data structure for provisioning and MSI"..
 
> The leaves the chance for future extensions to allow those features to
> be used by transport specific interface which will benefit for
> 
AQ allows communication (command, response) between driver and device in transport independent way.
Sometimes it query/set transport specific fields like MSI-X vectors of VF.
Sometimes device configure its on IMS interrupt.
Something else in future.
So it is really a generic request-response queue.

> [1]
> https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-18 17:17                                 ` Parav Pandit
@ 2022-01-19  7:20                                   ` Michael S. Tsirkin
  2022-01-19  8:15                                     ` [virtio-dev] " Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  7:20 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 18, 2022 at 05:17:06PM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 18, 2022 8:39 PM
> > 
> > On Tue, Jan 18, 2022 at 10:50:52AM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 18, 2022 4:09 PM
> > > >
> > > > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > > > An unrelated command to AQ in Jason's proposal [1] is about " The
> > > > management driver MUST create a managed device by allocating".
> > > > > We see that creator of the subfunction is often not the only
> > > > > entity managing
> > > > it.
> > > > > They being same in new era finding less and less users.
> > > > > So this piece needs more discussion whenever we address that.
> > > > >
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202108/msg001
> > > > > 36.h
> > > > > tml
> > > >
> > > > This reminds me. How do AQ commands interact with VF lifecycle?
> > > VF device usage is controlled by the same system which is configuring the VF
> > via its parent PF device.
> > > So VF device shouldn't be in use. Any configuration change while VF device is
> > in use will result in failing the AQ command.
> > >
> > > > E.g. can one change number of vectors for an active VF?
> > > > Need to specify this.
> > > >
> > > > Also, I started worrying about compatibility here.
> > > > Let's say the msix capability in a VF specifies 16 vectors.
> > > > Can PF specify 32? If yes how will driver program them?
> > > Yes, PF can change to 32. When VF driver queries the PCI capability, it will
> > reflect 32 instead of 16.
> > > > Can PF specify 8? If yes how do we make sure driver does not attempt
> > > > to use 16? And what happens if it does?
> > > PF is programming the VF msix capability in the device. So virtio pci driver
> > operating the PCI VF device cannot access vectors beyond the max value
> > programmed by the PF driver.
> > 
> > Um. Interesting. This means that the msix capability of the VF changes?
> Yes.
> > Is that in fact spec compliant? Could some OSes cache the value of the
> > capability even if the device is not in active use? E.g. I can see how this might
> > happen in order to map the MSIX tables even before loading the driver.
> > 
> PCI subsystem can catch the value before the device driver can load.
> Generally a device support intx/msix or intx/msi. So PCI subsystem is not aware what will be used by its upper layer device drivers.
> So it usually differs such initialization to a later stage until it is actually used.
> 
> Whichever OS driver which implements msix configuration, will have to either not cache it or flush+ rebuild the cache.

Seems to contradict what the spec says (below).

> > The spec says:
> > 	Depending upon system software policy, system software, device driver
> > software, or each at
> > 	different times or environments may configure a function’s MSI-X
> > capability and table
> > 	structures with suitable vectors.
> > 
> > So MSIX canfiguation might not be up to the driver.
> > 
> > We actually ask driver to read back any vector assigned to a VQ so it's possible
> > to fail vector assignment. Maybe that's better.
> > 
> Virtio driver should not incur any additional complexity in re-reading vector etc.

I think it does this already.

> All the msix config should happen much before drivers gets loaded for the VF.
> It is PCI layer of the HV to provide a stable device to virtio device driver which is not undergoing msix table changes, when virtio device driver is operating on it.


Problem is in the guest though. I'm not sure we can rely on this part
being part of the driver and not part of the OS.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19  3:04         ` Jason Wang
@ 2022-01-19  8:11           ` Michael S. Tsirkin
  2022-01-25  3:35             ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  8:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, parav, shahafs,
	oren, stefanha

On Wed, Jan 19, 2022 at 11:04:50AM +0800, Jason Wang wrote:
> Exactly, another call for the using the PRS queue instead. But generally
> speaking, admin virtqueue limit or complicate the functions that can be
> exported to guest. That's why I suggest to decouple all the possible
> features out of admin virtqueue, and make it available by both the admin
> virtqueue and the transport specific method (e.g capability).

I'm not exactly sure what's wrong with starting with a queue, if there's
need to also allow passing that over another transport we can add that.
In particular, I think it's useful to have a capability to inject
requests as if they have been passed through a VQ.
Such a capability would address this need, won't it?

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  7:20                                   ` Michael S. Tsirkin
@ 2022-01-19  8:15                                     ` Parav Pandit
  2022-01-19  8:21                                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-19  8:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha


> From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On
> Behalf Of Michael S. Tsirkin
> Sent: Wednesday, January 19, 2022 12:51 PM

> > > Um. Interesting. This means that the msix capability of the VF changes?
> > Yes.
> > > Is that in fact spec compliant? Could some OSes cache the value of
> > > the capability even if the device is not in active use? E.g. I can
> > > see how this might happen in order to map the MSIX tables even before
> loading the driver.
> > >
> > PCI subsystem can catch the value before the device driver can load.
> > Generally a device support intx/msix or intx/msi. So PCI subsystem is not
> aware what will be used by its upper layer device drivers.
> > So it usually differs such initialization to a later stage until it is actually used.
> >
> > Whichever OS driver which implements msix configuration, will have to
> either not cache it or flush+ rebuild the cache.
> 
> Seems to contradict what the spec says (below).
No it doesn't. spec covers that dependency is on system software policy, system sw, driver sw.
So this system that contains policy, sw, driver sw will implement virtio extension.

In above sentence of "whichever OS driver", it covers all sw components involved in this functionality.
> 
> > > The spec says:
> > > 	Depending upon system software policy, system software, device
> > > driver software, or each at
> > > 	different times or environments may configure a function’s MSI-X
> > > capability and table
> > > 	structures with suitable vectors.
> > >
> > > So MSIX canfiguation might not be up to the driver.
> > >
> > > We actually ask driver to read back any vector assigned to a VQ so
> > > it's possible to fail vector assignment. Maybe that's better.
> > >
> > Virtio driver should not incur any additional complexity in re-reading vector
> etc.
> 
> I think it does this already.
When does it re-read?
I do not follow your point of "ask driver to read back any vector". When do you want to do this?

> 
> > All the msix config should happen much before drivers gets loaded for the VF.
> > It is PCI layer of the HV to provide a stable device to virtio device driver which
> is not undergoing msix table changes, when virtio device driver is operating on
> it.
> 
> 
> Problem is in the guest though. I'm not sure we can rely on this part being part
> of the driver and not part of the OS.
It is part of the system software that consist of virtio driver, pci subsystem and user interface.
I do not follow your comment about "problem is in guest though".
Can you please explain?
VF is simply not available to the guest, when HV has not given it. And when its given, HV doesn’t modify the msix in some random manner.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  8:15                                     ` [virtio-dev] " Parav Pandit
@ 2022-01-19  8:21                                       ` Michael S. Tsirkin
  2022-01-19 10:10                                         ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  8:21 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 19, 2022 at 08:15:50AM +0000, Parav Pandit wrote:
> > > Virtio driver should not incur any additional complexity in re-reading vector
> > etc.
> > 
> > I think it does this already.
> When does it re-read?
> I do not follow your point of "ask driver to read back any vector". When do you want to do this?


After mapping an event to vector, the
driver MUST verify success by reading the Vector field value: on
success, the previously written value is returned, and on
failure, NO_VECTOR is returned. If a mapping failure is detected,
the driver MAY retry mapping with fewer vectors, disable MSI-X
or report device failure.



> > 
> > > All the msix config should happen much before drivers gets loaded for the VF.
> > > It is PCI layer of the HV to provide a stable device to virtio device driver which
> > is not undergoing msix table changes, when virtio device driver is operating on
> > it.
> > 
> > 
> > Problem is in the guest though. I'm not sure we can rely on this part being part
> > of the driver and not part of the OS.
> It is part of the system software that consist of virtio driver, pci subsystem and user interface.
> I do not follow your comment about "problem is in guest though".

sorry I meant host of course.

> Can you please explain?
> VF is simply not available to the guest, when HV has not given it. And
> when its given, HV doesn’t modify the msix in some random manner.

I am concerned that we can not be sure that changing MSIX capability
while device is present is safe since spec does not promise
the capability is not read by host at boot. However, given device can instead
fail to map events to vectors, even if it is not safe we have other
ways to fail gracefully. It's probably a good idea to mention all
this in the spec text.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-19  4:16               ` Jason Wang
@ 2022-01-19  9:26                 ` Michael S. Tsirkin
  2022-01-25  3:53                   ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  9:26 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > We also need
> > - something like injecting cvq commands to control rx mode from the admin device
> > - page fault / dirty page handling
> >
> > these two seem to call for a vq.
> 
> Right, but vq is not necessarily for PF if we had PASID. And with
> PASID we don't even need a dedicated new cvq.

I don't think it's a good idea to mix transactions from
multiple PASIDs on the same vq.

Attaching a PASID to a queue seems more reasonable.
cvq is under guest control, so yes I think a separate
vq is preferable.

What is true is that with subfunctions you would have
PASID per subfunction and then one subfunction for control.

I think a sketch of how things will work with scalable iov can't hurt as
part of this proposal.  And, I'm not sure we should have so much
flexibility: if there's an interface that works for SRIOV and SIOV then
that seems preferable than having distinct transports for SRIOV and
SIOV.

-- 
MST

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  4:21                             ` Jason Wang
@ 2022-01-19  9:30                               ` Michael S. Tsirkin
  2022-01-25  3:39                                 ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19  9:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 19, 2022 at 12:21:36PM +0800, Jason Wang wrote:
> 
> 在 2022/1/18 下午3:40, Michael S. Tsirkin 写道:
> > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > 
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 18, 2022 12:42 PM
> > > > 
> > > > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > > > 1. Use AQ for msix query and config
> > > > > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > > > > the queues 3. Update commit log to describe why config space is not
> > > > > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > > > > Improve documentation around msix config to link to sriov section of
> > > > > virtio spec 5. Describe error that if VF is bound to the device, admin
> > > > > commands targeting VF can fail, describe this error code
> > > > > 
> > > > > Did I miss anything?
> > > > Better document in spec text just what is the scope for AQ.
> > > > 
> > > Yes, will improve this spec.
> > > > > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> > > > be in this proposal, what pieces prevents it do as follow-on.
> > > > 
> > > > I think this is related to the subfunction usecase or other future usecase. In
> > > > case of PF/VF grouping is implicit through the SRIOV capability. It would be
> > > > nice to have things somewhat generic in most of the text though since we
> > > > already know this will be needed.
> > > > E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> > > > look at it, somehow AQ needs to be extendable to support that functionality
> > > > too.
> > > I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
> > > But more commands can be added in future.
> > > 
> > > What I wanted to check with you and other is, do we want command opcode to be 7-bit enough?
> > > #127 is lot of admin commands. 😊
> > > But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
> > > What do you think?
> > I agree, we are not short on bits.
> > 
> > > An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> > > We see that creator of the subfunction is often not the only entity managing it.
> > I think whoever does it can go through the main function driver.
> > 
> > > They being same in new era finding less and less users.
> > > So this piece needs more discussion whenever we address that.
> > > 
> > > [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> 
> 
> Yes, I do that for dynamic provisioning which seems a requirement (or better
> to have) for SIOV spec. We can extend or tweak it for static provisioning.
> 
> Thanks
> 

So you are basically saying that since with scalable iov we need
commands to create subfunctions, let's straight away teach
people to use them to manage VFs.
So before a VF can be used, you are asking that people "allocate" it
through a PF.  Is that right?

I have to say that addresses one concern I just had, which is that
it's unclear what is the status of a VF before any commands are
issued.


-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  8:21                                       ` Michael S. Tsirkin
@ 2022-01-19 10:10                                         ` Parav Pandit
  2022-01-19 16:40                                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-19 10:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, January 19, 2022 1:51 PM
> 
> On Wed, Jan 19, 2022 at 08:15:50AM +0000, Parav Pandit wrote:
> > > > Virtio driver should not incur any additional complexity in
> > > > re-reading vector
> > > etc.
> > >
> > > I think it does this already.
> > When does it re-read?
> > I do not follow your point of "ask driver to read back any vector". When do
> you want to do this?
> 
> 
> After mapping an event to vector, the
> driver MUST verify success by reading the Vector field value: on success, the
> previously written value is returned, and on failure, NO_VECTOR is returned. If
> a mapping failure is detected, the driver MAY retry mapping with fewer
> vectors, disable MSI-X or report device failure.
Ok I got it now.
But insane HV can attempt to change the value this vector even after this read was successful.
And it will obviously break the VM.
This isn't the usage model.
PF (admin device) user giving VF to VM ( = system software) has to ensure that they don’t give VF to VM while in middle of configuration.
We must add it to the spec in v2.

> 
> > >
> > > > All the msix config should happen much before drivers gets loaded for the
> VF.
> > > > It is PCI layer of the HV to provide a stable device to virtio
> > > > device driver which
> > > is not undergoing msix table changes, when virtio device driver is
> > > operating on it.
> > >
> > >
> > > Problem is in the guest though. I'm not sure we can rely on this
> > > part being part of the driver and not part of the OS.
> > It is part of the system software that consist of virtio driver, pci subsystem
> and user interface.
> > I do not follow your comment about "problem is in guest though".
> 
> sorry I meant host of course.
> 
> > Can you please explain?
> > VF is simply not available to the guest, when HV has not given it. And
> > when its given, HV doesn’t modify the msix in some random manner.
> 
> I am concerned that we can not be sure that changing MSIX capability while
> device is present is safe since spec does not promise the capability is not read
> by host at boot. However, given device can instead fail to map events to
> vectors, even if it is not safe we have other ways to fail gracefully. It's probably
> a good idea to mention all this in the spec text.
It is the system who implements virtio spec has to ensure that it doesn't change msix capability while device is use.
Virtio spec should define a minimum expectations from the system such as flushing the cache or no cache or not use the device while undergoing config.

For sure, this will be added to v2.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-18  7:20               ` Michael S. Tsirkin
@ 2022-01-19 11:33                 ` Max Gurtovoy
  2022-01-19 12:21                   ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-19 11:33 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha


On 1/18/2022 9:20 AM, Michael S. Tsirkin wrote:
> On Tue, Jan 18, 2022 at 07:14:56AM +0000, Parav Pandit wrote:
>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Tuesday, January 18, 2022 12:38 PM
>>>> Subfunctions with PASID can be similarly managed by extending device
>>> identification and its MSIX/IMS vector details.
>>>> May be vf_number should be put in the union as,
>>>>
>>>> union device_id {
>>>> 	struct pci_vf vf_id; /* current */
>>>> 	struct pci_sf sf_id; /* future */
>>>> };
>>>>
>>>> So that they both can use command opcode.
>>> device id is not a good name, but yes. However this is why I think we should
>>> have a slightly more generic terminology, and more space for these IDs, and
>>> then we'd have a specific binding for VFs.
>>>
>> I couldn't think of better name for identifying a PCI VF. But yes have to think better name for sure.
>>
>>>>> I am not
>>>>> asking you to add such mechanisms straight away but the current
>>>>> proposal kind of obscures this to the point where I don't see how
>>>>> would we extend it with these things down the road.
>>>>>
>>>> Which part in specific make it obscure?
>>> just that the text is not generic. would be nicer if adding new types would
>>> involve only changing one or two places
>>>
>>>> New device type can be identifiable by above union.
>>>>
>>>> May be a better structure would be in patch-5 is:
>>>> Something like below,
>>>>
>>>> struct virtio_admin_pci_virt_property_set {
>>>> 	enum virtio_device_identifier_type type; /* pci pf, pci vf, subfunction
>>> */
>>>> 	union virtio_device_identifier {
>>>> 		struct virtio_pci_dev_id pf_vf; /* current */
>>>> 		struct virtio_subfunction sf; /* future */
>>>> 	};
>>>> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
>>> specific, intx, something else */
>>>> 	union virtio_interrupt_config {
>>>> 		struct virtio_pci_msix_config msix_config;
>>>> 	};
>>>> };
>>>>
>>>> struct virtio_pci_interrupt_config {
>>>> 	le16 msix_count;
>>>> };
>>> you do not need a union straight away, Simply use something like this "device
>>> identifier" everywhere and then add some text explaining that currently it is a
>>> VF number and that admin device is a PF.
>> Unless we reserve some bytes, I fail to see how can it be future compatible for unknown device id type for subfunction.
> So reserve some bytes then. 4 should be plenty.

Ok so in V2 we'll use 4 bytes as device identifier to be generic.

We can call it lid (local id) of vlid (virtio local id) ?

Are we ok with one of the above names ?



^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 11:33                 ` Max Gurtovoy
@ 2022-01-19 12:21                   ` Parav Pandit
  2022-01-19 14:47                     ` Max Gurtovoy
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-19 12:21 UTC (permalink / raw)
  To: Max Gurtovoy, Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha


> From: Max Gurtovoy <mgurtovoy@nvidia.com>
> Sent: Wednesday, January 19, 2022 5:04 PM

[..]
> >>>> struct virtio_admin_pci_virt_property_set {
> >>>> 	enum virtio_device_identifier_type type; /* pci pf, pci vf,
> >>>> subfunction
> >>> */
> >>>> 	union virtio_device_identifier {
> >>>> 		struct virtio_pci_dev_id pf_vf; /* current */
> >>>> 		struct virtio_subfunction sf; /* future */
> >>>> 	};
> >>>> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
> >>> specific, intx, something else */
> >>>> 	union virtio_interrupt_config {
> >>>> 		struct virtio_pci_msix_config msix_config;
> >>>> 	};
> >>>> };
> >>>>
> >>>> struct virtio_pci_interrupt_config {
> >>>> 	le16 msix_count;
> >>>> };
> >>> you do not need a union straight away, Simply use something like
> >>> this "device identifier" everywhere and then add some text
> >>> explaining that currently it is a VF number and that admin device is a PF.
> >> Unless we reserve some bytes, I fail to see how can it be future compatible
> for unknown device id type for subfunction.
> > So reserve some bytes then. 4 should be plenty.
> 
I am not comfortable reserving 4 bytes for sf, though it is good option and already in use in one OS for more a year now.

> Ok so in V2 we'll use 4 bytes as device identifier to be generic.
> 
> We can call it lid (local id) of vlid (virtio local id) ?
> 
> Are we ok with one of the above names ?
> 
I go back to rethink the structure, and don’t see a need to abstract something which is so well defined.

I see need of below structures, how should it be made more abstract without breaking backward compat and without defining as TLV.

struct virtio_admin_pci_vf_interrupt_config {
	/* v1 current */
	le64 property_mask; /* bit 0 valid */
	le16 vf_number;
	le16 msix_count;
};

struct virtio_admin_pci_vf_interrupt_config {
	/* v2 near future, backward compatible */
	le64 property_mask; /* bit 0, 1 valid */
	le16 vf_number;
	le16 msix_count;
	le16 ims_count;
};

struct virtio_admin_pci_sf_interrupt_config {
	/* v3 future, new struct, no need of backward compat */
	le64 property_mask; /* bit 0,1,2, valid */
	le32 sf_number;
	/* is 4 bytes enough to describe sf,
	 * what if community decides uuid to identify sf?
	 * How about we take out device identifier outside of this struct?
	 */
	le16 msix_count;
	le16 ims_count;
	le16 pci_caps; /* pci atomics enable */
};

virtio_unknown_transport_interrupt_config {
	/* vX future */
	le64 property_mask;
	<unknown len> device identifier;
	le16 msix_count;
	le16 ims_count;
};

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 12:21                   ` Parav Pandit
@ 2022-01-19 14:47                     ` Max Gurtovoy
  2022-01-19 15:38                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-19 14:47 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: virtio-comment, cohuck, virtio-dev, jasowang, Shahaf Shuler,
	Oren Duer, stefanha


On 1/19/2022 2:21 PM, Parav Pandit wrote:
>> From: Max Gurtovoy <mgurtovoy@nvidia.com>
>> Sent: Wednesday, January 19, 2022 5:04 PM
> [..]
>>>>>> struct virtio_admin_pci_virt_property_set {
>>>>>> 	enum virtio_device_identifier_type type; /* pci pf, pci vf,
>>>>>> subfunction
>>>>> */
>>>>>> 	union virtio_device_identifier {
>>>>>> 		struct virtio_pci_dev_id pf_vf; /* current */
>>>>>> 		struct virtio_subfunction sf; /* future */
>>>>>> 	};
>>>>>> 	enum virtio_interrupt_type interrupt_type; /* msix, ims=device
>>>>> specific, intx, something else */
>>>>>> 	union virtio_interrupt_config {
>>>>>> 		struct virtio_pci_msix_config msix_config;
>>>>>> 	};
>>>>>> };
>>>>>>
>>>>>> struct virtio_pci_interrupt_config {
>>>>>> 	le16 msix_count;
>>>>>> };
>>>>> you do not need a union straight away, Simply use something like
>>>>> this "device identifier" everywhere and then add some text
>>>>> explaining that currently it is a VF number and that admin device is a PF.
>>>> Unless we reserve some bytes, I fail to see how can it be future compatible
>> for unknown device id type for subfunction.
>>> So reserve some bytes then. 4 should be plenty.
> I am not comfortable reserving 4 bytes for sf, though it is good option and already in use in one OS for more a year now.
>
>> Ok so in V2 we'll use 4 bytes as device identifier to be generic.
>>
>> We can call it lid (local id) of vlid (virtio local id) ?
>>
>> Are we ok with one of the above names ?
>>
> I go back to rethink the structure, and don’t see a need to abstract something which is so well defined.
>
> I see need of below structures, how should it be made more abstract without breaking backward compat and without defining as TLV.

I agree, I don't see why it's not possible to use a different command 
opcode for vf interrupt configuration and sf interrupt configuration.

We're not short in opcodes and it's very elegant and extendable IMO.

I think the order should be:

1. add adminq to virtio spec with one simple example (say MSIX config 
for VFs)

2. in parallel submission for admin commands: S-IOV support, num VQs 
config, feature bits config and more...

The below example emphasizes that the adminq protocol is flexible and 
easily extendable.

>
> struct virtio_admin_pci_vf_interrupt_config {
> 	/* v1 current */
> 	le64 property_mask; /* bit 0 valid */
> 	le16 vf_number;
> 	le16 msix_count;
> };
>
> struct virtio_admin_pci_vf_interrupt_config {
> 	/* v2 near future, backward compatible */
> 	le64 property_mask; /* bit 0, 1 valid */
> 	le16 vf_number;
> 	le16 msix_count;
> 	le16 ims_count;
> };
>
> struct virtio_admin_pci_sf_interrupt_config {
> 	/* v3 future, new struct, no need of backward compat */
> 	le64 property_mask; /* bit 0,1,2, valid */
> 	le32 sf_number;
> 	/* is 4 bytes enough to describe sf,
> 	 * what if community decides uuid to identify sf?
> 	 * How about we take out device identifier outside of this struct?
> 	 */
> 	le16 msix_count;
> 	le16 ims_count;
> 	le16 pci_caps; /* pci atomics enable */
> };
>
> virtio_unknown_transport_interrupt_config {
> 	/* vX future */
> 	le64 property_mask;
> 	<unknown len> device identifier;
> 	le16 msix_count;
> 	le16 ims_count;
> };


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 14:47                     ` Max Gurtovoy
@ 2022-01-19 15:38                       ` Michael S. Tsirkin
  2022-01-19 15:47                         ` Max Gurtovoy
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19 15:38 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Parav Pandit, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 19, 2022 at 04:47:10PM +0200, Max Gurtovoy wrote:
> I agree, I don't see why it's not possible to use a different command opcode
> for vf interrupt configuration and sf interrupt configuration.
> 
> We're not short in opcodes and it's very elegant and extendable IMO.
> 
> I think the order should be:
> 
> 1. add adminq to virtio spec with one simple example (say MSIX config for
> VFs)
> 
> 2. in parallel submission for admin commands: S-IOV support, num VQs config,
> feature bits config and more...

Up to you for sure but didn't you guys try this already?  I think there
are concerns such as how this will be extended to support subfunctions.
I can't say what do you want to do about that in v2, ignoring them
completely is probably not a good way to get more support in the TC.
Just my two cents, hope this helps.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19 15:38                       ` Michael S. Tsirkin
@ 2022-01-19 15:47                         ` Max Gurtovoy
  0 siblings, 0 replies; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-19 15:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha


On 1/19/2022 5:38 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 19, 2022 at 04:47:10PM +0200, Max Gurtovoy wrote:
>> I agree, I don't see why it's not possible to use a different command opcode
>> for vf interrupt configuration and sf interrupt configuration.
>>
>> We're not short in opcodes and it's very elegant and extendable IMO.
>>
>> I think the order should be:
>>
>> 1. add adminq to virtio spec with one simple example (say MSIX config for
>> VFs)
>>
>> 2. in parallel submission for admin commands: S-IOV support, num VQs config,
>> feature bits config and more...
> Up to you for sure but didn't you guys try this already?  I think there
> are concerns such as how this will be extended to support subfunctions.
> I can't say what do you want to do about that in v2, ignoring them
> completely is probably not a good way to get more support in the TC.
> Just my two cents, hope this helps.

I think that Parav demonstrated the flexibility and extendability of 
this interface.

And also I think you mentioned that you don't expect us to instrument 
this solution to this series.

During this discussion we agreed that using admin commands one can 
manage SRIOV and SIOV, didn't we ?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19 10:10                                         ` Parav Pandit
@ 2022-01-19 16:40                                           ` Michael S. Tsirkin
  2022-01-19 17:07                                             ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-19 16:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 19, 2022 at 10:10:38AM +0000, Parav Pandit wrote:
> Virtio spec should define a minimum expectations from the system such
> as flushing the cache or no cache

well one of the things virtio is trying to do is being compatible with a
wide range of hypervisors/OSes. it might be tricky to change how they
work internally. if we are relying on tricks like this it might be
necessary to poke at some popular systems to see what they do.
lots of work ...

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [virtio-dev] Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19 16:40                                           ` Michael S. Tsirkin
@ 2022-01-19 17:07                                             ` Parav Pandit
  0 siblings, 0 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-19 17:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, cohuck, virtio-dev, jasowang,
	Shahaf Shuler, Oren Duer, stefanha



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, January 19, 2022 10:10 PM
> 
> On Wed, Jan 19, 2022 at 10:10:38AM +0000, Parav Pandit wrote:
> > Virtio spec should define a minimum expectations from the system such
> > as flushing the cache or no cache
> 
> well one of the things virtio is trying to do is being compatible with a wide
> range of hypervisors/OSes. 
We are not breaking any compatibility with this optional enhancement.

> it might be tricky to change how they work
> internally. if we are relying on tricks like this it might be necessary to poke at
> some popular systems to see what they do.
Sure it should work on wide range of hypervisors/OSes. It's a new feature so those will implement when scale is critical for them.

If we consider Linux as popular system than Linux pci subsystem and mlx5 driver already implements it in upstream kernel 5.13.
(It doesn't cache it).

Similar implementation for non virtio also exists in _other_ popular OS, which I should avoid annotating here.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  4:48                         ` Parav Pandit
@ 2022-01-19 20:25                           ` Parav Pandit
  2022-01-25  3:45                             ` Jason Wang
  2022-01-25  3:29                           ` Jason Wang
  1 sibling, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-19 20:25 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha

Hi Jason,

> From: Parav Pandit
> Sent: Wednesday, January 19, 2022 10:18 AM
> 
> 
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, January 19, 2022 9:33 AM
> >
> >
> > It it means IMS, there's already a proposal[1] that introduce MSI
> > commands via the admin virtqueue. And we had similar requirement for
> > virtio-MMIO[2] and managed device or SF [3], so I would rather to
> > introduce IMS (need a better name though) as a basic facility instead
> > of tie it to any specific transport.
> >
> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
> driving, which needs a queue.
> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
> generic admin queue not attached to device type.
> And AQ in this proposal exactly serves this purpose.
> 
> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
> vector count are two different functionality.
> Both of these commands can ride on a generic queue.
> However the queue is not same, because
> PF owns its own admin queue (for vf msix config), VF or SF operates its own
> admin queue (for IMS config).
> 
> So a good example is,
> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ in
> GVM.
> Both the functions will have AQ feature bit set.
> 
> Fair enough, so we have more users of admin queue than just MSI-X config.
> 
If we treat the AQ as any other VQ, than IMS configuration by the virtio driver of #2 for VF or SF cannot be done through AQ.
Because AQ needs to be live before setting DRIVER_OK.
And spec mandates "The driver MUST configure the other virtqueue fields before enabling the virtqueue with queue_enable."
This includes the vector configuration.
So AQ will operate in polling mode which is not so good.
And it cannot be disabled to update MSI-X/IMS vectore because spec says "The driver MUST NOT write a 0 to queue_enable.".

Make special exception to AQ is equally not good in physical device implementation given all the queue config is located in single config space.

So a better approach for IMS configuration would be to have, limited config registers to bootstrap IMS configuration.
Something like,

struct virtio_pci_dev_interrupt_cfg {
	le16 max_device_interrupts; /* read-only tells maximum ims interrupts */
	le16 refer_index; /* write-only interrupt index to program */
	u8 status;	/* read only response by device, 0x0 = success, 0x1=busy, other error codes */
	u8 enable;	/* enable/disable interrupt referred by index */
	le16 reserved_pad;
	u64 addr; /* write only interrupt addr handle on MSIX TLPs */
	u32 data; /* write only interrupt data handle.. */
};
This way, VF, SF can configure large number of vectors before starting the queue and without on-chip resource.
Only on-chip resource required per VF is 20 bytes of above cfg registers.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  4:48                         ` Parav Pandit
  2022-01-19 20:25                           ` Parav Pandit
@ 2022-01-25  3:29                           ` Jason Wang
  2022-01-25  3:52                             ` Parav Pandit
  1 sibling, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-25  3:29 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha


在 2022/1/19 下午12:48, Parav Pandit 写道:
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Wednesday, January 19, 2022 9:33 AM
>>
>>
>> It it means IMS, there's already a proposal[1] that introduce MSI commands
>> via the admin virtqueue. And we had similar requirement for virtio-MMIO[2]
>> and managed device or SF [3], so I would rather to introduce IMS (need a
>> better name though) as a basic facility instead of tie it to any specific
>> transport.
>>
> IMS of [1] is a interrupt configuration by the virtio driver for the device is it driving, which needs a queue.
> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a generic admin queue not attached to device type.
> And AQ in this proposal exactly serves this purpose.
>
> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max vector count are two different functionality.
> Both of these commands can ride on a generic queue.
> However the queue is not same, because
> PF owns its own admin queue (for vf msix config),
> VF or SF operates its own admin queue (for IMS config).


So I think in the next version we need to clarify:

1) is there a single admin virtqueue shared by all the VFs and PF

or

2) per VF/PF admin virtqueue, and how does the driver know how to find 
the corresponding admin virtqueue


>
> So a good example is,
> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ in GVM.
> Both the functions will have AQ feature bit set.


Where did the VF_AQ sit? I guess it belongs to the VF. But if this is 
true, don't we need some kind of address isolation like PASID?


>
> Fair enough, so we have more users of admin queue than just MSI-X config.


Well, what I really meant is that we actually have more users of IMS. 
That's is exactly what virito-mmio wants. In this case introducing admin 
queue looks too heavyweight for that.


>
>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
>>> the queues 3. Update commit log to describe why config space is not
>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
>>
>> I fail to understand the scale/registeres issues. With the one of my previous
>> proposal (device selector), technically we don't even need any config space or
>> BAR for VF or SF by multiplexing the registers for PF.
>>
> Scale issue is: when you want to create, query, manipulate hundreds of objects, having shared MMIO register or configuration register, will be too slow.


Ok, this need to be clarified in the commit log. And we need make sure 
it's not an issue that is only happen for some specific vendor. I was 
told by some DPU vendors that MMIO register is just DRAM for them.


> And additionally such register set doesn't scale to allow sharing large number of bytes as DMA cannot be done.


That's true.


>
>  From physical device perspective, it doesn’t scale because device needs to have those resources ready to answer on MMIO reads and for hundreds to thousand of devices it just cannot do it.
> This is one of the reason for birth of IMS.


IMS allows the table to be stored in the memory and cached by the device 
to have the best scalability. But I had other questions:

1) if we have a single admin virtqueue, there will still be contention 
in the driver side

2) if we have per vf admin virtqueue, it still doesn't scale since it 
occupies more hardware resources


>
>> I do see one advantage is that the admin virtqueue is transport independent
>> (or it could be used as a transport).
>>
> I am yet to read the transport part from [1].


Yes, the main goal is to be compatible with SIOV.

Thanks


>
>>> 4. Improve documentation around msix config to link to sriov section of virtio
>> spec
>>> 5. Describe error that if VF is bound to the device, admin commands
>> targeting VF can fail, describe this error code
>>> Did I miss anything?
>>>
>>> Yet to receive your feedback on group, if/why is it needed and, why/if it must
>> be in this proposal, what pieces prevents it do as follow-on.
>>> Cornelia, Jason,
>>> Can you please review current proposal as well before we revise v2?
>>
>> If I understand correctly, most of the features (except for the admin
>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
>> discussed in the previous versions, I still think it's better:
>>
>> 1) adding sections in the basic device facility or data structure for
>> provisioning and MSI
>> 2) introduce admin virtqueue on top as an device interface for those
>> features
>>
> I didn't follow your suggestion. Can you please explain?
> Specifically "data structure for provisioning and MSI"..


I meant:

There's a chapter "Basic Facilities of a Virtio Device", we can 
introduce the concepts there like:

1) Managed device and Management device (terminology proposed by 
Michael), and can use PF and VF as a example

2) Managed device provisioning (the data structure to specify the 
attributes of a managed device (VF))

3) MSI

And then we can introduced admin virtqueue in either

1) transport part

or

2) PCI transport

In the admin virtqueue, there will be commands to provision and 
configure MSI.


>   
>> The leaves the chance for future extensions to allow those features to
>> be used by transport specific interface which will benefit for
>>
> AQ allows communication (command, response) between driver and device in transport independent way.
> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
> Sometimes device configure its on IMS interrupt.
> Something else in future.
> So it is really a generic request-response queue.


I agree, but I think we can't mandate new features to a specific transport.

Thanks


>
>> [1]
>> https://lists.oasis-open.org/archives/virtio-comment/202108/msg00025.html


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 1/5] Add virtio Admin Virtqueue specification
  2022-01-19  8:11           ` Michael S. Tsirkin
@ 2022-01-25  3:35             ` Jason Wang
  0 siblings, 0 replies; 110+ messages in thread
From: Jason Wang @ 2022-01-25  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, virtio-comment, Cornelia Huck, Virtio-Dev,
	Parav Pandit, Shahaf Shuler, Oren Duer, Stefan Hajnoczi

On Wed, Jan 19, 2022 at 4:11 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 19, 2022 at 11:04:50AM +0800, Jason Wang wrote:
> > Exactly, another call for the using the PRS queue instead. But generally
> > speaking, admin virtqueue limit or complicate the functions that can be
> > exported to guest. That's why I suggest to decouple all the possible
> > features out of admin virtqueue, and make it available by both the admin
> > virtqueue and the transport specific method (e.g capability).
>
> I'm not exactly sure what's wrong with starting with a queue, if there's
> need to also allow passing that over another transport we can add that.

Nothing wrong, but I think we can't mandate the features to be
implemented solely via admin virtqueue. Each transport has its own use
cases. Making admin virtqueue to be visible in the nested environment
will be a challenge.

> In particular, I think it's useful to have a capability to inject
> requests as if they have been passed through a VQ.
> Such a capability would address this need, won't it?

It really depends on the requirement, for simple requests like MSI
support in virtio-mmio, introducing such a large change seems
sub-optimal than an ad-hoc MSI interface.

Thanks


>
> --
> MST
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19  9:30                               ` Michael S. Tsirkin
@ 2022-01-25  3:39                                 ` Jason Wang
  0 siblings, 0 replies; 110+ messages in thread
From: Jason Wang @ 2022-01-25  3:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 19, 2022 at 5:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 19, 2022 at 12:21:36PM +0800, Jason Wang wrote:
> >
> > 在 2022/1/18 下午3:40, Michael S. Tsirkin 写道:
> > > On Tue, Jan 18, 2022 at 07:30:34AM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, January 18, 2022 12:42 PM
> > > > >
> > > > > On Tue, Jan 18, 2022 at 07:07:03AM +0000, Parav Pandit wrote:
> > > > > > 1. Use AQ for msix query and config
> > > > > > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > > > > > the queues 3. Update commit log to describe why config space is not
> > > > > > chosen (scale, on-die registers, uniform way to handle all aq cmds) 4.
> > > > > > Improve documentation around msix config to link to sriov section of
> > > > > > virtio spec 5. Describe error that if VF is bound to the device, admin
> > > > > > commands targeting VF can fail, describe this error code
> > > > > >
> > > > > > Did I miss anything?
> > > > > Better document in spec text just what is the scope for AQ.
> > > > >
> > > > Yes, will improve this spec.
> > > > > > Yet to receive your feedback on group, if/why is it needed and, why/if it must
> > > > > be in this proposal, what pieces prevents it do as follow-on.
> > > > >
> > > > > I think this is related to the subfunction usecase or other future usecase. In
> > > > > case of PF/VF grouping is implicit through the SRIOV capability. It would be
> > > > > nice to have things somewhat generic in most of the text though since we
> > > > > already know this will be needed.
> > > > > E.g. jason sent a proposal for commands to add/delete subfunctions, take a
> > > > > look at it, somehow AQ needs to be extendable to support that functionality
> > > > > too.
> > > > I looked briefly to it. AQ can be used for such purpose. Current proposal adds only msix config piece.
> > > > But more commands can be added in future.
> > > >
> > > > What I wanted to check with you and other is, do we want command opcode to be 7-bit enough?
> > > > #127 is lot of admin commands. 😊
> > > > But given virtio spec diversity of transport and device types, I was thinking to keep it 15-bit for future proofing.
> > > > What do you think?
> > > I agree, we are not short on bits.
> > >
> > > > An unrelated command to AQ in Jason's proposal [1] is about " The management driver MUST create a managed device by allocating".
> > > > We see that creator of the subfunction is often not the only entity managing it.
> > > I think whoever does it can go through the main function driver.
> > >
> > > > They being same in new era finding less and less users.
> > > > So this piece needs more discussion whenever we address that.
> > > >
> > > > [1] https://lists.oasis-open.org/archives/virtio-comment/202108/msg00136.html
> >
> >
> > Yes, I do that for dynamic provisioning which seems a requirement (or better
> > to have) for SIOV spec. We can extend or tweak it for static provisioning.
> >
> > Thanks
> >
>
> So you are basically saying that since with scalable iov we need
> commands to create subfunctions, let's straight away teach
> people to use them to manage VFs.
> So before a VF can be used, you are asking that people "allocate" it
> through a PF.  Is that right?

Right.

>
> I have to say that addresses one concern I just had, which is that
> it's unclear what is the status of a VF before any commands are
> issued.

I'm not even sure it's possible, my understanding is that most vendors
choose to go with static provisioning via sriov_numvfs. So such
dynamic on demand provisioning might be tricky.

For SR-IOV it has another subtle limitation that mandates all VF to
have the same device type.

Thanks

>
>
> --
> MST
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-19 20:25                           ` Parav Pandit
@ 2022-01-25  3:45                             ` Jason Wang
  2022-01-25  4:07                               ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-25  3:45 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha


在 2022/1/20 上午4:25, Parav Pandit 写道:
> Hi Jason,
>
>> From: Parav Pandit
>> Sent: Wednesday, January 19, 2022 10:18 AM
>>
>>
>>> From: Jason Wang <jasowang@redhat.com>
>>> Sent: Wednesday, January 19, 2022 9:33 AM
>>>
>>>
>>> It it means IMS, there's already a proposal[1] that introduce MSI
>>> commands via the admin virtqueue. And we had similar requirement for
>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
>>> introduce IMS (need a better name though) as a basic facility instead
>>> of tie it to any specific transport.
>>>
>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
>> driving, which needs a queue.
>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
>> generic admin queue not attached to device type.
>> And AQ in this proposal exactly serves this purpose.
>>
>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
>> vector count are two different functionality.
>> Both of these commands can ride on a generic queue.
>> However the queue is not same, because
>> PF owns its own admin queue (for vf msix config), VF or SF operates its own
>> admin queue (for IMS config).
>>
>> So a good example is,
>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ in
>> GVM.
>> Both the functions will have AQ feature bit set.
>>
>> Fair enough, so we have more users of admin queue than just MSI-X config.
>>
> If we treat the AQ as any other VQ, than IMS configuration by the virtio driver of #2 for VF or SF cannot be done through AQ.
> Because AQ needs to be live before setting DRIVER_OK.
> And spec mandates "The driver MUST configure the other virtqueue fields before enabling the virtqueue with queue_enable."
> This includes the vector configuration.
> So AQ will operate in polling mode which is not so good.
> And it cannot be disabled to update MSI-X/IMS vectore because spec says "The driver MUST NOT write a 0 to queue_enable.".
>
> Make special exception to AQ is equally not good in physical device implementation given all the queue config is located in single config space.
>
> So a better approach for IMS configuration would be to have, limited config registers to bootstrap IMS configuration.
> Something like,
>
> struct virtio_pci_dev_interrupt_cfg {
> 	le16 max_device_interrupts; /* read-only tells maximum ims interrupts */


Is it better to use the same terminology with MSI capability like 
"max_device_vectors"


> 	le16 refer_index; /* write-only interrupt index to program */
> 	u8 status;	/* read only response by device, 0x0 = success, 0x1=busy, other error codes */
> 	u8 enable;	/* enable/disable interrupt referred by index */


Any reason we need this assuming we've already had PCI_NO_VECTOR in 
common cfg? I think what's better to have is the ability to mask and 
unmask a vector.


> 	le16 reserved_pad;
> 	u64 addr; /* write only interrupt addr handle on MSIX TLPs */
> 	u32 data; /* write only interrupt data handle.. */
> };


Is this better to split the above into different commands?

Thanks


> This way, VF, SF can configure large number of vectors before starting the queue and without on-chip resource.
> Only on-chip resource required per VF is 20 bytes of above cfg registers.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-25  3:29                           ` Jason Wang
@ 2022-01-25  3:52                             ` Parav Pandit
  2022-01-25 10:59                               ` Max Gurtovoy
  2022-01-26  5:04                               ` Jason Wang
  0 siblings, 2 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-25  3:52 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha

Hi Jason,

> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, January 25, 2022 8:59 AM
> 
> 在 2022/1/19 下午12:48, Parav Pandit 写道:
> >> From: Jason Wang <jasowang@redhat.com>
> >> Sent: Wednesday, January 19, 2022 9:33 AM
> >>
> >>
> >> It it means IMS, there's already a proposal[1] that introduce MSI
> >> commands via the admin virtqueue. And we had similar requirement for
> >> virtio-MMIO[2] and managed device or SF [3], so I would rather to
> >> introduce IMS (need a better name though) as a basic facility instead
> >> of tie it to any specific transport.
> >>
> > IMS of [1] is a interrupt configuration by the virtio driver for the device is it
> driving, which needs a queue.
> > So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
> generic admin queue not attached to device type.
> > And AQ in this proposal exactly serves this purpose.
> >
> > Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
> vector count are two different functionality.
> > Both of these commands can ride on a generic queue.
> > However the queue is not same, because PF owns its own admin queue
> > (for vf msix config), VF or SF operates its own admin queue (for IMS
> > config).
> 
> 
> So I think in the next version we need to clarify:
> 
> 1) is there a single admin virtqueue shared by all the VFs and PF
> 
> or
> 
> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
> corresponding admin virtqueue
>

Admin queue is not per VF.
Lets take concrete examples.
1. So for example, PCI PF can have one AQ.
This AQ carries command to query/config MSI-X vector of VFs.

2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.

3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
May be something that is extremely hard to do over features bit.
Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
 
> 
> >
> > So a good example is,
> > 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> > 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
> in GVM.
> > Both the functions will have AQ feature bit set.
> 
> 
> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
> true, don't we need some kind of address isolation like PASID?
>
Above one for IMS is not a good example. I replied the reasoning last week for it.
 
> 
> >
> > Fair enough, so we have more users of admin queue than just MSI-X config.
> 
> 
> Well, what I really meant is that we actually have more users of IMS.
> That's is exactly what virito-mmio wants. In this case introducing admin
> queue looks too heavyweight for that.
> 
IMS config cannot be done over AQ as described in previous email in this thread.

> 
> >
> >>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> >>> the queues 3. Update commit log to describe why config space is not
> >>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
> >>
> >> I fail to understand the scale/registeres issues. With the one of my previous
> >> proposal (device selector), technically we don't even need any config space
> or
> >> BAR for VF or SF by multiplexing the registers for PF.
> >>
> > Scale issue is: when you want to create, query, manipulate hundreds of
> objects, having shared MMIO register or configuration register, will be too
> slow.
> 
> 
> Ok, this need to be clarified in the commit log. And we need make sure
> it's not an issue that is only happen for some specific vendor. 
It is present in the v2 commit log cover letter.
Please let me know if you think it should be in the actual patch commit log.


> > And additionally such register set doesn't scale to allow sharing large
> number of bytes as DMA cannot be done.
> 
> 
> That's true.
> 
> 
> >
> >  From physical device perspective, it doesn’t scale because device needs to
> have those resources ready to answer on MMIO reads and for hundreds to
> thousand of devices it just cannot do it.
> > This is one of the reason for birth of IMS.
> 
> 
> IMS allows the table to be stored in the memory and cached by the device
> to have the best scalability. But I had other questions:
> 
> 1) if we have a single admin virtqueue, there will still be contention
> in the driver side
>
AQ inherently allows out of order commands execution.
It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.

Which area/commands etc you think can lead to the contention?
 
> 2) if we have per vf admin virtqueue, it still doesn't scale since it
> occupies more hardware resources
>
That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
Proposal is to have one admin queue in a virtio device.
 
> 
> >
> >> I do see one advantage is that the admin virtqueue is transport
> independent
> >> (or it could be used as a transport).
> >>
> > I am yet to read the transport part from [1].
> 
> 
> Yes, the main goal is to be compatible with SIOV.
> 
Admin queue is a command interface transport where higher layer services can be buit.
This includes SR-IOV config, SIOV config.
And v2 enables SIOV commands implementation whenever they are ready.

> >
> >>> 4. Improve documentation around msix config to link to sriov section of
> virtio
> >> spec
> >>> 5. Describe error that if VF is bound to the device, admin commands
> >> targeting VF can fail, describe this error code
> >>> Did I miss anything?
> >>>
> >>> Yet to receive your feedback on group, if/why is it needed and, why/if it
> must
> >> be in this proposal, what pieces prevents it do as follow-on.
> >>> Cornelia, Jason,
> >>> Can you please review current proposal as well before we revise v2?
> >>
> >> If I understand correctly, most of the features (except for the admin
> >> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
> >> discussed in the previous versions, I still think it's better:
> >>
> >> 1) adding sections in the basic device facility or data structure for
> >> provisioning and MSI
> >> 2) introduce admin virtqueue on top as an device interface for those
> >> features
> >>
> > I didn't follow your suggestion. Can you please explain?
> > Specifically "data structure for provisioning and MSI"..
> 
> 
> I meant:
> 
> There's a chapter "Basic Facilities of a Virtio Device", we can
> introduce the concepts there like:
> 
> 1) Managed device and Management device (terminology proposed by
> Michael), and can use PF and VF as a example
> 
> 2) Managed device provisioning (the data structure to specify the
> attributes of a managed device (VF))
> 
> 3) MSI
>
Above is good idea. Will revisit v2, if it is not arranged this way.
 
> And then we can introduced admin virtqueue in either
> 
> 1) transport part
> 
> or
> 
> 2) PCI transport
>
It is not specific to PCI transport, and currently it is not a transport either.
So admin queue will keep as general entity for admin work.
 
> In the admin virtqueue, there will be commands to provision and
> configure MSI.
> 
Please review v2 if it is not arranged this way.

> 
> >
> >> The leaves the chance for future extensions to allow those features to
> >> be used by transport specific interface which will benefit for
> >>
> > AQ allows communication (command, response) between driver and device
> in transport independent way.
> > Sometimes it query/set transport specific fields like MSI-X vectors of VF.
> > Sometimes device configure its on IMS interrupt.
> > Something else in future.
> > So it is really a generic request-response queue.
> 
> 
> I agree, but I think we can't mandate new features to a specific transport.
>
Certainly. Admin queue is transport independent.
PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.

Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
For example, IMS for VF or IMS for SF.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-19  9:26                 ` Michael S. Tsirkin
@ 2022-01-25  3:53                   ` Jason Wang
  2022-01-25  7:19                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-25  3:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > We also need
> > > - something like injecting cvq commands to control rx mode from the admin device
> > > - page fault / dirty page handling
> > >
> > > these two seem to call for a vq.
> >
> > Right, but vq is not necessarily for PF if we had PASID. And with
> > PASID we don't even need a dedicated new cvq.
>
> I don't think it's a good idea to mix transactions from
> multiple PASIDs on the same vq.

To be clear, I don't mean to let a single vq use multiple PASIDs.

>
> Attaching a PASID to a queue seems more reasonable.
> cvq is under guest control, so yes I think a separate
> vq is preferable.

Sorry, I don't get here. E.g in the case of virtio-net, it's more than
sufficient to assign a dedicated PASID to cvq, any reason for yet
another one?

>
> What is true is that with subfunctions you would have
> PASID per subfunction and then one subfunction for control.

Well, it's possible, but it's also possible to have everything self
contained in a single subfucntion. Then cvq can be assigned to a PASID
that is used only for the hypervisor.

>
> I think a sketch of how things will work with scalable iov can't hurt as
> part of this proposal.  And, I'm not sure we should have so much
> flexibility: if there's an interface that works for SRIOV and SIOV then
> that seems preferable than having distinct transports for SRIOV and
> SIOV.

Some of my understanding of SR-IOV vs SIOV:

1) SR-IOV doesn't requires a transport, VF use PCI config space; But
SIOV requires one
2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can

So I'm not sure how hard it is if we want to unify the management
plane of the above two.

Thanks


>
>
> --
> MST
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-25  3:45                             ` Jason Wang
@ 2022-01-25  4:07                               ` Parav Pandit
  0 siblings, 0 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-25  4:07 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, January 25, 2022 9:16 AM
> 
> 在 2022/1/20 上午4:25, Parav Pandit 写道:
> > Hi Jason,
> >
> >> From: Parav Pandit
> >> Sent: Wednesday, January 19, 2022 10:18 AM
> >>
> >>
> >>> From: Jason Wang <jasowang@redhat.com>
> >>> Sent: Wednesday, January 19, 2022 9:33 AM
> >>>
> >>>
> >>> It it means IMS, there's already a proposal[1] that introduce MSI
> >>> commands via the admin virtqueue. And we had similar requirement for
> >>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
> >>> introduce IMS (need a better name though) as a basic facility
> >>> instead of tie it to any specific transport.
> >>>
> >> IMS of [1] is a interrupt configuration by the virtio driver for the
> >> device is it driving, which needs a queue.
> >> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire
> >> to have a generic admin queue not attached to device type.
> >> And AQ in this proposal exactly serves this purpose.
> >>
> >> Device configuring its own IMS vector vs PCI PF configuring VF's
> >> MSI-X max vector count are two different functionality.
> >> Both of these commands can ride on a generic queue.
> >> However the queue is not same, because PF owns its own admin queue
> >> (for vf msix config), VF or SF operates its own admin queue (for IMS
> >> config).
> >>
> >> So a good example is,
> >> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> >> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using
> >> VF_AQ in GVM.
> >> Both the functions will have AQ feature bit set.
> >>
> >> Fair enough, so we have more users of admin queue than just MSI-X config.
> >>
> > If we treat the AQ as any other VQ, than IMS configuration by the virtio
> driver of #2 for VF or SF cannot be done through AQ.
> > Because AQ needs to be live before setting DRIVER_OK.
> > And spec mandates "The driver MUST configure the other virtqueue fields
> before enabling the virtqueue with queue_enable."
> > This includes the vector configuration.
> > So AQ will operate in polling mode which is not so good.
> > And it cannot be disabled to update MSI-X/IMS vectore because spec says
> "The driver MUST NOT write a 0 to queue_enable.".
> >
> > Make special exception to AQ is equally not good in physical device
> implementation given all the queue config is located in single config space.
> >
> > So a better approach for IMS configuration would be to have, limited config
> registers to bootstrap IMS configuration.
> > Something like,
> >
> > struct virtio_pci_dev_interrupt_cfg {
> > 	le16 max_device_interrupts; /* read-only tells maximum ims
> interrupts
> > */
> 
> 
> Is it better to use the same terminology with MSI capability like
> "max_device_vectors"
>
Yep. 
 
> 
> > 	le16 refer_index; /* write-only interrupt index to program */
> > 	u8 status;	/* read only response by device, 0x0 = success,
> 0x1=busy, other error codes */
> > 	u8 enable;	/* enable/disable interrupt referred by index */
> 
> 
> Any reason we need this assuming we've already had PCI_NO_VECTOR in
> common cfg? I think what's better to have is the ability to mask and
> unmask a vector.
> 
Yes. mask/unmask is more appropriate. I just put a high level sketch of what we need to do outside of AQ.
NO_VECTOR is something stored inside the queue config.
Above MMIO structure is to enable/disable (mask/unmask) a specific vector.

> 
> > 	le16 reserved_pad;
> > 	u64 addr; /* write only interrupt addr handle on MSIX TLPs */
> > 	u32 data; /* write only interrupt data handle.. */
> > };
> 
> 
> Is this better to split the above into different commands?
>
Probably yes, I think we also need the ability to see if addr + data is needed.
I hear in an offline discussion that addr + data is not always necessary and burdensome.
So virtio adapting to this flexibility will be probably better.



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-25  3:53                   ` Jason Wang
@ 2022-01-25  7:19                     ` Michael S. Tsirkin
  2022-01-26  5:49                       ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-25  7:19 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > We also need
> > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > - page fault / dirty page handling
> > > >
> > > > these two seem to call for a vq.
> > >
> > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > PASID we don't even need a dedicated new cvq.
> >
> > I don't think it's a good idea to mix transactions from
> > multiple PASIDs on the same vq.
> 
> To be clear, I don't mean to let a single vq use multiple PASIDs.
> 
> >
> > Attaching a PASID to a queue seems more reasonable.
> > cvq is under guest control, so yes I think a separate
> > vq is preferable.
> 
> Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> sufficient to assign a dedicated PASID to cvq, any reason for yet
> another one?

Well I'm not sure how cheap it is to have an extra PASID.
In theory you can share page tables making it not that
expensive. In practice is it hard for the MMU to do so?
If page tables are not shared extra PASIDs become expensive.


> >
> > What is true is that with subfunctions you would have
> > PASID per subfunction and then one subfunction for control.
> 
> Well, it's possible, but it's also possible to have everything self
> contained in a single subfucntion. Then cvq can be assigned to a PASID
> that is used only for the hypervisor.
> 
> >
> > I think a sketch of how things will work with scalable iov can't hurt as
> > part of this proposal.  And, I'm not sure we should have so much
> > flexibility: if there's an interface that works for SRIOV and SIOV then
> > that seems preferable than having distinct transports for SRIOV and
> > SIOV.
> 
> Some of my understanding of SR-IOV vs SIOV:
> 
> 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> SIOV requires one
> 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> 
> So I'm not sure how hard it is if we want to unify the management
> plane of the above two.
> 
> Thanks

Interesting. So are you fine with a proposal which ignores the PASID
things completely then? If yes can we take that discussion to
a different thread then? This one is already too long ...


> 
> >
> >
> > --
> > MST
> >


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-25  3:52                             ` Parav Pandit
@ 2022-01-25 10:59                               ` Max Gurtovoy
  2022-01-25 12:09                                 ` Michael S. Tsirkin
  2022-01-26  7:03                                 ` Jason Wang
  2022-01-26  5:04                               ` Jason Wang
  1 sibling, 2 replies; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-25 10:59 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Michael S. Tsirkin
  Cc: cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha


On 1/25/2022 5:52 AM, Parav Pandit wrote:
> Hi Jason,
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, January 25, 2022 8:59 AM
>>
>> 在 2022/1/19 下午12:48, Parav Pandit 写道:
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Wednesday, January 19, 2022 9:33 AM
>>>>
>>>>
>>>> It it means IMS, there's already a proposal[1] that introduce MSI
>>>> commands via the admin virtqueue. And we had similar requirement for
>>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
>>>> introduce IMS (need a better name though) as a basic facility instead
>>>> of tie it to any specific transport.
>>>>
>>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
>> driving, which needs a queue.
>>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
>> generic admin queue not attached to device type.
>>> And AQ in this proposal exactly serves this purpose.
>>>
>>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
>> vector count are two different functionality.
>>> Both of these commands can ride on a generic queue.
>>> However the queue is not same, because PF owns its own admin queue
>>> (for vf msix config), VF or SF operates its own admin queue (for IMS
>>> config).
>>
>> So I think in the next version we need to clarify:
>>
>> 1) is there a single admin virtqueue shared by all the VFs and PF
>>
>> or
>>
>> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
>> corresponding admin virtqueue
>>
> Admin queue is not per VF.
> Lets take concrete examples.
> 1. So for example, PCI PF can have one AQ.
> This AQ carries command to query/config MSI-X vector of VFs.
>
> 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
>
> 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
> May be something that is extremely hard to do over features bit.
> Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
>   
>>> So a good example is,
>>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
>>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
>> in GVM.
>>> Both the functions will have AQ feature bit set.
>>
>> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
>> true, don't we need some kind of address isolation like PASID?
>>
> Above one for IMS is not a good example. I replied the reasoning last week for it.
>   
>>> Fair enough, so we have more users of admin queue than just MSI-X config.
>>
>> Well, what I really meant is that we actually have more users of IMS.
>> That's is exactly what virito-mmio wants. In this case introducing admin
>> queue looks too heavyweight for that.
>>
> IMS config cannot be done over AQ as described in previous email in this thread.
>
>>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
>>>>> the queues 3. Update commit log to describe why config space is not
>>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
>>>> I fail to understand the scale/registeres issues. With the one of my previous
>>>> proposal (device selector), technically we don't even need any config space
>> or
>>>> BAR for VF or SF by multiplexing the registers for PF.
>>>>
>>> Scale issue is: when you want to create, query, manipulate hundreds of
>> objects, having shared MMIO register or configuration register, will be too
>> slow.
>>
>>
>> Ok, this need to be clarified in the commit log. And we need make sure
>> it's not an issue that is only happen for some specific vendor.
> It is present in the v2 commit log cover letter.
> Please let me know if you think it should be in the actual patch commit log.
>
>
>>> And additionally such register set doesn't scale to allow sharing large
>> number of bytes as DMA cannot be done.
>>
>>
>> That's true.
>>
>>
>>>   From physical device perspective, it doesn’t scale because device needs to
>> have those resources ready to answer on MMIO reads and for hundreds to
>> thousand of devices it just cannot do it.
>>> This is one of the reason for birth of IMS.
>>
>> IMS allows the table to be stored in the memory and cached by the device
>> to have the best scalability. But I had other questions:
>>
>> 1) if we have a single admin virtqueue, there will still be contention
>> in the driver side
>>
> AQ inherently allows out of order commands execution.
> It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
>
> Which area/commands etc you think can lead to the contention?
>   
>> 2) if we have per vf admin virtqueue, it still doesn't scale since it
>> occupies more hardware resources
>>
> That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
> Proposal is to have one admin queue in a virtio device.

Right ? where did we mention something that can imply otherwise ?


>   
>>>> I do see one advantage is that the admin virtqueue is transport
>> independent
>>>> (or it could be used as a transport).
>>>>
>>> I am yet to read the transport part from [1].
>>
>> Yes, the main goal is to be compatible with SIOV.
>>
> Admin queue is a command interface transport where higher layer services can be buit.
> This includes SR-IOV config, SIOV config.
> And v2 enables SIOV commands implementation whenever they are ready.
>
>>>>> 4. Improve documentation around msix config to link to sriov section of
>> virtio
>>>> spec
>>>>> 5. Describe error that if VF is bound to the device, admin commands
>>>> targeting VF can fail, describe this error code
>>>>> Did I miss anything?
>>>>>
>>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it
>> must
>>>> be in this proposal, what pieces prevents it do as follow-on.
>>>>> Cornelia, Jason,
>>>>> Can you please review current proposal as well before we revise v2?
>>>> If I understand correctly, most of the features (except for the admin
>>>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
>>>> discussed in the previous versions, I still think it's better:
>>>>
>>>> 1) adding sections in the basic device facility or data structure for
>>>> provisioning and MSI
>>>> 2) introduce admin virtqueue on top as an device interface for those
>>>> features
>>>>
>>> I didn't follow your suggestion. Can you please explain?
>>> Specifically "data structure for provisioning and MSI"..
>>
>> I meant:
>>
>> There's a chapter "Basic Facilities of a Virtio Device", we can
>> introduce the concepts there like:
>>
>> 1) Managed device and Management device (terminology proposed by
>> Michael), and can use PF and VF as a example
>>
>> 2) Managed device provisioning (the data structure to specify the
>> attributes of a managed device (VF))
>>
>> 3) MSI
>>
> Above is good idea. Will revisit v2, if it is not arranged this way.

Let me make sure I understand, you would like to see a new chapter under 
"Basic Facilities of a Virtio Device" that is

called "Device management" and this chapter will explain in few words 
the concept and it will point to another chapter under "Basic Facilities 
of a Virtio Device"

that was introduced here "Admin Virtqueues" ?

So you do agree that managing a managed (create/destroy/setup/etc...) 
will be done using the AQ of the managing device ?

>   
>> And then we can introduced admin virtqueue in either
>>
>> 1) transport part
>>
>> or
>>
>> 2) PCI transport
>>
> It is not specific to PCI transport, and currently it is not a transport either.
> So admin queue will keep as general entity for admin work.
>   
>> In the admin virtqueue, there will be commands to provision and
>> configure MSI.
>>
> Please review v2 if it is not arranged this way.
>
>>>> The leaves the chance for future extensions to allow those features to
>>>> be used by transport specific interface which will benefit for
>>>>
>>> AQ allows communication (command, response) between driver and device
>> in transport independent way.
>>> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
>>> Sometimes device configure its on IMS interrupt.
>>> Something else in future.
>>> So it is really a generic request-response queue.
>>
>> I agree, but I think we can't mandate new features to a specific transport.
>>
> Certainly. Admin queue is transport independent.
> PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
> It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
>
> Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
> For example, IMS for VF or IMS for SF.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-25 10:59                               ` Max Gurtovoy
@ 2022-01-25 12:09                                 ` Michael S. Tsirkin
  2022-01-26 13:29                                   ` Parav Pandit
  2022-01-26  7:03                                 ` Jason Wang
  1 sibling, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-25 12:09 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Parav Pandit, Jason Wang, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha

On Tue, Jan 25, 2022 at 12:59:16PM +0200, Max Gurtovoy wrote:
> 
> On 1/25/2022 5:52 AM, Parav Pandit wrote:
> > Hi Jason,
> > 
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, January 25, 2022 8:59 AM
> > > 
> > > 在 2022/1/19 下午12:48, Parav Pandit 写道:
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Wednesday, January 19, 2022 9:33 AM
> > > > > 
> > > > > 
> > > > > It it means IMS, there's already a proposal[1] that introduce MSI
> > > > > commands via the admin virtqueue. And we had similar requirement for
> > > > > virtio-MMIO[2] and managed device or SF [3], so I would rather to
> > > > > introduce IMS (need a better name though) as a basic facility instead
> > > > > of tie it to any specific transport.
> > > > > 
> > > > IMS of [1] is a interrupt configuration by the virtio driver for the device is it
> > > driving, which needs a queue.
> > > > So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
> > > generic admin queue not attached to device type.
> > > > And AQ in this proposal exactly serves this purpose.
> > > > 
> > > > Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
> > > vector count are two different functionality.
> > > > Both of these commands can ride on a generic queue.
> > > > However the queue is not same, because PF owns its own admin queue
> > > > (for vf msix config), VF or SF operates its own admin queue (for IMS
> > > > config).
> > > 
> > > So I think in the next version we need to clarify:
> > > 
> > > 1) is there a single admin virtqueue shared by all the VFs and PF
> > > 
> > > or
> > > 
> > > 2) per VF/PF admin virtqueue, and how does the driver know how to find the
> > > corresponding admin virtqueue
> > > 
> > Admin queue is not per VF.
> > Lets take concrete examples.
> > 1. So for example, PCI PF can have one AQ.
> > This AQ carries command to query/config MSI-X vector of VFs.
> > 
> > 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
> > 
> > 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
> > May be something that is extremely hard to do over features bit.
> > Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
> > > > So a good example is,
> > > > 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> > > > 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
> > > in GVM.
> > > > Both the functions will have AQ feature bit set.
> > > 
> > > Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
> > > true, don't we need some kind of address isolation like PASID?
> > > 
> > Above one for IMS is not a good example. I replied the reasoning last week for it.
> > > > Fair enough, so we have more users of admin queue than just MSI-X config.
> > > 
> > > Well, what I really meant is that we actually have more users of IMS.
> > > That's is exactly what virito-mmio wants. In this case introducing admin
> > > queue looks too heavyweight for that.
> > > 
> > IMS config cannot be done over AQ as described in previous email in this thread.
> > 
> > > > > > 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> > > > > > the queues 3. Update commit log to describe why config space is not
> > > > > > chosen (scale, on-die registers, uniform way to handle all aq cmds)
> > > > > I fail to understand the scale/registeres issues. With the one of my previous
> > > > > proposal (device selector), technically we don't even need any config space
> > > or
> > > > > BAR for VF or SF by multiplexing the registers for PF.
> > > > > 
> > > > Scale issue is: when you want to create, query, manipulate hundreds of
> > > objects, having shared MMIO register or configuration register, will be too
> > > slow.
> > > 
> > > 
> > > Ok, this need to be clarified in the commit log. And we need make sure
> > > it's not an issue that is only happen for some specific vendor.
> > It is present in the v2 commit log cover letter.
> > Please let me know if you think it should be in the actual patch commit log.
> > 
> > 
> > > > And additionally such register set doesn't scale to allow sharing large
> > > number of bytes as DMA cannot be done.
> > > 
> > > 
> > > That's true.
> > > 
> > > 
> > > >   From physical device perspective, it doesn’t scale because device needs to
> > > have those resources ready to answer on MMIO reads and for hundreds to
> > > thousand of devices it just cannot do it.
> > > > This is one of the reason for birth of IMS.
> > > 
> > > IMS allows the table to be stored in the memory and cached by the device
> > > to have the best scalability. But I had other questions:
> > > 
> > > 1) if we have a single admin virtqueue, there will still be contention
> > > in the driver side
> > > 
> > AQ inherently allows out of order commands execution.
> > It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
> > 
> > Which area/commands etc you think can lead to the contention?
> > > 2) if we have per vf admin virtqueue, it still doesn't scale since it
> > > occupies more hardware resources
> > > 
> > That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
> > Proposal is to have one admin queue in a virtio device.
> 
> Right ? where did we mention something that can imply otherwise ?
> 
> 
> > > > > I do see one advantage is that the admin virtqueue is transport
> > > independent
> > > > > (or it could be used as a transport).
> > > > > 
> > > > I am yet to read the transport part from [1].
> > > 
> > > Yes, the main goal is to be compatible with SIOV.
> > > 
> > Admin queue is a command interface transport where higher layer services can be buit.
> > This includes SR-IOV config, SIOV config.
> > And v2 enables SIOV commands implementation whenever they are ready.
> > 
> > > > > > 4. Improve documentation around msix config to link to sriov section of
> > > virtio
> > > > > spec
> > > > > > 5. Describe error that if VF is bound to the device, admin commands
> > > > > targeting VF can fail, describe this error code
> > > > > > Did I miss anything?
> > > > > > 
> > > > > > Yet to receive your feedback on group, if/why is it needed and, why/if it
> > > must
> > > > > be in this proposal, what pieces prevents it do as follow-on.
> > > > > > Cornelia, Jason,
> > > > > > Can you please review current proposal as well before we revise v2?
> > > > > If I understand correctly, most of the features (except for the admin
> > > > > virtqueue in_order stuffs) are not specific to the admin virtqueue. As
> > > > > discussed in the previous versions, I still think it's better:
> > > > > 
> > > > > 1) adding sections in the basic device facility or data structure for
> > > > > provisioning and MSI
> > > > > 2) introduce admin virtqueue on top as an device interface for those
> > > > > features
> > > > > 
> > > > I didn't follow your suggestion. Can you please explain?
> > > > Specifically "data structure for provisioning and MSI"..
> > > 
> > > I meant:
> > > 
> > > There's a chapter "Basic Facilities of a Virtio Device", we can
> > > introduce the concepts there like:
> > > 
> > > 1) Managed device and Management device (terminology proposed by
> > > Michael), and can use PF and VF as a example
> > > 
> > > 2) Managed device provisioning (the data structure to specify the
> > > attributes of a managed device (VF))
> > > 
> > > 3) MSI
> > > 
> > Above is good idea. Will revisit v2, if it is not arranged this way.
> 
> Let me make sure I understand, you would like to see a new chapter under
> "Basic Facilities of a Virtio Device" that is
> 
> called "Device management" and this chapter will explain in few words the
> concept and it will point to another chapter under "Basic Facilities of a
> Virtio Device"
> 
> that was introduced here "Admin Virtqueues" ?
> 
> So you do agree that managing a managed (create/destroy/setup/etc...) will
> be done using the AQ of the managing device ?

I think Jason asked that the management commands are split from the
queue itself, such that they can be implemented in more ways down the
road.

> > > And then we can introduced admin virtqueue in either
> > > 
> > > 1) transport part
> > > 
> > > or
> > > 
> > > 2) PCI transport
> > > 
> > It is not specific to PCI transport, and currently it is not a transport either.
> > So admin queue will keep as general entity for admin work.
> > > In the admin virtqueue, there will be commands to provision and
> > > configure MSI.
> > > 
> > Please review v2 if it is not arranged this way.
> > 
> > > > > The leaves the chance for future extensions to allow those features to
> > > > > be used by transport specific interface which will benefit for
> > > > > 
> > > > AQ allows communication (command, response) between driver and device
> > > in transport independent way.
> > > > Sometimes it query/set transport specific fields like MSI-X vectors of VF.
> > > > Sometimes device configure its on IMS interrupt.
> > > > Something else in future.
> > > > So it is really a generic request-response queue.
> > > 
> > > I agree, but I think we can't mandate new features to a specific transport.
> > > 
> > Certainly. Admin queue is transport independent.
> > PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
> > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> > 
> > Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
> > For example, IMS for VF or IMS for SF.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-25  3:52                             ` Parav Pandit
  2022-01-25 10:59                               ` Max Gurtovoy
@ 2022-01-26  5:04                               ` Jason Wang
  2022-01-26  5:26                                 ` Parav Pandit
  1 sibling, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  5:04 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha


在 2022/1/25 上午11:52, Parav Pandit 写道:
> Hi Jason,
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, January 25, 2022 8:59 AM
>>
>> 在 2022/1/19 下午12:48, Parav Pandit 写道:
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Wednesday, January 19, 2022 9:33 AM
>>>>
>>>>
>>>> It it means IMS, there's already a proposal[1] that introduce MSI
>>>> commands via the admin virtqueue. And we had similar requirement for
>>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
>>>> introduce IMS (need a better name though) as a basic facility instead
>>>> of tie it to any specific transport.
>>>>
>>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
>> driving, which needs a queue.
>>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
>> generic admin queue not attached to device type.
>>> And AQ in this proposal exactly serves this purpose.
>>>
>>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
>> vector count are two different functionality.
>>> Both of these commands can ride on a generic queue.
>>> However the queue is not same, because PF owns its own admin queue
>>> (for vf msix config), VF or SF operates its own admin queue (for IMS
>>> config).
>>
>> So I think in the next version we need to clarify:
>>
>> 1) is there a single admin virtqueue shared by all the VFs and PF
>>
>> or
>>
>> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
>> corresponding admin virtqueue
>>
> Admin queue is not per VF.
> Lets take concrete examples.
> 1. So for example, PCI PF can have one AQ.
> This AQ carries command to query/config MSI-X vector of VFs.
>
> 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
>
> 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.


So this could be useful if we can create SF on top of VF. But as 
discussed we'd better to generalize the concept (management device vs 
managed device).


> May be something that is extremely hard to do over features bit.
> Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
>   
>>> So a good example is,
>>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
>>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
>> in GVM.
>>> Both the functions will have AQ feature bit set.
>>
>> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
>> true, don't we need some kind of address isolation like PASID?
>>
> Above one for IMS is not a good example. I replied the reasoning last week for it.
>   
>>> Fair enough, so we have more users of admin queue than just MSI-X config.
>>
>> Well, what I really meant is that we actually have more users of IMS.
>> That's is exactly what virito-mmio wants. In this case introducing admin
>> queue looks too heavyweight for that.
>>
> IMS config cannot be done over AQ as described in previous email in this thread.
>
>>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
>>>>> the queues 3. Update commit log to describe why config space is not
>>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
>>>> I fail to understand the scale/registeres issues. With the one of my previous
>>>> proposal (device selector), technically we don't even need any config space
>> or
>>>> BAR for VF or SF by multiplexing the registers for PF.
>>>>
>>> Scale issue is: when you want to create, query, manipulate hundreds of
>> objects, having shared MMIO register or configuration register, will be too
>> slow.
>>
>>
>> Ok, this need to be clarified in the commit log. And we need make sure
>> it's not an issue that is only happen for some specific vendor.
> It is present in the v2 commit log cover letter.
> Please let me know if you think it should be in the actual patch commit log.
>
>
>>> And additionally such register set doesn't scale to allow sharing large
>> number of bytes as DMA cannot be done.
>>
>>
>> That's true.
>>
>>
>>>   From physical device perspective, it doesn’t scale because device needs to
>> have those resources ready to answer on MMIO reads and for hundreds to
>> thousand of devices it just cannot do it.
>>> This is one of the reason for birth of IMS.
>>
>> IMS allows the table to be stored in the memory and cached by the device
>> to have the best scalability. But I had other questions:
>>
>> 1) if we have a single admin virtqueue, there will still be contention
>> in the driver side
>>
> AQ inherently allows out of order commands execution.
> It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
>
> Which area/commands etc you think can lead to the contention?


Unless the we have self-conainted descriptor which contains per 
descriptor writeback address. Even if we have OOO, the enqueue and 
dequeue still needs to be serialized?


>   
>> 2) if we have per vf admin virtqueue, it still doesn't scale since it
>> occupies more hardware resources
>>
> That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
> Proposal is to have one admin queue in a virtio device.


Ok.


>   
>>>> I do see one advantage is that the admin virtqueue is transport
>> independent
>>>> (or it could be used as a transport).
>>>>
>>> I am yet to read the transport part from [1].
>>
>> Yes, the main goal is to be compatible with SIOV.
>>
> Admin queue is a command interface transport where higher layer services can be buit.
> This includes SR-IOV config, SIOV config.
> And v2 enables SIOV commands implementation whenever they are ready.
>
>>>>> 4. Improve documentation around msix config to link to sriov section of
>> virtio
>>>> spec
>>>>> 5. Describe error that if VF is bound to the device, admin commands
>>>> targeting VF can fail, describe this error code
>>>>> Did I miss anything?
>>>>>
>>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it
>> must
>>>> be in this proposal, what pieces prevents it do as follow-on.
>>>>> Cornelia, Jason,
>>>>> Can you please review current proposal as well before we revise v2?
>>>> If I understand correctly, most of the features (except for the admin
>>>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
>>>> discussed in the previous versions, I still think it's better:
>>>>
>>>> 1) adding sections in the basic device facility or data structure for
>>>> provisioning and MSI
>>>> 2) introduce admin virtqueue on top as an device interface for those
>>>> features
>>>>
>>> I didn't follow your suggestion. Can you please explain?
>>> Specifically "data structure for provisioning and MSI"..
>>
>> I meant:
>>
>> There's a chapter "Basic Facilities of a Virtio Device", we can
>> introduce the concepts there like:
>>
>> 1) Managed device and Management device (terminology proposed by
>> Michael), and can use PF and VF as a example
>>
>> 2) Managed device provisioning (the data structure to specify the
>> attributes of a managed device (VF))
>>
>> 3) MSI
>>
> Above is good idea. Will revisit v2, if it is not arranged this way.
>   
>> And then we can introduced admin virtqueue in either
>>
>> 1) transport part
>>
>> or
>>
>> 2) PCI transport
>>
> It is not specific to PCI transport, and currently it is not a transport either.


Kind of, it allows to configure some basic attributes somehow. I think 
we'd better try not to couple any features to admin virtqueue.


> So admin queue will keep as general entity for admin work.
>   
>> In the admin virtqueue, there will be commands to provision and
>> configure MSI.
>>
> Please review v2 if it is not arranged this way.


Ok.


>
>>>> The leaves the chance for future extensions to allow those features to
>>>> be used by transport specific interface which will benefit for
>>>>
>>> AQ allows communication (command, response) between driver and device
>> in transport independent way.
>>> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
>>> Sometimes device configure its on IMS interrupt.
>>> Something else in future.
>>> So it is really a generic request-response queue.
>>
>> I agree, but I think we can't mandate new features to a specific transport.
>>
> Certainly. Admin queue is transport independent.
> PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
> It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
>
> Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
> For example, IMS for VF or IMS for SF.


I don't think IMS is PCI specific stuffs, we had similar requests for MMIO.

Thanks



^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  5:04                               ` Jason Wang
@ 2022-01-26  5:26                                 ` Parav Pandit
  2022-01-26  5:45                                   ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-26  5:26 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Max Gurtovoy, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha



> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, January 26, 2022 10:35 AM
> 
> 在 2022/1/25 上午11:52, Parav Pandit 写道:
> > Hi Jason,
> >
> >> From: Jason Wang <jasowang@redhat.com>
> >> Sent: Tuesday, January 25, 2022 8:59 AM
> >>
> >> 在 2022/1/19 下午12:48, Parav Pandit 写道:
> >>>> From: Jason Wang <jasowang@redhat.com>
> >>>> Sent: Wednesday, January 19, 2022 9:33 AM
> >>>>
> >>>>
> >>>> It it means IMS, there's already a proposal[1] that introduce MSI
> >>>> commands via the admin virtqueue. And we had similar requirement
> >>>> for virtio-MMIO[2] and managed device or SF [3], so I would rather
> >>>> to introduce IMS (need a better name though) as a basic facility
> >>>> instead of tie it to any specific transport.
> >>>>
> >>> IMS of [1] is a interrupt configuration by the virtio driver for the
> >>> device is it
> >> driving, which needs a queue.
> >>> So regardless of the device type as PCI PF/VF/SF/ADI, there is
> >>> desire to have a
> >> generic admin queue not attached to device type.
> >>> And AQ in this proposal exactly serves this purpose.
> >>>
> >>> Device configuring its own IMS vector vs PCI PF configuring VF's
> >>> MSI-X max
> >> vector count are two different functionality.
> >>> Both of these commands can ride on a generic queue.
> >>> However the queue is not same, because PF owns its own admin queue
> >>> (for vf msix config), VF or SF operates its own admin queue (for IMS
> >>> config).
> >>
> >> So I think in the next version we need to clarify:
> >>
> >> 1) is there a single admin virtqueue shared by all the VFs and PF
> >>
> >> or
> >>
> >> 2) per VF/PF admin virtqueue, and how does the driver know how to
> >> find the corresponding admin virtqueue
> >>
> > Admin queue is not per VF.
> > Lets take concrete examples.
> > 1. So for example, PCI PF can have one AQ.
> > This AQ carries command to query/config MSI-X vector of VFs.
> >
> > 2. In second example, PCI PF is creating/destroying SFs. This is again done by
> using the AQ of the PCI PF.
> >
> > 3. A PCI VF has its own AQ to configure some of its own generic attribute,
> don't know which is that today.
> 
> 
> So this could be useful if we can create SF on top of VF. But as discussed we'd
> better to generalize the concept (management device vs managed device).
> 
It does not matter if the SF is created over PCI PF or VF. Its on top of PCI virtio device.
When/if someone creates SF over PCI VF, PCI VF is management device, and PCI SF is managed device.

When/if SF is created over PCI PF, PCI PF is managed device, and PCI SF is managed device.

In either case the AQ on the PCI device is transporting SF create/destroy commands.

> > AQ inherently allows out of order commands execution.
> > It shouldn't face contention. For example 1K depth AQ should be serving
> hundreds of descriptors commands in parallel for SF creation, VF MSI-X config
> and more.
> >
> > Which area/commands etc you think can lead to the contention?
> 
> 
> Unless the we have self-conainted descriptor which contains per descriptor
> writeback address. Even if we have OOO, the enqueue and dequeue still needs
> to be serialized?
>
No. we don't need to define any behavior.
When VIRTIO_F_IN_ORDER is not negotiated, the way a VQ behaves, AQ always behaves this way.
And any synchronization to be done is done in the driver like today. Usually when posting descriptors it needs to hold lock for short internal.
And this cannot lead to contention, due to the fact that descriptor posting time is very short.
 
> 
> >
> >> 2) if we have per vf admin virtqueue, it still doesn't scale since it
> >> occupies more hardware resources
> >>
> > That is too heavy, and doesn’t scale. Proposal is to not have per vf admin
> queue.
> > Proposal is to have one admin queue in a virtio device.
> 
> 
> Ok.
> 
> 
> >
> >>>> I do see one advantage is that the admin virtqueue is transport
> >> independent
> >>>> (or it could be used as a transport).
> >>>>
> >>> I am yet to read the transport part from [1].
> >>
> >> Yes, the main goal is to be compatible with SIOV.
> >>
> > Admin queue is a command interface transport where higher layer services
> can be buit.
> > This includes SR-IOV config, SIOV config.
> > And v2 enables SIOV commands implementation whenever they are ready.
> >
> >>>>> 4. Improve documentation around msix config to link to sriov
> >>>>> section of
> >> virtio
> >>>> spec
> >>>>> 5. Describe error that if VF is bound to the device, admin
> >>>>> commands
> >>>> targeting VF can fail, describe this error code
> >>>>> Did I miss anything?
> >>>>>
> >>>>> Yet to receive your feedback on group, if/why is it needed and,
> >>>>> why/if it
> >> must
> >>>> be in this proposal, what pieces prevents it do as follow-on.
> >>>>> Cornelia, Jason,
> >>>>> Can you please review current proposal as well before we revise v2?
> >>>> If I understand correctly, most of the features (except for the
> >>>> admin virtqueue in_order stuffs) are not specific to the admin
> >>>> virtqueue. As discussed in the previous versions, I still think it's better:
> >>>>
> >>>> 1) adding sections in the basic device facility or data structure
> >>>> for provisioning and MSI
> >>>> 2) introduce admin virtqueue on top as an device interface for
> >>>> those features
> >>>>
> >>> I didn't follow your suggestion. Can you please explain?
> >>> Specifically "data structure for provisioning and MSI"..
> >>
> >> I meant:
> >>
> >> There's a chapter "Basic Facilities of a Virtio Device", we can
> >> introduce the concepts there like:
> >>
> >> 1) Managed device and Management device (terminology proposed by
> >> Michael), and can use PF and VF as a example
> >>
> >> 2) Managed device provisioning (the data structure to specify the
> >> attributes of a managed device (VF))
> >>
> >> 3) MSI
> >>
> > Above is good idea. Will revisit v2, if it is not arranged this way.
> >
> >> And then we can introduced admin virtqueue in either
> >>
> >> 1) transport part
> >>
> >> or
> >>
> >> 2) PCI transport
> >>
> > It is not specific to PCI transport, and currently it is not a transport either.
> 
> 
> Kind of, it allows to configure some basic attributes somehow. I think we'd
> better try not to couple any features to admin virtqueue.
> 
I am fine by defining virtio_mgmt_cmd that somehow can be issued without the admin queue.
For example, struct virtio_fs_req is detached from the request queue, but only way it can be issued today is with request queue.
So we can draft the specification this way.

But I repeatedly miss to see an explanation why is that needed.
Where in the recent spec a new queue is added that has request structure detached from queue.
I would like to see reference to the spec that indicates that 
a. struct virtio_fs_req can be issued by other means other than request queue.
b. Currently the negotiation is done by so and so feature bit to do so via a request queue.
c. "hence down the road something else can be used to carry struct virtio_fs_req instead of request queue".

And that will give good explanation why admin queue should follow some recently added queue which has structure detached from the queue.
(not just in form of structure name, but also in form on feature negotiation plumbing etc).

Otherwise detach mgmt. cmd from admin queue is vague requirement to me, that doesn’t require detachment.

> > Certainly. Admin queue is transport independent.
> > PCI MSI-X configuration is PCI transport specific command, so structures are
> defined it accordingly.
> > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> >
> > Any other transport will have transport specific interrupt configuration. So it
> will be defined accordingly whenever that occurs.
> > For example, IMS for VF or IMS for SF.
> 
> 
> I don't think IMS is PCI specific stuffs, we had similar requests for MMIO.
Sure, but even for that there will be SF specific command for IMS configuration.
This command will have main difference from VF will be the SF identifier vs VF identifier.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  5:26                                 ` Parav Pandit
@ 2022-01-26  5:45                                   ` Jason Wang
  2022-01-26  5:58                                     ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  5:45 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 1:26 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, January 26, 2022 10:35 AM
> >
> > 在 2022/1/25 上午11:52, Parav Pandit 写道:
> > > Hi Jason,
> > >
> > >> From: Jason Wang <jasowang@redhat.com>
> > >> Sent: Tuesday, January 25, 2022 8:59 AM
> > >>
> > >> 在 2022/1/19 下午12:48, Parav Pandit 写道:
> > >>>> From: Jason Wang <jasowang@redhat.com>
> > >>>> Sent: Wednesday, January 19, 2022 9:33 AM
> > >>>>
> > >>>>
> > >>>> It it means IMS, there's already a proposal[1] that introduce MSI
> > >>>> commands via the admin virtqueue. And we had similar requirement
> > >>>> for virtio-MMIO[2] and managed device or SF [3], so I would rather
> > >>>> to introduce IMS (need a better name though) as a basic facility
> > >>>> instead of tie it to any specific transport.
> > >>>>
> > >>> IMS of [1] is a interrupt configuration by the virtio driver for the
> > >>> device is it
> > >> driving, which needs a queue.
> > >>> So regardless of the device type as PCI PF/VF/SF/ADI, there is
> > >>> desire to have a
> > >> generic admin queue not attached to device type.
> > >>> And AQ in this proposal exactly serves this purpose.
> > >>>
> > >>> Device configuring its own IMS vector vs PCI PF configuring VF's
> > >>> MSI-X max
> > >> vector count are two different functionality.
> > >>> Both of these commands can ride on a generic queue.
> > >>> However the queue is not same, because PF owns its own admin queue
> > >>> (for vf msix config), VF or SF operates its own admin queue (for IMS
> > >>> config).
> > >>
> > >> So I think in the next version we need to clarify:
> > >>
> > >> 1) is there a single admin virtqueue shared by all the VFs and PF
> > >>
> > >> or
> > >>
> > >> 2) per VF/PF admin virtqueue, and how does the driver know how to
> > >> find the corresponding admin virtqueue
> > >>
> > > Admin queue is not per VF.
> > > Lets take concrete examples.
> > > 1. So for example, PCI PF can have one AQ.
> > > This AQ carries command to query/config MSI-X vector of VFs.
> > >
> > > 2. In second example, PCI PF is creating/destroying SFs. This is again done by
> > using the AQ of the PCI PF.
> > >
> > > 3. A PCI VF has its own AQ to configure some of its own generic attribute,
> > don't know which is that today.
> >
> >
> > So this could be useful if we can create SF on top of VF. But as discussed we'd
> > better to generalize the concept (management device vs managed device).
> >
> It does not matter if the SF is created over PCI PF or VF. Its on top of PCI virtio device.
> When/if someone creates SF over PCI VF, PCI VF is management device, and PCI SF is managed device.
>
> When/if SF is created over PCI PF, PCI PF is managed device, and PCI SF is managed device.
>
> In either case the AQ on the PCI device is transporting SF create/destroy commands.

That's exactly what I meant.

>
> > > AQ inherently allows out of order commands execution.
> > > It shouldn't face contention. For example 1K depth AQ should be serving
> > hundreds of descriptors commands in parallel for SF creation, VF MSI-X config
> > and more.
> > >
> > > Which area/commands etc you think can lead to the contention?
> >
> >
> > Unless the we have self-conainted descriptor which contains per descriptor
> > writeback address. Even if we have OOO, the enqueue and dequeue still needs
> > to be serialized?
> >
> No. we don't need to define any behavior.
> When VIRTIO_F_IN_ORDER is not negotiated, the way a VQ behaves, AQ always behaves this way.
> And any synchronization to be done is done in the driver like today. Usually when posting descriptors it needs to hold lock for short internal.
> And this cannot lead to contention, due to the fact that descriptor posting time is very short.

Probably but it really depends on the magnitude of the objects that
you want to manage via the admin virtqueue. 1K queue size may work for
1K objects but not for 10K or 100K.

The lock is not the only thing that needs to care, the (busy) waiting
for the completion of the command may still take time.

>
> >
> > >
> > >> 2) if we have per vf admin virtqueue, it still doesn't scale since it
> > >> occupies more hardware resources
> > >>
> > > That is too heavy, and doesn’t scale. Proposal is to not have per vf admin
> > queue.
> > > Proposal is to have one admin queue in a virtio device.
> >
> >
> > Ok.
> >
> >
> > >
> > >>>> I do see one advantage is that the admin virtqueue is transport
> > >> independent
> > >>>> (or it could be used as a transport).
> > >>>>
> > >>> I am yet to read the transport part from [1].
> > >>
> > >> Yes, the main goal is to be compatible with SIOV.
> > >>
> > > Admin queue is a command interface transport where higher layer services
> > can be buit.
> > > This includes SR-IOV config, SIOV config.
> > > And v2 enables SIOV commands implementation whenever they are ready.
> > >
> > >>>>> 4. Improve documentation around msix config to link to sriov
> > >>>>> section of
> > >> virtio
> > >>>> spec
> > >>>>> 5. Describe error that if VF is bound to the device, admin
> > >>>>> commands
> > >>>> targeting VF can fail, describe this error code
> > >>>>> Did I miss anything?
> > >>>>>
> > >>>>> Yet to receive your feedback on group, if/why is it needed and,
> > >>>>> why/if it
> > >> must
> > >>>> be in this proposal, what pieces prevents it do as follow-on.
> > >>>>> Cornelia, Jason,
> > >>>>> Can you please review current proposal as well before we revise v2?
> > >>>> If I understand correctly, most of the features (except for the
> > >>>> admin virtqueue in_order stuffs) are not specific to the admin
> > >>>> virtqueue. As discussed in the previous versions, I still think it's better:
> > >>>>
> > >>>> 1) adding sections in the basic device facility or data structure
> > >>>> for provisioning and MSI
> > >>>> 2) introduce admin virtqueue on top as an device interface for
> > >>>> those features
> > >>>>
> > >>> I didn't follow your suggestion. Can you please explain?
> > >>> Specifically "data structure for provisioning and MSI"..
> > >>
> > >> I meant:
> > >>
> > >> There's a chapter "Basic Facilities of a Virtio Device", we can
> > >> introduce the concepts there like:
> > >>
> > >> 1) Managed device and Management device (terminology proposed by
> > >> Michael), and can use PF and VF as a example
> > >>
> > >> 2) Managed device provisioning (the data structure to specify the
> > >> attributes of a managed device (VF))
> > >>
> > >> 3) MSI
> > >>
> > > Above is good idea. Will revisit v2, if it is not arranged this way.
> > >
> > >> And then we can introduced admin virtqueue in either
> > >>
> > >> 1) transport part
> > >>
> > >> or
> > >>
> > >> 2) PCI transport
> > >>
> > > It is not specific to PCI transport, and currently it is not a transport either.
> >
> >
> > Kind of, it allows to configure some basic attributes somehow. I think we'd
> > better try not to couple any features to admin virtqueue.
> >
> I am fine by defining virtio_mgmt_cmd that somehow can be issued without the admin queue.
> For example, struct virtio_fs_req is detached from the request queue, but only way it can be issued today is with request queue.
> So we can draft the specification this way.
>
> But I repeatedly miss to see an explanation why is that needed.
> Where in the recent spec a new queue is added that has request structure detached from queue.
> I would like to see reference to the spec that indicates that
> a. struct virtio_fs_req can be issued by other means other than request queue.
> b. Currently the negotiation is done by so and so feature bit to do so via a request queue.
> c. "hence down the road something else can be used to carry struct virtio_fs_req instead of request queue".
>
> And that will give good explanation why admin queue should follow some recently added queue which has structure detached from the queue.
> (not just in form of structure name, but also in form on feature negotiation plumbing etc).
>
> Otherwise detach mgmt. cmd from admin queue is vague requirement to me, that doesn’t require detachment.

So what I meant is not specific to any type of device. Device specific
operations should be done via virtqueue.

What I see is, we should not limit the interface for the device
independent basic device facility to be admin virtqueue only:

E.g for IMS, we should allow it to be configured with various ways

1) transport independent way: e.g admin virtqueue (which will be
eventually became another transport)

or

2) transport specific way, E.g a simple PCI(e) capability or MMIO registeres.

>
> > > Certainly. Admin queue is transport independent.
> > > PCI MSI-X configuration is PCI transport specific command, so structures are
> > defined it accordingly.
> > > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> > >
> > > Any other transport will have transport specific interrupt configuration. So it
> > will be defined accordingly whenever that occurs.
> > > For example, IMS for VF or IMS for SF.
> >
> >
> > I don't think IMS is PCI specific stuffs, we had similar requests for MMIO.
> Sure, but even for that there will be SF specific command for IMS configuration.
> This command will have main difference from VF will be the SF identifier vs VF identifier.

I think it's not hard to have a single identifier and just say it's
transport specific? Or simply reserving IDs for VF.

Thanks


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-25  7:19                     ` Michael S. Tsirkin
@ 2022-01-26  5:49                       ` Jason Wang
  2022-01-26  7:02                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  5:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 25, 2022 at 3:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> > On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > > We also need
> > > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > > - page fault / dirty page handling
> > > > >
> > > > > these two seem to call for a vq.
> > > >
> > > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > > PASID we don't even need a dedicated new cvq.
> > >
> > > I don't think it's a good idea to mix transactions from
> > > multiple PASIDs on the same vq.
> >
> > To be clear, I don't mean to let a single vq use multiple PASIDs.
> >
> > >
> > > Attaching a PASID to a queue seems more reasonable.
> > > cvq is under guest control, so yes I think a separate
> > > vq is preferable.
> >
> > Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> > sufficient to assign a dedicated PASID to cvq, any reason for yet
> > another one?
>
> Well I'm not sure how cheap it is to have an extra PASID.
> In theory you can share page tables making it not that
> expensive.

I think it should not be expensive since PASID is per RID according to
the PCIe spec.

> In practice is it hard for the MMU to do so?
> If page tables are not shared extra PASIDs become expensive.

Why? For CVQ, we don't need sharing page tables, just maintaining one
dedicated buffer for command forwarding is sufficient.

>
>
> > >
> > > What is true is that with subfunctions you would have
> > > PASID per subfunction and then one subfunction for control.
> >
> > Well, it's possible, but it's also possible to have everything self
> > contained in a single subfucntion. Then cvq can be assigned to a PASID
> > that is used only for the hypervisor.
> >
> > >
> > > I think a sketch of how things will work with scalable iov can't hurt as
> > > part of this proposal.  And, I'm not sure we should have so much
> > > flexibility: if there's an interface that works for SRIOV and SIOV then
> > > that seems preferable than having distinct transports for SRIOV and
> > > SIOV.
> >
> > Some of my understanding of SR-IOV vs SIOV:
> >
> > 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> > SIOV requires one
> > 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> >
> > So I'm not sure how hard it is if we want to unify the management
> > plane of the above two.
> >
> > Thanks
>
> Interesting. So are you fine with a proposal which ignores the PASID
> things completely then?

I'm fine, just a note that:

The main advantages of using admin virtqueue in another device (PF) is
that the DMA is isolated, but with the help of PASID, there's no need
to do that and we will have a better interface for nesting.

Thanks

> If yes can we take that discussion to
> a different thread then? This one is already too long ...
>
>
> >
> > >
> > >
> > > --
> > > MST
> > >
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  5:45                                   ` Jason Wang
@ 2022-01-26  5:58                                     ` Parav Pandit
  2022-01-26  6:06                                       ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-26  5:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha



> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, January 26, 2022 11:15 AM
> > It does not matter if the SF is created over PCI PF or VF. Its on top of PCI virtio
> device.
> > When/if someone creates SF over PCI VF, PCI VF is management device, and
> PCI SF is managed device.
> >
> > When/if SF is created over PCI PF, PCI PF is managed device, and PCI SF is
> managed device.
> >
> > In either case the AQ on the PCI device is transporting SF create/destroy
> commands.
> 
> That's exactly what I meant.
Ok. cool. So we are in sync here. :)

> 
> Probably but it really depends on the magnitude of the objects that you want to
> manage via the admin virtqueue. 1K queue size may work for 1K objects but not
> for 10K or 100K.
> 
We can have higher queue depth.
Not sure if all 10K will be active at same time, even though total 10K or 100K devices are there.
We don’t see the same in current Linux subfunctions users.

> The lock is not the only thing that needs to care, the (busy) waiting for the
> completion of the command may still take time.
There is no need for busy waiting for completion.
Its admin command issued from the process context, it should be like blk request.
When completion arrives, notifier will awake the caller.

> > I am fine by defining virtio_mgmt_cmd that somehow can be issued without
> the admin queue.
> > For example, struct virtio_fs_req is detached from the request queue, but
> only way it can be issued today is with request queue.
> > So we can draft the specification this way.
> >
> > But I repeatedly miss to see an explanation why is that needed.
> > Where in the recent spec a new queue is added that has request structure
> detached from queue.
> > I would like to see reference to the spec that indicates that a.
> > struct virtio_fs_req can be issued by other means other than request queue.
> > b. Currently the negotiation is done by so and so feature bit to do so via a
> request queue.
> > c. "hence down the road something else can be used to carry struct
> virtio_fs_req instead of request queue".
> >
> > And that will give good explanation why admin queue should follow some
> recently added queue which has structure detached from the queue.
> > (not just in form of structure name, but also in form on feature negotiation
> plumbing etc).
> >
> > Otherwise detach mgmt. cmd from admin queue is vague requirement to
> me, that doesn’t require detachment.
> 
> So what I meant is not specific to any type of device. Device specific operations
> should be done via virtqueue.
> 
> What I see is, we should not limit the interface for the device independent basic
> device facility to be admin virtqueue only:
Can you explain, why?

> 
> E.g for IMS, we should allow it to be configured with various ways
> 
IMS configuration is very abstract.
Lets talk specific.
I want to make sure when you say IMS configuration, 

Do you me HV is configuring IMS number of vectors for the VF/SF?
If it's this, than, it is similar to how HV provision MSIX for a VF.

Or you mean, a guest driver of VF or SF is configuring its IMS to later consume for the VQ?
If its this, than I explained that admin queue is not the vehicle to do so, and we discussed the other structure yday.

> 1) transport independent way: e.g admin virtqueue (which will be eventually
> became another transport)
> 
IMS by guest driver cannot be configured by AQ.

> or
> 
> 2) transport specific way, E.g a simple PCI(e) capability or MMIO registeres.
> 
This is practical.

> >
> > > > Certainly. Admin queue is transport independent.
> > > > PCI MSI-X configuration is PCI transport specific command, so
> > > > structures are
> > > defined it accordingly.
> > > > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> > > >
> > > > Any other transport will have transport specific interrupt
> > > > configuration. So it
> > > will be defined accordingly whenever that occurs.
> > > > For example, IMS for VF or IMS for SF.
> > >
> > >
> > > I don't think IMS is PCI specific stuffs, we had similar requests for MMIO.
> > Sure, but even for that there will be SF specific command for IMS
> configuration.
> > This command will have main difference from VF will be the SF identifier vs
> VF identifier.
> 
> I think it's not hard to have a single identifier and just say it's transport specific?
It is hard when SFs are not defined.

> Or simply reserving IDs for VF.
When SF are not defined, it doesn’t make sense to reserve any bytes for it.
Linux has 4 bytes SF identifier.
Community might go UUID way or some other way.
We cannot define arbitrary bytes that may/may not be enough.

When SF will be defined, it will anyway have sf identifier and it will be super easy to define new vector configuration structure for SF.
Let's not overload VF MSI-X configuration proposal to be intermix with SF.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  5:58                                     ` Parav Pandit
@ 2022-01-26  6:06                                       ` Jason Wang
  2022-01-26  6:24                                         ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  6:06 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 1:58 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, January 26, 2022 11:15 AM
> > > It does not matter if the SF is created over PCI PF or VF. Its on top of PCI virtio
> > device.
> > > When/if someone creates SF over PCI VF, PCI VF is management device, and
> > PCI SF is managed device.
> > >
> > > When/if SF is created over PCI PF, PCI PF is managed device, and PCI SF is
> > managed device.
> > >
> > > In either case the AQ on the PCI device is transporting SF create/destroy
> > commands.
> >
> > That's exactly what I meant.
> Ok. cool. So we are in sync here. :)
>
> >
> > Probably but it really depends on the magnitude of the objects that you want to
> > manage via the admin virtqueue. 1K queue size may work for 1K objects but not
> > for 10K or 100K.
> >
> We can have higher queue depth.
> Not sure if all 10K will be active at same time, even though total 10K or 100K devices are there.
> We don’t see the same in current Linux subfunctions users.

Not specific to this proposal but we see at least 10K+ requirement.

>
> > The lock is not the only thing that needs to care, the (busy) waiting for the
> > completion of the command may still take time.
> There is no need for busy waiting for completion.

Yes, that's why I put busy in the brace.

> Its admin command issued from the process context, it should be like blk request.
> When completion arrives, notifier will awake the caller.
>
> > > I am fine by defining virtio_mgmt_cmd that somehow can be issued without
> > the admin queue.
> > > For example, struct virtio_fs_req is detached from the request queue, but
> > only way it can be issued today is with request queue.
> > > So we can draft the specification this way.
> > >
> > > But I repeatedly miss to see an explanation why is that needed.
> > > Where in the recent spec a new queue is added that has request structure
> > detached from queue.
> > > I would like to see reference to the spec that indicates that a.
> > > struct virtio_fs_req can be issued by other means other than request queue.
> > > b. Currently the negotiation is done by so and so feature bit to do so via a
> > request queue.
> > > c. "hence down the road something else can be used to carry struct
> > virtio_fs_req instead of request queue".
> > >
> > > And that will give good explanation why admin queue should follow some
> > recently added queue which has structure detached from the queue.
> > > (not just in form of structure name, but also in form on feature negotiation
> > plumbing etc).
> > >
> > > Otherwise detach mgmt. cmd from admin queue is vague requirement to
> > me, that doesn’t require detachment.
> >
> > So what I meant is not specific to any type of device. Device specific operations
> > should be done via virtqueue.
> >
> > What I see is, we should not limit the interface for the device independent basic
> > device facility to be admin virtqueue only:
> Can you explain, why?

For

1) the vendor and transport that doesn't want to use admin virtqueue
2) a more simple interface for L1

>
> >
> > E.g for IMS, we should allow it to be configured with various ways
> >
> IMS configuration is very abstract.
> Lets talk specific.
> I want to make sure when you say IMS configuration,
>
> Do you me HV is configuring IMS number of vectors for the VF/SF?
> If it's this, than, it is similar to how HV provision MSIX for a VF.

It can be done by introducing a capability in the PF?

struct msix_provision {
u32 device_select;
u16 msix_vectors;
u16 padding;
};

>
> Or you mean, a guest driver of VF or SF is configuring its IMS to later consume for the VQ?
> If its this, than I explained that admin queue is not the vehicle to do so, and we discussed the other structure yday.

Yes, I guess that's the nesting case I mentioned above.

>
> > 1) transport independent way: e.g admin virtqueue (which will be eventually
> > became another transport)
> >
> IMS by guest driver cannot be configured by AQ.

Yes, that's one point.

>
> > or
> >
> > 2) transport specific way, E.g a simple PCI(e) capability or MMIO registeres.
> >
> This is practical.

Right.

>
> > >
> > > > > Certainly. Admin queue is transport independent.
> > > > > PCI MSI-X configuration is PCI transport specific command, so
> > > > > structures are
> > > > defined it accordingly.
> > > > > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> > > > >
> > > > > Any other transport will have transport specific interrupt
> > > > > configuration. So it
> > > > will be defined accordingly whenever that occurs.
> > > > > For example, IMS for VF or IMS for SF.
> > > >
> > > >
> > > > I don't think IMS is PCI specific stuffs, we had similar requests for MMIO.
> > > Sure, but even for that there will be SF specific command for IMS
> > configuration.
> > > This command will have main difference from VF will be the SF identifier vs
> > VF identifier.
> >
> > I think it's not hard to have a single identifier and just say it's transport specific?
> It is hard when SFs are not defined.
>
> > Or simply reserving IDs for VF.
> When SF are not defined, it doesn’t make sense to reserve any bytes for it.
> Linux has 4 bytes SF identifier.
> Community might go UUID way or some other way.
> We cannot define arbitrary bytes that may/may not be enough.
>
> When SF will be defined, it will anyway have sf identifier and it will be super easy to define new vector configuration structure for SF.
> Let's not overload VF MSI-X configuration proposal to be intermix with SF.

That's fine.

Thanks


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  6:06                                       ` Jason Wang
@ 2022-01-26  6:24                                         ` Parav Pandit
  2022-01-26  6:54                                           ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-26  6:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, January 26, 2022 11:36 AM
> 
> On Wed, Jan 26, 2022 at 1:58 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, January 26, 2022 11:15 AM
> > > > It does not matter if the SF is created over PCI PF or VF. Its on
> > > > top of PCI virtio
> > > device.
> > > > When/if someone creates SF over PCI VF, PCI VF is management
> > > > device, and
> > > PCI SF is managed device.
> > > >
> > > > When/if SF is created over PCI PF, PCI PF is managed device, and
> > > > PCI SF is
> > > managed device.
> > > >
> > > > In either case the AQ on the PCI device is transporting SF
> > > > create/destroy
> > > commands.
> > >
> > > That's exactly what I meant.
> > Ok. cool. So we are in sync here. :)
> >
> > >
> > > Probably but it really depends on the magnitude of the objects that
> > > you want to manage via the admin virtqueue. 1K queue size may work
> > > for 1K objects but not for 10K or 100K.
> > >
> > We can have higher queue depth.
> > Not sure if all 10K will be active at same time, even though total 10K or 100K
> devices are there.
> > We don’t see the same in current Linux subfunctions users.
> 
> Not specific to this proposal but we see at least 10K+ requirement.
> 
> >
> > > The lock is not the only thing that needs to care, the (busy)
> > > waiting for the completion of the command may still take time.
> > There is no need for busy waiting for completion.
> 
> Yes, that's why I put busy in the brace.
> 
> > Its admin command issued from the process context, it should be like blk
> request.
> > When completion arrives, notifier will awake the caller.
> >
> > > > I am fine by defining virtio_mgmt_cmd that somehow can be issued
> > > > without
> > > the admin queue.
> > > > For example, struct virtio_fs_req is detached from the request
> > > > queue, but
> > > only way it can be issued today is with request queue.
> > > > So we can draft the specification this way.
> > > >
> > > > But I repeatedly miss to see an explanation why is that needed.
> > > > Where in the recent spec a new queue is added that has request
> > > > structure
> > > detached from queue.
> > > > I would like to see reference to the spec that indicates that a.
> > > > struct virtio_fs_req can be issued by other means other than request
> queue.
> > > > b. Currently the negotiation is done by so and so feature bit to
> > > > do so via a
> > > request queue.
> > > > c. "hence down the road something else can be used to carry struct
> > > virtio_fs_req instead of request queue".
> > > >
> > > > And that will give good explanation why admin queue should follow
> > > > some
> > > recently added queue which has structure detached from the queue.
> > > > (not just in form of structure name, but also in form on feature
> > > > negotiation
> > > plumbing etc).
> > > >
> > > > Otherwise detach mgmt. cmd from admin queue is vague requirement
> > > > to
> > > me, that doesn’t require detachment.
> > >
> > > So what I meant is not specific to any type of device. Device
> > > specific operations should be done via virtqueue.
> > >
> > > What I see is, we should not limit the interface for the device
> > > independent basic device facility to be admin virtqueue only:
> > Can you explain, why?
> 
> For
> 
> 1) the vendor and transport that doesn't want to use admin virtqueue
It is not the choice of vendor to use admin virtqueue or not. It is the spec definition.

Why a transport doesn’t want to use admin queue?
I don’t follow why virtio fs device wants to use something other than request queue to transport virtio_fs_req.

> 2) a more simple interface for L1
I don’t see virtqueue as complicated object which exists for 10+ years now and in use by 18 devices in L1 and also in L2.
And it is something to be used for multiple commands.

> 
> >
> > >
> > > E.g for IMS, we should allow it to be configured with various ways
> > >
> > IMS configuration is very abstract.
> > Lets talk specific.
> > I want to make sure when you say IMS configuration,
> >
> > Do you me HV is configuring IMS number of vectors for the VF/SF?
> > If it's this, than, it is similar to how HV provision MSIX for a VF.
> 
> It can be done by introducing a capability in the PF?
> 
> struct msix_provision {
> u32 device_select;
> u16 msix_vectors;
> u16 padding;
> };
> 
Shahaf and I already explained in this thread that this capability doesn’t scale to your question about "I do not understand scale ..".
So a device that wants to do above can simply do this with AQ with single VQ depth and still achieve the simplicity.

Michael also responded that device configuration will not end at msix configuration.
So adding more and more tiny capabilities for each configuration doesn't scale either.

> >
> > Or you mean, a guest driver of VF or SF is configuring its IMS to later
> consume for the VQ?
> > If its this, than I explained that admin queue is not the vehicle to do so, and
> we discussed the other structure yday.
> 
> Yes, I guess that's the nesting case I mentioned above.
> 
> >
> > > 1) transport independent way: e.g admin virtqueue (which will be
> > > eventually became another transport)
> > >
> > IMS by guest driver cannot be configured by AQ.
> 
> Yes, that's one point.
> 
Ok. cool. We sync here too. :)

> >
> > > or
> > >
> > > 2) transport specific way, E.g a simple PCI(e) capability or MMIO registeres.
> > >
> > This is practical.
> 
> Right.
> 
> >
> > > >
> > > > > > Certainly. Admin queue is transport independent.
> > > > > > PCI MSI-X configuration is PCI transport specific command, so
> > > > > > structures are
> > > > > defined it accordingly.
> > > > > > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> > > > > >
> > > > > > Any other transport will have transport specific interrupt
> > > > > > configuration. So it
> > > > > will be defined accordingly whenever that occurs.
> > > > > > For example, IMS for VF or IMS for SF.
> > > > >
> > > > >
> > > > > I don't think IMS is PCI specific stuffs, we had similar requests for
> MMIO.
> > > > Sure, but even for that there will be SF specific command for IMS
> > > configuration.
> > > > This command will have main difference from VF will be the SF
> > > > identifier vs
> > > VF identifier.
> > >
> > > I think it's not hard to have a single identifier and just say it's transport
> specific?
> > It is hard when SFs are not defined.
> >
> > > Or simply reserving IDs for VF.
> > When SF are not defined, it doesn’t make sense to reserve any bytes for it.
> > Linux has 4 bytes SF identifier.
> > Community might go UUID way or some other way.
> > We cannot define arbitrary bytes that may/may not be enough.
> >
> > When SF will be defined, it will anyway have sf identifier and it will be super
> easy to define new vector configuration structure for SF.
> > Let's not overload VF MSI-X configuration proposal to be intermix with SF.
> 
> That's fine.

Ok. thanks.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  6:24                                         ` Parav Pandit
@ 2022-01-26  6:54                                           ` Jason Wang
  2022-01-26  8:09                                             ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  6:54 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 2:25 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, January 26, 2022 11:36 AM
> >
> > On Wed, Jan 26, 2022 at 1:58 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, January 26, 2022 11:15 AM
> > > > > It does not matter if the SF is created over PCI PF or VF. Its on
> > > > > top of PCI virtio
> > > > device.
> > > > > When/if someone creates SF over PCI VF, PCI VF is management
> > > > > device, and
> > > > PCI SF is managed device.
> > > > >
> > > > > When/if SF is created over PCI PF, PCI PF is managed device, and
> > > > > PCI SF is
> > > > managed device.
> > > > >
> > > > > In either case the AQ on the PCI device is transporting SF
> > > > > create/destroy
> > > > commands.
> > > >
> > > > That's exactly what I meant.
> > > Ok. cool. So we are in sync here. :)
> > >
> > > >
> > > > Probably but it really depends on the magnitude of the objects that
> > > > you want to manage via the admin virtqueue. 1K queue size may work
> > > > for 1K objects but not for 10K or 100K.
> > > >
> > > We can have higher queue depth.
> > > Not sure if all 10K will be active at same time, even though total 10K or 100K
> > devices are there.
> > > We don’t see the same in current Linux subfunctions users.
> >
> > Not specific to this proposal but we see at least 10K+ requirement.
> >
> > >
> > > > The lock is not the only thing that needs to care, the (busy)
> > > > waiting for the completion of the command may still take time.
> > > There is no need for busy waiting for completion.
> >
> > Yes, that's why I put busy in the brace.
> >
> > > Its admin command issued from the process context, it should be like blk
> > request.
> > > When completion arrives, notifier will awake the caller.
> > >
> > > > > I am fine by defining virtio_mgmt_cmd that somehow can be issued
> > > > > without
> > > > the admin queue.
> > > > > For example, struct virtio_fs_req is detached from the request
> > > > > queue, but
> > > > only way it can be issued today is with request queue.
> > > > > So we can draft the specification this way.
> > > > >
> > > > > But I repeatedly miss to see an explanation why is that needed.
> > > > > Where in the recent spec a new queue is added that has request
> > > > > structure
> > > > detached from queue.
> > > > > I would like to see reference to the spec that indicates that a.
> > > > > struct virtio_fs_req can be issued by other means other than request
> > queue.
> > > > > b. Currently the negotiation is done by so and so feature bit to
> > > > > do so via a
> > > > request queue.
> > > > > c. "hence down the road something else can be used to carry struct
> > > > virtio_fs_req instead of request queue".
> > > > >
> > > > > And that will give good explanation why admin queue should follow
> > > > > some
> > > > recently added queue which has structure detached from the queue.
> > > > > (not just in form of structure name, but also in form on feature
> > > > > negotiation
> > > > plumbing etc).
> > > > >
> > > > > Otherwise detach mgmt. cmd from admin queue is vague requirement
> > > > > to
> > > > me, that doesn’t require detachment.
> > > >
> > > > So what I meant is not specific to any type of device. Device
> > > > specific operations should be done via virtqueue.
> > > >
> > > > What I see is, we should not limit the interface for the device
> > > > independent basic device facility to be admin virtqueue only:
> > > Can you explain, why?
> >
> > For
> >
> > 1) the vendor and transport that doesn't want to use admin virtqueue
> It is not the choice of vendor to use admin virtqueue or not. It is the spec definition.

We are discussing the proposal which hasn't been a part of the spec
right now. It doesn't mean we can't do better.

>
> Why a transport doesn’t want to use admin queue?

The answer is simple and straightforward, each transport had already
had its transport specific way to configure the device.

> I don’t follow why virtio fs device wants to use something other than request queue to transport virtio_fs_req.

See my previous reply, I didn't mean we need to change any device
specific operation like virtio_fs_req.

What I meant is, let's take MSI as an example

1) PCI has MSI table
2) MMIO doesn't support MSI

It looks to me you want to mandate admin virtqueue to MMIO in order to
configure MSI?

>
> > 2) a more simple interface for L1
> I don’t see virtqueue as complicated object which exists for 10+ years now and in use by 18 devices in L1 and also in L2.
> And it is something to be used for multiple commands.

So again, there's misunderstanding. It's all about the "control plane"
but not "dataplane". A simple example is to configure the vq address.

1) For PCI it was done via common_cfg structure
2) For MMIO it was done via dedicated registers
3) For CCW, it was done via dedicated commands

And we know we need dedicated commands for the admin virtqueue. Since
we had a transport specific interface, guest drivers can still
configure the virtqueue address in L1 via the transport specific
interface. So the hypervisor can hide the admin virtqueue. Let's say
we introduce a new feature X for the admin virtqueue only. How can a
guest driver to use that feature? Do we want to hide or assign the
admin virtqueue to L1?

I understand that for things like provisioning which might not be
needed for L1, but how about others?

>
> >
> > >
> > > >
> > > > E.g for IMS, we should allow it to be configured with various ways
> > > >
> > > IMS configuration is very abstract.
> > > Lets talk specific.
> > > I want to make sure when you say IMS configuration,
> > >
> > > Do you me HV is configuring IMS number of vectors for the VF/SF?
> > > If it's this, than, it is similar to how HV provision MSIX for a VF.
> >
> > It can be done by introducing a capability in the PF?
> >
> > struct msix_provision {
> > u32 device_select;
> > u16 msix_vectors;
> > u16 padding;
> > };
> >
> Shahaf and I already explained in this thread that this capability doesn’t scale to your question about "I do not understand scale ..".
> So a device that wants to do above can simply do this with AQ with single VQ depth and still achieve the simplicity.

I thought it was for SF but not SR-IOV.

I think we all know SR-IOV doesn't scale in many ways and if I
understand this series correctly, the main goal is not to address the
scalability issues.

It's good to consider the scalability but there could be cases that
don't need to be scaled.

>
> Michael also responded that device configuration will not end at msix configuration.
> So adding more and more tiny capabilities for each configuration doesn't scale either.
>
> > >
> > > Or you mean, a guest driver of VF or SF is configuring its IMS to later
> > consume for the VQ?
> > > If its this, than I explained that admin queue is not the vehicle to do so, and
> > we discussed the other structure yday.
> >
> > Yes, I guess that's the nesting case I mentioned above.
> >
> > >
> > > > 1) transport independent way: e.g admin virtqueue (which will be
> > > > eventually became another transport)
> > > >
> > > IMS by guest driver cannot be configured by AQ.
> >
> > Yes, that's one point.
> >
> Ok. cool. We sync here too. :)

So I guess you know what I meant for the L1 interface?

Thanks

>
> > >
> > > > or
> > > >
> > > > 2) transport specific way, E.g a simple PCI(e) capability or MMIO registeres.
> > > >
> > > This is practical.
> >
> > Right.
> >
> > >
> > > > >
> > > > > > > Certainly. Admin queue is transport independent.
> > > > > > > PCI MSI-X configuration is PCI transport specific command, so
> > > > > > > structures are
> > > > > > defined it accordingly.
> > > > > > > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> > > > > > >
> > > > > > > Any other transport will have transport specific interrupt
> > > > > > > configuration. So it
> > > > > > will be defined accordingly whenever that occurs.
> > > > > > > For example, IMS for VF or IMS for SF.
> > > > > >
> > > > > >
> > > > > > I don't think IMS is PCI specific stuffs, we had similar requests for
> > MMIO.
> > > > > Sure, but even for that there will be SF specific command for IMS
> > > > configuration.
> > > > > This command will have main difference from VF will be the SF
> > > > > identifier vs
> > > > VF identifier.
> > > >
> > > > I think it's not hard to have a single identifier and just say it's transport
> > specific?
> > > It is hard when SFs are not defined.
> > >
> > > > Or simply reserving IDs for VF.
> > > When SF are not defined, it doesn’t make sense to reserve any bytes for it.
> > > Linux has 4 bytes SF identifier.
> > > Community might go UUID way or some other way.
> > > We cannot define arbitrary bytes that may/may not be enough.
> > >
> > > When SF will be defined, it will anyway have sf identifier and it will be super
> > easy to define new vector configuration structure for SF.
> > > Let's not overload VF MSI-X configuration proposal to be intermix with SF.
> >
> > That's fine.
>
> Ok. thanks.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-26  5:49                       ` Jason Wang
@ 2022-01-26  7:02                         ` Michael S. Tsirkin
  2022-01-26  7:10                           ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-26  7:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 01:49:05PM +0800, Jason Wang wrote:
> On Tue, Jan 25, 2022 at 3:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> > > On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > > > We also need
> > > > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > > > - page fault / dirty page handling
> > > > > >
> > > > > > these two seem to call for a vq.
> > > > >
> > > > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > > > PASID we don't even need a dedicated new cvq.
> > > >
> > > > I don't think it's a good idea to mix transactions from
> > > > multiple PASIDs on the same vq.
> > >
> > > To be clear, I don't mean to let a single vq use multiple PASIDs.
> > >
> > > >
> > > > Attaching a PASID to a queue seems more reasonable.
> > > > cvq is under guest control, so yes I think a separate
> > > > vq is preferable.
> > >
> > > Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> > > sufficient to assign a dedicated PASID to cvq, any reason for yet
> > > another one?
> >
> > Well I'm not sure how cheap it is to have an extra PASID.
> > In theory you can share page tables making it not that
> > expensive.
> 
> I think it should not be expensive since PASID is per RID according to
> the PCIe spec.
> 
> > In practice is it hard for the MMU to do so?
> > If page tables are not shared extra PASIDs become expensive.
> 
> Why? For CVQ, we don't need sharing page tables, just maintaining one
> dedicated buffer for command forwarding is sufficient.

I am talking about the IOMMU page tables, these are not part of PCIe
spec. You need to map all of guest memory to the device, this needs a
set of PTEs. If two PASIDs map same memory you might be able to share
PTEs but I am guessing that this will need some kind of reference
counting to track their usage. I am not sure how complex/expensive that
will turn out to be. In absence of that, we are doubling the amount of
PTEs by using two PASIDs for the same device.


> >
> >
> > > >
> > > > What is true is that with subfunctions you would have
> > > > PASID per subfunction and then one subfunction for control.
> > >
> > > Well, it's possible, but it's also possible to have everything self
> > > contained in a single subfucntion. Then cvq can be assigned to a PASID
> > > that is used only for the hypervisor.
> > >
> > > >
> > > > I think a sketch of how things will work with scalable iov can't hurt as
> > > > part of this proposal.  And, I'm not sure we should have so much
> > > > flexibility: if there's an interface that works for SRIOV and SIOV then
> > > > that seems preferable than having distinct transports for SRIOV and
> > > > SIOV.
> > >
> > > Some of my understanding of SR-IOV vs SIOV:
> > >
> > > 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> > > SIOV requires one
> > > 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> > >
> > > So I'm not sure how hard it is if we want to unify the management
> > > plane of the above two.
> > >
> > > Thanks
> >
> > Interesting. So are you fine with a proposal which ignores the PASID
> > things completely then?
> 
> I'm fine, just a note that:
> 
> The main advantages of using admin virtqueue in another device (PF) is
> that the DMA is isolated,

Right

> but with the help of PASID, there's no need
> to do that

In that you can make the AQ part of the VF itself?

> and we will have a better interface for nesting.
> 
> Thanks

In fact, nesting is an interesting use case. I have not
thought about this too much, it is worth thinking about
how this interface will virtualize.

> > If yes can we take that discussion to
> > a different thread then? This one is already too long ...
> >
> >
> > >
> > > >
> > > >
> > > > --
> > > > MST
> > > >
> >


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-25 10:59                               ` Max Gurtovoy
  2022-01-25 12:09                                 ` Michael S. Tsirkin
@ 2022-01-26  7:03                                 ` Jason Wang
  2022-01-26  9:27                                   ` Max Gurtovoy
  1 sibling, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  7:03 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Parav Pandit, Michael S. Tsirkin, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Tue, Jan 25, 2022 at 6:59 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>
>
> On 1/25/2022 5:52 AM, Parav Pandit wrote:
> > Hi Jason,
> >
> >> From: Jason Wang <jasowang@redhat.com>
> >> Sent: Tuesday, January 25, 2022 8:59 AM
> >>
> >> 在 2022/1/19 下午12:48, Parav Pandit 写道:
> >>>> From: Jason Wang <jasowang@redhat.com>
> >>>> Sent: Wednesday, January 19, 2022 9:33 AM
> >>>>
> >>>>
> >>>> It it means IMS, there's already a proposal[1] that introduce MSI
> >>>> commands via the admin virtqueue. And we had similar requirement for
> >>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
> >>>> introduce IMS (need a better name though) as a basic facility instead
> >>>> of tie it to any specific transport.
> >>>>
> >>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
> >> driving, which needs a queue.
> >>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
> >> generic admin queue not attached to device type.
> >>> And AQ in this proposal exactly serves this purpose.
> >>>
> >>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
> >> vector count are two different functionality.
> >>> Both of these commands can ride on a generic queue.
> >>> However the queue is not same, because PF owns its own admin queue
> >>> (for vf msix config), VF or SF operates its own admin queue (for IMS
> >>> config).
> >>
> >> So I think in the next version we need to clarify:
> >>
> >> 1) is there a single admin virtqueue shared by all the VFs and PF
> >>
> >> or
> >>
> >> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
> >> corresponding admin virtqueue
> >>
> > Admin queue is not per VF.
> > Lets take concrete examples.
> > 1. So for example, PCI PF can have one AQ.
> > This AQ carries command to query/config MSI-X vector of VFs.
> >
> > 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
> >
> > 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
> > May be something that is extremely hard to do over features bit.
> > Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
> >
> >>> So a good example is,
> >>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> >>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
> >> in GVM.
> >>> Both the functions will have AQ feature bit set.
> >>
> >> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
> >> true, don't we need some kind of address isolation like PASID?
> >>
> > Above one for IMS is not a good example. I replied the reasoning last week for it.
> >
> >>> Fair enough, so we have more users of admin queue than just MSI-X config.
> >>
> >> Well, what I really meant is that we actually have more users of IMS.
> >> That's is exactly what virito-mmio wants. In this case introducing admin
> >> queue looks too heavyweight for that.
> >>
> > IMS config cannot be done over AQ as described in previous email in this thread.
> >
> >>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> >>>>> the queues 3. Update commit log to describe why config space is not
> >>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
> >>>> I fail to understand the scale/registeres issues. With the one of my previous
> >>>> proposal (device selector), technically we don't even need any config space
> >> or
> >>>> BAR for VF or SF by multiplexing the registers for PF.
> >>>>
> >>> Scale issue is: when you want to create, query, manipulate hundreds of
> >> objects, having shared MMIO register or configuration register, will be too
> >> slow.
> >>
> >>
> >> Ok, this need to be clarified in the commit log. And we need make sure
> >> it's not an issue that is only happen for some specific vendor.
> > It is present in the v2 commit log cover letter.
> > Please let me know if you think it should be in the actual patch commit log.
> >
> >
> >>> And additionally such register set doesn't scale to allow sharing large
> >> number of bytes as DMA cannot be done.
> >>
> >>
> >> That's true.
> >>
> >>
> >>>   From physical device perspective, it doesn’t scale because device needs to
> >> have those resources ready to answer on MMIO reads and for hundreds to
> >> thousand of devices it just cannot do it.
> >>> This is one of the reason for birth of IMS.
> >>
> >> IMS allows the table to be stored in the memory and cached by the device
> >> to have the best scalability. But I had other questions:
> >>
> >> 1) if we have a single admin virtqueue, there will still be contention
> >> in the driver side
> >>
> > AQ inherently allows out of order commands execution.
> > It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
> >
> > Which area/commands etc you think can lead to the contention?
> >
> >> 2) if we have per vf admin virtqueue, it still doesn't scale since it
> >> occupies more hardware resources
> >>
> > That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
> > Proposal is to have one admin queue in a virtio device.
>
> Right ? where did we mention something that can imply otherwise ?

Well, I don't know but probably this part,

" PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ ..."

>
>
> >
> >>>> I do see one advantage is that the admin virtqueue is transport
> >> independent
> >>>> (or it could be used as a transport).
> >>>>
> >>> I am yet to read the transport part from [1].
> >>
> >> Yes, the main goal is to be compatible with SIOV.
> >>
> > Admin queue is a command interface transport where higher layer services can be buit.
> > This includes SR-IOV config, SIOV config.
> > And v2 enables SIOV commands implementation whenever they are ready.
> >
> >>>>> 4. Improve documentation around msix config to link to sriov section of
> >> virtio
> >>>> spec
> >>>>> 5. Describe error that if VF is bound to the device, admin commands
> >>>> targeting VF can fail, describe this error code
> >>>>> Did I miss anything?
> >>>>>
> >>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it
> >> must
> >>>> be in this proposal, what pieces prevents it do as follow-on.
> >>>>> Cornelia, Jason,
> >>>>> Can you please review current proposal as well before we revise v2?
> >>>> If I understand correctly, most of the features (except for the admin
> >>>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
> >>>> discussed in the previous versions, I still think it's better:
> >>>>
> >>>> 1) adding sections in the basic device facility or data structure for
> >>>> provisioning and MSI
> >>>> 2) introduce admin virtqueue on top as an device interface for those
> >>>> features
> >>>>
> >>> I didn't follow your suggestion. Can you please explain?
> >>> Specifically "data structure for provisioning and MSI"..
> >>
> >> I meant:
> >>
> >> There's a chapter "Basic Facilities of a Virtio Device", we can
> >> introduce the concepts there like:
> >>
> >> 1) Managed device and Management device (terminology proposed by
> >> Michael), and can use PF and VF as a example
> >>
> >> 2) Managed device provisioning (the data structure to specify the
> >> attributes of a managed device (VF))
> >>
> >> 3) MSI
> >>
> > Above is good idea. Will revisit v2, if it is not arranged this way.
>
> Let me make sure I understand, you would like to see a new chapter under
> "Basic Facilities of a Virtio Device" that is
>
> called "Device management" and this chapter will explain in few words
> the concept

Yes.

> and it will point to another chapter under "Basic Facilities
> of a Virtio Device"
>
> that was introduced here "Admin Virtqueues" ?

So far as I see from the proposal, it needs belong to PCI transport
part or a new transport.

>
> So you do agree that managing a managed (create/destroy/setup/etc...)
> will be done using the AQ of the managing device ?

I agree.

Thanks

>
> >
> >> And then we can introduced admin virtqueue in either
> >>
> >> 1) transport part
> >>
> >> or
> >>
> >> 2) PCI transport
> >>
> > It is not specific to PCI transport, and currently it is not a transport either.
> > So admin queue will keep as general entity for admin work.
> >
> >> In the admin virtqueue, there will be commands to provision and
> >> configure MSI.
> >>
> > Please review v2 if it is not arranged this way.
> >
> >>>> The leaves the chance for future extensions to allow those features to
> >>>> be used by transport specific interface which will benefit for
> >>>>
> >>> AQ allows communication (command, response) between driver and device
> >> in transport independent way.
> >>> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
> >>> Sometimes device configure its on IMS interrupt.
> >>> Something else in future.
> >>> So it is really a generic request-response queue.
> >>
> >> I agree, but I think we can't mandate new features to a specific transport.
> >>
> > Certainly. Admin queue is transport independent.
> > PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
> > It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> >
> > Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
> > For example, IMS for VF or IMS for SF.
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 4/5] virtio-net: add support for VIRTIO_F_ADMIN_VQ
  2022-01-26  7:02                         ` Michael S. Tsirkin
@ 2022-01-26  7:10                           ` Jason Wang
  0 siblings, 0 replies; 110+ messages in thread
From: Jason Wang @ 2022-01-26  7:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Max Gurtovoy, virtio-comment, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 3:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jan 26, 2022 at 01:49:05PM +0800, Jason Wang wrote:
> > On Tue, Jan 25, 2022 at 3:20 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Tue, Jan 25, 2022 at 11:53:35AM +0800, Jason Wang wrote:
> > > > On Wed, Jan 19, 2022 at 5:26 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jan 19, 2022 at 12:16:47PM +0800, Jason Wang wrote:
> > > > > > > We also need
> > > > > > > - something like injecting cvq commands to control rx mode from the admin device
> > > > > > > - page fault / dirty page handling
> > > > > > >
> > > > > > > these two seem to call for a vq.
> > > > > >
> > > > > > Right, but vq is not necessarily for PF if we had PASID. And with
> > > > > > PASID we don't even need a dedicated new cvq.
> > > > >
> > > > > I don't think it's a good idea to mix transactions from
> > > > > multiple PASIDs on the same vq.
> > > >
> > > > To be clear, I don't mean to let a single vq use multiple PASIDs.
> > > >
> > > > >
> > > > > Attaching a PASID to a queue seems more reasonable.
> > > > > cvq is under guest control, so yes I think a separate
> > > > > vq is preferable.
> > > >
> > > > Sorry, I don't get here. E.g in the case of virtio-net, it's more than
> > > > sufficient to assign a dedicated PASID to cvq, any reason for yet
> > > > another one?
> > >
> > > Well I'm not sure how cheap it is to have an extra PASID.
> > > In theory you can share page tables making it not that
> > > expensive.
> >
> > I think it should not be expensive since PASID is per RID according to
> > the PCIe spec.
> >
> > > In practice is it hard for the MMU to do so?
> > > If page tables are not shared extra PASIDs become expensive.
> >
> > Why? For CVQ, we don't need sharing page tables, just maintaining one
> > dedicated buffer for command forwarding is sufficient.
>
> I am talking about the IOMMU page tables, these are not part of PCIe
> spec. You need to map all of guest memory to the device, this needs a
> set of PTEs. If two PASIDs map same memory you might be able to share
> PTEs but I am guessing that this will need some kind of reference
> counting to track their usage. I am not sure how complex/expensive that
> will turn out to be. In absence of that, we are doubling the amount of
> PTEs by using two PASIDs for the same device.

So it depends on the migration model

1) save and restore

or

2) trap and emulate

Then:

- If the device provides the facility to sync the state we don't need
a dedicated PASID for CVQ, and CVQ can be assigned to guests.
- If the device doesn't provide the facility to sync the state, we
need trap CVQ and get the state (what Qemu currently did), then CVQ
needs to be trapped (an emulated CVQ will be presented to guests). And
we need a dedicated PASID for hardware CVQ, but in this case we don't
need to map guest memory to hardware CVQ otherwise there will be
security implications. It's sufficient to map a small buffer.


>
>
> > >
> > >
> > > > >
> > > > > What is true is that with subfunctions you would have
> > > > > PASID per subfunction and then one subfunction for control.
> > > >
> > > > Well, it's possible, but it's also possible to have everything self
> > > > contained in a single subfucntion. Then cvq can be assigned to a PASID
> > > > that is used only for the hypervisor.
> > > >
> > > > >
> > > > > I think a sketch of how things will work with scalable iov can't hurt as
> > > > > part of this proposal.  And, I'm not sure we should have so much
> > > > > flexibility: if there's an interface that works for SRIOV and SIOV then
> > > > > that seems preferable than having distinct transports for SRIOV and
> > > > > SIOV.
> > > >
> > > > Some of my understanding of SR-IOV vs SIOV:
> > > >
> > > > 1) SR-IOV doesn't requires a transport, VF use PCI config space; But
> > > > SIOV requires one
> > > > 2) SR-IOV doesn't support dynamic on demand provisioning where SIOV can
> > > >
> > > > So I'm not sure how hard it is if we want to unify the management
> > > > plane of the above two.
> > > >
> > > > Thanks
> > >
> > > Interesting. So are you fine with a proposal which ignores the PASID
> > > things completely then?
> >
> > I'm fine, just a note that:
> >
> > The main advantages of using admin virtqueue in another device (PF) is
> > that the DMA is isolated,
>
> Right
>
> > but with the help of PASID, there's no need
> > to do that
>
> In that you can make the AQ part of the VF itself?

Not sure, but I guess for nesting, A bar/register interface is much
more simpler/better for the case that doesn't need DMA.

>
> > and we will have a better interface for nesting.
> >
> > Thanks
>
> In fact, nesting is an interesting use case. I have not
> thought about this too much, it is worth thinking about
> how this interface will virtualize.

I totally agree.

Thanks

>
> > > If yes can we take that discussion to
> > > a different thread then? This one is already too long ...
> > >
> > >
> > > >
> > > > >
> > > > >
> > > > > --
> > > > > MST
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  6:54                                           ` Jason Wang
@ 2022-01-26  8:09                                             ` Parav Pandit
  2022-01-26  9:07                                               ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-26  8:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, January 26, 2022 12:24 PM
> 
> > >
> > > 1) the vendor and transport that doesn't want to use admin virtqueue
> > It is not the choice of vendor to use admin virtqueue or not. It is the spec
> definition.
> 
> We are discussing the proposal which hasn't been a part of the spec right now.
> It doesn't mean we can't do better.
Sure. It’s the proposal we are discussing, but I fail to understand why a vendor doesn’t want to use a admin queue due to which it needs to be detached.

> >
> > Why a transport doesn’t want to use admin queue?
> 
> The answer is simple and straightforward, each transport had already had its
> transport specific way to configure the device.
Each transport uses transport agnostic way to communicate the requests and response to the device by means of virtqueue descriptors.

And that is done through virtqueue.

A descriptor in a virtqueue represents wide range of things.
1. Sometime sending packet, 
2. sometimes configuring vlan,
3. sometime configuring adding mac to table.
4. sometime adding crypto session
5. and now sometime managing a managed VF or SF device

Item #5 is no different than #1 to #4.
Managed device configuration != "generic configure the device".

> 
> > I don’t follow why virtio fs device wants to use something other than request
> queue to transport virtio_fs_req.
> 
> See my previous reply, I didn't mean we need to change any device specific
> operation like virtio_fs_req.
> 
You imply that virtio_fs_req must be carried on the request queue, but some other request like to use other than virtqueue even though its explained several times that why virtqueue is used.
Doesn't make sense at all. 

> What I meant is, let's take MSI as an example
> 
> 1) PCI has MSI table
> 2) MMIO doesn't support MSI
> 
> It looks to me you want to mandate admin virtqueue to MMIO in order to
> configure MSI?
> 
If you want to attribute it as "mandate" than yes, it is no different than,
virtio_fs_req is mandated on request vq.
virtio_crypto_op_ctrl_req ia mandated on control vq.

What prevents both above examples to used AQ in generic manner?

> >
> > > 2) a more simple interface for L1
> > I don’t see virtqueue as complicated object which exists for 10+ years now
> and in use by 18 devices in L1 and also in L2.
> > And it is something to be used for multiple commands.
> 
> So again, there's misunderstanding. It's all about the "control plane"
> but not "dataplane". A simple example is to configure the vq address.
> 
> 1) For PCI it was done via common_cfg structure
> 2) For MMIO it was done via dedicated registers
> 3) For CCW, it was done via dedicated commands

We discussed already and you ignored the scale, on-die and other comments.
And low level self-device configuration is compared with managing device configuration.
This comparison simply isn't right.

data plane => vq
control plane => cq + transport specific cfg
mgmt plane => aq (this proposal)

mgmt plane != device on vq configuration.
Device on vq configuration doesn't block with other device VQ configuration.

> 
> And we know we need dedicated commands for the admin virtqueue. 
Not sure I follow you. AQ carries admin queue commands like any other queues.

> Since we
> had a transport specific interface, guest drivers can still configure the virtqueue
> address in L1 via the transport specific interface. So the hypervisor can hide the
> admin virtqueue. Let's say we introduce a new feature X for the admin
> virtqueue only. How can a guest driver to use that feature? 
As the patch-1, and patch-4 clearly shows the purpose of the AQ, AQ is of the virtio device located in the HV is not for the consumption in the guest driver.

> Do we want to hide
> or assign the admin virtqueue to L1?
A virtio device in guest may have its own admin queue. But again, until now there is no concrete example discussed that demands an admin queue in the guest driver.
IMS vector enable/disable was the closest one we discussed and we agreed that AQ cannot be used for that.

> 
> I understand that for things like provisioning which might not be needed for L1,
> but how about others?
Others?

> > >
> > Shahaf and I already explained in this thread that this capability doesn’t scale
> to your question about "I do not understand scale ..".
> > So a device that wants to do above can simply do this with AQ with single VQ
> depth and still achieve the simplicity.
> 
> I thought it was for SF but not SR-IOV.
Ah, that was the misunderstanding you had. It applies to SF and SR-IOV both.

> 
> I think we all know SR-IOV doesn't scale in many ways and if I understand this
> series correctly, the main goal is not to address the scalability issues.
You are mixing many things here. Do not mix SR-IOV VF scaling with PF resource scaling.
Please keep it focused.

The goal as constantly repeated, is to have a interface between driver and device to carry multiple commands in scalable way without consuming on-chip resources.

> 
> It's good to consider the scalability but there could be cases that don't need to
> be scaled.
Sure, in that case AQ of single entry is just fine.

> 
> >
> > Michael also responded that device configuration will not end at msix
> configuration.
> > So adding more and more tiny capabilities for each configuration doesn't
> scale either.
> >
You need to ack above point otherwise we will keep discussing this even 2 weeks from now. :)

> > > >
> > > > Or you mean, a guest driver of VF or SF is configuring its IMS to
> > > > later
> > > consume for the VQ?
> > > > If its this, than I explained that admin queue is not the vehicle
> > > > to do so, and
> > > we discussed the other structure yday.
> > >
> > > Yes, I guess that's the nesting case I mentioned above.
> > >
> > > >
> > > > > 1) transport independent way: e.g admin virtqueue (which will be
> > > > > eventually became another transport)
> > > > >
> > > > IMS by guest driver cannot be configured by AQ.
> > >
> > > Yes, that's one point.
> > >
> > Ok. cool. We sync here too. :)
> 
> So I guess you know what I meant for the L1 interface?
With your last question, let me double check my understanding. :)

My understanding of L1 interface is virtio device in hypervisor. Is that right?


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  8:09                                             ` Parav Pandit
@ 2022-01-26  9:07                                               ` Jason Wang
  2022-01-26  9:47                                                 ` Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  9:07 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 4:09 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, January 26, 2022 12:24 PM
> >
> > > >
> > > > 1) the vendor and transport that doesn't want to use admin virtqueue
> > > It is not the choice of vendor to use admin virtqueue or not. It is the spec
> > definition.
> >
> > We are discussing the proposal which hasn't been a part of the spec right now.
> > It doesn't mean we can't do better.
> Sure. It’s the proposal we are discussing, but I fail to understand why a vendor doesn’t want to use a admin queue due to which it needs to be detached.

What do you mean by "detached"?

>
> > >
> > > Why a transport doesn’t want to use admin queue?
> >
> > The answer is simple and straightforward, each transport had already had its
> > transport specific way to configure the device.
> Each transport uses transport agnostic way to communicate the requests and response to the device by means of virtqueue descriptors.
>
> And that is done through virtqueue.
>
> A descriptor in a virtqueue represents wide range of things.
> 1. Sometime sending packet,
> 2. sometimes configuring vlan,
> 3. sometime configuring adding mac to table.
> 4. sometime adding crypto session
> 5. and now sometime managing a managed VF or SF device
>
> Item #5 is no different than #1 to #4.
> Managed device configuration != "generic configure the device".

Well, I think then you need to explain this well in the patch.

>
> >
> > > I don’t follow why virtio fs device wants to use something other than request
> > queue to transport virtio_fs_req.
> >
> > See my previous reply, I didn't mean we need to change any device specific
> > operation like virtio_fs_req.
> >
> You imply that virtio_fs_req must be carried on the request queue, but some other request like to use other than virtqueue even though its explained several times that why virtqueue is used.
> Doesn't make sense at all.

So a virtqueue is needed only when DMA is required. That is the case
for virtio_fs_req.

For other cases that DMA is not a must, it could or not be a queue interface.

>
> > What I meant is, let's take MSI as an example
> >
> > 1) PCI has MSI table
> > 2) MMIO doesn't support MSI
> >
> > It looks to me you want to mandate admin virtqueue to MMIO in order to
> > configure MSI?
> >
> If you want to attribute it as "mandate" than yes, it is no different than,
> virtio_fs_req is mandated on request vq.
> virtio_crypto_op_ctrl_req ia mandated on control vq.
>
> What prevents both above examples to used AQ in generic manner?

Nothing, but for simple features like MSI I don't think we can exclude
the possibility of using dedicated registers. For a simple setup like
a guest that is using an MMIO device, using new virtqueue just for MSI
seems an odd burden.

>
> > >
> > > > 2) a more simple interface for L1
> > > I don’t see virtqueue as complicated object which exists for 10+ years now
> > and in use by 18 devices in L1 and also in L2.
> > > And it is something to be used for multiple commands.
> >
> > So again, there's misunderstanding. It's all about the "control plane"
> > but not "dataplane". A simple example is to configure the vq address.
> >
> > 1) For PCI it was done via common_cfg structure
> > 2) For MMIO it was done via dedicated registers
> > 3) For CCW, it was done via dedicated commands
>
> We discussed already and you ignored the scale, on-die and other comments.

Looks not, it's the IMS discussion that confuses the thread probably.

> And low level self-device configuration is compared with managing device configuration.
> This comparison simply isn't right.
>
> data plane => vq
> control plane => cq + transport specific cfg
> mgmt plane => aq (this proposal)
>
> mgmt plane != device on vq configuration.

To clarify, we need to define what exactly did management mean? E.g is
IMS considered to be mgmt or control?

> Device on vq configuration doesn't block with other device VQ configuration.
>
> >
> > And we know we need dedicated commands for the admin virtqueue.
> Not sure I follow you. AQ carries admin queue commands like any other queues.
>
> > Since we
> > had a transport specific interface, guest drivers can still configure the virtqueue
> > address in L1 via the transport specific interface. So the hypervisor can hide the
> > admin virtqueue. Let's say we introduce a new feature X for the admin
> > virtqueue only. How can a guest driver to use that feature?
> As the patch-1, and patch-4 clearly shows the purpose of the AQ, AQ is of the virtio device located in the HV is not for the consumption in the guest driver.
>
> > Do we want to hide
> > or assign the admin virtqueue to L1?
> A virtio device in guest may have its own admin queue. But again, until now there is no concrete example discussed that demands an admin queue in the guest driver.
> IMS vector enable/disable was the closest one we discussed and we agreed that AQ cannot be used for that.

Ok.

>
> >
> > I understand that for things like provisioning which might not be needed for L1,
> > but how about others?
> Others?

Well, I'd say if you limit the admin virtuqueue for provisioning, it
should be fine.

>
> > > >
> > > Shahaf and I already explained in this thread that this capability doesn’t scale
> > to your question about "I do not understand scale ..".
> > > So a device that wants to do above can simply do this with AQ with single VQ
> > depth and still achieve the simplicity.
> >
> > I thought it was for SF but not SR-IOV.
> Ah, that was the misunderstanding you had. It applies to SF and SR-IOV both.
>
> >
> > I think we all know SR-IOV doesn't scale in many ways and if I understand this
> > series correctly, the main goal is not to address the scalability issues.
> You are mixing many things here. Do not mix SR-IOV VF scaling with PF resource scaling.
> Please keep it focused.
>
> The goal as constantly repeated, is to have a interface between driver and device to carry multiple commands in scalable way without consuming on-chip resources.
>
> >
> > It's good to consider the scalability but there could be cases that don't need to
> > be scaled.
> Sure, in that case AQ of single entry is just fine.
>
> >
> > >
> > > Michael also responded that device configuration will not end at msix
> > configuration.
> > > So adding more and more tiny capabilities for each configuration doesn't
> > scale either.
> > >
> You need to ack above point otherwise we will keep discussing this even 2 weeks from now. :)

I think I've acknowledged in another thread about the idea of using
admin virtqueue.

Thanks

>
> > > > >
> > > > > Or you mean, a guest driver of VF or SF is configuring its IMS to
> > > > > later
> > > > consume for the VQ?
> > > > > If its this, than I explained that admin queue is not the vehicle
> > > > > to do so, and
> > > > we discussed the other structure yday.
> > > >
> > > > Yes, I guess that's the nesting case I mentioned above.
> > > >
> > > > >
> > > > > > 1) transport independent way: e.g admin virtqueue (which will be
> > > > > > eventually became another transport)
> > > > > >
> > > > > IMS by guest driver cannot be configured by AQ.
> > > >
> > > > Yes, that's one point.
> > > >
> > > Ok. cool. We sync here too. :)
> >
> > So I guess you know what I meant for the L1 interface?
> With your last question, let me double check my understanding. :)
>
> My understanding of L1 interface is virtio device in hypervisor. Is that right?
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  7:03                                 ` Jason Wang
@ 2022-01-26  9:27                                   ` Max Gurtovoy
  2022-01-26  9:34                                     ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-26  9:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Michael S. Tsirkin, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha


On 1/26/2022 9:03 AM, Jason Wang wrote:
> On Tue, Jan 25, 2022 at 6:59 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>>
>> On 1/25/2022 5:52 AM, Parav Pandit wrote:
>>> Hi Jason,
>>>
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Tuesday, January 25, 2022 8:59 AM
>>>>
>>>> 在 2022/1/19 下午12:48, Parav Pandit 写道:
>>>>>> From: Jason Wang <jasowang@redhat.com>
>>>>>> Sent: Wednesday, January 19, 2022 9:33 AM
>>>>>>
>>>>>>
>>>>>> It it means IMS, there's already a proposal[1] that introduce MSI
>>>>>> commands via the admin virtqueue. And we had similar requirement for
>>>>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
>>>>>> introduce IMS (need a better name though) as a basic facility instead
>>>>>> of tie it to any specific transport.
>>>>>>
>>>>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
>>>> driving, which needs a queue.
>>>>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
>>>> generic admin queue not attached to device type.
>>>>> And AQ in this proposal exactly serves this purpose.
>>>>>
>>>>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
>>>> vector count are two different functionality.
>>>>> Both of these commands can ride on a generic queue.
>>>>> However the queue is not same, because PF owns its own admin queue
>>>>> (for vf msix config), VF or SF operates its own admin queue (for IMS
>>>>> config).
>>>> So I think in the next version we need to clarify:
>>>>
>>>> 1) is there a single admin virtqueue shared by all the VFs and PF
>>>>
>>>> or
>>>>
>>>> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
>>>> corresponding admin virtqueue
>>>>
>>> Admin queue is not per VF.
>>> Lets take concrete examples.
>>> 1. So for example, PCI PF can have one AQ.
>>> This AQ carries command to query/config MSI-X vector of VFs.
>>>
>>> 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
>>>
>>> 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
>>> May be something that is extremely hard to do over features bit.
>>> Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
>>>
>>>>> So a good example is,
>>>>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
>>>>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
>>>> in GVM.
>>>>> Both the functions will have AQ feature bit set.
>>>> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
>>>> true, don't we need some kind of address isolation like PASID?
>>>>
>>> Above one for IMS is not a good example. I replied the reasoning last week for it.
>>>
>>>>> Fair enough, so we have more users of admin queue than just MSI-X config.
>>>> Well, what I really meant is that we actually have more users of IMS.
>>>> That's is exactly what virito-mmio wants. In this case introducing admin
>>>> queue looks too heavyweight for that.
>>>>
>>> IMS config cannot be done over AQ as described in previous email in this thread.
>>>
>>>>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
>>>>>>> the queues 3. Update commit log to describe why config space is not
>>>>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
>>>>>> I fail to understand the scale/registeres issues. With the one of my previous
>>>>>> proposal (device selector), technically we don't even need any config space
>>>> or
>>>>>> BAR for VF or SF by multiplexing the registers for PF.
>>>>>>
>>>>> Scale issue is: when you want to create, query, manipulate hundreds of
>>>> objects, having shared MMIO register or configuration register, will be too
>>>> slow.
>>>>
>>>>
>>>> Ok, this need to be clarified in the commit log. And we need make sure
>>>> it's not an issue that is only happen for some specific vendor.
>>> It is present in the v2 commit log cover letter.
>>> Please let me know if you think it should be in the actual patch commit log.
>>>
>>>
>>>>> And additionally such register set doesn't scale to allow sharing large
>>>> number of bytes as DMA cannot be done.
>>>>
>>>>
>>>> That's true.
>>>>
>>>>
>>>>>    From physical device perspective, it doesn’t scale because device needs to
>>>> have those resources ready to answer on MMIO reads and for hundreds to
>>>> thousand of devices it just cannot do it.
>>>>> This is one of the reason for birth of IMS.
>>>> IMS allows the table to be stored in the memory and cached by the device
>>>> to have the best scalability. But I had other questions:
>>>>
>>>> 1) if we have a single admin virtqueue, there will still be contention
>>>> in the driver side
>>>>
>>> AQ inherently allows out of order commands execution.
>>> It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
>>>
>>> Which area/commands etc you think can lead to the contention?
>>>
>>>> 2) if we have per vf admin virtqueue, it still doesn't scale since it
>>>> occupies more hardware resources
>>>>
>>> That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
>>> Proposal is to have one admin queue in a virtio device.
>> Right ? where did we mention something that can imply otherwise ?
> Well, I don't know but probably this part,
>
> " PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ ..."
>
>>
>>>>>> I do see one advantage is that the admin virtqueue is transport
>>>> independent
>>>>>> (or it could be used as a transport).
>>>>>>
>>>>> I am yet to read the transport part from [1].
>>>> Yes, the main goal is to be compatible with SIOV.
>>>>
>>> Admin queue is a command interface transport where higher layer services can be buit.
>>> This includes SR-IOV config, SIOV config.
>>> And v2 enables SIOV commands implementation whenever they are ready.
>>>
>>>>>>> 4. Improve documentation around msix config to link to sriov section of
>>>> virtio
>>>>>> spec
>>>>>>> 5. Describe error that if VF is bound to the device, admin commands
>>>>>> targeting VF can fail, describe this error code
>>>>>>> Did I miss anything?
>>>>>>>
>>>>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it
>>>> must
>>>>>> be in this proposal, what pieces prevents it do as follow-on.
>>>>>>> Cornelia, Jason,
>>>>>>> Can you please review current proposal as well before we revise v2?
>>>>>> If I understand correctly, most of the features (except for the admin
>>>>>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
>>>>>> discussed in the previous versions, I still think it's better:
>>>>>>
>>>>>> 1) adding sections in the basic device facility or data structure for
>>>>>> provisioning and MSI
>>>>>> 2) introduce admin virtqueue on top as an device interface for those
>>>>>> features
>>>>>>
>>>>> I didn't follow your suggestion. Can you please explain?
>>>>> Specifically "data structure for provisioning and MSI"..
>>>> I meant:
>>>>
>>>> There's a chapter "Basic Facilities of a Virtio Device", we can
>>>> introduce the concepts there like:
>>>>
>>>> 1) Managed device and Management device (terminology proposed by
>>>> Michael), and can use PF and VF as a example
>>>>
>>>> 2) Managed device provisioning (the data structure to specify the
>>>> attributes of a managed device (VF))
>>>>
>>>> 3) MSI
>>>>
>>> Above is good idea. Will revisit v2, if it is not arranged this way.
>> Let me make sure I understand, you would like to see a new chapter under
>> "Basic Facilities of a Virtio Device" that is
>>
>> called "Device management" and this chapter will explain in few words
>> the concept
> Yes.
>
>> and it will point to another chapter under "Basic Facilities
>> of a Virtio Device"
>>
>> that was introduced here "Admin Virtqueues" ?
> So far as I see from the proposal, it needs belong to PCI transport
> part or a new transport.

No it's not.

It should stay in the basic/generic area like we discussed in the past 
and already agreed on.

Lets move forward please.

>> So you do agree that managing a managed (create/destroy/setup/etc...)
>> will be done using the AQ of the managing device ?
> I agree.
>
> Thanks

Ok so I guess we agree on the concept of this patch set and the AQ.

Thanks.

>
>>>> And then we can introduced admin virtqueue in either
>>>>
>>>> 1) transport part
>>>>
>>>> or
>>>>
>>>> 2) PCI transport
>>>>
>>> It is not specific to PCI transport, and currently it is not a transport either.
>>> So admin queue will keep as general entity for admin work.
>>>
>>>> In the admin virtqueue, there will be commands to provision and
>>>> configure MSI.
>>>>
>>> Please review v2 if it is not arranged this way.
>>>
>>>>>> The leaves the chance for future extensions to allow those features to
>>>>>> be used by transport specific interface which will benefit for
>>>>>>
>>>>> AQ allows communication (command, response) between driver and device
>>>> in transport independent way.
>>>>> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
>>>>> Sometimes device configure its on IMS interrupt.
>>>>> Something else in future.
>>>>> So it is really a generic request-response queue.
>>>> I agree, but I think we can't mandate new features to a specific transport.
>>>>
>>> Certainly. Admin queue is transport independent.
>>> PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
>>> It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
>>>
>>> Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
>>> For example, IMS for VF or IMS for SF.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  9:27                                   ` Max Gurtovoy
@ 2022-01-26  9:34                                     ` Jason Wang
  2022-01-26  9:45                                       ` Max Gurtovoy
  0 siblings, 1 reply; 110+ messages in thread
From: Jason Wang @ 2022-01-26  9:34 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Parav Pandit, Michael S. Tsirkin, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 5:27 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>
>
> On 1/26/2022 9:03 AM, Jason Wang wrote:
> > On Tue, Jan 25, 2022 at 6:59 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
> >>
> >> On 1/25/2022 5:52 AM, Parav Pandit wrote:
> >>> Hi Jason,
> >>>
> >>>> From: Jason Wang <jasowang@redhat.com>
> >>>> Sent: Tuesday, January 25, 2022 8:59 AM
> >>>>
> >>>> 在 2022/1/19 下午12:48, Parav Pandit 写道:
> >>>>>> From: Jason Wang <jasowang@redhat.com>
> >>>>>> Sent: Wednesday, January 19, 2022 9:33 AM
> >>>>>>
> >>>>>>
> >>>>>> It it means IMS, there's already a proposal[1] that introduce MSI
> >>>>>> commands via the admin virtqueue. And we had similar requirement for
> >>>>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
> >>>>>> introduce IMS (need a better name though) as a basic facility instead
> >>>>>> of tie it to any specific transport.
> >>>>>>
> >>>>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
> >>>> driving, which needs a queue.
> >>>>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
> >>>> generic admin queue not attached to device type.
> >>>>> And AQ in this proposal exactly serves this purpose.
> >>>>>
> >>>>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
> >>>> vector count are two different functionality.
> >>>>> Both of these commands can ride on a generic queue.
> >>>>> However the queue is not same, because PF owns its own admin queue
> >>>>> (for vf msix config), VF or SF operates its own admin queue (for IMS
> >>>>> config).
> >>>> So I think in the next version we need to clarify:
> >>>>
> >>>> 1) is there a single admin virtqueue shared by all the VFs and PF
> >>>>
> >>>> or
> >>>>
> >>>> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
> >>>> corresponding admin virtqueue
> >>>>
> >>> Admin queue is not per VF.
> >>> Lets take concrete examples.
> >>> 1. So for example, PCI PF can have one AQ.
> >>> This AQ carries command to query/config MSI-X vector of VFs.
> >>>
> >>> 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
> >>>
> >>> 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
> >>> May be something that is extremely hard to do over features bit.
> >>> Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
> >>>
> >>>>> So a good example is,
> >>>>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> >>>>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
> >>>> in GVM.
> >>>>> Both the functions will have AQ feature bit set.
> >>>> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
> >>>> true, don't we need some kind of address isolation like PASID?
> >>>>
> >>> Above one for IMS is not a good example. I replied the reasoning last week for it.
> >>>
> >>>>> Fair enough, so we have more users of admin queue than just MSI-X config.
> >>>> Well, what I really meant is that we actually have more users of IMS.
> >>>> That's is exactly what virito-mmio wants. In this case introducing admin
> >>>> queue looks too heavyweight for that.
> >>>>
> >>> IMS config cannot be done over AQ as described in previous email in this thread.
> >>>
> >>>>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> >>>>>>> the queues 3. Update commit log to describe why config space is not
> >>>>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
> >>>>>> I fail to understand the scale/registeres issues. With the one of my previous
> >>>>>> proposal (device selector), technically we don't even need any config space
> >>>> or
> >>>>>> BAR for VF or SF by multiplexing the registers for PF.
> >>>>>>
> >>>>> Scale issue is: when you want to create, query, manipulate hundreds of
> >>>> objects, having shared MMIO register or configuration register, will be too
> >>>> slow.
> >>>>
> >>>>
> >>>> Ok, this need to be clarified in the commit log. And we need make sure
> >>>> it's not an issue that is only happen for some specific vendor.
> >>> It is present in the v2 commit log cover letter.
> >>> Please let me know if you think it should be in the actual patch commit log.
> >>>
> >>>
> >>>>> And additionally such register set doesn't scale to allow sharing large
> >>>> number of bytes as DMA cannot be done.
> >>>>
> >>>>
> >>>> That's true.
> >>>>
> >>>>
> >>>>>    From physical device perspective, it doesn’t scale because device needs to
> >>>> have those resources ready to answer on MMIO reads and for hundreds to
> >>>> thousand of devices it just cannot do it.
> >>>>> This is one of the reason for birth of IMS.
> >>>> IMS allows the table to be stored in the memory and cached by the device
> >>>> to have the best scalability. But I had other questions:
> >>>>
> >>>> 1) if we have a single admin virtqueue, there will still be contention
> >>>> in the driver side
> >>>>
> >>> AQ inherently allows out of order commands execution.
> >>> It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
> >>>
> >>> Which area/commands etc you think can lead to the contention?
> >>>
> >>>> 2) if we have per vf admin virtqueue, it still doesn't scale since it
> >>>> occupies more hardware resources
> >>>>
> >>> That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
> >>> Proposal is to have one admin queue in a virtio device.
> >> Right ? where did we mention something that can imply otherwise ?
> > Well, I don't know but probably this part,
> >
> > " PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ ..."
> >
> >>
> >>>>>> I do see one advantage is that the admin virtqueue is transport
> >>>> independent
> >>>>>> (or it could be used as a transport).
> >>>>>>
> >>>>> I am yet to read the transport part from [1].
> >>>> Yes, the main goal is to be compatible with SIOV.
> >>>>
> >>> Admin queue is a command interface transport where higher layer services can be buit.
> >>> This includes SR-IOV config, SIOV config.
> >>> And v2 enables SIOV commands implementation whenever they are ready.
> >>>
> >>>>>>> 4. Improve documentation around msix config to link to sriov section of
> >>>> virtio
> >>>>>> spec
> >>>>>>> 5. Describe error that if VF is bound to the device, admin commands
> >>>>>> targeting VF can fail, describe this error code
> >>>>>>> Did I miss anything?
> >>>>>>>
> >>>>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it
> >>>> must
> >>>>>> be in this proposal, what pieces prevents it do as follow-on.
> >>>>>>> Cornelia, Jason,
> >>>>>>> Can you please review current proposal as well before we revise v2?
> >>>>>> If I understand correctly, most of the features (except for the admin
> >>>>>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
> >>>>>> discussed in the previous versions, I still think it's better:
> >>>>>>
> >>>>>> 1) adding sections in the basic device facility or data structure for
> >>>>>> provisioning and MSI
> >>>>>> 2) introduce admin virtqueue on top as an device interface for those
> >>>>>> features
> >>>>>>
> >>>>> I didn't follow your suggestion. Can you please explain?
> >>>>> Specifically "data structure for provisioning and MSI"..
> >>>> I meant:
> >>>>
> >>>> There's a chapter "Basic Facilities of a Virtio Device", we can
> >>>> introduce the concepts there like:
> >>>>
> >>>> 1) Managed device and Management device (terminology proposed by
> >>>> Michael), and can use PF and VF as a example
> >>>>
> >>>> 2) Managed device provisioning (the data structure to specify the
> >>>> attributes of a managed device (VF))
> >>>>
> >>>> 3) MSI
> >>>>
> >>> Above is good idea. Will revisit v2, if it is not arranged this way.
> >> Let me make sure I understand, you would like to see a new chapter under
> >> "Basic Facilities of a Virtio Device" that is
> >>
> >> called "Device management" and this chapter will explain in few words
> >> the concept
> > Yes.
> >
> >> and it will point to another chapter under "Basic Facilities
> >> of a Virtio Device"
> >>
> >> that was introduced here "Admin Virtqueues" ?
> > So far as I see from the proposal, it needs belong to PCI transport
> > part or a new transport.
>
> No it's not.
>
> It should stay in the basic/generic area like we discussed in the past
> and already agreed on.
>
> Lets move forward please.

Yes, for the general admin virtqueue part, it should be fine, but for
SR-IOV ATTRS part, is it better to move it to PCI transport?

>
> >> So you do agree that managing a managed (create/destroy/setup/etc...)
> >> will be done using the AQ of the managing device ?
> > I agree.
> >
> > Thanks
>
> Ok so I guess we agree on the concept of this patch set and the AQ.

Yes.

Thanks

>
> Thanks.
>
> >
> >>>> And then we can introduced admin virtqueue in either
> >>>>
> >>>> 1) transport part
> >>>>
> >>>> or
> >>>>
> >>>> 2) PCI transport
> >>>>
> >>> It is not specific to PCI transport, and currently it is not a transport either.
> >>> So admin queue will keep as general entity for admin work.
> >>>
> >>>> In the admin virtqueue, there will be commands to provision and
> >>>> configure MSI.
> >>>>
> >>> Please review v2 if it is not arranged this way.
> >>>
> >>>>>> The leaves the chance for future extensions to allow those features to
> >>>>>> be used by transport specific interface which will benefit for
> >>>>>>
> >>>>> AQ allows communication (command, response) between driver and device
> >>>> in transport independent way.
> >>>>> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
> >>>>> Sometimes device configure its on IMS interrupt.
> >>>>> Something else in future.
> >>>>> So it is really a generic request-response queue.
> >>>> I agree, but I think we can't mandate new features to a specific transport.
> >>>>
> >>> Certainly. Admin queue is transport independent.
> >>> PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
> >>> It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> >>>
> >>> Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
> >>> For example, IMS for VF or IMS for SF.
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  9:34                                     ` Jason Wang
@ 2022-01-26  9:45                                       ` Max Gurtovoy
  2022-01-27  3:46                                         ` Jason Wang
  0 siblings, 1 reply; 110+ messages in thread
From: Max Gurtovoy @ 2022-01-26  9:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Michael S. Tsirkin, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha


On 1/26/2022 11:34 AM, Jason Wang wrote:
> On Wed, Jan 26, 2022 at 5:27 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>>
>> On 1/26/2022 9:03 AM, Jason Wang wrote:
>>> On Tue, Jan 25, 2022 at 6:59 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>>>> On 1/25/2022 5:52 AM, Parav Pandit wrote:
>>>>> Hi Jason,
>>>>>
>>>>>> From: Jason Wang <jasowang@redhat.com>
>>>>>> Sent: Tuesday, January 25, 2022 8:59 AM
>>>>>>
>>>>>> 在 2022/1/19 下午12:48, Parav Pandit 写道:
>>>>>>>> From: Jason Wang <jasowang@redhat.com>
>>>>>>>> Sent: Wednesday, January 19, 2022 9:33 AM
>>>>>>>>
>>>>>>>>
>>>>>>>> It it means IMS, there's already a proposal[1] that introduce MSI
>>>>>>>> commands via the admin virtqueue. And we had similar requirement for
>>>>>>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
>>>>>>>> introduce IMS (need a better name though) as a basic facility instead
>>>>>>>> of tie it to any specific transport.
>>>>>>>>
>>>>>>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
>>>>>> driving, which needs a queue.
>>>>>>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
>>>>>> generic admin queue not attached to device type.
>>>>>>> And AQ in this proposal exactly serves this purpose.
>>>>>>>
>>>>>>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
>>>>>> vector count are two different functionality.
>>>>>>> Both of these commands can ride on a generic queue.
>>>>>>> However the queue is not same, because PF owns its own admin queue
>>>>>>> (for vf msix config), VF or SF operates its own admin queue (for IMS
>>>>>>> config).
>>>>>> So I think in the next version we need to clarify:
>>>>>>
>>>>>> 1) is there a single admin virtqueue shared by all the VFs and PF
>>>>>>
>>>>>> or
>>>>>>
>>>>>> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
>>>>>> corresponding admin virtqueue
>>>>>>
>>>>> Admin queue is not per VF.
>>>>> Lets take concrete examples.
>>>>> 1. So for example, PCI PF can have one AQ.
>>>>> This AQ carries command to query/config MSI-X vector of VFs.
>>>>>
>>>>> 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
>>>>>
>>>>> 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
>>>>> May be something that is extremely hard to do over features bit.
>>>>> Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
>>>>>
>>>>>>> So a good example is,
>>>>>>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
>>>>>>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
>>>>>> in GVM.
>>>>>>> Both the functions will have AQ feature bit set.
>>>>>> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
>>>>>> true, don't we need some kind of address isolation like PASID?
>>>>>>
>>>>> Above one for IMS is not a good example. I replied the reasoning last week for it.
>>>>>
>>>>>>> Fair enough, so we have more users of admin queue than just MSI-X config.
>>>>>> Well, what I really meant is that we actually have more users of IMS.
>>>>>> That's is exactly what virito-mmio wants. In this case introducing admin
>>>>>> queue looks too heavyweight for that.
>>>>>>
>>>>> IMS config cannot be done over AQ as described in previous email in this thread.
>>>>>
>>>>>>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
>>>>>>>>> the queues 3. Update commit log to describe why config space is not
>>>>>>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
>>>>>>>> I fail to understand the scale/registeres issues. With the one of my previous
>>>>>>>> proposal (device selector), technically we don't even need any config space
>>>>>> or
>>>>>>>> BAR for VF or SF by multiplexing the registers for PF.
>>>>>>>>
>>>>>>> Scale issue is: when you want to create, query, manipulate hundreds of
>>>>>> objects, having shared MMIO register or configuration register, will be too
>>>>>> slow.
>>>>>>
>>>>>>
>>>>>> Ok, this need to be clarified in the commit log. And we need make sure
>>>>>> it's not an issue that is only happen for some specific vendor.
>>>>> It is present in the v2 commit log cover letter.
>>>>> Please let me know if you think it should be in the actual patch commit log.
>>>>>
>>>>>
>>>>>>> And additionally such register set doesn't scale to allow sharing large
>>>>>> number of bytes as DMA cannot be done.
>>>>>>
>>>>>>
>>>>>> That's true.
>>>>>>
>>>>>>
>>>>>>>     From physical device perspective, it doesn’t scale because device needs to
>>>>>> have those resources ready to answer on MMIO reads and for hundreds to
>>>>>> thousand of devices it just cannot do it.
>>>>>>> This is one of the reason for birth of IMS.
>>>>>> IMS allows the table to be stored in the memory and cached by the device
>>>>>> to have the best scalability. But I had other questions:
>>>>>>
>>>>>> 1) if we have a single admin virtqueue, there will still be contention
>>>>>> in the driver side
>>>>>>
>>>>> AQ inherently allows out of order commands execution.
>>>>> It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
>>>>>
>>>>> Which area/commands etc you think can lead to the contention?
>>>>>
>>>>>> 2) if we have per vf admin virtqueue, it still doesn't scale since it
>>>>>> occupies more hardware resources
>>>>>>
>>>>> That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
>>>>> Proposal is to have one admin queue in a virtio device.
>>>> Right ? where did we mention something that can imply otherwise ?
>>> Well, I don't know but probably this part,
>>>
>>> " PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ ..."
>>>
>>>>>>>> I do see one advantage is that the admin virtqueue is transport
>>>>>> independent
>>>>>>>> (or it could be used as a transport).
>>>>>>>>
>>>>>>> I am yet to read the transport part from [1].
>>>>>> Yes, the main goal is to be compatible with SIOV.
>>>>>>
>>>>> Admin queue is a command interface transport where higher layer services can be buit.
>>>>> This includes SR-IOV config, SIOV config.
>>>>> And v2 enables SIOV commands implementation whenever they are ready.
>>>>>
>>>>>>>>> 4. Improve documentation around msix config to link to sriov section of
>>>>>> virtio
>>>>>>>> spec
>>>>>>>>> 5. Describe error that if VF is bound to the device, admin commands
>>>>>>>> targeting VF can fail, describe this error code
>>>>>>>>> Did I miss anything?
>>>>>>>>>
>>>>>>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it
>>>>>> must
>>>>>>>> be in this proposal, what pieces prevents it do as follow-on.
>>>>>>>>> Cornelia, Jason,
>>>>>>>>> Can you please review current proposal as well before we revise v2?
>>>>>>>> If I understand correctly, most of the features (except for the admin
>>>>>>>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
>>>>>>>> discussed in the previous versions, I still think it's better:
>>>>>>>>
>>>>>>>> 1) adding sections in the basic device facility or data structure for
>>>>>>>> provisioning and MSI
>>>>>>>> 2) introduce admin virtqueue on top as an device interface for those
>>>>>>>> features
>>>>>>>>
>>>>>>> I didn't follow your suggestion. Can you please explain?
>>>>>>> Specifically "data structure for provisioning and MSI"..
>>>>>> I meant:
>>>>>>
>>>>>> There's a chapter "Basic Facilities of a Virtio Device", we can
>>>>>> introduce the concepts there like:
>>>>>>
>>>>>> 1) Managed device and Management device (terminology proposed by
>>>>>> Michael), and can use PF and VF as a example
>>>>>>
>>>>>> 2) Managed device provisioning (the data structure to specify the
>>>>>> attributes of a managed device (VF))
>>>>>>
>>>>>> 3) MSI
>>>>>>
>>>>> Above is good idea. Will revisit v2, if it is not arranged this way.
>>>> Let me make sure I understand, you would like to see a new chapter under
>>>> "Basic Facilities of a Virtio Device" that is
>>>>
>>>> called "Device management" and this chapter will explain in few words
>>>> the concept
>>> Yes.
>>>
>>>> and it will point to another chapter under "Basic Facilities
>>>> of a Virtio Device"
>>>>
>>>> that was introduced here "Admin Virtqueues" ?
>>> So far as I see from the proposal, it needs belong to PCI transport
>>> part or a new transport.
>> No it's not.
>>
>> It should stay in the basic/generic area like we discussed in the past
>> and already agreed on.
>>
>> Lets move forward please.
> Yes, for the general admin virtqueue part, it should be fine, but for
> SR-IOV ATTRS part, is it better to move it to PCI transport?

Did you see V2 ?

I've added a chapter in the PCI for PCI-specific admin capabilities.

And we have the "Admin Virtqueues" section to have a common place for 
all admin opcodes and cmd structure (optional and mandatory).

>
>>>> So you do agree that managing a managed (create/destroy/setup/etc...)
>>>> will be done using the AQ of the managing device ?
>>> I agree.
>>>
>>> Thanks
>> Ok so I guess we agree on the concept of this patch set and the AQ.
> Yes.
>
> Thanks
>
>> Thanks.
>>
>>>>>> And then we can introduced admin virtqueue in either
>>>>>>
>>>>>> 1) transport part
>>>>>>
>>>>>> or
>>>>>>
>>>>>> 2) PCI transport
>>>>>>
>>>>> It is not specific to PCI transport, and currently it is not a transport either.
>>>>> So admin queue will keep as general entity for admin work.
>>>>>
>>>>>> In the admin virtqueue, there will be commands to provision and
>>>>>> configure MSI.
>>>>>>
>>>>> Please review v2 if it is not arranged this way.
>>>>>
>>>>>>>> The leaves the chance for future extensions to allow those features to
>>>>>>>> be used by transport specific interface which will benefit for
>>>>>>>>
>>>>>>> AQ allows communication (command, response) between driver and device
>>>>>> in transport independent way.
>>>>>>> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
>>>>>>> Sometimes device configure its on IMS interrupt.
>>>>>>> Something else in future.
>>>>>>> So it is really a generic request-response queue.
>>>>>> I agree, but I think we can't mandate new features to a specific transport.
>>>>>>
>>>>> Certainly. Admin queue is transport independent.
>>>>> PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
>>>>> It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
>>>>>
>>>>> Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
>>>>> For example, IMS for VF or IMS for SF.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  9:07                                               ` Jason Wang
@ 2022-01-26  9:47                                                 ` Parav Pandit
  0 siblings, 0 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-26  9:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Max Gurtovoy, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha



> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, January 26, 2022 2:37 PM
> 
> On Wed, Jan 26, 2022 at 4:09 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, January 26, 2022 12:24 PM
> > >
> > > > >
> > > > > 1) the vendor and transport that doesn't want to use admin
> > > > > virtqueue
> > > > It is not the choice of vendor to use admin virtqueue or not. It
> > > > is the spec
> > > definition.
> > >
> > > We are discussing the proposal which hasn't been a part of the spec right
> now.
> > > It doesn't mean we can't do better.
> > Sure. It’s the proposal we are discussing, but I fail to understand why a vendor
> doesn’t want to use a admin queue due to which it needs to be detached.
> 
> What do you mean by "detached"?
Detaching the mgmt cmd from the admin queue.

> >
> > A descriptor in a virtqueue represents wide range of things.
> > 1. Sometime sending packet,
> > 2. sometimes configuring vlan,
> > 3. sometime configuring adding mac to table.
> > 4. sometime adding crypto session
> > 5. and now sometime managing a managed VF or SF device
> >
> > Item #5 is no different than #1 to #4.
> > Managed device configuration != "generic configure the device".
> 
> Well, I think then you need to explain this well in the patch.
> 
Did you get a chance to review v2 commit log?
It describes the motivation for AQ.

> So a virtqueue is needed only when DMA is required. That is the case for
> virtio_fs_req.
> 
> For other cases that DMA is not a must, it could or not be a queue interface.
It is not only the DMA, it is the ability to enqueue multiple requests to the device.

> > If you want to attribute it as "mandate" than yes, it is no different
> > than, virtio_fs_req is mandated on request vq.
> > virtio_crypto_op_ctrl_req ia mandated on control vq.
> >
> > What prevents both above examples to used AQ in generic manner?
> 
> Nothing, but for simple features like MSI I don't think we can exclude the
> possibility of using dedicated registers. For a simple setup like a guest that is
> using an MMIO device, using new virtqueue just for MSI seems an odd burden.
You mixing MSI configuration done by guest driver with AQ.
Guest driver can never use AQ for configuring its own vector configuration which must happen before AQ is created.
So AQ is not useful for that purpose anyway.
And how many vectors a MMIO device gets, can be always configured by HV over AQ.
Even for the simple setup.

> To clarify, we need to define what exactly did management mean? E.g is IMS
> considered to be mgmt or control?
What is IMS. Please be specific?
(a) IMS vector configuration before queue configuration by guest driver?
Or 
(b) IMS vector provisioning for a VF or SF by its parent(management) device by PF?

Above #a falls in the configuration area and needs transport specific registers.
Above #b falls in the mgmt bucket.

> > > > Michael also responded that device configuration will not end at
> > > > msix
> > > configuration.
> > > > So adding more and more tiny capabilities for each configuration
> > > > doesn't
> > > scale either.
> > > >
> > You need to ack above point otherwise we will keep discussing this
> > even 2 weeks from now. :)
> 
> I think I've acknowledged in another thread about the idea of using admin
> virtqueue.
> 
Ok. I am clear now that we have ack from you to use AQ for MSI-X provisioning by the HV PF.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-25 12:09                                 ` Michael S. Tsirkin
@ 2022-01-26 13:29                                   ` Parav Pandit
  2022-01-26 14:11                                     ` Michael S. Tsirkin
  2022-01-28  4:35                                     ` Jason Wang
  0 siblings, 2 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-26 13:29 UTC (permalink / raw)
  To: Michael S. Tsirkin, Max Gurtovoy
  Cc: Jason Wang, cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Tuesday, January 25, 2022 5:39 PM

> >
> > So you do agree that managing a managed (create/destroy/setup/etc...)
> > will be done using the AQ of the managing device ?
> 
> I think Jason asked that the management commands are split from the queue
> itself, such that they can be implemented in more ways down the road.

Admin commands involved DMA. And also multiple commands can be enqueued to the device.
I don't see any other construct in the spec other than vq that enables driver and device to achieve both.
Hence split the queue from command is very weird for spec to do.

A recent addition like virtio_fs_req didn’t adopt this suggestion.

If we just want to split so that admin commands can be transported via non admin queue, we should remove below line from spec:

"The Admin command set defines the commands that may be issued only to the admin virtqueue"

And reword it as below,

The Admin command set defines the commands to manipulate various features/attributes of another device within the same group...
When VIRTIO_F_ADMIN_VQ feature is negotiated, admin commands must be transported through the admin queue.

Looks ok?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26 13:29                                   ` Parav Pandit
@ 2022-01-26 14:11                                     ` Michael S. Tsirkin
  2022-01-27  3:49                                       ` Parav Pandit
  2022-01-28  4:35                                     ` Jason Wang
  1 sibling, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-26 14:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, Jason Wang, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha

On Wed, Jan 26, 2022 at 01:29:27PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Tuesday, January 25, 2022 5:39 PM
> 
> > >
> > > So you do agree that managing a managed (create/destroy/setup/etc...)
> > > will be done using the AQ of the managing device ?
> > 
> > I think Jason asked that the management commands are split from the queue
> > itself, such that they can be implemented in more ways down the road.
> 
> Admin commands involved DMA. And also multiple commands can be enqueued to the device.
> I don't see any other construct in the spec other than vq that enables driver and device to achieve both.
> Hence split the queue from command is very weird for spec to do.

 VIRTIO_PCI_CAP_SHARED_MEMORY_CFG can do this.


> A recent addition like virtio_fs_req didn’t adopt this suggestion.
> 
> If we just want to split so that admin commands can be transported via non admin queue, we should remove below line from spec:
> 
> "The Admin command set defines the commands that may be issued only to the admin virtqueue"

sure

> And reword it as below,
> 
> The Admin command set defines the commands to manipulate various features/attributes of another device within the same group...
> When VIRTIO_F_ADMIN_VQ feature is negotiated, admin commands must be transported through the admin queue.
> 
> Looks ok?

I see no problem with that.

-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26  9:45                                       ` Max Gurtovoy
@ 2022-01-27  3:46                                         ` Jason Wang
  0 siblings, 0 replies; 110+ messages in thread
From: Jason Wang @ 2022-01-27  3:46 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Parav Pandit, Michael S. Tsirkin, cohuck, virtio-dev,
	Shahaf Shuler, Oren Duer, stefanha

On Wed, Jan 26, 2022 at 5:46 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
>
>
> On 1/26/2022 11:34 AM, Jason Wang wrote:
> > On Wed, Jan 26, 2022 at 5:27 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
> >>
> >> On 1/26/2022 9:03 AM, Jason Wang wrote:
> >>> On Tue, Jan 25, 2022 at 6:59 PM Max Gurtovoy <mgurtovoy@nvidia.com> wrote:
> >>>> On 1/25/2022 5:52 AM, Parav Pandit wrote:
> >>>>> Hi Jason,
> >>>>>
> >>>>>> From: Jason Wang <jasowang@redhat.com>
> >>>>>> Sent: Tuesday, January 25, 2022 8:59 AM
> >>>>>>
> >>>>>> 在 2022/1/19 下午12:48, Parav Pandit 写道:
> >>>>>>>> From: Jason Wang <jasowang@redhat.com>
> >>>>>>>> Sent: Wednesday, January 19, 2022 9:33 AM
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> It it means IMS, there's already a proposal[1] that introduce MSI
> >>>>>>>> commands via the admin virtqueue. And we had similar requirement for
> >>>>>>>> virtio-MMIO[2] and managed device or SF [3], so I would rather to
> >>>>>>>> introduce IMS (need a better name though) as a basic facility instead
> >>>>>>>> of tie it to any specific transport.
> >>>>>>>>
> >>>>>>> IMS of [1] is a interrupt configuration by the virtio driver for the device is it
> >>>>>> driving, which needs a queue.
> >>>>>>> So regardless of the device type as PCI PF/VF/SF/ADI, there is desire to have a
> >>>>>> generic admin queue not attached to device type.
> >>>>>>> And AQ in this proposal exactly serves this purpose.
> >>>>>>>
> >>>>>>> Device configuring its own IMS vector vs PCI PF configuring VF's MSI-X max
> >>>>>> vector count are two different functionality.
> >>>>>>> Both of these commands can ride on a generic queue.
> >>>>>>> However the queue is not same, because PF owns its own admin queue
> >>>>>>> (for vf msix config), VF or SF operates its own admin queue (for IMS
> >>>>>>> config).
> >>>>>> So I think in the next version we need to clarify:
> >>>>>>
> >>>>>> 1) is there a single admin virtqueue shared by all the VFs and PF
> >>>>>>
> >>>>>> or
> >>>>>>
> >>>>>> 2) per VF/PF admin virtqueue, and how does the driver know how to find the
> >>>>>> corresponding admin virtqueue
> >>>>>>
> >>>>> Admin queue is not per VF.
> >>>>> Lets take concrete examples.
> >>>>> 1. So for example, PCI PF can have one AQ.
> >>>>> This AQ carries command to query/config MSI-X vector of VFs.
> >>>>>
> >>>>> 2. In second example, PCI PF is creating/destroying SFs. This is again done by using the AQ of the PCI PF.
> >>>>>
> >>>>> 3. A PCI VF has its own AQ to configure some of its own generic attribute, don't know which is that today.
> >>>>> May be something that is extremely hard to do over features bit.
> >>>>> Currently proposed v2 doesn't restrict admin queue to be within PCI PF or VF or that matter not limited to other transports.
> >>>>>
> >>>>>>> So a good example is,
> >>>>>>> 1. PCI PF configures 8 MSI-X or 16 IMS vectors for the VF using PF_AQ in HV.
> >>>>>>> 2. PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ
> >>>>>> in GVM.
> >>>>>>> Both the functions will have AQ feature bit set.
> >>>>>> Where did the VF_AQ sit? I guess it belongs to the VF. But if this is
> >>>>>> true, don't we need some kind of address isolation like PASID?
> >>>>>>
> >>>>> Above one for IMS is not a good example. I replied the reasoning last week for it.
> >>>>>
> >>>>>>> Fair enough, so we have more users of admin queue than just MSI-X config.
> >>>>>> Well, what I really meant is that we actually have more users of IMS.
> >>>>>> That's is exactly what virito-mmio wants. In this case introducing admin
> >>>>>> queue looks too heavyweight for that.
> >>>>>>
> >>>>> IMS config cannot be done over AQ as described in previous email in this thread.
> >>>>>
> >>>>>>>>> 2. AQ to follows IN_ORDER and INDIRECT_DESC negotiation like rest of
> >>>>>>>>> the queues 3. Update commit log to describe why config space is not
> >>>>>>>>> chosen (scale, on-die registers, uniform way to handle all aq cmds)
> >>>>>>>> I fail to understand the scale/registeres issues. With the one of my previous
> >>>>>>>> proposal (device selector), technically we don't even need any config space
> >>>>>> or
> >>>>>>>> BAR for VF or SF by multiplexing the registers for PF.
> >>>>>>>>
> >>>>>>> Scale issue is: when you want to create, query, manipulate hundreds of
> >>>>>> objects, having shared MMIO register or configuration register, will be too
> >>>>>> slow.
> >>>>>>
> >>>>>>
> >>>>>> Ok, this need to be clarified in the commit log. And we need make sure
> >>>>>> it's not an issue that is only happen for some specific vendor.
> >>>>> It is present in the v2 commit log cover letter.
> >>>>> Please let me know if you think it should be in the actual patch commit log.
> >>>>>
> >>>>>
> >>>>>>> And additionally such register set doesn't scale to allow sharing large
> >>>>>> number of bytes as DMA cannot be done.
> >>>>>>
> >>>>>>
> >>>>>> That's true.
> >>>>>>
> >>>>>>
> >>>>>>>     From physical device perspective, it doesn’t scale because device needs to
> >>>>>> have those resources ready to answer on MMIO reads and for hundreds to
> >>>>>> thousand of devices it just cannot do it.
> >>>>>>> This is one of the reason for birth of IMS.
> >>>>>> IMS allows the table to be stored in the memory and cached by the device
> >>>>>> to have the best scalability. But I had other questions:
> >>>>>>
> >>>>>> 1) if we have a single admin virtqueue, there will still be contention
> >>>>>> in the driver side
> >>>>>>
> >>>>> AQ inherently allows out of order commands execution.
> >>>>> It shouldn't face contention. For example 1K depth AQ should be serving hundreds of descriptors commands in parallel for SF creation, VF MSI-X config and more.
> >>>>>
> >>>>> Which area/commands etc you think can lead to the contention?
> >>>>>
> >>>>>> 2) if we have per vf admin virtqueue, it still doesn't scale since it
> >>>>>> occupies more hardware resources
> >>>>>>
> >>>>> That is too heavy, and doesn’t scale. Proposal is to not have per vf admin queue.
> >>>>> Proposal is to have one admin queue in a virtio device.
> >>>> Right ? where did we mention something that can imply otherwise ?
> >>> Well, I don't know but probably this part,
> >>>
> >>> " PCI VF when using IMS configures, IMS data, vector, mask etc using VF_AQ ..."
> >>>
> >>>>>>>> I do see one advantage is that the admin virtqueue is transport
> >>>>>> independent
> >>>>>>>> (or it could be used as a transport).
> >>>>>>>>
> >>>>>>> I am yet to read the transport part from [1].
> >>>>>> Yes, the main goal is to be compatible with SIOV.
> >>>>>>
> >>>>> Admin queue is a command interface transport where higher layer services can be buit.
> >>>>> This includes SR-IOV config, SIOV config.
> >>>>> And v2 enables SIOV commands implementation whenever they are ready.
> >>>>>
> >>>>>>>>> 4. Improve documentation around msix config to link to sriov section of
> >>>>>> virtio
> >>>>>>>> spec
> >>>>>>>>> 5. Describe error that if VF is bound to the device, admin commands
> >>>>>>>> targeting VF can fail, describe this error code
> >>>>>>>>> Did I miss anything?
> >>>>>>>>>
> >>>>>>>>> Yet to receive your feedback on group, if/why is it needed and, why/if it
> >>>>>> must
> >>>>>>>> be in this proposal, what pieces prevents it do as follow-on.
> >>>>>>>>> Cornelia, Jason,
> >>>>>>>>> Can you please review current proposal as well before we revise v2?
> >>>>>>>> If I understand correctly, most of the features (except for the admin
> >>>>>>>> virtqueue in_order stuffs) are not specific to the admin virtqueue. As
> >>>>>>>> discussed in the previous versions, I still think it's better:
> >>>>>>>>
> >>>>>>>> 1) adding sections in the basic device facility or data structure for
> >>>>>>>> provisioning and MSI
> >>>>>>>> 2) introduce admin virtqueue on top as an device interface for those
> >>>>>>>> features
> >>>>>>>>
> >>>>>>> I didn't follow your suggestion. Can you please explain?
> >>>>>>> Specifically "data structure for provisioning and MSI"..
> >>>>>> I meant:
> >>>>>>
> >>>>>> There's a chapter "Basic Facilities of a Virtio Device", we can
> >>>>>> introduce the concepts there like:
> >>>>>>
> >>>>>> 1) Managed device and Management device (terminology proposed by
> >>>>>> Michael), and can use PF and VF as a example
> >>>>>>
> >>>>>> 2) Managed device provisioning (the data structure to specify the
> >>>>>> attributes of a managed device (VF))
> >>>>>>
> >>>>>> 3) MSI
> >>>>>>
> >>>>> Above is good idea. Will revisit v2, if it is not arranged this way.
> >>>> Let me make sure I understand, you would like to see a new chapter under
> >>>> "Basic Facilities of a Virtio Device" that is
> >>>>
> >>>> called "Device management" and this chapter will explain in few words
> >>>> the concept
> >>> Yes.
> >>>
> >>>> and it will point to another chapter under "Basic Facilities
> >>>> of a Virtio Device"
> >>>>
> >>>> that was introduced here "Admin Virtqueues" ?
> >>> So far as I see from the proposal, it needs belong to PCI transport
> >>> part or a new transport.
> >> No it's not.
> >>
> >> It should stay in the basic/generic area like we discussed in the past
> >> and already agreed on.
> >>
> >> Lets move forward please.
> > Yes, for the general admin virtqueue part, it should be fine, but for
> > SR-IOV ATTRS part, is it better to move it to PCI transport?
>
> Did you see V2 ?

Not yet for the time of last reply.

>
> I've added a chapter in the PCI for PCI-specific admin capabilities.

But I still see stuffs like PF/VF in the general chapter.

Do we need to

1) using more general terminology

or

2) move those to PCI part?

Thanks

>
> And we have the "Admin Virtqueues" section to have a common place for
> all admin opcodes and cmd structure (optional and mandatory).
>
> >
> >>>> So you do agree that managing a managed (create/destroy/setup/etc...)
> >>>> will be done using the AQ of the managing device ?
> >>> I agree.
> >>>
> >>> Thanks
> >> Ok so I guess we agree on the concept of this patch set and the AQ.
> > Yes.
> >
> > Thanks
> >
> >> Thanks.
> >>
> >>>>>> And then we can introduced admin virtqueue in either
> >>>>>>
> >>>>>> 1) transport part
> >>>>>>
> >>>>>> or
> >>>>>>
> >>>>>> 2) PCI transport
> >>>>>>
> >>>>> It is not specific to PCI transport, and currently it is not a transport either.
> >>>>> So admin queue will keep as general entity for admin work.
> >>>>>
> >>>>>> In the admin virtqueue, there will be commands to provision and
> >>>>>> configure MSI.
> >>>>>>
> >>>>> Please review v2 if it is not arranged this way.
> >>>>>
> >>>>>>>> The leaves the chance for future extensions to allow those features to
> >>>>>>>> be used by transport specific interface which will benefit for
> >>>>>>>>
> >>>>>>> AQ allows communication (command, response) between driver and device
> >>>>>> in transport independent way.
> >>>>>>> Sometimes it query/set transport specific fields like MSI-X vectors of VF.
> >>>>>>> Sometimes device configure its on IMS interrupt.
> >>>>>>> Something else in future.
> >>>>>>> So it is really a generic request-response queue.
> >>>>>> I agree, but I think we can't mandate new features to a specific transport.
> >>>>>>
> >>>>> Certainly. Admin queue is transport independent.
> >>>>> PCI MSI-X configuration is PCI transport specific command, so structures are defined it accordingly.
> >>>>> It is similar to struct virtio_pci_cap, struct virtio_pci_common_cfg etc.
> >>>>>
> >>>>> Any other transport will have transport specific interrupt configuration. So it will be defined accordingly whenever that occurs.
> >>>>> For example, IMS for VF or IMS for SF.
>


^ permalink raw reply	[flat|nested] 110+ messages in thread

* RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26 14:11                                     ` Michael S. Tsirkin
@ 2022-01-27  3:49                                       ` Parav Pandit
  2022-01-27 13:05                                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 110+ messages in thread
From: Parav Pandit @ 2022-01-27  3:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, Jason Wang, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, January 26, 2022 7:42 PM
> 
> On Wed, Jan 26, 2022 at 01:29:27PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Tuesday, January 25, 2022 5:39 PM
> >
> > > >
> > > > So you do agree that managing a managed
> > > > (create/destroy/setup/etc...) will be done using the AQ of the managing
> device ?
> > >
> > > I think Jason asked that the management commands are split from the
> > > queue itself, such that they can be implemented in more ways down the
> road.
> >
> > Admin commands involved DMA. And also multiple commands can be
> enqueued to the device.
> > I don't see any other construct in the spec other than vq that enables driver
> and device to achieve both.
> > Hence split the queue from command is very weird for spec to do.
> 
>  VIRTIO_PCI_CAP_SHARED_MEMORY_CFG can do this.
>
Device needs to always have shared memory region always in available range for driver to read.
Driver still needs to copy data from shared region to own memory.
How DMA Can be done in the host driver PA using shared memory region?

In future when SIOV occurs, each new SIOV SF/ADI will require new window in the shared regions exposed statically during PCI discovery time.
So it doesn't scale.
  
> 
> > A recent addition like virtio_fs_req didn’t adopt this suggestion.
> >
> > If we just want to split so that admin commands can be transported via non
> admin queue, we should remove below line from spec:
> >
> > "The Admin command set defines the commands that may be issued only to
> the admin virtqueue"
> 
> sure
> 
> > And reword it as below,
> >
> > The Admin command set defines the commands to manipulate various
> features/attributes of another device within the same group...
> > When VIRTIO_F_ADMIN_VQ feature is negotiated, admin commands must
> be transported through the admin queue.
> >
> > Looks ok?
> 
> I see no problem with that.

Since we agree for this, I considered AQ vs no_AQ as closed topic?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-27  3:49                                       ` Parav Pandit
@ 2022-01-27 13:05                                         ` Michael S. Tsirkin
  2022-01-27 13:25                                           ` [virtio-dev] " Parav Pandit
  0 siblings, 1 reply; 110+ messages in thread
From: Michael S. Tsirkin @ 2022-01-27 13:05 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Max Gurtovoy, Jason Wang, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha

On Thu, Jan 27, 2022 at 03:49:58AM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, January 26, 2022 7:42 PM
> > 
> > On Wed, Jan 26, 2022 at 01:29:27PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: Tuesday, January 25, 2022 5:39 PM
> > >
> > > > >
> > > > > So you do agree that managing a managed
> > > > > (create/destroy/setup/etc...) will be done using the AQ of the managing
> > device ?
> > > >
> > > > I think Jason asked that the management commands are split from the
> > > > queue itself, such that they can be implemented in more ways down the
> > road.
> > >
> > > Admin commands involved DMA. And also multiple commands can be
> > enqueued to the device.
> > > I don't see any other construct in the spec other than vq that enables driver
> > and device to achieve both.
> > > Hence split the queue from command is very weird for spec to do.
> > 
> >  VIRTIO_PCI_CAP_SHARED_MEMORY_CFG can do this.
> >
> Device needs to always have shared memory region always in available range for driver to read.
> Driver still needs to copy data from shared region to own memory.
> How DMA Can be done in the host driver PA using shared memory region?

It can't but you don't need DMA if device is writing its own memory.

> In future when SIOV occurs, each new SIOV SF/ADI will require new
> window in the shared regions exposed statically during PCI discovery
> time.

Donnu if it's statically. But we are talking about AQ thing, that
is just per group not per SF, so might not be too bad.

> So it doesn't scale.


Maybe. You said that's the only way to queue commands to device in the
spec, I just responded with a way. No need to code that up right now,
but just something to keep in mind, things are not absolute here.

> > 
> > > A recent addition like virtio_fs_req didn’t adopt this suggestion.
> > >
> > > If we just want to split so that admin commands can be transported via non
> > admin queue, we should remove below line from spec:
> > >
> > > "The Admin command set defines the commands that may be issued only to
> > the admin virtqueue"
> > 
> > sure
> > 
> > > And reword it as below,
> > >
> > > The Admin command set defines the commands to manipulate various
> > features/attributes of another device within the same group...
> > > When VIRTIO_F_ADMIN_VQ feature is negotiated, admin commands must
> > be transported through the admin queue.
> > >
> > > Looks ok?
> > 
> > I see no problem with that.
> 
> Since we agree for this, I considered AQ vs no_AQ as closed topic?

It's always hard to answer questions like this when all I have to go on
is a vague idea of what you mean.  Based on the discussion I am guessing
that your v3 will use the VQ in a way that is not controversial if that
is what you are asking about.
-- 
MST


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [virtio-dev] RE: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-27 13:05                                         ` Michael S. Tsirkin
@ 2022-01-27 13:25                                           ` Parav Pandit
  0 siblings, 0 replies; 110+ messages in thread
From: Parav Pandit @ 2022-01-27 13:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Max Gurtovoy, Jason Wang, cohuck, virtio-dev, Shahaf Shuler,
	Oren Duer, stefanha


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, January 27, 2022 6:36 PM
> 
> On Thu, Jan 27, 2022 at 03:49:58AM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, January 26, 2022 7:42 PM
> > >
> > > On Wed, Jan 26, 2022 at 01:29:27PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Tuesday, January 25, 2022 5:39 PM
> > > >
> > > > > >
> > > > > > So you do agree that managing a managed
> > > > > > (create/destroy/setup/etc...) will be done using the AQ of the
> > > > > > managing
> > > device ?
> > > > >
> > > > > I think Jason asked that the management commands are split from
> > > > > the queue itself, such that they can be implemented in more ways
> > > > > down the
> > > road.
> > > >
> > > > Admin commands involved DMA. And also multiple commands can be
> > > enqueued to the device.
> > > > I don't see any other construct in the spec other than vq that
> > > > enables driver
> > > and device to achieve both.
> > > > Hence split the queue from command is very weird for spec to do.
> > >
> > >  VIRTIO_PCI_CAP_SHARED_MEMORY_CFG can do this.
> > >
> > Device needs to always have shared memory region always in available
> range for driver to read.
> > Driver still needs to copy data from shared region to own memory.
> > How DMA Can be done in the host driver PA using shared memory region?
> 
> It can't but you don't need DMA if device is writing its own memory.
And sometimes host also needs to write. So DMA is required.

> 
> > In future when SIOV occurs, each new SIOV SF/ADI will require new
> > window in the shared regions exposed statically during PCI discovery
> > time.
> 
> Donnu if it's statically. But we are talking about AQ thing, that is just per group
> not per SF, so might not be too bad.
I must repeat again that it requires memory to be always available at scale.
For example if a virtio device state requires 1Kbytes of metadata, you are asking device to implement 1Mbytes of memory to be accessible in PCI read latency time window.
This is super waste of device resources for querying objects used only during mgmt. time.

If you were thinking to use this shared memory regions as small region and using it for command queuing interface, than probably yes, but than it starts looking like spec's well defined VQ object.
And at that point reinventing VQ of shared memory just doesn’t make much sense.

> > Since we agree for this, I considered AQ vs no_AQ as closed topic?
> 
> It's always hard to answer questions like this when all I have to go on is a vague
> idea of what you mean. 
Which area is still vague? Can you please be specific, so that it can be covered in v3?
We answered all or almost the queries.
We addressed the comments you requested in V2.
Did you get a chance to review v2?

Jason's comment of splitting admin commands from admin queue is major item to fix in v3.
Documenting section differently, some in transport specific area was another fix to happen in v3.

In v3, if we are again going to discuss AQ vs config mmio region vs shared region, than we should discuss now before posting v3.
So please let me know if this area is vague or something else.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER
  2022-01-26 13:29                                   ` Parav Pandit
  2022-01-26 14:11                                     ` Michael S. Tsirkin
@ 2022-01-28  4:35                                     ` Jason Wang
  1 sibling, 0 replies; 110+ messages in thread
From: Jason Wang @ 2022-01-28  4:35 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, Max Gurtovoy
  Cc: cohuck, virtio-dev, Shahaf Shuler, Oren Duer, stefanha


在 2022/1/26 下午9:29, Parav Pandit 写道:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Tuesday, January 25, 2022 5:39 PM
>>> So you do agree that managing a managed (create/destroy/setup/etc...)
>>> will be done using the AQ of the managing device ?
>> I think Jason asked that the management commands are split from the queue
>> itself, such that they can be implemented in more ways down the road.
> Admin commands involved DMA. And also multiple commands can be enqueued to the device.
> I don't see any other construct in the spec other than vq that enables driver and device to achieve both.
> Hence split the queue from command is very weird for spec to do.


I'd say it really depends on the transport. Not all transport uses 
registers. There could be transport that:

1) use DMA (e.g CCW)

2) use transport specific queue

3) doesn't need DMA at all (e.g using shared memory like virtio-rproc)

Having another VQ seems redundant.

Thanks


>
> A recent addition like virtio_fs_req didn’t adopt this suggestion.
>
> If we just want to split so that admin commands can be transported via non admin queue, we should remove below line from spec:
>
> "The Admin command set defines the commands that may be issued only to the admin virtqueue"
>
> And reword it as below,
>
> The Admin command set defines the commands to manipulate various features/attributes of another device within the same group...
> When VIRTIO_F_ADMIN_VQ feature is negotiated, admin commands must be transported through the admin queue.
>
> Looks ok?


^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2022-01-28  4:35 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 14:50 [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Max Gurtovoy
2022-01-13 14:50 ` [PATCH 1/5] Add virtio Admin Virtqueue specification Max Gurtovoy
2022-01-13 17:53   ` Michael S. Tsirkin
2022-01-17  9:56     ` Max Gurtovoy
2022-01-17 21:30       ` Michael S. Tsirkin
2022-01-18  3:22         ` Parav Pandit
2022-01-18  6:17           ` Michael S. Tsirkin
2022-01-18  7:57             ` Parav Pandit
2022-01-18  8:05               ` Michael S. Tsirkin
2022-01-18  8:23                 ` Parav Pandit
2022-01-18 10:26                   ` Michael S. Tsirkin
2022-01-18 10:30                     ` Parav Pandit
2022-01-18 10:41                       ` Michael S. Tsirkin
2022-01-19  3:04         ` Jason Wang
2022-01-19  8:11           ` Michael S. Tsirkin
2022-01-25  3:35             ` Jason Wang
2022-01-17 14:12     ` Parav Pandit
2022-01-17 22:03       ` Michael S. Tsirkin
2022-01-18  3:36         ` Parav Pandit
2022-01-18  7:07           ` Michael S. Tsirkin
2022-01-18  7:14             ` Parav Pandit
2022-01-18  7:20               ` Michael S. Tsirkin
2022-01-19 11:33                 ` Max Gurtovoy
2022-01-19 12:21                   ` Parav Pandit
2022-01-19 14:47                     ` Max Gurtovoy
2022-01-19 15:38                       ` Michael S. Tsirkin
2022-01-19 15:47                         ` Max Gurtovoy
2022-01-13 14:51 ` [PATCH 2/5] Introduce VIRTIO_F_ADMIN_VQ_INDIRECT_DESC/VIRTIO_F_ADMIN_VQ_IN_ORDER Max Gurtovoy
2022-01-13 15:33   ` Michael S. Tsirkin
2022-01-13 17:07     ` Max Gurtovoy
2022-01-13 17:25       ` Michael S. Tsirkin
2022-01-17 13:59         ` Parav Pandit
2022-01-17 22:14           ` Michael S. Tsirkin
2022-01-18  4:44             ` Parav Pandit
2022-01-18  6:23               ` Michael S. Tsirkin
2022-01-18  6:32                 ` Parav Pandit
2022-01-18  6:54                   ` Michael S. Tsirkin
2022-01-18  7:07                     ` Parav Pandit
2022-01-18  7:12                       ` Michael S. Tsirkin
2022-01-18  7:30                         ` Parav Pandit
2022-01-18  7:40                           ` Michael S. Tsirkin
2022-01-19  4:21                             ` Jason Wang
2022-01-19  9:30                               ` Michael S. Tsirkin
2022-01-25  3:39                                 ` Jason Wang
2022-01-18 10:38                           ` Michael S. Tsirkin
2022-01-18 10:50                             ` Parav Pandit
2022-01-18 15:09                               ` Michael S. Tsirkin
2022-01-18 17:17                                 ` Parav Pandit
2022-01-19  7:20                                   ` Michael S. Tsirkin
2022-01-19  8:15                                     ` [virtio-dev] " Parav Pandit
2022-01-19  8:21                                       ` Michael S. Tsirkin
2022-01-19 10:10                                         ` Parav Pandit
2022-01-19 16:40                                           ` Michael S. Tsirkin
2022-01-19 17:07                                             ` Parav Pandit
2022-01-18  7:13                       ` Michael S. Tsirkin
2022-01-18  7:21                         ` Parav Pandit
2022-01-18  7:37                           ` Michael S. Tsirkin
2022-01-19  4:03                       ` Jason Wang
2022-01-19  4:48                         ` Parav Pandit
2022-01-19 20:25                           ` Parav Pandit
2022-01-25  3:45                             ` Jason Wang
2022-01-25  4:07                               ` Parav Pandit
2022-01-25  3:29                           ` Jason Wang
2022-01-25  3:52                             ` Parav Pandit
2022-01-25 10:59                               ` Max Gurtovoy
2022-01-25 12:09                                 ` Michael S. Tsirkin
2022-01-26 13:29                                   ` Parav Pandit
2022-01-26 14:11                                     ` Michael S. Tsirkin
2022-01-27  3:49                                       ` Parav Pandit
2022-01-27 13:05                                         ` Michael S. Tsirkin
2022-01-27 13:25                                           ` [virtio-dev] " Parav Pandit
2022-01-28  4:35                                     ` Jason Wang
2022-01-26  7:03                                 ` Jason Wang
2022-01-26  9:27                                   ` Max Gurtovoy
2022-01-26  9:34                                     ` Jason Wang
2022-01-26  9:45                                       ` Max Gurtovoy
2022-01-27  3:46                                         ` Jason Wang
2022-01-26  5:04                               ` Jason Wang
2022-01-26  5:26                                 ` Parav Pandit
2022-01-26  5:45                                   ` Jason Wang
2022-01-26  5:58                                     ` Parav Pandit
2022-01-26  6:06                                       ` Jason Wang
2022-01-26  6:24                                         ` Parav Pandit
2022-01-26  6:54                                           ` Jason Wang
2022-01-26  8:09                                             ` Parav Pandit
2022-01-26  9:07                                               ` Jason Wang
2022-01-26  9:47                                                 ` Parav Pandit
2022-01-13 14:51 ` [PATCH 3/5] virtio-blk: add support for VIRTIO_F_ADMIN_VQ Max Gurtovoy
2022-01-13 18:24   ` Michael S. Tsirkin
2022-01-13 14:51 ` [PATCH 4/5] virtio-net: " Max Gurtovoy
2022-01-13 17:56   ` Michael S. Tsirkin
2022-01-16  9:47     ` Max Gurtovoy
2022-01-16 16:45       ` Michael S. Tsirkin
2022-01-17 14:07       ` Parav Pandit
2022-01-17 22:22         ` Michael S. Tsirkin
2022-01-18  2:18           ` Jason Wang
2022-01-18  5:25             ` Michael S. Tsirkin
2022-01-19  4:16               ` Jason Wang
2022-01-19  9:26                 ` Michael S. Tsirkin
2022-01-25  3:53                   ` Jason Wang
2022-01-25  7:19                     ` Michael S. Tsirkin
2022-01-26  5:49                       ` Jason Wang
2022-01-26  7:02                         ` Michael S. Tsirkin
2022-01-26  7:10                           ` Jason Wang
2022-01-13 14:51 ` [PATCH 5/5] Add support for dynamic MSI-X vector mgmt for VFs Max Gurtovoy
2022-01-13 18:20   ` Michael S. Tsirkin
2022-01-18 10:38   ` Michael S. Tsirkin
2022-01-13 18:32 ` [PATCH v1 0/5] VIRTIO: Provision maximum MSI-X vectors for a VF Michael S. Tsirkin
2022-01-17 10:00   ` Shahaf Shuler
2022-01-17 21:41     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.