[virtio-dev] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

All of lore.kernel.org
 help / color / mirror / Atom feed

* [virtio-dev] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-06  8:16 ` Zhu Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This series introduces
1)a new SUSPEND bit in the device status
Which is used to suspend the device, so that the device states
and virtqueue states are stabilized.

2)virtqueue state and its accessor, to get and set last_avail_idx
and last_used_idx of virtqueues.

The main usecase of these new facilities is Live Migration.

Future work: dirty page tracking and in-flight descriptors.

This series addresses many comments from Jason, Stefan and Eugenio
from RFC series.

Zhu Lingshan (5):
  virtio: introduce vq state as basic facility
  virtio: introduce SUSPEND bit in device status
  virtqueue: constraints for virtqueue state
  virtqueue: ignore resetting vqs when SUSPEND
  virtio-pci: implement VIRTIO_F_QUEUE_STATE

 content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
 transport-pci.tex |  18 +++++++
 2 files changed, 136 insertions(+)

-- 
2.35.3

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-06  8:16 ` Zhu Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This series introduces
1)a new SUSPEND bit in the device status
Which is used to suspend the device, so that the device states
and virtqueue states are stabilized.

2)virtqueue state and its accessor, to get and set last_avail_idx
and last_used_idx of virtqueues.

The main usecase of these new facilities is Live Migration.

Future work: dirty page tracking and in-flight descriptors.

This series addresses many comments from Jason, Stefan and Eugenio
from RFC series.

Zhu Lingshan (5):
  virtio: introduce vq state as basic facility
  virtio: introduce SUSPEND bit in device status
  virtqueue: constraints for virtqueue state
  virtqueue: ignore resetting vqs when SUSPEND
  virtio-pci: implement VIRTIO_F_QUEUE_STATE

 content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
 transport-pci.tex |  18 +++++++
 2 files changed, 136 insertions(+)

-- 
2.35.3

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:16   ` Zhu Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch adds new device facility to save and restore virtqueue
state. The virtqueue state is split into two parts:

- The available state: The state that is used for read the next
  available buffer.
- The used state: The state that is used for make buffer used.

This will simply the transport specific method implementation. E.g two
le16 could be used instead of a single le32). For split virtqueue, we
only need the available state since the used state is implemented in
the virtqueue itself (the used index). For packed virtqueue, we need
both the available state and the used state.

Those states are required to implement live migration support for
virtio device.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/content.tex b/content.tex
index 0a62dce..0e492cd 100644
--- a/content.tex
+++ b/content.tex
@@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
 types. It is RECOMMENDED that devices generate version 4
 UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
 
+\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
+
+When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
+get the device internal virtqueue state through the following
+fields. The implementation of the interfaces is transport specific.
+
+\subsection{\field{Available State} Field}
+
+The available state field is two bytes of virtqueue state that is used by
+the device to read the next available buffer.
+
+When VIRTIO_RING_F_PACKED is not negotiated, it contains:
+
+\begin{lstlisting}
+le16 last_avail_idx;
+\end{lstlisting}
+
+The \field{last_avail_idx} field is the free-running available ring
+index where the device will read the next available head of a
+descriptor chain.
+
+See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
+
+When VIRTIO_RING_F_PACKED is negotiated, it contains:
+
+\begin{lstlisting}
+le16 {
+  last_avail_idx : 15;
+  last_avail_wrap_counter : 1;
+};
+\end{lstlisting}
+
+The \field{last_avail_idx} field is the free-running location
+where the device read the next descriptor from the virtqueue descriptor ring.
+
+The \field{last_avail_wrap_counter} field is the last driver ring wrap
+counter that was observed by the device.
+
+See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
+
+\subsection{\field{Used State} Field}
+
+The used state field is two bytes of virtqueue state that is used by
+the device when marking a buffer used.
+
+When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
+
+\begin{lstlisting}
+le16 {
+  used_idx : 15;
+  used_wrap_counter : 1;
+};
+\end{lstlisting}
+
+The \field{used_idx} field is the free-running location where the device write the next
+used descriptor to the descriptor ring.
+
+The \field{used_wrap_counter} field is the wrap counter that is used
+by the device.
+
+See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
+
+When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
+is always 0
+
 \input{admin.tex}
 
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
-- 
2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
@ 2023-09-06  8:16   ` Zhu Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch adds new device facility to save and restore virtqueue
state. The virtqueue state is split into two parts:

- The available state: The state that is used for read the next
  available buffer.
- The used state: The state that is used for make buffer used.

This will simply the transport specific method implementation. E.g two
le16 could be used instead of a single le32). For split virtqueue, we
only need the available state since the used state is implemented in
the virtqueue itself (the used index). For packed virtqueue, we need
both the available state and the used state.

Those states are required to implement live migration support for
virtio device.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/content.tex b/content.tex
index 0a62dce..0e492cd 100644
--- a/content.tex
+++ b/content.tex
@@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
 types. It is RECOMMENDED that devices generate version 4
 UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
 
+\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
+
+When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
+get the device internal virtqueue state through the following
+fields. The implementation of the interfaces is transport specific.
+
+\subsection{\field{Available State} Field}
+
+The available state field is two bytes of virtqueue state that is used by
+the device to read the next available buffer.
+
+When VIRTIO_RING_F_PACKED is not negotiated, it contains:
+
+\begin{lstlisting}
+le16 last_avail_idx;
+\end{lstlisting}
+
+The \field{last_avail_idx} field is the free-running available ring
+index where the device will read the next available head of a
+descriptor chain.
+
+See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
+
+When VIRTIO_RING_F_PACKED is negotiated, it contains:
+
+\begin{lstlisting}
+le16 {
+  last_avail_idx : 15;
+  last_avail_wrap_counter : 1;
+};
+\end{lstlisting}
+
+The \field{last_avail_idx} field is the free-running location
+where the device read the next descriptor from the virtqueue descriptor ring.
+
+The \field{last_avail_wrap_counter} field is the last driver ring wrap
+counter that was observed by the device.
+
+See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
+
+\subsection{\field{Used State} Field}
+
+The used state field is two bytes of virtqueue state that is used by
+the device when marking a buffer used.
+
+When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
+
+\begin{lstlisting}
+le16 {
+  used_idx : 15;
+  used_wrap_counter : 1;
+};
+\end{lstlisting}
+
+The \field{used_idx} field is the free-running location where the device write the next
+used descriptor to the descriptor ring.
+
+The \field{used_wrap_counter} field is the wrap counter that is used
+by the device.
+
+See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
+
+When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
+is always 0
+
 \input{admin.tex}
 
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-dev] [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:16   ` Zhu Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch introduces a new status bit in the device status: SUSPEND.

This SUSPEND bit can be used by the driver to suspend a device,
in order to stabilize the device states and virtqueue states.

Its main use case is live migration.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/content.tex b/content.tex
index 0e492cd..0fab537 100644
--- a/content.tex
+++ b/content.tex
@@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
   drive the device.
 
+\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
+  device has been suspended by the driver.
+
 \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
   an error from which it can't recover.
 \end{description}
@@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 recover by issuing a reset.
 \end{note}
 
+The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
+
+When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
+
 \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
 
 The device MUST NOT consume buffers or send any used buffer
@@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
 MUST send a device configuration change notification to the driver.
 
+The device MUST ignore SUSPEND if FEATURES_OK is not set.
+
+The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
+
+The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
+
+If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
+and resumes operation upon DRIVER_OK.
+
+If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
+the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
+
+\begin{itemize}
+\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
+\item Wait until all descriptors that being processed to finish and mark them as used.
+\item Flush all used buffer and send used buffer notifications to the driver.
+\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
+\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
+\end{itemize}
+
 \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
 
 Each virtio device offers all the features it understands.  During
@@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
 	handling features reserved for future use.
 
+  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
+   SUSPEND the device.
+   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
-- 
2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-comment] [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-06  8:16   ` Zhu Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch introduces a new status bit in the device status: SUSPEND.

This SUSPEND bit can be used by the driver to suspend a device,
in order to stabilize the device states and virtqueue states.

Its main use case is live migration.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/content.tex b/content.tex
index 0e492cd..0fab537 100644
--- a/content.tex
+++ b/content.tex
@@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
   drive the device.
 
+\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
+  device has been suspended by the driver.
+
 \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
   an error from which it can't recover.
 \end{description}
@@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 recover by issuing a reset.
 \end{note}
 
+The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
+
+When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
+
 \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
 
 The device MUST NOT consume buffers or send any used buffer
@@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
 MUST send a device configuration change notification to the driver.
 
+The device MUST ignore SUSPEND if FEATURES_OK is not set.
+
+The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
+
+The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
+
+If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
+and resumes operation upon DRIVER_OK.
+
+If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
+the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
+
+\begin{itemize}
+\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
+\item Wait until all descriptors that being processed to finish and mark them as used.
+\item Flush all used buffer and send used buffer notifications to the driver.
+\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
+\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
+\end{itemize}
+
 \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
 
 Each virtio device offers all the features it understands.  During
@@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
 	handling features reserved for future use.
 
+  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
+   SUSPEND the device.
+   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
+
 \end{description}
 
 \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-dev] [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:16   ` Zhu Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This commit specifies the constraints of the virtqueue state,
and the actions should be taken by the device when SUSPEND
and DRIVER_OK is set

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/content.tex b/content.tex
index 0fab537..9d727ce 100644
--- a/content.tex
+++ b/content.tex
@@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
 When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
 is always 0
 
+\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
+the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
+used index in the used ring.
+
+\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
+Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
+or both of DRIVER_OK and SUSPEND are set in \field{device status}.
+Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
+
+If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
+the device MUST record the Virtqueue State of every enabled virtqueue
+in \field{Available State} and \field{Used State} respectively,
+and correspondingly restore the Virtqueue State of every enabled virtqueue
+from \field{Available State} and \field{Used State} when DRIVER_OK is set.
+
 \input{admin.tex}
 
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
-- 
2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-comment] [PATCH 3/5] virtqueue: constraints for virtqueue state
@ 2023-09-06  8:16   ` Zhu Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This commit specifies the constraints of the virtqueue state,
and the actions should be taken by the device when SUSPEND
and DRIVER_OK is set

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 content.tex | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/content.tex b/content.tex
index 0fab537..9d727ce 100644
--- a/content.tex
+++ b/content.tex
@@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
 When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
 is always 0
 
+\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
+the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
+used index in the used ring.
+
+\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
+
+If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
+Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
+or both of DRIVER_OK and SUSPEND are set in \field{device status}.
+Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
+
+If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
+the device MUST record the Virtqueue State of every enabled virtqueue
+in \field{Available State} and \field{Used State} respectively,
+and correspondingly restore the Virtqueue State of every enabled virtqueue
+from \field{Available State} and \field{Used State} when DRIVER_OK is set.
+
 \input{admin.tex}
 
 \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-comment] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:16   ` Zhu Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

When SUSPEND is set, the device should stabilize the device
states and virtqueue states, therefore the device should
ignore resetting vqs when SUSPEND is set in device status.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 content.tex | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/content.tex b/content.tex
index 9d727ce..cd2b426 100644
--- a/content.tex
+++ b/content.tex
@@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
 The device MUST reset any state of a virtqueue to the default state,
 including the available state and the used state.
 
+If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
+the device SHOULD ignore resetting any virtqueues.
+
 \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
 
 After the driver tells the device to reset a queue, the driver MUST verify that
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-dev] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
@ 2023-09-06  8:16   ` Zhu Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

When SUSPEND is set, the device should stabilize the device
states and virtqueue states, therefore the device should
ignore resetting vqs when SUSPEND is set in device status.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 content.tex | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/content.tex b/content.tex
index 9d727ce..cd2b426 100644
--- a/content.tex
+++ b/content.tex
@@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
 The device MUST reset any state of a virtqueue to the default state,
 including the available state and the used state.
 
+If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
+the device SHOULD ignore resetting any virtqueues.
+
 \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
 
 After the driver tells the device to reset a queue, the driver MUST verify that
-- 
2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-dev] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:16   ` Zhu Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch adds two new le16 fields to common configuration structure
to support VIRTIO_F_QUEUE_STATE in PCI transport layer.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 transport-pci.tex | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..3161519 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
         /* About the administration virtqueue. */
         le16 admin_queue_index;         /* read-only for driver */
         le16 admin_queue_num;         /* read-only for driver */
+
+	/* Virtqueue state */
+        le16 queue_avail_state;         /* read-write */
+        le16 queue_used_state;          /* read-write */
 };
 \end{lstlisting}
 
@@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 	The value 0 indicates no supported administration virtqueues.
 	This field is valid only if VIRTIO_F_ADMIN_VQ has been
 	negotiated.
+
+\item[\field{queue_avail_state}]
+        This field is valid only if VIRTIO_F_QUEUE_STATE has been
+        negotiated. The driver sets and gets the available state of
+        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
+
+\item[\field{queue_used_state}]
+        This field is valid only if VIRTIO_F_QUEUE_STATE has been
+        negotiated. The driver sets and gets the used state of the
+        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
+
 \end{description}
 
 \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
@@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 present either a value of 0 or a power of 2 in
 \field{queue_size}.
 
+If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
+any accesses to \field{queue_avail_state} and \field{queue_used_state}.
+
 If VIRTIO_F_ADMIN_VQ has been negotiated, the value
 \field{admin_queue_index} MUST be equal to, or bigger than
 \field{num_queues}; also, \field{admin_queue_num} MUST be
-- 
2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-06  8:16   ` Zhu Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu Lingshan @ 2023-09-06  8:16 UTC (permalink / raw)
  To: jasowang, mst, eperezma, cohuck, stefanha
  Cc: virtio-comment, virtio-dev, Zhu Lingshan

This patch adds two new le16 fields to common configuration structure
to support VIRTIO_F_QUEUE_STATE in PCI transport layer.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
---
 transport-pci.tex | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..3161519 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
         /* About the administration virtqueue. */
         le16 admin_queue_index;         /* read-only for driver */
         le16 admin_queue_num;         /* read-only for driver */
+
+	/* Virtqueue state */
+        le16 queue_avail_state;         /* read-write */
+        le16 queue_used_state;          /* read-write */
 };
 \end{lstlisting}
 
@@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 	The value 0 indicates no supported administration virtqueues.
 	This field is valid only if VIRTIO_F_ADMIN_VQ has been
 	negotiated.
+
+\item[\field{queue_avail_state}]
+        This field is valid only if VIRTIO_F_QUEUE_STATE has been
+        negotiated. The driver sets and gets the available state of
+        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
+
+\item[\field{queue_used_state}]
+        This field is valid only if VIRTIO_F_QUEUE_STATE has been
+        negotiated. The driver sets and gets the used state of the
+        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
+
 \end{description}
 
 \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
@@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 present either a value of 0 or a power of 2 in
 \field{queue_size}.
 
+If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
+any accesses to \field{queue_avail_state} and \field{queue_used_state}.
+
 If VIRTIO_F_ADMIN_VQ has been negotiated, the value
 \field{admin_queue_index} MUST be equal to, or bigger than
 \field{num_queues}; also, \field{admin_queue_num} MUST be
-- 
2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:28     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:28 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
> This patch adds new device facility to save and restore virtqueue
> state. The virtqueue state is split into two parts:
> 
> - The available state: The state that is used for read the next
>   available buffer.
> - The used state: The state that is used for make buffer used.
> 
> This will simply the transport specific method implementation. E.g two
> le16 could be used instead of a single le32). For split virtqueue, we
> only need the available state since the used state is implemented in
> the virtqueue itself (the used index). For packed virtqueue, we need
> both the available state and the used state.
> 
> Those states are required to implement live migration support for
> virtio device.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0a62dce..0e492cd 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
> +
> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
> +get the device internal virtqueue state through the following
> +fields. The implementation of the interfaces is transport specific.


virtqueue state can not, generally, be described by two 16 bit
indices.

Consider an example: these buffers available: A B C D
After device used descriptors A and C, what is its state and
how do you describe it using a single index?


> +
> +\subsection{\field{Available State} Field}
> +
> +The available state field is two bytes of virtqueue state that is used by
> +the device to read the next available buffer.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 last_avail_idx;
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running available ring
> +index where the device will read the next available head of a
> +descriptor chain.

next after what?

> +
> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  last_avail_idx : 15;
> +  last_avail_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running location
> +where the device read the next descriptor from the virtqueue descriptor ring.
> +
> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
> +counter that was observed by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +\subsection{\field{Used State} Field}
> +
> +The used state field is two bytes of virtqueue state that is used by
> +the device when marking a buffer used.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  used_idx : 15;
> +  used_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{used_idx} field is the free-running location where the device write the next
> +used descriptor to the descriptor ring.

I don't get what good does this used_idx do - used descriptors are written in
order so just check which ones are valid?
And driver does of course know what the used_wrap_counter is
otherwise it can't work.
Or is this for some kind of
split driver setup where looking at the ring is impossible?


> +
> +The \field{used_wrap_counter} field is the wrap counter that is used
> +by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> +is always 0
> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
@ 2023-09-06  8:28     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:28 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
> This patch adds new device facility to save and restore virtqueue
> state. The virtqueue state is split into two parts:
> 
> - The available state: The state that is used for read the next
>   available buffer.
> - The used state: The state that is used for make buffer used.
> 
> This will simply the transport specific method implementation. E.g two
> le16 could be used instead of a single le32). For split virtqueue, we
> only need the available state since the used state is implemented in
> the virtqueue itself (the used index). For packed virtqueue, we need
> both the available state and the used state.
> 
> Those states are required to implement live migration support for
> virtio device.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0a62dce..0e492cd 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
> +
> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
> +get the device internal virtqueue state through the following
> +fields. The implementation of the interfaces is transport specific.


virtqueue state can not, generally, be described by two 16 bit
indices.

Consider an example: these buffers available: A B C D
After device used descriptors A and C, what is its state and
how do you describe it using a single index?


> +
> +\subsection{\field{Available State} Field}
> +
> +The available state field is two bytes of virtqueue state that is used by
> +the device to read the next available buffer.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 last_avail_idx;
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running available ring
> +index where the device will read the next available head of a
> +descriptor chain.

next after what?

> +
> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  last_avail_idx : 15;
> +  last_avail_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running location
> +where the device read the next descriptor from the virtqueue descriptor ring.
> +
> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
> +counter that was observed by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +\subsection{\field{Used State} Field}
> +
> +The used state field is two bytes of virtqueue state that is used by
> +the device when marking a buffer used.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  used_idx : 15;
> +  used_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{used_idx} field is the free-running location where the device write the next
> +used descriptor to the descriptor ring.

I don't get what good does this used_idx do - used descriptors are written in
order so just check which ones are valid?
And driver does of course know what the used_wrap_counter is
otherwise it can't work.
Or is this for some kind of
split driver setup where looking at the ring is impossible?


> +
> +The \field{used_wrap_counter} field is the wrap counter that is used
> +by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> +is always 0
> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:29   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:29 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.

oh that answers my question - it's not covered.
I don't think we can merge this without in-flight descriptor
support.



> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.
> 
> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-06  8:29   ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:29 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.

oh that answers my question - it's not covered.
I don't think we can merge this without in-flight descriptor
support.



> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.
> 
> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
@ 2023-09-06  8:32     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:32 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> This patch adds two new le16 fields to common configuration structure
> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>


I do not see why this would be pci specific at all.

But besides I thought work on live migration will use
admin queue. This was explicitly one of the motivators.

Poking at the device from the driver to migrate it
is not going to work if the driver lives within guest.




> ---
>  transport-pci.tex | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..3161519 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          /* About the administration virtqueue. */
>          le16 admin_queue_index;         /* read-only for driver */
>          le16 admin_queue_num;         /* read-only for driver */
> +
> +	/* Virtqueue state */
> +        le16 queue_avail_state;         /* read-write */
> +        le16 queue_used_state;          /* read-write */
>  };
>  \end{lstlisting}
>  
> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  	The value 0 indicates no supported administration virtqueues.
>  	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>  	negotiated.
> +
> +\item[\field{queue_avail_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the available state of
> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
> +\item[\field{queue_used_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the used state of the
> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
>  \end{description}
>  
>  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  present either a value of 0 or a power of 2 in
>  \field{queue_size}.
>  
> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> +
>  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>  \field{admin_queue_index} MUST be equal to, or bigger than
>  \field{num_queues}; also, \field{admin_queue_num} MUST be
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-06  8:32     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06  8:32 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> This patch adds two new le16 fields to common configuration structure
> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>


I do not see why this would be pci specific at all.

But besides I thought work on live migration will use
admin queue. This was explicitly one of the motivators.

Poking at the device from the driver to migrate it
is not going to work if the driver lives within guest.




> ---
>  transport-pci.tex | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..3161519 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          /* About the administration virtqueue. */
>          le16 admin_queue_index;         /* read-only for driver */
>          le16 admin_queue_num;         /* read-only for driver */
> +
> +	/* Virtqueue state */
> +        le16 queue_avail_state;         /* read-write */
> +        le16 queue_used_state;          /* read-write */
>  };
>  \end{lstlisting}
>  
> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  	The value 0 indicates no supported administration virtqueues.
>  	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>  	negotiated.
> +
> +\item[\field{queue_avail_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the available state of
> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
> +\item[\field{queue_used_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the used state of the
> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
>  \end{description}
>  
>  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  present either a value of 0 or a power of 2 in
>  \field{queue_size}.
>  
> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> +
>  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>  \field{admin_queue_index} MUST be equal to, or bigger than
>  \field{num_queues}; also, \field{admin_queue_num} MUST be
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:32     ` Michael S. Tsirkin
@ 2023-09-06  8:37       ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-06  8:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, September 6, 2023 2:03 PM
> To: Zhu Lingshan <lingshan.zhu@intel.com>
> Cc: jasowang@redhat.com; eperezma@redhat.com; cohuck@redhat.com;
> stefanha@redhat.com; virtio-comment@lists.oasis-open.org; virtio-
> dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > This patch adds two new le16 fields to common configuration structure
> > to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> >
> > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> 
> 
> I do not see why this would be pci specific at all.
> 
> But besides I thought work on live migration will use admin queue. This was
> explicitly one of the motivators.
> 
> Poking at the device from the driver to migrate it is not going to work if the
> driver lives within guest.

Exactly.
I was not paying attention to this thread as we have AQ based proposal for the passthrough device.
It leverages many idea of what Si-Wei presented in KVM forum 2022.
I will post the first draft in few days.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-06  8:37       ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-06  8:37 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Michael S. Tsirkin
> Sent: Wednesday, September 6, 2023 2:03 PM
> To: Zhu Lingshan <lingshan.zhu@intel.com>
> Cc: jasowang@redhat.com; eperezma@redhat.com; cohuck@redhat.com;
> stefanha@redhat.com; virtio-comment@lists.oasis-open.org; virtio-
> dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > This patch adds two new le16 fields to common configuration structure
> > to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> >
> > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> 
> 
> I do not see why this would be pci specific at all.
> 
> But besides I thought work on live migration will use admin queue. This was
> explicitly one of the motivators.
> 
> Poking at the device from the driver to migrate it is not going to work if the
> driver lives within guest.

Exactly.
I was not paying attention to this thread as we have AQ based proposal for the passthrough device.
It leverages many idea of what Si-Wei presented in KVM forum 2022.
I will post the first draft in few days.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:29   ` Michael S. Tsirkin
@ 2023-09-06  8:38     ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  8:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>> This series introduces
>> 1)a new SUSPEND bit in the device status
>> Which is used to suspend the device, so that the device states
>> and virtqueue states are stabilized.
>>
>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>> and last_used_idx of virtqueues.
>>
>> The main usecase of these new facilities is Live Migration.
>>
>> Future work: dirty page tracking and in-flight descriptors.
> oh that answers my question - it's not covered.
> I don't think we can merge this without in-flight descriptor
> support.
When SUSPEND, we require the device wait until all descriptors that
being processed to finish and mark them as used.(In patch 2)

at this point there may be no in-flight descriptors, so this is still
self-consistent. The tracker for in-flight descriptors is excluded to
make this series small and focus.
>
>
>
>> This series addresses many comments from Jason, Stefan and Eugenio
>> from RFC series.
>>
>> Zhu Lingshan (5):
>>    virtio: introduce vq state as basic facility
>>    virtio: introduce SUSPEND bit in device status
>>    virtqueue: constraints for virtqueue state
>>    virtqueue: ignore resetting vqs when SUSPEND
>>    virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>
>>   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>   transport-pci.tex |  18 +++++++
>>   2 files changed, 136 insertions(+)
>>
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-06  8:38     ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  8:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>> This series introduces
>> 1)a new SUSPEND bit in the device status
>> Which is used to suspend the device, so that the device states
>> and virtqueue states are stabilized.
>>
>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>> and last_used_idx of virtqueues.
>>
>> The main usecase of these new facilities is Live Migration.
>>
>> Future work: dirty page tracking and in-flight descriptors.
> oh that answers my question - it's not covered.
> I don't think we can merge this without in-flight descriptor
> support.
When SUSPEND, we require the device wait until all descriptors that
being processed to finish and mark them as used.(In patch 2)

at this point there may be no in-flight descriptors, so this is still
self-consistent. The tracker for in-flight descriptors is excluded to
make this series small and focus.
>
>
>
>> This series addresses many comments from Jason, Stefan and Eugenio
>> from RFC series.
>>
>> Zhu Lingshan (5):
>>    virtio: introduce vq state as basic facility
>>    virtio: introduce SUSPEND bit in device status
>>    virtqueue: constraints for virtqueue state
>>    virtqueue: ignore resetting vqs when SUSPEND
>>    virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>
>>   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>   transport-pci.tex |  18 +++++++
>>   2 files changed, 136 insertions(+)
>>
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:32     ` Michael S. Tsirkin
@ 2023-09-06  9:37       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:32 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>> This patch adds two new le16 fields to common configuration structure
>> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>
> I do not see why this would be pci specific at all.
It is just the implementation is transport specific.
>
> But besides I thought work on live migration will use
> admin queue. This was explicitly one of the motivators.
I assume this straight forward solution can work.
>
> Poking at the device from the driver to migrate it
> is not going to work if the driver lives within guest.
The hypervisor can still set SUSPEND and do other stuffs like
collecting dirty pages.

The process should be freeze the guest first, then suspend the device.
>
>
>
>
>> ---
>>   transport-pci.tex | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..3161519 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>           /* About the administration virtqueue. */
>>           le16 admin_queue_index;         /* read-only for driver */
>>           le16 admin_queue_num;         /* read-only for driver */
>> +
>> +	/* Virtqueue state */
>> +        le16 queue_avail_state;         /* read-write */
>> +        le16 queue_used_state;          /* read-write */
>>   };
>>   \end{lstlisting}
>>   
>> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   	The value 0 indicates no supported administration virtqueues.
>>   	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>>   	negotiated.
>> +
>> +\item[\field{queue_avail_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the available state of
>> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>> +\item[\field{queue_used_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the used state of the
>> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>>   \end{description}
>>   
>>   \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
>> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   present either a value of 0 or a power of 2 in
>>   \field{queue_size}.
>>   
>> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
>> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
>> +
>>   If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>>   \field{admin_queue_index} MUST be equal to, or bigger than
>>   \field{num_queues}; also, \field{admin_queue_num} MUST be
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-06  9:37       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  9:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:32 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>> This patch adds two new le16 fields to common configuration structure
>> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>
> I do not see why this would be pci specific at all.
It is just the implementation is transport specific.
>
> But besides I thought work on live migration will use
> admin queue. This was explicitly one of the motivators.
I assume this straight forward solution can work.
>
> Poking at the device from the driver to migrate it
> is not going to work if the driver lives within guest.
The hypervisor can still set SUSPEND and do other stuffs like
collecting dirty pages.

The process should be freeze the guest first, then suspend the device.
>
>
>
>
>> ---
>>   transport-pci.tex | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..3161519 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>           /* About the administration virtqueue. */
>>           le16 admin_queue_index;         /* read-only for driver */
>>           le16 admin_queue_num;         /* read-only for driver */
>> +
>> +	/* Virtqueue state */
>> +        le16 queue_avail_state;         /* read-write */
>> +        le16 queue_used_state;          /* read-write */
>>   };
>>   \end{lstlisting}
>>   
>> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   	The value 0 indicates no supported administration virtqueues.
>>   	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>>   	negotiated.
>> +
>> +\item[\field{queue_avail_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the available state of
>> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>> +\item[\field{queue_used_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the used state of the
>> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>>   \end{description}
>>   
>>   \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
>> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   present either a value of 0 or a power of 2 in
>>   \field{queue_size}.
>>   
>> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
>> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
>> +
>>   If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>>   \field{admin_queue_index} MUST be equal to, or bigger than
>>   \field{num_queues}; also, \field{admin_queue_num} MUST be
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:28     ` Michael S. Tsirkin
@ 2023-09-06  9:43       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:28 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
>> This patch adds new device facility to save and restore virtqueue
>> state. The virtqueue state is split into two parts:
>>
>> - The available state: The state that is used for read the next
>>    available buffer.
>> - The used state: The state that is used for make buffer used.
>>
>> This will simply the transport specific method implementation. E.g two
>> le16 could be used instead of a single le32). For split virtqueue, we
>> only need the available state since the used state is implemented in
>> the virtqueue itself (the used index). For packed virtqueue, we need
>> both the available state and the used state.
>>
>> Those states are required to implement live migration support for
>> virtio device.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 65 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0a62dce..0e492cd 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
>> +
>> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
>> +get the device internal virtqueue state through the following
>> +fields. The implementation of the interfaces is transport specific.
>
> virtqueue state can not, generally, be described by two 16 bit
> indices.
>
> Consider an example: these buffers available: A B C D
> After device used descriptors A and C, what is its state and
> how do you describe it using a single index?
<discussed in the 0/5 cover letter>
>
>
>> +
>> +\subsection{\field{Available State} Field}
>> +
>> +The available state field is two bytes of virtqueue state that is used by
>> +the device to read the next available buffer.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 last_avail_idx;
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running available ring
>> +index where the device will read the next available head of a
>> +descriptor chain.
> next after what?
I am not sure I get it.
It's like a pointer, so I assume it goes without saying, implies the 
next "address".

Do you suggest "next after current being processed ones" or others?
>
>> +
>> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  last_avail_idx : 15;
>> +  last_avail_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running location
>> +where the device read the next descriptor from the virtqueue descriptor ring.
>> +
>> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
>> +counter that was observed by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +\subsection{\field{Used State} Field}
>> +
>> +The used state field is two bytes of virtqueue state that is used by
>> +the device when marking a buffer used.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  used_idx : 15;
>> +  used_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{used_idx} field is the free-running location where the device write the next
>> +used descriptor to the descriptor ring.
> I don't get what good does this used_idx do - used descriptors are written in
> order so just check which ones are valid?
a bit confused. please correct me if I misunderstood.

Valid to the device? avail_idx? How to speculate used_idx from avail_idx,
or walk through the ring?
> And driver does of course know what the used_wrap_counter is
> otherwise it can't work.
> Or is this for some kind of
> split driver setup where looking at the ring is impossible?
For splitted virtqueue, we don't need to migrate used_idx, but for
packed vq, is it easier if we record and restore it?
>
>
>> +
>> +The \field{used_wrap_counter} field is the wrap counter that is used
>> +by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>> +is always 0
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
@ 2023-09-06  9:43       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-06  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 4:28 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
>> This patch adds new device facility to save and restore virtqueue
>> state. The virtqueue state is split into two parts:
>>
>> - The available state: The state that is used for read the next
>>    available buffer.
>> - The used state: The state that is used for make buffer used.
>>
>> This will simply the transport specific method implementation. E.g two
>> le16 could be used instead of a single le32). For split virtqueue, we
>> only need the available state since the used state is implemented in
>> the virtqueue itself (the used index). For packed virtqueue, we need
>> both the available state and the used state.
>>
>> Those states are required to implement live migration support for
>> virtio device.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 65 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0a62dce..0e492cd 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
>> +
>> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
>> +get the device internal virtqueue state through the following
>> +fields. The implementation of the interfaces is transport specific.
>
> virtqueue state can not, generally, be described by two 16 bit
> indices.
>
> Consider an example: these buffers available: A B C D
> After device used descriptors A and C, what is its state and
> how do you describe it using a single index?
<discussed in the 0/5 cover letter>
>
>
>> +
>> +\subsection{\field{Available State} Field}
>> +
>> +The available state field is two bytes of virtqueue state that is used by
>> +the device to read the next available buffer.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 last_avail_idx;
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running available ring
>> +index where the device will read the next available head of a
>> +descriptor chain.
> next after what?
I am not sure I get it.
It's like a pointer, so I assume it goes without saying, implies the 
next "address".

Do you suggest "next after current being processed ones" or others?
>
>> +
>> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  last_avail_idx : 15;
>> +  last_avail_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running location
>> +where the device read the next descriptor from the virtqueue descriptor ring.
>> +
>> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
>> +counter that was observed by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +\subsection{\field{Used State} Field}
>> +
>> +The used state field is two bytes of virtqueue state that is used by
>> +the device when marking a buffer used.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  used_idx : 15;
>> +  used_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{used_idx} field is the free-running location where the device write the next
>> +used descriptor to the descriptor ring.
> I don't get what good does this used_idx do - used descriptors are written in
> order so just check which ones are valid?
a bit confused. please correct me if I misunderstood.

Valid to the device? avail_idx? How to speculate used_idx from avail_idx,
or walk through the ring?
> And driver does of course know what the used_wrap_counter is
> otherwise it can't work.
> Or is this for some kind of
> split driver setup where looking at the ring is impossible?
For splitted virtqueue, we don't need to migrate used_idx, but for
packed vq, is it easier if we record and restore it?
>
>
>> +
>> +The \field{used_wrap_counter} field is the wrap counter that is used
>> +by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>> +is always 0
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:38     ` Zhu, Lingshan
@ 2023-09-06 13:49       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06 13:49 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > This series introduces
> > > 1)a new SUSPEND bit in the device status
> > > Which is used to suspend the device, so that the device states
> > > and virtqueue states are stabilized.
> > > 
> > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > and last_used_idx of virtqueues.
> > > 
> > > The main usecase of these new facilities is Live Migration.
> > > 
> > > Future work: dirty page tracking and in-flight descriptors.
> > oh that answers my question - it's not covered.
> > I don't think we can merge this without in-flight descriptor
> > support.
> When SUSPEND, we require the device wait until all descriptors that
> being processed to finish and mark them as used.(In patch 2)
> at this point there may be no in-flight descriptors, so this is still
> self-consistent. The tracker for in-flight descriptors is excluded to
> make this series small and focus.

Does not work generally.
Imagine RX ring of a network device for example. You can wait
as long as you can but if there's no incoming network traffic
buffers will not be used.

Also please, keep to the spec terminology. buffers are used not
descriptors. Best to keep it straight errors will not leak into
spec.


> > 
> > 
> > 
> > > This series addresses many comments from Jason, Stefan and Eugenio
> > > from RFC series.
> > > 
> > > Zhu Lingshan (5):
> > >    virtio: introduce vq state as basic facility
> > >    virtio: introduce SUSPEND bit in device status
> > >    virtqueue: constraints for virtqueue state
> > >    virtqueue: ignore resetting vqs when SUSPEND
> > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > 
> > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > >   transport-pci.tex |  18 +++++++
> > >   2 files changed, 136 insertions(+)
> > > 
> > > -- 
> > > 2.35.3
> > > 
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-06 13:49       ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-06 13:49 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > This series introduces
> > > 1)a new SUSPEND bit in the device status
> > > Which is used to suspend the device, so that the device states
> > > and virtqueue states are stabilized.
> > > 
> > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > and last_used_idx of virtqueues.
> > > 
> > > The main usecase of these new facilities is Live Migration.
> > > 
> > > Future work: dirty page tracking and in-flight descriptors.
> > oh that answers my question - it's not covered.
> > I don't think we can merge this without in-flight descriptor
> > support.
> When SUSPEND, we require the device wait until all descriptors that
> being processed to finish and mark them as used.(In patch 2)
> at this point there may be no in-flight descriptors, so this is still
> self-consistent. The tracker for in-flight descriptors is excluded to
> make this series small and focus.

Does not work generally.
Imagine RX ring of a network device for example. You can wait
as long as you can but if there's no incoming network traffic
buffers will not be used.

Also please, keep to the spec terminology. buffers are used not
descriptors. Best to keep it straight errors will not leak into
spec.


> > 
> > 
> > 
> > > This series addresses many comments from Jason, Stefan and Eugenio
> > > from RFC series.
> > > 
> > > Zhu Lingshan (5):
> > >    virtio: introduce vq state as basic facility
> > >    virtio: introduce SUSPEND bit in device status
> > >    virtqueue: constraints for virtqueue state
> > >    virtqueue: ignore resetting vqs when SUSPEND
> > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > 
> > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > >   transport-pci.tex |  18 +++++++
> > >   2 files changed, 136 insertions(+)
> > > 
> > > -- 
> > > 2.35.3
> > > 
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06 13:49       ` Michael S. Tsirkin
@ 2023-09-07  1:51         ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-07  1:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 9:49 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>>>> This series introduces
>>>> 1)a new SUSPEND bit in the device status
>>>> Which is used to suspend the device, so that the device states
>>>> and virtqueue states are stabilized.
>>>>
>>>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>>>> and last_used_idx of virtqueues.
>>>>
>>>> The main usecase of these new facilities is Live Migration.
>>>>
>>>> Future work: dirty page tracking and in-flight descriptors.
>>> oh that answers my question - it's not covered.
>>> I don't think we can merge this without in-flight descriptor
>>> support.
>> When SUSPEND, we require the device wait until all descriptors that
>> being processed to finish and mark them as used.(In patch 2)
>> at this point there may be no in-flight descriptors, so this is still
>> self-consistent. The tracker for in-flight descriptors is excluded to
>> make this series small and focus.
> Does not work generally.
> Imagine RX ring of a network device for example. You can wait
> as long as you can but if there's no incoming network traffic
> buffers will not be used.
Yes we will include a patch tracking in-flight descriptors in V2.
>
> Also please, keep to the spec terminology. buffers are used not
> descriptors. Best to keep it straight errors will not leak into
> spec.
OK
>
>
>>>
>>>
>>>> This series addresses many comments from Jason, Stefan and Eugenio
>>>> from RFC series.
>>>>
>>>> Zhu Lingshan (5):
>>>>     virtio: introduce vq state as basic facility
>>>>     virtio: introduce SUSPEND bit in device status
>>>>     virtqueue: constraints for virtqueue state
>>>>     virtqueue: ignore resetting vqs when SUSPEND
>>>>     virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>>>
>>>>    content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>    transport-pci.tex |  18 +++++++
>>>>    2 files changed, 136 insertions(+)
>>>>
>>>> -- 
>>>> 2.35.3
>>>>
>>>>
>>>> This publicly archived list offers a means to provide input to the
>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>
>>>> In order to verify user consent to the Feedback License terms and
>>>> to minimize spam in the list archive, subscription is required
>>>> before posting.
>>>>
>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-07  1:51         ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-07  1:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/6/2023 9:49 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>>>> This series introduces
>>>> 1)a new SUSPEND bit in the device status
>>>> Which is used to suspend the device, so that the device states
>>>> and virtqueue states are stabilized.
>>>>
>>>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>>>> and last_used_idx of virtqueues.
>>>>
>>>> The main usecase of these new facilities is Live Migration.
>>>>
>>>> Future work: dirty page tracking and in-flight descriptors.
>>> oh that answers my question - it's not covered.
>>> I don't think we can merge this without in-flight descriptor
>>> support.
>> When SUSPEND, we require the device wait until all descriptors that
>> being processed to finish and mark them as used.(In patch 2)
>> at this point there may be no in-flight descriptors, so this is still
>> self-consistent. The tracker for in-flight descriptors is excluded to
>> make this series small and focus.
> Does not work generally.
> Imagine RX ring of a network device for example. You can wait
> as long as you can but if there's no incoming network traffic
> buffers will not be used.
Yes we will include a patch tracking in-flight descriptors in V2.
>
> Also please, keep to the spec terminology. buffers are used not
> descriptors. Best to keep it straight errors will not leak into
> spec.
OK
>
>
>>>
>>>
>>>> This series addresses many comments from Jason, Stefan and Eugenio
>>>> from RFC series.
>>>>
>>>> Zhu Lingshan (5):
>>>>     virtio: introduce vq state as basic facility
>>>>     virtio: introduce SUSPEND bit in device status
>>>>     virtqueue: constraints for virtqueue state
>>>>     virtqueue: ignore resetting vqs when SUSPEND
>>>>     virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>>>
>>>>    content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>>>    transport-pci.tex |  18 +++++++
>>>>    2 files changed, 136 insertions(+)
>>>>
>>>> -- 
>>>> 2.35.3
>>>>
>>>>
>>>> This publicly archived list offers a means to provide input to the
>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>
>>>> In order to verify user consent to the Feedback License terms and
>>>> to minimize spam in the list archive, subscription is required
>>>> before posting.
>>>>
>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06 13:49       ` Michael S. Tsirkin
@ 2023-09-07 10:57         ` Eugenio Perez Martin
  -1 siblings, 0 replies; 445+ messages in thread
From: Eugenio Perez Martin @ 2023-09-07 10:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhu, Lingshan, jasowang, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 6, 2023 at 3:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> >
> >
> > On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > > This series introduces
> > > > 1)a new SUSPEND bit in the device status
> > > > Which is used to suspend the device, so that the device states
> > > > and virtqueue states are stabilized.
> > > >
> > > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > > and last_used_idx of virtqueues.
> > > >
> > > > The main usecase of these new facilities is Live Migration.
> > > >
> > > > Future work: dirty page tracking and in-flight descriptors.
> > > oh that answers my question - it's not covered.
> > > I don't think we can merge this without in-flight descriptor
> > > support.
> > When SUSPEND, we require the device wait until all descriptors that
> > being processed to finish and mark them as used.(In patch 2)
> > at this point there may be no in-flight descriptors, so this is still
> > self-consistent. The tracker for in-flight descriptors is excluded to
> > make this series small and focus.
>
> Does not work generally.
> Imagine RX ring of a network device for example. You can wait
> as long as you can but if there's no incoming network traffic
> buffers will not be used.
>

The patch should word it differently, yes.

QEMU's vhost-kernel net handler currently assumes the device will use
the descriptors sequentially from avail_idx. In that case, it is
possible to simply finish receiving in-flight packets (not buffers)
and just stop receiving new packets. As all packets has been received,
we have a valid used-idx, and the device at resume (or the destination
device at migration) can just fetch all buffers from there.

I may have a better wording of this in other mails.

Would it work to use this solution with in_order, and defer the
inflight buffers handling for the future? It would allow to keep this
series small.

Thanks!

> Also please, keep to the spec terminology. buffers are used not
> descriptors. Best to keep it straight errors will not leak into
> spec.
>
>
> > >
> > >
> > >
> > > > This series addresses many comments from Jason, Stefan and Eugenio
> > > > from RFC series.
> > > >
> > > > Zhu Lingshan (5):
> > > >    virtio: introduce vq state as basic facility
> > > >    virtio: introduce SUSPEND bit in device status
> > > >    virtqueue: constraints for virtqueue state
> > > >    virtqueue: ignore resetting vqs when SUSPEND
> > > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > >
> > > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >   transport-pci.tex |  18 +++++++
> > > >   2 files changed, 136 insertions(+)
> > > >
> > > > --
> > > > 2.35.3
> > > >
> > > >
> > > > This publicly archived list offers a means to provide input to the
> > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > >
> > > > In order to verify user consent to the Feedback License terms and
> > > > to minimize spam in the list archive, subscription is required
> > > > before posting.
> > > >
> > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > Join OASIS: https://www.oasis-open.org/join/
> > > >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-07 10:57         ` Eugenio Perez Martin
  0 siblings, 0 replies; 445+ messages in thread
From: Eugenio Perez Martin @ 2023-09-07 10:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhu, Lingshan, jasowang, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 6, 2023 at 3:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> >
> >
> > On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > > This series introduces
> > > > 1)a new SUSPEND bit in the device status
> > > > Which is used to suspend the device, so that the device states
> > > > and virtqueue states are stabilized.
> > > >
> > > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > > and last_used_idx of virtqueues.
> > > >
> > > > The main usecase of these new facilities is Live Migration.
> > > >
> > > > Future work: dirty page tracking and in-flight descriptors.
> > > oh that answers my question - it's not covered.
> > > I don't think we can merge this without in-flight descriptor
> > > support.
> > When SUSPEND, we require the device wait until all descriptors that
> > being processed to finish and mark them as used.(In patch 2)
> > at this point there may be no in-flight descriptors, so this is still
> > self-consistent. The tracker for in-flight descriptors is excluded to
> > make this series small and focus.
>
> Does not work generally.
> Imagine RX ring of a network device for example. You can wait
> as long as you can but if there's no incoming network traffic
> buffers will not be used.
>

The patch should word it differently, yes.

QEMU's vhost-kernel net handler currently assumes the device will use
the descriptors sequentially from avail_idx. In that case, it is
possible to simply finish receiving in-flight packets (not buffers)
and just stop receiving new packets. As all packets has been received,
we have a valid used-idx, and the device at resume (or the destination
device at migration) can just fetch all buffers from there.

I may have a better wording of this in other mails.

Would it work to use this solution with in_order, and defer the
inflight buffers handling for the future? It would allow to keep this
series small.

Thanks!

> Also please, keep to the spec terminology. buffers are used not
> descriptors. Best to keep it straight errors will not leak into
> spec.
>
>
> > >
> > >
> > >
> > > > This series addresses many comments from Jason, Stefan and Eugenio
> > > > from RFC series.
> > > >
> > > > Zhu Lingshan (5):
> > > >    virtio: introduce vq state as basic facility
> > > >    virtio: introduce SUSPEND bit in device status
> > > >    virtqueue: constraints for virtqueue state
> > > >    virtqueue: ignore resetting vqs when SUSPEND
> > > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > >
> > > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > > >   transport-pci.tex |  18 +++++++
> > > >   2 files changed, 136 insertions(+)
> > > >
> > > > --
> > > > 2.35.3
> > > >
> > > >
> > > > This publicly archived list offers a means to provide input to the
> > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > >
> > > > In order to verify user consent to the Feedback License terms and
> > > > to minimize spam in the list archive, subscription is required
> > > > before posting.
> > > >
> > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > Join OASIS: https://www.oasis-open.org/join/
> > > >
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-07 10:57         ` Eugenio Perez Martin
@ 2023-09-07 19:55           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-07 19:55 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Zhu, Lingshan, jasowang, cohuck, stefanha, virtio-comment, virtio-dev

On Thu, Sep 07, 2023 at 12:57:58PM +0200, Eugenio Perez Martin wrote:
> On Wed, Sep 6, 2023 at 3:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> > >
> > >
> > > On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > > > This series introduces
> > > > > 1)a new SUSPEND bit in the device status
> > > > > Which is used to suspend the device, so that the device states
> > > > > and virtqueue states are stabilized.
> > > > >
> > > > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > > > and last_used_idx of virtqueues.
> > > > >
> > > > > The main usecase of these new facilities is Live Migration.
> > > > >
> > > > > Future work: dirty page tracking and in-flight descriptors.
> > > > oh that answers my question - it's not covered.
> > > > I don't think we can merge this without in-flight descriptor
> > > > support.
> > > When SUSPEND, we require the device wait until all descriptors that
> > > being processed to finish and mark them as used.(In patch 2)
> > > at this point there may be no in-flight descriptors, so this is still
> > > self-consistent. The tracker for in-flight descriptors is excluded to
> > > make this series small and focus.
> >
> > Does not work generally.
> > Imagine RX ring of a network device for example. You can wait
> > as long as you can but if there's no incoming network traffic
> > buffers will not be used.
> >
> 
> The patch should word it differently, yes.
> 
> QEMU's vhost-kernel net handler currently assumes the device will use
> the descriptors sequentially from avail_idx. In that case, it is
> possible to simply finish receiving in-flight packets (not buffers)
> and just stop receiving new packets. As all packets has been received,
> we have a valid used-idx, and the device at resume (or the destination
> device at migration) can just fetch all buffers from there.
> 
> I may have a better wording of this in other mails.
> 
> Would it work to use this solution with in_order, and defer the
> inflight buffers handling for the future? It would allow to keep this
> series small.
> 
> Thanks!

in_order isn't used widely so I doubt depending on it is wise.
And again the whole things has to be rewritten with admin queue
and the group owner generally does not know whether in_order
was negotiated by member or not.


> > Also please, keep to the spec terminology. buffers are used not
> > descriptors. Best to keep it straight errors will not leak into
> > spec.
> >
> >
> > > >
> > > >
> > > >
> > > > > This series addresses many comments from Jason, Stefan and Eugenio
> > > > > from RFC series.
> > > > >
> > > > > Zhu Lingshan (5):
> > > > >    virtio: introduce vq state as basic facility
> > > > >    virtio: introduce SUSPEND bit in device status
> > > > >    virtqueue: constraints for virtqueue state
> > > > >    virtqueue: ignore resetting vqs when SUSPEND
> > > > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > > >
> > > > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > >   transport-pci.tex |  18 +++++++
> > > > >   2 files changed, 136 insertions(+)
> > > > >
> > > > > --
> > > > > 2.35.3
> > > > >
> > > > >
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > >
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > >
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-07 19:55           ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-07 19:55 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Zhu, Lingshan, jasowang, cohuck, stefanha, virtio-comment, virtio-dev

On Thu, Sep 07, 2023 at 12:57:58PM +0200, Eugenio Perez Martin wrote:
> On Wed, Sep 6, 2023 at 3:49 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 06, 2023 at 04:38:44PM +0800, Zhu, Lingshan wrote:
> > >
> > >
> > > On 9/6/2023 4:29 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> > > > > This series introduces
> > > > > 1)a new SUSPEND bit in the device status
> > > > > Which is used to suspend the device, so that the device states
> > > > > and virtqueue states are stabilized.
> > > > >
> > > > > 2)virtqueue state and its accessor, to get and set last_avail_idx
> > > > > and last_used_idx of virtqueues.
> > > > >
> > > > > The main usecase of these new facilities is Live Migration.
> > > > >
> > > > > Future work: dirty page tracking and in-flight descriptors.
> > > > oh that answers my question - it's not covered.
> > > > I don't think we can merge this without in-flight descriptor
> > > > support.
> > > When SUSPEND, we require the device wait until all descriptors that
> > > being processed to finish and mark them as used.(In patch 2)
> > > at this point there may be no in-flight descriptors, so this is still
> > > self-consistent. The tracker for in-flight descriptors is excluded to
> > > make this series small and focus.
> >
> > Does not work generally.
> > Imagine RX ring of a network device for example. You can wait
> > as long as you can but if there's no incoming network traffic
> > buffers will not be used.
> >
> 
> The patch should word it differently, yes.
> 
> QEMU's vhost-kernel net handler currently assumes the device will use
> the descriptors sequentially from avail_idx. In that case, it is
> possible to simply finish receiving in-flight packets (not buffers)
> and just stop receiving new packets. As all packets has been received,
> we have a valid used-idx, and the device at resume (or the destination
> device at migration) can just fetch all buffers from there.
> 
> I may have a better wording of this in other mails.
> 
> Would it work to use this solution with in_order, and defer the
> inflight buffers handling for the future? It would allow to keep this
> series small.
> 
> Thanks!

in_order isn't used widely so I doubt depending on it is wise.
And again the whole things has to be rewritten with admin queue
and the group owner generally does not know whether in_order
was negotiated by member or not.


> > Also please, keep to the spec terminology. buffers are used not
> > descriptors. Best to keep it straight errors will not leak into
> > spec.
> >
> >
> > > >
> > > >
> > > >
> > > > > This series addresses many comments from Jason, Stefan and Eugenio
> > > > > from RFC series.
> > > > >
> > > > > Zhu Lingshan (5):
> > > > >    virtio: introduce vq state as basic facility
> > > > >    virtio: introduce SUSPEND bit in device status
> > > > >    virtqueue: constraints for virtqueue state
> > > > >    virtqueue: ignore resetting vqs when SUSPEND
> > > > >    virtio-pci: implement VIRTIO_F_QUEUE_STATE
> > > > >
> > > > >   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
> > > > >   transport-pci.tex |  18 +++++++
> > > > >   2 files changed, 136 insertions(+)
> > > > >
> > > > > --
> > > > > 2.35.3
> > > > >
> > > > >
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > >
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > >
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > >
> >


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:32     ` Michael S. Tsirkin
@ 2023-09-11  3:01       ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-11  3:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhu Lingshan, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > This patch adds two new le16 fields to common configuration structure
> > to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> >
> > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>
>
> I do not see why this would be pci specific at all.

This is the PCI interface for live migration. The facility is not
specific to PCI.

It can choose to reuse the common configuration or not, but the
semantic is general enough to be used by other transports. We can
introduce one for MMIO for sure.

>
> But besides I thought work on live migration will use
> admin queue. This was explicitly one of the motivators.

I think not. Using admin virtqueue will end up with several problems:

1) the feature is not self contained so at the end we need transport
specific facility to migrate the admin virtqueue
2) won't work in the nested environment, or we need complicated SR-IOV
emulation in order to work

>
> Poking at the device from the driver to migrate it
> is not going to work if the driver lives within guest.

This is by design to allow live migration to work in the nested layer.
And it's the way we've used for CPU and MMU. Anything may virtio
different here?

Thanks


>
>
>
>
> > ---
> >  transport-pci.tex | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/transport-pci.tex b/transport-pci.tex
> > index a5c6719..3161519 100644
> > --- a/transport-pci.tex
> > +++ b/transport-pci.tex
> > @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >          /* About the administration virtqueue. */
> >          le16 admin_queue_index;         /* read-only for driver */
> >          le16 admin_queue_num;         /* read-only for driver */
> > +
> > +     /* Virtqueue state */
> > +        le16 queue_avail_state;         /* read-write */
> > +        le16 queue_used_state;          /* read-write */
> >  };
> >  \end{lstlisting}
> >
> > @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >       The value 0 indicates no supported administration virtqueues.
> >       This field is valid only if VIRTIO_F_ADMIN_VQ has been
> >       negotiated.
> > +
> > +\item[\field{queue_avail_state}]
> > +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> > +        negotiated. The driver sets and gets the available state of
> > +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> > +
> > +\item[\field{queue_used_state}]
> > +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> > +        negotiated. The driver sets and gets the used state of the
> > +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> > +
> >  \end{description}
> >
> >  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> > @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >  present either a value of 0 or a power of 2 in
> >  \field{queue_size}.
> >
> > +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> > +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> > +
> >  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
> >  \field{admin_queue_index} MUST be equal to, or bigger than
> >  \field{num_queues}; also, \field{admin_queue_num} MUST be
> > --
> > 2.35.3
> >
> >
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> >
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> >
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  3:01       ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-11  3:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Zhu Lingshan, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > This patch adds two new le16 fields to common configuration structure
> > to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> >
> > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>
>
> I do not see why this would be pci specific at all.

This is the PCI interface for live migration. The facility is not
specific to PCI.

It can choose to reuse the common configuration or not, but the
semantic is general enough to be used by other transports. We can
introduce one for MMIO for sure.

>
> But besides I thought work on live migration will use
> admin queue. This was explicitly one of the motivators.

I think not. Using admin virtqueue will end up with several problems:

1) the feature is not self contained so at the end we need transport
specific facility to migrate the admin virtqueue
2) won't work in the nested environment, or we need complicated SR-IOV
emulation in order to work

>
> Poking at the device from the driver to migrate it
> is not going to work if the driver lives within guest.

This is by design to allow live migration to work in the nested layer.
And it's the way we've used for CPU and MMU. Anything may virtio
different here?

Thanks


>
>
>
>
> > ---
> >  transport-pci.tex | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/transport-pci.tex b/transport-pci.tex
> > index a5c6719..3161519 100644
> > --- a/transport-pci.tex
> > +++ b/transport-pci.tex
> > @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >          /* About the administration virtqueue. */
> >          le16 admin_queue_index;         /* read-only for driver */
> >          le16 admin_queue_num;         /* read-only for driver */
> > +
> > +     /* Virtqueue state */
> > +        le16 queue_avail_state;         /* read-write */
> > +        le16 queue_used_state;          /* read-write */
> >  };
> >  \end{lstlisting}
> >
> > @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >       The value 0 indicates no supported administration virtqueues.
> >       This field is valid only if VIRTIO_F_ADMIN_VQ has been
> >       negotiated.
> > +
> > +\item[\field{queue_avail_state}]
> > +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> > +        negotiated. The driver sets and gets the available state of
> > +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> > +
> > +\item[\field{queue_used_state}]
> > +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> > +        negotiated. The driver sets and gets the used state of the
> > +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> > +
> >  \end{description}
> >
> >  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> > @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >  present either a value of 0 or a power of 2 in
> >  \field{queue_size}.
> >
> > +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> > +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> > +
> >  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
> >  \field{admin_queue_index} MUST be equal to, or bigger than
> >  \field{num_queues}; also, \field{admin_queue_num} MUST be
> > --
> > 2.35.3
> >
> >
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> >
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> >
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> >
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  3:01       ` Jason Wang
@ 2023-09-11  4:11         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  4:11 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Zhu Lingshan, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

Hi Michael,

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Jason Wang
> Sent: Monday, September 11, 2023 8:31 AM
> 
> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > This patch adds two new le16 fields to common configuration
> > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > >
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> >
> >
> > I do not see why this would be pci specific at all.
> 
> This is the PCI interface for live migration. The facility is not specific to PCI.
> 
> It can choose to reuse the common configuration or not, but the semantic is
> general enough to be used by other transports. We can introduce one for
> MMIO for sure.
> 
> >
> > But besides I thought work on live migration will use admin queue.
> > This was explicitly one of the motivators.
>
Please find the proposal that uses administration commands for device migration at [1] for passthrough devices.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

 > I think not. Using admin virtqueue will end up with several problems:
> 
> 1) the feature is not self contained so at the end we need transport specific
> facility to migrate the admin virtqueue

You mixed up.
Admin queue of the owner device is not migrated.
Admin queue of the member device is migrated like any other queue using above [1].

> 2) won't work in the nested environment, or we need complicated SR-IOV
> emulation in order to work
> 
> >
> > Poking at the device from the driver to migrate it is not going to
> > work if the driver lives within guest.
> 
> This is by design to allow live migration to work in the nested layer.
> And it's the way we've used for CPU and MMU. Anything may virtio different
> here?

Nested and non-nested use cases likely cannot be addressed by single solution/interface.
So both are orthogonal requirements to me.

One can defined some administration commands to issue on the AQ of the member device itself for nested case.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  4:11         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  4:11 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Zhu Lingshan, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

Hi Michael,

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Jason Wang
> Sent: Monday, September 11, 2023 8:31 AM
> 
> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > This patch adds two new le16 fields to common configuration
> > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > >
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> >
> >
> > I do not see why this would be pci specific at all.
> 
> This is the PCI interface for live migration. The facility is not specific to PCI.
> 
> It can choose to reuse the common configuration or not, but the semantic is
> general enough to be used by other transports. We can introduce one for
> MMIO for sure.
> 
> >
> > But besides I thought work on live migration will use admin queue.
> > This was explicitly one of the motivators.
>
Please find the proposal that uses administration commands for device migration at [1] for passthrough devices.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

 > I think not. Using admin virtqueue will end up with several problems:
> 
> 1) the feature is not self contained so at the end we need transport specific
> facility to migrate the admin virtqueue

You mixed up.
Admin queue of the owner device is not migrated.
Admin queue of the member device is migrated like any other queue using above [1].

> 2) won't work in the nested environment, or we need complicated SR-IOV
> emulation in order to work
> 
> >
> > Poking at the device from the driver to migrate it is not going to
> > work if the driver lives within guest.
> 
> This is by design to allow live migration to work in the nested layer.
> And it's the way we've used for CPU and MMU. Anything may virtio different
> here?

Nested and non-nested use cases likely cannot be addressed by single solution/interface.
So both are orthogonal requirements to me.

One can defined some administration commands to issue on the AQ of the member device itself for nested case.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  4:11         ` Parav Pandit
@ 2023-09-11  6:30           ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-11  6:30 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
>
> Hi Michael,
>
> > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> > open.org> On Behalf Of Jason Wang
> > Sent: Monday, September 11, 2023 8:31 AM
> >
> > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > This patch adds two new le16 fields to common configuration
> > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > >
> > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > >
> > >
> > > I do not see why this would be pci specific at all.
> >
> > This is the PCI interface for live migration. The facility is not specific to PCI.
> >
> > It can choose to reuse the common configuration or not, but the semantic is
> > general enough to be used by other transports. We can introduce one for
> > MMIO for sure.
> >
> > >
> > > But besides I thought work on live migration will use admin queue.
> > > This was explicitly one of the motivators.
> >
> Please find the proposal that uses administration commands for device migration at [1] for passthrough devices.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

This proposal couples live migration with several requirements, and
suffers from the exact issues I've mentioned below.

In some cases, it's even worse (coupling with PCI/SR-IOV, second state
machine other than the device status).

>
>  > I think not. Using admin virtqueue will end up with several problems:
> >
> > 1) the feature is not self contained so at the end we need transport specific
> > facility to migrate the admin virtqueue
>
> You mixed up.
> Admin queue of the owner device is not migrated.

Why not? Ling Shan's proposal makes everything work including
migrating the owner or in the case there's even no owner.

In this proposal, the facility (suspending, queue state, inflight
descriptors) is decoupled from the transport specific API. Each
transport can implement one or more types of interfaces. A MMIO based
interface is proposed but It doesn't prevent you from adding admin
commands for those facilities on top.

> Admin queue of the member device is migrated like any other queue using above [1].
>
> > 2) won't work in the nested environment, or we need complicated SR-IOV
> > emulation in order to work
> >
> > >
> > > Poking at the device from the driver to migrate it is not going to
> > > work if the driver lives within guest.
> >
> > This is by design to allow live migration to work in the nested layer.
> > And it's the way we've used for CPU and MMU. Anything may virtio different
> > here?
>
> Nested and non-nested use cases likely cannot be addressed by single solution/interface.

I think Ling Shan's proposal addressed them both.

> So both are orthogonal requirements to me.
>
> One can defined some administration commands to issue on the AQ of the member device itself for nested case.

This is not easy, DMA needs to be isolated so this means you need to
either emulate SR-IOV and use AQ on virtual PF in the guest or using
PASID.

Customers don't want to have admin stuff, SR-IOV or PASID in the guest
in order to migrate a single virtio device in the nest.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  6:30           ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-11  6:30 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
>
> Hi Michael,
>
> > From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> > open.org> On Behalf Of Jason Wang
> > Sent: Monday, September 11, 2023 8:31 AM
> >
> > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > This patch adds two new le16 fields to common configuration
> > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > >
> > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > >
> > >
> > > I do not see why this would be pci specific at all.
> >
> > This is the PCI interface for live migration. The facility is not specific to PCI.
> >
> > It can choose to reuse the common configuration or not, but the semantic is
> > general enough to be used by other transports. We can introduce one for
> > MMIO for sure.
> >
> > >
> > > But besides I thought work on live migration will use admin queue.
> > > This was explicitly one of the motivators.
> >
> Please find the proposal that uses administration commands for device migration at [1] for passthrough devices.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

This proposal couples live migration with several requirements, and
suffers from the exact issues I've mentioned below.

In some cases, it's even worse (coupling with PCI/SR-IOV, second state
machine other than the device status).

>
>  > I think not. Using admin virtqueue will end up with several problems:
> >
> > 1) the feature is not self contained so at the end we need transport specific
> > facility to migrate the admin virtqueue
>
> You mixed up.
> Admin queue of the owner device is not migrated.

Why not? Ling Shan's proposal makes everything work including
migrating the owner or in the case there's even no owner.

In this proposal, the facility (suspending, queue state, inflight
descriptors) is decoupled from the transport specific API. Each
transport can implement one or more types of interfaces. A MMIO based
interface is proposed but It doesn't prevent you from adding admin
commands for those facilities on top.

> Admin queue of the member device is migrated like any other queue using above [1].
>
> > 2) won't work in the nested environment, or we need complicated SR-IOV
> > emulation in order to work
> >
> > >
> > > Poking at the device from the driver to migrate it is not going to
> > > work if the driver lives within guest.
> >
> > This is by design to allow live migration to work in the nested layer.
> > And it's the way we've used for CPU and MMU. Anything may virtio different
> > here?
>
> Nested and non-nested use cases likely cannot be addressed by single solution/interface.

I think Ling Shan's proposal addressed them both.

> So both are orthogonal requirements to me.
>
> One can defined some administration commands to issue on the AQ of the member device itself for nested case.

This is not easy, DMA needs to be isolated so this means you need to
either emulate SR-IOV and use AQ on virtual PF in the guest or using
PASID.

Customers don't want to have admin stuff, SR-IOV or PASID in the guest
in order to migrate a single virtio device in the nest.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:30           ` Jason Wang
@ 2023-09-11  6:47             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  6:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 11, 2023 12:01 PM
> 
> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Hi Michael,
> >
> > > From: virtio-comment@lists.oasis-open.org
> > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > Sent: Monday, September 11, 2023 8:31 AM
> > >
> > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > >
> > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > This patch adds two new le16 fields to common configuration
> > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > >
> > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > >
> > > >
> > > > I do not see why this would be pci specific at all.
> > >
> > > This is the PCI interface for live migration. The facility is not specific to PCI.
> > >
> > > It can choose to reuse the common configuration or not, but the
> > > semantic is general enough to be used by other transports. We can
> > > introduce one for MMIO for sure.
> > >
> > > >
> > > > But besides I thought work on live migration will use admin queue.
> > > > This was explicitly one of the motivators.
> > >
> > Please find the proposal that uses administration commands for device
> migration at [1] for passthrough devices.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> 
> This proposal couples live migration with several requirements, and suffers from
> the exact issues I've mentioned below.
>
It does not.
Can you please list which one?
 
> In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
> other than the device status).
> 
There is no state machine in [1].
It is not coupled with PCI/SR-IOV either.
It supports PCI/SR-IOV transport and in future other transports too when they evolve.

> >
> >  > I think not. Using admin virtqueue will end up with several problems:
> > >
> > > 1) the feature is not self contained so at the end we need transport
> > > specific facility to migrate the admin virtqueue
> >
> > You mixed up.
> > Admin queue of the owner device is not migrated.
>
If you actually read more, it is for the member device migration and not the owner.
Hence, owner device admin queue is not migrated.
 
> Why not? Ling Shan's proposal makes everything work including migrating the
> owner or in the case there's even no owner.
> 
I don’t see in his proposal how all the features and functionality supported is achieved.

> In this proposal, the facility (suspending, queue state, inflight
> descriptors) is decoupled from the transport specific API. Each transport can
> implement one or more types of interfaces. A MMIO based interface is
> proposed but It doesn't prevent you from adding admin commands for those
> facilities on top.
>
Even in proposal [1] most things are transport agonistic.
Member device proposal covers several aspects already of downtime, peer to peer, dirty page tracking, efficient querying VQ state and more.


> > Admin queue of the member device is migrated like any other queue using
> above [1].
> >
> > > 2) won't work in the nested environment, or we need complicated
> > > SR-IOV emulation in order to work
> > >
> > > >
> > > > Poking at the device from the driver to migrate it is not going to
> > > > work if the driver lives within guest.
> > >
> > > This is by design to allow live migration to work in the nested layer.
> > > And it's the way we've used for CPU and MMU. Anything may virtio
> > > different here?
> >
> > Nested and non-nested use cases likely cannot be addressed by single
> solution/interface.
> 
> I think Ling Shan's proposal addressed them both.
>
I don’t see how all above points are covered.

 
> > So both are orthogonal requirements to me.
> >
> > One can defined some administration commands to issue on the AQ of the
> member device itself for nested case.
> 
> This is not easy, DMA needs to be isolated so this means you need to either
> emulate SR-IOV and use AQ on virtual PF in the guest or using PASID.
>
This is why nested and non-nested cannot be treated equally and I don’t see this all covered in Ling proposal either.
For passthrough device use case [1] has covered the necessary pieces.

> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
> to migrate a single virtio device in the nest.

As proposed in [1] for pass through devices no customer needs to do SR-IOV or PASID in the guest for non-nest.

Nested is some special case and likely need mediated based scheme using administration commands.

In best case we can produce common commands, if that fits. 
Else both proposals are orthogonal addressing different use cases.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  6:47             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  6:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 11, 2023 12:01 PM
> 
> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > Hi Michael,
> >
> > > From: virtio-comment@lists.oasis-open.org
> > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > Sent: Monday, September 11, 2023 8:31 AM
> > >
> > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> > > >
> > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > This patch adds two new le16 fields to common configuration
> > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > >
> > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > >
> > > >
> > > > I do not see why this would be pci specific at all.
> > >
> > > This is the PCI interface for live migration. The facility is not specific to PCI.
> > >
> > > It can choose to reuse the common configuration or not, but the
> > > semantic is general enough to be used by other transports. We can
> > > introduce one for MMIO for sure.
> > >
> > > >
> > > > But besides I thought work on live migration will use admin queue.
> > > > This was explicitly one of the motivators.
> > >
> > Please find the proposal that uses administration commands for device
> migration at [1] for passthrough devices.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> 
> This proposal couples live migration with several requirements, and suffers from
> the exact issues I've mentioned below.
>
It does not.
Can you please list which one?
 
> In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
> other than the device status).
> 
There is no state machine in [1].
It is not coupled with PCI/SR-IOV either.
It supports PCI/SR-IOV transport and in future other transports too when they evolve.

> >
> >  > I think not. Using admin virtqueue will end up with several problems:
> > >
> > > 1) the feature is not self contained so at the end we need transport
> > > specific facility to migrate the admin virtqueue
> >
> > You mixed up.
> > Admin queue of the owner device is not migrated.
>
If you actually read more, it is for the member device migration and not the owner.
Hence, owner device admin queue is not migrated.
 
> Why not? Ling Shan's proposal makes everything work including migrating the
> owner or in the case there's even no owner.
> 
I don’t see in his proposal how all the features and functionality supported is achieved.

> In this proposal, the facility (suspending, queue state, inflight
> descriptors) is decoupled from the transport specific API. Each transport can
> implement one or more types of interfaces. A MMIO based interface is
> proposed but It doesn't prevent you from adding admin commands for those
> facilities on top.
>
Even in proposal [1] most things are transport agonistic.
Member device proposal covers several aspects already of downtime, peer to peer, dirty page tracking, efficient querying VQ state and more.


> > Admin queue of the member device is migrated like any other queue using
> above [1].
> >
> > > 2) won't work in the nested environment, or we need complicated
> > > SR-IOV emulation in order to work
> > >
> > > >
> > > > Poking at the device from the driver to migrate it is not going to
> > > > work if the driver lives within guest.
> > >
> > > This is by design to allow live migration to work in the nested layer.
> > > And it's the way we've used for CPU and MMU. Anything may virtio
> > > different here?
> >
> > Nested and non-nested use cases likely cannot be addressed by single
> solution/interface.
> 
> I think Ling Shan's proposal addressed them both.
>
I don’t see how all above points are covered.

 
> > So both are orthogonal requirements to me.
> >
> > One can defined some administration commands to issue on the AQ of the
> member device itself for nested case.
> 
> This is not easy, DMA needs to be isolated so this means you need to either
> emulate SR-IOV and use AQ on virtual PF in the guest or using PASID.
>
This is why nested and non-nested cannot be treated equally and I don’t see this all covered in Ling proposal either.
For passthrough device use case [1] has covered the necessary pieces.

> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
> to migrate a single virtio device in the nest.

As proposed in [1] for pass through devices no customer needs to do SR-IOV or PASID in the guest for non-nest.

Nested is some special case and likely need mediated based scheme using administration commands.

In best case we can produce common commands, if that fits. 
Else both proposals are orthogonal addressing different use cases.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:47             ` Parav Pandit
@ 2023-09-11  6:58               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  6:58 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 2:47 PM, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Monday, September 11, 2023 12:01 PM
>>
>> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
>>> Hi Michael,
>>>
>>>> From: virtio-comment@lists.oasis-open.org
>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
>>>> Sent: Monday, September 11, 2023 8:31 AM
>>>>
>>>> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
>> wrote:
>>>>> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>>>>>> This patch adds two new le16 fields to common configuration
>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>>>>>
>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>
>>>>> I do not see why this would be pci specific at all.
>>>> This is the PCI interface for live migration. The facility is not specific to PCI.
>>>>
>>>> It can choose to reuse the common configuration or not, but the
>>>> semantic is general enough to be used by other transports. We can
>>>> introduce one for MMIO for sure.
>>>>
>>>>> But besides I thought work on live migration will use admin queue.
>>>>> This was explicitly one of the motivators.
>>> Please find the proposal that uses administration commands for device
>> migration at [1] for passthrough devices.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> This proposal couples live migration with several requirements, and suffers from
>> the exact issues I've mentioned below.
>>
> It does not.
> Can you please list which one?
>   
>> In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
>> other than the device status).
>>
> There is no state machine in [1].
> It is not coupled with PCI/SR-IOV either.
> It supports PCI/SR-IOV transport and in future other transports too when they evolve.
>
>>>   > I think not. Using admin virtqueue will end up with several problems:
>>>> 1) the feature is not self contained so at the end we need transport
>>>> specific facility to migrate the admin virtqueue
>>> You mixed up.
>>> Admin queue of the owner device is not migrated.
> If you actually read more, it is for the member device migration and not the owner.
> Hence, owner device admin queue is not migrated.
Then how to serve bare-metal migration? Migrate by itself?
>   
>> Why not? Ling Shan's proposal makes everything work including migrating the
>> owner or in the case there's even no owner.
>>
> I don’t see in his proposal how all the features and functionality supported is achieved.
I will include in-flight descriptor tracker and diry-page traking in V2, 
anything else missed?
It can migrate the device itself, why don't you think so, can you name 
some issues we can work on
for improvements?
>
>> In this proposal, the facility (suspending, queue state, inflight
>> descriptors) is decoupled from the transport specific API. Each transport can
>> implement one or more types of interfaces. A MMIO based interface is
>> proposed but It doesn't prevent you from adding admin commands for those
>> facilities on top.
>>
> Even in proposal [1] most things are transport agonistic.
> Member device proposal covers several aspects already of downtime, peer to peer, dirty page tracking, efficient querying VQ state and more.
If you want to implement LM by admin vq, the facilities in my series can 
be re-used. E.g., forward your suspend to SUSPEND bit.
>
>
>>> Admin queue of the member device is migrated like any other queue using
>> above [1].
>>>> 2) won't work in the nested environment, or we need complicated
>>>> SR-IOV emulation in order to work
>>>>
>>>>> Poking at the device from the driver to migrate it is not going to
>>>>> work if the driver lives within guest.
>>>> This is by design to allow live migration to work in the nested layer.
>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>> different here?
>>> Nested and non-nested use cases likely cannot be addressed by single
>> solution/interface.
>>
>> I think Ling Shan's proposal addressed them both.
>>
> I don’t see how all above points are covered.
Why?


And how do you migrate nested VMs by admin vq?

How many admin vqs and the bandwidth are
reserved for migrate all VMs?

Remember CSP migrates all VMs on a host for powersaving or upgrade.
>
>   
>>> So both are orthogonal requirements to me.
>>>
>>> One can defined some administration commands to issue on the AQ of the
>> member device itself for nested case.
>>
>> This is not easy, DMA needs to be isolated so this means you need to either
>> emulate SR-IOV and use AQ on virtual PF in the guest or using PASID.
>>
> This is why nested and non-nested cannot be treated equally and I don’t see this all covered in Ling proposal either.
> For passthrough device use case [1] has covered the necessary pieces.
>
>> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
>> to migrate a single virtio device in the nest.
> As proposed in [1] for pass through devices no customer needs to do SR-IOV or PASID in the guest for non-nest.
>
> Nested is some special case and likely need mediated based scheme using administration commands.
>
> In best case we can produce common commands, if that fits.
> Else both proposals are orthogonal addressing different use cases.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  6:58               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  6:58 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 2:47 PM, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Monday, September 11, 2023 12:01 PM
>>
>> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
>>> Hi Michael,
>>>
>>>> From: virtio-comment@lists.oasis-open.org
>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
>>>> Sent: Monday, September 11, 2023 8:31 AM
>>>>
>>>> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
>> wrote:
>>>>> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>>>>>> This patch adds two new le16 fields to common configuration
>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>>>>>
>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>
>>>>> I do not see why this would be pci specific at all.
>>>> This is the PCI interface for live migration. The facility is not specific to PCI.
>>>>
>>>> It can choose to reuse the common configuration or not, but the
>>>> semantic is general enough to be used by other transports. We can
>>>> introduce one for MMIO for sure.
>>>>
>>>>> But besides I thought work on live migration will use admin queue.
>>>>> This was explicitly one of the motivators.
>>> Please find the proposal that uses administration commands for device
>> migration at [1] for passthrough devices.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> This proposal couples live migration with several requirements, and suffers from
>> the exact issues I've mentioned below.
>>
> It does not.
> Can you please list which one?
>   
>> In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
>> other than the device status).
>>
> There is no state machine in [1].
> It is not coupled with PCI/SR-IOV either.
> It supports PCI/SR-IOV transport and in future other transports too when they evolve.
>
>>>   > I think not. Using admin virtqueue will end up with several problems:
>>>> 1) the feature is not self contained so at the end we need transport
>>>> specific facility to migrate the admin virtqueue
>>> You mixed up.
>>> Admin queue of the owner device is not migrated.
> If you actually read more, it is for the member device migration and not the owner.
> Hence, owner device admin queue is not migrated.
Then how to serve bare-metal migration? Migrate by itself?
>   
>> Why not? Ling Shan's proposal makes everything work including migrating the
>> owner or in the case there's even no owner.
>>
> I don’t see in his proposal how all the features and functionality supported is achieved.
I will include in-flight descriptor tracker and diry-page traking in V2, 
anything else missed?
It can migrate the device itself, why don't you think so, can you name 
some issues we can work on
for improvements?
>
>> In this proposal, the facility (suspending, queue state, inflight
>> descriptors) is decoupled from the transport specific API. Each transport can
>> implement one or more types of interfaces. A MMIO based interface is
>> proposed but It doesn't prevent you from adding admin commands for those
>> facilities on top.
>>
> Even in proposal [1] most things are transport agonistic.
> Member device proposal covers several aspects already of downtime, peer to peer, dirty page tracking, efficient querying VQ state and more.
If you want to implement LM by admin vq, the facilities in my series can 
be re-used. E.g., forward your suspend to SUSPEND bit.
>
>
>>> Admin queue of the member device is migrated like any other queue using
>> above [1].
>>>> 2) won't work in the nested environment, or we need complicated
>>>> SR-IOV emulation in order to work
>>>>
>>>>> Poking at the device from the driver to migrate it is not going to
>>>>> work if the driver lives within guest.
>>>> This is by design to allow live migration to work in the nested layer.
>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>> different here?
>>> Nested and non-nested use cases likely cannot be addressed by single
>> solution/interface.
>>
>> I think Ling Shan's proposal addressed them both.
>>
> I don’t see how all above points are covered.
Why?


And how do you migrate nested VMs by admin vq?

How many admin vqs and the bandwidth are
reserved for migrate all VMs?

Remember CSP migrates all VMs on a host for powersaving or upgrade.
>
>   
>>> So both are orthogonal requirements to me.
>>>
>>> One can defined some administration commands to issue on the AQ of the
>> member device itself for nested case.
>>
>> This is not easy, DMA needs to be isolated so this means you need to either
>> emulate SR-IOV and use AQ on virtual PF in the guest or using PASID.
>>
> This is why nested and non-nested cannot be treated equally and I don’t see this all covered in Ling proposal either.
> For passthrough device use case [1] has covered the necessary pieces.
>
>> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
>> to migrate a single virtio device in the nest.
> As proposed in [1] for pass through devices no customer needs to do SR-IOV or PASID in the guest for non-nest.
>
> Nested is some special case and likely need mediated based scheme using administration commands.
>
> In best case we can produce common commands, if that fits.
> Else both proposals are orthogonal addressing different use cases.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:30           ` Jason Wang
@ 2023-09-11  6:59             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  6:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 11, 2023 12:01 PM


> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
> to migrate a single virtio device in the nest.
It is not the customer want/does not want.
The pci transport simply do not allow one to bifurcate the PCI device to do things like resetting the device and still letting partial things run like some admin commands or registers.
So one needs to do tricks of mediation and build things on such depdency for nested use case.

Anyway mediation approach of AQ etc does not address the basic passthrough requirements.
So both are still orthogonal proposals addressing different use cases.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  6:59             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  6:59 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, September 11, 2023 12:01 PM


> Customers don't want to have admin stuff, SR-IOV or PASID in the guest in order
> to migrate a single virtio device in the nest.
It is not the customer want/does not want.
The pci transport simply do not allow one to bifurcate the PCI device to do things like resetting the device and still letting partial things run like some admin commands or registers.
So one needs to do tricks of mediation and build things on such depdency for nested use case.

Anyway mediation approach of AQ etc does not address the basic passthrough requirements.
So both are still orthogonal proposals addressing different use cases.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:58               ` Zhu, Lingshan
@ 2023-09-11  7:07                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  7:07 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 12:28 PM

> > I don’t see in his proposal how all the features and functionality supported is
> achieved.
> I will include in-flight descriptor tracker and diry-page traking in V2, anything
> else missed?
> It can migrate the device itself, why don't you think so, can you name some
> issues we can work on for improvements?

I would like to see a proposal similar to [1] that can work without mediation in case if you want to combine two use cases under one.
Else, I don’t see a need to merge two things.

Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered in [1] for passthrough cases.

> If you want to implement LM by admin vq, the facilities in my series can be re-
> used. E.g., forward your suspend to SUSPEND bit.
Just VQ suspend is not enough...

> >
> >
> >>> Admin queue of the member device is migrated like any other queue
> >>> using
> >> above [1].
> >>>> 2) won't work in the nested environment, or we need complicated
> >>>> SR-IOV emulation in order to work
> >>>>
> >>>>> Poking at the device from the driver to migrate it is not going to
> >>>>> work if the driver lives within guest.
> >>>> This is by design to allow live migration to work in the nested layer.
> >>>> And it's the way we've used for CPU and MMU. Anything may virtio
> >>>> different here?
> >>> Nested and non-nested use cases likely cannot be addressed by single
> >> solution/interface.
> >>
> >> I think Ling Shan's proposal addressed them both.
> >>
> > I don’t see how all above points are covered.
> Why?
> 
> 
> And how do you migrate nested VMs by admin vq?
>
Hypervisor = level 1.
VM = level 2.
Nested VM = level 3.
VM of level 2 to take care of migrating level 3 composed device using its sw composition or may be using some kind of mediation that you proposed.

> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>
It does not matter because number of AQs is configurable that device and driver can decide to use.
I am not sure which BW are talking about.
There are many BW in place that one can regulate, at network level, pci level, VM level etc.

> Remember CSP migrates all VMs on a host for powersaving or upgrade.
I am not sure why the migration reason has any influence on the design.

The CSPs that we had discussed, care for performance more and hence prefers passthrough instead or mediation and don’t seem to be doing any nesting. 
CPU doesnt have support for 3 level of page table nesting either.
I agree that there could be other users who care for nested functionality.

Any ways, nesting and non-nesting are two different requirements.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  7:07                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  7:07 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 12:28 PM

> > I don’t see in his proposal how all the features and functionality supported is
> achieved.
> I will include in-flight descriptor tracker and diry-page traking in V2, anything
> else missed?
> It can migrate the device itself, why don't you think so, can you name some
> issues we can work on for improvements?

I would like to see a proposal similar to [1] that can work without mediation in case if you want to combine two use cases under one.
Else, I don’t see a need to merge two things.

Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered in [1] for passthrough cases.

> If you want to implement LM by admin vq, the facilities in my series can be re-
> used. E.g., forward your suspend to SUSPEND bit.
Just VQ suspend is not enough...

> >
> >
> >>> Admin queue of the member device is migrated like any other queue
> >>> using
> >> above [1].
> >>>> 2) won't work in the nested environment, or we need complicated
> >>>> SR-IOV emulation in order to work
> >>>>
> >>>>> Poking at the device from the driver to migrate it is not going to
> >>>>> work if the driver lives within guest.
> >>>> This is by design to allow live migration to work in the nested layer.
> >>>> And it's the way we've used for CPU and MMU. Anything may virtio
> >>>> different here?
> >>> Nested and non-nested use cases likely cannot be addressed by single
> >> solution/interface.
> >>
> >> I think Ling Shan's proposal addressed them both.
> >>
> > I don’t see how all above points are covered.
> Why?
> 
> 
> And how do you migrate nested VMs by admin vq?
>
Hypervisor = level 1.
VM = level 2.
Nested VM = level 3.
VM of level 2 to take care of migrating level 3 composed device using its sw composition or may be using some kind of mediation that you proposed.

> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>
It does not matter because number of AQs is configurable that device and driver can decide to use.
I am not sure which BW are talking about.
There are many BW in place that one can regulate, at network level, pci level, VM level etc.

> Remember CSP migrates all VMs on a host for powersaving or upgrade.
I am not sure why the migration reason has any influence on the design.

The CSPs that we had discussed, care for performance more and hence prefers passthrough instead or mediation and don’t seem to be doing any nesting. 
CPU doesnt have support for 3 level of page table nesting either.
I agree that there could be other users who care for nested functionality.

Any ways, nesting and non-nesting are two different requirements.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:07                 ` Parav Pandit
@ 2023-09-11  7:18                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  7:18 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 3:07 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 12:28 PM
>>> I don’t see in his proposal how all the features and functionality supported is
>> achieved.
>> I will include in-flight descriptor tracker and diry-page traking in V2, anything
>> else missed?
>> It can migrate the device itself, why don't you think so, can you name some
>> issues we can work on for improvements?
> I would like to see a proposal similar to [1] that can work without mediation in case if you want to combine two use cases under one.
> Else, I don’t see a need to merge two things.
>
> Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered in [1] for passthrough cases.
We are introducing basic facilities, feel free to re-use them in the 
admin vq solution.
>
>> If you want to implement LM by admin vq, the facilities in my series can be re-
>> used. E.g., forward your suspend to SUSPEND bit.
> Just VQ suspend is not enough...
In this series, it contains: device SUSPEND, queue state accessor.
MST required in-flight descriptor tracking, which will be included in 
next version.
>
>>>
>>>>> Admin queue of the member device is migrated like any other queue
>>>>> using
>>>> above [1].
>>>>>> 2) won't work in the nested environment, or we need complicated
>>>>>> SR-IOV emulation in order to work
>>>>>>
>>>>>>> Poking at the device from the driver to migrate it is not going to
>>>>>>> work if the driver lives within guest.
>>>>>> This is by design to allow live migration to work in the nested layer.
>>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>>>> different here?
>>>>> Nested and non-nested use cases likely cannot be addressed by single
>>>> solution/interface.
>>>>
>>>> I think Ling Shan's proposal addressed them both.
>>>>
>>> I don’t see how all above points are covered.
>> Why?
>>
>>
>> And how do you migrate nested VMs by admin vq?
>>
> Hypervisor = level 1.
> VM = level 2.
> Nested VM = level 3.
> VM of level 2 to take care of migrating level 3 composed device using its sw composition or may be using some kind of mediation that you proposed.
So, nested VM is not aware of the admin vq or does not have access to 
admin vq, right?
>
>> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>>
> It does not matter because number of AQs is configurable that device and driver can decide to use.
> I am not sure which BW are talking about.
> There are many BW in place that one can regulate, at network level, pci level, VM level etc.
It matters because of QOS and the downtime must converge.

E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the 
number in HW implementation and how
does the driver get informed?
>
>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> I am not sure why the migration reason has any influence on the design.
Because this design is for live migration.
>
> The CSPs that we had discussed, care for performance more and hence prefers passthrough instead or mediation and don’t seem to be doing any nesting.
> CPU doesnt have support for 3 level of page table nesting either.
> I agree that there could be other users who care for nested functionality.
>
> Any ways, nesting and non-nesting are two different requirements.
The LM facility should server both, or it is far from ready. And it does 
not serve bare-metal live migration either.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  7:18                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  7:18 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 3:07 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 12:28 PM
>>> I don’t see in his proposal how all the features and functionality supported is
>> achieved.
>> I will include in-flight descriptor tracker and diry-page traking in V2, anything
>> else missed?
>> It can migrate the device itself, why don't you think so, can you name some
>> issues we can work on for improvements?
> I would like to see a proposal similar to [1] that can work without mediation in case if you want to combine two use cases under one.
> Else, I don’t see a need to merge two things.
>
> Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered in [1] for passthrough cases.
We are introducing basic facilities, feel free to re-use them in the 
admin vq solution.
>
>> If you want to implement LM by admin vq, the facilities in my series can be re-
>> used. E.g., forward your suspend to SUSPEND bit.
> Just VQ suspend is not enough...
In this series, it contains: device SUSPEND, queue state accessor.
MST required in-flight descriptor tracking, which will be included in 
next version.
>
>>>
>>>>> Admin queue of the member device is migrated like any other queue
>>>>> using
>>>> above [1].
>>>>>> 2) won't work in the nested environment, or we need complicated
>>>>>> SR-IOV emulation in order to work
>>>>>>
>>>>>>> Poking at the device from the driver to migrate it is not going to
>>>>>>> work if the driver lives within guest.
>>>>>> This is by design to allow live migration to work in the nested layer.
>>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>>>> different here?
>>>>> Nested and non-nested use cases likely cannot be addressed by single
>>>> solution/interface.
>>>>
>>>> I think Ling Shan's proposal addressed them both.
>>>>
>>> I don’t see how all above points are covered.
>> Why?
>>
>>
>> And how do you migrate nested VMs by admin vq?
>>
> Hypervisor = level 1.
> VM = level 2.
> Nested VM = level 3.
> VM of level 2 to take care of migrating level 3 composed device using its sw composition or may be using some kind of mediation that you proposed.
So, nested VM is not aware of the admin vq or does not have access to 
admin vq, right?
>
>> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>>
> It does not matter because number of AQs is configurable that device and driver can decide to use.
> I am not sure which BW are talking about.
> There are many BW in place that one can regulate, at network level, pci level, VM level etc.
It matters because of QOS and the downtime must converge.

E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the 
number in HW implementation and how
does the driver get informed?
>
>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> I am not sure why the migration reason has any influence on the design.
Because this design is for live migration.
>
> The CSPs that we had discussed, care for performance more and hence prefers passthrough instead or mediation and don’t seem to be doing any nesting.
> CPU doesnt have support for 3 level of page table nesting either.
> I agree that there could be other users who care for nested functionality.
>
> Any ways, nesting and non-nesting are two different requirements.
The LM facility should server both, or it is far from ready. And it does 
not serve bare-metal live migration either.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:18                   ` Zhu, Lingshan
@ 2023-09-11  7:30                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  7:30 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 12:48 PM
> 
> On 9/11/2023 3:07 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 12:28 PM
> >>> I don’t see in his proposal how all the features and functionality
> >>> supported is
> >> achieved.
> >> I will include in-flight descriptor tracker and diry-page traking in
> >> V2, anything else missed?
> >> It can migrate the device itself, why don't you think so, can you
> >> name some issues we can work on for improvements?
> > I would like to see a proposal similar to [1] that can work without mediation
> in case if you want to combine two use cases under one.
> > Else, I don’t see a need to merge two things.
> >
> > Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
> in [1] for passthrough cases.
> We are introducing basic facilities, feel free to re-use them in the admin vq
> solution.
Basic facilities are added in [1] for passthrough devices.
You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.

> >
> >> If you want to implement LM by admin vq, the facilities in my series
> >> can be re- used. E.g., forward your suspend to SUSPEND bit.
> > Just VQ suspend is not enough...
> In this series, it contains: device SUSPEND, queue state accessor.
> MST required in-flight descriptor tracking, which will be included in next
> version.
For passthrough more than that is needed.
> >
> >>>
> >>>>> Admin queue of the member device is migrated like any other queue
> >>>>> using
> >>>> above [1].
> >>>>>> 2) won't work in the nested environment, or we need complicated
> >>>>>> SR-IOV emulation in order to work
> >>>>>>
> >>>>>>> Poking at the device from the driver to migrate it is not going
> >>>>>>> to work if the driver lives within guest.
> >>>>>> This is by design to allow live migration to work in the nested layer.
> >>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
> >>>>>> different here?
> >>>>> Nested and non-nested use cases likely cannot be addressed by
> >>>>> single
> >>>> solution/interface.
> >>>>
> >>>> I think Ling Shan's proposal addressed them both.
> >>>>
> >>> I don’t see how all above points are covered.
> >> Why?
> >>
> >>
> >> And how do you migrate nested VMs by admin vq?
> >>
> > Hypervisor = level 1.
> > VM = level 2.
> > Nested VM = level 3.
> > VM of level 2 to take care of migrating level 3 composed device using its sw
> composition or may be using some kind of mediation that you proposed.
> So, nested VM is not aware of the admin vq or does not have access to admin
> vq, right?
Right. It is not aware.

> >

> >> How many admin vqs and the bandwidth are reserved for migrate all VMs?
> >>
> > It does not matter because number of AQs is configurable that device and
> driver can decide to use.
> > I am not sure which BW are talking about.
> > There are many BW in place that one can regulate, at network level, pci level,
> VM level etc.
> It matters because of QOS and the downtime must converge.
QOS is such a broad term that is hard to debate unless you get to a specific point.
> 
> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
> in HW implementation and how does the driver get informed?
Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
You can ask similar question for RSS, how does hw device how many RSS queues are needed. 😊
Device exposes number of supported AQs that driver is free to use.

Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
But when such requirements arise, a device may support it.
Just like how a net device can support from 1 to 32K txqueues at spec level.

> >
> >> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> > I am not sure why the migration reason has any influence on the design.
> Because this design is for live migration.
> >
> > The CSPs that we had discussed, care for performance more and hence
> prefers passthrough instead or mediation and don’t seem to be doing any
> nesting.
> > CPU doesnt have support for 3 level of page table nesting either.
> > I agree that there could be other users who care for nested functionality.
> >
> > Any ways, nesting and non-nesting are two different requirements.
> The LM facility should server both, 
I don’t see how PCI spec let you do it.
PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
Hence, its done over admin queue for passthrough devices.

If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.

> And it does not serve bare-metal live migration either.
A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.



^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  7:30                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  7:30 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 12:48 PM
> 
> On 9/11/2023 3:07 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 12:28 PM
> >>> I don’t see in his proposal how all the features and functionality
> >>> supported is
> >> achieved.
> >> I will include in-flight descriptor tracker and diry-page traking in
> >> V2, anything else missed?
> >> It can migrate the device itself, why don't you think so, can you
> >> name some issues we can work on for improvements?
> > I would like to see a proposal similar to [1] that can work without mediation
> in case if you want to combine two use cases under one.
> > Else, I don’t see a need to merge two things.
> >
> > Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
> in [1] for passthrough cases.
> We are introducing basic facilities, feel free to re-use them in the admin vq
> solution.
Basic facilities are added in [1] for passthrough devices.
You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.

> >
> >> If you want to implement LM by admin vq, the facilities in my series
> >> can be re- used. E.g., forward your suspend to SUSPEND bit.
> > Just VQ suspend is not enough...
> In this series, it contains: device SUSPEND, queue state accessor.
> MST required in-flight descriptor tracking, which will be included in next
> version.
For passthrough more than that is needed.
> >
> >>>
> >>>>> Admin queue of the member device is migrated like any other queue
> >>>>> using
> >>>> above [1].
> >>>>>> 2) won't work in the nested environment, or we need complicated
> >>>>>> SR-IOV emulation in order to work
> >>>>>>
> >>>>>>> Poking at the device from the driver to migrate it is not going
> >>>>>>> to work if the driver lives within guest.
> >>>>>> This is by design to allow live migration to work in the nested layer.
> >>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
> >>>>>> different here?
> >>>>> Nested and non-nested use cases likely cannot be addressed by
> >>>>> single
> >>>> solution/interface.
> >>>>
> >>>> I think Ling Shan's proposal addressed them both.
> >>>>
> >>> I don’t see how all above points are covered.
> >> Why?
> >>
> >>
> >> And how do you migrate nested VMs by admin vq?
> >>
> > Hypervisor = level 1.
> > VM = level 2.
> > Nested VM = level 3.
> > VM of level 2 to take care of migrating level 3 composed device using its sw
> composition or may be using some kind of mediation that you proposed.
> So, nested VM is not aware of the admin vq or does not have access to admin
> vq, right?
Right. It is not aware.

> >

> >> How many admin vqs and the bandwidth are reserved for migrate all VMs?
> >>
> > It does not matter because number of AQs is configurable that device and
> driver can decide to use.
> > I am not sure which BW are talking about.
> > There are many BW in place that one can regulate, at network level, pci level,
> VM level etc.
> It matters because of QOS and the downtime must converge.
QOS is such a broad term that is hard to debate unless you get to a specific point.
> 
> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
> in HW implementation and how does the driver get informed?
Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
You can ask similar question for RSS, how does hw device how many RSS queues are needed. 😊
Device exposes number of supported AQs that driver is free to use.

Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
But when such requirements arise, a device may support it.
Just like how a net device can support from 1 to 32K txqueues at spec level.

> >
> >> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> > I am not sure why the migration reason has any influence on the design.
> Because this design is for live migration.
> >
> > The CSPs that we had discussed, care for performance more and hence
> prefers passthrough instead or mediation and don’t seem to be doing any
> nesting.
> > CPU doesnt have support for 3 level of page table nesting either.
> > I agree that there could be other users who care for nested functionality.
> >
> > Any ways, nesting and non-nesting are two different requirements.
> The LM facility should server both, 
I don’t see how PCI spec let you do it.
PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
Hence, its done over admin queue for passthrough devices.

If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.

> And it does not serve bare-metal live migration either.
A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.



^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:30                     ` Parav Pandit
@ 2023-09-11  7:58                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  7:58 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 3:30 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 12:48 PM
>>
>> On 9/11/2023 3:07 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 12:28 PM
>>>>> I don’t see in his proposal how all the features and functionality
>>>>> supported is
>>>> achieved.
>>>> I will include in-flight descriptor tracker and diry-page traking in
>>>> V2, anything else missed?
>>>> It can migrate the device itself, why don't you think so, can you
>>>> name some issues we can work on for improvements?
>>> I would like to see a proposal similar to [1] that can work without mediation
>> in case if you want to combine two use cases under one.
>>> Else, I don’t see a need to merge two things.
>>>
>>> Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
>> in [1] for passthrough cases.
>> We are introducing basic facilities, feel free to re-use them in the admin vq
>> solution.
> Basic facilities are added in [1] for passthrough devices.
> You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.
Basic facilities should be better not depend on others, but admin vq can 
re-use the basic facilities.

For P2P, what if the devices are placed in different IOMMU group?
>
>>>> If you want to implement LM by admin vq, the facilities in my series
>>>> can be re- used. E.g., forward your suspend to SUSPEND bit.
>>> Just VQ suspend is not enough...
>> In this series, it contains: device SUSPEND, queue state accessor.
>> MST required in-flight descriptor tracking, which will be included in next
>> version.
> For passthrough more than that is needed.
Dirty page tracking will be addressed too, others should we work on?
>>>>>>> Admin queue of the member device is migrated like any other queue
>>>>>>> using
>>>>>> above [1].
>>>>>>>> 2) won't work in the nested environment, or we need complicated
>>>>>>>> SR-IOV emulation in order to work
>>>>>>>>
>>>>>>>>> Poking at the device from the driver to migrate it is not going
>>>>>>>>> to work if the driver lives within guest.
>>>>>>>> This is by design to allow live migration to work in the nested layer.
>>>>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>>>>>> different here?
>>>>>>> Nested and non-nested use cases likely cannot be addressed by
>>>>>>> single
>>>>>> solution/interface.
>>>>>>
>>>>>> I think Ling Shan's proposal addressed them both.
>>>>>>
>>>>> I don’t see how all above points are covered.
>>>> Why?
>>>>
>>>>
>>>> And how do you migrate nested VMs by admin vq?
>>>>
>>> Hypervisor = level 1.
>>> VM = level 2.
>>> Nested VM = level 3.
>>> VM of level 2 to take care of migrating level 3 composed device using its sw
>> composition or may be using some kind of mediation that you proposed.
>> So, nested VM is not aware of the admin vq or does not have access to admin
>> vq, right?
> Right. It is not aware.
>
>>>> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>>>>
>>> It does not matter because number of AQs is configurable that device and
>> driver can decide to use.
>>> I am not sure which BW are talking about.
>>> There are many BW in place that one can regulate, at network level, pci level,
>> VM level etc.
>> It matters because of QOS and the downtime must converge.
> QOS is such a broad term that is hard to debate unless you get to a specific point.
E.g., there can be hundreds or thousands of VMs, how many admin vq are 
required to serve them when
LM? To converge, no timeout.
>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
>> in HW implementation and how does the driver get informed?
> Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
> You can ask similar question for RSS, how does hw device how many RSS queues are needed. 😊
> Device exposes number of supported AQs that driver is free to use.
RSS is not a must for the transition through maybe performance overhead.
But if the host can not finish Live Migration in the due time, then it is
a failed LM.
>
> Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
> But when such requirements arise, a device may support it.
> Just like how a net device can support from 1 to 32K txqueues at spec level.
The orchestration layer may do that for host upgrade or power-saving.
And the VMs may be required to migrate together, for example:
a cluster of VMs in the same subnet.

Lets do not introduce new frangibility
>
>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>> I am not sure why the migration reason has any influence on the design.
>> Because this design is for live migration.
>>> The CSPs that we had discussed, care for performance more and hence
>> prefers passthrough instead or mediation and don’t seem to be doing any
>> nesting.
>>> CPU doesnt have support for 3 level of page table nesting either.
>>> I agree that there could be other users who care for nested functionality.
>>>
>>> Any ways, nesting and non-nesting are two different requirements.
>> The LM facility should server both,
> I don’t see how PCI spec let you do it.
> PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
> Hence, its done over admin queue for passthrough devices.
>
> If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.
Do you mean nested? Why this series can not support nested?
>
>> And it does not serve bare-metal live migration either.
> A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
> But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.
Bare metal live migration require other components like firmware OS and 
partitioning, that's why the device live migration should not
be a blocker.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  7:58                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  7:58 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 3:30 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 12:48 PM
>>
>> On 9/11/2023 3:07 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 12:28 PM
>>>>> I don’t see in his proposal how all the features and functionality
>>>>> supported is
>>>> achieved.
>>>> I will include in-flight descriptor tracker and diry-page traking in
>>>> V2, anything else missed?
>>>> It can migrate the device itself, why don't you think so, can you
>>>> name some issues we can work on for improvements?
>>> I would like to see a proposal similar to [1] that can work without mediation
>> in case if you want to combine two use cases under one.
>>> Else, I don’t see a need to merge two things.
>>>
>>> Dirty page tracking, peer to peer, downtime, no-mediation, flrs all are covered
>> in [1] for passthrough cases.
>> We are introducing basic facilities, feel free to re-use them in the admin vq
>> solution.
> Basic facilities are added in [1] for passthrough devices.
> You can leverage them in your v2 for supporting p2p devices, dirty page tracking, passthrough support, shorter downtime and more.
Basic facilities should be better not depend on others, but admin vq can 
re-use the basic facilities.

For P2P, what if the devices are placed in different IOMMU group?
>
>>>> If you want to implement LM by admin vq, the facilities in my series
>>>> can be re- used. E.g., forward your suspend to SUSPEND bit.
>>> Just VQ suspend is not enough...
>> In this series, it contains: device SUSPEND, queue state accessor.
>> MST required in-flight descriptor tracking, which will be included in next
>> version.
> For passthrough more than that is needed.
Dirty page tracking will be addressed too, others should we work on?
>>>>>>> Admin queue of the member device is migrated like any other queue
>>>>>>> using
>>>>>> above [1].
>>>>>>>> 2) won't work in the nested environment, or we need complicated
>>>>>>>> SR-IOV emulation in order to work
>>>>>>>>
>>>>>>>>> Poking at the device from the driver to migrate it is not going
>>>>>>>>> to work if the driver lives within guest.
>>>>>>>> This is by design to allow live migration to work in the nested layer.
>>>>>>>> And it's the way we've used for CPU and MMU. Anything may virtio
>>>>>>>> different here?
>>>>>>> Nested and non-nested use cases likely cannot be addressed by
>>>>>>> single
>>>>>> solution/interface.
>>>>>>
>>>>>> I think Ling Shan's proposal addressed them both.
>>>>>>
>>>>> I don’t see how all above points are covered.
>>>> Why?
>>>>
>>>>
>>>> And how do you migrate nested VMs by admin vq?
>>>>
>>> Hypervisor = level 1.
>>> VM = level 2.
>>> Nested VM = level 3.
>>> VM of level 2 to take care of migrating level 3 composed device using its sw
>> composition or may be using some kind of mediation that you proposed.
>> So, nested VM is not aware of the admin vq or does not have access to admin
>> vq, right?
> Right. It is not aware.
>
>>>> How many admin vqs and the bandwidth are reserved for migrate all VMs?
>>>>
>>> It does not matter because number of AQs is configurable that device and
>> driver can decide to use.
>>> I am not sure which BW are talking about.
>>> There are many BW in place that one can regulate, at network level, pci level,
>> VM level etc.
>> It matters because of QOS and the downtime must converge.
> QOS is such a broad term that is hard to debate unless you get to a specific point.
E.g., there can be hundreds or thousands of VMs, how many admin vq are 
required to serve them when
LM? To converge, no timeout.
>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the number
>> in HW implementation and how does the driver get informed?
> Usually just one AQ is enough as proposal [1] is built around inherent downtime reduction.
> You can ask similar question for RSS, how does hw device how many RSS queues are needed. 😊
> Device exposes number of supported AQs that driver is free to use.
RSS is not a must for the transition through maybe performance overhead.
But if the host can not finish Live Migration in the due time, then it is
a failed LM.
>
> Most sane sys admins do not migrate 1000 VMs at same time for obvious reasons.
> But when such requirements arise, a device may support it.
> Just like how a net device can support from 1 to 32K txqueues at spec level.
The orchestration layer may do that for host upgrade or power-saving.
And the VMs may be required to migrate together, for example:
a cluster of VMs in the same subnet.

Lets do not introduce new frangibility
>
>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>> I am not sure why the migration reason has any influence on the design.
>> Because this design is for live migration.
>>> The CSPs that we had discussed, care for performance more and hence
>> prefers passthrough instead or mediation and don’t seem to be doing any
>> nesting.
>>> CPU doesnt have support for 3 level of page table nesting either.
>>> I agree that there could be other users who care for nested functionality.
>>>
>>> Any ways, nesting and non-nesting are two different requirements.
>> The LM facility should server both,
> I don’t see how PCI spec let you do it.
> PCI community already handed over this to SR-PCIM interface outside of the PCI spec domain.
> Hence, its done over admin queue for passthrough devices.
>
> If you can explain, how your proposal addresses passthrough support without mediation and also does DMA, I am very interested to learn that.
Do you mean nested? Why this series can not support nested?
>
>> And it does not serve bare-metal live migration either.
> A bare-metal migration seems a distance theory as one need side cpu, memory accessor apart from device accessor.
> But somehow if that exists, there will be similar admin device to migrate it may be TDDISP will own this whole piece one day.
Bare metal live migration require other components like firmware OS and 
partitioning, that's why the device live migration should not
be a blocker.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  7:58                       ` Zhu, Lingshan
@ 2023-09-11  8:12                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  8:12 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 1:28 PM


> > Basic facilities are added in [1] for passthrough devices.
> > You can leverage them in your v2 for supporting p2p devices, dirty page
> tracking, passthrough support, shorter downtime and more.
> Basic facilities should be better not depend on others, but admin vq can re-use
> the basic facilities.
> 
> For P2P, what if the devices are placed in different IOMMU group?
IOMMU grouping is one OS specific notion of an older API.
Hypervisor needs to do right setup anyway for using PCI spec define access control and other semantics which is outside the scope of [1].
It is outside primarily because proposal [1] is not migrating the whole "PCI device".
It is migrating the virtio device, so that we can migrate from PCI VF member to some software based device too.
And vis-versa.

> > QOS is such a broad term that is hard to debate unless you get to a specific
> point.
> E.g., there can be hundreds or thousands of VMs, how many admin vq are
> required to serve them when LM? To converge, no timeout.
How many RSS queues are required to reach 800Gbs NIC performance at what q depth at what interrupt moderation level?
Such details are outside the scope of virtio specification.
Those are implementation details of the device.

Similarly here for AQ too.
The inherent nature of AQ to queue commands and execute them out of order in the device is the fundamental reason, AQ is introduced.
And one can have more AQs to do unrelated work, mainly from the hypervisor owner device who wants to enqueue unrelated commands in parallel.

> >> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
> >> number in HW implementation and how does the driver get informed?
> > Usually just one AQ is enough as proposal [1] is built around inherent
> downtime reduction.
> > You can ask similar question for RSS, how does hw device how many RSS
> > queues are needed. 😊
> > Device exposes number of supported AQs that driver is free to use.
> RSS is not a must for the transition through maybe performance overhead.
> But if the host can not finish Live Migration in the due time, then it is a failed
> LM.
It can aborts the LM and restore it back by resuming the device.

> >
> > Most sane sys admins do not migrate 1000 VMs at same time for obvious
> reasons.
> > But when such requirements arise, a device may support it.
> > Just like how a net device can support from 1 to 32K txqueues at spec level.
> The orchestration layer may do that for host upgrade or power-saving.
> And the VMs may be required to migrate together, for example:
> a cluster of VMs in the same subnet.
> 
Sure. AQ of depth 1K can support 1K outstanding commands at a time for 1000 member devices.

> Lets do not introduce new frangibility
I don’t see any frangibility added by [1].
If you see one, please let me know.

> >

> >>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> >>> I am not sure why the migration reason has any influence on the design.
> >> Because this design is for live migration.
> >>> The CSPs that we had discussed, care for performance more and hence
> >> prefers passthrough instead or mediation and don’t seem to be doing
> >> any nesting.
> >>> CPU doesnt have support for 3 level of page table nesting either.
> >>> I agree that there could be other users who care for nested functionality.
> >>>
> >>> Any ways, nesting and non-nesting are two different requirements.
> >> The LM facility should server both,
> > I don’t see how PCI spec let you do it.
> > PCI community already handed over this to SR-PCIM interface outside of the
> PCI spec domain.
> > Hence, its done over admin queue for passthrough devices.
> >
> > If you can explain, how your proposal addresses passthrough support without
> mediation and also does DMA, I am very interested to learn that.
> Do you mean nested? 
Before nesting, just like to see basic single level passthrough to see functional and performant like [1].

> Why this series can not support nested?
I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> >> And it does not serve bare-metal live migration either.
> > A bare-metal migration seems a distance theory as one need side cpu,
> memory accessor apart from device accessor.
> > But somehow if that exists, there will be similar admin device to migrate it
> may be TDDISP will own this whole piece one day.
> Bare metal live migration require other components like firmware OS and
> partitioning, that's why the device live migration should not be a blocker.
Device migration is not blocker.
In-fact it facilitates for this future in case if that happens where side cpu like DPU or similar sideband virtio admin device can migrate over its admin vq.

Long ago when admin commands were discussed, this was discussed too where a admin device may not be an owner device.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  8:12                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  8:12 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 1:28 PM


> > Basic facilities are added in [1] for passthrough devices.
> > You can leverage them in your v2 for supporting p2p devices, dirty page
> tracking, passthrough support, shorter downtime and more.
> Basic facilities should be better not depend on others, but admin vq can re-use
> the basic facilities.
> 
> For P2P, what if the devices are placed in different IOMMU group?
IOMMU grouping is one OS specific notion of an older API.
Hypervisor needs to do right setup anyway for using PCI spec define access control and other semantics which is outside the scope of [1].
It is outside primarily because proposal [1] is not migrating the whole "PCI device".
It is migrating the virtio device, so that we can migrate from PCI VF member to some software based device too.
And vis-versa.

> > QOS is such a broad term that is hard to debate unless you get to a specific
> point.
> E.g., there can be hundreds or thousands of VMs, how many admin vq are
> required to serve them when LM? To converge, no timeout.
How many RSS queues are required to reach 800Gbs NIC performance at what q depth at what interrupt moderation level?
Such details are outside the scope of virtio specification.
Those are implementation details of the device.

Similarly here for AQ too.
The inherent nature of AQ to queue commands and execute them out of order in the device is the fundamental reason, AQ is introduced.
And one can have more AQs to do unrelated work, mainly from the hypervisor owner device who wants to enqueue unrelated commands in parallel.

> >> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
> >> number in HW implementation and how does the driver get informed?
> > Usually just one AQ is enough as proposal [1] is built around inherent
> downtime reduction.
> > You can ask similar question for RSS, how does hw device how many RSS
> > queues are needed. 😊
> > Device exposes number of supported AQs that driver is free to use.
> RSS is not a must for the transition through maybe performance overhead.
> But if the host can not finish Live Migration in the due time, then it is a failed
> LM.
It can aborts the LM and restore it back by resuming the device.

> >
> > Most sane sys admins do not migrate 1000 VMs at same time for obvious
> reasons.
> > But when such requirements arise, a device may support it.
> > Just like how a net device can support from 1 to 32K txqueues at spec level.
> The orchestration layer may do that for host upgrade or power-saving.
> And the VMs may be required to migrate together, for example:
> a cluster of VMs in the same subnet.
> 
Sure. AQ of depth 1K can support 1K outstanding commands at a time for 1000 member devices.

> Lets do not introduce new frangibility
I don’t see any frangibility added by [1].
If you see one, please let me know.

> >

> >>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> >>> I am not sure why the migration reason has any influence on the design.
> >> Because this design is for live migration.
> >>> The CSPs that we had discussed, care for performance more and hence
> >> prefers passthrough instead or mediation and don’t seem to be doing
> >> any nesting.
> >>> CPU doesnt have support for 3 level of page table nesting either.
> >>> I agree that there could be other users who care for nested functionality.
> >>>
> >>> Any ways, nesting and non-nesting are two different requirements.
> >> The LM facility should server both,
> > I don’t see how PCI spec let you do it.
> > PCI community already handed over this to SR-PCIM interface outside of the
> PCI spec domain.
> > Hence, its done over admin queue for passthrough devices.
> >
> > If you can explain, how your proposal addresses passthrough support without
> mediation and also does DMA, I am very interested to learn that.
> Do you mean nested? 
Before nesting, just like to see basic single level passthrough to see functional and performant like [1].

> Why this series can not support nested?
I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> >> And it does not serve bare-metal live migration either.
> > A bare-metal migration seems a distance theory as one need side cpu,
> memory accessor apart from device accessor.
> > But somehow if that exists, there will be similar admin device to migrate it
> may be TDDISP will own this whole piece one day.
> Bare metal live migration require other components like firmware OS and
> partitioning, that's why the device live migration should not be a blocker.
Device migration is not blocker.
In-fact it facilitates for this future in case if that happens where side cpu like DPU or similar sideband virtio admin device can migrate over its admin vq.

Long ago when admin commands were discussed, this was discussed too where a admin device may not be an owner device.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  8:12                         ` Parav Pandit
@ 2023-09-11  8:46                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  8:46 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 4:12 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 1:28 PM
>
>>> Basic facilities are added in [1] for passthrough devices.
>>> You can leverage them in your v2 for supporting p2p devices, dirty page
>> tracking, passthrough support, shorter downtime and more.
>> Basic facilities should be better not depend on others, but admin vq can re-use
>> the basic facilities.
>>
>> For P2P, what if the devices are placed in different IOMMU group?
> IOMMU grouping is one OS specific notion of an older API.
> Hypervisor needs to do right setup anyway for using PCI spec define access control and other semantics which is outside the scope of [1].
> It is outside primarily because proposal [1] is not migrating the whole "PCI device".
> It is migrating the virtio device, so that we can migrate from PCI VF member to some software based device too.
> And vis-versa.
Since you talked about P2P, IOMMU is basically for address space 
isolation. For security reasons, it is usually
suggest to passthrough all devices in one IOMMU group to a single guest.

That means, if you want the VF to perform P2P with the PF there the AQ 
resides, you have to place them in the same
IOMMU group and passthrough them all to a guest. So how this AQ serve 
other purposes?
>
>>> QOS is such a broad term that is hard to debate unless you get to a specific
>> point.
>> E.g., there can be hundreds or thousands of VMs, how many admin vq are
>> required to serve them when LM? To converge, no timeout.
> How many RSS queues are required to reach 800Gbs NIC performance at what q depth at what interrupt moderation level?
> Such details are outside the scope of virtio specification.
> Those are implementation details of the device.
>
> Similarly here for AQ too.
> The inherent nature of AQ to queue commands and execute them out of order in the device is the fundamental reason, AQ is introduced.
> And one can have more AQs to do unrelated work, mainly from the hypervisor owner device who wants to enqueue unrelated commands in parallel.
As pointed above insufficient RSS capabilities may cause performance 
overhead, not not a failure, the device still stay functional.
But too few AQ to serve too high volume of VMs may be a problem.
Yes the number of AQs are negotiable, but how many exactly should the HW 
provide?

Naming a number or an algorithm for the ratio of devices / num_of_AQs is 
beyond this topic, but I made my point clear.
>
>>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
>>>> number in HW implementation and how does the driver get informed?
>>> Usually just one AQ is enough as proposal [1] is built around inherent
>> downtime reduction.
>>> You can ask similar question for RSS, how does hw device how many RSS
>>> queues are needed. 😊
>>> Device exposes number of supported AQs that driver is free to use.
>> RSS is not a must for the transition through maybe performance overhead.
>> But if the host can not finish Live Migration in the due time, then it is a failed
>> LM.
> It can aborts the LM and restore it back by resuming the device.
aborts means fail
>
>>> Most sane sys admins do not migrate 1000 VMs at same time for obvious
>> reasons.
>>> But when such requirements arise, a device may support it.
>>> Just like how a net device can support from 1 to 32K txqueues at spec level.
>> The orchestration layer may do that for host upgrade or power-saving.
>> And the VMs may be required to migrate together, for example:
>> a cluster of VMs in the same subnet.
>>
> Sure. AQ of depth 1K can support 1K outstanding commands at a time for 1000 member devices.
PCI transition is FIFO, can depth = 1K introduce significant latency? 
And 1K depths is
almost identical to 2 X 500 queue depths, so still the same problem, how 
many resource
does the HW need to reserve to serve the worst case?

Let's forget the numbers, the point is clear.
>
>> Lets do not introduce new frangibility
> I don’t see any frangibility added by [1].
> If you see one, please let me know.
The resource and latency explained above.
>
>>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>>>> I am not sure why the migration reason has any influence on the design.
>>>> Because this design is for live migration.
>>>>> The CSPs that we had discussed, care for performance more and hence
>>>> prefers passthrough instead or mediation and don’t seem to be doing
>>>> any nesting.
>>>>> CPU doesnt have support for 3 level of page table nesting either.
>>>>> I agree that there could be other users who care for nested functionality.
>>>>>
>>>>> Any ways, nesting and non-nesting are two different requirements.
>>>> The LM facility should server both,
>>> I don’t see how PCI spec let you do it.
>>> PCI community already handed over this to SR-PCIM interface outside of the
>> PCI spec domain.
>>> Hence, its done over admin queue for passthrough devices.
>>>
>>> If you can explain, how your proposal addresses passthrough support without
>> mediation and also does DMA, I am very interested to learn that.
>> Do you mean nested?
> Before nesting, just like to see basic single level passthrough to see functional and performant like [1].
I think we have discussed about this, the nested guest is not aware of 
the admin vq and can not access it,
because the admin vq is a host facility.
>
>> Why this series can not support nested?
> I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
We have discussed many other issues in this thread.
>
>>>> And it does not serve bare-metal live migration either.
>>> A bare-metal migration seems a distance theory as one need side cpu,
>> memory accessor apart from device accessor.
>>> But somehow if that exists, there will be similar admin device to migrate it
>> may be TDDISP will own this whole piece one day.
>> Bare metal live migration require other components like firmware OS and
>> partitioning, that's why the device live migration should not be a blocker.
> Device migration is not blocker.
> In-fact it facilitates for this future in case if that happens where side cpu like DPU or similar sideband virtio admin device can migrate over its admin vq.
>
> Long ago when admin commands were discussed, this was discussed too where a admin device may not be an owner device.
The admin vq can not migrate it self therefore baremetal can not be 
migrated by admin vq


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  8:46                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  8:46 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 4:12 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 1:28 PM
>
>>> Basic facilities are added in [1] for passthrough devices.
>>> You can leverage them in your v2 for supporting p2p devices, dirty page
>> tracking, passthrough support, shorter downtime and more.
>> Basic facilities should be better not depend on others, but admin vq can re-use
>> the basic facilities.
>>
>> For P2P, what if the devices are placed in different IOMMU group?
> IOMMU grouping is one OS specific notion of an older API.
> Hypervisor needs to do right setup anyway for using PCI spec define access control and other semantics which is outside the scope of [1].
> It is outside primarily because proposal [1] is not migrating the whole "PCI device".
> It is migrating the virtio device, so that we can migrate from PCI VF member to some software based device too.
> And vis-versa.
Since you talked about P2P, IOMMU is basically for address space 
isolation. For security reasons, it is usually
suggest to passthrough all devices in one IOMMU group to a single guest.

That means, if you want the VF to perform P2P with the PF there the AQ 
resides, you have to place them in the same
IOMMU group and passthrough them all to a guest. So how this AQ serve 
other purposes?
>
>>> QOS is such a broad term that is hard to debate unless you get to a specific
>> point.
>> E.g., there can be hundreds or thousands of VMs, how many admin vq are
>> required to serve them when LM? To converge, no timeout.
> How many RSS queues are required to reach 800Gbs NIC performance at what q depth at what interrupt moderation level?
> Such details are outside the scope of virtio specification.
> Those are implementation details of the device.
>
> Similarly here for AQ too.
> The inherent nature of AQ to queue commands and execute them out of order in the device is the fundamental reason, AQ is introduced.
> And one can have more AQs to do unrelated work, mainly from the hypervisor owner device who wants to enqueue unrelated commands in parallel.
As pointed above insufficient RSS capabilities may cause performance 
overhead, not not a failure, the device still stay functional.
But too few AQ to serve too high volume of VMs may be a problem.
Yes the number of AQs are negotiable, but how many exactly should the HW 
provide?

Naming a number or an algorithm for the ratio of devices / num_of_AQs is 
beyond this topic, but I made my point clear.
>
>>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
>>>> number in HW implementation and how does the driver get informed?
>>> Usually just one AQ is enough as proposal [1] is built around inherent
>> downtime reduction.
>>> You can ask similar question for RSS, how does hw device how many RSS
>>> queues are needed. 😊
>>> Device exposes number of supported AQs that driver is free to use.
>> RSS is not a must for the transition through maybe performance overhead.
>> But if the host can not finish Live Migration in the due time, then it is a failed
>> LM.
> It can aborts the LM and restore it back by resuming the device.
aborts means fail
>
>>> Most sane sys admins do not migrate 1000 VMs at same time for obvious
>> reasons.
>>> But when such requirements arise, a device may support it.
>>> Just like how a net device can support from 1 to 32K txqueues at spec level.
>> The orchestration layer may do that for host upgrade or power-saving.
>> And the VMs may be required to migrate together, for example:
>> a cluster of VMs in the same subnet.
>>
> Sure. AQ of depth 1K can support 1K outstanding commands at a time for 1000 member devices.
PCI transition is FIFO, can depth = 1K introduce significant latency? 
And 1K depths is
almost identical to 2 X 500 queue depths, so still the same problem, how 
many resource
does the HW need to reserve to serve the worst case?

Let's forget the numbers, the point is clear.
>
>> Lets do not introduce new frangibility
> I don’t see any frangibility added by [1].
> If you see one, please let me know.
The resource and latency explained above.
>
>>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>>>> I am not sure why the migration reason has any influence on the design.
>>>> Because this design is for live migration.
>>>>> The CSPs that we had discussed, care for performance more and hence
>>>> prefers passthrough instead or mediation and don’t seem to be doing
>>>> any nesting.
>>>>> CPU doesnt have support for 3 level of page table nesting either.
>>>>> I agree that there could be other users who care for nested functionality.
>>>>>
>>>>> Any ways, nesting and non-nesting are two different requirements.
>>>> The LM facility should server both,
>>> I don’t see how PCI spec let you do it.
>>> PCI community already handed over this to SR-PCIM interface outside of the
>> PCI spec domain.
>>> Hence, its done over admin queue for passthrough devices.
>>>
>>> If you can explain, how your proposal addresses passthrough support without
>> mediation and also does DMA, I am very interested to learn that.
>> Do you mean nested?
> Before nesting, just like to see basic single level passthrough to see functional and performant like [1].
I think we have discussed about this, the nested guest is not aware of 
the admin vq and can not access it,
because the admin vq is a host facility.
>
>> Why this series can not support nested?
> I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
We have discussed many other issues in this thread.
>
>>>> And it does not serve bare-metal live migration either.
>>> A bare-metal migration seems a distance theory as one need side cpu,
>> memory accessor apart from device accessor.
>>> But somehow if that exists, there will be similar admin device to migrate it
>> may be TDDISP will own this whole piece one day.
>> Bare metal live migration require other components like firmware OS and
>> partitioning, that's why the device live migration should not be a blocker.
> Device migration is not blocker.
> In-fact it facilitates for this future in case if that happens where side cpu like DPU or similar sideband virtio admin device can migrate over its admin vq.
>
> Long ago when admin commands were discussed, this was discussed too where a admin device may not be an owner device.
The admin vq can not migrate it self therefore baremetal can not be 
migrated by admin vq


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  8:46                           ` Zhu, Lingshan
@ 2023-09-11  9:05                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  9:05 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 2:17 PM

[..]
> > Hypervisor needs to do right setup anyway for using PCI spec define access
> control and other semantics which is outside the scope of [1].
> > It is outside primarily because proposal [1] is not migrating the whole "PCI
> device".
> > It is migrating the virtio device, so that we can migrate from PCI VF member
> to some software based device too.
> > And vis-versa.
> Since you talked about P2P, IOMMU is basically for address space isolation. For
> security reasons, it is usually suggest to passthrough all devices in one IOMMU
> group to a single guest.
> 
IOMMU group is OS concept and no need to mix it here.

> That means, if you want the VF to perform P2P with the PF there the AQ
> resides, you have to place them in the same IOMMU group and passthrough
> them all to a guest. So how this AQ serve other purposes?
> >
A PF resides on the hypervisor. One or more VFs are passthrough to the VM.
When one wants to do nesting, may be one of the VF can do the role of admin for its peer VF.
Such extension is only needed for nesting.

For non-nesting being the known common case to us, such extension is not needed.

> >>> QOS is such a broad term that is hard to debate unless you get to a
> >>> specific
> >> point.
> >> E.g., there can be hundreds or thousands of VMs, how many admin vq
> >> are required to serve them when LM? To converge, no timeout.
> > How many RSS queues are required to reach 800Gbs NIC performance at what
> q depth at what interrupt moderation level?
> > Such details are outside the scope of virtio specification.
> > Those are implementation details of the device.
> >
> > Similarly here for AQ too.
> > The inherent nature of AQ to queue commands and execute them out of order
> in the device is the fundamental reason, AQ is introduced.
> > And one can have more AQs to do unrelated work, mainly from the hypervisor
> owner device who wants to enqueue unrelated commands in parallel.
> As pointed above insufficient RSS capabilities may cause performance
> overhead, not not a failure, the device still stay functional.
If UDP packets are dropped, even application can fail who do no retry.

> But too few AQ to serve too high volume of VMs may be a problem.
It is left for the device to implement the needed scale requirement.

> Yes the number of AQs are negotiable, but how many exactly should the HW
> provide?
Again, it is outside the scope. It is left to the device implementation like many other performance aspects.

> 
> Naming a number or an algorithm for the ratio of devices / num_of_AQs is
> beyond this topic, but I made my point clear.
Sure. It is beyond.
And it is not a concern either.

> >
> >>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
> >>>> number in HW implementation and how does the driver get informed?
> >>> Usually just one AQ is enough as proposal [1] is built around
> >>> inherent
> >> downtime reduction.
> >>> You can ask similar question for RSS, how does hw device how many
> >>> RSS queues are needed. 😊
> >>> Device exposes number of supported AQs that driver is free to use.
> >> RSS is not a must for the transition through maybe performance overhead.
> >> But if the host can not finish Live Migration in the due time, then
> >> it is a failed LM.
> > It can aborts the LM and restore it back by resuming the device.
> aborts means fail
> >
> >>> Most sane sys admins do not migrate 1000 VMs at same time for
> >>> obvious
> >> reasons.
> >>> But when such requirements arise, a device may support it.
> >>> Just like how a net device can support from 1 to 32K txqueues at spec level.
> >> The orchestration layer may do that for host upgrade or power-saving.
> >> And the VMs may be required to migrate together, for example:
> >> a cluster of VMs in the same subnet.
> >>
> > Sure. AQ of depth 1K can support 1K outstanding commands at a time for
> 1000 member devices.
> PCI transition is FIFO, 
I do not understand what is "PCI transition".

> can depth = 1K introduce significant latency?
AQ command execution is not done serially. There is enough text on the AQ chapter as I recall.

> And 1K depths is
> almost identical to 2 X 500 queue depths, so still the same problem, how many
> resource does the HW need to reserve to serve the worst case?
> 
You didn’t describe the problem.
Virtqueue is generic infrastructure to execute commands, be it admin command, control command, flow filter command, scsi command.
How many to execute in parallel, how many queues to have are device implementation specific.

> Let's forget the numbers, the point is clear.
Ok. I agree with you.
Number of AQs and its depth matter for this discussion, and its performance characterization is outside the spec.
Design wise, key thing to have the queuing interface between driver and device for device migration commands.
This enables both entities to execute things in parallel.

This is fully covered in [1].
So let's improve [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> >
> >> Lets do not introduce new frangibility
> > I don’t see any frangibility added by [1].
> > If you see one, please let me know.
> The resource and latency explained above.
> >
> >>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> >>>>> I am not sure why the migration reason has any influence on the design.
> >>>> Because this design is for live migration.
> >>>>> The CSPs that we had discussed, care for performance more and
> >>>>> hence
> >>>> prefers passthrough instead or mediation and don’t seem to be doing
> >>>> any nesting.
> >>>>> CPU doesnt have support for 3 level of page table nesting either.
> >>>>> I agree that there could be other users who care for nested functionality.
> >>>>>
> >>>>> Any ways, nesting and non-nesting are two different requirements.
> >>>> The LM facility should server both,
> >>> I don’t see how PCI spec let you do it.
> >>> PCI community already handed over this to SR-PCIM interface outside
> >>> of the
> >> PCI spec domain.
> >>> Hence, its done over admin queue for passthrough devices.
> >>>
> >>> If you can explain, how your proposal addresses passthrough support
> >>> without
> >> mediation and also does DMA, I am very interested to learn that.
> >> Do you mean nested?
> > Before nesting, just like to see basic single level passthrough to see functional
> and performant like [1].
> I think we have discussed about this, the nested guest is not aware of the admin
> vq and can not access it, because the admin vq is a host facility.

A nested guest VM is not aware and should not.
The VM hosting the nested VM, is aware on how to execute administrative commands using the owner device.

At present for PCI transport, owner device is PF.

In future for nesting, may be another peer VF can be delegated such task and it can perform administration command.

For bare metal may be some other admin device like DPU can do that role.

> >

> >> Why this series can not support nested?
> > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> covered in some device, vq suspend resume piece.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> We have discussed many other issues in this thread.
> >
> >>>> And it does not serve bare-metal live migration either.
> >>> A bare-metal migration seems a distance theory as one need side cpu,
> >> memory accessor apart from device accessor.
> >>> But somehow if that exists, there will be similar admin device to
> >>> migrate it
> >> may be TDDISP will own this whole piece one day.
> >> Bare metal live migration require other components like firmware OS
> >> and partitioning, that's why the device live migration should not be a
> blocker.
> > Device migration is not blocker.
> > In-fact it facilitates for this future in case if that happens where side cpu like
> DPU or similar sideband virtio admin device can migrate over its admin vq.
> >
> > Long ago when admin commands were discussed, this was discussed too
> where a admin device may not be an owner device.
> The admin vq can not migrate it self therefore baremetal can not be migrated
> by admin vq
May be I was not clear. The admin commands are executed by some other device than the PF.
In above I call it admin device, which can be a DPU may be some other dedicated admin device or something else.
Large part of non virtio infrastructure at platform, BIOS, cpu, memory level needs to evolve before virtio can utilize it.

We don’t need to cook all now, as long as we have administration commands its good.
The real credit owner for detaching the administration command from the admin vq is Michael. :)
We like to utilize this in future for DPU case where admin device is not the PCI PF.
Eswitch, PF migration etc may utilize it in future when needed.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  9:05                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11  9:05 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 2:17 PM

[..]
> > Hypervisor needs to do right setup anyway for using PCI spec define access
> control and other semantics which is outside the scope of [1].
> > It is outside primarily because proposal [1] is not migrating the whole "PCI
> device".
> > It is migrating the virtio device, so that we can migrate from PCI VF member
> to some software based device too.
> > And vis-versa.
> Since you talked about P2P, IOMMU is basically for address space isolation. For
> security reasons, it is usually suggest to passthrough all devices in one IOMMU
> group to a single guest.
> 
IOMMU group is OS concept and no need to mix it here.

> That means, if you want the VF to perform P2P with the PF there the AQ
> resides, you have to place them in the same IOMMU group and passthrough
> them all to a guest. So how this AQ serve other purposes?
> >
A PF resides on the hypervisor. One or more VFs are passthrough to the VM.
When one wants to do nesting, may be one of the VF can do the role of admin for its peer VF.
Such extension is only needed for nesting.

For non-nesting being the known common case to us, such extension is not needed.

> >>> QOS is such a broad term that is hard to debate unless you get to a
> >>> specific
> >> point.
> >> E.g., there can be hundreds or thousands of VMs, how many admin vq
> >> are required to serve them when LM? To converge, no timeout.
> > How many RSS queues are required to reach 800Gbs NIC performance at what
> q depth at what interrupt moderation level?
> > Such details are outside the scope of virtio specification.
> > Those are implementation details of the device.
> >
> > Similarly here for AQ too.
> > The inherent nature of AQ to queue commands and execute them out of order
> in the device is the fundamental reason, AQ is introduced.
> > And one can have more AQs to do unrelated work, mainly from the hypervisor
> owner device who wants to enqueue unrelated commands in parallel.
> As pointed above insufficient RSS capabilities may cause performance
> overhead, not not a failure, the device still stay functional.
If UDP packets are dropped, even application can fail who do no retry.

> But too few AQ to serve too high volume of VMs may be a problem.
It is left for the device to implement the needed scale requirement.

> Yes the number of AQs are negotiable, but how many exactly should the HW
> provide?
Again, it is outside the scope. It is left to the device implementation like many other performance aspects.

> 
> Naming a number or an algorithm for the ratio of devices / num_of_AQs is
> beyond this topic, but I made my point clear.
Sure. It is beyond.
And it is not a concern either.

> >
> >>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
> >>>> number in HW implementation and how does the driver get informed?
> >>> Usually just one AQ is enough as proposal [1] is built around
> >>> inherent
> >> downtime reduction.
> >>> You can ask similar question for RSS, how does hw device how many
> >>> RSS queues are needed. 😊
> >>> Device exposes number of supported AQs that driver is free to use.
> >> RSS is not a must for the transition through maybe performance overhead.
> >> But if the host can not finish Live Migration in the due time, then
> >> it is a failed LM.
> > It can aborts the LM and restore it back by resuming the device.
> aborts means fail
> >
> >>> Most sane sys admins do not migrate 1000 VMs at same time for
> >>> obvious
> >> reasons.
> >>> But when such requirements arise, a device may support it.
> >>> Just like how a net device can support from 1 to 32K txqueues at spec level.
> >> The orchestration layer may do that for host upgrade or power-saving.
> >> And the VMs may be required to migrate together, for example:
> >> a cluster of VMs in the same subnet.
> >>
> > Sure. AQ of depth 1K can support 1K outstanding commands at a time for
> 1000 member devices.
> PCI transition is FIFO, 
I do not understand what is "PCI transition".

> can depth = 1K introduce significant latency?
AQ command execution is not done serially. There is enough text on the AQ chapter as I recall.

> And 1K depths is
> almost identical to 2 X 500 queue depths, so still the same problem, how many
> resource does the HW need to reserve to serve the worst case?
> 
You didn’t describe the problem.
Virtqueue is generic infrastructure to execute commands, be it admin command, control command, flow filter command, scsi command.
How many to execute in parallel, how many queues to have are device implementation specific.

> Let's forget the numbers, the point is clear.
Ok. I agree with you.
Number of AQs and its depth matter for this discussion, and its performance characterization is outside the spec.
Design wise, key thing to have the queuing interface between driver and device for device migration commands.
This enables both entities to execute things in parallel.

This is fully covered in [1].
So let's improve [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> >
> >> Lets do not introduce new frangibility
> > I don’t see any frangibility added by [1].
> > If you see one, please let me know.
> The resource and latency explained above.
> >
> >>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
> >>>>> I am not sure why the migration reason has any influence on the design.
> >>>> Because this design is for live migration.
> >>>>> The CSPs that we had discussed, care for performance more and
> >>>>> hence
> >>>> prefers passthrough instead or mediation and don’t seem to be doing
> >>>> any nesting.
> >>>>> CPU doesnt have support for 3 level of page table nesting either.
> >>>>> I agree that there could be other users who care for nested functionality.
> >>>>>
> >>>>> Any ways, nesting and non-nesting are two different requirements.
> >>>> The LM facility should server both,
> >>> I don’t see how PCI spec let you do it.
> >>> PCI community already handed over this to SR-PCIM interface outside
> >>> of the
> >> PCI spec domain.
> >>> Hence, its done over admin queue for passthrough devices.
> >>>
> >>> If you can explain, how your proposal addresses passthrough support
> >>> without
> >> mediation and also does DMA, I am very interested to learn that.
> >> Do you mean nested?
> > Before nesting, just like to see basic single level passthrough to see functional
> and performant like [1].
> I think we have discussed about this, the nested guest is not aware of the admin
> vq and can not access it, because the admin vq is a host facility.

A nested guest VM is not aware and should not.
The VM hosting the nested VM, is aware on how to execute administrative commands using the owner device.

At present for PCI transport, owner device is PF.

In future for nesting, may be another peer VF can be delegated such task and it can perform administration command.

For bare metal may be some other admin device like DPU can do that role.

> >

> >> Why this series can not support nested?
> > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> covered in some device, vq suspend resume piece.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> We have discussed many other issues in this thread.
> >
> >>>> And it does not serve bare-metal live migration either.
> >>> A bare-metal migration seems a distance theory as one need side cpu,
> >> memory accessor apart from device accessor.
> >>> But somehow if that exists, there will be similar admin device to
> >>> migrate it
> >> may be TDDISP will own this whole piece one day.
> >> Bare metal live migration require other components like firmware OS
> >> and partitioning, that's why the device live migration should not be a
> blocker.
> > Device migration is not blocker.
> > In-fact it facilitates for this future in case if that happens where side cpu like
> DPU or similar sideband virtio admin device can migrate over its admin vq.
> >
> > Long ago when admin commands were discussed, this was discussed too
> where a admin device may not be an owner device.
> The admin vq can not migrate it self therefore baremetal can not be migrated
> by admin vq
May be I was not clear. The admin commands are executed by some other device than the PF.
In above I call it admin device, which can be a DPU may be some other dedicated admin device or something else.
Large part of non virtio infrastructure at platform, BIOS, cpu, memory level needs to evolve before virtio can utilize it.

We don’t need to cook all now, as long as we have administration commands its good.
The real credit owner for detaching the administration command from the admin vq is Michael. :)
We like to utilize this in future for DPU case where admin device is not the PCI PF.
Eswitch, PF migration etc may utilize it in future when needed.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  9:05                             ` Parav Pandit
@ 2023-09-11  9:32                               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  9:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 5:05 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 2:17 PM
> [..]
>>> Hypervisor needs to do right setup anyway for using PCI spec define access
>> control and other semantics which is outside the scope of [1].
>>> It is outside primarily because proposal [1] is not migrating the whole "PCI
>> device".
>>> It is migrating the virtio device, so that we can migrate from PCI VF member
>> to some software based device too.
>>> And vis-versa.
>> Since you talked about P2P, IOMMU is basically for address space isolation. For
>> security reasons, it is usually suggest to passthrough all devices in one IOMMU
>> group to a single guest.
>>
> IOMMU group is OS concept and no need to mix it here.
>
>> That means, if you want the VF to perform P2P with the PF there the AQ
>> resides, you have to place them in the same IOMMU group and passthrough
>> them all to a guest. So how this AQ serve other purposes?
> A PF resides on the hypervisor. One or more VFs are passthrough to the VM.
> When one wants to do nesting, may be one of the VF can do the role of admin for its peer VF.
> Such extension is only needed for nesting.
>
> For non-nesting being the known common case to us, such extension is not needed.
So implement AQ on the "admin" VF? This require the HW reserve dedicated 
resource for every VF?
So expensive, Overkill?

And a VF may be managed by the PF and its admin "vf"?
>
>>>>> QOS is such a broad term that is hard to debate unless you get to a
>>>>> specific
>>>> point.
>>>> E.g., there can be hundreds or thousands of VMs, how many admin vq
>>>> are required to serve them when LM? To converge, no timeout.
>>> How many RSS queues are required to reach 800Gbs NIC performance at what
>> q depth at what interrupt moderation level?
>>> Such details are outside the scope of virtio specification.
>>> Those are implementation details of the device.
>>>
>>> Similarly here for AQ too.
>>> The inherent nature of AQ to queue commands and execute them out of order
>> in the device is the fundamental reason, AQ is introduced.
>>> And one can have more AQs to do unrelated work, mainly from the hypervisor
>> owner device who wants to enqueue unrelated commands in parallel.
>> As pointed above insufficient RSS capabilities may cause performance
>> overhead, not not a failure, the device still stay functional.
> If UDP packets are dropped, even application can fail who do no retry.
UDP is not reliable, and performance overhead does not mean fail.
>
>> But too few AQ to serve too high volume of VMs may be a problem.
> It is left for the device to implement the needed scale requirement.
Yes, so how many HW resource should the HW implementation reserved
to serve the worst case? Half of the board resource?
>
>> Yes the number of AQs are negotiable, but how many exactly should the HW
>> provide?
> Again, it is outside the scope. It is left to the device implementation like many other performance aspects.
I agree we can skip this issue, but the point is clear. and this is not 
only a performance issue,
this can lead to failed LM.
>
>> Naming a number or an algorithm for the ratio of devices / num_of_AQs is
>> beyond this topic, but I made my point clear.
> Sure. It is beyond.
> And it is not a concern either.
It is, the user expect the LM process success than fail.
>
>>>>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
>>>>>> number in HW implementation and how does the driver get informed?
>>>>> Usually just one AQ is enough as proposal [1] is built around
>>>>> inherent
>>>> downtime reduction.
>>>>> You can ask similar question for RSS, how does hw device how many
>>>>> RSS queues are needed. 😊
>>>>> Device exposes number of supported AQs that driver is free to use.
>>>> RSS is not a must for the transition through maybe performance overhead.
>>>> But if the host can not finish Live Migration in the due time, then
>>>> it is a failed LM.
>>> It can aborts the LM and restore it back by resuming the device.
>> aborts means fail
>>>>> Most sane sys admins do not migrate 1000 VMs at same time for
>>>>> obvious
>>>> reasons.
>>>>> But when such requirements arise, a device may support it.
>>>>> Just like how a net device can support from 1 to 32K txqueues at spec level.
>>>> The orchestration layer may do that for host upgrade or power-saving.
>>>> And the VMs may be required to migrate together, for example:
>>>> a cluster of VMs in the same subnet.
>>>>
>>> Sure. AQ of depth 1K can support 1K outstanding commands at a time for
>> 1000 member devices.
>> PCI transition is FIFO,
> I do not understand what is "PCI transition".
PCI data flow.
>
>> can depth = 1K introduce significant latency?
> AQ command execution is not done serially. There is enough text on the AQ chapter as I recall.
Then require more HW resource, I don't see difference.
>
>> And 1K depths is
>> almost identical to 2 X 500 queue depths, so still the same problem, how many
>> resource does the HW need to reserve to serve the worst case?
>>
> You didn’t describe the problem.
> Virtqueue is generic infrastructure to execute commands, be it admin command, control command, flow filter command, scsi command.
> How many to execute in parallel, how many queues to have are device implementation specific.
So the question is how many to serve the worst case? Does the HW vendor 
need to reserve half of the board resource?
>
>> Let's forget the numbers, the point is clear.
> Ok. I agree with you.
> Number of AQs and its depth matter for this discussion, and its performance characterization is outside the spec.
> Design wise, key thing to have the queuing interface between driver and device for device migration commands.
> This enables both entities to execute things in parallel.
>
> This is fully covered in [1].
> So let's improve [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I am not sure, why [1] is a must? There are certain issues discussed in 
this thread for [1] stay unsolved.

By the way, do you see anything we need to improve in this series?
>
>>>> Lets do not introduce new frangibility
>>> I don’t see any frangibility added by [1].
>>> If you see one, please let me know.
>> The resource and latency explained above.
>>>>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>>>>>> I am not sure why the migration reason has any influence on the design.
>>>>>> Because this design is for live migration.
>>>>>>> The CSPs that we had discussed, care for performance more and
>>>>>>> hence
>>>>>> prefers passthrough instead or mediation and don’t seem to be doing
>>>>>> any nesting.
>>>>>>> CPU doesnt have support for 3 level of page table nesting either.
>>>>>>> I agree that there could be other users who care for nested functionality.
>>>>>>>
>>>>>>> Any ways, nesting and non-nesting are two different requirements.
>>>>>> The LM facility should server both,
>>>>> I don’t see how PCI spec let you do it.
>>>>> PCI community already handed over this to SR-PCIM interface outside
>>>>> of the
>>>> PCI spec domain.
>>>>> Hence, its done over admin queue for passthrough devices.
>>>>>
>>>>> If you can explain, how your proposal addresses passthrough support
>>>>> without
>>>> mediation and also does DMA, I am very interested to learn that.
>>>> Do you mean nested?
>>> Before nesting, just like to see basic single level passthrough to see functional
>> and performant like [1].
>> I think we have discussed about this, the nested guest is not aware of the admin
>> vq and can not access it, because the admin vq is a host facility.
> A nested guest VM is not aware and should not.
> The VM hosting the nested VM, is aware on how to execute administrative commands using the owner device.
The VM does not talk to admin vq either, the admin vq is a host 
facility, host owns it.
>
> At present for PCI transport, owner device is PF.
>
> In future for nesting, may be another peer VF can be delegated such task and it can perform administration command.
Then it may run into the problems explained above.
>
> For bare metal may be some other admin device like DPU can do that role.
So [1] is not ready
>
>>>> Why this series can not support nested?
>>> I don’t see all the aspects that I covered in series [1] ranging from flr, device
>> context migration, virtio level reset, dirty page tracking, p2p support, etc.
>> covered in some device, vq suspend resume piece.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> We have discussed many other issues in this thread.
>>>>>> And it does not serve bare-metal live migration either.
>>>>> A bare-metal migration seems a distance theory as one need side cpu,
>>>> memory accessor apart from device accessor.
>>>>> But somehow if that exists, there will be similar admin device to
>>>>> migrate it
>>>> may be TDDISP will own this whole piece one day.
>>>> Bare metal live migration require other components like firmware OS
>>>> and partitioning, that's why the device live migration should not be a
>> blocker.
>>> Device migration is not blocker.
>>> In-fact it facilitates for this future in case if that happens where side cpu like
>> DPU or similar sideband virtio admin device can migrate over its admin vq.
>>> Long ago when admin commands were discussed, this was discussed too
>> where a admin device may not be an owner device.
>> The admin vq can not migrate it self therefore baremetal can not be migrated
>> by admin vq
> May be I was not clear. The admin commands are executed by some other device than the PF.
 From SW perspective, it should be the admin vq and the device it resides.
> In above I call it admin device, which can be a DPU may be some other dedicated admin device or something else.
> Large part of non virtio infrastructure at platform, BIOS, cpu, memory level needs to evolve before virtio can utilize it.
virito device should be self-contained. Not depend on other components.
>
> We don’t need to cook all now, as long as we have administration commands its good.
> The real credit owner for detaching the administration command from the admin vq is Michael. :)
> We like to utilize this in future for DPU case where admin device is not the PCI PF.
> Eswitch, PF migration etc may utilize it in future when needed.
Again, the design should not rely on other host components.

And it is not about the credit, this is reliable work outcome


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11  9:32                               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-11  9:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 5:05 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 2:17 PM
> [..]
>>> Hypervisor needs to do right setup anyway for using PCI spec define access
>> control and other semantics which is outside the scope of [1].
>>> It is outside primarily because proposal [1] is not migrating the whole "PCI
>> device".
>>> It is migrating the virtio device, so that we can migrate from PCI VF member
>> to some software based device too.
>>> And vis-versa.
>> Since you talked about P2P, IOMMU is basically for address space isolation. For
>> security reasons, it is usually suggest to passthrough all devices in one IOMMU
>> group to a single guest.
>>
> IOMMU group is OS concept and no need to mix it here.
>
>> That means, if you want the VF to perform P2P with the PF there the AQ
>> resides, you have to place them in the same IOMMU group and passthrough
>> them all to a guest. So how this AQ serve other purposes?
> A PF resides on the hypervisor. One or more VFs are passthrough to the VM.
> When one wants to do nesting, may be one of the VF can do the role of admin for its peer VF.
> Such extension is only needed for nesting.
>
> For non-nesting being the known common case to us, such extension is not needed.
So implement AQ on the "admin" VF? This require the HW reserve dedicated 
resource for every VF?
So expensive, Overkill?

And a VF may be managed by the PF and its admin "vf"?
>
>>>>> QOS is such a broad term that is hard to debate unless you get to a
>>>>> specific
>>>> point.
>>>> E.g., there can be hundreds or thousands of VMs, how many admin vq
>>>> are required to serve them when LM? To converge, no timeout.
>>> How many RSS queues are required to reach 800Gbs NIC performance at what
>> q depth at what interrupt moderation level?
>>> Such details are outside the scope of virtio specification.
>>> Those are implementation details of the device.
>>>
>>> Similarly here for AQ too.
>>> The inherent nature of AQ to queue commands and execute them out of order
>> in the device is the fundamental reason, AQ is introduced.
>>> And one can have more AQs to do unrelated work, mainly from the hypervisor
>> owner device who wants to enqueue unrelated commands in parallel.
>> As pointed above insufficient RSS capabilities may cause performance
>> overhead, not not a failure, the device still stay functional.
> If UDP packets are dropped, even application can fail who do no retry.
UDP is not reliable, and performance overhead does not mean fail.
>
>> But too few AQ to serve too high volume of VMs may be a problem.
> It is left for the device to implement the needed scale requirement.
Yes, so how many HW resource should the HW implementation reserved
to serve the worst case? Half of the board resource?
>
>> Yes the number of AQs are negotiable, but how many exactly should the HW
>> provide?
> Again, it is outside the scope. It is left to the device implementation like many other performance aspects.
I agree we can skip this issue, but the point is clear. and this is not 
only a performance issue,
this can lead to failed LM.
>
>> Naming a number or an algorithm for the ratio of devices / num_of_AQs is
>> beyond this topic, but I made my point clear.
> Sure. It is beyond.
> And it is not a concern either.
It is, the user expect the LM process success than fail.
>
>>>>>> E.g., do you need 100 admin vqs for 1000 VMs? How do you decide the
>>>>>> number in HW implementation and how does the driver get informed?
>>>>> Usually just one AQ is enough as proposal [1] is built around
>>>>> inherent
>>>> downtime reduction.
>>>>> You can ask similar question for RSS, how does hw device how many
>>>>> RSS queues are needed. 😊
>>>>> Device exposes number of supported AQs that driver is free to use.
>>>> RSS is not a must for the transition through maybe performance overhead.
>>>> But if the host can not finish Live Migration in the due time, then
>>>> it is a failed LM.
>>> It can aborts the LM and restore it back by resuming the device.
>> aborts means fail
>>>>> Most sane sys admins do not migrate 1000 VMs at same time for
>>>>> obvious
>>>> reasons.
>>>>> But when such requirements arise, a device may support it.
>>>>> Just like how a net device can support from 1 to 32K txqueues at spec level.
>>>> The orchestration layer may do that for host upgrade or power-saving.
>>>> And the VMs may be required to migrate together, for example:
>>>> a cluster of VMs in the same subnet.
>>>>
>>> Sure. AQ of depth 1K can support 1K outstanding commands at a time for
>> 1000 member devices.
>> PCI transition is FIFO,
> I do not understand what is "PCI transition".
PCI data flow.
>
>> can depth = 1K introduce significant latency?
> AQ command execution is not done serially. There is enough text on the AQ chapter as I recall.
Then require more HW resource, I don't see difference.
>
>> And 1K depths is
>> almost identical to 2 X 500 queue depths, so still the same problem, how many
>> resource does the HW need to reserve to serve the worst case?
>>
> You didn’t describe the problem.
> Virtqueue is generic infrastructure to execute commands, be it admin command, control command, flow filter command, scsi command.
> How many to execute in parallel, how many queues to have are device implementation specific.
So the question is how many to serve the worst case? Does the HW vendor 
need to reserve half of the board resource?
>
>> Let's forget the numbers, the point is clear.
> Ok. I agree with you.
> Number of AQs and its depth matter for this discussion, and its performance characterization is outside the spec.
> Design wise, key thing to have the queuing interface between driver and device for device migration commands.
> This enables both entities to execute things in parallel.
>
> This is fully covered in [1].
> So let's improve [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I am not sure, why [1] is a must? There are certain issues discussed in 
this thread for [1] stay unsolved.

By the way, do you see anything we need to improve in this series?
>
>>>> Lets do not introduce new frangibility
>>> I don’t see any frangibility added by [1].
>>> If you see one, please let me know.
>> The resource and latency explained above.
>>>>>>>> Remember CSP migrates all VMs on a host for powersaving or upgrade.
>>>>>>> I am not sure why the migration reason has any influence on the design.
>>>>>> Because this design is for live migration.
>>>>>>> The CSPs that we had discussed, care for performance more and
>>>>>>> hence
>>>>>> prefers passthrough instead or mediation and don’t seem to be doing
>>>>>> any nesting.
>>>>>>> CPU doesnt have support for 3 level of page table nesting either.
>>>>>>> I agree that there could be other users who care for nested functionality.
>>>>>>>
>>>>>>> Any ways, nesting and non-nesting are two different requirements.
>>>>>> The LM facility should server both,
>>>>> I don’t see how PCI spec let you do it.
>>>>> PCI community already handed over this to SR-PCIM interface outside
>>>>> of the
>>>> PCI spec domain.
>>>>> Hence, its done over admin queue for passthrough devices.
>>>>>
>>>>> If you can explain, how your proposal addresses passthrough support
>>>>> without
>>>> mediation and also does DMA, I am very interested to learn that.
>>>> Do you mean nested?
>>> Before nesting, just like to see basic single level passthrough to see functional
>> and performant like [1].
>> I think we have discussed about this, the nested guest is not aware of the admin
>> vq and can not access it, because the admin vq is a host facility.
> A nested guest VM is not aware and should not.
> The VM hosting the nested VM, is aware on how to execute administrative commands using the owner device.
The VM does not talk to admin vq either, the admin vq is a host 
facility, host owns it.
>
> At present for PCI transport, owner device is PF.
>
> In future for nesting, may be another peer VF can be delegated such task and it can perform administration command.
Then it may run into the problems explained above.
>
> For bare metal may be some other admin device like DPU can do that role.
So [1] is not ready
>
>>>> Why this series can not support nested?
>>> I don’t see all the aspects that I covered in series [1] ranging from flr, device
>> context migration, virtio level reset, dirty page tracking, p2p support, etc.
>> covered in some device, vq suspend resume piece.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> We have discussed many other issues in this thread.
>>>>>> And it does not serve bare-metal live migration either.
>>>>> A bare-metal migration seems a distance theory as one need side cpu,
>>>> memory accessor apart from device accessor.
>>>>> But somehow if that exists, there will be similar admin device to
>>>>> migrate it
>>>> may be TDDISP will own this whole piece one day.
>>>> Bare metal live migration require other components like firmware OS
>>>> and partitioning, that's why the device live migration should not be a
>> blocker.
>>> Device migration is not blocker.
>>> In-fact it facilitates for this future in case if that happens where side cpu like
>> DPU or similar sideband virtio admin device can migrate over its admin vq.
>>> Long ago when admin commands were discussed, this was discussed too
>> where a admin device may not be an owner device.
>> The admin vq can not migrate it self therefore baremetal can not be migrated
>> by admin vq
> May be I was not clear. The admin commands are executed by some other device than the PF.
 From SW perspective, it should be the admin vq and the device it resides.
> In above I call it admin device, which can be a DPU may be some other dedicated admin device or something else.
> Large part of non virtio infrastructure at platform, BIOS, cpu, memory level needs to evolve before virtio can utilize it.
virito device should be self-contained. Not depend on other components.
>
> We don’t need to cook all now, as long as we have administration commands its good.
> The real credit owner for detaching the administration command from the admin vq is Michael. :)
> We like to utilize this in future for DPU case where admin device is not the PCI PF.
> Eswitch, PF migration etc may utilize it in future when needed.
Again, the design should not rely on other host components.

And it is not about the credit, this is reliable work outcome


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:30           ` Jason Wang
@ 2023-09-11 10:15             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-11 10:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
> Customers don't want to have admin stuff, SR-IOV or PASID in the guest
> in order to migrate a single virtio device in the nest.

Built an alternative facility to implement admin commands then.
The advantage of admin commands is they are nicely contained.
This proposal is way too intrusive.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11 10:15             ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-11 10:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
> Customers don't want to have admin stuff, SR-IOV or PASID in the guest
> in order to migrate a single virtio device in the nest.

Built an alternative facility to implement admin commands then.
The advantage of admin commands is they are nicely contained.
This proposal is way too intrusive.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  9:32                               ` Zhu, Lingshan
@ 2023-09-11 10:21                                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11 10:21 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 3:03 PM

> So implement AQ on the "admin" VF? This require the HW reserve dedicated
> resource for every VF?
> So expensive, Overkill?
> 
> And a VF may be managed by the PF and its admin "vf"?
Yes.

> > If UDP packets are dropped, even application can fail who do no retry.
> UDP is not reliable, and performance overhead does not mean fail.
It largely depends on application.
I have seen iperf UDP failing on packet drop and never recovered.
A retransmission over UDP can fail.

> >
> >> But too few AQ to serve too high volume of VMs may be a problem.
> > It is left for the device to implement the needed scale requirement.
> Yes, so how many HW resource should the HW implementation reserved to
> serve the worst case? Half of the board resource?
The board designer can decide how to manage the resource.
Administration commands are explicit instructions to the device.
It knows how many members device's dirty tracking is ongoing, which device context is being read/written.

Admin command can even fail with EAGAIN error code when device is out of resource and software can retry the command.

They key part is all of these happens outside of the VM's downtime.
Majority of the work in proposal [1] is done when the VM is _live_.
Hence, the resource consumption or reservation is significantly less.


> >> Naming a number or an algorithm for the ratio of devices / num_of_AQs
> >> is beyond this topic, but I made my point clear.
> > Sure. It is beyond.
> > And it is not a concern either.
> It is, the user expect the LM process success than fail.
I still fail to understand why LM process fails.
The migration process is slow, but downtime is not in [1].

> >> can depth = 1K introduce significant latency?
> > AQ command execution is not done serially. There is enough text on the AQ
> chapter as I recall.
> Then require more HW resource, I don't see difference.
Difference compared to what, multiple AQs?
If so, sure.
The device who prefers to do only one AQ command at a time, sure it can work with less resource and do one at a time.

> >
> >> And 1K depths is
> >> almost identical to 2 X 500 queue depths, so still the same problem,
> >> how many resource does the HW need to reserve to serve the worst case?
> >>
> > You didn’t describe the problem.
> > Virtqueue is generic infrastructure to execute commands, be it admin
> command, control command, flow filter command, scsi command.
> > How many to execute in parallel, how many queues to have are device
> implementation specific.
> So the question is how many to serve the worst case? Does the HW vendor need
> to reserve half of the board resource?
No. It does not need to.

> >
> >> Let's forget the numbers, the point is clear.
> > Ok. I agree with you.
> > Number of AQs and its depth matter for this discussion, and its performance
> characterization is outside the spec.
> > Design wise, key thing to have the queuing interface between driver and
> device for device migration commands.
> > This enables both entities to execute things in parallel.
> >
> > This is fully covered in [1].
> > So let's improve [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> I am not sure, why [1] is a must? There are certain issues discussed in this
> thread for [1] stay unsolved.
> 
> By the way, do you see anything we need to improve in this series?
In [1], device context needs to more rich as we progress in v1/v2 versions.

[..]

> > A nested guest VM is not aware and should not.
> > The VM hosting the nested VM, is aware on how to execute administrative
> commands using the owner device.
> The VM does not talk to admin vq either, the admin vq is a host facility, host
> owns it.
Admin VQ is owned by the device whichever has it.
As I explained before, it is on the owner device.
If needed one can do on more than owner device.
A VM_A which is hosting another VM_B, a VM_A can have peer VF with AQ to be the admin device or migration manager device.

> >
> > At present for PCI transport, owner device is PF.
> >
> > In future for nesting, may be another peer VF can be delegated such task and
> it can perform administration command.
> Then it may run into the problems explained above.
> >
> > For bare metal may be some other admin device like DPU can do that role.
> So [1] is not ready
> >
> >>>> Why this series can not support nested?
> >>> I don’t see all the aspects that I covered in series [1] ranging
> >>> from flr, device
> >> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> >> covered in some device, vq suspend resume piece.
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
> >>> .h
> >>> tml
> >> We have discussed many other issues in this thread.
> >>>>>> And it does not serve bare-metal live migration either.
> >>>>> A bare-metal migration seems a distance theory as one need side
> >>>>> cpu,
> >>>> memory accessor apart from device accessor.
> >>>>> But somehow if that exists, there will be similar admin device to
> >>>>> migrate it
> >>>> may be TDDISP will own this whole piece one day.
> >>>> Bare metal live migration require other components like firmware OS
> >>>> and partitioning, that's why the device live migration should not
> >>>> be a
> >> blocker.
> >>> Device migration is not blocker.
> >>> In-fact it facilitates for this future in case if that happens where
> >>> side cpu like
> >> DPU or similar sideband virtio admin device can migrate over its admin vq.
> >>> Long ago when admin commands were discussed, this was discussed too
> >> where a admin device may not be an owner device.
> >> The admin vq can not migrate it self therefore baremetal can not be
> >> migrated by admin vq
> > May be I was not clear. The admin commands are executed by some other
> device than the PF.
>  From SW perspective, it should be the admin vq and the device it resides.
> > In above I call it admin device, which can be a DPU may be some other
> dedicated admin device or something else.
> > Large part of non virtio infrastructure at platform, BIOS, cpu, memory level
> needs to evolve before virtio can utilize it.
> virito device should be self-contained. Not depend on other components.
> >
> > We don’t need to cook all now, as long as we have administration commands
> its good.
> > The real credit owner for detaching the administration command from
> > the admin vq is Michael. :) We like to utilize this in future for DPU case where
> admin device is not the PCI PF.
> > Eswitch, PF migration etc may utilize it in future when needed.
> Again, the design should not rely on other host components.
It does not. It relies on the administration commands.

> 
> And it is not about the credit, this is reliable work outcome
I didn’t follow the comment.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11 10:21                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11 10:21 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 3:03 PM

> So implement AQ on the "admin" VF? This require the HW reserve dedicated
> resource for every VF?
> So expensive, Overkill?
> 
> And a VF may be managed by the PF and its admin "vf"?
Yes.

> > If UDP packets are dropped, even application can fail who do no retry.
> UDP is not reliable, and performance overhead does not mean fail.
It largely depends on application.
I have seen iperf UDP failing on packet drop and never recovered.
A retransmission over UDP can fail.

> >
> >> But too few AQ to serve too high volume of VMs may be a problem.
> > It is left for the device to implement the needed scale requirement.
> Yes, so how many HW resource should the HW implementation reserved to
> serve the worst case? Half of the board resource?
The board designer can decide how to manage the resource.
Administration commands are explicit instructions to the device.
It knows how many members device's dirty tracking is ongoing, which device context is being read/written.

Admin command can even fail with EAGAIN error code when device is out of resource and software can retry the command.

They key part is all of these happens outside of the VM's downtime.
Majority of the work in proposal [1] is done when the VM is _live_.
Hence, the resource consumption or reservation is significantly less.


> >> Naming a number or an algorithm for the ratio of devices / num_of_AQs
> >> is beyond this topic, but I made my point clear.
> > Sure. It is beyond.
> > And it is not a concern either.
> It is, the user expect the LM process success than fail.
I still fail to understand why LM process fails.
The migration process is slow, but downtime is not in [1].

> >> can depth = 1K introduce significant latency?
> > AQ command execution is not done serially. There is enough text on the AQ
> chapter as I recall.
> Then require more HW resource, I don't see difference.
Difference compared to what, multiple AQs?
If so, sure.
The device who prefers to do only one AQ command at a time, sure it can work with less resource and do one at a time.

> >
> >> And 1K depths is
> >> almost identical to 2 X 500 queue depths, so still the same problem,
> >> how many resource does the HW need to reserve to serve the worst case?
> >>
> > You didn’t describe the problem.
> > Virtqueue is generic infrastructure to execute commands, be it admin
> command, control command, flow filter command, scsi command.
> > How many to execute in parallel, how many queues to have are device
> implementation specific.
> So the question is how many to serve the worst case? Does the HW vendor need
> to reserve half of the board resource?
No. It does not need to.

> >
> >> Let's forget the numbers, the point is clear.
> > Ok. I agree with you.
> > Number of AQs and its depth matter for this discussion, and its performance
> characterization is outside the spec.
> > Design wise, key thing to have the queuing interface between driver and
> device for device migration commands.
> > This enables both entities to execute things in parallel.
> >
> > This is fully covered in [1].
> > So let's improve [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> I am not sure, why [1] is a must? There are certain issues discussed in this
> thread for [1] stay unsolved.
> 
> By the way, do you see anything we need to improve in this series?
In [1], device context needs to more rich as we progress in v1/v2 versions.

[..]

> > A nested guest VM is not aware and should not.
> > The VM hosting the nested VM, is aware on how to execute administrative
> commands using the owner device.
> The VM does not talk to admin vq either, the admin vq is a host facility, host
> owns it.
Admin VQ is owned by the device whichever has it.
As I explained before, it is on the owner device.
If needed one can do on more than owner device.
A VM_A which is hosting another VM_B, a VM_A can have peer VF with AQ to be the admin device or migration manager device.

> >
> > At present for PCI transport, owner device is PF.
> >
> > In future for nesting, may be another peer VF can be delegated such task and
> it can perform administration command.
> Then it may run into the problems explained above.
> >
> > For bare metal may be some other admin device like DPU can do that role.
> So [1] is not ready
> >
> >>>> Why this series can not support nested?
> >>> I don’t see all the aspects that I covered in series [1] ranging
> >>> from flr, device
> >> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> >> covered in some device, vq suspend resume piece.
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
> >>> .h
> >>> tml
> >> We have discussed many other issues in this thread.
> >>>>>> And it does not serve bare-metal live migration either.
> >>>>> A bare-metal migration seems a distance theory as one need side
> >>>>> cpu,
> >>>> memory accessor apart from device accessor.
> >>>>> But somehow if that exists, there will be similar admin device to
> >>>>> migrate it
> >>>> may be TDDISP will own this whole piece one day.
> >>>> Bare metal live migration require other components like firmware OS
> >>>> and partitioning, that's why the device live migration should not
> >>>> be a
> >> blocker.
> >>> Device migration is not blocker.
> >>> In-fact it facilitates for this future in case if that happens where
> >>> side cpu like
> >> DPU or similar sideband virtio admin device can migrate over its admin vq.
> >>> Long ago when admin commands were discussed, this was discussed too
> >> where a admin device may not be an owner device.
> >> The admin vq can not migrate it self therefore baremetal can not be
> >> migrated by admin vq
> > May be I was not clear. The admin commands are executed by some other
> device than the PF.
>  From SW perspective, it should be the admin vq and the device it resides.
> > In above I call it admin device, which can be a DPU may be some other
> dedicated admin device or something else.
> > Large part of non virtio infrastructure at platform, BIOS, cpu, memory level
> needs to evolve before virtio can utilize it.
> virito device should be self-contained. Not depend on other components.
> >
> > We don’t need to cook all now, as long as we have administration commands
> its good.
> > The real credit owner for detaching the administration command from
> > the admin vq is Michael. :) We like to utilize this in future for DPU case where
> admin device is not the PCI PF.
> > Eswitch, PF migration etc may utilize it in future when needed.
> Again, the design should not rely on other host components.
It does not. It relies on the administration commands.

> 
> And it is not about the credit, this is reliable work outcome
I didn’t follow the comment.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  9:32                               ` Zhu, Lingshan
@ 2023-09-11 11:50                                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11 11:50 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 3:03 PM

> By the way, do you see anything we need to improve in this series?

Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.

To me [1] is superset work that covers all needed functionality and downtime aspects.

I plan to improve [1] with v1 this week by extending device context and addressing other review comments.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-11 11:50                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-11 11:50 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 11, 2023 3:03 PM

> By the way, do you see anything we need to improve in this series?

Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.

To me [1] is superset work that covers all needed functionality and downtime aspects.

I plan to improve [1] with v1 this week by extending device context and addressing other review comments.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 10:15             ` Michael S. Tsirkin
@ 2023-09-12  3:35               ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 6:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
> > Customers don't want to have admin stuff, SR-IOV or PASID in the guest
> > in order to migrate a single virtio device in the nest.
>
> Built an alternative facility to implement admin commands then.

I wonder if it could be built in an efficient way. For example the
length of admin commands is not fixed and we don't want to grow MMIO
areas as the admin command, this will result in something like
VIRTIO_PCI_CAP_PCI_CFG which is sub-optimal (much more registers
accesses than simply introducing new fields in common cfg).

> The advantage of admin commands is they are nicely contained.

If it want to be contained it needs to duplicate the functionality of
the existing facilities like common cfg and others (one example is to
setup the virtqueue after migration). Otherwise during live migration,
we will use both admin commands and existing configuration structure
which will end up with more issues.

As stated before, the best way is to decouple the basic facilities
(states like index, inflight, dirty page) from a specific
interface/transport and keep the flexibility at the transport layer.
Transport layer can choose to stick to the existing interfaces or
implement the admin commands. So we can have two ways in parallel:

1) live migration via the existing transport specific facilities, this
allows us to reuse the existing interfaces with minimal extensions or
take the advantages of the transport specific facilities like PASID
2) live migration via admin commands, but it needs to invent commands
to access existing facilities which is just a new transport interface
that LingShan is work (transport over admin commands)

Instead of focusing on a solution that only works for a specific setup
on a specific transport.

Thanks

> This proposal is way too intrusive.
>
> --
> MST
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  3:35               ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 6:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
> > Customers don't want to have admin stuff, SR-IOV or PASID in the guest
> > in order to migrate a single virtio device in the nest.
>
> Built an alternative facility to implement admin commands then.

I wonder if it could be built in an efficient way. For example the
length of admin commands is not fixed and we don't want to grow MMIO
areas as the admin command, this will result in something like
VIRTIO_PCI_CAP_PCI_CFG which is sub-optimal (much more registers
accesses than simply introducing new fields in common cfg).

> The advantage of admin commands is they are nicely contained.

If it want to be contained it needs to duplicate the functionality of
the existing facilities like common cfg and others (one example is to
setup the virtqueue after migration). Otherwise during live migration,
we will use both admin commands and existing configuration structure
which will end up with more issues.

As stated before, the best way is to decouple the basic facilities
(states like index, inflight, dirty page) from a specific
interface/transport and keep the flexibility at the transport layer.
Transport layer can choose to stick to the existing interfaces or
implement the admin commands. So we can have two ways in parallel:

1) live migration via the existing transport specific facilities, this
allows us to reuse the existing interfaces with minimal extensions or
take the advantages of the transport specific facilities like PASID
2) live migration via admin commands, but it needs to invent commands
to access existing facilities which is just a new transport interface
that LingShan is work (transport over admin commands)

Instead of focusing on a solution that only works for a specific setup
on a specific transport.

Thanks

> This proposal is way too intrusive.
>
> --
> MST
>

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  3:35               ` Jason Wang
@ 2023-09-12  3:43                 ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  3:43 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Parav Pandit, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/12/2023 11:35 AM, Jason Wang wrote:
> On Mon, Sep 11, 2023 at 6:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
>>> Customers don't want to have admin stuff, SR-IOV or PASID in the guest
>>> in order to migrate a single virtio device in the nest.
>> Built an alternative facility to implement admin commands then.
> I wonder if it could be built in an efficient way. For example the
> length of admin commands is not fixed and we don't want to grow MMIO
> areas as the admin command, this will result in something like
> VIRTIO_PCI_CAP_PCI_CFG which is sub-optimal (much more registers
> accesses than simply introducing new fields in common cfg).
>
>> The advantage of admin commands is they are nicely contained.
> If it want to be contained it needs to duplicate the functionality of
> the existing facilities like common cfg and others (one example is to
> setup the virtqueue after migration). Otherwise during live migration,
> we will use both admin commands and existing configuration structure
> which will end up with more issues.
>
> As stated before, the best way is to decouple the basic facilities
> (states like index, inflight, dirty page) from a specific
> interface/transport and keep the flexibility at the transport layer.
> Transport layer can choose to stick to the existing interfaces or
> implement the admin commands. So we can have two ways in parallel:
>
> 1) live migration via the existing transport specific facilities, this
> allows us to reuse the existing interfaces with minimal extensions or
> take the advantages of the transport specific facilities like PASID
> 2) live migration via admin commands, but it needs to invent commands
> to access existing facilities which is just a new transport interface
> that LingShan is work (transport over admin commands)
>
> Instead of focusing on a solution that only works for a specific setup
> on a specific transport.
I totally agree
>
> Thanks
>
>> This proposal is way too intrusive.
>>
>> --
>> MST
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  3:43                 ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  3:43 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Parav Pandit, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/12/2023 11:35 AM, Jason Wang wrote:
> On Mon, Sep 11, 2023 at 6:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Mon, Sep 11, 2023 at 02:30:31PM +0800, Jason Wang wrote:
>>> Customers don't want to have admin stuff, SR-IOV or PASID in the guest
>>> in order to migrate a single virtio device in the nest.
>> Built an alternative facility to implement admin commands then.
> I wonder if it could be built in an efficient way. For example the
> length of admin commands is not fixed and we don't want to grow MMIO
> areas as the admin command, this will result in something like
> VIRTIO_PCI_CAP_PCI_CFG which is sub-optimal (much more registers
> accesses than simply introducing new fields in common cfg).
>
>> The advantage of admin commands is they are nicely contained.
> If it want to be contained it needs to duplicate the functionality of
> the existing facilities like common cfg and others (one example is to
> setup the virtqueue after migration). Otherwise during live migration,
> we will use both admin commands and existing configuration structure
> which will end up with more issues.
>
> As stated before, the best way is to decouple the basic facilities
> (states like index, inflight, dirty page) from a specific
> interface/transport and keep the flexibility at the transport layer.
> Transport layer can choose to stick to the existing interfaces or
> implement the admin commands. So we can have two ways in parallel:
>
> 1) live migration via the existing transport specific facilities, this
> allows us to reuse the existing interfaces with minimal extensions or
> take the advantages of the transport specific facilities like PASID
> 2) live migration via admin commands, but it needs to invent commands
> to access existing facilities which is just a new transport interface
> that LingShan is work (transport over admin commands)
>
> Instead of focusing on a solution that only works for a specific setup
> on a specific transport.
I totally agree
>
> Thanks
>
>> This proposal is way too intrusive.
>>
>> --
>> MST
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 11:50                                 ` Parav Pandit
@ 2023-09-12  3:43                                   ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  3:43 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 7:50 PM Parav Pandit <parav@nvidia.com> wrote:
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Monday, September 11, 2023 3:03 PM
>
> > By the way, do you see anything we need to improve in this series?
>
> Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.

What do you mean by "all the aspects"?

Of course it can't handle nesting well, passthrough doesn't work when
your hardware has N levels abstractions but nesting is M levels. Trap
and emulation is a must.

And exposing the whole device to the guest drivers will have security
implications, your proposal has demonstrated that you need a
workaround for FLR at least.

For non standard device we don't have choices other than passthrough,
but for standard devices we have other choices.

Thanks

>
> To me [1] is superset work that covers all needed functionality and downtime aspects.
>
> I plan to improve [1] with v1 this week by extending device context and addressing other review comments.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  3:43                                   ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  3:43 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 7:50 PM Parav Pandit <parav@nvidia.com> wrote:
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Monday, September 11, 2023 3:03 PM
>
> > By the way, do you see anything we need to improve in this series?
>
> Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.

What do you mean by "all the aspects"?

Of course it can't handle nesting well, passthrough doesn't work when
your hardware has N levels abstractions but nesting is M levels. Trap
and emulation is a must.

And exposing the whole device to the guest drivers will have security
implications, your proposal has demonstrated that you need a
workaround for FLR at least.

For non standard device we don't have choices other than passthrough,
but for standard devices we have other choices.

Thanks

>
> To me [1] is superset work that covers all needed functionality and downtime aspects.
>
> I plan to improve [1] with v1 this week by extending device context and addressing other review comments.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 11:50                                 ` Parav Pandit
@ 2023-09-12  3:48                                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  3:48 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 7:50 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 3:03 PM
>> By the way, do you see anything we need to improve in this series?
> Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.
>
> To me [1] is superset work that covers all needed functionality and downtime aspects.
>
> I plan to improve [1] with v1 this week by extending device context and addressing other review comments.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I am not sure, we have discussed a lot about the potential issues in the 
treads. I guess we should
resolve them first. E.g., nested use cases.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  3:48                                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  3:48 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 7:50 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 3:03 PM
>> By the way, do you see anything we need to improve in this series?
> Admin commands for passthrough devices of [1] is comprehensive proposal covering all the aspects.
>
> To me [1] is superset work that covers all needed functionality and downtime aspects.
>
> I plan to improve [1] with v1 this week by extending device context and addressing other review comments.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I am not sure, we have discussed a lot about the potential issues in the 
treads. I guess we should
resolve them first. E.g., nested use cases.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11 10:21                                 ` Parav Pandit
@ 2023-09-12  4:06                                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  4:06 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 6:21 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 3:03 PM
>> So implement AQ on the "admin" VF? This require the HW reserve dedicated
>> resource for every VF?
>> So expensive, Overkill?
>>
>> And a VF may be managed by the PF and its admin "vf"?
> Yes.
it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed
by both L1 guest VF and the host PF, that means two owners of the L2 VF.
>
>>> If UDP packets are dropped, even application can fail who do no retry.
>> UDP is not reliable, and performance overhead does not mean fail.
> It largely depends on application.
> I have seen iperf UDP failing on packet drop and never recovered.
> A retransmission over UDP can fail.
That depends on the workload, if it choose UDP, it is aware of the 
possibilities
of losing packets. But anyway, LM are expected to perform
successfully in the due time
>
>>>> But too few AQ to serve too high volume of VMs may be a problem.
>>> It is left for the device to implement the needed scale requirement.
>> Yes, so how many HW resource should the HW implementation reserved to
>> serve the worst case? Half of the board resource?
> The board designer can decide how to manage the resource.
> Administration commands are explicit instructions to the device.
> It knows how many members device's dirty tracking is ongoing, which device context is being read/written.
Still, does the board designer need to prepare for the worst case? How 
to meet that challenge?
>
> Admin command can even fail with EAGAIN error code when device is out of resource and software can retry the command.
As demonstrated, this series is reliable as the config space 
functionalities, so maybe less possibilities to fail?
>
> They key part is all of these happens outside of the VM's downtime.
> Majority of the work in proposal [1] is done when the VM is _live_.
> Hence, the resource consumption or reservation is significantly less.
Still depends on the volume of VMs and devices, the orchestration layer
needs to migrate the last round of dirty pages and states even when the VM
has been suspended.
>
>
>>>> Naming a number or an algorithm for the ratio of devices / num_of_AQs
>>>> is beyond this topic, but I made my point clear.
>>> Sure. It is beyond.
>>> And it is not a concern either.
>> It is, the user expect the LM process success than fail.
> I still fail to understand why LM process fails.
> The migration process is slow, but downtime is not in [1].
If I recall it clear, the downtime is around 300ms, so
don't let the bandwidth or num of admin vqs become
a bottle neck which may introduce more possibilities to fail.
>
>>>> can depth = 1K introduce significant latency?
>>> AQ command execution is not done serially. There is enough text on the AQ
>> chapter as I recall.
>> Then require more HW resource, I don't see difference.
> Difference compared to what, multiple AQs?
> If so, sure.
> The device who prefers to do only one AQ command at a time, sure it can work with less resource and do one at a time.
I think we are discussing the same issue as above "resource for the 
worst case" problem
>
>>>> And 1K depths is
>>>> almost identical to 2 X 500 queue depths, so still the same problem,
>>>> how many resource does the HW need to reserve to serve the worst case?
>>>>
>>> You didn’t describe the problem.
>>> Virtqueue is generic infrastructure to execute commands, be it admin
>> command, control command, flow filter command, scsi command.
>>> How many to execute in parallel, how many queues to have are device
>> implementation specific.
>> So the question is how many to serve the worst case? Does the HW vendor need
>> to reserve half of the board resource?
> No. It does not need to.
same as above
>
>>>> Let's forget the numbers, the point is clear.
>>> Ok. I agree with you.
>>> Number of AQs and its depth matter for this discussion, and its performance
>> characterization is outside the spec.
>>> Design wise, key thing to have the queuing interface between driver and
>> device for device migration commands.
>>> This enables both entities to execute things in parallel.
>>>
>>> This is fully covered in [1].
>>> So let's improve [1].
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> I am not sure, why [1] is a must? There are certain issues discussed in this
>> thread for [1] stay unsolved.
>>
>> By the way, do you see anything we need to improve in this series?
> In [1], device context needs to more rich as we progress in v1/v2 versions.
>
> [..]
>
>>> A nested guest VM is not aware and should not.
>>> The VM hosting the nested VM, is aware on how to execute administrative
>> commands using the owner device.
>> The VM does not talk to admin vq either, the admin vq is a host facility, host
>> owns it.
> Admin VQ is owned by the device whichever has it.
> As I explained before, it is on the owner device.
> If needed one can do on more than owner device.
> A VM_A which is hosting another VM_B, a VM_A can have peer VF with AQ to be the admin device or migration manager device.
so two or more owners own the same device, conflict?
>
>>> At present for PCI transport, owner device is PF.
>>>
>>> In future for nesting, may be another peer VF can be delegated such task and
>> it can perform administration command.
>> Then it may run into the problems explained above.
>>> For bare metal may be some other admin device like DPU can do that role.
>> So [1] is not ready
>>>>>> Why this series can not support nested?
>>>>> I don’t see all the aspects that I covered in series [1] ranging
>>>>> from flr, device
>>>> context migration, virtio level reset, dirty page tracking, p2p support, etc.
>>>> covered in some device, vq suspend resume piece.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
>>>>> .h
>>>>> tml
>>>> We have discussed many other issues in this thread.
>>>>>>>> And it does not serve bare-metal live migration either.
>>>>>>> A bare-metal migration seems a distance theory as one need side
>>>>>>> cpu,
>>>>>> memory accessor apart from device accessor.
>>>>>>> But somehow if that exists, there will be similar admin device to
>>>>>>> migrate it
>>>>>> may be TDDISP will own this whole piece one day.
>>>>>> Bare metal live migration require other components like firmware OS
>>>>>> and partitioning, that's why the device live migration should not
>>>>>> be a
>>>> blocker.
>>>>> Device migration is not blocker.
>>>>> In-fact it facilitates for this future in case if that happens where
>>>>> side cpu like
>>>> DPU or similar sideband virtio admin device can migrate over its admin vq.
>>>>> Long ago when admin commands were discussed, this was discussed too
>>>> where a admin device may not be an owner device.
>>>> The admin vq can not migrate it self therefore baremetal can not be
>>>> migrated by admin vq
>>> May be I was not clear. The admin commands are executed by some other
>> device than the PF.
>>   From SW perspective, it should be the admin vq and the device it resides.
>>> In above I call it admin device, which can be a DPU may be some other
>> dedicated admin device or something else.
>>> Large part of non virtio infrastructure at platform, BIOS, cpu, memory level
>> needs to evolve before virtio can utilize it.
>> virito device should be self-contained. Not depend on other components.
>>> We don’t need to cook all now, as long as we have administration commands
>> its good.
>>> The real credit owner for detaching the administration command from
>>> the admin vq is Michael. :) We like to utilize this in future for DPU case where
>> admin device is not the PCI PF.
>>> Eswitch, PF migration etc may utilize it in future when needed.
>> Again, the design should not rely on other host components.
> It does not. It relies on the administration commands.
I remember you have mentioned using DPU infrastructure to
perform bare-metal live migration?
>
>> And it is not about the credit, this is reliable work outcome
> I didn’t follow the comment.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  4:06                                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  4:06 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/11/2023 6:21 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 11, 2023 3:03 PM
>> So implement AQ on the "admin" VF? This require the HW reserve dedicated
>> resource for every VF?
>> So expensive, Overkill?
>>
>> And a VF may be managed by the PF and its admin "vf"?
> Yes.
it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed
by both L1 guest VF and the host PF, that means two owners of the L2 VF.
>
>>> If UDP packets are dropped, even application can fail who do no retry.
>> UDP is not reliable, and performance overhead does not mean fail.
> It largely depends on application.
> I have seen iperf UDP failing on packet drop and never recovered.
> A retransmission over UDP can fail.
That depends on the workload, if it choose UDP, it is aware of the 
possibilities
of losing packets. But anyway, LM are expected to perform
successfully in the due time
>
>>>> But too few AQ to serve too high volume of VMs may be a problem.
>>> It is left for the device to implement the needed scale requirement.
>> Yes, so how many HW resource should the HW implementation reserved to
>> serve the worst case? Half of the board resource?
> The board designer can decide how to manage the resource.
> Administration commands are explicit instructions to the device.
> It knows how many members device's dirty tracking is ongoing, which device context is being read/written.
Still, does the board designer need to prepare for the worst case? How 
to meet that challenge?
>
> Admin command can even fail with EAGAIN error code when device is out of resource and software can retry the command.
As demonstrated, this series is reliable as the config space 
functionalities, so maybe less possibilities to fail?
>
> They key part is all of these happens outside of the VM's downtime.
> Majority of the work in proposal [1] is done when the VM is _live_.
> Hence, the resource consumption or reservation is significantly less.
Still depends on the volume of VMs and devices, the orchestration layer
needs to migrate the last round of dirty pages and states even when the VM
has been suspended.
>
>
>>>> Naming a number or an algorithm for the ratio of devices / num_of_AQs
>>>> is beyond this topic, but I made my point clear.
>>> Sure. It is beyond.
>>> And it is not a concern either.
>> It is, the user expect the LM process success than fail.
> I still fail to understand why LM process fails.
> The migration process is slow, but downtime is not in [1].
If I recall it clear, the downtime is around 300ms, so
don't let the bandwidth or num of admin vqs become
a bottle neck which may introduce more possibilities to fail.
>
>>>> can depth = 1K introduce significant latency?
>>> AQ command execution is not done serially. There is enough text on the AQ
>> chapter as I recall.
>> Then require more HW resource, I don't see difference.
> Difference compared to what, multiple AQs?
> If so, sure.
> The device who prefers to do only one AQ command at a time, sure it can work with less resource and do one at a time.
I think we are discussing the same issue as above "resource for the 
worst case" problem
>
>>>> And 1K depths is
>>>> almost identical to 2 X 500 queue depths, so still the same problem,
>>>> how many resource does the HW need to reserve to serve the worst case?
>>>>
>>> You didn’t describe the problem.
>>> Virtqueue is generic infrastructure to execute commands, be it admin
>> command, control command, flow filter command, scsi command.
>>> How many to execute in parallel, how many queues to have are device
>> implementation specific.
>> So the question is how many to serve the worst case? Does the HW vendor need
>> to reserve half of the board resource?
> No. It does not need to.
same as above
>
>>>> Let's forget the numbers, the point is clear.
>>> Ok. I agree with you.
>>> Number of AQs and its depth matter for this discussion, and its performance
>> characterization is outside the spec.
>>> Design wise, key thing to have the queuing interface between driver and
>> device for device migration commands.
>>> This enables both entities to execute things in parallel.
>>>
>>> This is fully covered in [1].
>>> So let's improve [1].
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> I am not sure, why [1] is a must? There are certain issues discussed in this
>> thread for [1] stay unsolved.
>>
>> By the way, do you see anything we need to improve in this series?
> In [1], device context needs to more rich as we progress in v1/v2 versions.
>
> [..]
>
>>> A nested guest VM is not aware and should not.
>>> The VM hosting the nested VM, is aware on how to execute administrative
>> commands using the owner device.
>> The VM does not talk to admin vq either, the admin vq is a host facility, host
>> owns it.
> Admin VQ is owned by the device whichever has it.
> As I explained before, it is on the owner device.
> If needed one can do on more than owner device.
> A VM_A which is hosting another VM_B, a VM_A can have peer VF with AQ to be the admin device or migration manager device.
so two or more owners own the same device, conflict?
>
>>> At present for PCI transport, owner device is PF.
>>>
>>> In future for nesting, may be another peer VF can be delegated such task and
>> it can perform administration command.
>> Then it may run into the problems explained above.
>>> For bare metal may be some other admin device like DPU can do that role.
>> So [1] is not ready
>>>>>> Why this series can not support nested?
>>>>> I don’t see all the aspects that I covered in series [1] ranging
>>>>> from flr, device
>>>> context migration, virtio level reset, dirty page tracking, p2p support, etc.
>>>> covered in some device, vq suspend resume piece.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
>>>>> .h
>>>>> tml
>>>> We have discussed many other issues in this thread.
>>>>>>>> And it does not serve bare-metal live migration either.
>>>>>>> A bare-metal migration seems a distance theory as one need side
>>>>>>> cpu,
>>>>>> memory accessor apart from device accessor.
>>>>>>> But somehow if that exists, there will be similar admin device to
>>>>>>> migrate it
>>>>>> may be TDDISP will own this whole piece one day.
>>>>>> Bare metal live migration require other components like firmware OS
>>>>>> and partitioning, that's why the device live migration should not
>>>>>> be a
>>>> blocker.
>>>>> Device migration is not blocker.
>>>>> In-fact it facilitates for this future in case if that happens where
>>>>> side cpu like
>>>> DPU or similar sideband virtio admin device can migrate over its admin vq.
>>>>> Long ago when admin commands were discussed, this was discussed too
>>>> where a admin device may not be an owner device.
>>>> The admin vq can not migrate it self therefore baremetal can not be
>>>> migrated by admin vq
>>> May be I was not clear. The admin commands are executed by some other
>> device than the PF.
>>   From SW perspective, it should be the admin vq and the device it resides.
>>> In above I call it admin device, which can be a DPU may be some other
>> dedicated admin device or something else.
>>> Large part of non virtio infrastructure at platform, BIOS, cpu, memory level
>> needs to evolve before virtio can utilize it.
>> virito device should be self-contained. Not depend on other components.
>>> We don’t need to cook all now, as long as we have administration commands
>> its good.
>>> The real credit owner for detaching the administration command from
>>> the admin vq is Michael. :) We like to utilize this in future for DPU case where
>> admin device is not the PCI PF.
>>> Eswitch, PF migration etc may utilize it in future when needed.
>> Again, the design should not rely on other host components.
> It does not. It relies on the administration commands.
I remember you have mentioned using DPU infrastructure to
perform bare-metal live migration?
>
>> And it is not about the credit, this is reliable work outcome
> I didn’t follow the comment.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  8:12                         ` Parav Pandit
@ 2023-09-12  4:10                           ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  4:10 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

>
> > Why this series can not support nested?
> I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

The series works for stateless devices. Before we introduce device
states in the spec, we can't migrate stateful devices. So the device
context doesn't make much sense right now.

Dirty page tracking in virtio is not a must for live migration to
work. It can be done via platform facilities or even software. And to
make it more efficient, it needs to utilize transport facilities
instead of a general one.

The FLR, P2P demonstrates the fragility of a simple passthrough method
and how it conflicts with live migration and complicates the device
implementation. And it means you need to audit all PCI features and do
workaround if there're any possible issues (or using a whitelist).
This is tricky and we are migrating virtio not virtio-pci. If we don't
use simple passthrough we don't need to care about this.

Since the functionality proposed in this series focus on the minimal
set of the functionality for migration, it is virtio specific and self
contained so nothing special is required to work in the nest.

Thanks

>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  4:10                           ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  4:10 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

>
> > Why this series can not support nested?
> I don’t see all the aspects that I covered in series [1] ranging from flr, device context migration, virtio level reset, dirty page tracking, p2p support, etc. covered in some device, vq suspend resume piece.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

The series works for stateless devices. Before we introduce device
states in the spec, we can't migrate stateful devices. So the device
context doesn't make much sense right now.

Dirty page tracking in virtio is not a must for live migration to
work. It can be done via platform facilities or even software. And to
make it more efficient, it needs to utilize transport facilities
instead of a general one.

The FLR, P2P demonstrates the fragility of a simple passthrough method
and how it conflicts with live migration and complicates the device
implementation. And it means you need to audit all PCI features and do
workaround if there're any possible issues (or using a whitelist).
This is tricky and we are migrating virtio not virtio-pci. If we don't
use simple passthrough we don't need to care about this.

Since the functionality proposed in this series focus on the minimal
set of the functionality for migration, it is virtio specific and self
contained so nothing special is required to work in the nest.

Thanks

>

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-11  6:47             ` Parav Pandit
@ 2023-09-12  4:18               ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  4:18 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, September 11, 2023 12:01 PM
> >
> > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > Hi Michael,
> > >
> > > > From: virtio-comment@lists.oasis-open.org
> > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > Sent: Monday, September 11, 2023 8:31 AM
> > > >
> > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
> > wrote:
> > > > >
> > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > This patch adds two new le16 fields to common configuration
> > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > >
> > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > >
> > > > >
> > > > > I do not see why this would be pci specific at all.
> > > >
> > > > This is the PCI interface for live migration. The facility is not specific to PCI.
> > > >
> > > > It can choose to reuse the common configuration or not, but the
> > > > semantic is general enough to be used by other transports. We can
> > > > introduce one for MMIO for sure.
> > > >
> > > > >
> > > > > But besides I thought work on live migration will use admin queue.
> > > > > This was explicitly one of the motivators.
> > > >
> > > Please find the proposal that uses administration commands for device
> > migration at [1] for passthrough devices.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > > tml
> >
> > This proposal couples live migration with several requirements, and suffers from
> > the exact issues I've mentioned below.
> >
> It does not.
> Can you please list which one?
>
> > In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
> > other than the device status).
> >
> There is no state machine in [1].

Isn't the migration modes of "active, stop, freeze" a state machine?

> It is not coupled with PCI/SR-IOV either.
> It supports PCI/SR-IOV transport and in future other transports too when they evolve.
>

For example:

+struct virtio_dev_ctx_pci_vq_cfg {
+ le16 vq_index;
+        le16 queue_size;
+        le16 queue_msix_vector;
+ le64 queue_descÍ
+        le64 queue_driverÍ
+        le64 queue_deviceÍ
+};
+\end{lstlisting}

And does this mean we will have commands for MMIO and other transport?
(Most of the fields except the msix are general enough). And it's just
a partial implementation of the queue related functionality of the
common cfg, so I wonder how it can work.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  4:18               ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-12  4:18 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Monday, September 11, 2023 12:01 PM
> >
> > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > Hi Michael,
> > >
> > > > From: virtio-comment@lists.oasis-open.org
> > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > Sent: Monday, September 11, 2023 8:31 AM
> > > >
> > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin <mst@redhat.com>
> > wrote:
> > > > >
> > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > This patch adds two new le16 fields to common configuration
> > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > >
> > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > >
> > > > >
> > > > > I do not see why this would be pci specific at all.
> > > >
> > > > This is the PCI interface for live migration. The facility is not specific to PCI.
> > > >
> > > > It can choose to reuse the common configuration or not, but the
> > > > semantic is general enough to be used by other transports. We can
> > > > introduce one for MMIO for sure.
> > > >
> > > > >
> > > > > But besides I thought work on live migration will use admin queue.
> > > > > This was explicitly one of the motivators.
> > > >
> > > Please find the proposal that uses administration commands for device
> > migration at [1] for passthrough devices.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > > tml
> >
> > This proposal couples live migration with several requirements, and suffers from
> > the exact issues I've mentioned below.
> >
> It does not.
> Can you please list which one?
>
> > In some cases, it's even worse (coupling with PCI/SR-IOV, second state machine
> > other than the device status).
> >
> There is no state machine in [1].

Isn't the migration modes of "active, stop, freeze" a state machine?

> It is not coupled with PCI/SR-IOV either.
> It supports PCI/SR-IOV transport and in future other transports too when they evolve.
>

For example:

+struct virtio_dev_ctx_pci_vq_cfg {
+ le16 vq_index;
+        le16 queue_size;
+        le16 queue_msix_vector;
+ le64 queue_descÍ
+        le64 queue_driverÍ
+        le64 queue_deviceÍ
+};
+\end{lstlisting}

And does this mean we will have commands for MMIO and other transport?
(Most of the fields except the msix are general enough). And it's just
a partial implementation of the queue related functionality of the
common cfg, so I wonder how it can work.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  3:43                                   ` Jason Wang
@ 2023-09-12  5:50                                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  5:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:14 AM
> 
> On Mon, Sep 11, 2023 at 7:50 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Monday, September 11, 2023 3:03 PM
> >
> > > By the way, do you see anything we need to improve in this series?
> >
> > Admin commands for passthrough devices of [1] is comprehensive proposal
> covering all the aspects.
> 
> What do you mean by "all the aspects"?
They are covered in the proposal cover letter.
1. state migration, p2p, dirty page tracking, lower downtime, flr, device reset, no extra mediation requirement.

> 
> Of course it can't handle nesting well, passthrough doesn't work when your
> hardware has N levels abstractions but nesting is M levels. Trap and emulation
> is a must.
One can delicate the work to other VF for purpose of nesting.

One can build infinite level of nesting to not do passthrough, at the end user applications remains slow.
So for such N and M being > 1, one can use software base emulation anyway.

> 
> And exposing the whole device to the guest drivers will have security
> implications, your proposal has demonstrated that you need a workaround for
There is no security implications in passthrough.

> FLR at least.
It is actually the opposite.
FLR is supported with the proposal without any workarounds and mediation.

> 
> For non standard device we don't have choices other than passthrough, but for
> standard devices we have other choices.

Passthrough is basic requirement that we will be fulfilling.
If one wants to do special nesting, may be, there.
If both commands can converge its good, if not, they are orthogonal requirements.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  5:50                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  5:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:14 AM
> 
> On Mon, Sep 11, 2023 at 7:50 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Monday, September 11, 2023 3:03 PM
> >
> > > By the way, do you see anything we need to improve in this series?
> >
> > Admin commands for passthrough devices of [1] is comprehensive proposal
> covering all the aspects.
> 
> What do you mean by "all the aspects"?
They are covered in the proposal cover letter.
1. state migration, p2p, dirty page tracking, lower downtime, flr, device reset, no extra mediation requirement.

> 
> Of course it can't handle nesting well, passthrough doesn't work when your
> hardware has N levels abstractions but nesting is M levels. Trap and emulation
> is a must.
One can delicate the work to other VF for purpose of nesting.

One can build infinite level of nesting to not do passthrough, at the end user applications remains slow.
So for such N and M being > 1, one can use software base emulation anyway.

> 
> And exposing the whole device to the guest drivers will have security
> implications, your proposal has demonstrated that you need a workaround for
There is no security implications in passthrough.

> FLR at least.
It is actually the opposite.
FLR is supported with the proposal without any workarounds and mediation.

> 
> For non standard device we don't have choices other than passthrough, but for
> standard devices we have other choices.

Passthrough is basic requirement that we will be fulfilling.
If one wants to do special nesting, may be, there.
If both commands can converge its good, if not, they are orthogonal requirements.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  3:48                                   ` Zhu, Lingshan
@ 2023-09-12  5:51                                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  5:51 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 9:19 AM
> 
> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >> anything we need to improve in this series?
> > Admin commands for passthrough devices of [1] is comprehensive proposal
> covering all the aspects.
> >
> > To me [1] is superset work that covers all needed functionality and downtime
> aspects.
> >
> > I plan to improve [1] with v1 this week by extending device context and
> addressing other review comments.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> I am not sure, we have discussed a lot about the potential issues in the treads. I
> guess we should resolve them first. E.g., nested use cases.
You are using nesting use case as the _only_ use case and attempt to steer using that.
Not right.

If you want to discuss, then lets have both the use cases, attempt to converge and if we can its really good.
If we cannot, both requirements should be handled differently.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  5:51                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  5:51 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 9:19 AM
> 
> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >> anything we need to improve in this series?
> > Admin commands for passthrough devices of [1] is comprehensive proposal
> covering all the aspects.
> >
> > To me [1] is superset work that covers all needed functionality and downtime
> aspects.
> >
> > I plan to improve [1] with v1 this week by extending device context and
> addressing other review comments.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> I am not sure, we have discussed a lot about the potential issues in the treads. I
> guess we should resolve them first. E.g., nested use cases.
You are using nesting use case as the _only_ use case and attempt to steer using that.
Not right.

If you want to discuss, then lets have both the use cases, attempt to converge and if we can its really good.
If we cannot, both requirements should be handled differently.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  4:06                                   ` Zhu, Lingshan
@ 2023-09-12  5:58                                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  5:58 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 9:37 AM
> 
> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >> "admin" VF? This require the HW reserve dedicated resource for every
> >> VF?
> >> So expensive, Overkill?
> >>
> >> And a VF may be managed by the PF and its admin "vf"?
> > Yes.
> it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed by
> both L1 guest VF and the host PF, that means two owners of the L2 VF.
This is the nesting.
When you do M level nesting, does any cpu in world handle its own page tables in isolation of next level and also perform equally well?

> >
> >>> If UDP packets are dropped, even application can fail who do no retry.
> >> UDP is not reliable, and performance overhead does not mean fail.
> > It largely depends on application.
> > I have seen iperf UDP failing on packet drop and never recovered.
> > A retransmission over UDP can fail.
> That depends on the workload, if it choose UDP, it is aware of the possibilities of
> losing packets. But anyway, LM are expected to perform successfully in the due
> time
And LM also depends on the workload. :)
It is pointless to discuss performance characteristics as a point to use AQ or not.

> >
> >>>> But too few AQ to serve too high volume of VMs may be a problem.
> >>> It is left for the device to implement the needed scale requirement.
> >> Yes, so how many HW resource should the HW implementation reserved to
> >> serve the worst case? Half of the board resource?
> > The board designer can decide how to manage the resource.
> > Administration commands are explicit instructions to the device.
> > It knows how many members device's dirty tracking is ongoing, which device
> context is being read/written.
> Still, does the board designer need to prepare for the worst case? How to meet
> that challenge?
No. board designer does not need to.
As explained already, if board wants to supporting single command of AQ, sure.

> >
> > Admin command can even fail with EAGAIN error code when device is out of
> resource and software can retry the command.
> As demonstrated, this series is reliable as the config space functionalities, so
> maybe less possibilities to fail?
Huh. Config space has far higher failure rate for the PCI transport when due to inherent nature of PCI timeouts and reads and polling.
For any bulk data transfer virtqueue is spec defined approach.
For more than a year this was debated you can check some 2021 emails.

You can see the patches that data transfer done in [1] over registers is snail slow.

> >
> > They key part is all of these happens outside of the VM's downtime.
> > Majority of the work in proposal [1] is done when the VM is _live_.
> > Hence, the resource consumption or reservation is significantly less.
> Still depends on the volume of VMs and devices, the orchestration layer needs
> to migrate the last round of dirty pages and states even when the VM has been
> suspended.
That has nothing do with admin virtqueue.
And migration layer already does it and used by multiple devices.

> >
> >
> >>>> Naming a number or an algorithm for the ratio of devices /
> >>>> num_of_AQs is beyond this topic, but I made my point clear.
> >>> Sure. It is beyond.
> >>> And it is not a concern either.
> >> It is, the user expect the LM process success than fail.
> > I still fail to understand why LM process fails.
> > The migration process is slow, but downtime is not in [1].
> If I recall it clear, the downtime is around 300ms, so don't let the bandwidth or
> num of admin vqs become a bottle neck which may introduce more possibilities
> to fail.
> >
> >>>> can depth = 1K introduce significant latency?
> >>> AQ command execution is not done serially. There is enough text on
> >>> the AQ
> >> chapter as I recall.
> >> Then require more HW resource, I don't see difference.
> > Difference compared to what, multiple AQs?
> > If so, sure.
> > The device who prefers to do only one AQ command at a time, sure it can
> work with less resource and do one at a time.
> I think we are discussing the same issue as above "resource for the worst case"
> problem
Frankly I am not seeing any issue.
AQ is just another virtqueue as basic construct in the spec used by 30+ device types.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  5:58                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  5:58 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 9:37 AM
> 
> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >> "admin" VF? This require the HW reserve dedicated resource for every
> >> VF?
> >> So expensive, Overkill?
> >>
> >> And a VF may be managed by the PF and its admin "vf"?
> > Yes.
> it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed by
> both L1 guest VF and the host PF, that means two owners of the L2 VF.
This is the nesting.
When you do M level nesting, does any cpu in world handle its own page tables in isolation of next level and also perform equally well?

> >
> >>> If UDP packets are dropped, even application can fail who do no retry.
> >> UDP is not reliable, and performance overhead does not mean fail.
> > It largely depends on application.
> > I have seen iperf UDP failing on packet drop and never recovered.
> > A retransmission over UDP can fail.
> That depends on the workload, if it choose UDP, it is aware of the possibilities of
> losing packets. But anyway, LM are expected to perform successfully in the due
> time
And LM also depends on the workload. :)
It is pointless to discuss performance characteristics as a point to use AQ or not.

> >
> >>>> But too few AQ to serve too high volume of VMs may be a problem.
> >>> It is left for the device to implement the needed scale requirement.
> >> Yes, so how many HW resource should the HW implementation reserved to
> >> serve the worst case? Half of the board resource?
> > The board designer can decide how to manage the resource.
> > Administration commands are explicit instructions to the device.
> > It knows how many members device's dirty tracking is ongoing, which device
> context is being read/written.
> Still, does the board designer need to prepare for the worst case? How to meet
> that challenge?
No. board designer does not need to.
As explained already, if board wants to supporting single command of AQ, sure.

> >
> > Admin command can even fail with EAGAIN error code when device is out of
> resource and software can retry the command.
> As demonstrated, this series is reliable as the config space functionalities, so
> maybe less possibilities to fail?
Huh. Config space has far higher failure rate for the PCI transport when due to inherent nature of PCI timeouts and reads and polling.
For any bulk data transfer virtqueue is spec defined approach.
For more than a year this was debated you can check some 2021 emails.

You can see the patches that data transfer done in [1] over registers is snail slow.

> >
> > They key part is all of these happens outside of the VM's downtime.
> > Majority of the work in proposal [1] is done when the VM is _live_.
> > Hence, the resource consumption or reservation is significantly less.
> Still depends on the volume of VMs and devices, the orchestration layer needs
> to migrate the last round of dirty pages and states even when the VM has been
> suspended.
That has nothing do with admin virtqueue.
And migration layer already does it and used by multiple devices.

> >
> >
> >>>> Naming a number or an algorithm for the ratio of devices /
> >>>> num_of_AQs is beyond this topic, but I made my point clear.
> >>> Sure. It is beyond.
> >>> And it is not a concern either.
> >> It is, the user expect the LM process success than fail.
> > I still fail to understand why LM process fails.
> > The migration process is slow, but downtime is not in [1].
> If I recall it clear, the downtime is around 300ms, so don't let the bandwidth or
> num of admin vqs become a bottle neck which may introduce more possibilities
> to fail.
> >
> >>>> can depth = 1K introduce significant latency?
> >>> AQ command execution is not done serially. There is enough text on
> >>> the AQ
> >> chapter as I recall.
> >> Then require more HW resource, I don't see difference.
> > Difference compared to what, multiple AQs?
> > If so, sure.
> > The device who prefers to do only one AQ command at a time, sure it can
> work with less resource and do one at a time.
> I think we are discussing the same issue as above "resource for the worst case"
> problem
Frankly I am not seeing any issue.
AQ is just another virtqueue as basic construct in the spec used by 30+ device types.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  4:10                           ` Jason Wang
@ 2023-09-12  6:05                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:40 AM
> >
> > > Why this series can not support nested?
> > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> covered in some device, vq suspend resume piece.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> 
> The series works for stateless devices. Before we introduce device states in the
> spec, we can't migrate stateful devices. So the device context doesn't make
> much sense right now.
The series works for stateful devices too. The device context covers it.

> 
> Dirty page tracking in virtio is not a must for live migration to work. It can be
> done via platform facilities or even software. And to make it more efficient, it
> needs to utilize transport facilities instead of a general one.
> 
It is also optional in the spec proposal.
Most platforms claimed are not able to do efficiently either, hence the vfio subsystem added the support for it.

> The FLR, P2P demonstrates the fragility of a simple passthrough method and
> how it conflicts with live migration and complicates the device implementation.
Huh, it shows the opposite.
It shows that both will seamlessly work.

> And it means you need to audit all PCI features and do workaround if there're
> any possible issues (or using a whitelist).
No need for any of this.

> This is tricky and we are migrating virtio not virtio-pci. If we don't use simple
> passthrough we don't need to care about this.
> 
Exactly, we are migrating virtio device for the PCI transport.
As usual, if you have to keep arguing about not doing passhthrough, we are surely past that point.
Virtio does not need to stay in the weird umbrella to always mediate etc.

Series [1] will be enhanced further to support virtio passthrough device for device context and more.
Even further we like to extend the support.

> Since the functionality proposed in this series focus on the minimal set of the
> functionality for migration, it is virtio specific and self contained so nothing
> special is required to work in the nest.

Maybe it is.

Again, I repeat and like to converge the admin commands between passthrough and non-passthrough cases.
If we can converge it is good.
If not both modes can expand.
It is not either or as use cases are different.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:05                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:40 AM
> >
> > > Why this series can not support nested?
> > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> context migration, virtio level reset, dirty page tracking, p2p support, etc.
> covered in some device, vq suspend resume piece.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> 
> The series works for stateless devices. Before we introduce device states in the
> spec, we can't migrate stateful devices. So the device context doesn't make
> much sense right now.
The series works for stateful devices too. The device context covers it.

> 
> Dirty page tracking in virtio is not a must for live migration to work. It can be
> done via platform facilities or even software. And to make it more efficient, it
> needs to utilize transport facilities instead of a general one.
> 
It is also optional in the spec proposal.
Most platforms claimed are not able to do efficiently either, hence the vfio subsystem added the support for it.

> The FLR, P2P demonstrates the fragility of a simple passthrough method and
> how it conflicts with live migration and complicates the device implementation.
Huh, it shows the opposite.
It shows that both will seamlessly work.

> And it means you need to audit all PCI features and do workaround if there're
> any possible issues (or using a whitelist).
No need for any of this.

> This is tricky and we are migrating virtio not virtio-pci. If we don't use simple
> passthrough we don't need to care about this.
> 
Exactly, we are migrating virtio device for the PCI transport.
As usual, if you have to keep arguing about not doing passhthrough, we are surely past that point.
Virtio does not need to stay in the weird umbrella to always mediate etc.

Series [1] will be enhanced further to support virtio passthrough device for device context and more.
Even further we like to extend the support.

> Since the functionality proposed in this series focus on the minimal set of the
> functionality for migration, it is virtio specific and self contained so nothing
> special is required to work in the nest.

Maybe it is.

Again, I repeat and like to converge the admin commands between passthrough and non-passthrough cases.
If we can converge it is good.
If not both modes can expand.
It is not either or as use cases are different.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  4:18               ` Jason Wang
@ 2023-09-12  6:11                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:48 AM
> 
> On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Monday, September 11, 2023 12:01 PM
> > >
> > > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > > From: virtio-comment@lists.oasis-open.org
> > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > > Sent: Monday, September 11, 2023 8:31 AM
> > > > >
> > > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
> > > > > <mst@redhat.com>
> > > wrote:
> > > > > >
> > > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > > This patch adds two new le16 fields to common configuration
> > > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > > >
> > > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > >
> > > > > >
> > > > > > I do not see why this would be pci specific at all.
> > > > >
> > > > > This is the PCI interface for live migration. The facility is not specific to
> PCI.
> > > > >
> > > > > It can choose to reuse the common configuration or not, but the
> > > > > semantic is general enough to be used by other transports. We
> > > > > can introduce one for MMIO for sure.
> > > > >
> > > > > >
> > > > > > But besides I thought work on live migration will use admin queue.
> > > > > > This was explicitly one of the motivators.
> > > > >
> > > > Please find the proposal that uses administration commands for
> > > > device
> > > migration at [1] for passthrough devices.
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > 61.h
> > > > tml
> > >
> > > This proposal couples live migration with several requirements, and
> > > suffers from the exact issues I've mentioned below.
> > >
> > It does not.
> > Can you please list which one?
> >
> > > In some cases, it's even worse (coupling with PCI/SR-IOV, second
> > > state machine other than the device status).
> > >
> > There is no state machine in [1].
> 
> Isn't the migration modes of "active, stop, freeze" a state machine?
> 
Huh, no. Each mode stops/starts specific thing.
Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
If you call suspend resume as states, it is still 2 state state machines. :)

> > It is not coupled with PCI/SR-IOV either.
> > It supports PCI/SR-IOV transport and in future other transports too when they
> evolve.
> >
> 
> For example:
> 
> +struct virtio_dev_ctx_pci_vq_cfg {
> + le16 vq_index;
> +        le16 queue_size;
> +        le16 queue_msix_vector;
> + le64 queue_descÍ
> +        le64 queue_driverÍ
> +        le64 queue_deviceÍ
> +};
> +\end{lstlisting}
> 
> And does this mean we will have commands for MMIO and other transport?

There are transports so yes, field structures from the device context will have PCI specific items.

> (Most of the fields except the msix are general enough). And it's just a partial
> implementation of the queue related functionality of the common cfg, so I
> wonder how it can work.
> 
As I already explained in cover letter, device context will evolve in v0->v1 to cover more.
True, most fields are general, and it has some pci specific fields, which were not worth taking out to a different structure.
And replicating small number of structs for MMIO is not a problem either as it is not complicating the transport either.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:11                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 12, 2023 9:48 AM
> 
> On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Monday, September 11, 2023 12:01 PM
> > >
> > > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > > From: virtio-comment@lists.oasis-open.org
> > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > > Sent: Monday, September 11, 2023 8:31 AM
> > > > >
> > > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
> > > > > <mst@redhat.com>
> > > wrote:
> > > > > >
> > > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > > This patch adds two new le16 fields to common configuration
> > > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > > >
> > > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > >
> > > > > >
> > > > > > I do not see why this would be pci specific at all.
> > > > >
> > > > > This is the PCI interface for live migration. The facility is not specific to
> PCI.
> > > > >
> > > > > It can choose to reuse the common configuration or not, but the
> > > > > semantic is general enough to be used by other transports. We
> > > > > can introduce one for MMIO for sure.
> > > > >
> > > > > >
> > > > > > But besides I thought work on live migration will use admin queue.
> > > > > > This was explicitly one of the motivators.
> > > > >
> > > > Please find the proposal that uses administration commands for
> > > > device
> > > migration at [1] for passthrough devices.
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > 61.h
> > > > tml
> > >
> > > This proposal couples live migration with several requirements, and
> > > suffers from the exact issues I've mentioned below.
> > >
> > It does not.
> > Can you please list which one?
> >
> > > In some cases, it's even worse (coupling with PCI/SR-IOV, second
> > > state machine other than the device status).
> > >
> > There is no state machine in [1].
> 
> Isn't the migration modes of "active, stop, freeze" a state machine?
> 
Huh, no. Each mode stops/starts specific thing.
Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
If you call suspend resume as states, it is still 2 state state machines. :)

> > It is not coupled with PCI/SR-IOV either.
> > It supports PCI/SR-IOV transport and in future other transports too when they
> evolve.
> >
> 
> For example:
> 
> +struct virtio_dev_ctx_pci_vq_cfg {
> + le16 vq_index;
> +        le16 queue_size;
> +        le16 queue_msix_vector;
> + le64 queue_descÍ
> +        le64 queue_driverÍ
> +        le64 queue_deviceÍ
> +};
> +\end{lstlisting}
> 
> And does this mean we will have commands for MMIO and other transport?

There are transports so yes, field structures from the device context will have PCI specific items.

> (Most of the fields except the msix are general enough). And it's just a partial
> implementation of the queue related functionality of the common cfg, so I
> wonder how it can work.
> 
As I already explained in cover letter, device context will evolve in v0->v1 to cover more.
True, most fields are general, and it has some pci specific fields, which were not worth taking out to a different structure.
And replicating small number of structs for MMIO is not a problem either as it is not complicating the transport either.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  5:58                                     ` Parav Pandit
@ 2023-09-12  6:33                                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:33 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 1:58 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 9:37 AM
>>
>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>> "admin" VF? This require the HW reserve dedicated resource for every
>>>> VF?
>>>> So expensive, Overkill?
>>>>
>>>> And a VF may be managed by the PF and its admin "vf"?
>>> Yes.
>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed by
>> both L1 guest VF and the host PF, that means two owners of the L2 VF.
> This is the nesting.
> When you do M level nesting, does any cpu in world handle its own page tables in isolation of next level and also perform equally well?
Not exactly, in nesting, L1 guest is the host/infrastructure emulator 
for L2, so L2 is expect to do nothing with the host,
or something like L2 VF managed by both L1 VF and host PF can lead to 
operational and security issues?
>
>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>> UDP is not reliable, and performance overhead does not mean fail.
>>> It largely depends on application.
>>> I have seen iperf UDP failing on packet drop and never recovered.
>>> A retransmission over UDP can fail.
>> That depends on the workload, if it choose UDP, it is aware of the possibilities of
>> losing packets. But anyway, LM are expected to perform successfully in the due
>> time
> And LM also depends on the workload. :)
Exactly! That's the point, how to meet the requirements!
> It is pointless to discuss performance characteristics as a point to use AQ or not.
How to meet QOS requirement when LM?
>
>>>>>> But too few AQ to serve too high volume of VMs may be a problem.
>>>>> It is left for the device to implement the needed scale requirement.
>>>> Yes, so how many HW resource should the HW implementation reserved to
>>>> serve the worst case? Half of the board resource?
>>> The board designer can decide how to manage the resource.
>>> Administration commands are explicit instructions to the device.
>>> It knows how many members device's dirty tracking is ongoing, which device
>> context is being read/written.
>> Still, does the board designer need to prepare for the worst case? How to meet
>> that challenge?
> No. board designer does not need to.
> As explained already, if board wants to supporting single command of AQ, sure.
Same as above, the QOS question. For example, how to avoid the situation 
that
half VMs can be migrated and others timeout?
>
>>> Admin command can even fail with EAGAIN error code when device is out of
>> resource and software can retry the command.
>> As demonstrated, this series is reliable as the config space functionalities, so
>> maybe less possibilities to fail?
> Huh. Config space has far higher failure rate for the PCI transport when due to inherent nature of PCI timeouts and reads and polling.
> For any bulk data transfer virtqueue is spec defined approach.
> For more than a year this was debated you can check some 2021 emails.
>
> You can see the patches that data transfer done in [1] over registers is snail slow.
Do you often observe virtio PCI config space fail? Or does admin vq need 
to transfer data through PCI?
>
>>> They key part is all of these happens outside of the VM's downtime.
>>> Majority of the work in proposal [1] is done when the VM is _live_.
>>> Hence, the resource consumption or reservation is significantly less.
>> Still depends on the volume of VMs and devices, the orchestration layer needs
>> to migrate the last round of dirty pages and states even when the VM has been
>> suspended.
> That has nothing do with admin virtqueue.
> And migration layer already does it and used by multiple devices.
same as above, QOS
>
>>>
>>>>>> Naming a number or an algorithm for the ratio of devices /
>>>>>> num_of_AQs is beyond this topic, but I made my point clear.
>>>>> Sure. It is beyond.
>>>>> And it is not a concern either.
>>>> It is, the user expect the LM process success than fail.
>>> I still fail to understand why LM process fails.
>>> The migration process is slow, but downtime is not in [1].
>> If I recall it clear, the downtime is around 300ms, so don't let the bandwidth or
>> num of admin vqs become a bottle neck which may introduce more possibilities
>> to fail.
>>>>>> can depth = 1K introduce significant latency?
>>>>> AQ command execution is not done serially. There is enough text on
>>>>> the AQ
>>>> chapter as I recall.
>>>> Then require more HW resource, I don't see difference.
>>> Difference compared to what, multiple AQs?
>>> If so, sure.
>>> The device who prefers to do only one AQ command at a time, sure it can
>> work with less resource and do one at a time.
>> I think we are discussing the same issue as above "resource for the worst case"
>> problem
> Frankly I am not seeing any issue.
> AQ is just another virtqueue as basic construct in the spec used by 30+ device types.
explained above, when migrate a VM, the time consuming has to 
convergence and the total
downtime has a due, I remember it is less than 300ms. That is the QOS 
requirement.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:33                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:33 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 1:58 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 9:37 AM
>>
>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>> "admin" VF? This require the HW reserve dedicated resource for every
>>>> VF?
>>>> So expensive, Overkill?
>>>>
>>>> And a VF may be managed by the PF and its admin "vf"?
>>> Yes.
>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be managed by
>> both L1 guest VF and the host PF, that means two owners of the L2 VF.
> This is the nesting.
> When you do M level nesting, does any cpu in world handle its own page tables in isolation of next level and also perform equally well?
Not exactly, in nesting, L1 guest is the host/infrastructure emulator 
for L2, so L2 is expect to do nothing with the host,
or something like L2 VF managed by both L1 VF and host PF can lead to 
operational and security issues?
>
>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>> UDP is not reliable, and performance overhead does not mean fail.
>>> It largely depends on application.
>>> I have seen iperf UDP failing on packet drop and never recovered.
>>> A retransmission over UDP can fail.
>> That depends on the workload, if it choose UDP, it is aware of the possibilities of
>> losing packets. But anyway, LM are expected to perform successfully in the due
>> time
> And LM also depends on the workload. :)
Exactly! That's the point, how to meet the requirements!
> It is pointless to discuss performance characteristics as a point to use AQ or not.
How to meet QOS requirement when LM?
>
>>>>>> But too few AQ to serve too high volume of VMs may be a problem.
>>>>> It is left for the device to implement the needed scale requirement.
>>>> Yes, so how many HW resource should the HW implementation reserved to
>>>> serve the worst case? Half of the board resource?
>>> The board designer can decide how to manage the resource.
>>> Administration commands are explicit instructions to the device.
>>> It knows how many members device's dirty tracking is ongoing, which device
>> context is being read/written.
>> Still, does the board designer need to prepare for the worst case? How to meet
>> that challenge?
> No. board designer does not need to.
> As explained already, if board wants to supporting single command of AQ, sure.
Same as above, the QOS question. For example, how to avoid the situation 
that
half VMs can be migrated and others timeout?
>
>>> Admin command can even fail with EAGAIN error code when device is out of
>> resource and software can retry the command.
>> As demonstrated, this series is reliable as the config space functionalities, so
>> maybe less possibilities to fail?
> Huh. Config space has far higher failure rate for the PCI transport when due to inherent nature of PCI timeouts and reads and polling.
> For any bulk data transfer virtqueue is spec defined approach.
> For more than a year this was debated you can check some 2021 emails.
>
> You can see the patches that data transfer done in [1] over registers is snail slow.
Do you often observe virtio PCI config space fail? Or does admin vq need 
to transfer data through PCI?
>
>>> They key part is all of these happens outside of the VM's downtime.
>>> Majority of the work in proposal [1] is done when the VM is _live_.
>>> Hence, the resource consumption or reservation is significantly less.
>> Still depends on the volume of VMs and devices, the orchestration layer needs
>> to migrate the last round of dirty pages and states even when the VM has been
>> suspended.
> That has nothing do with admin virtqueue.
> And migration layer already does it and used by multiple devices.
same as above, QOS
>
>>>
>>>>>> Naming a number or an algorithm for the ratio of devices /
>>>>>> num_of_AQs is beyond this topic, but I made my point clear.
>>>>> Sure. It is beyond.
>>>>> And it is not a concern either.
>>>> It is, the user expect the LM process success than fail.
>>> I still fail to understand why LM process fails.
>>> The migration process is slow, but downtime is not in [1].
>> If I recall it clear, the downtime is around 300ms, so don't let the bandwidth or
>> num of admin vqs become a bottle neck which may introduce more possibilities
>> to fail.
>>>>>> can depth = 1K introduce significant latency?
>>>>> AQ command execution is not done serially. There is enough text on
>>>>> the AQ
>>>> chapter as I recall.
>>>> Then require more HW resource, I don't see difference.
>>> Difference compared to what, multiple AQs?
>>> If so, sure.
>>> The device who prefers to do only one AQ command at a time, sure it can
>> work with less resource and do one at a time.
>> I think we are discussing the same issue as above "resource for the worst case"
>> problem
> Frankly I am not seeing any issue.
> AQ is just another virtqueue as basic construct in the spec used by 30+ device types.
explained above, when migrate a VM, the time consuming has to 
convergence and the total
downtime has a due, I remember it is less than 300ms. That is the QOS 
requirement.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  5:51                                     ` Parav Pandit
@ 2023-09-12  6:37                                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:37 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 1:51 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 9:19 AM
>>
>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>> anything we need to improve in this series?
>>> Admin commands for passthrough devices of [1] is comprehensive proposal
>> covering all the aspects.
>>> To me [1] is superset work that covers all needed functionality and downtime
>> aspects.
>>> I plan to improve [1] with v1 this week by extending device context and
>> addressing other review comments.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> I am not sure, we have discussed a lot about the potential issues in the treads. I
>> guess we should resolve them first. E.g., nested use cases.
> You are using nesting use case as the _only_ use case and attempt to steer using that.
> Not right.
>
> If you want to discuss, then lets have both the use cases, attempt to converge and if we can its really good.
> If we cannot, both requirements should be handled differently.
Isn't nested a clear use case that should be supported?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:37                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:37 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 1:51 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 9:19 AM
>>
>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>> anything we need to improve in this series?
>>> Admin commands for passthrough devices of [1] is comprehensive proposal
>> covering all the aspects.
>>> To me [1] is superset work that covers all needed functionality and downtime
>> aspects.
>>> I plan to improve [1] with v1 this week by extending device context and
>> addressing other review comments.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> I am not sure, we have discussed a lot about the potential issues in the treads. I
>> guess we should resolve them first. E.g., nested use cases.
> You are using nesting use case as the _only_ use case and attempt to steer using that.
> Not right.
>
> If you want to discuss, then lets have both the use cases, attempt to converge and if we can its really good.
> If we cannot, both requirements should be handled differently.
Isn't nested a clear use case that should be supported?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:11                 ` Parav Pandit
@ 2023-09-12  6:43                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:43 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:11 PM, Parav Pandit wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, September 12, 2023 9:48 AM
>>
>> On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
>>>
>>>
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Monday, September 11, 2023 12:01 PM
>>>>
>>>> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
>> wrote:
>>>>> Hi Michael,
>>>>>
>>>>>> From: virtio-comment@lists.oasis-open.org
>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
>>>>>> Sent: Monday, September 11, 2023 8:31 AM
>>>>>>
>>>>>> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
>>>>>> <mst@redhat.com>
>>>> wrote:
>>>>>>> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>>>>>>>> This patch adds two new le16 fields to common configuration
>>>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>>>>>>>
>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>>>
>>>>>>> I do not see why this would be pci specific at all.
>>>>>> This is the PCI interface for live migration. The facility is not specific to
>> PCI.
>>>>>> It can choose to reuse the common configuration or not, but the
>>>>>> semantic is general enough to be used by other transports. We
>>>>>> can introduce one for MMIO for sure.
>>>>>>
>>>>>>> But besides I thought work on live migration will use admin queue.
>>>>>>> This was explicitly one of the motivators.
>>>>> Please find the proposal that uses administration commands for
>>>>> device
>>>> migration at [1] for passthrough devices.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>> 61.h
>>>>> tml
>>>> This proposal couples live migration with several requirements, and
>>>> suffers from the exact issues I've mentioned below.
>>>>
>>> It does not.
>>> Can you please list which one?
>>>
>>>> In some cases, it's even worse (coupling with PCI/SR-IOV, second
>>>> state machine other than the device status).
>>>>
>>> There is no state machine in [1].
>> Isn't the migration modes of "active, stop, freeze" a state machine?
>>
> Huh, no. Each mode stops/starts specific thing.
> Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
> If you call suspend resume as states, it is still 2 state state machines. :)
Why need P2P for Live Migration?
>
>>> It is not coupled with PCI/SR-IOV either.
>>> It supports PCI/SR-IOV transport and in future other transports too when they
>> evolve.
>> For example:
>>
>> +struct virtio_dev_ctx_pci_vq_cfg {
>> + le16 vq_index;
>> +        le16 queue_size;
>> +        le16 queue_msix_vector;
>> + le64 queue_descÍ
>> +        le64 queue_driverÍ
>> +        le64 queue_deviceÍ
>> +};
>> +\end{lstlisting}
>>
>> And does this mean we will have commands for MMIO and other transport?
> There are transports so yes, field structures from the device context will have PCI specific items.
>
>> (Most of the fields except the msix are general enough). And it's just a partial
>> implementation of the queue related functionality of the common cfg, so I
>> wonder how it can work.
>>
> As I already explained in cover letter, device context will evolve in v0->v1 to cover more.
> True, most fields are general, and it has some pci specific fields, which were not worth taking out to a different structure.
> And replicating small number of structs for MMIO is not a problem either as it is not complicating the transport either.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:43                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  6:43 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:11 PM, Parav Pandit wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, September 12, 2023 9:48 AM
>>
>> On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
>>>
>>>
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Monday, September 11, 2023 12:01 PM
>>>>
>>>> On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
>> wrote:
>>>>> Hi Michael,
>>>>>
>>>>>> From: virtio-comment@lists.oasis-open.org
>>>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
>>>>>> Sent: Monday, September 11, 2023 8:31 AM
>>>>>>
>>>>>> On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
>>>>>> <mst@redhat.com>
>>>> wrote:
>>>>>>> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>>>>>>>> This patch adds two new le16 fields to common configuration
>>>>>>>> structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>>>>>>>
>>>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>>>
>>>>>>> I do not see why this would be pci specific at all.
>>>>>> This is the PCI interface for live migration. The facility is not specific to
>> PCI.
>>>>>> It can choose to reuse the common configuration or not, but the
>>>>>> semantic is general enough to be used by other transports. We
>>>>>> can introduce one for MMIO for sure.
>>>>>>
>>>>>>> But besides I thought work on live migration will use admin queue.
>>>>>>> This was explicitly one of the motivators.
>>>>> Please find the proposal that uses administration commands for
>>>>> device
>>>> migration at [1] for passthrough devices.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>> 61.h
>>>>> tml
>>>> This proposal couples live migration with several requirements, and
>>>> suffers from the exact issues I've mentioned below.
>>>>
>>> It does not.
>>> Can you please list which one?
>>>
>>>> In some cases, it's even worse (coupling with PCI/SR-IOV, second
>>>> state machine other than the device status).
>>>>
>>> There is no state machine in [1].
>> Isn't the migration modes of "active, stop, freeze" a state machine?
>>
> Huh, no. Each mode stops/starts specific thing.
> Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
> If you call suspend resume as states, it is still 2 state state machines. :)
Why need P2P for Live Migration?
>
>>> It is not coupled with PCI/SR-IOV either.
>>> It supports PCI/SR-IOV transport and in future other transports too when they
>> evolve.
>> For example:
>>
>> +struct virtio_dev_ctx_pci_vq_cfg {
>> + le16 vq_index;
>> +        le16 queue_size;
>> +        le16 queue_msix_vector;
>> + le64 queue_descÍ
>> +        le64 queue_driverÍ
>> +        le64 queue_deviceÍ
>> +};
>> +\end{lstlisting}
>>
>> And does this mean we will have commands for MMIO and other transport?
> There are transports so yes, field structures from the device context will have PCI specific items.
>
>> (Most of the fields except the msix are general enough). And it's just a partial
>> implementation of the queue related functionality of the common cfg, so I
>> wonder how it can work.
>>
> As I already explained in cover letter, device context will evolve in v0->v1 to cover more.
> True, most fields are general, and it has some pci specific fields, which were not worth taking out to a different structure.
> And replicating small number of structs for MMIO is not a problem either as it is not complicating the transport either.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:33                                       ` Zhu, Lingshan
@ 2023-09-12  6:47                                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:47 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:04 PM
> 
> 
> On 9/12/2023 1:58 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 9:37 AM
> >>
> >> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >>>> "admin" VF? This require the HW reserve dedicated resource for
> >>>> every VF?
> >>>> So expensive, Overkill?
> >>>>
> >>>> And a VF may be managed by the PF and its admin "vf"?
> >>> Yes.
> >> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
> >> managed by both L1 guest VF and the host PF, that means two owners of the
> L2 VF.
> > This is the nesting.
> > When you do M level nesting, does any cpu in world handle its own page
> tables in isolation of next level and also perform equally well?
> Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
> is expect to do nothing with the host, or something like L2 VF managed by both
> L1 VF and host PF can lead to operational and security issues?
> >
> >>>>> If UDP packets are dropped, even application can fail who do no retry.
> >>>> UDP is not reliable, and performance overhead does not mean fail.
> >>> It largely depends on application.
> >>> I have seen iperf UDP failing on packet drop and never recovered.
> >>> A retransmission over UDP can fail.
> >> That depends on the workload, if it choose UDP, it is aware of the
> >> possibilities of losing packets. But anyway, LM are expected to
> >> perform successfully in the due time
> > And LM also depends on the workload. :)
> Exactly! That's the point, how to meet the requirements!
> > It is pointless to discuss performance characteristics as a point to use AQ or
> not.
> How to meet QOS requirement when LM?
By following [1] where large part of device context and dirty page tracking is done when the VM is running.

> > No. board designer does not need to.
> > As explained already, if board wants to supporting single command of AQ,
> sure.
> Same as above, the QOS question. For example, how to avoid the situation that
> half VMs can be migrated and others timeout?
Why would this happen?
Timeout is not related to AQ in case if that happens.
Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.

I am still not able to follow your point for asking about unrelated QOS questions.

> >
> >>> Admin command can even fail with EAGAIN error code when device is
> >>> out of
> >> resource and software can retry the command.
> >> As demonstrated, this series is reliable as the config space
> >> functionalities, so maybe less possibilities to fail?
> > Huh. Config space has far higher failure rate for the PCI transport when due to
> inherent nature of PCI timeouts and reads and polling.
> > For any bulk data transfer virtqueue is spec defined approach.
> > For more than a year this was debated you can check some 2021 emails.
> >
> > You can see the patches that data transfer done in [1] over registers is snail
> slow.
> Do you often observe virtio PCI config space fail? Or does admin vq need to
> transfer data through PCI?
Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.

> >
> >>> They key part is all of these happens outside of the VM's downtime.
> >>> Majority of the work in proposal [1] is done when the VM is _live_.
> >>> Hence, the resource consumption or reservation is significantly less.
> >> Still depends on the volume of VMs and devices, the orchestration
> >> layer needs to migrate the last round of dirty pages and states even
> >> when the VM has been suspended.
> > That has nothing do with admin virtqueue.
> > And migration layer already does it and used by multiple devices.
> same as above, QOS
> >
> >>>
> >>>>>> Naming a number or an algorithm for the ratio of devices /
> >>>>>> num_of_AQs is beyond this topic, but I made my point clear.
> >>>>> Sure. It is beyond.
> >>>>> And it is not a concern either.
> >>>> It is, the user expect the LM process success than fail.
> >>> I still fail to understand why LM process fails.
> >>> The migration process is slow, but downtime is not in [1].
> >> If I recall it clear, the downtime is around 300ms, so don't let the
> >> bandwidth or num of admin vqs become a bottle neck which may
> >> introduce more possibilities to fail.
> >>>>>> can depth = 1K introduce significant latency?
> >>>>> AQ command execution is not done serially. There is enough text on
> >>>>> the AQ
> >>>> chapter as I recall.
> >>>> Then require more HW resource, I don't see difference.
> >>> Difference compared to what, multiple AQs?
> >>> If so, sure.
> >>> The device who prefers to do only one AQ command at a time, sure it
> >>> can
> >> work with less resource and do one at a time.
> >> I think we are discussing the same issue as above "resource for the worst
> case"
> >> problem
> > Frankly I am not seeing any issue.
> > AQ is just another virtqueue as basic construct in the spec used by 30+ device
> types.
> explained above, when migrate a VM, the time consuming has to convergence
> and the total downtime has a due, I remember it is less than 300ms. That is the
> QOS requirement.
And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:47                                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:47 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:04 PM
> 
> 
> On 9/12/2023 1:58 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 9:37 AM
> >>
> >> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >>>> "admin" VF? This require the HW reserve dedicated resource for
> >>>> every VF?
> >>>> So expensive, Overkill?
> >>>>
> >>>> And a VF may be managed by the PF and its admin "vf"?
> >>> Yes.
> >> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
> >> managed by both L1 guest VF and the host PF, that means two owners of the
> L2 VF.
> > This is the nesting.
> > When you do M level nesting, does any cpu in world handle its own page
> tables in isolation of next level and also perform equally well?
> Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
> is expect to do nothing with the host, or something like L2 VF managed by both
> L1 VF and host PF can lead to operational and security issues?
> >
> >>>>> If UDP packets are dropped, even application can fail who do no retry.
> >>>> UDP is not reliable, and performance overhead does not mean fail.
> >>> It largely depends on application.
> >>> I have seen iperf UDP failing on packet drop and never recovered.
> >>> A retransmission over UDP can fail.
> >> That depends on the workload, if it choose UDP, it is aware of the
> >> possibilities of losing packets. But anyway, LM are expected to
> >> perform successfully in the due time
> > And LM also depends on the workload. :)
> Exactly! That's the point, how to meet the requirements!
> > It is pointless to discuss performance characteristics as a point to use AQ or
> not.
> How to meet QOS requirement when LM?
By following [1] where large part of device context and dirty page tracking is done when the VM is running.

> > No. board designer does not need to.
> > As explained already, if board wants to supporting single command of AQ,
> sure.
> Same as above, the QOS question. For example, how to avoid the situation that
> half VMs can be migrated and others timeout?
Why would this happen?
Timeout is not related to AQ in case if that happens.
Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.

I am still not able to follow your point for asking about unrelated QOS questions.

> >
> >>> Admin command can even fail with EAGAIN error code when device is
> >>> out of
> >> resource and software can retry the command.
> >> As demonstrated, this series is reliable as the config space
> >> functionalities, so maybe less possibilities to fail?
> > Huh. Config space has far higher failure rate for the PCI transport when due to
> inherent nature of PCI timeouts and reads and polling.
> > For any bulk data transfer virtqueue is spec defined approach.
> > For more than a year this was debated you can check some 2021 emails.
> >
> > You can see the patches that data transfer done in [1] over registers is snail
> slow.
> Do you often observe virtio PCI config space fail? Or does admin vq need to
> transfer data through PCI?
Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.

> >
> >>> They key part is all of these happens outside of the VM's downtime.
> >>> Majority of the work in proposal [1] is done when the VM is _live_.
> >>> Hence, the resource consumption or reservation is significantly less.
> >> Still depends on the volume of VMs and devices, the orchestration
> >> layer needs to migrate the last round of dirty pages and states even
> >> when the VM has been suspended.
> > That has nothing do with admin virtqueue.
> > And migration layer already does it and used by multiple devices.
> same as above, QOS
> >
> >>>
> >>>>>> Naming a number or an algorithm for the ratio of devices /
> >>>>>> num_of_AQs is beyond this topic, but I made my point clear.
> >>>>> Sure. It is beyond.
> >>>>> And it is not a concern either.
> >>>> It is, the user expect the LM process success than fail.
> >>> I still fail to understand why LM process fails.
> >>> The migration process is slow, but downtime is not in [1].
> >> If I recall it clear, the downtime is around 300ms, so don't let the
> >> bandwidth or num of admin vqs become a bottle neck which may
> >> introduce more possibilities to fail.
> >>>>>> can depth = 1K introduce significant latency?
> >>>>> AQ command execution is not done serially. There is enough text on
> >>>>> the AQ
> >>>> chapter as I recall.
> >>>> Then require more HW resource, I don't see difference.
> >>> Difference compared to what, multiple AQs?
> >>> If so, sure.
> >>> The device who prefers to do only one AQ command at a time, sure it
> >>> can
> >> work with less resource and do one at a time.
> >> I think we are discussing the same issue as above "resource for the worst
> case"
> >> problem
> > Frankly I am not seeing any issue.
> > AQ is just another virtqueue as basic construct in the spec used by 30+ device
> types.
> explained above, when migrate a VM, the time consuming has to convergence
> and the total downtime has a due, I remember it is less than 300ms. That is the
> QOS requirement.
And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:37                                       ` Zhu, Lingshan
@ 2023-09-12  6:49                                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:49 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:08 PM
> 
> On 9/12/2023 1:51 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 9:19 AM
> >>
> >> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >>>> anything we need to improve in this series?
> >>> Admin commands for passthrough devices of [1] is comprehensive
> >>> proposal
> >> covering all the aspects.
> >>> To me [1] is superset work that covers all needed functionality and
> >>> downtime
> >> aspects.
> >>> I plan to improve [1] with v1 this week by extending device context
> >>> and
> >> addressing other review comments.
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
> >>> .h
> >>> tml
> >> I am not sure, we have discussed a lot about the potential issues in
> >> the treads. I guess we should resolve them first. E.g., nested use cases.
> > You are using nesting use case as the _only_ use case and attempt to steer
> using that.
> > Not right.
> >
> > If you want to discuss, then lets have both the use cases, attempt to converge
> and if we can its really good.
> > If we cannot, both requirements should be handled differently.
> Isn't nested a clear use case that should be supported?

Most users who care for running real applications and real performance, have not asked for nesting.
It is not mandatory case; it may be required for some users.
I don’t know who needs M level nesting and how cpu also support its acceleration etc to run some reasonable workload.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:49                                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:49 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:08 PM
> 
> On 9/12/2023 1:51 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 9:19 AM
> >>
> >> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >>>> anything we need to improve in this series?
> >>> Admin commands for passthrough devices of [1] is comprehensive
> >>> proposal
> >> covering all the aspects.
> >>> To me [1] is superset work that covers all needed functionality and
> >>> downtime
> >> aspects.
> >>> I plan to improve [1] with v1 this week by extending device context
> >>> and
> >> addressing other review comments.
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
> >>> .h
> >>> tml
> >> I am not sure, we have discussed a lot about the potential issues in
> >> the treads. I guess we should resolve them first. E.g., nested use cases.
> > You are using nesting use case as the _only_ use case and attempt to steer
> using that.
> > Not right.
> >
> > If you want to discuss, then lets have both the use cases, attempt to converge
> and if we can its really good.
> > If we cannot, both requirements should be handled differently.
> Isn't nested a clear use case that should be supported?

Most users who care for running real applications and real performance, have not asked for nesting.
It is not mandatory case; it may be required for some users.
I don’t know who needs M level nesting and how cpu also support its acceleration etc to run some reasonable workload.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:43                   ` Zhu, Lingshan
@ 2023-09-12  6:52                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:52 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:13 PM

> Why need P2P for Live Migration?
A peer device may be accessing the virtio device. Hence first all the devices to be stopped like [1] allowing them to accept driver notifications from the peer device.
Once all the devices are stopped, than each device to be freeze to not do any device context updates. At this point the final device context can be read by the owner driver.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  6:52                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  6:52 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:13 PM

> Why need P2P for Live Migration?
A peer device may be accessing the virtio device. Hence first all the devices to be stopped like [1] allowing them to accept driver notifications from the peer device.
Once all the devices are stopped, than each device to be freeze to not do any device context updates. At this point the final device context can be read by the owner driver.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:47                                         ` Parav Pandit
@ 2023-09-12  7:27                                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:27 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:47 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:04 PM
>>
>>
>> On 9/12/2023 1:58 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 9:37 AM
>>>>
>>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>>>> "admin" VF? This require the HW reserve dedicated resource for
>>>>>> every VF?
>>>>>> So expensive, Overkill?
>>>>>>
>>>>>> And a VF may be managed by the PF and its admin "vf"?
>>>>> Yes.
>>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
>>>> managed by both L1 guest VF and the host PF, that means two owners of the
>> L2 VF.
>>> This is the nesting.
>>> When you do M level nesting, does any cpu in world handle its own page
>> tables in isolation of next level and also perform equally well?
>> Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
>> is expect to do nothing with the host, or something like L2 VF managed by both
>> L1 VF and host PF can lead to operational and security issues?
>>>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>>>> UDP is not reliable, and performance overhead does not mean fail.
>>>>> It largely depends on application.
>>>>> I have seen iperf UDP failing on packet drop and never recovered.
>>>>> A retransmission over UDP can fail.
>>>> That depends on the workload, if it choose UDP, it is aware of the
>>>> possibilities of losing packets. But anyway, LM are expected to
>>>> perform successfully in the due time
>>> And LM also depends on the workload. :)
>> Exactly! That's the point, how to meet the requirements!
>>> It is pointless to discuss performance characteristics as a point to use AQ or
>> not.
>> How to meet QOS requirement when LM?
> By following [1] where large part of device context and dirty page tracking is done when the VM is running.
Still needs to migrate the last round of dirty pages and device states 
when VM freeze. Still can be large if
take big amount of VMs into consideration, and that is where ~300ms due 
time rules.
>
>>> No. board designer does not need to.
>>> As explained already, if board wants to supporting single command of AQ,
>> sure.
>> Same as above, the QOS question. For example, how to avoid the situation that
>> half VMs can be migrated and others timeout?
> Why would this happen?
> Timeout is not related to AQ in case if that happens.
explained above
> Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.
When the VM freeze, the virtio functionalities, for example virito-net 
transaction is suspended as well,
so no TLPs for networking traffic buffers.

The on-device Live Migration facility can use the full PCI device 
bandwidth for migration.

That is the difference with the admin vq.
>
> I am still not able to follow your point for asking about unrelated QOS questions.
explained above, it has to meet the due time requirement and many VMs 
can be migrated simultaneously,
in that situation, they have to race for the admin vq resource/bandwidth.
>
>>>>> Admin command can even fail with EAGAIN error code when device is
>>>>> out of
>>>> resource and software can retry the command.
>>>> As demonstrated, this series is reliable as the config space
>>>> functionalities, so maybe less possibilities to fail?
>>> Huh. Config space has far higher failure rate for the PCI transport when due to
>> inherent nature of PCI timeouts and reads and polling.
>>> For any bulk data transfer virtqueue is spec defined approach.
>>> For more than a year this was debated you can check some 2021 emails.
>>>
>>> You can see the patches that data transfer done in [1] over registers is snail
>> slow.
>> Do you often observe virtio PCI config space fail? Or does admin vq need to
>> transfer data through PCI?
> Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.
So you agree actually PCI config space are very unlikely to fail? It is 
reliable.

Please allow me to provide an extreme example, is one single admin vq 
limitless, that can
serve hundreds to thousands of VMs migration? If not, two or three or 
what number?
>
>>>>> They key part is all of these happens outside of the VM's downtime.
>>>>> Majority of the work in proposal [1] is done when the VM is _live_.
>>>>> Hence, the resource consumption or reservation is significantly less.
>>>> Still depends on the volume of VMs and devices, the orchestration
>>>> layer needs to migrate the last round of dirty pages and states even
>>>> when the VM has been suspended.
>>> That has nothing do with admin virtqueue.
>>> And migration layer already does it and used by multiple devices.
>> same as above, QOS
>>>>>>>> Naming a number or an algorithm for the ratio of devices /
>>>>>>>> num_of_AQs is beyond this topic, but I made my point clear.
>>>>>>> Sure. It is beyond.
>>>>>>> And it is not a concern either.
>>>>>> It is, the user expect the LM process success than fail.
>>>>> I still fail to understand why LM process fails.
>>>>> The migration process is slow, but downtime is not in [1].
>>>> If I recall it clear, the downtime is around 300ms, so don't let the
>>>> bandwidth or num of admin vqs become a bottle neck which may
>>>> introduce more possibilities to fail.
>>>>>>>> can depth = 1K introduce significant latency?
>>>>>>> AQ command execution is not done serially. There is enough text on
>>>>>>> the AQ
>>>>>> chapter as I recall.
>>>>>> Then require more HW resource, I don't see difference.
>>>>> Difference compared to what, multiple AQs?
>>>>> If so, sure.
>>>>> The device who prefers to do only one AQ command at a time, sure it
>>>>> can
>>>> work with less resource and do one at a time.
>>>> I think we are discussing the same issue as above "resource for the worst
>> case"
>>>> problem
>>> Frankly I am not seeing any issue.
>>> AQ is just another virtqueue as basic construct in the spec used by 30+ device
>> types.
>> explained above, when migrate a VM, the time consuming has to convergence
>> and the total downtime has a due, I remember it is less than 300ms. That is the
>> QOS requirement.
> And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].
explained above, depends on the amount of the migrating VMs.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  7:27                                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:27 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:47 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:04 PM
>>
>>
>> On 9/12/2023 1:58 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 9:37 AM
>>>>
>>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>>>> "admin" VF? This require the HW reserve dedicated resource for
>>>>>> every VF?
>>>>>> So expensive, Overkill?
>>>>>>
>>>>>> And a VF may be managed by the PF and its admin "vf"?
>>>>> Yes.
>>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
>>>> managed by both L1 guest VF and the host PF, that means two owners of the
>> L2 VF.
>>> This is the nesting.
>>> When you do M level nesting, does any cpu in world handle its own page
>> tables in isolation of next level and also perform equally well?
>> Not exactly, in nesting, L1 guest is the host/infrastructure emulator for L2, so L2
>> is expect to do nothing with the host, or something like L2 VF managed by both
>> L1 VF and host PF can lead to operational and security issues?
>>>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>>>> UDP is not reliable, and performance overhead does not mean fail.
>>>>> It largely depends on application.
>>>>> I have seen iperf UDP failing on packet drop and never recovered.
>>>>> A retransmission over UDP can fail.
>>>> That depends on the workload, if it choose UDP, it is aware of the
>>>> possibilities of losing packets. But anyway, LM are expected to
>>>> perform successfully in the due time
>>> And LM also depends on the workload. :)
>> Exactly! That's the point, how to meet the requirements!
>>> It is pointless to discuss performance characteristics as a point to use AQ or
>> not.
>> How to meet QOS requirement when LM?
> By following [1] where large part of device context and dirty page tracking is done when the VM is running.
Still needs to migrate the last round of dirty pages and device states 
when VM freeze. Still can be large if
take big amount of VMs into consideration, and that is where ~300ms due 
time rules.
>
>>> No. board designer does not need to.
>>> As explained already, if board wants to supporting single command of AQ,
>> sure.
>> Same as above, the QOS question. For example, how to avoid the situation that
>> half VMs can be migrated and others timeout?
> Why would this happen?
> Timeout is not related to AQ in case if that happens.
explained above
> Timeout can happen to config registers too. And it can be even far more harder for board designers to support PCI reads in a timeout to handle in 384 reads in parallel.
When the VM freeze, the virtio functionalities, for example virito-net 
transaction is suspended as well,
so no TLPs for networking traffic buffers.

The on-device Live Migration facility can use the full PCI device 
bandwidth for migration.

That is the difference with the admin vq.
>
> I am still not able to follow your point for asking about unrelated QOS questions.
explained above, it has to meet the due time requirement and many VMs 
can be migrated simultaneously,
in that situation, they have to race for the admin vq resource/bandwidth.
>
>>>>> Admin command can even fail with EAGAIN error code when device is
>>>>> out of
>>>> resource and software can retry the command.
>>>> As demonstrated, this series is reliable as the config space
>>>> functionalities, so maybe less possibilities to fail?
>>> Huh. Config space has far higher failure rate for the PCI transport when due to
>> inherent nature of PCI timeouts and reads and polling.
>>> For any bulk data transfer virtqueue is spec defined approach.
>>> For more than a year this was debated you can check some 2021 emails.
>>>
>>> You can see the patches that data transfer done in [1] over registers is snail
>> slow.
>> Do you often observe virtio PCI config space fail? Or does admin vq need to
>> transfer data through PCI?
> Admin commands needs to transfer bulk data across thousands of VFs in parallel for many VFs without baking registers in PCI.
So you agree actually PCI config space are very unlikely to fail? It is 
reliable.

Please allow me to provide an extreme example, is one single admin vq 
limitless, that can
serve hundreds to thousands of VMs migration? If not, two or three or 
what number?
>
>>>>> They key part is all of these happens outside of the VM's downtime.
>>>>> Majority of the work in proposal [1] is done when the VM is _live_.
>>>>> Hence, the resource consumption or reservation is significantly less.
>>>> Still depends on the volume of VMs and devices, the orchestration
>>>> layer needs to migrate the last round of dirty pages and states even
>>>> when the VM has been suspended.
>>> That has nothing do with admin virtqueue.
>>> And migration layer already does it and used by multiple devices.
>> same as above, QOS
>>>>>>>> Naming a number or an algorithm for the ratio of devices /
>>>>>>>> num_of_AQs is beyond this topic, but I made my point clear.
>>>>>>> Sure. It is beyond.
>>>>>>> And it is not a concern either.
>>>>>> It is, the user expect the LM process success than fail.
>>>>> I still fail to understand why LM process fails.
>>>>> The migration process is slow, but downtime is not in [1].
>>>> If I recall it clear, the downtime is around 300ms, so don't let the
>>>> bandwidth or num of admin vqs become a bottle neck which may
>>>> introduce more possibilities to fail.
>>>>>>>> can depth = 1K introduce significant latency?
>>>>>>> AQ command execution is not done serially. There is enough text on
>>>>>>> the AQ
>>>>>> chapter as I recall.
>>>>>> Then require more HW resource, I don't see difference.
>>>>> Difference compared to what, multiple AQs?
>>>>> If so, sure.
>>>>> The device who prefers to do only one AQ command at a time, sure it
>>>>> can
>>>> work with less resource and do one at a time.
>>>> I think we are discussing the same issue as above "resource for the worst
>> case"
>>>> problem
>>> Frankly I am not seeing any issue.
>>> AQ is just another virtqueue as basic construct in the spec used by 30+ device
>> types.
>> explained above, when migrate a VM, the time consuming has to convergence
>> and the total downtime has a due, I remember it is less than 300ms. That is the
>> QOS requirement.
> And admin commands can easily serve that as majority of the work is done when the VM is running and member device is in active state in proposal [1].
explained above, depends on the amount of the migrating VMs.
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:49                                         ` Parav Pandit
@ 2023-09-12  7:29                                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:29 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:49 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:08 PM
>>
>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>
>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>> anything we need to improve in this series?
>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>> proposal
>>>> covering all the aspects.
>>>>> To me [1] is superset work that covers all needed functionality and
>>>>> downtime
>>>> aspects.
>>>>> I plan to improve [1] with v1 this week by extending device context
>>>>> and
>>>> addressing other review comments.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
>>>>> .h
>>>>> tml
>>>> I am not sure, we have discussed a lot about the potential issues in
>>>> the treads. I guess we should resolve them first. E.g., nested use cases.
>>> You are using nesting use case as the _only_ use case and attempt to steer
>> using that.
>>> Not right.
>>>
>>> If you want to discuss, then lets have both the use cases, attempt to converge
>> and if we can its really good.
>>> If we cannot, both requirements should be handled differently.
>> Isn't nested a clear use case that should be supported?
> Most users who care for running real applications and real performance, have not asked for nesting.
> It is not mandatory case; it may be required for some users.
> I don’t know who needs M level nesting and how cpu also support its acceleration etc to run some reasonable workload.
Nested is a common use case and it is mandatory.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  7:29                                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:29 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:49 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:08 PM
>>
>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>
>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>> anything we need to improve in this series?
>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>> proposal
>>>> covering all the aspects.
>>>>> To me [1] is superset work that covers all needed functionality and
>>>>> downtime
>>>> aspects.
>>>>> I plan to improve [1] with v1 this week by extending device context
>>>>> and
>>>> addressing other review comments.
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061
>>>>> .h
>>>>> tml
>>>> I am not sure, we have discussed a lot about the potential issues in
>>>> the treads. I guess we should resolve them first. E.g., nested use cases.
>>> You are using nesting use case as the _only_ use case and attempt to steer
>> using that.
>>> Not right.
>>>
>>> If you want to discuss, then lets have both the use cases, attempt to converge
>> and if we can its really good.
>>> If we cannot, both requirements should be handled differently.
>> Isn't nested a clear use case that should be supported?
> Most users who care for running real applications and real performance, have not asked for nesting.
> It is not mandatory case; it may be required for some users.
> I don’t know who needs M level nesting and how cpu also support its acceleration etc to run some reasonable workload.
Nested is a common use case and it is mandatory.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:52                     ` Parav Pandit
@ 2023-09-12  7:36                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:36 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:52 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:13 PM
>> Why need P2P for Live Migration?
> A peer device may be accessing the virtio device. Hence first all the devices to be stopped like [1] allowing them to accept driver notifications from the peer device.
> Once all the devices are stopped, than each device to be freeze to not do any device context updates. At this point the final device context can be read by the owner driver.
Is it beyond the spec? Nvidia specific use case and not related to 
virtio live migration?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  7:36                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  7:36 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 2:52 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:13 PM
>> Why need P2P for Live Migration?
> A peer device may be accessing the virtio device. Hence first all the devices to be stopped like [1] allowing them to accept driver notifications from the peer device.
> Once all the devices are stopped, than each device to be freeze to not do any device context updates. At this point the final device context can be read by the owner driver.
Is it beyond the spec? Nvidia specific use case and not related to 
virtio live migration?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:27                                           ` Zhu, Lingshan
@ 2023-09-12  7:40                                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  7:40 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:58 PM
> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
> open.org; virtio-dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> 
> 
> On 9/12/2023 2:47 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:04 PM
> >>
> >>
> >> On 9/12/2023 1:58 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 9:37 AM
> >>>>
> >>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >>>>>> "admin" VF? This require the HW reserve dedicated resource for
> >>>>>> every VF?
> >>>>>> So expensive, Overkill?
> >>>>>>
> >>>>>> And a VF may be managed by the PF and its admin "vf"?
> >>>>> Yes.
> >>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
> >>>> managed by both L1 guest VF and the host PF, that means two owners
> >>>> of the
> >> L2 VF.
> >>> This is the nesting.
> >>> When you do M level nesting, does any cpu in world handle its own
> >>> page
> >> tables in isolation of next level and also perform equally well?
> >> Not exactly, in nesting, L1 guest is the host/infrastructure emulator
> >> for L2, so L2 is expect to do nothing with the host, or something
> >> like L2 VF managed by both
> >> L1 VF and host PF can lead to operational and security issues?
> >>>>>>> If UDP packets are dropped, even application can fail who do no retry.
> >>>>>> UDP is not reliable, and performance overhead does not mean fail.
> >>>>> It largely depends on application.
> >>>>> I have seen iperf UDP failing on packet drop and never recovered.
> >>>>> A retransmission over UDP can fail.
> >>>> That depends on the workload, if it choose UDP, it is aware of the
> >>>> possibilities of losing packets. But anyway, LM are expected to
> >>>> perform successfully in the due time
> >>> And LM also depends on the workload. :)
> >> Exactly! That's the point, how to meet the requirements!
> >>> It is pointless to discuss performance characteristics as a point to
> >>> use AQ or
> >> not.
> >> How to meet QOS requirement when LM?
> > By following [1] where large part of device context and dirty page tracking is
> done when the VM is running.
> Still needs to migrate the last round of dirty pages and device states when VM
> freeze. Still can be large if take big amount of VMs into consideration, and that
> is where ~300ms due time rules.
> >
> >>> No. board designer does not need to.
> >>> As explained already, if board wants to supporting single command of
> >>> AQ,
> >> sure.
> >> Same as above, the QOS question. For example, how to avoid the
> >> situation that half VMs can be migrated and others timeout?
> > Why would this happen?
> > Timeout is not related to AQ in case if that happens.
> explained above
> > Timeout can happen to config registers too. And it can be even far more
> harder for board designers to support PCI reads in a timeout to handle in 384
> reads in parallel.
> When the VM freeze, the virtio functionalities, for example virito-net
> transaction is suspended as well, so no TLPs for networking traffic buffers.
The config registers mediated operation done by host itself are TLPs flowing for several hundreds of VM example you took.
In your example you took 1000 VMs freezing simultaneously for which you need to finish the config cycles in some 300 msec.

> 
> The on-device Live Migration facility can use the full PCI device bandwidth for
> migration.
So does admin commands also.
However the big difference is: registers do not scale with large number of VFs.
Admin commands scale easily.

I probably should not repeat what is already captured in the admin commands commit log and cover letter.

> 
> That is the difference with the admin vq.
I don’t know what difference you are talking about.
PCI device bandwidth for migration is available with admin commands and some config registers both.
BW != timeout.

> >
> > I am still not able to follow your point for asking about unrelated QOS
> questions.
> explained above, it has to meet the due time requirement and many VMs can
> be migrated simultaneously, in that situation, they have to race for the admin
> vq resource/bandwidth.
> >
> >>>>> Admin command can even fail with EAGAIN error code when device is
> >>>>> out of
> >>>> resource and software can retry the command.
> >>>> As demonstrated, this series is reliable as the config space
> >>>> functionalities, so maybe less possibilities to fail?
> >>> Huh. Config space has far higher failure rate for the PCI transport
> >>> when due to
> >> inherent nature of PCI timeouts and reads and polling.
> >>> For any bulk data transfer virtqueue is spec defined approach.
> >>> For more than a year this was debated you can check some 2021 emails.
> >>>
> >>> You can see the patches that data transfer done in [1] over
> >>> registers is snail
> >> slow.
> >> Do you often observe virtio PCI config space fail? Or does admin vq
> >> need to transfer data through PCI?
> > Admin commands needs to transfer bulk data across thousands of VFs in
> parallel for many VFs without baking registers in PCI.
> So you agree actually PCI config space are very unlikely to fail? It is reliable.
> 
No. I do not agree. It can fail and very hard for board designers.
AQs are more reliable way to transport bulk data in scalable manner for tens of member devices.

> Please allow me to provide an extreme example, is one single admin vq
> limitless, that can serve hundreds to thousands of VMs migration? 
It is left to the device implementation. Just like RSS and multi queue support?
Is one Q enough for 800Gbps to 10Mbps link?
Answer is: Not the scope of specification, spec provide the framework to scale this way, but not impose on the device.

> If not, two or
> three or what number?
It really does not matter. Its wrong point to discuss here.
Number of queues and command execution depends on the device implementation.
A financial transaction application can timeout when a device queuing delay for virtio net rx queue is long.
And we don’t put details about such things in specification.
Spec takes the requirements and provides driver device interface to implement and scale.

I still don’t follow the motivation behind the question.
Is your question: How many admin queues are needed to migrate N member devices? If so, it is implementation specific.
It is similar to how such things depend on implementation for 30 virtio device types.

And if are implying that because it is implementation specific, that is why administration queue should not be used, but some configuration register should be used.
Than you should propose a config register interface to post virtqueue descriptors that way for 30 device types!

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  7:40                                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  7:40 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:58 PM
> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
> open.org; virtio-dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> 
> 
> On 9/12/2023 2:47 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:04 PM
> >>
> >>
> >> On 9/12/2023 1:58 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 9:37 AM
> >>>>
> >>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
> >>>>>> "admin" VF? This require the HW reserve dedicated resource for
> >>>>>> every VF?
> >>>>>> So expensive, Overkill?
> >>>>>>
> >>>>>> And a VF may be managed by the PF and its admin "vf"?
> >>>>> Yes.
> >>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
> >>>> managed by both L1 guest VF and the host PF, that means two owners
> >>>> of the
> >> L2 VF.
> >>> This is the nesting.
> >>> When you do M level nesting, does any cpu in world handle its own
> >>> page
> >> tables in isolation of next level and also perform equally well?
> >> Not exactly, in nesting, L1 guest is the host/infrastructure emulator
> >> for L2, so L2 is expect to do nothing with the host, or something
> >> like L2 VF managed by both
> >> L1 VF and host PF can lead to operational and security issues?
> >>>>>>> If UDP packets are dropped, even application can fail who do no retry.
> >>>>>> UDP is not reliable, and performance overhead does not mean fail.
> >>>>> It largely depends on application.
> >>>>> I have seen iperf UDP failing on packet drop and never recovered.
> >>>>> A retransmission over UDP can fail.
> >>>> That depends on the workload, if it choose UDP, it is aware of the
> >>>> possibilities of losing packets. But anyway, LM are expected to
> >>>> perform successfully in the due time
> >>> And LM also depends on the workload. :)
> >> Exactly! That's the point, how to meet the requirements!
> >>> It is pointless to discuss performance characteristics as a point to
> >>> use AQ or
> >> not.
> >> How to meet QOS requirement when LM?
> > By following [1] where large part of device context and dirty page tracking is
> done when the VM is running.
> Still needs to migrate the last round of dirty pages and device states when VM
> freeze. Still can be large if take big amount of VMs into consideration, and that
> is where ~300ms due time rules.
> >
> >>> No. board designer does not need to.
> >>> As explained already, if board wants to supporting single command of
> >>> AQ,
> >> sure.
> >> Same as above, the QOS question. For example, how to avoid the
> >> situation that half VMs can be migrated and others timeout?
> > Why would this happen?
> > Timeout is not related to AQ in case if that happens.
> explained above
> > Timeout can happen to config registers too. And it can be even far more
> harder for board designers to support PCI reads in a timeout to handle in 384
> reads in parallel.
> When the VM freeze, the virtio functionalities, for example virito-net
> transaction is suspended as well, so no TLPs for networking traffic buffers.
The config registers mediated operation done by host itself are TLPs flowing for several hundreds of VM example you took.
In your example you took 1000 VMs freezing simultaneously for which you need to finish the config cycles in some 300 msec.

> 
> The on-device Live Migration facility can use the full PCI device bandwidth for
> migration.
So does admin commands also.
However the big difference is: registers do not scale with large number of VFs.
Admin commands scale easily.

I probably should not repeat what is already captured in the admin commands commit log and cover letter.

> 
> That is the difference with the admin vq.
I don’t know what difference you are talking about.
PCI device bandwidth for migration is available with admin commands and some config registers both.
BW != timeout.

> >
> > I am still not able to follow your point for asking about unrelated QOS
> questions.
> explained above, it has to meet the due time requirement and many VMs can
> be migrated simultaneously, in that situation, they have to race for the admin
> vq resource/bandwidth.
> >
> >>>>> Admin command can even fail with EAGAIN error code when device is
> >>>>> out of
> >>>> resource and software can retry the command.
> >>>> As demonstrated, this series is reliable as the config space
> >>>> functionalities, so maybe less possibilities to fail?
> >>> Huh. Config space has far higher failure rate for the PCI transport
> >>> when due to
> >> inherent nature of PCI timeouts and reads and polling.
> >>> For any bulk data transfer virtqueue is spec defined approach.
> >>> For more than a year this was debated you can check some 2021 emails.
> >>>
> >>> You can see the patches that data transfer done in [1] over
> >>> registers is snail
> >> slow.
> >> Do you often observe virtio PCI config space fail? Or does admin vq
> >> need to transfer data through PCI?
> > Admin commands needs to transfer bulk data across thousands of VFs in
> parallel for many VFs without baking registers in PCI.
> So you agree actually PCI config space are very unlikely to fail? It is reliable.
> 
No. I do not agree. It can fail and very hard for board designers.
AQs are more reliable way to transport bulk data in scalable manner for tens of member devices.

> Please allow me to provide an extreme example, is one single admin vq
> limitless, that can serve hundreds to thousands of VMs migration? 
It is left to the device implementation. Just like RSS and multi queue support?
Is one Q enough for 800Gbps to 10Mbps link?
Answer is: Not the scope of specification, spec provide the framework to scale this way, but not impose on the device.

> If not, two or
> three or what number?
It really does not matter. Its wrong point to discuss here.
Number of queues and command execution depends on the device implementation.
A financial transaction application can timeout when a device queuing delay for virtio net rx queue is long.
And we don’t put details about such things in specification.
Spec takes the requirements and provides driver device interface to implement and scale.

I still don’t follow the motivation behind the question.
Is your question: How many admin queues are needed to migrate N member devices? If so, it is implementation specific.
It is similar to how such things depend on implementation for 30 virtio device types.

And if are implying that because it is implementation specific, that is why administration queue should not be used, but some configuration register should be used.
Than you should propose a config register interface to post virtqueue descriptors that way for 30 device types!

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:36                       ` Zhu, Lingshan
@ 2023-09-12  7:43                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  7:43 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 1:06 PM
> 
> On 9/12/2023 2:52 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:13 PM Why need P2P for Live
> >> Migration?
> > A peer device may be accessing the virtio device. Hence first all the devices to
> be stopped like [1] allowing them to accept driver notifications from the peer
> device.
> > Once all the devices are stopped, than each device to be freeze to not do any
> device context updates. At this point the final device context can be read by the
> owner driver.
> Is it beyond the spec? Nvidia specific use case and not related to virtio live
> migration?
Not at all Nvidia specific.
And not all at all beyond the specification.
PCI transport is probably by far most common transport of virtio.
And hence, spec proposed in [1] covers it.

It is the base line implementation of leading OS such as Linux kernel.

Decade mature stack like vfio recommends support for p2p as base line without which multiple devices migration can fail as hypervisor has no knowledge if two devices are interacting or not.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  7:43                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  7:43 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 1:06 PM
> 
> On 9/12/2023 2:52 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:13 PM Why need P2P for Live
> >> Migration?
> > A peer device may be accessing the virtio device. Hence first all the devices to
> be stopped like [1] allowing them to accept driver notifications from the peer
> device.
> > Once all the devices are stopped, than each device to be freeze to not do any
> device context updates. At this point the final device context can be read by the
> owner driver.
> Is it beyond the spec? Nvidia specific use case and not related to virtio live
> migration?
Not at all Nvidia specific.
And not all at all beyond the specification.
PCI transport is probably by far most common transport of virtio.
And hence, spec proposed in [1] covers it.

It is the base line implementation of leading OS such as Linux kernel.

Decade mature stack like vfio recommends support for p2p as base line without which multiple devices migration can fail as hypervisor has no knowledge if two devices are interacting or not.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:29                                           ` Zhu, Lingshan
@ 2023-09-12  7:53                                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  7:53 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:59 PM
> 
> On 9/12/2023 2:49 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:08 PM
> >>
> >> On 9/12/2023 1:51 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 9:19 AM
> >>>>
> >>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >>>>>> anything we need to improve in this series?
> >>>>> Admin commands for passthrough devices of [1] is comprehensive
> >>>>> proposal
> >>>> covering all the aspects.
> >>>>> To me [1] is superset work that covers all needed functionality
> >>>>> and downtime
> >>>> aspects.
> >>>>> I plan to improve [1] with v1 this week by extending device
> >>>>> context and
> >>>> addressing other review comments.
> >>>>> [1]
> >>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> >>>>> 61
> >>>>> .h
> >>>>> tml
> >>>> I am not sure, we have discussed a lot about the potential issues
> >>>> in the treads. I guess we should resolve them first. E.g., nested use cases.
> >>> You are using nesting use case as the _only_ use case and attempt to
> >>> steer
> >> using that.
> >>> Not right.
> >>>
> >>> If you want to discuss, then lets have both the use cases, attempt
> >>> to converge
> >> and if we can its really good.
> >>> If we cannot, both requirements should be handled differently.
> >> Isn't nested a clear use case that should be supported?
> > Most users who care for running real applications and real performance, have
> not asked for nesting.
> > It is not mandatory case; it may be required for some users.
> > I don’t know who needs M level nesting and how cpu also support its
> acceleration etc to run some reasonable workload.
> Nested is a common use case and it is mandatory.
Maybe it is common case for the users you interact with, it is required for some complicated mode.
How many level of nesting 10, 2, 100?

I don’t see a point of debating that "nesting is the only case and mediation is the only way" to do device migration.

As I repeatedly acknowledged, 
We are open to converge on doing administration commands that can work for passthrough and nested way.

I just don’t see how nested solution can work without any mediation, as everything you do touches device reset and FLR flow and it practically breaks the PCI specification with these side band registers and faking device reset and FLR when asked.
This is the primary reason; I am less inclined to go the in-band method.
Until now, no one technically explained how it can even work on question from yesterday.

And if there is one, please explain, I am very interested to learn, how is this done without hacks where device reset by guest _actually_ reset the underlying member device while the dirty page tracking is also ongoing.

So, my humble request is, try to work towards co-existing both the methods if possible, rather than doing either or mode.



^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  7:53                                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  7:53 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 12:59 PM
> 
> On 9/12/2023 2:49 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 12:08 PM
> >>
> >> On 9/12/2023 1:51 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 9:19 AM
> >>>>
> >>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
> >>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
> >>>>>> anything we need to improve in this series?
> >>>>> Admin commands for passthrough devices of [1] is comprehensive
> >>>>> proposal
> >>>> covering all the aspects.
> >>>>> To me [1] is superset work that covers all needed functionality
> >>>>> and downtime
> >>>> aspects.
> >>>>> I plan to improve [1] with v1 this week by extending device
> >>>>> context and
> >>>> addressing other review comments.
> >>>>> [1]
> >>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> >>>>> 61
> >>>>> .h
> >>>>> tml
> >>>> I am not sure, we have discussed a lot about the potential issues
> >>>> in the treads. I guess we should resolve them first. E.g., nested use cases.
> >>> You are using nesting use case as the _only_ use case and attempt to
> >>> steer
> >> using that.
> >>> Not right.
> >>>
> >>> If you want to discuss, then lets have both the use cases, attempt
> >>> to converge
> >> and if we can its really good.
> >>> If we cannot, both requirements should be handled differently.
> >> Isn't nested a clear use case that should be supported?
> > Most users who care for running real applications and real performance, have
> not asked for nesting.
> > It is not mandatory case; it may be required for some users.
> > I don’t know who needs M level nesting and how cpu also support its
> acceleration etc to run some reasonable workload.
> Nested is a common use case and it is mandatory.
Maybe it is common case for the users you interact with, it is required for some complicated mode.
How many level of nesting 10, 2, 100?

I don’t see a point of debating that "nesting is the only case and mediation is the only way" to do device migration.

As I repeatedly acknowledged, 
We are open to converge on doing administration commands that can work for passthrough and nested way.

I just don’t see how nested solution can work without any mediation, as everything you do touches device reset and FLR flow and it practically breaks the PCI specification with these side band registers and faking device reset and FLR when asked.
This is the primary reason; I am less inclined to go the in-band method.
Until now, no one technically explained how it can even work on question from yesterday.

And if there is one, please explain, I am very interested to learn, how is this done without hacks where device reset by guest _actually_ reset the underlying member device while the dirty page tracking is also ongoing.

So, my humble request is, try to work towards co-existing both the methods if possible, rather than doing either or mode.



^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:40                                             ` [virtio-dev] " Parav Pandit
@ 2023-09-12  9:02                                               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:02 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 3:40 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:58 PM
>> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
>> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
>> open.org; virtio-dev@lists.oasis-open.org
>> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
>> VIRTIO_F_QUEUE_STATE
>>
>>
>>
>> On 9/12/2023 2:47 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:04 PM
>>>>
>>>>
>>>> On 9/12/2023 1:58 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 9:37 AM
>>>>>>
>>>>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>>>>>> "admin" VF? This require the HW reserve dedicated resource for
>>>>>>>> every VF?
>>>>>>>> So expensive, Overkill?
>>>>>>>>
>>>>>>>> And a VF may be managed by the PF and its admin "vf"?
>>>>>>> Yes.
>>>>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
>>>>>> managed by both L1 guest VF and the host PF, that means two owners
>>>>>> of the
>>>> L2 VF.
>>>>> This is the nesting.
>>>>> When you do M level nesting, does any cpu in world handle its own
>>>>> page
>>>> tables in isolation of next level and also perform equally well?
>>>> Not exactly, in nesting, L1 guest is the host/infrastructure emulator
>>>> for L2, so L2 is expect to do nothing with the host, or something
>>>> like L2 VF managed by both
>>>> L1 VF and host PF can lead to operational and security issues?
>>>>>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>>>>>> UDP is not reliable, and performance overhead does not mean fail.
>>>>>>> It largely depends on application.
>>>>>>> I have seen iperf UDP failing on packet drop and never recovered.
>>>>>>> A retransmission over UDP can fail.
>>>>>> That depends on the workload, if it choose UDP, it is aware of the
>>>>>> possibilities of losing packets. But anyway, LM are expected to
>>>>>> perform successfully in the due time
>>>>> And LM also depends on the workload. :)
>>>> Exactly! That's the point, how to meet the requirements!
>>>>> It is pointless to discuss performance characteristics as a point to
>>>>> use AQ or
>>>> not.
>>>> How to meet QOS requirement when LM?
>>> By following [1] where large part of device context and dirty page tracking is
>> done when the VM is running.
>> Still needs to migrate the last round of dirty pages and device states when VM
>> freeze. Still can be large if take big amount of VMs into consideration, and that
>> is where ~300ms due time rules.
>>>>> No. board designer does not need to.
>>>>> As explained already, if board wants to supporting single command of
>>>>> AQ,
>>>> sure.
>>>> Same as above, the QOS question. For example, how to avoid the
>>>> situation that half VMs can be migrated and others timeout?
>>> Why would this happen?
>>> Timeout is not related to AQ in case if that happens.
>> explained above
>>> Timeout can happen to config registers too. And it can be even far more
>> harder for board designers to support PCI reads in a timeout to handle in 384
>> reads in parallel.
>> When the VM freeze, the virtio functionalities, for example virito-net
>> transaction is suspended as well, so no TLPs for networking traffic buffers.
> The config registers mediated operation done by host itself are TLPs flowing for several hundreds of VM example you took.
> In your example you took 1000 VMs freezing simultaneously for which you need to finish the config cycles in some 300 msec.
This is per-device operations, directly access device config space, 
consume the dedicated device resource & bandwidth, like
other standard virito operations.
>
>> The on-device Live Migration facility can use the full PCI device bandwidth for
>> migration.
> So does admin commands also.
> However the big difference is: registers do not scale with large number of VFs.
> Admin commands scale easily.
admin vq require fixed and dedicated resource to serve the VMs, the 
question still
remains, does is scale to server big amount of devices migration? how 
many admin
vqs do you need to serve 10 VMs, how many for 100? and so on? How to scale?

If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable 
time?
If not, how many exactly.


And register does not need to scale, it resides on the VF and only serve 
the VF.

It does not reside on the PF to migrate the VFs.
>
> I probably should not repeat what is already captured in the admin commands commit log and cover letter.
>
>> That is the difference with the admin vq.
> I don’t know what difference you are talking about.
> PCI device bandwidth for migration is available with admin commands and some config registers both.
> BW != timeout.
VFs config space can use the device dedicated resource like the bandwidth.

for AQ, still you need to reserve resource and how much?
>
>>> I am still not able to follow your point for asking about unrelated QOS
>> questions.
>> explained above, it has to meet the due time requirement and many VMs can
>> be migrated simultaneously, in that situation, they have to race for the admin
>> vq resource/bandwidth.
>>>>>>> Admin command can even fail with EAGAIN error code when device is
>>>>>>> out of
>>>>>> resource and software can retry the command.
>>>>>> As demonstrated, this series is reliable as the config space
>>>>>> functionalities, so maybe less possibilities to fail?
>>>>> Huh. Config space has far higher failure rate for the PCI transport
>>>>> when due to
>>>> inherent nature of PCI timeouts and reads and polling.
>>>>> For any bulk data transfer virtqueue is spec defined approach.
>>>>> For more than a year this was debated you can check some 2021 emails.
>>>>>
>>>>> You can see the patches that data transfer done in [1] over
>>>>> registers is snail
>>>> slow.
>>>> Do you often observe virtio PCI config space fail? Or does admin vq
>>>> need to transfer data through PCI?
>>> Admin commands needs to transfer bulk data across thousands of VFs in
>> parallel for many VFs without baking registers in PCI.
>> So you agree actually PCI config space are very unlikely to fail? It is reliable.
>>
> No. I do not agree. It can fail and very hard for board designers.
> AQs are more reliable way to transport bulk data in scalable manner for tens of member devices.
Really? How often do you observe virtio config space fail?
>
>> Please allow me to provide an extreme example, is one single admin vq
>> limitless, that can serve hundreds to thousands of VMs migration?
> It is left to the device implementation. Just like RSS and multi queue support?
> Is one Q enough for 800Gbps to 10Mbps link?
> Answer is: Not the scope of specification, spec provide the framework to scale this way, but not impose on the device.
Even if not support RSS or MQ, the device still can work with 
performance overhead, not fail.

Insufficient bandwidth & resource caused live migration fail is totally 
different.
>
>> If not, two or
>> three or what number?
> It really does not matter. Its wrong point to discuss here.
> Number of queues and command execution depends on the device implementation.
> A financial transaction application can timeout when a device queuing delay for virtio net rx queue is long.
> And we don’t put details about such things in specification.
> Spec takes the requirements and provides driver device interface to implement and scale.
>
> I still don’t follow the motivation behind the question.
> Is your question: How many admin queues are needed to migrate N member devices? If so, it is implementation specific.
> It is similar to how such things depend on implementation for 30 virtio device types.
>
> And if are implying that because it is implementation specific, that is why administration queue should not be used, but some configuration register should be used.
> Than you should propose a config register interface to post virtqueue descriptors that way for 30 device types!
if so, leave it as undefined? A potential risk for device implantation?
Then why must the admin vq?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  9:02                                               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:02 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 3:40 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:58 PM
>> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
>> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
>> open.org; virtio-dev@lists.oasis-open.org
>> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
>> VIRTIO_F_QUEUE_STATE
>>
>>
>>
>> On 9/12/2023 2:47 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:04 PM
>>>>
>>>>
>>>> On 9/12/2023 1:58 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 9:37 AM
>>>>>>
>>>>>> On 9/11/2023 6:21 PM, Parav Pandit wrote:
>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM So implement AQ on the
>>>>>>>> "admin" VF? This require the HW reserve dedicated resource for
>>>>>>>> every VF?
>>>>>>>> So expensive, Overkill?
>>>>>>>>
>>>>>>>> And a VF may be managed by the PF and its admin "vf"?
>>>>>>> Yes.
>>>>>> it's a bit chaos, as you can see if the nested(L2 guest) VF can be
>>>>>> managed by both L1 guest VF and the host PF, that means two owners
>>>>>> of the
>>>> L2 VF.
>>>>> This is the nesting.
>>>>> When you do M level nesting, does any cpu in world handle its own
>>>>> page
>>>> tables in isolation of next level and also perform equally well?
>>>> Not exactly, in nesting, L1 guest is the host/infrastructure emulator
>>>> for L2, so L2 is expect to do nothing with the host, or something
>>>> like L2 VF managed by both
>>>> L1 VF and host PF can lead to operational and security issues?
>>>>>>>>> If UDP packets are dropped, even application can fail who do no retry.
>>>>>>>> UDP is not reliable, and performance overhead does not mean fail.
>>>>>>> It largely depends on application.
>>>>>>> I have seen iperf UDP failing on packet drop and never recovered.
>>>>>>> A retransmission over UDP can fail.
>>>>>> That depends on the workload, if it choose UDP, it is aware of the
>>>>>> possibilities of losing packets. But anyway, LM are expected to
>>>>>> perform successfully in the due time
>>>>> And LM also depends on the workload. :)
>>>> Exactly! That's the point, how to meet the requirements!
>>>>> It is pointless to discuss performance characteristics as a point to
>>>>> use AQ or
>>>> not.
>>>> How to meet QOS requirement when LM?
>>> By following [1] where large part of device context and dirty page tracking is
>> done when the VM is running.
>> Still needs to migrate the last round of dirty pages and device states when VM
>> freeze. Still can be large if take big amount of VMs into consideration, and that
>> is where ~300ms due time rules.
>>>>> No. board designer does not need to.
>>>>> As explained already, if board wants to supporting single command of
>>>>> AQ,
>>>> sure.
>>>> Same as above, the QOS question. For example, how to avoid the
>>>> situation that half VMs can be migrated and others timeout?
>>> Why would this happen?
>>> Timeout is not related to AQ in case if that happens.
>> explained above
>>> Timeout can happen to config registers too. And it can be even far more
>> harder for board designers to support PCI reads in a timeout to handle in 384
>> reads in parallel.
>> When the VM freeze, the virtio functionalities, for example virito-net
>> transaction is suspended as well, so no TLPs for networking traffic buffers.
> The config registers mediated operation done by host itself are TLPs flowing for several hundreds of VM example you took.
> In your example you took 1000 VMs freezing simultaneously for which you need to finish the config cycles in some 300 msec.
This is per-device operations, directly access device config space, 
consume the dedicated device resource & bandwidth, like
other standard virito operations.
>
>> The on-device Live Migration facility can use the full PCI device bandwidth for
>> migration.
> So does admin commands also.
> However the big difference is: registers do not scale with large number of VFs.
> Admin commands scale easily.
admin vq require fixed and dedicated resource to serve the VMs, the 
question still
remains, does is scale to server big amount of devices migration? how 
many admin
vqs do you need to serve 10 VMs, how many for 100? and so on? How to scale?

If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable 
time?
If not, how many exactly.


And register does not need to scale, it resides on the VF and only serve 
the VF.

It does not reside on the PF to migrate the VFs.
>
> I probably should not repeat what is already captured in the admin commands commit log and cover letter.
>
>> That is the difference with the admin vq.
> I don’t know what difference you are talking about.
> PCI device bandwidth for migration is available with admin commands and some config registers both.
> BW != timeout.
VFs config space can use the device dedicated resource like the bandwidth.

for AQ, still you need to reserve resource and how much?
>
>>> I am still not able to follow your point for asking about unrelated QOS
>> questions.
>> explained above, it has to meet the due time requirement and many VMs can
>> be migrated simultaneously, in that situation, they have to race for the admin
>> vq resource/bandwidth.
>>>>>>> Admin command can even fail with EAGAIN error code when device is
>>>>>>> out of
>>>>>> resource and software can retry the command.
>>>>>> As demonstrated, this series is reliable as the config space
>>>>>> functionalities, so maybe less possibilities to fail?
>>>>> Huh. Config space has far higher failure rate for the PCI transport
>>>>> when due to
>>>> inherent nature of PCI timeouts and reads and polling.
>>>>> For any bulk data transfer virtqueue is spec defined approach.
>>>>> For more than a year this was debated you can check some 2021 emails.
>>>>>
>>>>> You can see the patches that data transfer done in [1] over
>>>>> registers is snail
>>>> slow.
>>>> Do you often observe virtio PCI config space fail? Or does admin vq
>>>> need to transfer data through PCI?
>>> Admin commands needs to transfer bulk data across thousands of VFs in
>> parallel for many VFs without baking registers in PCI.
>> So you agree actually PCI config space are very unlikely to fail? It is reliable.
>>
> No. I do not agree. It can fail and very hard for board designers.
> AQs are more reliable way to transport bulk data in scalable manner for tens of member devices.
Really? How often do you observe virtio config space fail?
>
>> Please allow me to provide an extreme example, is one single admin vq
>> limitless, that can serve hundreds to thousands of VMs migration?
> It is left to the device implementation. Just like RSS and multi queue support?
> Is one Q enough for 800Gbps to 10Mbps link?
> Answer is: Not the scope of specification, spec provide the framework to scale this way, but not impose on the device.
Even if not support RSS or MQ, the device still can work with 
performance overhead, not fail.

Insufficient bandwidth & resource caused live migration fail is totally 
different.
>
>> If not, two or
>> three or what number?
> It really does not matter. Its wrong point to discuss here.
> Number of queues and command execution depends on the device implementation.
> A financial transaction application can timeout when a device queuing delay for virtio net rx queue is long.
> And we don’t put details about such things in specification.
> Spec takes the requirements and provides driver device interface to implement and scale.
>
> I still don’t follow the motivation behind the question.
> Is your question: How many admin queues are needed to migrate N member devices? If so, it is implementation specific.
> It is similar to how such things depend on implementation for 30 virtio device types.
>
> And if are implying that because it is implementation specific, that is why administration queue should not be used, but some configuration register should be used.
> Than you should propose a config register interface to post virtqueue descriptors that way for 30 device types!
if so, leave it as undefined? A potential risk for device implantation?
Then why must the admin vq?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:53                                             ` [virtio-dev] " Parav Pandit
@ 2023-09-12  9:06                                               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:06 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 3:53 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:59 PM
>>
>> On 9/12/2023 2:49 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:08 PM
>>>>
>>>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>>>
>>>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>>>> anything we need to improve in this series?
>>>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>>>> proposal
>>>>>> covering all the aspects.
>>>>>>> To me [1] is superset work that covers all needed functionality
>>>>>>> and downtime
>>>>>> aspects.
>>>>>>> I plan to improve [1] with v1 this week by extending device
>>>>>>> context and
>>>>>> addressing other review comments.
>>>>>>> [1]
>>>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>>>> 61
>>>>>>> .h
>>>>>>> tml
>>>>>> I am not sure, we have discussed a lot about the potential issues
>>>>>> in the treads. I guess we should resolve them first. E.g., nested use cases.
>>>>> You are using nesting use case as the _only_ use case and attempt to
>>>>> steer
>>>> using that.
>>>>> Not right.
>>>>>
>>>>> If you want to discuss, then lets have both the use cases, attempt
>>>>> to converge
>>>> and if we can its really good.
>>>>> If we cannot, both requirements should be handled differently.
>>>> Isn't nested a clear use case that should be supported?
>>> Most users who care for running real applications and real performance, have
>> not asked for nesting.
>>> It is not mandatory case; it may be required for some users.
>>> I don’t know who needs M level nesting and how cpu also support its
>> acceleration etc to run some reasonable workload.
>> Nested is a common use case and it is mandatory.
> Maybe it is common case for the users you interact with, it is required for some complicated mode.
> How many level of nesting 10, 2, 100?
>
> I don’t see a point of debating that "nesting is the only case and mediation is the only way" to do device migration.
>
> As I repeatedly acknowledged,
> We are open to converge on doing administration commands that can work for passthrough and nested way.
>
> I just don’t see how nested solution can work without any mediation, as everything you do touches device reset and FLR flow and it practically breaks the PCI specification with these side band registers and faking device reset and FLR when asked.
> This is the primary reason; I am less inclined to go the in-band method.
> Until now, no one technically explained how it can even work on question from yesterday.
>
> And if there is one, please explain, I am very interested to learn, how is this done without hacks where device reset by guest _actually_ reset the underlying member device while the dirty page tracking is also ongoing.
>
> So, my humble request is, try to work towards co-existing both the methods if possible, rather than doing either or mode.
If you want AQ used for LM, it should support nested anyway, don't break 
user logic.


This (my)series can support nested.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  9:06                                               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:06 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 3:53 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 12:59 PM
>>
>> On 9/12/2023 2:49 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:08 PM
>>>>
>>>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>>>
>>>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>>>> anything we need to improve in this series?
>>>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>>>> proposal
>>>>>> covering all the aspects.
>>>>>>> To me [1] is superset work that covers all needed functionality
>>>>>>> and downtime
>>>>>> aspects.
>>>>>>> I plan to improve [1] with v1 this week by extending device
>>>>>>> context and
>>>>>> addressing other review comments.
>>>>>>> [1]
>>>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>>>> 61
>>>>>>> .h
>>>>>>> tml
>>>>>> I am not sure, we have discussed a lot about the potential issues
>>>>>> in the treads. I guess we should resolve them first. E.g., nested use cases.
>>>>> You are using nesting use case as the _only_ use case and attempt to
>>>>> steer
>>>> using that.
>>>>> Not right.
>>>>>
>>>>> If you want to discuss, then lets have both the use cases, attempt
>>>>> to converge
>>>> and if we can its really good.
>>>>> If we cannot, both requirements should be handled differently.
>>>> Isn't nested a clear use case that should be supported?
>>> Most users who care for running real applications and real performance, have
>> not asked for nesting.
>>> It is not mandatory case; it may be required for some users.
>>> I don’t know who needs M level nesting and how cpu also support its
>> acceleration etc to run some reasonable workload.
>> Nested is a common use case and it is mandatory.
> Maybe it is common case for the users you interact with, it is required for some complicated mode.
> How many level of nesting 10, 2, 100?
>
> I don’t see a point of debating that "nesting is the only case and mediation is the only way" to do device migration.
>
> As I repeatedly acknowledged,
> We are open to converge on doing administration commands that can work for passthrough and nested way.
>
> I just don’t see how nested solution can work without any mediation, as everything you do touches device reset and FLR flow and it practically breaks the PCI specification with these side band registers and faking device reset and FLR when asked.
> This is the primary reason; I am less inclined to go the in-band method.
> Until now, no one technically explained how it can even work on question from yesterday.
>
> And if there is one, please explain, I am very interested to learn, how is this done without hacks where device reset by guest _actually_ reset the underlying member device while the dirty page tracking is also ongoing.
>
> So, my humble request is, try to work towards co-existing both the methods if possible, rather than doing either or mode.
If you want AQ used for LM, it should support nested anyway, don't break 
user logic.


This (my)series can support nested.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:06                                               ` Zhu, Lingshan
@ 2023-09-12  9:08                                                 ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:08 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:06 PM, Zhu, Lingshan wrote:
>
>
> On 9/12/2023 3:53 PM, Parav Pandit wrote:
>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>> Sent: Tuesday, September 12, 2023 12:59 PM
>>>
>>> On 9/12/2023 2:49 PM, Parav Pandit wrote:
>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>> Sent: Tuesday, September 12, 2023 12:08 PM
>>>>>
>>>>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>>>>
>>>>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>>>>> anything we need to improve in this series?
>>>>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>>>>> proposal
>>>>>>> covering all the aspects.
>>>>>>>> To me [1] is superset work that covers all needed functionality
>>>>>>>> and downtime
>>>>>>> aspects.
>>>>>>>> I plan to improve [1] with v1 this week by extending device
>>>>>>>> context and
>>>>>>> addressing other review comments.
>>>>>>>> [1]
>>>>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>>>>> 61
>>>>>>>> .h
>>>>>>>> tml
>>>>>>> I am not sure, we have discussed a lot about the potential issues
>>>>>>> in the treads. I guess we should resolve them first. E.g., 
>>>>>>> nested use cases.
>>>>>> You are using nesting use case as the _only_ use case and attempt to
>>>>>> steer
>>>>> using that.
>>>>>> Not right.
>>>>>>
>>>>>> If you want to discuss, then lets have both the use cases, attempt
>>>>>> to converge
>>>>> and if we can its really good.
>>>>>> If we cannot, both requirements should be handled differently.
>>>>> Isn't nested a clear use case that should be supported?
>>>> Most users who care for running real applications and real 
>>>> performance, have
>>> not asked for nesting.
>>>> It is not mandatory case; it may be required for some users.
>>>> I don’t know who needs M level nesting and how cpu also support its
>>> acceleration etc to run some reasonable workload.
>>> Nested is a common use case and it is mandatory.
>> Maybe it is common case for the users you interact with, it is 
>> required for some complicated mode.
>> How many level of nesting 10, 2, 100?
>>
>> I don’t see a point of debating that "nesting is the only case and 
>> mediation is the only way" to do device migration.
>>
>> As I repeatedly acknowledged,
>> We are open to converge on doing administration commands that can 
>> work for passthrough and nested way.
>>
>> I just don’t see how nested solution can work without any mediation, 
>> as everything you do touches device reset and FLR flow and it 
>> practically breaks the PCI specification with these side band 
>> registers and faking device reset and FLR when asked.
>> This is the primary reason; I am less inclined to go the in-band method.
>> Until now, no one technically explained how it can even work on 
>> question from yesterday.
>>
>> And if there is one, please explain, I am very interested to learn, 
>> how is this done without hacks where device reset by guest _actually_ 
>> reset the underlying member device while the dirty page tracking is 
>> also ongoing.
>>
>> So, my humble request is, try to work towards co-existing both the 
>> methods if possible, rather than doing either or mode.
> If you want AQ used for LM, it should support nested anyway, don't 
> break user logic.
>
>
> This (my)series can support nested.
supplementary: As Jason ever pointed out: the two solution can co-exist 
for sure, I am implementing basic facilities, admin vq can free feel to 
reuse them like forwarding messages to them, and this can help support 
nested.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  9:08                                                 ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12  9:08 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:06 PM, Zhu, Lingshan wrote:
>
>
> On 9/12/2023 3:53 PM, Parav Pandit wrote:
>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>> Sent: Tuesday, September 12, 2023 12:59 PM
>>>
>>> On 9/12/2023 2:49 PM, Parav Pandit wrote:
>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>> Sent: Tuesday, September 12, 2023 12:08 PM
>>>>>
>>>>> On 9/12/2023 1:51 PM, Parav Pandit wrote:
>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>> Sent: Tuesday, September 12, 2023 9:19 AM
>>>>>>>
>>>>>>> On 9/11/2023 7:50 PM, Parav Pandit wrote:
>>>>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>>>>> Sent: Monday, September 11, 2023 3:03 PM By the way, do you see
>>>>>>>>> anything we need to improve in this series?
>>>>>>>> Admin commands for passthrough devices of [1] is comprehensive
>>>>>>>> proposal
>>>>>>> covering all the aspects.
>>>>>>>> To me [1] is superset work that covers all needed functionality
>>>>>>>> and downtime
>>>>>>> aspects.
>>>>>>>> I plan to improve [1] with v1 this week by extending device
>>>>>>>> context and
>>>>>>> addressing other review comments.
>>>>>>>> [1]
>>>>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
>>>>>>>> 61
>>>>>>>> .h
>>>>>>>> tml
>>>>>>> I am not sure, we have discussed a lot about the potential issues
>>>>>>> in the treads. I guess we should resolve them first. E.g., 
>>>>>>> nested use cases.
>>>>>> You are using nesting use case as the _only_ use case and attempt to
>>>>>> steer
>>>>> using that.
>>>>>> Not right.
>>>>>>
>>>>>> If you want to discuss, then lets have both the use cases, attempt
>>>>>> to converge
>>>>> and if we can its really good.
>>>>>> If we cannot, both requirements should be handled differently.
>>>>> Isn't nested a clear use case that should be supported?
>>>> Most users who care for running real applications and real 
>>>> performance, have
>>> not asked for nesting.
>>>> It is not mandatory case; it may be required for some users.
>>>> I don’t know who needs M level nesting and how cpu also support its
>>> acceleration etc to run some reasonable workload.
>>> Nested is a common use case and it is mandatory.
>> Maybe it is common case for the users you interact with, it is 
>> required for some complicated mode.
>> How many level of nesting 10, 2, 100?
>>
>> I don’t see a point of debating that "nesting is the only case and 
>> mediation is the only way" to do device migration.
>>
>> As I repeatedly acknowledged,
>> We are open to converge on doing administration commands that can 
>> work for passthrough and nested way.
>>
>> I just don’t see how nested solution can work without any mediation, 
>> as everything you do touches device reset and FLR flow and it 
>> practically breaks the PCI specification with these side band 
>> registers and faking device reset and FLR when asked.
>> This is the primary reason; I am less inclined to go the in-band method.
>> Until now, no one technically explained how it can even work on 
>> question from yesterday.
>>
>> And if there is one, please explain, I am very interested to learn, 
>> how is this done without hacks where device reset by guest _actually_ 
>> reset the underlying member device while the dirty page tracking is 
>> also ongoing.
>>
>> So, my humble request is, try to work towards co-existing both the 
>> methods if possible, rather than doing either or mode.
> If you want AQ used for LM, it should support nested anyway, don't 
> break user logic.
>
>
> This (my)series can support nested.
supplementary: As Jason ever pointed out: the two solution can co-exist 
for sure, I am implementing basic facilities, admin vq can free feel to 
reuse them like forwarding messages to them, and this can help support 
nested.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:02                                               ` Zhu, Lingshan
@ 2023-09-12  9:21                                                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  9:21 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:33 PM

> admin vq require fixed and dedicated resource to serve the VMs, the question
> still remains, does is scale to server big amount of devices migration? how many
> admin vqs do you need to serve 10 VMs, how many for 100? and so on? How to
> scale?
>
Yes, it scales within the AQ and across multiple AQs.
Please consult your board designers to know such limits for your device.
 
> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable time?
> If not, how many exactly.
> 
Yes, it can serve both 100 and 1000 VMs in reasonable time.

> 
> And register does not need to scale, it resides on the VF and only serve
> the VF.
>
Since its per VF, by nature it is linearly growing entity that the board design needs to support read and write with guaranteed timing.
It clearly scaled poor than queue.
 
> It does not reside on the PF to migrate the VFs.
Hence it does not scale and cannot do parallel operation within the VF, unless each register is replicated.

Using register of a queue for bulk data transfer is solved question when the virtio spec was born.
I don’t see a point to discuss it.
Snippet from spec: " As a device can have zero or more virtqueues for bulk data transport"

> VFs config space can use the device dedicated resource like the bandwidth.
>
> for AQ, still you need to reserve resource and how much?
It depends on your board, please consult your board designer to know depending on the implementation.
From spec point of view, it should not be same as any other virtqueue.

> > No. I do not agree. It can fail and very hard for board designers.
> > AQs are more reliable way to transport bulk data in scalable manner for tens
> of member devices.
> Really? How often do you observe virtio config space fail?

On Intel Icelake server we have seen it failing with 128 VFs.
And device needs to do very weird things to support 1000+ VFs forever expanding config space, which is not the topic of this discussion anyway.


> >
> >> Please allow me to provide an extreme example, is one single admin vq
> >> limitless, that can serve hundreds to thousands of VMs migration?
> > It is left to the device implementation. Just like RSS and multi queue support?
> > Is one Q enough for 800Gbps to 10Mbps link?
> > Answer is: Not the scope of specification, spec provide the framework to scale
> this way, but not impose on the device.
> Even if not support RSS or MQ, the device still can work with
> performance overhead, not fail.
>
_work_ is subjective. 
The financial transaction (application) failed. Packeted worked.
LM commands were successful, but it was not timely.

Same same..
 
> Insufficient bandwidth & resource caused live migration fail is totally
> different.
Very abstract point and unrelated to administration commands.

> >
> >> If not, two or
> >> three or what number?
> > It really does not matter. Its wrong point to discuss here.
> > Number of queues and command execution depends on the device
> implementation.
> > A financial transaction application can timeout when a device queuing delay
> for virtio net rx queue is long.
> > And we don’t put details about such things in specification.
> > Spec takes the requirements and provides driver device interface to
> implement and scale.
> >
> > I still don’t follow the motivation behind the question.
> > Is your question: How many admin queues are needed to migrate N member
> devices? If so, it is implementation specific.
> > It is similar to how such things depend on implementation for 30 virtio device
> types.
> >
> > And if are implying that because it is implementation specific, that is why
> administration queue should not be used, but some configuration register
> should be used.
> > Than you should propose a config register interface to post virtqueue
> descriptors that way for 30 device types!
> if so, leave it as undefined? A potential risk for device implantation?


> Then why must the admin vq?

Because administration commands and admin vq does not impose devices to implement thousands of registers which must have time bound completion guarantee.
The large part of industry including SIOV devices led by Intel and others are moving away from register access mode.

To summarize, administration commands and queue offer following benefits.

1. Ability to do bulk data transfer between driver and device

2. Ability to parallelize the work within driver and within device within single or multiple virtqueues

3. Eliminates implementing PCI read/write MMIO registers which demand low latency response interval

4. Better utilize host cpu as no one needs to poll on the device register for completion

5. Ability to handle variability in command completion by device and ability to notify the driver

If this does not satisfy you, please refer to some of the past email discussions during administration virtuqueue time.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  9:21                                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  9:21 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:33 PM

> admin vq require fixed and dedicated resource to serve the VMs, the question
> still remains, does is scale to server big amount of devices migration? how many
> admin vqs do you need to serve 10 VMs, how many for 100? and so on? How to
> scale?
>
Yes, it scales within the AQ and across multiple AQs.
Please consult your board designers to know such limits for your device.
 
> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable time?
> If not, how many exactly.
> 
Yes, it can serve both 100 and 1000 VMs in reasonable time.

> 
> And register does not need to scale, it resides on the VF and only serve
> the VF.
>
Since its per VF, by nature it is linearly growing entity that the board design needs to support read and write with guaranteed timing.
It clearly scaled poor than queue.
 
> It does not reside on the PF to migrate the VFs.
Hence it does not scale and cannot do parallel operation within the VF, unless each register is replicated.

Using register of a queue for bulk data transfer is solved question when the virtio spec was born.
I don’t see a point to discuss it.
Snippet from spec: " As a device can have zero or more virtqueues for bulk data transport"

> VFs config space can use the device dedicated resource like the bandwidth.
>
> for AQ, still you need to reserve resource and how much?
It depends on your board, please consult your board designer to know depending on the implementation.
From spec point of view, it should not be same as any other virtqueue.

> > No. I do not agree. It can fail and very hard for board designers.
> > AQs are more reliable way to transport bulk data in scalable manner for tens
> of member devices.
> Really? How often do you observe virtio config space fail?

On Intel Icelake server we have seen it failing with 128 VFs.
And device needs to do very weird things to support 1000+ VFs forever expanding config space, which is not the topic of this discussion anyway.


> >
> >> Please allow me to provide an extreme example, is one single admin vq
> >> limitless, that can serve hundreds to thousands of VMs migration?
> > It is left to the device implementation. Just like RSS and multi queue support?
> > Is one Q enough for 800Gbps to 10Mbps link?
> > Answer is: Not the scope of specification, spec provide the framework to scale
> this way, but not impose on the device.
> Even if not support RSS or MQ, the device still can work with
> performance overhead, not fail.
>
_work_ is subjective. 
The financial transaction (application) failed. Packeted worked.
LM commands were successful, but it was not timely.

Same same..
 
> Insufficient bandwidth & resource caused live migration fail is totally
> different.
Very abstract point and unrelated to administration commands.

> >
> >> If not, two or
> >> three or what number?
> > It really does not matter. Its wrong point to discuss here.
> > Number of queues and command execution depends on the device
> implementation.
> > A financial transaction application can timeout when a device queuing delay
> for virtio net rx queue is long.
> > And we don’t put details about such things in specification.
> > Spec takes the requirements and provides driver device interface to
> implement and scale.
> >
> > I still don’t follow the motivation behind the question.
> > Is your question: How many admin queues are needed to migrate N member
> devices? If so, it is implementation specific.
> > It is similar to how such things depend on implementation for 30 virtio device
> types.
> >
> > And if are implying that because it is implementation specific, that is why
> administration queue should not be used, but some configuration register
> should be used.
> > Than you should propose a config register interface to post virtqueue
> descriptors that way for 30 device types!
> if so, leave it as undefined? A potential risk for device implantation?


> Then why must the admin vq?

Because administration commands and admin vq does not impose devices to implement thousands of registers which must have time bound completion guarantee.
The large part of industry including SIOV devices led by Intel and others are moving away from register access mode.

To summarize, administration commands and queue offer following benefits.

1. Ability to do bulk data transfer between driver and device

2. Ability to parallelize the work within driver and within device within single or multiple virtqueues

3. Eliminates implementing PCI read/write MMIO registers which demand low latency response interval

4. Better utilize host cpu as no one needs to poll on the device register for completion

5. Ability to handle variability in command completion by device and ability to notify the driver

If this does not satisfy you, please refer to some of the past email discussions during administration virtuqueue time.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:06                                               ` Zhu, Lingshan
@ 2023-09-12  9:28                                                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  9:28 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:36 PM

> If you want AQ used for LM, it should support nested anyway, don't break user
> logic.
You ignored the other part of my question when you asked above.
i.e. a PCI transport do not allow such weird bifurcation.
> 
> 
> This (my)series can support nested.
Maybe it does, with hacking the device reset and FLR sequence, without dirty tracking, without P2P support, without passthrough mode.
All these requirements are not addressed.

If you intent to cover both requirements, lets work towards it to see if it can converge, if there are technical challenges, 
then there is no point in pushing to claim that in-band VF with mediation is the only way to move forward.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  9:28                                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  9:28 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:36 PM

> If you want AQ used for LM, it should support nested anyway, don't break user
> logic.
You ignored the other part of my question when you asked above.
i.e. a PCI transport do not allow such weird bifurcation.
> 
> 
> This (my)series can support nested.
Maybe it does, with hacking the device reset and FLR sequence, without dirty tracking, without P2P support, without passthrough mode.
All these requirements are not addressed.

If you intent to cover both requirements, lets work towards it to see if it can converge, if there are technical challenges, 
then there is no point in pushing to claim that in-band VF with mediation is the only way to move forward.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:08                                                 ` Zhu, Lingshan
@ 2023-09-12  9:35                                                   ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  9:35 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:38 PM

> supplementary: As Jason ever pointed out: the two solution can co-exist for
> sure, I am implementing basic facilities, admin vq can free feel to reuse them like
> forwarding messages to them, and this can help support nested.

Sure. Sounds good.

At lest two device vendors + other industry bodies including led by Intel are moving away from the register-based implementation in virtualization area.
And registers that you expose are not supporting device reset and FLR sequence. So please add some text for that in PCI transport section about violation.
And guideline for driver on how it should not touch them to make this usable.
This will make the nested solution more clear.

Do you find the administration commands we proposed in [1] useful for nested case?
If not, both will likely diverge.

We would like to avoid suspending individual VQs in the passthrough case, as things are controlled at the device level.
It also reduces driver -> device interaction for large queue count ranging from 1 to 32K.

So at present I see very little overlap between the two. I will look more again on 9/13 if passthrough proposal can utilize anything from your series.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12  9:35                                                   ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12  9:35 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 2:38 PM

> supplementary: As Jason ever pointed out: the two solution can co-exist for
> sure, I am implementing basic facilities, admin vq can free feel to reuse them like
> forwarding messages to them, and this can help support nested.

Sure. Sounds good.

At lest two device vendors + other industry bodies including led by Intel are moving away from the register-based implementation in virtualization area.
And registers that you expose are not supporting device reset and FLR sequence. So please add some text for that in PCI transport section about violation.
And guideline for driver on how it should not touch them to make this usable.
This will make the nested solution more clear.

Do you find the administration commands we proposed in [1] useful for nested case?
If not, both will likely diverge.

We would like to avoid suspending individual VQs in the passthrough case, as things are controlled at the device level.
It also reduces driver -> device interaction for large queue count ranging from 1 to 32K.

So at present I see very little overlap between the two. I will look more again on 9/13 if passthrough proposal can utilize anything from your series.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:35                                                   ` Parav Pandit
@ 2023-09-12 10:14                                                     ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:14 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:35 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:38 PM
>> supplementary: As Jason ever pointed out: the two solution can co-exist for
>> sure, I am implementing basic facilities, admin vq can free feel to reuse them like
>> forwarding messages to them, and this can help support nested.
> Sure. Sounds good.
>
> At lest two device vendors + other industry bodies including led by Intel are moving away from the register-based implementation in virtualization area.
This series is self-contained, it is an register based solution. It 
introduces basic facilities, doesn't depend on others like AQ.
> And registers that you expose are not supporting device reset and FLR sequence. So please add some text for that in PCI transport section about violation.
> And guideline for driver on how it should not touch them to make this usable.
> This will make the nested solution more clear.
PCI FLR is out of this scope, for virtio you can still reset the device 
by writing 0.
>
> Do you find the administration commands we proposed in [1] useful for nested case?
> If not, both will likely diverge.
Not till now.
>
> We would like to avoid suspending individual VQs in the passthrough case, as things are controlled at the device level.
> It also reduces driver -> device interaction for large queue count ranging from 1 to 32K.
>
> So at present I see very little overlap between the two. I will look more again on 9/13 if passthrough proposal can utilize anything from your series.
It does not suspending an individual VQ, when suspend, all VQs are STOPPED.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:14                                                     ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:14 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:35 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:38 PM
>> supplementary: As Jason ever pointed out: the two solution can co-exist for
>> sure, I am implementing basic facilities, admin vq can free feel to reuse them like
>> forwarding messages to them, and this can help support nested.
> Sure. Sounds good.
>
> At lest two device vendors + other industry bodies including led by Intel are moving away from the register-based implementation in virtualization area.
This series is self-contained, it is an register based solution. It 
introduces basic facilities, doesn't depend on others like AQ.
> And registers that you expose are not supporting device reset and FLR sequence. So please add some text for that in PCI transport section about violation.
> And guideline for driver on how it should not touch them to make this usable.
> This will make the nested solution more clear.
PCI FLR is out of this scope, for virtio you can still reset the device 
by writing 0.
>
> Do you find the administration commands we proposed in [1] useful for nested case?
> If not, both will likely diverge.
Not till now.
>
> We would like to avoid suspending individual VQs in the passthrough case, as things are controlled at the device level.
> It also reduces driver -> device interaction for large queue count ranging from 1 to 32K.
>
> So at present I see very little overlap between the two. I will look more again on 9/13 if passthrough proposal can utilize anything from your series.
It does not suspending an individual VQ, when suspend, all VQs are STOPPED.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:14                                                     ` Zhu, Lingshan
@ 2023-09-12 10:16                                                       ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:16 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:45 PM
> 
> On 9/12/2023 5:35 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:38 PM
> >> supplementary: As Jason ever pointed out: the two solution can
> >> co-exist for sure, I am implementing basic facilities, admin vq can
> >> free feel to reuse them like forwarding messages to them, and this can help
> support nested.
> > Sure. Sounds good.
> >
> > At lest two device vendors + other industry bodies including led by Intel are
> moving away from the register-based implementation in virtualization area.
> This series is self-contained, it is an register based solution. It introduces basic
> facilities, doesn't depend on others like AQ.
> > And registers that you expose are not supporting device reset and FLR
> sequence. So please add some text for that in PCI transport section about
> violation.
> > And guideline for driver on how it should not touch them to make this usable.
> > This will make the nested solution more clear.
> PCI FLR is out of this scope, for virtio you can still reset the device by writing 0.
> >
> > Do you find the administration commands we proposed in [1] useful for
> nested case?
> > If not, both will likely diverge.
> Not till now.
> >
> > We would like to avoid suspending individual VQs in the passthrough case, as
> things are controlled at the device level.
> > It also reduces driver -> device interaction for large queue count ranging from
> 1 to 32K.
> >
> > So at present I see very little overlap between the two. I will look more again
> on 9/13 if passthrough proposal can utilize anything from your series.
> It does not suspending an individual VQ, when suspend, all VQs are STOPPED.

We need to stop configuration notifications as well and shared memory update etc.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:16                                                       ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:16 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:45 PM
> 
> On 9/12/2023 5:35 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:38 PM
> >> supplementary: As Jason ever pointed out: the two solution can
> >> co-exist for sure, I am implementing basic facilities, admin vq can
> >> free feel to reuse them like forwarding messages to them, and this can help
> support nested.
> > Sure. Sounds good.
> >
> > At lest two device vendors + other industry bodies including led by Intel are
> moving away from the register-based implementation in virtualization area.
> This series is self-contained, it is an register based solution. It introduces basic
> facilities, doesn't depend on others like AQ.
> > And registers that you expose are not supporting device reset and FLR
> sequence. So please add some text for that in PCI transport section about
> violation.
> > And guideline for driver on how it should not touch them to make this usable.
> > This will make the nested solution more clear.
> PCI FLR is out of this scope, for virtio you can still reset the device by writing 0.
> >
> > Do you find the administration commands we proposed in [1] useful for
> nested case?
> > If not, both will likely diverge.
> Not till now.
> >
> > We would like to avoid suspending individual VQs in the passthrough case, as
> things are controlled at the device level.
> > It also reduces driver -> device interaction for large queue count ranging from
> 1 to 32K.
> >
> > So at present I see very little overlap between the two. I will look more again
> on 9/13 if passthrough proposal can utilize anything from your series.
> It does not suspending an individual VQ, when suspend, all VQs are STOPPED.

We need to stop configuration notifications as well and shared memory update etc.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:28                                                 ` Parav Pandit
@ 2023-09-12 10:17                                                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:17 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:28 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:36 PM
>> If you want AQ used for LM, it should support nested anyway, don't break user
>> logic.
> You ignored the other part of my question when you asked above.
> i.e. a PCI transport do not allow such weird bifurcation.
I failed to process your comment. Do you mean the registers don't 
support nested?
>>
>> This (my)series can support nested.
> Maybe it does, with hacking the device reset and FLR sequence, without dirty tracking, without P2P support, without passthrough mode.
> All these requirements are not addressed.
>
> If you intent to cover both requirements, lets work towards it to see if it can converge, if there are technical challenges,
> then there is no point in pushing to claim that in-band VF with mediation is the only way to move forward.
Why do you need P2P? Why FLR and reset are concerned? Why do you think 
dirty page tracking is not supported?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:17                                                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:17 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:28 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:36 PM
>> If you want AQ used for LM, it should support nested anyway, don't break user
>> logic.
> You ignored the other part of my question when you asked above.
> i.e. a PCI transport do not allow such weird bifurcation.
I failed to process your comment. Do you mean the registers don't 
support nested?
>>
>> This (my)series can support nested.
> Maybe it does, with hacking the device reset and FLR sequence, without dirty tracking, without P2P support, without passthrough mode.
> All these requirements are not addressed.
>
> If you intent to cover both requirements, lets work towards it to see if it can converge, if there are technical challenges,
> then there is no point in pushing to claim that in-band VF with mediation is the only way to move forward.
Why do you need P2P? Why FLR and reset are concerned? Why do you think 
dirty page tracking is not supported?
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:17                                                   ` Zhu, Lingshan
@ 2023-09-12 10:25                                                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:25 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:47 PM
> 
> On 9/12/2023 5:28 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for LM,
> >> it should support nested anyway, don't break user logic.
> > You ignored the other part of my question when you asked above.
> > i.e. a PCI transport do not allow such weird bifurcation.
> I failed to process your comment. 

> Do you mean the registers don't support nested?
No. I mean registers access should support device reset flow and FLR flow.

> >>

> >> This (my)series can support nested.
> > Maybe it does, with hacking the device reset and FLR sequence, without dirty
> tracking, without P2P support, without passthrough mode.
> > All these requirements are not addressed.
> >
> > If you intent to cover both requirements, lets work towards it to see
> > if it can converge, if there are technical challenges, then there is no point in
> pushing to claim that in-band VF with mediation is the only way to move
> forward.
> Why do you need P2P? 
I answered this before few hours back.

> Why FLR and reset are concerned? 
Because they must work.
> Why do you think dirty page tracking is not supported?
Because you wrote in cover letter " Future work: dirty page tracking and in-flight descriptors."
And you repeatedly resisted administration commands. I don’t see how above two can be done efficiently without administration commands.
And if one is going to use administration command in future for above two, there is no point of doing current series over registers.
I don’t see how and administration queue can work after device is suspended, and after FLR how dirty page tracking can continue.

All these aspects are covered in [1] that can be extended for nesting if needed with side car VF.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:25                                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:25 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:47 PM
> 
> On 9/12/2023 5:28 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for LM,
> >> it should support nested anyway, don't break user logic.
> > You ignored the other part of my question when you asked above.
> > i.e. a PCI transport do not allow such weird bifurcation.
> I failed to process your comment. 

> Do you mean the registers don't support nested?
No. I mean registers access should support device reset flow and FLR flow.

> >>

> >> This (my)series can support nested.
> > Maybe it does, with hacking the device reset and FLR sequence, without dirty
> tracking, without P2P support, without passthrough mode.
> > All these requirements are not addressed.
> >
> > If you intent to cover both requirements, lets work towards it to see
> > if it can converge, if there are technical challenges, then there is no point in
> pushing to claim that in-band VF with mediation is the only way to move
> forward.
> Why do you need P2P? 
I answered this before few hours back.

> Why FLR and reset are concerned? 
Because they must work.
> Why do you think dirty page tracking is not supported?
Because you wrote in cover letter " Future work: dirty page tracking and in-flight descriptors."
And you repeatedly resisted administration commands. I don’t see how above two can be done efficiently without administration commands.
And if one is going to use administration command in future for above two, there is no point of doing current series over registers.
I don’t see how and administration queue can work after device is suspended, and after FLR how dirty page tracking can continue.

All these aspects are covered in [1] that can be extended for nesting if needed with side car VF.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  7:43                         ` Parav Pandit
@ 2023-09-12 10:27                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:27 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 3:43 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 1:06 PM
>>
>> On 9/12/2023 2:52 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:13 PM Why need P2P for Live
>>>> Migration?
>>> A peer device may be accessing the virtio device. Hence first all the devices to
>> be stopped like [1] allowing them to accept driver notifications from the peer
>> device.
>>> Once all the devices are stopped, than each device to be freeze to not do any
>> device context updates. At this point the final device context can be read by the
>> owner driver.
>> Is it beyond the spec? Nvidia specific use case and not related to virtio live
>> migration?
> Not at all Nvidia specific.
> And not all at all beyond the specification.
> PCI transport is probably by far most common transport of virtio.
> And hence, spec proposed in [1] covers it.
>
> It is the base line implementation of leading OS such as Linux kernel.
>
> Decade mature stack like vfio recommends support for p2p as base line without which multiple devices migration can fail as hypervisor has no knowledge if two devices are interacting or not.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html
still, why do you think P2P is a concern of live migration? Or why do 
you think Live Migration should implement special support for P2P?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:27                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:27 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 3:43 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 1:06 PM
>>
>> On 9/12/2023 2:52 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 12:13 PM Why need P2P for Live
>>>> Migration?
>>> A peer device may be accessing the virtio device. Hence first all the devices to
>> be stopped like [1] allowing them to accept driver notifications from the peer
>> device.
>>> Once all the devices are stopped, than each device to be freeze to not do any
>> device context updates. At this point the final device context can be read by the
>> owner driver.
>> Is it beyond the spec? Nvidia specific use case and not related to virtio live
>> migration?
> Not at all Nvidia specific.
> And not all at all beyond the specification.
> PCI transport is probably by far most common transport of virtio.
> And hence, spec proposed in [1] covers it.
>
> It is the base line implementation of leading OS such as Linux kernel.
>
> Decade mature stack like vfio recommends support for p2p as base line without which multiple devices migration can fail as hypervisor has no knowledge if two devices are interacting or not.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html
still, why do you think P2P is a concern of live migration? Or why do 
you think Live Migration should implement special support for P2P?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:16                                                       ` Parav Pandit
@ 2023-09-12 10:28                                                         ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:28 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:16 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:45 PM
>>
>> On 9/12/2023 5:35 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:38 PM
>>>> supplementary: As Jason ever pointed out: the two solution can
>>>> co-exist for sure, I am implementing basic facilities, admin vq can
>>>> free feel to reuse them like forwarding messages to them, and this can help
>> support nested.
>>> Sure. Sounds good.
>>>
>>> At lest two device vendors + other industry bodies including led by Intel are
>> moving away from the register-based implementation in virtualization area.
>> This series is self-contained, it is an register based solution. It introduces basic
>> facilities, doesn't depend on others like AQ.
>>> And registers that you expose are not supporting device reset and FLR
>> sequence. So please add some text for that in PCI transport section about
>> violation.
>>> And guideline for driver on how it should not touch them to make this usable.
>>> This will make the nested solution more clear.
>> PCI FLR is out of this scope, for virtio you can still reset the device by writing 0.
>>> Do you find the administration commands we proposed in [1] useful for
>> nested case?
>>> If not, both will likely diverge.
>> Not till now.
>>> We would like to avoid suspending individual VQs in the passthrough case, as
>> things are controlled at the device level.
>>> It also reduces driver -> device interaction for large queue count ranging from
>> 1 to 32K.
>>> So at present I see very little overlap between the two. I will look more again
>> on 9/13 if passthrough proposal can utilize anything from your series.
>> It does not suspending an individual VQ, when suspend, all VQs are STOPPED.
> We need to stop configuration notifications as well and shared memory update etc.
already did if you have read my series


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:28                                                         ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:28 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:16 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:45 PM
>>
>> On 9/12/2023 5:35 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:38 PM
>>>> supplementary: As Jason ever pointed out: the two solution can
>>>> co-exist for sure, I am implementing basic facilities, admin vq can
>>>> free feel to reuse them like forwarding messages to them, and this can help
>> support nested.
>>> Sure. Sounds good.
>>>
>>> At lest two device vendors + other industry bodies including led by Intel are
>> moving away from the register-based implementation in virtualization area.
>> This series is self-contained, it is an register based solution. It introduces basic
>> facilities, doesn't depend on others like AQ.
>>> And registers that you expose are not supporting device reset and FLR
>> sequence. So please add some text for that in PCI transport section about
>> violation.
>>> And guideline for driver on how it should not touch them to make this usable.
>>> This will make the nested solution more clear.
>> PCI FLR is out of this scope, for virtio you can still reset the device by writing 0.
>>> Do you find the administration commands we proposed in [1] useful for
>> nested case?
>>> If not, both will likely diverge.
>> Not till now.
>>> We would like to avoid suspending individual VQs in the passthrough case, as
>> things are controlled at the device level.
>>> It also reduces driver -> device interaction for large queue count ranging from
>> 1 to 32K.
>>> So at present I see very little overlap between the two. I will look more again
>> on 9/13 if passthrough proposal can utilize anything from your series.
>> It does not suspending an individual VQ, when suspend, all VQs are STOPPED.
> We need to stop configuration notifications as well and shared memory update etc.
already did if you have read my series


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:25                                                     ` Parav Pandit
@ 2023-09-12 10:32                                                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:25 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:47 PM
>>
>> On 9/12/2023 5:28 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for LM,
>>>> it should support nested anyway, don't break user logic.
>>> You ignored the other part of my question when you asked above.
>>> i.e. a PCI transport do not allow such weird bifurcation.
>> I failed to process your comment.
>> Do you mean the registers don't support nested?
> No. I mean registers access should support device reset flow and FLR flow.
DO you see this is a concern? Or why do you think there are problems?
>
>>>> This (my)series can support nested.
>>> Maybe it does, with hacking the device reset and FLR sequence, without dirty
>> tracking, without P2P support, without passthrough mode.
>>> All these requirements are not addressed.
>>>
>>> If you intent to cover both requirements, lets work towards it to see
>>> if it can converge, if there are technical challenges, then there is no point in
>> pushing to claim that in-band VF with mediation is the only way to move
>> forward.
>> Why do you need P2P?
> I answered this before few hours back.
still, why P2P is a blocker of my series?
>
>> Why FLR and reset are concerned?
> Because they must work.
why FLR and reset are affected? When SUSPEND, the device should stop 
operation.
>> Why do you think dirty page tracking is not supported?
> Because you wrote in cover letter " Future work: dirty page tracking and in-flight descriptors."
> And you repeatedly resisted administration commands. I don’t see how above two can be done efficiently without administration commands.
> And if one is going to use administration command in future for above two, there is no point of doing current series over registers.
> I don’t see how and administration queue can work after device is suspended, and after FLR how dirty page tracking can continue.
>
> All these aspects are covered in [1] that can be extended for nesting if needed with side car VF.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I will post V2 with in-flight descriptors tracking and dirty-page 
tracking. They are not in this series bcasue
I want this series to focus and small


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:32                                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:32 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:25 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:47 PM
>>
>> On 9/12/2023 5:28 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for LM,
>>>> it should support nested anyway, don't break user logic.
>>> You ignored the other part of my question when you asked above.
>>> i.e. a PCI transport do not allow such weird bifurcation.
>> I failed to process your comment.
>> Do you mean the registers don't support nested?
> No. I mean registers access should support device reset flow and FLR flow.
DO you see this is a concern? Or why do you think there are problems?
>
>>>> This (my)series can support nested.
>>> Maybe it does, with hacking the device reset and FLR sequence, without dirty
>> tracking, without P2P support, without passthrough mode.
>>> All these requirements are not addressed.
>>>
>>> If you intent to cover both requirements, lets work towards it to see
>>> if it can converge, if there are technical challenges, then there is no point in
>> pushing to claim that in-band VF with mediation is the only way to move
>> forward.
>> Why do you need P2P?
> I answered this before few hours back.
still, why P2P is a blocker of my series?
>
>> Why FLR and reset are concerned?
> Because they must work.
why FLR and reset are affected? When SUSPEND, the device should stop 
operation.
>> Why do you think dirty page tracking is not supported?
> Because you wrote in cover letter " Future work: dirty page tracking and in-flight descriptors."
> And you repeatedly resisted administration commands. I don’t see how above two can be done efficiently without administration commands.
> And if one is going to use administration command in future for above two, there is no point of doing current series over registers.
> I don’t see how and administration queue can work after device is suspended, and after FLR how dirty page tracking can continue.
>
> All these aspects are covered in [1] that can be extended for nesting if needed with side car VF.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
I will post V2 with in-flight descriptors tracking and dirty-page 
tracking. They are not in this series bcasue
I want this series to focus and small


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:27                           ` Zhu, Lingshan
@ 2023-09-12 10:33                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:33 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Zhu, Lingshan
> Sent: Tuesday, September 12, 2023 3:57 PM

> > It is the base line implementation of leading OS such as Linux kernel.
> >
> > Decade mature stack like vfio recommends support for p2p as base line
> without which multiple devices migration can fail as hypervisor has no
> knowledge if two devices are interacting or not.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
> > tml
> still, why do you think P2P is a concern of live migration? Or why do you think
> Live Migration should implement special support for P2P?

It is because p2p needs to work with live migration.
You probably missed the response.
I answered before at [1].

[1] https://lore.kernel.org/virtio-comment/PH0PR12MB5481961B5EA6DEE900CCF068DCF1A@PH0PR12MB5481.namprd12.prod.outlook.com/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:33                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:33 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Zhu, Lingshan
> Sent: Tuesday, September 12, 2023 3:57 PM

> > It is the base line implementation of leading OS such as Linux kernel.
> >
> > Decade mature stack like vfio recommends support for p2p as base line
> without which multiple devices migration can fail as hypervisor has no
> knowledge if two devices are interacting or not.
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
> > tml
> still, why do you think P2P is a concern of live migration? Or why do you think
> Live Migration should implement special support for P2P?

It is because p2p needs to work with live migration.
You probably missed the response.
I answered before at [1].

[1] https://lore.kernel.org/virtio-comment/PH0PR12MB5481961B5EA6DEE900CCF068DCF1A@PH0PR12MB5481.namprd12.prod.outlook.com/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:33                             ` [virtio-dev] " Parav Pandit
@ 2023-09-12 10:35                               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:35 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:33 PM, Parav Pandit wrote:
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Zhu, Lingshan
>> Sent: Tuesday, September 12, 2023 3:57 PM
>>> It is the base line implementation of leading OS such as Linux kernel.
>>>
>>> Decade mature stack like vfio recommends support for p2p as base line
>> without which multiple devices migration can fail as hypervisor has no
>> knowledge if two devices are interacting or not.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
>>> tml
>> still, why do you think P2P is a concern of live migration? Or why do you think
>> Live Migration should implement special support for P2P?
> It is because p2p needs to work with live migration.
> You probably missed the response.
> I answered before at [1].
I mean, why do you think my series can not work with P2P
>
> [1] https://lore.kernel.org/virtio-comment/PH0PR12MB5481961B5EA6DEE900CCF068DCF1A@PH0PR12MB5481.namprd12.prod.outlook.com/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:35                               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 10:35 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:33 PM, Parav Pandit wrote:
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Zhu, Lingshan
>> Sent: Tuesday, September 12, 2023 3:57 PM
>>> It is the base line implementation of leading OS such as Linux kernel.
>>>
>>> Decade mature stack like vfio recommends support for p2p as base line
>> without which multiple devices migration can fail as hypervisor has no
>> knowledge if two devices are interacting or not.
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
>>> tml
>> still, why do you think P2P is a concern of live migration? Or why do you think
>> Live Migration should implement special support for P2P?
> It is because p2p needs to work with live migration.
> You probably missed the response.
> I answered before at [1].
I mean, why do you think my series can not work with P2P
>
> [1] https://lore.kernel.org/virtio-comment/PH0PR12MB5481961B5EA6DEE900CCF068DCF1A@PH0PR12MB5481.namprd12.prod.outlook.com/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:32                                                       ` Zhu, Lingshan
@ 2023-09-12 10:40                                                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:40 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 4:02 PM
> 
> On 9/12/2023 6:25 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 3:47 PM
> >>
> >> On 9/12/2023 5:28 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for
> >>>> LM, it should support nested anyway, don't break user logic.
> >>> You ignored the other part of my question when you asked above.
> >>> i.e. a PCI transport do not allow such weird bifurcation.
> >> I failed to process your comment.
> >> Do you mean the registers don't support nested?
> > No. I mean registers access should support device reset flow and FLR flow.
> DO you see this is a concern? Or why do you think there are problems?
Yes. administration queue wont answer after SUSPEND is done in device status.
So how do you plan to support AQ, inflight tracking dirty tracking and suspend device status?

Device bifurcation is not supported in the pci spec and its hack in virtio to do so.

As I asked few times before, if you have solved this, I am very interested to learn more about it.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:40                                                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:40 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 4:02 PM
> 
> On 9/12/2023 6:25 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 3:47 PM
> >>
> >> On 9/12/2023 5:28 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for
> >>>> LM, it should support nested anyway, don't break user logic.
> >>> You ignored the other part of my question when you asked above.
> >>> i.e. a PCI transport do not allow such weird bifurcation.
> >> I failed to process your comment.
> >> Do you mean the registers don't support nested?
> > No. I mean registers access should support device reset flow and FLR flow.
> DO you see this is a concern? Or why do you think there are problems?
Yes. administration queue wont answer after SUSPEND is done in device status.
So how do you plan to support AQ, inflight tracking dirty tracking and suspend device status?

Device bifurcation is not supported in the pci spec and its hack in virtio to do so.

As I asked few times before, if you have solved this, I am very interested to learn more about it.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:35                               ` Zhu, Lingshan
@ 2023-09-12 10:41                                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:41 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 4:05 PM

> I mean, why do you think my series can not work with P2P
Because it misses the intermediate mode STOP that we have in series [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 10:41                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 10:41 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 4:05 PM

> I mean, why do you think my series can not work with P2P
Because it misses the intermediate mode STOP that we have in series [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  9:21                                                 ` Parav Pandit
@ 2023-09-12 13:03                                                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:03 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:21 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:33 PM
>> admin vq require fixed and dedicated resource to serve the VMs, the question
>> still remains, does is scale to server big amount of devices migration? how many
>> admin vqs do you need to serve 10 VMs, how many for 100? and so on? How to
>> scale?
>>
> Yes, it scales within the AQ and across multiple AQs.
> Please consult your board designers to know such limits for your device.
scales require multiple AQs, then how many should a vendor provide for the
worst case?

I am boring for the same repeating questions.
>   
>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable time?
>> If not, how many exactly.
>>
> Yes, it can serve both 100 and 1000 VMs in reasonable time.
I am not sure, the aq is limitless? Can serve thousands of VMs
in a reasonable time? Like in 300ms?

If you say, that require multiple AQ, then how many should a vendor provide?

Don't say the board designer own the risks.
>
>> And register does not need to scale, it resides on the VF and only serve
>> the VF.
>>
> Since its per VF, by nature it is linearly growing entity that the board design needs to support read and write with guaranteed timing.
> It clearly scaled poor than queue.
Please read my series. For example, we introduce a new bit SUSPEND in 
the \field{device status}, any scalability issues here?
>   
>> It does not reside on the PF to migrate the VFs.
> Hence it does not scale and cannot do parallel operation within the VF, unless each register is replicated.
Why its not scale? It is a per device facility.
Why do you need parallel operation against the LM facility?
That doesn't make a lot of sense.
>
> Using register of a queue for bulk data transfer is solved question when the virtio spec was born.
> I don’t see a point to discuss it.
> Snippet from spec: " As a device can have zero or more virtqueues for bulk data transport"
Where do you see the series intends to transfer bulk data through registers?
>
>> VFs config space can use the device dedicated resource like the bandwidth.
>>
>> for AQ, still you need to reserve resource and how much?
> It depends on your board, please consult your board designer to know depending on the implementation.
>  From spec point of view, it should not be same as any other virtqueue.
so the vendor own the risk to implement AQ LM? Why they have to?
>>> No. I do not agree. It can fail and very hard for board designers.
>>> AQs are more reliable way to transport bulk data in scalable manner for tens
>> of member devices.
>> Really? How often do you observe virtio config space fail?
> On Intel Icelake server we have seen it failing with 128 VFs.
> And device needs to do very weird things to support 1000+ VFs forever expanding config space, which is not the topic of this discussion anyway.
That is your setup problem.
>
>
>>>> Please allow me to provide an extreme example, is one single admin vq
>>>> limitless, that can serve hundreds to thousands of VMs migration?
>>> It is left to the device implementation. Just like RSS and multi queue support?
>>> Is one Q enough for 800Gbps to 10Mbps link?
>>> Answer is: Not the scope of specification, spec provide the framework to scale
>> this way, but not impose on the device.
>> Even if not support RSS or MQ, the device still can work with
>> performance overhead, not fail.
>>
> _work_ is subjective.
> The financial transaction (application) failed. Packeted worked.
> LM commands were successful, but it was not timely.
>
> Same same..
>   
>> Insufficient bandwidth & resource caused live migration fail is totally
>> different.
> Very abstract point and unrelated to administration commands.
It is your design facing the problem.
>
>>>> If not, two or
>>>> three or what number?
>>> It really does not matter. Its wrong point to discuss here.
>>> Number of queues and command execution depends on the device
>> implementation.
>>> A financial transaction application can timeout when a device queuing delay
>> for virtio net rx queue is long.
>>> And we don’t put details about such things in specification.
>>> Spec takes the requirements and provides driver device interface to
>> implement and scale.
>>> I still don’t follow the motivation behind the question.
>>> Is your question: How many admin queues are needed to migrate N member
>> devices? If so, it is implementation specific.
>>> It is similar to how such things depend on implementation for 30 virtio device
>> types.
>>> And if are implying that because it is implementation specific, that is why
>> administration queue should not be used, but some configuration register
>> should be used.
>>> Than you should propose a config register interface to post virtqueue
>> descriptors that way for 30 device types!
>> if so, leave it as undefined? A potential risk for device implantation?
>
>> Then why must the admin vq?
> Because administration commands and admin vq does not impose devices to implement thousands of registers which must have time bound completion guarantee.
> The large part of industry including SIOV devices led by Intel and others are moving away from register access mode.
>
> To summarize, administration commands and queue offer following benefits.
>
> 1. Ability to do bulk data transfer between driver and device
>
> 2. Ability to parallelize the work within driver and within device within single or multiple virtqueues
>
> 3. Eliminates implementing PCI read/write MMIO registers which demand low latency response interval
>
> 4. Better utilize host cpu as no one needs to poll on the device register for completion
>
> 5. Ability to handle variability in command completion by device and ability to notify the driver
>
> If this does not satisfy you, please refer to some of the past email discussions during administration virtuqueue time.
I think you mixed up the facility and the implementation in my series, 
please read.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 13:03                                                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:03 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 5:21 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 2:33 PM
>> admin vq require fixed and dedicated resource to serve the VMs, the question
>> still remains, does is scale to server big amount of devices migration? how many
>> admin vqs do you need to serve 10 VMs, how many for 100? and so on? How to
>> scale?
>>
> Yes, it scales within the AQ and across multiple AQs.
> Please consult your board designers to know such limits for your device.
scales require multiple AQs, then how many should a vendor provide for the
worst case?

I am boring for the same repeating questions.
>   
>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable time?
>> If not, how many exactly.
>>
> Yes, it can serve both 100 and 1000 VMs in reasonable time.
I am not sure, the aq is limitless? Can serve thousands of VMs
in a reasonable time? Like in 300ms?

If you say, that require multiple AQ, then how many should a vendor provide?

Don't say the board designer own the risks.
>
>> And register does not need to scale, it resides on the VF and only serve
>> the VF.
>>
> Since its per VF, by nature it is linearly growing entity that the board design needs to support read and write with guaranteed timing.
> It clearly scaled poor than queue.
Please read my series. For example, we introduce a new bit SUSPEND in 
the \field{device status}, any scalability issues here?
>   
>> It does not reside on the PF to migrate the VFs.
> Hence it does not scale and cannot do parallel operation within the VF, unless each register is replicated.
Why its not scale? It is a per device facility.
Why do you need parallel operation against the LM facility?
That doesn't make a lot of sense.
>
> Using register of a queue for bulk data transfer is solved question when the virtio spec was born.
> I don’t see a point to discuss it.
> Snippet from spec: " As a device can have zero or more virtqueues for bulk data transport"
Where do you see the series intends to transfer bulk data through registers?
>
>> VFs config space can use the device dedicated resource like the bandwidth.
>>
>> for AQ, still you need to reserve resource and how much?
> It depends on your board, please consult your board designer to know depending on the implementation.
>  From spec point of view, it should not be same as any other virtqueue.
so the vendor own the risk to implement AQ LM? Why they have to?
>>> No. I do not agree. It can fail and very hard for board designers.
>>> AQs are more reliable way to transport bulk data in scalable manner for tens
>> of member devices.
>> Really? How often do you observe virtio config space fail?
> On Intel Icelake server we have seen it failing with 128 VFs.
> And device needs to do very weird things to support 1000+ VFs forever expanding config space, which is not the topic of this discussion anyway.
That is your setup problem.
>
>
>>>> Please allow me to provide an extreme example, is one single admin vq
>>>> limitless, that can serve hundreds to thousands of VMs migration?
>>> It is left to the device implementation. Just like RSS and multi queue support?
>>> Is one Q enough for 800Gbps to 10Mbps link?
>>> Answer is: Not the scope of specification, spec provide the framework to scale
>> this way, but not impose on the device.
>> Even if not support RSS or MQ, the device still can work with
>> performance overhead, not fail.
>>
> _work_ is subjective.
> The financial transaction (application) failed. Packeted worked.
> LM commands were successful, but it was not timely.
>
> Same same..
>   
>> Insufficient bandwidth & resource caused live migration fail is totally
>> different.
> Very abstract point and unrelated to administration commands.
It is your design facing the problem.
>
>>>> If not, two or
>>>> three or what number?
>>> It really does not matter. Its wrong point to discuss here.
>>> Number of queues and command execution depends on the device
>> implementation.
>>> A financial transaction application can timeout when a device queuing delay
>> for virtio net rx queue is long.
>>> And we don’t put details about such things in specification.
>>> Spec takes the requirements and provides driver device interface to
>> implement and scale.
>>> I still don’t follow the motivation behind the question.
>>> Is your question: How many admin queues are needed to migrate N member
>> devices? If so, it is implementation specific.
>>> It is similar to how such things depend on implementation for 30 virtio device
>> types.
>>> And if are implying that because it is implementation specific, that is why
>> administration queue should not be used, but some configuration register
>> should be used.
>>> Than you should propose a config register interface to post virtqueue
>> descriptors that way for 30 device types!
>> if so, leave it as undefined? A potential risk for device implantation?
>
>> Then why must the admin vq?
> Because administration commands and admin vq does not impose devices to implement thousands of registers which must have time bound completion guarantee.
> The large part of industry including SIOV devices led by Intel and others are moving away from register access mode.
>
> To summarize, administration commands and queue offer following benefits.
>
> 1. Ability to do bulk data transfer between driver and device
>
> 2. Ability to parallelize the work within driver and within device within single or multiple virtqueues
>
> 3. Eliminates implementing PCI read/write MMIO registers which demand low latency response interval
>
> 4. Better utilize host cpu as no one needs to poll on the device register for completion
>
> 5. Ability to handle variability in command completion by device and ability to notify the driver
>
> If this does not satisfy you, please refer to some of the past email discussions during administration virtuqueue time.
I think you mixed up the facility and the implementation in my series, 
please read.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:40                                                         ` Parav Pandit
@ 2023-09-12 13:04                                                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:04 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:40 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 4:02 PM
>>
>> On 9/12/2023 6:25 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 3:47 PM
>>>>
>>>> On 9/12/2023 5:28 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for
>>>>>> LM, it should support nested anyway, don't break user logic.
>>>>> You ignored the other part of my question when you asked above.
>>>>> i.e. a PCI transport do not allow such weird bifurcation.
>>>> I failed to process your comment.
>>>> Do you mean the registers don't support nested?
>>> No. I mean registers access should support device reset flow and FLR flow.
>> DO you see this is a concern? Or why do you think there are problems?
> Yes. administration queue wont answer after SUSPEND is done in device status.
> So how do you plan to support AQ, inflight tracking dirty tracking and suspend device status?
>
> Device bifurcation is not supported in the pci spec and its hack in virtio to do so.
>
> As I asked few times before, if you have solved this, I am very interested to learn more about it.
Please read the series


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 13:04                                                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:04 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:40 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 4:02 PM
>>
>> On 9/12/2023 6:25 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 3:47 PM
>>>>
>>>> On 9/12/2023 5:28 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 2:36 PM If you want AQ used for
>>>>>> LM, it should support nested anyway, don't break user logic.
>>>>> You ignored the other part of my question when you asked above.
>>>>> i.e. a PCI transport do not allow such weird bifurcation.
>>>> I failed to process your comment.
>>>> Do you mean the registers don't support nested?
>>> No. I mean registers access should support device reset flow and FLR flow.
>> DO you see this is a concern? Or why do you think there are problems?
> Yes. administration queue wont answer after SUSPEND is done in device status.
> So how do you plan to support AQ, inflight tracking dirty tracking and suspend device status?
>
> Device bifurcation is not supported in the pci spec and its hack in virtio to do so.
>
> As I asked few times before, if you have solved this, I am very interested to learn more about it.
Please read the series


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:41                                 ` Parav Pandit
@ 2023-09-12 13:09                                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:09 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:41 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 4:05 PM
>> I mean, why do you think my series can not work with P2P
> Because it misses the intermediate mode STOP that we have in series [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html
Again, when SUSPEND:
1) the device freezes, means stop operation in both data-path and 
control-path, except the device status
2) a new feature bit will be introduced in V2, to allow RESET_VQ after 
SUSPEND
3) if there is a device doing P2P against the device.
They should be pass-through-ed to the same guest and should be suspended 
as well for LM,
or it is a security problem.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 13:09                                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-12 13:09 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 6:41 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 4:05 PM
>> I mean, why do you think my series can not work with P2P
> Because it misses the intermediate mode STOP that we have in series [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.html
Again, when SUSPEND:
1) the device freezes, means stop operation in both data-path and 
control-path, except the device status
2) a new feature bit will be introduced in V2, to allow RESET_VQ after 
SUSPEND
3) if there is a device doing P2P against the device.
They should be pass-through-ed to the same guest and should be suspended 
as well for LM,
or it is a security problem.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:09                                   ` Zhu, Lingshan
@ 2023-09-12 13:35                                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 13:35 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:39 PM
> 
> On 9/12/2023 6:41 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think my
> >> series can not work with P2P
> > Because it misses the intermediate mode STOP that we have in series [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
> > tml
> Again, when SUSPEND:
> 1) the device freezes, means stop operation in both data-path and control-path,
> except the device status
Exactly, including the RESET_VQ command also cannot be served because device is frozen.

> 2) a new feature bit will be introduced in V2, to allow RESET_VQ after SUSPEND
RESET_VQ after suspend is simply wrong. Because device is already suspended to not respond to some  extra RESET_VQ command.

> 3) if there is a device doing P2P against the device.
> They should be pass-through-ed to the same guest and should be suspended as
> well for LM, or it is a security problem.
There is no security problem. Multiple passthrough devices and P2P is already there in PCI using ACS for probably a decade now.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 13:35                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 13:35 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:39 PM
> 
> On 9/12/2023 6:41 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think my
> >> series can not work with P2P
> > Because it misses the intermediate mode STOP that we have in series [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
> > tml
> Again, when SUSPEND:
> 1) the device freezes, means stop operation in both data-path and control-path,
> except the device status
Exactly, including the RESET_VQ command also cannot be served because device is frozen.

> 2) a new feature bit will be introduced in V2, to allow RESET_VQ after SUSPEND
RESET_VQ after suspend is simply wrong. Because device is already suspended to not respond to some  extra RESET_VQ command.

> 3) if there is a device doing P2P against the device.
> They should be pass-through-ed to the same guest and should be suspended as
> well for LM, or it is a security problem.
There is no security problem. Multiple passthrough devices and P2P is already there in PCI using ACS for probably a decade now.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:04                                                           ` Zhu, Lingshan
@ 2023-09-12 13:36                                                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 13:36 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:34 PM


> > As I asked few times before, if you have solved this, I am very interested to
> learn more about it.
> Please read the series
As I promised already I will read on 9/13.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 13:36                                                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 13:36 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:34 PM


> > As I asked few times before, if you have solved this, I am very interested to
> learn more about it.
> Please read the series
As I promised already I will read on 9/13.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:03                                                   ` Zhu, Lingshan
@ 2023-09-12 13:43                                                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 13:43 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:33 PM
> 
> On 9/12/2023 5:21 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed and
> >> dedicated resource to serve the VMs, the question still remains, does
> >> is scale to server big amount of devices migration? how many admin
> >> vqs do you need to serve 10 VMs, how many for 100? and so on? How to
> >> scale?
> >>
> > Yes, it scales within the AQ and across multiple AQs.
> > Please consult your board designers to know such limits for your device.
> scales require multiple AQs, then how many should a vendor provide for the
> worst case?
> 
> I am boring for the same repeating questions.
I said it scales, within the AQ. (and across AQs).
I have answered enough times, so I will stop on same repeated question.
Your repeated question is not helping anyone as it is not in the scope of virtio.

If you think it is, please get it written first for RSS and MQ in net section and post for review.

> >
> >> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable
> time?
> >> If not, how many exactly.
> >>
> > Yes, it can serve both 100 and 1000 VMs in reasonable time.
> I am not sure, the aq is limitless? Can serve thousands of VMs in a reasonable
> time? Like in 300ms?
> 
Yes.

> If you say, that require multiple AQ, then how many should a vendor provide?
> 
I didn’t say multiple AQs must be used.
It is same as NIC RQs.

> Don't say the board designer own the risks.

> >
> >> And register does not need to scale, it resides on the VF and only
> >> serve the VF.
> >>
> > Since its per VF, by nature it is linearly growing entity that the board design
> needs to support read and write with guaranteed timing.
> > It clearly scaled poor than queue.
> Please read my series. For example, we introduce a new bit SUSPEND in the
> \field{device status}, any scalability issues here?
That must behave like queue_reset, (it must get acknowledged from the device) that it is suspended.
And that brings the scale issue.
On top of that once the device is SUSPENDED, it cannot accept some other RESET_VQ command.

> >
> >> It does not reside on the PF to migrate the VFs.
> > Hence it does not scale and cannot do parallel operation within the VF, unless
> each register is replicated.
> Why its not scale? It is a per device facility.
Because the device needs to answer per device through some large scale memory to fit in a response time.

> Why do you need parallel operation against the LM facility?
Because your downtime was 300msec for 1000 VMs.

> That doesn't make a lot of sense.
> >
> > Using register of a queue for bulk data transfer is solved question when the
> virtio spec was born.
> > I don’t see a point to discuss it.
> > Snippet from spec: " As a device can have zero or more virtqueues for bulk
> data transport"
> Where do you see the series intends to transfer bulk data through registers?
> >
> >> VFs config space can use the device dedicated resource like the bandwidth.
> >>
> >> for AQ, still you need to reserve resource and how much?
> > It depends on your board, please consult your board designer to know
> depending on the implementation.
> >  From spec point of view, it should not be same as any other virtqueue.
> so the vendor own the risk to implement AQ LM? Why they have to?
> >>> No. I do not agree. It can fail and very hard for board designers.
> >>> AQs are more reliable way to transport bulk data in scalable manner
> >>> for tens
> >> of member devices.
> >> Really? How often do you observe virtio config space fail?
> > On Intel Icelake server we have seen it failing with 128 VFs.
> > And device needs to do very weird things to support 1000+ VFs forever
> expanding config space, which is not the topic of this discussion anyway.
> That is your setup problem.
> >
> >
> >>>> Please allow me to provide an extreme example, is one single admin
> >>>> vq limitless, that can serve hundreds to thousands of VMs migration?
> >>> It is left to the device implementation. Just like RSS and multi queue
> support?
> >>> Is one Q enough for 800Gbps to 10Mbps link?
> >>> Answer is: Not the scope of specification, spec provide the
> >>> framework to scale
> >> this way, but not impose on the device.
> >> Even if not support RSS or MQ, the device still can work with
> >> performance overhead, not fail.
> >>
> > _work_ is subjective.
> > The financial transaction (application) failed. Packeted worked.
> > LM commands were successful, but it was not timely.
> >
> > Same same..
> >
> >> Insufficient bandwidth & resource caused live migration fail is
> >> totally different.
> > Very abstract point and unrelated to administration commands.
> It is your design facing the problem.
> >
> >>>> If not, two or
> >>>> three or what number?
> >>> It really does not matter. Its wrong point to discuss here.
> >>> Number of queues and command execution depends on the device
> >> implementation.
> >>> A financial transaction application can timeout when a device
> >>> queuing delay
> >> for virtio net rx queue is long.
> >>> And we don’t put details about such things in specification.
> >>> Spec takes the requirements and provides driver device interface to
> >> implement and scale.
> >>> I still don’t follow the motivation behind the question.
> >>> Is your question: How many admin queues are needed to migrate N
> >>> member
> >> devices? If so, it is implementation specific.
> >>> It is similar to how such things depend on implementation for 30
> >>> virtio device
> >> types.
> >>> And if are implying that because it is implementation specific, that
> >>> is why
> >> administration queue should not be used, but some configuration
> >> register should be used.
> >>> Than you should propose a config register interface to post
> >>> virtqueue
> >> descriptors that way for 30 device types!
> >> if so, leave it as undefined? A potential risk for device implantation?
> >
> >> Then why must the admin vq?
> > Because administration commands and admin vq does not impose devices to
> implement thousands of registers which must have time bound completion
> guarantee.
> > The large part of industry including SIOV devices led by Intel and others are
> moving away from register access mode.
> >
> > To summarize, administration commands and queue offer following benefits.
> >
> > 1. Ability to do bulk data transfer between driver and device
> >
> > 2. Ability to parallelize the work within driver and within device
> > within single or multiple virtqueues
> >
> > 3. Eliminates implementing PCI read/write MMIO registers which demand
> > low latency response interval
> >
> > 4. Better utilize host cpu as no one needs to poll on the device
> > register for completion
> >
> > 5. Ability to handle variability in command completion by device and
> > ability to notify the driver
> >
> > If this does not satisfy you, please refer to some of the past email discussions
> during administration virtuqueue time.
> I think you mixed up the facility and the implementation in my series, please
> read.
I don’t know what you refer to. You asked "why AQ is must?" I answered above what AQ has to offer than some synchronous register.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-12 13:43                                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-12 13:43 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 6:33 PM
> 
> On 9/12/2023 5:21 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed and
> >> dedicated resource to serve the VMs, the question still remains, does
> >> is scale to server big amount of devices migration? how many admin
> >> vqs do you need to serve 10 VMs, how many for 100? and so on? How to
> >> scale?
> >>
> > Yes, it scales within the AQ and across multiple AQs.
> > Please consult your board designers to know such limits for your device.
> scales require multiple AQs, then how many should a vendor provide for the
> worst case?
> 
> I am boring for the same repeating questions.
I said it scales, within the AQ. (and across AQs).
I have answered enough times, so I will stop on same repeated question.
Your repeated question is not helping anyone as it is not in the scope of virtio.

If you think it is, please get it written first for RSS and MQ in net section and post for review.

> >
> >> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable
> time?
> >> If not, how many exactly.
> >>
> > Yes, it can serve both 100 and 1000 VMs in reasonable time.
> I am not sure, the aq is limitless? Can serve thousands of VMs in a reasonable
> time? Like in 300ms?
> 
Yes.

> If you say, that require multiple AQ, then how many should a vendor provide?
> 
I didn’t say multiple AQs must be used.
It is same as NIC RQs.

> Don't say the board designer own the risks.

> >
> >> And register does not need to scale, it resides on the VF and only
> >> serve the VF.
> >>
> > Since its per VF, by nature it is linearly growing entity that the board design
> needs to support read and write with guaranteed timing.
> > It clearly scaled poor than queue.
> Please read my series. For example, we introduce a new bit SUSPEND in the
> \field{device status}, any scalability issues here?
That must behave like queue_reset, (it must get acknowledged from the device) that it is suspended.
And that brings the scale issue.
On top of that once the device is SUSPENDED, it cannot accept some other RESET_VQ command.

> >
> >> It does not reside on the PF to migrate the VFs.
> > Hence it does not scale and cannot do parallel operation within the VF, unless
> each register is replicated.
> Why its not scale? It is a per device facility.
Because the device needs to answer per device through some large scale memory to fit in a response time.

> Why do you need parallel operation against the LM facility?
Because your downtime was 300msec for 1000 VMs.

> That doesn't make a lot of sense.
> >
> > Using register of a queue for bulk data transfer is solved question when the
> virtio spec was born.
> > I don’t see a point to discuss it.
> > Snippet from spec: " As a device can have zero or more virtqueues for bulk
> data transport"
> Where do you see the series intends to transfer bulk data through registers?
> >
> >> VFs config space can use the device dedicated resource like the bandwidth.
> >>
> >> for AQ, still you need to reserve resource and how much?
> > It depends on your board, please consult your board designer to know
> depending on the implementation.
> >  From spec point of view, it should not be same as any other virtqueue.
> so the vendor own the risk to implement AQ LM? Why they have to?
> >>> No. I do not agree. It can fail and very hard for board designers.
> >>> AQs are more reliable way to transport bulk data in scalable manner
> >>> for tens
> >> of member devices.
> >> Really? How often do you observe virtio config space fail?
> > On Intel Icelake server we have seen it failing with 128 VFs.
> > And device needs to do very weird things to support 1000+ VFs forever
> expanding config space, which is not the topic of this discussion anyway.
> That is your setup problem.
> >
> >
> >>>> Please allow me to provide an extreme example, is one single admin
> >>>> vq limitless, that can serve hundreds to thousands of VMs migration?
> >>> It is left to the device implementation. Just like RSS and multi queue
> support?
> >>> Is one Q enough for 800Gbps to 10Mbps link?
> >>> Answer is: Not the scope of specification, spec provide the
> >>> framework to scale
> >> this way, but not impose on the device.
> >> Even if not support RSS or MQ, the device still can work with
> >> performance overhead, not fail.
> >>
> > _work_ is subjective.
> > The financial transaction (application) failed. Packeted worked.
> > LM commands were successful, but it was not timely.
> >
> > Same same..
> >
> >> Insufficient bandwidth & resource caused live migration fail is
> >> totally different.
> > Very abstract point and unrelated to administration commands.
> It is your design facing the problem.
> >
> >>>> If not, two or
> >>>> three or what number?
> >>> It really does not matter. Its wrong point to discuss here.
> >>> Number of queues and command execution depends on the device
> >> implementation.
> >>> A financial transaction application can timeout when a device
> >>> queuing delay
> >> for virtio net rx queue is long.
> >>> And we don’t put details about such things in specification.
> >>> Spec takes the requirements and provides driver device interface to
> >> implement and scale.
> >>> I still don’t follow the motivation behind the question.
> >>> Is your question: How many admin queues are needed to migrate N
> >>> member
> >> devices? If so, it is implementation specific.
> >>> It is similar to how such things depend on implementation for 30
> >>> virtio device
> >> types.
> >>> And if are implying that because it is implementation specific, that
> >>> is why
> >> administration queue should not be used, but some configuration
> >> register should be used.
> >>> Than you should propose a config register interface to post
> >>> virtqueue
> >> descriptors that way for 30 device types!
> >> if so, leave it as undefined? A potential risk for device implantation?
> >
> >> Then why must the admin vq?
> > Because administration commands and admin vq does not impose devices to
> implement thousands of registers which must have time bound completion
> guarantee.
> > The large part of industry including SIOV devices led by Intel and others are
> moving away from register access mode.
> >
> > To summarize, administration commands and queue offer following benefits.
> >
> > 1. Ability to do bulk data transfer between driver and device
> >
> > 2. Ability to parallelize the work within driver and within device
> > within single or multiple virtqueues
> >
> > 3. Eliminates implementing PCI read/write MMIO registers which demand
> > low latency response interval
> >
> > 4. Better utilize host cpu as no one needs to poll on the device
> > register for completion
> >
> > 5. Ability to handle variability in command completion by device and
> > ability to notify the driver
> >
> > If this does not satisfy you, please refer to some of the past email discussions
> during administration virtuqueue time.
> I think you mixed up the facility and the implementation in my series, please
> read.
I don’t know what you refer to. You asked "why AQ is must?" I answered above what AQ has to offer than some synchronous register.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 10:14                                                     ` Zhu, Lingshan
@ 2023-09-13  2:23                                                       ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  2:23 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:45 PM

> > Do you find the administration commands we proposed in [1] useful for
> nested case?
> > If not, both will likely diverge.
> Not till now.

I don’t think you reviewed [1] enough.
Following functionality that you want to post in v1 is already covered.
Why cannot you use it from [1]?

a. Dirty page tracking (write recording in [1]), 
b. device suspend/resume (mode setting)
c. inflight descriptors (device context)

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html



^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  2:23                                                       ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  2:23 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 12, 2023 3:45 PM

> > Do you find the administration commands we proposed in [1] useful for
> nested case?
> > If not, both will likely diverge.
> Not till now.

I don’t think you reviewed [1] enough.
Following functionality that you want to post in v1 is already covered.
Why cannot you use it from [1]?

a. Dirty page tracking (write recording in [1]), 
b. device suspend/resume (mode setting)
c. inflight descriptors (device context)

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html



^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:43                                                     ` Parav Pandit
@ 2023-09-13  4:01                                                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:01 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 9:43 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 6:33 PM
>>
>> On 9/12/2023 5:21 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed and
>>>> dedicated resource to serve the VMs, the question still remains, does
>>>> is scale to server big amount of devices migration? how many admin
>>>> vqs do you need to serve 10 VMs, how many for 100? and so on? How to
>>>> scale?
>>>>
>>> Yes, it scales within the AQ and across multiple AQs.
>>> Please consult your board designers to know such limits for your device.
>> scales require multiple AQs, then how many should a vendor provide for the
>> worst case?
>>
>> I am boring for the same repeating questions.
> I said it scales, within the AQ. (and across AQs).
> I have answered enough times, so I will stop on same repeated question.
> Your repeated question is not helping anyone as it is not in the scope of virtio.
>
> If you think it is, please get it written first for RSS and MQ in net section and post for review.
You missed the point of the question and I agree no need to discuss this 
anymore.
>
>>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable
>> time?
>>>> If not, how many exactly.
>>>>
>>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
>> I am not sure, the aq is limitless? Can serve thousands of VMs in a reasonable
>> time? Like in 300ms?
>>
> Yes.
really? limitless?
>
>> If you say, that require multiple AQ, then how many should a vendor provide?
>>
> I didn’t say multiple AQs must be used.
> It is same as NIC RQs.
don't you agree a single vq has its own performance limitations?
>
>> Don't say the board designer own the risks.
>>>> And register does not need to scale, it resides on the VF and only
>>>> serve the VF.
>>>>
>>> Since its per VF, by nature it is linearly growing entity that the board design
>> needs to support read and write with guaranteed timing.
>>> It clearly scaled poor than queue.
>> Please read my series. For example, we introduce a new bit SUSPEND in the
>> \field{device status}, any scalability issues here?
> That must behave like queue_reset, (it must get acknowledged from the device) that it is suspended.
> And that brings the scale issue.
In this series, it says:
+When setting SUSPEND, the driver MUST re-read \field{device status} to 
ensure the SUSPEND bit is set.

And this is nothing to do with scale.
> On top of that once the device is SUSPENDED, it cannot accept some other RESET_VQ command.
so as SiWei suggested, there will be a new feature bit introduced in V2 
for vq reset.
>
>>>> It does not reside on the PF to migrate the VFs.
>>> Hence it does not scale and cannot do parallel operation within the VF, unless
>> each register is replicated.
>> Why its not scale? It is a per device facility.
> Because the device needs to answer per device through some large scale memory to fit in a response time.
Again, it is a per-device facility, and it is register based serve the 
only one device itself.
And we do not plan to log the dirty pages in bar.
>
>> Why do you need parallel operation against the LM facility?
> Because your downtime was 300msec for 1000 VMs.
the LM facility in this series is per-device, it only severs itself.
>
>> That doesn't make a lot of sense.
>>> Using register of a queue for bulk data transfer is solved question when the
>> virtio spec was born.
>>> I don’t see a point to discuss it.
>>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
>> data transport"
>> Where do you see the series intends to transfer bulk data through registers?
>>>> VFs config space can use the device dedicated resource like the bandwidth.
>>>>
>>>> for AQ, still you need to reserve resource and how much?
>>> It depends on your board, please consult your board designer to know
>> depending on the implementation.
>>>   From spec point of view, it should not be same as any other virtqueue.
>> so the vendor own the risk to implement AQ LM? Why they have to?
>>>>> No. I do not agree. It can fail and very hard for board designers.
>>>>> AQs are more reliable way to transport bulk data in scalable manner
>>>>> for tens
>>>> of member devices.
>>>> Really? How often do you observe virtio config space fail?
>>> On Intel Icelake server we have seen it failing with 128 VFs.
>>> And device needs to do very weird things to support 1000+ VFs forever
>> expanding config space, which is not the topic of this discussion anyway.
>> That is your setup problem.
>>>
>>>>>> Please allow me to provide an extreme example, is one single admin
>>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
>>>>> It is left to the device implementation. Just like RSS and multi queue
>> support?
>>>>> Is one Q enough for 800Gbps to 10Mbps link?
>>>>> Answer is: Not the scope of specification, spec provide the
>>>>> framework to scale
>>>> this way, but not impose on the device.
>>>> Even if not support RSS or MQ, the device still can work with
>>>> performance overhead, not fail.
>>>>
>>> _work_ is subjective.
>>> The financial transaction (application) failed. Packeted worked.
>>> LM commands were successful, but it was not timely.
>>>
>>> Same same..
>>>
>>>> Insufficient bandwidth & resource caused live migration fail is
>>>> totally different.
>>> Very abstract point and unrelated to administration commands.
>> It is your design facing the problem.
>>>>>> If not, two or
>>>>>> three or what number?
>>>>> It really does not matter. Its wrong point to discuss here.
>>>>> Number of queues and command execution depends on the device
>>>> implementation.
>>>>> A financial transaction application can timeout when a device
>>>>> queuing delay
>>>> for virtio net rx queue is long.
>>>>> And we don’t put details about such things in specification.
>>>>> Spec takes the requirements and provides driver device interface to
>>>> implement and scale.
>>>>> I still don’t follow the motivation behind the question.
>>>>> Is your question: How many admin queues are needed to migrate N
>>>>> member
>>>> devices? If so, it is implementation specific.
>>>>> It is similar to how such things depend on implementation for 30
>>>>> virtio device
>>>> types.
>>>>> And if are implying that because it is implementation specific, that
>>>>> is why
>>>> administration queue should not be used, but some configuration
>>>> register should be used.
>>>>> Than you should propose a config register interface to post
>>>>> virtqueue
>>>> descriptors that way for 30 device types!
>>>> if so, leave it as undefined? A potential risk for device implantation?
>>>> Then why must the admin vq?
>>> Because administration commands and admin vq does not impose devices to
>> implement thousands of registers which must have time bound completion
>> guarantee.
>>> The large part of industry including SIOV devices led by Intel and others are
>> moving away from register access mode.
>>> To summarize, administration commands and queue offer following benefits.
>>>
>>> 1. Ability to do bulk data transfer between driver and device
>>>
>>> 2. Ability to parallelize the work within driver and within device
>>> within single or multiple virtqueues
>>>
>>> 3. Eliminates implementing PCI read/write MMIO registers which demand
>>> low latency response interval
>>>
>>> 4. Better utilize host cpu as no one needs to poll on the device
>>> register for completion
>>>
>>> 5. Ability to handle variability in command completion by device and
>>> ability to notify the driver
>>>
>>> If this does not satisfy you, please refer to some of the past email discussions
>> during administration virtuqueue time.
>> I think you mixed up the facility and the implementation in my series, please
>> read.
> I don’t know what you refer to. You asked "why AQ is must?" I answered above what AQ has to offer than some synchronous register.
Again, we are implementing facilities, V2 will include inflgiht 
descriptors and dirty page tracking. That works for LM.

>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:01                                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:01 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 9:43 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 6:33 PM
>>
>> On 9/12/2023 5:21 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed and
>>>> dedicated resource to serve the VMs, the question still remains, does
>>>> is scale to server big amount of devices migration? how many admin
>>>> vqs do you need to serve 10 VMs, how many for 100? and so on? How to
>>>> scale?
>>>>
>>> Yes, it scales within the AQ and across multiple AQs.
>>> Please consult your board designers to know such limits for your device.
>> scales require multiple AQs, then how many should a vendor provide for the
>> worst case?
>>
>> I am boring for the same repeating questions.
> I said it scales, within the AQ. (and across AQs).
> I have answered enough times, so I will stop on same repeated question.
> Your repeated question is not helping anyone as it is not in the scope of virtio.
>
> If you think it is, please get it written first for RSS and MQ in net section and post for review.
You missed the point of the question and I agree no need to discuss this 
anymore.
>
>>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in reasonable
>> time?
>>>> If not, how many exactly.
>>>>
>>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
>> I am not sure, the aq is limitless? Can serve thousands of VMs in a reasonable
>> time? Like in 300ms?
>>
> Yes.
really? limitless?
>
>> If you say, that require multiple AQ, then how many should a vendor provide?
>>
> I didn’t say multiple AQs must be used.
> It is same as NIC RQs.
don't you agree a single vq has its own performance limitations?
>
>> Don't say the board designer own the risks.
>>>> And register does not need to scale, it resides on the VF and only
>>>> serve the VF.
>>>>
>>> Since its per VF, by nature it is linearly growing entity that the board design
>> needs to support read and write with guaranteed timing.
>>> It clearly scaled poor than queue.
>> Please read my series. For example, we introduce a new bit SUSPEND in the
>> \field{device status}, any scalability issues here?
> That must behave like queue_reset, (it must get acknowledged from the device) that it is suspended.
> And that brings the scale issue.
In this series, it says:
+When setting SUSPEND, the driver MUST re-read \field{device status} to 
ensure the SUSPEND bit is set.

And this is nothing to do with scale.
> On top of that once the device is SUSPENDED, it cannot accept some other RESET_VQ command.
so as SiWei suggested, there will be a new feature bit introduced in V2 
for vq reset.
>
>>>> It does not reside on the PF to migrate the VFs.
>>> Hence it does not scale and cannot do parallel operation within the VF, unless
>> each register is replicated.
>> Why its not scale? It is a per device facility.
> Because the device needs to answer per device through some large scale memory to fit in a response time.
Again, it is a per-device facility, and it is register based serve the 
only one device itself.
And we do not plan to log the dirty pages in bar.
>
>> Why do you need parallel operation against the LM facility?
> Because your downtime was 300msec for 1000 VMs.
the LM facility in this series is per-device, it only severs itself.
>
>> That doesn't make a lot of sense.
>>> Using register of a queue for bulk data transfer is solved question when the
>> virtio spec was born.
>>> I don’t see a point to discuss it.
>>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
>> data transport"
>> Where do you see the series intends to transfer bulk data through registers?
>>>> VFs config space can use the device dedicated resource like the bandwidth.
>>>>
>>>> for AQ, still you need to reserve resource and how much?
>>> It depends on your board, please consult your board designer to know
>> depending on the implementation.
>>>   From spec point of view, it should not be same as any other virtqueue.
>> so the vendor own the risk to implement AQ LM? Why they have to?
>>>>> No. I do not agree. It can fail and very hard for board designers.
>>>>> AQs are more reliable way to transport bulk data in scalable manner
>>>>> for tens
>>>> of member devices.
>>>> Really? How often do you observe virtio config space fail?
>>> On Intel Icelake server we have seen it failing with 128 VFs.
>>> And device needs to do very weird things to support 1000+ VFs forever
>> expanding config space, which is not the topic of this discussion anyway.
>> That is your setup problem.
>>>
>>>>>> Please allow me to provide an extreme example, is one single admin
>>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
>>>>> It is left to the device implementation. Just like RSS and multi queue
>> support?
>>>>> Is one Q enough for 800Gbps to 10Mbps link?
>>>>> Answer is: Not the scope of specification, spec provide the
>>>>> framework to scale
>>>> this way, but not impose on the device.
>>>> Even if not support RSS or MQ, the device still can work with
>>>> performance overhead, not fail.
>>>>
>>> _work_ is subjective.
>>> The financial transaction (application) failed. Packeted worked.
>>> LM commands were successful, but it was not timely.
>>>
>>> Same same..
>>>
>>>> Insufficient bandwidth & resource caused live migration fail is
>>>> totally different.
>>> Very abstract point and unrelated to administration commands.
>> It is your design facing the problem.
>>>>>> If not, two or
>>>>>> three or what number?
>>>>> It really does not matter. Its wrong point to discuss here.
>>>>> Number of queues and command execution depends on the device
>>>> implementation.
>>>>> A financial transaction application can timeout when a device
>>>>> queuing delay
>>>> for virtio net rx queue is long.
>>>>> And we don’t put details about such things in specification.
>>>>> Spec takes the requirements and provides driver device interface to
>>>> implement and scale.
>>>>> I still don’t follow the motivation behind the question.
>>>>> Is your question: How many admin queues are needed to migrate N
>>>>> member
>>>> devices? If so, it is implementation specific.
>>>>> It is similar to how such things depend on implementation for 30
>>>>> virtio device
>>>> types.
>>>>> And if are implying that because it is implementation specific, that
>>>>> is why
>>>> administration queue should not be used, but some configuration
>>>> register should be used.
>>>>> Than you should propose a config register interface to post
>>>>> virtqueue
>>>> descriptors that way for 30 device types!
>>>> if so, leave it as undefined? A potential risk for device implantation?
>>>> Then why must the admin vq?
>>> Because administration commands and admin vq does not impose devices to
>> implement thousands of registers which must have time bound completion
>> guarantee.
>>> The large part of industry including SIOV devices led by Intel and others are
>> moving away from register access mode.
>>> To summarize, administration commands and queue offer following benefits.
>>>
>>> 1. Ability to do bulk data transfer between driver and device
>>>
>>> 2. Ability to parallelize the work within driver and within device
>>> within single or multiple virtqueues
>>>
>>> 3. Eliminates implementing PCI read/write MMIO registers which demand
>>> low latency response interval
>>>
>>> 4. Better utilize host cpu as no one needs to poll on the device
>>> register for completion
>>>
>>> 5. Ability to handle variability in command completion by device and
>>> ability to notify the driver
>>>
>>> If this does not satisfy you, please refer to some of the past email discussions
>> during administration virtuqueue time.
>> I think you mixed up the facility and the implementation in my series, please
>> read.
> I don’t know what you refer to. You asked "why AQ is must?" I answered above what AQ has to offer than some synchronous register.
Again, we are implementing facilities, V2 will include inflgiht 
descriptors and dirty page tracking. That works for LM.

>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  2:23                                                       ` Parav Pandit
@ 2023-09-13  4:03                                                         ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:03 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 10:23 AM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:45 PM
>>> Do you find the administration commands we proposed in [1] useful for
>> nested case?
>>> If not, both will likely diverge.
>> Not till now.
> I don’t think you reviewed [1] enough.
> Following functionality that you want to post in v1 is already covered.
> Why cannot you use it from [1]?
>
> a. Dirty page tracking (write recording in [1]),
> b. device suspend/resume (mode setting)
> c. inflight descriptors (device context)
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
you cut off the message, I don't know which conversation you are 
replying to.

But anyway, as pointed out many times, we are implementing basic facilities.
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:03                                                         ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:03 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 10:23 AM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 3:45 PM
>>> Do you find the administration commands we proposed in [1] useful for
>> nested case?
>>> If not, both will likely diverge.
>> Not till now.
> I don’t think you reviewed [1] enough.
> Following functionality that you want to post in v1 is already covered.
> Why cannot you use it from [1]?
>
> a. Dirty page tracking (write recording in [1]),
> b. device suspend/resume (mode setting)
> c. inflight descriptors (device context)
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
you cut off the message, I don't know which conversation you are 
replying to.

But anyway, as pointed out many times, we are implementing basic facilities.
>
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:01                                                       ` Zhu, Lingshan
@ 2023-09-13  4:12                                                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:12 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:31 AM
> 
> On 9/12/2023 9:43 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 6:33 PM
> >>
> >> On 9/12/2023 5:21 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed
> >>>> and dedicated resource to serve the VMs, the question still
> >>>> remains, does is scale to server big amount of devices migration?
> >>>> how many admin vqs do you need to serve 10 VMs, how many for 100?
> >>>> and so on? How to scale?
> >>>>
> >>> Yes, it scales within the AQ and across multiple AQs.
> >>> Please consult your board designers to know such limits for your device.
> >> scales require multiple AQs, then how many should a vendor provide
> >> for the worst case?
> >>
> >> I am boring for the same repeating questions.
> > I said it scales, within the AQ. (and across AQs).
> > I have answered enough times, so I will stop on same repeated question.
> > Your repeated question is not helping anyone as it is not in the scope of virtio.
> >
> > If you think it is, please get it written first for RSS and MQ in net section and
> post for review.
> You missed the point of the question and I agree no need to discuss this
> anymore.
Ok. thanks.

> >
> >>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in
> >>>> reasonable
> >> time?
> >>>> If not, how many exactly.
> >>>>
> >>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
> >> I am not sure, the aq is limitless? Can serve thousands of VMs in a
> >> reasonable time? Like in 300ms?
> >>
> > Yes.
> really? limitless?
> >
I answered yes for " Can serve thousands of VMs in reasonable time? Like in 300ms?"?
VQ depth defines the VQ's limit.

> >> If you say, that require multiple AQ, then how many should a vendor
> provide?
> >>
> > I didn’t say multiple AQs must be used.
> > It is same as NIC RQs.
> don't you agree a single vq has its own performance limitations?
For LM I don’t see the limitation. 
The finite limit an AQ has, such limitation is no different than some register write poll with one entry at a time per device.

> In this series, it says:
> +When setting SUSPEND, the driver MUST re-read \field{device status} to
> ensure the SUSPEND bit is set.
> 
> And this is nothing to do with scale.
Hence, it is bringing same scale QOS limitation on register too that you claim may be present in the AQ.

And hence, I responded earlier that when most things are not done through BAR, so there is no need to do suspend/resume via BAR either.
And hence the mode setting command of [1] is just fine.

> > On top of that once the device is SUSPENDED, it cannot accept some other
> RESET_VQ command.
> so as SiWei suggested, there will be a new feature bit introduced in V2
> for vq reset.
VQ cannot be RESET after the device reset as you wrote.

> >
> >>>> It does not reside on the PF to migrate the VFs.
> >>> Hence it does not scale and cannot do parallel operation within the VF,
> unless
> >> each register is replicated.
> >> Why its not scale? It is a per device facility.
> > Because the device needs to answer per device through some large scale
> memory to fit in a response time.
> Again, it is a per-device facility, and it is register based serve the
> only one device itself.
> And we do not plan to log the dirty pages in bar.
Hence, there is no reason to wrap suspend resume on the BAR either.
The mode setting admin command is just fine.

> >
> >> Why do you need parallel operation against the LM facility?
> > Because your downtime was 300msec for 1000 VMs.
> the LM facility in this series is per-device, it only severs itself.
And that single threading and single threading per VQ reset via single register wont scale.

> >
> >> That doesn't make a lot of sense.
> >>> Using register of a queue for bulk data transfer is solved question when the
> >> virtio spec was born.
> >>> I don’t see a point to discuss it.
> >>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
> >> data transport"
> >> Where do you see the series intends to transfer bulk data through registers?
> >>>> VFs config space can use the device dedicated resource like the
> bandwidth.
> >>>>
> >>>> for AQ, still you need to reserve resource and how much?
> >>> It depends on your board, please consult your board designer to know
> >> depending on the implementation.
> >>>   From spec point of view, it should not be same as any other virtqueue.
> >> so the vendor own the risk to implement AQ LM? Why they have to?
> >>>>> No. I do not agree. It can fail and very hard for board designers.
> >>>>> AQs are more reliable way to transport bulk data in scalable manner
> >>>>> for tens
> >>>> of member devices.
> >>>> Really? How often do you observe virtio config space fail?
> >>> On Intel Icelake server we have seen it failing with 128 VFs.
> >>> And device needs to do very weird things to support 1000+ VFs forever
> >> expanding config space, which is not the topic of this discussion anyway.
> >> That is your setup problem.
> >>>
> >>>>>> Please allow me to provide an extreme example, is one single admin
> >>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
> >>>>> It is left to the device implementation. Just like RSS and multi queue
> >> support?
> >>>>> Is one Q enough for 800Gbps to 10Mbps link?
> >>>>> Answer is: Not the scope of specification, spec provide the
> >>>>> framework to scale
> >>>> this way, but not impose on the device.
> >>>> Even if not support RSS or MQ, the device still can work with
> >>>> performance overhead, not fail.
> >>>>
> >>> _work_ is subjective.
> >>> The financial transaction (application) failed. Packeted worked.
> >>> LM commands were successful, but it was not timely.
> >>>
> >>> Same same..
> >>>
> >>>> Insufficient bandwidth & resource caused live migration fail is
> >>>> totally different.
> >>> Very abstract point and unrelated to administration commands.
> >> It is your design facing the problem.
> >>>>>> If not, two or
> >>>>>> three or what number?
> >>>>> It really does not matter. Its wrong point to discuss here.
> >>>>> Number of queues and command execution depends on the device
> >>>> implementation.
> >>>>> A financial transaction application can timeout when a device
> >>>>> queuing delay
> >>>> for virtio net rx queue is long.
> >>>>> And we don’t put details about such things in specification.
> >>>>> Spec takes the requirements and provides driver device interface to
> >>>> implement and scale.
> >>>>> I still don’t follow the motivation behind the question.
> >>>>> Is your question: How many admin queues are needed to migrate N
> >>>>> member
> >>>> devices? If so, it is implementation specific.
> >>>>> It is similar to how such things depend on implementation for 30
> >>>>> virtio device
> >>>> types.
> >>>>> And if are implying that because it is implementation specific, that
> >>>>> is why
> >>>> administration queue should not be used, but some configuration
> >>>> register should be used.
> >>>>> Than you should propose a config register interface to post
> >>>>> virtqueue
> >>>> descriptors that way for 30 device types!
> >>>> if so, leave it as undefined? A potential risk for device implantation?
> >>>> Then why must the admin vq?
> >>> Because administration commands and admin vq does not impose devices
> to
> >> implement thousands of registers which must have time bound completion
> >> guarantee.
> >>> The large part of industry including SIOV devices led by Intel and others are
> >> moving away from register access mode.
> >>> To summarize, administration commands and queue offer following
> benefits.
> >>>
> >>> 1. Ability to do bulk data transfer between driver and device
> >>>
> >>> 2. Ability to parallelize the work within driver and within device
> >>> within single or multiple virtqueues
> >>>
> >>> 3. Eliminates implementing PCI read/write MMIO registers which demand
> >>> low latency response interval
> >>>
> >>> 4. Better utilize host cpu as no one needs to poll on the device
> >>> register for completion
> >>>
> >>> 5. Ability to handle variability in command completion by device and
> >>> ability to notify the driver
> >>>
> >>> If this does not satisfy you, please refer to some of the past email
> discussions
> >> during administration virtuqueue time.
> >> I think you mixed up the facility and the implementation in my series, please
> >> read.
> > I don’t know what you refer to. You asked "why AQ is must?" I answered
> above what AQ has to offer than some synchronous register.
> Again, we are implementing facilities, V2 will include inflgiht
> descriptors and dirty page tracking. That works for LM.

It can be named under anything, what matters is how/where it is used?
So "facility" and "implementation" in your above comment are just abstract word.
I answered you "Why AQ is must"?

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:12                                                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:12 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:31 AM
> 
> On 9/12/2023 9:43 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 6:33 PM
> >>
> >> On 9/12/2023 5:21 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed
> >>>> and dedicated resource to serve the VMs, the question still
> >>>> remains, does is scale to server big amount of devices migration?
> >>>> how many admin vqs do you need to serve 10 VMs, how many for 100?
> >>>> and so on? How to scale?
> >>>>
> >>> Yes, it scales within the AQ and across multiple AQs.
> >>> Please consult your board designers to know such limits for your device.
> >> scales require multiple AQs, then how many should a vendor provide
> >> for the worst case?
> >>
> >> I am boring for the same repeating questions.
> > I said it scales, within the AQ. (and across AQs).
> > I have answered enough times, so I will stop on same repeated question.
> > Your repeated question is not helping anyone as it is not in the scope of virtio.
> >
> > If you think it is, please get it written first for RSS and MQ in net section and
> post for review.
> You missed the point of the question and I agree no need to discuss this
> anymore.
Ok. thanks.

> >
> >>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in
> >>>> reasonable
> >> time?
> >>>> If not, how many exactly.
> >>>>
> >>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
> >> I am not sure, the aq is limitless? Can serve thousands of VMs in a
> >> reasonable time? Like in 300ms?
> >>
> > Yes.
> really? limitless?
> >
I answered yes for " Can serve thousands of VMs in reasonable time? Like in 300ms?"?
VQ depth defines the VQ's limit.

> >> If you say, that require multiple AQ, then how many should a vendor
> provide?
> >>
> > I didn’t say multiple AQs must be used.
> > It is same as NIC RQs.
> don't you agree a single vq has its own performance limitations?
For LM I don’t see the limitation. 
The finite limit an AQ has, such limitation is no different than some register write poll with one entry at a time per device.

> In this series, it says:
> +When setting SUSPEND, the driver MUST re-read \field{device status} to
> ensure the SUSPEND bit is set.
> 
> And this is nothing to do with scale.
Hence, it is bringing same scale QOS limitation on register too that you claim may be present in the AQ.

And hence, I responded earlier that when most things are not done through BAR, so there is no need to do suspend/resume via BAR either.
And hence the mode setting command of [1] is just fine.

> > On top of that once the device is SUSPENDED, it cannot accept some other
> RESET_VQ command.
> so as SiWei suggested, there will be a new feature bit introduced in V2
> for vq reset.
VQ cannot be RESET after the device reset as you wrote.

> >
> >>>> It does not reside on the PF to migrate the VFs.
> >>> Hence it does not scale and cannot do parallel operation within the VF,
> unless
> >> each register is replicated.
> >> Why its not scale? It is a per device facility.
> > Because the device needs to answer per device through some large scale
> memory to fit in a response time.
> Again, it is a per-device facility, and it is register based serve the
> only one device itself.
> And we do not plan to log the dirty pages in bar.
Hence, there is no reason to wrap suspend resume on the BAR either.
The mode setting admin command is just fine.

> >
> >> Why do you need parallel operation against the LM facility?
> > Because your downtime was 300msec for 1000 VMs.
> the LM facility in this series is per-device, it only severs itself.
And that single threading and single threading per VQ reset via single register wont scale.

> >
> >> That doesn't make a lot of sense.
> >>> Using register of a queue for bulk data transfer is solved question when the
> >> virtio spec was born.
> >>> I don’t see a point to discuss it.
> >>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
> >> data transport"
> >> Where do you see the series intends to transfer bulk data through registers?
> >>>> VFs config space can use the device dedicated resource like the
> bandwidth.
> >>>>
> >>>> for AQ, still you need to reserve resource and how much?
> >>> It depends on your board, please consult your board designer to know
> >> depending on the implementation.
> >>>   From spec point of view, it should not be same as any other virtqueue.
> >> so the vendor own the risk to implement AQ LM? Why they have to?
> >>>>> No. I do not agree. It can fail and very hard for board designers.
> >>>>> AQs are more reliable way to transport bulk data in scalable manner
> >>>>> for tens
> >>>> of member devices.
> >>>> Really? How often do you observe virtio config space fail?
> >>> On Intel Icelake server we have seen it failing with 128 VFs.
> >>> And device needs to do very weird things to support 1000+ VFs forever
> >> expanding config space, which is not the topic of this discussion anyway.
> >> That is your setup problem.
> >>>
> >>>>>> Please allow me to provide an extreme example, is one single admin
> >>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
> >>>>> It is left to the device implementation. Just like RSS and multi queue
> >> support?
> >>>>> Is one Q enough for 800Gbps to 10Mbps link?
> >>>>> Answer is: Not the scope of specification, spec provide the
> >>>>> framework to scale
> >>>> this way, but not impose on the device.
> >>>> Even if not support RSS or MQ, the device still can work with
> >>>> performance overhead, not fail.
> >>>>
> >>> _work_ is subjective.
> >>> The financial transaction (application) failed. Packeted worked.
> >>> LM commands were successful, but it was not timely.
> >>>
> >>> Same same..
> >>>
> >>>> Insufficient bandwidth & resource caused live migration fail is
> >>>> totally different.
> >>> Very abstract point and unrelated to administration commands.
> >> It is your design facing the problem.
> >>>>>> If not, two or
> >>>>>> three or what number?
> >>>>> It really does not matter. Its wrong point to discuss here.
> >>>>> Number of queues and command execution depends on the device
> >>>> implementation.
> >>>>> A financial transaction application can timeout when a device
> >>>>> queuing delay
> >>>> for virtio net rx queue is long.
> >>>>> And we don’t put details about such things in specification.
> >>>>> Spec takes the requirements and provides driver device interface to
> >>>> implement and scale.
> >>>>> I still don’t follow the motivation behind the question.
> >>>>> Is your question: How many admin queues are needed to migrate N
> >>>>> member
> >>>> devices? If so, it is implementation specific.
> >>>>> It is similar to how such things depend on implementation for 30
> >>>>> virtio device
> >>>> types.
> >>>>> And if are implying that because it is implementation specific, that
> >>>>> is why
> >>>> administration queue should not be used, but some configuration
> >>>> register should be used.
> >>>>> Than you should propose a config register interface to post
> >>>>> virtqueue
> >>>> descriptors that way for 30 device types!
> >>>> if so, leave it as undefined? A potential risk for device implantation?
> >>>> Then why must the admin vq?
> >>> Because administration commands and admin vq does not impose devices
> to
> >> implement thousands of registers which must have time bound completion
> >> guarantee.
> >>> The large part of industry including SIOV devices led by Intel and others are
> >> moving away from register access mode.
> >>> To summarize, administration commands and queue offer following
> benefits.
> >>>
> >>> 1. Ability to do bulk data transfer between driver and device
> >>>
> >>> 2. Ability to parallelize the work within driver and within device
> >>> within single or multiple virtqueues
> >>>
> >>> 3. Eliminates implementing PCI read/write MMIO registers which demand
> >>> low latency response interval
> >>>
> >>> 4. Better utilize host cpu as no one needs to poll on the device
> >>> register for completion
> >>>
> >>> 5. Ability to handle variability in command completion by device and
> >>> ability to notify the driver
> >>>
> >>> If this does not satisfy you, please refer to some of the past email
> discussions
> >> during administration virtuqueue time.
> >> I think you mixed up the facility and the implementation in my series, please
> >> read.
> > I don’t know what you refer to. You asked "why AQ is must?" I answered
> above what AQ has to offer than some synchronous register.
> Again, we are implementing facilities, V2 will include inflgiht
> descriptors and dirty page tracking. That works for LM.

It can be named under anything, what matters is how/where it is used?
So "facility" and "implementation" in your above comment are just abstract word.
I answered you "Why AQ is must"?

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12 13:35                                     ` Parav Pandit
@ 2023-09-13  4:13                                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:13 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 9:35 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 6:39 PM
>>
>> On 9/12/2023 6:41 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think my
>>>> series can not work with P2P
>>> Because it misses the intermediate mode STOP that we have in series [1].
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
>>> tml
>> Again, when SUSPEND:
>> 1) the device freezes, means stop operation in both data-path and control-path,
>> except the device status
> Exactly, including the RESET_VQ command also cannot be served because device is frozen.
see below
>
>> 2) a new feature bit will be introduced in V2, to allow RESET_VQ after SUSPEND
> RESET_VQ after suspend is simply wrong. Because device is already suspended to not respond to some  extra RESET_VQ command.
No, when the device presents SUSPEND, that means the device config space 
is stabilized at that moment, from the SW perspective
the device will not make changes to config space until !SUSPEND.

However at that moment, the driver can still make modification to the 
config space and the driver handles the synchronization(checks, re-read, 
etc),
so the driver is responsible for what it reads.

As you can see, this is not perfect, so SiWei suggest to implement a new 
feature bit to control this, and it will be implemented in V2.
>
>> 3) if there is a device doing P2P against the device.
>> They should be pass-through-ed to the same guest and should be suspended as
>> well for LM, or it is a security problem.
> There is no security problem. Multiple passthrough devices and P2P is already there in PCI using ACS for probably a decade now.
As you aware of ACS, that means you have to trust them all, for example 
P2P devices has to be placed in one IOMMU group, and all devices
in the group should be pass-through-ed to a guest
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:13                                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:13 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/12/2023 9:35 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 12, 2023 6:39 PM
>>
>> On 9/12/2023 6:41 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think my
>>>> series can not work with P2P
>>> Because it misses the intermediate mode STOP that we have in series [1].
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071.h
>>> tml
>> Again, when SUSPEND:
>> 1) the device freezes, means stop operation in both data-path and control-path,
>> except the device status
> Exactly, including the RESET_VQ command also cannot be served because device is frozen.
see below
>
>> 2) a new feature bit will be introduced in V2, to allow RESET_VQ after SUSPEND
> RESET_VQ after suspend is simply wrong. Because device is already suspended to not respond to some  extra RESET_VQ command.
No, when the device presents SUSPEND, that means the device config space 
is stabilized at that moment, from the SW perspective
the device will not make changes to config space until !SUSPEND.

However at that moment, the driver can still make modification to the 
config space and the driver handles the synchronization(checks, re-read, 
etc),
so the driver is responsible for what it reads.

As you can see, this is not perfect, so SiWei suggest to implement a new 
feature bit to control this, and it will be implemented in V2.
>
>> 3) if there is a device doing P2P against the device.
>> They should be pass-through-ed to the same guest and should be suspended as
>> well for LM, or it is a security problem.
> There is no security problem. Multiple passthrough devices and P2P is already there in PCI using ACS for probably a decade now.
As you aware of ACS, that means you have to trust them all, for example 
P2P devices has to be placed in one IOMMU group, and all devices
in the group should be pass-through-ed to a guest
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:03                                                         ` Zhu, Lingshan
@ 2023-09-13  4:15                                                           ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:15 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:33 AM
> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
> open.org; virtio-dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> 
> 
> On 9/13/2023 10:23 AM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 3:45 PM
> >>> Do you find the administration commands we proposed in [1] useful
> >>> for
> >> nested case?
> >>> If not, both will likely diverge.
> >> Not till now.
> > I don’t think you reviewed [1] enough.
> > Following functionality that you want to post in v1 is already covered.
> > Why cannot you use it from [1]?
> >
> > a. Dirty page tracking (write recording in [1]), b. device
> > suspend/resume (mode setting) c. inflight descriptors (device context)
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> you cut off the message, I don't know which conversation you are replying to.
> 
> But anyway, as pointed out many times, we are implementing basic facilities.

I asked you what parts of the series [1] can be used by you for inflight tracking, dirty tracking, suspend/resume.
You replied, none is useful.
And after that you said you plan to send v2 that does dirty page tracking, inflight tracking.

So I asked why you cannot use [1] that covers things that you plan to send in future?

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

Hope this clarifies.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:15                                                           ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:15 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:33 AM
> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
> open.org; virtio-dev@lists.oasis-open.org
> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
> VIRTIO_F_QUEUE_STATE
> 
> 
> 
> On 9/13/2023 10:23 AM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 3:45 PM
> >>> Do you find the administration commands we proposed in [1] useful
> >>> for
> >> nested case?
> >>> If not, both will likely diverge.
> >> Not till now.
> > I don’t think you reviewed [1] enough.
> > Following functionality that you want to post in v1 is already covered.
> > Why cannot you use it from [1]?
> >
> > a. Dirty page tracking (write recording in [1]), b. device
> > suspend/resume (mode setting) c. inflight descriptors (device context)
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > tml
> you cut off the message, I don't know which conversation you are replying to.
> 
> But anyway, as pointed out many times, we are implementing basic facilities.

I asked you what parts of the series [1] can be used by you for inflight tracking, dirty tracking, suspend/resume.
You replied, none is useful.
And after that you said you plan to send v2 that does dirty page tracking, inflight tracking.

So I asked why you cannot use [1] that covers things that you plan to send in future?

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

Hope this clarifies.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:13                                       ` Zhu, Lingshan
@ 2023-09-13  4:19                                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:19 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:44 AM
> 
> 
> On 9/12/2023 9:35 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 6:39 PM
> >>
> >> On 9/12/2023 6:41 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think
> >>>> my series can not work with P2P
> >>> Because it misses the intermediate mode STOP that we have in series [1].
> >>>
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071
> >>> .h
> >>> tml
> >> Again, when SUSPEND:
> >> 1) the device freezes, means stop operation in both data-path and
> >> control-path, except the device status
> > Exactly, including the RESET_VQ command also cannot be served because
> device is frozen.
> see below
> >
> >> 2) a new feature bit will be introduced in V2, to allow RESET_VQ
> >> after SUSPEND
> > RESET_VQ after suspend is simply wrong. Because device is already
> suspended to not respond to some  extra RESET_VQ command.
> No, when the device presents SUSPEND, that means the device config space is
> stabilized at that moment, from the SW perspective the device will not make
> changes to config space until !SUSPEND.
> 
> However at that moment, the driver can still make modification to the config
> space and the driver handles the synchronization(checks, re-read, etc), so the
> driver is responsible for what it reads.
>
It should be named as SUSPEND_CFG_SPACE.!
All of this frankly seems intrusive enough as Michael pointed out.
Good luck.
 
> As you can see, this is not perfect, so SiWei suggest to implement a new feature
> bit to control this, and it will be implemented in V2.
> >
> >> 3) if there is a device doing P2P against the device.
> >> They should be pass-through-ed to the same guest and should be
> >> suspended as well for LM, or it is a security problem.
> > There is no security problem. Multiple passthrough devices and P2P is already
> there in PCI using ACS for probably a decade now.
> As you aware of ACS, that means you have to trust them all, for example P2P
> devices has to be placed in one IOMMU group, and all devices in the group
> should be pass-through-ed to a guest
> >
Such things are done by the hypervisor already. There is nothing virtio specific here.
There is no security problem.
If there is one, please file CVE for generic P2P in the pci-sig and we will handle them this Thu meeting.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:19                                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:19 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:44 AM
> 
> 
> On 9/12/2023 9:35 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 12, 2023 6:39 PM
> >>
> >> On 9/12/2023 6:41 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think
> >>>> my series can not work with P2P
> >>> Because it misses the intermediate mode STOP that we have in series [1].
> >>>
> >>> [1]
> >>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071
> >>> .h
> >>> tml
> >> Again, when SUSPEND:
> >> 1) the device freezes, means stop operation in both data-path and
> >> control-path, except the device status
> > Exactly, including the RESET_VQ command also cannot be served because
> device is frozen.
> see below
> >
> >> 2) a new feature bit will be introduced in V2, to allow RESET_VQ
> >> after SUSPEND
> > RESET_VQ after suspend is simply wrong. Because device is already
> suspended to not respond to some  extra RESET_VQ command.
> No, when the device presents SUSPEND, that means the device config space is
> stabilized at that moment, from the SW perspective the device will not make
> changes to config space until !SUSPEND.
> 
> However at that moment, the driver can still make modification to the config
> space and the driver handles the synchronization(checks, re-read, etc), so the
> driver is responsible for what it reads.
>
It should be named as SUSPEND_CFG_SPACE.!
All of this frankly seems intrusive enough as Michael pointed out.
Good luck.
 
> As you can see, this is not perfect, so SiWei suggest to implement a new feature
> bit to control this, and it will be implemented in V2.
> >
> >> 3) if there is a device doing P2P against the device.
> >> They should be pass-through-ed to the same guest and should be
> >> suspended as well for LM, or it is a security problem.
> > There is no security problem. Multiple passthrough devices and P2P is already
> there in PCI using ACS for probably a decade now.
> As you aware of ACS, that means you have to trust them all, for example P2P
> devices has to be placed in one IOMMU group, and all devices in the group
> should be pass-through-ed to a guest
> >
Such things are done by the hypervisor already. There is nothing virtio specific here.
There is no security problem.
If there is one, please file CVE for generic P2P in the pci-sig and we will handle them this Thu meeting.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:12                                                         ` Parav Pandit
@ 2023-09-13  4:20                                                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:20 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:12 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:31 AM
>>
>> On 9/12/2023 9:43 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 6:33 PM
>>>>
>>>> On 9/12/2023 5:21 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed
>>>>>> and dedicated resource to serve the VMs, the question still
>>>>>> remains, does is scale to server big amount of devices migration?
>>>>>> how many admin vqs do you need to serve 10 VMs, how many for 100?
>>>>>> and so on? How to scale?
>>>>>>
>>>>> Yes, it scales within the AQ and across multiple AQs.
>>>>> Please consult your board designers to know such limits for your device.
>>>> scales require multiple AQs, then how many should a vendor provide
>>>> for the worst case?
>>>>
>>>> I am boring for the same repeating questions.
>>> I said it scales, within the AQ. (and across AQs).
>>> I have answered enough times, so I will stop on same repeated question.
>>> Your repeated question is not helping anyone as it is not in the scope of virtio.
>>>
>>> If you think it is, please get it written first for RSS and MQ in net section and
>> post for review.
>> You missed the point of the question and I agree no need to discuss this
>> anymore.
> Ok. thanks.
>
>>>>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in
>>>>>> reasonable
>>>> time?
>>>>>> If not, how many exactly.
>>>>>>
>>>>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
>>>> I am not sure, the aq is limitless? Can serve thousands of VMs in a
>>>> reasonable time? Like in 300ms?
>>>>
>>> Yes.
>> really? limitless?
> I answered yes for " Can serve thousands of VMs in reasonable time? Like in 300ms?"?
> VQ depth defines the VQ's limit.
still sounds like limitless and I will stop arguing this as you can see 
if there is REALLY
a queue can be limitless, we even don't need Multi-queue or RSS.
>
>>>> If you say, that require multiple AQ, then how many should a vendor
>> provide?
>>> I didn’t say multiple AQs must be used.
>>> It is same as NIC RQs.
>> don't you agree a single vq has its own performance limitations?
> For LM I don’t see the limitation.
> The finite limit an AQ has, such limitation is no different than some register write poll with one entry at a time per device.
see above, and we are implementing per device facilities.
>
>> In this series, it says:
>> +When setting SUSPEND, the driver MUST re-read \field{device status} to
>> ensure the SUSPEND bit is set.
>>
>> And this is nothing to do with scale.
> Hence, it is bringing same scale QOS limitation on register too that you claim may be present in the AQ.
>
> And hence, I responded earlier that when most things are not done through BAR, so there is no need to do suspend/resume via BAR either.
> And hence the mode setting command of [1] is just fine.
The bar registers are almost "triggers"
>
>>> On top of that once the device is SUSPENDED, it cannot accept some other
>> RESET_VQ command.
>> so as SiWei suggested, there will be a new feature bit introduced in V2
>> for vq reset.
> VQ cannot be RESET after the device reset as you wrote.
It is device SUSPEND, not reset.
>
>>>>>> It does not reside on the PF to migrate the VFs.
>>>>> Hence it does not scale and cannot do parallel operation within the VF,
>> unless
>>>> each register is replicated.
>>>> Why its not scale? It is a per device facility.
>>> Because the device needs to answer per device through some large scale
>> memory to fit in a response time.
>> Again, it is a per-device facility, and it is register based serve the
>> only one device itself.
>> And we do not plan to log the dirty pages in bar.
> Hence, there is no reason to wrap suspend resume on the BAR either.
> The mode setting admin command is just fine.
They are device status bits.
>
>>>> Why do you need parallel operation against the LM facility?
>>> Because your downtime was 300msec for 1000 VMs.
>> the LM facility in this series is per-device, it only severs itself.
> And that single threading and single threading per VQ reset via single register wont scale.
it is per-device facility, for example, on the VF, not the owner PF.
>
>>>> That doesn't make a lot of sense.
>>>>> Using register of a queue for bulk data transfer is solved question when the
>>>> virtio spec was born.
>>>>> I don’t see a point to discuss it.
>>>>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
>>>> data transport"
>>>> Where do you see the series intends to transfer bulk data through registers?
>>>>>> VFs config space can use the device dedicated resource like the
>> bandwidth.
>>>>>> for AQ, still you need to reserve resource and how much?
>>>>> It depends on your board, please consult your board designer to know
>>>> depending on the implementation.
>>>>>    From spec point of view, it should not be same as any other virtqueue.
>>>> so the vendor own the risk to implement AQ LM? Why they have to?
>>>>>>> No. I do not agree. It can fail and very hard for board designers.
>>>>>>> AQs are more reliable way to transport bulk data in scalable manner
>>>>>>> for tens
>>>>>> of member devices.
>>>>>> Really? How often do you observe virtio config space fail?
>>>>> On Intel Icelake server we have seen it failing with 128 VFs.
>>>>> And device needs to do very weird things to support 1000+ VFs forever
>>>> expanding config space, which is not the topic of this discussion anyway.
>>>> That is your setup problem.
>>>>>>>> Please allow me to provide an extreme example, is one single admin
>>>>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
>>>>>>> It is left to the device implementation. Just like RSS and multi queue
>>>> support?
>>>>>>> Is one Q enough for 800Gbps to 10Mbps link?
>>>>>>> Answer is: Not the scope of specification, spec provide the
>>>>>>> framework to scale
>>>>>> this way, but not impose on the device.
>>>>>> Even if not support RSS or MQ, the device still can work with
>>>>>> performance overhead, not fail.
>>>>>>
>>>>> _work_ is subjective.
>>>>> The financial transaction (application) failed. Packeted worked.
>>>>> LM commands were successful, but it was not timely.
>>>>>
>>>>> Same same..
>>>>>
>>>>>> Insufficient bandwidth & resource caused live migration fail is
>>>>>> totally different.
>>>>> Very abstract point and unrelated to administration commands.
>>>> It is your design facing the problem.
>>>>>>>> If not, two or
>>>>>>>> three or what number?
>>>>>>> It really does not matter. Its wrong point to discuss here.
>>>>>>> Number of queues and command execution depends on the device
>>>>>> implementation.
>>>>>>> A financial transaction application can timeout when a device
>>>>>>> queuing delay
>>>>>> for virtio net rx queue is long.
>>>>>>> And we don’t put details about such things in specification.
>>>>>>> Spec takes the requirements and provides driver device interface to
>>>>>> implement and scale.
>>>>>>> I still don’t follow the motivation behind the question.
>>>>>>> Is your question: How many admin queues are needed to migrate N
>>>>>>> member
>>>>>> devices? If so, it is implementation specific.
>>>>>>> It is similar to how such things depend on implementation for 30
>>>>>>> virtio device
>>>>>> types.
>>>>>>> And if are implying that because it is implementation specific, that
>>>>>>> is why
>>>>>> administration queue should not be used, but some configuration
>>>>>> register should be used.
>>>>>>> Than you should propose a config register interface to post
>>>>>>> virtqueue
>>>>>> descriptors that way for 30 device types!
>>>>>> if so, leave it as undefined? A potential risk for device implantation?
>>>>>> Then why must the admin vq?
>>>>> Because administration commands and admin vq does not impose devices
>> to
>>>> implement thousands of registers which must have time bound completion
>>>> guarantee.
>>>>> The large part of industry including SIOV devices led by Intel and others are
>>>> moving away from register access mode.
>>>>> To summarize, administration commands and queue offer following
>> benefits.
>>>>> 1. Ability to do bulk data transfer between driver and device
>>>>>
>>>>> 2. Ability to parallelize the work within driver and within device
>>>>> within single or multiple virtqueues
>>>>>
>>>>> 3. Eliminates implementing PCI read/write MMIO registers which demand
>>>>> low latency response interval
>>>>>
>>>>> 4. Better utilize host cpu as no one needs to poll on the device
>>>>> register for completion
>>>>>
>>>>> 5. Ability to handle variability in command completion by device and
>>>>> ability to notify the driver
>>>>>
>>>>> If this does not satisfy you, please refer to some of the past email
>> discussions
>>>> during administration virtuqueue time.
>>>> I think you mixed up the facility and the implementation in my series, please
>>>> read.
>>> I don’t know what you refer to. You asked "why AQ is must?" I answered
>> above what AQ has to offer than some synchronous register.
>> Again, we are implementing facilities, V2 will include inflgiht
>> descriptors and dirty page tracking. That works for LM.
> It can be named under anything, what matters is how/where it is used?
> So "facility" and "implementation" in your above comment are just abstract word.
> I answered you "Why AQ is must"?
see above and please feel free to reuse the basic facilities if you like 
in your AQ LM


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:20                                                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:20 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:12 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:31 AM
>>
>> On 9/12/2023 9:43 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 6:33 PM
>>>>
>>>> On 9/12/2023 5:21 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 2:33 PM admin vq require fixed
>>>>>> and dedicated resource to serve the VMs, the question still
>>>>>> remains, does is scale to server big amount of devices migration?
>>>>>> how many admin vqs do you need to serve 10 VMs, how many for 100?
>>>>>> and so on? How to scale?
>>>>>>
>>>>> Yes, it scales within the AQ and across multiple AQs.
>>>>> Please consult your board designers to know such limits for your device.
>>>> scales require multiple AQs, then how many should a vendor provide
>>>> for the worst case?
>>>>
>>>> I am boring for the same repeating questions.
>>> I said it scales, within the AQ. (and across AQs).
>>> I have answered enough times, so I will stop on same repeated question.
>>> Your repeated question is not helping anyone as it is not in the scope of virtio.
>>>
>>> If you think it is, please get it written first for RSS and MQ in net section and
>> post for review.
>> You missed the point of the question and I agree no need to discuss this
>> anymore.
> Ok. thanks.
>
>>>>>> If one admin vq can serve 100 VMs, can it migrate 1000VMs in
>>>>>> reasonable
>>>> time?
>>>>>> If not, how many exactly.
>>>>>>
>>>>> Yes, it can serve both 100 and 1000 VMs in reasonable time.
>>>> I am not sure, the aq is limitless? Can serve thousands of VMs in a
>>>> reasonable time? Like in 300ms?
>>>>
>>> Yes.
>> really? limitless?
> I answered yes for " Can serve thousands of VMs in reasonable time? Like in 300ms?"?
> VQ depth defines the VQ's limit.
still sounds like limitless and I will stop arguing this as you can see 
if there is REALLY
a queue can be limitless, we even don't need Multi-queue or RSS.
>
>>>> If you say, that require multiple AQ, then how many should a vendor
>> provide?
>>> I didn’t say multiple AQs must be used.
>>> It is same as NIC RQs.
>> don't you agree a single vq has its own performance limitations?
> For LM I don’t see the limitation.
> The finite limit an AQ has, such limitation is no different than some register write poll with one entry at a time per device.
see above, and we are implementing per device facilities.
>
>> In this series, it says:
>> +When setting SUSPEND, the driver MUST re-read \field{device status} to
>> ensure the SUSPEND bit is set.
>>
>> And this is nothing to do with scale.
> Hence, it is bringing same scale QOS limitation on register too that you claim may be present in the AQ.
>
> And hence, I responded earlier that when most things are not done through BAR, so there is no need to do suspend/resume via BAR either.
> And hence the mode setting command of [1] is just fine.
The bar registers are almost "triggers"
>
>>> On top of that once the device is SUSPENDED, it cannot accept some other
>> RESET_VQ command.
>> so as SiWei suggested, there will be a new feature bit introduced in V2
>> for vq reset.
> VQ cannot be RESET after the device reset as you wrote.
It is device SUSPEND, not reset.
>
>>>>>> It does not reside on the PF to migrate the VFs.
>>>>> Hence it does not scale and cannot do parallel operation within the VF,
>> unless
>>>> each register is replicated.
>>>> Why its not scale? It is a per device facility.
>>> Because the device needs to answer per device through some large scale
>> memory to fit in a response time.
>> Again, it is a per-device facility, and it is register based serve the
>> only one device itself.
>> And we do not plan to log the dirty pages in bar.
> Hence, there is no reason to wrap suspend resume on the BAR either.
> The mode setting admin command is just fine.
They are device status bits.
>
>>>> Why do you need parallel operation against the LM facility?
>>> Because your downtime was 300msec for 1000 VMs.
>> the LM facility in this series is per-device, it only severs itself.
> And that single threading and single threading per VQ reset via single register wont scale.
it is per-device facility, for example, on the VF, not the owner PF.
>
>>>> That doesn't make a lot of sense.
>>>>> Using register of a queue for bulk data transfer is solved question when the
>>>> virtio spec was born.
>>>>> I don’t see a point to discuss it.
>>>>> Snippet from spec: " As a device can have zero or more virtqueues for bulk
>>>> data transport"
>>>> Where do you see the series intends to transfer bulk data through registers?
>>>>>> VFs config space can use the device dedicated resource like the
>> bandwidth.
>>>>>> for AQ, still you need to reserve resource and how much?
>>>>> It depends on your board, please consult your board designer to know
>>>> depending on the implementation.
>>>>>    From spec point of view, it should not be same as any other virtqueue.
>>>> so the vendor own the risk to implement AQ LM? Why they have to?
>>>>>>> No. I do not agree. It can fail and very hard for board designers.
>>>>>>> AQs are more reliable way to transport bulk data in scalable manner
>>>>>>> for tens
>>>>>> of member devices.
>>>>>> Really? How often do you observe virtio config space fail?
>>>>> On Intel Icelake server we have seen it failing with 128 VFs.
>>>>> And device needs to do very weird things to support 1000+ VFs forever
>>>> expanding config space, which is not the topic of this discussion anyway.
>>>> That is your setup problem.
>>>>>>>> Please allow me to provide an extreme example, is one single admin
>>>>>>>> vq limitless, that can serve hundreds to thousands of VMs migration?
>>>>>>> It is left to the device implementation. Just like RSS and multi queue
>>>> support?
>>>>>>> Is one Q enough for 800Gbps to 10Mbps link?
>>>>>>> Answer is: Not the scope of specification, spec provide the
>>>>>>> framework to scale
>>>>>> this way, but not impose on the device.
>>>>>> Even if not support RSS or MQ, the device still can work with
>>>>>> performance overhead, not fail.
>>>>>>
>>>>> _work_ is subjective.
>>>>> The financial transaction (application) failed. Packeted worked.
>>>>> LM commands were successful, but it was not timely.
>>>>>
>>>>> Same same..
>>>>>
>>>>>> Insufficient bandwidth & resource caused live migration fail is
>>>>>> totally different.
>>>>> Very abstract point and unrelated to administration commands.
>>>> It is your design facing the problem.
>>>>>>>> If not, two or
>>>>>>>> three or what number?
>>>>>>> It really does not matter. Its wrong point to discuss here.
>>>>>>> Number of queues and command execution depends on the device
>>>>>> implementation.
>>>>>>> A financial transaction application can timeout when a device
>>>>>>> queuing delay
>>>>>> for virtio net rx queue is long.
>>>>>>> And we don’t put details about such things in specification.
>>>>>>> Spec takes the requirements and provides driver device interface to
>>>>>> implement and scale.
>>>>>>> I still don’t follow the motivation behind the question.
>>>>>>> Is your question: How many admin queues are needed to migrate N
>>>>>>> member
>>>>>> devices? If so, it is implementation specific.
>>>>>>> It is similar to how such things depend on implementation for 30
>>>>>>> virtio device
>>>>>> types.
>>>>>>> And if are implying that because it is implementation specific, that
>>>>>>> is why
>>>>>> administration queue should not be used, but some configuration
>>>>>> register should be used.
>>>>>>> Than you should propose a config register interface to post
>>>>>>> virtqueue
>>>>>> descriptors that way for 30 device types!
>>>>>> if so, leave it as undefined? A potential risk for device implantation?
>>>>>> Then why must the admin vq?
>>>>> Because administration commands and admin vq does not impose devices
>> to
>>>> implement thousands of registers which must have time bound completion
>>>> guarantee.
>>>>> The large part of industry including SIOV devices led by Intel and others are
>>>> moving away from register access mode.
>>>>> To summarize, administration commands and queue offer following
>> benefits.
>>>>> 1. Ability to do bulk data transfer between driver and device
>>>>>
>>>>> 2. Ability to parallelize the work within driver and within device
>>>>> within single or multiple virtqueues
>>>>>
>>>>> 3. Eliminates implementing PCI read/write MMIO registers which demand
>>>>> low latency response interval
>>>>>
>>>>> 4. Better utilize host cpu as no one needs to poll on the device
>>>>> register for completion
>>>>>
>>>>> 5. Ability to handle variability in command completion by device and
>>>>> ability to notify the driver
>>>>>
>>>>> If this does not satisfy you, please refer to some of the past email
>> discussions
>>>> during administration virtuqueue time.
>>>> I think you mixed up the facility and the implementation in my series, please
>>>> read.
>>> I don’t know what you refer to. You asked "why AQ is must?" I answered
>> above what AQ has to offer than some synchronous register.
>> Again, we are implementing facilities, V2 will include inflgiht
>> descriptors and dirty page tracking. That works for LM.
> It can be named under anything, what matters is how/where it is used?
> So "facility" and "implementation" in your above comment are just abstract word.
> I answered you "Why AQ is must"?
see above and please feel free to reuse the basic facilities if you like 
in your AQ LM


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:15                                                           ` Parav Pandit
@ 2023-09-13  4:21                                                             ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:21 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:15 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:33 AM
>> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
>> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
>> open.org; virtio-dev@lists.oasis-open.org
>> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
>> VIRTIO_F_QUEUE_STATE
>>
>>
>>
>> On 9/13/2023 10:23 AM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 3:45 PM
>>>>> Do you find the administration commands we proposed in [1] useful
>>>>> for
>>>> nested case?
>>>>> If not, both will likely diverge.
>>>> Not till now.
>>> I don’t think you reviewed [1] enough.
>>> Following functionality that you want to post in v1 is already covered.
>>> Why cannot you use it from [1]?
>>>
>>> a. Dirty page tracking (write recording in [1]), b. device
>>> suspend/resume (mode setting) c. inflight descriptors (device context)
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> you cut off the message, I don't know which conversation you are replying to.
>>
>> But anyway, as pointed out many times, we are implementing basic facilities.
> I asked you what parts of the series [1] can be used by you for inflight tracking, dirty tracking, suspend/resume.
> You replied, none is useful.
> And after that you said you plan to send v2 that does dirty page tracking, inflight tracking.
>
> So I asked why you cannot use [1] that covers things that you plan to send in future?
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
>
> Hope this clarifies.
we plan to implement a self-contain solution


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:21                                                             ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:21 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:15 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:33 AM
>> To: Parav Pandit <parav@nvidia.com>; Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>; eperezma@redhat.com;
>> cohuck@redhat.com; stefanha@redhat.com; virtio-comment@lists.oasis-
>> open.org; virtio-dev@lists.oasis-open.org
>> Subject: Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement
>> VIRTIO_F_QUEUE_STATE
>>
>>
>>
>> On 9/13/2023 10:23 AM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 3:45 PM
>>>>> Do you find the administration commands we proposed in [1] useful
>>>>> for
>>>> nested case?
>>>>> If not, both will likely diverge.
>>>> Not till now.
>>> I don’t think you reviewed [1] enough.
>>> Following functionality that you want to post in v1 is already covered.
>>> Why cannot you use it from [1]?
>>>
>>> a. Dirty page tracking (write recording in [1]), b. device
>>> suspend/resume (mode setting) c. inflight descriptors (device context)
>>>
>>> [1]
>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
>>> tml
>> you cut off the message, I don't know which conversation you are replying to.
>>
>> But anyway, as pointed out many times, we are implementing basic facilities.
> I asked you what parts of the series [1] can be used by you for inflight tracking, dirty tracking, suspend/resume.
> You replied, none is useful.
> And after that you said you plan to send v2 that does dirty page tracking, inflight tracking.
>
> So I asked why you cannot use [1] that covers things that you plan to send in future?
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
>
> Hope this clarifies.
we plan to implement a self-contain solution


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:19                                         ` Parav Pandit
@ 2023-09-13  4:22                                           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:22 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:19 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:44 AM
>>
>>
>> On 9/12/2023 9:35 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 6:39 PM
>>>>
>>>> On 9/12/2023 6:41 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think
>>>>>> my series can not work with P2P
>>>>> Because it misses the intermediate mode STOP that we have in series [1].
>>>>>
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071
>>>>> .h
>>>>> tml
>>>> Again, when SUSPEND:
>>>> 1) the device freezes, means stop operation in both data-path and
>>>> control-path, except the device status
>>> Exactly, including the RESET_VQ command also cannot be served because
>> device is frozen.
>> see below
>>>> 2) a new feature bit will be introduced in V2, to allow RESET_VQ
>>>> after SUSPEND
>>> RESET_VQ after suspend is simply wrong. Because device is already
>> suspended to not respond to some  extra RESET_VQ command.
>> No, when the device presents SUSPEND, that means the device config space is
>> stabilized at that moment, from the SW perspective the device will not make
>> changes to config space until !SUSPEND.
>>
>> However at that moment, the driver can still make modification to the config
>> space and the driver handles the synchronization(checks, re-read, etc), so the
>> driver is responsible for what it reads.
>>
> It should be named as SUSPEND_CFG_SPACE.!
> All of this frankly seems intrusive enough as Michael pointed out.
> Good luck.
it also SUSPEND the data-path
>   
>> As you can see, this is not perfect, so SiWei suggest to implement a new feature
>> bit to control this, and it will be implemented in V2.
>>>> 3) if there is a device doing P2P against the device.
>>>> They should be pass-through-ed to the same guest and should be
>>>> suspended as well for LM, or it is a security problem.
>>> There is no security problem. Multiple passthrough devices and P2P is already
>> there in PCI using ACS for probably a decade now.
>> As you aware of ACS, that means you have to trust them all, for example P2P
>> devices has to be placed in one IOMMU group, and all devices in the group
>> should be pass-through-ed to a guest
> Such things are done by the hypervisor already. There is nothing virtio specific here.
> There is no security problem.
> If there is one, please file CVE for generic P2P in the pci-sig and we will handle them this Thu meeting.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:22                                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-13  4:22 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:19 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:44 AM
>>
>>
>> On 9/12/2023 9:35 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Tuesday, September 12, 2023 6:39 PM
>>>>
>>>> On 9/12/2023 6:41 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Tuesday, September 12, 2023 4:05 PM I mean, why do you think
>>>>>> my series can not work with P2P
>>>>> Because it misses the intermediate mode STOP that we have in series [1].
>>>>>
>>>>> [1]
>>>>> https://lists.oasis-open.org/archives/virtio-comment/202309/msg00071
>>>>> .h
>>>>> tml
>>>> Again, when SUSPEND:
>>>> 1) the device freezes, means stop operation in both data-path and
>>>> control-path, except the device status
>>> Exactly, including the RESET_VQ command also cannot be served because
>> device is frozen.
>> see below
>>>> 2) a new feature bit will be introduced in V2, to allow RESET_VQ
>>>> after SUSPEND
>>> RESET_VQ after suspend is simply wrong. Because device is already
>> suspended to not respond to some  extra RESET_VQ command.
>> No, when the device presents SUSPEND, that means the device config space is
>> stabilized at that moment, from the SW perspective the device will not make
>> changes to config space until !SUSPEND.
>>
>> However at that moment, the driver can still make modification to the config
>> space and the driver handles the synchronization(checks, re-read, etc), so the
>> driver is responsible for what it reads.
>>
> It should be named as SUSPEND_CFG_SPACE.!
> All of this frankly seems intrusive enough as Michael pointed out.
> Good luck.
it also SUSPEND the data-path
>   
>> As you can see, this is not perfect, so SiWei suggest to implement a new feature
>> bit to control this, and it will be implemented in V2.
>>>> 3) if there is a device doing P2P against the device.
>>>> They should be pass-through-ed to the same guest and should be
>>>> suspended as well for LM, or it is a security problem.
>>> There is no security problem. Multiple passthrough devices and P2P is already
>> there in PCI using ACS for probably a decade now.
>> As you aware of ACS, that means you have to trust them all, for example P2P
>> devices has to be placed in one IOMMU group, and all devices in the group
>> should be pass-through-ed to a guest
> Such things are done by the hypervisor already. There is nothing virtio specific here.
> There is no security problem.
> If there is one, please file CVE for generic P2P in the pci-sig and we will handle them this Thu meeting.
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:20                                                           ` Zhu, Lingshan
@ 2023-09-13  4:36                                                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:36 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:51 AM


> > VQ depth defines the VQ's limit.
> still sounds like limitless and I will stop arguing this as you can see if there is
> REALLY a queue can be limitless, we even don't need Multi-queue or RSS.

If you see some value in limitless queue, please add one.
I have not seen such construct until now and don’t see the need for it.

> >
> >>>> If you say, that require multiple AQ, then how many should a vendor
> >> provide?
> >>> I didn’t say multiple AQs must be used.
> >>> It is same as NIC RQs.
> >> don't you agree a single vq has its own performance limitations?
> > For LM I don’t see the limitation.
> > The finite limit an AQ has, such limitation is no different than some register
> write poll with one entry at a time per device.
> see above, and we are implementing per device facilities.
> >
> >> In this series, it says:
> >> +When setting SUSPEND, the driver MUST re-read \field{device status}
> >> +to
> >> ensure the SUSPEND bit is set.
> >>
> >> And this is nothing to do with scale.
> > Hence, it is bringing same scale QOS limitation on register too that you claim
> may be present in the AQ.
> >
> > And hence, I responded earlier that when most things are not done through
> BAR, so there is no need to do suspend/resume via BAR either.
> > And hence the mode setting command of [1] is just fine.
> The bar registers are almost "triggers"
> >
> >>> On top of that once the device is SUSPENDED, it cannot accept some
> >>> other
> >> RESET_VQ command.
> >> so as SiWei suggested, there will be a new feature bit introduced in
> >> V2 for vq reset.
> > VQ cannot be RESET after the device reset as you wrote.
> It is device SUSPEND, not reset.
> >
Suspend means suspend of English language.
It cannot accept more synchronous commands after that and not supposed to respond.

> >>>>>> It does not reside on the PF to migrate the VFs.
> >>>>> Hence it does not scale and cannot do parallel operation within
> >>>>> the VF,
> >> unless
> >>>> each register is replicated.
> >>>> Why its not scale? It is a per device facility.
> >>> Because the device needs to answer per device through some large
> >>> scale
> >> memory to fit in a response time.
> >> Again, it is a per-device facility, and it is register based serve
> >> the only one device itself.
> >> And we do not plan to log the dirty pages in bar.
> > Hence, there is no reason to wrap suspend resume on the BAR either.
> > The mode setting admin command is just fine.
> They are device status bits.
And it doesn't have to be.

> >
> >>>> Why do you need parallel operation against the LM facility?
> >>> Because your downtime was 300msec for 1000 VMs.
> >> the LM facility in this series is per-device, it only severs itself.
> > And that single threading and single threading per VQ reset via single register
> wont scale.
> it is per-device facility, for example, on the VF, not the owner PF.
And I repeatedly explained you that you never answered, is how such queue can work after device suspend.
A weird device bifurcation is not supported by pci and not to be done in virtio.

> see above and please feel free to reuse the basic facilities if you like in your AQ
> LM
The whole attitude that "We .." and use in "your" LM is just simply wrong.
Please work towards collaborative design in technical committee.
What you want to repeat was already posted, so take some time to review and utilize. If not, describe why it is not useful.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:36                                                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:36 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:51 AM


> > VQ depth defines the VQ's limit.
> still sounds like limitless and I will stop arguing this as you can see if there is
> REALLY a queue can be limitless, we even don't need Multi-queue or RSS.

If you see some value in limitless queue, please add one.
I have not seen such construct until now and don’t see the need for it.

> >
> >>>> If you say, that require multiple AQ, then how many should a vendor
> >> provide?
> >>> I didn’t say multiple AQs must be used.
> >>> It is same as NIC RQs.
> >> don't you agree a single vq has its own performance limitations?
> > For LM I don’t see the limitation.
> > The finite limit an AQ has, such limitation is no different than some register
> write poll with one entry at a time per device.
> see above, and we are implementing per device facilities.
> >
> >> In this series, it says:
> >> +When setting SUSPEND, the driver MUST re-read \field{device status}
> >> +to
> >> ensure the SUSPEND bit is set.
> >>
> >> And this is nothing to do with scale.
> > Hence, it is bringing same scale QOS limitation on register too that you claim
> may be present in the AQ.
> >
> > And hence, I responded earlier that when most things are not done through
> BAR, so there is no need to do suspend/resume via BAR either.
> > And hence the mode setting command of [1] is just fine.
> The bar registers are almost "triggers"
> >
> >>> On top of that once the device is SUSPENDED, it cannot accept some
> >>> other
> >> RESET_VQ command.
> >> so as SiWei suggested, there will be a new feature bit introduced in
> >> V2 for vq reset.
> > VQ cannot be RESET after the device reset as you wrote.
> It is device SUSPEND, not reset.
> >
Suspend means suspend of English language.
It cannot accept more synchronous commands after that and not supposed to respond.

> >>>>>> It does not reside on the PF to migrate the VFs.
> >>>>> Hence it does not scale and cannot do parallel operation within
> >>>>> the VF,
> >> unless
> >>>> each register is replicated.
> >>>> Why its not scale? It is a per device facility.
> >>> Because the device needs to answer per device through some large
> >>> scale
> >> memory to fit in a response time.
> >> Again, it is a per-device facility, and it is register based serve
> >> the only one device itself.
> >> And we do not plan to log the dirty pages in bar.
> > Hence, there is no reason to wrap suspend resume on the BAR either.
> > The mode setting admin command is just fine.
> They are device status bits.
And it doesn't have to be.

> >
> >>>> Why do you need parallel operation against the LM facility?
> >>> Because your downtime was 300msec for 1000 VMs.
> >> the LM facility in this series is per-device, it only severs itself.
> > And that single threading and single threading per VQ reset via single register
> wont scale.
> it is per-device facility, for example, on the VF, not the owner PF.
And I repeatedly explained you that you never answered, is how such queue can work after device suspend.
A weird device bifurcation is not supported by pci and not to be done in virtio.

> see above and please feel free to reuse the basic facilities if you like in your AQ
> LM
The whole attitude that "We .." and use in "your" LM is just simply wrong.
Please work towards collaborative design in technical committee.
What you want to repeat was already posted, so take some time to review and utilize. If not, describe why it is not useful.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:21                                                             ` Zhu, Lingshan
@ 2023-09-13  4:37                                                               ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:37 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:51 AM

> we plan to implement a self-contain solution
Make sure that works with device reset and FLR.
And if not, explain that it is for mediation mode related tricks.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:37                                                               ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:37 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:51 AM

> we plan to implement a self-contain solution
Make sure that works with device reset and FLR.
And if not, explain that it is for mediation mode related tricks.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:22                                           ` Zhu, Lingshan
@ 2023-09-13  4:39                                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:39 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:52 AM

> > It should be named as SUSPEND_CFG_SPACE.!
> > All of this frankly seems intrusive enough as Michael pointed out.
> > Good luck.
> it also SUSPEND the data-path
Ok so it works like Suspend of English dictionary, then after that any other VQ related commands don’t progress.
Because it is suspended.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:39                                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:39 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 13, 2023 9:52 AM

> > It should be named as SUSPEND_CFG_SPACE.!
> > All of this frankly seems intrusive enough as Michael pointed out.
> > Good luck.
> it also SUSPEND the data-path
Ok so it works like Suspend of English dictionary, then after that any other VQ related commands don’t progress.
Because it is suspended.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:11                 ` Parav Pandit
@ 2023-09-13  4:43                   ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:43 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Tue, Sep 12, 2023 at 2:11 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 12, 2023 9:48 AM
> >
> > On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Monday, September 11, 2023 12:01 PM
> > > >
> > > > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
> > wrote:
> > > > >
> > > > > Hi Michael,
> > > > >
> > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > > > Sent: Monday, September 11, 2023 8:31 AM
> > > > > >
> > > > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
> > > > > > <mst@redhat.com>
> > > > wrote:
> > > > > > >
> > > > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > > > This patch adds two new le16 fields to common configuration
> > > > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > > > >
> > > > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > > >
> > > > > > >
> > > > > > > I do not see why this would be pci specific at all.
> > > > > >
> > > > > > This is the PCI interface for live migration. The facility is not specific to
> > PCI.
> > > > > >
> > > > > > It can choose to reuse the common configuration or not, but the
> > > > > > semantic is general enough to be used by other transports. We
> > > > > > can introduce one for MMIO for sure.
> > > > > >
> > > > > > >
> > > > > > > But besides I thought work on live migration will use admin queue.
> > > > > > > This was explicitly one of the motivators.
> > > > > >
> > > > > Please find the proposal that uses administration commands for
> > > > > device
> > > > migration at [1] for passthrough devices.
> > > > >
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > > 61.h
> > > > > tml
> > > >
> > > > This proposal couples live migration with several requirements, and
> > > > suffers from the exact issues I've mentioned below.
> > > >
> > > It does not.
> > > Can you please list which one?
> > >
> > > > In some cases, it's even worse (coupling with PCI/SR-IOV, second
> > > > state machine other than the device status).
> > > >
> > > There is no state machine in [1].
> >
> > Isn't the migration modes of "active, stop, freeze" a state machine?
> >
> Huh, no. Each mode stops/starts specific thing.
> Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
> If you call suspend resume as states, it is still 2 state state machines. :)

It's not about how many states in a single state machine, it's about
how many state machines that exist for device status. Having more than
one creates big obstacles and complexity in the device. You need to
define the interaction of each state otherwise you leave undefined
behaviours.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:43                   ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:43 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Tue, Sep 12, 2023 at 2:11 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 12, 2023 9:48 AM
> >
> > On Mon, Sep 11, 2023 at 2:47 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Monday, September 11, 2023 12:01 PM
> > > >
> > > > On Mon, Sep 11, 2023 at 12:12 PM Parav Pandit <parav@nvidia.com>
> > wrote:
> > > > >
> > > > > Hi Michael,
> > > > >
> > > > > > From: virtio-comment@lists.oasis-open.org
> > > > > > <virtio-comment@lists.oasis- open.org> On Behalf Of Jason Wang
> > > > > > Sent: Monday, September 11, 2023 8:31 AM
> > > > > >
> > > > > > On Wed, Sep 6, 2023 at 4:33 PM Michael S. Tsirkin
> > > > > > <mst@redhat.com>
> > > > wrote:
> > > > > > >
> > > > > > > On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> > > > > > > > This patch adds two new le16 fields to common configuration
> > > > > > > > structure to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> > > > > > > >
> > > > > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > > >
> > > > > > >
> > > > > > > I do not see why this would be pci specific at all.
> > > > > >
> > > > > > This is the PCI interface for live migration. The facility is not specific to
> > PCI.
> > > > > >
> > > > > > It can choose to reuse the common configuration or not, but the
> > > > > > semantic is general enough to be used by other transports. We
> > > > > > can introduce one for MMIO for sure.
> > > > > >
> > > > > > >
> > > > > > > But besides I thought work on live migration will use admin queue.
> > > > > > > This was explicitly one of the motivators.
> > > > > >
> > > > > Please find the proposal that uses administration commands for
> > > > > device
> > > > migration at [1] for passthrough devices.
> > > > >
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > > 61.h
> > > > > tml
> > > >
> > > > This proposal couples live migration with several requirements, and
> > > > suffers from the exact issues I've mentioned below.
> > > >
> > > It does not.
> > > Can you please list which one?
> > >
> > > > In some cases, it's even worse (coupling with PCI/SR-IOV, second
> > > > state machine other than the device status).
> > > >
> > > There is no state machine in [1].
> >
> > Isn't the migration modes of "active, stop, freeze" a state machine?
> >
> Huh, no. Each mode stops/starts specific thing.
> Just because one series is missing this and did only suspend/resume and other series covered P2P mode with modes, it does not make it state machine.
> If you call suspend resume as states, it is still 2 state state machines. :)

It's not about how many states in a single state machine, it's about
how many state machines that exist for device status. Having more than
one creates big obstacles and complexity in the device. You need to
define the interaction of each state otherwise you leave undefined
behaviours.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  5:50                                     ` Parav Pandit
@ 2023-09-13  4:44                                       ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:44 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Tue, Sep 12, 2023 at 1:50 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> One can delicate the work to other VF for purpose of nesting.
>
> One can build infinite level of nesting to not do passthrough, at the end user applications remains slow.

We are talking about nested virtualization but nested emulation. I
won't repeat the definition of virtualization but no matter how much
level of nesting, the hypervisor will try hard to let the application
run natively for most of the time, otherwise it's not the nested
virtualization at all.

Nested virtualization has been supported by all major cloud vendors,
please read the relevant documentation for the performance
implications. Virtio community is not the correct place to debate
whether a nest is useful. We need to make sure the datapath could be
assigned to any nest layers without losing any fundamental facilities
like migration.

> So for such N and M being > 1, one can use software base emulation anyway.

No, only the control path is trapped, the datapath is still passthrough.

>
> >
> > And exposing the whole device to the guest drivers will have security
> > implications, your proposal has demonstrated that you need a workaround for
> There is no security implications in passthrough.

How can you prove this or is it even possible for you to prove this?
You expose all device details to guests (especially the transport
specific details), the attack surface is increased in this way.

What's more, a simple passthrough may lose the chance to workaround
hardware erratas and you will finally get back to the trap and
emulation.

>
> > FLR at least.
> It is actually the opposite.
> FLR is supported with the proposal without any workarounds and mediation.

It's an obvious drawback but not an advantage. And it's not a must for
live migration to work. You need to prove the FLR doesn't conflict
with the live migration, and it's not only FLR but also all the other
PCI facilities. one other example is P2P and what's the next? As more
features were added to the PCI spec, you will have endless work in
auditing the possible conflict with the passthrough based live
migration.

>
> >
> > For non standard device we don't have choices other than passthrough, but for
> > standard devices we have other choices.
>
> Passthrough is basic requirement that we will be fulfilling.

It has several drawbacks that I would not like to repeat. We all know
even for VFIO, it requires a trap instead of a complete passthrough.

> If one wants to do special nesting, may be, there.

Nesting is not special. Go and see how it is supported by major cloud
vendors and you will get the answer. Introducing an interface in
virtio that is hard to be virtualized is even worse than writing a
compiler that can not do bootstrap compilation.

Thanks

> If both commands can converge its good, if not, they are orthogonal requirements.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:44                                       ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:44 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Tue, Sep 12, 2023 at 1:50 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> One can delicate the work to other VF for purpose of nesting.
>
> One can build infinite level of nesting to not do passthrough, at the end user applications remains slow.

We are talking about nested virtualization but nested emulation. I
won't repeat the definition of virtualization but no matter how much
level of nesting, the hypervisor will try hard to let the application
run natively for most of the time, otherwise it's not the nested
virtualization at all.

Nested virtualization has been supported by all major cloud vendors,
please read the relevant documentation for the performance
implications. Virtio community is not the correct place to debate
whether a nest is useful. We need to make sure the datapath could be
assigned to any nest layers without losing any fundamental facilities
like migration.

> So for such N and M being > 1, one can use software base emulation anyway.

No, only the control path is trapped, the datapath is still passthrough.

>
> >
> > And exposing the whole device to the guest drivers will have security
> > implications, your proposal has demonstrated that you need a workaround for
> There is no security implications in passthrough.

How can you prove this or is it even possible for you to prove this?
You expose all device details to guests (especially the transport
specific details), the attack surface is increased in this way.

What's more, a simple passthrough may lose the chance to workaround
hardware erratas and you will finally get back to the trap and
emulation.

>
> > FLR at least.
> It is actually the opposite.
> FLR is supported with the proposal without any workarounds and mediation.

It's an obvious drawback but not an advantage. And it's not a must for
live migration to work. You need to prove the FLR doesn't conflict
with the live migration, and it's not only FLR but also all the other
PCI facilities. one other example is P2P and what's the next? As more
features were added to the PCI spec, you will have endless work in
auditing the possible conflict with the passthrough based live
migration.

>
> >
> > For non standard device we don't have choices other than passthrough, but for
> > standard devices we have other choices.
>
> Passthrough is basic requirement that we will be fulfilling.

It has several drawbacks that I would not like to repeat. We all know
even for VFIO, it requires a trap instead of a complete passthrough.

> If one wants to do special nesting, may be, there.

Nesting is not special. Go and see how it is supported by major cloud
vendors and you will get the answer. Introducing an interface in
virtio that is hard to be virtualized is even worse than writing a
compiler that can not do bootstrap compilation.

Thanks

> If both commands can converge its good, if not, they are orthogonal requirements.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-12  6:05                             ` Parav Pandit
@ 2023-09-13  4:45                               ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:45 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Tue, Sep 12, 2023 at 2:05 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 12, 2023 9:40 AM
> > >
> > > > Why this series can not support nested?
> > > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> > context migration, virtio level reset, dirty page tracking, p2p support, etc.
> > covered in some device, vq suspend resume piece.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > > tml
> >
> > The series works for stateless devices. Before we introduce device states in the
> > spec, we can't migrate stateful devices. So the device context doesn't make
> > much sense right now.
> The series works for stateful devices too. The device context covers it.

How? Can it be used for migrating any existing stateful devices? Don't
we need to define what context means for a specific stateful device
before you can introduce things like device context? Please go through
the archives for the relevant discussions (e.g virtio-FS), it's not as
simple as introducing a device context API.

And what's more, how can it handle the migration compatibility?

>
> >
> > Dirty page tracking in virtio is not a must for live migration to work. It can be
> > done via platform facilities or even software. And to make it more efficient, it
> > needs to utilize transport facilities instead of a general one.
> >
> It is also optional in the spec proposal.
> Most platforms claimed are not able to do efficiently either,

Most platforms are working towards an efficient way. But we are
talking about different things, hardware based dirty page logging is
not a must, that is what I'm saying. For example, KVM doesn't use
hardware to log dirty pages.

> hence the vfio subsystem added the support for it.

As an open standard, if it is designed for a specific software
subsystem on a specific OS, it's a failure.

>
> > The FLR, P2P demonstrates the fragility of a simple passthrough method and
> > how it conflicts with live migration and complicates the device implementation.
> Huh, it shows the opposite.
> It shows that both will seamlessly work.

Have you even tried your proposal with a prototype device?

>
> > And it means you need to audit all PCI features and do workaround if there're
> > any possible issues (or using a whitelist).
> No need for any of this.

You need to prove this otherwise it's fragile. It's the duty of the
author to justify not the reviewer.

For example FLR is required to be done in 100ms. How could you achieve
this during the live migration? How does it affect the downtime and
FRS?

>
> > This is tricky and we are migrating virtio not virtio-pci. If we don't use simple
> > passthrough we don't need to care about this.
> >
> Exactly, we are migrating virtio device for the PCI transport.

No, the migration facility is a general requirement for all transport.
Starting from a PCI specific (actually your proposal does not even
cover all even for PCI) solution which may easily end up with issues
in other transports.

Even if you want to migrate virtio for PCI,  please at least read Qemu
migration codes for virtio and PCI, then you will soon realize that a
lot of things are missing in your proposal.

> As usual, if you have to keep arguing about not doing passhthrough, we are surely past that point.

Who is "we"? Is something like what you said here passed the vote and
written to the spec? We all know the current virtio spec is not built
upon passthrough.

> Virtio does not need to stay in the weird umbrella to always mediate etc.

It's not the mediation, we're not doing vDPA, the device model we had
in hardware and we present to guests are all virtio devices. It's the
trap and emulation which is fundamental in the world of virtualization
for the past decades. It's the model we used to virtualize standard
devices. If you want to debate this methodology, virtio community is
clearly the wrong forum.

>
> Series [1] will be enhanced further to support virtio passthrough device for device context and more.
> Even further we like to extend the support.
>
> > Since the functionality proposed in this series focus on the minimal set of the
> > functionality for migration, it is virtio specific and self contained so nothing
> > special is required to work in the nest.
>
> Maybe it is.
>
> Again, I repeat and like to converge the admin commands between passthrough and non-passthrough cases.

You need to prove at least that your proposal can work for the
passthrough before we can try to converge.

> If we can converge it is good.
> If not both modes can expand.
> It is not either or as use cases are different.

Admin commands are not the cure for all, I've stated drawbacks in
other threads. Not repeating it again here.

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:45                               ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:45 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Tue, Sep 12, 2023 at 2:05 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 12, 2023 9:40 AM
> > >
> > > > Why this series can not support nested?
> > > I don’t see all the aspects that I covered in series [1] ranging from flr, device
> > context migration, virtio level reset, dirty page tracking, p2p support, etc.
> > covered in some device, vq suspend resume piece.
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.h
> > > tml
> >
> > The series works for stateless devices. Before we introduce device states in the
> > spec, we can't migrate stateful devices. So the device context doesn't make
> > much sense right now.
> The series works for stateful devices too. The device context covers it.

How? Can it be used for migrating any existing stateful devices? Don't
we need to define what context means for a specific stateful device
before you can introduce things like device context? Please go through
the archives for the relevant discussions (e.g virtio-FS), it's not as
simple as introducing a device context API.

And what's more, how can it handle the migration compatibility?

>
> >
> > Dirty page tracking in virtio is not a must for live migration to work. It can be
> > done via platform facilities or even software. And to make it more efficient, it
> > needs to utilize transport facilities instead of a general one.
> >
> It is also optional in the spec proposal.
> Most platforms claimed are not able to do efficiently either,

Most platforms are working towards an efficient way. But we are
talking about different things, hardware based dirty page logging is
not a must, that is what I'm saying. For example, KVM doesn't use
hardware to log dirty pages.

> hence the vfio subsystem added the support for it.

As an open standard, if it is designed for a specific software
subsystem on a specific OS, it's a failure.

>
> > The FLR, P2P demonstrates the fragility of a simple passthrough method and
> > how it conflicts with live migration and complicates the device implementation.
> Huh, it shows the opposite.
> It shows that both will seamlessly work.

Have you even tried your proposal with a prototype device?

>
> > And it means you need to audit all PCI features and do workaround if there're
> > any possible issues (or using a whitelist).
> No need for any of this.

You need to prove this otherwise it's fragile. It's the duty of the
author to justify not the reviewer.

For example FLR is required to be done in 100ms. How could you achieve
this during the live migration? How does it affect the downtime and
FRS?

>
> > This is tricky and we are migrating virtio not virtio-pci. If we don't use simple
> > passthrough we don't need to care about this.
> >
> Exactly, we are migrating virtio device for the PCI transport.

No, the migration facility is a general requirement for all transport.
Starting from a PCI specific (actually your proposal does not even
cover all even for PCI) solution which may easily end up with issues
in other transports.

Even if you want to migrate virtio for PCI,  please at least read Qemu
migration codes for virtio and PCI, then you will soon realize that a
lot of things are missing in your proposal.

> As usual, if you have to keep arguing about not doing passhthrough, we are surely past that point.

Who is "we"? Is something like what you said here passed the vote and
written to the spec? We all know the current virtio spec is not built
upon passthrough.

> Virtio does not need to stay in the weird umbrella to always mediate etc.

It's not the mediation, we're not doing vDPA, the device model we had
in hardware and we present to guests are all virtio devices. It's the
trap and emulation which is fundamental in the world of virtualization
for the past decades. It's the model we used to virtualize standard
devices. If you want to debate this methodology, virtio community is
clearly the wrong forum.

>
> Series [1] will be enhanced further to support virtio passthrough device for device context and more.
> Even further we like to extend the support.
>
> > Since the functionality proposed in this series focus on the minimal set of the
> > functionality for migration, it is virtio specific and self contained so nothing
> > special is required to work in the nest.
>
> Maybe it is.
>
> Again, I repeat and like to converge the admin commands between passthrough and non-passthrough cases.

You need to prove at least that your proposal can work for the
passthrough before we can try to converge.

> If we can converge it is good.
> If not both modes can expand.
> It is not either or as use cases are different.

Admin commands are not the cure for all, I've stated drawbacks in
other threads. Not repeating it again here.

Thanks

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:43                   ` Jason Wang
@ 2023-09-13  4:46                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:14 AM

> It's not about how many states in a single state machine, it's about how many
> state machines that exist for device status. Having more than one creates big
> obstacles and complexity in the device. You need to define the interaction of
> each state otherwise you leave undefined behaviours.
The device mode has zero relation to the device status. It does not mess with it at all.
In fact the new bits in device status is making it more complex for the device to handle.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:46                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  4:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:14 AM

> It's not about how many states in a single state machine, it's about how many
> state machines that exist for device status. Having more than one creates big
> obstacles and complexity in the device. You need to define the interaction of
> each state otherwise you leave undefined behaviours.
The device mode has zero relation to the device status. It does not mess with it at all.
In fact the new bits in device status is making it more complex for the device to handle.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:19                                         ` Parav Pandit
@ 2023-09-13  4:56                                           ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:56 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:20 PM Parav Pandit <parav@nvidia.com> wrote:
>
> All of this frankly seems intrusive enough as Michael pointed out.
> Good luck.
>

How do you define "intrusive"?

To me it's much less intrusive than what you've proposed.

1) It gives sufficient flexibility to implement migration via any
transport specific interface. It means your proposal could be built on
top of this but not vice versa.
2) It doesn't need to re-invent the wheels to save and load all the
existing PCI capabilities but your proposal needs to do that in order
to be self contained which turns out to be a new transport which
duplicates with the work of Ling Shan
3) It reuse the device status state machine instead of inventing a other

Only small extensions are required for device implementation to
migrate instead of coupling it with admin commands.

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  4:56                                           ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-13  4:56 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:20 PM Parav Pandit <parav@nvidia.com> wrote:
>
> All of this frankly seems intrusive enough as Michael pointed out.
> Good luck.
>

How do you define "intrusive"?

To me it's much less intrusive than what you've proposed.

1) It gives sufficient flexibility to implement migration via any
transport specific interface. It means your proposal could be built on
top of this but not vice versa.
2) It doesn't need to re-invent the wheels to save and load all the
existing PCI capabilities but your proposal needs to do that in order
to be self contained which turns out to be a new transport which
duplicates with the work of Ling Shan
3) It reuse the device status state machine instead of inventing a other

Only small extensions are required for device implementation to
migrate instead of coupling it with admin commands.

Thanks

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:44                                       ` Jason Wang
@ 2023-09-13  6:05                                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  6:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:14 AM
> To: Parav Pandit <parav@nvidia.com>

> > One can build infinite level of nesting to not do passthrough, at the end user
> applications remains slow.
> 
> We are talking about nested virtualization but nested emulation. I won't repeat
> the definition of virtualization but no matter how much level of nesting, the
> hypervisor will try hard to let the application run natively for most of the time,
> otherwise it's not the nested virtualization at all.
> 
> Nested virtualization has been supported by all major cloud vendors, please
> read the relevant documentation for the performance implications. Virtio
> community is not the correct place to debate whether a nest is useful. We need
> to make sure the datapath could be assigned to any nest layers without losing
> any fundamental facilities like migration.
> 
I am not debating. You or Lingshan claim or imply that mediation is the only way to progress.
And for sure virtio do not need to live in the dark shadow of mediation always.
For nesting use case sure one can do mediation related mode.

So only mediation is not the direction.

> > So for such N and M being > 1, one can use software base emulation anyway.
> 
> No, only the control path is trapped, the datapath is still passthrough.
> 
Again, it depends on the use case.

> >
> > >
> > > And exposing the whole device to the guest drivers will have
> > > security implications, your proposal has demonstrated that you need
> > > a workaround for
> > There is no security implications in passthrough.
> 
> How can you prove this or is it even possible for you to prove this?
Huh, when you claim that it is not secure, please point out exactly what is not secure.
Please take with PCI SIG and file CVE to PCI sig.

> You expose all device details to guests (especially the transport specific details),
> the attack surface is increased in this way.
One can say it is the opposite.
Attack surface is increased in hypervisor due to mediation poking at everything controlled by the guest.


> 
> What's more, a simple passthrough may lose the chance to workaround
> hardware erratas and you will finally get back to the trap and emulation.
Hardware errata's is not the starting point to build the software stack and spec.
What you imply is, one must never use vfio stack, one must not use vcpu acceleration and everything must be emulated.

Same argument of hardware errata applied to data path too.
One should not implement in hw...

I disagree with such argument.

You can say nesting is requirement for some use cases, so spec should support it without blocking the passthrough mode.
Then it is fair discussion.

I will not debate further on passthrough vs control path mediation as either_or approach.

> 
> >
> > > FLR at least.
> > It is actually the opposite.
> > FLR is supported with the proposal without any workarounds and mediation.
> 
> It's an obvious drawback but not an advantage. And it's not a must for live
> migration to work. You need to prove the FLR doesn't conflict with the live
> migration, and it's not only FLR but also all the other PCI facilities. 
I don’t know what you mean by prove. It is already clear from the proposal FLR is not messing with rest of the device migration infrastructure.
You should read [1].

> one other
> example is P2P and what's the next? As more features were added to the PCI
> spec, you will have endless work in auditing the possible conflict with the
> passthrough based live migration.
> 
This drawback equally applies to mediation route where one need to do more than audit where the mediation layer to be extended.
So each method has its pros and cons. One suits one use case, other suits other use case.
Therefore, again attempting to claim that only mediation approach is the only way to progress is incorrect.

In fact audit is still better than mediation because most audits are read only work as opposed to endlessly extending trapping and adding support in core stack.
Again, it is a choice that user make with the tradeoff.

> >
> > >
> > > For non standard device we don't have choices other than
> > > passthrough, but for standard devices we have other choices.
> >
> > Passthrough is basic requirement that we will be fulfilling.
> 
> It has several drawbacks that I would not like to repeat. We all know even for
> VFIO, it requires a trap instead of a complete passthrough.
> 
Sure. Both has pros and cons.
And both can co-exist.

> > If one wants to do special nesting, may be, there.
> 
> Nesting is not special. Go and see how it is supported by major cloud vendors
> and you will get the answer. Introducing an interface in virtio that is hard to be
> virtualized is even worse than writing a compiler that can not do bootstrap
> compilation.
We checked with more than two major cloud vendors and passthrough suffice their use cases and they are not doing nesting.
And other virtio vendor would also like to support native devices. So again, please do not portray that nesting is the only thing and passthrough must not be done.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  6:05                                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  6:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:14 AM
> To: Parav Pandit <parav@nvidia.com>

> > One can build infinite level of nesting to not do passthrough, at the end user
> applications remains slow.
> 
> We are talking about nested virtualization but nested emulation. I won't repeat
> the definition of virtualization but no matter how much level of nesting, the
> hypervisor will try hard to let the application run natively for most of the time,
> otherwise it's not the nested virtualization at all.
> 
> Nested virtualization has been supported by all major cloud vendors, please
> read the relevant documentation for the performance implications. Virtio
> community is not the correct place to debate whether a nest is useful. We need
> to make sure the datapath could be assigned to any nest layers without losing
> any fundamental facilities like migration.
> 
I am not debating. You or Lingshan claim or imply that mediation is the only way to progress.
And for sure virtio do not need to live in the dark shadow of mediation always.
For nesting use case sure one can do mediation related mode.

So only mediation is not the direction.

> > So for such N and M being > 1, one can use software base emulation anyway.
> 
> No, only the control path is trapped, the datapath is still passthrough.
> 
Again, it depends on the use case.

> >
> > >
> > > And exposing the whole device to the guest drivers will have
> > > security implications, your proposal has demonstrated that you need
> > > a workaround for
> > There is no security implications in passthrough.
> 
> How can you prove this or is it even possible for you to prove this?
Huh, when you claim that it is not secure, please point out exactly what is not secure.
Please take with PCI SIG and file CVE to PCI sig.

> You expose all device details to guests (especially the transport specific details),
> the attack surface is increased in this way.
One can say it is the opposite.
Attack surface is increased in hypervisor due to mediation poking at everything controlled by the guest.


> 
> What's more, a simple passthrough may lose the chance to workaround
> hardware erratas and you will finally get back to the trap and emulation.
Hardware errata's is not the starting point to build the software stack and spec.
What you imply is, one must never use vfio stack, one must not use vcpu acceleration and everything must be emulated.

Same argument of hardware errata applied to data path too.
One should not implement in hw...

I disagree with such argument.

You can say nesting is requirement for some use cases, so spec should support it without blocking the passthrough mode.
Then it is fair discussion.

I will not debate further on passthrough vs control path mediation as either_or approach.

> 
> >
> > > FLR at least.
> > It is actually the opposite.
> > FLR is supported with the proposal without any workarounds and mediation.
> 
> It's an obvious drawback but not an advantage. And it's not a must for live
> migration to work. You need to prove the FLR doesn't conflict with the live
> migration, and it's not only FLR but also all the other PCI facilities. 
I don’t know what you mean by prove. It is already clear from the proposal FLR is not messing with rest of the device migration infrastructure.
You should read [1].

> one other
> example is P2P and what's the next? As more features were added to the PCI
> spec, you will have endless work in auditing the possible conflict with the
> passthrough based live migration.
> 
This drawback equally applies to mediation route where one need to do more than audit where the mediation layer to be extended.
So each method has its pros and cons. One suits one use case, other suits other use case.
Therefore, again attempting to claim that only mediation approach is the only way to progress is incorrect.

In fact audit is still better than mediation because most audits are read only work as opposed to endlessly extending trapping and adding support in core stack.
Again, it is a choice that user make with the tradeoff.

> >
> > >
> > > For non standard device we don't have choices other than
> > > passthrough, but for standard devices we have other choices.
> >
> > Passthrough is basic requirement that we will be fulfilling.
> 
> It has several drawbacks that I would not like to repeat. We all know even for
> VFIO, it requires a trap instead of a complete passthrough.
> 
Sure. Both has pros and cons.
And both can co-exist.

> > If one wants to do special nesting, may be, there.
> 
> Nesting is not special. Go and see how it is supported by major cloud vendors
> and you will get the answer. Introducing an interface in virtio that is hard to be
> virtualized is even worse than writing a compiler that can not do bootstrap
> compilation.
We checked with more than two major cloud vendors and passthrough suffice their use cases and they are not doing nesting.
And other virtio vendor would also like to support native devices. So again, please do not portray that nesting is the only thing and passthrough must not be done.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:45                               ` Jason Wang
@ 2023-09-13  6:39                                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  6:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:15 AM

[..]
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > 61.h
> > > > tml
> > >
> > > The series works for stateless devices. Before we introduce device
> > > states in the spec, we can't migrate stateful devices. So the device
> > > context doesn't make much sense right now.
> > The series works for stateful devices too. The device context covers it.
> 
> How? Can it be used for migrating any existing stateful devices? Don't we need
> to define what context means for a specific stateful device before you can
> introduce things like device context? Please go through the archives for the
> relevant discussions (e.g virtio-FS), it's not as simple as introducing a device
> context API.
> 
A device will have its own context for example RSS definition, or flow filters tomorrow.
The device context will be extended post the first series.

> And what's more, how can it handle the migration compatibility?
It will be taken care in follow on as we all know that this to be checked.
I will include the notes of future follow up work items in v1, which will be taken care post this series.

> > > Dirty page tracking in virtio is not a must for live migration to
> > > work. It can be done via platform facilities or even software. And
> > > to make it more efficient, it needs to utilize transport facilities instead of a
> general one.
> > >
> > It is also optional in the spec proposal.
> > Most platforms claimed are not able to do efficiently either,
> 
> Most platforms are working towards an efficient way. But we are talking about
> different things, hardware based dirty page logging is not a must, that is what
> I'm saying. For example, KVM doesn't use hardware to log dirty pages.
> 
I also said same, that hw based dirty page logging is not must. :)
One day hw mmu will be able to track everything efficiently. I have not seen it happening yet.

> > hence the vfio subsystem added the support for it.
> 
> As an open standard, if it is designed for a specific software subsystem on a
> specific OS, it's a failure.
> 
It is not.
One need accept that, in certain areas virtio is following the trails of advancement already done in sw stack.
So that virtio spec advancement fits in to supply such use cases.
And blocking such advancement of virtio spec to promote only_mediation approach is not good either.

BTW: One can say the mediation approach is also designed for specific software subsystem and hence failure.
I will stay away from quoting it, as I don’t see it this way.

> >
> > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > method and how it conflicts with live migration and complicates the device
> implementation.
> > Huh, it shows the opposite.
> > It shows that both will seamlessly work.
> 
> Have you even tried your proposal with a prototype device?
Of course, it is delivered to user for 1.5 years ago before bringing it to the spec with virtio-net and virtio-blk devices.

> 
> >
> > > And it means you need to audit all PCI features and do workaround if
> > > there're any possible issues (or using a whitelist).
> > No need for any of this.
> 
> You need to prove this otherwise it's fragile. It's the duty of the author to justify
> not the reviewer.
> 
One cannot post patches and nor review giant series in one go.
Hence the work to be split on a logical boundary.
Features provisioning, pci layout etc is secondary tasks to take care of.

> For example FLR is required to be done in 100ms. How could you achieve this
> during the live migration? How does it affect the downtime and FRS?
> 
Good technical question to discuss instead of passthrough vs mediation. :)

Device administration work is separate from the device operational part.
The device context records what is the current device context, when the FLR occurs, the device stops all the operations.
And on next read of the device context the FLRed context is returned.

> >
> > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > don't use simple passthrough we don't need to care about this.
> > >
> > Exactly, we are migrating virtio device for the PCI transport.
> 
> No, the migration facility is a general requirement for all transport.
It is for all transport. One can extend when do for MMIO.

> Starting from a PCI specific (actually your proposal does not even cover all even
> for PCI) solution which may easily end up with issues in other transports.
> 
Like?

> Even if you want to migrate virtio for PCI,  please at least read Qemu migration
> codes for virtio and PCI, then you will soon realize that a lot of things are
> missing in your proposal.
> 
Device context is something that will be extended.
VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI transport.

> > As usual, if you have to keep arguing about not doing passhthrough, we are
> surely past that point.
> 
> Who is "we"? 
> 
We = You and me.
From 2021, you keep objecting that passthrough must not be done.
And blocking the work done by other technical committee members to improve the virtio spec to make that happen is simply wrong.

> Is something like what you said here passed the vote and written
> to the spec? 
Not only me.
The virtio technical committee has agreed for nested and hardware-based implementation _both_.

" hardware-based implementations" is part of the virtio specification charter with ballot of [1].

[1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html

And passthrough hardware-based device is in the charter that we strive to support.

> We all know the current virtio spec is not built upon passthrough.

This efforts improve the passthrough hw based implementation that should not be blocked.

> > Virtio does not need to stay in the weird umbrella to always mediate etc.
> 
> It's not the mediation, we're not doing vDPA, the device model we had in
> hardware and we present to guests are all virtio devices. It's the trap and
> emulation which is fundamental in the world of virtualization for the past
> decades. It's the model we used to virtualize standard devices. If you want to
> debate this methodology, virtio community is clearly the wrong forum.
> 
I am not debating it at all. You keep bringing up the point of mediation.

The proposal of [1] is clear that wants to do hardware based passthrough devices with least amount of virtio level mediation.

So somewhere mode of virtualizing has been used, that’s fine, it can continue with full virtualization, mediation,

And also hardware based passthrough device.

> >
> > Series [1] will be enhanced further to support virtio passthrough device for
> device context and more.
> > Even further we like to extend the support.
> >
> > > Since the functionality proposed in this series focus on the minimal
> > > set of the functionality for migration, it is virtio specific and
> > > self contained so nothing special is required to work in the nest.
> >
> > Maybe it is.
> >
> > Again, I repeat and like to converge the admin commands between
> passthrough and non-passthrough cases.
> 
> You need to prove at least that your proposal can work for the passthrough
> before we can try to converge.
> 
What do you mean by "prove"? virtio specification development is not proof based method.

If you want to participate, please review the patches and help community to improve.

> > If we can converge it is good.
> > If not both modes can expand.
> > It is not either or as use cases are different.
> 
> Admin commands are not the cure for all, I've stated drawbacks in other
> threads. Not repeating it again here.
He he, sure, I am not attempting to cure all.
One solution does not fit all cases.
Admin commands are used to solve the specific problem for which the AQ is designed for.

One can make argument saying take pci fabric to 10 km distance, don’t bring new virtio tcp transport...

Drawing boundaries around virtio spec in certain way only makes it further inferior. So please do not block advancements bring in [1].
We really would like to make it more robust with your rich experience and inputs, if you care to participate.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  6:39                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-13  6:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Wednesday, September 13, 2023 10:15 AM

[..]
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > 61.h
> > > > tml
> > >
> > > The series works for stateless devices. Before we introduce device
> > > states in the spec, we can't migrate stateful devices. So the device
> > > context doesn't make much sense right now.
> > The series works for stateful devices too. The device context covers it.
> 
> How? Can it be used for migrating any existing stateful devices? Don't we need
> to define what context means for a specific stateful device before you can
> introduce things like device context? Please go through the archives for the
> relevant discussions (e.g virtio-FS), it's not as simple as introducing a device
> context API.
> 
A device will have its own context for example RSS definition, or flow filters tomorrow.
The device context will be extended post the first series.

> And what's more, how can it handle the migration compatibility?
It will be taken care in follow on as we all know that this to be checked.
I will include the notes of future follow up work items in v1, which will be taken care post this series.

> > > Dirty page tracking in virtio is not a must for live migration to
> > > work. It can be done via platform facilities or even software. And
> > > to make it more efficient, it needs to utilize transport facilities instead of a
> general one.
> > >
> > It is also optional in the spec proposal.
> > Most platforms claimed are not able to do efficiently either,
> 
> Most platforms are working towards an efficient way. But we are talking about
> different things, hardware based dirty page logging is not a must, that is what
> I'm saying. For example, KVM doesn't use hardware to log dirty pages.
> 
I also said same, that hw based dirty page logging is not must. :)
One day hw mmu will be able to track everything efficiently. I have not seen it happening yet.

> > hence the vfio subsystem added the support for it.
> 
> As an open standard, if it is designed for a specific software subsystem on a
> specific OS, it's a failure.
> 
It is not.
One need accept that, in certain areas virtio is following the trails of advancement already done in sw stack.
So that virtio spec advancement fits in to supply such use cases.
And blocking such advancement of virtio spec to promote only_mediation approach is not good either.

BTW: One can say the mediation approach is also designed for specific software subsystem and hence failure.
I will stay away from quoting it, as I don’t see it this way.

> >
> > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > method and how it conflicts with live migration and complicates the device
> implementation.
> > Huh, it shows the opposite.
> > It shows that both will seamlessly work.
> 
> Have you even tried your proposal with a prototype device?
Of course, it is delivered to user for 1.5 years ago before bringing it to the spec with virtio-net and virtio-blk devices.

> 
> >
> > > And it means you need to audit all PCI features and do workaround if
> > > there're any possible issues (or using a whitelist).
> > No need for any of this.
> 
> You need to prove this otherwise it's fragile. It's the duty of the author to justify
> not the reviewer.
> 
One cannot post patches and nor review giant series in one go.
Hence the work to be split on a logical boundary.
Features provisioning, pci layout etc is secondary tasks to take care of.

> For example FLR is required to be done in 100ms. How could you achieve this
> during the live migration? How does it affect the downtime and FRS?
> 
Good technical question to discuss instead of passthrough vs mediation. :)

Device administration work is separate from the device operational part.
The device context records what is the current device context, when the FLR occurs, the device stops all the operations.
And on next read of the device context the FLRed context is returned.

> >
> > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > don't use simple passthrough we don't need to care about this.
> > >
> > Exactly, we are migrating virtio device for the PCI transport.
> 
> No, the migration facility is a general requirement for all transport.
It is for all transport. One can extend when do for MMIO.

> Starting from a PCI specific (actually your proposal does not even cover all even
> for PCI) solution which may easily end up with issues in other transports.
> 
Like?

> Even if you want to migrate virtio for PCI,  please at least read Qemu migration
> codes for virtio and PCI, then you will soon realize that a lot of things are
> missing in your proposal.
> 
Device context is something that will be extended.
VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI transport.

> > As usual, if you have to keep arguing about not doing passhthrough, we are
> surely past that point.
> 
> Who is "we"? 
> 
We = You and me.
From 2021, you keep objecting that passthrough must not be done.
And blocking the work done by other technical committee members to improve the virtio spec to make that happen is simply wrong.

> Is something like what you said here passed the vote and written
> to the spec? 
Not only me.
The virtio technical committee has agreed for nested and hardware-based implementation _both_.

" hardware-based implementations" is part of the virtio specification charter with ballot of [1].

[1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html

And passthrough hardware-based device is in the charter that we strive to support.

> We all know the current virtio spec is not built upon passthrough.

This efforts improve the passthrough hw based implementation that should not be blocked.

> > Virtio does not need to stay in the weird umbrella to always mediate etc.
> 
> It's not the mediation, we're not doing vDPA, the device model we had in
> hardware and we present to guests are all virtio devices. It's the trap and
> emulation which is fundamental in the world of virtualization for the past
> decades. It's the model we used to virtualize standard devices. If you want to
> debate this methodology, virtio community is clearly the wrong forum.
> 
I am not debating it at all. You keep bringing up the point of mediation.

The proposal of [1] is clear that wants to do hardware based passthrough devices with least amount of virtio level mediation.

So somewhere mode of virtualizing has been used, that’s fine, it can continue with full virtualization, mediation,

And also hardware based passthrough device.

> >
> > Series [1] will be enhanced further to support virtio passthrough device for
> device context and more.
> > Even further we like to extend the support.
> >
> > > Since the functionality proposed in this series focus on the minimal
> > > set of the functionality for migration, it is virtio specific and
> > > self contained so nothing special is required to work in the nest.
> >
> > Maybe it is.
> >
> > Again, I repeat and like to converge the admin commands between
> passthrough and non-passthrough cases.
> 
> You need to prove at least that your proposal can work for the passthrough
> before we can try to converge.
> 
What do you mean by "prove"? virtio specification development is not proof based method.

If you want to participate, please review the patches and help community to improve.

> > If we can converge it is good.
> > If not both modes can expand.
> > It is not either or as use cases are different.
> 
> Admin commands are not the cure for all, I've stated drawbacks in other
> threads. Not repeating it again here.
He he, sure, I am not attempting to cure all.
One solution does not fit all cases.
Admin commands are used to solve the specific problem for which the AQ is designed for.

One can make argument saying take pci fabric to 10 km distance, don’t bring new virtio tcp transport...

Drawing boundaries around virtio spec in certain way only makes it further inferior. So please do not block advancements bring in [1].
We really would like to make it more robust with your rich experience and inputs, if you care to participate.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:45                               ` Jason Wang
@ 2023-09-13  8:27                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-13  8:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Zhu, Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:45:21PM +0800, Jason Wang wrote:
> For example, KVM doesn't use
> hardware to log dirty pages.

It uses a mix of PML, PTE bit and EPT write protection.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-13  8:27                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-13  8:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Parav Pandit, Zhu, Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:45:21PM +0800, Jason Wang wrote:
> For example, KVM doesn't use
> hardware to log dirty pages.

It uses a mix of PML, PTE bit and EPT write protection.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  6:39                                 ` Parav Pandit
@ 2023-09-14  3:08                                   ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:08 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:15 AM
>
> [..]
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > > 61.h
> > > > > tml
> > > >
> > > > The series works for stateless devices. Before we introduce device
> > > > states in the spec, we can't migrate stateful devices. So the device
> > > > context doesn't make much sense right now.
> > > The series works for stateful devices too. The device context covers it.
> >
> > How? Can it be used for migrating any existing stateful devices? Don't we need
> > to define what context means for a specific stateful device before you can
> > introduce things like device context? Please go through the archives for the
> > relevant discussions (e.g virtio-FS), it's not as simple as introducing a device
> > context API.
> >
> A device will have its own context for example RSS definition, or flow filters tomorrow.

If you know there are things that are missing when posting the
patches, please use the RFC tag.

> The device context will be extended post the first series.
>
> > And what's more, how can it handle the migration compatibility?
> It will be taken care in follow on as we all know that this to be checked.

You don't even mention it anywhere in your series.

> I will include the notes of future follow up work items in v1, which will be taken care post this series.
>
> > > > Dirty page tracking in virtio is not a must for live migration to
> > > > work. It can be done via platform facilities or even software. And
> > > > to make it more efficient, it needs to utilize transport facilities instead of a
> > general one.
> > > >
> > > It is also optional in the spec proposal.
> > > Most platforms claimed are not able to do efficiently either,
> >
> > Most platforms are working towards an efficient way. But we are talking about
> > different things, hardware based dirty page logging is not a must, that is what
> > I'm saying. For example, KVM doesn't use hardware to log dirty pages.
> >
> I also said same, that hw based dirty page logging is not must. :)
> One day hw mmu will be able to track everything efficiently. I have not seen it happening yet.

How do you define efficiency? KVM uses page fault and most modern
IOMMU support PRI now.

>
> > > hence the vfio subsystem added the support for it.
> >
> > As an open standard, if it is designed for a specific software subsystem on a
> > specific OS, it's a failure.
> >
> It is not.
> One need accept that, in certain areas virtio is following the trails of advancement already done in sw stack.
> So that virtio spec advancement fits in to supply such use cases.
> And blocking such advancement of virtio spec to promote only_mediation approach is not good either.
>
> BTW: One can say the mediation approach is also designed for specific software subsystem and hence failure.
> I will stay away from quoting it, as I don’t see it this way.

The proposal is based on well known technology since the birth of
virtualization. I never knew a mainstream hypervisor that doesn't do
trap and emulate, did you?

>
> > >
> > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > method and how it conflicts with live migration and complicates the device
> > implementation.
> > > Huh, it shows the opposite.
> > > It shows that both will seamlessly work.
> >
> > Have you even tried your proposal with a prototype device?
> Of course, it is delivered to user for 1.5 years ago before bringing it to the spec with virtio-net and virtio-blk devices.

I hope this is your serious answer, but it looks like it is not. Your
proposal misses a lot of states as I pointed out in another thread,
how can it work in fact?

>
> >
> > >
> > > > And it means you need to audit all PCI features and do workaround if
> > > > there're any possible issues (or using a whitelist).
> > > No need for any of this.
> >
> > You need to prove this otherwise it's fragile. It's the duty of the author to justify
> > not the reviewer.
> >
> One cannot post patches and nor review giant series in one go.
> Hence the work to be split on a logical boundary.
> Features provisioning, pci layout etc is secondary tasks to take care of.

Again, if you know something is missing, you need to explain it in the
series instead of waiting for some reviewers to point it out and say
it's well-known afterwards.

>
> > For example FLR is required to be done in 100ms. How could you achieve this
> > during the live migration? How does it affect the downtime and FRS?
> >
> Good technical question to discuss instead of passthrough vs mediation. :)
>
> Device administration work is separate from the device operational part.
> The device context records what is the current device context, when the FLR occurs, the device stops all the operations.
> And on next read of the device context the FLRed context is returned.

Firstly, you didn't explain how it affects the live migration, for
example, what happens if we try to migrate while FLR is ongoing.
Secondly, you ignore the other two questions.

Let's save the time of both.

>
> > >
> > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > don't use simple passthrough we don't need to care about this.
> > > >
> > > Exactly, we are migrating virtio device for the PCI transport.
> >
> > No, the migration facility is a general requirement for all transport.
> It is for all transport. One can extend when do for MMIO.

By using admin commands? It can not perform well for registered.

>
> > Starting from a PCI specific (actually your proposal does not even cover all even
> > for PCI) solution which may easily end up with issues in other transports.
> >
> Like?

The admin command/virtqueue itself may not work well for other
transport. That's the drawback of your proposal while this proposal
doesn't do any coupling.

>
> > Even if you want to migrate virtio for PCI,  please at least read Qemu migration
> > codes for virtio and PCI, then you will soon realize that a lot of things are
> > missing in your proposal.
> >
> Device context is something that will be extended.
> VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI transport.

This is just one mini stuff, how about PCI config space and others?

Again, please read Qemu codes, a lot of things are missing in your
proposal now. If everything is fine to do passthrough based live
migration, I'm pretty sure you need more than what Qemu has since it
can only do a small fraction of the whole PCI.

>
> > > As usual, if you have to keep arguing about not doing passhthrough, we are
> > surely past that point.
> >
> > Who is "we"?
> >
> We = You and me.
> From 2021, you keep objecting that passthrough must not be done.

This is a big misunderstanding, you need to justify it or at least
address the concerns from any reviewer.

> And blocking the work done by other technical committee members to improve the virtio spec to make that happen is simply wrong.

It's unrealistic to think that one will be 100% correct. Justify your
proposal or why I was wrong instead of ignoring my questions and
complaining. That is why we need a community. If it doesn't work,
virtio provides another process for convergence.

>
> > Is something like what you said here passed the vote and written
> > to the spec?
> Not only me.
> The virtio technical committee has agreed for nested and hardware-based implementation _both_.
>
> " hardware-based implementations" is part of the virtio specification charter with ballot of [1].
>
> [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html

Let's don't do conceptual shifts, I was asking the passthrough but you
give me the hardware implementation.

>
> And passthrough hardware-based device is in the charter that we strive to support.
>
> > We all know the current virtio spec is not built upon passthrough.
>
> This efforts improve the passthrough hw based implementation that should not be blocked.

Your proposal was posted only for several days and you think I would
block that just because I asked several questions and some of them are
not answered?

>
> > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> >
> > It's not the mediation, we're not doing vDPA, the device model we had in
> > hardware and we present to guests are all virtio devices. It's the trap and
> > emulation which is fundamental in the world of virtualization for the past
> > decades. It's the model we used to virtualize standard devices. If you want to
> > debate this methodology, virtio community is clearly the wrong forum.
> >
> I am not debating it at all. You keep bringing up the point of mediation.
>
> The proposal of [1] is clear that wants to do hardware based passthrough devices with least amount of virtio level mediation.
>
> So somewhere mode of virtualizing has been used, that’s fine, it can continue with full virtualization, mediation,
>
> And also hardware based passthrough device.
>
> > >
> > > Series [1] will be enhanced further to support virtio passthrough device for
> > device context and more.
> > > Even further we like to extend the support.
> > >
> > > > Since the functionality proposed in this series focus on the minimal
> > > > set of the functionality for migration, it is virtio specific and
> > > > self contained so nothing special is required to work in the nest.
> > >
> > > Maybe it is.
> > >
> > > Again, I repeat and like to converge the admin commands between
> > passthrough and non-passthrough cases.
> >
> > You need to prove at least that your proposal can work for the passthrough
> > before we can try to converge.
> >
> What do you mean by "prove"? virtio specification development is not proof based method.

For example, several of my questions were ignored.

>
> If you want to participate, please review the patches and help community to improve.

See above.

>
> > > If we can converge it is good.
> > > If not both modes can expand.
> > > It is not either or as use cases are different.
> >
> > Admin commands are not the cure for all, I've stated drawbacks in other
> > threads. Not repeating it again here.
> He he, sure, I am not attempting to cure all.
> One solution does not fit all cases.

Then why do you want to couple migration with admin commands?

> Admin commands are used to solve the specific problem for which the AQ is designed for.
>
> One can make argument saying take pci fabric to 10 km distance, don’t bring new virtio tcp transport...
>
> Drawing boundaries around virtio spec in certain way only makes it further inferior. So please do not block advancements bring in [1].

As a reviewer, I ask questions but some of them are ignored, do you
expect the reviewer to figure out by themselves?  If yes, then the
virtio community is not the only community that can block you.

> We really would like to make it more robust with your rich experience and inputs, if you care to participate.

We can collaborate for sure: as I pointed out in another threads, from
what I can see from the both proposals of the current version:

I see a good opportunity to build your admin commands proposal on top
of this proposal. Or it means, we can focus on what needs to be
migrated first:

1) queue state
2) inflight descriptors
3) dirty pages (optional)
4) device state(context) (optional)

I'd leave 3 or 4 since they are very complicated features. Then we can
invent an interface to access those facilities? This is how this
series is structured.

And what's more, admin commands or transport specific interfaces. And
when we invent admin commands, you may realize you are inventing a new
transport which is the idea of transport via admin commands.

Thanks


>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  3:08                                   ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:08 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:15 AM
>
> [..]
> > > > > [1]
> > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg000
> > > > > 61.h
> > > > > tml
> > > >
> > > > The series works for stateless devices. Before we introduce device
> > > > states in the spec, we can't migrate stateful devices. So the device
> > > > context doesn't make much sense right now.
> > > The series works for stateful devices too. The device context covers it.
> >
> > How? Can it be used for migrating any existing stateful devices? Don't we need
> > to define what context means for a specific stateful device before you can
> > introduce things like device context? Please go through the archives for the
> > relevant discussions (e.g virtio-FS), it's not as simple as introducing a device
> > context API.
> >
> A device will have its own context for example RSS definition, or flow filters tomorrow.

If you know there are things that are missing when posting the
patches, please use the RFC tag.

> The device context will be extended post the first series.
>
> > And what's more, how can it handle the migration compatibility?
> It will be taken care in follow on as we all know that this to be checked.

You don't even mention it anywhere in your series.

> I will include the notes of future follow up work items in v1, which will be taken care post this series.
>
> > > > Dirty page tracking in virtio is not a must for live migration to
> > > > work. It can be done via platform facilities or even software. And
> > > > to make it more efficient, it needs to utilize transport facilities instead of a
> > general one.
> > > >
> > > It is also optional in the spec proposal.
> > > Most platforms claimed are not able to do efficiently either,
> >
> > Most platforms are working towards an efficient way. But we are talking about
> > different things, hardware based dirty page logging is not a must, that is what
> > I'm saying. For example, KVM doesn't use hardware to log dirty pages.
> >
> I also said same, that hw based dirty page logging is not must. :)
> One day hw mmu will be able to track everything efficiently. I have not seen it happening yet.

How do you define efficiency? KVM uses page fault and most modern
IOMMU support PRI now.

>
> > > hence the vfio subsystem added the support for it.
> >
> > As an open standard, if it is designed for a specific software subsystem on a
> > specific OS, it's a failure.
> >
> It is not.
> One need accept that, in certain areas virtio is following the trails of advancement already done in sw stack.
> So that virtio spec advancement fits in to supply such use cases.
> And blocking such advancement of virtio spec to promote only_mediation approach is not good either.
>
> BTW: One can say the mediation approach is also designed for specific software subsystem and hence failure.
> I will stay away from quoting it, as I don’t see it this way.

The proposal is based on well known technology since the birth of
virtualization. I never knew a mainstream hypervisor that doesn't do
trap and emulate, did you?

>
> > >
> > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > method and how it conflicts with live migration and complicates the device
> > implementation.
> > > Huh, it shows the opposite.
> > > It shows that both will seamlessly work.
> >
> > Have you even tried your proposal with a prototype device?
> Of course, it is delivered to user for 1.5 years ago before bringing it to the spec with virtio-net and virtio-blk devices.

I hope this is your serious answer, but it looks like it is not. Your
proposal misses a lot of states as I pointed out in another thread,
how can it work in fact?

>
> >
> > >
> > > > And it means you need to audit all PCI features and do workaround if
> > > > there're any possible issues (or using a whitelist).
> > > No need for any of this.
> >
> > You need to prove this otherwise it's fragile. It's the duty of the author to justify
> > not the reviewer.
> >
> One cannot post patches and nor review giant series in one go.
> Hence the work to be split on a logical boundary.
> Features provisioning, pci layout etc is secondary tasks to take care of.

Again, if you know something is missing, you need to explain it in the
series instead of waiting for some reviewers to point it out and say
it's well-known afterwards.

>
> > For example FLR is required to be done in 100ms. How could you achieve this
> > during the live migration? How does it affect the downtime and FRS?
> >
> Good technical question to discuss instead of passthrough vs mediation. :)
>
> Device administration work is separate from the device operational part.
> The device context records what is the current device context, when the FLR occurs, the device stops all the operations.
> And on next read of the device context the FLRed context is returned.

Firstly, you didn't explain how it affects the live migration, for
example, what happens if we try to migrate while FLR is ongoing.
Secondly, you ignore the other two questions.

Let's save the time of both.

>
> > >
> > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > don't use simple passthrough we don't need to care about this.
> > > >
> > > Exactly, we are migrating virtio device for the PCI transport.
> >
> > No, the migration facility is a general requirement for all transport.
> It is for all transport. One can extend when do for MMIO.

By using admin commands? It can not perform well for registered.

>
> > Starting from a PCI specific (actually your proposal does not even cover all even
> > for PCI) solution which may easily end up with issues in other transports.
> >
> Like?

The admin command/virtqueue itself may not work well for other
transport. That's the drawback of your proposal while this proposal
doesn't do any coupling.

>
> > Even if you want to migrate virtio for PCI,  please at least read Qemu migration
> > codes for virtio and PCI, then you will soon realize that a lot of things are
> > missing in your proposal.
> >
> Device context is something that will be extended.
> VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI transport.

This is just one mini stuff, how about PCI config space and others?

Again, please read Qemu codes, a lot of things are missing in your
proposal now. If everything is fine to do passthrough based live
migration, I'm pretty sure you need more than what Qemu has since it
can only do a small fraction of the whole PCI.

>
> > > As usual, if you have to keep arguing about not doing passhthrough, we are
> > surely past that point.
> >
> > Who is "we"?
> >
> We = You and me.
> From 2021, you keep objecting that passthrough must not be done.

This is a big misunderstanding, you need to justify it or at least
address the concerns from any reviewer.

> And blocking the work done by other technical committee members to improve the virtio spec to make that happen is simply wrong.

It's unrealistic to think that one will be 100% correct. Justify your
proposal or why I was wrong instead of ignoring my questions and
complaining. That is why we need a community. If it doesn't work,
virtio provides another process for convergence.

>
> > Is something like what you said here passed the vote and written
> > to the spec?
> Not only me.
> The virtio technical committee has agreed for nested and hardware-based implementation _both_.
>
> " hardware-based implementations" is part of the virtio specification charter with ballot of [1].
>
> [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html

Let's don't do conceptual shifts, I was asking the passthrough but you
give me the hardware implementation.

>
> And passthrough hardware-based device is in the charter that we strive to support.
>
> > We all know the current virtio spec is not built upon passthrough.
>
> This efforts improve the passthrough hw based implementation that should not be blocked.

Your proposal was posted only for several days and you think I would
block that just because I asked several questions and some of them are
not answered?

>
> > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> >
> > It's not the mediation, we're not doing vDPA, the device model we had in
> > hardware and we present to guests are all virtio devices. It's the trap and
> > emulation which is fundamental in the world of virtualization for the past
> > decades. It's the model we used to virtualize standard devices. If you want to
> > debate this methodology, virtio community is clearly the wrong forum.
> >
> I am not debating it at all. You keep bringing up the point of mediation.
>
> The proposal of [1] is clear that wants to do hardware based passthrough devices with least amount of virtio level mediation.
>
> So somewhere mode of virtualizing has been used, that’s fine, it can continue with full virtualization, mediation,
>
> And also hardware based passthrough device.
>
> > >
> > > Series [1] will be enhanced further to support virtio passthrough device for
> > device context and more.
> > > Even further we like to extend the support.
> > >
> > > > Since the functionality proposed in this series focus on the minimal
> > > > set of the functionality for migration, it is virtio specific and
> > > > self contained so nothing special is required to work in the nest.
> > >
> > > Maybe it is.
> > >
> > > Again, I repeat and like to converge the admin commands between
> > passthrough and non-passthrough cases.
> >
> > You need to prove at least that your proposal can work for the passthrough
> > before we can try to converge.
> >
> What do you mean by "prove"? virtio specification development is not proof based method.

For example, several of my questions were ignored.

>
> If you want to participate, please review the patches and help community to improve.

See above.

>
> > > If we can converge it is good.
> > > If not both modes can expand.
> > > It is not either or as use cases are different.
> >
> > Admin commands are not the cure for all, I've stated drawbacks in other
> > threads. Not repeating it again here.
> He he, sure, I am not attempting to cure all.
> One solution does not fit all cases.

Then why do you want to couple migration with admin commands?

> Admin commands are used to solve the specific problem for which the AQ is designed for.
>
> One can make argument saying take pci fabric to 10 km distance, don’t bring new virtio tcp transport...
>
> Drawing boundaries around virtio spec in certain way only makes it further inferior. So please do not block advancements bring in [1].

As a reviewer, I ask questions but some of them are ignored, do you
expect the reviewer to figure out by themselves?  If yes, then the
virtio community is not the only community that can block you.

> We really would like to make it more robust with your rich experience and inputs, if you care to participate.

We can collaborate for sure: as I pointed out in another threads, from
what I can see from the both proposals of the current version:

I see a good opportunity to build your admin commands proposal on top
of this proposal. Or it means, we can focus on what needs to be
migrated first:

1) queue state
2) inflight descriptors
3) dirty pages (optional)
4) device state(context) (optional)

I'd leave 3 or 4 since they are very complicated features. Then we can
invent an interface to access those facilities? This is how this
series is structured.

And what's more, admin commands or transport specific interfaces. And
when we invent admin commands, you may realize you are inventing a new
transport which is the idea of transport via admin commands.

Thanks


>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  6:05                                         ` Parav Pandit
@ 2023-09-14  3:11                                           ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:14 AM
> > To: Parav Pandit <parav@nvidia.com>
>
> > > One can build infinite level of nesting to not do passthrough, at the end user
> > applications remains slow.
> >
> > We are talking about nested virtualization but nested emulation. I won't repeat
> > the definition of virtualization but no matter how much level of nesting, the
> > hypervisor will try hard to let the application run natively for most of the time,
> > otherwise it's not the nested virtualization at all.
> >
> > Nested virtualization has been supported by all major cloud vendors, please
> > read the relevant documentation for the performance implications. Virtio
> > community is not the correct place to debate whether a nest is useful. We need
> > to make sure the datapath could be assigned to any nest layers without losing
> > any fundamental facilities like migration.
> >
> I am not debating. You or Lingshan claim or imply that mediation is the only way to progress.

Let me correct your temiology again. It's "trap and emulation" . It
means the workload runs mostly native but sometimes is trapped by the
hypervisor.

And it's not the only way. It's the start point since all current
virtio spec is built upon this methodology.

> And for sure virtio do not need to live in the dark shadow of mediation always.

99% of virtio devices are implemented in this way (which is what you
call dark and shadow) now.

> For nesting use case sure one can do mediation related mode.
>
> So only mediation is not the direction.

CPU and MMU virtualization were all built in this way.

>
> > > So for such N and M being > 1, one can use software base emulation anyway.
> >
> > No, only the control path is trapped, the datapath is still passthrough.
> >
> Again, it depends on the use case.

No matter what use case, the definition and methodology of
virtualization stands still.

>
> > >
> > > >
> > > > And exposing the whole device to the guest drivers will have
> > > > security implications, your proposal has demonstrated that you need
> > > > a workaround for
> > > There is no security implications in passthrough.
> >
> > How can you prove this or is it even possible for you to prove this?
> Huh, when you claim that it is not secure, please point out exactly what is not secure.
> Please take with PCI SIG and file CVE to PCI sig.

I am saying it has security implications. That is why you need to
explain why you think it doesn't. What's more, the implications are
obviously nothing related to PCI SIG but a vendor virtio hardware
implementation.

>
> > You expose all device details to guests (especially the transport specific details),
> > the attack surface is increased in this way.
> One can say it is the opposite.
> Attack surface is increased in hypervisor due to mediation poking at everything controlled by the guest.
>

We all know such a stack has been widely used for decades. But you
want to say your new stack is much more secure than this?

>
> >
> > What's more, a simple passthrough may lose the chance to workaround
> > hardware erratas and you will finally get back to the trap and emulation.
> Hardware errata's is not the starting point to build the software stack and spec.

It's not the starting point. But it's definitely something that needs
to be considered, go and see kernel codes (especially the KVM part)
and you will get the answer.

> What you imply is, one must never use vfio stack, one must not use vcpu acceleration and everything must be emulated.

Do I say so? Trap and emulation is the common methodology used in KVM
and VFIO. And if you want to replace it with a complete passthrough,
you need to prove your method can work.

>
> Same argument of hardware errata applied to data path too.

Anything makes datapath different? Xen used to fallback to shadow page
tables to workaround hardware TDP errata in the past.

> One should not implement in hw...
>
> I disagree with such argument.

It's not my argument.

>
> You can say nesting is requirement for some use cases, so spec should support it without blocking the passthrough mode.
> Then it is fair discussion.
>
> I will not debate further on passthrough vs control path mediation as either_or approach.
>
> >
> > >
> > > > FLR at least.
> > > It is actually the opposite.
> > > FLR is supported with the proposal without any workarounds and mediation.
> >
> > It's an obvious drawback but not an advantage. And it's not a must for live
> > migration to work. You need to prove the FLR doesn't conflict with the live
> > migration, and it's not only FLR but also all the other PCI facilities.
> I don’t know what you mean by prove. It is already clear from the proposal FLR is not messing with rest of the device migration infrastructure.
> You should read [1].

I don't think you answered my question in that thread.

>
> > one other
> > example is P2P and what's the next? As more features were added to the PCI
> > spec, you will have endless work in auditing the possible conflict with the
> > passthrough based live migration.
> >
> This drawback equally applies to mediation route where one need to do more than audit where the mediation layer to be extended.

No, for trap and emulation we don't need to do that. We only do
datapath assignments.

> So each method has its pros and cons. One suits one use case, other suits other use case.
> Therefore, again attempting to claim that only mediation approach is the only way to progress is incorrect.

I never say things like this, it is your proposal that mandates
migration with admin commands. Could you please read what is proposed
in this series carefully?

On top of this series, you can build your amd commands easily. But
there's nothing that can be done on top of your proposal.

>
> In fact audit is still better than mediation because most audits are read only work as opposed to endlessly extending trapping and adding support in core stack.

One reality that you constantly ignore is that such trapping and
device models have been widely used by a lot of cloud vendors for more
than a decade.

> Again, it is a choice that user make with the tradeoff.
>
> > >
> > > >
> > > > For non standard device we don't have choices other than
> > > > passthrough, but for standard devices we have other choices.
> > >
> > > Passthrough is basic requirement that we will be fulfilling.
> >
> > It has several drawbacks that I would not like to repeat. We all know even for
> > VFIO, it requires a trap instead of a complete passthrough.
> >
> Sure. Both has pros and cons.
> And both can co-exist.

I don't see how it can co-exist with your proposal. I can see how
admin commands can co-exist on top of this series.

>
> > > If one wants to do special nesting, may be, there.
> >
> > Nesting is not special. Go and see how it is supported by major cloud vendors
> > and you will get the answer. Introducing an interface in virtio that is hard to be
> > virtualized is even worse than writing a compiler that can not do bootstrap
> > compilation.
> We checked with more than two major cloud vendors and passthrough suffice their use cases and they are not doing nesting.
> And other virtio vendor would also like to support native devices. So again, please do not portray that nesting is the only thing and passthrough must not be done.

Where do I say passthrough must not be done? I'm saying you need to
justify your proposal instead of simply saying "hey, you are wrong".

Again, nesting is not the only issue, the key point is that it's
partial and not self contained.

Thanks

>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  3:11                                           ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:14 AM
> > To: Parav Pandit <parav@nvidia.com>
>
> > > One can build infinite level of nesting to not do passthrough, at the end user
> > applications remains slow.
> >
> > We are talking about nested virtualization but nested emulation. I won't repeat
> > the definition of virtualization but no matter how much level of nesting, the
> > hypervisor will try hard to let the application run natively for most of the time,
> > otherwise it's not the nested virtualization at all.
> >
> > Nested virtualization has been supported by all major cloud vendors, please
> > read the relevant documentation for the performance implications. Virtio
> > community is not the correct place to debate whether a nest is useful. We need
> > to make sure the datapath could be assigned to any nest layers without losing
> > any fundamental facilities like migration.
> >
> I am not debating. You or Lingshan claim or imply that mediation is the only way to progress.

Let me correct your temiology again. It's "trap and emulation" . It
means the workload runs mostly native but sometimes is trapped by the
hypervisor.

And it's not the only way. It's the start point since all current
virtio spec is built upon this methodology.

> And for sure virtio do not need to live in the dark shadow of mediation always.

99% of virtio devices are implemented in this way (which is what you
call dark and shadow) now.

> For nesting use case sure one can do mediation related mode.
>
> So only mediation is not the direction.

CPU and MMU virtualization were all built in this way.

>
> > > So for such N and M being > 1, one can use software base emulation anyway.
> >
> > No, only the control path is trapped, the datapath is still passthrough.
> >
> Again, it depends on the use case.

No matter what use case, the definition and methodology of
virtualization stands still.

>
> > >
> > > >
> > > > And exposing the whole device to the guest drivers will have
> > > > security implications, your proposal has demonstrated that you need
> > > > a workaround for
> > > There is no security implications in passthrough.
> >
> > How can you prove this or is it even possible for you to prove this?
> Huh, when you claim that it is not secure, please point out exactly what is not secure.
> Please take with PCI SIG and file CVE to PCI sig.

I am saying it has security implications. That is why you need to
explain why you think it doesn't. What's more, the implications are
obviously nothing related to PCI SIG but a vendor virtio hardware
implementation.

>
> > You expose all device details to guests (especially the transport specific details),
> > the attack surface is increased in this way.
> One can say it is the opposite.
> Attack surface is increased in hypervisor due to mediation poking at everything controlled by the guest.
>

We all know such a stack has been widely used for decades. But you
want to say your new stack is much more secure than this?

>
> >
> > What's more, a simple passthrough may lose the chance to workaround
> > hardware erratas and you will finally get back to the trap and emulation.
> Hardware errata's is not the starting point to build the software stack and spec.

It's not the starting point. But it's definitely something that needs
to be considered, go and see kernel codes (especially the KVM part)
and you will get the answer.

> What you imply is, one must never use vfio stack, one must not use vcpu acceleration and everything must be emulated.

Do I say so? Trap and emulation is the common methodology used in KVM
and VFIO. And if you want to replace it with a complete passthrough,
you need to prove your method can work.

>
> Same argument of hardware errata applied to data path too.

Anything makes datapath different? Xen used to fallback to shadow page
tables to workaround hardware TDP errata in the past.

> One should not implement in hw...
>
> I disagree with such argument.

It's not my argument.

>
> You can say nesting is requirement for some use cases, so spec should support it without blocking the passthrough mode.
> Then it is fair discussion.
>
> I will not debate further on passthrough vs control path mediation as either_or approach.
>
> >
> > >
> > > > FLR at least.
> > > It is actually the opposite.
> > > FLR is supported with the proposal without any workarounds and mediation.
> >
> > It's an obvious drawback but not an advantage. And it's not a must for live
> > migration to work. You need to prove the FLR doesn't conflict with the live
> > migration, and it's not only FLR but also all the other PCI facilities.
> I don’t know what you mean by prove. It is already clear from the proposal FLR is not messing with rest of the device migration infrastructure.
> You should read [1].

I don't think you answered my question in that thread.

>
> > one other
> > example is P2P and what's the next? As more features were added to the PCI
> > spec, you will have endless work in auditing the possible conflict with the
> > passthrough based live migration.
> >
> This drawback equally applies to mediation route where one need to do more than audit where the mediation layer to be extended.

No, for trap and emulation we don't need to do that. We only do
datapath assignments.

> So each method has its pros and cons. One suits one use case, other suits other use case.
> Therefore, again attempting to claim that only mediation approach is the only way to progress is incorrect.

I never say things like this, it is your proposal that mandates
migration with admin commands. Could you please read what is proposed
in this series carefully?

On top of this series, you can build your amd commands easily. But
there's nothing that can be done on top of your proposal.

>
> In fact audit is still better than mediation because most audits are read only work as opposed to endlessly extending trapping and adding support in core stack.

One reality that you constantly ignore is that such trapping and
device models have been widely used by a lot of cloud vendors for more
than a decade.

> Again, it is a choice that user make with the tradeoff.
>
> > >
> > > >
> > > > For non standard device we don't have choices other than
> > > > passthrough, but for standard devices we have other choices.
> > >
> > > Passthrough is basic requirement that we will be fulfilling.
> >
> > It has several drawbacks that I would not like to repeat. We all know even for
> > VFIO, it requires a trap instead of a complete passthrough.
> >
> Sure. Both has pros and cons.
> And both can co-exist.

I don't see how it can co-exist with your proposal. I can see how
admin commands can co-exist on top of this series.

>
> > > If one wants to do special nesting, may be, there.
> >
> > Nesting is not special. Go and see how it is supported by major cloud vendors
> > and you will get the answer. Introducing an interface in virtio that is hard to be
> > virtualized is even worse than writing a compiler that can not do bootstrap
> > compilation.
> We checked with more than two major cloud vendors and passthrough suffice their use cases and they are not doing nesting.
> And other virtio vendor would also like to support native devices. So again, please do not portray that nesting is the only thing and passthrough must not be done.

Where do I say passthrough must not be done? I'm saying you need to
justify your proposal instead of simply saying "hey, you are wrong".

Again, nesting is not the only issue, the key point is that it's
partial and not self contained.

Thanks

>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:37                                                               ` Parav Pandit
@ 2023-09-14  3:11                                                                 ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Wednesday, September 13, 2023 9:51 AM
>
> > we plan to implement a self-contain solution
> Make sure that works with device reset and FLR.

We don't need to do that. It's out of the spec.

> And if not, explain that it is for mediation mode related tricks.

It's not the tricks and again, it's not mediation but trap and
emulation. It's the fundamental methodology used in virtualization, so
does the virtio spec.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  3:11                                                                 ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Wednesday, September 13, 2023 9:51 AM
>
> > we plan to implement a self-contain solution
> Make sure that works with device reset and FLR.

We don't need to do that. It's out of the spec.

> And if not, explain that it is for mediation mode related tricks.

It's not the tricks and again, it's not mediation but trap and
emulation. It's the fundamental methodology used in virtualization, so
does the virtio spec.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  8:27                                 ` Michael S. Tsirkin
@ 2023-09-14  3:11                                   ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Zhu, Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 4:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 13, 2023 at 12:45:21PM +0800, Jason Wang wrote:
> > For example, KVM doesn't use
> > hardware to log dirty pages.
>
> It uses a mix of PML, PTE bit and EPT write protection.

Well EPT/PML is Intel specific, the minimal requirement is page fault.
The logging is done by software anyhow.

Virtio can choose to go device page fault for sure.

Thanks

>
> --
> MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  3:11                                   ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Parav Pandit, Zhu, Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 4:28 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 13, 2023 at 12:45:21PM +0800, Jason Wang wrote:
> > For example, KVM doesn't use
> > hardware to log dirty pages.
>
> It uses a mix of PML, PTE bit and EPT write protection.

Well EPT/PML is Intel specific, the minimal requirement is page fault.
The logging is done by software anyhow.

Virtio can choose to go device page fault for sure.

Thanks

>
> --
> MST
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:46                     ` Parav Pandit
@ 2023-09-14  3:12                       ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:14 AM
>
> > It's not about how many states in a single state machine, it's about how many
> > state machines that exist for device status. Having more than one creates big
> > obstacles and complexity in the device. You need to define the interaction of
> > each state otherwise you leave undefined behaviours.
> The device mode has zero relation to the device status.

You will soon get this issue when you want to do nesting.

> It does not mess with it at all.
> In fact the new bits in device status is making it more complex for the device to handle.

Are you challenging the design of the device status? It's definitely
too late to do this.

This proposal increases just one bit and that worries you? Or you
think one more state is much more complicated than a new state machine
with two states?

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  3:12                       ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-14  3:12 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Wednesday, September 13, 2023 10:14 AM
>
> > It's not about how many states in a single state machine, it's about how many
> > state machines that exist for device status. Having more than one creates big
> > obstacles and complexity in the device. You need to define the interaction of
> > each state otherwise you leave undefined behaviours.
> The device mode has zero relation to the device status.

You will soon get this issue when you want to do nesting.

> It does not mess with it at all.
> In fact the new bits in device status is making it more complex for the device to handle.

Are you challenging the design of the device status? It's definitely
too late to do this.

This proposal increases just one bit and that worries you? Or you
think one more state is much more complicated than a new state machine
with two states?

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:36                                                             ` Parav Pandit
@ 2023-09-14  8:19                                                               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:19 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:36 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:51 AM
>
>>> VQ depth defines the VQ's limit.
>> still sounds like limitless and I will stop arguing this as you can see if there is
>> REALLY a queue can be limitless, we even don't need Multi-queue or RSS.
> If you see some value in limitless queue, please add one.
> I have not seen such construct until now and don’t see the need for it.
It is you consider the admin vq is limitless, not me, and I don't agree 
with that.
And I stop arguing this, the point is clear.
>
>>>>>> If you say, that require multiple AQ, then how many should a vendor
>>>> provide?
>>>>> I didn’t say multiple AQs must be used.
>>>>> It is same as NIC RQs.
>>>> don't you agree a single vq has its own performance limitations?
>>> For LM I don’t see the limitation.
>>> The finite limit an AQ has, such limitation is no different than some register
>> write poll with one entry at a time per device.
>> see above, and we are implementing per device facilities.
>>>> In this series, it says:
>>>> +When setting SUSPEND, the driver MUST re-read \field{device status}
>>>> +to
>>>> ensure the SUSPEND bit is set.
>>>>
>>>> And this is nothing to do with scale.
>>> Hence, it is bringing same scale QOS limitation on register too that you claim
>> may be present in the AQ.
>>> And hence, I responded earlier that when most things are not done through
>> BAR, so there is no need to do suspend/resume via BAR either.
>>> And hence the mode setting command of [1] is just fine.
>> The bar registers are almost "triggers"
>>>>> On top of that once the device is SUSPENDED, it cannot accept some
>>>>> other
>>>> RESET_VQ command.
>>>> so as SiWei suggested, there will be a new feature bit introduced in
>>>> V2 for vq reset.
>>> VQ cannot be RESET after the device reset as you wrote.
>> It is device SUSPEND, not reset.
> Suspend means suspend of English language.
> It cannot accept more synchronous commands after that and not supposed to respond.
Please read the series, 2/5 patch describes SUSPEND behaviors.
>
>>>>>>>> It does not reside on the PF to migrate the VFs.
>>>>>>> Hence it does not scale and cannot do parallel operation within
>>>>>>> the VF,
>>>> unless
>>>>>> each register is replicated.
>>>>>> Why its not scale? It is a per device facility.
>>>>> Because the device needs to answer per device through some large
>>>>> scale
>>>> memory to fit in a response time.
>>>> Again, it is a per-device facility, and it is register based serve
>>>> the only one device itself.
>>>> And we do not plan to log the dirty pages in bar.
>>> Hence, there is no reason to wrap suspend resume on the BAR either.
>>> The mode setting admin command is just fine.
>> They are device status bits.
> And it doesn't have to be.
I don't get your comment.
Do you mean there should not be device status bits?
Challenging even DRIVER_OK is unreasonable?
>
>>>>>> Why do you need parallel operation against the LM facility?
>>>>> Because your downtime was 300msec for 1000 VMs.
>>>> the LM facility in this series is per-device, it only severs itself.
>>> And that single threading and single threading per VQ reset via single register
>> wont scale.
>> it is per-device facility, for example, on the VF, not the owner PF.
> And I repeatedly explained you that you never answered, is how such queue can work after device suspend.
> A weird device bifurcation is not supported by pci and not to be done in virtio.
Didn't you find the answer in my comments?

I repeated several times:
1) as described in this series, once SUSPEND, the device should present 
stabilized config space.
2) the device freeze both it's control path and data path. So we don't 
expect any queues functional since when SUSPEND ~ !SUSPEND.
3) but device status still operational because we may need to recover 
from failed LM or cancel the LM process.
4) We will introduce a new feature bit to allow reset vqs after SUSPEND.

Where so you see we expect the queue to work after SUSPEND?

Clear now?
>
>> see above and please feel free to reuse the basic facilities if you like in your AQ
>> LM
> The whole attitude that "We .." and use in "your" LM is just simply wrong.
why wrong?
> Please work towards collaborative design in technical committee.
This is what I am doing now, no? Or why I am talking to you?
> What you want to repeat was already posted, so take some time to review and utilize. If not, describe why it is not useful.
EOM
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  8:19                                                               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:19 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:36 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:51 AM
>
>>> VQ depth defines the VQ's limit.
>> still sounds like limitless and I will stop arguing this as you can see if there is
>> REALLY a queue can be limitless, we even don't need Multi-queue or RSS.
> If you see some value in limitless queue, please add one.
> I have not seen such construct until now and don’t see the need for it.
It is you consider the admin vq is limitless, not me, and I don't agree 
with that.
And I stop arguing this, the point is clear.
>
>>>>>> If you say, that require multiple AQ, then how many should a vendor
>>>> provide?
>>>>> I didn’t say multiple AQs must be used.
>>>>> It is same as NIC RQs.
>>>> don't you agree a single vq has its own performance limitations?
>>> For LM I don’t see the limitation.
>>> The finite limit an AQ has, such limitation is no different than some register
>> write poll with one entry at a time per device.
>> see above, and we are implementing per device facilities.
>>>> In this series, it says:
>>>> +When setting SUSPEND, the driver MUST re-read \field{device status}
>>>> +to
>>>> ensure the SUSPEND bit is set.
>>>>
>>>> And this is nothing to do with scale.
>>> Hence, it is bringing same scale QOS limitation on register too that you claim
>> may be present in the AQ.
>>> And hence, I responded earlier that when most things are not done through
>> BAR, so there is no need to do suspend/resume via BAR either.
>>> And hence the mode setting command of [1] is just fine.
>> The bar registers are almost "triggers"
>>>>> On top of that once the device is SUSPENDED, it cannot accept some
>>>>> other
>>>> RESET_VQ command.
>>>> so as SiWei suggested, there will be a new feature bit introduced in
>>>> V2 for vq reset.
>>> VQ cannot be RESET after the device reset as you wrote.
>> It is device SUSPEND, not reset.
> Suspend means suspend of English language.
> It cannot accept more synchronous commands after that and not supposed to respond.
Please read the series, 2/5 patch describes SUSPEND behaviors.
>
>>>>>>>> It does not reside on the PF to migrate the VFs.
>>>>>>> Hence it does not scale and cannot do parallel operation within
>>>>>>> the VF,
>>>> unless
>>>>>> each register is replicated.
>>>>>> Why its not scale? It is a per device facility.
>>>>> Because the device needs to answer per device through some large
>>>>> scale
>>>> memory to fit in a response time.
>>>> Again, it is a per-device facility, and it is register based serve
>>>> the only one device itself.
>>>> And we do not plan to log the dirty pages in bar.
>>> Hence, there is no reason to wrap suspend resume on the BAR either.
>>> The mode setting admin command is just fine.
>> They are device status bits.
> And it doesn't have to be.
I don't get your comment.
Do you mean there should not be device status bits?
Challenging even DRIVER_OK is unreasonable?
>
>>>>>> Why do you need parallel operation against the LM facility?
>>>>> Because your downtime was 300msec for 1000 VMs.
>>>> the LM facility in this series is per-device, it only severs itself.
>>> And that single threading and single threading per VQ reset via single register
>> wont scale.
>> it is per-device facility, for example, on the VF, not the owner PF.
> And I repeatedly explained you that you never answered, is how such queue can work after device suspend.
> A weird device bifurcation is not supported by pci and not to be done in virtio.
Didn't you find the answer in my comments?

I repeated several times:
1) as described in this series, once SUSPEND, the device should present 
stabilized config space.
2) the device freeze both it's control path and data path. So we don't 
expect any queues functional since when SUSPEND ~ !SUSPEND.
3) but device status still operational because we may need to recover 
from failed LM or cancel the LM process.
4) We will introduce a new feature bit to allow reset vqs after SUSPEND.

Where so you see we expect the queue to work after SUSPEND?

Clear now?
>
>> see above and please feel free to reuse the basic facilities if you like in your AQ
>> LM
> The whole attitude that "We .." and use in "your" LM is just simply wrong.
why wrong?
> Please work towards collaborative design in technical committee.
This is what I am doing now, no? Or why I am talking to you?
> What you want to repeat was already posted, so take some time to review and utilize. If not, describe why it is not useful.
EOM
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:37                                                               ` Parav Pandit
@ 2023-09-14  8:22                                                                 ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:22 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:37 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:51 AM
>> we plan to implement a self-contain solution
> Make sure that works with device reset and FLR.
> And if not, explain that it is for mediation mode related tricks.
also repeated for many times, this is trap and emulate, I don't know why you
keep talking about mediation.

And Why FLR? Is it related or out-of-sepc?

And again and again, we are implementing BASIC FACILITIES, which should
NOT introduce unnecessary dependencies.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  8:22                                                                 ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:22 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:37 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:51 AM
>> we plan to implement a self-contain solution
> Make sure that works with device reset and FLR.
> And if not, explain that it is for mediation mode related tricks.
also repeated for many times, this is trap and emulate, I don't know why you
keep talking about mediation.

And Why FLR? Is it related or out-of-sepc?

And again and again, we are implementing BASIC FACILITIES, which should
NOT introduce unnecessary dependencies.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-13  4:39                                             ` Parav Pandit
@ 2023-09-14  8:24                                               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:24 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:39 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:52 AM
>>> It should be named as SUSPEND_CFG_SPACE.!
>>> All of this frankly seems intrusive enough as Michael pointed out.
>>> Good luck.
>> it also SUSPEND the data-path
> Ok so it works like Suspend of English dictionary, then after that any other VQ related commands don’t progress.
> Because it is suspended.
After suspend, the vqs should not consume more buffers, and should
1)track in-flight descriptors(will be added in next version)
2)Or wait until all in-flight descriptors finish, mark them used and flush


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14  8:24                                               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-14  8:24 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang
  Cc: Michael S. Tsirkin, eperezma, cohuck, stefanha, virtio-comment,
	virtio-dev



On 9/13/2023 12:39 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 13, 2023 9:52 AM
>>> It should be named as SUSPEND_CFG_SPACE.!
>>> All of this frankly seems intrusive enough as Michael pointed out.
>>> Good luck.
>> it also SUSPEND the data-path
> Ok so it works like Suspend of English dictionary, then after that any other VQ related commands don’t progress.
> Because it is suspended.
After suspend, the vqs should not consume more buffers, and should
1)track in-flight descriptors(will be added in next version)
2)Or wait until all in-flight descriptors finish, mark them used and flush


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
  2023-09-06  8:16   ` [virtio-dev] " Zhu Lingshan
@ 2023-09-14 11:09     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:09 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:36PM +0800, Zhu Lingshan wrote:
> When SUSPEND is set, the device should stabilize the device
> states and virtqueue states, therefore the device should
> ignore resetting vqs when SUSPEND is set in device status.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>

And do what? If you really feel it's important we can prohibit
driver from touching state. But generally this seems
un-orthogonal.


> ---
>  content.tex | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 9d727ce..cd2b426 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
>  The device MUST reset any state of a virtqueue to the default state,
>  including the available state and the used state.
>  
> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
> +the device SHOULD ignore resetting any virtqueues.
> +
>  \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
>  
>  After the driver tells the device to reset a queue, the driver MUST verify that
> -- 
> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
@ 2023-09-14 11:09     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:09 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:36PM +0800, Zhu Lingshan wrote:
> When SUSPEND is set, the device should stabilize the device
> states and virtqueue states, therefore the device should
> ignore resetting vqs when SUSPEND is set in device status.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>

And do what? If you really feel it's important we can prohibit
driver from touching state. But generally this seems
un-orthogonal.


> ---
>  content.tex | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 9d727ce..cd2b426 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
>  The device MUST reset any state of a virtqueue to the default state,
>  including the available state and the used state.
>  
> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
> +the device SHOULD ignore resetting any virtqueues.
> +
>  \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
>  
>  After the driver tells the device to reset a queue, the driver MUST verify that
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-14 11:14   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:14 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.
> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.

Compared to Parav's patchset this is much less functional.

Assuming that one goes in, can't we add ability to submit
admin commands through MMIO on the device itself and be done with it?

> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-14 11:14   ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:14 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.
> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.

Compared to Parav's patchset this is much less functional.

Assuming that one goes in, can't we add ability to submit
admin commands through MMIO on the device itself and be done with it?

> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
@ 2023-09-14 11:25     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:25 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
> This patch adds new device facility to save and restore virtqueue
> state. The virtqueue state is split into two parts:
> 
> - The available state: The state that is used for read the next
>   available buffer.
> - The used state: The state that is used for make buffer used.
> 
> This will simply the transport specific method implementation. E.g two
> le16 could be used instead of a single le32). For split virtqueue, we
> only need the available state since the used state is implemented in
> the virtqueue itself (the used index).

hmm no, simply because when ring is not running and when
buffers are processed in order, last avail == used.

> For packed virtqueue, we need
> both the available state and the used state.
> 
> Those states are required to implement live migration support for
> virtio device.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>



> ---
>  content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0a62dce..0e492cd 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
> +
> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
> +get the device internal virtqueue state through the following
> +fields. The implementation of the interfaces is transport specific.
> +
> +\subsection{\field{Available State} Field}
> +
> +The available state field is two bytes of virtqueue state that is used by
> +the device to read the next available buffer.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 last_avail_idx;
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running available ring
> +index where the device will read the next available head of a
> +descriptor chain.

I dislike how this pokes at split-ring.txt externally.
Will make it harder to add new formats down the road.



> +
> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>
> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  last_avail_idx : 15;
> +  last_avail_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running location
> +where the device read the next descriptor from the virtqueue descriptor ring.
> +
> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
> +counter that was observed by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.


> +\subsection{\field{Used State} Field}
> +
> +The used state field is two bytes of virtqueue state that is used by
> +the device when marking a buffer used.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  used_idx : 15;
> +  used_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{used_idx} field is the free-running location where the device write the next
> +used descriptor to the descriptor ring.
> +
> +The \field{used_wrap_counter} field is the wrap counter that is used
> +by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> +is always 0
> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
@ 2023-09-14 11:25     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:25 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
> This patch adds new device facility to save and restore virtqueue
> state. The virtqueue state is split into two parts:
> 
> - The available state: The state that is used for read the next
>   available buffer.
> - The used state: The state that is used for make buffer used.
> 
> This will simply the transport specific method implementation. E.g two
> le16 could be used instead of a single le32). For split virtqueue, we
> only need the available state since the used state is implemented in
> the virtqueue itself (the used index).

hmm no, simply because when ring is not running and when
buffers are processed in order, last avail == used.

> For packed virtqueue, we need
> both the available state and the used state.
> 
> Those states are required to implement live migration support for
> virtio device.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>



> ---
>  content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 65 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0a62dce..0e492cd 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>  types. It is RECOMMENDED that devices generate version 4
>  UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>  
> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
> +
> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
> +get the device internal virtqueue state through the following
> +fields. The implementation of the interfaces is transport specific.
> +
> +\subsection{\field{Available State} Field}
> +
> +The available state field is two bytes of virtqueue state that is used by
> +the device to read the next available buffer.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 last_avail_idx;
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running available ring
> +index where the device will read the next available head of a
> +descriptor chain.

I dislike how this pokes at split-ring.txt externally.
Will make it harder to add new formats down the road.



> +
> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>
> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  last_avail_idx : 15;
> +  last_avail_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{last_avail_idx} field is the free-running location
> +where the device read the next descriptor from the virtqueue descriptor ring.
> +
> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
> +counter that was observed by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.


> +\subsection{\field{Used State} Field}
> +
> +The used state field is two bytes of virtqueue state that is used by
> +the device when marking a buffer used.
> +
> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
> +
> +\begin{lstlisting}
> +le16 {
> +  used_idx : 15;
> +  used_wrap_counter : 1;
> +};
> +\end{lstlisting}
> +
> +The \field{used_idx} field is the free-running location where the device write the next
> +used descriptor to the descriptor ring.
> +
> +The \field{used_wrap_counter} field is the wrap counter that is used
> +by the device.
> +
> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
> +
> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> +is always 0
> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
@ 2023-09-14 11:27     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:27 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> This patch adds two new le16 fields to common configuration structure
> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> ---
>  transport-pci.tex | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..3161519 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          /* About the administration virtqueue. */
>          le16 admin_queue_index;         /* read-only for driver */
>          le16 admin_queue_num;         /* read-only for driver */
> +
> +	/* Virtqueue state */
> +        le16 queue_avail_state;         /* read-write */
> +        le16 queue_used_state;          /* read-write */
>  };
>  \end{lstlisting}
>  
> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  	The value 0 indicates no supported administration virtqueues.
>  	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>  	negotiated.
> +
> +\item[\field{queue_avail_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the available state of
> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
> +\item[\field{queue_used_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the used state of the
> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).

I see no description either here or in the generic patch
of what does it mean to set or get the state.

> +
>  \end{description}
>  
>  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  present either a value of 0 or a power of 2 in
>  \field{queue_size}.
>  
> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> +
>  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>  \field{admin_queue_index} MUST be equal to, or bigger than
>  \field{num_queues}; also, \field{admin_queue_num} MUST be
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-14 11:27     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:27 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
> This patch adds two new le16 fields to common configuration structure
> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> ---
>  transport-pci.tex | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..3161519 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          /* About the administration virtqueue. */
>          le16 admin_queue_index;         /* read-only for driver */
>          le16 admin_queue_num;         /* read-only for driver */
> +
> +	/* Virtqueue state */
> +        le16 queue_avail_state;         /* read-write */
> +        le16 queue_used_state;          /* read-write */
>  };
>  \end{lstlisting}
>  
> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  	The value 0 indicates no supported administration virtqueues.
>  	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>  	negotiated.
> +
> +\item[\field{queue_avail_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the available state of
> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> +
> +\item[\field{queue_used_state}]
> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
> +        negotiated. The driver sets and gets the used state of the
> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).

I see no description either here or in the generic patch
of what does it mean to set or get the state.

> +
>  \end{description}
>  
>  \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  present either a value of 0 or a power of 2 in
>  \field{queue_size}.
>  
> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
> +
>  If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>  \field{admin_queue_index} MUST be equal to, or bigger than
>  \field{num_queues}; also, \field{admin_queue_num} MUST be
> -- 
> 2.35.3
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
@ 2023-09-14 11:30     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:30 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> This commit specifies the constraints of the virtqueue state,
> and the actions should be taken by the device when SUSPEND
> and DRIVER_OK is set
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0fab537..9d727ce 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>  When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>  is always 0
>  
> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> +
> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> +used index in the used ring.
> +
> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> +
> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> +
> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> +the device MUST record the Virtqueue State of every enabled virtqueue
> +in \field{Available State} and \field{Used State} respectively,

record how?

> +and correspondingly restore the Virtqueue State of every enabled virtqueue
> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.

when is that?


> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
@ 2023-09-14 11:30     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:30 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> This commit specifies the constraints of the virtqueue state,
> and the actions should be taken by the device when SUSPEND
> and DRIVER_OK is set
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0fab537..9d727ce 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>  When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>  is always 0
>  
> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> +
> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> +used index in the used ring.
> +
> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> +
> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> +
> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> +the device MUST record the Virtqueue State of every enabled virtqueue
> +in \field{Available State} and \field{Used State} respectively,

record how?

> +and correspondingly restore the Virtqueue State of every enabled virtqueue
> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.

when is that?


> +
>  \input{admin.tex}
>  
>  \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
@ 2023-09-14 11:34     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:34 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
> This patch introduces a new status bit in the device status: SUSPEND.
> 
> This SUSPEND bit can be used by the driver to suspend a device,
> in order to stabilize the device states and virtqueue states.
> 
> Its main use case is live migration.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0e492cd..0fab537 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>  
> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
> +  device has been suspended by the driver.
> +
>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  recover by issuing a reset.
>  \end{note}
>  
> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
> +
> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
> +
>  \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>  
>  The device MUST NOT consume buffers or send any used buffer
> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>  MUST send a device configuration change notification to the driver.
>  
> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
> +
> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.

why? let's just forbid driver from setting it.

> +
> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
> +
> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
> +and resumes operation upon DRIVER_OK.
> +

sorry what?

> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
> +
> +\begin{itemize}
> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
> +\item Wait until all descriptors that being processed to finish and mark them as used.
> +\item Flush all used buffer and send used buffer notifications to the driver.

flush how?

> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}


record where?

> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}

pause in what sense? completely?  this does not seem realistic.
e.g. pci express link has to stay active or device will die.


also, presumably here it is except a bunch of other fields.
e.g. what about queue select and all related queue fields?


> +\end{itemize}
> +
>  \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>  
>  Each virtio device offers all the features it understands.  During
> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>  	handling features reserved for future use.
>  
> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
> +   SUSPEND the device.
> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> -- 
> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-14 11:34     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:34 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
> This patch introduces a new status bit in the device status: SUSPEND.
> 
> This SUSPEND bit can be used by the driver to suspend a device,
> in order to stabilize the device states and virtqueue states.
> 
> Its main use case is live migration.
> 
> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  content.tex | 31 +++++++++++++++++++++++++++++++
>  1 file changed, 31 insertions(+)
> 
> diff --git a/content.tex b/content.tex
> index 0e492cd..0fab537 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>  
> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
> +  device has been suspended by the driver.
> +
>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  recover by issuing a reset.
>  \end{note}
>  
> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
> +
> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
> +
>  \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>  
>  The device MUST NOT consume buffers or send any used buffer
> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>  MUST send a device configuration change notification to the driver.
>  
> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
> +
> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.

why? let's just forbid driver from setting it.

> +
> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
> +
> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
> +and resumes operation upon DRIVER_OK.
> +

sorry what?

> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
> +
> +\begin{itemize}
> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
> +\item Wait until all descriptors that being processed to finish and mark them as used.
> +\item Flush all used buffer and send used buffer notifications to the driver.

flush how?

> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}


record where?

> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}

pause in what sense? completely?  this does not seem realistic.
e.g. pci express link has to stay active or device will die.


also, presumably here it is except a bunch of other fields.
e.g. what about queue select and all related queue fields?


> +\end{itemize}
> +
>  \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>  
>  Each virtio device offers all the features it understands.  During
> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>  	handling features reserved for future use.
>  
> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
> +   SUSPEND the device.
> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
> +
>  \end{description}
>  
>  \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
@ 2023-09-14 11:37   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:37 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.
> 
> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.


after going over this in detail, it is like I worried: this
tries to do too much through a single register and
the ownership is muddied significantly.

I feel a separate capability for suspend/resume that would
be independent of device status would be preferable.

> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-14 11:37   ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-14 11:37 UTC (permalink / raw)
  To: Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> This series introduces
> 1)a new SUSPEND bit in the device status
> Which is used to suspend the device, so that the device states
> and virtqueue states are stabilized.
> 
> 2)virtqueue state and its accessor, to get and set last_avail_idx
> and last_used_idx of virtqueues.
> 
> The main usecase of these new facilities is Live Migration.
> 
> Future work: dirty page tracking and in-flight descriptors.
> 
> This series addresses many comments from Jason, Stefan and Eugenio
> from RFC series.


after going over this in detail, it is like I worried: this
tries to do too much through a single register and
the ownership is muddied significantly.

I feel a separate capability for suspend/resume that would
be independent of device status would be preferable.

> Zhu Lingshan (5):
>   virtio: introduce vq state as basic facility
>   virtio: introduce SUSPEND bit in device status
>   virtqueue: constraints for virtqueue state
>   virtqueue: ignore resetting vqs when SUSPEND
>   virtio-pci: implement VIRTIO_F_QUEUE_STATE
> 
>  content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>  transport-pci.tex |  18 +++++++
>  2 files changed, 136 insertions(+)
> 
> -- 
> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
  2023-09-14 11:25     ` Michael S. Tsirkin
@ 2023-09-15  2:46       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:25 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
>> This patch adds new device facility to save and restore virtqueue
>> state. The virtqueue state is split into two parts:
>>
>> - The available state: The state that is used for read the next
>>    available buffer.
>> - The used state: The state that is used for make buffer used.
>>
>> This will simply the transport specific method implementation. E.g two
>> le16 could be used instead of a single le32). For split virtqueue, we
>> only need the available state since the used state is implemented in
>> the virtqueue itself (the used index).
> hmm no, simply because when ring is not running and when
> buffers are processed in order, last avail == used.
in the ideal case, yes.

I will remove this since this may be ambiguous.
>
>> For packed virtqueue, we need
>> both the available state and the used state.
>>
>> Those states are required to implement live migration support for
>> virtio device.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
>
>> ---
>>   content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 65 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0a62dce..0e492cd 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
>> +
>> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
>> +get the device internal virtqueue state through the following
>> +fields. The implementation of the interfaces is transport specific.
>> +
>> +\subsection{\field{Available State} Field}
>> +
>> +The available state field is two bytes of virtqueue state that is used by
>> +the device to read the next available buffer.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 last_avail_idx;
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running available ring
>> +index where the device will read the next available head of a
>> +descriptor chain.
> I dislike how this pokes at split-ring.txt externally.
> Will make it harder to add new formats down the road.
I can move these contents to split-ring.tex and packed-ring.tex
respectively, however some contents have to be duplicated.

Thanks
>
>
>
>> +
>> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>>
>> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  last_avail_idx : 15;
>> +  last_avail_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running location
>> +where the device read the next descriptor from the virtqueue descriptor ring.
>> +
>> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
>> +counter that was observed by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>
>> +\subsection{\field{Used State} Field}
>> +
>> +The used state field is two bytes of virtqueue state that is used by
>> +the device when marking a buffer used.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  used_idx : 15;
>> +  used_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{used_idx} field is the free-running location where the device write the next
>> +used descriptor to the descriptor ring.
>> +
>> +The \field{used_wrap_counter} field is the wrap counter that is used
>> +by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>> +is always 0
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 1/5] virtio: introduce vq state as basic facility
@ 2023-09-15  2:46       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:25 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:33PM +0800, Zhu Lingshan wrote:
>> This patch adds new device facility to save and restore virtqueue
>> state. The virtqueue state is split into two parts:
>>
>> - The available state: The state that is used for read the next
>>    available buffer.
>> - The used state: The state that is used for make buffer used.
>>
>> This will simply the transport specific method implementation. E.g two
>> le16 could be used instead of a single le32). For split virtqueue, we
>> only need the available state since the used state is implemented in
>> the virtqueue itself (the used index).
> hmm no, simply because when ring is not running and when
> buffers are processed in order, last avail == used.
in the ideal case, yes.

I will remove this since this may be ambiguous.
>
>> For packed virtqueue, we need
>> both the available state and the used state.
>>
>> Those states are required to implement live migration support for
>> virtio device.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>
>
>> ---
>>   content.tex | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 65 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0a62dce..0e492cd 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -502,6 +502,71 @@ \section{Exporting Objects}\label{sec:Basic Facilities of a Virtio Device / Expo
>>   types. It is RECOMMENDED that devices generate version 4
>>   UUIDs as specified by \hyperref[intro:rfc4122]{[RFC4122]}.
>>   
>> +\section{Virtqueue State}\label{sec:Virtqueues / Virtqueue State}
>> +
>> +When VIRTIO_F_QUEUE_STATE has been negotiated, the driver can set and
>> +get the device internal virtqueue state through the following
>> +fields. The implementation of the interfaces is transport specific.
>> +
>> +\subsection{\field{Available State} Field}
>> +
>> +The available state field is two bytes of virtqueue state that is used by
>> +the device to read the next available buffer.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 last_avail_idx;
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running available ring
>> +index where the device will read the next available head of a
>> +descriptor chain.
> I dislike how this pokes at split-ring.txt externally.
> Will make it harder to add new formats down the road.
I can move these contents to split-ring.tex and packed-ring.tex
respectively, however some contents have to be duplicated.

Thanks
>
>
>
>> +
>> +See also \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / The Virtqueue Available Ring}.
>>
>> +When VIRTIO_RING_F_PACKED is negotiated, it contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  last_avail_idx : 15;
>> +  last_avail_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{last_avail_idx} field is the free-running location
>> +where the device read the next descriptor from the virtqueue descriptor ring.
>> +
>> +The \field{last_avail_wrap_counter} field is the last driver ring wrap
>> +counter that was observed by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>
>> +\subsection{\field{Used State} Field}
>> +
>> +The used state field is two bytes of virtqueue state that is used by
>> +the device when marking a buffer used.
>> +
>> +When VIRTIO_RING_F_PACKED is negotiated, the used state contains:
>> +
>> +\begin{lstlisting}
>> +le16 {
>> +  used_idx : 15;
>> +  used_wrap_counter : 1;
>> +};
>> +\end{lstlisting}
>> +
>> +The \field{used_idx} field is the free-running location where the device write the next
>> +used descriptor to the descriptor ring.
>> +
>> +The \field{used_wrap_counter} field is the wrap counter that is used
>> +by the device.
>> +
>> +See also \ref{sec:Packed Virtqueues / Driver and Device Ring Wrap Counters}.
>> +
>> +When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>> +is always 0
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-14 11:34     ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-15  2:57       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>> This patch introduces a new status bit in the device status: SUSPEND.
>>
>> This SUSPEND bit can be used by the driver to suspend a device,
>> in order to stabilize the device states and virtqueue states.
>>
>> Its main use case is live migration.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 31 +++++++++++++++++++++++++++++++
>>   1 file changed, 31 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0e492cd..0fab537 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>   
>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
>> +  device has been suspended by the driver.
>> +
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   recover by issuing a reset.
>>   \end{note}
>>   
>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>> +
>> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
>> +
>>   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>   
>>   The device MUST NOT consume buffers or send any used buffer
>> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>   MUST send a device configuration change notification to the driver.
>>   
>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>> +
>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
> why? let's just forbid driver from setting it.
OK
>
>> +
>> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
>> +
>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
>> +and resumes operation upon DRIVER_OK.
>> +
> sorry what?
In case of a failed or cancelled Live Migration, the device needs to 
resume operation.
However the spec forbids the driver to clear a device status bit, so 
re-writing
DRIVER_OK is expected to clear SUSPEND and the device resume operation.
>
>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
>> +
>> +\begin{itemize}
>> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
>> +\item Wait until all descriptors that being processed to finish and mark them as used.
>> +\item Flush all used buffer and send used buffer notifications to the driver.
> flush how?
This is device-type-specific, and we will include tracking inflight 
descriptors(buffers) in V2.
>
>> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
>
> record where?
This is transport specific, for PCI, patch 5 introduces two new fields 
for avail and used state
>
>> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
> pause in what sense? completely?  this does not seem realistic.
> e.g. pci express link has to stay active or device will die.
only pause virtio, I will rephrase the sentence as "pause its virtio 
operation".
Others like PCI link in the example is out of the spec and we don't need
to migrate them.
>
>
> also, presumably here it is except a bunch of other fields.
> e.g. what about queue select and all related queue fields?
For now they are forbidden.

As SiWei suggested, we will introduce a new feature bit to control whether
allowing resetting a VQ after SUSPEND. We can use more feature bits if
there are requirements to perform anything after SUSPEND. But for now
they are forbidden.
>
>> +\end{itemize}
>> +
>>   \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>>   
>>   Each virtio device offers all the features it understands.  During
>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>>   	handling features reserved for future use.
>>   
>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
>> +   SUSPEND the device.
>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
>> +
>>   \end{description}
>>   
>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>> -- 
>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-15  2:57       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>> This patch introduces a new status bit in the device status: SUSPEND.
>>
>> This SUSPEND bit can be used by the driver to suspend a device,
>> in order to stabilize the device states and virtqueue states.
>>
>> Its main use case is live migration.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 31 +++++++++++++++++++++++++++++++
>>   1 file changed, 31 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0e492cd..0fab537 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>   
>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
>> +  device has been suspended by the driver.
>> +
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   recover by issuing a reset.
>>   \end{note}
>>   
>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>> +
>> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
>> +
>>   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>   
>>   The device MUST NOT consume buffers or send any used buffer
>> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>   MUST send a device configuration change notification to the driver.
>>   
>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>> +
>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
> why? let's just forbid driver from setting it.
OK
>
>> +
>> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
>> +
>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
>> +and resumes operation upon DRIVER_OK.
>> +
> sorry what?
In case of a failed or cancelled Live Migration, the device needs to 
resume operation.
However the spec forbids the driver to clear a device status bit, so 
re-writing
DRIVER_OK is expected to clear SUSPEND and the device resume operation.
>
>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
>> +
>> +\begin{itemize}
>> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
>> +\item Wait until all descriptors that being processed to finish and mark them as used.
>> +\item Flush all used buffer and send used buffer notifications to the driver.
> flush how?
This is device-type-specific, and we will include tracking inflight 
descriptors(buffers) in V2.
>
>> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
>
> record where?
This is transport specific, for PCI, patch 5 introduces two new fields 
for avail and used state
>
>> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
> pause in what sense? completely?  this does not seem realistic.
> e.g. pci express link has to stay active or device will die.
only pause virtio, I will rephrase the sentence as "pause its virtio 
operation".
Others like PCI link in the example is out of the spec and we don't need
to migrate them.
>
>
> also, presumably here it is except a bunch of other fields.
> e.g. what about queue select and all related queue fields?
For now they are forbidden.

As SiWei suggested, we will introduce a new feature bit to control whether
allowing resetting a VQ after SUSPEND. We can use more feature bits if
there are requirements to perform anything after SUSPEND. But for now
they are forbidden.
>
>> +\end{itemize}
>> +
>>   \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>>   
>>   Each virtio device offers all the features it understands.  During
>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>>   	handling features reserved for future use.
>>   
>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
>> +   SUSPEND the device.
>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
>> +
>>   \end{description}
>>   
>>   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>> -- 
>> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-14 11:30     ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-15  2:59       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>> This commit specifies the constraints of the virtqueue state,
>> and the actions should be taken by the device when SUSPEND
>> and DRIVER_OK is set
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 19 +++++++++++++++++++
>>   1 file changed, 19 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0fab537..9d727ce 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>   When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>   is always 0
>>   
>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>> +
>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>> +used index in the used ring.
>> +
>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>> +
>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>> +
>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>> +the device MUST record the Virtqueue State of every enabled virtqueue
>> +in \field{Available State} and \field{Used State} respectively,
> record how?
This is transport specific, for PCI they are recorded in the common 
config space,
two new fields of them are introduced in patch 5.
>
>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> when is that?
When the DRIVER sets DRIVER_OK and done before the device presents 
DRIVER_OK.
>
>
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
@ 2023-09-15  2:59       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  2:59 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>> This commit specifies the constraints of the virtqueue state,
>> and the actions should be taken by the device when SUSPEND
>> and DRIVER_OK is set
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>> ---
>>   content.tex | 19 +++++++++++++++++++
>>   1 file changed, 19 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 0fab537..9d727ce 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>   When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>   is always 0
>>   
>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>> +
>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>> +used index in the used ring.
>> +
>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>> +
>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>> +
>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>> +the device MUST record the Virtqueue State of every enabled virtqueue
>> +in \field{Available State} and \field{Used State} respectively,
> record how?
This is transport specific, for PCI they are recorded in the common 
config space,
two new fields of them are introduced in patch 5.
>
>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> when is that?
When the DRIVER sets DRIVER_OK and done before the device presents 
DRIVER_OK.
>
>
>> +
>>   \input{admin.tex}
>>   
>>   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>> -- 
>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
  2023-09-14 11:09     ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-15  4:06       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:09 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:36PM +0800, Zhu Lingshan wrote:
>> When SUSPEND is set, the device should stabilize the device
>> states and virtqueue states, therefore the device should
>> ignore resetting vqs when SUSPEND is set in device status.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> And do what? If you really feel it's important we can prohibit
> driver from touching state. But generally this seems
> un-orthogonal.
As discussed in other threads, we will introduce new feature bit controlling
this.
>
>
>> ---
>>   content.tex | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 9d727ce..cd2b426 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
>>   The device MUST reset any state of a virtqueue to the default state,
>>   including the available state and the used state.
>>   
>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
>> +the device SHOULD ignore resetting any virtqueues.
>> +
>>   \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
>>   
>>   After the driver tells the device to reset a queue, the driver MUST verify that
>> -- 
>> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND
@ 2023-09-15  4:06       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:09 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:36PM +0800, Zhu Lingshan wrote:
>> When SUSPEND is set, the device should stabilize the device
>> states and virtqueue states, therefore the device should
>> ignore resetting vqs when SUSPEND is set in device status.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> And do what? If you really feel it's important we can prohibit
> driver from touching state. But generally this seems
> un-orthogonal.
As discussed in other threads, we will introduce new feature bit controlling
this.
>
>
>> ---
>>   content.tex | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/content.tex b/content.tex
>> index 9d727ce..cd2b426 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -443,6 +443,9 @@ \subsubsection{Virtqueue Reset}\label{sec:Basic Facilities of a Virtio Device /
>>   The device MUST reset any state of a virtqueue to the default state,
>>   including the available state and the used state.
>>   
>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set in \field{device status},
>> +the device SHOULD ignore resetting any virtqueues.
>> +
>>   \drivernormative{\paragraph}{Virtqueue Reset}{Basic Facilities of a Virtio Device / Virtqueues / Virtqueue Reset / Virtqueue Reset}
>>   
>>   After the driver tells the device to reset a queue, the driver MUST verify that
>> -- 
>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14 11:27     ` Michael S. Tsirkin
@ 2023-09-15  4:13       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:27 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>> This patch adds two new le16 fields to common configuration structure
>> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> ---
>>   transport-pci.tex | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..3161519 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>           /* About the administration virtqueue. */
>>           le16 admin_queue_index;         /* read-only for driver */
>>           le16 admin_queue_num;         /* read-only for driver */
>> +
>> +	/* Virtqueue state */
>> +        le16 queue_avail_state;         /* read-write */
>> +        le16 queue_used_state;          /* read-write */
>>   };
>>   \end{lstlisting}
>>   
>> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   	The value 0 indicates no supported administration virtqueues.
>>   	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>>   	negotiated.
>> +
>> +\item[\field{queue_avail_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the available state of
>> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>> +\item[\field{queue_used_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the used state of the
>> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> I see no description either here or in the generic patch
> of what does it mean to set or get the state.
When SUSPEND, the device stores vq state here, then migrate to the
destination, then the destination hypervisor restores vq state
from here.

I will add more description in V2
>
>> +
>>   \end{description}
>>   
>>   \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
>> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   present either a value of 0 or a power of 2 in
>>   \field{queue_size}.
>>   
>> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
>> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
>> +
>>   If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>>   \field{admin_queue_index} MUST be equal to, or bigger than
>>   \field{num_queues}; also, \field{admin_queue_num} MUST be
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-15  4:13       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:27 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:37PM +0800, Zhu Lingshan wrote:
>> This patch adds two new le16 fields to common configuration structure
>> to support VIRTIO_F_QUEUE_STATE in PCI transport layer.
>>
>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>> ---
>>   transport-pci.tex | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..3161519 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -325,6 +325,10 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>           /* About the administration virtqueue. */
>>           le16 admin_queue_index;         /* read-only for driver */
>>           le16 admin_queue_num;         /* read-only for driver */
>> +
>> +	/* Virtqueue state */
>> +        le16 queue_avail_state;         /* read-write */
>> +        le16 queue_used_state;          /* read-write */
>>   };
>>   \end{lstlisting}
>>   
>> @@ -428,6 +432,17 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   	The value 0 indicates no supported administration virtqueues.
>>   	This field is valid only if VIRTIO_F_ADMIN_VQ has been
>>   	negotiated.
>> +
>> +\item[\field{queue_avail_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the available state of
>> +        the virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
>> +
>> +\item[\field{queue_used_state}]
>> +        This field is valid only if VIRTIO_F_QUEUE_STATE has been
>> +        negotiated. The driver sets and gets the used state of the
>> +        virtqueue here (see \ref{sec:Virtqueues / Virtqueue State}).
> I see no description either here or in the generic patch
> of what does it mean to set or get the state.
When SUSPEND, the device stores vq state here, then migrate to the
destination, then the destination hypervisor restores vq state
from here.

I will add more description in V2
>
>> +
>>   \end{description}
>>   
>>   \devicenormative{\paragraph}{Common configuration structure layout}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Common configuration structure layout}
>> @@ -488,6 +503,9 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   present either a value of 0 or a power of 2 in
>>   \field{queue_size}.
>>   
>> +If VIRTIO_F_QUEUE_STATE has not been negotiated, the device MUST ignore
>> +any accesses to \field{queue_avail_state} and \field{queue_used_state}.
>> +
>>   If VIRTIO_F_ADMIN_VQ has been negotiated, the value
>>   \field{admin_queue_index} MUST be equal to, or bigger than
>>   \field{num_queues}; also, \field{admin_queue_num} MUST be
>> -- 
>> 2.35.3
>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-14 11:14   ` [virtio-comment] " Michael S. Tsirkin
  (?)
@ 2023-09-15  4:28   ` Zhu, Lingshan
  2023-09-17  5:32     ` Parav Pandit
  -1 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:28 UTC (permalink / raw)
  To: virtio-dev



On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>> This series introduces
>> 1)a new SUSPEND bit in the device status
>> Which is used to suspend the device, so that the device states
>> and virtqueue states are stabilized.
>>
>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>> and last_used_idx of virtqueues.
>>
>> The main usecase of these new facilities is Live Migration.
>>
>> Future work: dirty page tracking and in-flight descriptors.
>> This series addresses many comments from Jason, Stefan and Eugenio
>> from RFC series.
> Compared to Parav's patchset this is much less functional.
we will add dirty page tracking and in-flight IO tracker in V2, then it
will be a full featured LM solution.

They are not in this series because we want this series to be small and 
focus.
>
> Assuming that one goes in, can't we add ability to submit
> admin commands through MMIO on the device itself and be done with it?
I am not sure, IMHO, if we use admin vq as back-ends for MMIO based live 
migration,
then the issues in admin vq still exist, for example:
1)nested virtualization
2)bare-metal live migration
3)QOS
4)introduce more attacking surfaces.

And what's more, if we wants to implementing a new capability onbehalf 
of admin vq,
does the capability need to store at least one descriptor buffer, that 
is the
capability length should be at least the max_lengh_of_buffer?

If that is not possible, do we need to implement extra fields like 
length and remaining_length,
then the device repeating update the cap data, and the driver repeat 
reading, way to complex
and introduce significant downtime.


>
>> Zhu Lingshan (5):
>>    virtio: introduce vq state as basic facility
>>    virtio: introduce SUSPEND bit in device status
>>    virtqueue: constraints for virtqueue state
>>    virtqueue: ignore resetting vqs when SUSPEND
>>    virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>
>>   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>   transport-pci.tex |  18 +++++++
>>   2 files changed, 136 insertions(+)
>>
>> -- 
>> 2.35.3
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-14 11:37   ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-15  4:41     ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:37 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>> This series introduces
>> 1)a new SUSPEND bit in the device status
>> Which is used to suspend the device, so that the device states
>> and virtqueue states are stabilized.
>>
>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>> and last_used_idx of virtqueues.
>>
>> The main usecase of these new facilities is Live Migration.
>>
>> Future work: dirty page tracking and in-flight descriptors.
>>
>> This series addresses many comments from Jason, Stefan and Eugenio
>> from RFC series.
>
> after going over this in detail, it is like I worried: this
> tries to do too much through a single register and
> the ownership is muddied significantly.
Not sure about what ownership, device usually STOPPED after
guest freezes, so the hypervisor owns the device status
and LM facilities at that moment.
>
> I feel a separate capability for suspend/resume that would
> be independent of device status would be preferable.
The implementation of the live migration basic facilities are transport 
specific, for PCI:
1)Dirty page tracking will have its own capability
2)In-flight descriptors tracker will have its own capability
3)vq states stored in common config space

Only SUSPEND is implemented in the device status, and this is a valid 
device status.
There are already 6 device status bits, and IMHO this series 
implementing SUSPEND does not
introduce more complexities.
>
>> Zhu Lingshan (5):
>>    virtio: introduce vq state as basic facility
>>    virtio: introduce SUSPEND bit in device status
>>    virtqueue: constraints for virtqueue state
>>    virtqueue: ignore resetting vqs when SUSPEND
>>    virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>
>>   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>   transport-pci.tex |  18 +++++++
>>   2 files changed, 136 insertions(+)
>>
>> -- 
>> 2.35.3
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-15  4:41     ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-15  4:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/14/2023 7:37 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>> This series introduces
>> 1)a new SUSPEND bit in the device status
>> Which is used to suspend the device, so that the device states
>> and virtqueue states are stabilized.
>>
>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>> and last_used_idx of virtqueues.
>>
>> The main usecase of these new facilities is Live Migration.
>>
>> Future work: dirty page tracking and in-flight descriptors.
>>
>> This series addresses many comments from Jason, Stefan and Eugenio
>> from RFC series.
>
> after going over this in detail, it is like I worried: this
> tries to do too much through a single register and
> the ownership is muddied significantly.
Not sure about what ownership, device usually STOPPED after
guest freezes, so the hypervisor owns the device status
and LM facilities at that moment.
>
> I feel a separate capability for suspend/resume that would
> be independent of device status would be preferable.
The implementation of the live migration basic facilities are transport 
specific, for PCI:
1)Dirty page tracking will have its own capability
2)In-flight descriptors tracker will have its own capability
3)vq states stored in common config space

Only SUSPEND is implemented in the device status, and this is a valid 
device status.
There are already 6 device status bits, and IMHO this series 
implementing SUSPEND does not
introduce more complexities.
>
>> Zhu Lingshan (5):
>>    virtio: introduce vq state as basic facility
>>    virtio: introduce SUSPEND bit in device status
>>    virtqueue: constraints for virtqueue state
>>    virtqueue: ignore resetting vqs when SUSPEND
>>    virtio-pci: implement VIRTIO_F_QUEUE_STATE
>>
>>   content.tex       | 118 ++++++++++++++++++++++++++++++++++++++++++++++
>>   transport-pci.tex |  18 +++++++
>>   2 files changed, 136 insertions(+)
>>
>> -- 
>> 2.35.3
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-15  2:57       ` [virtio-dev] " Zhu, Lingshan
@ 2023-09-15 11:10         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-15 11:10 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
> > > This patch introduces a new status bit in the device status: SUSPEND.
> > > 
> > > This SUSPEND bit can be used by the driver to suspend a device,
> > > in order to stabilize the device states and virtqueue states.
> > > 
> > > Its main use case is live migration.
> > > 
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   content.tex | 31 +++++++++++++++++++++++++++++++
> > >   1 file changed, 31 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0e492cd..0fab537 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> > >     drive the device.
> > > +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
> > > +  device has been suspended by the driver.
> > > +
> > >   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
> > >     an error from which it can't recover.
> > >   \end{description}
> > > @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   recover by issuing a reset.
> > >   \end{note}
> > > +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
> > > +
> > > +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
> > > +
> > >   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
> > >   The device MUST NOT consume buffers or send any used buffer
> > > @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
> > >   MUST send a device configuration change notification to the driver.
> > > +The device MUST ignore SUSPEND if FEATURES_OK is not set.
> > > +
> > > +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
> > why? let's just forbid driver from setting it.
> OK
> > 
> > > +
> > > +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
> > > +
> > > +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
> > > +and resumes operation upon DRIVER_OK.
> > > +
> > sorry what?
> In case of a failed or cancelled Live Migration, the device needs to resume
> operation.
> However the spec forbids the driver to clear a device status bit, so
> re-writing
> DRIVER_OK is expected to clear SUSPEND and the device resume operation.

No, DRIVER_OK is already set. Setting a bit that is already set should
not have side effects. In fact auto-clearing suspend is problematic too.


> > 
> > > +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
> > > +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
> > > +
> > > +\begin{itemize}
> > > +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
> > > +\item Wait until all descriptors that being processed to finish and mark them as used.
> > > +\item Flush all used buffer and send used buffer notifications to the driver.
> > flush how?
> This is device-type-specific, and we will include tracking inflight
> descriptors(buffers) in V2.
> > 
> > > +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
> > 
> > record where?
> This is transport specific, for PCI, patch 5 introduces two new fields for
> avail and used state

they clearly can't store state for all vqs, these are just two 16 bit fields.

> > 
> > > +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
> > pause in what sense? completely?  this does not seem realistic.
> > e.g. pci express link has to stay active or device will die.
> only pause virtio, I will rephrase the sentence as "pause its virtio
> operation".

that is vague too. for example what happens to link state of
a networking device?

> Others like PCI link in the example is out of the spec and we don't need
> to migrate them.
> > 
> > 
> > also, presumably here it is except a bunch of other fields.
> > e.g. what about queue select and all related queue fields?
> For now they are forbidden.
> 
> As SiWei suggested, we will introduce a new feature bit to control whether
> allowing resetting a VQ after SUSPEND. We can use more feature bits if
> there are requirements to perform anything after SUSPEND. But for now
> they are forbidden.

I don't know how this means, but whatever. you need to make
all this explicit though.

> > 
> > > +\end{itemize}
> > > +
> > >   \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
> > >   Each virtio device offers all the features it understands.  During
> > > @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
> > >   	handling features reserved for future use.
> > > +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
> > > +   SUSPEND the device.
> > > +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
> > > +
> > >   \end{description}
> > >   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > -- 
> > > 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-15 11:10         ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-15 11:10 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
> > > This patch introduces a new status bit in the device status: SUSPEND.
> > > 
> > > This SUSPEND bit can be used by the driver to suspend a device,
> > > in order to stabilize the device states and virtqueue states.
> > > 
> > > Its main use case is live migration.
> > > 
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   content.tex | 31 +++++++++++++++++++++++++++++++
> > >   1 file changed, 31 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0e492cd..0fab537 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> > >     drive the device.
> > > +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
> > > +  device has been suspended by the driver.
> > > +
> > >   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
> > >     an error from which it can't recover.
> > >   \end{description}
> > > @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   recover by issuing a reset.
> > >   \end{note}
> > > +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
> > > +
> > > +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
> > > +
> > >   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
> > >   The device MUST NOT consume buffers or send any used buffer
> > > @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> > >   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
> > >   MUST send a device configuration change notification to the driver.
> > > +The device MUST ignore SUSPEND if FEATURES_OK is not set.
> > > +
> > > +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
> > why? let's just forbid driver from setting it.
> OK
> > 
> > > +
> > > +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
> > > +
> > > +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
> > > +and resumes operation upon DRIVER_OK.
> > > +
> > sorry what?
> In case of a failed or cancelled Live Migration, the device needs to resume
> operation.
> However the spec forbids the driver to clear a device status bit, so
> re-writing
> DRIVER_OK is expected to clear SUSPEND and the device resume operation.

No, DRIVER_OK is already set. Setting a bit that is already set should
not have side effects. In fact auto-clearing suspend is problematic too.


> > 
> > > +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
> > > +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
> > > +
> > > +\begin{itemize}
> > > +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
> > > +\item Wait until all descriptors that being processed to finish and mark them as used.
> > > +\item Flush all used buffer and send used buffer notifications to the driver.
> > flush how?
> This is device-type-specific, and we will include tracking inflight
> descriptors(buffers) in V2.
> > 
> > > +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
> > 
> > record where?
> This is transport specific, for PCI, patch 5 introduces two new fields for
> avail and used state

they clearly can't store state for all vqs, these are just two 16 bit fields.

> > 
> > > +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
> > pause in what sense? completely?  this does not seem realistic.
> > e.g. pci express link has to stay active or device will die.
> only pause virtio, I will rephrase the sentence as "pause its virtio
> operation".

that is vague too. for example what happens to link state of
a networking device?

> Others like PCI link in the example is out of the spec and we don't need
> to migrate them.
> > 
> > 
> > also, presumably here it is except a bunch of other fields.
> > e.g. what about queue select and all related queue fields?
> For now they are forbidden.
> 
> As SiWei suggested, we will introduce a new feature bit to control whether
> allowing resetting a VQ after SUSPEND. We can use more feature bits if
> there are requirements to perform anything after SUSPEND. But for now
> they are forbidden.

I don't know how this means, but whatever. you need to make
all this explicit though.

> > 
> > > +\end{itemize}
> > > +
> > >   \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
> > >   Each virtio device offers all the features it understands.  During
> > > @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> > >   	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
> > >   	handling features reserved for future use.
> > > +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
> > > +   SUSPEND the device.
> > > +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
> > > +
> > >   \end{description}
> > >   \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
> > > -- 
> > > 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-15  2:59       ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-15 11:16         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-15 11:16 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> > > This commit specifies the constraints of the virtqueue state,
> > > and the actions should be taken by the device when SUSPEND
> > > and DRIVER_OK is set
> > > 
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   content.tex | 19 +++++++++++++++++++
> > >   1 file changed, 19 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0fab537..9d727ce 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
> > >   When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> > >   is always 0
> > > +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > +
> > > +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> > > +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> > > +used index in the used ring.
> > > +
> > > +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > +
> > > +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> > > +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> > > +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> > > +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> > > +
> > > +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> > > +the device MUST record the Virtqueue State of every enabled virtqueue
> > > +in \field{Available State} and \field{Used State} respectively,
> > record how?
> This is transport specific, for PCI they are recorded in the common config
> space,
> two new fields of them are introduced in patch 5.


that is not enough space to record state for every enabled vq.

> > 
> > > +and correspondingly restore the Virtqueue State of every enabled virtqueue
> > > +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> > when is that?
> When the DRIVER sets DRIVER_OK and done before the device presents
> DRIVER_OK.

I don't really understand the flow here. does SUSPEND clear DRIVER_OK
then?


> > 
> > 
> > > +
> > >   \input{admin.tex}
> > >   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > > -- 
> > > 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
@ 2023-09-15 11:16         ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-15 11:16 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> > > This commit specifies the constraints of the virtqueue state,
> > > and the actions should be taken by the device when SUSPEND
> > > and DRIVER_OK is set
> > > 
> > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   content.tex | 19 +++++++++++++++++++
> > >   1 file changed, 19 insertions(+)
> > > 
> > > diff --git a/content.tex b/content.tex
> > > index 0fab537..9d727ce 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
> > >   When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> > >   is always 0
> > > +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > +
> > > +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> > > +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> > > +used index in the used ring.
> > > +
> > > +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > +
> > > +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> > > +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> > > +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> > > +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> > > +
> > > +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> > > +the device MUST record the Virtqueue State of every enabled virtqueue
> > > +in \field{Available State} and \field{Used State} respectively,
> > record how?
> This is transport specific, for PCI they are recorded in the common config
> space,
> two new fields of them are introduced in patch 5.


that is not enough space to record state for every enabled vq.

> > 
> > > +and correspondingly restore the Virtqueue State of every enabled virtqueue
> > > +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> > when is that?
> When the DRIVER sets DRIVER_OK and done before the device presents
> DRIVER_OK.

I don't really understand the flow here. does SUSPEND clear DRIVER_OK
then?


> > 
> > 
> > > +
> > >   \input{admin.tex}
> > >   \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > > -- 
> > > 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:11                                           ` Jason Wang
@ 2023-09-17  5:22                                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:41 AM
> 
> On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > To: Parav Pandit <parav@nvidia.com>
> >
> > > > One can build infinite level of nesting to not do passthrough, at
> > > > the end user
> > > applications remains slow.
> > >
> > > We are talking about nested virtualization but nested emulation. I
> > > won't repeat the definition of virtualization but no matter how much
> > > level of nesting, the hypervisor will try hard to let the
> > > application run natively for most of the time, otherwise it's not the nested
> virtualization at all.
> > >
> > > Nested virtualization has been supported by all major cloud vendors,
> > > please read the relevant documentation for the performance
> > > implications. Virtio community is not the correct place to debate
> > > whether a nest is useful. We need to make sure the datapath could be
> > > assigned to any nest layers without losing any fundamental facilities like
> migration.
> > >
> > I am not debating. You or Lingshan claim or imply that mediation is the only
> way to progress.
> 
> Let me correct your temiology again. It's "trap and emulation" . It means the
> workload runs mostly native but sometimes is trapped by the hypervisor.
>
 
> And it's not the only way. It's the start point since all current virtio spec is built
> upon this methodology.
Current spec is not the steering point to define new methods.
So we will build the spec infra to support passthrough.

Mediation/trap-emulation where hypervisor is involved is also second use case that you are addressing.

And hence, both are not mutually exclusive.
Hence we should not debate that anymore.

> 
> > And for sure virtio do not need to live in the dark shadow of mediation always.
> 
> 99% of virtio devices are implemented in this way (which is what you call dark
> and shadow) now.
> 
What I am saying is one should not say mediation/trap-emulation is the only way for virtio.
So let passthrough device migration to progress.

> > For nesting use case sure one can do mediation related mode.
> >
> > So only mediation is not the direction.
> 
> CPU and MMU virtualization were all built in this way.
> 
Not anymore. Both of them have vcpus and viommu where may things are not trapped.
So as I said both has pros and cons and users will pick what fits their need and use case.

> >
> > > > So for such N and M being > 1, one can use software base emulation
> anyway.
> > >
> > > No, only the control path is trapped, the datapath is still passthrough.
> > >
> > Again, it depends on the use case.
> 
> No matter what use case, the definition and methodology of virtualization
> stands still.
> 
I will stop debating this because the core technical question is not answered.
I don’t see a technology available that virtio can utilize to it.
That is interface that can work without messing with device status and flr while device migration is ongoing.
Hence, methodology for passthrough and mediation/trap-emulation is fundamentally different.
And that is just fine.

> >
> > > >
> > > > >
> > > > > And exposing the whole device to the guest drivers will have
> > > > > security implications, your proposal has demonstrated that you
> > > > > need a workaround for
> > > > There is no security implications in passthrough.
> > >
> > > How can you prove this or is it even possible for you to prove this?
> > Huh, when you claim that it is not secure, please point out exactly what is not
> secure.
> > Please take with PCI SIG and file CVE to PCI sig.
> 
> I am saying it has security implications. That is why you need to explain why you
> think it doesn't. What's more, the implications are obviously nothing related to
> PCI SIG but a vendor virtio hardware implementation.
> 
PCI passthough for virtio member devices and non virtio devices with P2P, and their interaction is already there in the VM.
Device migration is not adding/removing anything, nor touching any security aspect of it.
Because it does not need to it either.
Device migration is making sure that it continue to exists.

> >
> > > You expose all device details to guests (especially the transport
> > > specific details), the attack surface is increased in this way.
> > One can say it is the opposite.
> > Attack surface is increased in hypervisor due to mediation poking at
> everything controlled by the guest.
> >
> 
> We all know such a stack has been widely used for decades. But you want to say
> your new stack is much more secure than this?
> 
It can be yes, because it exposes all necessary things defined in the virtio spec boundary today.
And not involving hypervisor in core device operation.

> >
> > >
> > > What's more, a simple passthrough may lose the chance to workaround
> > > hardware erratas and you will finally get back to the trap and emulation.
> > Hardware errata's is not the starting point to build the software stack and
> spec.
> 
> It's not the starting point. But it's definitely something that needs to be
> considered, go and see kernel codes (especially the KVM part) and you will get
> the answer.
> 
There are kernels which cannot be updated in field today in Nvidia cloud shipped by Redhat's OS variant.

So it is invalid assumption that somehow data path does not have bug, but large part of the control plane has bug, hence it should be done in software...

> > What you imply is, one must never use vfio stack, one must not use vcpu
> acceleration and everything must be emulated.
> 
> Do I say so? Trap and emulation is the common methodology used in KVM and
> VFIO. And if you want to replace it with a complete passthrough, you need to
> prove your method can work.
> 
Please review patches. I do not plan to _replace_ is either.
Those users who want to use passthrough, can use passthrough with major traps+emulation on FLR, device_status, cvq, avq and without implementing AQ on every single member device.
And those users who prefer trap+emualation can use that.

> >
> > Same argument of hardware errata applied to data path too.
> 
> Anything makes datapath different? Xen used to fallback to shadow page tables
> to workaround hardware TDP errata in the past.
> 
> > One should not implement in hw...
> >
> > I disagree with such argument.
> 
> It's not my argument.
> 
You claimed that to overcome hw errata, one should use trap_emulation, somehow only for portion of the functionality.
And rest portion of the functionality does not have hw errata, hence hw should be use (for example for data path). :)

> >
> > You can say nesting is requirement for some use cases, so spec should support
> it without blocking the passthrough mode.
> > Then it is fair discussion.
> >
> > I will not debate further on passthrough vs control path mediation as
> either_or approach.
> >
> > >
> > > >
> > > > > FLR at least.
> > > > It is actually the opposite.
> > > > FLR is supported with the proposal without any workarounds and
> mediation.
> > >
> > > It's an obvious drawback but not an advantage. And it's not a must
> > > for live migration to work. You need to prove the FLR doesn't
> > > conflict with the live migration, and it's not only FLR but also all the other
> PCI facilities.
> > I don’t know what you mean by prove. It is already clear from the proposal
> FLR is not messing with rest of the device migration infrastructure.
> > You should read [1].
> 
> I don't think you answered my question in that thread.
> 
Please ask the question in that series if any, because there is no FLR, device reset interaction in passthrough between owner and member device.

> >
> > > one other
> > > example is P2P and what's the next? As more features were added to
> > > the PCI spec, you will have endless work in auditing the possible
> > > conflict with the passthrough based live migration.
> > >
> > This drawback equally applies to mediation route where one need to do more
> than audit where the mediation layer to be extended.
> 
> No, for trap and emulation we don't need to do that. We only do datapath
> assignments.
> 
It is required, because also such paths to be audited and extended as without it the feature does not visible to the guest.

> > So each method has its pros and cons. One suits one use case, other suits
> other use case.
> > Therefore, again attempting to claim that only mediation approach is the only
> way to progress is incorrect.
> 
> I never say things like this, it is your proposal that mandates migration with
> admin commands. Could you please read what is proposed in this series
> carefully?
> 
Admin commands are split from the AQ so one can use the admin commands inband as well.
Though, I don’t see how it can functionality work without mediation.
This is the key technical difference between two approaches.

> On top of this series, you can build your amd commands easily. But there's
> nothing that can be done on top of your proposal.
> 
I don’t see what more to be done on top of our proposal.
If you hint nesting, than it can be done through a peer admin device to delete such admin role.

> >
> > In fact audit is still better than mediation because most audits are read only
> work as opposed to endlessly extending trapping and adding support in core
> stack.
> 
> One reality that you constantly ignore is that such trapping and device models
> have been widely used by a lot of cloud vendors for more than a decade.
> 
It may be but, it is not the only option.

> > Again, it is a choice that user make with the tradeoff.
> >
> > > >
> > > > >
> > > > > For non standard device we don't have choices other than
> > > > > passthrough, but for standard devices we have other choices.
> > > >
> > > > Passthrough is basic requirement that we will be fulfilling.
> > >
> > > It has several drawbacks that I would not like to repeat. We all
> > > know even for VFIO, it requires a trap instead of a complete passthrough.
> > >
> > Sure. Both has pros and cons.
> > And both can co-exist.
> 
> I don't see how it can co-exist with your proposal. I can see how admin
> commands can co-exist on top of this series.
> 
The reason to me both has difficulty is because both are solving different problem.
And they can co-exist as two different methods to two different problems.

> >
> > > > If one wants to do special nesting, may be, there.
> > >
> > > Nesting is not special. Go and see how it is supported by major
> > > cloud vendors and you will get the answer. Introducing an interface
> > > in virtio that is hard to be virtualized is even worse than writing
> > > a compiler that can not do bootstrap compilation.
> > We checked with more than two major cloud vendors and passthrough suffice
> their use cases and they are not doing nesting.
> > And other virtio vendor would also like to support native devices. So again,
> please do not portray that nesting is the only thing and passthrough must not be
> done.
> 
> Where do I say passthrough must not be done? I'm saying you need to justify
> your proposal instead of simply saying "hey, you are wrong".
> 
I never said you are wrong. I replied to Lingshan that resuming/suspending queues after the device is suspended, is wrong, and it should not be done.

> Again, nesting is not the only issue, the key point is that it's partial and not self
> contained.

Admin commands are self-contained to the owner device.
They are not self contained in the member device, because it cannot be. Self containment cannot work with device reset, flr, dma flow.
Self containment requires mediation or renamed trap+emulation; which is the anti-goal of passtrough.
And I am very interested if you can show how admin commands can work with device reset, flr flow WITHOUT mediation approach.
Lingshan so far didn’t answer this.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-17  5:22                                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:41 AM
> 
> On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > To: Parav Pandit <parav@nvidia.com>
> >
> > > > One can build infinite level of nesting to not do passthrough, at
> > > > the end user
> > > applications remains slow.
> > >
> > > We are talking about nested virtualization but nested emulation. I
> > > won't repeat the definition of virtualization but no matter how much
> > > level of nesting, the hypervisor will try hard to let the
> > > application run natively for most of the time, otherwise it's not the nested
> virtualization at all.
> > >
> > > Nested virtualization has been supported by all major cloud vendors,
> > > please read the relevant documentation for the performance
> > > implications. Virtio community is not the correct place to debate
> > > whether a nest is useful. We need to make sure the datapath could be
> > > assigned to any nest layers without losing any fundamental facilities like
> migration.
> > >
> > I am not debating. You or Lingshan claim or imply that mediation is the only
> way to progress.
> 
> Let me correct your temiology again. It's "trap and emulation" . It means the
> workload runs mostly native but sometimes is trapped by the hypervisor.
>
 
> And it's not the only way. It's the start point since all current virtio spec is built
> upon this methodology.
Current spec is not the steering point to define new methods.
So we will build the spec infra to support passthrough.

Mediation/trap-emulation where hypervisor is involved is also second use case that you are addressing.

And hence, both are not mutually exclusive.
Hence we should not debate that anymore.

> 
> > And for sure virtio do not need to live in the dark shadow of mediation always.
> 
> 99% of virtio devices are implemented in this way (which is what you call dark
> and shadow) now.
> 
What I am saying is one should not say mediation/trap-emulation is the only way for virtio.
So let passthrough device migration to progress.

> > For nesting use case sure one can do mediation related mode.
> >
> > So only mediation is not the direction.
> 
> CPU and MMU virtualization were all built in this way.
> 
Not anymore. Both of them have vcpus and viommu where may things are not trapped.
So as I said both has pros and cons and users will pick what fits their need and use case.

> >
> > > > So for such N and M being > 1, one can use software base emulation
> anyway.
> > >
> > > No, only the control path is trapped, the datapath is still passthrough.
> > >
> > Again, it depends on the use case.
> 
> No matter what use case, the definition and methodology of virtualization
> stands still.
> 
I will stop debating this because the core technical question is not answered.
I don’t see a technology available that virtio can utilize to it.
That is interface that can work without messing with device status and flr while device migration is ongoing.
Hence, methodology for passthrough and mediation/trap-emulation is fundamentally different.
And that is just fine.

> >
> > > >
> > > > >
> > > > > And exposing the whole device to the guest drivers will have
> > > > > security implications, your proposal has demonstrated that you
> > > > > need a workaround for
> > > > There is no security implications in passthrough.
> > >
> > > How can you prove this or is it even possible for you to prove this?
> > Huh, when you claim that it is not secure, please point out exactly what is not
> secure.
> > Please take with PCI SIG and file CVE to PCI sig.
> 
> I am saying it has security implications. That is why you need to explain why you
> think it doesn't. What's more, the implications are obviously nothing related to
> PCI SIG but a vendor virtio hardware implementation.
> 
PCI passthough for virtio member devices and non virtio devices with P2P, and their interaction is already there in the VM.
Device migration is not adding/removing anything, nor touching any security aspect of it.
Because it does not need to it either.
Device migration is making sure that it continue to exists.

> >
> > > You expose all device details to guests (especially the transport
> > > specific details), the attack surface is increased in this way.
> > One can say it is the opposite.
> > Attack surface is increased in hypervisor due to mediation poking at
> everything controlled by the guest.
> >
> 
> We all know such a stack has been widely used for decades. But you want to say
> your new stack is much more secure than this?
> 
It can be yes, because it exposes all necessary things defined in the virtio spec boundary today.
And not involving hypervisor in core device operation.

> >
> > >
> > > What's more, a simple passthrough may lose the chance to workaround
> > > hardware erratas and you will finally get back to the trap and emulation.
> > Hardware errata's is not the starting point to build the software stack and
> spec.
> 
> It's not the starting point. But it's definitely something that needs to be
> considered, go and see kernel codes (especially the KVM part) and you will get
> the answer.
> 
There are kernels which cannot be updated in field today in Nvidia cloud shipped by Redhat's OS variant.

So it is invalid assumption that somehow data path does not have bug, but large part of the control plane has bug, hence it should be done in software...

> > What you imply is, one must never use vfio stack, one must not use vcpu
> acceleration and everything must be emulated.
> 
> Do I say so? Trap and emulation is the common methodology used in KVM and
> VFIO. And if you want to replace it with a complete passthrough, you need to
> prove your method can work.
> 
Please review patches. I do not plan to _replace_ is either.
Those users who want to use passthrough, can use passthrough with major traps+emulation on FLR, device_status, cvq, avq and without implementing AQ on every single member device.
And those users who prefer trap+emualation can use that.

> >
> > Same argument of hardware errata applied to data path too.
> 
> Anything makes datapath different? Xen used to fallback to shadow page tables
> to workaround hardware TDP errata in the past.
> 
> > One should not implement in hw...
> >
> > I disagree with such argument.
> 
> It's not my argument.
> 
You claimed that to overcome hw errata, one should use trap_emulation, somehow only for portion of the functionality.
And rest portion of the functionality does not have hw errata, hence hw should be use (for example for data path). :)

> >
> > You can say nesting is requirement for some use cases, so spec should support
> it without blocking the passthrough mode.
> > Then it is fair discussion.
> >
> > I will not debate further on passthrough vs control path mediation as
> either_or approach.
> >
> > >
> > > >
> > > > > FLR at least.
> > > > It is actually the opposite.
> > > > FLR is supported with the proposal without any workarounds and
> mediation.
> > >
> > > It's an obvious drawback but not an advantage. And it's not a must
> > > for live migration to work. You need to prove the FLR doesn't
> > > conflict with the live migration, and it's not only FLR but also all the other
> PCI facilities.
> > I don’t know what you mean by prove. It is already clear from the proposal
> FLR is not messing with rest of the device migration infrastructure.
> > You should read [1].
> 
> I don't think you answered my question in that thread.
> 
Please ask the question in that series if any, because there is no FLR, device reset interaction in passthrough between owner and member device.

> >
> > > one other
> > > example is P2P and what's the next? As more features were added to
> > > the PCI spec, you will have endless work in auditing the possible
> > > conflict with the passthrough based live migration.
> > >
> > This drawback equally applies to mediation route where one need to do more
> than audit where the mediation layer to be extended.
> 
> No, for trap and emulation we don't need to do that. We only do datapath
> assignments.
> 
It is required, because also such paths to be audited and extended as without it the feature does not visible to the guest.

> > So each method has its pros and cons. One suits one use case, other suits
> other use case.
> > Therefore, again attempting to claim that only mediation approach is the only
> way to progress is incorrect.
> 
> I never say things like this, it is your proposal that mandates migration with
> admin commands. Could you please read what is proposed in this series
> carefully?
> 
Admin commands are split from the AQ so one can use the admin commands inband as well.
Though, I don’t see how it can functionality work without mediation.
This is the key technical difference between two approaches.

> On top of this series, you can build your amd commands easily. But there's
> nothing that can be done on top of your proposal.
> 
I don’t see what more to be done on top of our proposal.
If you hint nesting, than it can be done through a peer admin device to delete such admin role.

> >
> > In fact audit is still better than mediation because most audits are read only
> work as opposed to endlessly extending trapping and adding support in core
> stack.
> 
> One reality that you constantly ignore is that such trapping and device models
> have been widely used by a lot of cloud vendors for more than a decade.
> 
It may be but, it is not the only option.

> > Again, it is a choice that user make with the tradeoff.
> >
> > > >
> > > > >
> > > > > For non standard device we don't have choices other than
> > > > > passthrough, but for standard devices we have other choices.
> > > >
> > > > Passthrough is basic requirement that we will be fulfilling.
> > >
> > > It has several drawbacks that I would not like to repeat. We all
> > > know even for VFIO, it requires a trap instead of a complete passthrough.
> > >
> > Sure. Both has pros and cons.
> > And both can co-exist.
> 
> I don't see how it can co-exist with your proposal. I can see how admin
> commands can co-exist on top of this series.
> 
The reason to me both has difficulty is because both are solving different problem.
And they can co-exist as two different methods to two different problems.

> >
> > > > If one wants to do special nesting, may be, there.
> > >
> > > Nesting is not special. Go and see how it is supported by major
> > > cloud vendors and you will get the answer. Introducing an interface
> > > in virtio that is hard to be virtualized is even worse than writing
> > > a compiler that can not do bootstrap compilation.
> > We checked with more than two major cloud vendors and passthrough suffice
> their use cases and they are not doing nesting.
> > And other virtio vendor would also like to support native devices. So again,
> please do not portray that nesting is the only thing and passthrough must not be
> done.
> 
> Where do I say passthrough must not be done? I'm saying you need to justify
> your proposal instead of simply saying "hey, you are wrong".
> 
I never said you are wrong. I replied to Lingshan that resuming/suspending queues after the device is suspended, is wrong, and it should not be done.

> Again, nesting is not the only issue, the key point is that it's partial and not self
> contained.

Admin commands are self-contained to the owner device.
They are not self contained in the member device, because it cannot be. Self containment cannot work with device reset, flr, dma flow.
Self containment requires mediation or renamed trap+emulation; which is the anti-goal of passtrough.
And I am very interested if you can show how admin commands can work with device reset, flr flow WITHOUT mediation approach.
Lingshan so far didn’t answer this.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:08                                   ` Jason Wang
@ 2023-09-17  5:22                                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:39 AM
> 
> On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:15 AM
> >
> > [..]
> > > > > > [1]
> > > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/ms
> > > > > > g000
> > > > > > 61.h
> > > > > > tml
> > > > >
> > > > > The series works for stateless devices. Before we introduce
> > > > > device states in the spec, we can't migrate stateful devices. So
> > > > > the device context doesn't make much sense right now.
> > > > The series works for stateful devices too. The device context covers it.
> > >
> > > How? Can it be used for migrating any existing stateful devices?
> > > Don't we need to define what context means for a specific stateful
> > > device before you can introduce things like device context? Please
> > > go through the archives for the relevant discussions (e.g
> > > virtio-FS), it's not as simple as introducing a device context API.
> > >
> > A device will have its own context for example RSS definition, or flow filters
> tomorrow.
> 
> If you know there are things that are missing when posting the patches, please
> use the RFC tag.
> 
It is not missing. They are optional, which is why it is not needed in this series.

> > The device context will be extended post the first series.
> >
> > > And what's more, how can it handle the migration compatibility?
> > It will be taken care in follow on as we all know that this to be checked.
> 
> You don't even mention it anywhere in your series.
> 
Migration compatibility is topic in itself regardless of device migration series.
It is part of the feature provisioning phase needed regardless.
Like how you and Lingshan wanted to keep the suspend bit series small and logical, device migration series is also logically split for the functionality.
I don’t see a need to mention the long known missing functionality and common to both approaches.

> > I will include the notes of future follow up work items in v1, which will be
> taken care post this series.
> >
> > > > > Dirty page tracking in virtio is not a must for live migration
> > > > > to work. It can be done via platform facilities or even
> > > > > software. And to make it more efficient, it needs to utilize
> > > > > transport facilities instead of a
> > > general one.
> > > > >
> > > > It is also optional in the spec proposal.
> > > > Most platforms claimed are not able to do efficiently either,
> > >
> > > Most platforms are working towards an efficient way. But we are
> > > talking about different things, hardware based dirty page logging is
> > > not a must, that is what I'm saying. For example, KVM doesn't use hardware
> to log dirty pages.
> > >
> > I also said same, that hw based dirty page logging is not must. :) One
> > day hw mmu will be able to track everything efficiently. I have not seen it
> happening yet.
> 
> How do you define efficiency? KVM uses page fault and most modern IOMMU
> support PRI now.
>
One cannot define PRI as mandatory feature. In our research and experiments we see that PRI is significantly slower to handle page faults.
Yet different topic...

Efficiency is defined by the downtime of the multiple devices in a VM.
And leading OS allowed device advancements by allowing device to report dirty pages in cpu and platform agnostic way...

One can use post-copy approach as well, current device migration is around established pre-copy approach.

> >
> > > > hence the vfio subsystem added the support for it.
> > >
> > > As an open standard, if it is designed for a specific software
> > > subsystem on a specific OS, it's a failure.
> > >
> > It is not.
> > One need accept that, in certain areas virtio is following the trails of
> advancement already done in sw stack.
> > So that virtio spec advancement fits in to supply such use cases.
> > And blocking such advancement of virtio spec to promote only_mediation
> approach is not good either.
> >
> > BTW: One can say the mediation approach is also designed for specific
> software subsystem and hence failure.
> > I will stay away from quoting it, as I don’t see it this way.
> 
> The proposal is based on well known technology since the birth of virtualization.
Sure, but that does not change the fact that such series is also targeted for a specific software subsystem..
And hence failure.

I didn’t say that, I said the opposite that yes, since the virtio is in catch up mode, it is defining the interface so that it can fit into these OS platforms.
Mostly multiple of them, who all support passthrough devices.

> I never knew a mainstream hypervisor that doesn't do trap and emulate, did
> you?
> 
It does trap and emulation for PCI config space, not for virtio interfaces like queues, config space and more for passthrough devices.

> >
> > > >
> > > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > > method and how it conflicts with live migration and complicates
> > > > > the device
> > > implementation.
> > > > Huh, it shows the opposite.
> > > > It shows that both will seamlessly work.
> > >
> > > Have you even tried your proposal with a prototype device?
> > Of course, it is delivered to user for 1.5 years ago before bringing it to the
> spec with virtio-net and virtio-blk devices.
> 
> I hope this is your serious answer, but it looks like it is not. Your proposal misses
> a lot of states as I pointed out in another thread, how can it work in fact?
> 
Which states?
What is posted in series [1] is minimal and base required items, optional one is omitted as it can be done incrementally.
Lingshan had hard time digesting the basics of P2P and dirty page tracking work in this short series.
So there is no point in pushing large part of the device context and making the series blurry.
It will be done incrementally subsequently.

> > > >
> > > > > And it means you need to audit all PCI features and do
> > > > > workaround if there're any possible issues (or using a whitelist).
> > > > No need for any of this.
> > >
> > > You need to prove this otherwise it's fragile. It's the duty of the
> > > author to justify not the reviewer.
> > >
> > One cannot post patches and nor review giant series in one go.
> > Hence the work to be split on a logical boundary.
> > Features provisioning, pci layout etc is secondary tasks to take care of.
> 
> Again, if you know something is missing, you need to explain it in the series
> instead of waiting for some reviewers to point it out and say it's well-known
> afterwards.
> 
The patch set cannot be a laundry list of items missing in virtio spec.
It is short and focused on the device migration.

> >
> > > For example FLR is required to be done in 100ms. How could you
> > > achieve this during the live migration? How does it affect the downtime and
> FRS?
> > >
> > Good technical question to discuss instead of passthrough vs
> > mediation. :)
> >
> > Device administration work is separate from the device operational part.
> > The device context records what is the current device context, when the FLR
> occurs, the device stops all the operations.
> > And on next read of the device context the FLRed context is returned.
> 
> Firstly, you didn't explain how it affects the live migration, for example, what
> happens if we try to migrate while FLR is ongoing.
> Secondly, you ignore the other two questions.
> 
> Let's save the time of both.
> 
There is nothing to explain about device reset and live migration, because there is absolutely there is no touch points.
device_status is just another registers like rest of them.
One does not need to poke around registers when doing passthrough.

> >
> > > >
> > > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > > don't use simple passthrough we don't need to care about this.
> > > > >
> > > > Exactly, we are migrating virtio device for the PCI transport.
> > >
> > > No, the migration facility is a general requirement for all transport.
> > It is for all transport. One can extend when do for MMIO.
> 
> By using admin commands? It can not perform well for registered.
> 
Yes, admin commands using AQ on MMIO based owner device will also be just fine.

> >
> > > Starting from a PCI specific (actually your proposal does not even
> > > cover all even for PCI) solution which may easily end up with issues in other
> transports.
> > >
> > Like?
> 
> The admin command/virtqueue itself may not work well for other transport.
> That's the drawback of your proposal while this proposal doesn't do any
> coupling.
> 
There is no coupling in the spec of admin command with virtqueue as Michael consistently insisted.
And in my proposal also there is no such coupling.

> >
> > > Even if you want to migrate virtio for PCI,  please at least read
> > > Qemu migration codes for virtio and PCI, then you will soon realize
> > > that a lot of things are missing in your proposal.
> > >
> > Device context is something that will be extended.
> > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> transport.
> 
> This is just one mini stuff, how about PCI config space and others?
> 
No need to migrate the PCI config space, because migration is of the virtio device, and not the underlying transport.
Therefore, one can migrate from virtio member device to a fully software based device as well and vis versa.

> Again, please read Qemu codes, a lot of things are missing in your proposal
> now. If everything is fine to do passthrough based live migration, I'm pretty sure
> you need more than what Qemu has since it can only do a small fraction of the
> whole PCI.
> 
I will read.
Many of the pieces may be implemented by the device over time following the charter.

> >
> > > > As usual, if you have to keep arguing about not doing
> > > > passhthrough, we are
> > > surely past that point.
> > >
> > > Who is "we"?
> > >
> > We = You and me.
> > From 2021, you keep objecting that passthrough must not be done.
> 
> This is a big misunderstanding, you need to justify it or at least address the
> concerns from any reviewer.
> 
They are getting addressed, if you have comments, please post those comments in the actual series.
I wouldn’t diverge to discuss in different series here.

> > And blocking the work done by other technical committee members to
> improve the virtio spec to make that happen is simply wrong.
> 
> It's unrealistic to think that one will be 100% correct. Justify your proposal or
> why I was wrong instead of ignoring my questions and complaining. That is why
> we need a community. If it doesn't work, virtio provides another process for
> convergence.
> 
I am not expecting you to be correct at all. I totally agree that you may miss something, I may miss something.
And this is why I repeatedly, humbly ask to converge and jointly address the passthrough mode without trap+emulation method.
The way I understood from your comment is, passthrough for hw based device must not be done and multiple of hw vendors disagree to it.

> >
> > > Is something like what you said here passed the vote and written to
> > > the spec?
> > Not only me.
> > The virtio technical committee has agreed for nested and hardware-based
> implementation _both_.
> >
> > " hardware-based implementations" is part of the virtio specification charter
> with ballot of [1].
> >
> > [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> 
> Let's don't do conceptual shifts, I was asking the passthrough but you give me
> the hardware implementation.
> 
Passthrough devices implemented by hw which does dirty tracking and following the spec.

> >
> > And passthrough hardware-based device is in the charter that we strive to
> support.
> >
> > > We all know the current virtio spec is not built upon passthrough.
> >
> > This efforts improve the passthrough hw based implementation that should
> not be blocked.
> 
> Your proposal was posted only for several days and you think I would block that
> just because I asked several questions and some of them are not answered?
> 
If I misunderstood, then I am sorry.
Lets progress and improve the passthrough use case without trap+emulation.
Trap+emulation=mediation is also a valid solution for nested case.
And I frankly see a need for both as both are solving a different problem.
Trap+emulation cannot achieve passthrough mode, hence my request was not to step on each other.

When both can use the common infra, it is good to do that, when they cannot, due to the technical challenges of underlying transport, they should evolve differently.

> >
> > > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> > >
> > > It's not the mediation, we're not doing vDPA, the device model we
> > > had in hardware and we present to guests are all virtio devices.
> > > It's the trap and emulation which is fundamental in the world of
> > > virtualization for the past decades. It's the model we used to
> > > virtualize standard devices. If you want to debate this methodology, virtio
> community is clearly the wrong forum.
> > >
> > I am not debating it at all. You keep bringing up the point of mediation.
> >
> > The proposal of [1] is clear that wants to do hardware based passthrough
> devices with least amount of virtio level mediation.
> >
> > So somewhere mode of virtualizing has been used, that’s fine, it can
> > continue with full virtualization, mediation,
> >
> > And also hardware based passthrough device.
> >
> > > >
> > > > Series [1] will be enhanced further to support virtio passthrough
> > > > device for
> > > device context and more.
> > > > Even further we like to extend the support.
> > > >
> > > > > Since the functionality proposed in this series focus on the
> > > > > minimal set of the functionality for migration, it is virtio
> > > > > specific and self contained so nothing special is required to work in the
> nest.
> > > >
> > > > Maybe it is.
> > > >
> > > > Again, I repeat and like to converge the admin commands between
> > > passthrough and non-passthrough cases.
> > >
> > > You need to prove at least that your proposal can work for the
> > > passthrough before we can try to converge.
> > >
> > What do you mean by "prove"? virtio specification development is not proof
> based method.
> 
> For example, several of my questions were ignored.
> 
I didn’t ignore, but if I miss, I will answer.

> >
> > If you want to participate, please review the patches and help community to
> improve.
> 
> See above.
> 
> >
> > > > If we can converge it is good.
> > > > If not both modes can expand.
> > > > It is not either or as use cases are different.
> > >
> > > Admin commands are not the cure for all, I've stated drawbacks in
> > > other threads. Not repeating it again here.
> > He he, sure, I am not attempting to cure all.
> > One solution does not fit all cases.
> 
> Then why do you want to couple migration with admin commands?
> 
Because of following.
1. A device migration needs to bulk data transfer, this is something cannot be done with tiny registers.
Cannot be done through registers, because
a. registers are slow for bidirectional communication
b. do not scale well with scale of VFs

> > Admin commands are used to solve the specific problem for which the AQ is
> designed for.
> >
> > One can make argument saying take pci fabric to 10 km distance, don’t bring
> new virtio tcp transport...
> >
> > Drawing boundaries around virtio spec in certain way only makes it further
> inferior. So please do not block advancements bring in [1].
> 
> As a reviewer, I ask questions but some of them are ignored, do you expect the
> reviewer to figure out by themselves?  
Sure, please review.

Many of them were not questions, but assertion and conclusions that it does not fit nested.. and sub-optional etc.

> 
> > We really would like to make it more robust with your rich experience and
> inputs, if you care to participate.
> 
> We can collaborate for sure: as I pointed out in another threads, from what I
> can see from the both proposals of the current version:
> 
> I see a good opportunity to build your admin commands proposal on top of this
> proposal. Or it means, we can focus on what needs to be migrated first:
> 
> 1) queue state
This is just one small part of the device context
So once a device context is read/written, it covers q.

> 2) inflight descriptors
Same a q state, it is part of the device context.

> 3) dirty pages (optional)
> 4) device state(context) (optional)
>
It is same as #1 and #2.
Splitting them from #1 and #2 is not needed.

We can extend the device context to be selectively queried for nested case..
 
> I'd leave 3 or 4 since they are very complicated features. Then we can invent an
> interface to access those facilities? This is how this series is structured.
> 
> And what's more, admin commands or transport specific interfaces. And when
> we invent admin commands, you may realize you are inventing a new transport
> which is the idea of transport via admin commands.

Not really. it is not a new transport at all.
I explained you before when you quote is as transport, it must carry the driver notifications as well..
Otherwise it is just set of commands..

The new commands are self contained anyway of [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-1-parav@nvidia.com/T/#t

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-17  5:22                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:39 AM
> 
> On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:15 AM
> >
> > [..]
> > > > > > [1]
> > > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/ms
> > > > > > g000
> > > > > > 61.h
> > > > > > tml
> > > > >
> > > > > The series works for stateless devices. Before we introduce
> > > > > device states in the spec, we can't migrate stateful devices. So
> > > > > the device context doesn't make much sense right now.
> > > > The series works for stateful devices too. The device context covers it.
> > >
> > > How? Can it be used for migrating any existing stateful devices?
> > > Don't we need to define what context means for a specific stateful
> > > device before you can introduce things like device context? Please
> > > go through the archives for the relevant discussions (e.g
> > > virtio-FS), it's not as simple as introducing a device context API.
> > >
> > A device will have its own context for example RSS definition, or flow filters
> tomorrow.
> 
> If you know there are things that are missing when posting the patches, please
> use the RFC tag.
> 
It is not missing. They are optional, which is why it is not needed in this series.

> > The device context will be extended post the first series.
> >
> > > And what's more, how can it handle the migration compatibility?
> > It will be taken care in follow on as we all know that this to be checked.
> 
> You don't even mention it anywhere in your series.
> 
Migration compatibility is topic in itself regardless of device migration series.
It is part of the feature provisioning phase needed regardless.
Like how you and Lingshan wanted to keep the suspend bit series small and logical, device migration series is also logically split for the functionality.
I don’t see a need to mention the long known missing functionality and common to both approaches.

> > I will include the notes of future follow up work items in v1, which will be
> taken care post this series.
> >
> > > > > Dirty page tracking in virtio is not a must for live migration
> > > > > to work. It can be done via platform facilities or even
> > > > > software. And to make it more efficient, it needs to utilize
> > > > > transport facilities instead of a
> > > general one.
> > > > >
> > > > It is also optional in the spec proposal.
> > > > Most platforms claimed are not able to do efficiently either,
> > >
> > > Most platforms are working towards an efficient way. But we are
> > > talking about different things, hardware based dirty page logging is
> > > not a must, that is what I'm saying. For example, KVM doesn't use hardware
> to log dirty pages.
> > >
> > I also said same, that hw based dirty page logging is not must. :) One
> > day hw mmu will be able to track everything efficiently. I have not seen it
> happening yet.
> 
> How do you define efficiency? KVM uses page fault and most modern IOMMU
> support PRI now.
>
One cannot define PRI as mandatory feature. In our research and experiments we see that PRI is significantly slower to handle page faults.
Yet different topic...

Efficiency is defined by the downtime of the multiple devices in a VM.
And leading OS allowed device advancements by allowing device to report dirty pages in cpu and platform agnostic way...

One can use post-copy approach as well, current device migration is around established pre-copy approach.

> >
> > > > hence the vfio subsystem added the support for it.
> > >
> > > As an open standard, if it is designed for a specific software
> > > subsystem on a specific OS, it's a failure.
> > >
> > It is not.
> > One need accept that, in certain areas virtio is following the trails of
> advancement already done in sw stack.
> > So that virtio spec advancement fits in to supply such use cases.
> > And blocking such advancement of virtio spec to promote only_mediation
> approach is not good either.
> >
> > BTW: One can say the mediation approach is also designed for specific
> software subsystem and hence failure.
> > I will stay away from quoting it, as I don’t see it this way.
> 
> The proposal is based on well known technology since the birth of virtualization.
Sure, but that does not change the fact that such series is also targeted for a specific software subsystem..
And hence failure.

I didn’t say that, I said the opposite that yes, since the virtio is in catch up mode, it is defining the interface so that it can fit into these OS platforms.
Mostly multiple of them, who all support passthrough devices.

> I never knew a mainstream hypervisor that doesn't do trap and emulate, did
> you?
> 
It does trap and emulation for PCI config space, not for virtio interfaces like queues, config space and more for passthrough devices.

> >
> > > >
> > > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > > method and how it conflicts with live migration and complicates
> > > > > the device
> > > implementation.
> > > > Huh, it shows the opposite.
> > > > It shows that both will seamlessly work.
> > >
> > > Have you even tried your proposal with a prototype device?
> > Of course, it is delivered to user for 1.5 years ago before bringing it to the
> spec with virtio-net and virtio-blk devices.
> 
> I hope this is your serious answer, but it looks like it is not. Your proposal misses
> a lot of states as I pointed out in another thread, how can it work in fact?
> 
Which states?
What is posted in series [1] is minimal and base required items, optional one is omitted as it can be done incrementally.
Lingshan had hard time digesting the basics of P2P and dirty page tracking work in this short series.
So there is no point in pushing large part of the device context and making the series blurry.
It will be done incrementally subsequently.

> > > >
> > > > > And it means you need to audit all PCI features and do
> > > > > workaround if there're any possible issues (or using a whitelist).
> > > > No need for any of this.
> > >
> > > You need to prove this otherwise it's fragile. It's the duty of the
> > > author to justify not the reviewer.
> > >
> > One cannot post patches and nor review giant series in one go.
> > Hence the work to be split on a logical boundary.
> > Features provisioning, pci layout etc is secondary tasks to take care of.
> 
> Again, if you know something is missing, you need to explain it in the series
> instead of waiting for some reviewers to point it out and say it's well-known
> afterwards.
> 
The patch set cannot be a laundry list of items missing in virtio spec.
It is short and focused on the device migration.

> >
> > > For example FLR is required to be done in 100ms. How could you
> > > achieve this during the live migration? How does it affect the downtime and
> FRS?
> > >
> > Good technical question to discuss instead of passthrough vs
> > mediation. :)
> >
> > Device administration work is separate from the device operational part.
> > The device context records what is the current device context, when the FLR
> occurs, the device stops all the operations.
> > And on next read of the device context the FLRed context is returned.
> 
> Firstly, you didn't explain how it affects the live migration, for example, what
> happens if we try to migrate while FLR is ongoing.
> Secondly, you ignore the other two questions.
> 
> Let's save the time of both.
> 
There is nothing to explain about device reset and live migration, because there is absolutely there is no touch points.
device_status is just another registers like rest of them.
One does not need to poke around registers when doing passthrough.

> >
> > > >
> > > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > > don't use simple passthrough we don't need to care about this.
> > > > >
> > > > Exactly, we are migrating virtio device for the PCI transport.
> > >
> > > No, the migration facility is a general requirement for all transport.
> > It is for all transport. One can extend when do for MMIO.
> 
> By using admin commands? It can not perform well for registered.
> 
Yes, admin commands using AQ on MMIO based owner device will also be just fine.

> >
> > > Starting from a PCI specific (actually your proposal does not even
> > > cover all even for PCI) solution which may easily end up with issues in other
> transports.
> > >
> > Like?
> 
> The admin command/virtqueue itself may not work well for other transport.
> That's the drawback of your proposal while this proposal doesn't do any
> coupling.
> 
There is no coupling in the spec of admin command with virtqueue as Michael consistently insisted.
And in my proposal also there is no such coupling.

> >
> > > Even if you want to migrate virtio for PCI,  please at least read
> > > Qemu migration codes for virtio and PCI, then you will soon realize
> > > that a lot of things are missing in your proposal.
> > >
> > Device context is something that will be extended.
> > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> transport.
> 
> This is just one mini stuff, how about PCI config space and others?
> 
No need to migrate the PCI config space, because migration is of the virtio device, and not the underlying transport.
Therefore, one can migrate from virtio member device to a fully software based device as well and vis versa.

> Again, please read Qemu codes, a lot of things are missing in your proposal
> now. If everything is fine to do passthrough based live migration, I'm pretty sure
> you need more than what Qemu has since it can only do a small fraction of the
> whole PCI.
> 
I will read.
Many of the pieces may be implemented by the device over time following the charter.

> >
> > > > As usual, if you have to keep arguing about not doing
> > > > passhthrough, we are
> > > surely past that point.
> > >
> > > Who is "we"?
> > >
> > We = You and me.
> > From 2021, you keep objecting that passthrough must not be done.
> 
> This is a big misunderstanding, you need to justify it or at least address the
> concerns from any reviewer.
> 
They are getting addressed, if you have comments, please post those comments in the actual series.
I wouldn’t diverge to discuss in different series here.

> > And blocking the work done by other technical committee members to
> improve the virtio spec to make that happen is simply wrong.
> 
> It's unrealistic to think that one will be 100% correct. Justify your proposal or
> why I was wrong instead of ignoring my questions and complaining. That is why
> we need a community. If it doesn't work, virtio provides another process for
> convergence.
> 
I am not expecting you to be correct at all. I totally agree that you may miss something, I may miss something.
And this is why I repeatedly, humbly ask to converge and jointly address the passthrough mode without trap+emulation method.
The way I understood from your comment is, passthrough for hw based device must not be done and multiple of hw vendors disagree to it.

> >
> > > Is something like what you said here passed the vote and written to
> > > the spec?
> > Not only me.
> > The virtio technical committee has agreed for nested and hardware-based
> implementation _both_.
> >
> > " hardware-based implementations" is part of the virtio specification charter
> with ballot of [1].
> >
> > [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> 
> Let's don't do conceptual shifts, I was asking the passthrough but you give me
> the hardware implementation.
> 
Passthrough devices implemented by hw which does dirty tracking and following the spec.

> >
> > And passthrough hardware-based device is in the charter that we strive to
> support.
> >
> > > We all know the current virtio spec is not built upon passthrough.
> >
> > This efforts improve the passthrough hw based implementation that should
> not be blocked.
> 
> Your proposal was posted only for several days and you think I would block that
> just because I asked several questions and some of them are not answered?
> 
If I misunderstood, then I am sorry.
Lets progress and improve the passthrough use case without trap+emulation.
Trap+emulation=mediation is also a valid solution for nested case.
And I frankly see a need for both as both are solving a different problem.
Trap+emulation cannot achieve passthrough mode, hence my request was not to step on each other.

When both can use the common infra, it is good to do that, when they cannot, due to the technical challenges of underlying transport, they should evolve differently.

> >
> > > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> > >
> > > It's not the mediation, we're not doing vDPA, the device model we
> > > had in hardware and we present to guests are all virtio devices.
> > > It's the trap and emulation which is fundamental in the world of
> > > virtualization for the past decades. It's the model we used to
> > > virtualize standard devices. If you want to debate this methodology, virtio
> community is clearly the wrong forum.
> > >
> > I am not debating it at all. You keep bringing up the point of mediation.
> >
> > The proposal of [1] is clear that wants to do hardware based passthrough
> devices with least amount of virtio level mediation.
> >
> > So somewhere mode of virtualizing has been used, that’s fine, it can
> > continue with full virtualization, mediation,
> >
> > And also hardware based passthrough device.
> >
> > > >
> > > > Series [1] will be enhanced further to support virtio passthrough
> > > > device for
> > > device context and more.
> > > > Even further we like to extend the support.
> > > >
> > > > > Since the functionality proposed in this series focus on the
> > > > > minimal set of the functionality for migration, it is virtio
> > > > > specific and self contained so nothing special is required to work in the
> nest.
> > > >
> > > > Maybe it is.
> > > >
> > > > Again, I repeat and like to converge the admin commands between
> > > passthrough and non-passthrough cases.
> > >
> > > You need to prove at least that your proposal can work for the
> > > passthrough before we can try to converge.
> > >
> > What do you mean by "prove"? virtio specification development is not proof
> based method.
> 
> For example, several of my questions were ignored.
> 
I didn’t ignore, but if I miss, I will answer.

> >
> > If you want to participate, please review the patches and help community to
> improve.
> 
> See above.
> 
> >
> > > > If we can converge it is good.
> > > > If not both modes can expand.
> > > > It is not either or as use cases are different.
> > >
> > > Admin commands are not the cure for all, I've stated drawbacks in
> > > other threads. Not repeating it again here.
> > He he, sure, I am not attempting to cure all.
> > One solution does not fit all cases.
> 
> Then why do you want to couple migration with admin commands?
> 
Because of following.
1. A device migration needs to bulk data transfer, this is something cannot be done with tiny registers.
Cannot be done through registers, because
a. registers are slow for bidirectional communication
b. do not scale well with scale of VFs

> > Admin commands are used to solve the specific problem for which the AQ is
> designed for.
> >
> > One can make argument saying take pci fabric to 10 km distance, don’t bring
> new virtio tcp transport...
> >
> > Drawing boundaries around virtio spec in certain way only makes it further
> inferior. So please do not block advancements bring in [1].
> 
> As a reviewer, I ask questions but some of them are ignored, do you expect the
> reviewer to figure out by themselves?  
Sure, please review.

Many of them were not questions, but assertion and conclusions that it does not fit nested.. and sub-optional etc.

> 
> > We really would like to make it more robust with your rich experience and
> inputs, if you care to participate.
> 
> We can collaborate for sure: as I pointed out in another threads, from what I
> can see from the both proposals of the current version:
> 
> I see a good opportunity to build your admin commands proposal on top of this
> proposal. Or it means, we can focus on what needs to be migrated first:
> 
> 1) queue state
This is just one small part of the device context
So once a device context is read/written, it covers q.

> 2) inflight descriptors
Same a q state, it is part of the device context.

> 3) dirty pages (optional)
> 4) device state(context) (optional)
>
It is same as #1 and #2.
Splitting them from #1 and #2 is not needed.

We can extend the device context to be selectively queried for nested case..
 
> I'd leave 3 or 4 since they are very complicated features. Then we can invent an
> interface to access those facilities? This is how this series is structured.
> 
> And what's more, admin commands or transport specific interfaces. And when
> we invent admin commands, you may realize you are inventing a new transport
> which is the idea of transport via admin commands.

Not really. it is not a new transport at all.
I explained you before when you quote is as transport, it must carry the driver notifications as well..
Otherwise it is just set of commands..

The new commands are self contained anyway of [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-1-parav@nvidia.com/T/#t

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:11                                                                 ` Jason Wang
@ 2023-09-17  5:25                                                                   ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:41 AM
> 
> On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Wednesday, September 13, 2023 9:51 AM
> >
> > > we plan to implement a self-contain solution
> > Make sure that works with device reset and FLR.
> 
> We don't need to do that. It's out of the spec.
> 
It is not. For the PCI member device, it needs to work reliably.
Not doing means it relies on the trap+emulation, hence it just cannot complete.
And it is ok to me.
I just wont claim that trap+emulation is _complete_ method.

> > And if not, explain that it is for mediation mode related tricks.
> 
> It's not the tricks and again, it's not mediation but trap and emulation. It's the
> fundamental methodology used in virtualization, so does the virtio spec.

Not the virto spec of 2023 and more for new features.
The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation based software AFAIK.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-17  5:25                                                                   ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:41 AM
> 
> On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Wednesday, September 13, 2023 9:51 AM
> >
> > > we plan to implement a self-contain solution
> > Make sure that works with device reset and FLR.
> 
> We don't need to do that. It's out of the spec.
> 
It is not. For the PCI member device, it needs to work reliably.
Not doing means it relies on the trap+emulation, hence it just cannot complete.
And it is ok to me.
I just wont claim that trap+emulation is _complete_ method.

> > And if not, explain that it is for mediation mode related tricks.
> 
> It's not the tricks and again, it's not mediation but trap and emulation. It's the
> fundamental methodology used in virtualization, so does the virtio spec.

Not the virto spec of 2023 and more for new features.
The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation based software AFAIK.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-14  3:12                       ` Jason Wang
@ 2023-09-17  5:29                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:42 AM
> 
> On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:14 AM
> >
> > > It's not about how many states in a single state machine, it's about
> > > how many state machines that exist for device status. Having more
> > > than one creates big obstacles and complexity in the device. You
> > > need to define the interaction of each state otherwise you leave undefined
> behaviours.
> > The device mode has zero relation to the device status.
> 
> You will soon get this issue when you want to do nesting.
> 
I don’t think so. One needs to intercept it when one wants to do trap+emulation which seems to fullfil the nesting use case.

> > It does not mess with it at all.
> > In fact the new bits in device status is making it more complex for the device
> to handle.
> 
> Are you challenging the design of the device status? It's definitely too late to do
> this.
> 
No. I am saying the extending device_status with yet another state is equally complex and its core of the device.

> This proposal increases just one bit and that worries you? Or you think one
> more state is much more complicated than a new state machine with two
> states?

It is mode and not state. And two modes are needed for supporting P2P device.
When one wants to do with mediation, there also two states are needed.

The key is modes are not interacting with the device_status because device_status is just another register of the virtio.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-17  5:29                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:29 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 14, 2023 8:42 AM
> 
> On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Wednesday, September 13, 2023 10:14 AM
> >
> > > It's not about how many states in a single state machine, it's about
> > > how many state machines that exist for device status. Having more
> > > than one creates big obstacles and complexity in the device. You
> > > need to define the interaction of each state otherwise you leave undefined
> behaviours.
> > The device mode has zero relation to the device status.
> 
> You will soon get this issue when you want to do nesting.
> 
I don’t think so. One needs to intercept it when one wants to do trap+emulation which seems to fullfil the nesting use case.

> > It does not mess with it at all.
> > In fact the new bits in device status is making it more complex for the device
> to handle.
> 
> Are you challenging the design of the device status? It's definitely too late to do
> this.
> 
No. I am saying the extending device_status with yet another state is equally complex and its core of the device.

> This proposal increases just one bit and that worries you? Or you think one
> more state is much more complicated than a new state machine with two
> states?

It is mode and not state. And two modes are needed for supporting P2P device.
When one wants to do with mediation, there also two states are needed.

The key is modes are not interacting with the device_status because device_status is just another register of the virtio.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-15  4:28   ` [virtio-dev] " Zhu, Lingshan
@ 2023-09-17  5:32     ` Parav Pandit
  2023-09-18  3:10       ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-17  5:32 UTC (permalink / raw)
  To: Zhu, Lingshan, virtio-dev


> From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On
> Behalf Of Zhu, Lingshan
> Sent: Friday, September 15, 2023 9:59 AM
> 
> On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> >> This series introduces
> >> 1)a new SUSPEND bit in the device status Which is used to suspend the
> >> device, so that the device states and virtqueue states are
> >> stabilized.
> >>
> >> 2)virtqueue state and its accessor, to get and set last_avail_idx and
> >> last_used_idx of virtqueues.
> >>
> >> The main usecase of these new facilities is Live Migration.
> >>
> >> Future work: dirty page tracking and in-flight descriptors.
> >> This series addresses many comments from Jason, Stefan and Eugenio
> >> from RFC series.
> > Compared to Parav's patchset this is much less functional.
> we will add dirty page tracking and in-flight IO tracker in V2, then it will be a
> full featured LM solution.
> 
> They are not in this series because we want this series to be small and focus.
> >
> > Assuming that one goes in, can't we add ability to submit admin
> > commands through MMIO on the device itself and be done with it?
> I am not sure, IMHO, if we use admin vq as back-ends for MMIO based live
> migration, then the issues in admin vq still exist, for example:
> 1)nested virtualization
> 2)bare-metal live migration
> 3)QOS
> 4)introduce more attacking surfaces.
> 
#4 is just random without.
#3 There is no QoS issue with admin commands and queues. If you claim that then whole virtio spec based on the virtqueues is broken.
And it is certainly not the case.

> And what's more, if we wants to implementing a new capability onbehalf of
> admin vq, does the capability need to store at least one descriptor buffer, that is
> the capability length should be at least the max_lengh_of_buffer?
> 
> If that is not possible, do we need to implement extra fields like length and
> remaining_length, then the device repeating update the cap data, and the
> driver repeat reading, way to complex and introduce significant downtime.
> 
At least I do not understand. May be you can explain more?

We observe less than 300 msec downtime using admin commands over admin queue in internal tests in virtio area.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-15 11:10         ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-18  2:56           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  2:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/15/2023 7:10 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>>>> This patch introduces a new status bit in the device status: SUSPEND.
>>>>
>>>> This SUSPEND bit can be used by the driver to suspend a device,
>>>> in order to stabilize the device states and virtqueue states.
>>>>
>>>> Its main use case is live migration.
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    content.tex | 31 +++++++++++++++++++++++++++++++
>>>>    1 file changed, 31 insertions(+)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 0e492cd..0fab537 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>>      drive the device.
>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
>>>> +  device has been suspended by the driver.
>>>> +
>>>>    \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>>>      an error from which it can't recover.
>>>>    \end{description}
>>>> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    recover by issuing a reset.
>>>>    \end{note}
>>>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>>>> +
>>>> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
>>>> +
>>>>    \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>>>    The device MUST NOT consume buffers or send any used buffer
>>>> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>>>    MUST send a device configuration change notification to the driver.
>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>>>> +
>>>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
>>> why? let's just forbid driver from setting it.
>> OK
>>>> +
>>>> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
>>>> +
>>>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
>>>> +and resumes operation upon DRIVER_OK.
>>>> +
>>> sorry what?
>> In case of a failed or cancelled Live Migration, the device needs to resume
>> operation.
>> However the spec forbids the driver to clear a device status bit, so
>> re-writing
>> DRIVER_OK is expected to clear SUSPEND and the device resume operation.
> No, DRIVER_OK is already set. Setting a bit that is already set should
> not have side effects. In fact auto-clearing suspend is problematic too.
The spec says: Set the DRIVER_OK status bit. At this point the device is 
“live”.

So semantically DRIVER_OK can bring the device to live even from SUSPEND.

In the implementation, the device can check whether SUSPEND is set, then
decide what to do. Just don't ignore DRIVER_OK if it is already
set, and the driver should not clear a device status bit.

>
>
>>>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>>>> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
>>>> +
>>>> +\begin{itemize}
>>>> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
>>>> +\item Wait until all descriptors that being processed to finish and mark them as used.
>>>> +\item Flush all used buffer and send used buffer notifications to the driver.
>>> flush how?
>> This is device-type-specific, and we will include tracking inflight
>> descriptors(buffers) in V2.
>>>> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
>>> record where?
>> This is transport specific, for PCI, patch 5 introduces two new fields for
>> avail and used state
> they clearly can't store state for all vqs, these are just two 16 bit fields.
vq states filed can work with queue_select like other vq fields.
I will document this in the comment.
>
>>>> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
>>> pause in what sense? completely?  this does not seem realistic.
>>> e.g. pci express link has to stay active or device will die.
>> only pause virtio, I will rephrase the sentence as "pause its virtio
>> operation".
> that is vague too. for example what happens to link state of
> a networking device?
Then how about we say: pause operation in both data-path and control-path?

Or do you have any suggestion?
>
>> Others like PCI link in the example is out of the spec and we don't need
>> to migrate them.
>>>
>>> also, presumably here it is except a bunch of other fields.
>>> e.g. what about queue select and all related queue fields?
>> For now they are forbidden.
>>
>> As SiWei suggested, we will introduce a new feature bit to control whether
>> allowing resetting a VQ after SUSPEND. We can use more feature bits if
>> there are requirements to perform anything after SUSPEND. But for now
>> they are forbidden.
> I don't know how this means, but whatever. you need to make
> all this explicit though.
a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature
bit has been negotiated then the device allow reset a vq after SUSPEND.
>
>>>> +\end{itemize}
>>>> +
>>>>    \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>>>>    Each virtio device offers all the features it understands.  During
>>>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>>>    	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>>>>    	handling features reserved for future use.
>>>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
>>>> +   SUSPEND the device.
>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
>>>> +
>>>>    \end{description}
>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>>>> -- 
>>>> 2.35.3


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  2:56           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  2:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/15/2023 7:10 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>>>> This patch introduces a new status bit in the device status: SUSPEND.
>>>>
>>>> This SUSPEND bit can be used by the driver to suspend a device,
>>>> in order to stabilize the device states and virtqueue states.
>>>>
>>>> Its main use case is live migration.
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    content.tex | 31 +++++++++++++++++++++++++++++++
>>>>    1 file changed, 31 insertions(+)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 0e492cd..0fab537 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>>      drive the device.
>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, indicates that the
>>>> +  device has been suspended by the driver.
>>>> +
>>>>    \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>>>      an error from which it can't recover.
>>>>    \end{description}
>>>> @@ -73,6 +76,10 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    recover by issuing a reset.
>>>>    \end{note}
>>>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>>>> +
>>>> +When setting SUSPEND, the driver MUST re-read \field{device status} to ensure the SUSPEND bit is set.
>>>> +
>>>>    \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>>>    The device MUST NOT consume buffers or send any used buffer
>>>> @@ -82,6 +89,26 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>>>    MUST send a device configuration change notification to the driver.
>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>>>> +
>>>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not negotiated.
>>> why? let's just forbid driver from setting it.
>> OK
>>>> +
>>>> +The device SHOULD allow settings to \field{device status} even when SUSPEND is set.
>>>> +
>>>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device SHOULD clear SUSPEND
>>>> +and resumes operation upon DRIVER_OK.
>>>> +
>>> sorry what?
>> In case of a failed or cancelled Live Migration, the device needs to resume
>> operation.
>> However the spec forbids the driver to clear a device status bit, so
>> re-writing
>> DRIVER_OK is expected to clear SUSPEND and the device resume operation.
> No, DRIVER_OK is already set. Setting a bit that is already set should
> not have side effects. In fact auto-clearing suspend is problematic too.
The spec says: Set the DRIVER_OK status bit. At this point the device is 
“live”.

So semantically DRIVER_OK can bring the device to live even from SUSPEND.

In the implementation, the device can check whether SUSPEND is set, then
decide what to do. Just don't ignore DRIVER_OK if it is already
set, and the driver should not clear a device status bit.

>
>
>>>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>>>> +the device SHOULD perform the following actions before presenting SUSPEND bit in the \field{device status}:
>>>> +
>>>> +\begin{itemize}
>>>> +\item Stop consuming buffers of any virtqueues and mark all finished descritors as used.
>>>> +\item Wait until all descriptors that being processed to finish and mark them as used.
>>>> +\item Flush all used buffer and send used buffer notifications to the driver.
>>> flush how?
>> This is device-type-specific, and we will include tracking inflight
>> descriptors(buffers) in V2.
>>>> +\item Record Virtqueue State of each enabled virtqueue, see section \ref{sec:Virtqueues / Virtqueue State}
>>> record where?
>> This is transport specific, for PCI, patch 5 introduces two new fields for
>> avail and used state
> they clearly can't store state for all vqs, these are just two 16 bit fields.
vq states filed can work with queue_select like other vq fields.
I will document this in the comment.
>
>>>> +\item Pause its operation except \field{device status} and preserve configurations in its Device Configuration Space, see \ref{sec:Basic Facilities of a Virtio Device / Device Configuration Space}
>>> pause in what sense? completely?  this does not seem realistic.
>>> e.g. pci express link has to stay active or device will die.
>> only pause virtio, I will rephrase the sentence as "pause its virtio
>> operation".
> that is vague too. for example what happens to link state of
> a networking device?
Then how about we say: pause operation in both data-path and control-path?

Or do you have any suggestion?
>
>> Others like PCI link in the example is out of the spec and we don't need
>> to migrate them.
>>>
>>> also, presumably here it is except a bunch of other fields.
>>> e.g. what about queue select and all related queue fields?
>> For now they are forbidden.
>>
>> As SiWei suggested, we will introduce a new feature bit to control whether
>> allowing resetting a VQ after SUSPEND. We can use more feature bits if
>> there are requirements to perform anything after SUSPEND. But for now
>> they are forbidden.
> I don't know how this means, but whatever. you need to make
> all this explicit though.
a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature
bit has been negotiated then the device allow reset a vq after SUSPEND.
>
>>>> +\end{itemize}
>>>> +
>>>>    \section{Feature Bits}\label{sec:Basic Facilities of a Virtio Device / Feature Bits}
>>>>    Each virtio device offers all the features it understands.  During
>>>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>>>    	\ref{devicenormative:Basic Facilities of a Virtio Device / Feature Bits} for
>>>>    	handling features reserved for future use.
>>>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the driver can
>>>> +   SUSPEND the device.
>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device Status Field}.
>>>> +
>>>>    \end{description}
>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved Feature Bits}
>>>> -- 
>>>> 2.35.3


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-15 11:16         ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-18  3:02           ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>>>> This commit specifies the constraints of the virtqueue state,
>>>> and the actions should be taken by the device when SUSPEND
>>>> and DRIVER_OK is set
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    content.tex | 19 +++++++++++++++++++
>>>>    1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 0fab537..9d727ce 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>>>    When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>>>    is always 0
>>>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>>>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>>>> +used index in the used ring.
>>>> +
>>>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>>>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>>>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>>>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>>>> +the device MUST record the Virtqueue State of every enabled virtqueue
>>>> +in \field{Available State} and \field{Used State} respectively,
>>> record how?
>> This is transport specific, for PCI they are recorded in the common config
>> space,
>> two new fields of them are introduced in patch 5.
>
> that is not enough space to record state for every enabled vq.
They can work with queue_select like many other vq configurations.
I will mention this in the comment.
>
>>>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>>>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
>>> when is that?
>> When the DRIVER sets DRIVER_OK and done before the device presents
>> DRIVER_OK.
> I don't really understand the flow here. does SUSPEND clear DRIVER_OK
> then?
SUSPEND does not clear DRIVER, I think this is not a must.
>
>
>>>
>>>> +
>>>>    \input{admin.tex}
>>>>    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>> -- 
>>>> 2.35.3
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
@ 2023-09-18  3:02           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  3:02 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>>>> This commit specifies the constraints of the virtqueue state,
>>>> and the actions should be taken by the device when SUSPEND
>>>> and DRIVER_OK is set
>>>>
>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    content.tex | 19 +++++++++++++++++++
>>>>    1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 0fab537..9d727ce 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>>>    When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>>>    is always 0
>>>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>>>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>>>> +used index in the used ring.
>>>> +
>>>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>>>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>>>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>>>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>>>> +
>>>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>>>> +the device MUST record the Virtqueue State of every enabled virtqueue
>>>> +in \field{Available State} and \field{Used State} respectively,
>>> record how?
>> This is transport specific, for PCI they are recorded in the common config
>> space,
>> two new fields of them are introduced in patch 5.
>
> that is not enough space to record state for every enabled vq.
They can work with queue_select like many other vq configurations.
I will mention this in the comment.
>
>>>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>>>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
>>> when is that?
>> When the DRIVER sets DRIVER_OK and done before the device presents
>> DRIVER_OK.
> I don't really understand the flow here. does SUSPEND clear DRIVER_OK
> then?
SUSPEND does not clear DRIVER, I think this is not a must.
>
>
>>>
>>>> +
>>>>    \input{admin.tex}
>>>>    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>> -- 
>>>> 2.35.3
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-17  5:32     ` Parav Pandit
@ 2023-09-18  3:10       ` Zhu, Lingshan
  2023-09-18  4:32         ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  3:10 UTC (permalink / raw)
  To: Parav Pandit, virtio-dev



On 9/17/2023 1:32 PM, Parav Pandit wrote:
>> From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On
>> Behalf Of Zhu, Lingshan
>> Sent: Friday, September 15, 2023 9:59 AM
>>
>> On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>>>> This series introduces
>>>> 1)a new SUSPEND bit in the device status Which is used to suspend the
>>>> device, so that the device states and virtqueue states are
>>>> stabilized.
>>>>
>>>> 2)virtqueue state and its accessor, to get and set last_avail_idx and
>>>> last_used_idx of virtqueues.
>>>>
>>>> The main usecase of these new facilities is Live Migration.
>>>>
>>>> Future work: dirty page tracking and in-flight descriptors.
>>>> This series addresses many comments from Jason, Stefan and Eugenio
>>>> from RFC series.
>>> Compared to Parav's patchset this is much less functional.
>> we will add dirty page tracking and in-flight IO tracker in V2, then it will be a
>> full featured LM solution.
>>
>> They are not in this series because we want this series to be small and focus.
>>> Assuming that one goes in, can't we add ability to submit admin
>>> commands through MMIO on the device itself and be done with it?
>> I am not sure, IMHO, if we use admin vq as back-ends for MMIO based live
>> migration, then the issues in admin vq still exist, for example:
>> 1)nested virtualization
>> 2)bare-metal live migration
>> 3)QOS
>> 4)introduce more attacking surfaces.
>>
> #4 is just random without.
I failed to process "random without".

If you expect admin vq to perform live migration, it can certainly be a 
side channel attacking surface, for example:
a) a malicious SW can stop the device running
b) a malicious SW can sniff guest memory by tracking guest dirty pages, 
then speculate guest operations and stole secrets.
> #3 There is no QoS issue with admin commands and queues. If you claim that then whole virtio spec based on the virtqueues is broken.
> And it is certainly not the case.
Please do not confuse the concepts and purposes of the data queues and 
admin vq.

For data-queues, it can be slow without mq or rss, that means 
performance overhead, but can work.
For admin vq, if it don't meet QOS requirements, it fails to migrate guests.

I have replied to the same question so many times, and this is the last 
time.
>
>> And what's more, if we wants to implementing a new capability onbehalf of
>> admin vq, does the capability need to store at least one descriptor buffer, that is
>> the capability length should be at least the max_lengh_of_buffer?
>>
>> If that is not possible, do we need to implement extra fields like length and
>> remaining_length, then the device repeating update the cap data, and the
>> driver repeat reading, way to complex and introduce significant downtime.
>>
> At least I do not understand. May be you can explain more?
Just try consider how a MMIO bar cap can fit to max_buffer_size.
>
> We observe less than 300 msec downtime using admin commands over admin queue in internal tests in virtio area.
still QOS


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  3:10       ` Zhu, Lingshan
@ 2023-09-18  4:32         ` Parav Pandit
  2023-09-18  5:21           ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  4:32 UTC (permalink / raw)
  To: Zhu, Lingshan, virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 8:40 AM
> 
> On 9/17/2023 1:32 PM, Parav Pandit wrote:
> >> From: virtio-dev@lists.oasis-open.org
> >> <virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
> >> Sent: Friday, September 15, 2023 9:59 AM
> >>
> >> On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> >>>> This series introduces
> >>>> 1)a new SUSPEND bit in the device status Which is used to suspend
> >>>> the device, so that the device states and virtqueue states are
> >>>> stabilized.
> >>>>
> >>>> 2)virtqueue state and its accessor, to get and set last_avail_idx
> >>>> and last_used_idx of virtqueues.
> >>>>
> >>>> The main usecase of these new facilities is Live Migration.
> >>>>
> >>>> Future work: dirty page tracking and in-flight descriptors.
> >>>> This series addresses many comments from Jason, Stefan and Eugenio
> >>>> from RFC series.
> >>> Compared to Parav's patchset this is much less functional.
> >> we will add dirty page tracking and in-flight IO tracker in V2, then
> >> it will be a full featured LM solution.
> >>
> >> They are not in this series because we want this series to be small and focus.
> >>> Assuming that one goes in, can't we add ability to submit admin
> >>> commands through MMIO on the device itself and be done with it?
> >> I am not sure, IMHO, if we use admin vq as back-ends for MMIO based
> >> live migration, then the issues in admin vq still exist, for example:
> >> 1)nested virtualization
> >> 2)bare-metal live migration
> >> 3)QOS
> >> 4)introduce more attacking surfaces.
> >>
> > #4 is just random without.
> I failed to process "random without".
> 
> If you expect admin vq to perform live migration, it can certainly be a side
> channel attacking surface, for example:
> a) a malicious SW can stop the device running
> b) a malicious SW can sniff guest memory by tracking guest dirty pages, then
> speculate guest operations and stole secrets.
This is the mode when hypervisor is trusted.
When hypervisor is untrusted, the CC model TDISP enabled device, TSM will delegate the tasks to the DSM.

For untrusted hypervisor, same set of attack surface is present with trap+emulation.
So both method score same. Hence its not relevant point for discussion.

> > #3 There is no QoS issue with admin commands and queues. If you claim that
> then whole virtio spec based on the virtqueues is broken.
> > And it is certainly not the case.
> Please do not confuse the concepts and purposes of the data queues and
> admin vq.
> 
I am not confused.
There is no guarantee that a register placed on the VF will be serviced by the device in exact same time regardless of VF count = 1 or 4000.
Yet again not relevant comparison.

> For data-queues, it can be slow without mq or rss, that means performance
> overhead, but can work.
No, it does not work. The application failed because of jitter in the video and audio due to missing the latency budget.
A financial application is terminated due to timeouts and packet loss.

Device migration is just another 3rd such applications.

Its also same.
My last reply on this vague argument.

> For admin vq, if it don't meet QOS requirements, it fails to migrate guests.
> 
> I have replied to the same question so many times, and this is the last time.
> >
I also replied many times that QoS argument is not valid anymore. Same can happen with registers writes.
Perf characteristics for 30+ devices is not in the virtio spec. It is implementation details.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  2:56           ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-18  4:42             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  4:42 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev


> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Zhu, Lingshan
> Sent: Monday, September 18, 2023 8:27 AM


> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit has been
> negotiated then the device allow reset a vq after SUSPEND.

This is simply a wrong semantics to build to operate individual object after its parent object is suspended.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  4:42             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  4:42 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
> open.org> On Behalf Of Zhu, Lingshan
> Sent: Monday, September 18, 2023 8:27 AM

> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit has been
> negotiated then the device allow reset a vq after SUSPEND.

This is simply a wrong semantics to build to operate individual object after its parent object is suspended.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  4:42             ` Parav Pandit
@ 2023-09-18  5:14               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  5:14 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/18/2023 12:42 PM, Parav Pandit wrote:
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Zhu, Lingshan
>> Sent: Monday, September 18, 2023 8:27 AM
>
>> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit has been
>> negotiated then the device allow reset a vq after SUSPEND.
> This is simply a wrong semantics to build to operate individual object after its parent object is suspended.
A device can choose to respond to a set of signals and ignore others, right?

And, This is not your admin vq based LM solution, therefore there is NO 
PARENT objects.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  5:14               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  5:14 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On 9/18/2023 12:42 PM, Parav Pandit wrote:
>> From: virtio-comment@lists.oasis-open.org <virtio-comment@lists.oasis-
>> open.org> On Behalf Of Zhu, Lingshan
>> Sent: Monday, September 18, 2023 8:27 AM
>
>> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit has been
>> negotiated then the device allow reset a vq after SUSPEND.
> This is simply a wrong semantics to build to operate individual object after its parent object is suspended.
A device can choose to respond to a set of signals and ignore others, right?

And, This is not your admin vq based LM solution, therefore there is NO 
PARENT objects.

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  4:32         ` Parav Pandit
@ 2023-09-18  5:21           ` Zhu, Lingshan
  2023-09-18  5:25             ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  5:21 UTC (permalink / raw)
  To: Parav Pandit, virtio-dev



On 9/18/2023 12:32 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 8:40 AM
>>
>> On 9/17/2023 1:32 PM, Parav Pandit wrote:
>>>> From: virtio-dev@lists.oasis-open.org
>>>> <virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
>>>> Sent: Friday, September 15, 2023 9:59 AM
>>>>
>>>> On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>>>>>> This series introduces
>>>>>> 1)a new SUSPEND bit in the device status Which is used to suspend
>>>>>> the device, so that the device states and virtqueue states are
>>>>>> stabilized.
>>>>>>
>>>>>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>>>>>> and last_used_idx of virtqueues.
>>>>>>
>>>>>> The main usecase of these new facilities is Live Migration.
>>>>>>
>>>>>> Future work: dirty page tracking and in-flight descriptors.
>>>>>> This series addresses many comments from Jason, Stefan and Eugenio
>>>>>> from RFC series.
>>>>> Compared to Parav's patchset this is much less functional.
>>>> we will add dirty page tracking and in-flight IO tracker in V2, then
>>>> it will be a full featured LM solution.
>>>>
>>>> They are not in this series because we want this series to be small and focus.
>>>>> Assuming that one goes in, can't we add ability to submit admin
>>>>> commands through MMIO on the device itself and be done with it?
>>>> I am not sure, IMHO, if we use admin vq as back-ends for MMIO based
>>>> live migration, then the issues in admin vq still exist, for example:
>>>> 1)nested virtualization
>>>> 2)bare-metal live migration
>>>> 3)QOS
>>>> 4)introduce more attacking surfaces.
>>>>
>>> #4 is just random without.
>> I failed to process "random without".
>>
>> If you expect admin vq to perform live migration, it can certainly be a side
>> channel attacking surface, for example:
>> a) a malicious SW can stop the device running
>> b) a malicious SW can sniff guest memory by tracking guest dirty pages, then
>> speculate guest operations and stole secrets.
> This is the mode when hypervisor is trusted.
PF is not always owned by the hypervisor, right?
And you don't pass-through the PF to any guests, right?
> When hypervisor is untrusted, the CC model TDISP enabled device, TSM will delegate the tasks to the DSM.
TDISP devices can not be migrated for now.
>
> For untrusted hypervisor, same set of attack surface is present with trap+emulation.
> So both method score same. Hence its not relevant point for discussion.
this is not hypervisor, Do you see any modern hypervisor have these issues?

This is admin vq for LM can be a side channel attacking surface.
>
>>> #3 There is no QoS issue with admin commands and queues. If you claim that
>> then whole virtio spec based on the virtqueues is broken.
>>> And it is certainly not the case.
>> Please do not confuse the concepts and purposes of the data queues and
>> admin vq.
>>
> I am not confused.
> There is no guarantee that a register placed on the VF will be serviced by the device in exact same time regardless of VF count = 1 or 4000.
> Yet again not relevant comparison.
please read my previous replies in other threads.
>
>> For data-queues, it can be slow without mq or rss, that means performance
>> overhead, but can work.
> No, it does not work. The application failed because of jitter in the video and audio due to missing the latency budget.
> A financial application is terminated due to timeouts and packet loss.
>
> Device migration is just another 3rd such applications.
>
> Its also same.
> My last reply on this vague argument.
I think the points are clear, and you already understand the points, so 
no need to argue anymore
>
>> For admin vq, if it don't meet QOS requirements, it fails to migrate guests.
>>
>> I have replied to the same question so many times, and this is the last time.
> I also replied many times that QoS argument is not valid anymore. Same can happen with registers writes.
> Perf characteristics for 30+ devices is not in the virtio spec. It is implementation details.
as replied many times, registers only serve the device itself and 
registers are not DATA PATH,
means the device don't transfer data through registers.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  5:21           ` Zhu, Lingshan
@ 2023-09-18  5:25             ` Zhu, Lingshan
  2023-09-18  6:37               ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  5:25 UTC (permalink / raw)
  To: Parav Pandit, virtio-dev, Michael S. Tsirkin, Jason Wang

CC MST and Jason

On 9/18/2023 1:21 PM, Zhu, Lingshan wrote:
>
>
> On 9/18/2023 12:32 PM, Parav Pandit wrote:
>>
>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>> Sent: Monday, September 18, 2023 8:40 AM
>>>
>>> On 9/17/2023 1:32 PM, Parav Pandit wrote:
>>>>> From: virtio-dev@lists.oasis-open.org
>>>>> <virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
>>>>> Sent: Friday, September 15, 2023 9:59 AM
>>>>>
>>>>> On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
>>>>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>>>>>>> This series introduces
>>>>>>> 1)a new SUSPEND bit in the device status Which is used to suspend
>>>>>>> the device, so that the device states and virtqueue states are
>>>>>>> stabilized.
>>>>>>>
>>>>>>> 2)virtqueue state and its accessor, to get and set last_avail_idx
>>>>>>> and last_used_idx of virtqueues.
>>>>>>>
>>>>>>> The main usecase of these new facilities is Live Migration.
>>>>>>>
>>>>>>> Future work: dirty page tracking and in-flight descriptors.
>>>>>>> This series addresses many comments from Jason, Stefan and Eugenio
>>>>>>> from RFC series.
>>>>>> Compared to Parav's patchset this is much less functional.
>>>>> we will add dirty page tracking and in-flight IO tracker in V2, then
>>>>> it will be a full featured LM solution.
>>>>>
>>>>> They are not in this series because we want this series to be 
>>>>> small and focus.
>>>>>> Assuming that one goes in, can't we add ability to submit admin
>>>>>> commands through MMIO on the device itself and be done with it?
>>>>> I am not sure, IMHO, if we use admin vq as back-ends for MMIO based
>>>>> live migration, then the issues in admin vq still exist, for example:
>>>>> 1)nested virtualization
>>>>> 2)bare-metal live migration
>>>>> 3)QOS
>>>>> 4)introduce more attacking surfaces.
>>>>>
>>>> #4 is just random without.
>>> I failed to process "random without".
>>>
>>> If you expect admin vq to perform live migration, it can certainly 
>>> be a side
>>> channel attacking surface, for example:
>>> a) a malicious SW can stop the device running
>>> b) a malicious SW can sniff guest memory by tracking guest dirty 
>>> pages, then
>>> speculate guest operations and stole secrets.
>> This is the mode when hypervisor is trusted.
> PF is not always owned by the hypervisor, right?
> And you don't pass-through the PF to any guests, right?
>> When hypervisor is untrusted, the CC model TDISP enabled device, TSM 
>> will delegate the tasks to the DSM.
> TDISP devices can not be migrated for now.
>>
>> For untrusted hypervisor, same set of attack surface is present with 
>> trap+emulation.
>> So both method score same. Hence its not relevant point for discussion.
> this is not hypervisor, Do you see any modern hypervisor have these 
> issues?
>
> This is admin vq for LM can be a side channel attacking surface.
>>
>>>> #3 There is no QoS issue with admin commands and queues. If you 
>>>> claim that
>>> then whole virtio spec based on the virtqueues is broken.
>>>> And it is certainly not the case.
>>> Please do not confuse the concepts and purposes of the data queues and
>>> admin vq.
>>>
>> I am not confused.
>> There is no guarantee that a register placed on the VF will be 
>> serviced by the device in exact same time regardless of VF count = 1 
>> or 4000.
>> Yet again not relevant comparison.
> please read my previous replies in other threads.
>>
>>> For data-queues, it can be slow without mq or rss, that means 
>>> performance
>>> overhead, but can work.
>> No, it does not work. The application failed because of jitter in the 
>> video and audio due to missing the latency budget.
>> A financial application is terminated due to timeouts and packet loss.
>>
>> Device migration is just another 3rd such applications.
>>
>> Its also same.
>> My last reply on this vague argument.
> I think the points are clear, and you already understand the points, 
> so no need to argue anymore
>>
>>> For admin vq, if it don't meet QOS requirements, it fails to migrate 
>>> guests.
>>>
>>> I have replied to the same question so many times, and this is the 
>>> last time.
>> I also replied many times that QoS argument is not valid anymore. 
>> Same can happen with registers writes.
>> Perf characteristics for 30+ devices is not in the virtio spec. It is 
>> implementation details.
> as replied many times, registers only serve the device itself and 
> registers are not DATA PATH,
> means the device don't transfer data through registers.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  5:14               ` Zhu, Lingshan
@ 2023-09-18  6:17                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  6:17 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 10:45 AM
> 
> On 9/18/2023 12:42 PM, Parav Pandit wrote:
> >> From: virtio-comment@lists.oasis-open.org
> >> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan
> >> Sent: Monday, September 18, 2023 8:27 AM
> >
> >> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit
> >> has been negotiated then the device allow reset a vq after SUSPEND.
> > This is simply a wrong semantics to build to operate individual object after its
> parent object is suspended.
> A device can choose to respond to a set of signals and ignore others, right?
> 
> And, This is not your admin vq based LM solution, therefore there is NO PARENT
> objects.
There is parent object.
There is VQ which you propose to do SUSPEND_RESET of the parent virtio device which is already SUSPENDED.

Admin commands and vq exists in the spec because to admin work.
The admin vq series is split from its users because it is hard to do everything in one go.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  6:17                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  6:17 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 10:45 AM
> 
> On 9/18/2023 12:42 PM, Parav Pandit wrote:
> >> From: virtio-comment@lists.oasis-open.org
> >> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan
> >> Sent: Monday, September 18, 2023 8:27 AM
> >
> >> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit
> >> has been negotiated then the device allow reset a vq after SUSPEND.
> > This is simply a wrong semantics to build to operate individual object after its
> parent object is suspended.
> A device can choose to respond to a set of signals and ignore others, right?
> 
> And, This is not your admin vq based LM solution, therefore there is NO PARENT
> objects.
There is parent object.
There is VQ which you propose to do SUSPEND_RESET of the parent virtio device which is already SUSPENDED.

Admin commands and vq exists in the spec because to admin work.
The admin vq series is split from its users because it is hard to do everything in one go.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  5:25             ` Zhu, Lingshan
@ 2023-09-18  6:37               ` Parav Pandit
  2023-09-18  6:49                 ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  6:37 UTC (permalink / raw)
  To: Zhu, Lingshan, virtio-dev, Michael S. Tsirkin, Jason Wang


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 10:55 AM
> To: Parav Pandit <parav@nvidia.com>; virtio-dev@lists.oasis-open.org; Michael
> S. Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>
> Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq
> state
> 
> CC MST and Jason
> 
> On 9/18/2023 1:21 PM, Zhu, Lingshan wrote:
> >
> >
> > On 9/18/2023 12:32 PM, Parav Pandit wrote:
> >>
> >>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>> Sent: Monday, September 18, 2023 8:40 AM
> >>>
> >>> On 9/17/2023 1:32 PM, Parav Pandit wrote:
> >>>>> From: virtio-dev@lists.oasis-open.org
> >>>>> <virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
> >>>>> Sent: Friday, September 15, 2023 9:59 AM
> >>>>>
> >>>>> On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
> >>>>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
> >>>>>>> This series introduces
> >>>>>>> 1)a new SUSPEND bit in the device status Which is used to
> >>>>>>> suspend the device, so that the device states and virtqueue
> >>>>>>> states are stabilized.
> >>>>>>>
> >>>>>>> 2)virtqueue state and its accessor, to get and set
> >>>>>>> last_avail_idx and last_used_idx of virtqueues.
> >>>>>>>
> >>>>>>> The main usecase of these new facilities is Live Migration.
> >>>>>>>
> >>>>>>> Future work: dirty page tracking and in-flight descriptors.
> >>>>>>> This series addresses many comments from Jason, Stefan and
> >>>>>>> Eugenio from RFC series.
> >>>>>> Compared to Parav's patchset this is much less functional.
> >>>>> we will add dirty page tracking and in-flight IO tracker in V2,
> >>>>> then it will be a full featured LM solution.
> >>>>>
> >>>>> They are not in this series because we want this series to be
> >>>>> small and focus.
> >>>>>> Assuming that one goes in, can't we add ability to submit admin
> >>>>>> commands through MMIO on the device itself and be done with it?
> >>>>> I am not sure, IMHO, if we use admin vq as back-ends for MMIO
> >>>>> based live migration, then the issues in admin vq still exist, for example:
> >>>>> 1)nested virtualization
> >>>>> 2)bare-metal live migration
> >>>>> 3)QOS
> >>>>> 4)introduce more attacking surfaces.
> >>>>>
> >>>> #4 is just random without.
> >>> I failed to process "random without".
> >>>
> >>> If you expect admin vq to perform live migration, it can certainly
> >>> be a side channel attacking surface, for example:
> >>> a) a malicious SW can stop the device running
> >>> b) a malicious SW can sniff guest memory by tracking guest dirty
> >>> pages, then speculate guest operations and stole secrets.
> >> This is the mode when hypervisor is trusted.
> > PF is not always owned by the hypervisor, right?
> > And you don't pass-through the PF to any guests, right?
> >> When hypervisor is untrusted, the CC model TDISP enabled device, TSM
> >> will delegate the tasks to the DSM.
> > TDISP devices can not be migrated for now.
That is fine, the infra is build so that it can be migrated one day.
And at that point the proposed admin command-based model also fits fine.

> >>
> >> For untrusted hypervisor, same set of attack surface is present with
> >> trap+emulation.
> >> So both method score same. Hence its not relevant point for discussion.
> > this is not hypervisor, Do you see any modern hypervisor have these
> > issues?
> >
> > This is admin vq for LM can be a side channel attacking surface.
It is not.
Hypervisor is trusted entity.
For untrusted hypervisor the TDISP is unified solution build by the various industry bodies including DMTF, PCI for last few years.
We want to utilize that.

> >>
> >>>> #3 There is no QoS issue with admin commands and queues. If you
> >>>> claim that
> >>> then whole virtio spec based on the virtqueues is broken.
> >>>> And it is certainly not the case.
> >>> Please do not confuse the concepts and purposes of the data queues
> >>> and admin vq.
> >>>
> >> I am not confused.
> >> There is no guarantee that a register placed on the VF will be
> >> serviced by the device in exact same time regardless of VF count = 1
> >> or 4000.
> >> Yet again not relevant comparison.
> > please read my previous replies in other threads.
It does not answer.
The claim that somehow a polling register ensures downtime guarantee for scale of thousands of member devices is some specific device implementation without explanation.

> >>
> >>> For data-queues, it can be slow without mq or rss, that means
> >>> performance overhead, but can work.
> >> No, it does not work. The application failed because of jitter in the
> >> video and audio due to missing the latency budget.
> >> A financial application is terminated due to timeouts and packet loss.
> >>
> >> Device migration is just another 3rd such applications.
> >>
> >> Its also same.
> >> My last reply on this vague argument.
> > I think the points are clear, and you already understand the points,
> > so no need to argue anymore
Yes, I am clear from long time, nor AQ nor no register, RSS queues, none cannot guarantee any performance characteristics.
It is pretty clear to me.
Any performance guarantees are explicitly requested when desired.

> >>
> >>> For admin vq, if it don't meet QOS requirements, it fails to migrate
> >>> guests.
> >>>
> >>> I have replied to the same question so many times, and this is the
> >>> last time.
> >> I also replied many times that QoS argument is not valid anymore.
> >> Same can happen with registers writes.
> >> Perf characteristics for 30+ devices is not in the virtio spec. It is
> >> implementation details.
> > as replied many times, registers only serve the device itself and
> > registers are not DATA PATH, means the device don't transfer data
> > through registers.

It does not matter data path or control path, the fact is it downtime assurance cannot be guaranteed by register interface design, it is the implementation details.
And so does for admin commands and/or AQ.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  6:17                 ` Parav Pandit
@ 2023-09-18  6:38                   ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:38 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/18/2023 2:17 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 10:45 AM
>>
>> On 9/18/2023 12:42 PM, Parav Pandit wrote:
>>>> From: virtio-comment@lists.oasis-open.org
>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan
>>>> Sent: Monday, September 18, 2023 8:27 AM
>>>> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit
>>>> has been negotiated then the device allow reset a vq after SUSPEND.
>>> This is simply a wrong semantics to build to operate individual object after its
>> parent object is suspended.
>> A device can choose to respond to a set of signals and ignore others, right?
>>
>> And, This is not your admin vq based LM solution, therefore there is NO PARENT
>> objects.
> There is parent object.
> There is VQ which you propose to do SUSPEND_RESET of the parent virtio device which is already SUSPENDED.
that is why we plan to implement a new feature bit to control this behavior.

However, in next version, as MST suggested, I will forbid resetting vqs 
after SUSPEND.
>
> Admin commands and vq exists in the spec because to admin work.
> The admin vq series is split from its users because it is hard to do everything in one go.
I failed to process this comment, how is this related to the question?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  6:38                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:38 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/18/2023 2:17 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 10:45 AM
>>
>> On 9/18/2023 12:42 PM, Parav Pandit wrote:
>>>> From: virtio-comment@lists.oasis-open.org
>>>> <virtio-comment@lists.oasis- open.org> On Behalf Of Zhu, Lingshan
>>>> Sent: Monday, September 18, 2023 8:27 AM
>>>> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature bit
>>>> has been negotiated then the device allow reset a vq after SUSPEND.
>>> This is simply a wrong semantics to build to operate individual object after its
>> parent object is suspended.
>> A device can choose to respond to a set of signals and ignore others, right?
>>
>> And, This is not your admin vq based LM solution, therefore there is NO PARENT
>> objects.
> There is parent object.
> There is VQ which you propose to do SUSPEND_RESET of the parent virtio device which is already SUSPENDED.
that is why we plan to implement a new feature bit to control this behavior.

However, in next version, as MST suggested, I will forbid resetting vqs 
after SUSPEND.
>
> Admin commands and vq exists in the spec because to admin work.
> The admin vq series is split from its users because it is hard to do everything in one go.
I failed to process this comment, how is this related to the question?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  6:38                   ` Zhu, Lingshan
@ 2023-09-18  6:46                     ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  6:46 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 12:09 PM

> > There is parent object.
> > There is VQ which you propose to do SUSPEND_RESET of the parent virtio
> device which is already SUSPENDED.
> that is why we plan to implement a new feature bit to control this behavior.
> 
It does not matter adding a feature bit when the semantics itself is wrong.

> However, in next version, as MST suggested, I will forbid resetting vqs after
> SUSPEND.
Ok. That is good.
I got confused with above proposed bit.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  6:46                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  6:46 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 12:09 PM

> > There is parent object.
> > There is VQ which you propose to do SUSPEND_RESET of the parent virtio
> device which is already SUSPENDED.
> that is why we plan to implement a new feature bit to control this behavior.
> 
It does not matter adding a feature bit when the semantics itself is wrong.

> However, in next version, as MST suggested, I will forbid resetting vqs after
> SUSPEND.
Ok. That is good.
I got confused with above proposed bit.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  6:37               ` Parav Pandit
@ 2023-09-18  6:49                 ` Zhu, Lingshan
  2023-09-18  6:54                   ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:49 UTC (permalink / raw)
  To: Parav Pandit, virtio-dev, Michael S. Tsirkin, Jason Wang



On 9/18/2023 2:37 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 10:55 AM
>> To: Parav Pandit <parav@nvidia.com>; virtio-dev@lists.oasis-open.org; Michael
>> S. Tsirkin <mst@redhat.com>; Jason Wang <jasowang@redhat.com>
>> Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq
>> state
>>
>> CC MST and Jason
>>
>> On 9/18/2023 1:21 PM, Zhu, Lingshan wrote:
>>>
>>> On 9/18/2023 12:32 PM, Parav Pandit wrote:
>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>> Sent: Monday, September 18, 2023 8:40 AM
>>>>>
>>>>> On 9/17/2023 1:32 PM, Parav Pandit wrote:
>>>>>>> From: virtio-dev@lists.oasis-open.org
>>>>>>> <virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
>>>>>>> Sent: Friday, September 15, 2023 9:59 AM
>>>>>>>
>>>>>>> On 9/14/2023 7:14 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Wed, Sep 06, 2023 at 04:16:32PM +0800, Zhu Lingshan wrote:
>>>>>>>>> This series introduces
>>>>>>>>> 1)a new SUSPEND bit in the device status Which is used to
>>>>>>>>> suspend the device, so that the device states and virtqueue
>>>>>>>>> states are stabilized.
>>>>>>>>>
>>>>>>>>> 2)virtqueue state and its accessor, to get and set
>>>>>>>>> last_avail_idx and last_used_idx of virtqueues.
>>>>>>>>>
>>>>>>>>> The main usecase of these new facilities is Live Migration.
>>>>>>>>>
>>>>>>>>> Future work: dirty page tracking and in-flight descriptors.
>>>>>>>>> This series addresses many comments from Jason, Stefan and
>>>>>>>>> Eugenio from RFC series.
>>>>>>>> Compared to Parav's patchset this is much less functional.
>>>>>>> we will add dirty page tracking and in-flight IO tracker in V2,
>>>>>>> then it will be a full featured LM solution.
>>>>>>>
>>>>>>> They are not in this series because we want this series to be
>>>>>>> small and focus.
>>>>>>>> Assuming that one goes in, can't we add ability to submit admin
>>>>>>>> commands through MMIO on the device itself and be done with it?
>>>>>>> I am not sure, IMHO, if we use admin vq as back-ends for MMIO
>>>>>>> based live migration, then the issues in admin vq still exist, for example:
>>>>>>> 1)nested virtualization
>>>>>>> 2)bare-metal live migration
>>>>>>> 3)QOS
>>>>>>> 4)introduce more attacking surfaces.
>>>>>>>
>>>>>> #4 is just random without.
>>>>> I failed to process "random without".
>>>>>
>>>>> If you expect admin vq to perform live migration, it can certainly
>>>>> be a side channel attacking surface, for example:
>>>>> a) a malicious SW can stop the device running
>>>>> b) a malicious SW can sniff guest memory by tracking guest dirty
>>>>> pages, then speculate guest operations and stole secrets.
>>>> This is the mode when hypervisor is trusted.
>>> PF is not always owned by the hypervisor, right?
>>> And you don't pass-through the PF to any guests, right?
>>>> When hypervisor is untrusted, the CC model TDISP enabled device, TSM
>>>> will delegate the tasks to the DSM.
>>> TDISP devices can not be migrated for now.
> That is fine, the infra is build so that it can be migrated one day.
> And at that point the proposed admin command-based model also fits fine.
since you are talking about TDISP, I suggest to read TDISP spec,
it says:
Device Security Architecture - Administrative interfaces (e.g., a PF) 
may be used to influence the
security properties of the TDI used by the TVM. The device’s security 
architecture must provide
isolation and access control for TVM data in the device for protection 
against entities that are not in
the trust boundary of the TVM

so admin vq based LM solution can be a side channel attacking surface
>
>>>> For untrusted hypervisor, same set of attack surface is present with
>>>> trap+emulation.
>>>> So both method score same. Hence its not relevant point for discussion.
>>> this is not hypervisor, Do you see any modern hypervisor have these
>>> issues?
>>>
>>> This is admin vq for LM can be a side channel attacking surface.
> It is not.
> Hypervisor is trusted entity.
> For untrusted hypervisor the TDISP is unified solution build by the various industry bodies including DMTF, PCI for last few years.
> We want to utilize that.
first, TDISP is out of virtio spec.
second, TDISP devices can not be migrated for now
third, admin vq can be an side channel attacking surface as explained above.
>
>>>>>> #3 There is no QoS issue with admin commands and queues. If you
>>>>>> claim that
>>>>> then whole virtio spec based on the virtqueues is broken.
>>>>>> And it is certainly not the case.
>>>>> Please do not confuse the concepts and purposes of the data queues
>>>>> and admin vq.
>>>>>
>>>> I am not confused.
>>>> There is no guarantee that a register placed on the VF will be
>>>> serviced by the device in exact same time regardless of VF count = 1
>>>> or 4000.
>>>> Yet again not relevant comparison.
>>> please read my previous replies in other threads.
> It does not answer.
> The claim that somehow a polling register ensures downtime guarantee for scale of thousands of member devices is some specific device implementation without explanation.
the registers and the LM facilities are per-device.
>
>>>>> For data-queues, it can be slow without mq or rss, that means
>>>>> performance overhead, but can work.
>>>> No, it does not work. The application failed because of jitter in the
>>>> video and audio due to missing the latency budget.
>>>> A financial application is terminated due to timeouts and packet loss.
>>>>
>>>> Device migration is just another 3rd such applications.
>>>>
>>>> Its also same.
>>>> My last reply on this vague argument.
>>> I think the points are clear, and you already understand the points,
>>> so no need to argue anymore
> Yes, I am clear from long time, nor AQ nor no register, RSS queues, none cannot guarantee any performance characteristics.
> It is pretty clear to me.
> Any performance guarantees are explicitly requested when desired.
>
>>>>> For admin vq, if it don't meet QOS requirements, it fails to migrate
>>>>> guests.
>>>>>
>>>>> I have replied to the same question so many times, and this is the
>>>>> last time.
>>>> I also replied many times that QoS argument is not valid anymore.
>>>> Same can happen with registers writes.
>>>> Perf characteristics for 30+ devices is not in the virtio spec. It is
>>>> implementation details.
>>> as replied many times, registers only serve the device itself and
>>> registers are not DATA PATH, means the device don't transfer data
>>> through registers.
> It does not matter data path or control path, the fact is it downtime assurance cannot be guaranteed by register interface design, it is the implementation details.
> And so does for admin commands and/or AQ.
the registers do not perform any data transitions, e.g., we don't 
migrate dirty pages through registers.
But you do these by admin vq


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  6:46                     ` Parav Pandit
@ 2023-09-18  6:49                       ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:49 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/18/2023 2:46 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 12:09 PM
>>> There is parent object.
>>> There is VQ which you propose to do SUSPEND_RESET of the parent virtio
>> device which is already SUSPENDED.
>> that is why we plan to implement a new feature bit to control this behavior.
>>
> It does not matter adding a feature bit when the semantics itself is wrong.
>
>> However, in next version, as MST suggested, I will forbid resetting vqs after
>> SUSPEND.
> Ok. That is good.
> I got confused with above proposed bit.
OK, lets at least close this one


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  6:49                       ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:49 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On 9/18/2023 2:46 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 12:09 PM
>>> There is parent object.
>>> There is VQ which you propose to do SUSPEND_RESET of the parent virtio
>> device which is already SUSPENDED.
>> that is why we plan to implement a new feature bit to control this behavior.
>>
> It does not matter adding a feature bit when the semantics itself is wrong.
>
>> However, in next version, as MST suggested, I will forbid resetting vqs after
>> SUSPEND.
> Ok. That is good.
> I got confused with above proposed bit.
OK, lets at least close this one

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
  2023-09-18  2:56           ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-18  6:50             ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/18/2023 10:56 AM, Zhu, Lingshan wrote:
>
>
> On 9/15/2023 7:10 PM, Michael S. Tsirkin wrote:
>> On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
>>>
>>> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>>>>> This patch introduces a new status bit in the device status: SUSPEND.
>>>>>
>>>>> This SUSPEND bit can be used by the driver to suspend a device,
>>>>> in order to stabilize the device states and virtqueue states.
>>>>>
>>>>> Its main use case is live migration.
>>>>>
>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>    content.tex | 31 +++++++++++++++++++++++++++++++
>>>>>    1 file changed, 31 insertions(+)
>>>>>
>>>>> diff --git a/content.tex b/content.tex
>>>>> index 0e492cd..0fab537 100644
>>>>> --- a/content.tex
>>>>> +++ b/content.tex
>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and 
>>>>> ready to
>>>>>      drive the device.
>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, 
>>>>> indicates that the
>>>>> +  device has been suspended by the driver.
>>>>> +
>>>>>    \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has 
>>>>> experienced
>>>>>      an error from which it can't recover.
>>>>>    \end{description}
>>>>> @@ -73,6 +76,10 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    recover by issuing a reset.
>>>>>    \end{note}
>>>>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>>>>> +
>>>>> +When setting SUSPEND, the driver MUST re-read \field{device 
>>>>> status} to ensure the SUSPEND bit is set.
>>>>> +
>>>>>    \devicenormative{\subsection}{Device Status Field}{Basic 
>>>>> Facilities of a Virtio Device / Device Status Field}
>>>>>    The device MUST NOT consume buffers or send any used buffer
>>>>> @@ -82,6 +89,26 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    that a reset is needed.  If DRIVER_OK is set, after it sets 
>>>>> DEVICE_NEEDS_RESET, the device
>>>>>    MUST send a device configuration change notification to the 
>>>>> driver.
>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>>>>> +
>>>>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not 
>>>>> negotiated.
>>>> why? let's just forbid driver from setting it.
>>> OK
>>>>> +
>>>>> +The device SHOULD allow settings to \field{device status} even 
>>>>> when SUSPEND is set.
>>>>> +
>>>>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device 
>>>>> SHOULD clear SUSPEND
>>>>> +and resumes operation upon DRIVER_OK.
>>>>> +
>>>> sorry what?
>>> In case of a failed or cancelled Live Migration, the device needs to 
>>> resume
>>> operation.
>>> However the spec forbids the driver to clear a device status bit, so
>>> re-writing
>>> DRIVER_OK is expected to clear SUSPEND and the device resume operation.
>> No, DRIVER_OK is already set. Setting a bit that is already set should
>> not have side effects. In fact auto-clearing suspend is problematic too.
> The spec says: Set the DRIVER_OK status bit. At this point the device 
> is “live”.
>
> So semantically DRIVER_OK can bring the device to live even from SUSPEND.
>
> In the implementation, the device can check whether SUSPEND is set, then
> decide what to do. Just don't ignore DRIVER_OK if it is already
> set, and the driver should not clear a device status bit.
>
>>
>>
>>>>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>>>>> +the device SHOULD perform the following actions before presenting 
>>>>> SUSPEND bit in the \field{device status}:
>>>>> +
>>>>> +\begin{itemize}
>>>>> +\item Stop consuming buffers of any virtqueues and mark all 
>>>>> finished descritors as used.
>>>>> +\item Wait until all descriptors that being processed to finish 
>>>>> and mark them as used.
>>>>> +\item Flush all used buffer and send used buffer notifications to 
>>>>> the driver.
>>>> flush how?
>>> This is device-type-specific, and we will include tracking inflight
>>> descriptors(buffers) in V2.
>>>>> +\item Record Virtqueue State of each enabled virtqueue, see 
>>>>> section \ref{sec:Virtqueues / Virtqueue State}
>>>> record where?
>>> This is transport specific, for PCI, patch 5 introduces two new 
>>> fields for
>>> avail and used state
>> they clearly can't store state for all vqs, these are just two 16 bit 
>> fields.
> vq states filed can work with queue_select like other vq fields.
> I will document this in the comment.
>>
>>>>> +\item Pause its operation except \field{device status} and 
>>>>> preserve configurations in its Device Configuration Space, see 
>>>>> \ref{sec:Basic Facilities of a Virtio Device / Device 
>>>>> Configuration Space}
>>>> pause in what sense? completely? this does not seem realistic.
>>>> e.g. pci express link has to stay active or device will die.
>>> only pause virtio, I will rephrase the sentence as "pause its virtio
>>> operation".
>> that is vague too. for example what happens to link state of
>> a networking device?
> Then how about we say: pause operation in both data-path and 
> control-path?
>
> Or do you have any suggestion?
>>
>>> Others like PCI link in the example is out of the spec and we don't 
>>> need
>>> to migrate them.
>>>>
>>>> also, presumably here it is except a bunch of other fields.
>>>> e.g. what about queue select and all related queue fields?
>>> For now they are forbidden.
>>>
>>> As SiWei suggested, we will introduce a new feature bit to control 
>>> whether
>>> allowing resetting a VQ after SUSPEND. We can use more feature bits if
>>> there are requirements to perform anything after SUSPEND. But for now
>>> they are forbidden.
>> I don't know how this means, but whatever. you need to make
>> all this explicit though.
> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature
> bit has been negotiated then the device allow reset a vq after SUSPEND.
Hi Michael,

Rethink of this, as you suggested before, In V2, I will forbid resetting
VQs after SUSPEND.

Thanks
>>
>>>>> +\end{itemize}
>>>>> +
>>>>>    \section{Feature Bits}\label{sec:Basic Facilities of a Virtio 
>>>>> Device / Feature Bits}
>>>>>    Each virtio device offers all the features it understands.  During
>>>>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature 
>>>>> Bits}\label{sec:Reserved Feature Bits}
>>>>>        \ref{devicenormative:Basic Facilities of a Virtio Device / 
>>>>> Feature Bits} for
>>>>>        handling features reserved for future use.
>>>>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the 
>>>>> driver can
>>>>> +   SUSPEND the device.
>>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device 
>>>>> Status Field}.
>>>>> +
>>>>>    \end{description}
>>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved 
>>>>> Feature Bits}
>>>>> -- 
>>>>> 2.35.3
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [PATCH 2/5] virtio: introduce SUSPEND bit in device status
@ 2023-09-18  6:50             ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  6:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/18/2023 10:56 AM, Zhu, Lingshan wrote:
>
>
> On 9/15/2023 7:10 PM, Michael S. Tsirkin wrote:
>> On Fri, Sep 15, 2023 at 10:57:33AM +0800, Zhu, Lingshan wrote:
>>>
>>> On 9/14/2023 7:34 PM, Michael S. Tsirkin wrote:
>>>> On Wed, Sep 06, 2023 at 04:16:34PM +0800, Zhu Lingshan wrote:
>>>>> This patch introduces a new status bit in the device status: SUSPEND.
>>>>>
>>>>> This SUSPEND bit can be used by the driver to suspend a device,
>>>>> in order to stabilize the device states and virtqueue states.
>>>>>
>>>>> Its main use case is live migration.
>>>>>
>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>    content.tex | 31 +++++++++++++++++++++++++++++++
>>>>>    1 file changed, 31 insertions(+)
>>>>>
>>>>> diff --git a/content.tex b/content.tex
>>>>> index 0e492cd..0fab537 100644
>>>>> --- a/content.tex
>>>>> +++ b/content.tex
>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and 
>>>>> ready to
>>>>>      drive the device.
>>>>> +\item[SUSPEND (16)] When VIRTIO_F_SUSPEND is negotiated, 
>>>>> indicates that the
>>>>> +  device has been suspended by the driver.
>>>>> +
>>>>>    \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has 
>>>>> experienced
>>>>>      an error from which it can't recover.
>>>>>    \end{description}
>>>>> @@ -73,6 +76,10 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    recover by issuing a reset.
>>>>>    \end{note}
>>>>> +The driver SHOULD NOT set SUSPEND if FEATURES_OK is not set.
>>>>> +
>>>>> +When setting SUSPEND, the driver MUST re-read \field{device 
>>>>> status} to ensure the SUSPEND bit is set.
>>>>> +
>>>>>    \devicenormative{\subsection}{Device Status Field}{Basic 
>>>>> Facilities of a Virtio Device / Device Status Field}
>>>>>    The device MUST NOT consume buffers or send any used buffer
>>>>> @@ -82,6 +89,26 @@ \section{\field{Device Status} 
>>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>    that a reset is needed.  If DRIVER_OK is set, after it sets 
>>>>> DEVICE_NEEDS_RESET, the device
>>>>>    MUST send a device configuration change notification to the 
>>>>> driver.
>>>>> +The device MUST ignore SUSPEND if FEATURES_OK is not set.
>>>>> +
>>>>> +The device MUST ignore SUSPEND if VIRTIO_F_SUSPEND is not 
>>>>> negotiated.
>>>> why? let's just forbid driver from setting it.
>>> OK
>>>>> +
>>>>> +The device SHOULD allow settings to \field{device status} even 
>>>>> when SUSPEND is set.
>>>>> +
>>>>> +If VIRTIO_F_SUSPEND is negotiated and SUSPEND is set, the device 
>>>>> SHOULD clear SUSPEND
>>>>> +and resumes operation upon DRIVER_OK.
>>>>> +
>>>> sorry what?
>>> In case of a failed or cancelled Live Migration, the device needs to 
>>> resume
>>> operation.
>>> However the spec forbids the driver to clear a device status bit, so
>>> re-writing
>>> DRIVER_OK is expected to clear SUSPEND and the device resume operation.
>> No, DRIVER_OK is already set. Setting a bit that is already set should
>> not have side effects. In fact auto-clearing suspend is problematic too.
> The spec says: Set the DRIVER_OK status bit. At this point the device 
> is “live”.
>
> So semantically DRIVER_OK can bring the device to live even from SUSPEND.
>
> In the implementation, the device can check whether SUSPEND is set, then
> decide what to do. Just don't ignore DRIVER_OK if it is already
> set, and the driver should not clear a device status bit.
>
>>
>>
>>>>> +If VIRTIO_F_SUSPEND is negotiated, when the driver sets SUSPEND,
>>>>> +the device SHOULD perform the following actions before presenting 
>>>>> SUSPEND bit in the \field{device status}:
>>>>> +
>>>>> +\begin{itemize}
>>>>> +\item Stop consuming buffers of any virtqueues and mark all 
>>>>> finished descritors as used.
>>>>> +\item Wait until all descriptors that being processed to finish 
>>>>> and mark them as used.
>>>>> +\item Flush all used buffer and send used buffer notifications to 
>>>>> the driver.
>>>> flush how?
>>> This is device-type-specific, and we will include tracking inflight
>>> descriptors(buffers) in V2.
>>>>> +\item Record Virtqueue State of each enabled virtqueue, see 
>>>>> section \ref{sec:Virtqueues / Virtqueue State}
>>>> record where?
>>> This is transport specific, for PCI, patch 5 introduces two new 
>>> fields for
>>> avail and used state
>> they clearly can't store state for all vqs, these are just two 16 bit 
>> fields.
> vq states filed can work with queue_select like other vq fields.
> I will document this in the comment.
>>
>>>>> +\item Pause its operation except \field{device status} and 
>>>>> preserve configurations in its Device Configuration Space, see 
>>>>> \ref{sec:Basic Facilities of a Virtio Device / Device 
>>>>> Configuration Space}
>>>> pause in what sense? completely? this does not seem realistic.
>>>> e.g. pci express link has to stay active or device will die.
>>> only pause virtio, I will rephrase the sentence as "pause its virtio
>>> operation".
>> that is vague too. for example what happens to link state of
>> a networking device?
> Then how about we say: pause operation in both data-path and 
> control-path?
>
> Or do you have any suggestion?
>>
>>> Others like PCI link in the example is out of the spec and we don't 
>>> need
>>> to migrate them.
>>>>
>>>> also, presumably here it is except a bunch of other fields.
>>>> e.g. what about queue select and all related queue fields?
>>> For now they are forbidden.
>>>
>>> As SiWei suggested, we will introduce a new feature bit to control 
>>> whether
>>> allowing resetting a VQ after SUSPEND. We can use more feature bits if
>>> there are requirements to perform anything after SUSPEND. But for now
>>> they are forbidden.
>> I don't know how this means, but whatever. you need to make
>> all this explicit though.
> a new feature bit: VIRTIO_F_RING_SUSPEND_RESET. If this feature
> bit has been negotiated then the device allow reset a vq after SUSPEND.
Hi Michael,

Rethink of this, as you suggested before, In V2, I will forbid resetting
VQs after SUSPEND.

Thanks
>>
>>>>> +\end{itemize}
>>>>> +
>>>>>    \section{Feature Bits}\label{sec:Basic Facilities of a Virtio 
>>>>> Device / Feature Bits}
>>>>>    Each virtio device offers all the features it understands.  During
>>>>> @@ -937,6 +964,10 @@ \chapter{Reserved Feature 
>>>>> Bits}\label{sec:Reserved Feature Bits}
>>>>>        \ref{devicenormative:Basic Facilities of a Virtio Device / 
>>>>> Feature Bits} for
>>>>>        handling features reserved for future use.
>>>>> +  \item[VIRTIO_F_SUSPEND(42)] This feature indicates that the 
>>>>> driver can
>>>>> +   SUSPEND the device.
>>>>> +   See \ref{sec:Basic Facilities of a Virtio Device / Device 
>>>>> Status Field}.
>>>>> +
>>>>>    \end{description}
>>>>>    \drivernormative{\section}{Reserved Feature Bits}{Reserved 
>>>>> Feature Bits}
>>>>> -- 
>>>>> 2.35.3
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  6:49                 ` Zhu, Lingshan
@ 2023-09-18  6:54                   ` Parav Pandit
  2023-09-18  9:34                     ` Zhu, Lingshan
  2023-09-19  4:27                     ` Jason Wang
  0 siblings, 2 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18  6:54 UTC (permalink / raw)
  To: Zhu, Lingshan, virtio-dev, Michael S. Tsirkin, Jason Wang

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 12:19 PM


> so admin vq based LM solution can be a side channel attacking surface
It will be part of the DSM whenever it will be used in future.
Hence, it is not attack surface.

> >
> >>>> For untrusted hypervisor, same set of attack surface is present
> >>>> with
> >>>> trap+emulation.
> >>>> So both method score same. Hence its not relevant point for discussion.
> >>> this is not hypervisor, Do you see any modern hypervisor have these
> >>> issues?
> >>>
> >>> This is admin vq for LM can be a side channel attacking surface.
> > It is not.
> > Hypervisor is trusted entity.
> > For untrusted hypervisor the TDISP is unified solution build by the various
> industry bodies including DMTF, PCI for last few years.
> > We want to utilize that.
> first, TDISP is out of virtio spec.
Sure, hence, untrusted hypervisor are out of scope.
Otherwise, trap+emulation is equally dead which relies on the hypervisor to do things.

> second, TDISP devices can not be migrated for now third, admin vq can be an
> side channel attacking surface as explained above.
When TDISP are not used, hypervisor is trusted entity, period.
And hence, it cannot be considered attack surface.
An hypervisor can even disable SR-IOV. 

> >
> >>>>>> #3 There is no QoS issue with admin commands and queues. If you
> >>>>>> claim that
> >>>>> then whole virtio spec based on the virtqueues is broken.
> >>>>>> And it is certainly not the case.
> >>>>> Please do not confuse the concepts and purposes of the data queues
> >>>>> and admin vq.
> >>>>>
> >>>> I am not confused.
> >>>> There is no guarantee that a register placed on the VF will be
> >>>> serviced by the device in exact same time regardless of VF count =
> >>>> 1 or 4000.
> >>>> Yet again not relevant comparison.
> >>> please read my previous replies in other threads.
> > It does not answer.
> > The claim that somehow a polling register ensures downtime guarantee for
> scale of thousands of member devices is some specific device implementation
> without explanation.
> the registers and the LM facilities are per-device.
> >
> >>>>> For data-queues, it can be slow without mq or rss, that means
> >>>>> performance overhead, but can work.
> >>>> No, it does not work. The application failed because of jitter in
> >>>> the video and audio due to missing the latency budget.
> >>>> A financial application is terminated due to timeouts and packet loss.
> >>>>
> >>>> Device migration is just another 3rd such applications.
> >>>>
> >>>> Its also same.
> >>>> My last reply on this vague argument.
> >>> I think the points are clear, and you already understand the points,
> >>> so no need to argue anymore
> > Yes, I am clear from long time, nor AQ nor no register, RSS queues, none
> cannot guarantee any performance characteristics.
> > It is pretty clear to me.
> > Any performance guarantees are explicitly requested when desired.
> >
> >>>>> For admin vq, if it don't meet QOS requirements, it fails to
> >>>>> migrate guests.
> >>>>>
> >>>>> I have replied to the same question so many times, and this is the
> >>>>> last time.
> >>>> I also replied many times that QoS argument is not valid anymore.
> >>>> Same can happen with registers writes.
> >>>> Perf characteristics for 30+ devices is not in the virtio spec. It
> >>>> is implementation details.
> >>> as replied many times, registers only serve the device itself and
> >>> registers are not DATA PATH, means the device don't transfer data
> >>> through registers.
> > It does not matter data path or control path, the fact is it downtime assurance
> cannot be guaranteed by register interface design, it is the implementation
> details.
> > And so does for admin commands and/or AQ.
> the registers do not perform any data transitions, e.g., we don't migrate dirty
> pages through registers.
> But you do these by admin vq
So what?
Just because data transfer is not done, it does not mean that thousands of polling register writes complete in stipulated time.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  6:54                   ` Parav Pandit
@ 2023-09-18  9:34                     ` Zhu, Lingshan
  2023-09-18 18:41                       ` Parav Pandit
  2023-09-19  4:27                     ` Jason Wang
  1 sibling, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-18  9:34 UTC (permalink / raw)
  To: Parav Pandit, virtio-dev, Michael S. Tsirkin, Jason Wang



On 9/18/2023 2:54 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 12:19 PM
>
>> so admin vq based LM solution can be a side channel attacking surface
> It will be part of the DSM whenever it will be used in future.
> Hence, it is not attack surface.
I am not sure, why we have to trust the PF?
This is out of virtio scope anyway.

I have explained many times how it can be a attack surface, and examples.

What happen if malicious SW dump guest memory by admin vq dirty page
tracking feature?
>
>>>>>> For untrusted hypervisor, same set of attack surface is present
>>>>>> with
>>>>>> trap+emulation.
>>>>>> So both method score same. Hence its not relevant point for discussion.
>>>>> this is not hypervisor, Do you see any modern hypervisor have these
>>>>> issues?
>>>>>
>>>>> This is admin vq for LM can be a side channel attacking surface.
>>> It is not.
>>> Hypervisor is trusted entity.
>>> For untrusted hypervisor the TDISP is unified solution build by the various
>> industry bodies including DMTF, PCI for last few years.
>>> We want to utilize that.
>> first, TDISP is out of virtio spec.
> Sure, hence, untrusted hypervisor are out of scope.
> Otherwise, trap+emulation is equally dead which relies on the hypervisor to do things.
so lets focus on LM topic, other than confidential computing.
>
>> second, TDISP devices can not be migrated for now third, admin vq can be an
>> side channel attacking surface as explained above.
> When TDISP are not used, hypervisor is trusted entity, period.
> And hence, it cannot be considered attack surface.
> An hypervisor can even disable SR-IOV.
if SRIOV is disabled, so you are migrating PF?
A PF certainly can not migrate itself by its own admin vq.

again, TDISP is out of spec and TDISP devices are not
migratable.
>
>>>>>>>> #3 There is no QoS issue with admin commands and queues. If you
>>>>>>>> claim that
>>>>>>> then whole virtio spec based on the virtqueues is broken.
>>>>>>>> And it is certainly not the case.
>>>>>>> Please do not confuse the concepts and purposes of the data queues
>>>>>>> and admin vq.
>>>>>>>
>>>>>> I am not confused.
>>>>>> There is no guarantee that a register placed on the VF will be
>>>>>> serviced by the device in exact same time regardless of VF count =
>>>>>> 1 or 4000.
>>>>>> Yet again not relevant comparison.
>>>>> please read my previous replies in other threads.
>>> It does not answer.
>>> The claim that somehow a polling register ensures downtime guarantee for
>> scale of thousands of member devices is some specific device implementation
>> without explanation.
>> the registers and the LM facilities are per-device.
>>>>>>> For data-queues, it can be slow without mq or rss, that means
>>>>>>> performance overhead, but can work.
>>>>>> No, it does not work. The application failed because of jitter in
>>>>>> the video and audio due to missing the latency budget.
>>>>>> A financial application is terminated due to timeouts and packet loss.
>>>>>>
>>>>>> Device migration is just another 3rd such applications.
>>>>>>
>>>>>> Its also same.
>>>>>> My last reply on this vague argument.
>>>>> I think the points are clear, and you already understand the points,
>>>>> so no need to argue anymore
>>> Yes, I am clear from long time, nor AQ nor no register, RSS queues, none
>> cannot guarantee any performance characteristics.
>>> It is pretty clear to me.
>>> Any performance guarantees are explicitly requested when desired.
>>>
>>>>>>> For admin vq, if it don't meet QOS requirements, it fails to
>>>>>>> migrate guests.
>>>>>>>
>>>>>>> I have replied to the same question so many times, and this is the
>>>>>>> last time.
>>>>>> I also replied many times that QoS argument is not valid anymore.
>>>>>> Same can happen with registers writes.
>>>>>> Perf characteristics for 30+ devices is not in the virtio spec. It
>>>>>> is implementation details.
>>>>> as replied many times, registers only serve the device itself and
>>>>> registers are not DATA PATH, means the device don't transfer data
>>>>> through registers.
>>> It does not matter data path or control path, the fact is it downtime assurance
>> cannot be guaranteed by register interface design, it is the implementation
>> details.
>>> And so does for admin commands and/or AQ.
>> the registers do not perform any data transitions, e.g., we don't migrate dirty
>> pages through registers.
>> But you do these by admin vq
> So what?
> Just because data transfer is not done, it does not mean that thousands of polling register writes complete in stipulated time.
1) again, they are per-device facilities
2) we use very few registers, even status byte does not require polling, 
just re-read with delay.

Please refer to the code for setting FEATURES_OK.



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-18  3:02           ` Zhu, Lingshan
@ 2023-09-18 17:30             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-18 17:30 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Mon, Sep 18, 2023 at 11:02:18AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
> > On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> > > > > This commit specifies the constraints of the virtqueue state,
> > > > > and the actions should be taken by the device when SUSPEND
> > > > > and DRIVER_OK is set
> > > > > 
> > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >    content.tex | 19 +++++++++++++++++++
> > > > >    1 file changed, 19 insertions(+)
> > > > > 
> > > > > diff --git a/content.tex b/content.tex
> > > > > index 0fab537..9d727ce 100644
> > > > > --- a/content.tex
> > > > > +++ b/content.tex
> > > > > @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
> > > > >    When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> > > > >    is always 0
> > > > > +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> > > > > +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> > > > > +used index in the used ring.
> > > > > +
> > > > > +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> > > > > +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> > > > > +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> > > > > +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> > > > > +the device MUST record the Virtqueue State of every enabled virtqueue
> > > > > +in \field{Available State} and \field{Used State} respectively,
> > > > record how?
> > > This is transport specific, for PCI they are recorded in the common config
> > > space,
> > > two new fields of them are introduced in patch 5.
> > 
> > that is not enough space to record state for every enabled vq.
> They can work with queue_select like many other vq configurations.

queue select is under driver control.


> I will mention this in the comment.
> > 
> > > > > +and correspondingly restore the Virtqueue State of every enabled virtqueue
> > > > > +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> > > > when is that?
> > > When the DRIVER sets DRIVER_OK and done before the device presents
> > > DRIVER_OK.
> > I don't really understand the flow here. does SUSPEND clear DRIVER_OK
> > then?
> SUSPEND does not clear DRIVER, I think this is not a must.

then I don't get what does "when DRIVER_OK is set" mean - it stays
set all the time.


> > 
> > 
> > > > 
> > > > > +
> > > > >    \input{admin.tex}
> > > > >    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > > > > -- 
> > > > > 2.35.3
> > 
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
@ 2023-09-18 17:30             ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-18 17:30 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev

On Mon, Sep 18, 2023 at 11:02:18AM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
> > On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
> > > > > This commit specifies the constraints of the virtqueue state,
> > > > > and the actions should be taken by the device when SUSPEND
> > > > > and DRIVER_OK is set
> > > > > 
> > > > > Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >    content.tex | 19 +++++++++++++++++++
> > > > >    1 file changed, 19 insertions(+)
> > > > > 
> > > > > diff --git a/content.tex b/content.tex
> > > > > index 0fab537..9d727ce 100644
> > > > > --- a/content.tex
> > > > > +++ b/content.tex
> > > > > @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
> > > > >    When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
> > > > >    is always 0
> > > > > +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
> > > > > +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
> > > > > +used index in the used ring.
> > > > > +
> > > > > +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
> > > > > +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
> > > > > +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
> > > > > +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
> > > > > +
> > > > > +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
> > > > > +the device MUST record the Virtqueue State of every enabled virtqueue
> > > > > +in \field{Available State} and \field{Used State} respectively,
> > > > record how?
> > > This is transport specific, for PCI they are recorded in the common config
> > > space,
> > > two new fields of them are introduced in patch 5.
> > 
> > that is not enough space to record state for every enabled vq.
> They can work with queue_select like many other vq configurations.

queue select is under driver control.


> I will mention this in the comment.
> > 
> > > > > +and correspondingly restore the Virtqueue State of every enabled virtqueue
> > > > > +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
> > > > when is that?
> > > When the DRIVER sets DRIVER_OK and done before the device presents
> > > DRIVER_OK.
> > I don't really understand the flow here. does SUSPEND clear DRIVER_OK
> > then?
> SUSPEND does not clear DRIVER, I think this is not a must.

then I don't get what does "when DRIVER_OK is set" mean - it stays
set all the time.


> > 
> > 
> > > > 
> > > > > +
> > > > >    \input{admin.tex}
> > > > >    \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
> > > > > -- 
> > > > > 2.35.3
> > 
> > This publicly archived list offers a means to provide input to the
> > OASIS Virtual I/O Device (VIRTIO) TC.
> > 
> > In order to verify user consent to the Feedback License terms and
> > to minimize spam in the list archive, subscription is required
> > before posting.
> > 
> > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > List help: virtio-comment-help@lists.oasis-open.org
> > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > Committee: https://www.oasis-open.org/committees/virtio/
> > Join OASIS: https://www.oasis-open.org/join/
> > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  9:34                     ` Zhu, Lingshan
@ 2023-09-18 18:41                       ` Parav Pandit
  2023-09-18 18:49                         ` Michael S. Tsirkin
  2023-09-19  8:01                         ` Zhu, Lingshan
  0 siblings, 2 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-18 18:41 UTC (permalink / raw)
  To: Zhu, Lingshan, virtio-dev, Michael S. Tsirkin, Jason Wang


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, September 18, 2023 3:05 PM
> 
> On 9/18/2023 2:54 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 18, 2023 12:19 PM
> >
> >> so admin vq based LM solution can be a side channel attacking surface
> > It will be part of the DSM whenever it will be used in future.
> > Hence, it is not attack surface.
> I am not sure, why we have to trust the PF?
> This is out of virtio scope anyway.
> 
> I have explained many times how it can be a attack surface, and examples.
> 
And none of that make any sense as fundamentally, hypervisor is trusted regardless of the approach.

> What happen if malicious SW dump guest memory by admin vq dirty page
> tracking feature?
What??
Where is this malicious SW is located, in guest VM?

> >
> >>>>>> For untrusted hypervisor, same set of attack surface is present
> >>>>>> with
> >>>>>> trap+emulation.
> >>>>>> So both method score same. Hence its not relevant point for discussion.
> >>>>> this is not hypervisor, Do you see any modern hypervisor have
> >>>>> these issues?
> >>>>>
> >>>>> This is admin vq for LM can be a side channel attacking surface.
> >>> It is not.
> >>> Hypervisor is trusted entity.
> >>> For untrusted hypervisor the TDISP is unified solution build by the
> >>> various
> >> industry bodies including DMTF, PCI for last few years.
> >>> We want to utilize that.
> >> first, TDISP is out of virtio spec.
> > Sure, hence, untrusted hypervisor are out of scope.
> > Otherwise, trap+emulation is equally dead which relies on the hypervisor to
> do things.
> so lets focus on LM topic, other than confidential computing.
ok.

> > Just because data transfer is not done, it does not mean that thousands of
> polling register writes complete in stipulated time.
> 1) again, they are per-device facilities
That does not satisfy that it can somehow do work in < x usec time.
> 2) we use very few registers, even status byte does not require polling, just re-
> read with delay.
> 
> Please refer to the code for setting FEATURES_OK.
It wont work when one needs to suspend the device.
There is no point of doing such work over registers as fundamental framework is over the AQ.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18 18:41                       ` Parav Pandit
@ 2023-09-18 18:49                         ` Michael S. Tsirkin
  2023-09-20  6:06                           ` Zhu, Lingshan
  2023-09-19  8:01                         ` Zhu, Lingshan
  1 sibling, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-18 18:49 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > Please refer to the code for setting FEATURES_OK.
> It wont work when one needs to suspend the device.
> There is no point of doing such work over registers as fundamental framework is over the AQ.

Well not really. It's over admin commands. When these were built the
intent always was that it's possible to use admin commands through
another interface, other than admin queue. Is there a problem
implementing admin commands over a memory BAR? For example, I can see
an "admin command" capability pointing at a BAR where
commands are supplied, and using a new group type referring to
device itself.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:29                         ` Parav Pandit
@ 2023-09-19  4:25                           ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:25 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:29 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:42 AM
> >
> > On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > >
> > > > It's not about how many states in a single state machine, it's about
> > > > how many state machines that exist for device status. Having more
> > > > than one creates big obstacles and complexity in the device. You
> > > > need to define the interaction of each state otherwise you leave undefined
> > behaviours.
> > > The device mode has zero relation to the device status.
> >
> > You will soon get this issue when you want to do nesting.
> >
> I don’t think so. One needs to intercept it when one wants to do trap+emulation which seems to fullfil the nesting use case.

Well, how can you trap it? You have admin vq in L0, it means the
suspending is never exposed to L1 unless you assign the owner to L1.
Is this what you want?

>
> > > It does not mess with it at all.
> > > In fact the new bits in device status is making it more complex for the device
> > to handle.
> >
> > Are you challenging the design of the device status? It's definitely too late to do
> > this.
> >
> No. I am saying the extending device_status with yet another state is equally complex and its core of the device.

You never explain why.

>
> > This proposal increases just one bit and that worries you? Or you think one
> > more state is much more complicated than a new state machine with two
> > states?
>
> It is mode and not state. And two modes are needed for supporting P2P device.

You keep saying you are migrating the core virtio devices but then you
are saying it is required for PCI. And you never explain why it can't
be done by reusing the device status bit.

> When one wants to do with mediation, there also two states are needed.
>
> The key is modes are not interacting

You need to explain why they are not interacting. It touches the
virtio facility which (partially) overlaps the function of the device
status for sure. You invent a new state machine, and leave the vendors
to guess how or why they are not interacting with the existing one.
There are just too many corner cases that need to be figured out.

For example:

How do you define stop? Is it a virtio level stop, transport level or
a mixing of them both? Is the device allowed to stop in the middle or
reset, feature negotiation or even transport specific things like FLR?
If yes, how about other operations and who defines and maintains those
transitional states? If not, why and how long would a stop wait for an
operation? Can a stop fail? What happens if the driver wants to reset
but the device is stopped by the admin commands? Who suppresses who
and why?

This demonstrates the complexity of your proposal and I don't see any
of the above were clearly stated in your series. Reusing the existing
device status machine, everything would be simplified.

> with the device_status because device_status is just another register of the virtio.

Let's don't do layer violation, device status is the basic facility of
the virtio device which is not coupled with any transport so it is not
necessarily implemented via registers.

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  4:25                           ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:25 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:29 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:42 AM
> >
> > On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > >
> > > > It's not about how many states in a single state machine, it's about
> > > > how many state machines that exist for device status. Having more
> > > > than one creates big obstacles and complexity in the device. You
> > > > need to define the interaction of each state otherwise you leave undefined
> > behaviours.
> > > The device mode has zero relation to the device status.
> >
> > You will soon get this issue when you want to do nesting.
> >
> I don’t think so. One needs to intercept it when one wants to do trap+emulation which seems to fullfil the nesting use case.

Well, how can you trap it? You have admin vq in L0, it means the
suspending is never exposed to L1 unless you assign the owner to L1.
Is this what you want?

>
> > > It does not mess with it at all.
> > > In fact the new bits in device status is making it more complex for the device
> > to handle.
> >
> > Are you challenging the design of the device status? It's definitely too late to do
> > this.
> >
> No. I am saying the extending device_status with yet another state is equally complex and its core of the device.

You never explain why.

>
> > This proposal increases just one bit and that worries you? Or you think one
> > more state is much more complicated than a new state machine with two
> > states?
>
> It is mode and not state. And two modes are needed for supporting P2P device.

You keep saying you are migrating the core virtio devices but then you
are saying it is required for PCI. And you never explain why it can't
be done by reusing the device status bit.

> When one wants to do with mediation, there also two states are needed.
>
> The key is modes are not interacting

You need to explain why they are not interacting. It touches the
virtio facility which (partially) overlaps the function of the device
status for sure. You invent a new state machine, and leave the vendors
to guess how or why they are not interacting with the existing one.
There are just too many corner cases that need to be figured out.

For example:

How do you define stop? Is it a virtio level stop, transport level or
a mixing of them both? Is the device allowed to stop in the middle or
reset, feature negotiation or even transport specific things like FLR?
If yes, how about other operations and who defines and maintains those
transitional states? If not, why and how long would a stop wait for an
operation? Can a stop fail? What happens if the driver wants to reset
but the device is stopped by the admin commands? Who suppresses who
and why?

This demonstrates the complexity of your proposal and I don't see any
of the above were clearly stated in your series. Reusing the existing
device status machine, everything would be simplified.

> with the device_status because device_status is just another register of the virtio.

Let's don't do layer violation, device status is the basic facility of
the virtio device which is not coupled with any transport so it is not
necessarily implemented via registers.

Thanks

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18  6:54                   ` Parav Pandit
  2023-09-18  9:34                     ` Zhu, Lingshan
@ 2023-09-19  4:27                     ` Jason Wang
  2023-09-19  7:32                       ` Parav Pandit
  1 sibling, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:27 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Michael S. Tsirkin

On Mon, Sep 18, 2023 at 2:55 PM Parav Pandit <parav@nvidia.com> wrote:
>
> > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > Sent: Monday, September 18, 2023 12:19 PM
>
>
> > so admin vq based LM solution can be a side channel attacking surface
> It will be part of the DSM whenever it will be used in future.
> Hence, it is not attack surface.

DSM is not a part of TVM. So it really depends on what kind of work
did the admin virtqueue do. For commands that can't be self-contained
like provisioning, it is fine, since it is done before the TDI
assignment. But it not necessarily for your migration proposal. It
seems you've found another case that self-containing is important:
allowing the owner to access the member after TDI is attached to TVM
is a side channel attack.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:22                                     ` Parav Pandit
@ 2023-09-19  4:32                                       ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:22 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:39 AM
> >
> > On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:15 AM
> > >
> > > [..]
> > > > > > > [1]
> > > > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/ms
> > > > > > > g000
> > > > > > > 61.h
> > > > > > > tml
> > > > > >
> > > > > > The series works for stateless devices. Before we introduce
> > > > > > device states in the spec, we can't migrate stateful devices. So
> > > > > > the device context doesn't make much sense right now.
> > > > > The series works for stateful devices too. The device context covers it.
> > > >
> > > > How? Can it be used for migrating any existing stateful devices?
> > > > Don't we need to define what context means for a specific stateful
> > > > device before you can introduce things like device context? Please
> > > > go through the archives for the relevant discussions (e.g
> > > > virtio-FS), it's not as simple as introducing a device context API.
> > > >
> > > A device will have its own context for example RSS definition, or flow filters
> > tomorrow.
> >
> > If you know there are things that are missing when posting the patches, please
> > use the RFC tag.
> >
> It is not missing. They are optional, which is why it is not needed in this series.
>
> > > The device context will be extended post the first series.
> > >
> > > > And what's more, how can it handle the migration compatibility?
> > > It will be taken care in follow on as we all know that this to be checked.
> >
> > You don't even mention it anywhere in your series.
> >
> Migration compatibility is topic in itself regardless of device migration series.

Why? Without compatibility support, migration can't work in the
production environment.

> It is part of the feature provisioning phase needed regardless.

Definitely not, it is something that must be considered even without
any feature. It's about the robustness of the migration protocol.
Sometimes you need to do that since some states were lost in the
previous version of protocols or formats .

> Like how you and Lingshan wanted to keep the suspend bit series small and logical, device migration series is also logically split for the functionality.
> I don’t see a need to mention the long known missing functionality and common to both approaches.

Again, your proposal needs to describe at least the plan for dealing
with migration compatibility since you want a passthrough based
solution. That's the point.

>
> > > I will include the notes of future follow up work items in v1, which will be
> > taken care post this series.
> > >
> > > > > > Dirty page tracking in virtio is not a must for live migration
> > > > > > to work. It can be done via platform facilities or even
> > > > > > software. And to make it more efficient, it needs to utilize
> > > > > > transport facilities instead of a
> > > > general one.
> > > > > >
> > > > > It is also optional in the spec proposal.
> > > > > Most platforms claimed are not able to do efficiently either,
> > > >
> > > > Most platforms are working towards an efficient way. But we are
> > > > talking about different things, hardware based dirty page logging is
> > > > not a must, that is what I'm saying. For example, KVM doesn't use hardware
> > to log dirty pages.
> > > >
> > > I also said same, that hw based dirty page logging is not must. :) One
> > > day hw mmu will be able to track everything efficiently. I have not seen it
> > happening yet.
> >
> > How do you define efficiency? KVM uses page fault and most modern IOMMU
> > support PRI now.
> >
> One cannot define PRI as mandatory feature.

There's no way to mandate PRI, it's a PCI specific facility.

> In our research and experiments we see that PRI is significantly slower to handle page faults.
> Yet different topic...

PRI's performance is definitely another topic, it's just an example
that tracking dirty pages by device is optional and transport (PCI)
can evolve for sure. What's more important, it demonstrates the basic
design of virtio, which is trying to leverage the transport instead of
a mandatory reveinting of everything.

>
> Efficiency is defined by the downtime of the multiple devices in a VM.

Ok, but you tend to ignore my question regarding the downtime.

> And leading OS allowed device advancements by allowing device to report dirty pages in cpu and platform agnostic way...
>

It has many things that I don't see a good answer for. For example,
the QOS raised by Ling Shan.

> One can use post-copy approach as well, current device migration is around established pre-copy approach.

Another drawback of your proposal. With transport specific assistance
like PRI, you can do both pre and post. But the point is we need to
make sure pre-copy downtime can satisfy the requirement instead of
switching to another.

>
> > >
> > > > > hence the vfio subsystem added the support for it.
> > > >
> > > > As an open standard, if it is designed for a specific software
> > > > subsystem on a specific OS, it's a failure.
> > > >
> > > It is not.
> > > One need accept that, in certain areas virtio is following the trails of
> > advancement already done in sw stack.
> > > So that virtio spec advancement fits in to supply such use cases.
> > > And blocking such advancement of virtio spec to promote only_mediation
> > approach is not good either.
> > >
> > > BTW: One can say the mediation approach is also designed for specific
> > software subsystem and hence failure.
> > > I will stay away from quoting it, as I don’t see it this way.
> >
> > The proposal is based on well known technology since the birth of virtualization.
> Sure, but that does not change the fact that such series is also targeted for a specific software subsystem..

How, this series reuses the existing capability by introducing just
two more registers on the existing common cfg structure and you think
it targets a specific software subsystem? If this is true, I think you
are actually challenging the design of the whole modern PCI transport.

> And hence failure.

Failure in what sense?

>
> I didn’t say that, I said the opposite that yes, since the virtio is in catch up mode, it is defining the interface so that it can fit into these OS platforms.
> Mostly multiple of them, who all support passthrough devices.

We are talking about different things again.

>
> > I never knew a mainstream hypervisor that doesn't do trap and emulate, did
> > you?
> >
> It does trap and emulation for PCI config space, not for virtio interfaces like queues, config space and more for passthrough devices.

Well, we are in the context of live migration, no? We all know
passthrough just works fine with the existing virtio spec...

>
> > >
> > > > >
> > > > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > > > method and how it conflicts with live migration and complicates
> > > > > > the device
> > > > implementation.
> > > > > Huh, it shows the opposite.
> > > > > It shows that both will seamlessly work.
> > > >
> > > > Have you even tried your proposal with a prototype device?
> > > Of course, it is delivered to user for 1.5 years ago before bringing it to the
> > spec with virtio-net and virtio-blk devices.
> >
> > I hope this is your serious answer, but it looks like it is not. Your proposal misses
> > a lot of states as I pointed out in another thread, how can it work in fact?
> >
> Which states?

Let me repeat it for the third time. You don't even cover all the
functionality of common cfg, how can guests see a consistent common
cfg state?

> What is posted in series [1] is minimal and base required items,

You need to prove it is minimal, instead of ignoring my questions. For
example, dirty page tracking is definitely optional.

> optional one is omitted as it can be done incrementally.
> Lingshan had hard time digesting the basics of P2P and dirty page tracking work in this short series.

You never explain why this series needs to deal with P2P and dirty
page tracking.

> So there is no point in pushing large part of the device context and making the series blurry.

I don't see a good definition of "device context" and most of the
device context has been covered by the existing PCI capabilities.

> It will be done incrementally subsequently.
>
> > > > >
> > > > > > And it means you need to audit all PCI features and do
> > > > > > workaround if there're any possible issues (or using a whitelist).
> > > > > No need for any of this.
> > > >
> > > > You need to prove this otherwise it's fragile. It's the duty of the
> > > > author to justify not the reviewer.
> > > >
> > > One cannot post patches and nor review giant series in one go.
> > > Hence the work to be split on a logical boundary.
> > > Features provisioning, pci layout etc is secondary tasks to take care of.
> >
> > Again, if you know something is missing, you need to explain it in the series
> > instead of waiting for some reviewers to point it out and say it's well-known
> > afterwards.
> >
> The patch set cannot be a laundry list of items missing in virtio spec.
> It is short and focused on the device migration.

You need to mention it in the cover letter at least for a big picture
at least, what's wrong with this? It helps to save time for everyone
or people will keep asking similar questions. Is this too hard to be
understood?

>
> > >
> > > > For example FLR is required to be done in 100ms. How could you
> > > > achieve this during the live migration? How does it affect the downtime and
> > FRS?
> > > >
> > > Good technical question to discuss instead of passthrough vs
> > > mediation. :)
> > >
> > > Device administration work is separate from the device operational part.
> > > The device context records what is the current device context, when the FLR
> > occurs, the device stops all the operations.
> > > And on next read of the device context the FLRed context is returned.
> >
> > Firstly, you didn't explain how it affects the live migration, for example, what
> > happens if we try to migrate while FLR is ongoing.
> > Secondly, you ignore the other two questions.
> >
> > Let's save the time of both.
> >
> There is nothing to explain about device reset and live migration, because there is absolutely there is no touch points.

Do you think this is a valid answer to my above question? Let's don't
exhaust the patience from any reviewer.

> device_status is just another registers like rest of them.

I don't see device status itself as anything related to FLR.

> One does not need to poke around registers when doing passthrough.
>
> > >
> > > > >
> > > > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > > > don't use simple passthrough we don't need to care about this.
> > > > > >
> > > > > Exactly, we are migrating virtio device for the PCI transport.
> > > >
> > > > No, the migration facility is a general requirement for all transport.
> > > It is for all transport. One can extend when do for MMIO.
> >
> > By using admin commands? It can not perform well for registered.
> >
> Yes, admin commands using AQ on MMIO based owner device will also be just fine.

Can admin commands be implemented efficiently via registered? I would
like to see how it can work.

MMIO doesn't have the concepts of group owner etc at all or do you
know how to build one?

>
> > >
> > > > Starting from a PCI specific (actually your proposal does not even
> > > > cover all even for PCI) solution which may easily end up with issues in other
> > transports.
> > > >
> > > Like?
> >
> > The admin command/virtqueue itself may not work well for other transport.
> > That's the drawback of your proposal while this proposal doesn't do any
> > coupling.
> >
> There is no coupling in the spec of admin command with virtqueue as Michael consistently insisted.
> And in my proposal also there is no such coupling.

I hope so but I don't think so. We need to at least do this explicitly
by moving all the state definitions to the "basic facility" part.

>
> > >
> > > > Even if you want to migrate virtio for PCI,  please at least read
> > > > Qemu migration codes for virtio and PCI, then you will soon realize
> > > > that a lot of things are missing in your proposal.
> > > >
> > > Device context is something that will be extended.
> > > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> > transport.
> >
> > This is just one mini stuff, how about PCI config space and others?
> >
> No need to migrate the PCI config space, because migration is of the virtio device, and not the underlying transport.

Let me ask you a simple question, if you don't migrate the PCI config
space, how can you guarantee that guests see the same config space
state after migration? What happens if a specific capability exists
only in the src but not the destination? Or do you want to provision
PCI capabilities?

> Therefore, one can migrate from virtio member device to a fully software based device as well and vis versa.

Please answer my question above.

>
> > Again, please read Qemu codes, a lot of things are missing in your proposal
> > now. If everything is fine to do passthrough based live migration, I'm pretty sure
> > you need more than what Qemu has since it can only do a small fraction of the
> > whole PCI.
> >
> I will read.
> Many of the pieces may be implemented by the device over time following the charter.
>
> > >
> > > > > As usual, if you have to keep arguing about not doing
> > > > > passhthrough, we are
> > > > surely past that point.
> > > >
> > > > Who is "we"?
> > > >
> > > We = You and me.
> > > From 2021, you keep objecting that passthrough must not be done.
> >
> > This is a big misunderstanding, you need to justify it or at least address the
> > concerns from any reviewer.
> >
> They are getting addressed, if you have comments, please post those comments in the actual series.
> I wouldn’t diverge to discuss in different series here.

Well, Lingshan's series was posted before you and it's you that keep
referring to your proposal here. What's more, I've asked some
questions but most of them don't have a good answer.  So I need to
stop before I can ask more.

>
> > > And blocking the work done by other technical committee members to
> > improve the virtio spec to make that happen is simply wrong.
> >
> > It's unrealistic to think that one will be 100% correct. Justify your proposal or
> > why I was wrong instead of ignoring my questions and complaining. That is why
> > we need a community. If it doesn't work, virtio provides another process for
> > convergence.
> >
> I am not expecting you to be correct at all. I totally agree that you may miss something, I may miss something.
> And this is why I repeatedly, humbly ask to converge and jointly address the passthrough mode without trap+emulation method.
> The way I understood from your comment is, passthrough for hw based device must not be done and multiple of hw vendors disagree to it.

Again, this is a big misunderstanding. Passthrough can work doesn't
mean your proposal can work. I'm asking questions and want to figure
out if/how it can work correctly. But you keep ignoring them or
raising other unrelated issues.

>
> > >
> > > > Is something like what you said here passed the vote and written to
> > > > the spec?
> > > Not only me.
> > > The virtio technical committee has agreed for nested and hardware-based
> > implementation _both_.
> > >
> > > " hardware-based implementations" is part of the virtio specification charter
> > with ballot of [1].
> > >
> > > [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> >
> > Let's don't do conceptual shifts, I was asking the passthrough but you give me
> > the hardware implementation.
> >
> Passthrough devices implemented by hw which does dirty tracking and following the spec.

Why is passthrough coupled with dirty tracking?

>
> > >
> > > And passthrough hardware-based device is in the charter that we strive to
> > support.
> > >
> > > > We all know the current virtio spec is not built upon passthrough.
> > >
> > > This efforts improve the passthrough hw based implementation that should
> > not be blocked.
> >
> > Your proposal was posted only for several days and you think I would block that
> > just because I asked several questions and some of them are not answered?
> >
> If I misunderstood, then I am sorry.
> Lets progress and improve the passthrough use case without trap+emulation.

Unless any reviewer says no, the comments or concerns are a good
opportunity for you to justify your method. That's what I'm doing
right now and how the community works.

> Trap+emulation=mediation is also a valid solution for nested case.

Again. Not only for the nested case. This method has been used for
cloud vendors now.

> And I frankly see a need for both as both are solving a different problem.

Then, let's don't couple state, suspending, dirty page tracking with
admin commands.

> Trap+emulation cannot achieve passthrough mode, hence my request was not to step on each other.

It's easy to not step on others, but it would end up with duplications for sure.

>
> When both can use the common infra, it is good to do that, when they cannot, due to the technical challenges of underlying transport, they should evolve differently.
>
> > >
> > > > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> > > >
> > > > It's not the mediation, we're not doing vDPA, the device model we
> > > > had in hardware and we present to guests are all virtio devices.
> > > > It's the trap and emulation which is fundamental in the world of
> > > > virtualization for the past decades. It's the model we used to
> > > > virtualize standard devices. If you want to debate this methodology, virtio
> > community is clearly the wrong forum.
> > > >
> > > I am not debating it at all. You keep bringing up the point of mediation.
> > >
> > > The proposal of [1] is clear that wants to do hardware based passthrough
> > devices with least amount of virtio level mediation.
> > >
> > > So somewhere mode of virtualizing has been used, that’s fine, it can
> > > continue with full virtualization, mediation,
> > >
> > > And also hardware based passthrough device.
> > >
> > > > >
> > > > > Series [1] will be enhanced further to support virtio passthrough
> > > > > device for
> > > > device context and more.
> > > > > Even further we like to extend the support.
> > > > >
> > > > > > Since the functionality proposed in this series focus on the
> > > > > > minimal set of the functionality for migration, it is virtio
> > > > > > specific and self contained so nothing special is required to work in the
> > nest.
> > > > >
> > > > > Maybe it is.
> > > > >
> > > > > Again, I repeat and like to converge the admin commands between
> > > > passthrough and non-passthrough cases.
> > > >
> > > > You need to prove at least that your proposal can work for the
> > > > passthrough before we can try to converge.
> > > >
> > > What do you mean by "prove"? virtio specification development is not proof
> > based method.
> >
> > For example, several of my questions were ignored.
> >
> I didn’t ignore, but if I miss, I will answer.
>
> > >
> > > If you want to participate, please review the patches and help community to
> > improve.
> >
> > See above.
> >
> > >
> > > > > If we can converge it is good.
> > > > > If not both modes can expand.
> > > > > It is not either or as use cases are different.
> > > >
> > > > Admin commands are not the cure for all, I've stated drawbacks in
> > > > other threads. Not repeating it again here.
> > > He he, sure, I am not attempting to cure all.
> > > One solution does not fit all cases.
> >
> > Then why do you want to couple migration with admin commands?
> >
> Because of following.
> 1. A device migration needs to bulk data transfer, this is something cannot be done with tiny registers.
> Cannot be done through registers, because
> a. registers are slow for bidirectional communication
> b. do not scale well with scale of VFs

That's pretty fine, but let's not limit it to a virtqueue. Virtqueue
may not work for all the cases:

I must repeat some of Ling Shan's questions since I don't see a good
answer for them now.

1) If you want to use virtqueue to do the migration with a downtime
requirement. Is the driver required to do some sort of software QOS?
For example what happens if one wants to migrate but the admin
virtqueue is out of space? And do we need a timeout for a specific
command and if yes what happens after the timeout?
2) Assuming one round of the migration requires several commands. Are
they allowed to be submitted in a batch? If yes, how is the ordering
guaranteed or we don't need it at all? If not, why do we even need a
queue?

If you're using an existing transport specific mechanism, you don't
need to care about the above. I'm not saying admin virtqueue can't
work but it definitely has more things to be considered.

>
> > > Admin commands are used to solve the specific problem for which the AQ is
> > designed for.
> > >
> > > One can make argument saying take pci fabric to 10 km distance, don’t bring
> > new virtio tcp transport...
> > >
> > > Drawing boundaries around virtio spec in certain way only makes it further
> > inferior. So please do not block advancements bring in [1].
> >
> > As a reviewer, I ask questions but some of them are ignored, do you expect the
> > reviewer to figure out by themselves?
> Sure, please review.
>
> Many of them were not questions, but assertion and conclusions that it does not fit nested.. and sub-optional etc.

I think we all agree that your proposal does not fit for nesting, no?
It demonstrates that work needs to be done in the basic facility
first.

What's more the conclusion is for coupling live migration with admin
command. This point has been clarified several times before.

>
> >
> > > We really would like to make it more robust with your rich experience and
> > inputs, if you care to participate.
> >
> > We can collaborate for sure: as I pointed out in another threads, from what I
> > can see from the both proposals of the current version:
> >
> > I see a good opportunity to build your admin commands proposal on top of this
> > proposal. Or it means, we can focus on what needs to be migrated first:
> >
> > 1) queue state
> This is just one small part of the device context
> So once a device context is read/written, it covers q.

That's a layer violation. Virtqueue is the basic facility, states need
to be defined there.

>
> > 2) inflight descriptors
> Same a q state, it is part of the device context.

Admin commands are not the only way to access device context. For
example, do you agree the virtqueue address is part of the device
context? If yes, it is available in the common configuration now.

>
> > 3) dirty pages (optional)
> > 4) device state(context) (optional)
> >
> It is same as #1 and #2.
> Splitting them from #1 and #2 is not needed.
>
> We can extend the device context to be selectively queried for nested case..
>
> > I'd leave 3 or 4 since they are very complicated features. Then we can invent an
> > interface to access those facilities? This is how this series is structured.
> >
> > And what's more, admin commands or transport specific interfaces. And when
> > we invent admin commands, you may realize you are inventing a new transport
> > which is the idea of transport via admin commands.
>
> Not really. it is not a new transport at all.
> I explained you before when you quote is as transport, it must carry the driver notifications as well..
> Otherwise it is just set of commands..

I've explained that you need admin commands to save and load all
existing virtio PCI capabilities. This means a driver can just use
those commands to work. If not, please explain why I was wrong.

Thanks





>
> The new commands are self contained anyway of [1].
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-1-parav@nvidia.com/T/#t


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  4:32                                       ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:22 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:39 AM
> >
> > On Wed, Sep 13, 2023 at 2:39 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:15 AM
> > >
> > > [..]
> > > > > > > [1]
> > > > > > > https://lists.oasis-open.org/archives/virtio-comment/202309/ms
> > > > > > > g000
> > > > > > > 61.h
> > > > > > > tml
> > > > > >
> > > > > > The series works for stateless devices. Before we introduce
> > > > > > device states in the spec, we can't migrate stateful devices. So
> > > > > > the device context doesn't make much sense right now.
> > > > > The series works for stateful devices too. The device context covers it.
> > > >
> > > > How? Can it be used for migrating any existing stateful devices?
> > > > Don't we need to define what context means for a specific stateful
> > > > device before you can introduce things like device context? Please
> > > > go through the archives for the relevant discussions (e.g
> > > > virtio-FS), it's not as simple as introducing a device context API.
> > > >
> > > A device will have its own context for example RSS definition, or flow filters
> > tomorrow.
> >
> > If you know there are things that are missing when posting the patches, please
> > use the RFC tag.
> >
> It is not missing. They are optional, which is why it is not needed in this series.
>
> > > The device context will be extended post the first series.
> > >
> > > > And what's more, how can it handle the migration compatibility?
> > > It will be taken care in follow on as we all know that this to be checked.
> >
> > You don't even mention it anywhere in your series.
> >
> Migration compatibility is topic in itself regardless of device migration series.

Why? Without compatibility support, migration can't work in the
production environment.

> It is part of the feature provisioning phase needed regardless.

Definitely not, it is something that must be considered even without
any feature. It's about the robustness of the migration protocol.
Sometimes you need to do that since some states were lost in the
previous version of protocols or formats .

> Like how you and Lingshan wanted to keep the suspend bit series small and logical, device migration series is also logically split for the functionality.
> I don’t see a need to mention the long known missing functionality and common to both approaches.

Again, your proposal needs to describe at least the plan for dealing
with migration compatibility since you want a passthrough based
solution. That's the point.

>
> > > I will include the notes of future follow up work items in v1, which will be
> > taken care post this series.
> > >
> > > > > > Dirty page tracking in virtio is not a must for live migration
> > > > > > to work. It can be done via platform facilities or even
> > > > > > software. And to make it more efficient, it needs to utilize
> > > > > > transport facilities instead of a
> > > > general one.
> > > > > >
> > > > > It is also optional in the spec proposal.
> > > > > Most platforms claimed are not able to do efficiently either,
> > > >
> > > > Most platforms are working towards an efficient way. But we are
> > > > talking about different things, hardware based dirty page logging is
> > > > not a must, that is what I'm saying. For example, KVM doesn't use hardware
> > to log dirty pages.
> > > >
> > > I also said same, that hw based dirty page logging is not must. :) One
> > > day hw mmu will be able to track everything efficiently. I have not seen it
> > happening yet.
> >
> > How do you define efficiency? KVM uses page fault and most modern IOMMU
> > support PRI now.
> >
> One cannot define PRI as mandatory feature.

There's no way to mandate PRI, it's a PCI specific facility.

> In our research and experiments we see that PRI is significantly slower to handle page faults.
> Yet different topic...

PRI's performance is definitely another topic, it's just an example
that tracking dirty pages by device is optional and transport (PCI)
can evolve for sure. What's more important, it demonstrates the basic
design of virtio, which is trying to leverage the transport instead of
a mandatory reveinting of everything.

>
> Efficiency is defined by the downtime of the multiple devices in a VM.

Ok, but you tend to ignore my question regarding the downtime.

> And leading OS allowed device advancements by allowing device to report dirty pages in cpu and platform agnostic way...
>

It has many things that I don't see a good answer for. For example,
the QOS raised by Ling Shan.

> One can use post-copy approach as well, current device migration is around established pre-copy approach.

Another drawback of your proposal. With transport specific assistance
like PRI, you can do both pre and post. But the point is we need to
make sure pre-copy downtime can satisfy the requirement instead of
switching to another.

>
> > >
> > > > > hence the vfio subsystem added the support for it.
> > > >
> > > > As an open standard, if it is designed for a specific software
> > > > subsystem on a specific OS, it's a failure.
> > > >
> > > It is not.
> > > One need accept that, in certain areas virtio is following the trails of
> > advancement already done in sw stack.
> > > So that virtio spec advancement fits in to supply such use cases.
> > > And blocking such advancement of virtio spec to promote only_mediation
> > approach is not good either.
> > >
> > > BTW: One can say the mediation approach is also designed for specific
> > software subsystem and hence failure.
> > > I will stay away from quoting it, as I don’t see it this way.
> >
> > The proposal is based on well known technology since the birth of virtualization.
> Sure, but that does not change the fact that such series is also targeted for a specific software subsystem..

How, this series reuses the existing capability by introducing just
two more registers on the existing common cfg structure and you think
it targets a specific software subsystem? If this is true, I think you
are actually challenging the design of the whole modern PCI transport.

> And hence failure.

Failure in what sense?

>
> I didn’t say that, I said the opposite that yes, since the virtio is in catch up mode, it is defining the interface so that it can fit into these OS platforms.
> Mostly multiple of them, who all support passthrough devices.

We are talking about different things again.

>
> > I never knew a mainstream hypervisor that doesn't do trap and emulate, did
> > you?
> >
> It does trap and emulation for PCI config space, not for virtio interfaces like queues, config space and more for passthrough devices.

Well, we are in the context of live migration, no? We all know
passthrough just works fine with the existing virtio spec...

>
> > >
> > > > >
> > > > > > The FLR, P2P demonstrates the fragility of a simple passthrough
> > > > > > method and how it conflicts with live migration and complicates
> > > > > > the device
> > > > implementation.
> > > > > Huh, it shows the opposite.
> > > > > It shows that both will seamlessly work.
> > > >
> > > > Have you even tried your proposal with a prototype device?
> > > Of course, it is delivered to user for 1.5 years ago before bringing it to the
> > spec with virtio-net and virtio-blk devices.
> >
> > I hope this is your serious answer, but it looks like it is not. Your proposal misses
> > a lot of states as I pointed out in another thread, how can it work in fact?
> >
> Which states?

Let me repeat it for the third time. You don't even cover all the
functionality of common cfg, how can guests see a consistent common
cfg state?

> What is posted in series [1] is minimal and base required items,

You need to prove it is minimal, instead of ignoring my questions. For
example, dirty page tracking is definitely optional.

> optional one is omitted as it can be done incrementally.
> Lingshan had hard time digesting the basics of P2P and dirty page tracking work in this short series.

You never explain why this series needs to deal with P2P and dirty
page tracking.

> So there is no point in pushing large part of the device context and making the series blurry.

I don't see a good definition of "device context" and most of the
device context has been covered by the existing PCI capabilities.

> It will be done incrementally subsequently.
>
> > > > >
> > > > > > And it means you need to audit all PCI features and do
> > > > > > workaround if there're any possible issues (or using a whitelist).
> > > > > No need for any of this.
> > > >
> > > > You need to prove this otherwise it's fragile. It's the duty of the
> > > > author to justify not the reviewer.
> > > >
> > > One cannot post patches and nor review giant series in one go.
> > > Hence the work to be split on a logical boundary.
> > > Features provisioning, pci layout etc is secondary tasks to take care of.
> >
> > Again, if you know something is missing, you need to explain it in the series
> > instead of waiting for some reviewers to point it out and say it's well-known
> > afterwards.
> >
> The patch set cannot be a laundry list of items missing in virtio spec.
> It is short and focused on the device migration.

You need to mention it in the cover letter at least for a big picture
at least, what's wrong with this? It helps to save time for everyone
or people will keep asking similar questions. Is this too hard to be
understood?

>
> > >
> > > > For example FLR is required to be done in 100ms. How could you
> > > > achieve this during the live migration? How does it affect the downtime and
> > FRS?
> > > >
> > > Good technical question to discuss instead of passthrough vs
> > > mediation. :)
> > >
> > > Device administration work is separate from the device operational part.
> > > The device context records what is the current device context, when the FLR
> > occurs, the device stops all the operations.
> > > And on next read of the device context the FLRed context is returned.
> >
> > Firstly, you didn't explain how it affects the live migration, for example, what
> > happens if we try to migrate while FLR is ongoing.
> > Secondly, you ignore the other two questions.
> >
> > Let's save the time of both.
> >
> There is nothing to explain about device reset and live migration, because there is absolutely there is no touch points.

Do you think this is a valid answer to my above question? Let's don't
exhaust the patience from any reviewer.

> device_status is just another registers like rest of them.

I don't see device status itself as anything related to FLR.

> One does not need to poke around registers when doing passthrough.
>
> > >
> > > > >
> > > > > > This is tricky and we are migrating virtio not virtio-pci. If we
> > > > > > don't use simple passthrough we don't need to care about this.
> > > > > >
> > > > > Exactly, we are migrating virtio device for the PCI transport.
> > > >
> > > > No, the migration facility is a general requirement for all transport.
> > > It is for all transport. One can extend when do for MMIO.
> >
> > By using admin commands? It can not perform well for registered.
> >
> Yes, admin commands using AQ on MMIO based owner device will also be just fine.

Can admin commands be implemented efficiently via registered? I would
like to see how it can work.

MMIO doesn't have the concepts of group owner etc at all or do you
know how to build one?

>
> > >
> > > > Starting from a PCI specific (actually your proposal does not even
> > > > cover all even for PCI) solution which may easily end up with issues in other
> > transports.
> > > >
> > > Like?
> >
> > The admin command/virtqueue itself may not work well for other transport.
> > That's the drawback of your proposal while this proposal doesn't do any
> > coupling.
> >
> There is no coupling in the spec of admin command with virtqueue as Michael consistently insisted.
> And in my proposal also there is no such coupling.

I hope so but I don't think so. We need to at least do this explicitly
by moving all the state definitions to the "basic facility" part.

>
> > >
> > > > Even if you want to migrate virtio for PCI,  please at least read
> > > > Qemu migration codes for virtio and PCI, then you will soon realize
> > > > that a lot of things are missing in your proposal.
> > > >
> > > Device context is something that will be extended.
> > > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> > transport.
> >
> > This is just one mini stuff, how about PCI config space and others?
> >
> No need to migrate the PCI config space, because migration is of the virtio device, and not the underlying transport.

Let me ask you a simple question, if you don't migrate the PCI config
space, how can you guarantee that guests see the same config space
state after migration? What happens if a specific capability exists
only in the src but not the destination? Or do you want to provision
PCI capabilities?

> Therefore, one can migrate from virtio member device to a fully software based device as well and vis versa.

Please answer my question above.

>
> > Again, please read Qemu codes, a lot of things are missing in your proposal
> > now. If everything is fine to do passthrough based live migration, I'm pretty sure
> > you need more than what Qemu has since it can only do a small fraction of the
> > whole PCI.
> >
> I will read.
> Many of the pieces may be implemented by the device over time following the charter.
>
> > >
> > > > > As usual, if you have to keep arguing about not doing
> > > > > passhthrough, we are
> > > > surely past that point.
> > > >
> > > > Who is "we"?
> > > >
> > > We = You and me.
> > > From 2021, you keep objecting that passthrough must not be done.
> >
> > This is a big misunderstanding, you need to justify it or at least address the
> > concerns from any reviewer.
> >
> They are getting addressed, if you have comments, please post those comments in the actual series.
> I wouldn’t diverge to discuss in different series here.

Well, Lingshan's series was posted before you and it's you that keep
referring to your proposal here. What's more, I've asked some
questions but most of them don't have a good answer.  So I need to
stop before I can ask more.

>
> > > And blocking the work done by other technical committee members to
> > improve the virtio spec to make that happen is simply wrong.
> >
> > It's unrealistic to think that one will be 100% correct. Justify your proposal or
> > why I was wrong instead of ignoring my questions and complaining. That is why
> > we need a community. If it doesn't work, virtio provides another process for
> > convergence.
> >
> I am not expecting you to be correct at all. I totally agree that you may miss something, I may miss something.
> And this is why I repeatedly, humbly ask to converge and jointly address the passthrough mode without trap+emulation method.
> The way I understood from your comment is, passthrough for hw based device must not be done and multiple of hw vendors disagree to it.

Again, this is a big misunderstanding. Passthrough can work doesn't
mean your proposal can work. I'm asking questions and want to figure
out if/how it can work correctly. But you keep ignoring them or
raising other unrelated issues.

>
> > >
> > > > Is something like what you said here passed the vote and written to
> > > > the spec?
> > > Not only me.
> > > The virtio technical committee has agreed for nested and hardware-based
> > implementation _both_.
> > >
> > > " hardware-based implementations" is part of the virtio specification charter
> > with ballot of [1].
> > >
> > > [1] https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> >
> > Let's don't do conceptual shifts, I was asking the passthrough but you give me
> > the hardware implementation.
> >
> Passthrough devices implemented by hw which does dirty tracking and following the spec.

Why is passthrough coupled with dirty tracking?

>
> > >
> > > And passthrough hardware-based device is in the charter that we strive to
> > support.
> > >
> > > > We all know the current virtio spec is not built upon passthrough.
> > >
> > > This efforts improve the passthrough hw based implementation that should
> > not be blocked.
> >
> > Your proposal was posted only for several days and you think I would block that
> > just because I asked several questions and some of them are not answered?
> >
> If I misunderstood, then I am sorry.
> Lets progress and improve the passthrough use case without trap+emulation.

Unless any reviewer says no, the comments or concerns are a good
opportunity for you to justify your method. That's what I'm doing
right now and how the community works.

> Trap+emulation=mediation is also a valid solution for nested case.

Again. Not only for the nested case. This method has been used for
cloud vendors now.

> And I frankly see a need for both as both are solving a different problem.

Then, let's don't couple state, suspending, dirty page tracking with
admin commands.

> Trap+emulation cannot achieve passthrough mode, hence my request was not to step on each other.

It's easy to not step on others, but it would end up with duplications for sure.

>
> When both can use the common infra, it is good to do that, when they cannot, due to the technical challenges of underlying transport, they should evolve differently.
>
> > >
> > > > > Virtio does not need to stay in the weird umbrella to always mediate etc.
> > > >
> > > > It's not the mediation, we're not doing vDPA, the device model we
> > > > had in hardware and we present to guests are all virtio devices.
> > > > It's the trap and emulation which is fundamental in the world of
> > > > virtualization for the past decades. It's the model we used to
> > > > virtualize standard devices. If you want to debate this methodology, virtio
> > community is clearly the wrong forum.
> > > >
> > > I am not debating it at all. You keep bringing up the point of mediation.
> > >
> > > The proposal of [1] is clear that wants to do hardware based passthrough
> > devices with least amount of virtio level mediation.
> > >
> > > So somewhere mode of virtualizing has been used, that’s fine, it can
> > > continue with full virtualization, mediation,
> > >
> > > And also hardware based passthrough device.
> > >
> > > > >
> > > > > Series [1] will be enhanced further to support virtio passthrough
> > > > > device for
> > > > device context and more.
> > > > > Even further we like to extend the support.
> > > > >
> > > > > > Since the functionality proposed in this series focus on the
> > > > > > minimal set of the functionality for migration, it is virtio
> > > > > > specific and self contained so nothing special is required to work in the
> > nest.
> > > > >
> > > > > Maybe it is.
> > > > >
> > > > > Again, I repeat and like to converge the admin commands between
> > > > passthrough and non-passthrough cases.
> > > >
> > > > You need to prove at least that your proposal can work for the
> > > > passthrough before we can try to converge.
> > > >
> > > What do you mean by "prove"? virtio specification development is not proof
> > based method.
> >
> > For example, several of my questions were ignored.
> >
> I didn’t ignore, but if I miss, I will answer.
>
> > >
> > > If you want to participate, please review the patches and help community to
> > improve.
> >
> > See above.
> >
> > >
> > > > > If we can converge it is good.
> > > > > If not both modes can expand.
> > > > > It is not either or as use cases are different.
> > > >
> > > > Admin commands are not the cure for all, I've stated drawbacks in
> > > > other threads. Not repeating it again here.
> > > He he, sure, I am not attempting to cure all.
> > > One solution does not fit all cases.
> >
> > Then why do you want to couple migration with admin commands?
> >
> Because of following.
> 1. A device migration needs to bulk data transfer, this is something cannot be done with tiny registers.
> Cannot be done through registers, because
> a. registers are slow for bidirectional communication
> b. do not scale well with scale of VFs

That's pretty fine, but let's not limit it to a virtqueue. Virtqueue
may not work for all the cases:

I must repeat some of Ling Shan's questions since I don't see a good
answer for them now.

1) If you want to use virtqueue to do the migration with a downtime
requirement. Is the driver required to do some sort of software QOS?
For example what happens if one wants to migrate but the admin
virtqueue is out of space? And do we need a timeout for a specific
command and if yes what happens after the timeout?
2) Assuming one round of the migration requires several commands. Are
they allowed to be submitted in a batch? If yes, how is the ordering
guaranteed or we don't need it at all? If not, why do we even need a
queue?

If you're using an existing transport specific mechanism, you don't
need to care about the above. I'm not saying admin virtqueue can't
work but it definitely has more things to be considered.

>
> > > Admin commands are used to solve the specific problem for which the AQ is
> > designed for.
> > >
> > > One can make argument saying take pci fabric to 10 km distance, don’t bring
> > new virtio tcp transport...
> > >
> > > Drawing boundaries around virtio spec in certain way only makes it further
> > inferior. So please do not block advancements bring in [1].
> >
> > As a reviewer, I ask questions but some of them are ignored, do you expect the
> > reviewer to figure out by themselves?
> Sure, please review.
>
> Many of them were not questions, but assertion and conclusions that it does not fit nested.. and sub-optional etc.

I think we all agree that your proposal does not fit for nesting, no?
It demonstrates that work needs to be done in the basic facility
first.

What's more the conclusion is for coupling live migration with admin
command. This point has been clarified several times before.

>
> >
> > > We really would like to make it more robust with your rich experience and
> > inputs, if you care to participate.
> >
> > We can collaborate for sure: as I pointed out in another threads, from what I
> > can see from the both proposals of the current version:
> >
> > I see a good opportunity to build your admin commands proposal on top of this
> > proposal. Or it means, we can focus on what needs to be migrated first:
> >
> > 1) queue state
> This is just one small part of the device context
> So once a device context is read/written, it covers q.

That's a layer violation. Virtqueue is the basic facility, states need
to be defined there.

>
> > 2) inflight descriptors
> Same a q state, it is part of the device context.

Admin commands are not the only way to access device context. For
example, do you agree the virtqueue address is part of the device
context? If yes, it is available in the common configuration now.

>
> > 3) dirty pages (optional)
> > 4) device state(context) (optional)
> >
> It is same as #1 and #2.
> Splitting them from #1 and #2 is not needed.
>
> We can extend the device context to be selectively queried for nested case..
>
> > I'd leave 3 or 4 since they are very complicated features. Then we can invent an
> > interface to access those facilities? This is how this series is structured.
> >
> > And what's more, admin commands or transport specific interfaces. And when
> > we invent admin commands, you may realize you are inventing a new transport
> > which is the idea of transport via admin commands.
>
> Not really. it is not a new transport at all.
> I explained you before when you quote is as transport, it must carry the driver notifications as well..
> Otherwise it is just set of commands..

I've explained that you need admin commands to save and load all
existing virtio PCI capabilities. This means a driver can just use
those commands to work. If not, please explain why I was wrong.

Thanks





>
> The new commands are self contained anyway of [1].
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-1-parav@nvidia.com/T/#t


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:25                                                                   ` Parav Pandit
@ 2023-09-19  4:34                                                                     ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:34 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:25 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:41 AM
> >
> > On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > Sent: Wednesday, September 13, 2023 9:51 AM
> > >
> > > > we plan to implement a self-contain solution
> > > Make sure that works with device reset and FLR.
> >
> > We don't need to do that. It's out of the spec.
> >
> It is not. For the PCI member device, it needs to work reliably.

We never mentioned FLR in the PCI transport layer before and vendors
have produced tons of hardware PCI devices for several years.

If it's important, please describe it in detail in your series but it doesn't.

> Not doing means it relies on the trap+emulation, hence it just cannot complete.
> And it is ok to me.
> I just wont claim that trap+emulation is _complete_ method.
>
> > > And if not, explain that it is for mediation mode related tricks.
> >
> > It's not the tricks and again, it's not mediation but trap and emulation. It's the
> > fundamental methodology used in virtualization, so does the virtio spec.
>
> Not the virto spec of 2023 and more for new features.
> The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation based software AFAIK.

Are you saying those new features will not be suitable for software
devices? If yes, please explain why.

Or are you saying the virtio spec is not capable for hardware devices?

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  4:34                                                                     ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:34 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:25 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:41 AM
> >
> > On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > Sent: Wednesday, September 13, 2023 9:51 AM
> > >
> > > > we plan to implement a self-contain solution
> > > Make sure that works with device reset and FLR.
> >
> > We don't need to do that. It's out of the spec.
> >
> It is not. For the PCI member device, it needs to work reliably.

We never mentioned FLR in the PCI transport layer before and vendors
have produced tons of hardware PCI devices for several years.

If it's important, please describe it in detail in your series but it doesn't.

> Not doing means it relies on the trap+emulation, hence it just cannot complete.
> And it is ok to me.
> I just wont claim that trap+emulation is _complete_ method.
>
> > > And if not, explain that it is for mediation mode related tricks.
> >
> > It's not the tricks and again, it's not mediation but trap and emulation. It's the
> > fundamental methodology used in virtualization, so does the virtio spec.
>
> Not the virto spec of 2023 and more for new features.
> The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation based software AFAIK.

Are you saying those new features will not be suitable for software
devices? If yes, please explain why.

Or are you saying the virtio spec is not capable for hardware devices?

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-17  5:22                                             ` Parav Pandit
@ 2023-09-19  4:35                                               ` Jason Wang
  -1 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:35 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:22 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:41 AM
> >
> > On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > > To: Parav Pandit <parav@nvidia.com>
> > >
> > > > > One can build infinite level of nesting to not do passthrough, at
> > > > > the end user
> > > > applications remains slow.
> > > >
> > > > We are talking about nested virtualization but nested emulation. I
> > > > won't repeat the definition of virtualization but no matter how much
> > > > level of nesting, the hypervisor will try hard to let the
> > > > application run natively for most of the time, otherwise it's not the nested
> > virtualization at all.
> > > >
> > > > Nested virtualization has been supported by all major cloud vendors,
> > > > please read the relevant documentation for the performance
> > > > implications. Virtio community is not the correct place to debate
> > > > whether a nest is useful. We need to make sure the datapath could be
> > > > assigned to any nest layers without losing any fundamental facilities like
> > migration.
> > > >
> > > I am not debating. You or Lingshan claim or imply that mediation is the only
> > way to progress.
> >
> > Let me correct your temiology again. It's "trap and emulation" . It means the
> > workload runs mostly native but sometimes is trapped by the hypervisor.
> >
>
> > And it's not the only way. It's the start point since all current virtio spec is built
> > upon this methodology.
> Current spec is not the steering point to define new methods.
> So we will build the spec infra to support passthrough.
>

Passthrough migration actually, passthrough is already supported now.

> Mediation/trap-emulation where hypervisor is involved is also second use case that you are addressing.
>
> And hence, both are not mutually exclusive.
> Hence we should not debate that anymore.
>
> >
> > > And for sure virtio do not need to live in the dark shadow of mediation always.
> >
> > 99% of virtio devices are implemented in this way (which is what you call dark
> > and shadow) now.
> >
> What I am saying is one should not say mediation/trap-emulation is the only way for virtio.

Then using things like "dark shadow" is not fair.

> So let passthrough device migration to progress.

Then you need to answer or address the concerns.

>
> > > For nesting use case sure one can do mediation related mode.
> > >
> > > So only mediation is not the direction.
> >
> > CPU and MMU virtualization were all built in this way.
> >
> Not anymore. Both of them have vcpus and viommu where may things are not trapped.

We are talking about different things. I'm saying trap is a must but
you say not all are trapped.

> So as I said both has pros and cons and users will pick what fits their need and use case.
>
> > >
> > > > > So for such N and M being > 1, one can use software base emulation
> > anyway.
> > > >
> > > > No, only the control path is trapped, the datapath is still passthrough.
> > > >
> > > Again, it depends on the use case.
> >
> > No matter what use case, the definition and methodology of virtualization
> > stands still.
> >
> I will stop debating this because the core technical question is not answered.
> I don’t see a technology available that virtio can utilize to it.
> That is interface that can work without messing with device status and flr while device migration is ongoing.

Again, you need to justify it. For example, why does it mess up device
status? Why is rest ok but not suspending?

At least so far, I don't see good answers for thoses.

> Hence, methodology for passthrough and mediation/trap-emulation is fundamentally different.
> And that is just fine.
>
> > >
> > > > >
> > > > > >
> > > > > > And exposing the whole device to the guest drivers will have
> > > > > > security implications, your proposal has demonstrated that you
> > > > > > need a workaround for
> > > > > There is no security implications in passthrough.
> > > >
> > > > How can you prove this or is it even possible for you to prove this?
> > > Huh, when you claim that it is not secure, please point out exactly what is not
> > secure.
> > > Please take with PCI SIG and file CVE to PCI sig.
> >
> > I am saying it has security implications. That is why you need to explain why you
> > think it doesn't. What's more, the implications are obviously nothing related to
> > PCI SIG but a vendor virtio hardware implementation.
> >
> PCI passthough for virtio member devices and non virtio devices with P2P, and their interaction is already there in the VM.
> Device migration is not adding/removing anything, nor touching any security aspect of it.
> Because it does not need to it either.
> Device migration is making sure that it continue to exists.

Since we are discussing in the virtio community, what we care about is
the chance that guest(driver) can explore device security
vulnerabilities. In this context, exposing more means the increasing
of the attacking surfaces since we (cloud vendor) can't control guests
but the hypervisor.

>
> > >
> > > > You expose all device details to guests (especially the transport
> > > > specific details), the attack surface is increased in this way.
> > > One can say it is the opposite.
> > > Attack surface is increased in hypervisor due to mediation poking at
> > everything controlled by the guest.
> > >
> >
> > We all know such a stack has been widely used for decades. But you want to say
> > your new stack is much more secure than this?
> >
> It can be yes, because it exposes all necessary things defined in the virtio spec boundary today.
> And not involving hypervisor in core device operation.

That's perfectly fine if we can do this. But you need to justify this.

>
> > >
> > > >
> > > > What's more, a simple passthrough may lose the chance to workaround
> > > > hardware erratas and you will finally get back to the trap and emulation.
> > > Hardware errata's is not the starting point to build the software stack and
> > spec.
> >
> > It's not the starting point. But it's definitely something that needs to be
> > considered, go and see kernel codes (especially the KVM part) and you will get
> > the answer.
> >
> There are kernels which cannot be updated in field today in Nvidia cloud shipped by Redhat's OS variant.
>
> So it is invalid assumption that somehow data path does not have bug, but large part of the control plane has bug, hence it should be done in software...

Well, for sure there are cases that can't be worked around. But for
the case that it can, trap and emulation gives much more flexibility.

>
> > > What you imply is, one must never use vfio stack, one must not use vcpu
> > acceleration and everything must be emulated.
> >
> > Do I say so? Trap and emulation is the common methodology used in KVM and
> > VFIO. And if you want to replace it with a complete passthrough, you need to
> > prove your method can work.
> >
> Please review patches. I do not plan to _replace_ is either.

You define all the migration stuffs in the admin commands section,
isn't this an implicit coupling?

> Those users who want to use passthrough, can use passthrough with major traps+emulation on FLR, device_status, cvq, avq and without implementing AQ on every single member device.
> And those users who prefer trap+emualation can use that.
>
> > >
> > > Same argument of hardware errata applied to data path too.
> >
> > Anything makes datapath different? Xen used to fallback to shadow page tables
> > to workaround hardware TDP errata in the past.
> >
> > > One should not implement in hw...
> > >
> > > I disagree with such argument.
> >
> > It's not my argument.
> >
> You claimed that to overcome hw errata, one should use trap_emulation, somehow only for portion of the functionality.
> And rest portion of the functionality does not have hw errata, hence hw should be use (for example for data path). :)

I've explained before, we all know there're errata that can't be a
workaround in any way.

>
> > >
> > > You can say nesting is requirement for some use cases, so spec should support
> > it without blocking the passthrough mode.
> > > Then it is fair discussion.
> > >
> > > I will not debate further on passthrough vs control path mediation as
> > either_or approach.
> > >
> > > >
> > > > >
> > > > > > FLR at least.
> > > > > It is actually the opposite.
> > > > > FLR is supported with the proposal without any workarounds and
> > mediation.
> > > >
> > > > It's an obvious drawback but not an advantage. And it's not a must
> > > > for live migration to work. You need to prove the FLR doesn't
> > > > conflict with the live migration, and it's not only FLR but also all the other
> > PCI facilities.
> > > I don’t know what you mean by prove. It is already clear from the proposal
> > FLR is not messing with rest of the device migration infrastructure.
> > > You should read [1].
> >
> > I don't think you answered my question in that thread.
> >
> Please ask the question in that series if any, because there is no FLR, device reset interaction in passthrough between owner and member device.
>
> > >
> > > > one other
> > > > example is P2P and what's the next? As more features were added to
> > > > the PCI spec, you will have endless work in auditing the possible
> > > > conflict with the passthrough based live migration.
> > > >
> > > This drawback equally applies to mediation route where one need to do more
> > than audit where the mediation layer to be extended.
> >
> > No, for trap and emulation we don't need to do that. We only do datapath
> > assignments.
> >
> It is required, because also such paths to be audited and extended as without it the feature does not visible to the guest.

You need first answer the following questions:

1) Why FLR is a must for the guest
2) What's wrong with the current Qemu emulation of FLR for virtio-pci device

>
> > > So each method has its pros and cons. One suits one use case, other suits
> > other use case.
> > > Therefore, again attempting to claim that only mediation approach is the only
> > way to progress is incorrect.
> >
> > I never say things like this, it is your proposal that mandates migration with
> > admin commands. Could you please read what is proposed in this series
> > carefully?
> >
> Admin commands are split from the AQ so one can use the admin commands inband as well.

How can it? It couples a lot of concepts like group, owner and
members. All of these have only existed in SR-IOV so far.

I don't know how to define those for MMIO where the design wants to be
as simple as possible.

> Though, I don’t see how it can functionality work without mediation.
> This is the key technical difference between two approaches.
>
> > On top of this series, you can build your amd commands easily. But there's
> > nothing that can be done on top of your proposal.
> >
> I don’t see what more to be done on top of our proposal.

Actually it really has one, that is moving the description/definition
of those states to the basc facility part. But if we do this, why not
do it from the start? This is exactly what Lingshan's proposal did.

> If you hint nesting, than it can be done through a peer admin device to delete such admin role.
>
> > >
> > > In fact audit is still better than mediation because most audits are read only
> > work as opposed to endlessly extending trapping and adding support in core
> > stack.
> >
> > One reality that you constantly ignore is that such trapping and device models
> > have been widely used by a lot of cloud vendors for more than a decade.
> >
> It may be but, it is not the only option.

I don't say it's the only option. If most of the devices were built in
this way, we should first allow any new function to be available to
those devices and then consider other cases. Inventing a mechanism
that can't work for most of the existing devices is sub-optimal.

>
> > > Again, it is a choice that user make with the tradeoff.
> > >
> > > > >
> > > > > >
> > > > > > For non standard device we don't have choices other than
> > > > > > passthrough, but for standard devices we have other choices.
> > > > >
> > > > > Passthrough is basic requirement that we will be fulfilling.
> > > >
> > > > It has several drawbacks that I would not like to repeat. We all
> > > > know even for VFIO, it requires a trap instead of a complete passthrough.
> > > >
> > > Sure. Both has pros and cons.
> > > And both can co-exist.
> >
> > I don't see how it can co-exist with your proposal. I can see how admin
> > commands can co-exist on top of this series.
> >
> The reason to me both has difficulty is because both are solving different problem.
> And they can co-exist as two different methods to two different problems.

It's not hard to demonstrate how admin commands can be built on top.

>
> > >
> > > > > If one wants to do special nesting, may be, there.
> > > >
> > > > Nesting is not special. Go and see how it is supported by major
> > > > cloud vendors and you will get the answer. Introducing an interface
> > > > in virtio that is hard to be virtualized is even worse than writing
> > > > a compiler that can not do bootstrap compilation.
> > > We checked with more than two major cloud vendors and passthrough suffice
> > their use cases and they are not doing nesting.
> > > And other virtio vendor would also like to support native devices. So again,
> > please do not portray that nesting is the only thing and passthrough must not be
> > done.
> >
> > Where do I say passthrough must not be done? I'm saying you need to justify
> > your proposal instead of simply saying "hey, you are wrong".
> >
> I never said you are wrong. I replied to Lingshan that resuming/suspending queues after the device is suspended, is wrong, and it should not be done.
>
> > Again, nesting is not the only issue, the key point is that it's partial and not self
> > contained.
>
> Admin commands are self-contained to the owner device.
> They are not self contained in the member device, because it cannot be.

There're cases that self contained is not required for example the
provisioning. Admin commands/queues fit perfectly there.

> Self containment cannot work with device reset, flr, dma flow.

How do you define self containment? We all know that virtio can't fly
without transporting specific things ...

For the context of "self contain" I mean the basic virtio facility
needs to be self contained.

> Self containment requires mediation or renamed trap+emulation; which is the anti-goal of passtrough.
> And I am very interested if you can show how admin commands can work with device reset, flr flow WITHOUT mediation approach.

Why is it the job for me? This proposal doesn't use admin commands at all.

Thanks





> Lingshan so far didn’t answer this.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  4:35                                               ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-19  4:35 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev

On Sun, Sep 17, 2023 at 1:22 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 14, 2023 8:41 AM
> >
> > On Wed, Sep 13, 2023 at 2:06 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > > To: Parav Pandit <parav@nvidia.com>
> > >
> > > > > One can build infinite level of nesting to not do passthrough, at
> > > > > the end user
> > > > applications remains slow.
> > > >
> > > > We are talking about nested virtualization but nested emulation. I
> > > > won't repeat the definition of virtualization but no matter how much
> > > > level of nesting, the hypervisor will try hard to let the
> > > > application run natively for most of the time, otherwise it's not the nested
> > virtualization at all.
> > > >
> > > > Nested virtualization has been supported by all major cloud vendors,
> > > > please read the relevant documentation for the performance
> > > > implications. Virtio community is not the correct place to debate
> > > > whether a nest is useful. We need to make sure the datapath could be
> > > > assigned to any nest layers without losing any fundamental facilities like
> > migration.
> > > >
> > > I am not debating. You or Lingshan claim or imply that mediation is the only
> > way to progress.
> >
> > Let me correct your temiology again. It's "trap and emulation" . It means the
> > workload runs mostly native but sometimes is trapped by the hypervisor.
> >
>
> > And it's not the only way. It's the start point since all current virtio spec is built
> > upon this methodology.
> Current spec is not the steering point to define new methods.
> So we will build the spec infra to support passthrough.
>

Passthrough migration actually, passthrough is already supported now.

> Mediation/trap-emulation where hypervisor is involved is also second use case that you are addressing.
>
> And hence, both are not mutually exclusive.
> Hence we should not debate that anymore.
>
> >
> > > And for sure virtio do not need to live in the dark shadow of mediation always.
> >
> > 99% of virtio devices are implemented in this way (which is what you call dark
> > and shadow) now.
> >
> What I am saying is one should not say mediation/trap-emulation is the only way for virtio.

Then using things like "dark shadow" is not fair.

> So let passthrough device migration to progress.

Then you need to answer or address the concerns.

>
> > > For nesting use case sure one can do mediation related mode.
> > >
> > > So only mediation is not the direction.
> >
> > CPU and MMU virtualization were all built in this way.
> >
> Not anymore. Both of them have vcpus and viommu where may things are not trapped.

We are talking about different things. I'm saying trap is a must but
you say not all are trapped.

> So as I said both has pros and cons and users will pick what fits their need and use case.
>
> > >
> > > > > So for such N and M being > 1, one can use software base emulation
> > anyway.
> > > >
> > > > No, only the control path is trapped, the datapath is still passthrough.
> > > >
> > > Again, it depends on the use case.
> >
> > No matter what use case, the definition and methodology of virtualization
> > stands still.
> >
> I will stop debating this because the core technical question is not answered.
> I don’t see a technology available that virtio can utilize to it.
> That is interface that can work without messing with device status and flr while device migration is ongoing.

Again, you need to justify it. For example, why does it mess up device
status? Why is rest ok but not suspending?

At least so far, I don't see good answers for thoses.

> Hence, methodology for passthrough and mediation/trap-emulation is fundamentally different.
> And that is just fine.
>
> > >
> > > > >
> > > > > >
> > > > > > And exposing the whole device to the guest drivers will have
> > > > > > security implications, your proposal has demonstrated that you
> > > > > > need a workaround for
> > > > > There is no security implications in passthrough.
> > > >
> > > > How can you prove this or is it even possible for you to prove this?
> > > Huh, when you claim that it is not secure, please point out exactly what is not
> > secure.
> > > Please take with PCI SIG and file CVE to PCI sig.
> >
> > I am saying it has security implications. That is why you need to explain why you
> > think it doesn't. What's more, the implications are obviously nothing related to
> > PCI SIG but a vendor virtio hardware implementation.
> >
> PCI passthough for virtio member devices and non virtio devices with P2P, and their interaction is already there in the VM.
> Device migration is not adding/removing anything, nor touching any security aspect of it.
> Because it does not need to it either.
> Device migration is making sure that it continue to exists.

Since we are discussing in the virtio community, what we care about is
the chance that guest(driver) can explore device security
vulnerabilities. In this context, exposing more means the increasing
of the attacking surfaces since we (cloud vendor) can't control guests
but the hypervisor.

>
> > >
> > > > You expose all device details to guests (especially the transport
> > > > specific details), the attack surface is increased in this way.
> > > One can say it is the opposite.
> > > Attack surface is increased in hypervisor due to mediation poking at
> > everything controlled by the guest.
> > >
> >
> > We all know such a stack has been widely used for decades. But you want to say
> > your new stack is much more secure than this?
> >
> It can be yes, because it exposes all necessary things defined in the virtio spec boundary today.
> And not involving hypervisor in core device operation.

That's perfectly fine if we can do this. But you need to justify this.

>
> > >
> > > >
> > > > What's more, a simple passthrough may lose the chance to workaround
> > > > hardware erratas and you will finally get back to the trap and emulation.
> > > Hardware errata's is not the starting point to build the software stack and
> > spec.
> >
> > It's not the starting point. But it's definitely something that needs to be
> > considered, go and see kernel codes (especially the KVM part) and you will get
> > the answer.
> >
> There are kernels which cannot be updated in field today in Nvidia cloud shipped by Redhat's OS variant.
>
> So it is invalid assumption that somehow data path does not have bug, but large part of the control plane has bug, hence it should be done in software...

Well, for sure there are cases that can't be worked around. But for
the case that it can, trap and emulation gives much more flexibility.

>
> > > What you imply is, one must never use vfio stack, one must not use vcpu
> > acceleration and everything must be emulated.
> >
> > Do I say so? Trap and emulation is the common methodology used in KVM and
> > VFIO. And if you want to replace it with a complete passthrough, you need to
> > prove your method can work.
> >
> Please review patches. I do not plan to _replace_ is either.

You define all the migration stuffs in the admin commands section,
isn't this an implicit coupling?

> Those users who want to use passthrough, can use passthrough with major traps+emulation on FLR, device_status, cvq, avq and without implementing AQ on every single member device.
> And those users who prefer trap+emualation can use that.
>
> > >
> > > Same argument of hardware errata applied to data path too.
> >
> > Anything makes datapath different? Xen used to fallback to shadow page tables
> > to workaround hardware TDP errata in the past.
> >
> > > One should not implement in hw...
> > >
> > > I disagree with such argument.
> >
> > It's not my argument.
> >
> You claimed that to overcome hw errata, one should use trap_emulation, somehow only for portion of the functionality.
> And rest portion of the functionality does not have hw errata, hence hw should be use (for example for data path). :)

I've explained before, we all know there're errata that can't be a
workaround in any way.

>
> > >
> > > You can say nesting is requirement for some use cases, so spec should support
> > it without blocking the passthrough mode.
> > > Then it is fair discussion.
> > >
> > > I will not debate further on passthrough vs control path mediation as
> > either_or approach.
> > >
> > > >
> > > > >
> > > > > > FLR at least.
> > > > > It is actually the opposite.
> > > > > FLR is supported with the proposal without any workarounds and
> > mediation.
> > > >
> > > > It's an obvious drawback but not an advantage. And it's not a must
> > > > for live migration to work. You need to prove the FLR doesn't
> > > > conflict with the live migration, and it's not only FLR but also all the other
> > PCI facilities.
> > > I don’t know what you mean by prove. It is already clear from the proposal
> > FLR is not messing with rest of the device migration infrastructure.
> > > You should read [1].
> >
> > I don't think you answered my question in that thread.
> >
> Please ask the question in that series if any, because there is no FLR, device reset interaction in passthrough between owner and member device.
>
> > >
> > > > one other
> > > > example is P2P and what's the next? As more features were added to
> > > > the PCI spec, you will have endless work in auditing the possible
> > > > conflict with the passthrough based live migration.
> > > >
> > > This drawback equally applies to mediation route where one need to do more
> > than audit where the mediation layer to be extended.
> >
> > No, for trap and emulation we don't need to do that. We only do datapath
> > assignments.
> >
> It is required, because also such paths to be audited and extended as without it the feature does not visible to the guest.

You need first answer the following questions:

1) Why FLR is a must for the guest
2) What's wrong with the current Qemu emulation of FLR for virtio-pci device

>
> > > So each method has its pros and cons. One suits one use case, other suits
> > other use case.
> > > Therefore, again attempting to claim that only mediation approach is the only
> > way to progress is incorrect.
> >
> > I never say things like this, it is your proposal that mandates migration with
> > admin commands. Could you please read what is proposed in this series
> > carefully?
> >
> Admin commands are split from the AQ so one can use the admin commands inband as well.

How can it? It couples a lot of concepts like group, owner and
members. All of these have only existed in SR-IOV so far.

I don't know how to define those for MMIO where the design wants to be
as simple as possible.

> Though, I don’t see how it can functionality work without mediation.
> This is the key technical difference between two approaches.
>
> > On top of this series, you can build your amd commands easily. But there's
> > nothing that can be done on top of your proposal.
> >
> I don’t see what more to be done on top of our proposal.

Actually it really has one, that is moving the description/definition
of those states to the basc facility part. But if we do this, why not
do it from the start? This is exactly what Lingshan's proposal did.

> If you hint nesting, than it can be done through a peer admin device to delete such admin role.
>
> > >
> > > In fact audit is still better than mediation because most audits are read only
> > work as opposed to endlessly extending trapping and adding support in core
> > stack.
> >
> > One reality that you constantly ignore is that such trapping and device models
> > have been widely used by a lot of cloud vendors for more than a decade.
> >
> It may be but, it is not the only option.

I don't say it's the only option. If most of the devices were built in
this way, we should first allow any new function to be available to
those devices and then consider other cases. Inventing a mechanism
that can't work for most of the existing devices is sub-optimal.

>
> > > Again, it is a choice that user make with the tradeoff.
> > >
> > > > >
> > > > > >
> > > > > > For non standard device we don't have choices other than
> > > > > > passthrough, but for standard devices we have other choices.
> > > > >
> > > > > Passthrough is basic requirement that we will be fulfilling.
> > > >
> > > > It has several drawbacks that I would not like to repeat. We all
> > > > know even for VFIO, it requires a trap instead of a complete passthrough.
> > > >
> > > Sure. Both has pros and cons.
> > > And both can co-exist.
> >
> > I don't see how it can co-exist with your proposal. I can see how admin
> > commands can co-exist on top of this series.
> >
> The reason to me both has difficulty is because both are solving different problem.
> And they can co-exist as two different methods to two different problems.

It's not hard to demonstrate how admin commands can be built on top.

>
> > >
> > > > > If one wants to do special nesting, may be, there.
> > > >
> > > > Nesting is not special. Go and see how it is supported by major
> > > > cloud vendors and you will get the answer. Introducing an interface
> > > > in virtio that is hard to be virtualized is even worse than writing
> > > > a compiler that can not do bootstrap compilation.
> > > We checked with more than two major cloud vendors and passthrough suffice
> > their use cases and they are not doing nesting.
> > > And other virtio vendor would also like to support native devices. So again,
> > please do not portray that nesting is the only thing and passthrough must not be
> > done.
> >
> > Where do I say passthrough must not be done? I'm saying you need to justify
> > your proposal instead of simply saying "hey, you are wrong".
> >
> I never said you are wrong. I replied to Lingshan that resuming/suspending queues after the device is suspended, is wrong, and it should not be done.
>
> > Again, nesting is not the only issue, the key point is that it's partial and not self
> > contained.
>
> Admin commands are self-contained to the owner device.
> They are not self contained in the member device, because it cannot be.

There're cases that self contained is not required for example the
provisioning. Admin commands/queues fit perfectly there.

> Self containment cannot work with device reset, flr, dma flow.

How do you define self containment? We all know that virtio can't fly
without transporting specific things ...

For the context of "self contain" I mean the basic virtio facility
needs to be self contained.

> Self containment requires mediation or renamed trap+emulation; which is the anti-goal of passtrough.
> And I am very interested if you can show how admin commands can work with device reset, flr flow WITHOUT mediation approach.

Why is it the job for me? This proposal doesn't use admin commands at all.

Thanks





> Lingshan so far didn’t answer this.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:25                           ` Jason Wang
@ 2023-09-19  7:32                             ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 9:56 AM

> On Sun, Sep 17, 2023 at 1:29 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 14, 2023 8:42 AM
> > >
> > > On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > >
> > > > > It's not about how many states in a single state machine, it's
> > > > > about how many state machines that exist for device status.
> > > > > Having more than one creates big obstacles and complexity in the
> > > > > device. You need to define the interaction of each state
> > > > > otherwise you leave undefined
> > > behaviours.
> > > > The device mode has zero relation to the device status.
> > >
> > > You will soon get this issue when you want to do nesting.
> > >
> > I don’t think so. One needs to intercept it when one wants to do
> trap+emulation which seems to fullfil the nesting use case.
> 
> Well, how can you trap it? You have admin vq in L0, it means the suspending is
> never exposed to L1 unless you assign the owner to L1.
> Is this what you want?
> 
When nesting is not done, it is not needed.
Only the nest cases need to trap it.
So when one want to do nesting use case, one should also place the admin peer PF in that guest.
Right assign one VF and its peer admin VF to L1.

> >
> > > > It does not mess with it at all.
> > > > In fact the new bits in device status is making it more complex
> > > > for the device
> > > to handle.
> > >
> > > Are you challenging the design of the device status? It's definitely
> > > too late to do this.
> > >
> > No. I am saying the extending device_status with yet another state is equally
> complex and its core of the device.
> 
> You never explain why.
If you are comparing two methods, then a new feature adds complexity.
Hence, they both score equal adding complexity for new feature.
In case of device_status one needs to things synchronously.
This adds complexity on the device side to answer those registers in hot downtime path.
When done over admin commands, they happen in parallel.
 
> 
> >
> > > This proposal increases just one bit and that worries you? Or you
> > > think one more state is much more complicated than a new state
> > > machine with two states?
> >
> > It is mode and not state. And two modes are needed for supporting P2P
> device.
> 
> You keep saying you are migrating the core virtio devices but then you are
> saying it is required for PCI. And you never explain why it can't be done by
> reusing the device status bit.
It cannot be done using device status bits, because hypervisor is not involved in trapping, and parsing it.
We better discuss in the actual series where things are posted.

> 
> > When one wants to do with mediation, there also two states are needed.
> >
> > The key is modes are not interacting
> 
> You need to explain why they are not interacting. It touches the virtio facility
> which (partially) overlaps the function of the device status for sure. You invent a
> new state machine, and leave the vendors to guess how or why they are not
> interacting with the existing one.
Huh, something needs explanation when there is interaction.
I explained that device_status is just another virtio register.

I missed to add the other vendors Sign-off. I will add it.

> There are just too many corner cases that need to be figured out.
> 
> For example:
> 
> How do you define stop? Is it a virtio level stop, transport level or a mixing of
> them both? 
It is defined in the series and device and driver requirements section and also in theory of operation.

> Is the device allowed to stop in the middle or reset, feature
> negotiation or even transport specific things like FLR?
Yes.

> If yes, how about other operations and who defines and maintains those
> transitional states? If not, why and how long would a stop wait for an operation?
> Can a stop fail? What happens if the driver wants to reset but the device is
> stopped by the admin commands? Who suppresses who and why?
All are described in the series in the normative. If something is missing, please put comment there and I will fix in v1.

> 
> This demonstrates the complexity of your proposal and I don't see any of the
> above were clearly stated in your series. Reusing the existing device status
> machine, everything would be simplified.
You see, too early conclusion saying things are missing, take mine...
I will fix missing items in v1, please put review comments in there.

> 
> > with the device_status because device_status is just another register of the
> virtio.
> 
> Let's don't do layer violation, device status is the basic facility of the virtio
> device which is not coupled with any transport so it is not necessarily
> implemented via registers.

There is no violation. Device_status is already part of the transport specific context.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  7:32                             ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Zhu Lingshan, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 9:56 AM

> On Sun, Sep 17, 2023 at 1:29 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 14, 2023 8:42 AM
> > >
> > > On Wed, Sep 13, 2023 at 12:46 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Wednesday, September 13, 2023 10:14 AM
> > > >
> > > > > It's not about how many states in a single state machine, it's
> > > > > about how many state machines that exist for device status.
> > > > > Having more than one creates big obstacles and complexity in the
> > > > > device. You need to define the interaction of each state
> > > > > otherwise you leave undefined
> > > behaviours.
> > > > The device mode has zero relation to the device status.
> > >
> > > You will soon get this issue when you want to do nesting.
> > >
> > I don’t think so. One needs to intercept it when one wants to do
> trap+emulation which seems to fullfil the nesting use case.
> 
> Well, how can you trap it? You have admin vq in L0, it means the suspending is
> never exposed to L1 unless you assign the owner to L1.
> Is this what you want?
> 
When nesting is not done, it is not needed.
Only the nest cases need to trap it.
So when one want to do nesting use case, one should also place the admin peer PF in that guest.
Right assign one VF and its peer admin VF to L1.

> >
> > > > It does not mess with it at all.
> > > > In fact the new bits in device status is making it more complex
> > > > for the device
> > > to handle.
> > >
> > > Are you challenging the design of the device status? It's definitely
> > > too late to do this.
> > >
> > No. I am saying the extending device_status with yet another state is equally
> complex and its core of the device.
> 
> You never explain why.
If you are comparing two methods, then a new feature adds complexity.
Hence, they both score equal adding complexity for new feature.
In case of device_status one needs to things synchronously.
This adds complexity on the device side to answer those registers in hot downtime path.
When done over admin commands, they happen in parallel.
 
> 
> >
> > > This proposal increases just one bit and that worries you? Or you
> > > think one more state is much more complicated than a new state
> > > machine with two states?
> >
> > It is mode and not state. And two modes are needed for supporting P2P
> device.
> 
> You keep saying you are migrating the core virtio devices but then you are
> saying it is required for PCI. And you never explain why it can't be done by
> reusing the device status bit.
It cannot be done using device status bits, because hypervisor is not involved in trapping, and parsing it.
We better discuss in the actual series where things are posted.

> 
> > When one wants to do with mediation, there also two states are needed.
> >
> > The key is modes are not interacting
> 
> You need to explain why they are not interacting. It touches the virtio facility
> which (partially) overlaps the function of the device status for sure. You invent a
> new state machine, and leave the vendors to guess how or why they are not
> interacting with the existing one.
Huh, something needs explanation when there is interaction.
I explained that device_status is just another virtio register.

I missed to add the other vendors Sign-off. I will add it.

> There are just too many corner cases that need to be figured out.
> 
> For example:
> 
> How do you define stop? Is it a virtio level stop, transport level or a mixing of
> them both? 
It is defined in the series and device and driver requirements section and also in theory of operation.

> Is the device allowed to stop in the middle or reset, feature
> negotiation or even transport specific things like FLR?
Yes.

> If yes, how about other operations and who defines and maintains those
> transitional states? If not, why and how long would a stop wait for an operation?
> Can a stop fail? What happens if the driver wants to reset but the device is
> stopped by the admin commands? Who suppresses who and why?
All are described in the series in the normative. If something is missing, please put comment there and I will fix in v1.

> 
> This demonstrates the complexity of your proposal and I don't see any of the
> above were clearly stated in your series. Reusing the existing device status
> machine, everything would be simplified.
You see, too early conclusion saying things are missing, take mine...
I will fix missing items in v1, please put review comments in there.

> 
> > with the device_status because device_status is just another register of the
> virtio.
> 
> Let's don't do layer violation, device status is the basic facility of the virtio
> device which is not coupled with any transport so it is not necessarily
> implemented via registers.

There is no violation. Device_status is already part of the transport specific context.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  4:27                     ` Jason Wang
@ 2023-09-19  7:32                       ` Parav Pandit
  2023-09-19  7:46                         ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: Zhu, Lingshan, virtio-dev, Michael S. Tsirkin


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 9:58 AM
> 
> On Mon, Sep 18, 2023 at 2:55 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Monday, September 18, 2023 12:19 PM
> >
> >
> > > so admin vq based LM solution can be a side channel attacking
> > > surface
> > It will be part of the DSM whenever it will be used in future.
> > Hence, it is not attack surface.
> 
> DSM is not a part of TVM. So it really depends on what kind of work did the
> admin virtqueue do. For commands that can't be self-contained like
> provisioning, it is fine, since it is done before the TDI assignment. But it not
> necessarily for your migration proposal. It seems you've found another case
> that self-containing is important:
> allowing the owner to access the member after TDI is attached to TVM is a side
> channel attack.

TVM and DSM specs will be extended in future when we get there, so core hypervisor will not be involved.
With trap+mediation, it is involved.

Lingshan wanted to take this TDISP extension in future.
So are you both aligned or not yet?

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:32                                       ` Jason Wang
@ 2023-09-19  7:32                                         ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:02 AM

> > Migration compatibility is topic in itself regardless of device migration series.
> 
> Why? Without compatibility support, migration can't work in the production
> environment.

As I said it is part of the future series. We don’t cook all features at one.
Orchestration knows when not to migrate in out of band manner.

> 
> > It is part of the feature provisioning phase needed regardless.
> 
> Definitely not, it is something that must be considered even without any feature.
I disagree to make it must.
It is not must but it is greatly useful for sure.

> It's about the robustness of the migration protocol.
> Sometimes you need to do that since some states were lost in the previous
> version of protocols or formats .
> 
> > Like how you and Lingshan wanted to keep the suspend bit series small and
> logical, device migration series is also logically split for the functionality.
> > I don’t see a need to mention the long known missing functionality and
> common to both approaches.
> 
> Again, your proposal needs to describe at least the plan for dealing with
> migration compatibility since you want a passthrough based solution. That's the
> point.
> 
No matter passthrough or no-passthrough, migration compatibility cannot be achieved if the device does not provide a way to query and configure.
Migration will fail when features mis-match.

So, I will add note to add this in future in the commit log.

> >
> > > > I will include the notes of future follow up work items in v1,
> > > > which will be
> > > taken care post this series.
> > > >
> > > > > > > Dirty page tracking in virtio is not a must for live
> > > > > > > migration to work. It can be done via platform facilities or
> > > > > > > even software. And to make it more efficient, it needs to
> > > > > > > utilize transport facilities instead of a
> > > > > general one.
> > > > > > >
> > > > > > It is also optional in the spec proposal.
> > > > > > Most platforms claimed are not able to do efficiently either,
> > > > >
> > > > > Most platforms are working towards an efficient way. But we are
> > > > > talking about different things, hardware based dirty page
> > > > > logging is not a must, that is what I'm saying. For example, KVM
> > > > > doesn't use hardware
> > > to log dirty pages.
> > > > >
> > > > I also said same, that hw based dirty page logging is not must. :)
> > > > One day hw mmu will be able to track everything efficiently. I
> > > > have not seen it
> > > happening yet.
> > >
> > > How do you define efficiency? KVM uses page fault and most modern
> > > IOMMU support PRI now.
> > >
> > One cannot define PRI as mandatory feature.
> 
> There's no way to mandate PRI, it's a PCI specific facility.
> 
You proposed it to do PRI for migration, it becomes mandatory at that point.

> > In our research and experiments we see that PRI is significantly slower to
> handle page faults.
> > Yet different topic...
> 
> PRI's performance is definitely another topic, it's just an example that tracking
> dirty pages by device is optional and transport (PCI) can evolve for sure. What's
> more important, it demonstrates the basic design of virtio, which is trying to
> leverage the transport instead of a mandatory reveinting of everything.
> 
An example that does not work is not worth and dependable technology to rely on to achieve it now.
Anyway, all will not use PRI always.

> >
> > Efficiency is defined by the downtime of the multiple devices in a VM.
> 
> Ok, but you tend to ignore my question regarding the downtime.
> 
What is the question?
Admin commands can achieve the desired downtime, if that is what you are asking.

> > And leading OS allowed device advancements by allowing device to report
> dirty pages in cpu and platform agnostic way...
> >
> 
> It has many things that I don't see a good answer for. For example, the QOS
> raised by Ling Shan.
> 
I am not going to repeat QoS anymore. :)
He is questioning virtqueue semantics itself, he better rewrite the spec to not use virtqueue.

> > One can use post-copy approach as well, current device migration is around
> established pre-copy approach.
> 
> Another drawback of your proposal. With transport specific assistance like PRI,
PRI page fault rate in our research is 20x slower than cpu page fault rate.

> you can do both pre and post. But the point is we need to make sure pre-copy
> downtime can satisfy the requirement instead of switching to another.
> 
In our work we see it satisfy the downtime requirements.

Again dirty page tracking is optional so when PRI can catch up in next few years, driver can stop relying on it.

> >
> > > >
> > > > > > hence the vfio subsystem added the support for it.
> > > > >
> > > > > As an open standard, if it is designed for a specific software
> > > > > subsystem on a specific OS, it's a failure.
> > > > >
> > > > It is not.
> > > > One need accept that, in certain areas virtio is following the
> > > > trails of
> > > advancement already done in sw stack.
> > > > So that virtio spec advancement fits in to supply such use cases.
> > > > And blocking such advancement of virtio spec to promote
> > > > only_mediation
> > > approach is not good either.
> > > >
> > > > BTW: One can say the mediation approach is also designed for
> > > > specific
> > > software subsystem and hence failure.
> > > > I will stay away from quoting it, as I don’t see it this way.
> > >
> > > The proposal is based on well known technology since the birth of
> virtualization.
> > Sure, but that does not change the fact that such series is also targeted for a
> specific software subsystem..
> 
> How, this series reuses the existing capability by introducing just two more
> registers on the existing common cfg structure and you think it targets a specific
> software subsystem? If this is true, I think you are actually challenging the
> design of the whole modern PCI transport.
> 
No. the way I understood is you are targeting trap+emulation approach that you posted.
You need to show that your mechanism also works for passthrough that it is proved that it is not targeted for a specific use case.

> > And hence failure.
> 
> Failure in what sense?
> 
You defined the failure first when quoted passthrough. :)

> >
> > I didn’t say that, I said the opposite that yes, since the virtio is in catch up
> mode, it is defining the interface so that it can fit into these OS platforms.
> > Mostly multiple of them, who all support passthrough devices.
> 
> We are talking about different things again.
> 
:)

> >
> > > I never knew a mainstream hypervisor that doesn't do trap and
> > > emulate, did you?
> > >
> > It does trap and emulation for PCI config space, not for virtio interfaces like
> queues, config space and more for passthrough devices.
> 
> Well, we are in the context of live migration, no? We all know passthrough just
> works fine with the existing virtio spec...
> 
Right and we want to continue to make passtrough work fine with device migration.
So we are in the passthrough context where only PCI specific things are trapped as before, without additional virtio traps.

> >
> > > >
> > > > > >
> > > > > > > The FLR, P2P demonstrates the fragility of a simple
> > > > > > > passthrough method and how it conflicts with live migration
> > > > > > > and complicates the device
> > > > > implementation.
> > > > > > Huh, it shows the opposite.
> > > > > > It shows that both will seamlessly work.
> > > > >
> > > > > Have you even tried your proposal with a prototype device?
> > > > Of course, it is delivered to user for 1.5 years ago before
> > > > bringing it to the
> > > spec with virtio-net and virtio-blk devices.
> > >
> > > I hope this is your serious answer, but it looks like it is not.
> > > Your proposal misses a lot of states as I pointed out in another thread, how
> can it work in fact?
> > >
> > Which states?
> 
> Let me repeat it for the third time. You don't even cover all the functionality of
> common cfg, how can guests see a consistent common cfg state?
> 
Please respond in that series, what is missing. I will fix it in v1.

> > What is posted in series [1] is minimal and base required items,
> 
> You need to prove it is minimal, instead of ignoring my questions. For example,
> dirty page tracking is definitely optional.
>
Again, reviews are not proof based. Please review the series.
 
It is optional that significantly improves the VM downtime in pre-copy approach.

> > optional one is omitted as it can be done incrementally.
> > Lingshan had hard time digesting the basics of P2P and dirty page tracking
> work in this short series.
> 
> You never explain why this series needs to deal with P2P and dirty page
> tracking.
> 
Please read my response to him, you likely missed it.

> > So there is no point in pushing large part of the device context and making the
> series blurry.
> 
> I don't see a good definition of "device context" and most of the device context
> has been covered by the existing PCI capabilities.
> 
Please respond in that patch.  Device context is well defined in the theory of operation [2] and also in the independent patch [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#u
[2] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#m12a5f675aaa95a1de8945772a3f5d1efb0c9e25e

> > It will be done incrementally subsequently.
> >
> > > > > >
> > > > > > > And it means you need to audit all PCI features and do
> > > > > > > workaround if there're any possible issues (or using a whitelist).
> > > > > > No need for any of this.
> > > > >
> > > > > You need to prove this otherwise it's fragile. It's the duty of
> > > > > the author to justify not the reviewer.
> > > > >
> > > > One cannot post patches and nor review giant series in one go.
> > > > Hence the work to be split on a logical boundary.
> > > > Features provisioning, pci layout etc is secondary tasks to take care of.
> > >
> > > Again, if you know something is missing, you need to explain it in
> > > the series instead of waiting for some reviewers to point it out and
> > > say it's well-known afterwards.
> > >
> > The patch set cannot be a laundry list of items missing in virtio spec.
> > It is short and focused on the device migration.
> 
> You need to mention it in the cover letter at least for a big picture at least,
> what's wrong with this? It helps to save time for everyone or people will keep
> asking similar questions. Is this too hard to be understood?
> 
No, it is not hard.
I will mention about adjacent features in the cover letter.
> >
> > > >
> > > > > For example FLR is required to be done in 100ms. How could you
> > > > > achieve this during the live migration? How does it affect the
> > > > > downtime and
> > > FRS?
> > > > >
> > > > Good technical question to discuss instead of passthrough vs
> > > > mediation. :)
> > > >
> > > > Device administration work is separate from the device operational part.
> > > > The device context records what is the current device context,
> > > > when the FLR
> > > occurs, the device stops all the operations.
> > > > And on next read of the device context the FLRed context is returned.
> > >
> > > Firstly, you didn't explain how it affects the live migration, for
> > > example, what happens if we try to migrate while FLR is ongoing.
> > > Secondly, you ignore the other two questions.
> > >
> > > Let's save the time of both.
> > >
> > There is nothing to explain about device reset and live migration, because
> there is absolutely there is no touch points.
> 
> Do you think this is a valid answer to my above question? Let's don't exhaust the
> patience from any reviewer.
> 
You asked follow up related questions above.
A device status update do not affect the live migration.
Reading/writing other registers do not affect the live migration.

I am not sure such explicit mention is worth in the spec, but if you find it useful, I will add it v1.

> > device_status is just another registers like rest of them.
> 
> I don't see device status itself as anything related to FLR.
> 
I don’t follow your above comment.

> > One does not need to poke around registers when doing passthrough.
> >
> > > >
> > > > > >
> > > > > > > This is tricky and we are migrating virtio not virtio-pci.
> > > > > > > If we don't use simple passthrough we don't need to care about this.
> > > > > > >
> > > > > > Exactly, we are migrating virtio device for the PCI transport.
> > > > >
> > > > > No, the migration facility is a general requirement for all transport.
> > > > It is for all transport. One can extend when do for MMIO.
> > >
> > > By using admin commands? It can not perform well for registered.
> > >
> > Yes, admin commands using AQ on MMIO based owner device will also be
> just fine.
> 
> Can admin commands be implemented efficiently via registered? I would like to
> see how it can work.
> 
Well you always liked MMIO for long time to do everything via MMIO registers, so you should define it.
I don’t see any modern device implementing it. May be vendors who want to focus on nested use case, may do.

> MMIO doesn't have the concepts of group owner etc at all or do you know how
> to build one?
I think Michael suggested to have new group type. That would work.

> 
> >
> > > >
> > > > > Starting from a PCI specific (actually your proposal does not
> > > > > even cover all even for PCI) solution which may easily end up
> > > > > with issues in other
> > > transports.
> > > > >
> > > > Like?
> > >
> > > The admin command/virtqueue itself may not work well for other transport.
> > > That's the drawback of your proposal while this proposal doesn't do
> > > any coupling.
> > >
> > There is no coupling in the spec of admin command with virtqueue as
> Michael consistently insisted.
> > And in my proposal also there is no such coupling.
> 
> I hope so but I don't think so. We need to at least do this explicitly by moving all
> the state definitions to the "basic facility" part.
I am not sure who will use it beyond device migration use case.
Maybe it can be moved at that point in future.

> 
> >
> > > >
> > > > > Even if you want to migrate virtio for PCI,  please at least
> > > > > read Qemu migration codes for virtio and PCI, then you will soon
> > > > > realize that a lot of things are missing in your proposal.
> > > > >
> > > > Device context is something that will be extended.
> > > > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> > > transport.
> > >
> > > This is just one mini stuff, how about PCI config space and others?
> > >
> > No need to migrate the PCI config space, because migration is of the virtio
> device, and not the underlying transport.
> 
> Let me ask you a simple question, if you don't migrate the PCI config space,
> how can you guarantee that guests see the same config space state after
> migration? What happens if a specific capability exists only in the src but not
> the destination? Or do you want to provision PCI capabilities?
> 
PCI capabilities are to be provisioned only if it is needed.
It is optional.
One can check if they match or not.

> > Therefore, one can migrate from virtio member device to a fully software
> based device as well and vis versa.
> 
> Please answer my question above.
> 
> >
> > > Again, please read Qemu codes, a lot of things are missing in your
> > > proposal now. If everything is fine to do passthrough based live
> > > migration, I'm pretty sure you need more than what Qemu has since it
> > > can only do a small fraction of the whole PCI.
> > >
> > I will read.
> > Many of the pieces may be implemented by the device over time following
> the charter.
> >
> > > >
> > > > > > As usual, if you have to keep arguing about not doing
> > > > > > passhthrough, we are
> > > > > surely past that point.
> > > > >
> > > > > Who is "we"?
> > > > >
> > > > We = You and me.
> > > > From 2021, you keep objecting that passthrough must not be done.
> > >
> > > This is a big misunderstanding, you need to justify it or at least
> > > address the concerns from any reviewer.
> > >
> > They are getting addressed, if you have comments, please post those
> comments in the actual series.
> > I wouldn’t diverge to discuss in different series here.
> 
> Well, Lingshan's series was posted before you and it's you that keep referring to
> your proposal here. What's more, I've asked some questions but most of them
> don't have a good answer.  So I need to stop before I can ask more.
> 
If you really want to count the timing that you got to go back to 2021 or so which series posted first. :)

Please ask your question in the relevant series and not Lingshan's series.

> >
> > > > And blocking the work done by other technical committee members to
> > > improve the virtio spec to make that happen is simply wrong.
> > >
> > > It's unrealistic to think that one will be 100% correct. Justify
> > > your proposal or why I was wrong instead of ignoring my questions
> > > and complaining. That is why we need a community. If it doesn't
> > > work, virtio provides another process for convergence.
> > >
> > I am not expecting you to be correct at all. I totally agree that you may miss
> something, I may miss something.
> > And this is why I repeatedly, humbly ask to converge and jointly address the
> passthrough mode without trap+emulation method.
> > The way I understood from your comment is, passthrough for hw based device
> must not be done and multiple of hw vendors disagree to it.
> 
> Again, this is a big misunderstanding. Passthrough can work doesn't mean your
> proposal can work. I'm asking questions and want to figure out if/how it can
> work correctly. But you keep ignoring them or raising other unrelated issues.
> 
Since you agree that passthrough is equally valid case,
Lets review the passthrough series line by line.
This is not the right email thread to review passthrough.

> >
> > > >
> > > > > Is something like what you said here passed the vote and written
> > > > > to the spec?
> > > > Not only me.
> > > > The virtio technical committee has agreed for nested and
> > > > hardware-based
> > > implementation _both_.
> > > >
> > > > " hardware-based implementations" is part of the virtio
> > > > specification charter
> > > with ballot of [1].
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> > >
> > > Let's don't do conceptual shifts, I was asking the passthrough but
> > > you give me the hardware implementation.
> > >
> > Passthrough devices implemented by hw which does dirty tracking and
> following the spec.
> 
> Why is passthrough coupled with dirty tracking?
> 
You are free to use and extend it without passthrough too.
Can you please explain the use case of doing dirty tracking that you have in mind without passthrough device migration, due to which you want to see it differently?

> >
> > > >
> > > > And passthrough hardware-based device is in the charter that we
> > > > strive to
> > > support.
> > > >
> > > > > We all know the current virtio spec is not built upon passthrough.
> > > >
> > > > This efforts improve the passthrough hw based implementation that
> > > > should
> > > not be blocked.
> > >
> > > Your proposal was posted only for several days and you think I would
> > > block that just because I asked several questions and some of them are not
> answered?
> > >
> > If I misunderstood, then I am sorry.
> > Lets progress and improve the passthrough use case without trap+emulation.
> 
> Unless any reviewer says no, the comments or concerns are a good opportunity
> for you to justify your method. That's what I'm doing right now and how the
> community works.
> 
So lets please continue the review of passthrough work in that series. No need to do it here.

> > Trap+emulation=mediation is also a valid solution for nested case.
> 
> Again. Not only for the nested case. This method has been used for cloud
> vendors now.
> 
We are advancing the virtio spec for future, and there is no reason for it to be limited to only nested case.

> > And I frankly see a need for both as both are solving a different problem.
> 
> Then, let's don't couple state, suspending, dirty page tracking with admin
> commands.
> 
Please explain the use case for your proposal.
I think it is incorrect to say coupled.
It is the way to do it.
One can invent some other way when admin commands does not fit the requirement for the explained use case.

> > Trap+emulation cannot achieve passthrough mode, hence my request was not
> to step on each other.
> 
> It's easy to not step on others, but it would end up with duplications for sure.
> 
This is why I keep asking the author to review others work to converge, but author is not cooperative to do the joint community work.

> >
> > When both can use the common infra, it is good to do that, when they cannot,
> due to the technical challenges of underlying transport, they should evolve
> differently.
> >
> > > >
> > > > > > Virtio does not need to stay in the weird umbrella to always mediate
> etc.
> > > > >
> > > > > It's not the mediation, we're not doing vDPA, the device model
> > > > > we had in hardware and we present to guests are all virtio devices.
> > > > > It's the trap and emulation which is fundamental in the world of
> > > > > virtualization for the past decades. It's the model we used to
> > > > > virtualize standard devices. If you want to debate this
> > > > > methodology, virtio
> > > community is clearly the wrong forum.
> > > > >
> > > > I am not debating it at all. You keep bringing up the point of mediation.
> > > >
> > > > The proposal of [1] is clear that wants to do hardware based
> > > > passthrough
> > > devices with least amount of virtio level mediation.
> > > >
> > > > So somewhere mode of virtualizing has been used, that’s fine, it
> > > > can continue with full virtualization, mediation,
> > > >
> > > > And also hardware based passthrough device.
> > > >
> > > > > >
> > > > > > Series [1] will be enhanced further to support virtio
> > > > > > passthrough device for
> > > > > device context and more.
> > > > > > Even further we like to extend the support.
> > > > > >
> > > > > > > Since the functionality proposed in this series focus on the
> > > > > > > minimal set of the functionality for migration, it is virtio
> > > > > > > specific and self contained so nothing special is required
> > > > > > > to work in the
> > > nest.
> > > > > >
> > > > > > Maybe it is.
> > > > > >
> > > > > > Again, I repeat and like to converge the admin commands
> > > > > > between
> > > > > passthrough and non-passthrough cases.
> > > > >
> > > > > You need to prove at least that your proposal can work for the
> > > > > passthrough before we can try to converge.
> > > > >
> > > > What do you mean by "prove"? virtio specification development is
> > > > not proof
> > > based method.
> > >
> > > For example, several of my questions were ignored.
> > >
> > I didn’t ignore, but if I miss, I will answer.
> >
> > > >
> > > > If you want to participate, please review the patches and help
> > > > community to
> > > improve.
> > >
> > > See above.
> > >
> > > >
> > > > > > If we can converge it is good.
> > > > > > If not both modes can expand.
> > > > > > It is not either or as use cases are different.
> > > > >
> > > > > Admin commands are not the cure for all, I've stated drawbacks
> > > > > in other threads. Not repeating it again here.
> > > > He he, sure, I am not attempting to cure all.
> > > > One solution does not fit all cases.
> > >
> > > Then why do you want to couple migration with admin commands?
> > >
> > Because of following.
> > 1. A device migration needs to bulk data transfer, this is something cannot be
> done with tiny registers.
> > Cannot be done through registers, because a. registers are slow for
> > bidirectional communication b. do not scale well with scale of VFs
> 
> That's pretty fine, but let's not limit it to a virtqueue. Virtqueue may not work
> for all the cases:
> 
> I must repeat some of Ling Shan's questions since I don't see a good answer for
> them now.
> 
> 1) If you want to use virtqueue to do the migration with a downtime
> requirement. Is the driver required to do some sort of software QOS?
Should not require.

> For example what happens if one wants to migrate but the admin virtqueue is
> out of space? 
When error code EGAIN is returned, migration may be retried.
Alternatively, driver can also wait and retry.
A device may be able to support multiple VQs as well.
Many options are possible.

> And do we need a timeout for a specific command and if yes what
> happens after the timeout?
If timeout occurs, it is likely a failure from the device.
One can retry or mark the device in error.

> 2) Assuming one round of the migration requires several commands. Are they
> allowed to be submitted in a batch? 
Yes, can be submitted in the batch when they are unrelated.

> If yes, how is the ordering guaranteed or
> we don't need it at all? If not, why do we even need a queue?
> 
Software can order them if needed.
In the OS UAPI we explored, does not require any ordering.
Queue to parallelize the work of multiple unrelated member device migrations.

> If you're using an existing transport specific mechanism, you don't need to care
> about the above. I'm not saying admin virtqueue can't work but it definitely has
> more things to be considered.
> 
Ok. yes, admin virtqueue is considered as transport agnostic method.

> >
> > > > Admin commands are used to solve the specific problem for which
> > > > the AQ is
> > > designed for.
> > > >
> > > > One can make argument saying take pci fabric to 10 km distance,
> > > > don’t bring
> > > new virtio tcp transport...
> > > >
> > > > Drawing boundaries around virtio spec in certain way only makes it
> > > > further
> > > inferior. So please do not block advancements bring in [1].
> > >
> > > As a reviewer, I ask questions but some of them are ignored, do you
> > > expect the reviewer to figure out by themselves?
> > Sure, please review.
> >
> > Many of them were not questions, but assertion and conclusions that it does
> not fit nested.. and sub-optional etc.
> 
> I think we all agree that your proposal does not fit for nesting, no?
Sure. I never claimed it works.

> It demonstrates that work needs to be done in the basic facility first.
You continue to claim to hint nesting + trap+emulation must be  done first.
I disagree to what you define as first.
Both are valid use cases and both can progress.

> 
> What's more the conclusion is for coupling live migration with admin
> command. This point has been clarified several times before.
Well it has to be connected to something.
Equally to say the live migration to be not connected with device status.

> 
> >
> > >
> > > > We really would like to make it more robust with your rich
> > > > experience and
> > > inputs, if you care to participate.
> > >
> > > We can collaborate for sure: as I pointed out in another threads,
> > > from what I can see from the both proposals of the current version:
> > >
> > > I see a good opportunity to build your admin commands proposal on
> > > top of this proposal. Or it means, we can focus on what needs to be
> migrated first:
> > >
> > > 1) queue state
> > This is just one small part of the device context So once a device
> > context is read/written, it covers q.
> 
> That's a layer violation. Virtqueue is the basic facility, states need to be defined
> there.
> 
It is not layer violation.
But anyways, the state is defined in the basic facility in my series.

> >
> > > 2) inflight descriptors
> > Same a q state, it is part of the device context.
> 
> Admin commands are not the only way to access device context. For example,
> do you agree the virtqueue address is part of the device context? If yes, it is
> available in the common configuration now.
> 
Virtqueue address is accessible to common configuration for the guest, for the obvious reason.

Device context is accessible to basic migration facility to cover the case where migration facility is not trapping any of the virtio guest accesses.

> >
> > > 3) dirty pages (optional)
> > > 4) device state(context) (optional)
> > >
> > It is same as #1 and #2.
> > Splitting them from #1 and #2 is not needed.
> >
> > We can extend the device context to be selectively queried for nested case..
> >
> > > I'd leave 3 or 4 since they are very complicated features. Then we
> > > can invent an interface to access those facilities? This is how this series is
> structured.
> > >
> > > And what's more, admin commands or transport specific interfaces.
> > > And when we invent admin commands, you may realize you are inventing
> > > a new transport which is the idea of transport via admin commands.
> >
> > Not really. it is not a new transport at all.
> > I explained you before when you quote is as transport, it must carry the driver
> notifications as well..
> > Otherwise it is just set of commands..
> 
> I've explained that you need admin commands to save and load all existing
> virtio PCI capabilities. This means a driver can just use those commands to
> work. If not, please explain why I was wrong.

Virtio pci capabilities are read only except the one, which needs to migrate.
Those caps needs to match on src and dst side.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  7:32                                         ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:02 AM

> > Migration compatibility is topic in itself regardless of device migration series.
> 
> Why? Without compatibility support, migration can't work in the production
> environment.

As I said it is part of the future series. We don’t cook all features at one.
Orchestration knows when not to migrate in out of band manner.

> 
> > It is part of the feature provisioning phase needed regardless.
> 
> Definitely not, it is something that must be considered even without any feature.
I disagree to make it must.
It is not must but it is greatly useful for sure.

> It's about the robustness of the migration protocol.
> Sometimes you need to do that since some states were lost in the previous
> version of protocols or formats .
> 
> > Like how you and Lingshan wanted to keep the suspend bit series small and
> logical, device migration series is also logically split for the functionality.
> > I don’t see a need to mention the long known missing functionality and
> common to both approaches.
> 
> Again, your proposal needs to describe at least the plan for dealing with
> migration compatibility since you want a passthrough based solution. That's the
> point.
> 
No matter passthrough or no-passthrough, migration compatibility cannot be achieved if the device does not provide a way to query and configure.
Migration will fail when features mis-match.

So, I will add note to add this in future in the commit log.

> >
> > > > I will include the notes of future follow up work items in v1,
> > > > which will be
> > > taken care post this series.
> > > >
> > > > > > > Dirty page tracking in virtio is not a must for live
> > > > > > > migration to work. It can be done via platform facilities or
> > > > > > > even software. And to make it more efficient, it needs to
> > > > > > > utilize transport facilities instead of a
> > > > > general one.
> > > > > > >
> > > > > > It is also optional in the spec proposal.
> > > > > > Most platforms claimed are not able to do efficiently either,
> > > > >
> > > > > Most platforms are working towards an efficient way. But we are
> > > > > talking about different things, hardware based dirty page
> > > > > logging is not a must, that is what I'm saying. For example, KVM
> > > > > doesn't use hardware
> > > to log dirty pages.
> > > > >
> > > > I also said same, that hw based dirty page logging is not must. :)
> > > > One day hw mmu will be able to track everything efficiently. I
> > > > have not seen it
> > > happening yet.
> > >
> > > How do you define efficiency? KVM uses page fault and most modern
> > > IOMMU support PRI now.
> > >
> > One cannot define PRI as mandatory feature.
> 
> There's no way to mandate PRI, it's a PCI specific facility.
> 
You proposed it to do PRI for migration, it becomes mandatory at that point.

> > In our research and experiments we see that PRI is significantly slower to
> handle page faults.
> > Yet different topic...
> 
> PRI's performance is definitely another topic, it's just an example that tracking
> dirty pages by device is optional and transport (PCI) can evolve for sure. What's
> more important, it demonstrates the basic design of virtio, which is trying to
> leverage the transport instead of a mandatory reveinting of everything.
> 
An example that does not work is not worth and dependable technology to rely on to achieve it now.
Anyway, all will not use PRI always.

> >
> > Efficiency is defined by the downtime of the multiple devices in a VM.
> 
> Ok, but you tend to ignore my question regarding the downtime.
> 
What is the question?
Admin commands can achieve the desired downtime, if that is what you are asking.

> > And leading OS allowed device advancements by allowing device to report
> dirty pages in cpu and platform agnostic way...
> >
> 
> It has many things that I don't see a good answer for. For example, the QOS
> raised by Ling Shan.
> 
I am not going to repeat QoS anymore. :)
He is questioning virtqueue semantics itself, he better rewrite the spec to not use virtqueue.

> > One can use post-copy approach as well, current device migration is around
> established pre-copy approach.
> 
> Another drawback of your proposal. With transport specific assistance like PRI,
PRI page fault rate in our research is 20x slower than cpu page fault rate.

> you can do both pre and post. But the point is we need to make sure pre-copy
> downtime can satisfy the requirement instead of switching to another.
> 
In our work we see it satisfy the downtime requirements.

Again dirty page tracking is optional so when PRI can catch up in next few years, driver can stop relying on it.

> >
> > > >
> > > > > > hence the vfio subsystem added the support for it.
> > > > >
> > > > > As an open standard, if it is designed for a specific software
> > > > > subsystem on a specific OS, it's a failure.
> > > > >
> > > > It is not.
> > > > One need accept that, in certain areas virtio is following the
> > > > trails of
> > > advancement already done in sw stack.
> > > > So that virtio spec advancement fits in to supply such use cases.
> > > > And blocking such advancement of virtio spec to promote
> > > > only_mediation
> > > approach is not good either.
> > > >
> > > > BTW: One can say the mediation approach is also designed for
> > > > specific
> > > software subsystem and hence failure.
> > > > I will stay away from quoting it, as I don’t see it this way.
> > >
> > > The proposal is based on well known technology since the birth of
> virtualization.
> > Sure, but that does not change the fact that such series is also targeted for a
> specific software subsystem..
> 
> How, this series reuses the existing capability by introducing just two more
> registers on the existing common cfg structure and you think it targets a specific
> software subsystem? If this is true, I think you are actually challenging the
> design of the whole modern PCI transport.
> 
No. the way I understood is you are targeting trap+emulation approach that you posted.
You need to show that your mechanism also works for passthrough that it is proved that it is not targeted for a specific use case.

> > And hence failure.
> 
> Failure in what sense?
> 
You defined the failure first when quoted passthrough. :)

> >
> > I didn’t say that, I said the opposite that yes, since the virtio is in catch up
> mode, it is defining the interface so that it can fit into these OS platforms.
> > Mostly multiple of them, who all support passthrough devices.
> 
> We are talking about different things again.
> 
:)

> >
> > > I never knew a mainstream hypervisor that doesn't do trap and
> > > emulate, did you?
> > >
> > It does trap and emulation for PCI config space, not for virtio interfaces like
> queues, config space and more for passthrough devices.
> 
> Well, we are in the context of live migration, no? We all know passthrough just
> works fine with the existing virtio spec...
> 
Right and we want to continue to make passtrough work fine with device migration.
So we are in the passthrough context where only PCI specific things are trapped as before, without additional virtio traps.

> >
> > > >
> > > > > >
> > > > > > > The FLR, P2P demonstrates the fragility of a simple
> > > > > > > passthrough method and how it conflicts with live migration
> > > > > > > and complicates the device
> > > > > implementation.
> > > > > > Huh, it shows the opposite.
> > > > > > It shows that both will seamlessly work.
> > > > >
> > > > > Have you even tried your proposal with a prototype device?
> > > > Of course, it is delivered to user for 1.5 years ago before
> > > > bringing it to the
> > > spec with virtio-net and virtio-blk devices.
> > >
> > > I hope this is your serious answer, but it looks like it is not.
> > > Your proposal misses a lot of states as I pointed out in another thread, how
> can it work in fact?
> > >
> > Which states?
> 
> Let me repeat it for the third time. You don't even cover all the functionality of
> common cfg, how can guests see a consistent common cfg state?
> 
Please respond in that series, what is missing. I will fix it in v1.

> > What is posted in series [1] is minimal and base required items,
> 
> You need to prove it is minimal, instead of ignoring my questions. For example,
> dirty page tracking is definitely optional.
>
Again, reviews are not proof based. Please review the series.
 
It is optional that significantly improves the VM downtime in pre-copy approach.

> > optional one is omitted as it can be done incrementally.
> > Lingshan had hard time digesting the basics of P2P and dirty page tracking
> work in this short series.
> 
> You never explain why this series needs to deal with P2P and dirty page
> tracking.
> 
Please read my response to him, you likely missed it.

> > So there is no point in pushing large part of the device context and making the
> series blurry.
> 
> I don't see a good definition of "device context" and most of the device context
> has been covered by the existing PCI capabilities.
> 
Please respond in that patch.  Device context is well defined in the theory of operation [2] and also in the independent patch [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#u
[2] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#m12a5f675aaa95a1de8945772a3f5d1efb0c9e25e

> > It will be done incrementally subsequently.
> >
> > > > > >
> > > > > > > And it means you need to audit all PCI features and do
> > > > > > > workaround if there're any possible issues (or using a whitelist).
> > > > > > No need for any of this.
> > > > >
> > > > > You need to prove this otherwise it's fragile. It's the duty of
> > > > > the author to justify not the reviewer.
> > > > >
> > > > One cannot post patches and nor review giant series in one go.
> > > > Hence the work to be split on a logical boundary.
> > > > Features provisioning, pci layout etc is secondary tasks to take care of.
> > >
> > > Again, if you know something is missing, you need to explain it in
> > > the series instead of waiting for some reviewers to point it out and
> > > say it's well-known afterwards.
> > >
> > The patch set cannot be a laundry list of items missing in virtio spec.
> > It is short and focused on the device migration.
> 
> You need to mention it in the cover letter at least for a big picture at least,
> what's wrong with this? It helps to save time for everyone or people will keep
> asking similar questions. Is this too hard to be understood?
> 
No, it is not hard.
I will mention about adjacent features in the cover letter.
> >
> > > >
> > > > > For example FLR is required to be done in 100ms. How could you
> > > > > achieve this during the live migration? How does it affect the
> > > > > downtime and
> > > FRS?
> > > > >
> > > > Good technical question to discuss instead of passthrough vs
> > > > mediation. :)
> > > >
> > > > Device administration work is separate from the device operational part.
> > > > The device context records what is the current device context,
> > > > when the FLR
> > > occurs, the device stops all the operations.
> > > > And on next read of the device context the FLRed context is returned.
> > >
> > > Firstly, you didn't explain how it affects the live migration, for
> > > example, what happens if we try to migrate while FLR is ongoing.
> > > Secondly, you ignore the other two questions.
> > >
> > > Let's save the time of both.
> > >
> > There is nothing to explain about device reset and live migration, because
> there is absolutely there is no touch points.
> 
> Do you think this is a valid answer to my above question? Let's don't exhaust the
> patience from any reviewer.
> 
You asked follow up related questions above.
A device status update do not affect the live migration.
Reading/writing other registers do not affect the live migration.

I am not sure such explicit mention is worth in the spec, but if you find it useful, I will add it v1.

> > device_status is just another registers like rest of them.
> 
> I don't see device status itself as anything related to FLR.
> 
I don’t follow your above comment.

> > One does not need to poke around registers when doing passthrough.
> >
> > > >
> > > > > >
> > > > > > > This is tricky and we are migrating virtio not virtio-pci.
> > > > > > > If we don't use simple passthrough we don't need to care about this.
> > > > > > >
> > > > > > Exactly, we are migrating virtio device for the PCI transport.
> > > > >
> > > > > No, the migration facility is a general requirement for all transport.
> > > > It is for all transport. One can extend when do for MMIO.
> > >
> > > By using admin commands? It can not perform well for registered.
> > >
> > Yes, admin commands using AQ on MMIO based owner device will also be
> just fine.
> 
> Can admin commands be implemented efficiently via registered? I would like to
> see how it can work.
> 
Well you always liked MMIO for long time to do everything via MMIO registers, so you should define it.
I don’t see any modern device implementing it. May be vendors who want to focus on nested use case, may do.

> MMIO doesn't have the concepts of group owner etc at all or do you know how
> to build one?
I think Michael suggested to have new group type. That would work.

> 
> >
> > > >
> > > > > Starting from a PCI specific (actually your proposal does not
> > > > > even cover all even for PCI) solution which may easily end up
> > > > > with issues in other
> > > transports.
> > > > >
> > > > Like?
> > >
> > > The admin command/virtqueue itself may not work well for other transport.
> > > That's the drawback of your proposal while this proposal doesn't do
> > > any coupling.
> > >
> > There is no coupling in the spec of admin command with virtqueue as
> Michael consistently insisted.
> > And in my proposal also there is no such coupling.
> 
> I hope so but I don't think so. We need to at least do this explicitly by moving all
> the state definitions to the "basic facility" part.
I am not sure who will use it beyond device migration use case.
Maybe it can be moved at that point in future.

> 
> >
> > > >
> > > > > Even if you want to migrate virtio for PCI,  please at least
> > > > > read Qemu migration codes for virtio and PCI, then you will soon
> > > > > realize that a lot of things are missing in your proposal.
> > > > >
> > > > Device context is something that will be extended.
> > > > VIRTIO_PCI_CAP_PCI_CFG will also be added as optional item for PCI
> > > transport.
> > >
> > > This is just one mini stuff, how about PCI config space and others?
> > >
> > No need to migrate the PCI config space, because migration is of the virtio
> device, and not the underlying transport.
> 
> Let me ask you a simple question, if you don't migrate the PCI config space,
> how can you guarantee that guests see the same config space state after
> migration? What happens if a specific capability exists only in the src but not
> the destination? Or do you want to provision PCI capabilities?
> 
PCI capabilities are to be provisioned only if it is needed.
It is optional.
One can check if they match or not.

> > Therefore, one can migrate from virtio member device to a fully software
> based device as well and vis versa.
> 
> Please answer my question above.
> 
> >
> > > Again, please read Qemu codes, a lot of things are missing in your
> > > proposal now. If everything is fine to do passthrough based live
> > > migration, I'm pretty sure you need more than what Qemu has since it
> > > can only do a small fraction of the whole PCI.
> > >
> > I will read.
> > Many of the pieces may be implemented by the device over time following
> the charter.
> >
> > > >
> > > > > > As usual, if you have to keep arguing about not doing
> > > > > > passhthrough, we are
> > > > > surely past that point.
> > > > >
> > > > > Who is "we"?
> > > > >
> > > > We = You and me.
> > > > From 2021, you keep objecting that passthrough must not be done.
> > >
> > > This is a big misunderstanding, you need to justify it or at least
> > > address the concerns from any reviewer.
> > >
> > They are getting addressed, if you have comments, please post those
> comments in the actual series.
> > I wouldn’t diverge to discuss in different series here.
> 
> Well, Lingshan's series was posted before you and it's you that keep referring to
> your proposal here. What's more, I've asked some questions but most of them
> don't have a good answer.  So I need to stop before I can ask more.
> 
If you really want to count the timing that you got to go back to 2021 or so which series posted first. :)

Please ask your question in the relevant series and not Lingshan's series.

> >
> > > > And blocking the work done by other technical committee members to
> > > improve the virtio spec to make that happen is simply wrong.
> > >
> > > It's unrealistic to think that one will be 100% correct. Justify
> > > your proposal or why I was wrong instead of ignoring my questions
> > > and complaining. That is why we need a community. If it doesn't
> > > work, virtio provides another process for convergence.
> > >
> > I am not expecting you to be correct at all. I totally agree that you may miss
> something, I may miss something.
> > And this is why I repeatedly, humbly ask to converge and jointly address the
> passthrough mode without trap+emulation method.
> > The way I understood from your comment is, passthrough for hw based device
> must not be done and multiple of hw vendors disagree to it.
> 
> Again, this is a big misunderstanding. Passthrough can work doesn't mean your
> proposal can work. I'm asking questions and want to figure out if/how it can
> work correctly. But you keep ignoring them or raising other unrelated issues.
> 
Since you agree that passthrough is equally valid case,
Lets review the passthrough series line by line.
This is not the right email thread to review passthrough.

> >
> > > >
> > > > > Is something like what you said here passed the vote and written
> > > > > to the spec?
> > > > Not only me.
> > > > The virtio technical committee has agreed for nested and
> > > > hardware-based
> > > implementation _both_.
> > > >
> > > > " hardware-based implementations" is part of the virtio
> > > > specification charter
> > > with ballot of [1].
> > > >
> > > > [1]
> > > > https://lists.oasis-open.org/archives/virtio/202104/msg00038.html
> > >
> > > Let's don't do conceptual shifts, I was asking the passthrough but
> > > you give me the hardware implementation.
> > >
> > Passthrough devices implemented by hw which does dirty tracking and
> following the spec.
> 
> Why is passthrough coupled with dirty tracking?
> 
You are free to use and extend it without passthrough too.
Can you please explain the use case of doing dirty tracking that you have in mind without passthrough device migration, due to which you want to see it differently?

> >
> > > >
> > > > And passthrough hardware-based device is in the charter that we
> > > > strive to
> > > support.
> > > >
> > > > > We all know the current virtio spec is not built upon passthrough.
> > > >
> > > > This efforts improve the passthrough hw based implementation that
> > > > should
> > > not be blocked.
> > >
> > > Your proposal was posted only for several days and you think I would
> > > block that just because I asked several questions and some of them are not
> answered?
> > >
> > If I misunderstood, then I am sorry.
> > Lets progress and improve the passthrough use case without trap+emulation.
> 
> Unless any reviewer says no, the comments or concerns are a good opportunity
> for you to justify your method. That's what I'm doing right now and how the
> community works.
> 
So lets please continue the review of passthrough work in that series. No need to do it here.

> > Trap+emulation=mediation is also a valid solution for nested case.
> 
> Again. Not only for the nested case. This method has been used for cloud
> vendors now.
> 
We are advancing the virtio spec for future, and there is no reason for it to be limited to only nested case.

> > And I frankly see a need for both as both are solving a different problem.
> 
> Then, let's don't couple state, suspending, dirty page tracking with admin
> commands.
> 
Please explain the use case for your proposal.
I think it is incorrect to say coupled.
It is the way to do it.
One can invent some other way when admin commands does not fit the requirement for the explained use case.

> > Trap+emulation cannot achieve passthrough mode, hence my request was not
> to step on each other.
> 
> It's easy to not step on others, but it would end up with duplications for sure.
> 
This is why I keep asking the author to review others work to converge, but author is not cooperative to do the joint community work.

> >
> > When both can use the common infra, it is good to do that, when they cannot,
> due to the technical challenges of underlying transport, they should evolve
> differently.
> >
> > > >
> > > > > > Virtio does not need to stay in the weird umbrella to always mediate
> etc.
> > > > >
> > > > > It's not the mediation, we're not doing vDPA, the device model
> > > > > we had in hardware and we present to guests are all virtio devices.
> > > > > It's the trap and emulation which is fundamental in the world of
> > > > > virtualization for the past decades. It's the model we used to
> > > > > virtualize standard devices. If you want to debate this
> > > > > methodology, virtio
> > > community is clearly the wrong forum.
> > > > >
> > > > I am not debating it at all. You keep bringing up the point of mediation.
> > > >
> > > > The proposal of [1] is clear that wants to do hardware based
> > > > passthrough
> > > devices with least amount of virtio level mediation.
> > > >
> > > > So somewhere mode of virtualizing has been used, that’s fine, it
> > > > can continue with full virtualization, mediation,
> > > >
> > > > And also hardware based passthrough device.
> > > >
> > > > > >
> > > > > > Series [1] will be enhanced further to support virtio
> > > > > > passthrough device for
> > > > > device context and more.
> > > > > > Even further we like to extend the support.
> > > > > >
> > > > > > > Since the functionality proposed in this series focus on the
> > > > > > > minimal set of the functionality for migration, it is virtio
> > > > > > > specific and self contained so nothing special is required
> > > > > > > to work in the
> > > nest.
> > > > > >
> > > > > > Maybe it is.
> > > > > >
> > > > > > Again, I repeat and like to converge the admin commands
> > > > > > between
> > > > > passthrough and non-passthrough cases.
> > > > >
> > > > > You need to prove at least that your proposal can work for the
> > > > > passthrough before we can try to converge.
> > > > >
> > > > What do you mean by "prove"? virtio specification development is
> > > > not proof
> > > based method.
> > >
> > > For example, several of my questions were ignored.
> > >
> > I didn’t ignore, but if I miss, I will answer.
> >
> > > >
> > > > If you want to participate, please review the patches and help
> > > > community to
> > > improve.
> > >
> > > See above.
> > >
> > > >
> > > > > > If we can converge it is good.
> > > > > > If not both modes can expand.
> > > > > > It is not either or as use cases are different.
> > > > >
> > > > > Admin commands are not the cure for all, I've stated drawbacks
> > > > > in other threads. Not repeating it again here.
> > > > He he, sure, I am not attempting to cure all.
> > > > One solution does not fit all cases.
> > >
> > > Then why do you want to couple migration with admin commands?
> > >
> > Because of following.
> > 1. A device migration needs to bulk data transfer, this is something cannot be
> done with tiny registers.
> > Cannot be done through registers, because a. registers are slow for
> > bidirectional communication b. do not scale well with scale of VFs
> 
> That's pretty fine, but let's not limit it to a virtqueue. Virtqueue may not work
> for all the cases:
> 
> I must repeat some of Ling Shan's questions since I don't see a good answer for
> them now.
> 
> 1) If you want to use virtqueue to do the migration with a downtime
> requirement. Is the driver required to do some sort of software QOS?
Should not require.

> For example what happens if one wants to migrate but the admin virtqueue is
> out of space? 
When error code EGAIN is returned, migration may be retried.
Alternatively, driver can also wait and retry.
A device may be able to support multiple VQs as well.
Many options are possible.

> And do we need a timeout for a specific command and if yes what
> happens after the timeout?
If timeout occurs, it is likely a failure from the device.
One can retry or mark the device in error.

> 2) Assuming one round of the migration requires several commands. Are they
> allowed to be submitted in a batch? 
Yes, can be submitted in the batch when they are unrelated.

> If yes, how is the ordering guaranteed or
> we don't need it at all? If not, why do we even need a queue?
> 
Software can order them if needed.
In the OS UAPI we explored, does not require any ordering.
Queue to parallelize the work of multiple unrelated member device migrations.

> If you're using an existing transport specific mechanism, you don't need to care
> about the above. I'm not saying admin virtqueue can't work but it definitely has
> more things to be considered.
> 
Ok. yes, admin virtqueue is considered as transport agnostic method.

> >
> > > > Admin commands are used to solve the specific problem for which
> > > > the AQ is
> > > designed for.
> > > >
> > > > One can make argument saying take pci fabric to 10 km distance,
> > > > don’t bring
> > > new virtio tcp transport...
> > > >
> > > > Drawing boundaries around virtio spec in certain way only makes it
> > > > further
> > > inferior. So please do not block advancements bring in [1].
> > >
> > > As a reviewer, I ask questions but some of them are ignored, do you
> > > expect the reviewer to figure out by themselves?
> > Sure, please review.
> >
> > Many of them were not questions, but assertion and conclusions that it does
> not fit nested.. and sub-optional etc.
> 
> I think we all agree that your proposal does not fit for nesting, no?
Sure. I never claimed it works.

> It demonstrates that work needs to be done in the basic facility first.
You continue to claim to hint nesting + trap+emulation must be  done first.
I disagree to what you define as first.
Both are valid use cases and both can progress.

> 
> What's more the conclusion is for coupling live migration with admin
> command. This point has been clarified several times before.
Well it has to be connected to something.
Equally to say the live migration to be not connected with device status.

> 
> >
> > >
> > > > We really would like to make it more robust with your rich
> > > > experience and
> > > inputs, if you care to participate.
> > >
> > > We can collaborate for sure: as I pointed out in another threads,
> > > from what I can see from the both proposals of the current version:
> > >
> > > I see a good opportunity to build your admin commands proposal on
> > > top of this proposal. Or it means, we can focus on what needs to be
> migrated first:
> > >
> > > 1) queue state
> > This is just one small part of the device context So once a device
> > context is read/written, it covers q.
> 
> That's a layer violation. Virtqueue is the basic facility, states need to be defined
> there.
> 
It is not layer violation.
But anyways, the state is defined in the basic facility in my series.

> >
> > > 2) inflight descriptors
> > Same a q state, it is part of the device context.
> 
> Admin commands are not the only way to access device context. For example,
> do you agree the virtqueue address is part of the device context? If yes, it is
> available in the common configuration now.
> 
Virtqueue address is accessible to common configuration for the guest, for the obvious reason.

Device context is accessible to basic migration facility to cover the case where migration facility is not trapping any of the virtio guest accesses.

> >
> > > 3) dirty pages (optional)
> > > 4) device state(context) (optional)
> > >
> > It is same as #1 and #2.
> > Splitting them from #1 and #2 is not needed.
> >
> > We can extend the device context to be selectively queried for nested case..
> >
> > > I'd leave 3 or 4 since they are very complicated features. Then we
> > > can invent an interface to access those facilities? This is how this series is
> structured.
> > >
> > > And what's more, admin commands or transport specific interfaces.
> > > And when we invent admin commands, you may realize you are inventing
> > > a new transport which is the idea of transport via admin commands.
> >
> > Not really. it is not a new transport at all.
> > I explained you before when you quote is as transport, it must carry the driver
> notifications as well..
> > Otherwise it is just set of commands..
> 
> I've explained that you need admin commands to save and load all existing
> virtio PCI capabilities. This means a driver can just use those commands to
> work. If not, please explain why I was wrong.

Virtio pci capabilities are read only except the one, which needs to migrate.
Those caps needs to match on src and dst side.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:34                                                                     ` Jason Wang
@ 2023-09-19  7:32                                                                       ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:04 AM
> 
> On Sun, Sep 17, 2023 at 1:25 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 14, 2023 8:41 AM
> > >
> > > On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > >
> > > >
> > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > Sent: Wednesday, September 13, 2023 9:51 AM
> > > >
> > > > > we plan to implement a self-contain solution
> > > > Make sure that works with device reset and FLR.
> > >
> > > We don't need to do that. It's out of the spec.
> > >
> > It is not. For the PCI member device, it needs to work reliably.
> 
> We never mentioned FLR in the PCI transport layer before and vendors have
> produced tons of hardware PCI devices for several years.
It is not mentioned like many other PCI things because its native.
What I was saying is that if you are claiming that suspend, resume etc all are some basic facilities that can work with passthrough also, please show how it works with FLR in place.

> 
> If it's important, please describe it in detail in your series but it doesn't.
> 
It is mentioned. Please review there.

> > Not doing means it relies on the trap+emulation, hence it just cannot
> complete.
> > And it is ok to me.
> > I just wont claim that trap+emulation is _complete_ method.
> >
> > > > And if not, explain that it is for mediation mode related tricks.
> > >
> > > It's not the tricks and again, it's not mediation but trap and
> > > emulation. It's the fundamental methodology used in virtualization, so does
> the virtio spec.
> >
> > Not the virto spec of 2023 and more for new features.
> > The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation
> based software AFAIK.
> 
> Are you saying those new features will not be suitable for software devices? If
> yes, please explain why.
> 
> Or are you saying the virtio spec is not capable for hardware devices?

No, you were hinting that trap+emulation is the fundamental technology of virtualization and virtio spec.

And I replied that for virtio spec 1.x only base line was spec 0.9.5, trap+emulation was not the base line when 1.x spec is drafted.

Virtio spec with few caveats is capable of hw and new sw, and it needs to continue to build new features that can work without trap+emulation mode.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  7:32                                                                       ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:04 AM
> 
> On Sun, Sep 17, 2023 at 1:25 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 14, 2023 8:41 AM
> > >
> > > On Wed, Sep 13, 2023 at 12:37 PM Parav Pandit <parav@nvidia.com>
> wrote:
> > > >
> > > >
> > > >
> > > > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > > > Sent: Wednesday, September 13, 2023 9:51 AM
> > > >
> > > > > we plan to implement a self-contain solution
> > > > Make sure that works with device reset and FLR.
> > >
> > > We don't need to do that. It's out of the spec.
> > >
> > It is not. For the PCI member device, it needs to work reliably.
> 
> We never mentioned FLR in the PCI transport layer before and vendors have
> produced tons of hardware PCI devices for several years.
It is not mentioned like many other PCI things because its native.
What I was saying is that if you are claiming that suspend, resume etc all are some basic facilities that can work with passthrough also, please show how it works with FLR in place.

> 
> If it's important, please describe it in detail in your series but it doesn't.
> 
It is mentioned. Please review there.

> > Not doing means it relies on the trap+emulation, hence it just cannot
> complete.
> > And it is ok to me.
> > I just wont claim that trap+emulation is _complete_ method.
> >
> > > > And if not, explain that it is for mediation mode related tricks.
> > >
> > > It's not the tricks and again, it's not mediation but trap and
> > > emulation. It's the fundamental methodology used in virtualization, so does
> the virtio spec.
> >
> > Not the virto spec of 2023 and more for new features.
> > The base for virtio spec 1.x was 0.9.5, but not the QEMU or other mediation
> based software AFAIK.
> 
> Are you saying those new features will not be suitable for software devices? If
> yes, please explain why.
> 
> Or are you saying the virtio spec is not capable for hardware devices?

No, you were hinting that trap+emulation is the fundamental technology of virtualization and virtio spec.

And I replied that for virtio spec 1.x only base line was spec 0.9.5, trap+emulation was not the base line when 1.x spec is drafted.

Virtio spec with few caveats is capable of hw and new sw, and it needs to continue to build new features that can work without trap+emulation mode.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
  2023-09-19  4:35                                               ` Jason Wang
@ 2023-09-19  7:33                                                 ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:06 AM

> > Current spec is not the steering point to define new methods.
> > So we will build the spec infra to support passthrough.
> >
> 
> Passthrough migration actually, passthrough is already supported now.
Yes, device migration basic facility for passthrough devices.
My series adds basic facility section extension.

> 
> > Mediation/trap-emulation where hypervisor is involved is also second use
> case that you are addressing.
> >
> > And hence, both are not mutually exclusive.
> > Hence we should not debate that anymore.
> >
> > >
> > > > And for sure virtio do not need to live in the dark shadow of mediation
> always.
> > >
> > > 99% of virtio devices are implemented in this way (which is what you
> > > call dark and shadow) now.
> > >
> > What I am saying is one should not say mediation/trap-emulation is the only
> way for virtio.
> 
> Then using things like "dark shadow" is not fair.
I apologize. Lets work towards supporting device migration for passthrough as well.
The comments I hear from you hints that virtio must live its life through mediation.

> 
> > So let passthrough device migration to progress.
> 
> Then you need to answer or address the concerns.
> 
Sure. Will do once I receive the comments in the patches of [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > For nesting use case sure one can do mediation related mode.
> > > >
> > > > So only mediation is not the direction.
> > >
> > > CPU and MMU virtualization were all built in this way.
> > >
> > Not anymore. Both of them have vcpus and viommu where may things are not
> trapped.
> 
> We are talking about different things. I'm saying trap is a must but you say not
> all are trapped.
> 
To be clear, all I am saying is virtio interface level trap is not must to achieve device migration.

> > So as I said both has pros and cons and users will pick what fits their need and
> use case.
> >
> > > >
> > > > > > So for such N and M being > 1, one can use software base
> > > > > > emulation
> > > anyway.
> > > > >
> > > > > No, only the control path is trapped, the datapath is still passthrough.
> > > > >
> > > > Again, it depends on the use case.
> > >
> > > No matter what use case, the definition and methodology of
> > > virtualization stands still.
> > >
> > I will stop debating this because the core technical question is not answered.
> > I don’t see a technology available that virtio can utilize to it.
> > That is interface that can work without messing with device status and flr
> while device migration is ongoing.
> 
> Again, you need to justify it. For example, why does it mess up device status?
With FLR and device reset in place, many things like queues, device registers etc do not work.
Because they are reset while migration is going on, and inflight descriptors info is lost.

> Why is rest ok but not suspending?
> 
Rest is not ok either to me.
I am not suggesting trapping CVQ or AQ or other virtio registers either.

> At least so far, I don't see good answers for thoses.
> 
> > Hence, methodology for passthrough and mediation/trap-emulation is
> fundamentally different.
> > And that is just fine.
> >
> > > >
> > > > > >
> > > > > > >
> > > > > > > And exposing the whole device to the guest drivers will have
> > > > > > > security implications, your proposal has demonstrated that
> > > > > > > you need a workaround for
> > > > > > There is no security implications in passthrough.
> > > > >
> > > > > How can you prove this or is it even possible for you to prove this?
> > > > Huh, when you claim that it is not secure, please point out
> > > > exactly what is not
> > > secure.
> > > > Please take with PCI SIG and file CVE to PCI sig.
> > >
> > > I am saying it has security implications. That is why you need to
> > > explain why you think it doesn't. What's more, the implications are
> > > obviously nothing related to PCI SIG but a vendor virtio hardware
> implementation.
> > >
> > PCI passthough for virtio member devices and non virtio devices with P2P, and
> their interaction is already there in the VM.
> > Device migration is not adding/removing anything, nor touching any security
> aspect of it.
> > Because it does not need to it either.
> > Device migration is making sure that it continue to exists.
> 
> Since we are discussing in the virtio community, what we care about is the
> chance that guest(driver) can explore device security vulnerabilities. In this
> context, exposing more means the increasing of the attacking surfaces since we
> (cloud vendor) can't control guests but the hypervisor.
> 
Guest (driver) can explore device security vulnerabilities in many areas not just device status and cvq.
So it is not good answer to me.

> >
> > > >
> > > > > You expose all device details to guests (especially the
> > > > > transport specific details), the attack surface is increased in this way.
> > > > One can say it is the opposite.
> > > > Attack surface is increased in hypervisor due to mediation poking
> > > > at
> > > everything controlled by the guest.
> > > >
> > >
> > > We all know such a stack has been widely used for decades. But you
> > > want to say your new stack is much more secure than this?
> > >
> > It can be yes, because it exposes all necessary things defined in the virtio spec
> boundary today.
> > And not involving hypervisor in core device operation.
> 
> That's perfectly fine if we can do this. But you need to justify this.
> 
We are not inventing any new things here. As you acknowledged passthtrough devices are already there...
> >
> > > >
> > > > >
> > > > > What's more, a simple passthrough may lose the chance to
> > > > > workaround hardware erratas and you will finally get back to the trap
> and emulation.
> > > > Hardware errata's is not the starting point to build the software
> > > > stack and
> > > spec.
> > >
> > > It's not the starting point. But it's definitely something that
> > > needs to be considered, go and see kernel codes (especially the KVM
> > > part) and you will get the answer.
> > >
> > There are kernels which cannot be updated in field today in Nvidia cloud
> shipped by Redhat's OS variant.
> >
> > So it is invalid assumption that somehow data path does not have bug, but
> large part of the control plane has bug, hence it should be done in software...
> 
> Well, for sure there are cases that can't be worked around. But for the case that
> it can, trap and emulation gives much more flexibility.
> 
It can. So it is not your or mine decision.
The user will pick what they want to use.
So it is invalid assumption and hence invalid point to discuss.

> >
> > > > What you imply is, one must never use vfio stack, one must not use
> > > > vcpu
> > > acceleration and everything must be emulated.
> > >
> > > Do I say so? Trap and emulation is the common methodology used in
> > > KVM and VFIO. And if you want to replace it with a complete
> > > passthrough, you need to prove your method can work.
> > >
> > Please review patches. I do not plan to _replace_ is either.
> 
> You define all the migration stuffs in the admin commands section, isn't this an
> implicit coupling?
> 
RSS is done in receive packet section. Is this coupling RSS with receive q? Yes, because it is meant for it.

You are questioning, 
How one can receive packets and post descriptors without virtqueues? Oh, descriptors are implicitly tied to virtqueues.. too bad..

If above mechanism looks coupling, than it is one connection.
May be there is another use case and more efficicent way without admin commands, I didn’t hear it so far.

I don’t see a point of writing some non-practical abstract spec.

> > Those users who want to use passthrough, can use passthrough with major
> traps+emulation on FLR, device_status, cvq, avq and without implementing AQ
> on every single member device.
> > And those users who prefer trap+emualation can use that.
> >
> > > >
> > > > Same argument of hardware errata applied to data path too.
> > >
> > > Anything makes datapath different? Xen used to fallback to shadow
> > > page tables to workaround hardware TDP errata in the past.
> > >
> > > > One should not implement in hw...
> > > >
> > > > I disagree with such argument.
> > >
> > > It's not my argument.
> > >
> > You claimed that to overcome hw errata, one should use trap_emulation,
> somehow only for portion of the functionality.
> > And rest portion of the functionality does not have hw errata, hence
> > hw should be use (for example for data path). :)
> 
> I've explained before, we all know there're errata that can't be a workaround in
> any way.
> 
No point in discussing hw errata. As it is not the goal or anti-goal for trap+emulation or passthrough either.

> >
> > > >
> > > > You can say nesting is requirement for some use cases, so spec
> > > > should support
> > > it without blocking the passthrough mode.
> > > > Then it is fair discussion.
> > > >
> > > > I will not debate further on passthrough vs control path mediation
> > > > as
> > > either_or approach.
> > > >
> > > > >
> > > > > >
> > > > > > > FLR at least.
> > > > > > It is actually the opposite.
> > > > > > FLR is supported with the proposal without any workarounds and
> > > mediation.
> > > > >
> > > > > It's an obvious drawback but not an advantage. And it's not a
> > > > > must for live migration to work. You need to prove the FLR
> > > > > doesn't conflict with the live migration, and it's not only FLR
> > > > > but also all the other
> > > PCI facilities.
> > > > I don’t know what you mean by prove. It is already clear from the
> > > > proposal
> > > FLR is not messing with rest of the device migration infrastructure.
> > > > You should read [1].
> > >
> > > I don't think you answered my question in that thread.
> > >
> > Please ask the question in that series if any, because there is no FLR, device
> reset interaction in passthrough between owner and member device.
> >
> > > >
> > > > > one other
> > > > > example is P2P and what's the next? As more features were added
> > > > > to the PCI spec, you will have endless work in auditing the
> > > > > possible conflict with the passthrough based live migration.
> > > > >
> > > > This drawback equally applies to mediation route where one need to
> > > > do more
> > > than audit where the mediation layer to be extended.
> > >
> > > No, for trap and emulation we don't need to do that. We only do
> > > datapath assignments.
> > >
> > It is required, because also such paths to be audited and extended as without
> it the feature does not visible to the guest.
> 
> You need first answer the following questions:
> 
> 1) Why FLR is a must for the guest
Because passthrough device guest does it.

> 2) What's wrong with the current Qemu emulation of FLR for virtio-pci device
> 
I am not sure QEMU discussion is relevant here.

But in general, when passthrough device is given to an example software like QEMU, it will not bisect the FLR differently from other VF.

> >
> > > > So each method has its pros and cons. One suits one use case,
> > > > other suits
> > > other use case.
> > > > Therefore, again attempting to claim that only mediation approach
> > > > is the only
> > > way to progress is incorrect.
> > >
> > > I never say things like this, it is your proposal that mandates
> > > migration with admin commands. Could you please read what is
> > > proposed in this series carefully?
> > >
> > Admin commands are split from the AQ so one can use the admin commands
> inband as well.
> 
> How can it? It couples a lot of concepts like group, owner and members. All of
> these have only existed in SR-IOV so far.
> 
It is drafted nicely by Michael that admin commands has one transport as AQ.
It can be extended for MMIO when one needs it, I will let it to the creativity of MMIO supporter to have admin queue on MMIO device like other virtqueues.

So far no one seems interested in to extend MMIO other than theoretical questions.

> I don't know how to define those for MMIO where the design wants to be as
> simple as possible.
> 
What prevents MMIO to have AQ?

> > Though, I don’t see how it can functionality work without mediation.
> > This is the key technical difference between two approaches.
> >
> > > On top of this series, you can build your amd commands easily. But
> > > there's nothing that can be done on top of your proposal.
> > >
> > I don’t see what more to be done on top of our proposal.
> 
> Actually it really has one, that is moving the description/definition of those
> states to the basc facility part. But if we do this, why not do it from the start?
> This is exactly what Lingshan's proposal did.
> 
Device context is defined in basic facility section. It is under admin commands because reading something large without command doesn't seem possible.
Lingshan is not showing how a giant RSS context, many other fields of the device can be read/written during device migration flow.
So its incomplete work, which is covered in [1] using admin commands.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead.

> > If you hint nesting, than it can be done through a peer admin device to delete
> such admin role.
> >
> > > >
> > > > In fact audit is still better than mediation because most audits
> > > > are read only
> > > work as opposed to endlessly extending trapping and adding support
> > > in core stack.
> > >
> > > One reality that you constantly ignore is that such trapping and
> > > device models have been widely used by a lot of cloud vendors for more
> than a decade.
> > >
> > It may be but, it is not the only option.
> 
> I don't say it's the only option. If most of the devices were built in this way, we
> should first allow any new function to be available to those devices and then
> consider other cases. Inventing a mechanism that can't work for most of the
> existing devices is sub-optimal.
> 
I don’t agree to "first", "cant work" and "sub-optimal".

It seems to work for more than one vendor.
Proposal [1] should work for passthrough devices that users are using.
[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > Again, it is a choice that user make with the tradeoff.
> > > >
> > > > > >
> > > > > > >
> > > > > > > For non standard device we don't have choices other than
> > > > > > > passthrough, but for standard devices we have other choices.
> > > > > >
> > > > > > Passthrough is basic requirement that we will be fulfilling.
> > > > >
> > > > > It has several drawbacks that I would not like to repeat. We all
> > > > > know even for VFIO, it requires a trap instead of a complete
> passthrough.
> > > > >
> > > > Sure. Both has pros and cons.
> > > > And both can co-exist.
> > >
> > > I don't see how it can co-exist with your proposal. I can see how
> > > admin commands can co-exist on top of this series.
> > >
> > The reason to me both has difficulty is because both are solving different
> problem.
> > And they can co-exist as two different methods to two different problems.
> 
> It's not hard to demonstrate how admin commands can be built on top.
> 
I don’t see a reason why it should be on top.

Look, if you have _real_ interest in both use cases to utilize lets work toward such definition.
If you don’t have interest, and I don’t see a point of objecting and pointing figure at using trap+emulation.

> >
> > > >
> > > > > > If one wants to do special nesting, may be, there.
> > > > >
> > > > > Nesting is not special. Go and see how it is supported by major
> > > > > cloud vendors and you will get the answer. Introducing an
> > > > > interface in virtio that is hard to be virtualized is even worse
> > > > > than writing a compiler that can not do bootstrap compilation.
> > > > We checked with more than two major cloud vendors and passthrough
> > > > suffice
> > > their use cases and they are not doing nesting.
> > > > And other virtio vendor would also like to support native devices.
> > > > So again,
> > > please do not portray that nesting is the only thing and passthrough
> > > must not be done.
> > >
> > > Where do I say passthrough must not be done? I'm saying you need to
> > > justify your proposal instead of simply saying "hey, you are wrong".
> > >
> > I never said you are wrong. I replied to Lingshan that resuming/suspending
> queues after the device is suspended, is wrong, and it should not be done.
> >
> > > Again, nesting is not the only issue, the key point is that it's
> > > partial and not self contained.
> >
> > Admin commands are self-contained to the owner device.
> > They are not self contained in the member device, because it cannot be.
> 
> There're cases that self contained is not required for example the provisioning.
> Admin commands/queues fit perfectly there.
And it is not limited to it.

> 
> > Self containment cannot work with device reset, flr, dma flow.
> 
> How do you define self containment? We all know that virtio can't fly without
> transporting specific things ...
> 
Self containment is only member device drives following without need of owner or peer device.
1. device reset
2. FLR
3. dirty page tracking
4. device context read + write

> For the context of "self contain" I mean the basic virtio facility needs to be self
> contained.
> 
> > Self containment requires mediation or renamed trap+emulation; which is the
> anti-goal of passtrough.
> > And I am very interested if you can show how admin commands can work
> with device reset, flr flow WITHOUT mediation approach.
> 
> Why is it the job for me? This proposal doesn't use admin commands at all.
Because few days back Lingshan claimed that he wants to see both needs must be addressed somehow
And you claimed that basic facility that you build in this patch can work for _any_ software, not some specific software (otherwise you quoted it as failure).

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE
@ 2023-09-19  7:33                                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu, Lingshan, Michael S. Tsirkin, eperezma, cohuck, stefanha,
	virtio-comment, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 19, 2023 10:06 AM

> > Current spec is not the steering point to define new methods.
> > So we will build the spec infra to support passthrough.
> >
> 
> Passthrough migration actually, passthrough is already supported now.
Yes, device migration basic facility for passthrough devices.
My series adds basic facility section extension.

> 
> > Mediation/trap-emulation where hypervisor is involved is also second use
> case that you are addressing.
> >
> > And hence, both are not mutually exclusive.
> > Hence we should not debate that anymore.
> >
> > >
> > > > And for sure virtio do not need to live in the dark shadow of mediation
> always.
> > >
> > > 99% of virtio devices are implemented in this way (which is what you
> > > call dark and shadow) now.
> > >
> > What I am saying is one should not say mediation/trap-emulation is the only
> way for virtio.
> 
> Then using things like "dark shadow" is not fair.
I apologize. Lets work towards supporting device migration for passthrough as well.
The comments I hear from you hints that virtio must live its life through mediation.

> 
> > So let passthrough device migration to progress.
> 
> Then you need to answer or address the concerns.
> 
Sure. Will do once I receive the comments in the patches of [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > For nesting use case sure one can do mediation related mode.
> > > >
> > > > So only mediation is not the direction.
> > >
> > > CPU and MMU virtualization were all built in this way.
> > >
> > Not anymore. Both of them have vcpus and viommu where may things are not
> trapped.
> 
> We are talking about different things. I'm saying trap is a must but you say not
> all are trapped.
> 
To be clear, all I am saying is virtio interface level trap is not must to achieve device migration.

> > So as I said both has pros and cons and users will pick what fits their need and
> use case.
> >
> > > >
> > > > > > So for such N and M being > 1, one can use software base
> > > > > > emulation
> > > anyway.
> > > > >
> > > > > No, only the control path is trapped, the datapath is still passthrough.
> > > > >
> > > > Again, it depends on the use case.
> > >
> > > No matter what use case, the definition and methodology of
> > > virtualization stands still.
> > >
> > I will stop debating this because the core technical question is not answered.
> > I don’t see a technology available that virtio can utilize to it.
> > That is interface that can work without messing with device status and flr
> while device migration is ongoing.
> 
> Again, you need to justify it. For example, why does it mess up device status?
With FLR and device reset in place, many things like queues, device registers etc do not work.
Because they are reset while migration is going on, and inflight descriptors info is lost.

> Why is rest ok but not suspending?
> 
Rest is not ok either to me.
I am not suggesting trapping CVQ or AQ or other virtio registers either.

> At least so far, I don't see good answers for thoses.
> 
> > Hence, methodology for passthrough and mediation/trap-emulation is
> fundamentally different.
> > And that is just fine.
> >
> > > >
> > > > > >
> > > > > > >
> > > > > > > And exposing the whole device to the guest drivers will have
> > > > > > > security implications, your proposal has demonstrated that
> > > > > > > you need a workaround for
> > > > > > There is no security implications in passthrough.
> > > > >
> > > > > How can you prove this or is it even possible for you to prove this?
> > > > Huh, when you claim that it is not secure, please point out
> > > > exactly what is not
> > > secure.
> > > > Please take with PCI SIG and file CVE to PCI sig.
> > >
> > > I am saying it has security implications. That is why you need to
> > > explain why you think it doesn't. What's more, the implications are
> > > obviously nothing related to PCI SIG but a vendor virtio hardware
> implementation.
> > >
> > PCI passthough for virtio member devices and non virtio devices with P2P, and
> their interaction is already there in the VM.
> > Device migration is not adding/removing anything, nor touching any security
> aspect of it.
> > Because it does not need to it either.
> > Device migration is making sure that it continue to exists.
> 
> Since we are discussing in the virtio community, what we care about is the
> chance that guest(driver) can explore device security vulnerabilities. In this
> context, exposing more means the increasing of the attacking surfaces since we
> (cloud vendor) can't control guests but the hypervisor.
> 
Guest (driver) can explore device security vulnerabilities in many areas not just device status and cvq.
So it is not good answer to me.

> >
> > > >
> > > > > You expose all device details to guests (especially the
> > > > > transport specific details), the attack surface is increased in this way.
> > > > One can say it is the opposite.
> > > > Attack surface is increased in hypervisor due to mediation poking
> > > > at
> > > everything controlled by the guest.
> > > >
> > >
> > > We all know such a stack has been widely used for decades. But you
> > > want to say your new stack is much more secure than this?
> > >
> > It can be yes, because it exposes all necessary things defined in the virtio spec
> boundary today.
> > And not involving hypervisor in core device operation.
> 
> That's perfectly fine if we can do this. But you need to justify this.
> 
We are not inventing any new things here. As you acknowledged passthtrough devices are already there...
> >
> > > >
> > > > >
> > > > > What's more, a simple passthrough may lose the chance to
> > > > > workaround hardware erratas and you will finally get back to the trap
> and emulation.
> > > > Hardware errata's is not the starting point to build the software
> > > > stack and
> > > spec.
> > >
> > > It's not the starting point. But it's definitely something that
> > > needs to be considered, go and see kernel codes (especially the KVM
> > > part) and you will get the answer.
> > >
> > There are kernels which cannot be updated in field today in Nvidia cloud
> shipped by Redhat's OS variant.
> >
> > So it is invalid assumption that somehow data path does not have bug, but
> large part of the control plane has bug, hence it should be done in software...
> 
> Well, for sure there are cases that can't be worked around. But for the case that
> it can, trap and emulation gives much more flexibility.
> 
It can. So it is not your or mine decision.
The user will pick what they want to use.
So it is invalid assumption and hence invalid point to discuss.

> >
> > > > What you imply is, one must never use vfio stack, one must not use
> > > > vcpu
> > > acceleration and everything must be emulated.
> > >
> > > Do I say so? Trap and emulation is the common methodology used in
> > > KVM and VFIO. And if you want to replace it with a complete
> > > passthrough, you need to prove your method can work.
> > >
> > Please review patches. I do not plan to _replace_ is either.
> 
> You define all the migration stuffs in the admin commands section, isn't this an
> implicit coupling?
> 
RSS is done in receive packet section. Is this coupling RSS with receive q? Yes, because it is meant for it.

You are questioning, 
How one can receive packets and post descriptors without virtqueues? Oh, descriptors are implicitly tied to virtqueues.. too bad..

If above mechanism looks coupling, than it is one connection.
May be there is another use case and more efficicent way without admin commands, I didn’t hear it so far.

I don’t see a point of writing some non-practical abstract spec.

> > Those users who want to use passthrough, can use passthrough with major
> traps+emulation on FLR, device_status, cvq, avq and without implementing AQ
> on every single member device.
> > And those users who prefer trap+emualation can use that.
> >
> > > >
> > > > Same argument of hardware errata applied to data path too.
> > >
> > > Anything makes datapath different? Xen used to fallback to shadow
> > > page tables to workaround hardware TDP errata in the past.
> > >
> > > > One should not implement in hw...
> > > >
> > > > I disagree with such argument.
> > >
> > > It's not my argument.
> > >
> > You claimed that to overcome hw errata, one should use trap_emulation,
> somehow only for portion of the functionality.
> > And rest portion of the functionality does not have hw errata, hence
> > hw should be use (for example for data path). :)
> 
> I've explained before, we all know there're errata that can't be a workaround in
> any way.
> 
No point in discussing hw errata. As it is not the goal or anti-goal for trap+emulation or passthrough either.

> >
> > > >
> > > > You can say nesting is requirement for some use cases, so spec
> > > > should support
> > > it without blocking the passthrough mode.
> > > > Then it is fair discussion.
> > > >
> > > > I will not debate further on passthrough vs control path mediation
> > > > as
> > > either_or approach.
> > > >
> > > > >
> > > > > >
> > > > > > > FLR at least.
> > > > > > It is actually the opposite.
> > > > > > FLR is supported with the proposal without any workarounds and
> > > mediation.
> > > > >
> > > > > It's an obvious drawback but not an advantage. And it's not a
> > > > > must for live migration to work. You need to prove the FLR
> > > > > doesn't conflict with the live migration, and it's not only FLR
> > > > > but also all the other
> > > PCI facilities.
> > > > I don’t know what you mean by prove. It is already clear from the
> > > > proposal
> > > FLR is not messing with rest of the device migration infrastructure.
> > > > You should read [1].
> > >
> > > I don't think you answered my question in that thread.
> > >
> > Please ask the question in that series if any, because there is no FLR, device
> reset interaction in passthrough between owner and member device.
> >
> > > >
> > > > > one other
> > > > > example is P2P and what's the next? As more features were added
> > > > > to the PCI spec, you will have endless work in auditing the
> > > > > possible conflict with the passthrough based live migration.
> > > > >
> > > > This drawback equally applies to mediation route where one need to
> > > > do more
> > > than audit where the mediation layer to be extended.
> > >
> > > No, for trap and emulation we don't need to do that. We only do
> > > datapath assignments.
> > >
> > It is required, because also such paths to be audited and extended as without
> it the feature does not visible to the guest.
> 
> You need first answer the following questions:
> 
> 1) Why FLR is a must for the guest
Because passthrough device guest does it.

> 2) What's wrong with the current Qemu emulation of FLR for virtio-pci device
> 
I am not sure QEMU discussion is relevant here.

But in general, when passthrough device is given to an example software like QEMU, it will not bisect the FLR differently from other VF.

> >
> > > > So each method has its pros and cons. One suits one use case,
> > > > other suits
> > > other use case.
> > > > Therefore, again attempting to claim that only mediation approach
> > > > is the only
> > > way to progress is incorrect.
> > >
> > > I never say things like this, it is your proposal that mandates
> > > migration with admin commands. Could you please read what is
> > > proposed in this series carefully?
> > >
> > Admin commands are split from the AQ so one can use the admin commands
> inband as well.
> 
> How can it? It couples a lot of concepts like group, owner and members. All of
> these have only existed in SR-IOV so far.
> 
It is drafted nicely by Michael that admin commands has one transport as AQ.
It can be extended for MMIO when one needs it, I will let it to the creativity of MMIO supporter to have admin queue on MMIO device like other virtqueues.

So far no one seems interested in to extend MMIO other than theoretical questions.

> I don't know how to define those for MMIO where the design wants to be as
> simple as possible.
> 
What prevents MMIO to have AQ?

> > Though, I don’t see how it can functionality work without mediation.
> > This is the key technical difference between two approaches.
> >
> > > On top of this series, you can build your amd commands easily. But
> > > there's nothing that can be done on top of your proposal.
> > >
> > I don’t see what more to be done on top of our proposal.
> 
> Actually it really has one, that is moving the description/definition of those
> states to the basc facility part. But if we do this, why not do it from the start?
> This is exactly what Lingshan's proposal did.
> 
Device context is defined in basic facility section. It is under admin commands because reading something large without command doesn't seem possible.
Lingshan is not showing how a giant RSS context, many other fields of the device can be read/written during device migration flow.
So its incomplete work, which is covered in [1] using admin commands.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead.

> > If you hint nesting, than it can be done through a peer admin device to delete
> such admin role.
> >
> > > >
> > > > In fact audit is still better than mediation because most audits
> > > > are read only
> > > work as opposed to endlessly extending trapping and adding support
> > > in core stack.
> > >
> > > One reality that you constantly ignore is that such trapping and
> > > device models have been widely used by a lot of cloud vendors for more
> than a decade.
> > >
> > It may be but, it is not the only option.
> 
> I don't say it's the only option. If most of the devices were built in this way, we
> should first allow any new function to be available to those devices and then
> consider other cases. Inventing a mechanism that can't work for most of the
> existing devices is sub-optimal.
> 
I don’t agree to "first", "cant work" and "sub-optimal".

It seems to work for more than one vendor.
Proposal [1] should work for passthrough devices that users are using.
[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-4-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

> >
> > > > Again, it is a choice that user make with the tradeoff.
> > > >
> > > > > >
> > > > > > >
> > > > > > > For non standard device we don't have choices other than
> > > > > > > passthrough, but for standard devices we have other choices.
> > > > > >
> > > > > > Passthrough is basic requirement that we will be fulfilling.
> > > > >
> > > > > It has several drawbacks that I would not like to repeat. We all
> > > > > know even for VFIO, it requires a trap instead of a complete
> passthrough.
> > > > >
> > > > Sure. Both has pros and cons.
> > > > And both can co-exist.
> > >
> > > I don't see how it can co-exist with your proposal. I can see how
> > > admin commands can co-exist on top of this series.
> > >
> > The reason to me both has difficulty is because both are solving different
> problem.
> > And they can co-exist as two different methods to two different problems.
> 
> It's not hard to demonstrate how admin commands can be built on top.
> 
I don’t see a reason why it should be on top.

Look, if you have _real_ interest in both use cases to utilize lets work toward such definition.
If you don’t have interest, and I don’t see a point of objecting and pointing figure at using trap+emulation.

> >
> > > >
> > > > > > If one wants to do special nesting, may be, there.
> > > > >
> > > > > Nesting is not special. Go and see how it is supported by major
> > > > > cloud vendors and you will get the answer. Introducing an
> > > > > interface in virtio that is hard to be virtualized is even worse
> > > > > than writing a compiler that can not do bootstrap compilation.
> > > > We checked with more than two major cloud vendors and passthrough
> > > > suffice
> > > their use cases and they are not doing nesting.
> > > > And other virtio vendor would also like to support native devices.
> > > > So again,
> > > please do not portray that nesting is the only thing and passthrough
> > > must not be done.
> > >
> > > Where do I say passthrough must not be done? I'm saying you need to
> > > justify your proposal instead of simply saying "hey, you are wrong".
> > >
> > I never said you are wrong. I replied to Lingshan that resuming/suspending
> queues after the device is suspended, is wrong, and it should not be done.
> >
> > > Again, nesting is not the only issue, the key point is that it's
> > > partial and not self contained.
> >
> > Admin commands are self-contained to the owner device.
> > They are not self contained in the member device, because it cannot be.
> 
> There're cases that self contained is not required for example the provisioning.
> Admin commands/queues fit perfectly there.
And it is not limited to it.

> 
> > Self containment cannot work with device reset, flr, dma flow.
> 
> How do you define self containment? We all know that virtio can't fly without
> transporting specific things ...
> 
Self containment is only member device drives following without need of owner or peer device.
1. device reset
2. FLR
3. dirty page tracking
4. device context read + write

> For the context of "self contain" I mean the basic virtio facility needs to be self
> contained.
> 
> > Self containment requires mediation or renamed trap+emulation; which is the
> anti-goal of passtrough.
> > And I am very interested if you can show how admin commands can work
> with device reset, flr flow WITHOUT mediation approach.
> 
> Why is it the job for me? This proposal doesn't use admin commands at all.
Because few days back Lingshan claimed that he wants to see both needs must be addressed somehow
And you claimed that basic facility that you build in this patch can work for _any_ software, not some specific software (otherwise you quoted it as failure).

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  7:32                       ` Parav Pandit
@ 2023-09-19  7:46                         ` Zhu, Lingshan
  2023-09-19  7:53                           ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-19  7:46 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang; +Cc: virtio-dev, Michael S. Tsirkin



On 9/19/2023 3:32 PM, Parav Pandit wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, September 19, 2023 9:58 AM
>>
>> On Mon, Sep 18, 2023 at 2:55 PM Parav Pandit <parav@nvidia.com> wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 18, 2023 12:19 PM
>>>
>>>> so admin vq based LM solution can be a side channel attacking
>>>> surface
>>> It will be part of the DSM whenever it will be used in future.
>>> Hence, it is not attack surface.
>> DSM is not a part of TVM. So it really depends on what kind of work did the
>> admin virtqueue do. For commands that can't be self-contained like
>> provisioning, it is fine, since it is done before the TDI assignment. But it not
>> necessarily for your migration proposal. It seems you've found another case
>> that self-containing is important:
>> allowing the owner to access the member after TDI is attached to TVM is a side
>> channel attack.
> TVM and DSM specs will be extended in future when we get there, so core hypervisor will not be involved.
> With trap+mediation, it is involved.
>
> Lingshan wanted to take this TDISP extension in future.
> So are you both aligned or not yet?
I didn't say that, never ever.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  7:46                         ` Zhu, Lingshan
@ 2023-09-19  7:53                           ` Parav Pandit
  2023-09-19  8:03                             ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  7:53 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang; +Cc: virtio-dev, Michael S. Tsirkin


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 19, 2023 1:16 PM
> 
> On 9/19/2023 3:32 PM, Parav Pandit wrote:
> >> From: Jason Wang <jasowang@redhat.com>
> >> Sent: Tuesday, September 19, 2023 9:58 AM
> >>
> >> On Mon, Sep 18, 2023 at 2:55 PM Parav Pandit <parav@nvidia.com> wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 18, 2023 12:19 PM
> >>>
> >>>> so admin vq based LM solution can be a side channel attacking
> >>>> surface
> >>> It will be part of the DSM whenever it will be used in future.
> >>> Hence, it is not attack surface.
> >> DSM is not a part of TVM. So it really depends on what kind of work
> >> did the admin virtqueue do. For commands that can't be self-contained
> >> like provisioning, it is fine, since it is done before the TDI
> >> assignment. But it not necessarily for your migration proposal. It
> >> seems you've found another case that self-containing is important:
> >> allowing the owner to access the member after TDI is attached to TVM
> >> is a side channel attack.
> > TVM and DSM specs will be extended in future when we get there, so core
> hypervisor will not be involved.
> > With trap+mediation, it is involved.
> >
> > Lingshan wanted to take this TDISP extension in future.
> > So are you both aligned or not yet?
> I didn't say that, never ever.

In your previous email you wrote,

1. "so lets focus on LM topic, other than confidential computing."
2. "again, TDISP is out of spec and TDISP devices are not migratable."

From above two comments from you I understood it that way, you want to focus now on the LM _other_than_ CC.
How to read it differently?



^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
  2023-09-18 17:30             ` Michael S. Tsirkin
@ 2023-09-19  7:56               ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-19  7:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/19/2023 1:30 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 18, 2023 at 11:02:18AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
>>> On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
>>>> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>>>>>> This commit specifies the constraints of the virtqueue state,
>>>>>> and the actions should be taken by the device when SUSPEND
>>>>>> and DRIVER_OK is set
>>>>>>
>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>> ---
>>>>>>     content.tex | 19 +++++++++++++++++++
>>>>>>     1 file changed, 19 insertions(+)
>>>>>>
>>>>>> diff --git a/content.tex b/content.tex
>>>>>> index 0fab537..9d727ce 100644
>>>>>> --- a/content.tex
>>>>>> +++ b/content.tex
>>>>>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>>>>>     When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>>>>>     is always 0
>>>>>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>>>>>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>>>>>> +used index in the used ring.
>>>>>> +
>>>>>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>>>>>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>>>>>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>>>>>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>>>>>> +the device MUST record the Virtqueue State of every enabled virtqueue
>>>>>> +in \field{Available State} and \field{Used State} respectively,
>>>>> record how?
>>>> This is transport specific, for PCI they are recorded in the common config
>>>> space,
>>>> two new fields of them are introduced in patch 5.
>>> that is not enough space to record state for every enabled vq.
>> They can work with queue_select like many other vq configurations.
> queue select is under driver control.
queue_select works for other fields like queue_size which is also RW.

It looks no difference between queue_size and vq_state.
>
>
>> I will mention this in the comment.
>>>>>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>>>>>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
>>>>> when is that?
>>>> When the DRIVER sets DRIVER_OK and done before the device presents
>>>> DRIVER_OK.
>>> I don't really understand the flow here. does SUSPEND clear DRIVER_OK
>>> then?
>> SUSPEND does not clear DRIVER, I think this is not a must.
> then I don't get what does "when DRIVER_OK is set" mean - it stays
> set all the time.
That means the driver sets DRIVER_OK.

I am not a native speaker, but This wording can be found throughout the 
spec, e.g.:

2.1.2 Device Requirements: Device Status Field

If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device MUST 
send a device configuration
change notification to the driver.
>
>
>>>
>>>>>> +
>>>>>>     \input{admin.tex}
>>>>>>     \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>>>> -- 
>>>>>> 2.35.3
>>> This publicly archived list offers a means to provide input to the
>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>
>>> In order to verify user consent to the Feedback License terms and
>>> to minimize spam in the list archive, subscription is required
>>> before posting.
>>>
>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>> List help: virtio-comment-help@lists.oasis-open.org
>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>> Committee: https://www.oasis-open.org/committees/virtio/
>>> Join OASIS: https://www.oasis-open.org/join/
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [PATCH 3/5] virtqueue: constraints for virtqueue state
@ 2023-09-19  7:56               ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-19  7:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, eperezma, cohuck, stefanha, virtio-comment, virtio-dev



On 9/19/2023 1:30 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 18, 2023 at 11:02:18AM +0800, Zhu, Lingshan wrote:
>>
>> On 9/15/2023 7:16 PM, Michael S. Tsirkin wrote:
>>> On Fri, Sep 15, 2023 at 10:59:29AM +0800, Zhu, Lingshan wrote:
>>>> On 9/14/2023 7:30 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 06, 2023 at 04:16:35PM +0800, Zhu Lingshan wrote:
>>>>>> This commit specifies the constraints of the virtqueue state,
>>>>>> and the actions should be taken by the device when SUSPEND
>>>>>> and DRIVER_OK is set
>>>>>>
>>>>>> Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>> ---
>>>>>>     content.tex | 19 +++++++++++++++++++
>>>>>>     1 file changed, 19 insertions(+)
>>>>>>
>>>>>> diff --git a/content.tex b/content.tex
>>>>>> index 0fab537..9d727ce 100644
>>>>>> --- a/content.tex
>>>>>> +++ b/content.tex
>>>>>> @@ -594,6 +594,25 @@ \subsection{\field{Used State} Field}
>>>>>>     When VIRTIO_RING_F_PACKED is not negotiated, the 16-bit value of \field{used_idx}
>>>>>>     is always 0
>>>>>> +\drivernormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated but VIRTIO_RING_F_PACKED not been negotiated,
>>>>>> +the driver SHOULD NOT access \field{Used State} of any virtqueues, it SHOULD use the
>>>>>> +used index in the used ring.
>>>>>> +
>>>>>> +\devicenormative{\subsection}{Virtqueue State}{Basic Facilities of a Virtio Device / Virtqueue State}
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE has been negotiated, the device SHOULD only accept setting
>>>>>> +Virtqueue State of any virtqueues when DRIVER_OK is not set in \field{device status},
>>>>>> +or both of DRIVER_OK and SUSPEND are set in \field{device status}.
>>>>>> +Otherwise the device MUST ignore any writes to Virtqueue State of any virtqueues.
>>>>>> +
>>>>>> +If VIRTIO_F_QUEUE_STATE have been negotiated, when SUSPEND is set,
>>>>>> +the device MUST record the Virtqueue State of every enabled virtqueue
>>>>>> +in \field{Available State} and \field{Used State} respectively,
>>>>> record how?
>>>> This is transport specific, for PCI they are recorded in the common config
>>>> space,
>>>> two new fields of them are introduced in patch 5.
>>> that is not enough space to record state for every enabled vq.
>> They can work with queue_select like many other vq configurations.
> queue select is under driver control.
queue_select works for other fields like queue_size which is also RW.

It looks no difference between queue_size and vq_state.
>
>
>> I will mention this in the comment.
>>>>>> +and correspondingly restore the Virtqueue State of every enabled virtqueue
>>>>>> +from \field{Available State} and \field{Used State} when DRIVER_OK is set.
>>>>> when is that?
>>>> When the DRIVER sets DRIVER_OK and done before the device presents
>>>> DRIVER_OK.
>>> I don't really understand the flow here. does SUSPEND clear DRIVER_OK
>>> then?
>> SUSPEND does not clear DRIVER, I think this is not a must.
> then I don't get what does "when DRIVER_OK is set" mean - it stays
> set all the time.
That means the driver sets DRIVER_OK.

I am not a native speaker, but This wording can be found throughout the 
spec, e.g.:

2.1.2 Device Requirements: Device Status Field

If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device MUST 
send a device configuration
change notification to the driver.
>
>
>>>
>>>>>> +
>>>>>>     \input{admin.tex}
>>>>>>     \chapter{General Initialization And Device Operation}\label{sec:General Initialization And Device Operation}
>>>>>> -- 
>>>>>> 2.35.3
>>> This publicly archived list offers a means to provide input to the
>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>
>>> In order to verify user consent to the Feedback License terms and
>>> to minimize spam in the list archive, subscription is required
>>> before posting.
>>>
>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>> List help: virtio-comment-help@lists.oasis-open.org
>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>> Committee: https://www.oasis-open.org/committees/virtio/
>>> Join OASIS: https://www.oasis-open.org/join/
>>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18 18:41                       ` Parav Pandit
  2023-09-18 18:49                         ` Michael S. Tsirkin
@ 2023-09-19  8:01                         ` Zhu, Lingshan
  2023-09-19  9:06                           ` Parav Pandit
  1 sibling, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-19  8:01 UTC (permalink / raw)
  To: Parav Pandit, virtio-dev, Michael S. Tsirkin, Jason Wang



On 9/19/2023 2:41 AM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Monday, September 18, 2023 3:05 PM
>>
>> On 9/18/2023 2:54 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 18, 2023 12:19 PM
>>>> so admin vq based LM solution can be a side channel attacking surface
>>> It will be part of the DSM whenever it will be used in future.
>>> Hence, it is not attack surface.
>> I am not sure, why we have to trust the PF?
>> This is out of virtio scope anyway.
>>
>> I have explained many times how it can be a attack surface, and examples.
>>
> And none of that make any sense as fundamentally, hypervisor is trusted regardless of the approach.
this is not about hypervisors, I am saying admin vq based LM solution 
can be a side channel attacking surface
Please refer to my previously listed examples and the TDISP spec is FYI.
>
>> What happen if malicious SW dump guest memory by admin vq dirty page
>> tracking feature?
> What??
> Where is this malicious SW is located, in guest VM?
host, in this attacking model.
>
>>>>>>>> For untrusted hypervisor, same set of attack surface is present
>>>>>>>> with
>>>>>>>> trap+emulation.
>>>>>>>> So both method score same. Hence its not relevant point for discussion.
>>>>>>> this is not hypervisor, Do you see any modern hypervisor have
>>>>>>> these issues?
>>>>>>>
>>>>>>> This is admin vq for LM can be a side channel attacking surface.
>>>>> It is not.
>>>>> Hypervisor is trusted entity.
>>>>> For untrusted hypervisor the TDISP is unified solution build by the
>>>>> various
>>>> industry bodies including DMTF, PCI for last few years.
>>>>> We want to utilize that.
>>>> first, TDISP is out of virtio spec.
>>> Sure, hence, untrusted hypervisor are out of scope.
>>> Otherwise, trap+emulation is equally dead which relies on the hypervisor to
>> do things.
>> so lets focus on LM topic, other than confidential computing.
> ok.
>
>>> Just because data transfer is not done, it does not mean that thousands of
>> polling register writes complete in stipulated time.
>> 1) again, they are per-device facilities
> That does not satisfy that it can somehow do work in < x usec time.
why? Do you mind take examples of basic PCI virtio common config space 
registers?
>> 2) we use very few registers, even status byte does not require polling, just re-
>> read with delay.
>>
>> Please refer to the code for setting FEATURES_OK.
> It wont work when one needs to suspend the device.
> There is no point of doing such work over registers as fundamental framework is over the AQ.
why it doesn't work?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  7:53                           ` Parav Pandit
@ 2023-09-19  8:03                             ` Zhu, Lingshan
  2023-09-19  8:31                               ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-19  8:03 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang; +Cc: virtio-dev, Michael S. Tsirkin



On 9/19/2023 3:53 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 19, 2023 1:16 PM
>>
>> On 9/19/2023 3:32 PM, Parav Pandit wrote:
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Tuesday, September 19, 2023 9:58 AM
>>>>
>>>> On Mon, Sep 18, 2023 at 2:55 PM Parav Pandit <parav@nvidia.com> wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 18, 2023 12:19 PM
>>>>>> so admin vq based LM solution can be a side channel attacking
>>>>>> surface
>>>>> It will be part of the DSM whenever it will be used in future.
>>>>> Hence, it is not attack surface.
>>>> DSM is not a part of TVM. So it really depends on what kind of work
>>>> did the admin virtqueue do. For commands that can't be self-contained
>>>> like provisioning, it is fine, since it is done before the TDI
>>>> assignment. But it not necessarily for your migration proposal. It
>>>> seems you've found another case that self-containing is important:
>>>> allowing the owner to access the member after TDI is attached to TVM
>>>> is a side channel attack.
>>> TVM and DSM specs will be extended in future when we get there, so core
>> hypervisor will not be involved.
>>> With trap+mediation, it is involved.
>>>
>>> Lingshan wanted to take this TDISP extension in future.
>>> So are you both aligned or not yet?
>> I didn't say that, never ever.
> In your previous email you wrote,
>
> 1. "so lets focus on LM topic, other than confidential computing."
> 2. "again, TDISP is out of spec and TDISP devices are not migratable."
>
>  From above two comments from you I understood it that way, you want to focus now on the LM _other_than_ CC.
> How to read it differently?
You said:"Lingshan wanted to take this TDISP extension in future."

How do you conclude this statement? Did I ever said I want to take TDISP 
extension in future?
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  8:03                             ` Zhu, Lingshan
@ 2023-09-19  8:31                               ` Parav Pandit
  2023-09-19  8:39                                 ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  8:31 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang; +Cc: virtio-dev, Michael S. Tsirkin


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 19, 2023 1:34 PM

> > In your previous email you wrote,
> >
> > 1. "so lets focus on LM topic, other than confidential computing."
> > 2. "again, TDISP is out of spec and TDISP devices are not migratable."
> >
> >  From above two comments from you I understood it that way, you want to
> focus now on the LM _other_than_ CC.
> > How to read it differently?
> You said:"Lingshan wanted to take this TDISP extension in future."
> 
Based on above two comments from you this is what I understand.
> How do you conclude this statement? Did I ever said I want to take TDISP
> extension in future?
CC and TDISP goes hand in hand.

So you want to consider TDISP extension now (and not in future)?

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  8:31                               ` Parav Pandit
@ 2023-09-19  8:39                                 ` Zhu, Lingshan
  2023-09-19  9:09                                   ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-19  8:39 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang; +Cc: virtio-dev, Michael S. Tsirkin



On 9/19/2023 4:31 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 19, 2023 1:34 PM
>>> In your previous email you wrote,
>>>
>>> 1. "so lets focus on LM topic, other than confidential computing."
>>> 2. "again, TDISP is out of spec and TDISP devices are not migratable."
>>>
>>>   From above two comments from you I understood it that way, you want to
>> focus now on the LM _other_than_ CC.
>>> How to read it differently?
>> You said:"Lingshan wanted to take this TDISP extension in future."
>>
> Based on above two comments from you this is what I understand.
I am not a native speaker, but do you see I ever mentioned that
I want to take TDISP in future? Any verbs I used?
>> How do you conclude this statement? Did I ever said I want to take TDISP
>> extension in future?
> CC and TDISP goes hand in hand.
>
> So you want to consider TDISP extension now (and not in future)?
As I said before, CC and TDISP is out of spec, that means we should 
ignore them for now.
And I don't plan to take TDISP in future, you should not confuse the 
sentences.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  8:01                         ` Zhu, Lingshan
@ 2023-09-19  9:06                           ` Parav Pandit
  2023-09-19 10:03                             ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  9:06 UTC (permalink / raw)
  To: Zhu, Lingshan, virtio-dev, Michael S. Tsirkin, Jason Wang



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 19, 2023 1:32 PM
> 
> On 9/19/2023 2:41 AM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Monday, September 18, 2023 3:05 PM
> >>
> >> On 9/18/2023 2:54 PM, Parav Pandit wrote:
> >>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >>>> Sent: Monday, September 18, 2023 12:19 PM so admin vq based LM
> >>>> solution can be a side channel attacking surface
> >>> It will be part of the DSM whenever it will be used in future.
> >>> Hence, it is not attack surface.
> >> I am not sure, why we have to trust the PF?
> >> This is out of virtio scope anyway.
> >>
> >> I have explained many times how it can be a attack surface, and examples.
> >>
> > And none of that make any sense as fundamentally, hypervisor is trusted
> regardless of the approach.
> this is not about hypervisors, I am saying admin vq based LM solution can be a
> side channel attacking surface Please refer to my previously listed examples
> and the TDISP spec is FYI.
> >
In previous email you wrote " As I said before, CC and TDISP is out of spec, that means we should ignore them for now."
So I am ignoring it now and hence, I am ignoring above comment.
Lets reach to a common ground for simplified case and than consider more complex cases.

> >> What happen if malicious SW dump guest memory by admin vq dirty page
> >> tracking feature?
> > What??
> > Where is this malicious SW is located, in guest VM?
> host, in this attacking model.
> >
> >>>>>>>> For untrusted hypervisor, same set of attack surface is present
> >>>>>>>> with
> >>>>>>>> trap+emulation.
> >>>>>>>> So both method score same. Hence its not relevant point for
> discussion.
> >>>>>>> this is not hypervisor, Do you see any modern hypervisor have
> >>>>>>> these issues?
> >>>>>>>
> >>>>>>> This is admin vq for LM can be a side channel attacking surface.
> >>>>> It is not.
> >>>>> Hypervisor is trusted entity.
> >>>>> For untrusted hypervisor the TDISP is unified solution build by
> >>>>> the various
> >>>> industry bodies including DMTF, PCI for last few years.
> >>>>> We want to utilize that.
> >>>> first, TDISP is out of virtio spec.
> >>> Sure, hence, untrusted hypervisor are out of scope.
> >>> Otherwise, trap+emulation is equally dead which relies on the
> >>> hypervisor to
> >> do things.
> >> so lets focus on LM topic, other than confidential computing.
> > ok.
> >
> >>> Just because data transfer is not done, it does not mean that
> >>> thousands of
> >> polling register writes complete in stipulated time.
> >> 1) again, they are per-device facilities
> > That does not satisfy that it can somehow do work in < x usec time.
> why? Do you mind take examples of basic PCI virtio common config space
> registers?
> >> 2) we use very few registers, even status byte does not require
> >> polling, just re- read with delay.
> >>
> >> Please refer to the code for setting FEATURES_OK.
> > It wont work when one needs to suspend the device.
> > There is no point of doing such work over registers as fundamental framework
> is over the AQ.
> why it doesn't work?

For two following reasons.
1. All the things needed cannot be communicated over registers efficiently, such as (a) device context, (b) dirty pages.
2. synchronous registers on the VF cannot inter operate with FLR and device reset flow.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  8:39                                 ` Zhu, Lingshan
@ 2023-09-19  9:09                                   ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-19  9:09 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang; +Cc: virtio-dev, Michael S. Tsirkin

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 19, 2023 2:09 PM
> 
> On 9/19/2023 4:31 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Tuesday, September 19, 2023 1:34 PM
> >>> In your previous email you wrote,
> >>>
> >>> 1. "so lets focus on LM topic, other than confidential computing."
> >>> 2. "again, TDISP is out of spec and TDISP devices are not migratable."
> >>>
> >>>   From above two comments from you I understood it that way, you
> >>> want to
> >> focus now on the LM _other_than_ CC.
> >>> How to read it differently?
> >> You said:"Lingshan wanted to take this TDISP extension in future."
> >>
> > Based on above two comments from you this is what I understand.
> I am not a native speaker, but do you see I ever mentioned that I want to take
> TDISP in future? Any verbs I used?
> >> How do you conclude this statement? Did I ever said I want to take
> >> TDISP extension in future?
> > CC and TDISP goes hand in hand.
> >
> > So you want to consider TDISP extension now (and not in future)?
> As I said before, CC and TDISP is out of spec, that means we should ignore them
> for now.
> And I don't plan to take TDISP in future, you should not confuse the sentences.
Ah, I understood now that you do not plan TDISP in future.
Thanks.


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-19  9:06                           ` Parav Pandit
@ 2023-09-19 10:03                             ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-19 10:03 UTC (permalink / raw)
  To: Parav Pandit, virtio-dev, Michael S. Tsirkin, Jason Wang



On 9/19/2023 5:06 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 19, 2023 1:32 PM
>>
>> On 9/19/2023 2:41 AM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Monday, September 18, 2023 3:05 PM
>>>>
>>>> On 9/18/2023 2:54 PM, Parav Pandit wrote:
>>>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>>>> Sent: Monday, September 18, 2023 12:19 PM so admin vq based LM
>>>>>> solution can be a side channel attacking surface
>>>>> It will be part of the DSM whenever it will be used in future.
>>>>> Hence, it is not attack surface.
>>>> I am not sure, why we have to trust the PF?
>>>> This is out of virtio scope anyway.
>>>>
>>>> I have explained many times how it can be a attack surface, and examples.
>>>>
>>> And none of that make any sense as fundamentally, hypervisor is trusted
>> regardless of the approach.
>> this is not about hypervisors, I am saying admin vq based LM solution can be a
>> side channel attacking surface Please refer to my previously listed examples
>> and the TDISP spec is FYI.
> In previous email you wrote " As I said before, CC and TDISP is out of spec, that means we should ignore them for now."
> So I am ignoring it now and hence, I am ignoring above comment.
> Lets reach to a common ground for simplified case and than consider more complex cases.
ok
>
>>>> What happen if malicious SW dump guest memory by admin vq dirty page
>>>> tracking feature?
>>> What??
>>> Where is this malicious SW is located, in guest VM?
>> host, in this attacking model.
>>>>>>>>>> For untrusted hypervisor, same set of attack surface is present
>>>>>>>>>> with
>>>>>>>>>> trap+emulation.
>>>>>>>>>> So both method score same. Hence its not relevant point for
>> discussion.
>>>>>>>>> this is not hypervisor, Do you see any modern hypervisor have
>>>>>>>>> these issues?
>>>>>>>>>
>>>>>>>>> This is admin vq for LM can be a side channel attacking surface.
>>>>>>> It is not.
>>>>>>> Hypervisor is trusted entity.
>>>>>>> For untrusted hypervisor the TDISP is unified solution build by
>>>>>>> the various
>>>>>> industry bodies including DMTF, PCI for last few years.
>>>>>>> We want to utilize that.
>>>>>> first, TDISP is out of virtio spec.
>>>>> Sure, hence, untrusted hypervisor are out of scope.
>>>>> Otherwise, trap+emulation is equally dead which relies on the
>>>>> hypervisor to
>>>> do things.
>>>> so lets focus on LM topic, other than confidential computing.
>>> ok.
>>>
>>>>> Just because data transfer is not done, it does not mean that
>>>>> thousands of
>>>> polling register writes complete in stipulated time.
>>>> 1) again, they are per-device facilities
>>> That does not satisfy that it can somehow do work in < x usec time.
>> why? Do you mind take examples of basic PCI virtio common config space
>> registers?
>>>> 2) we use very few registers, even status byte does not require
>>>> polling, just re- read with delay.
>>>>
>>>> Please refer to the code for setting FEATURES_OK.
>>> It wont work when one needs to suspend the device.
>>> There is no point of doing such work over registers as fundamental framework
>> is over the AQ.
>> why it doesn't work?
> For two following reasons.
> 1. All the things needed cannot be communicated over registers efficiently, such as (a) device context, (b) dirty pages.
for a) please read QEMU live migration code.
for b) the registers in config space are control path, we don't store 
dirty pages by the registers.
You can review the next version.
> 2. synchronous registers on the VF cannot inter operate with FLR and device reset flow.
Why FLR is a concern to this series? Have you read QEMU live migration 
code? Does it handle FLR explicitly?
Does it need to handle all PCI attributes?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-18 18:49                         ` Michael S. Tsirkin
@ 2023-09-20  6:06                           ` Zhu, Lingshan
  2023-09-20  6:08                             ` Parav Pandit
  2023-09-20 10:36                             ` Michael S. Tsirkin
  0 siblings, 2 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  6:06 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang



On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
>>> Please refer to the code for setting FEATURES_OK.
>> It wont work when one needs to suspend the device.
>> There is no point of doing such work over registers as fundamental framework is over the AQ.
> Well not really. It's over admin commands. When these were built the
> intent always was that it's possible to use admin commands through
> another interface, other than admin queue. Is there a problem
> implementing admin commands over a memory BAR? For example, I can see
> an "admin command" capability pointing at a BAR where
> commands are supplied, and using a new group type referring to
> device itself.
I am not sure, if a bar cap would be implemented as a proxy for the admin vq
based live migration. then the problems of admin vq LM that we have 
discussed
still exist. the bar is only a proxy, doesn't fix anything. and even larger
side channel attacking surface: vf-->pf-->vf
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20  6:06                           ` Zhu, Lingshan
@ 2023-09-20  6:08                             ` Parav Pandit
  2023-09-20  6:31                               ` Zhu, Lingshan
  2023-09-20 10:36                             ` Michael S. Tsirkin
  1 sibling, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-20  6:08 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 20, 2023 11:36 AM
> 
> On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> >>> Please refer to the code for setting FEATURES_OK.
> >> It wont work when one needs to suspend the device.
> >> There is no point of doing such work over registers as fundamental
> framework is over the AQ.
> > Well not really. It's over admin commands. When these were built the
> > intent always was that it's possible to use admin commands through
> > another interface, other than admin queue. Is there a problem
> > implementing admin commands over a memory BAR? For example, I can see
> > an "admin command" capability pointing at a BAR where commands are
> > supplied, and using a new group type referring to device itself.
> I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> based live migration. then the problems of admin vq LM that we have discussed
> still exist. the bar is only a proxy, doesn't fix anything. and even larger side
> channel attacking surface: vf-->pf-->vf

AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20  6:08                             ` Parav Pandit
@ 2023-09-20  6:31                               ` Zhu, Lingshan
  2023-09-20  8:34                                 ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  6:31 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 2062 bytes --]



On 9/20/2023 2:08 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan<lingshan.zhu@intel.com>
>> Sent: Wednesday, September 20, 2023 11:36 AM
>>
>> On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
>>> On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
>>>>> Please refer to the code for setting FEATURES_OK.
>>>> It wont work when one needs to suspend the device.
>>>> There is no point of doing such work over registers as fundamental
>> framework is over the AQ.
>>> Well not really. It's over admin commands. When these were built the
>>> intent always was that it's possible to use admin commands through
>>> another interface, other than admin queue. Is there a problem
>>> implementing admin commands over a memory BAR? For example, I can see
>>> an "admin command" capability pointing at a BAR where commands are
>>> supplied, and using a new group type referring to device itself.
>> I am not sure, if a bar cap would be implemented as a proxy for the admin vq
>> based live migration. then the problems of admin vq LM that we have discussed
>> still exist. the bar is only a proxy, doesn't fix anything. and even larger side
>> channel attacking surface: vf-->pf-->vf
> AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.
I believe we have discussed this for many times, and I even provide you 
some examples.

Let me repeat for the last time.

There can be malicious SW on the host, and the host may be hacked and 
compromised.
For example:
1) SUSPEND the a running guest by admin vq
2) dumping guest memory through admin vq dirty page tracking.

These above can happen right?

You made TDISP as an example, but have you really read the TDISP spec?
In the spec:

Device Security Architecture - Administrative interfaces (e.g., a PF) may be
used to influence the security properties of the TDI used by the TVM.

TEE-I/O requires the device to organize its hardware/software interfaces 
such that the PF cannot
be used to affect the security of a TDI when it is in use by a TVM

Clear?

[-- Attachment #2: Type: text/html, Size: 4146 bytes --]

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20  6:31                               ` Zhu, Lingshan
@ 2023-09-20  8:34                                 ` Parav Pandit
  2023-09-20  9:44                                   ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-20  8:34 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 2944 bytes --]

> There can be malicious SW on the host, and the host may be hacked and compromised.
> For example:
> 1) SUSPEND the a running guest by admin vq
> 2) dumping guest memory through admin vq dirty page tracking.

No. hypervisor is trusted entity who is hosting the VM.
The device migration is initiated by the hypervisor.

I am omitting the TDISP question for now as talked before.
TDISP spec will evolve for hypercalls when we get there.

From: virtio-dev@lists.oasis-open.org <virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
Sent: Wednesday, September 20, 2023 12:01 PM
To: Parav Pandit <parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org; Jason Wang <jasowang@redhat.com>
Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

On 9/20/2023 2:08 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com><mailto:lingshan.zhu@intel.com>

Sent: Wednesday, September 20, 2023 11:36 AM

On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:

On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:

Please refer to the code for setting FEATURES_OK.

It wont work when one needs to suspend the device.

There is no point of doing such work over registers as fundamental

framework is over the AQ.

Well not really. It's over admin commands. When these were built the

intent always was that it's possible to use admin commands through

another interface, other than admin queue. Is there a problem

implementing admin commands over a memory BAR? For example, I can see

an "admin command" capability pointing at a BAR where commands are

supplied, and using a new group type referring to device itself.

I am not sure, if a bar cap would be implemented as a proxy for the admin vq

based live migration. then the problems of admin vq LM that we have discussed

still exist. the bar is only a proxy, doesn't fix anything. and even larger side

channel attacking surface: vf-->pf-->vf

AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.
I believe we have discussed this for many times, and I even provide you some examples.

Let me repeat for the last time.

There can be malicious SW on the host, and the host may be hacked and compromised.
For example:
1) SUSPEND the a running guest by admin vq
2) dumping guest memory through admin vq dirty page tracking.

These above can happen right?

You made TDISP as an example, but have you really read the TDISP spec?
In the spec:

Device Security Architecture - Administrative interfaces (e.g., a PF) may be
used to influence the security properties of the TDI used by the TVM.

TEE-I/O requires the device to organize its hardware/software interfaces such that the PF cannot
be used to affect the security of a TDI when it is in use by a TVM

Clear?

[-- Attachment #2: Type: text/html, Size: 7294 bytes --]

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20  8:34                                 ` Parav Pandit
@ 2023-09-20  9:44                                   ` Zhu, Lingshan
  2023-09-20  9:52                                     ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  9:44 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 3610 bytes --]



On 9/20/2023 4:34 PM, Parav Pandit wrote:
>
> > There can be malicious SW on the host, and the host may be hacked 
> and compromised.
> > For example:
> > 1) SUSPEND the a running guest by admin vq
> > 2) dumping guest memory through admin vq dirty page tracking.
>
> No. hypervisor is trusted entity who is hosting the VM.
>
The PF may not owned by the hypervisor and the host can be hacked and 
computerized.
>
> The device migration is initiated by the hypervisor.
>
> I am omitting the TDISP question for now as talked before.
>
> TDISP spec will evolve for hypercalls when we get there.
>
Confidential computing is out of the spec, as we discussed and agreed.

This is to demonstrate why even using a bar cap as proxy for admin vq LM 
is still problematic.
TDISP gives examples of the attacking models, and admin vq based LM
conforms to the models.
>
> *From:* virtio-dev@lists.oasis-open.org 
> <virtio-dev@lists.oasis-open.org> *On Behalf Of *Zhu, Lingshan
> *Sent:* Wednesday, September 20, 2023 12:01 PM
> *To:* Parav Pandit <parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com>
> *Cc:* virtio-dev@lists.oasis-open.org; Jason Wang <jasowang@redhat.com>
> *Subject:* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND 
> bit and vq state
>
> On 9/20/2023 2:08 PM, Parav Pandit wrote:
>
>         From: Zhu, Lingshan<lingshan.zhu@intel.com>  <mailto:lingshan.zhu@intel.com>
>
>         Sent: Wednesday, September 20, 2023 11:36 AM
>
>         On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
>
>             On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
>
>                     Please refer to the code for setting FEATURES_OK.
>
>                 It wont work when one needs to suspend the device.
>
>                 There is no point of doing such work over registers as fundamental
>
>         framework is over the AQ.
>
>             Well not really. It's over admin commands. When these were built the
>
>             intent always was that it's possible to use admin commands through
>
>             another interface, other than admin queue. Is there a problem
>
>             implementing admin commands over a memory BAR? For example, I can see
>
>             an "admin command" capability pointing at a BAR where commands are
>
>             supplied, and using a new group type referring to device itself.
>
>         I am not sure, if a bar cap would be implemented as a proxy for the admin vq
>
>         based live migration. then the problems of admin vq LM that we have discussed
>
>         still exist. the bar is only a proxy, doesn't fix anything. and even larger side
>
>         channel attacking surface: vf-->pf-->vf
>
>     AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.
>
> I believe we have discussed this for many times, and I even provide 
> you some examples.
>
> Let me repeat for the last time.
>
> There can be malicious SW on the host, and the host may be hacked and 
> compromised.
> For example:
> 1) SUSPEND the a running guest by admin vq
> 2) dumping guest memory through admin vq dirty page tracking.
>
> These above can happen right?
>
> You made TDISP as an example, but have you really read the TDISP spec?
> In the spec:
>
> Device Security Architecture - Administrative interfaces (e.g., a PF) 
> may be
> used to influence the security properties of the TDI used by the TVM.
>
> TEE-I/O requires the device to organize its hardware/software 
> interfaces such that the PF cannot
> be used to affect the security of a TDI when it is in use by a TVM
>
> Clear?
>

[-- Attachment #2: Type: text/html, Size: 10996 bytes --]

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20  9:44                                   ` Zhu, Lingshan
@ 2023-09-20  9:52                                     ` Parav Pandit
  2023-09-20 11:11                                       ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-20  9:52 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 5339 bytes --]

Hi Lingshan,

Last two email replies in non-next format are getting hard to follow.
Can you please revert back to have text-based emails?

When one wants to use PF for the live migration in trusted hypervisor, PF is in the trust zone.

In future when hypervisor is not trusted, the task of LM will be delegated to other infrastructure TVM.
Ravi at Intel already explained this a year ago using migration TD.
This fits very well without bifurcating the member device which is extremely hard.

Parav

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, September 20, 2023 3:15 PM
To: Parav Pandit <parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org; Jason Wang <jasowang@redhat.com>
Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

On 9/20/2023 4:34 PM, Parav Pandit wrote:
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;}pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0in; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";}span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas;}span.fontstyle0 {mso-style-name:fontstyle0;}span.EmailStyle21 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt; mso-ligatures:none;}div.WordSection1 {page:WordSection1;}
> There can be malicious SW on the host, and the host may be hacked and compromised.
> For example:
> 1) SUSPEND the a running guest by admin vq
> 2) dumping guest memory through admin vq dirty page tracking.

No. hypervisor is trusted entity who is hosting the VM.
The PF may not owned by the hypervisor and the host can be hacked and computerized.

The device migration is initiated by the hypervisor.

I am omitting the TDISP question for now as talked before.
TDISP spec will evolve for hypercalls when we get there.
Confidential computing is out of the spec, as we discussed and agreed.

This is to demonstrate why even using a bar cap as proxy for admin vq LM is still problematic.
TDISP gives examples of the attacking models, and admin vq based LM
conforms to the models.

From: virtio-dev@lists.oasis-open.org<mailto:virtio-dev@lists.oasis-open.org> <virtio-dev@lists.oasis-open.org><mailto:virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
Sent: Wednesday, September 20, 2023 12:01 PM
To: Parav Pandit <parav@nvidia.com><mailto:parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com><mailto:mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org<mailto:virtio-dev@lists.oasis-open.org>; Jason Wang <jasowang@redhat.com><mailto:jasowang@redhat.com>
Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

On 9/20/2023 2:08 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com><mailto:lingshan.zhu@intel.com>

Sent: Wednesday, September 20, 2023 11:36 AM

On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:

On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:

Please refer to the code for setting FEATURES_OK.

It wont work when one needs to suspend the device.

There is no point of doing such work over registers as fundamental

framework is over the AQ.

Well not really. It's over admin commands. When these were built the

intent always was that it's possible to use admin commands through

another interface, other than admin queue. Is there a problem

implementing admin commands over a memory BAR? For example, I can see

an "admin command" capability pointing at a BAR where commands are

supplied, and using a new group type referring to device itself.

I am not sure, if a bar cap would be implemented as a proxy for the admin vq

based live migration. then the problems of admin vq LM that we have discussed

still exist. the bar is only a proxy, doesn't fix anything. and even larger side

channel attacking surface: vf-->pf-->vf

AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.
I believe we have discussed this for many times, and I even provide you some examples.

Let me repeat for the last time.

There can be malicious SW on the host, and the host may be hacked and compromised.
For example:
1) SUSPEND the a running guest by admin vq
2) dumping guest memory through admin vq dirty page tracking.

These above can happen right?

You made TDISP as an example, but have you really read the TDISP spec?
In the spec:

Device Security Architecture - Administrative interfaces (e.g., a PF) may be
used to influence the security properties of the TDI used by the TVM.

TEE-I/O requires the device to organize its hardware/software interfaces such that the PF cannot
be used to affect the security of a TDI when it is in use by a TVM

Clear?

[-- Attachment #2: Type: text/html, Size: 12689 bytes --]

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20  6:06                           ` Zhu, Lingshan
  2023-09-20  6:08                             ` Parav Pandit
@ 2023-09-20 10:36                             ` Michael S. Tsirkin
  2023-09-20 10:55                               ` Parav Pandit
                                                 ` (2 more replies)
  1 sibling, 3 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 10:36 UTC (permalink / raw)
  To: Zhu, Lingshan; +Cc: Parav Pandit, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > Please refer to the code for setting FEATURES_OK.
> > > It wont work when one needs to suspend the device.
> > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > Well not really. It's over admin commands. When these were built the
> > intent always was that it's possible to use admin commands through
> > another interface, other than admin queue. Is there a problem
> > implementing admin commands over a memory BAR? For example, I can see
> > an "admin command" capability pointing at a BAR where
> > commands are supplied, and using a new group type referring to
> > device itself.
> I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> based live migration.

Not a proxy for a vq in that there's no vq then.

> then the problems of admin vq LM that we have
> discussed
> still exist.

I freely admit the finer points of this extended flamewar have been lost
on me, and I wager I'm not the only one. I thought you wanted to migrate
the device just by accessing the device itself (e.g. the VF) without
accessing other devices (e.g. the PF), while Parav wants it in a
separate device so the whole of the device itself can passed through to
guest. Isn't this, fundamentally, the issue?

> the bar is only a proxy, doesn't fix anything. and even larger
> side channel attacking surface: vf-->pf-->vf

In this model there's no pf. BAR belongs to vf itself
and you submit commands for the VF through its BAR.
Just separate from the pci config space.

The whole attacking surface discussion is also puzzling.  We either are
or are not discussing confidential computing/TDI.  I couldn't figure
it out. This needs a separate thread I think.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 10:36                             ` Michael S. Tsirkin
@ 2023-09-20 10:55                               ` Parav Pandit
  2023-09-20 11:28                                 ` Zhu, Lingshan
  2023-09-20 11:22                               ` Zhu, Lingshan
  2023-09-21  3:17                               ` Jason Wang
  2 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-20 10:55 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu, Lingshan; +Cc: virtio-dev, Jason Wang


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 20, 2023 4:06 PM

> 
> I freely admit the finer points of this extended flamewar have been lost on me,
> and I wager I'm not the only one. I thought you wanted to migrate the device
> just by accessing the device itself (e.g. the VF) without accessing other devices
> (e.g. the PF), while Parav wants it in a separate device so the whole of the
> device itself can passed through to guest. Isn't this, fundamentally, the issue?
Right. An admin device doing the work of device migration. Today it is the owner PF.
In future it can be other admin device who is deleted this task of migration, who can be group owner.
All the admin commands that we plumb here just works great in that CC/TDI future, because only thing changes is the admin device issuing this command.

> 
> > the bar is only a proxy, doesn't fix anything. and even larger side
> > channel attacking surface: vf-->pf-->vf
> 
> In this model there's no pf. BAR belongs to vf itself and you submit commands
> for the VF through its BAR.
> Just separate from the pci config space.
> 
> The whole attacking surface discussion is also puzzling.  We either are or are
> not discussing confidential computing/TDI.  I couldn't figure it out. This needs a
> separate thread I think.

True. Many of Lingshan thoughts/comments gets mixed I feel.
Because he proposes trap+emulation/mediation-based solution by hypervisor and none of that is secure anyway in CC/TDI concept.
He keeps attacking AQ as some side channel attack, while somehow trap+emulation also done by hypervisor is secure, which obviously does not make sense in CC/TDI concept.
Both scores equal where hypervisor trust is of concern.

And admin command approach [1] has clear direction for CC to delete those admin commands to a dedicated trusted entity instead of hypervisor.

I try to explain these few times, but..

Anyways, if AQ has some comments better to reply in its thread at [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

I will post v1 for [1] with more mature device context this week along with future provisioning item note.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20  9:52                                     ` Parav Pandit
@ 2023-09-20 11:11                                       ` Zhu, Lingshan
  2023-09-20 11:15                                         ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20 11:11 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 6402 bytes --]

On 9/20/2023 5:52 PM, Parav Pandit wrote:
>
> Hi Lingshan,
>
> Last two email replies in non-next format are getting hard to follow.
>
> Can you please revert back to have text-based emails?
>
> When one wants to use PF for the live migration in trusted hypervisor, 
> PF is in the trust zone.
>
even without live migration, it can be an attacking surface while guest 
running.
As repeated for many times, it can be used by malicious SW to dump guest 
memory
>
> In future when hypervisor is not trusted, the task of LM will be 
> delegated to other infrastructure TVM.
>
> Ravi at Intel already explained this a year ago using migration TD.
>
> This fits very well without bifurcating the member device which is 
> extremely hard.
>
TD, TDX or TDX-IO are more complex topics, and we should
focus on our live migration solution, not CC.

My point is: using bar cap as a proxy for admin vq based LM is still 
problematic.

Maybe we can close this.
>
> Parav
>
> *From:* Zhu, Lingshan <lingshan.zhu@intel.com>
> *Sent:* Wednesday, September 20, 2023 3:15 PM
> *To:* Parav Pandit <parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com>
> *Cc:* virtio-dev@lists.oasis-open.org; Jason Wang <jasowang@redhat.com>
> *Subject:* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND 
> bit and vq state
>
> On 9/20/2023 4:34 PM, Parav Pandit wrote:
>
>     @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2
>     4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2
>     4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2
>     4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in;
>     font-size:11.0pt; font-family:"Calibri",sans-serif;}a:link,
>     span.MsoHyperlink {mso-style-priority:99; color:blue;
>     text-decoration:underline;}pre {mso-style-priority:99;
>     mso-style-link:"HTML Preformatted Char"; margin:0in;
>     margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier
>     New";}span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted
>     Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted";
>     font-family:Consolas;}span.fontstyle0
>     {mso-style-name:fontstyle0;}span.EmailStyle21
>     {mso-style-type:personal-reply; font-family:"Calibri",sans-serif;
>     color:windowtext;}.MsoChpDefault {mso-style-type:export-only;
>     font-size:10.0pt; mso-ligatures:none;}div.WordSection1
>     {page:WordSection1;}
>
>     > There can be malicious SW on the host, and the host may be hacked
>     and compromised.
>     > For example:
>     > 1) SUSPEND the a running guest by admin vq
>     > 2) dumping guest memory through admin vq dirty page tracking.
>
>
>     No. hypervisor is trusted entity who is hosting the VM.
>
> The PF may not owned by the hypervisor and the host can be hacked and 
> computerized.
>
>     The device migration is initiated by the hypervisor.
>
>     I am omitting the TDISP question for now as talked before.
>
>     TDISP spec will evolve for hypercalls when we get there.
>
> Confidential computing is out of the spec, as we discussed and agreed.
>
> This is to demonstrate why even using a bar cap as proxy for admin vq 
> LM is still problematic.
> TDISP gives examples of the attacking models, and admin vq based LM
> conforms to the models.
>
>     *From:*virtio-dev@lists.oasis-open.org
>     <virtio-dev@lists.oasis-open.org>
>     <mailto:virtio-dev@lists.oasis-open.org> *On Behalf Of *Zhu, Lingshan
>     *Sent:* Wednesday, September 20, 2023 12:01 PM
>     *To:* Parav Pandit <parav@nvidia.com> <mailto:parav@nvidia.com>;
>     Michael S. Tsirkin <mst@redhat.com> <mailto:mst@redhat.com>
>     *Cc:* virtio-dev@lists.oasis-open.org; Jason Wang
>     <jasowang@redhat.com> <mailto:jasowang@redhat.com>
>     *Subject:* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce
>     SUSPEND bit and vq state
>
>     On 9/20/2023 2:08 PM, Parav Pandit wrote:
>
>           
>
>             From: Zhu, Lingshan<lingshan.zhu@intel.com>  <mailto:lingshan.zhu@intel.com>
>
>             Sent: Wednesday, September 20, 2023 11:36 AM
>
>               
>
>             On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
>
>                 On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
>
>                         Please refer to the code for setting FEATURES_OK.
>
>                     It wont work when one needs to suspend the device.
>
>                     There is no point of doing such work over registers as fundamental
>
>             framework is over the AQ.
>
>                 Well not really. It's over admin commands. When these were built the
>
>                 intent always was that it's possible to use admin commands through
>
>                 another interface, other than admin queue. Is there a problem
>
>                 implementing admin commands over a memory BAR? For example, I can see
>
>                 an "admin command" capability pointing at a BAR where commands are
>
>                 supplied, and using a new group type referring to device itself.
>
>             I am not sure, if a bar cap would be implemented as a proxy for the admin vq
>
>             based live migration. then the problems of admin vq LM that we have discussed
>
>             still exist. the bar is only a proxy, doesn't fix anything. and even larger side
>
>             channel attacking surface: vf-->pf-->vf
>
>           
>
>         AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.
>
>     I believe we have discussed this for many times, and I even
>     provide you some examples.
>
>     Let me repeat for the last time.
>
>     There can be malicious SW on the host, and the host may be hacked
>     and compromised.
>     For example:
>     1) SUSPEND the a running guest by admin vq
>     2) dumping guest memory through admin vq dirty page tracking.
>
>     These above can happen right?
>
>     You made TDISP as an example, but have you really read the TDISP spec?
>     In the spec:
>
>     Device Security Architecture - Administrative interfaces (e.g., a
>     PF) may be
>     used to influence the security properties of the TDI used by the TVM.
>
>     TEE-I/O requires the device to organize its hardware/software
>     interfaces such that the PF cannot
>     be used to affect the security of a TDI when it is in use by a TVM
>
>     Clear?
>
>
>           
>

[-- Attachment #2: Type: text/html, Size: 17970 bytes --]

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 11:11                                       ` Zhu, Lingshan
@ 2023-09-20 11:15                                         ` Parav Pandit
  2023-09-20 11:27                                           ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-20 11:15 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 6737 bytes --]

Random words like malicious SW to describe an attack do not make sense.

Refer the patches and series and usage model to describe the sw attack if any.
I disagree and I will not repeat all the points anymore.
If you have comments in [1], please reply in [1].

Series [1] clearly describes the usage model at least for one widely used OS = Linux.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

From: Zhu, Lingshan <lingshan.zhu@intel.com>
Sent: Wednesday, September 20, 2023 4:41 PM
To: Parav Pandit <parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org; Jason Wang <jasowang@redhat.com>
Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

On 9/20/2023 5:52 PM, Parav Pandit wrote:

Hi Lingshan,

Last two email replies in non-next format are getting hard to follow.
Can you please revert back to have text-based emails?

When one wants to use PF for the live migration in trusted hypervisor, PF is in the trust zone.
even without live migration, it can be an attacking surface while guest running.
As repeated for many times, it can be used by malicious SW to dump guest memory

In future when hypervisor is not trusted, the task of LM will be delegated to other infrastructure TVM.
Ravi at Intel already explained this a year ago using migration TD.
This fits very well without bifurcating the member device which is extremely hard.
TD, TDX or TDX-IO are more complex topics, and we should
focus on our live migration solution, not CC.

My point is: using bar cap as a proxy for admin vq based LM is still problematic.

Maybe we can close this.

Parav

From: Zhu, Lingshan <lingshan.zhu@intel.com><mailto:lingshan.zhu@intel.com>
Sent: Wednesday, September 20, 2023 3:15 PM
To: Parav Pandit <parav@nvidia.com><mailto:parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com><mailto:mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org<mailto:virtio-dev@lists.oasis-open.org>; Jason Wang <jasowang@redhat.com><mailto:jasowang@redhat.com>
Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

On 9/20/2023 4:34 PM, Parav Pandit wrote:
@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;}pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0in; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";}span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas;}span.fontstyle0 {mso-style-name:fontstyle0;}span.EmailStyle21 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:windowtext;}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt; mso-ligatures:none;}div.WordSection1 {page:WordSection1;}
> There can be malicious SW on the host, and the host may be hacked and compromised.
> For example:
> 1) SUSPEND the a running guest by admin vq
> 2) dumping guest memory through admin vq dirty page tracking.

No. hypervisor is trusted entity who is hosting the VM.
The PF may not owned by the hypervisor and the host can be hacked and computerized.

The device migration is initiated by the hypervisor.

I am omitting the TDISP question for now as talked before.
TDISP spec will evolve for hypercalls when we get there.
Confidential computing is out of the spec, as we discussed and agreed.

This is to demonstrate why even using a bar cap as proxy for admin vq LM is still problematic.
TDISP gives examples of the attacking models, and admin vq based LM
conforms to the models.

From: virtio-dev@lists.oasis-open.org<mailto:virtio-dev@lists.oasis-open.org> <virtio-dev@lists.oasis-open.org><mailto:virtio-dev@lists.oasis-open.org> On Behalf Of Zhu, Lingshan
Sent: Wednesday, September 20, 2023 12:01 PM
To: Parav Pandit <parav@nvidia.com><mailto:parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com><mailto:mst@redhat.com>
Cc: virtio-dev@lists.oasis-open.org<mailto:virtio-dev@lists.oasis-open.org>; Jason Wang <jasowang@redhat.com><mailto:jasowang@redhat.com>
Subject: Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state

On 9/20/2023 2:08 PM, Parav Pandit wrote:

From: Zhu, Lingshan <lingshan.zhu@intel.com><mailto:lingshan.zhu@intel.com>

Sent: Wednesday, September 20, 2023 11:36 AM

On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:

On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:

Please refer to the code for setting FEATURES_OK.

It wont work when one needs to suspend the device.

There is no point of doing such work over registers as fundamental

framework is over the AQ.

Well not really. It's over admin commands. When these were built the

intent always was that it's possible to use admin commands through

another interface, other than admin queue. Is there a problem

implementing admin commands over a memory BAR? For example, I can see

an "admin command" capability pointing at a BAR where commands are

supplied, and using a new group type referring to device itself.

I am not sure, if a bar cap would be implemented as a proxy for the admin vq

based live migration. then the problems of admin vq LM that we have discussed

still exist. the bar is only a proxy, doesn't fix anything. and even larger side

channel attacking surface: vf-->pf-->vf

AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.
I believe we have discussed this for many times, and I even provide you some examples.

Let me repeat for the last time.

There can be malicious SW on the host, and the host may be hacked and compromised.
For example:
1) SUSPEND the a running guest by admin vq
2) dumping guest memory through admin vq dirty page tracking.

These above can happen right?

You made TDISP as an example, but have you really read the TDISP spec?
In the spec:

Device Security Architecture - Administrative interfaces (e.g., a PF) may be
used to influence the security properties of the TDI used by the TVM.

TEE-I/O requires the device to organize its hardware/software interfaces such that the PF cannot
be used to affect the security of a TDI when it is in use by a TVM

Clear?

[-- Attachment #2: Type: text/html, Size: 15756 bytes --]

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 10:36                             ` Michael S. Tsirkin
  2023-09-20 10:55                               ` Parav Pandit
@ 2023-09-20 11:22                               ` Zhu, Lingshan
  2023-09-20 12:05                                 ` Michael S. Tsirkin
  2023-09-21  3:17                               ` Jason Wang
  2 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20 11:22 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Parav Pandit, virtio-dev, Jason Wang



On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
>>> On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
>>>>> Please refer to the code for setting FEATURES_OK.
>>>> It wont work when one needs to suspend the device.
>>>> There is no point of doing such work over registers as fundamental framework is over the AQ.
>>> Well not really. It's over admin commands. When these were built the
>>> intent always was that it's possible to use admin commands through
>>> another interface, other than admin queue. Is there a problem
>>> implementing admin commands over a memory BAR? For example, I can see
>>> an "admin command" capability pointing at a BAR where
>>> commands are supplied, and using a new group type referring to
>>> device itself.
>> I am not sure, if a bar cap would be implemented as a proxy for the admin vq
>> based live migration.
> Not a proxy for a vq in that there's no vq then.
I think if the driver sends admin commands through a VF's bar, then
VF forwards the admin commands to the PF, it acts like a proxy,
or an agent. Anyway it takes admin commands.

So the problems we have discussed still exist.
>
>> then the problems of admin vq LM that we have
>> discussed
>> still exist.
> I freely admit the finer points of this extended flamewar have been lost
> on me, and I wager I'm not the only one. I thought you wanted to migrate
> the device just by accessing the device itself (e.g. the VF) without
> accessing other devices (e.g. the PF), while Parav wants it in a
> separate device so the whole of the device itself can passed through to
> guest. Isn't this, fundamentally, the issue?
we are implementing basic facilities for live migration.

We have pointed out lots of issues, there are many discussions with
Jason and Parav about the problems in migration by admin vq, for example:
security, QOS and nested.
>
>> the bar is only a proxy, doesn't fix anything. and even larger
>> side channel attacking surface: vf-->pf-->vf
> In this model there's no pf. BAR belongs to vf itself
> and you submit commands for the VF through its BAR.
> Just separate from the pci config space.
If using the bar to process admin commands,
is this solution too heavy compared to my proposal in this series?
>
> The whole attacking surface discussion is also puzzling.  We either are
> or are not discussing confidential computing/TDI.  I couldn't figure
> it out. This needs a separate thread I think.
I agree confidential computing is out of spec. Parva mentioned TDISP and 
even
in TDISP spec, it explicitly defined some attacking model, and PF is an 
example.

It is out of spec anyway.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 11:15                                         ` Parav Pandit
@ 2023-09-20 11:27                                           ` Zhu, Lingshan
  2023-09-21  5:13                                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20 11:27 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang

[-- Attachment #1: Type: text/plain, Size: 8254 bytes --]



On 9/20/2023 7:15 PM, Parav Pandit wrote:
>
> Random words like malicious SW to describe an attack do not make sense.
>
this is not random wording, "malicious" used a lot in the papers,
you can search in google scholar
>
> Refer the patches and series and usage model to describe the sw attack 
> if any.
>
> I disagree and I will not repeat all the points anymore.
>
don't only say you disagree, please provide why it is not an attacking 
surface.

The problem still exist:

What if a malicious SW dump guest memory through admin vq LM facility?

I think this is end of this discussion.
>
> If you have comments in [1], please reply in [1].
>
> Series [1] clearly describes the usage model at least for one widely 
> used OS = Linux.
>
> [1] 
> https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
>
> *From:* Zhu, Lingshan <lingshan.zhu@intel.com>
> *Sent:* Wednesday, September 20, 2023 4:41 PM
> *To:* Parav Pandit <parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com>
> *Cc:* virtio-dev@lists.oasis-open.org; Jason Wang <jasowang@redhat.com>
> *Subject:* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND 
> bit and vq state
>
> On 9/20/2023 5:52 PM, Parav Pandit wrote:
>
>     Hi Lingshan,
>
>     Last two email replies in non-next format are getting hard to follow.
>
>     Can you please revert back to have text-based emails?
>
>     When one wants to use PF for the live migration in trusted
>     hypervisor, PF is in the trust zone.
>
> even without live migration, it can be an attacking surface while 
> guest running.
> As repeated for many times, it can be used by malicious SW to dump 
> guest memory
>
>     In future when hypervisor is not trusted, the task of LM will be
>     delegated to other infrastructure TVM.
>
>     Ravi at Intel already explained this a year ago using migration TD.
>
>     This fits very well without bifurcating the member device which is
>     extremely hard.
>
> TD, TDX or TDX-IO are more complex topics, and we should
> focus on our live migration solution, not CC.
>
> My point is: using bar cap as a proxy for admin vq based LM is still 
> problematic.
>
> Maybe we can close this.
>
>     Parav
>
>     *From:* Zhu, Lingshan <lingshan.zhu@intel.com>
>     <mailto:lingshan.zhu@intel.com>
>     *Sent:* Wednesday, September 20, 2023 3:15 PM
>     *To:* Parav Pandit <parav@nvidia.com> <mailto:parav@nvidia.com>;
>     Michael S. Tsirkin <mst@redhat.com> <mailto:mst@redhat.com>
>     *Cc:* virtio-dev@lists.oasis-open.org; Jason Wang
>     <jasowang@redhat.com> <mailto:jasowang@redhat.com>
>     *Subject:* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce
>     SUSPEND bit and vq state
>
>     On 9/20/2023 4:34 PM, Parav Pandit wrote:
>
>         @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6
>         3 2 4;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2
>         4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2
>         2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
>         {margin:0in; font-size:11.0pt;
>         font-family:"Calibri",sans-serif;}a:link, span.MsoHyperlink
>         {mso-style-priority:99; color:blue;
>         text-decoration:underline;}pre {mso-style-priority:99;
>         mso-style-link:"HTML Preformatted Char"; margin:0in;
>         margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier
>         New";}span.HTMLPreformattedChar {mso-style-name:"HTML
>         Preformatted Char"; mso-style-priority:99;
>         mso-style-link:"HTML Preformatted";
>         font-family:Consolas;}span.fontstyle0
>         {mso-style-name:fontstyle0;}span.EmailStyle21
>         {mso-style-type:personal-reply;
>         font-family:"Calibri",sans-serif;
>         color:windowtext;}.MsoChpDefault {mso-style-type:export-only;
>         font-size:10.0pt; mso-ligatures:none;}div.WordSection1
>         {page:WordSection1;}
>
>         > There can be malicious SW on the host, and the host may be
>         hacked and compromised.
>         > For example:
>         > 1) SUSPEND the a running guest by admin vq
>         > 2) dumping guest memory through admin vq dirty page tracking.
>
>
>
>         No. hypervisor is trusted entity who is hosting the VM.
>
>     The PF may not owned by the hypervisor and the host can be hacked
>     and computerized.
>
>
>         The device migration is initiated by the hypervisor.
>
>         I am omitting the TDISP question for now as talked before.
>
>         TDISP spec will evolve for hypercalls when we get there.
>
>     Confidential computing is out of the spec, as we discussed and agreed.
>
>     This is to demonstrate why even using a bar cap as proxy for admin
>     vq LM is still problematic.
>     TDISP gives examples of the attacking models, and admin vq based LM
>     conforms to the models.
>
>
>         *From:*virtio-dev@lists.oasis-open.org
>         <virtio-dev@lists.oasis-open.org>
>         <mailto:virtio-dev@lists.oasis-open.org> *On Behalf Of *Zhu,
>         Lingshan
>         *Sent:* Wednesday, September 20, 2023 12:01 PM
>         *To:* Parav Pandit <parav@nvidia.com>
>         <mailto:parav@nvidia.com>; Michael S. Tsirkin <mst@redhat.com>
>         <mailto:mst@redhat.com>
>         *Cc:* virtio-dev@lists.oasis-open.org; Jason Wang
>         <jasowang@redhat.com> <mailto:jasowang@redhat.com>
>         *Subject:* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce
>         SUSPEND bit and vq state
>
>         On 9/20/2023 2:08 PM, Parav Pandit wrote:
>
>               
>
>                 From: Zhu, Lingshan<lingshan.zhu@intel.com>  <mailto:lingshan.zhu@intel.com>
>
>                 Sent: Wednesday, September 20, 2023 11:36 AM
>
>                   
>
>                 On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
>
>                     On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
>
>                             Please refer to the code for setting FEATURES_OK.
>
>                         It wont work when one needs to suspend the device.
>
>                         There is no point of doing such work over registers as fundamental
>
>                 framework is over the AQ.
>
>                     Well not really. It's over admin commands. When these were built the
>
>                     intent always was that it's possible to use admin commands through
>
>                     another interface, other than admin queue. Is there a problem
>
>                     implementing admin commands over a memory BAR? For example, I can see
>
>                     an "admin command" capability pointing at a BAR where commands are
>
>                     supplied, and using a new group type referring to device itself.
>
>                 I am not sure, if a bar cap would be implemented as a proxy for the admin vq
>
>                 based live migration. then the problems of admin vq LM that we have discussed
>
>                 still exist. the bar is only a proxy, doesn't fix anything. and even larger side
>
>                 channel attacking surface: vf-->pf-->vf
>
>               
>
>             AQ LM using PF has no side channel attack as hypervisor and owner device is trusted entity as already discussed.
>
>         I believe we have discussed this for many times, and I even
>         provide you some examples.
>
>         Let me repeat for the last time.
>
>         There can be malicious SW on the host, and the host may be
>         hacked and compromised.
>         For example:
>         1) SUSPEND the a running guest by admin vq
>         2) dumping guest memory through admin vq dirty page tracking.
>
>         These above can happen right?
>
>         You made TDISP as an example, but have you really read the
>         TDISP spec?
>         In the spec:
>
>         Device Security Architecture - Administrative interfaces
>         (e.g., a PF) may be
>         used to influence the security properties of the TDI used by
>         the TVM.
>
>         TEE-I/O requires the device to organize its hardware/software
>         interfaces such that the PF cannot
>         be used to affect the security of a TDI when it is in use by a TVM
>
>         Clear?
>
>
>
>               
>

[-- Attachment #2: Type: text/html, Size: 23633 bytes --]

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 10:55                               ` Parav Pandit
@ 2023-09-20 11:28                                 ` Zhu, Lingshan
  2023-09-20 11:52                                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20 11:28 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin; +Cc: virtio-dev, Jason Wang



On 9/20/2023 6:55 PM, Parav Pandit wrote:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Wednesday, September 20, 2023 4:06 PM
>> I freely admit the finer points of this extended flamewar have been lost on me,
>> and I wager I'm not the only one. I thought you wanted to migrate the device
>> just by accessing the device itself (e.g. the VF) without accessing other devices
>> (e.g. the PF), while Parav wants it in a separate device so the whole of the
>> device itself can passed through to guest. Isn't this, fundamentally, the issue?
> Right. An admin device doing the work of device migration. Today it is the owner PF.
> In future it can be other admin device who is deleted this task of migration, who can be group owner.
> All the admin commands that we plumb here just works great in that CC/TDI future, because only thing changes is the admin device issuing this command.
>
>>> the bar is only a proxy, doesn't fix anything. and even larger side
>>> channel attacking surface: vf-->pf-->vf
>> In this model there's no pf. BAR belongs to vf itself and you submit commands
>> for the VF through its BAR.
>> Just separate from the pci config space.
>>
>> The whole attacking surface discussion is also puzzling.  We either are or are
>> not discussing confidential computing/TDI.  I couldn't figure it out. This needs a
>> separate thread I think.
> True. Many of Lingshan thoughts/comments gets mixed I feel.
> Because he proposes trap+emulation/mediation-based solution by hypervisor and none of that is secure anyway in CC/TDI concept.
> He keeps attacking AQ as some side channel attack, while somehow trap+emulation also done by hypervisor is secure, which obviously does not make sense in CC/TDI concept.
> Both scores equal where hypervisor trust is of concern.
Please answer directly:

What if a malicious SW suspend the guest when it is running through 
admin vq live migration facility

What if a malicious SW dump guest memory by tracking guest dirty pages 
by admin vq live migration faclity
>
> And admin command approach [1] has clear direction for CC to delete those admin commands to a dedicated trusted entity instead of hypervisor.
>
> I try to explain these few times, but..
>
> Anyways, if AQ has some comments better to reply in its thread at [1].
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
>
> I will post v1 for [1] with more mature device context this week along with future provisioning item note.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 11:28                                 ` Zhu, Lingshan
@ 2023-09-20 11:52                                   ` Michael S. Tsirkin
  2023-09-20 12:05                                     ` Zhu, Lingshan
  0 siblings, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 11:52 UTC (permalink / raw)
  To: Zhu, Lingshan; +Cc: Parav Pandit, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 07:28:39PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 6:55 PM, Parav Pandit wrote:
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, September 20, 2023 4:06 PM
> > > I freely admit the finer points of this extended flamewar have been lost on me,
> > > and I wager I'm not the only one. I thought you wanted to migrate the device
> > > just by accessing the device itself (e.g. the VF) without accessing other devices
> > > (e.g. the PF), while Parav wants it in a separate device so the whole of the
> > > device itself can passed through to guest. Isn't this, fundamentally, the issue?
> > Right. An admin device doing the work of device migration. Today it is the owner PF.
> > In future it can be other admin device who is deleted this task of migration, who can be group owner.
> > All the admin commands that we plumb here just works great in that CC/TDI future, because only thing changes is the admin device issuing this command.
> > 
> > > > the bar is only a proxy, doesn't fix anything. and even larger side
> > > > channel attacking surface: vf-->pf-->vf
> > > In this model there's no pf. BAR belongs to vf itself and you submit commands
> > > for the VF through its BAR.
> > > Just separate from the pci config space.
> > > 
> > > The whole attacking surface discussion is also puzzling.  We either are or are
> > > not discussing confidential computing/TDI.  I couldn't figure it out. This needs a
> > > separate thread I think.
> > True. Many of Lingshan thoughts/comments gets mixed I feel.
> > Because he proposes trap+emulation/mediation-based solution by hypervisor and none of that is secure anyway in CC/TDI concept.
> > He keeps attacking AQ as some side channel attack, while somehow trap+emulation also done by hypervisor is secure, which obviously does not make sense in CC/TDI concept.
> > Both scores equal where hypervisor trust is of concern.
> Please answer directly:

And here you go discussing this in the same thread. I feel you guys are
wasting bytes copying the list with this most people lost track
if not interest.

> What if a malicious SW suspend the guest when it is running through admin vq
> live migration facility

I doubt suspend is a problem - looks like a denial of service to me
and that is not considered part of the threat model at least going by
the documents confidential computing guys are posting on lkml.


> What if a malicious SW dump guest memory by tracking guest dirty pages by
> admin vq live migration faclity

All this does is tell you which pages did device access though.
It looks like on many architectures this information is readily
available anyway due to host page tables being under the hypervisor
control, since this is how it's migrated. Problem? How is memory
migrated otherwise?

> > 
> > And admin command approach [1] has clear direction for CC to delete those admin commands to a dedicated trusted entity instead of hypervisor.
> > 
> > I try to explain these few times, but..
> > 
> > Anyways, if AQ has some comments better to reply in its thread at [1].
> > 
> > [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
> > 
> > I will post v1 for [1] with more mature device context this week along with future provisioning item note.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 11:52                                   ` Michael S. Tsirkin
@ 2023-09-20 12:05                                     ` Zhu, Lingshan
  2023-09-20 12:08                                       ` Zhu, Lingshan
  2023-09-20 12:22                                       ` Michael S. Tsirkin
  0 siblings, 2 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20 12:05 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Parav Pandit, virtio-dev, Jason Wang



On 9/20/2023 7:52 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 20, 2023 at 07:28:39PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/20/2023 6:55 PM, Parav Pandit wrote:
>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>> Sent: Wednesday, September 20, 2023 4:06 PM
>>>> I freely admit the finer points of this extended flamewar have been lost on me,
>>>> and I wager I'm not the only one. I thought you wanted to migrate the device
>>>> just by accessing the device itself (e.g. the VF) without accessing other devices
>>>> (e.g. the PF), while Parav wants it in a separate device so the whole of the
>>>> device itself can passed through to guest. Isn't this, fundamentally, the issue?
>>> Right. An admin device doing the work of device migration. Today it is the owner PF.
>>> In future it can be other admin device who is deleted this task of migration, who can be group owner.
>>> All the admin commands that we plumb here just works great in that CC/TDI future, because only thing changes is the admin device issuing this command.
>>>
>>>>> the bar is only a proxy, doesn't fix anything. and even larger side
>>>>> channel attacking surface: vf-->pf-->vf
>>>> In this model there's no pf. BAR belongs to vf itself and you submit commands
>>>> for the VF through its BAR.
>>>> Just separate from the pci config space.
>>>>
>>>> The whole attacking surface discussion is also puzzling.  We either are or are
>>>> not discussing confidential computing/TDI.  I couldn't figure it out. This needs a
>>>> separate thread I think.
>>> True. Many of Lingshan thoughts/comments gets mixed I feel.
>>> Because he proposes trap+emulation/mediation-based solution by hypervisor and none of that is secure anyway in CC/TDI concept.
>>> He keeps attacking AQ as some side channel attack, while somehow trap+emulation also done by hypervisor is secure, which obviously does not make sense in CC/TDI concept.
>>> Both scores equal where hypervisor trust is of concern.
>> Please answer directly:
> And here you go discussing this in the same thread. I feel you guys are
> wasting bytes copying the list with this most people lost track
> if not interest.
I agree, although I have to reply because Parav said I am "attacking" AQ 
which is
not a good wording.

And I need to show its not attacking, this is discussions on
LM topics, there may be some debating, and I surely need to provide proof.
>
>> What if a malicious SW suspend the guest when it is running through admin vq
>> live migration facility
> I doubt suspend is a problem - looks like a denial of service to me
> and that is not considered part of the threat model at least going by
> the documents confidential computing guys are posting on lkml.
Yes this is a denial of service and it can be a problem if the service 
is a critical service
like a remote attestation server.

So suspending the VM by admin vq LM commands can attack the system.
>
>
>> What if a malicious SW dump guest memory by tracking guest dirty pages by
>> admin vq live migration faclity
> All this does is tell you which pages did device access though.
> It looks like on many architectures this information is readily
> available anyway due to host page tables being under the hypervisor
> control, since this is how it's migrated. Problem? How is memory
> migrated otherwise?
It tracks dirty pages, may record them in a bitmap.

Without CoCo, procdump or qemu dump-guest-memory can dump the guest 
memory pages,
but it does not know which part of memory is guest secrets.

For example, if a malicious SW wants to sniff the guest networking, the 
bitmap
of the dirty pages can help to locate the network DMA pages. This also apply
to disk IOs

So this enlarges the attacking surface.

Current live migration solution does not use any PFs tracking dirty pages,
so no such side channel.
>
>>> And admin command approach [1] has clear direction for CC to delete those admin commands to a dedicated trusted entity instead of hypervisor.
>>>
>>> I try to explain these few times, but..
>>>
>>> Anyways, if AQ has some comments better to reply in its thread at [1].
>>>
>>> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
>>>
>>> I will post v1 for [1] with more mature device context this week along with future provisioning item note.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 11:22                               ` Zhu, Lingshan
@ 2023-09-20 12:05                                 ` Michael S. Tsirkin
  2023-09-20 12:13                                   ` Parav Pandit
                                                     ` (3 more replies)
  0 siblings, 4 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 12:05 UTC (permalink / raw)
  To: Zhu, Lingshan; +Cc: Parav Pandit, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 07:22:32PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > > > Please refer to the code for setting FEATURES_OK.
> > > > > It wont work when one needs to suspend the device.
> > > > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > > > Well not really. It's over admin commands. When these were built the
> > > > intent always was that it's possible to use admin commands through
> > > > another interface, other than admin queue. Is there a problem
> > > > implementing admin commands over a memory BAR? For example, I can see
> > > > an "admin command" capability pointing at a BAR where
> > > > commands are supplied, and using a new group type referring to
> > > > device itself.
> > > I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> > > based live migration.
> > Not a proxy for a vq in that there's no vq then.
> I think if the driver sends admin commands through a VF's bar, then
> VF forwards the admin commands to the PF, it acts like a proxy,
> or an agent. Anyway it takes admin commands.

Why send them to the PF? They are controlling the VF anyway.

> So the problems we have discussed still exist.
> > 
> > > then the problems of admin vq LM that we have
> > > discussed
> > > still exist.
> > I freely admit the finer points of this extended flamewar have been lost
> > on me, and I wager I'm not the only one. I thought you wanted to migrate
> > the device just by accessing the device itself (e.g. the VF) without
> > accessing other devices (e.g. the PF), while Parav wants it in a
> > separate device so the whole of the device itself can passed through to
> > guest. Isn't this, fundamentally, the issue?
> we are implementing basic facilities for live migration.
> 
> We have pointed out lots of issues, there are many discussions with
> Jason and Parav about the problems in migration by admin vq, for example:
> security, QOS and nested.

/me shrugs
Thanks for the summary I guess. Same applies to almost any proposal.
What would help make progress is an explanation why this has grown into
a megathread.  Do you understand Parav's thoughts well enough to
summarize them?

> > 
> > > the bar is only a proxy, doesn't fix anything. and even larger
> > > side channel attacking surface: vf-->pf-->vf
> > In this model there's no pf. BAR belongs to vf itself
> > and you submit commands for the VF through its BAR.
> > Just separate from the pci config space.
> If using the bar to process admin commands,
> is this solution too heavy compared to my proposal in this series?

somewhat - because it's more comprehensive - you can actually
migrate a device using it.
this series just begins to define how to poke at some
of the vq state - it's a subset of the necessary functionality.

And it will give you a bunch of side benefits, such as
support for legacy compat commands that were merged.



> > 
> > The whole attacking surface discussion is also puzzling.  We either are
> > or are not discussing confidential computing/TDI.  I couldn't figure
> > it out. This needs a separate thread I think.
> I agree confidential computing is out of spec. Parva mentioned TDISP and
> even
> in TDISP spec, it explicitly defined some attacking model, and PF is an
> example.
> 
> It is out of spec anyway.

OK so we are ignoring TDISP applications for now? Everyone agrees on
that?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:05                                     ` Zhu, Lingshan
@ 2023-09-20 12:08                                       ` Zhu, Lingshan
  2023-09-20 12:22                                       ` Michael S. Tsirkin
  1 sibling, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20 12:08 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Parav Pandit, virtio-dev, Jason Wang



On 9/20/2023 8:05 PM, Zhu, Lingshan wrote:
>
>
> On 9/20/2023 7:52 PM, Michael S. Tsirkin wrote:
>> On Wed, Sep 20, 2023 at 07:28:39PM +0800, Zhu, Lingshan wrote:
>>>
>>> On 9/20/2023 6:55 PM, Parav Pandit wrote:
>>>>> From: Michael S. Tsirkin <mst@redhat.com>
>>>>> Sent: Wednesday, September 20, 2023 4:06 PM
>>>>> I freely admit the finer points of this extended flamewar have 
>>>>> been lost on me,
>>>>> and I wager I'm not the only one. I thought you wanted to migrate 
>>>>> the device
>>>>> just by accessing the device itself (e.g. the VF) without 
>>>>> accessing other devices
>>>>> (e.g. the PF), while Parav wants it in a separate device so the 
>>>>> whole of the
>>>>> device itself can passed through to guest. Isn't this, 
>>>>> fundamentally, the issue?
>>>> Right. An admin device doing the work of device migration. Today it 
>>>> is the owner PF.
>>>> In future it can be other admin device who is deleted this task of 
>>>> migration, who can be group owner.
>>>> All the admin commands that we plumb here just works great in that 
>>>> CC/TDI future, because only thing changes is the admin device 
>>>> issuing this command.
>>>>
>>>>>> the bar is only a proxy, doesn't fix anything. and even larger side
>>>>>> channel attacking surface: vf-->pf-->vf
>>>>> In this model there's no pf. BAR belongs to vf itself and you 
>>>>> submit commands
>>>>> for the VF through its BAR.
>>>>> Just separate from the pci config space.
>>>>>
>>>>> The whole attacking surface discussion is also puzzling.  We 
>>>>> either are or are
>>>>> not discussing confidential computing/TDI.  I couldn't figure it 
>>>>> out. This needs a
>>>>> separate thread I think.
>>>> True. Many of Lingshan thoughts/comments gets mixed I feel.
>>>> Because he proposes trap+emulation/mediation-based solution by 
>>>> hypervisor and none of that is secure anyway in CC/TDI concept.
>>>> He keeps attacking AQ as some side channel attack, while somehow 
>>>> trap+emulation also done by hypervisor is secure, which obviously 
>>>> does not make sense in CC/TDI concept.
>>>> Both scores equal where hypervisor trust is of concern.
>>> Please answer directly:
>> And here you go discussing this in the same thread. I feel you guys are
>> wasting bytes copying the list with this most people lost track
>> if not interest.
> I agree, although I have to reply because Parav said I am "attacking" 
> AQ which is
> not a good wording.
>
> And I need to show its not attacking, this is discussions on
> LM topics, there may be some debating, and I surely need to provide 
> proof.
>>
>>> What if a malicious SW suspend the guest when it is running through 
>>> admin vq
>>> live migration facility
>> I doubt suspend is a problem - looks like a denial of service to me
>> and that is not considered part of the threat model at least going by
>> the documents confidential computing guys are posting on lkml.
> Yes this is a denial of service and it can be a problem if the service 
> is a critical service
> like a remote attestation server.
>
> So suspending the VM by admin vq LM commands can attack the system.
>>
>>
>>> What if a malicious SW dump guest memory by tracking guest dirty 
>>> pages by
>>> admin vq live migration faclity
>> All this does is tell you which pages did device access though.
>> It looks like on many architectures this information is readily
>> available anyway due to host page tables being under the hypervisor
>> control, since this is how it's migrated. Problem? How is memory
>> migrated otherwise?
> It tracks dirty pages, may record them in a bitmap.
>
> Without CoCo, procdump or qemu dump-guest-memory can dump the guest 
> memory pages,
> but it does not know which part of memory is guest secrets.
>
> For example, if a malicious SW wants to sniff the guest networking, 
> the bitmap
> of the dirty pages can help to locate the network DMA pages. This also 
> apply
> to disk IOs
>
> So this enlarges the attacking surface.
>
> Current live migration solution does not use any PFs tracking dirty 
> pages,
> so no such side channel.
supplementary comments for my own reply:

Confidential computing is still out of the spec, and I think we should focus
on current solution

>>
>>>> And admin command approach [1] has clear direction for CC to delete 
>>>> those admin commands to a dedicated trusted entity instead of 
>>>> hypervisor.
>>>>
>>>> I try to explain these few times, but..
>>>>
>>>> Anyways, if AQ has some comments better to reply in its thread at [1].
>>>>
>>>> [1] 
>>>> https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
>>>>
>>>> I will post v1 for [1] with more mature device context this week 
>>>> along with future provisioning item note.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:05                                 ` Michael S. Tsirkin
@ 2023-09-20 12:13                                   ` Parav Pandit
  2023-09-20 12:16                                   ` Zhu, Lingshan
                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-20 12:13 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu, Lingshan; +Cc: virtio-dev, Jason Wang


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 20, 2023 5:36 PM
> 
> OK so we are ignoring TDISP applications for now? Everyone agrees on that?

We are actively considering TDISP applications to support in (unknown) future in a way that new spec additions for new features we do, does not block the future TDISP support.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:05                                 ` Michael S. Tsirkin
  2023-09-20 12:13                                   ` Parav Pandit
@ 2023-09-20 12:16                                   ` Zhu, Lingshan
  2023-09-20 12:40                                     ` Michael S. Tsirkin
  2023-09-20 12:41                                   ` Michael S. Tsirkin
  2023-09-21  3:18                                   ` Jason Wang
  3 siblings, 1 reply; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-20 12:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Parav Pandit, virtio-dev, Jason Wang



On 9/20/2023 8:05 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 20, 2023 at 07:22:32PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
>>>> On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
>>>>>>> Please refer to the code for setting FEATURES_OK.
>>>>>> It wont work when one needs to suspend the device.
>>>>>> There is no point of doing such work over registers as fundamental framework is over the AQ.
>>>>> Well not really. It's over admin commands. When these were built the
>>>>> intent always was that it's possible to use admin commands through
>>>>> another interface, other than admin queue. Is there a problem
>>>>> implementing admin commands over a memory BAR? For example, I can see
>>>>> an "admin command" capability pointing at a BAR where
>>>>> commands are supplied, and using a new group type referring to
>>>>> device itself.
>>>> I am not sure, if a bar cap would be implemented as a proxy for the admin vq
>>>> based live migration.
>>> Not a proxy for a vq in that there's no vq then.
>> I think if the driver sends admin commands through a VF's bar, then
>> VF forwards the admin commands to the PF, it acts like a proxy,
>> or an agent. Anyway it takes admin commands.
> Why send them to the PF? They are controlling the VF anyway.
I think its still too heavy compared to this series proposal
>
>> So the problems we have discussed still exist.
>>>> then the problems of admin vq LM that we have
>>>> discussed
>>>> still exist.
>>> I freely admit the finer points of this extended flamewar have been lost
>>> on me, and I wager I'm not the only one. I thought you wanted to migrate
>>> the device just by accessing the device itself (e.g. the VF) without
>>> accessing other devices (e.g. the PF), while Parav wants it in a
>>> separate device so the whole of the device itself can passed through to
>>> guest. Isn't this, fundamentally, the issue?
>> we are implementing basic facilities for live migration.
>>
>> We have pointed out lots of issues, there are many discussions with
>> Jason and Parav about the problems in migration by admin vq, for example:
>> security, QOS and nested.
> /me shrugs
> Thanks for the summary I guess. Same applies to almost any proposal.
> What would help make progress is an explanation why this has grown into
> a megathread.  Do you understand Parav's thoughts well enough to
> summarize them?
as far as I see, I don't see admin vq as must for live migration.
and it does not serve nested for sure.
>
>>>> the bar is only a proxy, doesn't fix anything. and even larger
>>>> side channel attacking surface: vf-->pf-->vf
>>> In this model there's no pf. BAR belongs to vf itself
>>> and you submit commands for the VF through its BAR.
>>> Just separate from the pci config space.
>> If using the bar to process admin commands,
>> is this solution too heavy compared to my proposal in this series?
> somewhat - because it's more comprehensive - you can actually
> migrate a device using it.
> this series just begins to define how to poke at some
> of the vq state - it's a subset of the necessary functionality.
>
> And it will give you a bunch of side benefits, such as
> support for legacy compat commands that were merged.
next version will include in-flight descriptors and dirty page tracking.
I failed to process the comments for legacy.
legacy devices are defined by code than the spec, if wants to migrate legacy
devices, maybe working on QEMU first
>
>
>
>>> The whole attacking surface discussion is also puzzling.  We either are
>>> or are not discussing confidential computing/TDI.  I couldn't figure
>>> it out. This needs a separate thread I think.
>> I agree confidential computing is out of spec. Parva mentioned TDISP and
>> even
>> in TDISP spec, it explicitly defined some attacking model, and PF is an
>> example.
>>
>> It is out of spec anyway.
> OK so we are ignoring TDISP applications for now? Everyone agrees on
> that?
sure
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:05                                     ` Zhu, Lingshan
  2023-09-20 12:08                                       ` Zhu, Lingshan
@ 2023-09-20 12:22                                       ` Michael S. Tsirkin
  1 sibling, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 12:22 UTC (permalink / raw)
  To: Zhu, Lingshan; +Cc: Parav Pandit, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 08:05:24PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 7:52 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 20, 2023 at 07:28:39PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/20/2023 6:55 PM, Parav Pandit wrote:
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: Wednesday, September 20, 2023 4:06 PM
> > > > > I freely admit the finer points of this extended flamewar have been lost on me,
> > > > > and I wager I'm not the only one. I thought you wanted to migrate the device
> > > > > just by accessing the device itself (e.g. the VF) without accessing other devices
> > > > > (e.g. the PF), while Parav wants it in a separate device so the whole of the
> > > > > device itself can passed through to guest. Isn't this, fundamentally, the issue?
> > > > Right. An admin device doing the work of device migration. Today it is the owner PF.
> > > > In future it can be other admin device who is deleted this task of migration, who can be group owner.
> > > > All the admin commands that we plumb here just works great in that CC/TDI future, because only thing changes is the admin device issuing this command.
> > > > 
> > > > > > the bar is only a proxy, doesn't fix anything. and even larger side
> > > > > > channel attacking surface: vf-->pf-->vf
> > > > > In this model there's no pf. BAR belongs to vf itself and you submit commands
> > > > > for the VF through its BAR.
> > > > > Just separate from the pci config space.
> > > > > 
> > > > > The whole attacking surface discussion is also puzzling.  We either are or are
> > > > > not discussing confidential computing/TDI.  I couldn't figure it out. This needs a
> > > > > separate thread I think.
> > > > True. Many of Lingshan thoughts/comments gets mixed I feel.
> > > > Because he proposes trap+emulation/mediation-based solution by hypervisor and none of that is secure anyway in CC/TDI concept.
> > > > He keeps attacking AQ as some side channel attack, while somehow trap+emulation also done by hypervisor is secure, which obviously does not make sense in CC/TDI concept.
> > > > Both scores equal where hypervisor trust is of concern.
> > > Please answer directly:
> > And here you go discussing this in the same thread. I feel you guys are
> > wasting bytes copying the list with this most people lost track
> > if not interest.
> I agree, although I have to reply because Parav said I am "attacking" AQ
> which is
> not a good wording.
> And I need to show its not attacking, this is discussions on
> LM topics, there may be some debating, and I surely need to provide proof.

I will be very surprised if something costructive comes out of debates
in this style.

Some sure ways to detect flamewars:
- deep thread
- repeating same claims
- ignoring questions/comments

This passes with flying colors and I'm not going to point fingers
but really please start seeing each other's point of view.


> > 
> > > What if a malicious SW suspend the guest when it is running through admin vq
> > > live migration facility
> > I doubt suspend is a problem - looks like a denial of service to me
> > and that is not considered part of the threat model at least going by
> > the documents confidential computing guys are posting on lkml.
> Yes this is a denial of service and it can be a problem if the service is a
> critical service
> like a remote attestation server.
> 
> So suspending the VM by admin vq LM commands can attack the system.

um did you read the coco threat model posts? they are educational.
ability to deny service to a VM is currently fundamental to building
PaaS platforms on top of it.


> > 
> > 
> > > What if a malicious SW dump guest memory by tracking guest dirty pages by
> > > admin vq live migration faclity
> > All this does is tell you which pages did device access though.
> > It looks like on many architectures this information is readily
> > available anyway due to host page tables being under the hypervisor
> > control, since this is how it's migrated. Problem? How is memory
> > migrated otherwise?
> It tracks dirty pages, may record them in a bitmap.
> 
> Without CoCo, procdump or qemu dump-guest-memory can dump the guest memory
> pages,
> but it does not know which part of memory is guest secrets.
> 
> For example, if a malicious SW wants to sniff the guest networking, the
> bitmap
> of the dirty pages can help to locate the network DMA pages. This also apply
> to disk IOs
> 
> So this enlarges the attacking surface.
> 
> Current live migration solution does not use any PFs tracking dirty pages,
> so no such side channel.


You must be joking. Look into the virtio ring, there are DMA addresses
right there.


> > 
> > > > And admin command approach [1] has clear direction for CC to delete those admin commands to a dedicated trusted entity instead of hypervisor.
> > > > 
> > > > I try to explain these few times, but..
> > > > 
> > > > Anyways, if AQ has some comments better to reply in its thread at [1].
> > > > 
> > > > [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
> > > > 
> > > > I will post v1 for [1] with more mature device context this week along with future provisioning item note.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:16                                   ` Zhu, Lingshan
@ 2023-09-20 12:40                                     ` Michael S. Tsirkin
  2023-09-21  3:14                                       ` Jason Wang
  0 siblings, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 12:40 UTC (permalink / raw)
  To: Zhu, Lingshan; +Cc: Parav Pandit, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 08:16:13PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 8:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 20, 2023 at 07:22:32PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> > > > > On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > > > > > Please refer to the code for setting FEATURES_OK.
> > > > > > > It wont work when one needs to suspend the device.
> > > > > > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > > > > > Well not really. It's over admin commands. When these were built the
> > > > > > intent always was that it's possible to use admin commands through
> > > > > > another interface, other than admin queue. Is there a problem
> > > > > > implementing admin commands over a memory BAR? For example, I can see
> > > > > > an "admin command" capability pointing at a BAR where
> > > > > > commands are supplied, and using a new group type referring to
> > > > > > device itself.
> > > > > I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> > > > > based live migration.
> > > > Not a proxy for a vq in that there's no vq then.
> > > I think if the driver sends admin commands through a VF's bar, then
> > > VF forwards the admin commands to the PF, it acts like a proxy,
> > > or an agent. Anyway it takes admin commands.
> > Why send them to the PF? They are controlling the VF anyway.
> I think its still too heavy compared to this series proposal

it will be on you to prove all the complexity is unnecessary though.

> > 
> > > So the problems we have discussed still exist.
> > > > > then the problems of admin vq LM that we have
> > > > > discussed
> > > > > still exist.
> > > > I freely admit the finer points of this extended flamewar have been lost
> > > > on me, and I wager I'm not the only one. I thought you wanted to migrate
> > > > the device just by accessing the device itself (e.g. the VF) without
> > > > accessing other devices (e.g. the PF), while Parav wants it in a
> > > > separate device so the whole of the device itself can passed through to
> > > > guest. Isn't this, fundamentally, the issue?
> > > we are implementing basic facilities for live migration.
> > > 
> > > We have pointed out lots of issues, there are many discussions with
> > > Jason and Parav about the problems in migration by admin vq, for example:
> > > security, QOS and nested.
> > /me shrugs
> > Thanks for the summary I guess. Same applies to almost any proposal.
> > What would help make progress is an explanation why this has grown into
> > a megathread.  Do you understand Parav's thoughts well enough to
> > summarize them?
> as far as I see, I don't see admin vq as must for live migration.
> and it does not serve nested for sure.
> > 
> > > > > the bar is only a proxy, doesn't fix anything. and even larger
> > > > > side channel attacking surface: vf-->pf-->vf
> > > > In this model there's no pf. BAR belongs to vf itself
> > > > and you submit commands for the VF through its BAR.
> > > > Just separate from the pci config space.
> > > If using the bar to process admin commands,
> > > is this solution too heavy compared to my proposal in this series?
> > somewhat - because it's more comprehensive - you can actually
> > migrate a device using it.
> > this series just begins to define how to poke at some
> > of the vq state - it's a subset of the necessary functionality.
> > 
> > And it will give you a bunch of side benefits, such as
> > support for legacy compat commands that were merged.
> next version will include in-flight descriptors and dirty page tracking.

what we don't need is another version of this megathread.
which it sounds like you intend to restart?
nor do I cherish maintaining two independent mechanisms for doing
the same thing in the spec.
all of the above is already in parav's patchset so you guys should find
a way to work together rather than compete?

> I failed to process the comments for legacy.
> legacy devices are defined by code than the spec, if wants to migrate legacy
> devices, maybe working on QEMU first

that is not much in the way of addressing, just saying go pound sand.
the functionality has already been accepted by the TC though I don't
know what you are trying to say here. that we should drop it from spec?

> > 
> > 
> > 
> > > > The whole attacking surface discussion is also puzzling.  We either are
> > > > or are not discussing confidential computing/TDI.  I couldn't figure
> > > > it out. This needs a separate thread I think.
> > > I agree confidential computing is out of spec. Parva mentioned TDISP and
> > > even
> > > in TDISP spec, it explicitly defined some attacking model, and PF is an
> > > example.
> > > 
> > > It is out of spec anyway.
> > OK so we are ignoring TDISP applications for now? Everyone agrees on
> > that?
> sure
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:05                                 ` Michael S. Tsirkin
  2023-09-20 12:13                                   ` Parav Pandit
  2023-09-20 12:16                                   ` Zhu, Lingshan
@ 2023-09-20 12:41                                   ` Michael S. Tsirkin
  2023-09-20 13:41                                     ` Parav Pandit
  2023-09-21  3:26                                     ` Jason Wang
  2023-09-21  3:18                                   ` Jason Wang
  3 siblings, 2 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 12:41 UTC (permalink / raw)
  To: Zhu, Lingshan; +Cc: Parav Pandit, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 08:05:49AM -0400, Michael S. Tsirkin wrote:
> On Wed, Sep 20, 2023 at 07:22:32PM +0800, Zhu, Lingshan wrote:
> > 
> > 
> > On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
> > > On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> > > > 
> > > > On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > > > > Please refer to the code for setting FEATURES_OK.
> > > > > > It wont work when one needs to suspend the device.
> > > > > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > > > > Well not really. It's over admin commands. When these were built the
> > > > > intent always was that it's possible to use admin commands through
> > > > > another interface, other than admin queue. Is there a problem
> > > > > implementing admin commands over a memory BAR? For example, I can see
> > > > > an "admin command" capability pointing at a BAR where
> > > > > commands are supplied, and using a new group type referring to
> > > > > device itself.
> > > > I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> > > > based live migration.
> > > Not a proxy for a vq in that there's no vq then.
> > I think if the driver sends admin commands through a VF's bar, then
> > VF forwards the admin commands to the PF, it acts like a proxy,
> > or an agent. Anyway it takes admin commands.
> 
> Why send them to the PF? They are controlling the VF anyway.
> 
> > So the problems we have discussed still exist.
> > > 
> > > > then the problems of admin vq LM that we have
> > > > discussed
> > > > still exist.
> > > I freely admit the finer points of this extended flamewar have been lost
> > > on me, and I wager I'm not the only one. I thought you wanted to migrate
> > > the device just by accessing the device itself (e.g. the VF) without
> > > accessing other devices (e.g. the PF), while Parav wants it in a
> > > separate device so the whole of the device itself can passed through to
> > > guest. Isn't this, fundamentally, the issue?
> > we are implementing basic facilities for live migration.
> > 
> > We have pointed out lots of issues, there are many discussions with
> > Jason and Parav about the problems in migration by admin vq, for example:
> > security, QOS and nested.
> 
> /me shrugs
> Thanks for the summary I guess. Same applies to almost any proposal.
> What would help make progress is an explanation why this has grown into
> a megathread.  Do you understand Parav's thoughts well enough to
> summarize them?


And Parav same goes for you - can you summarize Zhu Lingshan's position?

> > > 
> > > > the bar is only a proxy, doesn't fix anything. and even larger
> > > > side channel attacking surface: vf-->pf-->vf
> > > In this model there's no pf. BAR belongs to vf itself
> > > and you submit commands for the VF through its BAR.
> > > Just separate from the pci config space.
> > If using the bar to process admin commands,
> > is this solution too heavy compared to my proposal in this series?
> 
> somewhat - because it's more comprehensive - you can actually
> migrate a device using it.
> this series just begins to define how to poke at some
> of the vq state - it's a subset of the necessary functionality.
> 
> And it will give you a bunch of side benefits, such as
> support for legacy compat commands that were merged.
> 
> 
> 
> > > 
> > > The whole attacking surface discussion is also puzzling.  We either are
> > > or are not discussing confidential computing/TDI.  I couldn't figure
> > > it out. This needs a separate thread I think.
> > I agree confidential computing is out of spec. Parva mentioned TDISP and
> > even
> > in TDISP spec, it explicitly defined some attacking model, and PF is an
> > example.
> > 
> > It is out of spec anyway.
> 
> OK so we are ignoring TDISP applications for now? Everyone agrees on
> that?
> 
> -- 
> MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:41                                   ` Michael S. Tsirkin
@ 2023-09-20 13:41                                     ` Parav Pandit
  2023-09-20 14:13                                       ` Michael S. Tsirkin
                                                         ` (2 more replies)
  2023-09-21  3:26                                     ` Jason Wang
  1 sibling, 3 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-20 13:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, Zhu, Lingshan; +Cc: virtio-dev, Jason Wang

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 20, 2023 6:12 PM

> And Parav same goes for you - can you summarize Zhu Lingshan's position?

Below is my summary about Zhu Lingshan's position:

One line summary of his position in my view:

0. Use inband device migration only, use mediation, mediation is secure, but AQ is not secure.

Details of his position in my view:

1. Device migration must be done through VF itself by suspending specific vqs and the VF device both.
2. When device migration is done using #1, it must be done using mediation approach in hypervisor.

3. When migration is done using inband mediation it is more secure than AQ approach.
(as opposed to AQ of the owner device who enables/disables SR-IOV).

4. AQ is not secure.
But,
5. AQ and admin commands can be built on top of his proposal #1, even if AQ is less secure. Opposing statements...

6. Dirty page tracking and inflight descriptors tracking to be done in his v1. but he does not want to review such coverage in [1].

8. Since his series does not cover any device context migration and does not talk anything about it, 
I deduce that he plans to use cvq for setting ups RSS and other fields using inband CVQ of the VF.
This further limit the solution to only net device, ignoring rest of the other 20+ device types, where all may not have the CVQ.

9. trapping and emulation of following objects: AQ, CVQ, virtio config space, PCI FLR flow in hypervisor is secure, but when if AQ of the PF do far small work of it, AQ is not secure.

10. Any traps proposed in #9 mostly do not work with future TDISP as TDISP do not bifurcate the device, so ignore them for now to promote inband migration.

11. He do not show interest in collaboration (even after requesting few times) to see if we can produce common commands that may work for both passthrough (without mediation) and using mediation for nested case.

12. Some how register access on single physical card for the PFs and VFs gives better QoS guarantee than virtqueue as registers can scale infinitely no matter how many VFs or for multiple VQs because it is per VF.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 13:41                                     ` Parav Pandit
@ 2023-09-20 14:13                                       ` Michael S. Tsirkin
  2023-09-20 14:16                                       ` Michael S. Tsirkin
  2023-09-21  9:18                                         ` [virtio-comment] " Zhu, Lingshan
  2 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 14:13 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 01:41:00PM +0000, Parav Pandit wrote:
> 12. Some how register access on single physical card for the PFs and VFs gives better QoS guarantee than virtqueue as registers can scale infinitely no matter how many VFs or for multiple VQs because it is per VF.
>

This makes some sense as memory accesses to independent devices do not
need to be ordered. AQ's answer is multiple queues and out of order
execution within the queue. Not sure whether this is good enough.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 13:41                                     ` Parav Pandit
  2023-09-20 14:13                                       ` Michael S. Tsirkin
@ 2023-09-20 14:16                                       ` Michael S. Tsirkin
  2023-09-20 17:21                                         ` Parav Pandit
  2023-09-21  9:18                                         ` [virtio-comment] " Zhu, Lingshan
  2 siblings, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 14:16 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 01:41:00PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, September 20, 2023 6:12 PM
> 
> > And Parav same goes for you - can you summarize Zhu Lingshan's position?
> 
> Below is my summary about Zhu Lingshan's position:
> 
> One line summary of his position in my view:
> 
> 0. Use inband device migration only, use mediation, mediation is secure, but AQ is not secure.
> 
> Details of his position in my view:
> 
> 1. Device migration must be done through VF itself by suspending specific vqs and the VF device both.
> 2. When device migration is done using #1, it must be done using mediation approach in hypervisor.
> 
> 3. When migration is done using inband mediation it is more secure than AQ approach.
> (as opposed to AQ of the owner device who enables/disables SR-IOV).
> 
> 4. AQ is not secure.
> But,
> 5. AQ and admin commands can be built on top of his proposal #1, even if AQ is less secure. Opposing statements...
> 
> 6. Dirty page tracking and inflight descriptors tracking to be done in his v1. but he does not want to review such coverage in [1].
> 
> 8. Since his series does not cover any device context migration and does not talk anything about it, 
> I deduce that he plans to use cvq for setting ups RSS and other fields using inband CVQ of the VF.
> This further limit the solution to only net device, ignoring rest of the other 20+ device types, where all may not have the CVQ.
> 
> 9. trapping and emulation of following objects: AQ, CVQ, virtio config space, PCI FLR flow in hypervisor is secure, but when if AQ of the PF do far small work of it, AQ is not secure.
> 
> 10. Any traps proposed in #9 mostly do not work with future TDISP as TDISP do not bifurcate the device, so ignore them for now to promote inband migration.
> 
> 11. He do not show interest in collaboration (even after requesting few times) to see if we can produce common commands that may work for both passthrough (without mediation) and using mediation for nested case.
> 
> 12. Some how register access on single physical card for the PFs and VFs gives better QoS guarantee than virtqueue as registers can scale infinitely no matter how many VFs or for multiple VQs because it is per VF.
> 
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead


OK so with this summary in mind, can you find any advantages to
inband+mediation that are real or do you just see disadvantages? And
it's a tricky question because I can see some advantages ;)


-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 14:16                                       ` Michael S. Tsirkin
@ 2023-09-20 17:21                                         ` Parav Pandit
  2023-09-20 20:03                                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-20 17:21 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, September 20, 2023 7:46 PM
> 
> > Details of his position in my view:
> >
> > 1. Device migration must be done through VF itself by suspending specific vqs
> and the VF device both.
> > 2. When device migration is done using #1, it must be done using mediation
> approach in hypervisor.
> >
> > 3. When migration is done using inband mediation it is more secure than AQ
> approach.
> > (as opposed to AQ of the owner device who enables/disables SR-IOV).
> >
> > 4. AQ is not secure.
> > But,
> > 5. AQ and admin commands can be built on top of his proposal #1, even if AQ
> is less secure. Opposing statements...
> >
> > 6. Dirty page tracking and inflight descriptors tracking to be done in his v1. but
> he does not want to review such coverage in [1].
> >
> > 8. Since his series does not cover any device context migration and
> > does not talk anything about it, I deduce that he plans to use cvq for setting
> ups RSS and other fields using inband CVQ of the VF.
> > This further limit the solution to only net device, ignoring rest of the other
> 20+ device types, where all may not have the CVQ.
> >
> > 9. trapping and emulation of following objects: AQ, CVQ, virtio config space,
> PCI FLR flow in hypervisor is secure, but when if AQ of the PF do far small work
> of it, AQ is not secure.
> >
> > 10. Any traps proposed in #9 mostly do not work with future TDISP as TDISP do
> not bifurcate the device, so ignore them for now to promote inband migration.
> >
> > 11. He do not show interest in collaboration (even after requesting few times)
> to see if we can produce common commands that may work for both
> passthrough (without mediation) and using mediation for nested case.
> >
> > 12. Some how register access on single physical card for the PFs and VFs gives
> better QoS guarantee than virtqueue as registers can scale infinitely no matter
> how many VFs or for multiple VQs because it is per VF.
> >
> > [1]
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
> 
> 
> OK so with this summary in mind, can you find any advantages to
> inband+mediation that are real or do you just see disadvantages? And
> it's a tricky question because I can see some advantages ;)

inband + mediation may be useful for nested case.

In attempting inband + mediation, there are many critical pieces are let go.
It may be fine to let go for some cases but not for passthrough.

The fundamental advantages of owner-based approach I see are:
1. Nesting use case usually involve large number of VMs to be hosted in one VM
For this purpose, better to hand over a PF to level 0 VM that hosts VFs and level 1 VMs and avoid two level device nesting.

2. Support P2P natively

3. Single non replicated resource (AQ) manages less frequent work of device migration
No need to replicate AQs to thousands of VFs who rarely do the migration work.
Overall gains system, device, and memory efficiency.

5. Passthrough simply do not work at all ever without owner device.
This is because dirty page tracking, device context management, CVQ, FLR, config space, device status, MSIX config all must be trapped.
Many systems do not prefer this involvement of hypervisor even if the hypervisor is trusted. (to avoid moving parts).
New generation TEE, TPM devices are on horizon, and they would not like things not trapped either.
The security audit surface is very large for them.

6. Any new basic functionality added to device must always also require constant software updates at few layers in mediation entities

7. TDISP is inherently covered. Without owner device TDISP is broken as device cannot be bifurcated.

To me #2, #5, #7 are critical piece that a device migration must support/work with.
Rest is second level of importance.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 17:21                                         ` Parav Pandit
@ 2023-09-20 20:03                                           ` Michael S. Tsirkin
  2023-09-21  3:43                                             ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-20 20:03 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 05:21:52PM +0000, Parav Pandit wrote:
> > OK so with this summary in mind, can you find any advantages to
> > inband+mediation that are real or do you just see disadvantages? And
> > it's a tricky question because I can see some advantages ;)
> 
> inband + mediation may be useful for nested case.

Hint: there's more.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:40                                     ` Michael S. Tsirkin
@ 2023-09-21  3:14                                       ` Jason Wang
  2023-09-21  3:51                                         ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-21  3:14 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, Parav Pandit, virtio-dev

On Wed, Sep 20, 2023 at 8:40 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 20, 2023 at 08:16:13PM +0800, Zhu, Lingshan wrote:
> >
> >
> > On 9/20/2023 8:05 PM, Michael S. Tsirkin wrote:
> > > On Wed, Sep 20, 2023 at 07:22:32PM +0800, Zhu, Lingshan wrote:
> > > >
> > > > On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
> > > > > On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> > > > > > On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > > > > > > Please refer to the code for setting FEATURES_OK.
> > > > > > > > It wont work when one needs to suspend the device.
> > > > > > > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > > > > > > Well not really. It's over admin commands. When these were built the
> > > > > > > intent always was that it's possible to use admin commands through
> > > > > > > another interface, other than admin queue. Is there a problem
> > > > > > > implementing admin commands over a memory BAR? For example, I can see
> > > > > > > an "admin command" capability pointing at a BAR where
> > > > > > > commands are supplied, and using a new group type referring to
> > > > > > > device itself.
> > > > > > I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> > > > > > based live migration.
> > > > > Not a proxy for a vq in that there's no vq then.
> > > > I think if the driver sends admin commands through a VF's bar, then
> > > > VF forwards the admin commands to the PF, it acts like a proxy,
> > > > or an agent. Anyway it takes admin commands.
> > > Why send them to the PF? They are controlling the VF anyway.
> > I think its still too heavy compared to this series proposal
>
> it will be on you to prove all the complexity is unnecessary though.
>
> > >
> > > > So the problems we have discussed still exist.
> > > > > > then the problems of admin vq LM that we have
> > > > > > discussed
> > > > > > still exist.
> > > > > I freely admit the finer points of this extended flamewar have been lost
> > > > > on me, and I wager I'm not the only one. I thought you wanted to migrate
> > > > > the device just by accessing the device itself (e.g. the VF) without
> > > > > accessing other devices (e.g. the PF), while Parav wants it in a
> > > > > separate device so the whole of the device itself can passed through to
> > > > > guest. Isn't this, fundamentally, the issue?
> > > > we are implementing basic facilities for live migration.
> > > >
> > > > We have pointed out lots of issues, there are many discussions with
> > > > Jason and Parav about the problems in migration by admin vq, for example:
> > > > security, QOS and nested.
> > > /me shrugs
> > > Thanks for the summary I guess. Same applies to almost any proposal.
> > > What would help make progress is an explanation why this has grown into
> > > a megathread.  Do you understand Parav's thoughts well enough to
> > > summarize them?
> > as far as I see, I don't see admin vq as must for live migration.
> > and it does not serve nested for sure.
> > >
> > > > > > the bar is only a proxy, doesn't fix anything. and even larger
> > > > > > side channel attacking surface: vf-->pf-->vf
> > > > > In this model there's no pf. BAR belongs to vf itself
> > > > > and you submit commands for the VF through its BAR.
> > > > > Just separate from the pci config space.
> > > > If using the bar to process admin commands,
> > > > is this solution too heavy compared to my proposal in this series?
> > > somewhat - because it's more comprehensive - you can actually
> > > migrate a device using it.
> > > this series just begins to define how to poke at some
> > > of the vq state - it's a subset of the necessary functionality.
> > >
> > > And it will give you a bunch of side benefits, such as
> > > support for legacy compat commands that were merged.
> > next version will include in-flight descriptors and dirty page tracking.
>
> what we don't need is another version of this megathread.
> which it sounds like you intend to restart?
> nor do I cherish maintaining two independent mechanisms for doing
> the same thing in the spec.

I'm not sure how to define "the same thing" but we had different
transports for accessing basic facilities like virtqueue, device
status, etc. I don't get why live migration is different. Especially
considering migration is actually a combination of several independent
functions.

And we have already had different ways to transport legacy devices as
well. It's really hard to say one mechanism can work for all use
cases.

> all of the above is already in parav's patchset so you guys should find
> a way to work together rather than compete?

There're things that are missed in Parav's series for sure, that is
the migrating for owner and nest.

From my point of view there's no competition. The main issue I see so
far is that you want to couple migration with admin commands but I
don't see much advantages to doing this.

And before the collaboration, I need to first figure out if Parav's
proposal can work. Or what's wrong with this series. If it is just a
missing of any function, it could be added on top for sure, if it
belongs to the basic facility part, it can be reused (that's the
motivation for this series).

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 10:36                             ` Michael S. Tsirkin
  2023-09-20 10:55                               ` Parav Pandit
  2023-09-20 11:22                               ` Zhu, Lingshan
@ 2023-09-21  3:17                               ` Jason Wang
  2023-09-21  4:01                                 ` Parav Pandit
  2 siblings, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-21  3:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, Parav Pandit, virtio-dev

On Wed, Sep 20, 2023 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> >
> >
> > On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > > Please refer to the code for setting FEATURES_OK.
> > > > It wont work when one needs to suspend the device.
> > > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > > Well not really. It's over admin commands. When these were built the
> > > intent always was that it's possible to use admin commands through
> > > another interface, other than admin queue. Is there a problem
> > > implementing admin commands over a memory BAR? For example, I can see
> > > an "admin command" capability pointing at a BAR where
> > > commands are supplied, and using a new group type referring to
> > > device itself.
> > I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> > based live migration.
>
> Not a proxy for a vq in that there's no vq then.

As replied in another thread, the issues for BAR are:

1) Not sure it can have an efficient interface, it would be something
like VIRTIO_PCI_CAP_PCI_CFG which is very slow compared to single
register accessing
2) There's no owner/group/member for MMIO, most of the time, we only
need a single MMIO device. If we want the owner to manage itself, it
seems redundant as is implied in all the existing transports (without
admin commands). Even if we had, it might still suffer from bootstrap
issues.
3) For live migration, it means the admin commands needs to start from
duplicating every existing transport specific interface it can give
us. One example is that we may end up with two interfaces to access
virtqueue addresses etc. This results in extra complicity and it is
actually a full transport (driver can just use admin commands to drive
the device).
4) Admin commands itself may not be capable of doing things like dirty
page logging, it requires the assistance from the transport

>
> > then the problems of admin vq LM that we have
> > discussed
> > still exist.
>
> I freely admit the finer points of this extended flamewar have been lost
> on me, and I wager I'm not the only one. I thought you wanted to migrate
> the device just by accessing the device itself (e.g. the VF) without
> accessing other devices (e.g. the PF), while Parav wants it in a
> separate device so the whole of the device itself can passed through to
> guest.

If accessing the device itself, anything prevents us from passing it
through to the guest? It is how all the existing devices are built.

> Isn't this, fundamentally, the issue?

For me it's not. The fundamental issues are:

1) Parav's proposal does several couplings: couple basic build blocks
(suspend, dirty page tracking) with live migration, couple live
migration with admin commands. This proposal doesn't do such coupling,
and admin commands can be built on top.
2) It's still not clear that Parav's proposal can work, a lot of
corner cases needs to be examined

>
> > the bar is only a proxy, doesn't fix anything. and even larger
> > side channel attacking surface: vf-->pf-->vf
>
> In this model there's no pf. BAR belongs to vf itself
> and you submit commands for the VF through its BAR.
> Just separate from the pci config space.
>
> The whole attacking surface discussion is also puzzling.  We either are
> or are not discussing confidential computing/TDI.  I couldn't figure
> it out. This needs a separate thread I think.

Anyhow, it's not bad to take it into consideration. But we can do it
elsewhere for sure.

Thanks

>
> --
> MST
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:05                                 ` Michael S. Tsirkin
                                                     ` (2 preceding siblings ...)
  2023-09-20 12:41                                   ` Michael S. Tsirkin
@ 2023-09-21  3:18                                   ` Jason Wang
  2023-09-21  4:03                                     ` Parav Pandit
  3 siblings, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-21  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, Parav Pandit, virtio-dev

On Wed, Sep 20, 2023 at 8:05 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 20, 2023 at 07:22:32PM +0800, Zhu, Lingshan wrote:
> >
> >
> > On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
> > > On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> > > >
> > > > On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > > > > Please refer to the code for setting FEATURES_OK.
> > > > > > It wont work when one needs to suspend the device.
> > > > > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > > > > Well not really. It's over admin commands. When these were built the
> > > > > intent always was that it's possible to use admin commands through
> > > > > another interface, other than admin queue. Is there a problem
> > > > > implementing admin commands over a memory BAR? For example, I can see
> > > > > an "admin command" capability pointing at a BAR where
> > > > > commands are supplied, and using a new group type referring to
> > > > > device itself.
> > > > I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> > > > based live migration.
> > > Not a proxy for a vq in that there's no vq then.
> > I think if the driver sends admin commands through a VF's bar, then
> > VF forwards the admin commands to the PF, it acts like a proxy,
> > or an agent. Anyway it takes admin commands.
>
> Why send them to the PF? They are controlling the VF anyway.
>
> > So the problems we have discussed still exist.
> > >
> > > > then the problems of admin vq LM that we have
> > > > discussed
> > > > still exist.
> > > I freely admit the finer points of this extended flamewar have been lost
> > > on me, and I wager I'm not the only one. I thought you wanted to migrate
> > > the device just by accessing the device itself (e.g. the VF) without
> > > accessing other devices (e.g. the PF), while Parav wants it in a
> > > separate device so the whole of the device itself can passed through to
> > > guest. Isn't this, fundamentally, the issue?
> > we are implementing basic facilities for live migration.
> >
> > We have pointed out lots of issues, there are many discussions with
> > Jason and Parav about the problems in migration by admin vq, for example:
> > security, QOS and nested.
>
> /me shrugs
> Thanks for the summary I guess. Same applies to almost any proposal.

So it's something we need to consider in the virtio as well and it is
raised by different people. (Correct me if I was wrong)

Security, Parav.
Nest, me
QOS, LingShan

> What would help make progress is an explanation why this has grown into
> a megathread.  Do you understand Parav's thoughts well enough to
> summarize them?
>
> > >
> > > > the bar is only a proxy, doesn't fix anything. and even larger
> > > > side channel attacking surface: vf-->pf-->vf
> > > In this model there's no pf. BAR belongs to vf itself
> > > and you submit commands for the VF through its BAR.
> > > Just separate from the pci config space.
> > If using the bar to process admin commands,
> > is this solution too heavy compared to my proposal in this series?
>
> somewhat - because it's more comprehensive - you can actually
> migrate a device using it.

But it's not a must. And there will be a lot of duplication where it
will become a transport.

> this series just begins to define how to poke at some
> of the vq state - it's a subset of the necessary functionality.

It defines the minimal set of the functionality. We can have more for sure.

>
> And it will give you a bunch of side benefits, such as
> support for legacy compat commands that were merged.

Legacy has too many corner cases and why do we need to do such
revinenting of wheels? We had already had transitional devices for
years.

>
>
>
> > >
> > > The whole attacking surface discussion is also puzzling.  We either are
> > > or are not discussing confidential computing/TDI.  I couldn't figure
> > > it out. This needs a separate thread I think.
> > I agree confidential computing is out of spec. Parva mentioned TDISP and
> > even
> > in TDISP spec, it explicitly defined some attacking model, and PF is an
> > example.
> >
> > It is out of spec anyway.
>
> OK so we are ignoring TDISP applications for now? Everyone agrees on
> that?

I'm fine. But TDISP is something that needs to be considered. The
earlier we realize the possible issue the better.

Thanks




>
> --
> MST
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 12:41                                   ` Michael S. Tsirkin
  2023-09-20 13:41                                     ` Parav Pandit
@ 2023-09-21  3:26                                     ` Jason Wang
  2023-09-21  4:21                                       ` Parav Pandit
  1 sibling, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-21  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, Parav Pandit, virtio-dev

On Wed, Sep 20, 2023 at 8:41 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Sep 20, 2023 at 08:05:49AM -0400, Michael S. Tsirkin wrote:
> > On Wed, Sep 20, 2023 at 07:22:32PM +0800, Zhu, Lingshan wrote:
> > >
> > >
> > > On 9/20/2023 6:36 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 20, 2023 at 02:06:13PM +0800, Zhu, Lingshan wrote:
> > > > >
> > > > > On 9/19/2023 2:49 AM, Michael S. Tsirkin wrote:
> > > > > > On Mon, Sep 18, 2023 at 06:41:55PM +0000, Parav Pandit wrote:
> > > > > > > > Please refer to the code for setting FEATURES_OK.
> > > > > > > It wont work when one needs to suspend the device.
> > > > > > > There is no point of doing such work over registers as fundamental framework is over the AQ.
> > > > > > Well not really. It's over admin commands. When these were built the
> > > > > > intent always was that it's possible to use admin commands through
> > > > > > another interface, other than admin queue. Is there a problem
> > > > > > implementing admin commands over a memory BAR? For example, I can see
> > > > > > an "admin command" capability pointing at a BAR where
> > > > > > commands are supplied, and using a new group type referring to
> > > > > > device itself.
> > > > > I am not sure, if a bar cap would be implemented as a proxy for the admin vq
> > > > > based live migration.
> > > > Not a proxy for a vq in that there's no vq then.
> > > I think if the driver sends admin commands through a VF's bar, then
> > > VF forwards the admin commands to the PF, it acts like a proxy,
> > > or an agent. Anyway it takes admin commands.
> >
> > Why send them to the PF? They are controlling the VF anyway.
> >
> > > So the problems we have discussed still exist.
> > > >
> > > > > then the problems of admin vq LM that we have
> > > > > discussed
> > > > > still exist.
> > > > I freely admit the finer points of this extended flamewar have been lost
> > > > on me, and I wager I'm not the only one. I thought you wanted to migrate
> > > > the device just by accessing the device itself (e.g. the VF) without
> > > > accessing other devices (e.g. the PF), while Parav wants it in a
> > > > separate device so the whole of the device itself can passed through to
> > > > guest. Isn't this, fundamentally, the issue?
> > > we are implementing basic facilities for live migration.
> > >
> > > We have pointed out lots of issues, there are many discussions with
> > > Jason and Parav about the problems in migration by admin vq, for example:
> > > security, QOS and nested.
> >
> > /me shrugs
> > Thanks for the summary I guess. Same applies to almost any proposal.
> > What would help make progress is an explanation why this has grown into
> > a megathread.  Do you understand Parav's thoughts well enough to
> > summarize them?
>
>
> And Parav same goes for you - can you summarize Zhu Lingshan's position?

The root cause for the long debate is that there are a lot of
misunderstandings between each other. This can be seen from Parav's
reply.

My understanding is it might be better that each side do a summary of
the both proposals.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 20:03                                           ` Michael S. Tsirkin
@ 2023-09-21  3:43                                             ` Parav Pandit
  2023-09-21  5:41                                               ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  3:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 1:34 AM
> 
> On Wed, Sep 20, 2023 at 05:21:52PM +0000, Parav Pandit wrote:
> > > OK so with this summary in mind, can you find any advantages to
> > > inband+mediation that are real or do you just see disadvantages? And
> > > it's a tricky question because I can see some advantages ;)
> >
> > inband + mediation may be useful for nested case.
> 
> Hint: there's more.

Can you please list down?

The starting point of discussion is, there is passthrough member device without mediation in virtio interface layers.
How shall device migration should work for it?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  3:14                                       ` Jason Wang
@ 2023-09-21  3:51                                         ` Parav Pandit
  2023-09-21  4:02                                           ` Jason Wang
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  3:51 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 21, 2023 8:45 AM
> The main issue I see so far is that
> you want to couple migration with admin commands but I don't see much
> advantages to doing this.
> 
The way I read above comment is, to draw a parallel line: descriptor posting in virtio spec is tied to virtqueues. What is the advantage of it?
Well, it is one way to achieve it.
There may be different way to do all bulk data transfer without admin commands.
What it the advantage of it, please list down them in [1] for the command where you can find alternative.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  3:17                               ` Jason Wang
@ 2023-09-21  4:01                                 ` Parav Pandit
  2023-09-21  4:09                                   ` Jason Wang
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  4:01 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 21, 2023 8:48 AM

> As replied in another thread, the issues for BAR are:
> 
> 1) Not sure it can have an efficient interface, it would be something like
> VIRTIO_PCI_CAP_PCI_CFG which is very slow compared to single register
> accessing
> 2) There's no owner/group/member for MMIO, most of the time, we only need
> a single MMIO device. If we want the owner to manage itself, it seems
> redundant as is implied in all the existing transports (without admin commands).
> Even if we had, it might still suffer from bootstrap issues.
> 3) For live migration, it means the admin commands needs to start from
> duplicating every existing transport specific interface it can give us. One
> example is that we may end up with two interfaces to access virtqueue
> addresses etc. This results in extra complicity and it is actually a full transport
> (driver can just use admin commands to drive the device).
In [1] there is no duplication. The live migration driver never parses the device context either while reading or write.
Hence no code and no complexity in driver and no duplicate work.
Therefore, those admin commands are not to drive the guest device either.

> 4) Admin commands itself may not be capable of doing things like dirty page
> logging, it requires the assistance from the transport
>
Admin command in [1] is capable of dirty page logging.
 
> 1) Parav's proposal does several couplings: couple basic build blocks (suspend,
> dirty page tracking) with live migration, couple live migration with admin
> commands. 
In which use case you find dirty page tracking useful without migration for which you like to see it detached from device migration flow?
One can always use these commands if they wish to as_is.

> 2) It's still not clear that Parav's proposal can work, a lot of corner cases needs
> to be examined
> 
Please let me know which part can be improved in [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead


> >
> > > the bar is only a proxy, doesn't fix anything. and even larger side
> > > channel attacking surface: vf-->pf-->vf
> >
> > In this model there's no pf. BAR belongs to vf itself and you submit
> > commands for the VF through its BAR.
> > Just separate from the pci config space.
> >
> > The whole attacking surface discussion is also puzzling.  We either
> > are or are not discussing confidential computing/TDI.  I couldn't
> > figure it out. This needs a separate thread I think.
> 
> Anyhow, it's not bad to take it into consideration. But we can do it elsewhere for
> sure.
Thanks.
Please have comments in [1].

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  3:51                                         ` Parav Pandit
@ 2023-09-21  4:02                                           ` Jason Wang
  2023-09-21  4:11                                             ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-21  4:02 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Thu, Sep 21, 2023 at 11:51 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 21, 2023 8:45 AM
> > The main issue I see so far is that
> > you want to couple migration with admin commands but I don't see much
> > advantages to doing this.
> >
> The way I read above comment is, to draw a parallel line: descriptor posting in virtio spec is tied to virtqueues. What is the advantage of it?

Are you saying virtio can't live without admin commands? Again, let's
not shift concepts.

> Well, it is one way to achieve it.
> There may be different way to do all bulk data transfer without admin commands.

Why is virtqueue the only way to do bulk data transferring? Can't DMA
be initiated by other-way?

Thanks

> What it the advantage of it, please list down them in [1] for the command where you can find alternative.
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  3:18                                   ` Jason Wang
@ 2023-09-21  4:03                                     ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  4:03 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 21, 2023 8:48 AM


> I'm fine. But TDISP is something that needs to be considered. The earlier we
> realize the possible issue the better.

[1] has considered this in the design.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  4:01                                 ` Parav Pandit
@ 2023-09-21  4:09                                   ` Jason Wang
  2023-09-21  4:19                                     ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-21  4:09 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Thu, Sep 21, 2023 at 12:01 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 21, 2023 8:48 AM
>
> > As replied in another thread, the issues for BAR are:
> >
> > 1) Not sure it can have an efficient interface, it would be something like
> > VIRTIO_PCI_CAP_PCI_CFG which is very slow compared to single register
> > accessing
> > 2) There's no owner/group/member for MMIO, most of the time, we only need
> > a single MMIO device. If we want the owner to manage itself, it seems
> > redundant as is implied in all the existing transports (without admin commands).
> > Even if we had, it might still suffer from bootstrap issues.
> > 3) For live migration, it means the admin commands needs to start from
> > duplicating every existing transport specific interface it can give us. One
> > example is that we may end up with two interfaces to access virtqueue
> > addresses etc. This results in extra complicity and it is actually a full transport
> > (driver can just use admin commands to drive the device).
> In [1] there is no duplication. The live migration driver never parses the device context either while reading or write.
> Hence no code and no complexity in driver and no duplicate work.
> Therefore, those admin commands are not to drive the guest device either.

I'm not sure how this is related to the duplication issue.

>
> > 4) Admin commands itself may not be capable of doing things like dirty page
> > logging, it requires the assistance from the transport
> >
> Admin command in [1] is capable of dirty page logging.

In your design, the logging is done via DMA not the virtqueue.

The only job for virtqueue is to initiate the DMA. But if DMA can be
initiated via virtqueue, it can be done in other ways.

>
> > 1) Parav's proposal does several couplings: couple basic build blocks (suspend,
> > dirty page tracking) with live migration, couple live migration with admin
> > commands.
> In which use case you find dirty page tracking useful without migration for which you like to see it detached from device migration flow?

Is it only the dirty page tracking? It's the combinations of

1) suspending
2) device states
3) dirty page tracking

Eeah of those will have use cases other than live migration: VM stop,
power management in VM, profiling and monitoring, failover etc.

> One can always use these commands if they wish to as_is.
>
> > 2) It's still not clear that Parav's proposal can work, a lot of corner cases needs
> > to be examined
> >
> Please let me know which part can be improved in [1].

I will do that but it may take time. It's near to the public holiday.

Thanks

>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
>
>
> > >
> > > > the bar is only a proxy, doesn't fix anything. and even larger side
> > > > channel attacking surface: vf-->pf-->vf
> > >
> > > In this model there's no pf. BAR belongs to vf itself and you submit
> > > commands for the VF through its BAR.
> > > Just separate from the pci config space.
> > >
> > > The whole attacking surface discussion is also puzzling.  We either
> > > are or are not discussing confidential computing/TDI.  I couldn't
> > > figure it out. This needs a separate thread I think.
> >
> > Anyhow, it's not bad to take it into consideration. But we can do it elsewhere for
> > sure.
> Thanks.
> Please have comments in [1].
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  4:02                                           ` Jason Wang
@ 2023-09-21  4:11                                             ` Parav Pandit
  2023-09-21  4:19                                               ` Jason Wang
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  4:11 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 21, 2023 9:32 AM
> 
> On Thu, Sep 21, 2023 at 11:51 AM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 21, 2023 8:45 AM The main issue I see so
> > > far is that you want to couple migration with admin commands but I
> > > don't see much advantages to doing this.
> > >
> > The way I read above comment is, to draw a parallel line: descriptor posting in
> virtio spec is tied to virtqueues. What is the advantage of it?
> 
> Are you saying virtio can't live without admin commands? Again, let's not shift
> concepts.
>
No, I did not say that.
I just don’t see how functionalities proposed in [1] can be done without admin commands by the _device_ for member device passthrough requirement.

You made point as "don’t see much advantage with migration done using admin commands".
What is the advantage of descriptor posting using virtqueue. It is the way of virtio spec...
 
> > Well, it is one way to achieve it.
> > There may be different way to do all bulk data transfer without admin
> commands.
> 
> Why is virtqueue the only way to do bulk data transferring? Can't DMA be
> initiated by other-way?
>

Sure, what is the disadvantage of existing mechanism of virtqueue that can do following.
1. Ability to do DMA
2. agnostic of the DMA who do not want to do DMA
3. Ability to multiple command executions in parallel
4. Non blocking interface for driver that does not require any kind of polling

Why to invent new DMA scheme which at does all the 4 tasks?
First please list down disadvantages of admin queue + show all 4 things are achieved using new DMA interface.
That will help to understand why new dma interface is needed.


 
> Thanks
> 
> > What it the advantage of it, please list down them in [1] for the command
> where you can find alternative.
> >
> > [1]
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  4:09                                   ` Jason Wang
@ 2023-09-21  4:19                                     ` Parav Pandit
  2023-09-22  3:08                                       ` Jason Wang
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  4:19 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 21, 2023 9:39 AM
> 
> On Thu, Sep 21, 2023 at 12:01 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Thursday, September 21, 2023 8:48 AM
> >
> > > As replied in another thread, the issues for BAR are:
> > >
> > > 1) Not sure it can have an efficient interface, it would be
> > > something like VIRTIO_PCI_CAP_PCI_CFG which is very slow compared to
> > > single register accessing
> > > 2) There's no owner/group/member for MMIO, most of the time, we only
> > > need a single MMIO device. If we want the owner to manage itself, it
> > > seems redundant as is implied in all the existing transports (without admin
> commands).
> > > Even if we had, it might still suffer from bootstrap issues.
> > > 3) For live migration, it means the admin commands needs to start
> > > from duplicating every existing transport specific interface it can
> > > give us. One example is that we may end up with two interfaces to
> > > access virtqueue addresses etc. This results in extra complicity and
> > > it is actually a full transport (driver can just use admin commands to drive
> the device).
> > In [1] there is no duplication. The live migration driver never parses the device
> context either while reading or write.
> > Hence no code and no complexity in driver and no duplicate work.
> > Therefore, those admin commands are not to drive the guest device either.
> 
> I'm not sure how this is related to the duplication issue.
> 
You commented that admin virtqueue duplicates somethings.
And I explained above that it does not.

> >
> > > 4) Admin commands itself may not be capable of doing things like
> > > dirty page logging, it requires the assistance from the transport
> > >
> > Admin command in [1] is capable of dirty page logging.
> 
> In your design, the logging is done via DMA not the virtqueue.
> 
No. it is done via admin command, not DMA in [2].

[2] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#m17b09acd8c73d374e98ad84764b315afa94f59c9

> The only job for virtqueue is to initiate the DMA. But if DMA can be initiated via
> virtqueue, it can be done in other ways.
> 
Lets first establish 4 things in alternative way, any motivation to do so with 5th point without giant registers need in device.

> >
> > > 1) Parav's proposal does several couplings: couple basic build
> > > blocks (suspend, dirty page tracking) with live migration, couple
> > > live migration with admin commands.
> > In which use case you find dirty page tracking useful without migration for
> which you like to see it detached from device migration flow?
> 
> Is it only the dirty page tracking? It's the combinations of
> 
> 1) suspending
> 2) device states
> 3) dirty page tracking
> 
> Eeah of those will have use cases other than live migration: VM stop, power
> management in VM, profiling and monitoring, failover etc.
> 
Suspend/resume with different power state is driven by the guest directly.
So it may find some overlap.

Device context has no overlap.

Dirty page tracking has no overlap. What do you want to profile and monitor? In case if you want to profile, it can be used without migration command anyway?
If you describe, may be we I can split "device migration" chapter to two pieces, 
Device management and device migration.

Device migration will use these basic facility.
Would that help you?

Also those can be split later when the actual use case can be described.


> > One can always use these commands if they wish to as_is.
> >
> > > 2) It's still not clear that Parav's proposal can work, a lot of
> > > corner cases needs to be examined
> > >
> > Please let me know which part can be improved in [1].
> 
> I will do that but it may take time. It's near to the public holiday.

I understand. No problem. Take your time.
I will proceed with v1 enhancements regardless for [1].


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  4:11                                             ` Parav Pandit
@ 2023-09-21  4:19                                               ` Jason Wang
  2023-09-21  4:29                                                 ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-21  4:19 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Thu, Sep 21, 2023 at 12:11 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 21, 2023 9:32 AM
> >
> > On Thu, Sep 21, 2023 at 11:51 AM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Thursday, September 21, 2023 8:45 AM The main issue I see so
> > > > far is that you want to couple migration with admin commands but I
> > > > don't see much advantages to doing this.
> > > >
> > > The way I read above comment is, to draw a parallel line: descriptor posting in
> > virtio spec is tied to virtqueues. What is the advantage of it?
> >
> > Are you saying virtio can't live without admin commands? Again, let's not shift
> > concepts.
> >
> No, I did not say that.
> I just don’t see how functionalities proposed in [1] can be done without admin commands by the _device_ for member device passthrough requirement.
>
> You made point as "don’t see much advantage with migration done using admin commands".

Parav, I think I've clarified several times:

migration using the admin command is probably fine in some use cases.

What's not fine, is:

Mandate the admin command to be the only way for migration.

Are we on the same page for my concern now?

> What is the advantage of descriptor posting using virtqueue. It is the way of virtio spec...
>
> > > Well, it is one way to achieve it.
> > > There may be different way to do all bulk data transfer without admin
> > commands.
> >
> > Why is virtqueue the only way to do bulk data transferring? Can't DMA be
> > initiated by other-way?
> >
>
> Sure, what is the disadvantage of existing mechanism of virtqueue that can do following.
> 1. Ability to do DMA
> 2. agnostic of the DMA who do not want to do DMA

I don't understand this.

> 3. Ability to multiple command executions in parallel

Each device has their self-contained interface, why can't the commands
be executed in parallel.

> 4. Non blocking interface for driver that does not require any kind of polling

Are you saying the interrupt can only work for virtqueue?

>
> Why to invent new DMA scheme which at does all the 4 tasks?

It's simply because admin virtqueue can not work for all the cases. I
think you've agreed on this, no?

> First please list down disadvantages of admin queue + show all 4 things are achieved using new DMA interface.
> That will help to understand why new dma interface is needed.

I can give you a simple example. For example, what happens if we want
to migrate the owner? Having another owner for this owner is not a
good answer.

Thanks


>
>
>
> > Thanks
> >
> > > What it the advantage of it, please list down them in [1] for the command
> > where you can find alternative.
> > >
> > > [1]
> > > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > > vidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  3:26                                     ` Jason Wang
@ 2023-09-21  4:21                                       ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  4:21 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev


> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 21, 2023 8:56 AM


> My understanding is it might be better that each side do a summary of the both
> proposals.

I will summarize it soon in reply to [1].
Thanks.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  4:19                                               ` Jason Wang
@ 2023-09-21  4:29                                                 ` Parav Pandit
  2023-09-22  3:13                                                   ` Jason Wang
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  4:29 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Thursday, September 21, 2023 9:50 AM

> Parav, I think I've clarified several times:
> 
> migration using the admin command is probably fine in some use cases.
>
This is definitely, was not clear to me.
I am 100% clear now.
 
> What's not fine, is:
> 
> Mandate the admin command to be the only way for migration.
>
For sure, my series did not mandate that either.
I kept asking if we both can converge it will be really good to merge two use cases, we should.
If we cannot because of technical issues, than both methods to exists to address two different use cases.

> Are we on the same page for my concern now?
> 
Yes.

> > What is the advantage of descriptor posting using virtqueue. It is the way of
> virtio spec...
> >
> > > > Well, it is one way to achieve it.
> > > > There may be different way to do all bulk data transfer without
> > > > admin
> > > commands.
> > >
> > > Why is virtqueue the only way to do bulk data transferring? Can't
> > > DMA be initiated by other-way?
> > >
> >
> > Sure, what is the disadvantage of existing mechanism of virtqueue that can do
> following.
> > 1. Ability to do DMA
> > 2. agnostic of the DMA who do not want to do DMA
> 
> I don't understand this.
>
Admin commands can work without DMA, right because they are transported using admin queue.
 
> > 3. Ability to multiple command executions in parallel
> 
> Each device has their self-contained interface, why can't the commands be
> executed in parallel.
> 
Within the device it cannot if the interface is synchronous.

> > 4. Non blocking interface for driver that does not require any kind of
> > polling
> 
> Are you saying the interrupt can only work for virtqueue?
>
No. I am saying if one has to invent a interface that satisfy above needs, it will become a virtqueue.
And if it is not, one should list those disadvantages and cost of new interface, and explain its benefits.
Such interface should be generic one too.
 
> >
> > Why to invent new DMA scheme which at does all the 4 tasks?
> 
> It's simply because admin virtqueue can not work for all the cases. I think you've
> agreed on this, no?
>
I think it may work for nested case as well at the cost of replicating it on each device and adding special plumbing to isolate it, so that guest cannot issue driver notifications to it.

> > First please list down disadvantages of admin queue + show all 4 things are
> achieved using new DMA interface.
> > That will help to understand why new dma interface is needed.
> 
> I can give you a simple example. For example, what happens if we want to
> migrate the owner? Having another owner for this owner is not a good answer.

That is the nesting, don’t see any difference with other nesting.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 11:27                                           ` Zhu, Lingshan
@ 2023-09-21  5:13                                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21  5:13 UTC (permalink / raw)
  To: Zhu, Lingshan; +Cc: Parav Pandit, virtio-dev, Jason Wang

On Wed, Sep 20, 2023 at 07:27:05PM +0800, Zhu, Lingshan wrote:
> What if a malicious SW dump guest memory through admin vq LM facility?

What if malicious SW misconfigures vq through the SUSPEND bit facility?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  3:43                                             ` Parav Pandit
@ 2023-09-21  5:41                                               ` Michael S. Tsirkin
  2023-09-21  5:54                                                 ` Parav Pandit
  2023-09-21  9:06                                                   ` [virtio-comment] " Zhu, Lingshan
  0 siblings, 2 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21  5:41 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 03:43:12AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 1:34 AM
> > 
> > On Wed, Sep 20, 2023 at 05:21:52PM +0000, Parav Pandit wrote:
> > > > OK so with this summary in mind, can you find any advantages to
> > > > inband+mediation that are real or do you just see disadvantages? And
> > > > it's a tricky question because I can see some advantages ;)
> > >
> > > inband + mediation may be useful for nested case.
> > 
> > Hint: there's more.
> 
> Can you please list down?
> 
> The starting point of discussion is, there is passthrough member device without mediation in virtio interface layers.
> How shall device migration should work for it?

I was attempting to have each of you see other's point of view.
It seems clear I was right, at least one way communication was
not getting through. Let me try to help.

First, clearly Zhu Lingshan cares about the mediation use-case, not the
un-mediated one.  Mediation is clearly heavier but also more powerful
in many use-cases - is that obvious or do I need to list the reasons?
To mention one example, it supports cross-vendor migration. Which the unmediated
variant maybe can in theory support too, and when it does maybe in a better and
more structured way - but that will require standartization effort that
didn't happen yet. With mediation it was already demonstrated more than
once.

1. For mediation something that works within existing mediation framework -
e.g. reusing as he does feature bits - will require less support
than a completely separate facility.
I think Zhu Lingshan also believes that since there will be less code ->
less security issues.

2. With or without mediation, the mapping of commands to VFs is simpler,
allowing more control - for example, let's say you want to reset a VF -
you do not need to flush the queue of existing commands, which might
potentially take a long time because some other VFs are very busy - you
just reset the VF which any unmap flow will already do.

But Zhu Lingshan, all this will be pointless if you also do not try to
do this and list what are reasonable points that Parav made. Please do
not mistake what I'm doing here for taking sides I just want the
communication to start working. And that means everyone tries to take
all use-cases into account even if working for a vendor that does not
care about this use-case. Otherwise we will just keep getting into these
flamewars.
-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  5:41                                               ` Michael S. Tsirkin
@ 2023-09-21  5:54                                                 ` Parav Pandit
  2023-09-21  6:06                                                   ` Michael S. Tsirkin
  2023-09-21  9:06                                                   ` [virtio-comment] " Zhu, Lingshan
  1 sibling, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  5:54 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

Hi Michael,

> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 11:12 AM

> I was attempting to have each of you see other's point of view.
> It seems clear I was right, at least one way communication was not getting
> through. Let me try to help.
> 
> 
> First, clearly Zhu Lingshan cares about the mediation use-case, not the un-
> mediated one.  Mediation is clearly heavier but also more powerful in many
> use-cases - is that obvious or do I need to list the reasons?

I agree to it.

> To mention one example, it supports cross-vendor migration. Which the
> unmediated variant maybe can in theory support too, and when it does maybe
> in a better and more structured way - but that will require standartization effort
> that didn't happen yet. With mediation it was already demonstrated more than
> once.
>
We should be enhancing the device context so that more and more items can be annotated.
I started with small to get the design and idea through and will expand the device context so that cross vendors can migrate.

> 1. For mediation something that works within existing mediation framework -
> e.g. reusing as he does feature bits - will require less support than a completely
> separate facility.
> I think Zhu Lingshan also believes that since there will be less code -> less
> security issues.
>
With approach of [1], there is less code in the core device migration flow because none of those fields etc are parsed/read/written by the driver software.
 
> 2. With or without mediation, the mapping of commands to VFs is simpler,
> allowing more control - for example, let's say you want to reset a VF - you do not
> need to flush the queue of existing commands, which might potentially take a
> long time because some other VFs are very busy - you just reset the VF which
> any unmap flow will already do.
> 
If I understand you right, to reset a VF, no need to flush the queues without mediation too.
Just do VF FLR or do device reset, both will be fine.

> 
> 
> But Zhu Lingshan, all this will be pointless if you also do not try to do this and list
> what are reasonable points that Parav made. Please do not mistake what I'm
> doing here for taking sides I just want the communication to start working. And
> that means everyone tries to take all use-cases into account even if working for
> a vendor that does not care about this use-case. Otherwise we will just keep
> getting into these flamewars.

Right. I like to take this opportunity to ask again, lets sit together and see if we can utilize common framework between two methods.
For example,
1. device context for mediation and without mediation can provide common structures that both solutions can use
2. device provisioning (not in any of our series, that can find common ways)

May be more can be merged once we are open to collaborate.
If we face technical issues in unifying the methods, it will be self-explained why both methods to exist or create something different for two different use cases.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  5:54                                                 ` Parav Pandit
@ 2023-09-21  6:06                                                   ` Michael S. Tsirkin
  2023-09-21  6:31                                                     ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21  6:06 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 05:54:37AM +0000, Parav Pandit wrote:
> Hi Michael,
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 11:12 AM
> 
> > I was attempting to have each of you see other's point of view.
> > It seems clear I was right, at least one way communication was not getting
> > through. Let me try to help.
> > 
> > 
> > First, clearly Zhu Lingshan cares about the mediation use-case, not the un-
> > mediated one.  Mediation is clearly heavier but also more powerful in many
> > use-cases - is that obvious or do I need to list the reasons?
> 
> I agree to it.
> 
> > To mention one example, it supports cross-vendor migration. Which the
> > unmediated variant maybe can in theory support too, and when it does maybe
> > in a better and more structured way - but that will require standartization effort
> > that didn't happen yet. With mediation it was already demonstrated more than
> > once.
> >
> We should be enhancing the device context so that more and more items can be annotated.
> I started with small to get the design and idea through and will expand the device context so that cross vendors can migrate.

As I said, doable without mediation but already done with.


> > 1. For mediation something that works within existing mediation framework -
> > e.g. reusing as he does feature bits - will require less support than a completely
> > separate facility.
> > I think Zhu Lingshan also believes that since there will be less code -> less
> > security issues.
> >
> With approach of [1], there is less code in the core device migration flow because none of those fields etc are parsed/read/written by the driver software.

What is or is not executed in a specific flow is a separate question.
But the point is vdpa and any mediation have to talk virtio things
such as feature bits. So reusing e.g. feature bits needs less code
than operating the admin command machinery to check what is
supported. Yes, you can operate this machinery during setup
and not during migration itself. It's still less code to maintain.


> > 2. With or without mediation, the mapping of commands to VFs is simpler,
> > allowing more control - for example, let's say you want to reset a VF - you do not
> > need to flush the queue of existing commands, which might potentially take a
> > long time because some other VFs are very busy - you just reset the VF which
> > any unmap flow will already do.
> > 
> If I understand you right, to reset a VF, no need to flush the queues without mediation too.
> Just do VF FLR or do device reset, both will be fine.

Not to reset the VF - that's a narrow definition. To put it back in its
original state unrelated to any VMs.  Nope, FLR is not enough - there
could be commands in queue addressing the VF. If they take effect after
FLR VF state changed and it can't be cleanly assigned to a new VM.


> > 
> > 
> > But Zhu Lingshan, all this will be pointless if you also do not try to do this and list
> > what are reasonable points that Parav made. Please do not mistake what I'm
> > doing here for taking sides I just want the communication to start working. And
> > that means everyone tries to take all use-cases into account even if working for
> > a vendor that does not care about this use-case. Otherwise we will just keep
> > getting into these flamewars.
> 
> Right. I like to take this opportunity to ask again, lets sit together and see if we can utilize common framework between two methods.
> For example,
> 1. device context for mediation and without mediation can provide common structures that both solutions can use
> 2. device provisioning (not in any of our series, that can find common ways)
> 
> May be more can be merged once we are open to collaborate.
> If we face technical issues in unifying the methods, it will be self-explained why both methods to exist or create something different for two different use cases.


-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  6:06                                                   ` Michael S. Tsirkin
@ 2023-09-21  6:31                                                     ` Parav Pandit
  2023-09-21  7:20                                                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  6:31 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 11:37 AM

> > We should be enhancing the device context so that more and more items can
> be annotated.
> > I started with small to get the design and idea through and will expand the
> device context so that cross vendors can migrate.
> 
> As I said, doable without mediation but already done with.
>
It is not done in the virtio spec that can work without mediation right?
 
> 
> > > 1. For mediation something that works within existing mediation
> > > framework - e.g. reusing as he does feature bits - will require less
> > > support than a completely separate facility.
> > > I think Zhu Lingshan also believes that since there will be less
> > > code -> less security issues.
> > >
> > With approach of [1], there is less code in the core device migration flow
> because none of those fields etc are parsed/read/written by the driver
> software.
> 
> What is or is not executed in a specific flow is a separate question.
> But the point is vdpa and any mediation have to talk virtio things such as feature
> bits. So reusing e.g. feature bits needs less code than operating the admin
> command machinery to check what is supported. Yes, you can operate this
> machinery during setup and not during migration itself. It's still less code to
> maintain.
>
I wouldn't go down the path of code comparison.
But if you want to: we can take a concrete example of what is done by similar device who uses admin command approach.
The admin command-based approach migration driver is likely 10x smaller than the actual driver driving the feature bits and rest of the config.
If one needs more precise numbers of number of lines of code, I can derive it.

As features and functionality grows, every line of code gets added there in mediation too.
I agree such mediation has value and use case, as we know it is not the only approach fitting all use cases.

> 
> > > 2. With or without mediation, the mapping of commands to VFs is
> > > simpler, allowing more control - for example, let's say you want to
> > > reset a VF - you do not need to flush the queue of existing
> > > commands, which might potentially take a long time because some
> > > other VFs are very busy - you just reset the VF which any unmap flow will
> already do.
> > >
> > If I understand you right, to reset a VF, no need to flush the queues without
> mediation too.
> > Just do VF FLR or do device reset, both will be fine.
> 
> Not to reset the VF - that's a narrow definition. To put it back in its original state
> unrelated to any VMs.  Nope, FLR is not enough - there could be commands in
> queue addressing the VF. If they take effect after FLR VF state changed and it
> can't be cleanly assigned to a new VM.
> 
If you mean the SR-PCIM command, than it is virtio spec to define them.
We ratified this recently in the PCI-SIG of what gets cleared on VF FLR and what stays that SR-PCIM interface to do.

So if other commands are in pipe, only after they are done, such VF will be assigned to new VM.

This aspect is also covered in the proposal [2] in the DISCARD command and stop dirty page tracking (aka stop write reporting).

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa

To me, frankly, both methods are addressing two slightly different requirement due to which it may not be worth the comparison.
It is not either or.
And so far, technically what I understand is, both methods has its pros, cons and limitations due to which both requirements (passthrough and nest) cannot be addressed uniformly.
Both has its space for the solution it offers.

At best, two methods can find some common ground of commands or plumbing to reuse, it will be great.
If I am understanding is wrong, I would like to learn and discuss.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  6:31                                                     ` Parav Pandit
@ 2023-09-21  7:20                                                       ` Michael S. Tsirkin
  2023-09-21  7:53                                                         ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21  7:20 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 06:31:01AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 11:37 AM
> 
> > > We should be enhancing the device context so that more and more items can
> > be annotated.
> > > I started with small to get the design and idea through and will expand the
> > device context so that cross vendors can migrate.
> > 
> > As I said, doable without mediation but already done with.
> >
> It is not done in the virtio spec that can work without mediation right?

have trouble parsing this sentence

> > 
> > > > 1. For mediation something that works within existing mediation
> > > > framework - e.g. reusing as he does feature bits - will require less
> > > > support than a completely separate facility.
> > > > I think Zhu Lingshan also believes that since there will be less
> > > > code -> less security issues.
> > > >
> > > With approach of [1], there is less code in the core device migration flow
> > because none of those fields etc are parsed/read/written by the driver
> > software.
> > 
> > What is or is not executed in a specific flow is a separate question.
> > But the point is vdpa and any mediation have to talk virtio things such as feature
> > bits. So reusing e.g. feature bits needs less code than operating the admin
> > command machinery to check what is supported. Yes, you can operate this
> > machinery during setup and not during migration itself. It's still less code to
> > maintain.
> >
> I wouldn't go down the path of code comparison.
> But if you want to: we can take a concrete example of what is done by similar device who uses admin command approach.
> The admin command-based approach migration driver is likely 10x smaller than the actual driver driving the feature bits and rest of the config.

yes but mediation driver already has to do feature bits. so if doing
mediation then the cost of adding this specific extension is low.

> If one needs more precise numbers of number of lines of code, I can derive it.
> As features and functionality grows, every line of code gets added there in mediation too.
> I agree such mediation has value and use case, as we know it is not the only approach fitting all use cases.

Do you see how this extension is easier for mediation than driving
admin queue though?

> > 
> > > > 2. With or without mediation, the mapping of commands to VFs is
> > > > simpler, allowing more control - for example, let's say you want to
> > > > reset a VF - you do not need to flush the queue of existing
> > > > commands, which might potentially take a long time because some
> > > > other VFs are very busy - you just reset the VF which any unmap flow will
> > already do.
> > > >
> > > If I understand you right, to reset a VF, no need to flush the queues without
> > mediation too.
> > > Just do VF FLR or do device reset, both will be fine.
> > 
> > Not to reset the VF - that's a narrow definition. To put it back in its original state
> > unrelated to any VMs.  Nope, FLR is not enough - there could be commands in
> > queue addressing the VF. If they take effect after FLR VF state changed and it
> > can't be cleanly assigned to a new VM.
> > 
> If you mean the SR-PCIM command, than it is virtio spec to define them.
> We ratified this recently in the PCI-SIG of what gets cleared on VF FLR and what stays that SR-PCIM interface to do.

Interesting. Which ECN exactly do you refer to?

> So if other commands are in pipe, only after they are done, such VF will be assigned to new VM.

Exactly. this is exactly what I meant when I said "flush the queue" -
you have to wait until these commands are done, then do reset.

> This aspect is also covered in the proposal [2] in the DISCARD command and stop dirty page tracking (aka stop write reporting).
> 
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
> 
> To me, frankly, both methods are addressing two slightly different requirement due to which it may not be worth the comparison.
> It is not either or.
> And so far, technically what I understand is, both methods has its pros, cons and limitations due to which both requirements (passthrough and nest) cannot be addressed uniformly.
> Both has its space for the solution it offers.
> At best, two methods can find some common ground of commands or plumbing to reuse, it will be great.
> If I am understanding is wrong, I would like to learn and discuss.

okay now that's progress. what it would be great next if there was a
review of this proposal that does not say "just use my patch instead it
does the same" but instead says "here is how you can reuse plumbing from
my patch and it will address most of your use-case".

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  7:20                                                       ` Michael S. Tsirkin
@ 2023-09-21  7:53                                                         ` Parav Pandit
  2023-09-21  8:11                                                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  7:53 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 12:51 PM

> > > As I said, doable without mediation but already done with.
> > >
> > It is not done in the virtio spec that can work without mediation right?
>
> have trouble parsing this sentence

I mean to say,
Virtio spec has not achieved mediation less, device migration.
Virtio spec has not achieved device migration using mediation.
And two proposals are trying to do it.

> 
> > >
> > > > > 1. For mediation something that works within existing mediation
> > > > > framework - e.g. reusing as he does feature bits - will require
> > > > > less support than a completely separate facility.
> > > > > I think Zhu Lingshan also believes that since there will be less
> > > > > code -> less security issues.
> > > > >
> > > > With approach of [1], there is less code in the core device
> > > > migration flow
> > > because none of those fields etc are parsed/read/written by the
> > > driver software.
> > >
> > > What is or is not executed in a specific flow is a separate question.
> > > But the point is vdpa and any mediation have to talk virtio things
> > > such as feature bits. So reusing e.g. feature bits needs less code
> > > than operating the admin command machinery to check what is
> > > supported. Yes, you can operate this machinery during setup and not
> > > during migration itself. It's still less code to maintain.
> > >
> > I wouldn't go down the path of code comparison.
> > But if you want to: we can take a concrete example of what is done by similar
> device who uses admin command approach.
> > The admin command-based approach migration driver is likely 10x smaller
> than the actual driver driving the feature bits and rest of the config.
> 
> yes but mediation driver already has to do feature bits. so if doing mediation
> then the cost of adding this specific extension is low.
>
I thought first you were counting the cost of the code and not the spec in your point "feature bits needs less code than operating".
 
> > If one needs more precise numbers of number of lines of code, I can derive it.
> > As features and functionality grows, every line of code gets added there in
> mediation too.
> > I agree such mediation has value and use case, as we know it is not the only
> approach fitting all use cases.
> 
> Do you see how this extension is easier for mediation than driving admin queue
> though?
>
If we count the total cost of code than building the mediation framework + extensions, than it is not.
But as I said, I wouldn't compare two solutions as they are addressing a slightly different requirement.

What to compare is what can be reused between two solutions.
 
> > >
> > > > > 2. With or without mediation, the mapping of commands to VFs is
> > > > > simpler, allowing more control - for example, let's say you want
> > > > > to reset a VF - you do not need to flush the queue of existing
> > > > > commands, which might potentially take a long time because some
> > > > > other VFs are very busy - you just reset the VF which any unmap
> > > > > flow will
> > > already do.
> > > > >
> > > > If I understand you right, to reset a VF, no need to flush the
> > > > queues without
> > > mediation too.
> > > > Just do VF FLR or do device reset, both will be fine.
> > >
> > > Not to reset the VF - that's a narrow definition. To put it back in
> > > its original state unrelated to any VMs.  Nope, FLR is not enough -
> > > there could be commands in queue addressing the VF. If they take
> > > effect after FLR VF state changed and it can't be cleanly assigned to a new
> VM.
> > >
> > If you mean the SR-PCIM command, than it is virtio spec to define them.
> > We ratified this recently in the PCI-SIG of what gets cleared on VF FLR and
> what stays that SR-PCIM interface to do.
> 
> Interesting. Which ECN exactly do you refer to?
> 
B405.

> > So if other commands are in pipe, only after they are done, such VF will be
> assigned to new VM.
> 
> Exactly. this is exactly what I meant when I said "flush the queue" - you have to
> wait until these commands are done, then do reset.
>
Not exactly. VF reset is fully controlled by the guest. Hence, it does not collide with admin side of commands,
Same admin command for dirty page tracking and device context do not mess with FLR.
This is the critical because VF FLR cannot clear the page addresses already reported and not yet read by the driver.
It is covered in admin proposal.
 
> > This aspect is also covered in the proposal [2] in the DISCARD command and
> stop dirty page tracking (aka stop write reporting).
> >
> > [1]
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
> >
> > To me, frankly, both methods are addressing two slightly different requirement
> due to which it may not be worth the comparison.
> > It is not either or.
> > And so far, technically what I understand is, both methods has its pros, cons
> and limitations due to which both requirements (passthrough and nest) cannot
> be addressed uniformly.
> > Both has its space for the solution it offers.
> > At best, two methods can find some common ground of commands or
> plumbing to reuse, it will be great.
> > If I am understanding is wrong, I would like to learn and discuss.
> 
> okay now that's progress. what it would be great next if there was a review of
> this proposal that does not say "just use my patch instead it does the same" but
> instead says "here is how you can reuse plumbing from my patch and it will
> address most of your use-case".

I really didn't say it, but it is easy to miss out or interpret that.
I assume your comment applies to both the proposals.

This is what I proposed to Lingshan to review dirty page tracking and device context from admin proposal if we find it useful in his v1.
He didn't explain why he cannot use it.

Lingshan,
Can we please start fresh to review requirements of both the modes (passthrough, mediation) and see what all we can converge to from both the proposals?
(before sending vX of each series).
This will help us to unify the plumbing if possible, on both areas.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  7:53                                                         ` Parav Pandit
@ 2023-09-21  8:11                                                           ` Michael S. Tsirkin
  2023-09-21  9:17                                                             ` Parav Pandit
  0 siblings, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21  8:11 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 07:53:18AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 12:51 PM
> 
> > > > As I said, doable without mediation but already done with.
> > > >
> > > It is not done in the virtio spec that can work without mediation right?
> >
> > have trouble parsing this sentence
> 
> I mean to say,
> Virtio spec has not achieved mediation less, device migration.
> Virtio spec has not achieved device migration using mediation.

But yes it has - it was implemented with shadow vq.

> And two proposals are trying to do it.
> 
> > 
> > > >
> > > > > > 1. For mediation something that works within existing mediation
> > > > > > framework - e.g. reusing as he does feature bits - will require
> > > > > > less support than a completely separate facility.
> > > > > > I think Zhu Lingshan also believes that since there will be less
> > > > > > code -> less security issues.
> > > > > >
> > > > > With approach of [1], there is less code in the core device
> > > > > migration flow
> > > > because none of those fields etc are parsed/read/written by the
> > > > driver software.
> > > >
> > > > What is or is not executed in a specific flow is a separate question.
> > > > But the point is vdpa and any mediation have to talk virtio things
> > > > such as feature bits. So reusing e.g. feature bits needs less code
> > > > than operating the admin command machinery to check what is
> > > > supported. Yes, you can operate this machinery during setup and not
> > > > during migration itself. It's still less code to maintain.
> > > >
> > > I wouldn't go down the path of code comparison.
> > > But if you want to: we can take a concrete example of what is done by similar
> > device who uses admin command approach.
> > > The admin command-based approach migration driver is likely 10x smaller
> > than the actual driver driving the feature bits and rest of the config.
> > 
> > yes but mediation driver already has to do feature bits. so if doing mediation
> > then the cost of adding this specific extension is low.
> >
> I thought first you were counting the cost of the code and not the spec in your point "feature bits needs less code than operating".

yes - with vdpa it's mostly just
	vdev->status |= SUSPEND
	vdev->status &= ~SUSPEND
all over the place.

> > > If one needs more precise numbers of number of lines of code, I can derive it.
> > > As features and functionality grows, every line of code gets added there in
> > mediation too.
> > > I agree such mediation has value and use case, as we know it is not the only
> > approach fitting all use cases.
> > 
> > Do you see how this extension is easier for mediation than driving admin queue
> > though?
> >
> If we count the total cost of code than building the mediation framework + extensions, than it is not.
> But as I said, I wouldn't compare two solutions as they are addressing a slightly different requirement.

yes they are. the point of comparison was explaining why people who
use mediation anyway might not want to also use aq. can i assume
that's clear?

> What to compare is what can be reused between two solutions.
>  
> > > >
> > > > > > 2. With or without mediation, the mapping of commands to VFs is
> > > > > > simpler, allowing more control - for example, let's say you want
> > > > > > to reset a VF - you do not need to flush the queue of existing
> > > > > > commands, which might potentially take a long time because some
> > > > > > other VFs are very busy - you just reset the VF which any unmap
> > > > > > flow will
> > > > already do.
> > > > > >
> > > > > If I understand you right, to reset a VF, no need to flush the
> > > > > queues without
> > > > mediation too.
> > > > > Just do VF FLR or do device reset, both will be fine.
> > > >
> > > > Not to reset the VF - that's a narrow definition. To put it back in
> > > > its original state unrelated to any VMs.  Nope, FLR is not enough -
> > > > there could be commands in queue addressing the VF. If they take
> > > > effect after FLR VF state changed and it can't be cleanly assigned to a new
> > VM.
> > > >
> > > If you mean the SR-PCIM command, than it is virtio spec to define them.
> > > We ratified this recently in the PCI-SIG of what gets cleared on VF FLR and
> > what stays that SR-PCIM interface to do.
> > 
> > Interesting. Which ECN exactly do you refer to?
> > 
> B405.
> 
> > > So if other commands are in pipe, only after they are done, such VF will be
> > assigned to new VM.
> > 
> > Exactly. this is exactly what I meant when I said "flush the queue" - you have to
> > wait until these commands are done, then do reset.
> >
> Not exactly. VF reset is fully controlled by the guest.
> Hence, it does not collide with admin side of commands,

No, host does VF reset in a number of situations, including
guest restart, guest shutdown, etc.

> Same admin command for dirty page tracking and device context do not mess with FLR.
> This is the critical because VF FLR cannot clear the page addresses already reported and not yet read by the driver.
> It is covered in admin proposal.

They might not mess with it in that you can still do FLR but they will
mess with what hosts commonly try to achieve with FLR which is getting a
clean VF that has nothing to do with a given guest state.
For example
- queue stop command on PF
- flr on VF
- stop command is seen by PF

will get you a VF that is not running and can not be given to another
guest. You have to
- queue stop command on PF
- stop command is seen by PF
- flr on VF
in this order.

> > > This aspect is also covered in the proposal [2] in the DISCARD command and
> > stop dirty page tracking (aka stop write reporting).
> > >
> > > [1]
> > > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > > vidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
> > >
> > > To me, frankly, both methods are addressing two slightly different requirement
> > due to which it may not be worth the comparison.
> > > It is not either or.
> > > And so far, technically what I understand is, both methods has its pros, cons
> > and limitations due to which both requirements (passthrough and nest) cannot
> > be addressed uniformly.
> > > Both has its space for the solution it offers.
> > > At best, two methods can find some common ground of commands or
> > plumbing to reuse, it will be great.
> > > If I am understanding is wrong, I would like to learn and discuss.
> > 
> > okay now that's progress. what it would be great next if there was a review of
> > this proposal that does not say "just use my patch instead it does the same" but
> > instead says "here is how you can reuse plumbing from my patch and it will
> > address most of your use-case".
> 
> I really didn't say it, but it is easy to miss out or interpret that.
> I assume your comment applies to both the proposals.
> 
> This is what I proposed to Lingshan to review dirty page tracking and device context from admin proposal if we find it useful in his v1.
> He didn't explain why he cannot use it.
> 
> Lingshan,
> Can we please start fresh to review requirements of both the modes (passthrough, mediation) and see what all we can converge to from both the proposals?
> (before sending vX of each series).
> This will help us to unify the plumbing if possible, on both areas.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  5:41                                               ` Michael S. Tsirkin
@ 2023-09-21  9:06                                                   ` Zhu, Lingshan
  2023-09-21  9:06                                                   ` [virtio-comment] " Zhu, Lingshan
  1 sibling, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-21  9:06 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit, eperezma, Cornelia Huck,
	Stefan Hajnoczi, Stefano Garzarella, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/21/2023 1:41 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:43:12AM +0000, Parav Pandit wrote:
>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Thursday, September 21, 2023 1:34 AM
>>>
>>> On Wed, Sep 20, 2023 at 05:21:52PM +0000, Parav Pandit wrote:
>>>>> OK so with this summary in mind, can you find any advantages to
>>>>> inband+mediation that are real or do you just see disadvantages? And
>>>>> it's a tricky question because I can see some advantages ;)
>>>> inband + mediation may be useful for nested case.
>>> Hint: there's more.
>> Can you please list down?
>>
>> The starting point of discussion is, there is passthrough member device without mediation in virtio interface layers.
>> How shall device migration should work for it?
> I was attempting to have each of you see other's point of view.
> It seems clear I was right, at least one way communication was
> not getting through. Let me try to help.
>
>
> First, clearly Zhu Lingshan cares about the mediation use-case, not the
> un-mediated one.  Mediation is clearly heavier but also more powerful
> in many use-cases - is that obvious or do I need to list the reasons?
> To mention one example, it supports cross-vendor migration. Which the unmediated
> variant maybe can in theory support too, and when it does maybe in a better and
> more structured way - but that will require standartization effort that
> didn't happen yet. With mediation it was already demonstrated more than
> once.
>
> 1. For mediation something that works within existing mediation framework -
> e.g. reusing as he does feature bits - will require less support
> than a completely separate facility.
> I think Zhu Lingshan also believes that since there will be less code ->
> less security issues.
>
> 2. With or without mediation, the mapping of commands to VFs is simpler,
> allowing more control - for example, let's say you want to reset a VF -
> you do not need to flush the queue of existing commands, which might
> potentially take a long time because some other VFs are very busy - you
> just reset the VF which any unmap flow will already do.
>
>
Thanks, I agree
> But Zhu Lingshan, all this will be pointless if you also do not try to
> do this and list what are reasonable points that Parav made. Please do
> not mistake what I'm doing here for taking sides I just want the
> communication to start working. And that means everyone tries to take
> all use-cases into account even if working for a vendor that does not
> care about this use-case. Otherwise we will just keep getting into these
> flamewars.
I think admin vq live migration surely work for some scenarios and can 
meet specific customers requirements,
that uses cases are reasonable for sure.

Jason, Eugenio and me, et al spent a lot of efforts on this live 
migration proposal in the past two years,
this series are based on the joint work, directly carry on previous 
series sent by Jason and Eugenio.

This series introduces basic facilities for live migration, and the 
implementation is transport specific.

I agree we should cooperate, at least the basic facilities can be used 
by admin vq, for example the
dirty page tracking facility and even forward suspend command to device 
status.

Thanks,
Zhu Lingshan


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-21  9:06                                                   ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-21  9:06 UTC (permalink / raw)
  To: Michael S. Tsirkin, Parav Pandit, eperezma, Cornelia Huck,
	Stefan Hajnoczi, Stefano Garzarella, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/21/2023 1:41 PM, Michael S. Tsirkin wrote:
> On Thu, Sep 21, 2023 at 03:43:12AM +0000, Parav Pandit wrote:
>>
>>> From: Michael S. Tsirkin <mst@redhat.com>
>>> Sent: Thursday, September 21, 2023 1:34 AM
>>>
>>> On Wed, Sep 20, 2023 at 05:21:52PM +0000, Parav Pandit wrote:
>>>>> OK so with this summary in mind, can you find any advantages to
>>>>> inband+mediation that are real or do you just see disadvantages? And
>>>>> it's a tricky question because I can see some advantages ;)
>>>> inband + mediation may be useful for nested case.
>>> Hint: there's more.
>> Can you please list down?
>>
>> The starting point of discussion is, there is passthrough member device without mediation in virtio interface layers.
>> How shall device migration should work for it?
> I was attempting to have each of you see other's point of view.
> It seems clear I was right, at least one way communication was
> not getting through. Let me try to help.
>
>
> First, clearly Zhu Lingshan cares about the mediation use-case, not the
> un-mediated one.  Mediation is clearly heavier but also more powerful
> in many use-cases - is that obvious or do I need to list the reasons?
> To mention one example, it supports cross-vendor migration. Which the unmediated
> variant maybe can in theory support too, and when it does maybe in a better and
> more structured way - but that will require standartization effort that
> didn't happen yet. With mediation it was already demonstrated more than
> once.
>
> 1. For mediation something that works within existing mediation framework -
> e.g. reusing as he does feature bits - will require less support
> than a completely separate facility.
> I think Zhu Lingshan also believes that since there will be less code ->
> less security issues.
>
> 2. With or without mediation, the mapping of commands to VFs is simpler,
> allowing more control - for example, let's say you want to reset a VF -
> you do not need to flush the queue of existing commands, which might
> potentially take a long time because some other VFs are very busy - you
> just reset the VF which any unmap flow will already do.
>
>
Thanks, I agree
> But Zhu Lingshan, all this will be pointless if you also do not try to
> do this and list what are reasonable points that Parav made. Please do
> not mistake what I'm doing here for taking sides I just want the
> communication to start working. And that means everyone tries to take
> all use-cases into account even if working for a vendor that does not
> care about this use-case. Otherwise we will just keep getting into these
> flamewars.
I think admin vq live migration surely work for some scenarios and can 
meet specific customers requirements,
that uses cases are reasonable for sure.

Jason, Eugenio and me, et al spent a lot of efforts on this live 
migration proposal in the past two years,
this series are based on the joint work, directly carry on previous 
series sent by Jason and Eugenio.

This series introduces basic facilities for live migration, and the 
implementation is transport specific.

I agree we should cooperate, at least the basic facilities can be used 
by admin vq, for example the
dirty page tracking facility and even forward suspend command to device 
status.

Thanks,
Zhu Lingshan


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  8:11                                                           ` Michael S. Tsirkin
@ 2023-09-21  9:17                                                             ` Parav Pandit
  2023-09-21 10:01                                                               ` Michael S. Tsirkin
  2023-09-21 10:09                                                               ` Michael S. Tsirkin
  0 siblings, 2 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  9:17 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang



> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 1:41 PM

> > I mean to say,
> > Virtio spec has not achieved mediation less, device migration.
> > Virtio spec has not achieved device migration using mediation.
> 
> But yes it has - it was implemented with shadow vq.
>
Shadow vq + several other trap points on config space, cvq and more.

we cannot suspend the whole device and resume from where it was left off.
Those extensions are happening now.

> > And two proposals are trying to do it.
> >
> > >
> > > > >
> > > > > > > 1. For mediation something that works within existing
> > > > > > > mediation framework - e.g. reusing as he does feature bits -
> > > > > > > will require less support than a completely separate facility.
> > > > > > > I think Zhu Lingshan also believes that since there will be
> > > > > > > less code -> less security issues.
> > > > > > >
> > > > > > With approach of [1], there is less code in the core device
> > > > > > migration flow
> > > > > because none of those fields etc are parsed/read/written by the
> > > > > driver software.
> > > > >
> > > > > What is or is not executed in a specific flow is a separate question.
> > > > > But the point is vdpa and any mediation have to talk virtio
> > > > > things such as feature bits. So reusing e.g. feature bits needs
> > > > > less code than operating the admin command machinery to check
> > > > > what is supported. Yes, you can operate this machinery during
> > > > > setup and not during migration itself. It's still less code to maintain.
> > > > >
> > > > I wouldn't go down the path of code comparison.
> > > > But if you want to: we can take a concrete example of what is done
> > > > by similar
> > > device who uses admin command approach.
> > > > The admin command-based approach migration driver is likely 10x
> > > > smaller
> > > than the actual driver driving the feature bits and rest of the config.
> > >
> > > yes but mediation driver already has to do feature bits. so if doing
> > > mediation then the cost of adding this specific extension is low.
> > >
> > I thought first you were counting the cost of the code and not the spec in your
> point "feature bits needs less code than operating".
> 
> yes - with vdpa it's mostly just
> 	vdev->status |= SUSPEND
> 	vdev->status &= ~SUSPEND
> all over the place.
> 
+ inflight descriptors.

> > > > If one needs more precise numbers of number of lines of code, I can
> derive it.
> > > > As features and functionality grows, every line of code gets added
> > > > there in
> > > mediation too.
> > > > I agree such mediation has value and use case, as we know it is
> > > > not the only
> > > approach fitting all use cases.
> > >
> > > Do you see how this extension is easier for mediation than driving
> > > admin queue though?
> > >
> > If we count the total cost of code than building the mediation framework +
> extensions, than it is not.
> > But as I said, I wouldn't compare two solutions as they are addressing a slightly
> different requirement.
> 
> yes they are. the point of comparison was explaining why people who use
> mediation anyway might not want to also use aq. can i assume that's clear?
>
I am not fully sure.
I frankly don't find it right for member virtio device itself to be mediated.
Vdpa stack make total sense when the underlying device is not virtio and hence emulation.
But when there is native virtio member device, further mediation is overkill for certain scenarios.
But I understand that it helps to utilize a vdpa stack and thereby overcome some limitations, while it introduces other limitations...

> > What to compare is what can be reused between two solutions.
> >
> > > > >
> > > > > > > 2. With or without mediation, the mapping of commands to VFs
> > > > > > > is simpler, allowing more control - for example, let's say
> > > > > > > you want to reset a VF - you do not need to flush the queue
> > > > > > > of existing commands, which might potentially take a long
> > > > > > > time because some other VFs are very busy - you just reset
> > > > > > > the VF which any unmap flow will
> > > > > already do.
> > > > > > >
> > > > > > If I understand you right, to reset a VF, no need to flush the
> > > > > > queues without
> > > > > mediation too.
> > > > > > Just do VF FLR or do device reset, both will be fine.
> > > > >
> > > > > Not to reset the VF - that's a narrow definition. To put it back
> > > > > in its original state unrelated to any VMs.  Nope, FLR is not
> > > > > enough - there could be commands in queue addressing the VF. If
> > > > > they take effect after FLR VF state changed and it can't be
> > > > > cleanly assigned to a new
> > > VM.
> > > > >
> > > > If you mean the SR-PCIM command, than it is virtio spec to define them.
> > > > We ratified this recently in the PCI-SIG of what gets cleared on
> > > > VF FLR and
> > > what stays that SR-PCIM interface to do.
> > >
> > > Interesting. Which ECN exactly do you refer to?
> > >
> > B405.
> >
> > > > So if other commands are in pipe, only after they are done, such
> > > > VF will be
> > > assigned to new VM.
> > >
> > > Exactly. this is exactly what I meant when I said "flush the queue"
> > > - you have to wait until these commands are done, then do reset.
> > >
> > Not exactly. VF reset is fully controlled by the guest.
> > Hence, it does not collide with admin side of commands,
> 
> No, host does VF reset in a number of situations, including guest restart, guest
> shutdown, etc.
> 
That is fine, it can do.

> > Same admin command for dirty page tracking and device context do not mess
> with FLR.
> > This is the critical because VF FLR cannot clear the page addresses already
> reported and not yet read by the driver.
> > It is covered in admin proposal.
> 
> They might not mess with it in that you can still do FLR but they will mess with
> what hosts commonly try to achieve with FLR which is getting a clean VF that
> has nothing to do with a given guest state.
> For example
> - queue stop command on PF
> - flr on VF
> - stop command is seen by PF
>
This just fine, because FLR is not resetting what PF has done.
PF is not touching FLR side of things either.
 
> will get you a VF that is not running and can not be given to another guest. You
> have to
> - queue stop command on PF
> - stop command is seen by PF
> - flr on VF
> in this order.
> 
Since you are discussing the admin patches of [1], better to discuss there.
But even if we follow the sequence you described, it is also fine.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-20 13:41                                     ` Parav Pandit
@ 2023-09-21  9:18                                         ` Zhu, Lingshan
  2023-09-20 14:16                                       ` Michael S. Tsirkin
  2023-09-21  9:18                                         ` [virtio-comment] " Zhu, Lingshan
  2 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-21  9:18 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/20/2023 9:41 PM, Parav Pandit wrote:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Wednesday, September 20, 2023 6:12 PM
>> And Parav same goes for you - can you summarize Zhu Lingshan's position?
> Below is my summary about Zhu Lingshan's position:
>
> One line summary of his position in my view:
>
> 0. Use inband device migration only, use mediation, mediation is secure, but AQ is not secure.
>
> Details of his position in my view:
>
> 1. Device migration must be done through VF itself by suspending specific vqs and the VF device both.
Not exactly, my series implements basic facilities for live migration, 
admin vq solution can reuse them
for sure. admin vq solution can work for some use cases, but for others, 
you still need to resolve
the issues we talked before.
> 2. When device migration is done using #1, it must be done using mediation approach in hypervisor.
for fundamentals of virtualization, it is trap and emulate, I think 
Jason have told you many times.
>
> 3. When migration is done using inband mediation it is more secure than AQ approach.
> (as opposed to AQ of the owner device who enables/disables SR-IOV).
VF owns it and the hypervisor owns the VF, so no side channel.
>
> 4. AQ is not secure.
> But,
so many times discussions....
> 5. AQ and admin commands can be built on top of his proposal #1, even if AQ is less secure. Opposing statements...
The security leaks and attacking surface are introduced by AQ, not the 
basic facilities,
>
> 6. Dirty page tracking and inflight descriptors tracking to be done in his v1. but he does not want to review such coverage in [1].
Will be done in V2, and they are still config space solution, with help 
of the hypervisor.
>
> 8. Since his series does not cover any device context migration and does not talk anything about it,
> I deduce that he plans to use cvq for setting ups RSS and other fields using inband CVQ of the VF.
> This further limit the solution to only net device, ignoring rest of the other 20+ device types, where all may not have the CVQ.
Any difference from current vhost solution?
>
> 9. trapping and emulation of following objects: AQ, CVQ, virtio config space, PCI FLR flow in hypervisor is secure, but when if AQ of the PF do far small work of it, AQ is not secure.
for cvq, you should read Eugenio's patcheset, it is secure. For others, 
we have discussed for many times, no need to repeat.
>
> 10. Any traps proposed in #9 mostly do not work with future TDISP as TDISP do not bifurcate the device, so ignore them for now to promote inband migration.
TDISP devices can not be migrated for now, and the TDISP spec make clear 
examples of attacking models, your admin vq LM on the PF exactly match 
the model.

Sorry I have to repeat this again, this is the last time.
>
> 11. He do not show interest in collaboration (even after requesting few times) to see if we can produce common commands that may work for both passthrough (without mediation) and using mediation for nested case.
as repeated for many times, we are implementing basic facilities, and 
you can reuse the basic facilities for live migration in admin vq 
design, do you want to cooperate?
>
> 12. Some how register access on single physical card for the PFs and VFs gives better QoS guarantee than virtqueue as registers can scale infinitely no matter how many VFs or for multiple VQs because it is per VF.
that is per-device facilities.
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-21  9:18                                         ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-21  9:18 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/20/2023 9:41 PM, Parav Pandit wrote:
>> From: Michael S. Tsirkin <mst@redhat.com>
>> Sent: Wednesday, September 20, 2023 6:12 PM
>> And Parav same goes for you - can you summarize Zhu Lingshan's position?
> Below is my summary about Zhu Lingshan's position:
>
> One line summary of his position in my view:
>
> 0. Use inband device migration only, use mediation, mediation is secure, but AQ is not secure.
>
> Details of his position in my view:
>
> 1. Device migration must be done through VF itself by suspending specific vqs and the VF device both.
Not exactly, my series implements basic facilities for live migration, 
admin vq solution can reuse them
for sure. admin vq solution can work for some use cases, but for others, 
you still need to resolve
the issues we talked before.
> 2. When device migration is done using #1, it must be done using mediation approach in hypervisor.
for fundamentals of virtualization, it is trap and emulate, I think 
Jason have told you many times.
>
> 3. When migration is done using inband mediation it is more secure than AQ approach.
> (as opposed to AQ of the owner device who enables/disables SR-IOV).
VF owns it and the hypervisor owns the VF, so no side channel.
>
> 4. AQ is not secure.
> But,
so many times discussions....
> 5. AQ and admin commands can be built on top of his proposal #1, even if AQ is less secure. Opposing statements...
The security leaks and attacking surface are introduced by AQ, not the 
basic facilities,
>
> 6. Dirty page tracking and inflight descriptors tracking to be done in his v1. but he does not want to review such coverage in [1].
Will be done in V2, and they are still config space solution, with help 
of the hypervisor.
>
> 8. Since his series does not cover any device context migration and does not talk anything about it,
> I deduce that he plans to use cvq for setting ups RSS and other fields using inband CVQ of the VF.
> This further limit the solution to only net device, ignoring rest of the other 20+ device types, where all may not have the CVQ.
Any difference from current vhost solution?
>
> 9. trapping and emulation of following objects: AQ, CVQ, virtio config space, PCI FLR flow in hypervisor is secure, but when if AQ of the PF do far small work of it, AQ is not secure.
for cvq, you should read Eugenio's patcheset, it is secure. For others, 
we have discussed for many times, no need to repeat.
>
> 10. Any traps proposed in #9 mostly do not work with future TDISP as TDISP do not bifurcate the device, so ignore them for now to promote inband migration.
TDISP devices can not be migrated for now, and the TDISP spec make clear 
examples of attacking models, your admin vq LM on the PF exactly match 
the model.

Sorry I have to repeat this again, this is the last time.
>
> 11. He do not show interest in collaboration (even after requesting few times) to see if we can produce common commands that may work for both passthrough (without mediation) and using mediation for nested case.
as repeated for many times, we are implementing basic facilities, and 
you can reuse the basic facilities for live migration in admin vq 
design, do you want to cooperate?
>
> 12. Some how register access on single physical card for the PFs and VFs gives better QoS guarantee than virtqueue as registers can scale infinitely no matter how many VFs or for multiple VQs because it is per VF.
that is per-device facilities.
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#md9fcfa1ba997463de8c7fb8c6d1786b224b0bead


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  9:18                                         ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-21  9:26                                           ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  9:26 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Thursday, September 21, 2023 2:49 PM

> TDISP devices can not be migrated for now, and the TDISP spec make clear
> examples of attacking models, your admin vq LM on the PF exactly match the
> model.

I gave hint yesterday to you to consult Ravi at Intel who showed TDISP migration using a dedicated TVM using similar mechanism as admin command.
But you sadly ignored...

So let me make another attempt to explain,

When in future TDISP device migration to be supported, the admin command will be done through a dedicated PF or a VF that resides in another trust domain, for example another TVM.
Such admin virtio device will not be located in the hypervisor.
Thereby, it will be secure.
The admin commands pave the road to make this happen. Only thing changes is delegation of admin commands to another admin device instead of a PF.

There are other solutions too that will arise.
I have seen another one too, may be DPU.

In all the 2 approaches, TDISP is migratable and spec will evolve as multiple vendors including Intel, AMD and others showed the path towards it without mediation.
Virtio will be able to leverage that as well using admin commands.

I want to emphasize again, do not keep repeating AQ in your comments.
It is admin commands in proposal [1].

As Michael also requested, I kindly request to co-operate on doing join technical work, shared ideas, knowledge and improve the spec.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-21  9:26                                           ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21  9:26 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Thursday, September 21, 2023 2:49 PM

> TDISP devices can not be migrated for now, and the TDISP spec make clear
> examples of attacking models, your admin vq LM on the PF exactly match the
> model.

I gave hint yesterday to you to consult Ravi at Intel who showed TDISP migration using a dedicated TVM using similar mechanism as admin command.
But you sadly ignored...

So let me make another attempt to explain,

When in future TDISP device migration to be supported, the admin command will be done through a dedicated PF or a VF that resides in another trust domain, for example another TVM.
Such admin virtio device will not be located in the hypervisor.
Thereby, it will be secure.
The admin commands pave the road to make this happen. Only thing changes is delegation of admin commands to another admin device instead of a PF.

There are other solutions too that will arise.
I have seen another one too, may be DPU.

In all the 2 approaches, TDISP is migratable and spec will evolve as multiple vendors including Intel, AMD and others showed the path towards it without mediation.
Virtio will be able to leverage that as well using admin commands.

I want to emphasize again, do not keep repeating AQ in your comments.
It is admin commands in proposal [1].

As Michael also requested, I kindly request to co-operate on doing join technical work, shared ideas, knowledge and improve the spec.

[1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  9:26                                           ` [virtio-comment] " Parav Pandit
@ 2023-09-21  9:55                                             ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-21  9:55 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/21/2023 5:26 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Thursday, September 21, 2023 2:49 PM
>> TDISP devices can not be migrated for now, and the TDISP spec make clear
>> examples of attacking models, your admin vq LM on the PF exactly match the
>> model.
> I gave hint yesterday to you to consult Ravi at Intel who showed TDISP migration using a dedicated TVM using similar mechanism as admin command.
> But you sadly ignored...
>
> So let me make another attempt to explain,
>
> When in future TDISP device migration to be supported, the admin command will be done through a dedicated PF or a VF that resides in another trust domain, for example another TVM.
> Such admin virtio device will not be located in the hypervisor.
> Thereby, it will be secure.
> The admin commands pave the road to make this happen. Only thing changes is delegation of admin commands to another admin device instead of a PF.
if you plan to do it in future, then lets discuss in the future.

And TDISP can be migrated in future does not mean admin vq LM is secure, 
I have repeated for so many times of the attacking model. and I will not 
repeat again.
>
> There are other solutions too that will arise.
> I have seen another one too, may be DPU.
>
> In all the 2 approaches, TDISP is migratable and spec will evolve as multiple vendors including Intel, AMD and others showed the path towards it without mediation.
> Virtio will be able to leverage that as well using admin commands.
>
> I want to emphasize again, do not keep repeating AQ in your comments.
> It is admin commands in proposal [1].
we are discussing LM, right? Can TDISP help you here? TDISP spec gives 
examples of attacking models, and your admin vq matches it, I gave you 
quote of the spec yesterday.

This thread is about live migration anyway, not TDISP.
>
> As Michael also requested, I kindly request to co-operate on doing join technical work, shared ideas, knowledge and improve the spec.
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
see other threads, I propose to reuse the basic facilities of live 
migration in admin vq.
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-21  9:55                                             ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-21  9:55 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/21/2023 5:26 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Thursday, September 21, 2023 2:49 PM
>> TDISP devices can not be migrated for now, and the TDISP spec make clear
>> examples of attacking models, your admin vq LM on the PF exactly match the
>> model.
> I gave hint yesterday to you to consult Ravi at Intel who showed TDISP migration using a dedicated TVM using similar mechanism as admin command.
> But you sadly ignored...
>
> So let me make another attempt to explain,
>
> When in future TDISP device migration to be supported, the admin command will be done through a dedicated PF or a VF that resides in another trust domain, for example another TVM.
> Such admin virtio device will not be located in the hypervisor.
> Thereby, it will be secure.
> The admin commands pave the road to make this happen. Only thing changes is delegation of admin commands to another admin device instead of a PF.
if you plan to do it in future, then lets discuss in the future.

And TDISP can be migrated in future does not mean admin vq LM is secure, 
I have repeated for so many times of the attacking model. and I will not 
repeat again.
>
> There are other solutions too that will arise.
> I have seen another one too, may be DPU.
>
> In all the 2 approaches, TDISP is migratable and spec will evolve as multiple vendors including Intel, AMD and others showed the path towards it without mediation.
> Virtio will be able to leverage that as well using admin commands.
>
> I want to emphasize again, do not keep repeating AQ in your comments.
> It is admin commands in proposal [1].
we are discussing LM, right? Can TDISP help you here? TDISP spec gives 
examples of attacking models, and your admin vq matches it, I gave you 
quote of the spec yesterday.

This thread is about live migration anyway, not TDISP.
>
> As Michael also requested, I kindly request to co-operate on doing join technical work, shared ideas, knowledge and improve the spec.
>
> [1] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
see other threads, I propose to reuse the basic facilities of live 
migration in admin vq.
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  9:17                                                             ` Parav Pandit
@ 2023-09-21 10:01                                                               ` Michael S. Tsirkin
  2023-09-21 11:13                                                                 ` Parav Pandit
  2023-09-21 10:09                                                               ` Michael S. Tsirkin
  1 sibling, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 10:01 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 09:17:53AM +0000, Parav Pandit wrote:
> 
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 1:41 PM
> 
> > > I mean to say,
> > > Virtio spec has not achieved mediation less, device migration.
> > > Virtio spec has not achieved device migration using mediation.
> > 
> > But yes it has - it was implemented with shadow vq.
> >
> Shadow vq + several other trap points on config space, cvq and more.

exactly.

> we cannot suspend the whole device and resume from where it was left off.
> Those extensions are happening now.
> 
> > > And two proposals are trying to do it.
> > >
> > > >
> > > > > >
> > > > > > > > 1. For mediation something that works within existing
> > > > > > > > mediation framework - e.g. reusing as he does feature bits -
> > > > > > > > will require less support than a completely separate facility.
> > > > > > > > I think Zhu Lingshan also believes that since there will be
> > > > > > > > less code -> less security issues.
> > > > > > > >
> > > > > > > With approach of [1], there is less code in the core device
> > > > > > > migration flow
> > > > > > because none of those fields etc are parsed/read/written by the
> > > > > > driver software.
> > > > > >
> > > > > > What is or is not executed in a specific flow is a separate question.
> > > > > > But the point is vdpa and any mediation have to talk virtio
> > > > > > things such as feature bits. So reusing e.g. feature bits needs
> > > > > > less code than operating the admin command machinery to check
> > > > > > what is supported. Yes, you can operate this machinery during
> > > > > > setup and not during migration itself. It's still less code to maintain.
> > > > > >
> > > > > I wouldn't go down the path of code comparison.
> > > > > But if you want to: we can take a concrete example of what is done
> > > > > by similar
> > > > device who uses admin command approach.
> > > > > The admin command-based approach migration driver is likely 10x
> > > > > smaller
> > > > than the actual driver driving the feature bits and rest of the config.
> > > >
> > > > yes but mediation driver already has to do feature bits. so if doing
> > > > mediation then the cost of adding this specific extension is low.
> > > >
> > > I thought first you were counting the cost of the code and not the spec in your
> > point "feature bits needs less code than operating".
> > 
> > yes - with vdpa it's mostly just
> > 	vdev->status |= SUSPEND
> > 	vdev->status &= ~SUSPEND
> > all over the place.
> > 
> + inflight descriptors.

for sure, this is just stopping it.

> > > > > If one needs more precise numbers of number of lines of code, I can
> > derive it.
> > > > > As features and functionality grows, every line of code gets added
> > > > > there in
> > > > mediation too.
> > > > > I agree such mediation has value and use case, as we know it is
> > > > > not the only
> > > > approach fitting all use cases.
> > > >
> > > > Do you see how this extension is easier for mediation than driving
> > > > admin queue though?
> > > >
> > > If we count the total cost of code than building the mediation framework +
> > extensions, than it is not.
> > > But as I said, I wouldn't compare two solutions as they are addressing a slightly
> > different requirement.
> > 
> > yes they are. the point of comparison was explaining why people who use
> > mediation anyway might not want to also use aq. can i assume that's clear?
> >
> I am not fully sure.
> I frankly don't find it right for member virtio device itself to be mediated.
> Vdpa stack make total sense when the underlying device is not virtio and hence emulation.
> But when there is native virtio member device, further mediation is overkill for certain scenarios.
> But I understand that it helps to utilize a vdpa stack and thereby overcome some limitations, while it introduces other limitations...

yes. whether it makes sense depends on the use-case.


> > > What to compare is what can be reused between two solutions.
> > >
> > > > > >
> > > > > > > > 2. With or without mediation, the mapping of commands to VFs
> > > > > > > > is simpler, allowing more control - for example, let's say
> > > > > > > > you want to reset a VF - you do not need to flush the queue
> > > > > > > > of existing commands, which might potentially take a long
> > > > > > > > time because some other VFs are very busy - you just reset
> > > > > > > > the VF which any unmap flow will
> > > > > > already do.
> > > > > > > >
> > > > > > > If I understand you right, to reset a VF, no need to flush the
> > > > > > > queues without
> > > > > > mediation too.
> > > > > > > Just do VF FLR or do device reset, both will be fine.
> > > > > >
> > > > > > Not to reset the VF - that's a narrow definition. To put it back
> > > > > > in its original state unrelated to any VMs.  Nope, FLR is not
> > > > > > enough - there could be commands in queue addressing the VF. If
> > > > > > they take effect after FLR VF state changed and it can't be
> > > > > > cleanly assigned to a new
> > > > VM.
> > > > > >
> > > > > If you mean the SR-PCIM command, than it is virtio spec to define them.
> > > > > We ratified this recently in the PCI-SIG of what gets cleared on
> > > > > VF FLR and
> > > > what stays that SR-PCIM interface to do.
> > > >
> > > > Interesting. Which ECN exactly do you refer to?
> > > >
> > > B405.
> > >
> > > > > So if other commands are in pipe, only after they are done, such
> > > > > VF will be
> > > > assigned to new VM.
> > > >
> > > > Exactly. this is exactly what I meant when I said "flush the queue"
> > > > - you have to wait until these commands are done, then do reset.
> > > >
> > > Not exactly. VF reset is fully controlled by the guest.
> > > Hence, it does not collide with admin side of commands,
> > 
> > No, host does VF reset in a number of situations, including guest restart, guest
> > shutdown, etc.
> > 
> That is fine, it can do.
> 
> > > Same admin command for dirty page tracking and device context do not mess
> > with FLR.
> > > This is the critical because VF FLR cannot clear the page addresses already
> > reported and not yet read by the driver.
> > > It is covered in admin proposal.
> > 
> > They might not mess with it in that you can still do FLR but they will mess with
> > what hosts commonly try to achieve with FLR which is getting a clean VF that
> > has nothing to do with a given guest state.
> > For example
> > - queue stop command on PF
> > - flr on VF
> > - stop command is seen by PF
> >
> This just fine, because FLR is not resetting what PF has done.
> PF is not touching FLR side of things either.
>  
> > will get you a VF that is not running and can not be given to another guest. You
> > have to
> > - queue stop command on PF
> > - stop command is seen by PF
> > - flr on VF
> > in this order.
> > 
> Since you are discussing the admin patches of [1], better to discuss there.
> But even if we follow the sequence you described, it is also fine.

well then flr itself is now insufficient to reset the VF completely to
its original state.  I can't say whether you see why one has to make
sure device is not stopped before assigning it to guest or not.
a stopped device can't be used by guest the way it expects.
And to make sure, one has to wait for some admin command to complete,
one way or another.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  9:17                                                             ` Parav Pandit
  2023-09-21 10:01                                                               ` Michael S. Tsirkin
@ 2023-09-21 10:09                                                               ` Michael S. Tsirkin
  2023-09-21 10:39                                                                 ` Parav Pandit
  1 sibling, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 10:09 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 09:17:53AM +0000, Parav Pandit wrote:
> Vdpa stack make total sense when the underlying device is not virtio and hence emulation.

Which linux framework is used is kind of beside the point but since you
bring this up - not necessarily.

E.g. I personally don't care much about "stack" but clearly we need a
virtio driver on the host to be involved, teaching vfio about virtio is
probably a much worse idea than creating a mode in the vdpa driver which
mostly sets up the IOMMU and otherwise gets out of the way of using the
VF and just drives the PF.

But for example, to migrate cross-vendor you need the
pci config space to look the same and for some devices this
might mean that pci config space will have to be mediated.
Or maybe not. vdpa is a good framework in that it gives us
flexibility, it is not opinionated.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21 10:09                                                               ` Michael S. Tsirkin
@ 2023-09-21 10:39                                                                 ` Parav Pandit
  2023-09-21 12:22                                                                   ` Michael S. Tsirkin
  2023-09-22  3:31                                                                   ` Jason Wang
  0 siblings, 2 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21 10:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 3:40 PM
> 
> On Thu, Sep 21, 2023 at 09:17:53AM +0000, Parav Pandit wrote:
> > Vdpa stack make total sense when the underlying device is not virtio and
> hence emulation.
> 
> Which linux framework is used is kind of beside the point but since you bring
> this up - not necessarily.
> 
> E.g. I personally don't care much about "stack" but clearly we need a virtio
> driver on the host to be involved, teaching vfio about virtio is probably a much
> worse idea than creating a mode in the vdpa driver which mostly sets up the
> IOMMU and otherwise gets out of the way of using the VF and just drives the
> PF.
Well, vdpa has to drive large many things including cvq, config space, msix and more.
It can help to overcome some issues as you listed below.
So that way vdpa over virtio is useful.

In vfio world, there is nothing significant to teach about virtio.
Vfio is already has the model to extend the stack to do only specific work and reuse the rest.

> 
> But for example, to migrate cross-vendor you need the pci config space to look
> the same and for some devices this might mean that pci config space will have
> to be mediated.
> Or maybe not. vdpa is a good framework in that it gives us flexibility, it is not
> opinionated.

Sure, as you list, both has its pros and cons.
So both solutions has its space and trade off.

Hence, if both can converge to same set of commands, => good.
When there is different way two stacks operates for those items, we will have spec extensions for both model, right?

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21 10:01                                                               ` Michael S. Tsirkin
@ 2023-09-21 11:13                                                                 ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21 11:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 3:32 PM
> 
> On Thu, Sep 21, 2023 at 09:17:53AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Thursday, September 21, 2023 1:41 PM
> >
> > > > I mean to say,
> > > > Virtio spec has not achieved mediation less, device migration.
> > > > Virtio spec has not achieved device migration using mediation.
> > >
> > > But yes it has - it was implemented with shadow vq.
> > >
> > Shadow vq + several other trap points on config space, cvq and more.
> 
> exactly.
> 
Ok.

> > we cannot suspend the whole device and resume from where it was left off.
> > Those extensions are happening now.
> >
> > > > And two proposals are trying to do it.
> > > >
> > > > >
> > > > > > >
> > > > > > > > > 1. For mediation something that works within existing
> > > > > > > > > mediation framework - e.g. reusing as he does feature
> > > > > > > > > bits - will require less support than a completely separate facility.
> > > > > > > > > I think Zhu Lingshan also believes that since there will
> > > > > > > > > be less code -> less security issues.
> > > > > > > > >
> > > > > > > > With approach of [1], there is less code in the core
> > > > > > > > device migration flow
> > > > > > > because none of those fields etc are parsed/read/written by
> > > > > > > the driver software.
> > > > > > >
> > > > > > > What is or is not executed in a specific flow is a separate question.
> > > > > > > But the point is vdpa and any mediation have to talk virtio
> > > > > > > things such as feature bits. So reusing e.g. feature bits
> > > > > > > needs less code than operating the admin command machinery
> > > > > > > to check what is supported. Yes, you can operate this
> > > > > > > machinery during setup and not during migration itself. It's still less
> code to maintain.
> > > > > > >
> > > > > > I wouldn't go down the path of code comparison.
> > > > > > But if you want to: we can take a concrete example of what is
> > > > > > done by similar
> > > > > device who uses admin command approach.
> > > > > > The admin command-based approach migration driver is likely
> > > > > > 10x smaller
> > > > > than the actual driver driving the feature bits and rest of the config.
> > > > >
> > > > > yes but mediation driver already has to do feature bits. so if
> > > > > doing mediation then the cost of adding this specific extension is low.
> > > > >
> > > > I thought first you were counting the cost of the code and not the
> > > > spec in your
> > > point "feature bits needs less code than operating".
> > >
> > > yes - with vdpa it's mostly just
> > > 	vdev->status |= SUSPEND
> > > 	vdev->status &= ~SUSPEND
> > > all over the place.
> > >
> > + inflight descriptors.
> 
> for sure, this is just stopping it.
> 
> > > > > > If one needs more precise numbers of number of lines of code,
> > > > > > I can
> > > derive it.
> > > > > > As features and functionality grows, every line of code gets
> > > > > > added there in
> > > > > mediation too.
> > > > > > I agree such mediation has value and use case, as we know it
> > > > > > is not the only
> > > > > approach fitting all use cases.
> > > > >
> > > > > Do you see how this extension is easier for mediation than
> > > > > driving admin queue though?
> > > > >
> > > > If we count the total cost of code than building the mediation
> > > > framework +
> > > extensions, than it is not.
> > > > But as I said, I wouldn't compare two solutions as they are
> > > > addressing a slightly
> > > different requirement.
> > >
> > > yes they are. the point of comparison was explaining why people who
> > > use mediation anyway might not want to also use aq. can i assume that's
> clear?
> > >
> > I am not fully sure.
> > I frankly don't find it right for member virtio device itself to be mediated.
> > Vdpa stack make total sense when the underlying device is not virtio and
> hence emulation.
> > But when there is native virtio member device, further mediation is overkill
> for certain scenarios.
> > But I understand that it helps to utilize a vdpa stack and thereby overcome
> some limitations, while it introduces other limitations...
> 
> yes. whether it makes sense depends on the use-case.
> 
Both are valid use cases so vdpa over virtio can find the uses to overcome some of the limitations.

When one wants to use virtio member device as passthrough, when needed attributes match, one can use it too.

> > This just fine, because FLR is not resetting what PF has done.
> > PF is not touching FLR side of things either.
> >
> > > will get you a VF that is not running and can not be given to
> > > another guest. You have to
> > > - queue stop command on PF
> > > - stop command is seen by PF
> > > - flr on VF
> > > in this order.
> > >
> > Since you are discussing the admin patches of [1], better to discuss there.
> > But even if we follow the sequence you described, it is also fine.
> 
> well then flr itself is now insufficient to reset the VF completely to its original
> state.  
That is how it is in the PCI spec and other consortium based spec and many vendor implementations too.

> stopped before assigning it to guest or not.
> a stopped device can't be used by guest the way it expects.
> And to make sure, one has to wait for some admin command to complete, one
> way or another.

Yes, sw stack which dynamically keep mapping VFs to different VMs already do it for virtio and non virtio devices.
The migration driver will make sure whatever it has setup is cleared by it before handing over the VF.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  9:55                                             ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-21 11:28                                               ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21 11:28 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Thursday, September 21, 2023 3:25 PM
> 
> On 9/21/2023 5:26 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Thursday, September 21, 2023 2:49 PM TDISP devices can not be
> >> migrated for now, and the TDISP spec make clear examples of attacking
> >> models, your admin vq LM on the PF exactly match the model.
> > I gave hint yesterday to you to consult Ravi at Intel who showed TDISP
> migration using a dedicated TVM using similar mechanism as admin command.
> > But you sadly ignored...
> >
> > So let me make another attempt to explain,
> >
> > When in future TDISP device migration to be supported, the admin command
> will be done through a dedicated PF or a VF that resides in another trust
> domain, for example another TVM.
> > Such admin virtio device will not be located in the hypervisor.
> > Thereby, it will be secure.
> > The admin commands pave the road to make this happen. Only thing changes
> is delegation of admin commands to another admin device instead of a PF.
> if you plan to do it in future, then lets discuss in the future.
> 
> And TDISP can be migrated in future does not mean admin vq LM is secure, I
> have repeated for so many times of the attacking model. and I will not repeat
> again.

> > There are other solutions too that will arise.
> > I have seen another one too, may be DPU.
> >
> > In all the 2 approaches, TDISP is migratable and spec will evolve as multiple
> vendors including Intel, AMD and others showed the path towards it without
> mediation.
> > Virtio will be able to leverage that as well using admin commands.
> >
> > I want to emphasize again, do not keep repeating AQ in your comments.
> > It is admin commands in proposal [1].
> we are discussing LM, right? Can TDISP help you here? TDISP spec gives
> examples of attacking models, and your admin vq matches it, I gave you quote
> of the spec yesterday.
> 
> This thread is about live migration anyway, not TDISP.
> >
> > As Michael also requested, I kindly request to co-operate on doing join
> technical work, shared ideas, knowledge and improve the spec.
> >
> > [1]
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
> see other threads, I propose to reuse the basic facilities of live migration in
> admin vq.
> >

I don’t see a point in repeating anything anymore with your constant repetitions and ignorance to ideas.

I am happy to collaborate to driver virtio spec when you can give thoughts with an open mind to address two use cases to converge and discuss.

1. virtio device migration using mediation approach
2. virtio member passthrough device migration

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-21 11:28                                               ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-21 11:28 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Thursday, September 21, 2023 3:25 PM
> 
> On 9/21/2023 5:26 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Thursday, September 21, 2023 2:49 PM TDISP devices can not be
> >> migrated for now, and the TDISP spec make clear examples of attacking
> >> models, your admin vq LM on the PF exactly match the model.
> > I gave hint yesterday to you to consult Ravi at Intel who showed TDISP
> migration using a dedicated TVM using similar mechanism as admin command.
> > But you sadly ignored...
> >
> > So let me make another attempt to explain,
> >
> > When in future TDISP device migration to be supported, the admin command
> will be done through a dedicated PF or a VF that resides in another trust
> domain, for example another TVM.
> > Such admin virtio device will not be located in the hypervisor.
> > Thereby, it will be secure.
> > The admin commands pave the road to make this happen. Only thing changes
> is delegation of admin commands to another admin device instead of a PF.
> if you plan to do it in future, then lets discuss in the future.
> 
> And TDISP can be migrated in future does not mean admin vq LM is secure, I
> have repeated for so many times of the attacking model. and I will not repeat
> again.

> > There are other solutions too that will arise.
> > I have seen another one too, may be DPU.
> >
> > In all the 2 approaches, TDISP is migratable and spec will evolve as multiple
> vendors including Intel, AMD and others showed the path towards it without
> mediation.
> > Virtio will be able to leverage that as well using admin commands.
> >
> > I want to emphasize again, do not keep repeating AQ in your comments.
> > It is admin commands in proposal [1].
> we are discussing LM, right? Can TDISP help you here? TDISP spec gives
> examples of attacking models, and your admin vq matches it, I gave you quote
> of the spec yesterday.
> 
> This thread is about live migration anyway, not TDISP.
> >
> > As Michael also requested, I kindly request to co-operate on doing join
> technical work, shared ideas, knowledge and improve the spec.
> >
> > [1]
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
> see other threads, I propose to reuse the basic facilities of live migration in
> admin vq.
> >

I don’t see a point in repeating anything anymore with your constant repetitions and ignorance to ideas.

I am happy to collaborate to driver virtio spec when you can give thoughts with an open mind to address two use cases to converge and discuss.

1. virtio device migration using mediation approach
2. virtio member passthrough device migration

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21 10:39                                                                 ` Parav Pandit
@ 2023-09-21 12:22                                                                   ` Michael S. Tsirkin
  2023-09-21 12:39                                                                     ` Parav Pandit
  2023-09-22  3:31                                                                   ` Jason Wang
  1 sibling, 1 reply; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 12:22 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 10:39:23AM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 3:40 PM
> > 
> > On Thu, Sep 21, 2023 at 09:17:53AM +0000, Parav Pandit wrote:
> > > Vdpa stack make total sense when the underlying device is not virtio and
> > hence emulation.
> > 
> > Which linux framework is used is kind of beside the point but since you bring
> > this up - not necessarily.
> > 
> > E.g. I personally don't care much about "stack" but clearly we need a virtio
> > driver on the host to be involved, teaching vfio about virtio is probably a much
> > worse idea than creating a mode in the vdpa driver which mostly sets up the
> > IOMMU and otherwise gets out of the way of using the VF and just drives the
> > PF.
> Well, vdpa has to drive large many things including cvq, config space, msix and more.
> It can help to overcome some issues as you listed below.
> So that way vdpa over virtio is useful.
> 
> In vfio world, there is nothing significant to teach about virtio.
> Vfio is already has the model to extend the stack to do only specific work and reuse the rest.

Again the thread has been side-tracked, which linux module does what is
not what it was talking about.  By the way I wonder who decided to drop
virtio comment from this and copy virtio-dev. Guys please don't do this
and generally just send spec patches to virtio-comment, implementation
discussion on virtio-dev.


> > 
> > But for example, to migrate cross-vendor you need the pci config space to look
> > the same and for some devices this might mean that pci config space will have
> > to be mediated.
> > Or maybe not. vdpa is a good framework in that it gives us flexibility, it is not
> > opinionated.
> 
> Sure, as you list, both has its pros and cons.
> So both solutions has its space and trade off.

You can thinkably write a vfio based driver for PF and VF in userspace,
sure. But I think this is just making things unnecessarily complex
for users who will have to know which device to use with which
driver. I think that e.g. if we have two ways to submit admin
commands, vdpa can just support both of them and that is all.
When mediation is not needed then vdpa will just get out of the way.

> Hence, if both can converge to same set of commands, => good.
> When there is different way two stacks operates for those items, we will have spec extensions for both model, right?

My intiution says a modest amount of duplication isn't too bad.
E.g. I can see two ways to submit admin commands as being acceptable.
Are the SUSPEND bit and vq state as defined by these patches acceptable
in addition to vq commands as defined by your patches? For sure
it seems inelegant to say the least.


-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21 12:22                                                                   ` Michael S. Tsirkin
@ 2023-09-21 12:39                                                                     ` Parav Pandit
  2023-09-21 13:04                                                                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-21 12:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang


> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Thursday, September 21, 2023 5:53 PM

> 
> Again the thread has been side-tracked, which linux module does what is not
> what it was talking about.  By the way I wonder who decided to drop virtio
> comment from this and copy virtio-dev. Guys please don't do this and generally
> just send spec patches to virtio-comment, implementation discussion on virtio-
> dev.

I have no idea who dropped the virtio-comment.
For sure it is not me, as I am fully aware that virtio-dev is not the one to discuss.

> 
> 
> > >
> > > But for example, to migrate cross-vendor you need the pci config
> > > space to look the same and for some devices this might mean that pci
> > > config space will have to be mediated.
> > > Or maybe not. vdpa is a good framework in that it gives us
> > > flexibility, it is not opinionated.
> >
> > Sure, as you list, both has its pros and cons.
> > So both solutions has its space and trade off.
> 
> You can thinkably write a vfio based driver for PF and VF in userspace, sure. But
> I think this is just making things unnecessarily complex for users who will have
> to know which device to use with which driver. I think that e.g. if we have two
> ways to submit admin commands, vdpa can just support both of them and that
> is all.
> When mediation is not needed then vdpa will just get out of the way.
> 
Well there is enough documentation already exists to indicate users to know when to use what.

> > Hence, if both can converge to same set of commands, => good.
> > When there is different way two stacks operates for those items, we will have
> spec extensions for both model, right?
> 
> My intiution says a modest amount of duplication isn't too bad.
> E.g. I can see two ways to submit admin commands as being acceptable.
> Are the SUSPEND bit and vq state as defined by these patches acceptable in
> addition to vq commands as defined by your patches? For sure it seems
> inelegant to say the least.

Right, my patches are not bifurcating the device during the device migration.
It has generic device migration concept, so be it config space, or shared memory or admin queue or config interrupts or some other dma interface or some other things.
All are covered under device migration that software does not need to bisect.

So may be instead of suspending the VQ, it can be reseting the VQ by using existing functionality, and when enabling the VQ, it can start from newly supplied avail index.
This way, it can be elegant too.

These patches do not explain the motivation of why to suspend individual queues. Do you know?
There is suspend device bit as well, so it is unclear why to have both.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21 12:39                                                                     ` Parav Pandit
@ 2023-09-21 13:04                                                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-21 13:04 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Zhu, Lingshan, virtio-dev, Jason Wang

On Thu, Sep 21, 2023 at 12:39:31PM +0000, Parav Pandit wrote:
> 
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 5:53 PM
> 
> > 
> > Again the thread has been side-tracked, which linux module does what is not
> > what it was talking about.  By the way I wonder who decided to drop virtio
> > comment from this and copy virtio-dev. Guys please don't do this and generally
> > just send spec patches to virtio-comment, implementation discussion on virtio-
> > dev.
> 
> I have no idea who dropped the virtio-comment.
> For sure it is not me, as I am fully aware that virtio-dev is not the one to discuss.
> 
> > 
> > 
> > > >
> > > > But for example, to migrate cross-vendor you need the pci config
> > > > space to look the same and for some devices this might mean that pci
> > > > config space will have to be mediated.
> > > > Or maybe not. vdpa is a good framework in that it gives us
> > > > flexibility, it is not opinionated.
> > >
> > > Sure, as you list, both has its pros and cons.
> > > So both solutions has its space and trade off.
> > 
> > You can thinkably write a vfio based driver for PF and VF in userspace, sure. But
> > I think this is just making things unnecessarily complex for users who will have
> > to know which device to use with which driver. I think that e.g. if we have two
> > ways to submit admin commands, vdpa can just support both of them and that
> > is all.
> > When mediation is not needed then vdpa will just get out of the way.
> > 
> Well there is enough documentation already exists to indicate users to know when to use what.
> 
> > > Hence, if both can converge to same set of commands, => good.
> > > When there is different way two stacks operates for those items, we will have
> > spec extensions for both model, right?
> > 
> > My intiution says a modest amount of duplication isn't too bad.
> > E.g. I can see two ways to submit admin commands as being acceptable.
> > Are the SUSPEND bit and vq state as defined by these patches acceptable in
> > addition to vq commands as defined by your patches? For sure it seems
> > inelegant to say the least.
> 
> Right, my patches are not bifurcating the device during the device migration.
> It has generic device migration concept, so be it config space, or shared memory or admin queue or config interrupts or some other dma interface or some other things.
> All are covered under device migration that software does not need to bisect.
> 
> So may be instead of suspending the VQ, it can be reseting the VQ by using existing functionality, and when enabling the VQ, it can start from newly supplied avail index.
> This way, it can be elegant too.
> 
> These patches do not explain the motivation of why to suspend individual queues. Do you know?
> There is suspend device bit as well, so it is unclear why to have both.

Modularity is good generally but it's all guessing.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21 11:28                                               ` [virtio-comment] " Parav Pandit
@ 2023-09-22  2:40                                                 ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-22  2:40 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/21/2023 7:28 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Thursday, September 21, 2023 3:25 PM
>>
>> On 9/21/2023 5:26 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Thursday, September 21, 2023 2:49 PM TDISP devices can not be
>>>> migrated for now, and the TDISP spec make clear examples of attacking
>>>> models, your admin vq LM on the PF exactly match the model.
>>> I gave hint yesterday to you to consult Ravi at Intel who showed TDISP
>> migration using a dedicated TVM using similar mechanism as admin command.
>>> But you sadly ignored...
>>>
>>> So let me make another attempt to explain,
>>>
>>> When in future TDISP device migration to be supported, the admin command
>> will be done through a dedicated PF or a VF that resides in another trust
>> domain, for example another TVM.
>>> Such admin virtio device will not be located in the hypervisor.
>>> Thereby, it will be secure.
>>> The admin commands pave the road to make this happen. Only thing changes
>> is delegation of admin commands to another admin device instead of a PF.
>> if you plan to do it in future, then lets discuss in the future.
>>
>> And TDISP can be migrated in future does not mean admin vq LM is secure, I
>> have repeated for so many times of the attacking model. and I will not repeat
>> again.
>>> There are other solutions too that will arise.
>>> I have seen another one too, may be DPU.
>>>
>>> In all the 2 approaches, TDISP is migratable and spec will evolve as multiple
>> vendors including Intel, AMD and others showed the path towards it without
>> mediation.
>>> Virtio will be able to leverage that as well using admin commands.
>>>
>>> I want to emphasize again, do not keep repeating AQ in your comments.
>>> It is admin commands in proposal [1].
>> we are discussing LM, right? Can TDISP help you here? TDISP spec gives
>> examples of attacking models, and your admin vq matches it, I gave you quote
>> of the spec yesterday.
>>
>> This thread is about live migration anyway, not TDISP.
>>> As Michael also requested, I kindly request to co-operate on doing join
>> technical work, shared ideas, knowledge and improve the spec.
>>> [1]
>>> https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
>>> vidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
>> see other threads, I propose to reuse the basic facilities of live migration in
>> admin vq.
> I don’t see a point in repeating anything anymore with your constant repetitions and ignorance to ideas.
>
> I am happy to collaborate to driver virtio spec when you can give thoughts with an open mind to address two use cases to converge and discuss.
>
> 1. virtio device migration using mediation approach
As Jason and I have told you many times, basic and fundamental of 
virtualization is trap and emulate,
and this series work for trap and emulate.

And for mediation, do you see any troubles?

Can't vDPA migrate devices by this solution?
> 2. virtio member passthrough device migration
if you want, you can build admin vq LM on the basic facilities. But 
still admin vq LM will not work for nested.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-22  2:40                                                 ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-22  2:40 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, eperezma, Stefan Hajnoczi,
	Cornelia Huck, Jason Wang
  Cc: virtio-dev, virtio-comment



On 9/21/2023 7:28 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Thursday, September 21, 2023 3:25 PM
>>
>> On 9/21/2023 5:26 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Thursday, September 21, 2023 2:49 PM TDISP devices can not be
>>>> migrated for now, and the TDISP spec make clear examples of attacking
>>>> models, your admin vq LM on the PF exactly match the model.
>>> I gave hint yesterday to you to consult Ravi at Intel who showed TDISP
>> migration using a dedicated TVM using similar mechanism as admin command.
>>> But you sadly ignored...
>>>
>>> So let me make another attempt to explain,
>>>
>>> When in future TDISP device migration to be supported, the admin command
>> will be done through a dedicated PF or a VF that resides in another trust
>> domain, for example another TVM.
>>> Such admin virtio device will not be located in the hypervisor.
>>> Thereby, it will be secure.
>>> The admin commands pave the road to make this happen. Only thing changes
>> is delegation of admin commands to another admin device instead of a PF.
>> if you plan to do it in future, then lets discuss in the future.
>>
>> And TDISP can be migrated in future does not mean admin vq LM is secure, I
>> have repeated for so many times of the attacking model. and I will not repeat
>> again.
>>> There are other solutions too that will arise.
>>> I have seen another one too, may be DPU.
>>>
>>> In all the 2 approaches, TDISP is migratable and spec will evolve as multiple
>> vendors including Intel, AMD and others showed the path towards it without
>> mediation.
>>> Virtio will be able to leverage that as well using admin commands.
>>>
>>> I want to emphasize again, do not keep repeating AQ in your comments.
>>> It is admin commands in proposal [1].
>> we are discussing LM, right? Can TDISP help you here? TDISP spec gives
>> examples of attacking models, and your admin vq matches it, I gave you quote
>> of the spec yesterday.
>>
>> This thread is about live migration anyway, not TDISP.
>>> As Michael also requested, I kindly request to co-operate on doing join
>> technical work, shared ideas, knowledge and improve the spec.
>>> [1]
>>> https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
>>> vidia.com/T/#mf15b68617f772770c6bf79f70e8ddc6fea834cfa
>> see other threads, I propose to reuse the basic facilities of live migration in
>> admin vq.
> I don’t see a point in repeating anything anymore with your constant repetitions and ignorance to ideas.
>
> I am happy to collaborate to driver virtio spec when you can give thoughts with an open mind to address two use cases to converge and discuss.
>
> 1. virtio device migration using mediation approach
As Jason and I have told you many times, basic and fundamental of 
virtualization is trap and emulate,
and this series work for trap and emulate.

And for mediation, do you see any troubles?

Can't vDPA migrate devices by this solution?
> 2. virtio member passthrough device migration
if you want, you can build admin vq LM on the basic facilities. But 
still admin vq LM will not work for nested.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  4:19                                     ` Parav Pandit
@ 2023-09-22  3:08                                       ` Jason Wang
  2023-09-22  3:39                                           ` Zhu, Lingshan
  2023-09-25 10:41                                         ` Parav Pandit
  0 siblings, 2 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-22  3:08 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Thu, Sep 21, 2023 at 12:19 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 21, 2023 9:39 AM
> >
> > On Thu, Sep 21, 2023 at 12:01 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Thursday, September 21, 2023 8:48 AM
> > >
> > > > As replied in another thread, the issues for BAR are:
> > > >
> > > > 1) Not sure it can have an efficient interface, it would be
> > > > something like VIRTIO_PCI_CAP_PCI_CFG which is very slow compared to
> > > > single register accessing
> > > > 2) There's no owner/group/member for MMIO, most of the time, we only
> > > > need a single MMIO device. If we want the owner to manage itself, it
> > > > seems redundant as is implied in all the existing transports (without admin
> > commands).
> > > > Even if we had, it might still suffer from bootstrap issues.
> > > > 3) For live migration, it means the admin commands needs to start
> > > > from duplicating every existing transport specific interface it can
> > > > give us. One example is that we may end up with two interfaces to
> > > > access virtqueue addresses etc. This results in extra complicity and
> > > > it is actually a full transport (driver can just use admin commands to drive
> > the device).
> > > In [1] there is no duplication. The live migration driver never parses the device
> > context either while reading or write.
> > > Hence no code and no complexity in driver and no duplicate work.
> > > Therefore, those admin commands are not to drive the guest device either.
> >
> > I'm not sure how this is related to the duplication issue.
> >
> You commented that admin virtqueue duplicates somethings.
> And I explained above that it does not.
>
> > >
> > > > 4) Admin commands itself may not be capable of doing things like
> > > > dirty page logging, it requires the assistance from the transport
> > > >
> > > Admin command in [1] is capable of dirty page logging.
> >
> > In your design, the logging is done via DMA not the virtqueue.
> >
> No. it is done via admin command, not DMA in [2].
>
> [2] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#m17b09acd8c73d374e98ad84764b315afa94f59c9
>
> > The only job for virtqueue is to initiate the DMA. But if DMA can be initiated via
> > virtqueue, it can be done in other ways.
> >
> Lets first establish 4 things in alternative way, any motivation to do so with 5th point without giant registers need in device.
>
> > >
> > > > 1) Parav's proposal does several couplings: couple basic build
> > > > blocks (suspend, dirty page tracking) with live migration, couple
> > > > live migration with admin commands.
> > > In which use case you find dirty page tracking useful without migration for
> > which you like to see it detached from device migration flow?
> >
> > Is it only the dirty page tracking? It's the combinations of
> >
> > 1) suspending
> > 2) device states
> > 3) dirty page tracking
> >
> > Eeah of those will have use cases other than live migration: VM stop, power
> > management in VM, profiling and monitoring, failover etc.
> >
> Suspend/resume with different power state is driven by the guest directly.

And there's hibernation actually where device states might be useful.

> So it may find some overlap.
>
> Device context has no overlap.

I can give you one example, e.g debugging.

>
> Dirty page tracking has no overlap. What do you want to profile and monitor? In case if you want to profile, it can be used without migration command anyway?

It works like a dirty bit of PTE. We all know it has a broader use
case than logging. For example, tracking working set and do
optimization on IOMMU/IOTLB or even device IOTLB.

1) Try to prove your facility can only work for one specific cases
2) Try to prove your facility can work for more than one cases

Which one is easier and more beneficial to virtio?


> If you describe, may be we I can split "device migration" chapter to two pieces,
> Device management and device migration.
>
> Device migration will use these basic facility.
> Would that help you?

Definitely, but it needs to be done by not making it under the
subsection of admin commands, that's it.

Let me repeat once again here for the possible steps to collaboration:

1) define virtqueue state, inflight descriptors in the section of
basic facility but not under the admin commands
2) define the dirty page tracking, device context/states in the
section of basic facility but not under the admin commands
3) define transport specific interfaces or admin commands to access them

Does this work? It seems you refused such steps in the past.

Actually, I would like to leave 2) as it's very complicated which
might not converge easily.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21  4:29                                                 ` Parav Pandit
@ 2023-09-22  3:13                                                   ` Jason Wang
  0 siblings, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-22  3:13 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Thu, Sep 21, 2023 at 12:29 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Thursday, September 21, 2023 9:50 AM
>
> > Parav, I think I've clarified several times:
> >
> > migration using the admin command is probably fine in some use cases.
> >
> This is definitely, was not clear to me.
> I am 100% clear now.

Great, thanks.

>
> > What's not fine, is:
> >
> > Mandate the admin command to be the only way for migration.
> >
> For sure, my series did not mandate that either.
> I kept asking if we both can converge it will be really good to merge two use cases, we should.

I've replied in another thread.

> If we cannot because of technical issues, than both methods to exists to address two different use cases.
>
> > Are we on the same page for my concern now?
> >
> Yes.
>
> > > What is the advantage of descriptor posting using virtqueue. It is the way of
> > virtio spec...
> > >
> > > > > Well, it is one way to achieve it.
> > > > > There may be different way to do all bulk data transfer without
> > > > > admin
> > > > commands.
> > > >
> > > > Why is virtqueue the only way to do bulk data transferring? Can't
> > > > DMA be initiated by other-way?
> > > >
> > >
> > > Sure, what is the disadvantage of existing mechanism of virtqueue that can do
> > following.
> > > 1. Ability to do DMA
> > > 2. agnostic of the DMA who do not want to do DMA
> >
> > I don't understand this.
> >
> Admin commands can work without DMA, right because they are transported using admin queue.
>
> > > 3. Ability to multiple command executions in parallel
> >
> > Each device has their self-contained interface, why can't the commands be
> > executed in parallel.
> >
> Within the device it cannot if the interface is synchronous.
>
> > > 4. Non blocking interface for driver that does not require any kind of
> > > polling
> >
> > Are you saying the interrupt can only work for virtqueue?
> >
> No. I am saying if one has to invent a interface that satisfy above needs, it will become a virtqueue.
> And if it is not, one should list those disadvantages and cost of new interface, and explain its benefits.
> Such interface should be generic one too.

See my question below.

>
> > >
> > > Why to invent new DMA scheme which at does all the 4 tasks?
> >
> > It's simply because admin virtqueue can not work for all the cases. I think you've
> > agreed on this, no?
> >
> I think it may work for nested case as well at the cost of replicating it on each device and adding special plumbing to isolate it, so that guest cannot issue driver notifications to it.
>
> > > First please list down disadvantages of admin queue + show all 4 things are
> > achieved using new DMA interface.
> > > That will help to understand why new dma interface is needed.
> >
> > I can give you a simple example. For example, what happens if we want to
> > migrate the owner? Having another owner for this owner is not a good answer.
>
> That is the nesting, don’t see any difference with other nesting.

It's not necessarily nesting. For example how do you migrate PF?

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-21 10:39                                                                 ` Parav Pandit
  2023-09-21 12:22                                                                   ` Michael S. Tsirkin
@ 2023-09-22  3:31                                                                   ` Jason Wang
  1 sibling, 0 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-22  3:31 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Thu, Sep 21, 2023 at 6:39 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Thursday, September 21, 2023 3:40 PM
> >
> > On Thu, Sep 21, 2023 at 09:17:53AM +0000, Parav Pandit wrote:
> > > Vdpa stack make total sense when the underlying device is not virtio and
> > hence emulation.
> >
> > Which linux framework is used is kind of beside the point but since you bring
> > this up - not necessarily.
> >
> > E.g. I personally don't care much about "stack" but clearly we need a virtio
> > driver on the host to be involved, teaching vfio about virtio is probably a much
> > worse idea than creating a mode in the vdpa driver which mostly sets up the
> > IOMMU and otherwise gets out of the way of using the VF and just drives the
> > PF.
> Well, vdpa has to drive large many things including cvq, config space, msix and more.

Just to clarify, vDPA has the flexibility to decide if it wants to
deal with the above not (except the MSI-X which is hidden in the vDPA
layer).

That is to say, if vDPA wants, nothing prevents vDPA from assigning
CVQ and config space to guests.

> It can help to overcome some issues as you listed below.
> So that way vdpa over virtio is useful.
>
> In vfio world, there is nothing significant to teach about virtio.

It would be pretty fine if it's nothing, not nothing significant.

Thanks


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-22  3:08                                       ` Jason Wang
@ 2023-09-22  3:39                                           ` Zhu, Lingshan
  2023-09-25 10:41                                         ` Parav Pandit
  1 sibling, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-22  3:39 UTC (permalink / raw)
  To: Jason Wang, Parav Pandit, eperezma, Cornelia Huck,
	Michael S. Tsirkin, virtio-comment, Stefan Hajnoczi
  Cc: virtio-dev



On 9/22/2023 11:08 AM, Jason Wang wrote:
> On Thu, Sep 21, 2023 at 12:19 PM Parav Pandit <parav@nvidia.com> wrote:
>>
>>> From: Jason Wang <jasowang@redhat.com>
>>> Sent: Thursday, September 21, 2023 9:39 AM
>>>
>>> On Thu, Sep 21, 2023 at 12:01 PM Parav Pandit <parav@nvidia.com> wrote:
>>>>
>>>>
>>>>> From: Jason Wang <jasowang@redhat.com>
>>>>> Sent: Thursday, September 21, 2023 8:48 AM
>>>>> As replied in another thread, the issues for BAR are:
>>>>>
>>>>> 1) Not sure it can have an efficient interface, it would be
>>>>> something like VIRTIO_PCI_CAP_PCI_CFG which is very slow compared to
>>>>> single register accessing
>>>>> 2) There's no owner/group/member for MMIO, most of the time, we only
>>>>> need a single MMIO device. If we want the owner to manage itself, it
>>>>> seems redundant as is implied in all the existing transports (without admin
>>> commands).
>>>>> Even if we had, it might still suffer from bootstrap issues.
>>>>> 3) For live migration, it means the admin commands needs to start
>>>>> from duplicating every existing transport specific interface it can
>>>>> give us. One example is that we may end up with two interfaces to
>>>>> access virtqueue addresses etc. This results in extra complicity and
>>>>> it is actually a full transport (driver can just use admin commands to drive
>>> the device).
>>>> In [1] there is no duplication. The live migration driver never parses the device
>>> context either while reading or write.
>>>> Hence no code and no complexity in driver and no duplicate work.
>>>> Therefore, those admin commands are not to drive the guest device either.
>>> I'm not sure how this is related to the duplication issue.
>>>
>> You commented that admin virtqueue duplicates somethings.
>> And I explained above that it does not.
>>
>>>>> 4) Admin commands itself may not be capable of doing things like
>>>>> dirty page logging, it requires the assistance from the transport
>>>>>
>>>> Admin command in [1] is capable of dirty page logging.
>>> In your design, the logging is done via DMA not the virtqueue.
>>>
>> No. it is done via admin command, not DMA in [2].
>>
>> [2] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#m17b09acd8c73d374e98ad84764b315afa94f59c9
>>
>>> The only job for virtqueue is to initiate the DMA. But if DMA can be initiated via
>>> virtqueue, it can be done in other ways.
>>>
>> Lets first establish 4 things in alternative way, any motivation to do so with 5th point without giant registers need in device.
>>
>>>>> 1) Parav's proposal does several couplings: couple basic build
>>>>> blocks (suspend, dirty page tracking) with live migration, couple
>>>>> live migration with admin commands.
>>>> In which use case you find dirty page tracking useful without migration for
>>> which you like to see it detached from device migration flow?
>>>
>>> Is it only the dirty page tracking? It's the combinations of
>>>
>>> 1) suspending
>>> 2) device states
>>> 3) dirty page tracking
>>>
>>> Eeah of those will have use cases other than live migration: VM stop, power
>>> management in VM, profiling and monitoring, failover etc.
>>>
>> Suspend/resume with different power state is driven by the guest directly.
> And there's hibernation actually where device states might be useful.
>
>> So it may find some overlap.
>>
>> Device context has no overlap.
> I can give you one example, e.g debugging.
>
>> Dirty page tracking has no overlap. What do you want to profile and monitor? In case if you want to profile, it can be used without migration command anyway?
> It works like a dirty bit of PTE. We all know it has a broader use
> case than logging. For example, tracking working set and do
> optimization on IOMMU/IOTLB or even device IOTLB.
>
> 1) Try to prove your facility can only work for one specific cases
> 2) Try to prove your facility can work for more than one cases
>
> Which one is easier and more beneficial to virtio?
>
>
>> If you describe, may be we I can split "device migration" chapter to two pieces,
>> Device management and device migration.
>>
>> Device migration will use these basic facility.
>> Would that help you?
> Definitely, but it needs to be done by not making it under the
> subsection of admin commands, that's it.
>
> Let me repeat once again here for the possible steps to collaboration:
>
> 1) define virtqueue state, inflight descriptors in the section of
> basic facility but not under the admin commands
> 2) define the dirty page tracking, device context/states in the
> section of basic facility but not under the admin commands
> 3) define transport specific interfaces or admin commands to access them
I totally agree with this proposal.
>
> Does this work? It seems you refused such steps in the past.
>
> Actually, I would like to leave 2) as it's very complicated which
> might not converge easily.
>
> Thanks
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-22  3:39                                           ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-22  3:39 UTC (permalink / raw)
  To: Jason Wang, Parav Pandit, eperezma, Cornelia Huck,
	Michael S. Tsirkin, virtio-comment, Stefan Hajnoczi
  Cc: virtio-dev



On 9/22/2023 11:08 AM, Jason Wang wrote:
> On Thu, Sep 21, 2023 at 12:19 PM Parav Pandit <parav@nvidia.com> wrote:
>>
>>> From: Jason Wang <jasowang@redhat.com>
>>> Sent: Thursday, September 21, 2023 9:39 AM
>>>
>>> On Thu, Sep 21, 2023 at 12:01 PM Parav Pandit <parav@nvidia.com> wrote:
>>>>
>>>>
>>>>> From: Jason Wang <jasowang@redhat.com>
>>>>> Sent: Thursday, September 21, 2023 8:48 AM
>>>>> As replied in another thread, the issues for BAR are:
>>>>>
>>>>> 1) Not sure it can have an efficient interface, it would be
>>>>> something like VIRTIO_PCI_CAP_PCI_CFG which is very slow compared to
>>>>> single register accessing
>>>>> 2) There's no owner/group/member for MMIO, most of the time, we only
>>>>> need a single MMIO device. If we want the owner to manage itself, it
>>>>> seems redundant as is implied in all the existing transports (without admin
>>> commands).
>>>>> Even if we had, it might still suffer from bootstrap issues.
>>>>> 3) For live migration, it means the admin commands needs to start
>>>>> from duplicating every existing transport specific interface it can
>>>>> give us. One example is that we may end up with two interfaces to
>>>>> access virtqueue addresses etc. This results in extra complicity and
>>>>> it is actually a full transport (driver can just use admin commands to drive
>>> the device).
>>>> In [1] there is no duplication. The live migration driver never parses the device
>>> context either while reading or write.
>>>> Hence no code and no complexity in driver and no duplicate work.
>>>> Therefore, those admin commands are not to drive the guest device either.
>>> I'm not sure how this is related to the duplication issue.
>>>
>> You commented that admin virtqueue duplicates somethings.
>> And I explained above that it does not.
>>
>>>>> 4) Admin commands itself may not be capable of doing things like
>>>>> dirty page logging, it requires the assistance from the transport
>>>>>
>>>> Admin command in [1] is capable of dirty page logging.
>>> In your design, the logging is done via DMA not the virtqueue.
>>>
>> No. it is done via admin command, not DMA in [2].
>>
>> [2] https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#m17b09acd8c73d374e98ad84764b315afa94f59c9
>>
>>> The only job for virtqueue is to initiate the DMA. But if DMA can be initiated via
>>> virtqueue, it can be done in other ways.
>>>
>> Lets first establish 4 things in alternative way, any motivation to do so with 5th point without giant registers need in device.
>>
>>>>> 1) Parav's proposal does several couplings: couple basic build
>>>>> blocks (suspend, dirty page tracking) with live migration, couple
>>>>> live migration with admin commands.
>>>> In which use case you find dirty page tracking useful without migration for
>>> which you like to see it detached from device migration flow?
>>>
>>> Is it only the dirty page tracking? It's the combinations of
>>>
>>> 1) suspending
>>> 2) device states
>>> 3) dirty page tracking
>>>
>>> Eeah of those will have use cases other than live migration: VM stop, power
>>> management in VM, profiling and monitoring, failover etc.
>>>
>> Suspend/resume with different power state is driven by the guest directly.
> And there's hibernation actually where device states might be useful.
>
>> So it may find some overlap.
>>
>> Device context has no overlap.
> I can give you one example, e.g debugging.
>
>> Dirty page tracking has no overlap. What do you want to profile and monitor? In case if you want to profile, it can be used without migration command anyway?
> It works like a dirty bit of PTE. We all know it has a broader use
> case than logging. For example, tracking working set and do
> optimization on IOMMU/IOTLB or even device IOTLB.
>
> 1) Try to prove your facility can only work for one specific cases
> 2) Try to prove your facility can work for more than one cases
>
> Which one is easier and more beneficial to virtio?
>
>
>> If you describe, may be we I can split "device migration" chapter to two pieces,
>> Device management and device migration.
>>
>> Device migration will use these basic facility.
>> Would that help you?
> Definitely, but it needs to be done by not making it under the
> subsection of admin commands, that's it.
>
> Let me repeat once again here for the possible steps to collaboration:
>
> 1) define virtqueue state, inflight descriptors in the section of
> basic facility but not under the admin commands
> 2) define the dirty page tracking, device context/states in the
> section of basic facility but not under the admin commands
> 3) define transport specific interfaces or admin commands to access them
I totally agree with this proposal.
>
> Does this work? It seems you refused such steps in the past.
>
> Actually, I would like to leave 2) as it's very complicated which
> might not converge easily.
>
> Thanks
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-22  3:08                                       ` Jason Wang
  2023-09-22  3:39                                           ` Zhu, Lingshan
@ 2023-09-25 10:41                                         ` Parav Pandit
  2023-09-26  2:45                                           ` Jason Wang
  2023-09-27 21:43                                           ` Michael S. Tsirkin
  1 sibling, 2 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-25 10:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Friday, September 22, 2023 8:38 AM

> > Device context has no overlap.
> 
> I can give you one example, e.g debugging.
>
Almost every feature needs debugging. :)
So I am omitting it for time being.

 
> >
> > Dirty page tracking has no overlap. What do you want to profile and monitor?
> In case if you want to profile, it can be used without migration command
> anyway?
> 
> It works like a dirty bit of PTE. We all know it has a broader use case than
> logging. For example, tracking working set and do optimization on
> IOMMU/IOTLB or even device IOTLB.
> 
> 1) Try to prove your facility can only work for one specific cases
> 2) Try to prove your facility can work for more than one cases
> 
> Which one is easier and more beneficial to virtio?
> 
> 
> > If you describe, may be we I can split "device migration" chapter to
> > two pieces, Device management and device migration.
> >
> > Device migration will use these basic facility.
> > Would that help you?
> 
> Definitely, but it needs to be done by not making it under the subsection of
> admin commands, that's it.
> 
> Let me repeat once again here for the possible steps to collaboration:
> 
> 1) define virtqueue state, inflight descriptors in the section of basic facility but
> not under the admin commands
It will be part of the device context such a way that so that one can only read the vq state only instead of full device context.
This will work.

> 2) define the dirty page tracking, device context/states in the section of basic
> facility but not under the admin commands
Great.
Device context is already defined in the basic facility outside of the admin commands in [1].
Current text is around device migration and spec evolves it can adopt more generic text without device migration.

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html

> 3) define transport specific interfaces or admin commands to access them
>
As you also envisioned, it is done using admin commands to access it.

> Does this work? It seems you refused such steps in the past.
> 
I didn’t.
There must be some confusion in many emails we both exchanged, because already posted v0 has it split such as device context.

For dirty page tracking I couldn’t find a solid use case without device migration, so I asked which you already replied above.

> Actually, I would like to leave 2) as it's very complicated which might not
> converge easily.
>
I will split current "device migration" section to two.
1. device management
2. device migration

Device management covers device mode, device context and dirty page tracking.
Device migration refers to device management section.

We can omit the dirty page tracking commands for a moment and first close on the device mode and device context as first step.
Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but moved to device management section.

Michael,
Are you ok with this approach to step forward?

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-25 10:41                                         ` Parav Pandit
@ 2023-09-26  2:45                                           ` Jason Wang
  2023-09-26  3:40                                             ` Parav Pandit
  2023-09-27 21:43                                           ` Michael S. Tsirkin
  1 sibling, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-09-26  2:45 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Friday, September 22, 2023 8:38 AM
>
> > > Device context has no overlap.
> >
> > I can give you one example, e.g debugging.
> >
> Almost every feature needs debugging. :)
> So I am omitting it for time being.

Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).

>
>
> > >
> > > Dirty page tracking has no overlap. What do you want to profile and monitor?
> > In case if you want to profile, it can be used without migration command
> > anyway?
> >
> > It works like a dirty bit of PTE. We all know it has a broader use case than
> > logging. For example, tracking working set and do optimization on
> > IOMMU/IOTLB or even device IOTLB.
> >
> > 1) Try to prove your facility can only work for one specific cases
> > 2) Try to prove your facility can work for more than one cases
> >
> > Which one is easier and more beneficial to virtio?
> >
> >
> > > If you describe, may be we I can split "device migration" chapter to
> > > two pieces, Device management and device migration.
> > >
> > > Device migration will use these basic facility.
> > > Would that help you?
> >
> > Definitely, but it needs to be done by not making it under the subsection of
> > admin commands, that's it.
> >
> > Let me repeat once again here for the possible steps to collaboration:
> >
> > 1) define virtqueue state, inflight descriptors in the section of basic facility but
> > not under the admin commands
> It will be part of the device context such a way that so that one can only read the vq state only instead of full device context.
> This will work.

I'm not sure what it looks like, but I think they are well decoupled
in this series. E.g driver can choose to just read e.g last_avail_idx
and report to ethtool or watchdog.

>
> > 2) define the dirty page tracking, device context/states in the section of basic
> > facility but not under the admin commands
> Great.
> Device context is already defined in the basic facility outside of the admin commands in [1].
> Current text is around device migration and spec evolves it can adopt more generic text without device migration.
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html

As I replied in other thread, I see several problems:

1) layer violation, PCI specific state were mixed into the basic facilities
2) I don't see a good definition on "device context"
3) TLV works fine for queue but not registers

What needs to be done first is to describe what device context means
and what it contains. Not the actual data structure since it may vary.

>
> > 3) define transport specific interfaces or admin commands to access them
> >
> As you also envisioned, it is done using admin commands to access it.

That's fine, but it should allow other ways.

>
> > Does this work? It seems you refused such steps in the past.
> >
> I didn’t.
> There must be some confusion in many emails we both exchanged, because already posted v0 has it split such as device context.
>
> For dirty page tracking I couldn’t find a solid use case without device migration, so I asked which you already replied above.

First, you never explain why such coupling gives us any benefit.
Second, I've given you sufficient examples but you tend to ignore
them. Why not go through Qemu codes then you will see the answer.

>
> > Actually, I would like to leave 2) as it's very complicated which might not
> > converge easily.
> >
> I will split current "device migration" section to two.
> 1. device management
> 2. device migration
>
> Device management covers device mode, device context and dirty page tracking.

I don't see any connection between "management" and "device context".

> Device migration refers to device management section.
>
> We can omit the dirty page tracking commands for a moment and first close on the device mode and device context as first step.
> Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but moved to device management section.

Could you please answer what's wrong with the first 4 patches in this series?

Thakns

>
> Michael,
> Are you ok with this approach to step forward?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  2:45                                           ` Jason Wang
@ 2023-09-26  3:40                                             ` Parav Pandit
  2023-09-26  4:37                                               ` Jason Wang
  2023-09-26  5:36                                                 ` [virtio-comment] " Zhu, Lingshan
  0 siblings, 2 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-26  3:40 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 8:16 AM
> 
> On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Friday, September 22, 2023 8:38 AM
> >
> > > > Device context has no overlap.
> > >
> > > I can give you one example, e.g debugging.
> > >
> > Almost every feature needs debugging. :) So I am omitting it for time
> > being.
> 
> Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
Sure add something specific for debug and explicitly mention that it is for debug like -d.
Every feature and functionality needs debug, not specifically device context.
So add infra for debug. Device migration series is not the vehicle to piggy back on.

> > > 1) define virtqueue state, inflight descriptors in the section of
> > > basic facility but not under the admin commands
> > It will be part of the device context such a way that so that one can only read
> the vq state only instead of full device context.
> > This will work.
> 
> I'm not sure what it looks like, but I think they are well decoupled in this series.
> E.g driver can choose to just read e.g last_avail_idx and report to ethtool or
> watchdog.
>
Once its done it will be visible how it looks like.
The key is it needs to cover BOTH use cases.

> As I replied in other thread, I see several problems:
> 
> 1) layer violation, PCI specific state were mixed into the basic facilities

After agreeing to see merged donctext, now you are hinting that you don’t agree to merge two.
I disagree if you are leaning towards that direction.
I hope my deduction from your above comment is incorrect.

There is no violation. PCI specific device context will be captured in PCI specific section.
Device type contexts will be captured in those device type sections.

TLVs will cover many of the device context information.

> 2) I don't see a good definition on "device context"
> 3) TLV works fine for queue but not registers
> 
Please definition of device context in [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html

> What needs to be done first is to describe what device context means and what
> it contains. Not the actual data structure since it may vary.
> 
Sure, it is already defined in device migration theory of operation section in [2].
I will try to take it out and put in device management section, so that device migration can refer to it and 
Some other basic facility also can refer to it (which must need to explain a use case beyond silly debug point).

[2] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.html

> >
> > > 3) define transport specific interfaces or admin commands to access
> > > them
> > >
> > As you also envisioned, it is done using admin commands to access it.
> 
> That's fine, but it should allow other ways.
>
For passthrough admin commands fits the design.
Sure, you need to draft it how to do other ways..
 > >
> > > Does this work? It seems you refused such steps in the past.
> > >
> > I didn’t.
> > There must be some confusion in many emails we both exchanged, because
> already posted v0 has it split such as device context.
> >
> > For dirty page tracking I couldn’t find a solid use case without device
> migration, so I asked which you already replied above.
> 
> First, you never explain why such coupling gives us any benefit.
It is not coupled. Device migration uses this facility. So it is matter of text organization in the spec.
Not the design.

> Second, I've given you sufficient examples but you tend to ignore them. Why
> not go through Qemu codes then you will see the answer.
You gave example of debug, and profiling. You didn’t explain the use case of how to actually connect and how to profile etc.

> > > Actually, I would like to leave 2) as it's very complicated which
> > > might not converge easily.
> > >
> > I will split current "device migration" section to two.
> > 1. device management
> > 2. device migration
> >
> > Device management covers device mode, device context and dirty page
> tracking.
> 
> I don't see any connection between "management" and "device context".
>
You have better name than device management?
May be device operation.
May be it is just better to have the device context just the way it is in [1] under basic facility.
 
> > Device migration refers to device management section.
> >
> > We can omit the dirty page tracking commands for a moment and first close
> on the device mode and device context as first step.
> > Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but
> moved to device management section.
> 
> Could you please answer what's wrong with the first 4 patches in this series?
> 
1. cover letter is missing the problem statement and use case
2. why queue suspend and both device suspend are introduced, only one should be there. The design description is missing.
3. Even though it claims under some random basic facility, cover letter clearly states the main use case is "live migration".
4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It does not do any bifurcation.
5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend and freeze both covered in series [1].
6. Finally the whole description of 1 to 4 need to be split in the device operation, so that both passthrough and medication can utilize it using admin cmd and otherwise.
Since Zhu, told that dirty tracking and inflight descriptors will be done, I presume he will propose to do over admin q or command interface.
And since all can run over the admin commands, the plumbing done in 1 to 4 can be made using admin commands.

Until now we could not establish creating yet another DMA interface that is better than q interface.
So ...
To me both the methods will start looking more converged to me over admin command and queues.

Passthrough will use them over owner device.
Mediation somehow need to do over member device.
Mediation will not use any device suspend command because it needs to keep bisecting everything.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  3:40                                             ` Parav Pandit
@ 2023-09-26  4:37                                               ` Jason Wang
  2023-09-26  5:21                                                 ` Parav Pandit
  2023-09-27 15:31                                                 ` Michael S. Tsirkin
  2023-09-26  5:36                                                 ` [virtio-comment] " Zhu, Lingshan
  1 sibling, 2 replies; 445+ messages in thread
From: Jason Wang @ 2023-09-26  4:37 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Tue, Sep 26, 2023 at 11:40 AM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 26, 2023 8:16 AM
> >
> > On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Friday, September 22, 2023 8:38 AM
> > >
> > > > > Device context has no overlap.
> > > >
> > > > I can give you one example, e.g debugging.
> > > >
> > > Almost every feature needs debugging. :) So I am omitting it for time
> > > being.
> >
> > Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
> Sure add something specific for debug and explicitly mention that it is for debug like -d.
> Every feature and functionality needs debug, not specifically device context.
> So add infra for debug.

Why do you think it's an infra? All you need to do is a simple decoupling.

> Device migration series is not the vehicle to piggy back on.
>
> > > > 1) define virtqueue state, inflight descriptors in the section of
> > > > basic facility but not under the admin commands
> > > It will be part of the device context such a way that so that one can only read
> > the vq state only instead of full device context.
> > > This will work.
> >
> > I'm not sure what it looks like, but I think they are well decoupled in this series.
> > E.g driver can choose to just read e.g last_avail_idx and report to ethtool or
> > watchdog.
> >
> Once its done it will be visible how it looks like.
> The key is it needs to cover BOTH use cases.
>
> > As I replied in other thread, I see several problems:
> >
> > 1) layer violation, PCI specific state were mixed into the basic facilities
>
> After agreeing to see merged donctext, now you are hinting that you don’t agree to merge two.

It's not a merging, it's about decoupling. I'm fine if you don't do
coupling and layer violation.

> I disagree if you are leaning towards that direction.
> I hope my deduction from your above comment is incorrect.

I agree to seek a way to unify but it doesn't mean everything in your
current proposal is correct. Basic facility part should be transport
independent.

>
> There is no violation. PCI specific device context will be captured in PCI specific section.

Is that what you've done in your series now?

> Device type contexts will be captured in those device type sections.
>
> TLVs will cover many of the device context information.
>
> > 2) I don't see a good definition on "device context"
> > 3) TLV works fine for queue but not registers
> >
> Please definition of device context in [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html
>
> > What needs to be done first is to describe what device context means and what
> > it contains. Not the actual data structure since it may vary.
> >
> Sure, it is already defined in device migration theory of operation section in [2].

It's too vague, for example it's not easy to infer if the following
belong to device context:

1) dirty pages
2) virtqueue adresses

etc.

> I will try to take it out and put in device management section, so that device migration can refer to it and
> Some other basic facility also can refer to it (which must need to explain a use case beyond silly debug point).

It's silly to ship a product without any debugging facilities.

And I've given you other examples like hibernation but you ignore them
again. And what's more important, you ignores my question, let me ask
you again here:

1) Try to prove your facility can only work for one specific case
2) Try to prove your facility can work for more than one cases

Which one is easier and more beneficial to virtio?

The decoupling is just a matter of relocating the text, does it block
any of your proposals? Why do you choose to refuse such a simple
change with such a huge advantage?

>
> [2] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.html
>
> > >
> > > > 3) define transport specific interfaces or admin commands to access
> > > > them
> > > >
> > > As you also envisioned, it is done using admin commands to access it.
> >
> > That's fine, but it should allow other ways.
> >
> For passthrough admin commands fits the design.
> Sure, you need to draft it how to do other ways..

So, it looks to me you never examine this series carefully.

>  > >
> > > > Does this work? It seems you refused such steps in the past.
> > > >
> > > I didn’t.
> > > There must be some confusion in many emails we both exchanged, because
> > already posted v0 has it split such as device context.
> > >
> > > For dirty page tracking I couldn’t find a solid use case without device
> > migration, so I asked which you already replied above.
> >
> > First, you never explain why such coupling gives us any benefit.
> It is not coupled. Device migration uses this facility. So it is matter of text organization in the spec.
> Not the design.
>
> > Second, I've given you sufficient examples but you tend to ignore them. Why
> > not go through Qemu codes then you will see the answer.
> You gave example of debug, and profiling. You didn’t explain the use case of how to actually connect and how to profile etc.

How much evidence do you need me to provide? Do I need to give you the
manpages of ethtool -d? For profiling, is it too hard to think, for
example, to give non retire hints to IOTLB via a specific IOPTE if we
found a page is dirty?

Compare the evidence or proof you've provided, how much you have
provided so far? Or could you please prove the device context can only
work for migration?

>
> > > > Actually, I would like to leave 2) as it's very complicated which
> > > > might not converge easily.
> > > >
> > > I will split current "device migration" section to two.
> > > 1. device management
> > > 2. device migration
> > >
> > > Device management covers device mode, device context and dirty page
> > tracking.
> >
> > I don't see any connection between "management" and "device context".
> >
> You have better name than device management?
> May be device operation.
> May be it is just better to have the device context just the way it is in [1] under basic facility.

I don't understand this.

>
> > > Device migration refers to device management section.
> > >
> > > We can omit the dirty page tracking commands for a moment and first close
> > on the device mode and device context as first step.
> > > Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but
> > moved to device management section.
> >
> > Could you please answer what's wrong with the first 4 patches in this series?
> >

I'd leave Ling Shan to comment on this.

> 1. cover letter is missing the problem statement and use case
> 2. why queue suspend and both device suspend are introduced, only one should be there. The design description is missing.
> 3. Even though it claims under some random basic facility, cover letter clearly states the main use case is "live migration".
> 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It does not do any bifurcation.
> 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend and freeze both covered in series [1].
> 6. Finally the whole description of 1 to 4 need to be split in the device operation, so that both passthrough and medication can utilize it using admin cmd and otherwise.
> Since Zhu, told that dirty tracking and inflight descriptors will be done, I presume he will propose to do over admin q or command interface.
> And since all can run over the admin commands, the plumbing done in 1 to 4 can be made using admin commands.
>
> Until now we could not establish creating yet another DMA interface that is better than q interface.
> So ...
> To me both the methods will start looking more converged to me over admin command and queues.

This is self-contradictory with what you've said before.

We're in an endless circle now. The main reason is that you keep
ignoring comments.

Another question you keep ignoring is: what prevents this proposal
from being used in the passthrough setups? I'm pretty sure if you can
get the right answer, you will understand the big picture.

Thanks




>
> Passthrough will use them over owner device.
> Mediation somehow need to do over member device.
> Mediation will not use any device suspend command because it needs to keep bisecting everything.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  4:37                                               ` Jason Wang
@ 2023-09-26  5:21                                                 ` Parav Pandit
  2023-10-09  8:49                                                   ` Jason Wang
  2023-09-27 15:31                                                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 445+ messages in thread
From: Parav Pandit @ 2023-09-26  5:21 UTC (permalink / raw)
  To: Jason Wang; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev



> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, September 26, 2023 10:07 AM
> 
> On Tue, Sep 26, 2023 at 11:40 AM Parav Pandit <parav@nvidia.com> wrote:
> >
> >
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, September 26, 2023 8:16 AM
> > >
> > > On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <parav@nvidia.com> wrote:
> > > >
> > > >
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Friday, September 22, 2023 8:38 AM
> > > >
> > > > > > Device context has no overlap.
> > > > >
> > > > > I can give you one example, e.g debugging.
> > > > >
> > > > Almost every feature needs debugging. :) So I am omitting it for
> > > > time being.
> > >
> > > Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
> > Sure add something specific for debug and explicitly mention that it is for
> debug like -d.
> > Every feature and functionality needs debug, not specifically device context.
> > So add infra for debug.
> 
> Why do you think it's an infra? All you need to do is a simple decoupling.
>
It is too vague comment for me.
Everything you need to debug can be queried from the device if needed.
So please add the debug infrastructure for it.

It may be useful outside of debug too.

Practically any query interface added, can be used by the member driver for debug purposes.

> > Device migration series is not the vehicle to piggy back on.
> >
> > > > > 1) define virtqueue state, inflight descriptors in the section
> > > > > of basic facility but not under the admin commands
> > > > It will be part of the device context such a way that so that one
> > > > can only read
> > > the vq state only instead of full device context.
> > > > This will work.
> > >
> > > I'm not sure what it looks like, but I think they are well decoupled in this
> series.
> > > E.g driver can choose to just read e.g last_avail_idx and report to
> > > ethtool or watchdog.
> > >
> > Once its done it will be visible how it looks like.
> > The key is it needs to cover BOTH use cases.
> >
> > > As I replied in other thread, I see several problems:
> > >
> > > 1) layer violation, PCI specific state were mixed into the basic
> > > facilities
> >
> > After agreeing to see merged donctext, now you are hinting that you don’t
> agree to merge two.
> 
> It's not a merging, it's about decoupling. I'm fine if you don't do coupling and
> layer violation.
>
Admin command is decoupled from admin virtqueue already.
Device context is decoupled from admin command already.
Dirty page tracking is decpupled from admin command already.
Device mode is decoupled from admin command already.

 
> > I disagree if you are leaning towards that direction.
> > I hope my deduction from your above comment is incorrect.
> 
> I agree to seek a way to unify but it doesn't mean everything in your current
> proposal is correct. Basic facility part should be transport independent.
>
Can you please comment in my series what is incorrect? I would like to discuss there, similar to your ask here.

 
> >
> > There is no violation. PCI specific device context will be captured in PCI
> specific section.
> 
> Is that what you've done in your series now?
>
It will be added in the v1 of my series.
 
> > Device type contexts will be captured in those device type sections.
> >
> > TLVs will cover many of the device context information.
> >
> > > 2) I don't see a good definition on "device context"
> > > 3) TLV works fine for queue but not registers
> > >
> > Please definition of device context in [1].
> >
> > [1]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.h
> > tml
> >
> > > What needs to be done first is to describe what device context means
> > > and what it contains. Not the actual data structure since it may vary.
> > >
> > Sure, it is already defined in device migration theory of operation section in
> [2].
> 
> It's too vague, for example it's not easy to infer if the following belong to device
> context:
> 
I am 100% confident and challenge you that theory of operation explained in my series is sigficantly better than commit log of Lingshan saying "main use case is live migration".

> 1) dirty pages

Above does not. Once you read each patch it should be clear, because dirty pages in not part of the device context.

> 2) virtqueue adresses
>
It is part of the device context is listed in the patch.
 
> etc.
> 
> > I will try to take it out and put in device management section, so
> > that device migration can refer to it and Some other basic facility also can
> refer to it (which must need to explain a use case beyond silly debug point).
> 
> It's silly to ship a product without any debugging facilities.
> 
:)
How come until now no one debug virtio last_avail_index in the device?
Please really stop such arguments.

If something is needed for debug please proceed to add such debug interface.

> And I've given you other examples like hibernation but you ignore them again.
No. I didn’t ignore. In fact I asked AMD expert to extend his proposal beyond GPU and more.

> And what's more important, you ignores my question, let me ask you again
> here:
> 
> 1) Try to prove your facility can only work for one specific case
> 2) Try to prove your facility can work for more than one cases
> 
> Which one is easier and more beneficial to virtio?
>
#1 is easier than #2 as it needs only solve specific case.
#2 may be more useful when one establishes two cases _really_ exists.

> The decoupling is just a matter of relocating the text, does it block any of your
> proposals? Why do you choose to refuse such a simple change with such a huge
> advantage?

I repeat, I didn’t refuse. I just don’t see a point of writing something without a solid use case defined.
If debug is a use case, one must write "debug" there..

> 
> >
> > [2]
> > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.h
> > tml
> >
> > > >
> > > > > 3) define transport specific interfaces or admin commands to
> > > > > access them
> > > > >
> > > > As you also envisioned, it is done using admin commands to access it.
> > >
> > > That's fine, but it should allow other ways.
> > >
> > For passthrough admin commands fits the design.
> > Sure, you need to draft it how to do other ways..
> 
> So, it looks to me you never examine this series carefully.
> 
Please stay to technical discussions.
I examined and there is zero description beyond "main use case is live migration".

I don’t see the way how your proposals fits the passthrough case.

> >  > >
> > > > > Does this work? It seems you refused such steps in the past.
> > > > >
> > > > I didn’t.
> > > > There must be some confusion in many emails we both exchanged,
> > > > because
> > > already posted v0 has it split such as device context.
> > > >
> > > > For dirty page tracking I couldn’t find a solid use case without
> > > > device
> > > migration, so I asked which you already replied above.
> > >
> > > First, you never explain why such coupling gives us any benefit.
> > It is not coupled. Device migration uses this facility. So it is matter of text
> organization in the spec.
> > Not the design.
> >
> > > Second, I've given you sufficient examples but you tend to ignore
> > > them. Why not go through Qemu codes then you will see the answer.
> > You gave example of debug, and profiling. You didn’t explain the use case of
> how to actually connect and how to profile etc.
> 
> How much evidence do you need me to provide? Do I need to give you the
> manpages of ethtool -d? For profiling, is it too hard to think, for example, to give
> non retire hints to IOTLB via a specific IOPTE if we found a page is dirty?
>
Sounds useful, but this is a very different use case than live migration.
If I think further, this interface may be useful in the data path side, when mapping/unmapping is done.
So it can actually use the existing interface.
A side interface may not be useful as it may be coming slightly less as opposed to existing hint of DMA interface.
 
> Compare the evidence or proof you've provided, how much you have provided
> so far? Or could you please prove the device context can only work for
> migration?
>
Virtio spec development is not proof based method. I wont go this route.
Please review the patches and comment.

Device context works well for the device migration use case.
If you don’t understand specific text in the patches, please comment there to proceed.
 
> >
> > > > > Actually, I would like to leave 2) as it's very complicated
> > > > > which might not converge easily.
> > > > >
> > > > I will split current "device migration" section to two.
> > > > 1. device management
> > > > 2. device migration
> > > >
> > > > Device management covers device mode, device context and dirty
> > > > page
> > > tracking.
> > >
> > > I don't see any connection between "management" and "device context".
> > >
> > You have better name than device management?
> > May be device operation.
> > May be it is just better to have the device context just the way it is in [1] under
> basic facility.
> 
> I don't understand this.
> 
Device context is defined in "basic facility" section.
I will just keep it there.

> >
> > > > Device migration refers to device management section.
> > > >
> > > > We can omit the dirty page tracking commands for a moment and
> > > > first close
> > > on the device mode and device context as first step.
> > > > Since it is already part of v0 and it is needed, I will keep it in
> > > > subsequent v1 but
> > > moved to device management section.
> > >
> > > Could you please answer what's wrong with the first 4 patches in this series?
> > >
> 
> I'd leave Ling Shan to comment on this.
> 
> > 1. cover letter is missing the problem statement and use case 2. why
> > queue suspend and both device suspend are introduced, only one should be
> there. The design description is missing.
> > 3. Even though it claims under some random basic facility, cover letter clearly
> states the main use case is "live migration".
> > 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It
> does not do any bifurcation.
> > 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend
> and freeze both covered in series [1].
> > 6. Finally the whole description of 1 to 4 need to be split in the device
> operation, so that both passthrough and medication can utilize it using admin
> cmd and otherwise.
> > Since Zhu, told that dirty tracking and inflight descriptors will be done, I
> presume he will propose to do over admin q or command interface.
> > And since all can run over the admin commands, the plumbing done in 1 to 4
> can be made using admin commands.
> >
> > Until now we could not establish creating yet another DMA interface that is
> better than q interface.
> > So ...
> > To me both the methods will start looking more converged to me over admin
> command and queues.
> 
> This is self-contradictory with what you've said before.
> 
> We're in an endless circle now. The main reason is that you keep ignoring
> comments.
>
You didn’t comment in the series that should be reviewed.

We can start back as following.
1. We have two use cases, first agree that both use cases exist.
If disagree, discuss...
2. Agree to improve device migration framework for both
If disagree, discuss...
3. How to implement such functionality, discuss if both can use common framework.
a. Discuss what are the technical touch points, which differs between two use cases. What different framework is needed for each case.
b. Discuss which are the common points that both can leverage.

> Another question you keep ignoring is: what prevents this proposal from being
> used in the passthrough setups? I'm pretty sure if you can get the right answer,
> you will understand the big picture.
I answered this already.
In passthrough mode, the passthrough device do not have access to any of the following facility.
These facility must reside on the owner device.
a. dirty page tracking
b. incremental device context read/write
c. device mode setting

Your comment is, some of this useful for debug and IOTLB optimization, so one can also avail this facility in the member device too.
This is fine, both can have it.
In such use case one will have two AQs, one on owner device, one on the member device.
Both operates at their own domain.

For debug purpose you wanted to use non incremental device context.
This is a different use case, and one should build it when it is _really_ needed.
Putting that in the debug umbrella to use it for mediation use case is not the right way to proceed.
One should say, mediation requires non incremental device context and the interface may look different, this is also fine.

Please don’t try to contradict two series, and it will be really easy to work forward.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  3:40                                             ` Parav Pandit
@ 2023-09-26  5:36                                                 ` Zhu, Lingshan
  2023-09-26  5:36                                                 ` [virtio-comment] " Zhu, Lingshan
  1 sibling, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-26  5:36 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Michael S. Tsirkin, eperezma,
	Stefan Hajnoczi, virtio-comment
  Cc: virtio-dev



On 9/26/2023 11:40 AM, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, September 26, 2023 8:16 AM
>>
>> On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <parav@nvidia.com> wrote:
>>>
>>>
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Friday, September 22, 2023 8:38 AM
>>>>> Device context has no overlap.
>>>> I can give you one example, e.g debugging.
>>>>
>>> Almost every feature needs debugging. :) So I am omitting it for time
>>> being.
>> Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
> Sure add something specific for debug and explicitly mention that it is for debug like -d.
> Every feature and functionality needs debug, not specifically device context.
> So add infra for debug. Device migration series is not the vehicle to piggy back on.
>
>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>> basic facility but not under the admin commands
>>> It will be part of the device context such a way that so that one can only read
>> the vq state only instead of full device context.
>>> This will work.
>> I'm not sure what it looks like, but I think they are well decoupled in this series.
>> E.g driver can choose to just read e.g last_avail_idx and report to ethtool or
>> watchdog.
>>
> Once its done it will be visible how it looks like.
> The key is it needs to cover BOTH use cases.
>
>> As I replied in other thread, I see several problems:
>>
>> 1) layer violation, PCI specific state were mixed into the basic facilities
> After agreeing to see merged donctext, now you are hinting that you don’t agree to merge two.
> I disagree if you are leaning towards that direction.
> I hope my deduction from your above comment is incorrect.
>
> There is no violation. PCI specific device context will be captured in PCI specific section.
> Device type contexts will be captured in those device type sections.
>
> TLVs will cover many of the device context information.
>
>> 2) I don't see a good definition on "device context"
>> 3) TLV works fine for queue but not registers
>>
> Please definition of device context in [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html
>
>> What needs to be done first is to describe what device context means and what
>> it contains. Not the actual data structure since it may vary.
>>
> Sure, it is already defined in device migration theory of operation section in [2].
> I will try to take it out and put in device management section, so that device migration can refer to it and
> Some other basic facility also can refer to it (which must need to explain a use case beyond silly debug point).
>
> [2] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.html
>
>>>> 3) define transport specific interfaces or admin commands to access
>>>> them
>>>>
>>> As you also envisioned, it is done using admin commands to access it.
>> That's fine, but it should allow other ways.
>>
> For passthrough admin commands fits the design.
> Sure, you need to draft it how to do other ways..
>   > >
>>>> Does this work? It seems you refused such steps in the past.
>>>>
>>> I didn’t.
>>> There must be some confusion in many emails we both exchanged, because
>> already posted v0 has it split such as device context.
>>> For dirty page tracking I couldn’t find a solid use case without device
>> migration, so I asked which you already replied above.
>>
>> First, you never explain why such coupling gives us any benefit.
> It is not coupled. Device migration uses this facility. So it is matter of text organization in the spec.
> Not the design.
>
>> Second, I've given you sufficient examples but you tend to ignore them. Why
>> not go through Qemu codes then you will see the answer.
> You gave example of debug, and profiling. You didn’t explain the use case of how to actually connect and how to profile etc.
>
>>>> Actually, I would like to leave 2) as it's very complicated which
>>>> might not converge easily.
>>>>
>>> I will split current "device migration" section to two.
>>> 1. device management
>>> 2. device migration
>>>
>>> Device management covers device mode, device context and dirty page
>> tracking.
>>
>> I don't see any connection between "management" and "device context".
>>
> You have better name than device management?
> May be device operation.
> May be it is just better to have the device context just the way it is in [1] under basic facility.
>   
>>> Device migration refers to device management section.
>>>
>>> We can omit the dirty page tracking commands for a moment and first close
>> on the device mode and device context as first step.
>>> Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but
>> moved to device management section.
>>
>> Could you please answer what's wrong with the first 4 patches in this series?
>>
> 1. cover letter is missing the problem statement and use case
I only reply to this section of comments, this does not mean I agree 
with you on your other statements, Instead I agree with
Jason on his replies to you.

In my cover letter:
"The main usecase of these new facilities is Live Migration."

Did you miss it?
> 2. why queue suspend and both device suspend are introduced, only one should be there. The design description is missing.
there are no queue suspend, they are device suspend and vq state 
accessors. please read the patch if you want to comment.
> 3. Even though it claims under some random basic facility, cover letter clearly states the main use case is "live migration".
it is not random, they are precisely defined virtio basic facilities.
> 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It does not do any bifurcation.
The device should not accept vq reset and the driver should reset vqs, 
please read previous discussions with MST
and please don't ignore the conclusions
> 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend and freeze both covered in series [1].
we have discussed this for many times, P2P is out of virtio spec, do you 
want to mediate every PCI state/functionality?
> 6. Finally the whole description of 1 to 4 need to be split in the device operation, so that both passthrough and medication can utilize it using admin cmd and otherwise.
Do you see any reasons this solution can not be used for passthrough and 
mediation?
Or does features_OK work for passthrough or mediation? Any difference?
> Since Zhu, told that dirty tracking and inflight descriptors will be done, I presume he will propose to do over admin q or command interface.
> And since all can run over the admin commands, the plumbing done in 1 to 4 can be made using admin commands.
No
>
> Until now we could not establish creating yet another DMA interface that is better than q interface.
> So ...
> To me both the methods will start looking more converged to me over admin command and queues.
I don't think so, again, we are introducing basic facilities and these 
facilities don't
depend on or rely on admin vq.
>
> Passthrough will use them over owner device.
> Mediation somehow need to do over member device.
> Mediation will not use any device suspend command because it needs to keep bisecting everything.
please read QEMU vhost live migration solution


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-26  5:36                                                 ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-26  5:36 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Michael S. Tsirkin, eperezma,
	Stefan Hajnoczi, virtio-comment
  Cc: virtio-dev



On 9/26/2023 11:40 AM, Parav Pandit wrote:
>
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Tuesday, September 26, 2023 8:16 AM
>>
>> On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <parav@nvidia.com> wrote:
>>>
>>>
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Sent: Friday, September 22, 2023 8:38 AM
>>>>> Device context has no overlap.
>>>> I can give you one example, e.g debugging.
>>>>
>>> Almost every feature needs debugging. :) So I am omitting it for time
>>> being.
>> Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
> Sure add something specific for debug and explicitly mention that it is for debug like -d.
> Every feature and functionality needs debug, not specifically device context.
> So add infra for debug. Device migration series is not the vehicle to piggy back on.
>
>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>> basic facility but not under the admin commands
>>> It will be part of the device context such a way that so that one can only read
>> the vq state only instead of full device context.
>>> This will work.
>> I'm not sure what it looks like, but I think they are well decoupled in this series.
>> E.g driver can choose to just read e.g last_avail_idx and report to ethtool or
>> watchdog.
>>
> Once its done it will be visible how it looks like.
> The key is it needs to cover BOTH use cases.
>
>> As I replied in other thread, I see several problems:
>>
>> 1) layer violation, PCI specific state were mixed into the basic facilities
> After agreeing to see merged donctext, now you are hinting that you don’t agree to merge two.
> I disagree if you are leaning towards that direction.
> I hope my deduction from your above comment is incorrect.
>
> There is no violation. PCI specific device context will be captured in PCI specific section.
> Device type contexts will be captured in those device type sections.
>
> TLVs will cover many of the device context information.
>
>> 2) I don't see a good definition on "device context"
>> 3) TLV works fine for queue but not registers
>>
> Please definition of device context in [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html
>
>> What needs to be done first is to describe what device context means and what
>> it contains. Not the actual data structure since it may vary.
>>
> Sure, it is already defined in device migration theory of operation section in [2].
> I will try to take it out and put in device management section, so that device migration can refer to it and
> Some other basic facility also can refer to it (which must need to explain a use case beyond silly debug point).
>
> [2] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.html
>
>>>> 3) define transport specific interfaces or admin commands to access
>>>> them
>>>>
>>> As you also envisioned, it is done using admin commands to access it.
>> That's fine, but it should allow other ways.
>>
> For passthrough admin commands fits the design.
> Sure, you need to draft it how to do other ways..
>   > >
>>>> Does this work? It seems you refused such steps in the past.
>>>>
>>> I didn’t.
>>> There must be some confusion in many emails we both exchanged, because
>> already posted v0 has it split such as device context.
>>> For dirty page tracking I couldn’t find a solid use case without device
>> migration, so I asked which you already replied above.
>>
>> First, you never explain why such coupling gives us any benefit.
> It is not coupled. Device migration uses this facility. So it is matter of text organization in the spec.
> Not the design.
>
>> Second, I've given you sufficient examples but you tend to ignore them. Why
>> not go through Qemu codes then you will see the answer.
> You gave example of debug, and profiling. You didn’t explain the use case of how to actually connect and how to profile etc.
>
>>>> Actually, I would like to leave 2) as it's very complicated which
>>>> might not converge easily.
>>>>
>>> I will split current "device migration" section to two.
>>> 1. device management
>>> 2. device migration
>>>
>>> Device management covers device mode, device context and dirty page
>> tracking.
>>
>> I don't see any connection between "management" and "device context".
>>
> You have better name than device management?
> May be device operation.
> May be it is just better to have the device context just the way it is in [1] under basic facility.
>   
>>> Device migration refers to device management section.
>>>
>>> We can omit the dirty page tracking commands for a moment and first close
>> on the device mode and device context as first step.
>>> Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but
>> moved to device management section.
>>
>> Could you please answer what's wrong with the first 4 patches in this series?
>>
> 1. cover letter is missing the problem statement and use case
I only reply to this section of comments, this does not mean I agree 
with you on your other statements, Instead I agree with
Jason on his replies to you.

In my cover letter:
"The main usecase of these new facilities is Live Migration."

Did you miss it?
> 2. why queue suspend and both device suspend are introduced, only one should be there. The design description is missing.
there are no queue suspend, they are device suspend and vq state 
accessors. please read the patch if you want to comment.
> 3. Even though it claims under some random basic facility, cover letter clearly states the main use case is "live migration".
it is not random, they are precisely defined virtio basic facilities.
> 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It does not do any bifurcation.
The device should not accept vq reset and the driver should reset vqs, 
please read previous discussions with MST
and please don't ignore the conclusions
> 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend and freeze both covered in series [1].
we have discussed this for many times, P2P is out of virtio spec, do you 
want to mediate every PCI state/functionality?
> 6. Finally the whole description of 1 to 4 need to be split in the device operation, so that both passthrough and medication can utilize it using admin cmd and otherwise.
Do you see any reasons this solution can not be used for passthrough and 
mediation?
Or does features_OK work for passthrough or mediation? Any difference?
> Since Zhu, told that dirty tracking and inflight descriptors will be done, I presume he will propose to do over admin q or command interface.
> And since all can run over the admin commands, the plumbing done in 1 to 4 can be made using admin commands.
No
>
> Until now we could not establish creating yet another DMA interface that is better than q interface.
> So ...
> To me both the methods will start looking more converged to me over admin command and queues.
I don't think so, again, we are introducing basic facilities and these 
facilities don't
depend on or rely on admin vq.
>
> Passthrough will use them over owner device.
> Mediation somehow need to do over member device.
> Mediation will not use any device suspend command because it needs to keep bisecting everything.
please read QEMU vhost live migration solution


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  5:36                                                 ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-26  6:03                                                   ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-26  6:03 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang, Michael S. Tsirkin, eperezma,
	Stefan Hajnoczi, virtio-comment
  Cc: virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 26, 2023 11:07 AM

> > 1. cover letter is missing the problem statement and use case
> I only reply to this section of comments, this does not mean I agree with you on
> your other statements, Instead I agree with Jason on his replies to you.
> 
> In my cover letter:
> "The main usecase of these new facilities is Live Migration."
>
:)
Two letter word do not explain the use case of why is asking to mediating a native virtio device.

And yet you call is the basic facilities.
Anyways you know the misaligned response in email and cover letter is evident.

> Did you miss it?
No.
It misses the detail as I described in the theory of operation described in [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> > 2. why queue suspend and both device suspend are introduced, only one
> should be there. The design description is missing.
> there are no queue suspend, they are device suspend and vq state accessors.
> please read the patch if you want to comment.
> > 3. Even though it claims under some random basic facility, cover letter clearly
> states the main use case is "live migration".
> it is not random, they are precisely defined virtio basic facilities.
> > 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It
> does not do any bifurcation.
> The device should not accept vq reset and the driver should reset vqs, please
> read previous discussions with MST and please don't ignore the conclusions
> > 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend
> and freeze both covered in series [1].
> we have discussed this for many times, P2P is out of virtio spec, do you want to
> mediate every PCI state/functionality?
> > 6. Finally the whole description of 1 to 4 need to be split in the device
> operation, so that both passthrough and medication can utilize it using admin
> cmd and otherwise.
> Do you see any reasons this solution can not be used for passthrough and
> mediation?
Right. Proposed solution does not meeting following requirements addressed in [1].

[1] https://lore.kernel.org/virtio-comment/20230906081637.32185-1-lingshan.zhu@intel.com/T/#m7efbaadbc73f033c2793d9eb1eb0afa210aae4be

[1] I replied to Jason in previous email.
I will repeat here. They are covered in [1].

1. Missing P2P support
2. Missing dirty page tracking
3. Incremental device context framework for short downtime
3.a Ability to do inflight descriptor tracking
4. Ability to do the work for multiple member devices in parallel.


> Or does features_OK work for passthrough or mediation? Any difference?
It does not work. Passthrough devices are not trapped by the hypervisor.

> > Since Zhu, told that dirty tracking and inflight descriptors will be done, I
> presume he will propose to do over admin q or command interface.
> > And since all can run over the admin commands, the plumbing done in 1 to 4
> can be made using admin commands.
> No
Such negative assertion does not help.
Explain why part, like how I explained above.

> >
> > Until now we could not establish creating yet another DMA interface that is
> better than q interface.
> > So ...
> > To me both the methods will start looking more converged to me over admin
> command and queues.
> I don't think so, again, we are introducing basic facilities and these facilities
> don't depend on or rely on admin vq.
If so, stop the work "live migration" from the cover letter.
Reliance of admin command (again not vq, be careful what you constantly claim).
Not reliance on admin queue or admin command does/does not make it basic facility.

Admin commands and queues are already in basic facilities section today.
So claiming that hey one is using admin commands that means it is non_basic facility is not correct.

> >
> > Passthrough will use them over owner device.
> > Mediation somehow need to do over member device.
> > Mediation will not use any device suspend command because it needs to
> keep bisecting everything.
> please read QEMU vhost live migration solution
Can you please share the pointer to it?

I am familiar with [2] and it does not require device suspend flow as things are bisected.

[2] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-26  6:03                                                   ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-26  6:03 UTC (permalink / raw)
  To: Zhu, Lingshan, Jason Wang, Michael S. Tsirkin, eperezma,
	Stefan Hajnoczi, virtio-comment
  Cc: virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Tuesday, September 26, 2023 11:07 AM

> > 1. cover letter is missing the problem statement and use case
> I only reply to this section of comments, this does not mean I agree with you on
> your other statements, Instead I agree with Jason on his replies to you.
> 
> In my cover letter:
> "The main usecase of these new facilities is Live Migration."
>
:)
Two letter word do not explain the use case of why is asking to mediating a native virtio device.

And yet you call is the basic facilities.
Anyways you know the misaligned response in email and cover letter is evident.

> Did you miss it?
No.
It misses the detail as I described in the theory of operation described in [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html

> > 2. why queue suspend and both device suspend are introduced, only one
> should be there. The design description is missing.
> there are no queue suspend, they are device suspend and vq state accessors.
> please read the patch if you want to comment.
> > 3. Even though it claims under some random basic facility, cover letter clearly
> states the main use case is "live migration".
> it is not random, they are precisely defined virtio basic facilities.
> > 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It
> does not do any bifurcation.
> The device should not accept vq reset and the driver should reset vqs, please
> read previous discussions with MST and please don't ignore the conclusions
> > 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend
> and freeze both covered in series [1].
> we have discussed this for many times, P2P is out of virtio spec, do you want to
> mediate every PCI state/functionality?
> > 6. Finally the whole description of 1 to 4 need to be split in the device
> operation, so that both passthrough and medication can utilize it using admin
> cmd and otherwise.
> Do you see any reasons this solution can not be used for passthrough and
> mediation?
Right. Proposed solution does not meeting following requirements addressed in [1].

[1] https://lore.kernel.org/virtio-comment/20230906081637.32185-1-lingshan.zhu@intel.com/T/#m7efbaadbc73f033c2793d9eb1eb0afa210aae4be

[1] I replied to Jason in previous email.
I will repeat here. They are covered in [1].

1. Missing P2P support
2. Missing dirty page tracking
3. Incremental device context framework for short downtime
3.a Ability to do inflight descriptor tracking
4. Ability to do the work for multiple member devices in parallel.


> Or does features_OK work for passthrough or mediation? Any difference?
It does not work. Passthrough devices are not trapped by the hypervisor.

> > Since Zhu, told that dirty tracking and inflight descriptors will be done, I
> presume he will propose to do over admin q or command interface.
> > And since all can run over the admin commands, the plumbing done in 1 to 4
> can be made using admin commands.
> No
Such negative assertion does not help.
Explain why part, like how I explained above.

> >
> > Until now we could not establish creating yet another DMA interface that is
> better than q interface.
> > So ...
> > To me both the methods will start looking more converged to me over admin
> command and queues.
> I don't think so, again, we are introducing basic facilities and these facilities
> don't depend on or rely on admin vq.
If so, stop the work "live migration" from the cover letter.
Reliance of admin command (again not vq, be careful what you constantly claim).
Not reliance on admin queue or admin command does/does not make it basic facility.

Admin commands and queues are already in basic facilities section today.
So claiming that hey one is using admin commands that means it is non_basic facility is not correct.

> >
> > Passthrough will use them over owner device.
> > Mediation somehow need to do over member device.
> > Mediation will not use any device suspend command because it needs to
> keep bisecting everything.
> please read QEMU vhost live migration solution
Can you please share the pointer to it?

I am familiar with [2] and it does not require device suspend flow as things are bisected.

[2] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  6:03                                                   ` [virtio-comment] " Parav Pandit
@ 2023-09-26  9:25                                                     ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-26  9:25 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Michael S. Tsirkin, eperezma,
	Stefan Hajnoczi, virtio-comment
  Cc: virtio-dev



On 9/26/2023 2:03 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 26, 2023 11:07 AM
>>> 1. cover letter is missing the problem statement and use case
>> I only reply to this section of comments, this does not mean I agree with you on
>> your other statements, Instead I agree with Jason on his replies to you.
>>
>> In my cover letter:
>> "The main usecase of these new facilities is Live Migration."
>>
> :)
> Two letter word do not explain the use case of why is asking to mediating a native virtio device.
this solution work for fundamental virtualization: trap and emulate, 
just like other virtio config space
fields.

Do you know how device status or vq_enable work? I suggest to read QEMU 
code.
>
> And yet you call is the basic facilities.
> Anyways you know the misaligned response in email and cover letter is evident.
No, I don't, they are basic facilities as you can see, this is loud and 
clear.
>
>> Did you miss it?
> No.
> It misses the detail as I described in the theory of operation described in [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
Details are in the following patch in this series
>
>>> 2. why queue suspend and both device suspend are introduced, only one
>> should be there. The design description is missing.
>> there are no queue suspend, they are device suspend and vq state accessors.
>> please read the patch if you want to comment.
>>> 3. Even though it claims under some random basic facility, cover letter clearly
>> states the main use case is "live migration".
>> it is not random, they are precisely defined virtio basic facilities.
>>> 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It
>> does not do any bifurcation.
>> The device should not accept vq reset and the driver should reset vqs, please
>> read previous discussions with MST and please don't ignore the conclusions
>>> 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend
>> and freeze both covered in series [1].
>> we have discussed this for many times, P2P is out of virtio spec, do you want to
>> mediate every PCI state/functionality?
>>> 6. Finally the whole description of 1 to 4 need to be split in the device
>> operation, so that both passthrough and medication can utilize it using admin
>> cmd and otherwise.
>> Do you see any reasons this solution can not be used for passthrough and
>> mediation?
> Right. Proposed solution does not meeting following requirements addressed in [1].
>
> [1] https://lore.kernel.org/virtio-comment/20230906081637.32185-1-lingshan.zhu@intel.com/T/#m7efbaadbc73f033c2793d9eb1eb0afa210aae4be
>
> [1] I replied to Jason in previous email.
> I will repeat here. They are covered in [1].
>
> 1. Missing P2P support
As I asked before, please don't ignore, please answer:
"we have discussed this for many times, P2P is out of virtio spec,
do you want to mediate every PCI state/functionality?"
> 2. Missing dirty page tracking
This will be included in V2, as we have repeated for many times,
we want this series to be small and focus, that is why dirty page tracking
and in-flight descriptors are not here. but they will in V2.
> 3. Incremental device context framework for short downtime
Do you observe significant downtime in QEMU/vhost?

Why do you think this series can introduce more downtime,

Do you know this series can re-use QEMU/vhost?

Have you really read QEMU live migration code?

Jason has ever suggested you read it.
> 3.a Ability to do inflight descriptor tracking
as told you many times, in V2
> 4. Ability to do the work for multiple member devices in parallel.
As told you many times, they are per-device facilities, for example,
per-vf device, that means, migrate the VF by its own facilities.

Is that clear enough for you?
>
>
>> Or does features_OK work for passthrough or mediation? Any difference?
> It does not work. Passthrough devices are not trapped by the hypervisor.
Really? Features_ok does not work for passthrough? Seriously?
>
>>> Since Zhu, told that dirty tracking and inflight descriptors will be done, I
>> presume he will propose to do over admin q or command interface.
>>> And since all can run over the admin commands, the plumbing done in 1 to 4
>> can be made using admin commands.
>> No
> Such negative assertion does not help.
> Explain why part, like how I explained above.
OK, we can repeat:

Again! They are self-contained basic facilities, they should better not 
depend on others like admin vq.

And please refer to previous discussions, where Jason and I pointed out 
admin vq is not a qualified
solution for live migration because of: 1)nested 2)baremetal LM 3)QOS 
4)security.

We don't want to repeat the discussions, it looks like endless circle 
with no direction.
>
>>> Until now we could not establish creating yet another DMA interface that is
>> better than q interface.
>>> So ...
>>> To me both the methods will start looking more converged to me over admin
>> command and queues.
>> I don't think so, again, we are introducing basic facilities and these facilities
>> don't depend on or rely on admin vq.
> If so, stop the work "live migration" from the cover letter.
> Reliance of admin command (again not vq, be careful what you constantly claim).
> Not reliance on admin queue or admin command does/does not make it basic facility.
why admin commands are must? These facilities are self contained, right?
>
> Admin commands and queues are already in basic facilities section today.
> So claiming that hey one is using admin commands that means it is non_basic facility is not correct.
Still why you think admin command is a must? It is clear that this 
proposal can work without admin vq,
and even better!
>
>>> Passthrough will use them over owner device.
>>> Mediation somehow need to do over member device.
>>> Mediation will not use any device suspend command because it needs to
>> keep bisecting everything.
>> please read QEMU vhost live migration solution
> Can you please share the pointer to it?
>
> I am familiar with [2] and it does not require device suspend flow as things are bisected.
>
> [2] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction
If so, I believe you may find out that this solution can work perfect 
with vhost, right?


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-26  9:25                                                     ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-26  9:25 UTC (permalink / raw)
  To: Parav Pandit, Jason Wang, Michael S. Tsirkin, eperezma,
	Stefan Hajnoczi, virtio-comment
  Cc: virtio-dev



On 9/26/2023 2:03 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Tuesday, September 26, 2023 11:07 AM
>>> 1. cover letter is missing the problem statement and use case
>> I only reply to this section of comments, this does not mean I agree with you on
>> your other statements, Instead I agree with Jason on his replies to you.
>>
>> In my cover letter:
>> "The main usecase of these new facilities is Live Migration."
>>
> :)
> Two letter word do not explain the use case of why is asking to mediating a native virtio device.
this solution work for fundamental virtualization: trap and emulate, 
just like other virtio config space
fields.

Do you know how device status or vq_enable work? I suggest to read QEMU 
code.
>
> And yet you call is the basic facilities.
> Anyways you know the misaligned response in email and cover letter is evident.
No, I don't, they are basic facilities as you can see, this is loud and 
clear.
>
>> Did you miss it?
> No.
> It misses the detail as I described in the theory of operation described in [1].
>
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00061.html
Details are in the following patch in this series
>
>>> 2. why queue suspend and both device suspend are introduced, only one
>> should be there. The design description is missing.
>> there are no queue suspend, they are device suspend and vq state accessors.
>> please read the patch if you want to comment.
>>> 3. Even though it claims under some random basic facility, cover letter clearly
>> states the main use case is "live migration".
>> it is not random, they are precisely defined virtio basic facilities.
>>> 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It
>> does not do any bifurcation.
>> The device should not accept vq reset and the driver should reset vqs, please
>> read previous discussions with MST and please don't ignore the conclusions
>>> 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend
>> and freeze both covered in series [1].
>> we have discussed this for many times, P2P is out of virtio spec, do you want to
>> mediate every PCI state/functionality?
>>> 6. Finally the whole description of 1 to 4 need to be split in the device
>> operation, so that both passthrough and medication can utilize it using admin
>> cmd and otherwise.
>> Do you see any reasons this solution can not be used for passthrough and
>> mediation?
> Right. Proposed solution does not meeting following requirements addressed in [1].
>
> [1] https://lore.kernel.org/virtio-comment/20230906081637.32185-1-lingshan.zhu@intel.com/T/#m7efbaadbc73f033c2793d9eb1eb0afa210aae4be
>
> [1] I replied to Jason in previous email.
> I will repeat here. They are covered in [1].
>
> 1. Missing P2P support
As I asked before, please don't ignore, please answer:
"we have discussed this for many times, P2P is out of virtio spec,
do you want to mediate every PCI state/functionality?"
> 2. Missing dirty page tracking
This will be included in V2, as we have repeated for many times,
we want this series to be small and focus, that is why dirty page tracking
and in-flight descriptors are not here. but they will in V2.
> 3. Incremental device context framework for short downtime
Do you observe significant downtime in QEMU/vhost?

Why do you think this series can introduce more downtime,

Do you know this series can re-use QEMU/vhost?

Have you really read QEMU live migration code?

Jason has ever suggested you read it.
> 3.a Ability to do inflight descriptor tracking
as told you many times, in V2
> 4. Ability to do the work for multiple member devices in parallel.
As told you many times, they are per-device facilities, for example,
per-vf device, that means, migrate the VF by its own facilities.

Is that clear enough for you?
>
>
>> Or does features_OK work for passthrough or mediation? Any difference?
> It does not work. Passthrough devices are not trapped by the hypervisor.
Really? Features_ok does not work for passthrough? Seriously?
>
>>> Since Zhu, told that dirty tracking and inflight descriptors will be done, I
>> presume he will propose to do over admin q or command interface.
>>> And since all can run over the admin commands, the plumbing done in 1 to 4
>> can be made using admin commands.
>> No
> Such negative assertion does not help.
> Explain why part, like how I explained above.
OK, we can repeat:

Again! They are self-contained basic facilities, they should better not 
depend on others like admin vq.

And please refer to previous discussions, where Jason and I pointed out 
admin vq is not a qualified
solution for live migration because of: 1)nested 2)baremetal LM 3)QOS 
4)security.

We don't want to repeat the discussions, it looks like endless circle 
with no direction.
>
>>> Until now we could not establish creating yet another DMA interface that is
>> better than q interface.
>>> So ...
>>> To me both the methods will start looking more converged to me over admin
>> command and queues.
>> I don't think so, again, we are introducing basic facilities and these facilities
>> don't depend on or rely on admin vq.
> If so, stop the work "live migration" from the cover letter.
> Reliance of admin command (again not vq, be careful what you constantly claim).
> Not reliance on admin queue or admin command does/does not make it basic facility.
why admin commands are must? These facilities are self contained, right?
>
> Admin commands and queues are already in basic facilities section today.
> So claiming that hey one is using admin commands that means it is non_basic facility is not correct.
Still why you think admin command is a must? It is clear that this 
proposal can work without admin vq,
and even better!
>
>>> Passthrough will use them over owner device.
>>> Mediation somehow need to do over member device.
>>> Mediation will not use any device suspend command because it needs to
>> keep bisecting everything.
>> please read QEMU vhost live migration solution
> Can you please share the pointer to it?
>
> I am familiar with [2] and it does not require device suspend flow as things are bisected.
>
> [2] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction
If so, I believe you may find out that this solution can work perfect 
with vhost, right?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  9:25                                                     ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-26 10:48                                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 10:48 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Parav Pandit, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev

On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> We don't want to repeat the discussions, it looks like endless circle with
> no direction.

OK let me try to direct this discussion.
You guys were speaking past each other, no dialog is happening.
And as long as it goes on no progress will be made and you
will keep going in circles.

Parav here made an effort and attempted to summarize
use-cases addressed by your proposal but not his.
He couldn't resist adding "a yes but" in there oh well.
But now I hope you know he knows about your use-cases?

So please do the same. Do you see any advantages to Parav's
proposal as compared to yours? Try to list them and
if possible try not to accompany the list with "yes but"
(put it in a separate mail if you must ;) ).
If you won't be able to see any, let me know and I'll try to help.

Once each of you and Parav have finally heard the other and
the other also knows he's been heard, that's when we can
try to make progress by looking for something that addresses
all use-cases as opposed to endlessly repeating same arguments.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-26 10:48                                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-26 10:48 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Parav Pandit, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev

On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> We don't want to repeat the discussions, it looks like endless circle with
> no direction.

OK let me try to direct this discussion.
You guys were speaking past each other, no dialog is happening.
And as long as it goes on no progress will be made and you
will keep going in circles.

Parav here made an effort and attempted to summarize
use-cases addressed by your proposal but not his.
He couldn't resist adding "a yes but" in there oh well.
But now I hope you know he knows about your use-cases?

So please do the same. Do you see any advantages to Parav's
proposal as compared to yours? Try to list them and
if possible try not to accompany the list with "yes but"
(put it in a separate mail if you must ;) ).
If you won't be able to see any, let me know and I'll try to help.

Once each of you and Parav have finally heard the other and
the other also knows he's been heard, that's when we can
try to make progress by looking for something that addresses
all use-cases as opposed to endlessly repeating same arguments.

-- 
MST

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26 10:48                                                       ` [virtio-comment] " Michael S. Tsirkin
@ 2023-09-27  8:20                                                         ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-27  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, Cornelia Huck
  Cc: Parav Pandit, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev



On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>> We don't want to repeat the discussions, it looks like endless circle with
>> no direction.
> OK let me try to direct this discussion.
> You guys were speaking past each other, no dialog is happening.
> And as long as it goes on no progress will be made and you
> will keep going in circles.
>
> Parav here made an effort and attempted to summarize
> use-cases addressed by your proposal but not his.
> He couldn't resist adding "a yes but" in there oh well.
> But now I hope you know he knows about your use-cases?
>
> So please do the same. Do you see any advantages to Parav's
> proposal as compared to yours? Try to list them and
> if possible try not to accompany the list with "yes but"
> (put it in a separate mail if you must ;) ).
> If you won't be able to see any, let me know and I'll try to help.
>
> Once each of you and Parav have finally heard the other and
> the other also knows he's been heard, that's when we can
> try to make progress by looking for something that addresses
> all use-cases as opposed to endlessly repeating same arguments.
Sure Michael, I will not say "yes but" here.

 From Parav's proposal, he intends to migrate a member device by its 
owner device through the admin vq,
thus necessary admin vq commands are introduced in his series.


I see his proposal can:
1) meet some customers requirements without nested and bare-metal
2) align with Nvidia production
3) easier to emulate by onboard SOC

The general purpose of his proposal and mine are aligned: migrate virtio 
devices.

Jason has ever proposed to collaborate, please allow me quote his proposal:

"
Let me repeat once again here for the possible steps to collaboration:

1) define virtqueue state, inflight descriptors in the section of
basic facility but not under the admin commands
2) define the dirty page tracking, device context/states in the
section of basic facility but not under the admin commands
3) define transport specific interfaces or admin commands to access them
"

I totally agree with his proposal.

Does this work for you Michael?

Thanks
Zhu Lingshan

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-27  8:20                                                         ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-09-27  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin, Cornelia Huck
  Cc: Parav Pandit, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev



On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>> We don't want to repeat the discussions, it looks like endless circle with
>> no direction.
> OK let me try to direct this discussion.
> You guys were speaking past each other, no dialog is happening.
> And as long as it goes on no progress will be made and you
> will keep going in circles.
>
> Parav here made an effort and attempted to summarize
> use-cases addressed by your proposal but not his.
> He couldn't resist adding "a yes but" in there oh well.
> But now I hope you know he knows about your use-cases?
>
> So please do the same. Do you see any advantages to Parav's
> proposal as compared to yours? Try to list them and
> if possible try not to accompany the list with "yes but"
> (put it in a separate mail if you must ;) ).
> If you won't be able to see any, let me know and I'll try to help.
>
> Once each of you and Parav have finally heard the other and
> the other also knows he's been heard, that's when we can
> try to make progress by looking for something that addresses
> all use-cases as opposed to endlessly repeating same arguments.
Sure Michael, I will not say "yes but" here.

 From Parav's proposal, he intends to migrate a member device by its 
owner device through the admin vq,
thus necessary admin vq commands are introduced in his series.


I see his proposal can:
1) meet some customers requirements without nested and bare-metal
2) align with Nvidia production
3) easier to emulate by onboard SOC

The general purpose of his proposal and mine are aligned: migrate virtio 
devices.

Jason has ever proposed to collaborate, please allow me quote his proposal:

"
Let me repeat once again here for the possible steps to collaboration:

1) define virtqueue state, inflight descriptors in the section of
basic facility but not under the admin commands
2) define the dirty page tracking, device context/states in the
section of basic facility but not under the admin commands
3) define transport specific interfaces or admin commands to access them
"

I totally agree with his proposal.

Does this work for you Michael?

Thanks
Zhu Lingshan

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-27  8:20                                                         ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-27 10:39                                                           ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-27 10:39 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, Cornelia Huck
  Cc: Jason Wang, eperezma, Stefan Hajnoczi, virtio-comment, virtio-dev



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 27, 2023 1:50 PM

> I see his proposal can:
> 1) meet some customers requirements without nested and bare-metal
> 2) align with Nvidia production
Slightly inaccurate.
The work produced is for the virtio spec update for the users.

I have missed adding other contributors Sign-off who also share similar use cases, which I will add in v1.

> 3) easier to emulate by onboard SOC
> 
> The general purpose of his proposal and mine are aligned: migrate virtio	
> devices.
> 
Great.

> Jason has ever proposed to collaborate, please allow me quote his proposal:
> 
> "
> Let me repeat once again here for the possible steps to collaboration:
> 
> 1) define virtqueue state, inflight descriptors in the section of
> basic facility but not under the admin commands
> 2) define the dirty page tracking, device context/states in the
> section of basic facility but not under the admin commands
> 3) define transport specific interfaces or admin commands to access them
> "
> 
> I totally agree with his proposal.

We started discussing some of the it.
If I draw parallels, one should not say "detach descriptors from virtqueue" for the infrastructure that exists in the basic facilities.
If so, one should explain the technical design reason and it would make sense.

So let's discuss it.
I like to better understand the _real_ technical reason for detaching it.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-27 10:39                                                           ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-09-27 10:39 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, Cornelia Huck
  Cc: Jason Wang, eperezma, Stefan Hajnoczi, virtio-comment, virtio-dev

> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 27, 2023 1:50 PM

> I see his proposal can:
> 1) meet some customers requirements without nested and bare-metal
> 2) align with Nvidia production
Slightly inaccurate.
The work produced is for the virtio spec update for the users.

I have missed adding other contributors Sign-off who also share similar use cases, which I will add in v1.

> 3) easier to emulate by onboard SOC
> 
> The general purpose of his proposal and mine are aligned: migrate virtio	
> devices.
> 
Great.

> Jason has ever proposed to collaborate, please allow me quote his proposal:
> 
> "
> Let me repeat once again here for the possible steps to collaboration:
> 
> 1) define virtqueue state, inflight descriptors in the section of
> basic facility but not under the admin commands
> 2) define the dirty page tracking, device context/states in the
> section of basic facility but not under the admin commands
> 3) define transport specific interfaces or admin commands to access them
> "
> 
> I totally agree with his proposal.

We started discussing some of the it.
If I draw parallels, one should not say "detach descriptors from virtqueue" for the infrastructure that exists in the basic facilities.
If so, one should explain the technical design reason and it would make sense.

So let's discuss it.
I like to better understand the _real_ technical reason for detaching it.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  4:37                                               ` Jason Wang
  2023-09-26  5:21                                                 ` Parav Pandit
@ 2023-09-27 15:31                                                 ` Michael S. Tsirkin
  1 sibling, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 15:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: Parav Pandit, Zhu, Lingshan, virtio-dev

On Tue, Sep 26, 2023 at 12:37:19PM +0800, Jason Wang wrote:
> It's silly to ship a product without any debugging facilities.

The way I see it, debugging facilities ideally would be somewhat
separate from rest of the driver. For example, imagine an on-device
gateway that lets you query the internal device state.
Sounds better than trying to tweak registers that other driver
parts might be tweaking at the same time, while trying to
take locks that other driver parts might be holding at the
same time.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-27  8:20                                                         ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-27 15:40                                                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 15:40 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > We don't want to repeat the discussions, it looks like endless circle with
> > > no direction.
> > OK let me try to direct this discussion.
> > You guys were speaking past each other, no dialog is happening.
> > And as long as it goes on no progress will be made and you
> > will keep going in circles.
> > 
> > Parav here made an effort and attempted to summarize
> > use-cases addressed by your proposal but not his.
> > He couldn't resist adding "a yes but" in there oh well.
> > But now I hope you know he knows about your use-cases?
> > 
> > So please do the same. Do you see any advantages to Parav's
> > proposal as compared to yours? Try to list them and
> > if possible try not to accompany the list with "yes but"
> > (put it in a separate mail if you must ;) ).
> > If you won't be able to see any, let me know and I'll try to help.
> > 
> > Once each of you and Parav have finally heard the other and
> > the other also knows he's been heard, that's when we can
> > try to make progress by looking for something that addresses
> > all use-cases as opposed to endlessly repeating same arguments.
> Sure Michael, I will not say "yes but" here.
> 
> From Parav's proposal, he intends to migrate a member device by its owner
> device through the admin vq,
> thus necessary admin vq commands are introduced in his series.
> 
> 
> I see his proposal can:
> 1) meet some customers requirements without nested and bare-metal
> 2) align with Nvidia production
> 3) easier to emulate by onboard SOC

Is that all you can see?

Hint: there's more.





> The general purpose of his proposal and mine are aligned: migrate virtio
> devices.
> 
> Jason has ever proposed to collaborate, please allow me quote his proposal:
> 
> "
> Let me repeat once again here for the possible steps to collaboration:
> 
> 1) define virtqueue state, inflight descriptors in the section of
> basic facility but not under the admin commands
> 2) define the dirty page tracking, device context/states in the
> section of basic facility but not under the admin commands
> 3) define transport specific interfaces or admin commands to access them
> "
> 
> I totally agree with his proposal.
> 
> Does this work for you Michael?
> 
> Thanks
> Zhu Lingshan

I just doubt very much this will work.  What will "define" mean then -
not an interface, just a description in english? I think you
underestimate the difficulty of creating such definitions that
are robust and precise.


Instead I suggest you define a way to submit admin commands that works
for nested and bare-metal (i.e. not admin vq, and not with sriov group
type). And work with Parav to make live migration admin commands work
reasonably will through this interface and with this type.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-09-27 15:40                                                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 15:40 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > We don't want to repeat the discussions, it looks like endless circle with
> > > no direction.
> > OK let me try to direct this discussion.
> > You guys were speaking past each other, no dialog is happening.
> > And as long as it goes on no progress will be made and you
> > will keep going in circles.
> > 
> > Parav here made an effort and attempted to summarize
> > use-cases addressed by your proposal but not his.
> > He couldn't resist adding "a yes but" in there oh well.
> > But now I hope you know he knows about your use-cases?
> > 
> > So please do the same. Do you see any advantages to Parav's
> > proposal as compared to yours? Try to list them and
> > if possible try not to accompany the list with "yes but"
> > (put it in a separate mail if you must ;) ).
> > If you won't be able to see any, let me know and I'll try to help.
> > 
> > Once each of you and Parav have finally heard the other and
> > the other also knows he's been heard, that's when we can
> > try to make progress by looking for something that addresses
> > all use-cases as opposed to endlessly repeating same arguments.
> Sure Michael, I will not say "yes but" here.
> 
> From Parav's proposal, he intends to migrate a member device by its owner
> device through the admin vq,
> thus necessary admin vq commands are introduced in his series.
> 
> 
> I see his proposal can:
> 1) meet some customers requirements without nested and bare-metal
> 2) align with Nvidia production
> 3) easier to emulate by onboard SOC

Is that all you can see?

Hint: there's more.





> The general purpose of his proposal and mine are aligned: migrate virtio
> devices.
> 
> Jason has ever proposed to collaborate, please allow me quote his proposal:
> 
> "
> Let me repeat once again here for the possible steps to collaboration:
> 
> 1) define virtqueue state, inflight descriptors in the section of
> basic facility but not under the admin commands
> 2) define the dirty page tracking, device context/states in the
> section of basic facility but not under the admin commands
> 3) define transport specific interfaces or admin commands to access them
> "
> 
> I totally agree with his proposal.
> 
> Does this work for you Michael?
> 
> Thanks
> Zhu Lingshan

I just doubt very much this will work.  What will "define" mean then -
not an interface, just a description in english? I think you
underestimate the difficulty of creating such definitions that
are robust and precise.


Instead I suggest you define a way to submit admin commands that works
for nested and bare-metal (i.e. not admin vq, and not with sriov group
type). And work with Parav to make live migration admin commands work
reasonably will through this interface and with this type.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-25 10:41                                         ` Parav Pandit
  2023-09-26  2:45                                           ` Jason Wang
@ 2023-09-27 21:43                                           ` Michael S. Tsirkin
  1 sibling, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-09-27 21:43 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Jason Wang, Zhu, Lingshan, virtio-dev

On Mon, Sep 25, 2023 at 10:41:12AM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Friday, September 22, 2023 8:38 AM
> 
> > > Device context has no overlap.
> > 
> > I can give you one example, e.g debugging.
> >
> Almost every feature needs debugging. :)
> So I am omitting it for time being.
> 
>  
> > >
> > > Dirty page tracking has no overlap. What do you want to profile and monitor?
> > In case if you want to profile, it can be used without migration command
> > anyway?
> > 
> > It works like a dirty bit of PTE. We all know it has a broader use case than
> > logging. For example, tracking working set and do optimization on
> > IOMMU/IOTLB or even device IOTLB.
> > 
> > 1) Try to prove your facility can only work for one specific cases
> > 2) Try to prove your facility can work for more than one cases
> > 
> > Which one is easier and more beneficial to virtio?
> > 
> > 
> > > If you describe, may be we I can split "device migration" chapter to
> > > two pieces, Device management and device migration.
> > >
> > > Device migration will use these basic facility.
> > > Would that help you?
> > 
> > Definitely, but it needs to be done by not making it under the subsection of
> > admin commands, that's it.
> > 
> > Let me repeat once again here for the possible steps to collaboration:
> > 
> > 1) define virtqueue state, inflight descriptors in the section of basic facility but
> > not under the admin commands
> It will be part of the device context such a way that so that one can only read the vq state only instead of full device context.
> This will work.
> 
> > 2) define the dirty page tracking, device context/states in the section of basic
> > facility but not under the admin commands
> Great.
> Device context is already defined in the basic facility outside of the admin commands in [1].
> Current text is around device migration and spec evolves it can adopt more generic text without device migration.
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.html
> 
> > 3) define transport specific interfaces or admin commands to access them
> >
> As you also envisioned, it is done using admin commands to access it.
> 
> > Does this work? It seems you refused such steps in the past.
> > 
> I didn’t.
> There must be some confusion in many emails we both exchanged, because already posted v0 has it split such as device context.
> 
> For dirty page tracking I couldn’t find a solid use case without device migration, so I asked which you already replied above.
> 
> > Actually, I would like to leave 2) as it's very complicated which might not
> > converge easily.
> >
> I will split current "device migration" section to two.
> 1. device management
> 2. device migration
> 
> Device management covers device mode, device context and dirty page tracking.
> Device migration refers to device management section.
> 
> We can omit the dirty page tracking commands for a moment and first close on the device mode and device context as first step.
> Since it is already part of v0 and it is needed, I will keep it in subsequent v1 but moved to device management section.
> 
> Michael,
> Are you ok with this approach to step forward?

I actually like it that your current series is more or less complete.
I would rather you did not drop parts of functionality, for me at least
it's hard to see how things will work otherwise.

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-26  5:21                                                 ` Parav Pandit
@ 2023-10-09  8:49                                                   ` Jason Wang
  2023-10-12 10:03                                                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 445+ messages in thread
From: Jason Wang @ 2023-10-09  8:49 UTC (permalink / raw)
  To: Parav Pandit; +Cc: Michael S. Tsirkin, Zhu, Lingshan, virtio-dev

On Tue, Sep 26, 2023 at 1:21 PM Parav Pandit <parav@nvidia.com> wrote:
>
>
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, September 26, 2023 10:07 AM
> >
> > On Tue, Sep 26, 2023 at 11:40 AM Parav Pandit <parav@nvidia.com> wrote:
> > >
> > >
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, September 26, 2023 8:16 AM
> > > >
> > > > On Mon, Sep 25, 2023 at 6:41 PM Parav Pandit <parav@nvidia.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > Sent: Friday, September 22, 2023 8:38 AM
> > > > >
> > > > > > > Device context has no overlap.
> > > > > >
> > > > > > I can give you one example, e.g debugging.
> > > > > >
> > > > > Almost every feature needs debugging. :) So I am omitting it for
> > > > > time being.
> > > >
> > > > Well, I don't think so. We have a lot of handy tools for that (ethtool -d?).
> > > Sure add something specific for debug and explicitly mention that it is for
> > debug like -d.
> > > Every feature and functionality needs debug, not specifically device context.
> > > So add infra for debug.
> >
> > Why do you think it's an infra? All you need to do is a simple decoupling.
> >
> It is too vague comment for me.
> Everything you need to debug can be queried from the device if needed.
> So please add the debug infrastructure for it.
>
> It may be useful outside of debug too.
>
> Practically any query interface added, can be used by the member driver for debug purposes.
>
> > > Device migration series is not the vehicle to piggy back on.
> > >
> > > > > > 1) define virtqueue state, inflight descriptors in the section
> > > > > > of basic facility but not under the admin commands
> > > > > It will be part of the device context such a way that so that one
> > > > > can only read
> > > > the vq state only instead of full device context.
> > > > > This will work.
> > > >
> > > > I'm not sure what it looks like, but I think they are well decoupled in this
> > series.
> > > > E.g driver can choose to just read e.g last_avail_idx and report to
> > > > ethtool or watchdog.
> > > >
> > > Once its done it will be visible how it looks like.
> > > The key is it needs to cover BOTH use cases.
> > >
> > > > As I replied in other thread, I see several problems:
> > > >
> > > > 1) layer violation, PCI specific state were mixed into the basic
> > > > facilities
> > >
> > > After agreeing to see merged donctext, now you are hinting that you don’t
> > agree to merge two.
> >
> > It's not a merging, it's about decoupling. I'm fine if you don't do coupling and
> > layer violation.
> >
> Admin command is decoupled from admin virtqueue already.
> Device context is decoupled from admin command already.
> Dirty page tracking is decpupled from admin command already.
> Device mode is decoupled from admin command already.

Again, this is not what I read from your series.

>
>
> > > I disagree if you are leaning towards that direction.
> > > I hope my deduction from your above comment is incorrect.
> >
> > I agree to seek a way to unify but it doesn't mean everything in your current
> > proposal is correct. Basic facility part should be transport independent.
> >
> Can you please comment in my series what is incorrect? I would like to discuss there, similar to your ask here.
>
>
> > >
> > > There is no violation. PCI specific device context will be captured in PCI
> > specific section.
> >
> > Is that what you've done in your series now?
> >
> It will be added in the v1 of my series.
>
> > > Device type contexts will be captured in those device type sections.
> > >
> > > TLVs will cover many of the device context information.
> > >
> > > > 2) I don't see a good definition on "device context"
> > > > 3) TLV works fine for queue but not registers
> > > >
> > > Please definition of device context in [1].
> > >
> > > [1]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00064.h
> > > tml
> > >
> > > > What needs to be done first is to describe what device context means
> > > > and what it contains. Not the actual data structure since it may vary.
> > > >
> > > Sure, it is already defined in device migration theory of operation section in
> > [2].
> >
> > It's too vague, for example it's not easy to infer if the following belong to device
> > context:
> >
> I am 100% confident and challenge you that theory of operation explained in my series is sigficantly better than commit log of Lingshan saying "main use case is live migration".
>
> > 1) dirty pages
>
> Above does not. Once you read each patch it should be clear, because dirty pages in not part of the device context.

This is not what I read from your series.

>
> > 2) virtqueue adresses
> >
> It is part of the device context is listed in the patch.
>
> > etc.
> >
> > > I will try to take it out and put in device management section, so
> > > that device migration can refer to it and Some other basic facility also can
> > refer to it (which must need to explain a use case beyond silly debug point).
> >
> > It's silly to ship a product without any debugging facilities.
> >
> :)
> How come until now no one debug virtio last_avail_index in the device?

ethtool -d is more than just indices, no?

People can easily instrument the Qemu kernel via a lot of mature
debugging facilities. Or are you saying those facilities can be used
across the PCI interfaces? If not, how?

> Please really stop such arguments.
>
> If something is needed for debug please proceed to add such debug interface.

Most debugging is simply the dumping of device states so I don't know
what you want to say here. We all know we can't easily add things like
dynamic debugging to a hardware.

>
> > And I've given you other examples like hibernation but you ignore them again.
> No. I didn’t ignore. In fact I asked AMD expert to extend his proposal beyond GPU and more.
>
> > And what's more important, you ignores my question, let me ask you again
> > here:
> >
> > 1) Try to prove your facility can only work for one specific case
> > 2) Try to prove your facility can work for more than one cases
> >
> > Which one is easier and more beneficial to virtio?
> >
> #1 is easier than #2 as it needs only solve specific case.
> #2 may be more useful when one establishes two cases _really_ exists.
>
> > The decoupling is just a matter of relocating the text, does it block any of your
> > proposals? Why do you choose to refuse such a simple change with such a huge
> > advantage?
>
> I repeat, I didn’t refuse. I just don’t see a point of writing something without a solid use case defined.
> If debug is a use case, one must write "debug" there..

Why? For example ethtool -d just dump registers. But from the view of
device, those registers is not necessarily named wtih "debug".

Again, the debugging is just a dump of device states. We are not
inventing debugging facilities for e.g SoC.

>
> >
> > >
> > > [2]
> > > https://lists.oasis-open.org/archives/virtio-comment/202309/msg00062.h
> > > tml
> > >
> > > > >
> > > > > > 3) define transport specific interfaces or admin commands to
> > > > > > access them
> > > > > >
> > > > > As you also envisioned, it is done using admin commands to access it.
> > > >
> > > > That's fine, but it should allow other ways.
> > > >
> > > For passthrough admin commands fits the design.
> > > Sure, you need to draft it how to do other ways..
> >
> > So, it looks to me you never examine this series carefully.
> >
> Please stay to technical discussions.
> I examined and there is zero description beyond "main use case is live migration".

Virtio spec is not a suitable place to repeat the definition of live
migration since it's a well-known technology.

>
> I don’t see the way how your proposals fits the passthrough case.

Please explain why.

>
> > >  > >
> > > > > > Does this work? It seems you refused such steps in the past.
> > > > > >
> > > > > I didn’t.
> > > > > There must be some confusion in many emails we both exchanged,
> > > > > because
> > > > already posted v0 has it split such as device context.
> > > > >
> > > > > For dirty page tracking I couldn’t find a solid use case without
> > > > > device
> > > > migration, so I asked which you already replied above.
> > > >
> > > > First, you never explain why such coupling gives us any benefit.
> > > It is not coupled. Device migration uses this facility. So it is matter of text
> > organization in the spec.
> > > Not the design.
> > >
> > > > Second, I've given you sufficient examples but you tend to ignore
> > > > them. Why not go through Qemu codes then you will see the answer.
> > > You gave example of debug, and profiling. You didn’t explain the use case of
> > how to actually connect and how to profile etc.
> >
> > How much evidence do you need me to provide? Do I need to give you the
> > manpages of ethtool -d? For profiling, is it too hard to think, for example, to give
> > non retire hints to IOTLB via a specific IOPTE if we found a page is dirty?
> >
> Sounds useful, but this is a very different use case than live migration.
> If I think further, this interface may be useful in the data path side, when mapping/unmapping is done.
> So it can actually use the existing interface.
> A side interface may not be useful as it may be coming slightly less as opposed to existing hint of DMA interface.
>
> > Compare the evidence or proof you've provided, how much you have provided
> > so far? Or could you please prove the device context can only work for
> > migration?
> >
> Virtio spec development is not proof based method.

It's you that wants the reviewers to use a proof based method. For
example, I'm saying logging has more use cases than migration and then
you want me to give proofs.

I'm saying it has more use cases, you want me to prove. I prove and
then I want you to prove it can't be used outside live migration,
you're saying you don't want to be proof based. Self contradictary,
no?

> I wont go this route.

Then let's don't use double standard.

> Please review the patches and comment.
>
> Device context works well for the device migration use case.

You're sending patches for comments not for parse.

> If you don’t understand specific text in the patches, please comment there to proceed.

I will but you never give any concrete comments to this series.

>
> > >
> > > > > > Actually, I would like to leave 2) as it's very complicated
> > > > > > which might not converge easily.
> > > > > >
> > > > > I will split current "device migration" section to two.
> > > > > 1. device management
> > > > > 2. device migration
> > > > >
> > > > > Device management covers device mode, device context and dirty
> > > > > page
> > > > tracking.
> > > >
> > > > I don't see any connection between "management" and "device context".
> > > >
> > > You have better name than device management?
> > > May be device operation.
> > > May be it is just better to have the device context just the way it is in [1] under
> > basic facility.
> >
> > I don't understand this.
> >
> Device context is defined in "basic facility" section.
> I will just keep it there.
>
> > >
> > > > > Device migration refers to device management section.
> > > > >
> > > > > We can omit the dirty page tracking commands for a moment and
> > > > > first close
> > > > on the device mode and device context as first step.
> > > > > Since it is already part of v0 and it is needed, I will keep it in
> > > > > subsequent v1 but
> > > > moved to device management section.
> > > >
> > > > Could you please answer what's wrong with the first 4 patches in this series?
> > > >
> >
> > I'd leave Ling Shan to comment on this.
> >
> > > 1. cover letter is missing the problem statement and use case 2. why
> > > queue suspend and both device suspend are introduced, only one should be
> > there. The design description is missing.
> > > 3. Even though it claims under some random basic facility, cover letter clearly
> > states the main use case is "live migration".
> > > 4. Patch 4 is not needed at all. When device is suspended, it is _suspended_. It
> > does not do any bifurcation.
> > > 5. only suspend bit of patch2 is not enough to cover P2P. One needs suspend
> > and freeze both covered in series [1].
> > > 6. Finally the whole description of 1 to 4 need to be split in the device
> > operation, so that both passthrough and medication can utilize it using admin
> > cmd and otherwise.
> > > Since Zhu, told that dirty tracking and inflight descriptors will be done, I
> > presume he will propose to do over admin q or command interface.
> > > And since all can run over the admin commands, the plumbing done in 1 to 4
> > can be made using admin commands.
> > >
> > > Until now we could not establish creating yet another DMA interface that is
> > better than q interface.
> > > So ...
> > > To me both the methods will start looking more converged to me over admin
> > command and queues.
> >
> > This is self-contradictory with what you've said before.
> >
> > We're in an endless circle now. The main reason is that you keep ignoring
> > comments.
> >
> You didn’t comment in the series that should be reviewed.
>
> We can start back as following.
> 1. We have two use cases, first agree that both use cases exist.
> If disagree, discuss...
> 2. Agree to improve device migration framework for both
> If disagree, discuss...
> 3. How to implement such functionality, discuss if both can use common framework.
> a. Discuss what are the technical touch points, which differs between two use cases. What different framework is needed for each case.
> b. Discuss which are the common points that both can leverage.
>
> > Another question you keep ignoring is: what prevents this proposal from being
> > used in the passthrough setups? I'm pretty sure if you can get the right answer,
> > you will understand the big picture.
> I answered this already.
> In passthrough mode, the passthrough device do not have access to any of the following facility.

What's wrong if they have? I had given sufficient proofs to
demonstrate that they are useful for guests.

> These facility must reside on the owner device.
> a. dirty page tracking
> b. incremental device context read/write
> c. device mode setting
>
> Your comment is, some of this useful for debug and IOTLB optimization, so one can also avail this facility in the member device too.

Not only the member device but the use cases beyond migration, that's the point.

> This is fine, both can have it.
> In such use case one will have two AQs, one on owner device, one on the member device.

AQ on the member device for live migration requires PASID. And once we
have PASID why do we still bother with owner?

> Both operates at their own domain.
>
> For debug purpose you wanted to use non incremental device context.
> This is a different use case, and one should build it when it is _really_ needed.

The debugging is just a piggybacking if you don't couple context with
migration, isn't it?

> Putting that in the debug umbrella to use it for mediation use case is not the right way to proceed.

I never say debugging has any relationship with mediation. Let me
repeat once again, try not couple dirty pages, device context with
migration. We can benefit from those decoupling.

> One should say, mediation requires non incremental device context and the interface may look different, this is also fine.

This is not my point.

Thanks


>
> Please don’t try to contradict two series, and it will be really easy to work forward.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-27 15:40                                                           ` [virtio-comment] " Michael S. Tsirkin
@ 2023-10-09 10:01                                                             ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-09 10:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>> no direction.
>>> OK let me try to direct this discussion.
>>> You guys were speaking past each other, no dialog is happening.
>>> And as long as it goes on no progress will be made and you
>>> will keep going in circles.
>>>
>>> Parav here made an effort and attempted to summarize
>>> use-cases addressed by your proposal but not his.
>>> He couldn't resist adding "a yes but" in there oh well.
>>> But now I hope you know he knows about your use-cases?
>>>
>>> So please do the same. Do you see any advantages to Parav's
>>> proposal as compared to yours? Try to list them and
>>> if possible try not to accompany the list with "yes but"
>>> (put it in a separate mail if you must ;) ).
>>> If you won't be able to see any, let me know and I'll try to help.
>>>
>>> Once each of you and Parav have finally heard the other and
>>> the other also knows he's been heard, that's when we can
>>> try to make progress by looking for something that addresses
>>> all use-cases as opposed to endlessly repeating same arguments.
>> Sure Michael, I will not say "yes but" here.
>>
>>  From Parav's proposal, he intends to migrate a member device by its owner
>> device through the admin vq,
>> thus necessary admin vq commands are introduced in his series.
>>
>>
>> I see his proposal can:
>> 1) meet some customers requirements without nested and bare-metal
>> 2) align with Nvidia production
>> 3) easier to emulate by onboard SOC
> Is that all you can see?
>
> Hint: there's more.
please help provide more.
>
>
>
>
>
>> The general purpose of his proposal and mine are aligned: migrate virtio
>> devices.
>>
>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>
>> "
>> Let me repeat once again here for the possible steps to collaboration:
>>
>> 1) define virtqueue state, inflight descriptors in the section of
>> basic facility but not under the admin commands
>> 2) define the dirty page tracking, device context/states in the
>> section of basic facility but not under the admin commands
>> 3) define transport specific interfaces or admin commands to access them
>> "
>>
>> I totally agree with his proposal.
>>
>> Does this work for you Michael?
>>
>> Thanks
>> Zhu Lingshan
> I just doubt very much this will work.  What will "define" mean then -
> not an interface, just a description in english? I think you
> underestimate the difficulty of creating such definitions that
> are robust and precise.
I think we can review the patch to correct the words.
>
>
> Instead I suggest you define a way to submit admin commands that works
> for nested and bare-metal (i.e. not admin vq, and not with sriov group
> type). And work with Parav to make live migration admin commands work
> reasonably will through this interface and with this type.
why admin commands are better than registers?
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-09 10:01                                                             ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-09 10:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>> no direction.
>>> OK let me try to direct this discussion.
>>> You guys were speaking past each other, no dialog is happening.
>>> And as long as it goes on no progress will be made and you
>>> will keep going in circles.
>>>
>>> Parav here made an effort and attempted to summarize
>>> use-cases addressed by your proposal but not his.
>>> He couldn't resist adding "a yes but" in there oh well.
>>> But now I hope you know he knows about your use-cases?
>>>
>>> So please do the same. Do you see any advantages to Parav's
>>> proposal as compared to yours? Try to list them and
>>> if possible try not to accompany the list with "yes but"
>>> (put it in a separate mail if you must ;) ).
>>> If you won't be able to see any, let me know and I'll try to help.
>>>
>>> Once each of you and Parav have finally heard the other and
>>> the other also knows he's been heard, that's when we can
>>> try to make progress by looking for something that addresses
>>> all use-cases as opposed to endlessly repeating same arguments.
>> Sure Michael, I will not say "yes but" here.
>>
>>  From Parav's proposal, he intends to migrate a member device by its owner
>> device through the admin vq,
>> thus necessary admin vq commands are introduced in his series.
>>
>>
>> I see his proposal can:
>> 1) meet some customers requirements without nested and bare-metal
>> 2) align with Nvidia production
>> 3) easier to emulate by onboard SOC
> Is that all you can see?
>
> Hint: there's more.
please help provide more.
>
>
>
>
>
>> The general purpose of his proposal and mine are aligned: migrate virtio
>> devices.
>>
>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>
>> "
>> Let me repeat once again here for the possible steps to collaboration:
>>
>> 1) define virtqueue state, inflight descriptors in the section of
>> basic facility but not under the admin commands
>> 2) define the dirty page tracking, device context/states in the
>> section of basic facility but not under the admin commands
>> 3) define transport specific interfaces or admin commands to access them
>> "
>>
>> I totally agree with his proposal.
>>
>> Does this work for you Michael?
>>
>> Thanks
>> Zhu Lingshan
> I just doubt very much this will work.  What will "define" mean then -
> not an interface, just a description in english? I think you
> underestimate the difficulty of creating such definitions that
> are robust and precise.
I think we can review the patch to correct the words.
>
>
> Instead I suggest you define a way to submit admin commands that works
> for nested and bare-metal (i.e. not admin vq, and not with sriov group
> type). And work with Parav to make live migration admin commands work
> reasonably will through this interface and with this type.
why admin commands are better than registers?
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-09-27 10:39                                                           ` [virtio-comment] " Parav Pandit
@ 2023-10-09 10:05                                                             ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-09 10:05 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, Cornelia Huck
  Cc: Jason Wang, eperezma, Stefan Hajnoczi, virtio-comment, virtio-dev



On 9/27/2023 6:39 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 27, 2023 1:50 PM
>> I see his proposal can:
>> 1) meet some customers requirements without nested and bare-metal
>> 2) align with Nvidia production
> Slightly inaccurate.
> The work produced is for the virtio spec update for the users.
>
> I have missed adding other contributors Sign-off who also share similar use cases, which I will add in v1.
>
>> 3) easier to emulate by onboard SOC
>>
>> The general purpose of his proposal and mine are aligned: migrate virtio	
>> devices.
>>
> Great.
>
>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>
>> "
>> Let me repeat once again here for the possible steps to collaboration:
>>
>> 1) define virtqueue state, inflight descriptors in the section of
>> basic facility but not under the admin commands
>> 2) define the dirty page tracking, device context/states in the
>> section of basic facility but not under the admin commands
>> 3) define transport specific interfaces or admin commands to access them
>> "
>>
>> I totally agree with his proposal.
> We started discussing some of the it.
> If I draw parallels, one should not say "detach descriptors from virtqueue" for the infrastructure that exists in the basic facilities.
> If so, one should explain the technical design reason and it would make sense.
not sure what is  "detach descriptors from virtqueue", but admin vq 
carries commands anyway.
>
> So let's discuss it.
> I like to better understand the _real_ technical reason for detaching it.
so please to or cc me in your series.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-09 10:05                                                             ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-09 10:05 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin, Cornelia Huck
  Cc: Jason Wang, eperezma, Stefan Hajnoczi, virtio-comment, virtio-dev



On 9/27/2023 6:39 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 27, 2023 1:50 PM
>> I see his proposal can:
>> 1) meet some customers requirements without nested and bare-metal
>> 2) align with Nvidia production
> Slightly inaccurate.
> The work produced is for the virtio spec update for the users.
>
> I have missed adding other contributors Sign-off who also share similar use cases, which I will add in v1.
>
>> 3) easier to emulate by onboard SOC
>>
>> The general purpose of his proposal and mine are aligned: migrate virtio	
>> devices.
>>
> Great.
>
>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>
>> "
>> Let me repeat once again here for the possible steps to collaboration:
>>
>> 1) define virtqueue state, inflight descriptors in the section of
>> basic facility but not under the admin commands
>> 2) define the dirty page tracking, device context/states in the
>> section of basic facility but not under the admin commands
>> 3) define transport specific interfaces or admin commands to access them
>> "
>>
>> I totally agree with his proposal.
> We started discussing some of the it.
> If I draw parallels, one should not say "detach descriptors from virtqueue" for the infrastructure that exists in the basic facilities.
> If so, one should explain the technical design reason and it would make sense.
not sure what is  "detach descriptors from virtqueue", but admin vq 
carries commands anyway.
>
> So let's discuss it.
> I like to better understand the _real_ technical reason for detaching it.
so please to or cc me in your series.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-09 10:05                                                             ` Zhu, Lingshan
@ 2023-10-09 10:07                                                               ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-10-09 10:07 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, Cornelia Huck
  Cc: Jason Wang, eperezma, Stefan Hajnoczi, virtio-comment, virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, October 9, 2023 3:36 PM
> 
> On 9/27/2023 6:39 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Wednesday, September 27, 2023 1:50 PM I see his proposal can:
> >> 1) meet some customers requirements without nested and bare-metal
> >> 2) align with Nvidia production
> > Slightly inaccurate.
> > The work produced is for the virtio spec update for the users.
> >
> > I have missed adding other contributors Sign-off who also share similar use
> cases, which I will add in v1.
> >
> >> 3) easier to emulate by onboard SOC
> >>
> >> The general purpose of his proposal and mine are aligned: migrate virtio
> 
> >> devices.
> >>
> > Great.
> >
> >> Jason has ever proposed to collaborate, please allow me quote his proposal:
> >>
> >> "
> >> Let me repeat once again here for the possible steps to collaboration:
> >>
> >> 1) define virtqueue state, inflight descriptors in the section of
> >> basic facility but not under the admin commands
> >> 2) define the dirty page tracking, device context/states in the
> >> section of basic facility but not under the admin commands
> >> 3) define transport specific interfaces or admin commands to access
> >> them "
> >>
> >> I totally agree with his proposal.
> > We started discussing some of the it.
> > If I draw parallels, one should not say "detach descriptors from virtqueue" for
> the infrastructure that exists in the basic facilities.
> > If so, one should explain the technical design reason and it would make sense.
> not sure what is  "detach descriptors from virtqueue", but admin vq carries
> commands anyway.
> >
> > So let's discuss it.
> > I like to better understand the _real_ technical reason for detaching it.
> so please to or cc me in your series.
Sure, will do it in the v2.
Jason already added you in the v1.
It is already in the virtio-comment mailing list, so for now you can respond to v1.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-09 10:07                                                               ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-10-09 10:07 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin, Cornelia Huck
  Cc: Jason Wang, eperezma, Stefan Hajnoczi, virtio-comment, virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Monday, October 9, 2023 3:36 PM
> 
> On 9/27/2023 6:39 PM, Parav Pandit wrote:
> >
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Wednesday, September 27, 2023 1:50 PM I see his proposal can:
> >> 1) meet some customers requirements without nested and bare-metal
> >> 2) align with Nvidia production
> > Slightly inaccurate.
> > The work produced is for the virtio spec update for the users.
> >
> > I have missed adding other contributors Sign-off who also share similar use
> cases, which I will add in v1.
> >
> >> 3) easier to emulate by onboard SOC
> >>
> >> The general purpose of his proposal and mine are aligned: migrate virtio
> 
> >> devices.
> >>
> > Great.
> >
> >> Jason has ever proposed to collaborate, please allow me quote his proposal:
> >>
> >> "
> >> Let me repeat once again here for the possible steps to collaboration:
> >>
> >> 1) define virtqueue state, inflight descriptors in the section of
> >> basic facility but not under the admin commands
> >> 2) define the dirty page tracking, device context/states in the
> >> section of basic facility but not under the admin commands
> >> 3) define transport specific interfaces or admin commands to access
> >> them "
> >>
> >> I totally agree with his proposal.
> > We started discussing some of the it.
> > If I draw parallels, one should not say "detach descriptors from virtqueue" for
> the infrastructure that exists in the basic facilities.
> > If so, one should explain the technical design reason and it would make sense.
> not sure what is  "detach descriptors from virtqueue", but admin vq carries
> commands anyway.
> >
> > So let's discuss it.
> > I like to better understand the _real_ technical reason for detaching it.
> so please to or cc me in your series.
Sure, will do it in the v2.
Jason already added you in the v1.
It is already in the virtio-comment mailing list, so for now you can respond to v1.

^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-09 10:01                                                             ` [virtio-comment] " Zhu, Lingshan
@ 2023-10-11 10:20                                                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 10:20 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > > > We don't want to repeat the discussions, it looks like endless circle with
> > > > > no direction.
> > > > OK let me try to direct this discussion.
> > > > You guys were speaking past each other, no dialog is happening.
> > > > And as long as it goes on no progress will be made and you
> > > > will keep going in circles.
> > > > 
> > > > Parav here made an effort and attempted to summarize
> > > > use-cases addressed by your proposal but not his.
> > > > He couldn't resist adding "a yes but" in there oh well.
> > > > But now I hope you know he knows about your use-cases?
> > > > 
> > > > So please do the same. Do you see any advantages to Parav's
> > > > proposal as compared to yours? Try to list them and
> > > > if possible try not to accompany the list with "yes but"
> > > > (put it in a separate mail if you must ;) ).
> > > > If you won't be able to see any, let me know and I'll try to help.
> > > > 
> > > > Once each of you and Parav have finally heard the other and
> > > > the other also knows he's been heard, that's when we can
> > > > try to make progress by looking for something that addresses
> > > > all use-cases as opposed to endlessly repeating same arguments.
> > > Sure Michael, I will not say "yes but" here.
> > > 
> > >  From Parav's proposal, he intends to migrate a member device by its owner
> > > device through the admin vq,
> > > thus necessary admin vq commands are introduced in his series.
> > > 
> > > 
> > > I see his proposal can:
> > > 1) meet some customers requirements without nested and bare-metal
> > > 2) align with Nvidia production
> > > 3) easier to emulate by onboard SOC
> > Is that all you can see?
> > 
> > Hint: there's more.
> please help provide more.

Just a small subset off the top of my head:
Error handling.
Extendable to other group types such as SIOV.
Batching of commands
less pci transactioons
Support for keeping some data off-device

which does not mean it's better unconditionally.
are above points clear?

as long as you guys keep not hearing each other we will keep
seeing these flame wars. if you expect everyone on virtio-comment
to follow a 300 message thread you are imo very much mistaken.

> > 
> > 
> > 
> > 
> > 
> > > The general purpose of his proposal and mine are aligned: migrate virtio
> > > devices.
> > > 
> > > Jason has ever proposed to collaborate, please allow me quote his proposal:
> > > 
> > > "
> > > Let me repeat once again here for the possible steps to collaboration:
> > > 
> > > 1) define virtqueue state, inflight descriptors in the section of
> > > basic facility but not under the admin commands
> > > 2) define the dirty page tracking, device context/states in the
> > > section of basic facility but not under the admin commands
> > > 3) define transport specific interfaces or admin commands to access them
> > > "
> > > 
> > > I totally agree with his proposal.
> > > 
> > > Does this work for you Michael?
> > > 
> > > Thanks
> > > Zhu Lingshan
> > I just doubt very much this will work.  What will "define" mean then -
> > not an interface, just a description in english? I think you
> > underestimate the difficulty of creating such definitions that
> > are robust and precise.
> I think we can review the patch to correct the words.
> > 
> > 
> > Instead I suggest you define a way to submit admin commands that works
> > for nested and bare-metal (i.e. not admin vq, and not with sriov group
> > type). And work with Parav to make live migration admin commands work
> > reasonably will through this interface and with this type.
> why admin commands are better than registers?
> > 
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-11 10:20                                                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-11 10:20 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> > On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > > > We don't want to repeat the discussions, it looks like endless circle with
> > > > > no direction.
> > > > OK let me try to direct this discussion.
> > > > You guys were speaking past each other, no dialog is happening.
> > > > And as long as it goes on no progress will be made and you
> > > > will keep going in circles.
> > > > 
> > > > Parav here made an effort and attempted to summarize
> > > > use-cases addressed by your proposal but not his.
> > > > He couldn't resist adding "a yes but" in there oh well.
> > > > But now I hope you know he knows about your use-cases?
> > > > 
> > > > So please do the same. Do you see any advantages to Parav's
> > > > proposal as compared to yours? Try to list them and
> > > > if possible try not to accompany the list with "yes but"
> > > > (put it in a separate mail if you must ;) ).
> > > > If you won't be able to see any, let me know and I'll try to help.
> > > > 
> > > > Once each of you and Parav have finally heard the other and
> > > > the other also knows he's been heard, that's when we can
> > > > try to make progress by looking for something that addresses
> > > > all use-cases as opposed to endlessly repeating same arguments.
> > > Sure Michael, I will not say "yes but" here.
> > > 
> > >  From Parav's proposal, he intends to migrate a member device by its owner
> > > device through the admin vq,
> > > thus necessary admin vq commands are introduced in his series.
> > > 
> > > 
> > > I see his proposal can:
> > > 1) meet some customers requirements without nested and bare-metal
> > > 2) align with Nvidia production
> > > 3) easier to emulate by onboard SOC
> > Is that all you can see?
> > 
> > Hint: there's more.
> please help provide more.

Just a small subset off the top of my head:
Error handling.
Extendable to other group types such as SIOV.
Batching of commands
less pci transactioons
Support for keeping some data off-device

which does not mean it's better unconditionally.
are above points clear?

as long as you guys keep not hearing each other we will keep
seeing these flame wars. if you expect everyone on virtio-comment
to follow a 300 message thread you are imo very much mistaken.

> > 
> > 
> > 
> > 
> > 
> > > The general purpose of his proposal and mine are aligned: migrate virtio
> > > devices.
> > > 
> > > Jason has ever proposed to collaborate, please allow me quote his proposal:
> > > 
> > > "
> > > Let me repeat once again here for the possible steps to collaboration:
> > > 
> > > 1) define virtqueue state, inflight descriptors in the section of
> > > basic facility but not under the admin commands
> > > 2) define the dirty page tracking, device context/states in the
> > > section of basic facility but not under the admin commands
> > > 3) define transport specific interfaces or admin commands to access them
> > > "
> > > 
> > > I totally agree with his proposal.
> > > 
> > > Does this work for you Michael?
> > > 
> > > Thanks
> > > Zhu Lingshan
> > I just doubt very much this will work.  What will "define" mean then -
> > not an interface, just a description in english? I think you
> > underestimate the difficulty of creating such definitions that
> > are robust and precise.
> I think we can review the patch to correct the words.
> > 
> > 
> > Instead I suggest you define a way to submit admin commands that works
> > for nested and bare-metal (i.e. not admin vq, and not with sriov group
> > type). And work with Parav to make live migration admin commands work
> > reasonably will through this interface and with this type.
> why admin commands are better than registers?
> > 
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-11 10:20                                                               ` Michael S. Tsirkin
@ 2023-10-11 10:38                                                                 ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-11 10:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>>>> no direction.
>>>>> OK let me try to direct this discussion.
>>>>> You guys were speaking past each other, no dialog is happening.
>>>>> And as long as it goes on no progress will be made and you
>>>>> will keep going in circles.
>>>>>
>>>>> Parav here made an effort and attempted to summarize
>>>>> use-cases addressed by your proposal but not his.
>>>>> He couldn't resist adding "a yes but" in there oh well.
>>>>> But now I hope you know he knows about your use-cases?
>>>>>
>>>>> So please do the same. Do you see any advantages to Parav's
>>>>> proposal as compared to yours? Try to list them and
>>>>> if possible try not to accompany the list with "yes but"
>>>>> (put it in a separate mail if you must ;) ).
>>>>> If you won't be able to see any, let me know and I'll try to help.
>>>>>
>>>>> Once each of you and Parav have finally heard the other and
>>>>> the other also knows he's been heard, that's when we can
>>>>> try to make progress by looking for something that addresses
>>>>> all use-cases as opposed to endlessly repeating same arguments.
>>>> Sure Michael, I will not say "yes but" here.
>>>>
>>>>   From Parav's proposal, he intends to migrate a member device by its owner
>>>> device through the admin vq,
>>>> thus necessary admin vq commands are introduced in his series.
>>>>
>>>>
>>>> I see his proposal can:
>>>> 1) meet some customers requirements without nested and bare-metal
>>>> 2) align with Nvidia production
>>>> 3) easier to emulate by onboard SOC
>>> Is that all you can see?
>>>
>>> Hint: there's more.
>> please help provide more.
> Just a small subset off the top of my head:
> Error handling.
handle failed live migration? how?

and for other errors, we have mature error handling solutions
in virtio for years, like re-read, NEEDS_RESET.

If that is not good enough, then the corollary is:
admin vq is better than config space,
then the further corollary could be:
we should refactor virito-pci interfaces to admin vq commands,
like how we handle features

Is that true?
> Extendable to other group types such as SIOV.
For SIOV, the admin vq is a transport, but for SR-IOV
the admin vq is a control channel, that is different,
and admin vq can be a side channel.

For example, for SIOV, we config and migrate MSIX through
admin vq. For SRIOV, they are in config space.
> Batching of commands
> less pci transactioons
so this can still be a QOS issue.
If batching, others to starve?
> Support for keeping some data off-device
I don't get it, what is off-device?
The live migration facilities need to fetch data from the device anyway
>
> which does not mean it's better unconditionally.
> are above points clear?
The thing is, what blocks the config space solution?
Why admin vq is a must for live migration?
What's wrong in config space solution?
Shall we refactor everything in virtio-pci to use admin vq?
>
> as long as you guys keep not hearing each other we will keep
> seeing these flame wars. if you expect everyone on virtio-comment
> to follow a 300 message thread you are imo very much mistaken.
I am sure I have not ignored any questions.
I am saying admin vq is problematic for live migration,
at least it doesn't work for nested, so why admin vq is a must for live 
migration?
>
>>>
>>>
>>>
>>>
>>>> The general purpose of his proposal and mine are aligned: migrate virtio
>>>> devices.
>>>>
>>>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>>>
>>>> "
>>>> Let me repeat once again here for the possible steps to collaboration:
>>>>
>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>> basic facility but not under the admin commands
>>>> 2) define the dirty page tracking, device context/states in the
>>>> section of basic facility but not under the admin commands
>>>> 3) define transport specific interfaces or admin commands to access them
>>>> "
>>>>
>>>> I totally agree with his proposal.
>>>>
>>>> Does this work for you Michael?
>>>>
>>>> Thanks
>>>> Zhu Lingshan
>>> I just doubt very much this will work.  What will "define" mean then -
>>> not an interface, just a description in english? I think you
>>> underestimate the difficulty of creating such definitions that
>>> are robust and precise.
>> I think we can review the patch to correct the words.
>>>
>>> Instead I suggest you define a way to submit admin commands that works
>>> for nested and bare-metal (i.e. not admin vq, and not with sriov group
>>> type). And work with Parav to make live migration admin commands work
>>> reasonably will through this interface and with this type.
>> why admin commands are better than registers?
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-11 10:38                                                                 ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-11 10:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
> On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
>>
>> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
>>> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>>>> no direction.
>>>>> OK let me try to direct this discussion.
>>>>> You guys were speaking past each other, no dialog is happening.
>>>>> And as long as it goes on no progress will be made and you
>>>>> will keep going in circles.
>>>>>
>>>>> Parav here made an effort and attempted to summarize
>>>>> use-cases addressed by your proposal but not his.
>>>>> He couldn't resist adding "a yes but" in there oh well.
>>>>> But now I hope you know he knows about your use-cases?
>>>>>
>>>>> So please do the same. Do you see any advantages to Parav's
>>>>> proposal as compared to yours? Try to list them and
>>>>> if possible try not to accompany the list with "yes but"
>>>>> (put it in a separate mail if you must ;) ).
>>>>> If you won't be able to see any, let me know and I'll try to help.
>>>>>
>>>>> Once each of you and Parav have finally heard the other and
>>>>> the other also knows he's been heard, that's when we can
>>>>> try to make progress by looking for something that addresses
>>>>> all use-cases as opposed to endlessly repeating same arguments.
>>>> Sure Michael, I will not say "yes but" here.
>>>>
>>>>   From Parav's proposal, he intends to migrate a member device by its owner
>>>> device through the admin vq,
>>>> thus necessary admin vq commands are introduced in his series.
>>>>
>>>>
>>>> I see his proposal can:
>>>> 1) meet some customers requirements without nested and bare-metal
>>>> 2) align with Nvidia production
>>>> 3) easier to emulate by onboard SOC
>>> Is that all you can see?
>>>
>>> Hint: there's more.
>> please help provide more.
> Just a small subset off the top of my head:
> Error handling.
handle failed live migration? how?

and for other errors, we have mature error handling solutions
in virtio for years, like re-read, NEEDS_RESET.

If that is not good enough, then the corollary is:
admin vq is better than config space,
then the further corollary could be:
we should refactor virito-pci interfaces to admin vq commands,
like how we handle features

Is that true?
> Extendable to other group types such as SIOV.
For SIOV, the admin vq is a transport, but for SR-IOV
the admin vq is a control channel, that is different,
and admin vq can be a side channel.

For example, for SIOV, we config and migrate MSIX through
admin vq. For SRIOV, they are in config space.
> Batching of commands
> less pci transactioons
so this can still be a QOS issue.
If batching, others to starve?
> Support for keeping some data off-device
I don't get it, what is off-device?
The live migration facilities need to fetch data from the device anyway
>
> which does not mean it's better unconditionally.
> are above points clear?
The thing is, what blocks the config space solution?
Why admin vq is a must for live migration?
What's wrong in config space solution?
Shall we refactor everything in virtio-pci to use admin vq?
>
> as long as you guys keep not hearing each other we will keep
> seeing these flame wars. if you expect everyone on virtio-comment
> to follow a 300 message thread you are imo very much mistaken.
I am sure I have not ignored any questions.
I am saying admin vq is problematic for live migration,
at least it doesn't work for nested, so why admin vq is a must for live 
migration?
>
>>>
>>>
>>>
>>>
>>>> The general purpose of his proposal and mine are aligned: migrate virtio
>>>> devices.
>>>>
>>>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>>>
>>>> "
>>>> Let me repeat once again here for the possible steps to collaboration:
>>>>
>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>> basic facility but not under the admin commands
>>>> 2) define the dirty page tracking, device context/states in the
>>>> section of basic facility but not under the admin commands
>>>> 3) define transport specific interfaces or admin commands to access them
>>>> "
>>>>
>>>> I totally agree with his proposal.
>>>>
>>>> Does this work for you Michael?
>>>>
>>>> Thanks
>>>> Zhu Lingshan
>>> I just doubt very much this will work.  What will "define" mean then -
>>> not an interface, just a description in english? I think you
>>> underestimate the difficulty of creating such definitions that
>>> are robust and precise.
>> I think we can review the patch to correct the words.
>>>
>>> Instead I suggest you define a way to submit admin commands that works
>>> for nested and bare-metal (i.e. not admin vq, and not with sriov group
>>> type). And work with Parav to make live migration admin commands work
>>> reasonably will through this interface and with this type.
>> why admin commands are better than registers?
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] RE: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-11 10:38                                                                 ` Zhu, Lingshan
@ 2023-10-11 11:52                                                                   ` Parav Pandit
  -1 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-10-11 11:52 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, October 11, 2023 4:09 PM

> I am sure I have not ignored any questions.
What about below one?

https://lore.kernel.org/virtio-dev/20230921011221-mutt-send-email-mst@kernel.org/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* RE: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-11 11:52                                                                   ` Parav Pandit
  0 siblings, 0 replies; 445+ messages in thread
From: Parav Pandit @ 2023-10-11 11:52 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, October 11, 2023 4:09 PM

> I am sure I have not ignored any questions.
What about below one?

https://lore.kernel.org/virtio-dev/20230921011221-mutt-send-email-mst@kernel.org/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-11 10:38                                                                 ` Zhu, Lingshan
@ 2023-10-12  9:59                                                                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12  9:59 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
> > On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> > > > > On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > > > > > We don't want to repeat the discussions, it looks like endless circle with
> > > > > > > no direction.
> > > > > > OK let me try to direct this discussion.
> > > > > > You guys were speaking past each other, no dialog is happening.
> > > > > > And as long as it goes on no progress will be made and you
> > > > > > will keep going in circles.
> > > > > > 
> > > > > > Parav here made an effort and attempted to summarize
> > > > > > use-cases addressed by your proposal but not his.
> > > > > > He couldn't resist adding "a yes but" in there oh well.
> > > > > > But now I hope you know he knows about your use-cases?
> > > > > > 
> > > > > > So please do the same. Do you see any advantages to Parav's
> > > > > > proposal as compared to yours? Try to list them and
> > > > > > if possible try not to accompany the list with "yes but"
> > > > > > (put it in a separate mail if you must ;) ).
> > > > > > If you won't be able to see any, let me know and I'll try to help.
> > > > > > 
> > > > > > Once each of you and Parav have finally heard the other and
> > > > > > the other also knows he's been heard, that's when we can
> > > > > > try to make progress by looking for something that addresses
> > > > > > all use-cases as opposed to endlessly repeating same arguments.
> > > > > Sure Michael, I will not say "yes but" here.
> > > > > 
> > > > >   From Parav's proposal, he intends to migrate a member device by its owner
> > > > > device through the admin vq,
> > > > > thus necessary admin vq commands are introduced in his series.
> > > > > 
> > > > > 
> > > > > I see his proposal can:
> > > > > 1) meet some customers requirements without nested and bare-metal
> > > > > 2) align with Nvidia production
> > > > > 3) easier to emulate by onboard SOC
> > > > Is that all you can see?
> > > > 
> > > > Hint: there's more.
> > > please help provide more.
> > Just a small subset off the top of my head:
> > Error handling.
> handle failed live migration? how?

For example you can try restarting VM on source.
Or at least report an error to hypervisor.


> and for other errors, we have mature error handling solutions
> in virtio for years, like re-read, NEEDS_RESET.

facepalm

Are you aware of the fact that Linux still doesn't support
it since it turned out to be an extremely awkward interface
to use?

> If that is not good enough, then the corollary is:
> admin vq is better than config space,


You keep confusing admin vq with admin commands.


> then the further corollary could be:
> we should refactor virito-pci interfaces to admin vq commands,
> like how we handle features
> 
> Is that true?
> > Extendable to other group types such as SIOV.
> For SIOV, the admin vq is a transport, but for SR-IOV
> the admin vq is a control channel, that is different,
> and admin vq can be a side channel.
> 
> For example, for SIOV, we config and migrate MSIX through
> admin vq. For SRIOV, they are in config space.

And that's a mess. FYI we already got feedback from Linux devs
who are wondering why we can't come up with a consistent
interface that does everything.


> > Batching of commands
> > less pci transactioons
> so this can still be a QOS issue.
> If batching, others to starve?

And if you block CPU since you are not accepting
a posted write this is better?

> > Support for keeping some data off-device
> I don't get it, what is off-device?
> The live migration facilities need to fetch data from the device anyway

Heh this is what was driving nvidia to use DMA so heavily all this time.
no - if data is not in registers, device can fetch the data from
across pci express link, presumably with a local cache.


> > 
> > which does not mean it's better unconditionally.
> > are above points clear?
> The thing is, what blocks the config space solution?
> Why admin vq is a must for live migration?
> What's wrong in config space solution?

Whan you say what's wrong do you mean you still see no
advantages to doing DMA at all? config space is just better
with no drawbacks?

> Shall we refactor everything in virtio-pci to use admin vq?

> > 
> > as long as you guys keep not hearing each other we will keep
> > seeing these flame wars. if you expect everyone on virtio-comment
> > to follow a 300 message thread you are imo very much mistaken.
> I am sure I have not ignored any questions.
> I am saying admin vq is problematic for live migration,
> at least it doesn't work for nested, so why admin vq is a must for live
> migration?


My suggestion for you was to add admin command support to
VF memory, as an alternative to admin vq. It looks like that
will address the nested virt usecase.

> > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > The general purpose of his proposal and mine are aligned: migrate virtio
> > > > > devices.
> > > > > 
> > > > > Jason has ever proposed to collaborate, please allow me quote his proposal:
> > > > > 
> > > > > "
> > > > > Let me repeat once again here for the possible steps to collaboration:
> > > > > 
> > > > > 1) define virtqueue state, inflight descriptors in the section of
> > > > > basic facility but not under the admin commands
> > > > > 2) define the dirty page tracking, device context/states in the
> > > > > section of basic facility but not under the admin commands
> > > > > 3) define transport specific interfaces or admin commands to access them
> > > > > "
> > > > > 
> > > > > I totally agree with his proposal.
> > > > > 
> > > > > Does this work for you Michael?
> > > > > 
> > > > > Thanks
> > > > > Zhu Lingshan
> > > > I just doubt very much this will work.  What will "define" mean then -
> > > > not an interface, just a description in english? I think you
> > > > underestimate the difficulty of creating such definitions that
> > > > are robust and precise.
> > > I think we can review the patch to correct the words.
> > > > 
> > > > Instead I suggest you define a way to submit admin commands that works
> > > > for nested and bare-metal (i.e. not admin vq, and not with sriov group
> > > > type). And work with Parav to make live migration admin commands work
> > > > reasonably will through this interface and with this type.
> > > why admin commands are better than registers?
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-12  9:59                                                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12  9:59 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
> > On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> > > > > On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > > > > > We don't want to repeat the discussions, it looks like endless circle with
> > > > > > > no direction.
> > > > > > OK let me try to direct this discussion.
> > > > > > You guys were speaking past each other, no dialog is happening.
> > > > > > And as long as it goes on no progress will be made and you
> > > > > > will keep going in circles.
> > > > > > 
> > > > > > Parav here made an effort and attempted to summarize
> > > > > > use-cases addressed by your proposal but not his.
> > > > > > He couldn't resist adding "a yes but" in there oh well.
> > > > > > But now I hope you know he knows about your use-cases?
> > > > > > 
> > > > > > So please do the same. Do you see any advantages to Parav's
> > > > > > proposal as compared to yours? Try to list them and
> > > > > > if possible try not to accompany the list with "yes but"
> > > > > > (put it in a separate mail if you must ;) ).
> > > > > > If you won't be able to see any, let me know and I'll try to help.
> > > > > > 
> > > > > > Once each of you and Parav have finally heard the other and
> > > > > > the other also knows he's been heard, that's when we can
> > > > > > try to make progress by looking for something that addresses
> > > > > > all use-cases as opposed to endlessly repeating same arguments.
> > > > > Sure Michael, I will not say "yes but" here.
> > > > > 
> > > > >   From Parav's proposal, he intends to migrate a member device by its owner
> > > > > device through the admin vq,
> > > > > thus necessary admin vq commands are introduced in his series.
> > > > > 
> > > > > 
> > > > > I see his proposal can:
> > > > > 1) meet some customers requirements without nested and bare-metal
> > > > > 2) align with Nvidia production
> > > > > 3) easier to emulate by onboard SOC
> > > > Is that all you can see?
> > > > 
> > > > Hint: there's more.
> > > please help provide more.
> > Just a small subset off the top of my head:
> > Error handling.
> handle failed live migration? how?

For example you can try restarting VM on source.
Or at least report an error to hypervisor.


> and for other errors, we have mature error handling solutions
> in virtio for years, like re-read, NEEDS_RESET.

facepalm

Are you aware of the fact that Linux still doesn't support
it since it turned out to be an extremely awkward interface
to use?

> If that is not good enough, then the corollary is:
> admin vq is better than config space,


You keep confusing admin vq with admin commands.


> then the further corollary could be:
> we should refactor virito-pci interfaces to admin vq commands,
> like how we handle features
> 
> Is that true?
> > Extendable to other group types such as SIOV.
> For SIOV, the admin vq is a transport, but for SR-IOV
> the admin vq is a control channel, that is different,
> and admin vq can be a side channel.
> 
> For example, for SIOV, we config and migrate MSIX through
> admin vq. For SRIOV, they are in config space.

And that's a mess. FYI we already got feedback from Linux devs
who are wondering why we can't come up with a consistent
interface that does everything.


> > Batching of commands
> > less pci transactioons
> so this can still be a QOS issue.
> If batching, others to starve?

And if you block CPU since you are not accepting
a posted write this is better?

> > Support for keeping some data off-device
> I don't get it, what is off-device?
> The live migration facilities need to fetch data from the device anyway

Heh this is what was driving nvidia to use DMA so heavily all this time.
no - if data is not in registers, device can fetch the data from
across pci express link, presumably with a local cache.


> > 
> > which does not mean it's better unconditionally.
> > are above points clear?
> The thing is, what blocks the config space solution?
> Why admin vq is a must for live migration?
> What's wrong in config space solution?

Whan you say what's wrong do you mean you still see no
advantages to doing DMA at all? config space is just better
with no drawbacks?

> Shall we refactor everything in virtio-pci to use admin vq?

> > 
> > as long as you guys keep not hearing each other we will keep
> > seeing these flame wars. if you expect everyone on virtio-comment
> > to follow a 300 message thread you are imo very much mistaken.
> I am sure I have not ignored any questions.
> I am saying admin vq is problematic for live migration,
> at least it doesn't work for nested, so why admin vq is a must for live
> migration?


My suggestion for you was to add admin command support to
VF memory, as an alternative to admin vq. It looks like that
will address the nested virt usecase.

> > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > The general purpose of his proposal and mine are aligned: migrate virtio
> > > > > devices.
> > > > > 
> > > > > Jason has ever proposed to collaborate, please allow me quote his proposal:
> > > > > 
> > > > > "
> > > > > Let me repeat once again here for the possible steps to collaboration:
> > > > > 
> > > > > 1) define virtqueue state, inflight descriptors in the section of
> > > > > basic facility but not under the admin commands
> > > > > 2) define the dirty page tracking, device context/states in the
> > > > > section of basic facility but not under the admin commands
> > > > > 3) define transport specific interfaces or admin commands to access them
> > > > > "
> > > > > 
> > > > > I totally agree with his proposal.
> > > > > 
> > > > > Does this work for you Michael?
> > > > > 
> > > > > Thanks
> > > > > Zhu Lingshan
> > > > I just doubt very much this will work.  What will "define" mean then -
> > > > not an interface, just a description in english? I think you
> > > > underestimate the difficulty of creating such definitions that
> > > > are robust and precise.
> > > I think we can review the patch to correct the words.
> > > > 
> > > > Instead I suggest you define a way to submit admin commands that works
> > > > for nested and bare-metal (i.e. not admin vq, and not with sriov group
> > > > type). And work with Parav to make live migration admin commands work
> > > > reasonably will through this interface and with this type.
> > > why admin commands are better than registers?
> > > 
> > > This publicly archived list offers a means to provide input to the
> > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > 
> > > In order to verify user consent to the Feedback License terms and
> > > to minimize spam in the list archive, subscription is required
> > > before posting.
> > > 
> > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > List help: virtio-comment-help@lists.oasis-open.org
> > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > Committee: https://www.oasis-open.org/committees/virtio/
> > > Join OASIS: https://www.oasis-open.org/join/
> > > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-09  8:49                                                   ` Jason Wang
@ 2023-10-12 10:03                                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 10:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: Parav Pandit, Zhu, Lingshan, virtio-dev

On Mon, Oct 09, 2023 at 04:49:33PM +0800, Jason Wang wrote:
> > If something is needed for debug please proceed to add such debug interface.
> 
> Most debugging is simply the dumping of device states so I don't know
> what you want to say here.

Donnu about Parav but I want to say that I'd like to see driver
support for the debugging interface before we add it to
spec. General "may be useful" doesn't cut it  - too easy to
justify anything by "debugging".


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-12  9:59                                                                   ` Michael S. Tsirkin
@ 2023-10-12 10:49                                                                     ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/12/2023 5:59 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
>>> On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
>>>> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>>>>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>>>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>>>>>> no direction.
>>>>>>> OK let me try to direct this discussion.
>>>>>>> You guys were speaking past each other, no dialog is happening.
>>>>>>> And as long as it goes on no progress will be made and you
>>>>>>> will keep going in circles.
>>>>>>>
>>>>>>> Parav here made an effort and attempted to summarize
>>>>>>> use-cases addressed by your proposal but not his.
>>>>>>> He couldn't resist adding "a yes but" in there oh well.
>>>>>>> But now I hope you know he knows about your use-cases?
>>>>>>>
>>>>>>> So please do the same. Do you see any advantages to Parav's
>>>>>>> proposal as compared to yours? Try to list them and
>>>>>>> if possible try not to accompany the list with "yes but"
>>>>>>> (put it in a separate mail if you must ;) ).
>>>>>>> If you won't be able to see any, let me know and I'll try to help.
>>>>>>>
>>>>>>> Once each of you and Parav have finally heard the other and
>>>>>>> the other also knows he's been heard, that's when we can
>>>>>>> try to make progress by looking for something that addresses
>>>>>>> all use-cases as opposed to endlessly repeating same arguments.
>>>>>> Sure Michael, I will not say "yes but" here.
>>>>>>
>>>>>>    From Parav's proposal, he intends to migrate a member device by its owner
>>>>>> device through the admin vq,
>>>>>> thus necessary admin vq commands are introduced in his series.
>>>>>>
>>>>>>
>>>>>> I see his proposal can:
>>>>>> 1) meet some customers requirements without nested and bare-metal
>>>>>> 2) align with Nvidia production
>>>>>> 3) easier to emulate by onboard SOC
>>>>> Is that all you can see?
>>>>>
>>>>> Hint: there's more.
>>>> please help provide more.
>>> Just a small subset off the top of my head:
>>> Error handling.
>> handle failed live migration? how?
> For example you can try restarting VM on source.
> Or at least report an error to hypervisor.
I am not sure resetting a VM due to failed live migration is
a good idea, should we resume the VM instead? Then try other
convergence algorithm?

And I think current live migration solution already implements error
detector, like sees a time out?
>
>
>> and for other errors, we have mature error handling solutions
>> in virtio for years, like re-read, NEEDS_RESET.
> facepalm
>
> Are you aware of the fact that Linux still doesn't support
> it since it turned out to be an extremely awkward interface
> to use?
I think we have implemented this in virtio driver,
like re-read to check FEATURES.
>
>> If that is not good enough, then the corollary is:
>> admin vq is better than config space,
>
> You keep confusing admin vq with admin commands.
OK, so are admin commands better than registers?
>
>
>> then the further corollary could be:
>> we should refactor virito-pci interfaces to admin vq commands,
>> like how we handle features
>>
>> Is that true?
>>> Extendable to other group types such as SIOV.
>> For SIOV, the admin vq is a transport, but for SR-IOV
>> the admin vq is a control channel, that is different,
>> and admin vq can be a side channel.
>>
>> For example, for SIOV, we config and migrate MSIX through
>> admin vq. For SRIOV, they are in config space.
> And that's a mess. FYI we already got feedback from Linux devs
> who are wondering why we can't come up with a consistent
> interface that does everything.
I believe config space is a consistent interface for PCI.
For SIOV, we need a new transport layer anyway.
>
>
>>> Batching of commands
>>> less pci transactioons
>> so this can still be a QOS issue.
>> If batching, others to starve?
> And if you block CPU since you are not accepting
> a posted write this is better?
I don't get it, block guest CPU?
>
>>> Support for keeping some data off-device
>> I don't get it, what is off-device?
>> The live migration facilities need to fetch data from the device anyway
> Heh this is what was driving nvidia to use DMA so heavily all this time.
> no - if data is not in registers, device can fetch the data from
> across pci express link, presumably with a local cache.
For PCI based configuration, like MSI, we need to fetch from config 
space anyway.
For others like dirty page, we can store the bitmap in host memory, and use
PASID for isolation.
>
>
>>> which does not mean it's better unconditionally.
>>> are above points clear?
>> The thing is, what blocks the config space solution?
>> Why admin vq is a must for live migration?
>> What's wrong in config space solution?
> Whan you say what's wrong do you mean you still see no
> advantages to doing DMA at all? config space is just better
> with no drawbacks?
still, if admin vq or admin commands are better than config space,
we should refactor the whole virtio-pci interfaces to admin vq.

And Jason has ever proposed to build admin vq LM on our basic
facilities, but I see this has been rejected.
>
>> Shall we refactor everything in virtio-pci to use admin vq?
>>> as long as you guys keep not hearing each other we will keep
>>> seeing these flame wars. if you expect everyone on virtio-comment
>>> to follow a 300 message thread you are imo very much mistaken.
>> I am sure I have not ignored any questions.
>> I am saying admin vq is problematic for live migration,
>> at least it doesn't work for nested, so why admin vq is a must for live
>> migration?
>
> My suggestion for you was to add admin command support to
> VF memory, as an alternative to admin vq. It looks like that
> will address the nested virt usecase.
If you mean carrying some big bulk of data like dirty page information,
we implemented a facility in host memory which is isolated by PASID.

I should send a new series soon, so we can work on the patch.

Thanks for your suggestions and efforts anyway.
>
>>>>>
>>>>>
>>>>>
>>>>>> The general purpose of his proposal and mine are aligned: migrate virtio
>>>>>> devices.
>>>>>>
>>>>>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>>>>>
>>>>>> "
>>>>>> Let me repeat once again here for the possible steps to collaboration:
>>>>>>
>>>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>>>> basic facility but not under the admin commands
>>>>>> 2) define the dirty page tracking, device context/states in the
>>>>>> section of basic facility but not under the admin commands
>>>>>> 3) define transport specific interfaces or admin commands to access them
>>>>>> "
>>>>>>
>>>>>> I totally agree with his proposal.
>>>>>>
>>>>>> Does this work for you Michael?
>>>>>>
>>>>>> Thanks
>>>>>> Zhu Lingshan
>>>>> I just doubt very much this will work.  What will "define" mean then -
>>>>> not an interface, just a description in english? I think you
>>>>> underestimate the difficulty of creating such definitions that
>>>>> are robust and precise.
>>>> I think we can review the patch to correct the words.
>>>>> Instead I suggest you define a way to submit admin commands that works
>>>>> for nested and bare-metal (i.e. not admin vq, and not with sriov group
>>>>> type). And work with Parav to make live migration admin commands work
>>>>> reasonably will through this interface and with this type.
>>>> why admin commands are better than registers?
>>>>
>>>> This publicly archived list offers a means to provide input to the
>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>
>>>> In order to verify user consent to the Feedback License terms and
>>>> to minimize spam in the list archive, subscription is required
>>>> before posting.
>>>>
>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-12 10:49                                                                     ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/12/2023 5:59 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
>>> On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
>>>> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>>>>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>>>>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>>>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>>>>>> no direction.
>>>>>>> OK let me try to direct this discussion.
>>>>>>> You guys were speaking past each other, no dialog is happening.
>>>>>>> And as long as it goes on no progress will be made and you
>>>>>>> will keep going in circles.
>>>>>>>
>>>>>>> Parav here made an effort and attempted to summarize
>>>>>>> use-cases addressed by your proposal but not his.
>>>>>>> He couldn't resist adding "a yes but" in there oh well.
>>>>>>> But now I hope you know he knows about your use-cases?
>>>>>>>
>>>>>>> So please do the same. Do you see any advantages to Parav's
>>>>>>> proposal as compared to yours? Try to list them and
>>>>>>> if possible try not to accompany the list with "yes but"
>>>>>>> (put it in a separate mail if you must ;) ).
>>>>>>> If you won't be able to see any, let me know and I'll try to help.
>>>>>>>
>>>>>>> Once each of you and Parav have finally heard the other and
>>>>>>> the other also knows he's been heard, that's when we can
>>>>>>> try to make progress by looking for something that addresses
>>>>>>> all use-cases as opposed to endlessly repeating same arguments.
>>>>>> Sure Michael, I will not say "yes but" here.
>>>>>>
>>>>>>    From Parav's proposal, he intends to migrate a member device by its owner
>>>>>> device through the admin vq,
>>>>>> thus necessary admin vq commands are introduced in his series.
>>>>>>
>>>>>>
>>>>>> I see his proposal can:
>>>>>> 1) meet some customers requirements without nested and bare-metal
>>>>>> 2) align with Nvidia production
>>>>>> 3) easier to emulate by onboard SOC
>>>>> Is that all you can see?
>>>>>
>>>>> Hint: there's more.
>>>> please help provide more.
>>> Just a small subset off the top of my head:
>>> Error handling.
>> handle failed live migration? how?
> For example you can try restarting VM on source.
> Or at least report an error to hypervisor.
I am not sure resetting a VM due to failed live migration is
a good idea, should we resume the VM instead? Then try other
convergence algorithm?

And I think current live migration solution already implements error
detector, like sees a time out?
>
>
>> and for other errors, we have mature error handling solutions
>> in virtio for years, like re-read, NEEDS_RESET.
> facepalm
>
> Are you aware of the fact that Linux still doesn't support
> it since it turned out to be an extremely awkward interface
> to use?
I think we have implemented this in virtio driver,
like re-read to check FEATURES.
>
>> If that is not good enough, then the corollary is:
>> admin vq is better than config space,
>
> You keep confusing admin vq with admin commands.
OK, so are admin commands better than registers?
>
>
>> then the further corollary could be:
>> we should refactor virito-pci interfaces to admin vq commands,
>> like how we handle features
>>
>> Is that true?
>>> Extendable to other group types such as SIOV.
>> For SIOV, the admin vq is a transport, but for SR-IOV
>> the admin vq is a control channel, that is different,
>> and admin vq can be a side channel.
>>
>> For example, for SIOV, we config and migrate MSIX through
>> admin vq. For SRIOV, they are in config space.
> And that's a mess. FYI we already got feedback from Linux devs
> who are wondering why we can't come up with a consistent
> interface that does everything.
I believe config space is a consistent interface for PCI.
For SIOV, we need a new transport layer anyway.
>
>
>>> Batching of commands
>>> less pci transactioons
>> so this can still be a QOS issue.
>> If batching, others to starve?
> And if you block CPU since you are not accepting
> a posted write this is better?
I don't get it, block guest CPU?
>
>>> Support for keeping some data off-device
>> I don't get it, what is off-device?
>> The live migration facilities need to fetch data from the device anyway
> Heh this is what was driving nvidia to use DMA so heavily all this time.
> no - if data is not in registers, device can fetch the data from
> across pci express link, presumably with a local cache.
For PCI based configuration, like MSI, we need to fetch from config 
space anyway.
For others like dirty page, we can store the bitmap in host memory, and use
PASID for isolation.
>
>
>>> which does not mean it's better unconditionally.
>>> are above points clear?
>> The thing is, what blocks the config space solution?
>> Why admin vq is a must for live migration?
>> What's wrong in config space solution?
> Whan you say what's wrong do you mean you still see no
> advantages to doing DMA at all? config space is just better
> with no drawbacks?
still, if admin vq or admin commands are better than config space,
we should refactor the whole virtio-pci interfaces to admin vq.

And Jason has ever proposed to build admin vq LM on our basic
facilities, but I see this has been rejected.
>
>> Shall we refactor everything in virtio-pci to use admin vq?
>>> as long as you guys keep not hearing each other we will keep
>>> seeing these flame wars. if you expect everyone on virtio-comment
>>> to follow a 300 message thread you are imo very much mistaken.
>> I am sure I have not ignored any questions.
>> I am saying admin vq is problematic for live migration,
>> at least it doesn't work for nested, so why admin vq is a must for live
>> migration?
>
> My suggestion for you was to add admin command support to
> VF memory, as an alternative to admin vq. It looks like that
> will address the nested virt usecase.
If you mean carrying some big bulk of data like dirty page information,
we implemented a facility in host memory which is isolated by PASID.

I should send a new series soon, so we can work on the patch.

Thanks for your suggestions and efforts anyway.
>
>>>>>
>>>>>
>>>>>
>>>>>> The general purpose of his proposal and mine are aligned: migrate virtio
>>>>>> devices.
>>>>>>
>>>>>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>>>>>
>>>>>> "
>>>>>> Let me repeat once again here for the possible steps to collaboration:
>>>>>>
>>>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>>>> basic facility but not under the admin commands
>>>>>> 2) define the dirty page tracking, device context/states in the
>>>>>> section of basic facility but not under the admin commands
>>>>>> 3) define transport specific interfaces or admin commands to access them
>>>>>> "
>>>>>>
>>>>>> I totally agree with his proposal.
>>>>>>
>>>>>> Does this work for you Michael?
>>>>>>
>>>>>> Thanks
>>>>>> Zhu Lingshan
>>>>> I just doubt very much this will work.  What will "define" mean then -
>>>>> not an interface, just a description in english? I think you
>>>>> underestimate the difficulty of creating such definitions that
>>>>> are robust and precise.
>>>> I think we can review the patch to correct the words.
>>>>> Instead I suggest you define a way to submit admin commands that works
>>>>> for nested and bare-metal (i.e. not admin vq, and not with sriov group
>>>>> type). And work with Parav to make live migration admin commands work
>>>>> reasonably will through this interface and with this type.
>>>> why admin commands are better than registers?
>>>>
>>>> This publicly archived list offers a means to provide input to the
>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>
>>>> In order to verify user consent to the Feedback License terms and
>>>> to minimize spam in the list archive, subscription is required
>>>> before posting.
>>>>
>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-11 11:52                                                                   ` Parav Pandit
@ 2023-10-12 10:57                                                                     ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:57 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev



On 10/11/2023 7:52 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, October 11, 2023 4:09 PM
>> I am sure I have not ignored any questions.
> What about below one?
>
> https://lore.kernel.org/virtio-dev/20230921011221-mutt-send-email-mst@kernel.org/
This is to discuss a attacking model, I have given the answer in another 
thread,
I have even provide an example of how malicious SW can dump guest 
security through admin vq
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-12 10:57                                                                     ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-12 10:57 UTC (permalink / raw)
  To: Parav Pandit, Michael S. Tsirkin
  Cc: Cornelia Huck, Jason Wang, eperezma, Stefan Hajnoczi,
	virtio-comment, virtio-dev



On 10/11/2023 7:52 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, October 11, 2023 4:09 PM
>> I am sure I have not ignored any questions.
> What about below one?
>
> https://lore.kernel.org/virtio-dev/20230921011221-mutt-send-email-mst@kernel.org/
This is to discuss a attacking model, I have given the answer in another 
thread,
I have even provide an example of how malicious SW can dump guest 
security through admin vq
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-12 10:49                                                                     ` Zhu, Lingshan
@ 2023-10-12 11:12                                                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 11:12 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/12/2023 5:59 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
> > > > > On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> > > > > > > On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > > > > > > > We don't want to repeat the discussions, it looks like endless circle with
> > > > > > > > > no direction.
> > > > > > > > OK let me try to direct this discussion.
> > > > > > > > You guys were speaking past each other, no dialog is happening.
> > > > > > > > And as long as it goes on no progress will be made and you
> > > > > > > > will keep going in circles.
> > > > > > > > 
> > > > > > > > Parav here made an effort and attempted to summarize
> > > > > > > > use-cases addressed by your proposal but not his.
> > > > > > > > He couldn't resist adding "a yes but" in there oh well.
> > > > > > > > But now I hope you know he knows about your use-cases?
> > > > > > > > 
> > > > > > > > So please do the same. Do you see any advantages to Parav's
> > > > > > > > proposal as compared to yours? Try to list them and
> > > > > > > > if possible try not to accompany the list with "yes but"
> > > > > > > > (put it in a separate mail if you must ;) ).
> > > > > > > > If you won't be able to see any, let me know and I'll try to help.
> > > > > > > > 
> > > > > > > > Once each of you and Parav have finally heard the other and
> > > > > > > > the other also knows he's been heard, that's when we can
> > > > > > > > try to make progress by looking for something that addresses
> > > > > > > > all use-cases as opposed to endlessly repeating same arguments.
> > > > > > > Sure Michael, I will not say "yes but" here.
> > > > > > > 
> > > > > > >    From Parav's proposal, he intends to migrate a member device by its owner
> > > > > > > device through the admin vq,
> > > > > > > thus necessary admin vq commands are introduced in his series.
> > > > > > > 
> > > > > > > 
> > > > > > > I see his proposal can:
> > > > > > > 1) meet some customers requirements without nested and bare-metal
> > > > > > > 2) align with Nvidia production
> > > > > > > 3) easier to emulate by onboard SOC
> > > > > > Is that all you can see?
> > > > > > 
> > > > > > Hint: there's more.
> > > > > please help provide more.
> > > > Just a small subset off the top of my head:
> > > > Error handling.
> > > handle failed live migration? how?
> > For example you can try restarting VM on source.
> > Or at least report an error to hypervisor.
> I am not sure resetting a VM due to failed live migration is
> a good idea, should we resume the VM instead?

Yes - when I said restarting I meant resuming not resetting.

> Then try other
> convergence algorithm?

Talking about device failures here nothing to do with convergence.
But yes, can e.g. try a different destination.

> 
> And I think current live migration solution already implements error
> detector, like sees a time out?

it is extremely hard to predict how
long will it take a random piece of hardware from a random
vendor to respond. even if you do timeouts break nested
don't they ;) and finally, they provide no indication
of what went wrong whatsoever.

> > 
> > 
> > > and for other errors, we have mature error handling solutions
> > > in virtio for years, like re-read, NEEDS_RESET.
> > facepalm
> > 
> > Are you aware of the fact that Linux still doesn't support
> > it since it turned out to be an extremely awkward interface
> > to use?
> I think we have implemented this in virtio driver,
> like re-read to check FEATURES.

grep for NEEDS_RESET in drivers/virtio and weep.

> > 
> > > If that is not good enough, then the corollary is:
> > > admin vq is better than config space,
> > 
> > You keep confusing admin vq with admin commands.
> OK, so are admin commands better than registers?

They have more functionality for sure.

> > 
> > 
> > > then the further corollary could be:
> > > we should refactor virito-pci interfaces to admin vq commands,
> > > like how we handle features
> > > 
> > > Is that true?
> > > > Extendable to other group types such as SIOV.
> > > For SIOV, the admin vq is a transport, but for SR-IOV
> > > the admin vq is a control channel, that is different,
> > > and admin vq can be a side channel.
> > > 
> > > For example, for SIOV, we config and migrate MSIX through
> > > admin vq. For SRIOV, they are in config space.
> > And that's a mess. FYI we already got feedback from Linux devs
> > who are wondering why we can't come up with a consistent
> > interface that does everything.
> I believe config space is a consistent interface for PCI.
> For SIOV, we need a new transport layer anyway.
> > 
> > 
> > > > Batching of commands
> > > > less pci transactioons
> > > so this can still be a QOS issue.
> > > If batching, others to starve?
> > And if you block CPU since you are not accepting
> > a posted write this is better?
> I don't get it, block guest CPU?

host cpu in fact. if you flood pci expess with transactions
this is exactly what happens.

> > 
> > > > Support for keeping some data off-device
> > > I don't get it, what is off-device?
> > > The live migration facilities need to fetch data from the device anyway
> > Heh this is what was driving nvidia to use DMA so heavily all this time.
> > no - if data is not in registers, device can fetch the data from
> > across pci express link, presumably with a local cache.
> For PCI based configuration, like MSI, we need to fetch from config space
> anyway.
> For others like dirty page, we can store the bitmap in host memory, and use
> PASID for isolation.

Oh really?  What do we get by not using same mechanism for
device state then? This begins to look exactly like admin vq.

> > 
> > 
> > > > which does not mean it's better unconditionally.
> > > > are above points clear?
> > > The thing is, what blocks the config space solution?
> > > Why admin vq is a must for live migration?
> > > What's wrong in config space solution?
> > Whan you say what's wrong do you mean you still see no
> > advantages to doing DMA at all? config space is just better
> > with no drawbacks?
> still, if admin vq or admin commands are better than config space,
> we should refactor the whole virtio-pci interfaces to admin vq.

mixing admin vq and command up again apparently.
We want to support virtio over admin commands for SIOV, yes.
And once that's supported nothing should prevent using that
for SRIOV too.

> And Jason has ever proposed to build admin vq LM on our basic
> facilities, but I see this has been rejected.

Please do not conclude that you just need to resubmit.

> > 
> > > Shall we refactor everything in virtio-pci to use admin vq?
> > > > as long as you guys keep not hearing each other we will keep
> > > > seeing these flame wars. if you expect everyone on virtio-comment
> > > > to follow a 300 message thread you are imo very much mistaken.
> > > I am sure I have not ignored any questions.
> > > I am saying admin vq is problematic for live migration,
> > > at least it doesn't work for nested, so why admin vq is a must for live
> > > migration?
> > 
> > My suggestion for you was to add admin command support to
> > VF memory, as an alternative to admin vq. It looks like that
> > will address the nested virt usecase.
> If you mean carrying some big bulk of data like dirty page information,
> we implemented a facility in host memory which is isolated by PASID.
> 
> I should send a new series soon, so we can work on the patch.

I hope that one does not just restart the same flame war.
As it will if people keep talking past each other and
not listening.

> Thanks for your suggestions and efforts anyway.
> > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > The general purpose of his proposal and mine are aligned: migrate virtio
> > > > > > > devices.
> > > > > > > 
> > > > > > > Jason has ever proposed to collaborate, please allow me quote his proposal:
> > > > > > > 
> > > > > > > "
> > > > > > > Let me repeat once again here for the possible steps to collaboration:
> > > > > > > 
> > > > > > > 1) define virtqueue state, inflight descriptors in the section of
> > > > > > > basic facility but not under the admin commands
> > > > > > > 2) define the dirty page tracking, device context/states in the
> > > > > > > section of basic facility but not under the admin commands
> > > > > > > 3) define transport specific interfaces or admin commands to access them
> > > > > > > "
> > > > > > > 
> > > > > > > I totally agree with his proposal.
> > > > > > > 
> > > > > > > Does this work for you Michael?
> > > > > > > 
> > > > > > > Thanks
> > > > > > > Zhu Lingshan
> > > > > > I just doubt very much this will work.  What will "define" mean then -
> > > > > > not an interface, just a description in english? I think you
> > > > > > underestimate the difficulty of creating such definitions that
> > > > > > are robust and precise.
> > > > > I think we can review the patch to correct the words.
> > > > > > Instead I suggest you define a way to submit admin commands that works
> > > > > > for nested and bare-metal (i.e. not admin vq, and not with sriov group
> > > > > > type). And work with Parav to make live migration admin commands work
> > > > > > reasonably will through this interface and with this type.
> > > > > why admin commands are better than registers?
> > > > > 
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > 
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > > 
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-12 11:12                                                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 11:12 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/12/2023 5:59 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
> > > 
> > > On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
> > > > > On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
> > > > > > > On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
> > > > > > > > > We don't want to repeat the discussions, it looks like endless circle with
> > > > > > > > > no direction.
> > > > > > > > OK let me try to direct this discussion.
> > > > > > > > You guys were speaking past each other, no dialog is happening.
> > > > > > > > And as long as it goes on no progress will be made and you
> > > > > > > > will keep going in circles.
> > > > > > > > 
> > > > > > > > Parav here made an effort and attempted to summarize
> > > > > > > > use-cases addressed by your proposal but not his.
> > > > > > > > He couldn't resist adding "a yes but" in there oh well.
> > > > > > > > But now I hope you know he knows about your use-cases?
> > > > > > > > 
> > > > > > > > So please do the same. Do you see any advantages to Parav's
> > > > > > > > proposal as compared to yours? Try to list them and
> > > > > > > > if possible try not to accompany the list with "yes but"
> > > > > > > > (put it in a separate mail if you must ;) ).
> > > > > > > > If you won't be able to see any, let me know and I'll try to help.
> > > > > > > > 
> > > > > > > > Once each of you and Parav have finally heard the other and
> > > > > > > > the other also knows he's been heard, that's when we can
> > > > > > > > try to make progress by looking for something that addresses
> > > > > > > > all use-cases as opposed to endlessly repeating same arguments.
> > > > > > > Sure Michael, I will not say "yes but" here.
> > > > > > > 
> > > > > > >    From Parav's proposal, he intends to migrate a member device by its owner
> > > > > > > device through the admin vq,
> > > > > > > thus necessary admin vq commands are introduced in his series.
> > > > > > > 
> > > > > > > 
> > > > > > > I see his proposal can:
> > > > > > > 1) meet some customers requirements without nested and bare-metal
> > > > > > > 2) align with Nvidia production
> > > > > > > 3) easier to emulate by onboard SOC
> > > > > > Is that all you can see?
> > > > > > 
> > > > > > Hint: there's more.
> > > > > please help provide more.
> > > > Just a small subset off the top of my head:
> > > > Error handling.
> > > handle failed live migration? how?
> > For example you can try restarting VM on source.
> > Or at least report an error to hypervisor.
> I am not sure resetting a VM due to failed live migration is
> a good idea, should we resume the VM instead?

Yes - when I said restarting I meant resuming not resetting.

> Then try other
> convergence algorithm?

Talking about device failures here nothing to do with convergence.
But yes, can e.g. try a different destination.

> 
> And I think current live migration solution already implements error
> detector, like sees a time out?

it is extremely hard to predict how
long will it take a random piece of hardware from a random
vendor to respond. even if you do timeouts break nested
don't they ;) and finally, they provide no indication
of what went wrong whatsoever.

> > 
> > 
> > > and for other errors, we have mature error handling solutions
> > > in virtio for years, like re-read, NEEDS_RESET.
> > facepalm
> > 
> > Are you aware of the fact that Linux still doesn't support
> > it since it turned out to be an extremely awkward interface
> > to use?
> I think we have implemented this in virtio driver,
> like re-read to check FEATURES.

grep for NEEDS_RESET in drivers/virtio and weep.

> > 
> > > If that is not good enough, then the corollary is:
> > > admin vq is better than config space,
> > 
> > You keep confusing admin vq with admin commands.
> OK, so are admin commands better than registers?

They have more functionality for sure.

> > 
> > 
> > > then the further corollary could be:
> > > we should refactor virito-pci interfaces to admin vq commands,
> > > like how we handle features
> > > 
> > > Is that true?
> > > > Extendable to other group types such as SIOV.
> > > For SIOV, the admin vq is a transport, but for SR-IOV
> > > the admin vq is a control channel, that is different,
> > > and admin vq can be a side channel.
> > > 
> > > For example, for SIOV, we config and migrate MSIX through
> > > admin vq. For SRIOV, they are in config space.
> > And that's a mess. FYI we already got feedback from Linux devs
> > who are wondering why we can't come up with a consistent
> > interface that does everything.
> I believe config space is a consistent interface for PCI.
> For SIOV, we need a new transport layer anyway.
> > 
> > 
> > > > Batching of commands
> > > > less pci transactioons
> > > so this can still be a QOS issue.
> > > If batching, others to starve?
> > And if you block CPU since you are not accepting
> > a posted write this is better?
> I don't get it, block guest CPU?

host cpu in fact. if you flood pci expess with transactions
this is exactly what happens.

> > 
> > > > Support for keeping some data off-device
> > > I don't get it, what is off-device?
> > > The live migration facilities need to fetch data from the device anyway
> > Heh this is what was driving nvidia to use DMA so heavily all this time.
> > no - if data is not in registers, device can fetch the data from
> > across pci express link, presumably with a local cache.
> For PCI based configuration, like MSI, we need to fetch from config space
> anyway.
> For others like dirty page, we can store the bitmap in host memory, and use
> PASID for isolation.

Oh really?  What do we get by not using same mechanism for
device state then? This begins to look exactly like admin vq.

> > 
> > 
> > > > which does not mean it's better unconditionally.
> > > > are above points clear?
> > > The thing is, what blocks the config space solution?
> > > Why admin vq is a must for live migration?
> > > What's wrong in config space solution?
> > Whan you say what's wrong do you mean you still see no
> > advantages to doing DMA at all? config space is just better
> > with no drawbacks?
> still, if admin vq or admin commands are better than config space,
> we should refactor the whole virtio-pci interfaces to admin vq.

mixing admin vq and command up again apparently.
We want to support virtio over admin commands for SIOV, yes.
And once that's supported nothing should prevent using that
for SRIOV too.

> And Jason has ever proposed to build admin vq LM on our basic
> facilities, but I see this has been rejected.

Please do not conclude that you just need to resubmit.

> > 
> > > Shall we refactor everything in virtio-pci to use admin vq?
> > > > as long as you guys keep not hearing each other we will keep
> > > > seeing these flame wars. if you expect everyone on virtio-comment
> > > > to follow a 300 message thread you are imo very much mistaken.
> > > I am sure I have not ignored any questions.
> > > I am saying admin vq is problematic for live migration,
> > > at least it doesn't work for nested, so why admin vq is a must for live
> > > migration?
> > 
> > My suggestion for you was to add admin command support to
> > VF memory, as an alternative to admin vq. It looks like that
> > will address the nested virt usecase.
> If you mean carrying some big bulk of data like dirty page information,
> we implemented a facility in host memory which is isolated by PASID.
> 
> I should send a new series soon, so we can work on the patch.

I hope that one does not just restart the same flame war.
As it will if people keep talking past each other and
not listening.

> Thanks for your suggestions and efforts anyway.
> > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > The general purpose of his proposal and mine are aligned: migrate virtio
> > > > > > > devices.
> > > > > > > 
> > > > > > > Jason has ever proposed to collaborate, please allow me quote his proposal:
> > > > > > > 
> > > > > > > "
> > > > > > > Let me repeat once again here for the possible steps to collaboration:
> > > > > > > 
> > > > > > > 1) define virtqueue state, inflight descriptors in the section of
> > > > > > > basic facility but not under the admin commands
> > > > > > > 2) define the dirty page tracking, device context/states in the
> > > > > > > section of basic facility but not under the admin commands
> > > > > > > 3) define transport specific interfaces or admin commands to access them
> > > > > > > "
> > > > > > > 
> > > > > > > I totally agree with his proposal.
> > > > > > > 
> > > > > > > Does this work for you Michael?
> > > > > > > 
> > > > > > > Thanks
> > > > > > > Zhu Lingshan
> > > > > > I just doubt very much this will work.  What will "define" mean then -
> > > > > > not an interface, just a description in english? I think you
> > > > > > underestimate the difficulty of creating such definitions that
> > > > > > are robust and precise.
> > > > > I think we can review the patch to correct the words.
> > > > > > Instead I suggest you define a way to submit admin commands that works
> > > > > > for nested and bare-metal (i.e. not admin vq, and not with sriov group
> > > > > > type). And work with Parav to make live migration admin commands work
> > > > > > reasonably will through this interface and with this type.
> > > > > why admin commands are better than registers?
> > > > > 
> > > > > This publicly archived list offers a means to provide input to the
> > > > > OASIS Virtual I/O Device (VIRTIO) TC.
> > > > > 
> > > > > In order to verify user consent to the Feedback License terms and
> > > > > to minimize spam in the list archive, subscription is required
> > > > > before posting.
> > > > > 
> > > > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> > > > > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> > > > > List help: virtio-comment-help@lists.oasis-open.org
> > > > > List archive: https://lists.oasis-open.org/archives/virtio-comment/
> > > > > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> > > > > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> > > > > Committee: https://www.oasis-open.org/committees/virtio/
> > > > > Join OASIS: https://www.oasis-open.org/join/
> > > > > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-12 10:57                                                                     ` Zhu, Lingshan
@ 2023-10-12 11:13                                                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 11:13 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Parav Pandit, Cornelia Huck, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Thu, Oct 12, 2023 at 06:57:35PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/11/2023 7:52 PM, Parav Pandit wrote:
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Wednesday, October 11, 2023 4:09 PM
> > > I am sure I have not ignored any questions.
> > What about below one?
> > 
> > https://lore.kernel.org/virtio-dev/20230921011221-mutt-send-email-mst@kernel.org/
> This is to discuss a attacking model, I have given the answer in another
> thread,
> I have even provide an example of how malicious SW can dump guest security
> through admin vq

No one cares, without encryption hypervisor is in control anyway.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-12 11:13                                                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 11:13 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Parav Pandit, Cornelia Huck, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Thu, Oct 12, 2023 at 06:57:35PM +0800, Zhu, Lingshan wrote:
> 
> 
> On 10/11/2023 7:52 PM, Parav Pandit wrote:
> > > From: Zhu, Lingshan <lingshan.zhu@intel.com>
> > > Sent: Wednesday, October 11, 2023 4:09 PM
> > > I am sure I have not ignored any questions.
> > What about below one?
> > 
> > https://lore.kernel.org/virtio-dev/20230921011221-mutt-send-email-mst@kernel.org/
> This is to discuss a attacking model, I have given the answer in another
> thread,
> I have even provide an example of how malicious SW can dump guest security
> through admin vq

No one cares, without encryption hypervisor is in control anyway.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-12 10:49                                                                     ` Zhu, Lingshan
@ 2023-10-12 14:38                                                                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 14:38 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
> For PCI based configuration, like MSI, we need to fetch from config space
> anyway.
> For others like dirty page, we can store the bitmap in host memory, and use
> PASID for isolation.

Ok. So how a about a simple interface along the lines of
u64 cmd_address
u8 ready_flags:1

For this kind of stuff?

-- 
MST


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-12 14:38                                                                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 445+ messages in thread
From: Michael S. Tsirkin @ 2023-10-12 14:38 UTC (permalink / raw)
  To: Zhu, Lingshan
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev

On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
> For PCI based configuration, like MSI, we need to fetch from config space
> anyway.
> For others like dirty page, we can store the bitmap in host memory, and use
> PASID for isolation.

Ok. So how a about a simple interface along the lines of
u64 cmd_address
u8 ready_flags:1

For this kind of stuff?

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-12 11:12                                                                       ` Michael S. Tsirkin
@ 2023-10-13 10:18                                                                         ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-13 10:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/12/2023 7:12 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/12/2023 5:59 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
>>>> On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
>>>>>> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>>>>>>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>>>>>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>>>>>>>> no direction.
>>>>>>>>> OK let me try to direct this discussion.
>>>>>>>>> You guys were speaking past each other, no dialog is happening.
>>>>>>>>> And as long as it goes on no progress will be made and you
>>>>>>>>> will keep going in circles.
>>>>>>>>>
>>>>>>>>> Parav here made an effort and attempted to summarize
>>>>>>>>> use-cases addressed by your proposal but not his.
>>>>>>>>> He couldn't resist adding "a yes but" in there oh well.
>>>>>>>>> But now I hope you know he knows about your use-cases?
>>>>>>>>>
>>>>>>>>> So please do the same. Do you see any advantages to Parav's
>>>>>>>>> proposal as compared to yours? Try to list them and
>>>>>>>>> if possible try not to accompany the list with "yes but"
>>>>>>>>> (put it in a separate mail if you must ;) ).
>>>>>>>>> If you won't be able to see any, let me know and I'll try to help.
>>>>>>>>>
>>>>>>>>> Once each of you and Parav have finally heard the other and
>>>>>>>>> the other also knows he's been heard, that's when we can
>>>>>>>>> try to make progress by looking for something that addresses
>>>>>>>>> all use-cases as opposed to endlessly repeating same arguments.
>>>>>>>> Sure Michael, I will not say "yes but" here.
>>>>>>>>
>>>>>>>>     From Parav's proposal, he intends to migrate a member device by its owner
>>>>>>>> device through the admin vq,
>>>>>>>> thus necessary admin vq commands are introduced in his series.
>>>>>>>>
>>>>>>>>
>>>>>>>> I see his proposal can:
>>>>>>>> 1) meet some customers requirements without nested and bare-metal
>>>>>>>> 2) align with Nvidia production
>>>>>>>> 3) easier to emulate by onboard SOC
>>>>>>> Is that all you can see?
>>>>>>>
>>>>>>> Hint: there's more.
>>>>>> please help provide more.
>>>>> Just a small subset off the top of my head:
>>>>> Error handling.
>>>> handle failed live migration? how?
>>> For example you can try restarting VM on source.
>>> Or at least report an error to hypervisor.
>> I am not sure resetting a VM due to failed live migration is
>> a good idea, should we resume the VM instead?
> Yes - when I said restarting I meant resuming not resetting.
OK, we have implemented the interface to resume the device, to clear 
suspend.
>
>> Then try other
>> convergence algorithm?
> Talking about device failures here nothing to do with convergence.
> But yes, can e.g. try a different destination.
OK
>
>> And I think current live migration solution already implements error
>> detector, like sees a time out?
> it is extremely hard to predict how
> long will it take a random piece of hardware from a random
> vendor to respond. even if you do timeouts break nested
> don't they ;) and finally, they provide no indication
> of what went wrong whatsoever.
the hypervisor would not complete the live migration
process before device migration done.

I think the hypervisor or the orchestration layer
know the LM status anyway.
>
>>>
>>>> and for other errors, we have mature error handling solutions
>>>> in virtio for years, like re-read, NEEDS_RESET.
>>> facepalm
>>>
>>> Are you aware of the fact that Linux still doesn't support
>>> it since it turned out to be an extremely awkward interface
>>> to use?
>> I think we have implemented this in virtio driver,
>> like re-read to check FEATURES.
> grep for NEEDS_RESET in drivers/virtio and weep.
that is interesting, virito driver lives so many years
without handling NEEDS_RESET, so good device quality and
layers of error handlers.

what prevent implementing NEEDS_RESET? Is it because of how to reinitialize?
It looks like we should do that.

For now, re-read working well at least.
>
>>>> If that is not good enough, then the corollary is:
>>>> admin vq is better than config space,
>>> You keep confusing admin vq with admin commands.
>> OK, so are admin commands better than registers?
> They have more functionality for sure.
yes they are powerful than registers.

However, to suspend, resume, config dirty page facility,
registers are low hanging fruits.
>
>>>
>>>> then the further corollary could be:
>>>> we should refactor virito-pci interfaces to admin vq commands,
>>>> like how we handle features
>>>>
>>>> Is that true?
>>>>> Extendable to other group types such as SIOV.
>>>> For SIOV, the admin vq is a transport, but for SR-IOV
>>>> the admin vq is a control channel, that is different,
>>>> and admin vq can be a side channel.
>>>>
>>>> For example, for SIOV, we config and migrate MSIX through
>>>> admin vq. For SRIOV, they are in config space.
>>> And that's a mess. FYI we already got feedback from Linux devs
>>> who are wondering why we can't come up with a consistent
>>> interface that does everything.
>> I believe config space is a consistent interface for PCI.
>> For SIOV, we need a new transport layer anyway.
>>>
>>>>> Batching of commands
>>>>> less pci transactioons
>>>> so this can still be a QOS issue.
>>>> If batching, others to starve?
>>> And if you block CPU since you are not accepting
>>> a posted write this is better?
>> I don't get it, block guest CPU?
> host cpu in fact. if you flood pci expess with transactions
> this is exactly what happens.
Not sure hypervisor will implement this just because adapting to admin 
vq live migration.
>
>>>>> Support for keeping some data off-device
>>>> I don't get it, what is off-device?
>>>> The live migration facilities need to fetch data from the device anyway
>>> Heh this is what was driving nvidia to use DMA so heavily all this time.
>>> no - if data is not in registers, device can fetch the data from
>>> across pci express link, presumably with a local cache.
>> For PCI based configuration, like MSI, we need to fetch from config space
>> anyway.
>> For others like dirty page, we can store the bitmap in host memory, and use
>> PASID for isolation.
> Oh really?  What do we get by not using same mechanism for
> device state then? This begins to look exactly like admin vq.
implementing a register to config a logging address in host memory and 
isolated by PASID.
Also there are other few registers to control the facility, like 
enable/disable.
>
>>>
>>>>> which does not mean it's better unconditionally.
>>>>> are above points clear?
>>>> The thing is, what blocks the config space solution?
>>>> Why admin vq is a must for live migration?
>>>> What's wrong in config space solution?
>>> Whan you say what's wrong do you mean you still see no
>>> advantages to doing DMA at all? config space is just better
>>> with no drawbacks?
>> still, if admin vq or admin commands are better than config space,
>> we should refactor the whole virtio-pci interfaces to admin vq.
> mixing admin vq and command up again apparently.
> We want to support virtio over admin commands for SIOV, yes.
> And once that's supported nothing should prevent using that
> for SRIOV too.
admin commands work for SRIOV, but overkill for live migration.

For example, to suspend a device, what is the benefit using a
admin command than just a register?

And if we want a bar to process admin commands, do we need
to implement some fields like data_length, total_length and
etc, much more complex than a register.
>
>> And Jason has ever proposed to build admin vq LM on our basic
>> facilities, but I see this has been rejected.
> Please do not conclude that you just need to resubmit.
>
>>>> Shall we refactor everything in virtio-pci to use admin vq?
>>>>> as long as you guys keep not hearing each other we will keep
>>>>> seeing these flame wars. if you expect everyone on virtio-comment
>>>>> to follow a 300 message thread you are imo very much mistaken.
>>>> I am sure I have not ignored any questions.
>>>> I am saying admin vq is problematic for live migration,
>>>> at least it doesn't work for nested, so why admin vq is a must for live
>>>> migration?
>>> My suggestion for you was to add admin command support to
>>> VF memory, as an alternative to admin vq. It looks like that
>>> will address the nested virt usecase.
>> If you mean carrying some big bulk of data like dirty page information,
>> we implemented a facility in host memory which is isolated by PASID.
>>
>> I should send a new series soon, so we can work on the patch.
> I hope that one does not just restart the same flame war.
> As it will if people keep talking past each other and
> not listening.
V2 will include dirty page tracking, so we can review the design.

Yes I hope no flame wars.
>
>> Thanks for your suggestions and efforts anyway.
>>>>>>>
>>>>>>>
>>>>>>>> The general purpose of his proposal and mine are aligned: migrate virtio
>>>>>>>> devices.
>>>>>>>>
>>>>>>>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>>>>>>>
>>>>>>>> "
>>>>>>>> Let me repeat once again here for the possible steps to collaboration:
>>>>>>>>
>>>>>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>>>>>> basic facility but not under the admin commands
>>>>>>>> 2) define the dirty page tracking, device context/states in the
>>>>>>>> section of basic facility but not under the admin commands
>>>>>>>> 3) define transport specific interfaces or admin commands to access them
>>>>>>>> "
>>>>>>>>
>>>>>>>> I totally agree with his proposal.
>>>>>>>>
>>>>>>>> Does this work for you Michael?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Zhu Lingshan
>>>>>>> I just doubt very much this will work.  What will "define" mean then -
>>>>>>> not an interface, just a description in english? I think you
>>>>>>> underestimate the difficulty of creating such definitions that
>>>>>>> are robust and precise.
>>>>>> I think we can review the patch to correct the words.
>>>>>>> Instead I suggest you define a way to submit admin commands that works
>>>>>>> for nested and bare-metal (i.e. not admin vq, and not with sriov group
>>>>>>> type). And work with Parav to make live migration admin commands work
>>>>>>> reasonably will through this interface and with this type.
>>>>>> why admin commands are better than registers?
>>>>>>
>>>>>> This publicly archived list offers a means to provide input to the
>>>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>>>
>>>>>> In order to verify user consent to the Feedback License terms and
>>>>>> to minimize spam in the list archive, subscription is required
>>>>>> before posting.
>>>>>>
>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-13 10:18                                                                         ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-13 10:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/12/2023 7:12 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
>>
>> On 10/12/2023 5:59 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 11, 2023 at 06:38:32PM +0800, Zhu, Lingshan wrote:
>>>> On 10/11/2023 6:20 PM, Michael S. Tsirkin wrote:
>>>>> On Mon, Oct 09, 2023 at 06:01:42PM +0800, Zhu, Lingshan wrote:
>>>>>> On 9/27/2023 11:40 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Sep 27, 2023 at 04:20:01PM +0800, Zhu, Lingshan wrote:
>>>>>>>> On 9/26/2023 6:48 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Tue, Sep 26, 2023 at 05:25:42PM +0800, Zhu, Lingshan wrote:
>>>>>>>>>> We don't want to repeat the discussions, it looks like endless circle with
>>>>>>>>>> no direction.
>>>>>>>>> OK let me try to direct this discussion.
>>>>>>>>> You guys were speaking past each other, no dialog is happening.
>>>>>>>>> And as long as it goes on no progress will be made and you
>>>>>>>>> will keep going in circles.
>>>>>>>>>
>>>>>>>>> Parav here made an effort and attempted to summarize
>>>>>>>>> use-cases addressed by your proposal but not his.
>>>>>>>>> He couldn't resist adding "a yes but" in there oh well.
>>>>>>>>> But now I hope you know he knows about your use-cases?
>>>>>>>>>
>>>>>>>>> So please do the same. Do you see any advantages to Parav's
>>>>>>>>> proposal as compared to yours? Try to list them and
>>>>>>>>> if possible try not to accompany the list with "yes but"
>>>>>>>>> (put it in a separate mail if you must ;) ).
>>>>>>>>> If you won't be able to see any, let me know and I'll try to help.
>>>>>>>>>
>>>>>>>>> Once each of you and Parav have finally heard the other and
>>>>>>>>> the other also knows he's been heard, that's when we can
>>>>>>>>> try to make progress by looking for something that addresses
>>>>>>>>> all use-cases as opposed to endlessly repeating same arguments.
>>>>>>>> Sure Michael, I will not say "yes but" here.
>>>>>>>>
>>>>>>>>     From Parav's proposal, he intends to migrate a member device by its owner
>>>>>>>> device through the admin vq,
>>>>>>>> thus necessary admin vq commands are introduced in his series.
>>>>>>>>
>>>>>>>>
>>>>>>>> I see his proposal can:
>>>>>>>> 1) meet some customers requirements without nested and bare-metal
>>>>>>>> 2) align with Nvidia production
>>>>>>>> 3) easier to emulate by onboard SOC
>>>>>>> Is that all you can see?
>>>>>>>
>>>>>>> Hint: there's more.
>>>>>> please help provide more.
>>>>> Just a small subset off the top of my head:
>>>>> Error handling.
>>>> handle failed live migration? how?
>>> For example you can try restarting VM on source.
>>> Or at least report an error to hypervisor.
>> I am not sure resetting a VM due to failed live migration is
>> a good idea, should we resume the VM instead?
> Yes - when I said restarting I meant resuming not resetting.
OK, we have implemented the interface to resume the device, to clear 
suspend.
>
>> Then try other
>> convergence algorithm?
> Talking about device failures here nothing to do with convergence.
> But yes, can e.g. try a different destination.
OK
>
>> And I think current live migration solution already implements error
>> detector, like sees a time out?
> it is extremely hard to predict how
> long will it take a random piece of hardware from a random
> vendor to respond. even if you do timeouts break nested
> don't they ;) and finally, they provide no indication
> of what went wrong whatsoever.
the hypervisor would not complete the live migration
process before device migration done.

I think the hypervisor or the orchestration layer
know the LM status anyway.
>
>>>
>>>> and for other errors, we have mature error handling solutions
>>>> in virtio for years, like re-read, NEEDS_RESET.
>>> facepalm
>>>
>>> Are you aware of the fact that Linux still doesn't support
>>> it since it turned out to be an extremely awkward interface
>>> to use?
>> I think we have implemented this in virtio driver,
>> like re-read to check FEATURES.
> grep for NEEDS_RESET in drivers/virtio and weep.
that is interesting, virito driver lives so many years
without handling NEEDS_RESET, so good device quality and
layers of error handlers.

what prevent implementing NEEDS_RESET? Is it because of how to reinitialize?
It looks like we should do that.

For now, re-read working well at least.
>
>>>> If that is not good enough, then the corollary is:
>>>> admin vq is better than config space,
>>> You keep confusing admin vq with admin commands.
>> OK, so are admin commands better than registers?
> They have more functionality for sure.
yes they are powerful than registers.

However, to suspend, resume, config dirty page facility,
registers are low hanging fruits.
>
>>>
>>>> then the further corollary could be:
>>>> we should refactor virito-pci interfaces to admin vq commands,
>>>> like how we handle features
>>>>
>>>> Is that true?
>>>>> Extendable to other group types such as SIOV.
>>>> For SIOV, the admin vq is a transport, but for SR-IOV
>>>> the admin vq is a control channel, that is different,
>>>> and admin vq can be a side channel.
>>>>
>>>> For example, for SIOV, we config and migrate MSIX through
>>>> admin vq. For SRIOV, they are in config space.
>>> And that's a mess. FYI we already got feedback from Linux devs
>>> who are wondering why we can't come up with a consistent
>>> interface that does everything.
>> I believe config space is a consistent interface for PCI.
>> For SIOV, we need a new transport layer anyway.
>>>
>>>>> Batching of commands
>>>>> less pci transactioons
>>>> so this can still be a QOS issue.
>>>> If batching, others to starve?
>>> And if you block CPU since you are not accepting
>>> a posted write this is better?
>> I don't get it, block guest CPU?
> host cpu in fact. if you flood pci expess with transactions
> this is exactly what happens.
Not sure hypervisor will implement this just because adapting to admin 
vq live migration.
>
>>>>> Support for keeping some data off-device
>>>> I don't get it, what is off-device?
>>>> The live migration facilities need to fetch data from the device anyway
>>> Heh this is what was driving nvidia to use DMA so heavily all this time.
>>> no - if data is not in registers, device can fetch the data from
>>> across pci express link, presumably with a local cache.
>> For PCI based configuration, like MSI, we need to fetch from config space
>> anyway.
>> For others like dirty page, we can store the bitmap in host memory, and use
>> PASID for isolation.
> Oh really?  What do we get by not using same mechanism for
> device state then? This begins to look exactly like admin vq.
implementing a register to config a logging address in host memory and 
isolated by PASID.
Also there are other few registers to control the facility, like 
enable/disable.
>
>>>
>>>>> which does not mean it's better unconditionally.
>>>>> are above points clear?
>>>> The thing is, what blocks the config space solution?
>>>> Why admin vq is a must for live migration?
>>>> What's wrong in config space solution?
>>> Whan you say what's wrong do you mean you still see no
>>> advantages to doing DMA at all? config space is just better
>>> with no drawbacks?
>> still, if admin vq or admin commands are better than config space,
>> we should refactor the whole virtio-pci interfaces to admin vq.
> mixing admin vq and command up again apparently.
> We want to support virtio over admin commands for SIOV, yes.
> And once that's supported nothing should prevent using that
> for SRIOV too.
admin commands work for SRIOV, but overkill for live migration.

For example, to suspend a device, what is the benefit using a
admin command than just a register?

And if we want a bar to process admin commands, do we need
to implement some fields like data_length, total_length and
etc, much more complex than a register.
>
>> And Jason has ever proposed to build admin vq LM on our basic
>> facilities, but I see this has been rejected.
> Please do not conclude that you just need to resubmit.
>
>>>> Shall we refactor everything in virtio-pci to use admin vq?
>>>>> as long as you guys keep not hearing each other we will keep
>>>>> seeing these flame wars. if you expect everyone on virtio-comment
>>>>> to follow a 300 message thread you are imo very much mistaken.
>>>> I am sure I have not ignored any questions.
>>>> I am saying admin vq is problematic for live migration,
>>>> at least it doesn't work for nested, so why admin vq is a must for live
>>>> migration?
>>> My suggestion for you was to add admin command support to
>>> VF memory, as an alternative to admin vq. It looks like that
>>> will address the nested virt usecase.
>> If you mean carrying some big bulk of data like dirty page information,
>> we implemented a facility in host memory which is isolated by PASID.
>>
>> I should send a new series soon, so we can work on the patch.
> I hope that one does not just restart the same flame war.
> As it will if people keep talking past each other and
> not listening.
V2 will include dirty page tracking, so we can review the design.

Yes I hope no flame wars.
>
>> Thanks for your suggestions and efforts anyway.
>>>>>>>
>>>>>>>
>>>>>>>> The general purpose of his proposal and mine are aligned: migrate virtio
>>>>>>>> devices.
>>>>>>>>
>>>>>>>> Jason has ever proposed to collaborate, please allow me quote his proposal:
>>>>>>>>
>>>>>>>> "
>>>>>>>> Let me repeat once again here for the possible steps to collaboration:
>>>>>>>>
>>>>>>>> 1) define virtqueue state, inflight descriptors in the section of
>>>>>>>> basic facility but not under the admin commands
>>>>>>>> 2) define the dirty page tracking, device context/states in the
>>>>>>>> section of basic facility but not under the admin commands
>>>>>>>> 3) define transport specific interfaces or admin commands to access them
>>>>>>>> "
>>>>>>>>
>>>>>>>> I totally agree with his proposal.
>>>>>>>>
>>>>>>>> Does this work for you Michael?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Zhu Lingshan
>>>>>>> I just doubt very much this will work.  What will "define" mean then -
>>>>>>> not an interface, just a description in english? I think you
>>>>>>> underestimate the difficulty of creating such definitions that
>>>>>>> are robust and precise.
>>>>>> I think we can review the patch to correct the words.
>>>>>>> Instead I suggest you define a way to submit admin commands that works
>>>>>>> for nested and bare-metal (i.e. not admin vq, and not with sriov group
>>>>>>> type). And work with Parav to make live migration admin commands work
>>>>>>> reasonably will through this interface and with this type.
>>>>>> why admin commands are better than registers?
>>>>>>
>>>>>> This publicly archived list offers a means to provide input to the
>>>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>>>
>>>>>> In order to verify user consent to the Feedback License terms and
>>>>>> to minimize spam in the list archive, subscription is required
>>>>>> before posting.
>>>>>>
>>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

* [virtio-dev] Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
  2023-10-12 14:38                                                                       ` Michael S. Tsirkin
@ 2023-10-13 10:23                                                                         ` Zhu, Lingshan
  -1 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-13 10:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/12/2023 10:38 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
>> For PCI based configuration, like MSI, we need to fetch from config space
>> anyway.
>> For others like dirty page, we can store the bitmap in host memory, and use
>> PASID for isolation.
> Ok. So how a about a simple interface along the lines of
> u64 cmd_address
> u8 ready_flags:1
>
> For this kind of stuff?
Yes, something like this, log dirty pages in host memory.
The is the draft, not finished yet.

  84 +\begin{lstlisting}
  85 +struct virtio_pci_dity_page_track {
  86 +        u8 enable;               /* Read-Write */
  87 +        u8 gra_power;            /* Read-Write */
  88 +        u8 reserved[2];
  89 +        le32 {
  90 +            pasid: 20;           /* Read-Write */
  91 +            reserved: 12;
  92 +        };
  93 +        le64 bitmap_addr;        /* Read-Write */
  94 +        le64 bitmap_length;      /* Read-Write */
  95 +};
  96 +\end{lstlisting}


>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 445+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
@ 2023-10-13 10:23                                                                         ` Zhu, Lingshan
  0 siblings, 0 replies; 445+ messages in thread
From: Zhu, Lingshan @ 2023-10-13 10:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Cornelia Huck, Parav Pandit, Jason Wang, eperezma,
	Stefan Hajnoczi, virtio-comment, virtio-dev



On 10/12/2023 10:38 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 12, 2023 at 06:49:51PM +0800, Zhu, Lingshan wrote:
>> For PCI based configuration, like MSI, we need to fetch from config space
>> anyway.
>> For others like dirty page, we can store the bitmap in host memory, and use
>> PASID for isolation.
> Ok. So how a about a simple interface along the lines of
> u64 cmd_address
> u8 ready_flags:1
>
> For this kind of stuff?
Yes, something like this, log dirty pages in host memory.
The is the draft, not finished yet.

  84 +\begin{lstlisting}
  85 +struct virtio_pci_dity_page_track {
  86 +        u8 enable;               /* Read-Write */
  87 +        u8 gra_power;            /* Read-Write */
  88 +        u8 reserved[2];
  89 +        le32 {
  90 +            pasid: 20;           /* Read-Write */
  91 +            reserved: 12;
  92 +        };
  93 +        le64 bitmap_addr;        /* Read-Write */
  94 +        le64 bitmap_length;      /* Read-Write */
  95 +};
  96 +\end{lstlisting}


>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 445+ messages in thread

end of thread, other threads:[~2023-10-13 10:24 UTC | newest]

Thread overview: 445+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-06  8:16 [virtio-dev] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Zhu Lingshan
2023-09-06  8:16 ` [virtio-comment] " Zhu Lingshan
2023-09-06  8:16 ` [virtio-dev] [PATCH 1/5] virtio: introduce vq state as basic facility Zhu Lingshan
2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
2023-09-06  8:28   ` [virtio-dev] " Michael S. Tsirkin
2023-09-06  8:28     ` Michael S. Tsirkin
2023-09-06  9:43     ` [virtio-dev] " Zhu, Lingshan
2023-09-06  9:43       ` Zhu, Lingshan
2023-09-14 11:25   ` [virtio-dev] " Michael S. Tsirkin
2023-09-14 11:25     ` Michael S. Tsirkin
2023-09-15  2:46     ` [virtio-dev] " Zhu, Lingshan
2023-09-15  2:46       ` Zhu, Lingshan
2023-09-06  8:16 ` [virtio-dev] [PATCH 2/5] virtio: introduce SUSPEND bit in device status Zhu Lingshan
2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
2023-09-14 11:34   ` [virtio-dev] " Michael S. Tsirkin
2023-09-14 11:34     ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  2:57     ` Zhu, Lingshan
2023-09-15  2:57       ` [virtio-dev] " Zhu, Lingshan
2023-09-15 11:10       ` Michael S. Tsirkin
2023-09-15 11:10         ` [virtio-comment] " Michael S. Tsirkin
2023-09-18  2:56         ` [virtio-dev] " Zhu, Lingshan
2023-09-18  2:56           ` [virtio-comment] " Zhu, Lingshan
2023-09-18  4:42           ` [virtio-dev] " Parav Pandit
2023-09-18  4:42             ` Parav Pandit
2023-09-18  5:14             ` [virtio-dev] " Zhu, Lingshan
2023-09-18  5:14               ` Zhu, Lingshan
2023-09-18  6:17               ` [virtio-dev] " Parav Pandit
2023-09-18  6:17                 ` Parav Pandit
2023-09-18  6:38                 ` [virtio-dev] " Zhu, Lingshan
2023-09-18  6:38                   ` Zhu, Lingshan
2023-09-18  6:46                   ` [virtio-dev] " Parav Pandit
2023-09-18  6:46                     ` Parav Pandit
2023-09-18  6:49                     ` [virtio-dev] " Zhu, Lingshan
2023-09-18  6:49                       ` Zhu, Lingshan
2023-09-18  6:50           ` [virtio-dev] " Zhu, Lingshan
2023-09-18  6:50             ` [virtio-comment] " Zhu, Lingshan
2023-09-06  8:16 ` [virtio-dev] [PATCH 3/5] virtqueue: constraints for virtqueue state Zhu Lingshan
2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
2023-09-14 11:30   ` [virtio-dev] " Michael S. Tsirkin
2023-09-14 11:30     ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  2:59     ` [virtio-dev] " Zhu, Lingshan
2023-09-15  2:59       ` [virtio-comment] " Zhu, Lingshan
2023-09-15 11:16       ` [virtio-dev] " Michael S. Tsirkin
2023-09-15 11:16         ` [virtio-comment] " Michael S. Tsirkin
2023-09-18  3:02         ` [virtio-dev] " Zhu, Lingshan
2023-09-18  3:02           ` Zhu, Lingshan
2023-09-18 17:30           ` [virtio-dev] " Michael S. Tsirkin
2023-09-18 17:30             ` Michael S. Tsirkin
2023-09-19  7:56             ` [virtio-dev] " Zhu, Lingshan
2023-09-19  7:56               ` Zhu, Lingshan
2023-09-06  8:16 ` [virtio-comment] [PATCH 4/5] virtqueue: ignore resetting vqs when SUSPEND Zhu Lingshan
2023-09-06  8:16   ` [virtio-dev] " Zhu Lingshan
2023-09-14 11:09   ` [virtio-dev] " Michael S. Tsirkin
2023-09-14 11:09     ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  4:06     ` [virtio-dev] " Zhu, Lingshan
2023-09-15  4:06       ` [virtio-comment] " Zhu, Lingshan
2023-09-06  8:16 ` [virtio-dev] [PATCH 5/5] virtio-pci: implement VIRTIO_F_QUEUE_STATE Zhu Lingshan
2023-09-06  8:16   ` [virtio-comment] " Zhu Lingshan
2023-09-06  8:32   ` [virtio-dev] " Michael S. Tsirkin
2023-09-06  8:32     ` Michael S. Tsirkin
2023-09-06  8:37     ` [virtio-dev] " Parav Pandit
2023-09-06  8:37       ` Parav Pandit
2023-09-06  9:37     ` [virtio-dev] " Zhu, Lingshan
2023-09-06  9:37       ` Zhu, Lingshan
2023-09-11  3:01     ` [virtio-dev] " Jason Wang
2023-09-11  3:01       ` Jason Wang
2023-09-11  4:11       ` [virtio-dev] " Parav Pandit
2023-09-11  4:11         ` Parav Pandit
2023-09-11  6:30         ` [virtio-dev] " Jason Wang
2023-09-11  6:30           ` Jason Wang
2023-09-11  6:47           ` [virtio-dev] " Parav Pandit
2023-09-11  6:47             ` Parav Pandit
2023-09-11  6:58             ` [virtio-dev] " Zhu, Lingshan
2023-09-11  6:58               ` Zhu, Lingshan
2023-09-11  7:07               ` [virtio-dev] " Parav Pandit
2023-09-11  7:07                 ` Parav Pandit
2023-09-11  7:18                 ` [virtio-dev] " Zhu, Lingshan
2023-09-11  7:18                   ` Zhu, Lingshan
2023-09-11  7:30                   ` [virtio-dev] " Parav Pandit
2023-09-11  7:30                     ` Parav Pandit
2023-09-11  7:58                     ` [virtio-dev] " Zhu, Lingshan
2023-09-11  7:58                       ` Zhu, Lingshan
2023-09-11  8:12                       ` [virtio-dev] " Parav Pandit
2023-09-11  8:12                         ` Parav Pandit
2023-09-11  8:46                         ` [virtio-dev] " Zhu, Lingshan
2023-09-11  8:46                           ` Zhu, Lingshan
2023-09-11  9:05                           ` [virtio-dev] " Parav Pandit
2023-09-11  9:05                             ` Parav Pandit
2023-09-11  9:32                             ` [virtio-dev] " Zhu, Lingshan
2023-09-11  9:32                               ` Zhu, Lingshan
2023-09-11 10:21                               ` [virtio-dev] " Parav Pandit
2023-09-11 10:21                                 ` Parav Pandit
2023-09-12  4:06                                 ` [virtio-dev] " Zhu, Lingshan
2023-09-12  4:06                                   ` Zhu, Lingshan
2023-09-12  5:58                                   ` [virtio-dev] " Parav Pandit
2023-09-12  5:58                                     ` Parav Pandit
2023-09-12  6:33                                     ` [virtio-dev] " Zhu, Lingshan
2023-09-12  6:33                                       ` Zhu, Lingshan
2023-09-12  6:47                                       ` [virtio-dev] " Parav Pandit
2023-09-12  6:47                                         ` Parav Pandit
2023-09-12  7:27                                         ` [virtio-dev] " Zhu, Lingshan
2023-09-12  7:27                                           ` Zhu, Lingshan
2023-09-12  7:40                                           ` Parav Pandit
2023-09-12  7:40                                             ` [virtio-dev] " Parav Pandit
2023-09-12  9:02                                             ` [virtio-dev] " Zhu, Lingshan
2023-09-12  9:02                                               ` Zhu, Lingshan
2023-09-12  9:21                                               ` [virtio-dev] " Parav Pandit
2023-09-12  9:21                                                 ` Parav Pandit
2023-09-12 13:03                                                 ` [virtio-dev] " Zhu, Lingshan
2023-09-12 13:03                                                   ` Zhu, Lingshan
2023-09-12 13:43                                                   ` [virtio-dev] " Parav Pandit
2023-09-12 13:43                                                     ` Parav Pandit
2023-09-13  4:01                                                     ` [virtio-dev] " Zhu, Lingshan
2023-09-13  4:01                                                       ` Zhu, Lingshan
2023-09-13  4:12                                                       ` [virtio-dev] " Parav Pandit
2023-09-13  4:12                                                         ` Parav Pandit
2023-09-13  4:20                                                         ` [virtio-dev] " Zhu, Lingshan
2023-09-13  4:20                                                           ` Zhu, Lingshan
2023-09-13  4:36                                                           ` [virtio-dev] " Parav Pandit
2023-09-13  4:36                                                             ` Parav Pandit
2023-09-14  8:19                                                             ` [virtio-dev] " Zhu, Lingshan
2023-09-14  8:19                                                               ` Zhu, Lingshan
2023-09-11 11:50                               ` [virtio-dev] " Parav Pandit
2023-09-11 11:50                                 ` Parav Pandit
2023-09-12  3:43                                 ` [virtio-dev] " Jason Wang
2023-09-12  3:43                                   ` Jason Wang
2023-09-12  5:50                                   ` [virtio-dev] " Parav Pandit
2023-09-12  5:50                                     ` Parav Pandit
2023-09-13  4:44                                     ` [virtio-dev] " Jason Wang
2023-09-13  4:44                                       ` Jason Wang
2023-09-13  6:05                                       ` [virtio-dev] " Parav Pandit
2023-09-13  6:05                                         ` Parav Pandit
2023-09-14  3:11                                         ` [virtio-dev] " Jason Wang
2023-09-14  3:11                                           ` Jason Wang
2023-09-17  5:22                                           ` [virtio-dev] " Parav Pandit
2023-09-17  5:22                                             ` Parav Pandit
2023-09-19  4:35                                             ` [virtio-dev] " Jason Wang
2023-09-19  4:35                                               ` Jason Wang
2023-09-19  7:33                                               ` [virtio-dev] " Parav Pandit
2023-09-19  7:33                                                 ` Parav Pandit
2023-09-12  3:48                                 ` [virtio-dev] " Zhu, Lingshan
2023-09-12  3:48                                   ` Zhu, Lingshan
2023-09-12  5:51                                   ` [virtio-dev] " Parav Pandit
2023-09-12  5:51                                     ` Parav Pandit
2023-09-12  6:37                                     ` [virtio-dev] " Zhu, Lingshan
2023-09-12  6:37                                       ` Zhu, Lingshan
2023-09-12  6:49                                       ` [virtio-dev] " Parav Pandit
2023-09-12  6:49                                         ` Parav Pandit
2023-09-12  7:29                                         ` [virtio-dev] " Zhu, Lingshan
2023-09-12  7:29                                           ` Zhu, Lingshan
2023-09-12  7:53                                           ` Parav Pandit
2023-09-12  7:53                                             ` [virtio-dev] " Parav Pandit
2023-09-12  9:06                                             ` [virtio-dev] " Zhu, Lingshan
2023-09-12  9:06                                               ` Zhu, Lingshan
2023-09-12  9:08                                               ` [virtio-dev] " Zhu, Lingshan
2023-09-12  9:08                                                 ` Zhu, Lingshan
2023-09-12  9:35                                                 ` [virtio-dev] " Parav Pandit
2023-09-12  9:35                                                   ` Parav Pandit
2023-09-12 10:14                                                   ` [virtio-dev] " Zhu, Lingshan
2023-09-12 10:14                                                     ` Zhu, Lingshan
2023-09-12 10:16                                                     ` [virtio-dev] " Parav Pandit
2023-09-12 10:16                                                       ` Parav Pandit
2023-09-12 10:28                                                       ` [virtio-dev] " Zhu, Lingshan
2023-09-12 10:28                                                         ` Zhu, Lingshan
2023-09-13  2:23                                                     ` [virtio-dev] " Parav Pandit
2023-09-13  2:23                                                       ` Parav Pandit
2023-09-13  4:03                                                       ` [virtio-dev] " Zhu, Lingshan
2023-09-13  4:03                                                         ` Zhu, Lingshan
2023-09-13  4:15                                                         ` [virtio-dev] " Parav Pandit
2023-09-13  4:15                                                           ` Parav Pandit
2023-09-13  4:21                                                           ` [virtio-dev] " Zhu, Lingshan
2023-09-13  4:21                                                             ` Zhu, Lingshan
2023-09-13  4:37                                                             ` [virtio-dev] " Parav Pandit
2023-09-13  4:37                                                               ` Parav Pandit
2023-09-14  3:11                                                               ` [virtio-dev] " Jason Wang
2023-09-14  3:11                                                                 ` Jason Wang
2023-09-17  5:25                                                                 ` [virtio-dev] " Parav Pandit
2023-09-17  5:25                                                                   ` Parav Pandit
2023-09-19  4:34                                                                   ` [virtio-dev] " Jason Wang
2023-09-19  4:34                                                                     ` Jason Wang
2023-09-19  7:32                                                                     ` [virtio-dev] " Parav Pandit
2023-09-19  7:32                                                                       ` Parav Pandit
2023-09-14  8:22                                                               ` [virtio-dev] " Zhu, Lingshan
2023-09-14  8:22                                                                 ` Zhu, Lingshan
2023-09-12  9:28                                               ` [virtio-dev] " Parav Pandit
2023-09-12  9:28                                                 ` Parav Pandit
2023-09-12 10:17                                                 ` [virtio-dev] " Zhu, Lingshan
2023-09-12 10:17                                                   ` Zhu, Lingshan
2023-09-12 10:25                                                   ` [virtio-dev] " Parav Pandit
2023-09-12 10:25                                                     ` Parav Pandit
2023-09-12 10:32                                                     ` [virtio-dev] " Zhu, Lingshan
2023-09-12 10:32                                                       ` Zhu, Lingshan
2023-09-12 10:40                                                       ` [virtio-dev] " Parav Pandit
2023-09-12 10:40                                                         ` Parav Pandit
2023-09-12 13:04                                                         ` [virtio-dev] " Zhu, Lingshan
2023-09-12 13:04                                                           ` Zhu, Lingshan
2023-09-12 13:36                                                           ` [virtio-dev] " Parav Pandit
2023-09-12 13:36                                                             ` Parav Pandit
2023-09-12  4:10                         ` [virtio-dev] " Jason Wang
2023-09-12  4:10                           ` Jason Wang
2023-09-12  6:05                           ` [virtio-dev] " Parav Pandit
2023-09-12  6:05                             ` Parav Pandit
2023-09-13  4:45                             ` [virtio-dev] " Jason Wang
2023-09-13  4:45                               ` Jason Wang
2023-09-13  6:39                               ` [virtio-dev] " Parav Pandit
2023-09-13  6:39                                 ` Parav Pandit
2023-09-14  3:08                                 ` [virtio-dev] " Jason Wang
2023-09-14  3:08                                   ` Jason Wang
2023-09-17  5:22                                   ` [virtio-dev] " Parav Pandit
2023-09-17  5:22                                     ` Parav Pandit
2023-09-19  4:32                                     ` [virtio-dev] " Jason Wang
2023-09-19  4:32                                       ` Jason Wang
2023-09-19  7:32                                       ` [virtio-dev] " Parav Pandit
2023-09-19  7:32                                         ` Parav Pandit
2023-09-13  8:27                               ` [virtio-dev] " Michael S. Tsirkin
2023-09-13  8:27                                 ` Michael S. Tsirkin
2023-09-14  3:11                                 ` [virtio-dev] " Jason Wang
2023-09-14  3:11                                   ` Jason Wang
2023-09-12  4:18             ` [virtio-dev] " Jason Wang
2023-09-12  4:18               ` Jason Wang
2023-09-12  6:11               ` [virtio-dev] " Parav Pandit
2023-09-12  6:11                 ` Parav Pandit
2023-09-12  6:43                 ` [virtio-dev] " Zhu, Lingshan
2023-09-12  6:43                   ` Zhu, Lingshan
2023-09-12  6:52                   ` [virtio-dev] " Parav Pandit
2023-09-12  6:52                     ` Parav Pandit
2023-09-12  7:36                     ` [virtio-dev] " Zhu, Lingshan
2023-09-12  7:36                       ` Zhu, Lingshan
2023-09-12  7:43                       ` [virtio-dev] " Parav Pandit
2023-09-12  7:43                         ` Parav Pandit
2023-09-12 10:27                         ` [virtio-dev] " Zhu, Lingshan
2023-09-12 10:27                           ` Zhu, Lingshan
2023-09-12 10:33                           ` Parav Pandit
2023-09-12 10:33                             ` [virtio-dev] " Parav Pandit
2023-09-12 10:35                             ` [virtio-dev] " Zhu, Lingshan
2023-09-12 10:35                               ` Zhu, Lingshan
2023-09-12 10:41                               ` [virtio-dev] " Parav Pandit
2023-09-12 10:41                                 ` Parav Pandit
2023-09-12 13:09                                 ` [virtio-dev] " Zhu, Lingshan
2023-09-12 13:09                                   ` Zhu, Lingshan
2023-09-12 13:35                                   ` [virtio-dev] " Parav Pandit
2023-09-12 13:35                                     ` Parav Pandit
2023-09-13  4:13                                     ` [virtio-dev] " Zhu, Lingshan
2023-09-13  4:13                                       ` Zhu, Lingshan
2023-09-13  4:19                                       ` [virtio-dev] " Parav Pandit
2023-09-13  4:19                                         ` Parav Pandit
2023-09-13  4:22                                         ` [virtio-dev] " Zhu, Lingshan
2023-09-13  4:22                                           ` Zhu, Lingshan
2023-09-13  4:39                                           ` [virtio-dev] " Parav Pandit
2023-09-13  4:39                                             ` Parav Pandit
2023-09-14  8:24                                             ` [virtio-dev] " Zhu, Lingshan
2023-09-14  8:24                                               ` Zhu, Lingshan
2023-09-13  4:56                                         ` [virtio-dev] " Jason Wang
2023-09-13  4:56                                           ` Jason Wang
2023-09-13  4:43                 ` [virtio-dev] " Jason Wang
2023-09-13  4:43                   ` Jason Wang
2023-09-13  4:46                   ` [virtio-dev] " Parav Pandit
2023-09-13  4:46                     ` Parav Pandit
2023-09-14  3:12                     ` [virtio-dev] " Jason Wang
2023-09-14  3:12                       ` Jason Wang
2023-09-17  5:29                       ` [virtio-dev] " Parav Pandit
2023-09-17  5:29                         ` Parav Pandit
2023-09-19  4:25                         ` [virtio-dev] " Jason Wang
2023-09-19  4:25                           ` Jason Wang
2023-09-19  7:32                           ` [virtio-dev] " Parav Pandit
2023-09-19  7:32                             ` Parav Pandit
2023-09-11  6:59           ` [virtio-dev] " Parav Pandit
2023-09-11  6:59             ` Parav Pandit
2023-09-11 10:15           ` [virtio-dev] " Michael S. Tsirkin
2023-09-11 10:15             ` Michael S. Tsirkin
2023-09-12  3:35             ` [virtio-dev] " Jason Wang
2023-09-12  3:35               ` Jason Wang
2023-09-12  3:43               ` [virtio-dev] " Zhu, Lingshan
2023-09-12  3:43                 ` Zhu, Lingshan
2023-09-14 11:27   ` [virtio-dev] " Michael S. Tsirkin
2023-09-14 11:27     ` Michael S. Tsirkin
2023-09-15  4:13     ` [virtio-dev] " Zhu, Lingshan
2023-09-15  4:13       ` Zhu, Lingshan
2023-09-06  8:29 ` [virtio-dev] Re: [virtio-comment] [PATCH 0/5] virtio: introduce SUSPEND bit and vq state Michael S. Tsirkin
2023-09-06  8:29   ` Michael S. Tsirkin
2023-09-06  8:38   ` [virtio-dev] " Zhu, Lingshan
2023-09-06  8:38     ` Zhu, Lingshan
2023-09-06 13:49     ` [virtio-dev] " Michael S. Tsirkin
2023-09-06 13:49       ` Michael S. Tsirkin
2023-09-07  1:51       ` [virtio-dev] " Zhu, Lingshan
2023-09-07  1:51         ` Zhu, Lingshan
2023-09-07 10:57       ` [virtio-dev] " Eugenio Perez Martin
2023-09-07 10:57         ` Eugenio Perez Martin
2023-09-07 19:55         ` [virtio-dev] " Michael S. Tsirkin
2023-09-07 19:55           ` Michael S. Tsirkin
2023-09-14 11:14 ` [virtio-dev] " Michael S. Tsirkin
2023-09-14 11:14   ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  4:28   ` [virtio-dev] " Zhu, Lingshan
2023-09-17  5:32     ` Parav Pandit
2023-09-18  3:10       ` Zhu, Lingshan
2023-09-18  4:32         ` Parav Pandit
2023-09-18  5:21           ` Zhu, Lingshan
2023-09-18  5:25             ` Zhu, Lingshan
2023-09-18  6:37               ` Parav Pandit
2023-09-18  6:49                 ` Zhu, Lingshan
2023-09-18  6:54                   ` Parav Pandit
2023-09-18  9:34                     ` Zhu, Lingshan
2023-09-18 18:41                       ` Parav Pandit
2023-09-18 18:49                         ` Michael S. Tsirkin
2023-09-20  6:06                           ` Zhu, Lingshan
2023-09-20  6:08                             ` Parav Pandit
2023-09-20  6:31                               ` Zhu, Lingshan
2023-09-20  8:34                                 ` Parav Pandit
2023-09-20  9:44                                   ` Zhu, Lingshan
2023-09-20  9:52                                     ` Parav Pandit
2023-09-20 11:11                                       ` Zhu, Lingshan
2023-09-20 11:15                                         ` Parav Pandit
2023-09-20 11:27                                           ` Zhu, Lingshan
2023-09-21  5:13                                             ` Michael S. Tsirkin
2023-09-20 10:36                             ` Michael S. Tsirkin
2023-09-20 10:55                               ` Parav Pandit
2023-09-20 11:28                                 ` Zhu, Lingshan
2023-09-20 11:52                                   ` Michael S. Tsirkin
2023-09-20 12:05                                     ` Zhu, Lingshan
2023-09-20 12:08                                       ` Zhu, Lingshan
2023-09-20 12:22                                       ` Michael S. Tsirkin
2023-09-20 11:22                               ` Zhu, Lingshan
2023-09-20 12:05                                 ` Michael S. Tsirkin
2023-09-20 12:13                                   ` Parav Pandit
2023-09-20 12:16                                   ` Zhu, Lingshan
2023-09-20 12:40                                     ` Michael S. Tsirkin
2023-09-21  3:14                                       ` Jason Wang
2023-09-21  3:51                                         ` Parav Pandit
2023-09-21  4:02                                           ` Jason Wang
2023-09-21  4:11                                             ` Parav Pandit
2023-09-21  4:19                                               ` Jason Wang
2023-09-21  4:29                                                 ` Parav Pandit
2023-09-22  3:13                                                   ` Jason Wang
2023-09-20 12:41                                   ` Michael S. Tsirkin
2023-09-20 13:41                                     ` Parav Pandit
2023-09-20 14:13                                       ` Michael S. Tsirkin
2023-09-20 14:16                                       ` Michael S. Tsirkin
2023-09-20 17:21                                         ` Parav Pandit
2023-09-20 20:03                                           ` Michael S. Tsirkin
2023-09-21  3:43                                             ` Parav Pandit
2023-09-21  5:41                                               ` Michael S. Tsirkin
2023-09-21  5:54                                                 ` Parav Pandit
2023-09-21  6:06                                                   ` Michael S. Tsirkin
2023-09-21  6:31                                                     ` Parav Pandit
2023-09-21  7:20                                                       ` Michael S. Tsirkin
2023-09-21  7:53                                                         ` Parav Pandit
2023-09-21  8:11                                                           ` Michael S. Tsirkin
2023-09-21  9:17                                                             ` Parav Pandit
2023-09-21 10:01                                                               ` Michael S. Tsirkin
2023-09-21 11:13                                                                 ` Parav Pandit
2023-09-21 10:09                                                               ` Michael S. Tsirkin
2023-09-21 10:39                                                                 ` Parav Pandit
2023-09-21 12:22                                                                   ` Michael S. Tsirkin
2023-09-21 12:39                                                                     ` Parav Pandit
2023-09-21 13:04                                                                       ` Michael S. Tsirkin
2023-09-22  3:31                                                                   ` Jason Wang
2023-09-21  9:06                                                 ` Zhu, Lingshan
2023-09-21  9:06                                                   ` [virtio-comment] " Zhu, Lingshan
2023-09-21  9:18                                       ` Zhu, Lingshan
2023-09-21  9:18                                         ` [virtio-comment] " Zhu, Lingshan
2023-09-21  9:26                                         ` Parav Pandit
2023-09-21  9:26                                           ` [virtio-comment] " Parav Pandit
2023-09-21  9:55                                           ` Zhu, Lingshan
2023-09-21  9:55                                             ` [virtio-comment] " Zhu, Lingshan
2023-09-21 11:28                                             ` Parav Pandit
2023-09-21 11:28                                               ` [virtio-comment] " Parav Pandit
2023-09-22  2:40                                               ` Zhu, Lingshan
2023-09-22  2:40                                                 ` [virtio-comment] " Zhu, Lingshan
2023-09-21  3:26                                     ` Jason Wang
2023-09-21  4:21                                       ` Parav Pandit
2023-09-21  3:18                                   ` Jason Wang
2023-09-21  4:03                                     ` Parav Pandit
2023-09-21  3:17                               ` Jason Wang
2023-09-21  4:01                                 ` Parav Pandit
2023-09-21  4:09                                   ` Jason Wang
2023-09-21  4:19                                     ` Parav Pandit
2023-09-22  3:08                                       ` Jason Wang
2023-09-22  3:39                                         ` [virtio-comment] " Zhu, Lingshan
2023-09-22  3:39                                           ` Zhu, Lingshan
2023-09-25 10:41                                         ` Parav Pandit
2023-09-26  2:45                                           ` Jason Wang
2023-09-26  3:40                                             ` Parav Pandit
2023-09-26  4:37                                               ` Jason Wang
2023-09-26  5:21                                                 ` Parav Pandit
2023-10-09  8:49                                                   ` Jason Wang
2023-10-12 10:03                                                     ` Michael S. Tsirkin
2023-09-27 15:31                                                 ` Michael S. Tsirkin
2023-09-26  5:36                                               ` Zhu, Lingshan
2023-09-26  5:36                                                 ` [virtio-comment] " Zhu, Lingshan
2023-09-26  6:03                                                 ` Parav Pandit
2023-09-26  6:03                                                   ` [virtio-comment] " Parav Pandit
2023-09-26  9:25                                                   ` Zhu, Lingshan
2023-09-26  9:25                                                     ` [virtio-comment] " Zhu, Lingshan
2023-09-26 10:48                                                     ` Michael S. Tsirkin
2023-09-26 10:48                                                       ` [virtio-comment] " Michael S. Tsirkin
2023-09-27  8:20                                                       ` Zhu, Lingshan
2023-09-27  8:20                                                         ` [virtio-comment] " Zhu, Lingshan
2023-09-27 10:39                                                         ` Parav Pandit
2023-09-27 10:39                                                           ` [virtio-comment] " Parav Pandit
2023-10-09 10:05                                                           ` [virtio-comment] " Zhu, Lingshan
2023-10-09 10:05                                                             ` Zhu, Lingshan
2023-10-09 10:07                                                             ` [virtio-comment] " Parav Pandit
2023-10-09 10:07                                                               ` Parav Pandit
2023-09-27 15:40                                                         ` Michael S. Tsirkin
2023-09-27 15:40                                                           ` [virtio-comment] " Michael S. Tsirkin
2023-10-09 10:01                                                           ` Zhu, Lingshan
2023-10-09 10:01                                                             ` [virtio-comment] " Zhu, Lingshan
2023-10-11 10:20                                                             ` [virtio-dev] " Michael S. Tsirkin
2023-10-11 10:20                                                               ` Michael S. Tsirkin
2023-10-11 10:38                                                               ` [virtio-dev] " Zhu, Lingshan
2023-10-11 10:38                                                                 ` Zhu, Lingshan
2023-10-11 11:52                                                                 ` [virtio-dev] " Parav Pandit
2023-10-11 11:52                                                                   ` Parav Pandit
2023-10-12 10:57                                                                   ` [virtio-dev] " Zhu, Lingshan
2023-10-12 10:57                                                                     ` Zhu, Lingshan
2023-10-12 11:13                                                                     ` [virtio-dev] " Michael S. Tsirkin
2023-10-12 11:13                                                                       ` Michael S. Tsirkin
2023-10-12  9:59                                                                 ` [virtio-dev] " Michael S. Tsirkin
2023-10-12  9:59                                                                   ` Michael S. Tsirkin
2023-10-12 10:49                                                                   ` [virtio-dev] " Zhu, Lingshan
2023-10-12 10:49                                                                     ` Zhu, Lingshan
2023-10-12 11:12                                                                     ` [virtio-dev] " Michael S. Tsirkin
2023-10-12 11:12                                                                       ` Michael S. Tsirkin
2023-10-13 10:18                                                                       ` [virtio-dev] " Zhu, Lingshan
2023-10-13 10:18                                                                         ` [virtio-comment] " Zhu, Lingshan
2023-10-12 14:38                                                                     ` Michael S. Tsirkin
2023-10-12 14:38                                                                       ` Michael S. Tsirkin
2023-10-13 10:23                                                                       ` [virtio-dev] " Zhu, Lingshan
2023-10-13 10:23                                                                         ` Zhu, Lingshan
2023-09-27 21:43                                           ` Michael S. Tsirkin
2023-09-19  8:01                         ` Zhu, Lingshan
2023-09-19  9:06                           ` Parav Pandit
2023-09-19 10:03                             ` Zhu, Lingshan
2023-09-19  4:27                     ` Jason Wang
2023-09-19  7:32                       ` Parav Pandit
2023-09-19  7:46                         ` Zhu, Lingshan
2023-09-19  7:53                           ` Parav Pandit
2023-09-19  8:03                             ` Zhu, Lingshan
2023-09-19  8:31                               ` Parav Pandit
2023-09-19  8:39                                 ` Zhu, Lingshan
2023-09-19  9:09                                   ` Parav Pandit
2023-09-14 11:37 ` Michael S. Tsirkin
2023-09-14 11:37   ` [virtio-comment] " Michael S. Tsirkin
2023-09-15  4:41   ` [virtio-dev] " Zhu, Lingshan
2023-09-15  4:41     ` [virtio-comment] " Zhu, Lingshan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.