All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability
@ 2020-12-04  5:46 Jason Wang
  2020-12-17 15:28 ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Wang @ 2020-12-04  5:46 UTC (permalink / raw)
  To: virtio-comment, mst; +Cc: eperezma, Jason Wang

This patch tries to introduce device state capability in order to be
self virtualized. One possible user is the support of live migration.

The minimal requirements for achieving are:

- The ability to pause and resume the the datapath. This patch choose
  to do this per virtqueue via a dedicated rw field queue_status to do
  this. Driver may choose to write 0 to stop a queue and device can
  only advertise the stop when it completes all the inflight
  descriptors.
- The ability to set and get the device internal ring state. As an RFC
  this patch only introduce the last_avail_idx field for split
  ring. We could extend it to support packed ring (e.g with the wrap
  counters).

We probably need a new feature bit for advertising this new feature
but this draft should be sufficient for starting the discussion.

Several other choices:

1) allow writing 0 to queue_enable when the new feature is negotiated
2) reuse the common structure for queue index save and restore
3) introduce new virtio device status to stop/resume the whole device

Please share your thoughts.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 content.tex | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/content.tex b/content.tex
index 61eab41..7cdb66a 100644
--- a/content.tex
+++ b/content.tex
@@ -705,6 +705,8 @@ \subsection{Virtio Structure PCI Capabilities}\label{sec:Virtio Transport Option
 #define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8
 /* Vendor-specific data */
 #define VIRTIO_PCI_CAP_VENDOR_CFG        9
+/* Device state */
+#define VIRTIO_PCI_CAP_DEVICE_STATE_CFG        10
 \end{lstlisting}
 
         Any other value is reserved for future use.
@@ -1247,6 +1249,55 @@ \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport O
 specified by some other Virtio Structure PCI Capability
 of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
 
+\subsubsection{Device State capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device state capability}
+
+The device state structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_DEVICE_STATE_CFG capability; its layout is below.
+
+\begin{lstlisting}
+struct virtio_pci_device_state {
+        /* About a specific virtqueue. */
+        le16 queue_select;              /* read-write */
+        le16 last_avail_idx;            /* read-write */
+        u8 queue_status;                /* read-write */
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{queue_select}]
+        Queue Select. The driver selects which virtqueue the following
+        fields refer to.
+
+\item[\field{last_avail_idx}]
+        Last avail index for this virtqueue that is used by the device.
+
+\item[\field{queue_status}]
+        The driver uses this to stop the executing request for this virtqueue.
+        1 - running; 0 - stopped.
+\end{description}
+
+\devicenormative{\paragraph}{PCI Device State capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI Device State capability}
+
+The device MUST present a 0 in \field{queue_status} on reset.
+
+The device MUST present a 1 in \field{queue_status} upon writing 1
+to \field{queue_enable} from common configuration structure by driver.
+
+The device MUST complete all inflight requests before presenting a 0
+in \field{queue_status}.
+
+The device MUST ignore the write to \field{queue_status} if the
+virtqueue is unavailable when the value \field{queue_enable} is zero
+from common configuration structure.
+
+\drivernormative{\paragraph}{PCI Device State capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI Device State capability}
+
+To stop using the queue the driver MUST write zero (0x0) to this
+\field{queue_status} and MUST read the value back to ensure
+synchronization.
+
+The driver MUST NOT access \field{last_avail_idx} when
+\field{queue_status} is not zero.
+
 \subsubsection{Legacy Interfaces: A Note on PCI Device Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
 
 Transitional devices MUST present part of configuration
-- 
2.25.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability
  2020-12-04  5:46 [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability Jason Wang
@ 2020-12-17 15:28 ` Stefan Hajnoczi
  2020-12-18  3:16   ` Jason Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2020-12-17 15:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, mst, eperezma, sgarzare

[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]

On Fri, Dec 04, 2020 at 01:46:15PM +0800, Jason Wang wrote:
> @@ -1247,6 +1249,55 @@ \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport O
>  specified by some other Virtio Structure PCI Capability
>  of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
>  
> +\subsubsection{Device State capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device state capability}
> +
> +The device state structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_DEVICE_STATE_CFG capability; its layout is below.
> +
> +\begin{lstlisting}
> +struct virtio_pci_device_state {
> +        /* About a specific virtqueue. */
> +        le16 queue_select;              /* read-write */
> +        le16 last_avail_idx;            /* read-write */
> +        u8 queue_status;                /* read-write */

Starting/stopping queues is useful for managing multiqueue across CPU
hotplug. Perhaps the queue_status functionality should be generic and
not related to device state save/load?

> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{queue_select}]
> +        Queue Select. The driver selects which virtqueue the following
> +        fields refer to.
> +
> +\item[\field{last_avail_idx}]
> +        Last avail index for this virtqueue that is used by the device.

This is only sufficient to save/load devices where all device state is
visible to the driver (I would call them "stateless"). virtio-gpu,
virtio-crypto, and virtio-fs are "stateful" devices in the sense that
devices contain internal state that is not directly visible to the
driver but is needed to save/load the device.

If we want to save/load stateful devices then an additional interface is
needed for this data. It can be large in size (e.g. imagine graphics
buffers or many thousands of inodes) and should therefore probably
support iterative saving.

> +\item[\field{queue_status}]
> +        The driver uses this to stop the executing request for this virtqueue.
> +        1 - running; 0 - stopped.
> +\end{description}

A per-virtqueue stopped state is not enough to stop a device because
some devices have global behavior like config change events. It must be
possible to completely stop the device to prevent further activity
(VIRTIO config space changes, interrupts, etc).

Maybe a Device Status register bit can be added for that?

> +
> +\devicenormative{\paragraph}{PCI Device State capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI Device State capability}
> +
> +The device MUST present a 0 in \field{queue_status} on reset.
> +
> +The device MUST present a 1 in \field{queue_status} upon writing 1
> +to \field{queue_enable} from common configuration structure by driver.
> +
> +The device MUST complete all inflight requests before presenting a 0
> +in \field{queue_status}.

Is there a definition of "inflight requests"? I think the precise
meaning here is avail requests that the device is processing. It does
not include avail requests that the device has not yet started
processing.

Is there any notification when the queue_status bit transitions from 1
to 0? If not, then drivers must poll the register to detect completion.

Is it possible to resume a virtqueue that is in the process of stopping
(for example, if the guest driver decides a timeout has been reached and
it no longer wishes to stop the virtqueue)?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability
  2020-12-17 15:28 ` Stefan Hajnoczi
@ 2020-12-18  3:16   ` Jason Wang
  2020-12-18  4:58     ` Jason Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Wang @ 2020-12-18  3:16 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, mst, eperezma, sgarzare


On 2020/12/17 下午11:28, Stefan Hajnoczi wrote:
> On Fri, Dec 04, 2020 at 01:46:15PM +0800, Jason Wang wrote:
>> @@ -1247,6 +1249,55 @@ \subsubsection{PCI configuration access capability}\label{sec:Virtio Transport O
>>   specified by some other Virtio Structure PCI Capability
>>   of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
>>   
>> +\subsubsection{Device State capability}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / Device state capability}
>> +
>> +The device state structure is found at the \field{bar} and \field{offset} within the VIRTIO_PCI_CAP_DEVICE_STATE_CFG capability; its layout is below.
>> +
>> +\begin{lstlisting}
>> +struct virtio_pci_device_state {
>> +        /* About a specific virtqueue. */
>> +        le16 queue_select;              /* read-write */
>> +        le16 last_avail_idx;            /* read-write */
>> +        u8 queue_status;                /* read-write */
> Starting/stopping queues is useful for managing multiqueue across CPU
> hotplug. Perhaps the queue_status functionality should be generic and
> not related to device state save/load?


That's fine, actually, I was considering reusing queue_enable which 
forbids to be write with 0 now. My idea is to allow zero to be wrote if 
some features is negotiated.


>
>> +};
>> +\end{lstlisting}
>> +
>> +\begin{description}
>> +\item[\field{queue_select}]
>> +        Queue Select. The driver selects which virtqueue the following
>> +        fields refer to.
>> +
>> +\item[\field{last_avail_idx}]
>> +        Last avail index for this virtqueue that is used by the device.
> This is only sufficient to save/load devices where all device state is
> visible to the driver (I would call them "stateless"). virtio-gpu,
> virtio-crypto, and virtio-fs are "stateful" devices in the sense that
> devices contain internal state that is not directly visible to the
> driver but is needed to save/load the device.


Right, this is the support at generic device level.

For stateful device, it needs device specific extension which I think is 
hard to be generalized.


>
> If we want to save/load stateful devices then an additional interface is
> needed for this data. It can be large in size (e.g. imagine graphics
> buffers or many thousands of inodes) and should therefore probably
> support iterative saving.


Yes, this needs more consideration. But we can solve the generic 
virtqueue level first.


>
>> +\item[\field{queue_status}]
>> +        The driver uses this to stop the executing request for this virtqueue.
>> +        1 - running; 0 - stopped.
>> +\end{description}
> A per-virtqueue stopped state is not enough to stop a device because
> some devices have global behavior like config change events. It must be
> possible to completely stop the device to prevent further activity
> (VIRTIO config space changes, interrupts, etc).
>
> Maybe a Device Status register bit can be added for that?


I've considered such case, one of the idea is to introduce a new device 
status to achieve which seems more generic. Then we need to define a 
state machine carefully.


>
>> +
>> +\devicenormative{\paragraph}{PCI Device State capability}{Virtio Transport Options / Virtio Over PCI Bus / PCI Device Layout / PCI Device State capability}
>> +
>> +The device MUST present a 0 in \field{queue_status} on reset.
>> +
>> +The device MUST present a 1 in \field{queue_status} upon writing 1
>> +to \field{queue_enable} from common configuration structure by driver.
>> +
>> +The device MUST complete all inflight requests before presenting a 0
>> +in \field{queue_status}.
> Is there a definition of "inflight requests"? I think the precise
> meaning here is avail requests that the device is processing. It does
> not include avail requests that the device has not yet started
> processing.


Right.


>
> Is there any notification when the queue_status bit transitions from 1
> to 0? If not, then drivers must poll the register to detect completion.


To be simple, this draft requires the driver to poll for the status.


>
> Is it possible to resume a virtqueue that is in the process of stopping
> (for example, if the guest driver decides a timeout has been reached and
> it no longer wishes to stop the virtqueue)?


I think the answer is yes, we can clarify this in the spec.

Thanks



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability
  2020-12-18  3:16   ` Jason Wang
@ 2020-12-18  4:58     ` Jason Wang
  2020-12-18 17:28       ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Jason Wang @ 2020-12-18  4:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, mst, eperezma, sgarzare


On 2020/12/18 上午11:16, Jason Wang wrote:
>>>
>> A per-virtqueue stopped state is not enough to stop a device because
>> some devices have global behavior like config change events. It must be
>> possible to completely stop the device to prevent further activity
>> (VIRTIO config space changes, interrupts, etc).
>>
>> Maybe a Device Status register bit can be added for that?
>
>
> I've considered such case, one of the idea is to introduce a new 
> device status to achieve which seems more generic. Then we need to 
> define a state machine carefully.


So I post another RFC to introduce a new device status bit. Please review.

Thanks


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability
  2020-12-18  4:58     ` Jason Wang
@ 2020-12-18 17:28       ` Stefan Hajnoczi
  2020-12-21  3:43         ` Jason Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2020-12-18 17:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtio-comment, mst, eperezma, sgarzare

[-- Attachment #1: Type: text/plain, Size: 866 bytes --]

On Fri, Dec 18, 2020 at 12:58:21PM +0800, Jason Wang wrote:
> 
> On 2020/12/18 上午11:16, Jason Wang wrote:
> > > > 
> > > A per-virtqueue stopped state is not enough to stop a device because
> > > some devices have global behavior like config change events. It must be
> > > possible to completely stop the device to prevent further activity
> > > (VIRTIO config space changes, interrupts, etc).
> > > 
> > > Maybe a Device Status register bit can be added for that?
> > 
> > 
> > I've considered such case, one of the idea is to introduce a new device
> > status to achieve which seems more generic. Then we need to define a
> > state machine carefully.
> 
> 
> So I post another RFC to introduce a new device status bit. Please review.

Thanks, that sounds good. I will be offline until January 4th and will
try to catch up then.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability
  2020-12-18 17:28       ` Stefan Hajnoczi
@ 2020-12-21  3:43         ` Jason Wang
  0 siblings, 0 replies; 6+ messages in thread
From: Jason Wang @ 2020-12-21  3:43 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-comment, mst, eperezma, sgarzare


On 2020/12/19 上午1:28, Stefan Hajnoczi wrote:
> On Fri, Dec 18, 2020 at 12:58:21PM +0800, Jason Wang wrote:
>> On 2020/12/18 上午11:16, Jason Wang wrote:
>>>> A per-virtqueue stopped state is not enough to stop a device because
>>>> some devices have global behavior like config change events. It must be
>>>> possible to completely stop the device to prevent further activity
>>>> (VIRTIO config space changes, interrupts, etc).
>>>>
>>>> Maybe a Device Status register bit can be added for that?
>>>
>>> I've considered such case, one of the idea is to introduce a new device
>>> status to achieve which seems more generic. Then we need to define a
>>> state machine carefully.
>>
>> So I post another RFC to introduce a new device status bit. Please review.
> Thanks, that sounds good. I will be offline until January 4th and will
> try to catch up then.
>
> Stefan


Great.

Thanks



This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-21  3:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04  5:46 [virtio-comment] [RFC PATCH] virtio-pci: introduce device state capability Jason Wang
2020-12-17 15:28 ` Stefan Hajnoczi
2020-12-18  3:16   ` Jason Wang
2020-12-18  4:58     ` Jason Wang
2020-12-18 17:28       ` Stefan Hajnoczi
2020-12-21  3:43         ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.