All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
@ 2020-12-18  4:23 Jason Wang
  2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella
                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Jason Wang @ 2020-12-18  4:23 UTC (permalink / raw)
  To: stefanha, virtio-comment, mst; +Cc: eperezma, sgarzare, Jason Wang

This patch introduces a new status bit DEVICE_STOPPED. This will be
used by the driver to stop and resume a device. The main user will be
live migration support for virtio device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 content.tex | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/content.tex b/content.tex
index 61eab41..4392b60 100644
--- a/content.tex
+++ b/content.tex
@@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
   drive the device.
 
+\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
+  indicates that the device has been stopped by the driver.
+
 \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
   an error from which it can't recover.
 \end{description}
@@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 \ref{sec:General Initialization And Device Operation / Device
 Initialization}.
 The driver MUST NOT clear a
-\field{device status} bit.  If the driver sets the FAILED bit,
-the driver MUST later reset the device before attempting to re-initialize.
+\field{device status} bit other than DEVICE_STOPPED.  If the
+driver sets the FAILED bit, the driver MUST later reset the device
+before attempting to re-initialize.
 
 The driver SHOULD NOT rely on completion of operations of a
 device if DEVICE_NEEDS_RESET is set.
@@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
 recover by issuing a reset.
 \end{note}
 
+The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
+set. In order to stop the device, the driver MUST set DEVICE_STOPPED
+first and re-read status to check whether DEVICE_STOPPED is set by the
+device. In order to resume the device, the driver MUST clear
+DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
+is cleared by the device.
+
 \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
 The device MUST initialize \field{device status} to 0 upon reset.
 
 The device MUST NOT consume buffers or send any used buffer
 notifications to the driver before DRIVER_OK.
 
+The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
+
+When driver is trying to set DEVICE_STOPPED, the device MUST not
+process new avail requests and MUST complete all requests that is
+currently processing before setting DEVICE_STOPPED.
+
+The device MUST keep the config space unchanged when DEVICE_STOPPED is
+set.
+
 \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
 that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
 MUST send a device configuration change notification to the driver.
@@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
   \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
   that the driver passes extra data (besides identifying the virtqueue)
   in its device notifications.
+  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
+  driver can stop and resume the device.
   See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
 \end{description}
 
-- 
2.25.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-18  4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang
@ 2020-12-18 10:15 ` Stefano Garzarella
  2020-12-21  3:08   ` Jason Wang
  2020-12-21 21:33 ` [virtio-comment] " Halil Pasic
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Stefano Garzarella @ 2020-12-18 10:15 UTC (permalink / raw)
  To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma

On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote:
>This patch introduces a new status bit DEVICE_STOPPED. This will be
>used by the driver to stop and resume a device. The main user will be
>live migration support for virtio device.
>
>Signed-off-by: Jason Wang <jasowang@redhat.com>
>---
> content.tex | 26 ++++++++++++++++++++++++--
> 1 file changed, 24 insertions(+), 2 deletions(-)
>
>diff --git a/content.tex b/content.tex
>index 61eab41..4392b60 100644
>--- a/content.tex
>+++ b/content.tex
>@@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>   drive the device.
>
>+\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>+  indicates that the device has been stopped by the driver.
>+

Just for curiosity, why 32 and not 16?
Is there any rule for high bits and low bits of Device Status field?

> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has 
> experienced
>   an error from which it can't recover.
> \end{description}
>@@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> \ref{sec:General Initialization And Device Operation / Device
> Initialization}.
> The driver MUST NOT clear a
>-\field{device status} bit.  If the driver sets the FAILED bit,
>-the driver MUST later reset the device before attempting to re-initialize.
>+\field{device status} bit other than DEVICE_STOPPED.  If the
>+driver sets the FAILED bit, the driver MUST later reset the device
>+before attempting to re-initialize.
>
> The driver SHOULD NOT rely on completion of operations of a
> device if DEVICE_NEEDS_RESET is set.
>@@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> recover by issuing a reset.
> \end{note}
>
>+The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>+set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>+first and re-read status to check whether DEVICE_STOPPED is set by the
>+device. In order to resume the device, the driver MUST clear
>+DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>+is cleared by the device.
>+
> \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
> The device MUST initialize \field{device status} to 0 upon reset.
>
> The device MUST NOT consume buffers or send any used buffer
> notifications to the driver before DRIVER_OK.
>
>+The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>+
>+When driver is trying to set DEVICE_STOPPED, the device MUST not
>+process new avail requests and MUST complete all requests that is
>+currently processing before setting DEVICE_STOPPED.
>+
>+The device MUST keep the config space unchanged when DEVICE_STOPPED is
>+set.

Maybe here we can specify if "DEVICE_STOPPED is set by the driver" or 
when the device set it after completing all the requests.

>+
> \label{sec:Basic Facilities of a Virtio Device / Device Status Field / 
> DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it 
> enters an error state
> that a reset is needed.  If DRIVER_OK is set, after it sets 
> DEVICE_NEEDS_RESET, the device
> MUST send a device configuration change notification to the driver.
>@@ -6553,6 +6573,8 @@ \chapter{Reserved Feature 
>Bits}\label{sec:Reserved Feature Bits}
>   \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>   that the driver passes extra data (besides identifying the 
>   virtqueue)
>   in its device notifications.
>+  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
>+  driver can stop and resume the device.
>   See \ref{sec:Virtqueues / Driver 
>   notifications}~\nameref{sec:Virtqueues / Driver notifications}.
> \end{description}
>

Since the completion of in-flight requests can take time (I'm thinking 
of block devices), should we consider a notification when the device 
completes all the pending requests and set DEVICE_STOPPED.

Something similar to the notification when DEVICE_NEEDS_RESET is set.

Thanks,
Stefano


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella
@ 2020-12-21  3:08   ` Jason Wang
  2020-12-21 11:06     ` Stefano Garzarella
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-21  3:08 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: stefanha, virtio-comment, mst, eperezma


On 2020/12/18 下午6:15, Stefano Garzarella wrote:
> On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote:
>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>> used by the driver to stop and resume a device. The main user will be
>> live migration support for virtio device.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> content.tex | 26 ++++++++++++++++++++++++--
>> 1 file changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/content.tex b/content.tex
>> index 61eab41..4392b60 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} 
>> Field}\label{sec:Basic Facilities of a Virtio Dev
>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>   drive the device.
>>
>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>> +  indicates that the device has been stopped by the driver.
>> +
>
> Just for curiosity, why 32 and not 16?
> Is there any rule for high bits and low bits of Device Status field?


I'm not sure. I just spot that DEVICE_NEEDS_RESET (with DEVICE prefix) 
is using 64.


>
>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>   an error from which it can't recover.
>> \end{description}
>> @@ -58,8 +61,9 @@ \section{\field{Device Status} 
>> Field}\label{sec:Basic Facilities of a Virtio Dev
>> \ref{sec:General Initialization And Device Operation / Device
>> Initialization}.
>> The driver MUST NOT clear a
>> -\field{device status} bit.  If the driver sets the FAILED bit,
>> -the driver MUST later reset the device before attempting to 
>> re-initialize.
>> +\field{device status} bit other than DEVICE_STOPPED.  If the
>> +driver sets the FAILED bit, the driver MUST later reset the device
>> +before attempting to re-initialize.
>>
>> The driver SHOULD NOT rely on completion of operations of a
>> device if DEVICE_NEEDS_RESET is set.
>> @@ -70,12 +74,28 @@ \section{\field{Device Status} 
>> Field}\label{sec:Basic Facilities of a Virtio Dev
>> recover by issuing a reset.
>> \end{note}
>>
>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>> +first and re-read status to check whether DEVICE_STOPPED is set by the
>> +device. In order to resume the device, the driver MUST clear
>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>> +is cleared by the device.
>> +
>> \devicenormative{\subsection}{Device Status Field}{Basic Facilities 
>> of a Virtio Device / Device Status Field}
>> The device MUST initialize \field{device status} to 0 upon reset.
>>
>> The device MUST NOT consume buffers or send any used buffer
>> notifications to the driver before DRIVER_OK.
>>
>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>> +
>> +When driver is trying to set DEVICE_STOPPED, the device MUST not
>> +process new avail requests and MUST complete all requests that is
>> +currently processing before setting DEVICE_STOPPED.
>> +
>> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
>> +set.
>
> Maybe here we can specify if "DEVICE_STOPPED is set by the driver" or 
> when the device set it after completing all the requests.


Good point. Since the bit is offered by device, so "set by the device " 
might be better.


>
>> +
>> \label{sec:Basic Facilities of a Virtio Device / Device Status Field 
>> / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it 
>> enters an error state
>> that a reset is needed.  If DRIVER_OK is set, after it sets 
>> DEVICE_NEEDS_RESET, the device
>> MUST send a device configuration change notification to the driver.
>> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature 
>> Bits}\label{sec:Reserved Feature Bits}
>>   \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>>   that the driver passes extra data (besides identifying the virtqueue)
>>   in its device notifications.
>> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
>> +  driver can stop and resume the device.
>>   See \ref{sec:Virtqueues / Driver 
>> notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>> \end{description}
>>
>
> Since the completion of in-flight requests can take time (I'm thinking 
> of block devices), should we consider a notification when the device 
> completes all the pending requests and set DEVICE_STOPPED.
>
> Something similar to the notification when DEVICE_NEEDS_RESET is set.


I'm not sure, but it looks even more complicated. Driver may do a 
periodic polling or fail the operation (like migration) when the request 
is not completed soon.

Thanks


>
> Thanks,
> Stefano
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-21  3:08   ` Jason Wang
@ 2020-12-21 11:06     ` Stefano Garzarella
  2020-12-22  2:38       ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Stefano Garzarella @ 2020-12-21 11:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma

On Mon, Dec 21, 2020 at 11:08:37AM +0800, Jason Wang wrote:
>
>On 2020/12/18 下午6:15, Stefano Garzarella wrote:
>>On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote:
>>>This patch introduces a new status bit DEVICE_STOPPED. This will be
>>>used by the driver to stop and resume a device. The main user will be
>>>live migration support for virtio device.
>>>
>>>Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>---
>>>content.tex | 26 ++++++++++++++++++++++++--
>>>1 file changed, 24 insertions(+), 2 deletions(-)
>>>
>>>diff --git a/content.tex b/content.tex
>>>index 61eab41..4392b60 100644
>>>--- a/content.tex
>>>+++ b/content.tex
>>>@@ -47,6 +47,9 @@ \section{\field{Device Status} 
>>>Field}\label{sec:Basic Facilities of a Virtio Dev
>>>\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>  drive the device.
>>>
>>>+\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>>>+  indicates that the device has been stopped by the driver.
>>>+
>>
>>Just for curiosity, why 32 and not 16?
>>Is there any rule for high bits and low bits of Device Status field?
>
>
>I'm not sure. I just spot that DEVICE_NEEDS_RESET (with DEVICE prefix) 
>is using 64.
>
>
>>
>>>\item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>>  an error from which it can't recover.
>>>\end{description}
>>>@@ -58,8 +61,9 @@ \section{\field{Device Status} 
>>>Field}\label{sec:Basic Facilities of a Virtio Dev
>>>\ref{sec:General Initialization And Device Operation / Device
>>>Initialization}.
>>>The driver MUST NOT clear a
>>>-\field{device status} bit.  If the driver sets the FAILED bit,
>>>-the driver MUST later reset the device before attempting to 
>>>re-initialize.
>>>+\field{device status} bit other than DEVICE_STOPPED.  If the
>>>+driver sets the FAILED bit, the driver MUST later reset the device
>>>+before attempting to re-initialize.
>>>
>>>The driver SHOULD NOT rely on completion of operations of a
>>>device if DEVICE_NEEDS_RESET is set.
>>>@@ -70,12 +74,28 @@ \section{\field{Device Status} 
>>>Field}\label{sec:Basic Facilities of a Virtio Dev
>>>recover by issuing a reset.
>>>\end{note}
>>>
>>>+The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>>>+set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>>>+first and re-read status to check whether DEVICE_STOPPED is set by the
>>>+device. In order to resume the device, the driver MUST clear
>>>+DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>>>+is cleared by the device.
>>>+
>>>\devicenormative{\subsection}{Device Status Field}{Basic 
>>>Facilities of a Virtio Device / Device Status Field}
>>>The device MUST initialize \field{device status} to 0 upon reset.
>>>
>>>The device MUST NOT consume buffers or send any used buffer
>>>notifications to the driver before DRIVER_OK.
>>>
>>>+The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>>>+
>>>+When driver is trying to set DEVICE_STOPPED, the device MUST not
>>>+process new avail requests and MUST complete all requests that is
>>>+currently processing before setting DEVICE_STOPPED.
>>>+
>>>+The device MUST keep the config space unchanged when DEVICE_STOPPED is
>>>+set.
>>
>>Maybe here we can specify if "DEVICE_STOPPED is set by the driver" 
>>or when the device set it after completing all the requests.
>
>
>Good point. Since the bit is offered by device, so "set by the device 
>" might be better.

Yeah, I agree.

>
>
>>
>>>+
>>>\label{sec:Basic Facilities of a Virtio Device / Device Status 
>>>Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET 
>>>when it enters an error state
>>>that a reset is needed.  If DRIVER_OK is set, after it sets 
>>>DEVICE_NEEDS_RESET, the device
>>>MUST send a device configuration change notification to the driver.
>>>@@ -6553,6 +6573,8 @@ \chapter{Reserved Feature 
>>>Bits}\label{sec:Reserved Feature Bits}
>>>  \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>>>  that the driver passes extra data (besides identifying the virtqueue)
>>>  in its device notifications.
>>>+  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
>>>+  driver can stop and resume the device.
>>>  See \ref{sec:Virtqueues / Driver 
>>>notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>>>\end{description}
>>>
>>
>>Since the completion of in-flight requests can take time (I'm 
>>thinking of block devices), should we consider a notification when 
>>the device completes all the pending requests and set 
>>DEVICE_STOPPED.
>>
>>Something similar to the notification when DEVICE_NEEDS_RESET is set.
>
>
>I'm not sure, but it looks even more complicated. Driver may do a 
>periodic polling or fail the operation (like migration) when the 
>request is not completed soon.
>

I agree that colud be more complicated for the device point of view.
Since the driver is asking the device to stop, perhaps it makes sense 
for it to periodically check the status.

In case the driver decides to abort the operation (e.g. migration), can 
it clear the bit if the device hasn't set it yet or should it wait until 
the device has stopped?
Maybe we should clarify this as well.

Thanks,
Stefano


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-18  4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang
  2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella
@ 2020-12-21 21:33 ` Halil Pasic
  2020-12-22  2:36   ` Jason Wang
  2021-05-03  9:02 ` [virtio-comment] " Eugenio Perez Martin
  2021-05-05 13:16 ` Michael S. Tsirkin
  3 siblings, 1 reply; 43+ messages in thread
From: Halil Pasic @ 2020-12-21 21:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare

On Fri, 18 Dec 2020 12:23:02 +0800
Jason Wang <jasowang@redhat.com> wrote:

> This patch introduces a new status bit DEVICE_STOPPED. This will be
> used by the driver to stop and resume a device. The main user will be
> live migration support for virtio device.
> 

Can you please provide some more background information, or point
me to the appropriate discussion?

I mean AFAIK migration already works without this driver initiated
drain. What is the exact motivation? What about the big picture? I
guess some agent in the guest would have to make the driver issue
the DEVICE_STOP.

> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  content.tex | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index 61eab41..4392b60 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>  
> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> +  indicates that the device has been stopped by the driver.
> +

AFAIU it is not only about indicating stopped, but also requesting to be
stopped.

More importantly, that must not be set immediately, in a sense that the
one side initiates some action by requesting the bit to be set, and the
other side must not set the bit before the action is performed. We also
seem to assume that every device implementation is capable of performing
this trick. Is it for hardware devices (e.g. PCI) standard to request an
operation by writing some value into a register, and get feedback bout
a non-completion by reading different value that written, and about the
completion, by reading the same value as written?


>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \ref{sec:General Initialization And Device Operation / Device
>  Initialization}.
>  The driver MUST NOT clear a
> -\field{device status} bit.  If the driver sets the FAILED bit,
> -the driver MUST later reset the device before attempting to re-initialize.
> +\field{device status} bit other than DEVICE_STOPPED.  If the
> +driver sets the FAILED bit, the driver MUST later reset the device
> +before attempting to re-initialize.
>  
>  The driver SHOULD NOT rely on completion of operations of a
>  device if DEVICE_NEEDS_RESET is set.
> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  recover by issuing a reset.
>  \end{note}
>  
> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
> +first and re-read status to check whether DEVICE_STOPPED is set by the
> +device. In order to resume the device, the driver MUST clear
> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
> +is cleared by the device.
> +
>  \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>  The device MUST initialize \field{device status} to 0 upon reset.
>  
>  The device MUST NOT consume buffers or send any used buffer
>  notifications to the driver before DRIVER_OK.
>  
> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
> +
> +When driver is trying to set DEVICE_STOPPED, the device MUST not

The when driver trying to set DEVICE_STOPPED is a bit soft as a
duration. For example consider virtio-ccw, at the moment when the driver
issues the ssch to set status, the device still does not know about it.

> +process new avail requests and MUST complete all requests that is
> +currently processing before setting DEVICE_STOPPED.

I would like to have a more precise definition of 'new avail requests'
and 'requests that is currently processing'.

> +
> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
> +set.

Here you have the set by driver which is actually requesting the stop
operation, and set by device which indicted that the stop operation
was successfully performed by the device.

> +
>  \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>  that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>  MUST send a device configuration change notification to the driver.
> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>    that the driver passes extra data (besides identifying the virtqueue)
>    in its device notifications.
> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
> +  driver can stop and resume the device.
>    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>  \end{description}
>  


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-21 21:33 ` [virtio-comment] " Halil Pasic
@ 2020-12-22  2:36   ` Jason Wang
  2020-12-22  6:50     ` Halil Pasic
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-22  2:36 UTC (permalink / raw)
  To: Halil Pasic; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/22 上午5:33, Halil Pasic wrote:
> On Fri, 18 Dec 2020 12:23:02 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>> used by the driver to stop and resume a device. The main user will be
>> live migration support for virtio device.
>>
> Can you please provide some more background information, or point
> me to the appropriate discussion?
>
> I mean AFAIK migration already works without this driver initiated
> drain. What is the exact motivation? What about the big picture? I
> guess some agent in the guest would have to make the driver issue
> the DEVICE_STOP.


This is not necessary if the datapath is done inside qemu and when 
migration is initiated by qemu itself.

But it's a must for using virtio-device as a backend for emulated virtio 
devices (e.g vhost-vDPA). In this case, qemu needs to stop the device 
then it can safely synchronize the state from them.


>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   content.tex | 26 ++++++++++++++++++++++++--
>>   1 file changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/content.tex b/content.tex
>> index 61eab41..4392b60 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>   
>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>> +  indicates that the device has been stopped by the driver.
>> +
> AFAIU it is not only about indicating stopped, but also requesting to be
> stopped.
>
> More importantly, that must not be set immediately, in a sense that the
> one side initiates some action by requesting the bit to be set, and the
> other side must not set the bit before the action is performed.


Yes.


> We also
> seem to assume that every device implementation is capable of performing
> this trick.


A dedicated feature bit is introduced for this.


> Is it for hardware devices (e.g. PCI) standard to request an
> operation by writing some value into a register, and get feedback bout
> a non-completion by reading different value that written,


This is not ununsal in other devices. And in fact, the FEATURES_OK works 
like this:

"""

The device MUST NOT offer a feature which requires another feature which 
was not offered. The device SHOULD accept any valid subset of features 
the driver accepts, otherwise it MUST fail to set the FEATURES_OK device 
status bit when the driver writes it.

"""

We've already had several hardware implementation of virtio-pci devices 
from different vendors. I didn't hear any complain about such kind of 
design.


>   and about the
> completion, by reading the same value as written?


After after DEVICE_STOPPED is read from device the driver can assume the 
device is stopped.


>
>
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \ref{sec:General Initialization And Device Operation / Device
>>   Initialization}.
>>   The driver MUST NOT clear a
>> -\field{device status} bit.  If the driver sets the FAILED bit,
>> -the driver MUST later reset the device before attempting to re-initialize.
>> +\field{device status} bit other than DEVICE_STOPPED.  If the
>> +driver sets the FAILED bit, the driver MUST later reset the device
>> +before attempting to re-initialize.
>>   
>>   The driver SHOULD NOT rely on completion of operations of a
>>   device if DEVICE_NEEDS_RESET is set.
>> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   recover by issuing a reset.
>>   \end{note}
>>   
>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>> +first and re-read status to check whether DEVICE_STOPPED is set by the
>> +device. In order to resume the device, the driver MUST clear
>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>> +is cleared by the device.
>> +
>>   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>   The device MUST initialize \field{device status} to 0 upon reset.
>>   
>>   The device MUST NOT consume buffers or send any used buffer
>>   notifications to the driver before DRIVER_OK.
>>   
>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>> +
>> +When driver is trying to set DEVICE_STOPPED, the device MUST not
> The when driver trying to set DEVICE_STOPPED is a bit soft as a
> duration. For example consider virtio-ccw, at the moment when the driver
> issues the ssch to set status, the device still does not know about it.


I need more context on this, if it works like this, when or how can 
device know the status has been changed? (E.g how reset or other status 
bit is supposed to work?)

It looks like a transport limitation if we can't guarantee this. Similar 
issue were met in the PCIE Endpoint device, but it can be workaround by 
designing a new transport.


>
>> +process new avail requests and MUST complete all requests that is
>> +currently processing before setting DEVICE_STOPPED.
> I would like to have a more precise definition of 'new avail requests'
> and 'requests that is currently processing'.


Good point. How about something like:

The device MUST stop reading requests from descriptor area or driver 
area and MUST complete all in flight requests before setting DEVICE_STOPPED.

To be 100% accurate, it looks to me we need to mention device 
implementation internals or pseudo code. I start with "in flight" but 
Stefan wants a more accurate one. Reading the spec I found "in flight" 
has been used in:

"""

The driver SHOULD NOT rely on completion of operations of a device if 
DEVICE_NEEDS_RESET is set. Note: For example, the driver can’t assume 
requests in flight will be completed if DEVICE_NEEDS_RESET is set, nor 
can it assume that they have not been completed. A good implementation 
will try to recover by issuing a reset.

"""

So we are probably fine.  Any idea or suggestion are more than welcomed.


>
>> +
>> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
>> +set.
> Here you have the set by driver which is actually requesting the stop
> operation, and set by device which indicted that the stop operation
> was successfully performed by the device.


Exactly.

Thanks


>
>> +
>>   \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>>   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>   MUST send a device configuration change notification to the driver.
>> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>>     that the driver passes extra data (besides identifying the virtqueue)
>>     in its device notifications.
>> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
>> +  driver can stop and resume the device.
>>     See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>>   \end{description}
>>   


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-21 11:06     ` Stefano Garzarella
@ 2020-12-22  2:38       ` Jason Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2020-12-22  2:38 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: stefanha, virtio-comment, mst, eperezma


On 2020/12/21 下午7:06, Stefano Garzarella wrote:
> On Mon, Dec 21, 2020 at 11:08:37AM +0800, Jason Wang wrote:
>>
>> On 2020/12/18 下午6:15, Stefano Garzarella wrote:
>>> On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote:
>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>>>> used by the driver to stop and resume a device. The main user will be
>>>> live migration support for virtio device.
>>>>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>> content.tex | 26 ++++++++++++++++++++++++--
>>>> 1 file changed, 24 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 61eab41..4392b60 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} 
>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>>   drive the device.
>>>>
>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is 
>>>> negotiated,
>>>> +  indicates that the device has been stopped by the driver.
>>>> +
>>>
>>> Just for curiosity, why 32 and not 16?
>>> Is there any rule for high bits and low bits of Device Status field?
>>
>>
>> I'm not sure. I just spot that DEVICE_NEEDS_RESET (with DEVICE 
>> prefix) is using 64.
>>
>>
>>>
>>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has 
>>>> experienced
>>>>   an error from which it can't recover.
>>>> \end{description}
>>>> @@ -58,8 +61,9 @@ \section{\field{Device Status} 
>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>> \ref{sec:General Initialization And Device Operation / Device
>>>> Initialization}.
>>>> The driver MUST NOT clear a
>>>> -\field{device status} bit.  If the driver sets the FAILED bit,
>>>> -the driver MUST later reset the device before attempting to 
>>>> re-initialize.
>>>> +\field{device status} bit other than DEVICE_STOPPED.  If the
>>>> +driver sets the FAILED bit, the driver MUST later reset the device
>>>> +before attempting to re-initialize.
>>>>
>>>> The driver SHOULD NOT rely on completion of operations of a
>>>> device if DEVICE_NEEDS_RESET is set.
>>>> @@ -70,12 +74,28 @@ \section{\field{Device Status} 
>>>> Field}\label{sec:Basic Facilities of a Virtio Dev
>>>> recover by issuing a reset.
>>>> \end{note}
>>>>
>>>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>>>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>>>> +first and re-read status to check whether DEVICE_STOPPED is set by 
>>>> the
>>>> +device. In order to resume the device, the driver MUST clear
>>>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>>>> +is cleared by the device.
>>>> +
>>>> \devicenormative{\subsection}{Device Status Field}{Basic Facilities 
>>>> of a Virtio Device / Device Status Field}
>>>> The device MUST initialize \field{device status} to 0 upon reset.
>>>>
>>>> The device MUST NOT consume buffers or send any used buffer
>>>> notifications to the driver before DRIVER_OK.
>>>>
>>>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>>>> +
>>>> +When driver is trying to set DEVICE_STOPPED, the device MUST not
>>>> +process new avail requests and MUST complete all requests that is
>>>> +currently processing before setting DEVICE_STOPPED.
>>>> +
>>>> +The device MUST keep the config space unchanged when 
>>>> DEVICE_STOPPED is
>>>> +set.
>>>
>>> Maybe here we can specify if "DEVICE_STOPPED is set by the driver" 
>>> or when the device set it after completing all the requests.
>>
>>
>> Good point. Since the bit is offered by device, so "set by the device 
>> " might be better.
>
> Yeah, I agree.
>
>>
>>
>>>
>>>> +
>>>> \label{sec:Basic Facilities of a Virtio Device / Device Status 
>>>> Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET 
>>>> when it enters an error state
>>>> that a reset is needed.  If DRIVER_OK is set, after it sets 
>>>> DEVICE_NEEDS_RESET, the device
>>>> MUST send a device configuration change notification to the driver.
>>>> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature 
>>>> Bits}\label{sec:Reserved Feature Bits}
>>>>   \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>>>>   that the driver passes extra data (besides identifying the 
>>>> virtqueue)
>>>>   in its device notifications.
>>>> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
>>>> +  driver can stop and resume the device.
>>>>   See \ref{sec:Virtqueues / Driver 
>>>> notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>>>> \end{description}
>>>>
>>>
>>> Since the completion of in-flight requests can take time (I'm 
>>> thinking of block devices), should we consider a notification when 
>>> the device completes all the pending requests and set DEVICE_STOPPED.
>>>
>>> Something similar to the notification when DEVICE_NEEDS_RESET is set.
>>
>>
>> I'm not sure, but it looks even more complicated. Driver may do a 
>> periodic polling or fail the operation (like migration) when the 
>> request is not completed soon.
>>
>
> I agree that colud be more complicated for the device point of view.
> Since the driver is asking the device to stop, perhaps it makes sense 
> for it to periodically check the status.
>
> In case the driver decides to abort the operation (e.g. migration), 
> can it clear the bit if the device hasn't set it yet or should it wait 
> until the device has stopped?
> Maybe we should clarify this as well.


Right, will add this.

Thanks


>
> Thanks,
> Stefano
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-22  2:36   ` Jason Wang
@ 2020-12-22  6:50     ` Halil Pasic
  2020-12-22  7:30       ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Halil Pasic @ 2020-12-22  6:50 UTC (permalink / raw)
  To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare

On Tue, 22 Dec 2020 10:36:41 +0800
Jason Wang <jasowang@redhat.com> wrote:

> 
> On 2020/12/22 上午5:33, Halil Pasic wrote:
> > On Fri, 18 Dec 2020 12:23:02 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >
> >> This patch introduces a new status bit DEVICE_STOPPED. This will be
> >> used by the driver to stop and resume a device. The main user will be
> >> live migration support for virtio device.
> >>
> > Can you please provide some more background information, or point
> > me to the appropriate discussion?
> >
> > I mean AFAIK migration already works without this driver initiated
> > drain. What is the exact motivation? What about the big picture? I
> > guess some agent in the guest would have to make the driver issue
> > the DEVICE_STOP.
> 
> 
> This is not necessary if the datapath is done inside qemu and when 
> migration is initiated by qemu itself.
> 
> But it's a must for using virtio-device as a backend for emulated virtio 
> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device 
> then it can safely synchronize the state from them.
> 

You say, in this case qemu needs to stop the device, which makes sense
(it also has to do this when the datapath is implemented in qemu), but
AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
confused.

I'm still curious about how the different components in the stack
(guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
supposed to interact.


> 
> >
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>   content.tex | 26 ++++++++++++++++++++++++--
> >>   1 file changed, 24 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/content.tex b/content.tex
> >> index 61eab41..4392b60 100644
> >> --- a/content.tex
> >> +++ b/content.tex
> >> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> >>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> >>     drive the device.
> >>   
> >> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> >> +  indicates that the device has been stopped by the driver.
> >> +
> > AFAIU it is not only about indicating stopped, but also requesting to be
> > stopped.
> >
> > More importantly, that must not be set immediately, in a sense that the
> > one side initiates some action by requesting the bit to be set, and the
> > other side must not set the bit before the action is performed.
> 
> 
> Yes.
> 
> 
> > We also
> > seem to assume that every device implementation is capable of performing
> > this trick.
> 
> 
> A dedicated feature bit is introduced for this.
>

This is not about the feature bit, but about the mechanism. But your
subsequent answers explain, that this is nothing unusual, and then
we should be fine.
 
> 
> > Is it for hardware devices (e.g. PCI) standard to request an
> > operation by writing some value into a register, and get feedback bout
> > a non-completion by reading different value that written,
> 
> 
> This is not ununsal in other devices. And in fact, the FEATURES_OK works 
> like this:
> 
> """
> 
> The device MUST NOT offer a feature which requires another feature which 
> was not offered. The device SHOULD accept any valid subset of features 
> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device 
> status bit when the driver writes it.
> 
> """
> 

Thanks for the pointer. I intend to have another look at how FEATURES_OK
works, and how similar this is to DEVICE_STOPPED.

> We've already had several hardware implementation of virtio-pci devices 
> from different vendors. I didn't hear any complain about such kind of 
> design.
> 
> 
> >   and about the
> > completion, by reading the same value as written?
> 
> 
> After after DEVICE_STOPPED is read from device the driver can assume the 
> device is stopped.
> 

How is whether the device has already stopped or is still in the middle
of stopping supposed/expected to affect the drivers behavior? Does the
driver actually care?

> 
> >
> >
> >>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
> >>     an error from which it can't recover.
> >>   \end{description}
> >> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> >>   \ref{sec:General Initialization And Device Operation / Device
> >>   Initialization}.
> >>   The driver MUST NOT clear a
> >> -\field{device status} bit.  If the driver sets the FAILED bit,
> >> -the driver MUST later reset the device before attempting to re-initialize.
> >> +\field{device status} bit other than DEVICE_STOPPED.  If the
> >> +driver sets the FAILED bit, the driver MUST later reset the device
> >> +before attempting to re-initialize.
> >>   
> >>   The driver SHOULD NOT rely on completion of operations of a
> >>   device if DEVICE_NEEDS_RESET is set.
> >> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> >>   recover by issuing a reset.
> >>   \end{note}
> >>   
> >> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
> >> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
> >> +first and re-read status to check whether DEVICE_STOPPED is set by the
> >> +device. In order to resume the device, the driver MUST clear
> >> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
> >> +is cleared by the device.
> >> +
> >>   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
> >>   The device MUST initialize \field{device status} to 0 upon reset.
> >>   
> >>   The device MUST NOT consume buffers or send any used buffer
> >>   notifications to the driver before DRIVER_OK.
> >>   
> >> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
> >> +
> >> +When driver is trying to set DEVICE_STOPPED, the device MUST not
> > The when driver trying to set DEVICE_STOPPED is a bit soft as a
> > duration. For example consider virtio-ccw, at the moment when the driver
> > issues the ssch to set status, the device still does not know about it.
> 
> 
> I need more context on this, if it works like this, when or how can 
> device know the status has been changed? (E.g how reset or other status 
> bit is supposed to work?)

I think we have a misunderstanding here. The device  will eventually
learn about the drivers request to write the status, it is just not
instantaneous. It is like when you turn the light switch on, it takes
some amount of time for the current to reach the lightbulb. 

> 
> It looks like a transport limitation if we can't guarantee this. Similar 
> issue were met in the PCIE Endpoint device, but it can be workaround by 
> designing a new transport.
> 

I don't think we have a problem with virtio-ccw here.

> 
> >
> >> +process new avail requests and MUST complete all requests that is
> >> +currently processing before setting DEVICE_STOPPED.
> > I would like to have a more precise definition of 'new avail requests'
> > and 'requests that is currently processing'.
> 
> 
> Good point. How about something like:
> 
> The device MUST stop reading requests from descriptor area or driver 
> area and MUST complete all in flight requests before setting DEVICE_STOPPED.
> 
> To be 100% accurate, it looks to me we need to mention device 
> implementation internals or pseudo code. I start with "in flight" but 
> Stefan wants a more accurate one. Reading the spec I found "in flight" 
> has been used in:
> 
> """
> 
> The driver SHOULD NOT rely on completion of operations of a device if 
> DEVICE_NEEDS_RESET is set. Note: For example, the driver can’t assume 
> requests in flight will be completed if DEVICE_NEEDS_RESET is set, nor 
> can it assume that they have not been completed. A good implementation 
> will try to recover by issuing a reset.
> 
> """
>

My problem is, that one can look at this from the drivers or from the
devices perspective. From the drivers perspective, every buffer that has
been made available but haven't been marked as used, is essentially a
request in flight.

From the device perspective, and probably from migration perspective, the
in-flight requests are those that were submitted to some kind of a
backend. I.e. available requests that are just sitting on the queue, but
weren't  submitted to the backend, can sit around, and get submitted
after the migration. It seems that what we want is the following: while
the DEVICE_STOPPED is set, there are no requests in the backend (i.e.
all requests submitted prior have completed, and no more request are
submitted to it).

BTW how about s/stop/drain? I think drain has more of this 'finish
what you are doing' while stop can be 'stop in the middle of it, whatever
you are doing'.

> 
> So we are probably fine.  Any idea or suggestion are more than welcomed.
> 
> 

I will think about it some more.

> >
> >> +
> >> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
> >> +set.
> > Here you have the set by driver which is actually requesting the stop
> > operation, and set by device which indicted that the stop operation
> > was successfully performed by the device.
> 
> 
> Exactly.
> 
> Thanks
> 

Thank you for all the exlanations!

Regards,
Halil

[..]

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-22  6:50     ` Halil Pasic
@ 2020-12-22  7:30       ` Jason Wang
  2020-12-22 12:14         ` Cornelia Huck
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-22  7:30 UTC (permalink / raw)
  To: Halil Pasic; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/22 下午2:50, Halil Pasic wrote:
> On Tue, 22 Dec 2020 10:36:41 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/12/22 上午5:33, Halil Pasic wrote:
>>> On Fri, 18 Dec 2020 12:23:02 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>
>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>>>> used by the driver to stop and resume a device. The main user will be
>>>> live migration support for virtio device.
>>>>
>>> Can you please provide some more background information, or point
>>> me to the appropriate discussion?
>>>
>>> I mean AFAIK migration already works without this driver initiated
>>> drain. What is the exact motivation? What about the big picture? I
>>> guess some agent in the guest would have to make the driver issue
>>> the DEVICE_STOP.
>>
>> This is not necessary if the datapath is done inside qemu and when
>> migration is initiated by qemu itself.
>>
>> But it's a must for using virtio-device as a backend for emulated virtio
>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device
>> then it can safely synchronize the state from them.
>>
> You say, in this case qemu needs to stop the device, which makes sense
> (it also has to do this when the datapath is implemented in qemu), but
> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
> confused.


It's initiated by Qemu. Guest is unware of live migration.


>
> I'm still curious about how the different components in the stack
> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
> supposed to interact.


It works like:

 From Qemu point of view, vhost-vDPA is just another type of vhost 
backend. Qemu needs to stop virtio (vhost) before it can do migration. 
So we require vDPA devices to have the ability of stopping or pausing 
its datapath. If the vDPA device is by chance the virtio-PCI device, it 
needs an interface for receiving stop/resume command from the driver.

So the devce stop/resume command was sent from Qemu to vhost-VDPA, then 
to vDPA parent which could be a virtio-PCI device in this case.


>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>>    content.tex | 26 ++++++++++++++++++++++++--
>>>>    1 file changed, 24 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/content.tex b/content.tex
>>>> index 61eab41..4392b60 100644
>>>> --- a/content.tex
>>>> +++ b/content.tex
>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>>      drive the device.
>>>>    
>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>>>> +  indicates that the device has been stopped by the driver.
>>>> +
>>> AFAIU it is not only about indicating stopped, but also requesting to be
>>> stopped.
>>>
>>> More importantly, that must not be set immediately, in a sense that the
>>> one side initiates some action by requesting the bit to be set, and the
>>> other side must not set the bit before the action is performed.
>>
>> Yes.
>>
>>
>>> We also
>>> seem to assume that every device implementation is capable of performing
>>> this trick.
>>
>> A dedicated feature bit is introduced for this.
>>
> This is not about the feature bit, but about the mechanism. But your
> subsequent answers explain, that this is nothing unusual, and then
> we should be fine.
>   
>>> Is it for hardware devices (e.g. PCI) standard to request an
>>> operation by writing some value into a register, and get feedback bout
>>> a non-completion by reading different value that written,
>>
>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
>> like this:
>>
>> """
>>
>> The device MUST NOT offer a feature which requires another feature which
>> was not offered. The device SHOULD accept any valid subset of features
>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
>> status bit when the driver writes it.
>>
>> """
>>
> Thanks for the pointer. I intend to have another look at how FEATURES_OK
> works, and how similar this is to DEVICE_STOPPED.


My understanding is that for both of them, driver can try to set the bit 
by writing to the status register but it's the device that decide when 
to set the bit.


>
>> We've already had several hardware implementation of virtio-pci devices
>> from different vendors. I didn't hear any complain about such kind of
>> design.
>>
>>
>>>    and about the
>>> completion, by reading the same value as written?
>>
>> After after DEVICE_STOPPED is read from device the driver can assume the
>> device is stopped.
>>
> How is whether the device has already stopped or is still in the middle
> of stopping supposed/expected to affect the drivers behavior? Does the
> driver actually care?


Yes, in the case of live migration. If device takes too longs time to be 
stopped. Qemu may simple fail the migration.


>
>>>
>>>>    \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>>>      an error from which it can't recover.
>>>>    \end{description}
>>>> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    \ref{sec:General Initialization And Device Operation / Device
>>>>    Initialization}.
>>>>    The driver MUST NOT clear a
>>>> -\field{device status} bit.  If the driver sets the FAILED bit,
>>>> -the driver MUST later reset the device before attempting to re-initialize.
>>>> +\field{device status} bit other than DEVICE_STOPPED.  If the
>>>> +driver sets the FAILED bit, the driver MUST later reset the device
>>>> +before attempting to re-initialize.
>>>>    
>>>>    The driver SHOULD NOT rely on completion of operations of a
>>>>    device if DEVICE_NEEDS_RESET is set.
>>>> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>    recover by issuing a reset.
>>>>    \end{note}
>>>>    
>>>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>>>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>>>> +first and re-read status to check whether DEVICE_STOPPED is set by the
>>>> +device. In order to resume the device, the driver MUST clear
>>>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>>>> +is cleared by the device.
>>>> +
>>>>    \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>>>    The device MUST initialize \field{device status} to 0 upon reset.
>>>>    
>>>>    The device MUST NOT consume buffers or send any used buffer
>>>>    notifications to the driver before DRIVER_OK.
>>>>    
>>>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>>>> +
>>>> +When driver is trying to set DEVICE_STOPPED, the device MUST not
>>> The when driver trying to set DEVICE_STOPPED is a bit soft as a
>>> duration. For example consider virtio-ccw, at the moment when the driver
>>> issues the ssch to set status, the device still does not know about it.
>>
>> I need more context on this, if it works like this, when or how can
>> device know the status has been changed? (E.g how reset or other status
>> bit is supposed to work?)
> I think we have a misunderstanding here. The device  will eventually
> learn about the drivers request to write the status, it is just not
> instantaneous. It is like when you turn the light switch on, it takes
> some amount of time for the current to reach the lightbulb.


I see, so it looks not a problem since device is not required to be 
stopped immediately if driver write to that bit.


>
>> It looks like a transport limitation if we can't guarantee this. Similar
>> issue were met in the PCIE Endpoint device, but it can be workaround by
>> designing a new transport.
>>
> I don't think we have a problem with virtio-ccw here.


Agree.


>
>>>> +process new avail requests and MUST complete all requests that is
>>>> +currently processing before setting DEVICE_STOPPED.
>>> I would like to have a more precise definition of 'new avail requests'
>>> and 'requests that is currently processing'.
>>
>> Good point. How about something like:
>>
>> The device MUST stop reading requests from descriptor area or driver
>> area and MUST complete all in flight requests before setting DEVICE_STOPPED.
>>
>> To be 100% accurate, it looks to me we need to mention device
>> implementation internals or pseudo code. I start with "in flight" but
>> Stefan wants a more accurate one. Reading the spec I found "in flight"
>> has been used in:
>>
>> """
>>
>> The driver SHOULD NOT rely on completion of operations of a device if
>> DEVICE_NEEDS_RESET is set. Note: For example, the driver can’t assume
>> requests in flight will be completed if DEVICE_NEEDS_RESET is set, nor
>> can it assume that they have not been completed. A good implementation
>> will try to recover by issuing a reset.
>>
>> """
>>
> My problem is, that one can look at this from the drivers or from the
> devices perspective. From the drivers perspective, every buffer that has
> been made available but haven't been marked as used, is essentially a
> request in flight.


Not a native speaker, but the sentence starts from "The device" so it 
should be the perspective of device.


>
>  From the device perspective, and probably from migration perspective, the
> in-flight requests are those that were submitted to some kind of a
> backend. I.e. available requests that are just sitting on the queue, but
> weren't  submitted to the backend, can sit around, and get submitted
> after the migration. It seems that what we want is the following: while
> the DEVICE_STOPPED is set, there are no requests in the backend (i.e.
> all requests submitted prior have completed, and no more request are
> submitted to it).


Yes. but do we need to mention something like "backend" in the spec? It 
sounds like a implementation detail.


>
> BTW how about s/stop/drain? I think drain has more of this 'finish
> what you are doing' while stop can be 'stop in the middle of it, whatever
> you are doing'.


It might be better. But drain may fit more for a queue not a device. We 
had stuffs other than queue e.g as Stefan pointed out, we should keep 
config space unchanged (and no config interrupt) after the device is 
stopped.


>
>> So we are probably fine.  Any idea or suggestion are more than welcomed.
>>
>>
> I will think about it some more.


Thanks


>
>>>> +
>>>> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
>>>> +set.
>>> Here you have the set by driver which is actually requesting the stop
>>> operation, and set by device which indicted that the stop operation
>>> was successfully performed by the device.
>>
>> Exactly.
>>
>> Thanks
>>
> Thank you for all the exlanations!
>
> Regards,
> Halil
>
> [..]
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-22  7:30       ` Jason Wang
@ 2020-12-22 12:14         ` Cornelia Huck
  2020-12-22 12:51           ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Cornelia Huck @ 2020-12-22 12:14 UTC (permalink / raw)
  To: Jason Wang; +Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare

On Tue, 22 Dec 2020 15:30:31 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/12/22 下午2:50, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 10:36:41 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2020/12/22 上午5:33, Halil Pasic wrote:  
> >>> On Fri, 18 Dec 2020 12:23:02 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>  
> >>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
> >>>> used by the driver to stop and resume a device. The main user will be
> >>>> live migration support for virtio device.
> >>>>  
> >>> Can you please provide some more background information, or point
> >>> me to the appropriate discussion?
> >>>
> >>> I mean AFAIK migration already works without this driver initiated
> >>> drain. What is the exact motivation? What about the big picture? I
> >>> guess some agent in the guest would have to make the driver issue
> >>> the DEVICE_STOP.  
> >>
> >> This is not necessary if the datapath is done inside qemu and when
> >> migration is initiated by qemu itself.
> >>
> >> But it's a must for using virtio-device as a backend for emulated virtio
> >> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device
> >> then it can safely synchronize the state from them.
> >>  
> > You say, in this case qemu needs to stop the device, which makes sense
> > (it also has to do this when the datapath is implemented in qemu), but
> > AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
> > confused.  
> 
> 
> It's initiated by Qemu. Guest is unware of live migration.

But isn't setting DEVICE_STOPPED a _driver_ initiated process? That
sounds like "guest" to me.

> 
> 
> >
> > I'm still curious about how the different components in the stack
> > (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
> > supposed to interact.  
> 
> 
> It works like:
> 
>  From Qemu point of view, vhost-vDPA is just another type of vhost 
> backend. Qemu needs to stop virtio (vhost) before it can do migration. 
> So we require vDPA devices to have the ability of stopping or pausing 
> its datapath. If the vDPA device is by chance the virtio-PCI device, it 
> needs an interface for receiving stop/resume command from the driver.
> 
> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then 
> to vDPA parent which could be a virtio-PCI device in this case.

But QEMU implements the _device_, not the driver, doesn't it? And IIUC,
vhost-VDPA and the vDPA parent are also on the device side. I feel like
I'm missing something essential here.

> 
> 
> >  
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>> ---
> >>>>    content.tex | 26 ++++++++++++++++++++++++--
> >>>>    1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/content.tex b/content.tex
> >>>> index 61eab41..4392b60 100644
> >>>> --- a/content.tex
> >>>> +++ b/content.tex
> >>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> >>>>    \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> >>>>      drive the device.
> >>>>    
> >>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> >>>> +  indicates that the device has been stopped by the driver.
> >>>> +  
> >>> AFAIU it is not only about indicating stopped, but also requesting to be
> >>> stopped.
> >>>
> >>> More importantly, that must not be set immediately, in a sense that the
> >>> one side initiates some action by requesting the bit to be set, and the
> >>> other side must not set the bit before the action is performed.  
> >>
> >> Yes.
> >>
> >>  
> >>> We also
> >>> seem to assume that every device implementation is capable of performing
> >>> this trick.  
> >>
> >> A dedicated feature bit is introduced for this.
> >>  
> > This is not about the feature bit, but about the mechanism. But your
> > subsequent answers explain, that this is nothing unusual, and then
> > we should be fine.
> >     
> >>> Is it for hardware devices (e.g. PCI) standard to request an
> >>> operation by writing some value into a register, and get feedback bout
> >>> a non-completion by reading different value that written,  
> >>
> >> This is not ununsal in other devices. And in fact, the FEATURES_OK works
> >> like this:
> >>
> >> """
> >>
> >> The device MUST NOT offer a feature which requires another feature which
> >> was not offered. The device SHOULD accept any valid subset of features
> >> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
> >> status bit when the driver writes it.
> >>
> >> """
> >>  
> > Thanks for the pointer. I intend to have another look at how FEATURES_OK
> > works, and how similar this is to DEVICE_STOPPED.  
> 
> 
> My understanding is that for both of them, driver can try to set the bit 
> by writing to the status register but it's the device that decide when 
> to set the bit.

I think there's a difference: For FEATURES_OK, the driver can read back
the status and knows that the feature combination is not accepted if it
is not set. For DEVICE_STOPPED, it does not really know whether the
device has started to stop yet. It only knows that the device is
actually done stopping when DEVICE_STOPPED is set.

Do we need a different mechanism for the device to signal the driver
that the device has actually been stopped? If the driver sees the
STOPPED status after reading back, it knows that the device has
acknowledged the request and is now stopping down. If it is not set, it
means that the device has not honoured the request. Same for clearing
it to resume.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-22 12:14         ` Cornelia Huck
@ 2020-12-22 12:51           ` Jason Wang
  2020-12-22 15:54             ` Cornelia Huck
  2020-12-24  4:52             ` Halil Pasic
  0 siblings, 2 replies; 43+ messages in thread
From: Jason Wang @ 2020-12-22 12:51 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/22 下午8:14, Cornelia Huck wrote:
> On Tue, 22 Dec 2020 15:30:31 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/12/22 下午2:50, Halil Pasic wrote:
>>> On Tue, 22 Dec 2020 10:36:41 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> On 2020/12/22 上午5:33, Halil Pasic wrote:
>>>>> On Fri, 18 Dec 2020 12:23:02 +0800
>>>>> Jason Wang <jasowang@redhat.com> wrote:
>>>>>   
>>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>>>>>> used by the driver to stop and resume a device. The main user will be
>>>>>> live migration support for virtio device.
>>>>>>   
>>>>> Can you please provide some more background information, or point
>>>>> me to the appropriate discussion?
>>>>>
>>>>> I mean AFAIK migration already works without this driver initiated
>>>>> drain. What is the exact motivation? What about the big picture? I
>>>>> guess some agent in the guest would have to make the driver issue
>>>>> the DEVICE_STOP.
>>>> This is not necessary if the datapath is done inside qemu and when
>>>> migration is initiated by qemu itself.
>>>>
>>>> But it's a must for using virtio-device as a backend for emulated virtio
>>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device
>>>> then it can safely synchronize the state from them.
>>>>   
>>> You say, in this case qemu needs to stop the device, which makes sense
>>> (it also has to do this when the datapath is implemented in qemu), but
>>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
>>> confused.
>>
>> It's initiated by Qemu. Guest is unware of live migration.
> But isn't setting DEVICE_STOPPED a _driver_ initiated process? That
> sounds like "guest" to me.


I think we need clarification on the "driver". Think the following setup:

Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA 
driver)

It's the Qemu that initiates the stop, and the request is forwarded to 
virtio-pci vDPA driver. So the process is transparent to the guest (driver).


>
>>
>>> I'm still curious about how the different components in the stack
>>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
>>> supposed to interact.
>>
>> It works like:
>>
>>   From Qemu point of view, vhost-vDPA is just another type of vhost
>> backend. Qemu needs to stop virtio (vhost) before it can do migration.
>> So we require vDPA devices to have the ability of stopping or pausing
>> its datapath. If the vDPA device is by chance the virtio-PCI device, it
>> needs an interface for receiving stop/resume command from the driver.
>>
>> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then
>> to vDPA parent which could be a virtio-PCI device in this case.
> But QEMU implements the _device_, not the driver, doesn't it?


The device is implemented with the co-operation between Qemu and 
vhost-vDPA. During migration qemu need to stop the virtio-net device, 
then vhost must be stopped as well.


> And IIUC,
> vhost-VDPA and the vDPA parent are also on the device side. I feel like
> I'm missing something essential here.


Virtio-PCI driver could be a vDPA parent as well in this case. So we 
need stop the virtio-pci device.


>
>>
>>>   
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>> ---
>>>>>>     content.tex | 26 ++++++++++++++++++++++++--
>>>>>>     1 file changed, 24 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/content.tex b/content.tex
>>>>>> index 61eab41..4392b60 100644
>>>>>> --- a/content.tex
>>>>>> +++ b/content.tex
>>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>>     \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>>>>       drive the device.
>>>>>>     
>>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>>>>>> +  indicates that the device has been stopped by the driver.
>>>>>> +
>>>>> AFAIU it is not only about indicating stopped, but also requesting to be
>>>>> stopped.
>>>>>
>>>>> More importantly, that must not be set immediately, in a sense that the
>>>>> one side initiates some action by requesting the bit to be set, and the
>>>>> other side must not set the bit before the action is performed.
>>>> Yes.
>>>>
>>>>   
>>>>> We also
>>>>> seem to assume that every device implementation is capable of performing
>>>>> this trick.
>>>> A dedicated feature bit is introduced for this.
>>>>   
>>> This is not about the feature bit, but about the mechanism. But your
>>> subsequent answers explain, that this is nothing unusual, and then
>>> we should be fine.
>>>      
>>>>> Is it for hardware devices (e.g. PCI) standard to request an
>>>>> operation by writing some value into a register, and get feedback bout
>>>>> a non-completion by reading different value that written,
>>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
>>>> like this:
>>>>
>>>> """
>>>>
>>>> The device MUST NOT offer a feature which requires another feature which
>>>> was not offered. The device SHOULD accept any valid subset of features
>>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
>>>> status bit when the driver writes it.
>>>>
>>>> """
>>>>   
>>> Thanks for the pointer. I intend to have another look at how FEATURES_OK
>>> works, and how similar this is to DEVICE_STOPPED.
>>
>> My understanding is that for both of them, driver can try to set the bit
>> by writing to the status register but it's the device that decide when
>> to set the bit.
> I think there's a difference: For FEATURES_OK, the driver can read back
> the status and knows that the feature combination is not accepted if it
> is not set. For DEVICE_STOPPED, it does not really know whether the
> device has started to stop yet. It only knows that the device is
> actually done stopping when DEVICE_STOPPED is set.


The only difference is that device may need sometime to be stopped. 
FEATURES_OK might not be the best example. Let's see how reset work 
which is more similar to DEVICE_STOPPED:

"""

After writing 0 to device_status, the driver MUST wait for a read of 
device_status to return 0 before reinitializing the device.

"""


>
> Do we need a different mechanism for the device to signal the driver
> that the device has actually been stopped? If the driver sees the
> STOPPED status after reading back, it knows that the device has
> acknowledged the request and is now stopping down. If it is not set, it
> means that the device has not honoured the request. Same for clearing
> it to resume.


I don't see much difference. Device reset doesn't have such notification 
and driver may simply poll or use a timer.

Thanks


>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-22 12:51           ` Jason Wang
@ 2020-12-22 15:54             ` Cornelia Huck
  2020-12-23  2:48               ` Jason Wang
  2020-12-24  4:52             ` Halil Pasic
  1 sibling, 1 reply; 43+ messages in thread
From: Cornelia Huck @ 2020-12-22 15:54 UTC (permalink / raw)
  To: Jason Wang; +Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare

On Tue, 22 Dec 2020 20:51:04 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/12/22 下午8:14, Cornelia Huck wrote:
> > On Tue, 22 Dec 2020 15:30:31 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2020/12/22 下午2:50, Halil Pasic wrote:  
> >>> On Tue, 22 Dec 2020 10:36:41 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>     
> >>>> On 2020/12/22 上午5:33, Halil Pasic wrote:  
> >>>>> On Fri, 18 Dec 2020 12:23:02 +0800
> >>>>> Jason Wang <jasowang@redhat.com> wrote:
> >>>>>     
> >>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
> >>>>>> used by the driver to stop and resume a device. The main user will be
> >>>>>> live migration support for virtio device.
> >>>>>>     
> >>>>> Can you please provide some more background information, or point
> >>>>> me to the appropriate discussion?
> >>>>>
> >>>>> I mean AFAIK migration already works without this driver initiated
> >>>>> drain. What is the exact motivation? What about the big picture? I
> >>>>> guess some agent in the guest would have to make the driver issue
> >>>>> the DEVICE_STOP.  
> >>>> This is not necessary if the datapath is done inside qemu and when
> >>>> migration is initiated by qemu itself.
> >>>>
> >>>> But it's a must for using virtio-device as a backend for emulated virtio
> >>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device
> >>>> then it can safely synchronize the state from them.
> >>>>     
> >>> You say, in this case qemu needs to stop the device, which makes sense
> >>> (it also has to do this when the datapath is implemented in qemu), but
> >>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
> >>> confused.  
> >>
> >> It's initiated by Qemu. Guest is unware of live migration.  
> > But isn't setting DEVICE_STOPPED a _driver_ initiated process? That
> > sounds like "guest" to me.  
> 
> 
> I think we need clarification on the "driver".

Looking through the spec, I think we have never properly defined what
we mean by "device" or "driver". Not sure what a good definition would
look like. I would consider a "device" an entity that is present and is
configured/used by a "driver". The "driver" is the means to actually
configure/use a "device", i.e. the "initiating" part (even though some
actions are initiated by the device once the driver has started
interacting with it).

> Think the following setup:
> 
> Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA 
> driver)
> 
> It's the Qemu that initiates the stop, and the request is forwarded to 
> virtio-pci vDPA driver. So the process is transparent to the guest (driver).

So it's

Guest driver <-> QEMU <-> vhost-vdpa <-> vDPA parent
|<-- guest -->|<-------------- host -------------->|

but the vDPA parent is itself acting as a "driver" if you take my
attempt at a definition above?

(Wouldn't that mean that the device in QEMU is configured/used from two
entities?)

> 
> 
> >  
> >>  
> >>> I'm still curious about how the different components in the stack
> >>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
> >>> supposed to interact.  
> >>
> >> It works like:
> >>
> >>   From Qemu point of view, vhost-vDPA is just another type of vhost
> >> backend. Qemu needs to stop virtio (vhost) before it can do migration.
> >> So we require vDPA devices to have the ability of stopping or pausing
> >> its datapath. If the vDPA device is by chance the virtio-PCI device, it
> >> needs an interface for receiving stop/resume command from the driver.
> >>
> >> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then
> >> to vDPA parent which could be a virtio-PCI device in this case.  
> > But QEMU implements the _device_, not the driver, doesn't it?  
> 
> 
> The device is implemented with the co-operation between Qemu and 
> vhost-vDPA. During migration qemu need to stop the virtio-net device, 
> then vhost must be stopped as well.

Yes, it makes sense that any vhost parts need to be stopped as well.

> 
> 
> > And IIUC,
> > vhost-VDPA and the vDPA parent are also on the device side. I feel like
> > I'm missing something essential here.  
> 
> 
> Virtio-PCI driver could be a vDPA parent as well in this case. So we 
> need stop the virtio-pci device.

Who is the "driver" in that case?

> 
> 
> >  
> >>  
> >>>     
> >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>>>> ---
> >>>>>>     content.tex | 26 ++++++++++++++++++++++++--
> >>>>>>     1 file changed, 24 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/content.tex b/content.tex
> >>>>>> index 61eab41..4392b60 100644
> >>>>>> --- a/content.tex
> >>>>>> +++ b/content.tex
> >>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
> >>>>>>     \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
> >>>>>>       drive the device.
> >>>>>>     
> >>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> >>>>>> +  indicates that the device has been stopped by the driver.
> >>>>>> +  
> >>>>> AFAIU it is not only about indicating stopped, but also requesting to be
> >>>>> stopped.
> >>>>>
> >>>>> More importantly, that must not be set immediately, in a sense that the
> >>>>> one side initiates some action by requesting the bit to be set, and the
> >>>>> other side must not set the bit before the action is performed.  
> >>>> Yes.
> >>>>
> >>>>     
> >>>>> We also
> >>>>> seem to assume that every device implementation is capable of performing
> >>>>> this trick.  
> >>>> A dedicated feature bit is introduced for this.
> >>>>     
> >>> This is not about the feature bit, but about the mechanism. But your
> >>> subsequent answers explain, that this is nothing unusual, and then
> >>> we should be fine.
> >>>        
> >>>>> Is it for hardware devices (e.g. PCI) standard to request an
> >>>>> operation by writing some value into a register, and get feedback bout
> >>>>> a non-completion by reading different value that written,  
> >>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
> >>>> like this:
> >>>>
> >>>> """
> >>>>
> >>>> The device MUST NOT offer a feature which requires another feature which
> >>>> was not offered. The device SHOULD accept any valid subset of features
> >>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
> >>>> status bit when the driver writes it.
> >>>>
> >>>> """
> >>>>     
> >>> Thanks for the pointer. I intend to have another look at how FEATURES_OK
> >>> works, and how similar this is to DEVICE_STOPPED.  
> >>
> >> My understanding is that for both of them, driver can try to set the bit
> >> by writing to the status register but it's the device that decide when
> >> to set the bit.  
> > I think there's a difference: For FEATURES_OK, the driver can read back
> > the status and knows that the feature combination is not accepted if it
> > is not set. For DEVICE_STOPPED, it does not really know whether the
> > device has started to stop yet. It only knows that the device is
> > actually done stopping when DEVICE_STOPPED is set.  
> 
> 
> The only difference is that device may need sometime to be stopped. 
> FEATURES_OK might not be the best example. Let's see how reset work 
> which is more similar to DEVICE_STOPPED:
> 
> """
> 
> After writing 0 to device_status, the driver MUST wait for a read of 
> device_status to return 0 before reinitializing the device.
> 
> """

Hm, but that is PCI-specific writing of fields in the configuration
structure, right? CCW uses a different method for resetting (an
asynchronous channel command; channel commands create an interrupt when
done); "write 0 to status for reset" is not universal.

Which makes me think: is using the device status a good universal
method for stopping the device? If I look at it only from the CCW
perspective, I'd have used a new channel command with a payload of
stop/resume and only reflected the stopped state in the device status
(i.e. read-only from the driver, and only changed by the device when it
transitions to/from the stopped state.) Could PCI use a new field for
that?

> 
> >
> > Do we need a different mechanism for the device to signal the driver
> > that the device has actually been stopped? If the driver sees the
> > STOPPED status after reading back, it knows that the device has
> > acknowledged the request and is now stopping down. If it is not set, it
> > means that the device has not honoured the request. Same for clearing
> > it to resume.  
> 
> 
> I don't see much difference. Device reset doesn't have such notification 
> and driver may simply poll or use a timer.

See above; CCW does not really use the device status in the same way. I
think a notification would be useful.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-22 15:54             ` Cornelia Huck
@ 2020-12-23  2:48               ` Jason Wang
  2020-12-25  7:38                 ` Halil Pasic
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-23  2:48 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/22 下午11:54, Cornelia Huck wrote:
> On Tue, 22 Dec 2020 20:51:04 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/12/22 下午8:14, Cornelia Huck wrote:
>>> On Tue, 22 Dec 2020 15:30:31 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> On 2020/12/22 下午2:50, Halil Pasic wrote:
>>>>> On Tue, 22 Dec 2020 10:36:41 +0800
>>>>> Jason Wang <jasowang@redhat.com> wrote:
>>>>>      
>>>>>> On 2020/12/22 上午5:33, Halil Pasic wrote:
>>>>>>> On Fri, 18 Dec 2020 12:23:02 +0800
>>>>>>> Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>      
>>>>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>>>>>>>> used by the driver to stop and resume a device. The main user will be
>>>>>>>> live migration support for virtio device.
>>>>>>>>      
>>>>>>> Can you please provide some more background information, or point
>>>>>>> me to the appropriate discussion?
>>>>>>>
>>>>>>> I mean AFAIK migration already works without this driver initiated
>>>>>>> drain. What is the exact motivation? What about the big picture? I
>>>>>>> guess some agent in the guest would have to make the driver issue
>>>>>>> the DEVICE_STOP.
>>>>>> This is not necessary if the datapath is done inside qemu and when
>>>>>> migration is initiated by qemu itself.
>>>>>>
>>>>>> But it's a must for using virtio-device as a backend for emulated virtio
>>>>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device
>>>>>> then it can safely synchronize the state from them.
>>>>>>      
>>>>> You say, in this case qemu needs to stop the device, which makes sense
>>>>> (it also has to do this when the datapath is implemented in qemu), but
>>>>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm
>>>>> confused.
>>>> It's initiated by Qemu. Guest is unware of live migration.
>>> But isn't setting DEVICE_STOPPED a _driver_ initiated process? That
>>> sounds like "guest" to me.
>>
>> I think we need clarification on the "driver".
> Looking through the spec, I think we have never properly defined what
> we mean by "device" or "driver". Not sure what a good definition would
> look like. I would consider a "device" an entity that is present and is
> configured/used by a "driver". The "driver" is the means to actually
> configure/use a "device", i.e. the "initiating" part (even though some
> actions are initiated by the device once the driver has started
> interacting with it).


Sorry for being unclear. I think the spec is fine since it's not limit 
to a host/guest interface, the clarification is only needed for the 
context of the use cases of DEVICE_STOP.


>
>> Think the following setup:
>>
>> Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA
>> driver)
>>
>> It's the Qemu that initiates the stop, and the request is forwarded to
>> virtio-pci vDPA driver. So the process is transparent to the guest (driver).
> So it's
>
> Guest driver <-> QEMU <-> vhost-vdpa <-> vDPA parent
> |<-- guest -->|<-------------- host -------------->|
>
> but the vDPA parent is itself acting as a "driver" if you take my
> attempt at a definition above?
>
> (Wouldn't that mean that the device in QEMU is configured/used from two
> entities?)


A more accurate version

Guest virtio-pci driver <->  virtio-pci  device QEMU <-> vhost-vdpa <-> 
virtio pci vDPA driver <-> virtio-pci device

So we had two drivers:

1) The first that is running in the guest (it could be either L1 or L2)
2) The second that is running on the host (or L1 if the first is running 
in L2)

And we had two devices
1) The first that is emulated by QEMU which is running in either L0 or L1
2) The second could be either a physical device on L0 or emulated one in 
L1 (Qemu is running in L1)

The main use case for DEVICE_STOP the second device and driver. This 
driver is not the virtio-pci driver but a vDPA driver [1].

[1] https://lkml.org/lkml/2020/12/3/1589


>
>>
>>>   
>>>>   
>>>>> I'm still curious about how the different components in the stack
>>>>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are
>>>>> supposed to interact.
>>>> It works like:
>>>>
>>>>    From Qemu point of view, vhost-vDPA is just another type of vhost
>>>> backend. Qemu needs to stop virtio (vhost) before it can do migration.
>>>> So we require vDPA devices to have the ability of stopping or pausing
>>>> its datapath. If the vDPA device is by chance the virtio-PCI device, it
>>>> needs an interface for receiving stop/resume command from the driver.
>>>>
>>>> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then
>>>> to vDPA parent which could be a virtio-PCI device in this case.
>>> But QEMU implements the _device_, not the driver, doesn't it?
>>
>> The device is implemented with the co-operation between Qemu and
>> vhost-vDPA. During migration qemu need to stop the virtio-net device,
>> then vhost must be stopped as well.
> Yes, it makes sense that any vhost parts need to be stopped as well.
>
>>
>>> And IIUC,
>>> vhost-VDPA and the vDPA parent are also on the device side. I feel like
>>> I'm missing something essential here.
>>
>> Virtio-PCI driver could be a vDPA parent as well in this case. So we
>> need stop the virtio-pci device.
> Who is the "driver" in that case?


It's the virtio-PCI vDPA driver.


>
>>
>>>   
>>>>   
>>>>>      
>>>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>>>> ---
>>>>>>>>      content.tex | 26 ++++++++++++++++++++++++--
>>>>>>>>      1 file changed, 24 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/content.tex b/content.tex
>>>>>>>> index 61eab41..4392b60 100644
>>>>>>>> --- a/content.tex
>>>>>>>> +++ b/content.tex
>>>>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>>>>>>>      \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>>>>>>>        drive the device.
>>>>>>>>      
>>>>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>>>>>>>> +  indicates that the device has been stopped by the driver.
>>>>>>>> +
>>>>>>> AFAIU it is not only about indicating stopped, but also requesting to be
>>>>>>> stopped.
>>>>>>>
>>>>>>> More importantly, that must not be set immediately, in a sense that the
>>>>>>> one side initiates some action by requesting the bit to be set, and the
>>>>>>> other side must not set the bit before the action is performed.
>>>>>> Yes.
>>>>>>
>>>>>>      
>>>>>>> We also
>>>>>>> seem to assume that every device implementation is capable of performing
>>>>>>> this trick.
>>>>>> A dedicated feature bit is introduced for this.
>>>>>>      
>>>>> This is not about the feature bit, but about the mechanism. But your
>>>>> subsequent answers explain, that this is nothing unusual, and then
>>>>> we should be fine.
>>>>>         
>>>>>>> Is it for hardware devices (e.g. PCI) standard to request an
>>>>>>> operation by writing some value into a register, and get feedback bout
>>>>>>> a non-completion by reading different value that written,
>>>>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
>>>>>> like this:
>>>>>>
>>>>>> """
>>>>>>
>>>>>> The device MUST NOT offer a feature which requires another feature which
>>>>>> was not offered. The device SHOULD accept any valid subset of features
>>>>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
>>>>>> status bit when the driver writes it.
>>>>>>
>>>>>> """
>>>>>>      
>>>>> Thanks for the pointer. I intend to have another look at how FEATURES_OK
>>>>> works, and how similar this is to DEVICE_STOPPED.
>>>> My understanding is that for both of them, driver can try to set the bit
>>>> by writing to the status register but it's the device that decide when
>>>> to set the bit.
>>> I think there's a difference: For FEATURES_OK, the driver can read back
>>> the status and knows that the feature combination is not accepted if it
>>> is not set. For DEVICE_STOPPED, it does not really know whether the
>>> device has started to stop yet. It only knows that the device is
>>> actually done stopping when DEVICE_STOPPED is set.
>>
>> The only difference is that device may need sometime to be stopped.
>> FEATURES_OK might not be the best example. Let's see how reset work
>> which is more similar to DEVICE_STOPPED:
>>
>> """
>>
>> After writing 0 to device_status, the driver MUST wait for a read of
>> device_status to return 0 before reinitializing the device.
>>
>> """
> Hm, but that is PCI-specific writing of fields in the configuration
> structure, right? CCW uses a different method for resetting (an
> asynchronous channel command; channel commands create an interrupt when
> done); "write 0 to status for reset" is not universal.


I may miss something. Even if CCW is using something asynchronously (e.g 
interrupt). It still present a synchronous API to the upper layer. E.g 
ccw_io_helper() is using wait_event()?


>
> Which makes me think: is using the device status a good universal
> method for stopping the device?


This part I don't understand. Device stop is kind of similar to reset in 
this case. If reset can work what prevent stop work?

E.g:

ccw_io_helper(VIRTIO_CCW_DOING_WRITE_STATUS);
while (1) {
     ccw_io_helper(VIRTIO_CCW_DOING_READ_STATUS);
     if (status & DEVICE_STOPPED)
         break;
}


>   If I look at it only from the CCW
> perspective, I'd have used a new channel command with a payload of
> stop/resume and only reflected the stopped state in the device status
> (i.e. read-only from the driver, and only changed by the device when it
> transitions to/from the stopped state.) Could PCI use a new field for
> that?


PCI can use new filed. But I wonder how can that help for CCW.

Thanks


>
>>> Do we need a different mechanism for the device to signal the driver
>>> that the device has actually been stopped? If the driver sees the
>>> STOPPED status after reading back, it knows that the device has
>>> acknowledged the request and is now stopping down. If it is not set, it
>>> means that the device has not honoured the request. Same for clearing
>>> it to resume.
>>
>> I don't see much difference. Device reset doesn't have such notification
>> and driver may simply poll or use a timer.
> See above; CCW does not really use the device status in the same way. I
> think a notification would be useful.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-22 12:51           ` Jason Wang
  2020-12-22 15:54             ` Cornelia Huck
@ 2020-12-24  4:52             ` Halil Pasic
  2020-12-24  5:51               ` Jason Wang
  1 sibling, 1 reply; 43+ messages in thread
From: Halil Pasic @ 2020-12-24  4:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare

On Tue, 22 Dec 2020 20:51:04 +0800
Jason Wang <jasowang@redhat.com> wrote:

> >>   From Qemu point of view, vhost-vDPA is just another type of vhost
> >> backend. Qemu needs to stop virtio (vhost) before it can do migration.
> >> So we require vDPA devices to have the ability of stopping or pausing
> >> its datapath. If the vDPA device is by chance the virtio-PCI device, it
> >> needs an interface for receiving stop/resume command from the driver.
> >>
> >> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then
> >> to vDPA parent which could be a virtio-PCI device in this case.  
> > But QEMU implements the _device_, not the driver, doesn't it?  
> 
> 
> The device is implemented with the co-operation between Qemu and 
> vhost-vDPA. During migration qemu need to stop the virtio-net device, 
> then vhost must be stopped as well.
> 

Sorry for the lazy/stupid question (it's been a while since I last
understood that code), how does this work with vhost based virtio-net?

I don't see ioctl that stops the old classic vhost (no vdpa). I have
a fuzzy remembrance that we 'stop' the notifiers (ioeventfd & irqfd) but
I'm not sure.

And since we are at virtio-net, I ask myself how are 'new requests' and
'requests that is (the device) currently processing' defined for the
receive functionality and the rx queue(s).  Remember the latter ('in
flight requests') need to be all completed. I mean 'in-flight' is pretty
straight-forward for something like virtio-blk, or even for tx but I have
a hard time with rx.

> 
> > And IIUC,
> > vhost-VDPA and the vDPA parent are also on the device side. I feel like
> > I'm missing something essential here.  
> 
> 
> Virtio-PCI driver could be a vDPA parent as well in this case. So we 
> need stop the virtio-pci device.

I used to think about vdpa like a vehicle to make partial virtio support
in HW viable and easy. I.e. when I hear vdpa I have something like this
in mind:
https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg

Of course the 'physical nic' from the linked picture can be a nic that
supports both the virtio data plane and the virtio control plane (i.e.
what you are referring to as a virtio-pci device). But do we still expect
that device to be connected via vdpa?

The second question is not strictly on topic, but I'm still curious, what
do we plan to do with a lower device that does not support virtio control
plane? In that case we can't go via DEVICE_STOP. Does that mean, there
has to be a vendor specific mechanism equivalent to the DEVICE_STOP
mechanism, or otherwise we are non-migratable?

Regards,
Halil

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-24  4:52             ` Halil Pasic
@ 2020-12-24  5:51               ` Jason Wang
  2020-12-25  3:18                 ` Halil Pasic
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-24  5:51 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/24 下午12:52, Halil Pasic wrote:
> On Tue, 22 Dec 2020 20:51:04 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>>>>    From Qemu point of view, vhost-vDPA is just another type of vhost
>>>> backend. Qemu needs to stop virtio (vhost) before it can do migration.
>>>> So we require vDPA devices to have the ability of stopping or pausing
>>>> its datapath. If the vDPA device is by chance the virtio-PCI device, it
>>>> needs an interface for receiving stop/resume command from the driver.
>>>>
>>>> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then
>>>> to vDPA parent which could be a virtio-PCI device in this case.
>>> But QEMU implements the _device_, not the driver, doesn't it?
>>
>> The device is implemented with the co-operation between Qemu and
>> vhost-vDPA. During migration qemu need to stop the virtio-net device,
>> then vhost must be stopped as well.
>>
> Sorry for the lazy/stupid question (it's been a while since I last
> understood that code), how does this work with vhost based virtio-net?
>
> I don't see ioctl that stops the old classic vhost (no vdpa). I have
> a fuzzy remembrance that we 'stop' the notifiers (ioeventfd & irqfd) but
> I'm not sure.


Each vhost devices implements its own ioctl:

For vhost-net, it's VHOST_NET_SET_BACKEND, passing -1 as socket fd will 
stop the device.
For vhost-vsock, it's VHOST_VSOCK_SET_RUNNING.

I don't check SCSI but it should have something similar.


>
> And since we are at virtio-net,


The feature is not net specific.


> I ask myself how are 'new requests' and
> 'requests that is (the device) currently processing' defined for the
> receive functionality and the rx queue(s).  Remember the latter ('in
> flight requests') need to be all completed. I mean 'in-flight' is pretty
> straight-forward for something like virtio-blk, or even for tx but I have
> a hard time with rx.


I think it should be no difference from the view of virtqueue. It's the 
requests that have been read from avail ring but not added to the used ring.

Actually this ties to the visibility of virtqueue internal state. 
Usually device maintains a tail pointer (last_avail_idx).  So in the 
simplest case when no indices wrap and no out-of-order, requests belongs 
to [last_avail_idx, used_idx) are considered as in-flight.


>
>>> And IIUC,
>>> vhost-VDPA and the vDPA parent are also on the device side. I feel like
>>> I'm missing something essential here.
>>
>> Virtio-PCI driver could be a vDPA parent as well in this case. So we
>> need stop the virtio-pci device.
> I used to think about vdpa like a vehicle to make partial virtio support
> in HW viable and easy. I.e. when I hear vdpa I have something like this
> in mind:
> https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg
>
> Of course the 'physical nic' from the linked picture can be a nic that
> supports both the virtio data plane and the virtio control plane (i.e.
> what you are referring to as a virtio-pci device). But do we still expect
> that device to be connected via vdpa?


Why not?[1] With the help of vhost-vDPA, we get the wire speed and live 
migration support for virtio-pci device.


>
> The second question is not strictly on topic, but I'm still curious, what
> do we plan to do with a lower device that does not support virtio control
> plane?


Yes, the vDPA device just need to implement the same semantic not the 
same interface. You may refer mlx5e vDPA driver in the kernel source.


> In that case we can't go via DEVICE_STOP. Does that mean, there
> has to be a vendor specific mechanism equivalent to the DEVICE_STOP
> mechanism, or otherwise we are non-migratable?


Exactly.

Thanks

[1] https://lwn.net/Articles/838978/


>
> Regards,
> Halil
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-24  5:51               ` Jason Wang
@ 2020-12-25  3:18                 ` Halil Pasic
  2020-12-25  6:45                   ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Halil Pasic @ 2020-12-25  3:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare

On Thu, 24 Dec 2020 13:51:31 +0800
Jason Wang <jasowang@redhat.com> wrote:

> 
> On 2020/12/24 下午12:52, Halil Pasic wrote:
> > On Tue, 22 Dec 2020 20:51:04 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >
[..]
> 
> Each vhost devices implements its own ioctl:
> 
> For vhost-net, it's VHOST_NET_SET_BACKEND, passing -1 as socket fd will 
> stop the device.
> For vhost-vsock, it's VHOST_VSOCK_SET_RUNNING.
> 
> I don't check SCSI but it should have something similar.
> 

Thanks!

> 
> >
> > And since we are at virtio-net,
> 
> 
> The feature is not net specific.
> 

Nod.

> 
> > I ask myself how are 'new requests' and
> > 'requests that is (the device) currently processing' defined for the
> > receive functionality and the rx queue(s).  Remember the latter ('in
> > flight requests') need to be all completed. I mean 'in-flight' is pretty
> > straight-forward for something like virtio-blk, or even for tx but I have
> > a hard time with rx.
> 
> 
> I think it should be no difference from the view of virtqueue. It's the 
> requests that have been read from avail ring but not added to the used ring.
> 
> Actually this ties to the visibility of virtqueue internal state. 
> Usually device maintains a tail pointer (last_avail_idx).  So in the 
> simplest case when no indices wrap and no out-of-order, requests belongs 
> to [last_avail_idx, used_idx) are considered as in-flight.
> 

I wouldn't call it virtqueue internal state, but I get what you mean. The
virtio specification is concerned with the device/driver interface. The
driver does not and can not differentiate between buffers that are
available but "haven't been read from avail ring", and those that have
"been read from the avail ring". As you said last_avail_idx is a device
private thing -- a device internal state. 

I tend to say, that from a perspective of the driver, all requests that
are available, and not yet used, are in-flight. So we have to be very
careful when wording this requirement, to avoid misunderstandings. I
don't think the first RFC is good enough. I will think some more about
this. I'm under the impression that considering virtio console (serial)
could be useful, as it provides a reliable duplex data transfer. I.e.
some I/O is not driver initiated, but dropping packets ain't OK. So, the
data may be in flight without the virtqueue buffer being in flight
according to our latest definition ([last_avail_idx, used_idx)).

Sorry, I have the feeling I'm spouting out half baked thoughts. Thank you
for bearing with me.

> 
> >
> >>> And IIUC,
> >>> vhost-VDPA and the vDPA parent are also on the device side. I feel like
> >>> I'm missing something essential here.
> >>
> >> Virtio-PCI driver could be a vDPA parent as well in this case. So we
> >> need stop the virtio-pci device.
> > I used to think about vdpa like a vehicle to make partial virtio support
> > in HW viable and easy. I.e. when I hear vdpa I have something like this
> > in mind:
> > https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg
> >
> > Of course the 'physical nic' from the linked picture can be a nic that
> > supports both the virtio data plane and the virtio control plane (i.e.
> > what you are referring to as a virtio-pci device). But do we still expect
> > that device to be connected via vdpa?
> 
> 
> Why not?[1] With the help of vhost-vDPA, we get the wire speed and live 
> migration support for virtio-pci device.

Indeed.

> 
> 
> >
> > The second question is not strictly on topic, but I'm still curious, what
> > do we plan to do with a lower device that does not support virtio control
> > plane?
> 
> 
> Yes, the vDPA device just need to implement the same semantic not the 
> same interface. You may refer mlx5e vDPA driver in the kernel source.

Thanks! I will have a look.

Regards,
Halil

[..]

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-25  3:18                 ` Halil Pasic
@ 2020-12-25  6:45                   ` Jason Wang
  2020-12-27 11:12                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-25  6:45 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/25 上午11:18, Halil Pasic wrote:
> On Thu, 24 Dec 2020 13:51:31 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/12/24 下午12:52, Halil Pasic wrote:
>>> On Tue, 22 Dec 2020 20:51:04 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>
> [..]
>> Each vhost devices implements its own ioctl:
>>
>> For vhost-net, it's VHOST_NET_SET_BACKEND, passing -1 as socket fd will
>> stop the device.
>> For vhost-vsock, it's VHOST_VSOCK_SET_RUNNING.
>>
>> I don't check SCSI but it should have something similar.
>>
> Thanks!
>
>>> And since we are at virtio-net,
>>
>> The feature is not net specific.
>>
> Nod.
>
>>> I ask myself how are 'new requests' and
>>> 'requests that is (the device) currently processing' defined for the
>>> receive functionality and the rx queue(s).  Remember the latter ('in
>>> flight requests') need to be all completed. I mean 'in-flight' is pretty
>>> straight-forward for something like virtio-blk, or even for tx but I have
>>> a hard time with rx.
>>
>> I think it should be no difference from the view of virtqueue. It's the
>> requests that have been read from avail ring but not added to the used ring.
>>
>> Actually this ties to the visibility of virtqueue internal state.
>> Usually device maintains a tail pointer (last_avail_idx).  So in the
>> simplest case when no indices wrap and no out-of-order, requests belongs
>> to [last_avail_idx, used_idx) are considered as in-flight.
>>
> I wouldn't call it virtqueue internal state, but I get what you mean. The
> virtio specification is concerned with the device/driver interface. The
> driver does not and can not differentiate between buffers that are
> available but "haven't been read from avail ring", and those that have
> "been read from the avail ring". As you said last_avail_idx is a device
> private thing -- a device internal state.


Exactly.


>
> I tend to say, that from a perspective of the driver, all requests that
> are available, and not yet used, are in-flight. So we have to be very
> careful when wording this requirement, to avoid misunderstandings. I
> don't think the first RFC is good enough. I will think some more about
> this.


Yes, I agree. The problem is that the spec doesn't describe how device 
work, so if we want to be more accurate, it might require some work not 
only for stop but also for e.g reset (something like in flight has been 
used by the spec in that case).


> I'm under the impression that considering virtio console (serial)
> could be useful, as it provides a reliable duplex data transfer. I.e.
> some I/O is not driver initiated, but dropping packets ain't OK. So, the
> data may be in flight without the virtqueue buffer being in flight
> according to our latest definition ([last_avail_idx, used_idx)).


Yes.


> Sorry, I have the feeling I'm spouting out half baked thoughts. Thank you
> for bearing with me.


You're welcome and you comments are very helpful and appreciated.

Thanks


>
>>>>> And IIUC,
>>>>> vhost-VDPA and the vDPA parent are also on the device side. I feel like
>>>>> I'm missing something essential here.
>>>> Virtio-PCI driver could be a vDPA parent as well in this case. So we
>>>> need stop the virtio-pci device.
>>> I used to think about vdpa like a vehicle to make partial virtio support
>>> in HW viable and easy. I.e. when I hear vdpa I have something like this
>>> in mind:
>>> https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg
>>>
>>> Of course the 'physical nic' from the linked picture can be a nic that
>>> supports both the virtio data plane and the virtio control plane (i.e.
>>> what you are referring to as a virtio-pci device). But do we still expect
>>> that device to be connected via vdpa?
>>
>> Why not?[1] With the help of vhost-vDPA, we get the wire speed and live
>> migration support for virtio-pci device.
> Indeed.
>
>>
>>> The second question is not strictly on topic, but I'm still curious, what
>>> do we plan to do with a lower device that does not support virtio control
>>> plane?
>>
>> Yes, the vDPA device just need to implement the same semantic not the
>> same interface. You may refer mlx5e vDPA driver in the kernel source.
> Thanks! I will have a look.
>
> Regards,
> Halil
>
> [..]
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-23  2:48               ` Jason Wang
@ 2020-12-25  7:38                 ` Halil Pasic
  2020-12-27 10:00                   ` Michael S. Tsirkin
  2020-12-28  6:47                   ` Jason Wang
  0 siblings, 2 replies; 43+ messages in thread
From: Halil Pasic @ 2020-12-25  7:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare

On Wed, 23 Dec 2020 10:48:45 +0800
Jason Wang <jasowang@redhat.com> wrote:
[..]
> >>>>>>> Is it for hardware devices (e.g. PCI) standard to request an
> >>>>>>> operation by writing some value into a register, and get feedback bout
> >>>>>>> a non-completion by reading different value that written,  
> >>>>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
> >>>>>> like this:
> >>>>>>
> >>>>>> """
> >>>>>>
> >>>>>> The device MUST NOT offer a feature which requires another feature which
> >>>>>> was not offered. The device SHOULD accept any valid subset of features
> >>>>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
> >>>>>> status bit when the driver writes it.
> >>>>>>
> >>>>>> """
> >>>>>>        
> >>>>> Thanks for the pointer. I intend to have another look at how FEATURES_OK
> >>>>> works, and how similar this is to DEVICE_STOPPED.  
> >>>> My understanding is that for both of them, driver can try to set the bit
> >>>> by writing to the status register but it's the device that decide when
> >>>> to set the bit.  
> >>> I think there's a difference: For FEATURES_OK, the driver can read back
> >>> the status and knows that the feature combination is not accepted if it
> >>> is not set. For DEVICE_STOPPED, it does not really know whether the
> >>> device has started to stop yet. It only knows that the device is
> >>> actually done stopping when DEVICE_STOPPED is set.  
> >>

I share Connie's concern. The config->set_status() is a synchronous API,
that is right after the iowrite8 for pci comes back, or the channel
program finishes executing for ccw, we can go, do a config->get_status()
and know if the status was set succesfully.

> >> The only difference is that device may need sometime to be stopped.
> >> FEATURES_OK might not be the best example. Let's see how reset work
> >> which is more similar to DEVICE_STOPPED:
> >>
> >> """
> >>
> >> After writing 0 to device_status, the driver MUST wait for a read of
> >> device_status to return 0 before reinitializing the device.
> >>
> >> """  
> > Hm, but that is PCI-specific writing of fields in the configuration
> > structure, right? CCW uses a different method for resetting (an
> > asynchronous channel command; channel commands create an interrupt when
> > done); "write 0 to status for reset" is not universal.  
> 
> 

> I may miss something. Even if CCW is using something asynchronously (e.g 
> interrupt). It still present a synchronous API to the upper layer. E.g 
> ccw_io_helper() is using wait_event()?

Right. The reason why we have to wait for the completion of the channel
program is exactly that. Furthermore if we have to fail setting
FEATURES_OK we fail the channel program, that was supposed to do the
operation with an unit check. So at the point where the set_status() is
done, we already know if it worked or not. Of course the device can
change it's status any time.

Since we have a synchronous API setting DEVICE_STOPPED would also have to
block until all in-flight requests are completed. But this does not see
to be what you want. I understood that
virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would
a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set,
if there are still outstanding requests.

I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by
a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED)
(followed by a readback) as asynchronous is a good idea. 

BTW regarding 'The device SHOULD accept any valid subset of features
the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
status bit when the driver writes it.' I'm a bit puzzled what 'must fail
to set' actually means. From CPU perspective the 'set the FEATURE_OK
status bit' boils down to an instruction generated for iowrite8. As far
as I understand this instruction is not supposed to fail (on s390 there
are multiple ways in which a pci store can fail). AFAIK x86 uses normal
memory access instructions. I suppose when a store instruction does not
fail, then the store must happen. Or is that only true for RAM and for
PCI memory not?


> 
> 
> >
> > Which makes me think: is using the device status a good universal
> > method for stopping the device?  
> 
> 
> This part I don't understand. Device stop is kind of similar to reset in 
> this case. If reset can work what prevent stop work?
> 
> E.g:
> 
> ccw_io_helper(VIRTIO_CCW_DOING_WRITE_STATUS);
> while (1) {
>      ccw_io_helper(VIRTIO_CCW_DOING_READ_STATUS);
>      if (status & DEVICE_STOPPED)
>          break;
> }
> 

You are right I think. It can work for virtio-ccw, if setting
DEVICE_STOPPED is synchronous. For reset we use the CCW_CMD_VDEV_RESET
ccw command (see in the spec), but that is a design choice. I think
setting the status to 0 via CCW_CMD_WRITE_STATUS should also work. The
code below is from qemu and you should look for the invocation of
virtio_ccw_reset_virtio(). I guess, that is an undocumented feature. And
we can get away with setting the status first, and then doing the reset
here, because the implementation guarantees that the driver won't see
the new status value until the reset is actually done. (This is not
a property of the architecture, but a property of the implementation).

    case CCW_CMD_WRITE_STATUS:
        if (check_len) {
            if (ccw.count != sizeof(status)) {
                ret = -EINVAL;
                break;
            }
        } else if (ccw.count < sizeof(status)) {
            /* Can't execute command. */
            ret = -EINVAL; 
            break;
        }
        if (!ccw.cda) {
            ret = -EFAULT;
        } else {
            ccw_dstream_read(&sch->cds, status);
            if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
                virtio_ccw_stop_ioeventfd(dev);
            }
            if (virtio_set_status(vdev, status) == 0) {
                if (vdev->status == 0) {
                    virtio_ccw_reset_virtio(dev, vdev);
                }
                if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
                    virtio_ccw_start_ioeventfd(dev);
                }
                sch->curr_status.scsw.count = ccw.count - sizeof(status);
                ret = 0;
            } else {
                /* Trigger a command reject. */
                ret = -ENOSYS;
            }
        }


> 
> >   If I look at it only from the CCW
> > perspective, I'd have used a new channel command with a payload of
> > stop/resume and only reflected the stopped state in the device status
> > (i.e. read-only from the driver, and only changed by the device when it
> > transitions to/from the stopped state.) Could PCI use a new field for
> > that?  
> 
> 
> PCI can use new filed. But I wonder how can that help for CCW.

Using a new mechanism to request the device to stop would help with
the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK
synchronous, and make REQUEST_STOP asynchronous.

A new PCI field would have the downside, of being PCI specific. This
feature makes sense to me regardless of the transport. But I don't think
it is likely to become relevant for the ccw any time soon.

Regards,
Halil


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-25  7:38                 ` Halil Pasic
@ 2020-12-27 10:00                   ` Michael S. Tsirkin
  2020-12-28  6:21                     ` Halil Pasic
  2020-12-28  6:47                   ` Jason Wang
  1 sibling, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2020-12-27 10:00 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Jason Wang, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare

On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote:
>When driver is trying to set DEVICE_STOPPED, the device MUST not
>process new avail requests and MUST complete all requests that is
>currently processing before setting DEVICE_STOPPED.

...

> Since we have a synchronous API setting DEVICE_STOPPED would also have to
> block until all in-flight requests are completed.

Judging from the surrounding discussion,
when you say complete you seem to mean "use", and
I'm not sure how you define in flight, but
it seems there could be many of these (up to a full queue?)
so waiting for all available buffers to be
used might indeed require an asynchronous interface,
which gets complex very quickly.

However wouldn't it be possible for device to just cancel
processing available buffers even if it started processing
them? Some buffers could be in indeterminate state
(e.g. we might not have a way to know how much data did
device have time to write into a buffer).

Making it clearer what does "complete" mean here might help.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-25  6:45                   ` Jason Wang
@ 2020-12-27 11:12                     ` Michael S. Tsirkin
  2020-12-28  7:05                       ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2020-12-27 11:12 UTC (permalink / raw)
  To: Jason Wang
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare

On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote:
> > I tend to say, that from a perspective of the driver, all requests that
> > are available, and not yet used, are in-flight. So we have to be very
> > careful when wording this requirement, to avoid misunderstandings. I
> > don't think the first RFC is good enough. I will think some more about
> > this.
> 
> 
> Yes, I agree. The problem is that the spec doesn't describe how device work,
> so if we want to be more accurate, it might require some work not only for
> stop but also for e.g reset (something like in flight has been used by the
> spec in that case).

You probably mean the DEVICE_NEEDS_RESET description, right?

	For example, the driver can't assume requests in flight will be
	completed if DEVICE_NEEDS_RESET is set, nor can it assume that
	they have not been completed.  A good implementation will try to
	recover by issuing a reset.

yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely
is related to the fact it is not widely implemented.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-27 10:00                   ` Michael S. Tsirkin
@ 2020-12-28  6:21                     ` Halil Pasic
  2020-12-28  7:01                       ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Halil Pasic @ 2020-12-28  6:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare

On Sun, 27 Dec 2020 05:00:05 -0500
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote:
> >When driver is trying to set DEVICE_STOPPED, the device MUST not
> >process new avail requests and MUST complete all requests that is
> >currently processing before setting DEVICE_STOPPED.
> 
> ...
> 
> > Since we have a synchronous API setting DEVICE_STOPPED would also have to
> > block until all in-flight requests are completed.
> 
> Judging from the surrounding discussion,
> when you say complete you seem to mean "use", and

My intention was merely to paraphrase Jason's proposal which says:

+When driver is trying to set DEVICE_STOPPED, the device MUST not
+process new avail requests and MUST complete all requests that is
+currently processing before setting DEVICE_STOPPED.

> I'm not sure how you define in flight, but
> it seems there could be many of these (up to a full queue?)

In the follow-on discussion 'in-flight' emerged as an alternative to
'all requests that is currently processing' form the proposed text.

A large part of the discussion is, IMHO, about finding precise
definitions, for what the quoted paragraph is trying to express.

> so waiting for all available buffers to be
> used might indeed require an asynchronous interface,
> which gets complex very quickly.
> 
> However wouldn't it be possible for device to just cancel
> processing available buffers even if it started processing
> them? Some buffers could be in indeterminate state
> (e.g. we might not have a way to know how much data did
> device have time to write into a buffer).
> 

Maybe Jason can answer this one.

> Making it clearer what does "complete" mean here might help.
> 

I think what we are trying to accomplish here, is avoiding, having
non-standardised state in device (that can not be dropped) when
migrating. 

I'm still struggling with wrapping my mind around the problem. AFAIU
migration and migratability is not really a feature of the virtio
standard, but can be a feature of it's implementation
(e.g. QEMU & KVM), where the standard does help a great deal by having
certain aspects of the operation and interaction nailed down.

Regards,
Halil 

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-25  7:38                 ` Halil Pasic
  2020-12-27 10:00                   ` Michael S. Tsirkin
@ 2020-12-28  6:47                   ` Jason Wang
  2020-12-29 13:20                     ` Halil Pasic
  1 sibling, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-28  6:47 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/25 下午3:38, Halil Pasic wrote:
> On Wed, 23 Dec 2020 10:48:45 +0800
> Jason Wang <jasowang@redhat.com> wrote:
> [..]
>>>>>>>>> Is it for hardware devices (e.g. PCI) standard to request an
>>>>>>>>> operation by writing some value into a register, and get feedback bout
>>>>>>>>> a non-completion by reading different value that written,
>>>>>>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works
>>>>>>>> like this:
>>>>>>>>
>>>>>>>> """
>>>>>>>>
>>>>>>>> The device MUST NOT offer a feature which requires another feature which
>>>>>>>> was not offered. The device SHOULD accept any valid subset of features
>>>>>>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
>>>>>>>> status bit when the driver writes it.
>>>>>>>>
>>>>>>>> """
>>>>>>>>         
>>>>>>> Thanks for the pointer. I intend to have another look at how FEATURES_OK
>>>>>>> works, and how similar this is to DEVICE_STOPPED.
>>>>>> My understanding is that for both of them, driver can try to set the bit
>>>>>> by writing to the status register but it's the device that decide when
>>>>>> to set the bit.
>>>>> I think there's a difference: For FEATURES_OK, the driver can read back
>>>>> the status and knows that the feature combination is not accepted if it
>>>>> is not set. For DEVICE_STOPPED, it does not really know whether the
>>>>> device has started to stop yet. It only knows that the device is
>>>>> actually done stopping when DEVICE_STOPPED is set.
> I share Connie's concern. The config->set_status() is a synchronous API,
> that is right after the iowrite8 for pci comes back, or the channel
> program finishes executing for ccw, we can go, do a config->get_status()
> and know if the status was set succesfully.
>
>>>> The only difference is that device may need sometime to be stopped.
>>>> FEATURES_OK might not be the best example. Let's see how reset work
>>>> which is more similar to DEVICE_STOPPED:
>>>>
>>>> """
>>>>
>>>> After writing 0 to device_status, the driver MUST wait for a read of
>>>> device_status to return 0 before reinitializing the device.
>>>>
>>>> """
>>> Hm, but that is PCI-specific writing of fields in the configuration
>>> structure, right? CCW uses a different method for resetting (an
>>> asynchronous channel command; channel commands create an interrupt when
>>> done); "write 0 to status for reset" is not universal.
>>
>> I may miss something. Even if CCW is using something asynchronously (e.g
>> interrupt). It still present a synchronous API to the upper layer. E.g
>> ccw_io_helper() is using wait_event()?
> Right. The reason why we have to wait for the completion of the channel
> program is exactly that. Furthermore if we have to fail setting
> FEATURES_OK we fail the channel program, that was supposed to do the
> operation with an unit check. So at the point where the set_status() is
> done, we already know if it worked or not.


The only difference is, for FEATURES_OK, driver know it won't work any 
more. But for DEVICE_STOPPED, driver know it might work sometime in the 
future.


> Of course the device can
> change it's status any time.
>
> Since we have a synchronous API setting DEVICE_STOPPED would also have to
> block until all in-flight requests are completed. But this does not see
> to be what you want. I understood that
> virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would
> a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set,
> if there are still outstanding requests.


Yes.


>
> I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by
> a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED)
> (followed by a readback) as asynchronous is a good idea.


The main reason is that DEVICE_STOPPED is a one of the possible state. 
We'd better do this by extending the current device status state 
machine. Otherwise it would be more complicated (e.g using another 
interface).

It looks to me what you don't like is the driver API design? If yes, how 
about introduce something like virtio_try_add_status()? The difference 
is that it doesn't  require a immediate result.


>
> BTW regarding 'The device SHOULD accept any valid subset of features
> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device
> status bit when the driver writes it.' I'm a bit puzzled what 'must fail
> to set' actually means.


Good point, my understanding for "set XX bit" means the whether or not 
the bit could be read by the driver. We need some clarification here.


>   From CPU perspective the 'set the FEATURE_OK
> status bit' boils down to an instruction generated for iowrite8. As far
> as I understand this instruction is not supposed to fail (on s390 there
> are multiple ways in which a pci store can fail).


Yes, my understanding is that it doesn't mean fail the instruction. As 
you said it's sometime impossible a specific transport.


>   AFAIK x86 uses normal
> memory access instructions. I suppose when a store instruction does not
> fail, then the store must happen. Or is that only true for RAM and for
> PCI memory not?


AFAIK, PCI have non-posted write which can fail but the spec doesn't 
requires that.


>
>
>>
>>> Which makes me think: is using the device status a good universal
>>> method for stopping the device?
>>
>> This part I don't understand. Device stop is kind of similar to reset in
>> this case. If reset can work what prevent stop work?
>>
>> E.g:
>>
>> ccw_io_helper(VIRTIO_CCW_DOING_WRITE_STATUS);
>> while (1) {
>>       ccw_io_helper(VIRTIO_CCW_DOING_READ_STATUS);
>>       if (status & DEVICE_STOPPED)
>>           break;
>> }
>>
> You are right I think. It can work for virtio-ccw, if setting
> DEVICE_STOPPED is synchronous. For reset we use the CCW_CMD_VDEV_RESET
> ccw command (see in the spec), but that is a design choice. I think
> setting the status to 0 via CCW_CMD_WRITE_STATUS should also work. The
> code below is from qemu and you should look for the invocation of
> virtio_ccw_reset_virtio(). I guess, that is an undocumented feature.


Then I think it needs to be clarified in the spec.


>   And
> we can get away with setting the status first, and then doing the reset
> here, because the implementation guarantees that the driver won't see
> the new status value until the reset is actually done. (This is not
> a property of the architecture, but a property of the implementation).


So did PCI:

     case VIRTIO_PCI_STATUS:
         if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
             virtio_pci_stop_ioeventfd(proxy);
         }

         virtio_set_status(vdev, val & 0xFF);

         if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
             virtio_pci_start_ioeventfd(proxy);
         }

         if (vdev->status == 0) {
             virtio_pci_reset(DEVICE(proxy));
         }

As you said, it's a implementation property since qemu implements a 
synchronous virtio_set_status(). But it might not be true for real 
hardware which may require more time or even hard to drain a queue in 
short time. That's why PCI spec mandate the driver to poll for the 
status after writing zero to device status.


>
>      case CCW_CMD_WRITE_STATUS:
>          if (check_len) {
>              if (ccw.count != sizeof(status)) {
>                  ret = -EINVAL;
>                  break;
>              }
>          } else if (ccw.count < sizeof(status)) {
>              /* Can't execute command. */
>              ret = -EINVAL;
>              break;
>          }
>          if (!ccw.cda) {
>              ret = -EFAULT;
>          } else {
>              ccw_dstream_read(&sch->cds, status);
>              if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>                  virtio_ccw_stop_ioeventfd(dev);
>              }
>              if (virtio_set_status(vdev, status) == 0) {
>                  if (vdev->status == 0) {
>                      virtio_ccw_reset_virtio(dev, vdev);
>                  }
>                  if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>                      virtio_ccw_start_ioeventfd(dev);
>                  }
>                  sch->curr_status.scsw.count = ccw.count - sizeof(status);
>                  ret = 0;
>              } else {
>                  /* Trigger a command reject. */
>                  ret = -ENOSYS;
>              }
>          }
>
>
>>>    If I look at it only from the CCW
>>> perspective, I'd have used a new channel command with a payload of
>>> stop/resume and only reflected the stopped state in the device status
>>> (i.e. read-only from the driver, and only changed by the device when it
>>> transitions to/from the stopped state.) Could PCI use a new field for
>>> that?
>>
>> PCI can use new filed. But I wonder how can that help for CCW.
> Using a new mechanism to request the device to stop would help with
> the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK
> synchronous, and make REQUEST_STOP asynchronous.


Right, but as said before, it looks more like a part of device status 
state machine, using new mechanism might bring extra complexity and 
confusion.

Looking at CCW implementation of device status, I don't see any blocker 
for implementing it synchronously if we just wait for the stop of the 
device and then return?


>
> A new PCI field would have the downside, of being PCI specific. This
> feature makes sense to me regardless of the transport. But I don't think
> it is likely to become relevant for the ccw any time soon.


Yes, it's better not limit the feature to a specific transport.

Thanks


>
> Regards,
> Halil
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-28  6:21                     ` Halil Pasic
@ 2020-12-28  7:01                       ` Jason Wang
  2020-12-28 12:30                         ` Michael S. Tsirkin
  2020-12-29 13:35                         ` Halil Pasic
  0 siblings, 2 replies; 43+ messages in thread
From: Jason Wang @ 2020-12-28  7:01 UTC (permalink / raw)
  To: Halil Pasic, Michael S. Tsirkin
  Cc: Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare


On 2020/12/28 下午2:21, Halil Pasic wrote:
> On Sun, 27 Dec 2020 05:00:05 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
>> On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote:
>>> When driver is trying to set DEVICE_STOPPED, the device MUST not
>>> process new avail requests and MUST complete all requests that is
>>> currently processing before setting DEVICE_STOPPED.
>> ...
>>
>>> Since we have a synchronous API setting DEVICE_STOPPED would also have to
>>> block until all in-flight requests are completed.
>> Judging from the surrounding discussion,
>> when you say complete you seem to mean "use", and
> My intention was merely to paraphrase Jason's proposal which says:
>
> +When driver is trying to set DEVICE_STOPPED, the device MUST not
> +process new avail requests and MUST complete all requests that is
> +currently processing before setting DEVICE_STOPPED.
>
>> I'm not sure how you define in flight, but
>> it seems there could be many of these (up to a full queue?)
> In the follow-on discussion 'in-flight' emerged as an alternative to
> 'all requests that is currently processing' form the proposed text.
>
> A large part of the discussion is, IMHO, about finding precise
> definitions, for what the quoted paragraph is trying to express.
>
>> so waiting for all available buffers to be
>> used might indeed require an asynchronous interface,
>> which gets complex very quickly.


Some part of the virtio has enforced an asynchronous interface during reset:

For MMIO the spec said:

"""

To stop using the queue the driver MUST write zero (0x0) to this 
QueueReady and MUST read the value back to ensure synchronization.

"""

For PCI it said:

"""

After writing 0 to device_status, the driver MUST wait for a read of 
device_status to return 0 before reinitializing the device.

"""


>>
>> However wouldn't it be possible for device to just cancel
>> processing available buffers even if it started processing
>> them? Some buffers could be in indeterminate state
>> (e.g. we might not have a way to know how much data did
>> device have time to write into a buffer).
>>
> Maybe Jason can answer this one.


For networking device, it should be possible (packet could be dropped or 
duplicated). But I'm not sure it works for other device. E.g for the 
block devices that needs to communicate with a remote backend. Waat's 
more important, my understadning is that the intermediate state is 
something that we need to avoid (hard to be migrated anyhow).


>
>> Making it clearer what does "complete" mean here might help.
>>
> I think what we are trying to accomplish here, is avoiding, having
> non-standardised state in device (that can not be dropped) when
> migrating.
>
> I'm still struggling with wrapping my mind around the problem. AFAIU
> migration and migratability is not really a feature of the virtio
> standard, but can be a feature of it's implementation
> (e.g. QEMU & KVM), where the standard does help a great deal by having
> certain aspects of the operation and interaction nailed down.


It looks like a must (at least from the level of semantics) for 
designing software API for vDPA. Using a standard spec may help to avoid 
subtle misunderstanding of different vendors.

Thanks


>
> Regards,
> Halil
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-27 11:12                     ` Michael S. Tsirkin
@ 2020-12-28  7:05                       ` Jason Wang
  2020-12-28 12:27                         ` Michael S. Tsirkin
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-28  7:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare


On 2020/12/27 下午7:12, Michael S. Tsirkin wrote:
> On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote:
>>> I tend to say, that from a perspective of the driver, all requests that
>>> are available, and not yet used, are in-flight. So we have to be very
>>> careful when wording this requirement, to avoid misunderstandings. I
>>> don't think the first RFC is good enough. I will think some more about
>>> this.
>>
>> Yes, I agree. The problem is that the spec doesn't describe how device work,
>> so if we want to be more accurate, it might require some work not only for
>> stop but also for e.g reset (something like in flight has been used by the
>> spec in that case).
> You probably mean the DEVICE_NEEDS_RESET description, right?
>
> 	For example, the driver can't assume requests in flight will be
> 	completed if DEVICE_NEEDS_RESET is set, nor can it assume that
> 	they have not been completed.  A good implementation will try to
> 	recover by issuing a reset.
>
> yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely
> is related to the fact it is not widely implemented.


For both NEEDS_RESET and device reset.

For NEEDS_RESET, we use "in flight" and "completed" without an actual 
definition.

For device reset, we don't even define what is the device expected (e.g 
how are "in flight" requests expected to be processed) to behave.

Thanks


>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-28  7:05                       ` Jason Wang
@ 2020-12-28 12:27                         ` Michael S. Tsirkin
  2020-12-29  8:57                           ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 12:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare

On Mon, Dec 28, 2020 at 03:05:43PM +0800, Jason Wang wrote:
> 
> On 2020/12/27 下午7:12, Michael S. Tsirkin wrote:
> > On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote:
> > > > I tend to say, that from a perspective of the driver, all requests that
> > > > are available, and not yet used, are in-flight. So we have to be very
> > > > careful when wording this requirement, to avoid misunderstandings. I
> > > > don't think the first RFC is good enough. I will think some more about
> > > > this.
> > > 
> > > Yes, I agree. The problem is that the spec doesn't describe how device work,
> > > so if we want to be more accurate, it might require some work not only for
> > > stop but also for e.g reset (something like in flight has been used by the
> > > spec in that case).
> > You probably mean the DEVICE_NEEDS_RESET description, right?
> > 
> > 	For example, the driver can't assume requests in flight will be
> > 	completed if DEVICE_NEEDS_RESET is set, nor can it assume that
> > 	they have not been completed.  A good implementation will try to
> > 	recover by issuing a reset.
> > 
> > yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely
> > is related to the fact it is not widely implemented.
> 
> 
> For both NEEDS_RESET and device reset.
> 
> For NEEDS_RESET, we use "in flight" and "completed" without an actual
> definition.
> 
> For device reset, we don't even define what is the device expected (e.g how
> are "in flight" requests expected to be processed) to behave.
> 
> Thanks

Because device is expected to just stop:

	None of the virtqueues
	of a device are live once the device has been reset.

and it's driver's job to recover buffers:

	Thus a driver MUST ensure a virtqueue isn't live (by device reset) before removing exposed buffers.

what happened to buffers which were not used?

I think it's a quality of implementation/device specific issue, e.g. for net:
for transmit, if device sends a packet without using the buffer,
then driver will resend the packet leading to duplicates.
for receive, it's a packet drop, slightly less of a problem.


my question is whether such behaviour is sufficient for migration?
if not can we really describe it generally? it's possible we need to
describe it per device type.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-28  7:01                       ` Jason Wang
@ 2020-12-28 12:30                         ` Michael S. Tsirkin
  2020-12-29  9:04                           ` Jason Wang
  2020-12-29 13:35                         ` Halil Pasic
  1 sibling, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2020-12-28 12:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare

On Mon, Dec 28, 2020 at 03:01:57PM +0800, Jason Wang wrote:
> 
> On 2020/12/28 下午2:21, Halil Pasic wrote:
> > On Sun, 27 Dec 2020 05:00:05 -0500
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote:
> > > > When driver is trying to set DEVICE_STOPPED, the device MUST not
> > > > process new avail requests and MUST complete all requests that is
> > > > currently processing before setting DEVICE_STOPPED.
> > > ...
> > > 
> > > > Since we have a synchronous API setting DEVICE_STOPPED would also have to
> > > > block until all in-flight requests are completed.
> > > Judging from the surrounding discussion,
> > > when you say complete you seem to mean "use", and
> > My intention was merely to paraphrase Jason's proposal which says:
> > 
> > +When driver is trying to set DEVICE_STOPPED, the device MUST not
> > +process new avail requests and MUST complete all requests that is
> > +currently processing before setting DEVICE_STOPPED.
> > 
> > > I'm not sure how you define in flight, but
> > > it seems there could be many of these (up to a full queue?)
> > In the follow-on discussion 'in-flight' emerged as an alternative to
> > 'all requests that is currently processing' form the proposed text.
> > 
> > A large part of the discussion is, IMHO, about finding precise
> > definitions, for what the quoted paragraph is trying to express.
> > 
> > > so waiting for all available buffers to be
> > > used might indeed require an asynchronous interface,
> > > which gets complex very quickly.
> 
> 
> Some part of the virtio has enforced an asynchronous interface during reset:
> 
> For MMIO the spec said:
> 
> """
> 
> To stop using the queue the driver MUST write zero (0x0) to this QueueReady
> and MUST read the value back to ensure synchronization.
> 
> """
> 
> For PCI it said:
> 
> """
> 
> After writing 0 to device_status, the driver MUST wait for a read of
> device_status to return 0 before reinitializing the device.
> 
> """

Absolutely. It's ok for simple things.
However doing anything major like waiting for IO
in a tight loop is a problem.



> 
> > > 
> > > However wouldn't it be possible for device to just cancel
> > > processing available buffers even if it started processing
> > > them? Some buffers could be in indeterminate state
> > > (e.g. we might not have a way to know how much data did
> > > device have time to write into a buffer).
> > > 
> > Maybe Jason can answer this one.
> 
> 
> For networking device, it should be possible (packet could be dropped or
> duplicated). But I'm not sure it works for other device. E.g for the block
> devices that needs to communicate with a remote backend. Waat's more
> important, my understadning is that the intermediate state is something that
> we need to avoid (hard to be migrated anyhow).

The difficulty is on the device side though, so why not say: if it wants to
flush IO that's up to it.

> 
> > 
> > > Making it clearer what does "complete" mean here might help.
> > > 
> > I think what we are trying to accomplish here, is avoiding, having
> > non-standardised state in device (that can not be dropped) when
> > migrating.
> > 
> > I'm still struggling with wrapping my mind around the problem. AFAIU
> > migration and migratability is not really a feature of the virtio
> > standard, but can be a feature of it's implementation
> > (e.g. QEMU & KVM), where the standard does help a great deal by having
> > certain aspects of the operation and interaction nailed down.
> 
> 
> It looks like a must (at least from the level of semantics) for designing
> software API for vDPA. Using a standard spec may help to avoid subtle
> misunderstanding of different vendors.
> 
> Thanks
> 
> 
> > 
> > Regards,
> > Halil
> > 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-28 12:27                         ` Michael S. Tsirkin
@ 2020-12-29  8:57                           ` Jason Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2020-12-29  8:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare



----- Original Message -----
> On Mon, Dec 28, 2020 at 03:05:43PM +0800, Jason Wang wrote:
> > 
> > On 2020/12/27 下午7:12, Michael S. Tsirkin wrote:
> > > On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote:
> > > > > I tend to say, that from a perspective of the driver, all requests
> > > > > that
> > > > > are available, and not yet used, are in-flight. So we have to be very
> > > > > careful when wording this requirement, to avoid misunderstandings. I
> > > > > don't think the first RFC is good enough. I will think some more
> > > > > about
> > > > > this.
> > > > 
> > > > Yes, I agree. The problem is that the spec doesn't describe how device
> > > > work,
> > > > so if we want to be more accurate, it might require some work not only
> > > > for
> > > > stop but also for e.g reset (something like in flight has been used by
> > > > the
> > > > spec in that case).
> > > You probably mean the DEVICE_NEEDS_RESET description, right?
> > > 
> > > 	For example, the driver can't assume requests in flight will be
> > > 	completed if DEVICE_NEEDS_RESET is set, nor can it assume that
> > > 	they have not been completed.  A good implementation will try to
> > > 	recover by issuing a reset.
> > > 
> > > yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely
> > > is related to the fact it is not widely implemented.
> > 
> > 
> > For both NEEDS_RESET and device reset.
> > 
> > For NEEDS_RESET, we use "in flight" and "completed" without an actual
> > definition.
> > 
> > For device reset, we don't even define what is the device expected (e.g how
> > are "in flight" requests expected to be processed) to behave.
> > 
> > Thanks
> 
> Because device is expected to just stop:
> 
> 	None of the virtqueues
> 	of a device are live once the device has been reset.

So it's something similar to what DEVICE_STOPPED wants.

> 
> and it's driver's job to recover buffers:
> 
> 	Thus a driver MUST ensure a virtqueue isn't live (by device reset) before
> 	removing exposed buffers.
> 
> what happened to buffers which were not used?
> 
> I think it's a quality of implementation/device specific issue, e.g. for net:
> for transmit, if device sends a packet without using the buffer,

This sounds like an intermediate state we need to avoid in the
migration. Can we mandate the device to make the buffer used in this
case? For networking device, it should be not slow, just wait for the
DMA to be completed is sufficient.

> then driver will resend the packet leading to duplicates.
> for receive, it's a packet drop, slightly less of a problem.

For net yes.

> 
> 
> my question is whether such behaviour is sufficient for migration?

I suspect it's not, we need to either

1) wait for the buffer to be used

or 

2) fail the migration if the device is not stopped in a short time

> if not can we really describe it generally?

It looks like a virtqueue level issue, so we need to try.

> it's possible we need to
> describe it per device type.

Maybe.

Thanks

> 
> --
> MST
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-28 12:30                         ` Michael S. Tsirkin
@ 2020-12-29  9:04                           ` Jason Wang
  2021-01-12 10:54                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-29  9:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare



----- Original Message -----
> On Mon, Dec 28, 2020 at 03:01:57PM +0800, Jason Wang wrote:
> > 
> > On 2020/12/28 下午2:21, Halil Pasic wrote:
> > > On Sun, 27 Dec 2020 05:00:05 -0500
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > > > On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote:
> > > > > When driver is trying to set DEVICE_STOPPED, the device MUST not
> > > > > process new avail requests and MUST complete all requests that is
> > > > > currently processing before setting DEVICE_STOPPED.
> > > > ...
> > > > 
> > > > > Since we have a synchronous API setting DEVICE_STOPPED would also
> > > > > have to
> > > > > block until all in-flight requests are completed.
> > > > Judging from the surrounding discussion,
> > > > when you say complete you seem to mean "use", and
> > > My intention was merely to paraphrase Jason's proposal which says:
> > > 
> > > +When driver is trying to set DEVICE_STOPPED, the device MUST not
> > > +process new avail requests and MUST complete all requests that is
> > > +currently processing before setting DEVICE_STOPPED.
> > > 
> > > > I'm not sure how you define in flight, but
> > > > it seems there could be many of these (up to a full queue?)
> > > In the follow-on discussion 'in-flight' emerged as an alternative to
> > > 'all requests that is currently processing' form the proposed text.
> > > 
> > > A large part of the discussion is, IMHO, about finding precise
> > > definitions, for what the quoted paragraph is trying to express.
> > > 
> > > > so waiting for all available buffers to be
> > > > used might indeed require an asynchronous interface,
> > > > which gets complex very quickly.
> > 
> > 
> > Some part of the virtio has enforced an asynchronous interface during
> > reset:
> > 
> > For MMIO the spec said:
> > 
> > """
> > 
> > To stop using the queue the driver MUST write zero (0x0) to this QueueReady
> > and MUST read the value back to ensure synchronization.
> > 
> > """
> > 
> > For PCI it said:
> > 
> > """
> > 
> > After writing 0 to device_status, the driver MUST wait for a read of
> > device_status to return 0 before reinitializing the device.
> > 
> > """
> 
> Absolutely. It's ok for simple things.
> However doing anything major like waiting for IO
> in a tight loop is a problem.

It's mainly used for Qemu. And driver can choose to cancel the stop by
trying to clear the bit. Does this work?


> 
> 
> 
> > 
> > > > 
> > > > However wouldn't it be possible for device to just cancel
> > > > processing available buffers even if it started processing
> > > > them? Some buffers could be in indeterminate state
> > > > (e.g. we might not have a way to know how much data did
> > > > device have time to write into a buffer).
> > > > 
> > > Maybe Jason can answer this one.
> > 
> > 
> > For networking device, it should be possible (packet could be dropped or
> > duplicated). But I'm not sure it works for other device. E.g for the block
> > devices that needs to communicate with a remote backend. Waat's more
> > important, my understadning is that the intermediate state is something
> > that
> > we need to avoid (hard to be migrated anyhow).
> 
> The difficulty is on the device side though, so why not say: if it wants to
> flush IO that's up to it.

Do you mean to introduce a device specific way to flush IO? If yes, it
actually introduces a new intermediate state implicitly:

1) device is stopped
2) device is stopped and IO is flushed

This looks more complicated than a single new state:

1) device is stopped and IO is flushed

Thanks

> 
> > 
> > > 
> > > > Making it clearer what does "complete" mean here might help.
> > > > 
> > > I think what we are trying to accomplish here, is avoiding, having
> > > non-standardised state in device (that can not be dropped) when
> > > migrating.
> > > 
> > > I'm still struggling with wrapping my mind around the problem. AFAIU
> > > migration and migratability is not really a feature of the virtio
> > > standard, but can be a feature of it's implementation
> > > (e.g. QEMU & KVM), where the standard does help a great deal by having
> > > certain aspects of the operation and interaction nailed down.
> > 
> > 
> > It looks like a must (at least from the level of semantics) for designing
> > software API for vDPA. Using a standard spec may help to avoid subtle
> > misunderstanding of different vendors.
> > 
> > Thanks
> > 
> > 
> > > 
> > > Regards,
> > > Halil
> > > 
> 
> 


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-28  6:47                   ` Jason Wang
@ 2020-12-29 13:20                     ` Halil Pasic
  2020-12-30  8:03                       ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Halil Pasic @ 2020-12-29 13:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare

On Mon, 28 Dec 2020 14:47:21 +0800
Jason Wang <jasowang@redhat.com> wrote:

> 
[..]
> > Right. The reason why we have to wait for the completion of the channel
> > program is exactly that. Furthermore if we have to fail setting
> > FEATURES_OK we fail the channel program, that was supposed to do the
> > operation with an unit check. So at the point where the set_status() is
> > done, we already know if it worked or not.
> 
> 
> The only difference is, for FEATURES_OK, driver know it won't work any 
> more. But for DEVICE_STOPPED, driver know it might work sometime in the 
> future.
> 

If the channel program fails with a command reject, that that ccw is not
allowed to cause further side effects. I.e. we can not fail the channel
program, but then make the operation succeed later.

> 
> > Of course the device can
> > change it's status any time.
> >
> > Since we have a synchronous API setting DEVICE_STOPPED would also have to
> > block until all in-flight requests are completed. But this does not see
> > to be what you want. I understood that
> > virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would
> > a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set,
> > if there are still outstanding requests.
> 
> 
> Yes.
> 
> 
> >
> > I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by
> > a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED)
> > (followed by a readback) as asynchronous is a good idea.
> 
> 
> The main reason is that DEVICE_STOPPED is a one of the possible state. 
> We'd better do this by extending the current device status state 
> machine. Otherwise it would be more complicated (e.g using another 
> interface).
> 
> It looks to me what you don't like is the driver API design? If yes, how 
> about introduce something like virtio_try_add_status()? The difference 
> is that it doesn't  require a immediate result.
>

Yes a new (Linux) driver API operation sounds good -- at least on the
surface. I would rather call it virtio_add_status_async() than
virtio_try_add_status() because for me 'try' tells that something may
or may not work, but the outcome is decided when the function
returns. E.g. try_lock(l) may or may not succeed the lock l, but when it
returns you know if it did take l or not.

But how would you implement that new operation? I guess it would still
have to boil down to config->set_status(dev), and to set_status(s),
being asynchronous when s & DEVICE_STOPPED and synchronous otherwise (and
in particular FEATURES_OK is added to the status).

Ultimately a set_status(b) boils down to a iowrite8(b) for the PCI
transport, and to a channel program that references a single byte buffer
with the value of b for Channel IO transport. Making that synchroneaous
or asynchronous depending on the value of b does not sound right.

Maybe Conny can chime in on this later as well.

[..]
> >   AFAIK x86 uses normal
> > memory access instructions. I suppose when a store instruction does not
> > fail, then the store must happen. Or is that only true for RAM and for
> > PCI memory not?
> 
> 
> AFAIK, PCI have non-posted write which can fail but the spec doesn't 
> requires that.
> 

So for a posted write a PCI device can just silently discard writes it
does not 'like' (i.e. pretend there was no such transaction in the first
place)?

[..]
> 
> So did PCI:
> 
>      case VIRTIO_PCI_STATUS:
>          if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>              virtio_pci_stop_ioeventfd(proxy);
>          }
> 
>          virtio_set_status(vdev, val & 0xFF);
> 
>          if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>              virtio_pci_start_ioeventfd(proxy);
>          }
> 
>          if (vdev->status == 0) {
>              virtio_pci_reset(DEVICE(proxy));
>          }
> 
> As you said, it's a implementation property since qemu implements a 
> synchronous virtio_set_status(). But it might not be true for real 
> hardware which may require more time or even hard to drain a queue in 
> short time. That's why PCI spec mandate the driver to poll for the 
> status after writing zero to device status.
> 

That is a PCI specific thing, and unfortunately my PCI skills are very
limited.

Please compare 

static void vp_reset(struct virtio_device *vdev)                                
{                                                                               
	vp_iowrite8(0, &vp_dev->common->device_status);
[..]
        while (vp_ioread8(&vp_dev->common->device_status))                      
                msleep(1);

With 
int virtio_finalize_features(struct virtio_device *dev)                         
{                                                                               
[..]
        virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);                    
        status = dev->config->get_status(dev);                                  
        if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {  

The latter clearly assumes that the first get_status() will
yield the final outcome, where the former clearly assumes that
the effect of set_status() may be delayed beyond the first
subsequent get_status().

Btw, if the device were to to fail resetting itself, what would
you expect to happen? I would expect the device to present a
status DEVICE_NEEDS_RESET, but that would effectively make
vp_reset() getting stuck in that loop. Or does the spec guarantee
that a reset must work eventually?

[..]

> >> PCI can use new filed. But I wonder how can that help for CCW.
> > Using a new mechanism to request the device to stop would help with
> > the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK
> > synchronous, and make REQUEST_STOP asynchronous.
> 
> 
> Right, but as said before, it looks more like a part of device status 
> state machine, using new mechanism might bring extra complexity and 
> confusion.
> 
> Looking at CCW implementation of device status, I don't see any blocker 
> for implementing it synchronously if we just wait for the stop of the 
> device and then return?
> 

I don't see a problem on the qemu side, but I do see a problem on
the guest side. There is a reason why you want set DEVICE_STOPPED to
be asynchronous.

Regards,
Halil

[..]

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-28  7:01                       ` Jason Wang
  2020-12-28 12:30                         ` Michael S. Tsirkin
@ 2020-12-29 13:35                         ` Halil Pasic
  2020-12-30  8:15                           ` Jason Wang
  1 sibling, 1 reply; 43+ messages in thread
From: Halil Pasic @ 2020-12-29 13:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Cornelia Huck, stefanha, virtio-comment,
	eperezma, sgarzare

On Mon, 28 Dec 2020 15:01:57 +0800
Jason Wang <jasowang@redhat.com> wrote:

> Some part of the virtio has enforced an asynchronous interface during reset:
> 
> For MMIO the spec said:
> 
> """
> 
> To stop using the queue the driver MUST write zero (0x0) to this 
> QueueReady and MUST read the value back to ensure synchronization.
> 
> """

I read the MMIO quote like a single read is sufficient to ensure
synchronization. I.e. it does not require a loop which waits for
the read to yield the expected value.

> 
> For PCI it said:
> 
> """
> 
> After writing 0 to device_status, the driver MUST wait for a read of 
> device_status to return 0 before reinitializing the device.
> 
> """

Yes, this does sound like a loop. And that's what Linux does. But this
is transport (PCI) specific. On a spec level, a reset is a distinct
operation from setting device status (to 0). It just happens to be
mapped to the PCI transport as setting the status to 0.

For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while
setting the status is mapped via CCW_CMD_WRITE_STATUS.

Regards,
Halil

This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-29 13:20                     ` Halil Pasic
@ 2020-12-30  8:03                       ` Jason Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2020-12-30  8:03 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare


On 2020/12/29 下午9:20, Halil Pasic wrote:
> On Mon, 28 Dec 2020 14:47:21 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
> [..]
>>> Right. The reason why we have to wait for the completion of the channel
>>> program is exactly that. Furthermore if we have to fail setting
>>> FEATURES_OK we fail the channel program, that was supposed to do the
>>> operation with an unit check. So at the point where the set_status() is
>>> done, we already know if it worked or not.
>>
>> The only difference is, for FEATURES_OK, driver know it won't work any
>> more. But for DEVICE_STOPPED, driver know it might work sometime in the
>> future.
>>
> If the channel program fails with a command reject, that that ccw is not
> allowed to cause further side effects. I.e. we can not fail the channel
> program, but then make the operation succeed later.
>
>>> Of course the device can
>>> change it's status any time.
>>>
>>> Since we have a synchronous API setting DEVICE_STOPPED would also have to
>>> block until all in-flight requests are completed. But this does not see
>>> to be what you want. I understood that
>>> virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would
>>> a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set,
>>> if there are still outstanding requests.
>>
>> Yes.
>>
>>
>>> I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by
>>> a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED)
>>> (followed by a readback) as asynchronous is a good idea.
>>
>> The main reason is that DEVICE_STOPPED is a one of the possible state.
>> We'd better do this by extending the current device status state
>> machine. Otherwise it would be more complicated (e.g using another
>> interface).
>>
>> It looks to me what you don't like is the driver API design? If yes, how
>> about introduce something like virtio_try_add_status()? The difference
>> is that it doesn't  require a immediate result.
>>
> Yes a new (Linux) driver API operation sounds good -- at least on the
> surface. I would rather call it virtio_add_status_async() than
> virtio_try_add_status() because for me 'try' tells that something may
> or may not work, but the outcome is decided when the function
> returns. E.g. try_lock(l) may or may not succeed the lock l, but when it
> returns you know if it did take l or not.


I agree.


>
> But how would you implement that new operation? I guess it would still
> have to boil down to config->set_status(dev), and to set_status(s),
> being asynchronous when s & DEVICE_STOPPED and synchronous otherwise (and
> in particular FEATURES_OK is added to the status).


Or it could be done via a dedicated config ops like how vp_reset() did.


>
> Ultimately a set_status(b) boils down to a iowrite8(b) for the PCI
> transport, and to a channel program that references a single byte buffer
> with the value of b for Channel IO transport. Making that synchroneaous
> or asynchronous depending on the value of b does not sound right.


Just to make sure we're in the same page. I meant e.g in 
CCW_CMD_WRITE_STATUS, qemu just propagate the request of device stop to 
device and return success. And it's the device that decide when it can 
set the DEVICE_STOPPED bit and then the next CCW_CMD_READ_STATUS could 
read the bit.


>
> Maybe Conny can chime in on this later as well.
>
> [..]
>>>    AFAIK x86 uses normal
>>> memory access instructions. I suppose when a store instruction does not
>>> fail, then the store must happen. Or is that only true for RAM and for
>>> PCI memory not?
>>
>> AFAIK, PCI have non-posted write which can fail but the spec doesn't
>> requires that.
>>
> So for a posted write a PCI device can just silently discard writes it
> does not 'like' (i.e. pretend there was no such transaction in the first
> place)?


I might be wrong, but it's my understanding as well.


>
> [..]
>> So did PCI:
>>
>>       case VIRTIO_PCI_STATUS:
>>           if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
>>               virtio_pci_stop_ioeventfd(proxy);
>>           }
>>
>>           virtio_set_status(vdev, val & 0xFF);
>>
>>           if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
>>               virtio_pci_start_ioeventfd(proxy);
>>           }
>>
>>           if (vdev->status == 0) {
>>               virtio_pci_reset(DEVICE(proxy));
>>           }
>>
>> As you said, it's a implementation property since qemu implements a
>> synchronous virtio_set_status(). But it might not be true for real
>> hardware which may require more time or even hard to drain a queue in
>> short time. That's why PCI spec mandate the driver to poll for the
>> status after writing zero to device status.
>>
> That is a PCI specific thing, and unfortunately my PCI skills are very
> limited.
>
> Please compare
>
> static void vp_reset(struct virtio_device *vdev)
> {
> 	vp_iowrite8(0, &vp_dev->common->device_status);
> [..]
>          while (vp_ioread8(&vp_dev->common->device_status))
>                  msleep(1);
>
> With
> int virtio_finalize_features(struct virtio_device *dev)
> {
> [..]
>          virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>          status = dev->config->get_status(dev);
>          if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
>
> The latter clearly assumes that the first get_status() will
> yield the final outcome, where the former clearly assumes that
> the effect of set_status() may be delayed beyond the first
> subsequent get_status().


I think the reason is that. For PCI, the spec forces the device to 
either set or clear the FEATURE_OK during write. But for reset, device 
may choose to delay setting status to 0 after the write.


>
> Btw, if the device were to to fail resetting itself, what would
> you expect to happen? I would expect the device to present a
> status DEVICE_NEEDS_RESET, but that would effectively make
> vp_reset() getting stuck in that loop.


Yes, then it looks like a recursion here.


>   Or does the spec guarantee
> that a reset must work eventually?


My understanding is that the spec implies it must succeed eventually.


>
> [..]
>
>>>> PCI can use new filed. But I wonder how can that help for CCW.
>>> Using a new mechanism to request the device to stop would help with
>>> the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK
>>> synchronous, and make REQUEST_STOP asynchronous.
>>
>> Right, but as said before, it looks more like a part of device status
>> state machine, using new mechanism might bring extra complexity and
>> confusion.
>>
>> Looking at CCW implementation of device status, I don't see any blocker
>> for implementing it synchronously if we just wait for the stop of the
>> device and then return?
>>
> I don't see a problem on the qemu side, but I do see a problem on
> the guest side. There is a reason why you want set DEVICE_STOPPED to
> be asynchronous.


So my understanding is that. at least for current kernel virtio drivers, 
it doesn't need DEVICE_STOPPED since there's no actual use cases. So no 
need for inventing new helpers.

It would be only need for the vDPA drivers that is used by userspace via 
vhost-vDPA. In this case, we just provide the mechanism to write and 
read status, vDPA driver don't need to enforce any policy in the middle 
(e.g polling or not).

Thanks


>
> Regards,
> Halil
>
> [..]
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-29 13:35                         ` Halil Pasic
@ 2020-12-30  8:15                           ` Jason Wang
  2021-01-11 18:16                             ` Cornelia Huck
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2020-12-30  8:15 UTC (permalink / raw)
  To: Halil Pasic
  Cc: Michael S. Tsirkin, Cornelia Huck, stefanha, virtio-comment,
	eperezma, sgarzare


On 2020/12/29 下午9:35, Halil Pasic wrote:
> On Mon, 28 Dec 2020 15:01:57 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> Some part of the virtio has enforced an asynchronous interface during reset:
>>
>> For MMIO the spec said:
>>
>> """
>>
>> To stop using the queue the driver MUST write zero (0x0) to this
>> QueueReady and MUST read the value back to ensure synchronization.
>>
>> """
> I read the MMIO quote like a single read is sufficient to ensure
> synchronization. I.e. it does not require a loop which waits for
> the read to yield the expected value.
>
>> For PCI it said:
>>
>> """
>>
>> After writing 0 to device_status, the driver MUST wait for a read of
>> device_status to return 0 before reinitializing the device.
>>
>> """
> Yes, this does sound like a loop. And that's what Linux does. But this
> is transport (PCI) specific. On a spec level, a reset is a distinct
> operation from setting device status (to 0).


I'm not sure or it looks unclear to me for this point.

E.g the device reset is mentioned in "Device Status Field" belongs to 
"Basic Facilities of a Virtio Device". But there's no "Device Reset" in 
"Basic Facilities of a Virtio Device".


> It just happens to be
> mapped to the PCI transport as setting the status to 0.


MMIO did the same. And it makes sense since using a single API to 
configure/change device status looks simpler.


>
> For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while
> setting the status is mapped via CCW_CMD_WRITE_STATUS.


Yes, but I think actually there's no limitation if we want to tread 0 as 
a reset command for CCW_CMD_WRITE_STATUS.

Thanks


>
> Regards,
> Halil
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-30  8:15                           ` Jason Wang
@ 2021-01-11 18:16                             ` Cornelia Huck
  2021-01-12  3:27                               ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Cornelia Huck @ 2021-01-11 18:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment,
	eperezma, sgarzare

On Wed, 30 Dec 2020 16:15:47 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/12/29 下午9:35, Halil Pasic wrote:
> > On Mon, 28 Dec 2020 15:01:57 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> Some part of the virtio has enforced an asynchronous interface during reset:
> >>
> >> For MMIO the spec said:
> >>
> >> """
> >>
> >> To stop using the queue the driver MUST write zero (0x0) to this
> >> QueueReady and MUST read the value back to ensure synchronization.
> >>
> >> """  
> > I read the MMIO quote like a single read is sufficient to ensure
> > synchronization. I.e. it does not require a loop which waits for
> > the read to yield the expected value.
> >  
> >> For PCI it said:
> >>
> >> """
> >>
> >> After writing 0 to device_status, the driver MUST wait for a read of
> >> device_status to return 0 before reinitializing the device.
> >>
> >> """  
> > Yes, this does sound like a loop. And that's what Linux does. But this
> > is transport (PCI) specific. On a spec level, a reset is a distinct
> > operation from setting device status (to 0).  
> 
> 
> I'm not sure or it looks unclear to me for this point.
> 
> E.g the device reset is mentioned in "Device Status Field" belongs to 
> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in 
> "Basic Facilities of a Virtio Device".

I think it should be, just to make clear what a driver-initiated reset
of a device actually resets (and that the method for doing so is
transport-specific.)

> 
> 
> > It just happens to be
> > mapped to the PCI transport as setting the status to 0.  
> 
> 
> MMIO did the same. And it makes sense since using a single API to 
> configure/change device status looks simpler.
> 
> 
> >
> > For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while
> > setting the status is mapped via CCW_CMD_WRITE_STATUS.  
> 
> 
> Yes, but I think actually there's no limitation if we want to tread 0 as 
> a reset command for CCW_CMD_WRITE_STATUS.

I'm not a fan of the "driver writes 0 to status to initiate a device
reset" method, but we are stuck with it. I think it's actually not
working well with two other requirements in the spec:

- "The driver MUST NOT clear a device status bit."
- "The device MUST initialize device status to 0 upon reset." (This
  suggests to me that a zero status is something set by the device as
  the result of a reset request by the driver, and _not_ set by the
  driver.)

Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be
incompatible with older devices, wouldn't it?


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2021-01-11 18:16                             ` Cornelia Huck
@ 2021-01-12  3:27                               ` Jason Wang
  2021-01-12 12:13                                 ` Cornelia Huck
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2021-01-12  3:27 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment,
	eperezma, sgarzare


On 2021/1/12 上午2:16, Cornelia Huck wrote:
> On Wed, 30 Dec 2020 16:15:47 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/12/29 下午9:35, Halil Pasic wrote:
>>> On Mon, 28 Dec 2020 15:01:57 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> Some part of the virtio has enforced an asynchronous interface during reset:
>>>>
>>>> For MMIO the spec said:
>>>>
>>>> """
>>>>
>>>> To stop using the queue the driver MUST write zero (0x0) to this
>>>> QueueReady and MUST read the value back to ensure synchronization.
>>>>
>>>> """
>>> I read the MMIO quote like a single read is sufficient to ensure
>>> synchronization. I.e. it does not require a loop which waits for
>>> the read to yield the expected value.
>>>   
>>>> For PCI it said:
>>>>
>>>> """
>>>>
>>>> After writing 0 to device_status, the driver MUST wait for a read of
>>>> device_status to return 0 before reinitializing the device.
>>>>
>>>> """
>>> Yes, this does sound like a loop. And that's what Linux does. But this
>>> is transport (PCI) specific. On a spec level, a reset is a distinct
>>> operation from setting device status (to 0).
>>
>> I'm not sure or it looks unclear to me for this point.
>>
>> E.g the device reset is mentioned in "Device Status Field" belongs to
>> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in
>> "Basic Facilities of a Virtio Device".
> I think it should be, just to make clear what a driver-initiated reset
> of a device actually resets (and that the method for doing so is
> transport-specific.)


Then we need some clarifications in the spec. It would be easily imply 
that reset is part of device status after reading "Basic Facilities of a 
Virtio Device".


>
>>
>>> It just happens to be
>>> mapped to the PCI transport as setting the status to 0.
>>
>> MMIO did the same. And it makes sense since using a single API to
>> configure/change device status looks simpler.
>>
>>
>>> For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while
>>> setting the status is mapped via CCW_CMD_WRITE_STATUS.
>>
>> Yes, but I think actually there's no limitation if we want to tread 0 as
>> a reset command for CCW_CMD_WRITE_STATUS.
> I'm not a fan of the "driver writes 0 to status to initiate a device
> reset" method, but we are stuck with it. I think it's actually not
> working well with two other requirements in the spec:
>
> - "The driver MUST NOT clear a device status bit."


Yes, that's kind of conflict but it was how PCI works now ...


> - "The device MUST initialize device status to 0 upon reset." (This
>    suggests to me that a zero status is something set by the device as
>    the result of a reset request by the driver, and _not_ set by the
>    driver.)


Not a native speaker but we probably need to define "set" and "clear" 
mean, E.g:

Clear a bit from driver means write with that bit cleared. Set a bit 
from driver means write with that bit set.
Clear a bit from device means read with that be cleared. Set a bit from 
device means read with that bit set.

So a question here is how to trigger the device set as we discussed before.


>
> Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be
> incompatible with older devices, wouldn't it?


Probably, but we can make both methods work. Note that I'm not 
suggesting doing something like this. Just to point out it may work. And 
there's something not clear in the spec.

Thanks


>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-29  9:04                           ` Jason Wang
@ 2021-01-12 10:54                             ` Michael S. Tsirkin
  2021-01-13  3:35                               ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2021-01-12 10:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare

On Tue, Dec 29, 2020 at 04:04:27AM -0500, Jason Wang wrote:
> > > > > However wouldn't it be possible for device to just cancel
> > > > > processing available buffers even if it started processing
> > > > > them? Some buffers could be in indeterminate state
> > > > > (e.g. we might not have a way to know how much data did
> > > > > device have time to write into a buffer).
> > > > > 
> > > > Maybe Jason can answer this one.
> > > 
> > > 
> > > For networking device, it should be possible (packet could be dropped or
> > > duplicated). But I'm not sure it works for other device. E.g for the block
> > > devices that needs to communicate with a remote backend. Waat's more
> > > important, my understadning is that the intermediate state is something
> > > that
> > > we need to avoid (hard to be migrated anyhow).
> > 
> > The difficulty is on the device side though, so why not say: if it wants to
> > flush IO that's up to it.
> 
> Do you mean to introduce a device specific way to flush IO? If yes, it
> actually introduces a new intermediate state implicitly:
> 
> 1) device is stopped
> 2) device is stopped and IO is flushed
> 
> This looks more complicated than a single new state:
> 
> 1) device is stopped and IO is flushed
> 
> Thanks

I thought some more about it, if we call a buffer which was
made available by driver but not yet used by device,
a cancelled buffer, then maybe we can specify
the behaviour by explaining that it should be possible to
retry any cancelled buffer, that is make it available again,
and achieve the same effect as if it was used.
For example, any reads without side effects can be just
ignored even if they might have modified guest memory.

-- 
MST


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2021-01-12  3:27                               ` Jason Wang
@ 2021-01-12 12:13                                 ` Cornelia Huck
  2021-01-13  2:52                                   ` Jason Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Cornelia Huck @ 2021-01-12 12:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment,
	eperezma, sgarzare

On Tue, 12 Jan 2021 11:27:20 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2021/1/12 上午2:16, Cornelia Huck wrote:
> > On Wed, 30 Dec 2020 16:15:47 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2020/12/29 下午9:35, Halil Pasic wrote:  
> >>> On Mon, 28 Dec 2020 15:01:57 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>     
> >>>> Some part of the virtio has enforced an asynchronous interface during reset:
> >>>>
> >>>> For MMIO the spec said:
> >>>>
> >>>> """
> >>>>
> >>>> To stop using the queue the driver MUST write zero (0x0) to this
> >>>> QueueReady and MUST read the value back to ensure synchronization.
> >>>>
> >>>> """  
> >>> I read the MMIO quote like a single read is sufficient to ensure
> >>> synchronization. I.e. it does not require a loop which waits for
> >>> the read to yield the expected value.
> >>>     
> >>>> For PCI it said:
> >>>>
> >>>> """
> >>>>
> >>>> After writing 0 to device_status, the driver MUST wait for a read of
> >>>> device_status to return 0 before reinitializing the device.
> >>>>
> >>>> """  
> >>> Yes, this does sound like a loop. And that's what Linux does. But this
> >>> is transport (PCI) specific. On a spec level, a reset is a distinct
> >>> operation from setting device status (to 0).  
> >>
> >> I'm not sure or it looks unclear to me for this point.
> >>
> >> E.g the device reset is mentioned in "Device Status Field" belongs to
> >> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in
> >> "Basic Facilities of a Virtio Device".  
> > I think it should be, just to make clear what a driver-initiated reset
> > of a device actually resets (and that the method for doing so is
> > transport-specific.)  
> 
> 
> Then we need some clarifications in the spec. It would be easily imply 
> that reset is part of device status after reading "Basic Facilities of a 
> Virtio Device".

I can try to come up with a patch.

> 
> 
> >  
> >>  
> >>> It just happens to be
> >>> mapped to the PCI transport as setting the status to 0.  
> >>
> >> MMIO did the same. And it makes sense since using a single API to
> >> configure/change device status looks simpler.
> >>
> >>  
> >>> For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while
> >>> setting the status is mapped via CCW_CMD_WRITE_STATUS.  
> >>
> >> Yes, but I think actually there's no limitation if we want to tread 0 as
> >> a reset command for CCW_CMD_WRITE_STATUS.  
> > I'm not a fan of the "driver writes 0 to status to initiate a device
> > reset" method, but we are stuck with it. I think it's actually not
> > working well with two other requirements in the spec:
> >
> > - "The driver MUST NOT clear a device status bit."  
> 
> 
> Yes, that's kind of conflict but it was how PCI works now ...

Nod.

> 
> 
> > - "The device MUST initialize device status to 0 upon reset." (This
> >    suggests to me that a zero status is something set by the device as
> >    the result of a reset request by the driver, and _not_ set by the
> >    driver.)  
> 
> 
> Not a native speaker but we probably need to define "set" and "clear" 
> mean, E.g:
> 
> Clear a bit from driver means write with that bit cleared. Set a bit 
> from driver means write with that bit set.

Agreed.

> Clear a bit from device means read with that be cleared. Set a bit from 
> device means read with that bit set.

I always saw the status as an actual part of the device. I.e., the
device setting or clearing a bit means that the device is modifying the
status in the device. The driver doing a read will get the status as it
currently is within the device. IOW, the device "controls" the status,
and the driver can submit changes to the status.

[Maybe that's closer to how ccw operates with its commands for reading
or writing the status. Are pci/mmio envisioning the status more like
some kind of shared area?]

> 
> So a question here is how to trigger the device set as we discussed before.
> 
> 
> >
> > Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be
> > incompatible with older devices, wouldn't it?  
> 
> 
> Probably, but we can make both methods work. Note that I'm not 
> suggesting doing something like this. Just to point out it may work. And 
> there's something not clear in the spec.

I'm still not convinced that's the way to go. But first, we should be
clear on how the status really is supposed to work.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2021-01-12 12:13                                 ` Cornelia Huck
@ 2021-01-13  2:52                                   ` Jason Wang
  2021-01-14 12:00                                     ` Cornelia Huck
  0 siblings, 1 reply; 43+ messages in thread
From: Jason Wang @ 2021-01-13  2:52 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment,
	eperezma, sgarzare


On 2021/1/12 下午8:13, Cornelia Huck wrote:
> On Tue, 12 Jan 2021 11:27:20 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2021/1/12 上午2:16, Cornelia Huck wrote:
>>> On Wed, 30 Dec 2020 16:15:47 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> On 2020/12/29 下午9:35, Halil Pasic wrote:
>>>>> On Mon, 28 Dec 2020 15:01:57 +0800
>>>>> Jason Wang <jasowang@redhat.com> wrote:
>>>>>      
>>>>>> Some part of the virtio has enforced an asynchronous interface during reset:
>>>>>>
>>>>>> For MMIO the spec said:
>>>>>>
>>>>>> """
>>>>>>
>>>>>> To stop using the queue the driver MUST write zero (0x0) to this
>>>>>> QueueReady and MUST read the value back to ensure synchronization.
>>>>>>
>>>>>> """
>>>>> I read the MMIO quote like a single read is sufficient to ensure
>>>>> synchronization. I.e. it does not require a loop which waits for
>>>>> the read to yield the expected value.
>>>>>      
>>>>>> For PCI it said:
>>>>>>
>>>>>> """
>>>>>>
>>>>>> After writing 0 to device_status, the driver MUST wait for a read of
>>>>>> device_status to return 0 before reinitializing the device.
>>>>>>
>>>>>> """
>>>>> Yes, this does sound like a loop. And that's what Linux does. But this
>>>>> is transport (PCI) specific. On a spec level, a reset is a distinct
>>>>> operation from setting device status (to 0).
>>>> I'm not sure or it looks unclear to me for this point.
>>>>
>>>> E.g the device reset is mentioned in "Device Status Field" belongs to
>>>> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in
>>>> "Basic Facilities of a Virtio Device".
>>> I think it should be, just to make clear what a driver-initiated reset
>>> of a device actually resets (and that the method for doing so is
>>> transport-specific.)
>>
>> Then we need some clarifications in the spec. It would be easily imply
>> that reset is part of device status after reading "Basic Facilities of a
>> Virtio Device".
> I can try to come up with a patch.


Thanks


>
>>
>>>   
>>>>   
>>>>> It just happens to be
>>>>> mapped to the PCI transport as setting the status to 0.
>>>> MMIO did the same. And it makes sense since using a single API to
>>>> configure/change device status looks simpler.
>>>>
>>>>   
>>>>> For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while
>>>>> setting the status is mapped via CCW_CMD_WRITE_STATUS.
>>>> Yes, but I think actually there's no limitation if we want to tread 0 as
>>>> a reset command for CCW_CMD_WRITE_STATUS.
>>> I'm not a fan of the "driver writes 0 to status to initiate a device
>>> reset" method, but we are stuck with it. I think it's actually not
>>> working well with two other requirements in the spec:
>>>
>>> - "The driver MUST NOT clear a device status bit."
>>
>> Yes, that's kind of conflict but it was how PCI works now ...
> Nod.
>
>>
>>> - "The device MUST initialize device status to 0 upon reset." (This
>>>     suggests to me that a zero status is something set by the device as
>>>     the result of a reset request by the driver, and _not_ set by the
>>>     driver.)
>>
>> Not a native speaker but we probably need to define "set" and "clear"
>> mean, E.g:
>>
>> Clear a bit from driver means write with that bit cleared. Set a bit
>> from driver means write with that bit set.
> Agreed.
>
>> Clear a bit from device means read with that be cleared. Set a bit from
>> device means read with that bit set.
> I always saw the status as an actual part of the device. I.e., the
> device setting or clearing a bit means that the device is modifying the
> status in the device. The driver doing a read will get the status as it
> currently is within the device. IOW, the device "controls" the status,
> and the driver can submit changes to the status.
>
> [Maybe that's closer to how ccw operates with its commands for reading
> or writing the status. Are pci/mmio envisioning the status more like
> some kind of shared area?]


My understanding is it's not a shared area, each MMIO read and write 
will generate a PCI transaction on the bus.

Some developers upstream told me that in the PCIE Endpoint device, the 
MMIO area could be a shared memory from the view of endpoint device 
which makes it very hard to make current virtio-pci transport work[1].

[1] 
https://lore.kernel.org/linux-pci/20190827180114.5979-1-haotian.wang@sifive.com/T/ 



>
>> So a question here is how to trigger the device set as we discussed before.
>>
>>
>>> Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be
>>> incompatible with older devices, wouldn't it?
>>
>> Probably, but we can make both methods work. Note that I'm not
>> suggesting doing something like this. Just to point out it may work. And
>> there's something not clear in the spec.
> I'm still not convinced that's the way to go. But first, we should be
> clear on how the status really is supposed to work.


Right, I meant we had two ways to make the implement aligned with the spec:

1) implement zero write as reset (and maybe keep the old way), so we 
don't need to modify the spec
2) clarify that transport specific device reset is part of the basic 
facilities

Either is fine I think.

Thanks


>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2021-01-12 10:54                             ` Michael S. Tsirkin
@ 2021-01-13  3:35                               ` Jason Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2021-01-13  3:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare


On 2021/1/12 下午6:54, Michael S. Tsirkin wrote:
> On Tue, Dec 29, 2020 at 04:04:27AM -0500, Jason Wang wrote:
>>>>>> However wouldn't it be possible for device to just cancel
>>>>>> processing available buffers even if it started processing
>>>>>> them? Some buffers could be in indeterminate state
>>>>>> (e.g. we might not have a way to know how much data did
>>>>>> device have time to write into a buffer).
>>>>>>
>>>>> Maybe Jason can answer this one.
>>>>
>>>> For networking device, it should be possible (packet could be dropped or
>>>> duplicated). But I'm not sure it works for other device. E.g for the block
>>>> devices that needs to communicate with a remote backend. Waat's more
>>>> important, my understadning is that the intermediate state is something
>>>> that
>>>> we need to avoid (hard to be migrated anyhow).
>>> The difficulty is on the device side though, so why not say: if it wants to
>>> flush IO that's up to it.
>> Do you mean to introduce a device specific way to flush IO? If yes, it
>> actually introduces a new intermediate state implicitly:
>>
>> 1) device is stopped
>> 2) device is stopped and IO is flushed
>>
>> This looks more complicated than a single new state:
>>
>> 1) device is stopped and IO is flushed
>>
>> Thanks
> I thought some more about it, if we call a buffer which was
> made available by driver but not yet used by device,


For "used by device" you meant posting entries to used ring?


> a cancelled buffer, then maybe we can specify
> the behaviour by explaining that it should be possible to
> retry any cancelled buffer, that is make it available again,
> and achieve the same effect as if it was used.
> For example, any reads without side effects can be just
> ignored even if they might have modified guest memory.


Right, but how about the buffer that can't be retried? Or you meant we 
leave that to a specific device to handle?

THanks


>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2021-01-13  2:52                                   ` Jason Wang
@ 2021-01-14 12:00                                     ` Cornelia Huck
  0 siblings, 0 replies; 43+ messages in thread
From: Cornelia Huck @ 2021-01-14 12:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment,
	eperezma, sgarzare

On Wed, 13 Jan 2021 10:52:32 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2021/1/12 下午8:13, Cornelia Huck wrote:
> > On Tue, 12 Jan 2021 11:27:20 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2021/1/12 上午2:16, Cornelia Huck wrote:  
> >>> On Wed, 30 Dec 2020 16:15:47 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>     
> >>>> On 2020/12/29 下午9:35, Halil Pasic wrote:  
> >>>>> On Mon, 28 Dec 2020 15:01:57 +0800
> >>>>> Jason Wang <jasowang@redhat.com> wrote:
> >>>>>        
> >>>>>> Some part of the virtio has enforced an asynchronous interface during reset:
> >>>>>>
> >>>>>> For MMIO the spec said:
> >>>>>>
> >>>>>> """
> >>>>>>
> >>>>>> To stop using the queue the driver MUST write zero (0x0) to this
> >>>>>> QueueReady and MUST read the value back to ensure synchronization.
> >>>>>>
> >>>>>> """  
> >>>>> I read the MMIO quote like a single read is sufficient to ensure
> >>>>> synchronization. I.e. it does not require a loop which waits for
> >>>>> the read to yield the expected value.
> >>>>>        
> >>>>>> For PCI it said:
> >>>>>>
> >>>>>> """
> >>>>>>
> >>>>>> After writing 0 to device_status, the driver MUST wait for a read of
> >>>>>> device_status to return 0 before reinitializing the device.
> >>>>>>
> >>>>>> """  
> >>>>> Yes, this does sound like a loop. And that's what Linux does. But this
> >>>>> is transport (PCI) specific. On a spec level, a reset is a distinct
> >>>>> operation from setting device status (to 0).  
> >>>> I'm not sure or it looks unclear to me for this point.
> >>>>
> >>>> E.g the device reset is mentioned in "Device Status Field" belongs to
> >>>> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in
> >>>> "Basic Facilities of a Virtio Device".  
> >>> I think it should be, just to make clear what a driver-initiated reset
> >>> of a device actually resets (and that the method for doing so is
> >>> transport-specific.)  
> >>
> >> Then we need some clarifications in the spec. It would be easily imply
> >> that reset is part of device status after reading "Basic Facilities of a
> >> Virtio Device".  
> > I can try to come up with a patch.  
> 
> 
> Thanks

Attempt here:

https://lists.oasis-open.org/archives/virtio-comment/202101/msg00020.html


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-18  4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang
  2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella
  2020-12-21 21:33 ` [virtio-comment] " Halil Pasic
@ 2021-05-03  9:02 ` Eugenio Perez Martin
  2021-05-06  2:51   ` Jason Wang
  2021-05-05 13:16 ` Michael S. Tsirkin
  3 siblings, 1 reply; 43+ messages in thread
From: Eugenio Perez Martin @ 2021-05-03  9:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: Stefan Hajnoczi, virtio-comment, Michael Tsirkin, Stefano Garzarella

On Fri, Dec 18, 2020 at 5:23 AM Jason Wang <jasowang@redhat.com> wrote:
>
> This patch introduces a new status bit DEVICE_STOPPED. This will be
> used by the driver to stop and resume a device. The main user will be
> live migration support for virtio device.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  content.tex | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/content.tex b/content.tex
> index 61eab41..4392b60 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>
> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> +  indicates that the device has been stopped by the driver.
> +
>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \ref{sec:General Initialization And Device Operation / Device
>  Initialization}.
>  The driver MUST NOT clear a
> -\field{device status} bit.  If the driver sets the FAILED bit,
> -the driver MUST later reset the device before attempting to re-initialize.
> +\field{device status} bit other than DEVICE_STOPPED.  If the
> +driver sets the FAILED bit, the driver MUST later reset the device
> +before attempting to re-initialize.
>
>  The driver SHOULD NOT rely on completion of operations of a
>  device if DEVICE_NEEDS_RESET is set.
> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  recover by issuing a reset.
>  \end{note}
>
> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
> +first and re-read status to check whether DEVICE_STOPPED is set by the
> +device. In order to resume the device, the driver MUST clear
> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
> +is cleared by the device.
> +
>  \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>  The device MUST initialize \field{device status} to 0 upon reset.
>
>  The device MUST NOT consume buffers or send any used buffer
>  notifications to the driver before DRIVER_OK.
>
> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
> +
> +When driver is trying to set DEVICE_STOPPED, the device MUST not
> +process new avail requests and MUST complete all requests that is
> +currently processing before setting DEVICE_STOPPED.
> +
> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
> +set.
> +
>  \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>  that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>  MUST send a device configuration change notification to the driver.
> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>    that the driver passes extra data (besides identifying the virtqueue)
>    in its device notifications.
> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
> +  driver can stop and resume the device.
>    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>  \end{description}
>
> --
> 2.25.1
>

Hi!

After implementing a small POC for this in QEMU/virtio-pci vdpa
device, I think that the standard should specify what state the driver
[1] is allowed to change, if any, and what fields are valid for the
driver to query (like device status or virtqueue state).

As a proposal, I think all the device config status should freeze (as
"the device is not allowed to change") when paused, and the driver
must be able to read all of it.

Regarding writing to the device config, the straightforward solution
is to not allow them, and mandate the driver to go through a full
reset cycle in my opinion. The only exception should be the queue
selector: Writing to it by the driver has the obvious consequences.
This way, the device does not need to check for changes on the resume,
it could be tricky. However maybe there are some use cases where to
change just a field through this method could be faster than the whole
device reset.

I can think in another approach: The device supports to write some
fields (for example, just virtqueue addresses and virtqueue size, or
some feature flags), but any other change would lead to the device to
report DEVICE_NEEDS_RESET. This has the mentioned advantages of
reducing latencies (as transactions) in case of device re-usage, but
at the cost of increasing complexity. Any opinion about if it is worth
it?

Thanks!

[1] From the device's point of view, so I mean QEMU here.


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2020-12-18  4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang
                   ` (2 preceding siblings ...)
  2021-05-03  9:02 ` [virtio-comment] " Eugenio Perez Martin
@ 2021-05-05 13:16 ` Michael S. Tsirkin
  2021-05-06  7:26   ` Jason Wang
  3 siblings, 1 reply; 43+ messages in thread
From: Michael S. Tsirkin @ 2021-05-05 13:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: stefanha, virtio-comment, eperezma, sgarzare

On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote:
> This patch introduces a new status bit DEVICE_STOPPED. This will be
> used by the driver to stop and resume a device. The main user will be
> live migration support for virtio device.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I have been thinking about this whole approach.

While yes it's possible to have the hypervisor poke
at the device while device is not reset, e.g. by
blocking guest access to device, while device can
still DMA into guest memory, this is at least unusual
and seems somewhat fragile.

I think normally live migration would belong in a PF as opposed to the VF.

As such, how about for starters we prototype this
as part of a vendor specific capability of a PF?
This will also show-case how to add vendor specific capabilities
to devices.


Longer term it would be nice to come up with all kind of PF
related functionality in the spec too, but I think that should
be part of a more comprehensive/extensible approach
since it's not limited to live migration concerns.
For example the way qemu handles control vq of a nic
in software creates problems, couldn't we inject
the relevant commands through PF somehow?


> ---
>  content.tex | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/content.tex b/content.tex
> index 61eab41..4392b60 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>    drive the device.
>  
> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
> +  indicates that the device has been stopped by the driver.
> +
>  \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>    an error from which it can't recover.
>  \end{description}
> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  \ref{sec:General Initialization And Device Operation / Device
>  Initialization}.
>  The driver MUST NOT clear a
> -\field{device status} bit.  If the driver sets the FAILED bit,
> -the driver MUST later reset the device before attempting to re-initialize.
> +\field{device status} bit other than DEVICE_STOPPED.  If the
> +driver sets the FAILED bit, the driver MUST later reset the device
> +before attempting to re-initialize.
>  
>  The driver SHOULD NOT rely on completion of operations of a
>  device if DEVICE_NEEDS_RESET is set.
> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>  recover by issuing a reset.
>  \end{note}
>  
> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
> +first and re-read status to check whether DEVICE_STOPPED is set by the
> +device. In order to resume the device, the driver MUST clear
> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
> +is cleared by the device.
> +
>  \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>  The device MUST initialize \field{device status} to 0 upon reset.
>  
>  The device MUST NOT consume buffers or send any used buffer
>  notifications to the driver before DRIVER_OK.
>  
> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
> +
> +When driver is trying to set DEVICE_STOPPED, the device MUST not
> +process new avail requests and MUST complete all requests that is
> +currently processing before setting DEVICE_STOPPED.
> +
> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
> +set.
> +
>  \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>  that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>  MUST send a device configuration change notification to the driver.
> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>    \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>    that the driver passes extra data (besides identifying the virtqueue)
>    in its device notifications.
> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
> +  driver can stop and resume the device.
>    See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>  \end{description}
>  
> -- 
> 2.25.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2021-05-03  9:02 ` [virtio-comment] " Eugenio Perez Martin
@ 2021-05-06  2:51   ` Jason Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2021-05-06  2:51 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Stefan Hajnoczi, virtio-comment, Michael Tsirkin, Stefano Garzarella


在 2021/5/3 下午5:02, Eugenio Perez Martin 写道:
> On Fri, Dec 18, 2020 at 5:23 AM Jason Wang <jasowang@redhat.com> wrote:
>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>> used by the driver to stop and resume a device. The main user will be
>> live migration support for virtio device.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   content.tex | 26 ++++++++++++++++++++++++--
>>   1 file changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/content.tex b/content.tex
>> index 61eab41..4392b60 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>
>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>> +  indicates that the device has been stopped by the driver.
>> +
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \ref{sec:General Initialization And Device Operation / Device
>>   Initialization}.
>>   The driver MUST NOT clear a
>> -\field{device status} bit.  If the driver sets the FAILED bit,
>> -the driver MUST later reset the device before attempting to re-initialize.
>> +\field{device status} bit other than DEVICE_STOPPED.  If the
>> +driver sets the FAILED bit, the driver MUST later reset the device
>> +before attempting to re-initialize.
>>
>>   The driver SHOULD NOT rely on completion of operations of a
>>   device if DEVICE_NEEDS_RESET is set.
>> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   recover by issuing a reset.
>>   \end{note}
>>
>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>> +first and re-read status to check whether DEVICE_STOPPED is set by the
>> +device. In order to resume the device, the driver MUST clear
>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>> +is cleared by the device.
>> +
>>   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>   The device MUST initialize \field{device status} to 0 upon reset.
>>
>>   The device MUST NOT consume buffers or send any used buffer
>>   notifications to the driver before DRIVER_OK.
>>
>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>> +
>> +When driver is trying to set DEVICE_STOPPED, the device MUST not
>> +process new avail requests and MUST complete all requests that is
>> +currently processing before setting DEVICE_STOPPED.
>> +
>> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
>> +set.
>> +
>>   \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>>   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>   MUST send a device configuration change notification to the driver.
>> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>>     that the driver passes extra data (besides identifying the virtqueue)
>>     in its device notifications.
>> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
>> +  driver can stop and resume the device.
>>     See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>>   \end{description}
>>
>> --
>> 2.25.1
>>
> Hi!
>
> After implementing a small POC for this in QEMU/virtio-pci vdpa
> device, I think that the standard should specify what state the driver
> [1] is allowed to change, if any, and what fields are valid for the
> driver to query (like device status or virtqueue state).
>
> As a proposal, I think all the device config status should freeze (as
> "the device is not allowed to change") when paused, and the driver
> must be able to read all of it.


Yes.


>
> Regarding writing to the device config, the straightforward solution
> is to not allow them, and mandate the driver to go through a full
> reset cycle in my opinion. The only exception should be the queue
> selector: Writing to it by the driver has the obvious consequences.
> This way, the device does not need to check for changes on the resume,
> it could be tricky. However maybe there are some use cases where to
> change just a field through this method could be faster than the whole
> device reset.
>
> I can think in another approach: The device supports to write some
> fields (for example, just virtqueue addresses and virtqueue size, or
> some feature flags), but any other change would lead to the device to
> report DEVICE_NEEDS_RESET. This has the mentioned advantages of
> reducing latencies (as transactions) in case of device re-usage, but
> at the cost of increasing complexity. Any opinion about if it is worth
> it?


I'm not sure we need to do this, and actually it's not specific to the 
stop state.

A more generic question is: do we allow the driver to change 
device/virtqueue status after DRIVER_OK is set? If yes, it's probably 
less interesting in limiting the behavior when stop is set. If not, we 
should clarify it and there's no need to teat stop as a specific case to 
deal with.

Thanks


>
> Thanks!
>
> [1] From the device's point of view, so I mean QEMU here.
>
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP
  2021-05-05 13:16 ` Michael S. Tsirkin
@ 2021-05-06  7:26   ` Jason Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Jason Wang @ 2021-05-06  7:26 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: stefanha, virtio-comment, eperezma, sgarzare


在 2021/5/5 下午9:16, Michael S. Tsirkin 写道:
> On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote:
>> This patch introduces a new status bit DEVICE_STOPPED. This will be
>> used by the driver to stop and resume a device. The main user will be
>> live migration support for virtio device.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> I have been thinking about this whole approach.
>
> While yes it's possible to have the hypervisor poke
> at the device while device is not reset, e.g. by
> blocking guest access to device, while device can
> still DMA into guest memory, this is at least unusual
> and seems somewhat fragile.


So in the context of stop, I wonder if we can just forbid the DMA in 
this case?


>
> I think normally live migration would belong in a PF as opposed to the VF.
>
> As such, how about for starters we prototype this
> as part of a vendor specific capability of a PF?


Just to make sure I understand the proposal. Except for the differnt 
control path, I don't see any other differences:

Consider we use vendor specific capability, we still need to define the 
semantics like the VIRTIO_F_DEVICE_STOP in this case. E.g as you said, 
quiescence the DMA and other stuffs.


> This will also show-case how to add vendor specific capabilities
> to devices.
>
>
> Longer term it would be nice to come up with all kind of PF
> related functionality in the spec too, but I think that should
> be part of a more comprehensive/extensible approach
> since it's not limited to live migration concerns.
> For example the way qemu handles control vq of a nic
> in software creates problems, couldn't we inject
> the relevant commands through PF somehow?


Yes, as we discussed in the past, I plan to send a RFC to introduce the 
admin virtqueue for PF. It could be used for such extensions.

Thanks


>
>
>> ---
>>   content.tex | 26 ++++++++++++++++++++++++--
>>   1 file changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/content.tex b/content.tex
>> index 61eab41..4392b60 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to
>>     drive the device.
>>   
>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated,
>> +  indicates that the device has been stopped by the driver.
>> +
>>   \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced
>>     an error from which it can't recover.
>>   \end{description}
>> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   \ref{sec:General Initialization And Device Operation / Device
>>   Initialization}.
>>   The driver MUST NOT clear a
>> -\field{device status} bit.  If the driver sets the FAILED bit,
>> -the driver MUST later reset the device before attempting to re-initialize.
>> +\field{device status} bit other than DEVICE_STOPPED.  If the
>> +driver sets the FAILED bit, the driver MUST later reset the device
>> +before attempting to re-initialize.
>>   
>>   The driver SHOULD NOT rely on completion of operations of a
>>   device if DEVICE_NEEDS_RESET is set.
>> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev
>>   recover by issuing a reset.
>>   \end{note}
>>   
>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not
>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED
>> +first and re-read status to check whether DEVICE_STOPPED is set by the
>> +device. In order to resume the device, the driver MUST clear
>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED
>> +is cleared by the device.
>> +
>>   \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field}
>>   The device MUST initialize \field{device status} to 0 upon reset.
>>   
>>   The device MUST NOT consume buffers or send any used buffer
>>   notifications to the driver before DRIVER_OK.
>>   
>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set.
>> +
>> +When driver is trying to set DEVICE_STOPPED, the device MUST not
>> +process new avail requests and MUST complete all requests that is
>> +currently processing before setting DEVICE_STOPPED.
>> +
>> +The device MUST keep the config space unchanged when DEVICE_STOPPED is
>> +set.
>> +
>>   \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state
>>   that a reset is needed.  If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device
>>   MUST send a device configuration change notification to the driver.
>> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>     \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates
>>     that the driver passes extra data (besides identifying the virtqueue)
>>     in its device notifications.
>> +  \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the
>> +  driver can stop and resume the device.
>>     See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}.
>>   \end{description}
>>   
>> -- 
>> 2.25.1


This publicly archived list offers a means to provide input to the
OASIS Virtual I/O Device (VIRTIO) TC.

In order to verify user consent to the Feedback License terms and
to minimize spam in the list archive, subscription is required
before posting.

Subscribe: virtio-comment-subscribe@lists.oasis-open.org
Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
List help: virtio-comment-help@lists.oasis-open.org
List archive: https://lists.oasis-open.org/archives/virtio-comment/
Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
Committee: https://www.oasis-open.org/committees/virtio/
Join OASIS: https://www.oasis-open.org/join/


^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2021-05-06  7:26 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-18  4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang
2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella
2020-12-21  3:08   ` Jason Wang
2020-12-21 11:06     ` Stefano Garzarella
2020-12-22  2:38       ` Jason Wang
2020-12-21 21:33 ` [virtio-comment] " Halil Pasic
2020-12-22  2:36   ` Jason Wang
2020-12-22  6:50     ` Halil Pasic
2020-12-22  7:30       ` Jason Wang
2020-12-22 12:14         ` Cornelia Huck
2020-12-22 12:51           ` Jason Wang
2020-12-22 15:54             ` Cornelia Huck
2020-12-23  2:48               ` Jason Wang
2020-12-25  7:38                 ` Halil Pasic
2020-12-27 10:00                   ` Michael S. Tsirkin
2020-12-28  6:21                     ` Halil Pasic
2020-12-28  7:01                       ` Jason Wang
2020-12-28 12:30                         ` Michael S. Tsirkin
2020-12-29  9:04                           ` Jason Wang
2021-01-12 10:54                             ` Michael S. Tsirkin
2021-01-13  3:35                               ` Jason Wang
2020-12-29 13:35                         ` Halil Pasic
2020-12-30  8:15                           ` Jason Wang
2021-01-11 18:16                             ` Cornelia Huck
2021-01-12  3:27                               ` Jason Wang
2021-01-12 12:13                                 ` Cornelia Huck
2021-01-13  2:52                                   ` Jason Wang
2021-01-14 12:00                                     ` Cornelia Huck
2020-12-28  6:47                   ` Jason Wang
2020-12-29 13:20                     ` Halil Pasic
2020-12-30  8:03                       ` Jason Wang
2020-12-24  4:52             ` Halil Pasic
2020-12-24  5:51               ` Jason Wang
2020-12-25  3:18                 ` Halil Pasic
2020-12-25  6:45                   ` Jason Wang
2020-12-27 11:12                     ` Michael S. Tsirkin
2020-12-28  7:05                       ` Jason Wang
2020-12-28 12:27                         ` Michael S. Tsirkin
2020-12-29  8:57                           ` Jason Wang
2021-05-03  9:02 ` [virtio-comment] " Eugenio Perez Martin
2021-05-06  2:51   ` Jason Wang
2021-05-05 13:16 ` Michael S. Tsirkin
2021-05-06  7:26   ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.