* [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP @ 2020-12-18 4:23 Jason Wang 2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella ` (3 more replies) 0 siblings, 4 replies; 43+ messages in thread From: Jason Wang @ 2020-12-18 4:23 UTC (permalink / raw) To: stefanha, virtio-comment, mst; +Cc: eperezma, sgarzare, Jason Wang This patch introduces a new status bit DEVICE_STOPPED. This will be used by the driver to stop and resume a device. The main user will be live migration support for virtio device. Signed-off-by: Jason Wang <jasowang@redhat.com> --- content.tex | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/content.tex b/content.tex index 61eab41..4392b60 100644 --- a/content.tex +++ b/content.tex @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to drive the device. +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, + indicates that the device has been stopped by the driver. + \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced an error from which it can't recover. \end{description} @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev \ref{sec:General Initialization And Device Operation / Device Initialization}. The driver MUST NOT clear a -\field{device status} bit. If the driver sets the FAILED bit, -the driver MUST later reset the device before attempting to re-initialize. +\field{device status} bit other than DEVICE_STOPPED. If the +driver sets the FAILED bit, the driver MUST later reset the device +before attempting to re-initialize. The driver SHOULD NOT rely on completion of operations of a device if DEVICE_NEEDS_RESET is set. @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev recover by issuing a reset. \end{note} +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not +set. In order to stop the device, the driver MUST set DEVICE_STOPPED +first and re-read status to check whether DEVICE_STOPPED is set by the +device. In order to resume the device, the driver MUST clear +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED +is cleared by the device. + \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} The device MUST initialize \field{device status} to 0 upon reset. The device MUST NOT consume buffers or send any used buffer notifications to the driver before DRIVER_OK. +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. + +When driver is trying to set DEVICE_STOPPED, the device MUST not +process new avail requests and MUST complete all requests that is +currently processing before setting DEVICE_STOPPED. + +The device MUST keep the config space unchanged when DEVICE_STOPPED is +set. + \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device MUST send a device configuration change notification to the driver. @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates that the driver passes extra data (besides identifying the virtqueue) in its device notifications. + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the + driver can stop and resume the device. See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}. \end{description} -- 2.25.1 This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-18 4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang @ 2020-12-18 10:15 ` Stefano Garzarella 2020-12-21 3:08 ` Jason Wang 2020-12-21 21:33 ` [virtio-comment] " Halil Pasic ` (2 subsequent siblings) 3 siblings, 1 reply; 43+ messages in thread From: Stefano Garzarella @ 2020-12-18 10:15 UTC (permalink / raw) To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote: >This patch introduces a new status bit DEVICE_STOPPED. This will be >used by the driver to stop and resume a device. The main user will be >live migration support for virtio device. > >Signed-off-by: Jason Wang <jasowang@redhat.com> >--- > content.tex | 26 ++++++++++++++++++++++++-- > 1 file changed, 24 insertions(+), 2 deletions(-) > >diff --git a/content.tex b/content.tex >index 61eab41..4392b60 100644 >--- a/content.tex >+++ b/content.tex >@@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > drive the device. > >+\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >+ indicates that the device has been stopped by the driver. >+ Just for curiosity, why 32 and not 16? Is there any rule for high bits and low bits of Device Status field? > \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has > experienced > an error from which it can't recover. > \end{description} >@@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \ref{sec:General Initialization And Device Operation / Device > Initialization}. > The driver MUST NOT clear a >-\field{device status} bit. If the driver sets the FAILED bit, >-the driver MUST later reset the device before attempting to re-initialize. >+\field{device status} bit other than DEVICE_STOPPED. If the >+driver sets the FAILED bit, the driver MUST later reset the device >+before attempting to re-initialize. > > The driver SHOULD NOT rely on completion of operations of a > device if DEVICE_NEEDS_RESET is set. >@@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > recover by issuing a reset. > \end{note} > >+The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >+set. In order to stop the device, the driver MUST set DEVICE_STOPPED >+first and re-read status to check whether DEVICE_STOPPED is set by the >+device. In order to resume the device, the driver MUST clear >+DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >+is cleared by the device. >+ > \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} > The device MUST initialize \field{device status} to 0 upon reset. > > The device MUST NOT consume buffers or send any used buffer > notifications to the driver before DRIVER_OK. > >+The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >+ >+When driver is trying to set DEVICE_STOPPED, the device MUST not >+process new avail requests and MUST complete all requests that is >+currently processing before setting DEVICE_STOPPED. >+ >+The device MUST keep the config space unchanged when DEVICE_STOPPED is >+set. Maybe here we can specify if "DEVICE_STOPPED is set by the driver" or when the device set it after completing all the requests. >+ > \label{sec:Basic Facilities of a Virtio Device / Device Status Field / > DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it > enters an error state > that a reset is needed. If DRIVER_OK is set, after it sets > DEVICE_NEEDS_RESET, the device > MUST send a device configuration change notification to the driver. >@@ -6553,6 +6573,8 @@ \chapter{Reserved Feature >Bits}\label{sec:Reserved Feature Bits} > \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates > that the driver passes extra data (besides identifying the > virtqueue) > in its device notifications. >+ \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the >+ driver can stop and resume the device. > See \ref{sec:Virtqueues / Driver > notifications}~\nameref{sec:Virtqueues / Driver notifications}. > \end{description} > Since the completion of in-flight requests can take time (I'm thinking of block devices), should we consider a notification when the device completes all the pending requests and set DEVICE_STOPPED. Something similar to the notification when DEVICE_NEEDS_RESET is set. Thanks, Stefano This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella @ 2020-12-21 3:08 ` Jason Wang 2020-12-21 11:06 ` Stefano Garzarella 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-21 3:08 UTC (permalink / raw) To: Stefano Garzarella; +Cc: stefanha, virtio-comment, mst, eperezma On 2020/12/18 下午6:15, Stefano Garzarella wrote: > On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote: >> This patch introduces a new status bit DEVICE_STOPPED. This will be >> used by the driver to stop and resume a device. The main user will be >> live migration support for virtio device. >> >> Signed-off-by: Jason Wang <jasowang@redhat.com> >> --- >> content.tex | 26 ++++++++++++++++++++++++-- >> 1 file changed, 24 insertions(+), 2 deletions(-) >> >> diff --git a/content.tex b/content.tex >> index 61eab41..4392b60 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -47,6 +47,9 @@ \section{\field{Device Status} >> Field}\label{sec:Basic Facilities of a Virtio Dev >> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >> drive the device. >> >> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >> + indicates that the device has been stopped by the driver. >> + > > Just for curiosity, why 32 and not 16? > Is there any rule for high bits and low bits of Device Status field? I'm not sure. I just spot that DEVICE_NEEDS_RESET (with DEVICE prefix) is using 64. > >> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced >> an error from which it can't recover. >> \end{description} >> @@ -58,8 +61,9 @@ \section{\field{Device Status} >> Field}\label{sec:Basic Facilities of a Virtio Dev >> \ref{sec:General Initialization And Device Operation / Device >> Initialization}. >> The driver MUST NOT clear a >> -\field{device status} bit. If the driver sets the FAILED bit, >> -the driver MUST later reset the device before attempting to >> re-initialize. >> +\field{device status} bit other than DEVICE_STOPPED. If the >> +driver sets the FAILED bit, the driver MUST later reset the device >> +before attempting to re-initialize. >> >> The driver SHOULD NOT rely on completion of operations of a >> device if DEVICE_NEEDS_RESET is set. >> @@ -70,12 +74,28 @@ \section{\field{Device Status} >> Field}\label{sec:Basic Facilities of a Virtio Dev >> recover by issuing a reset. >> \end{note} >> >> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED >> +first and re-read status to check whether DEVICE_STOPPED is set by the >> +device. In order to resume the device, the driver MUST clear >> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >> +is cleared by the device. >> + >> \devicenormative{\subsection}{Device Status Field}{Basic Facilities >> of a Virtio Device / Device Status Field} >> The device MUST initialize \field{device status} to 0 upon reset. >> >> The device MUST NOT consume buffers or send any used buffer >> notifications to the driver before DRIVER_OK. >> >> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >> + >> +When driver is trying to set DEVICE_STOPPED, the device MUST not >> +process new avail requests and MUST complete all requests that is >> +currently processing before setting DEVICE_STOPPED. >> + >> +The device MUST keep the config space unchanged when DEVICE_STOPPED is >> +set. > > Maybe here we can specify if "DEVICE_STOPPED is set by the driver" or > when the device set it after completing all the requests. Good point. Since the bit is offered by device, so "set by the device " might be better. > >> + >> \label{sec:Basic Facilities of a Virtio Device / Device Status Field >> / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it >> enters an error state >> that a reset is needed. If DRIVER_OK is set, after it sets >> DEVICE_NEEDS_RESET, the device >> MUST send a device configuration change notification to the driver. >> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature >> Bits}\label{sec:Reserved Feature Bits} >> \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates >> that the driver passes extra data (besides identifying the virtqueue) >> in its device notifications. >> + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the >> + driver can stop and resume the device. >> See \ref{sec:Virtqueues / Driver >> notifications}~\nameref{sec:Virtqueues / Driver notifications}. >> \end{description} >> > > Since the completion of in-flight requests can take time (I'm thinking > of block devices), should we consider a notification when the device > completes all the pending requests and set DEVICE_STOPPED. > > Something similar to the notification when DEVICE_NEEDS_RESET is set. I'm not sure, but it looks even more complicated. Driver may do a periodic polling or fail the operation (like migration) when the request is not completed soon. Thanks > > Thanks, > Stefano > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-21 3:08 ` Jason Wang @ 2020-12-21 11:06 ` Stefano Garzarella 2020-12-22 2:38 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Stefano Garzarella @ 2020-12-21 11:06 UTC (permalink / raw) To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma On Mon, Dec 21, 2020 at 11:08:37AM +0800, Jason Wang wrote: > >On 2020/12/18 下午6:15, Stefano Garzarella wrote: >>On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote: >>>This patch introduces a new status bit DEVICE_STOPPED. This will be >>>used by the driver to stop and resume a device. The main user will be >>>live migration support for virtio device. >>> >>>Signed-off-by: Jason Wang <jasowang@redhat.com> >>>--- >>>content.tex | 26 ++++++++++++++++++++++++-- >>>1 file changed, 24 insertions(+), 2 deletions(-) >>> >>>diff --git a/content.tex b/content.tex >>>index 61eab41..4392b60 100644 >>>--- a/content.tex >>>+++ b/content.tex >>>@@ -47,6 +47,9 @@ \section{\field{Device Status} >>>Field}\label{sec:Basic Facilities of a Virtio Dev >>>\item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >>> drive the device. >>> >>>+\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >>>+ indicates that the device has been stopped by the driver. >>>+ >> >>Just for curiosity, why 32 and not 16? >>Is there any rule for high bits and low bits of Device Status field? > > >I'm not sure. I just spot that DEVICE_NEEDS_RESET (with DEVICE prefix) >is using 64. > > >> >>>\item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced >>> an error from which it can't recover. >>>\end{description} >>>@@ -58,8 +61,9 @@ \section{\field{Device Status} >>>Field}\label{sec:Basic Facilities of a Virtio Dev >>>\ref{sec:General Initialization And Device Operation / Device >>>Initialization}. >>>The driver MUST NOT clear a >>>-\field{device status} bit. If the driver sets the FAILED bit, >>>-the driver MUST later reset the device before attempting to >>>re-initialize. >>>+\field{device status} bit other than DEVICE_STOPPED. If the >>>+driver sets the FAILED bit, the driver MUST later reset the device >>>+before attempting to re-initialize. >>> >>>The driver SHOULD NOT rely on completion of operations of a >>>device if DEVICE_NEEDS_RESET is set. >>>@@ -70,12 +74,28 @@ \section{\field{Device Status} >>>Field}\label{sec:Basic Facilities of a Virtio Dev >>>recover by issuing a reset. >>>\end{note} >>> >>>+The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >>>+set. In order to stop the device, the driver MUST set DEVICE_STOPPED >>>+first and re-read status to check whether DEVICE_STOPPED is set by the >>>+device. In order to resume the device, the driver MUST clear >>>+DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >>>+is cleared by the device. >>>+ >>>\devicenormative{\subsection}{Device Status Field}{Basic >>>Facilities of a Virtio Device / Device Status Field} >>>The device MUST initialize \field{device status} to 0 upon reset. >>> >>>The device MUST NOT consume buffers or send any used buffer >>>notifications to the driver before DRIVER_OK. >>> >>>+The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >>>+ >>>+When driver is trying to set DEVICE_STOPPED, the device MUST not >>>+process new avail requests and MUST complete all requests that is >>>+currently processing before setting DEVICE_STOPPED. >>>+ >>>+The device MUST keep the config space unchanged when DEVICE_STOPPED is >>>+set. >> >>Maybe here we can specify if "DEVICE_STOPPED is set by the driver" >>or when the device set it after completing all the requests. > > >Good point. Since the bit is offered by device, so "set by the device >" might be better. Yeah, I agree. > > >> >>>+ >>>\label{sec:Basic Facilities of a Virtio Device / Device Status >>>Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET >>>when it enters an error state >>>that a reset is needed. If DRIVER_OK is set, after it sets >>>DEVICE_NEEDS_RESET, the device >>>MUST send a device configuration change notification to the driver. >>>@@ -6553,6 +6573,8 @@ \chapter{Reserved Feature >>>Bits}\label{sec:Reserved Feature Bits} >>> \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates >>> that the driver passes extra data (besides identifying the virtqueue) >>> in its device notifications. >>>+ \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the >>>+ driver can stop and resume the device. >>> See \ref{sec:Virtqueues / Driver >>>notifications}~\nameref{sec:Virtqueues / Driver notifications}. >>>\end{description} >>> >> >>Since the completion of in-flight requests can take time (I'm >>thinking of block devices), should we consider a notification when >>the device completes all the pending requests and set >>DEVICE_STOPPED. >> >>Something similar to the notification when DEVICE_NEEDS_RESET is set. > > >I'm not sure, but it looks even more complicated. Driver may do a >periodic polling or fail the operation (like migration) when the >request is not completed soon. > I agree that colud be more complicated for the device point of view. Since the driver is asking the device to stop, perhaps it makes sense for it to periodically check the status. In case the driver decides to abort the operation (e.g. migration), can it clear the bit if the device hasn't set it yet or should it wait until the device has stopped? Maybe we should clarify this as well. Thanks, Stefano This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-21 11:06 ` Stefano Garzarella @ 2020-12-22 2:38 ` Jason Wang 0 siblings, 0 replies; 43+ messages in thread From: Jason Wang @ 2020-12-22 2:38 UTC (permalink / raw) To: Stefano Garzarella; +Cc: stefanha, virtio-comment, mst, eperezma On 2020/12/21 下午7:06, Stefano Garzarella wrote: > On Mon, Dec 21, 2020 at 11:08:37AM +0800, Jason Wang wrote: >> >> On 2020/12/18 下午6:15, Stefano Garzarella wrote: >>> On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote: >>>> This patch introduces a new status bit DEVICE_STOPPED. This will be >>>> used by the driver to stop and resume a device. The main user will be >>>> live migration support for virtio device. >>>> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com> >>>> --- >>>> content.tex | 26 ++++++++++++++++++++++++-- >>>> 1 file changed, 24 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/content.tex b/content.tex >>>> index 61eab41..4392b60 100644 >>>> --- a/content.tex >>>> +++ b/content.tex >>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} >>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >>>> drive the device. >>>> >>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is >>>> negotiated, >>>> + indicates that the device has been stopped by the driver. >>>> + >>> >>> Just for curiosity, why 32 and not 16? >>> Is there any rule for high bits and low bits of Device Status field? >> >> >> I'm not sure. I just spot that DEVICE_NEEDS_RESET (with DEVICE >> prefix) is using 64. >> >> >>> >>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has >>>> experienced >>>> an error from which it can't recover. >>>> \end{description} >>>> @@ -58,8 +61,9 @@ \section{\field{Device Status} >>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>> \ref{sec:General Initialization And Device Operation / Device >>>> Initialization}. >>>> The driver MUST NOT clear a >>>> -\field{device status} bit. If the driver sets the FAILED bit, >>>> -the driver MUST later reset the device before attempting to >>>> re-initialize. >>>> +\field{device status} bit other than DEVICE_STOPPED. If the >>>> +driver sets the FAILED bit, the driver MUST later reset the device >>>> +before attempting to re-initialize. >>>> >>>> The driver SHOULD NOT rely on completion of operations of a >>>> device if DEVICE_NEEDS_RESET is set. >>>> @@ -70,12 +74,28 @@ \section{\field{Device Status} >>>> Field}\label{sec:Basic Facilities of a Virtio Dev >>>> recover by issuing a reset. >>>> \end{note} >>>> >>>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >>>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED >>>> +first and re-read status to check whether DEVICE_STOPPED is set by >>>> the >>>> +device. In order to resume the device, the driver MUST clear >>>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >>>> +is cleared by the device. >>>> + >>>> \devicenormative{\subsection}{Device Status Field}{Basic Facilities >>>> of a Virtio Device / Device Status Field} >>>> The device MUST initialize \field{device status} to 0 upon reset. >>>> >>>> The device MUST NOT consume buffers or send any used buffer >>>> notifications to the driver before DRIVER_OK. >>>> >>>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >>>> + >>>> +When driver is trying to set DEVICE_STOPPED, the device MUST not >>>> +process new avail requests and MUST complete all requests that is >>>> +currently processing before setting DEVICE_STOPPED. >>>> + >>>> +The device MUST keep the config space unchanged when >>>> DEVICE_STOPPED is >>>> +set. >>> >>> Maybe here we can specify if "DEVICE_STOPPED is set by the driver" >>> or when the device set it after completing all the requests. >> >> >> Good point. Since the bit is offered by device, so "set by the device >> " might be better. > > Yeah, I agree. > >> >> >>> >>>> + >>>> \label{sec:Basic Facilities of a Virtio Device / Device Status >>>> Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET >>>> when it enters an error state >>>> that a reset is needed. If DRIVER_OK is set, after it sets >>>> DEVICE_NEEDS_RESET, the device >>>> MUST send a device configuration change notification to the driver. >>>> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature >>>> Bits}\label{sec:Reserved Feature Bits} >>>> \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates >>>> that the driver passes extra data (besides identifying the >>>> virtqueue) >>>> in its device notifications. >>>> + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the >>>> + driver can stop and resume the device. >>>> See \ref{sec:Virtqueues / Driver >>>> notifications}~\nameref{sec:Virtqueues / Driver notifications}. >>>> \end{description} >>>> >>> >>> Since the completion of in-flight requests can take time (I'm >>> thinking of block devices), should we consider a notification when >>> the device completes all the pending requests and set DEVICE_STOPPED. >>> >>> Something similar to the notification when DEVICE_NEEDS_RESET is set. >> >> >> I'm not sure, but it looks even more complicated. Driver may do a >> periodic polling or fail the operation (like migration) when the >> request is not completed soon. >> > > I agree that colud be more complicated for the device point of view. > Since the driver is asking the device to stop, perhaps it makes sense > for it to periodically check the status. > > In case the driver decides to abort the operation (e.g. migration), > can it clear the bit if the device hasn't set it yet or should it wait > until the device has stopped? > Maybe we should clarify this as well. Right, will add this. Thanks > > Thanks, > Stefano > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-18 4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang 2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella @ 2020-12-21 21:33 ` Halil Pasic 2020-12-22 2:36 ` Jason Wang 2021-05-03 9:02 ` [virtio-comment] " Eugenio Perez Martin 2021-05-05 13:16 ` Michael S. Tsirkin 3 siblings, 1 reply; 43+ messages in thread From: Halil Pasic @ 2020-12-21 21:33 UTC (permalink / raw) To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare On Fri, 18 Dec 2020 12:23:02 +0800 Jason Wang <jasowang@redhat.com> wrote: > This patch introduces a new status bit DEVICE_STOPPED. This will be > used by the driver to stop and resume a device. The main user will be > live migration support for virtio device. > Can you please provide some more background information, or point me to the appropriate discussion? I mean AFAIK migration already works without this driver initiated drain. What is the exact motivation? What about the big picture? I guess some agent in the guest would have to make the driver issue the DEVICE_STOP. > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > content.tex | 26 ++++++++++++++++++++++++-- > 1 file changed, 24 insertions(+), 2 deletions(-) > > diff --git a/content.tex b/content.tex > index 61eab41..4392b60 100644 > --- a/content.tex > +++ b/content.tex > @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > drive the device. > > +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, > + indicates that the device has been stopped by the driver. > + AFAIU it is not only about indicating stopped, but also requesting to be stopped. More importantly, that must not be set immediately, in a sense that the one side initiates some action by requesting the bit to be set, and the other side must not set the bit before the action is performed. We also seem to assume that every device implementation is capable of performing this trick. Is it for hardware devices (e.g. PCI) standard to request an operation by writing some value into a register, and get feedback bout a non-completion by reading different value that written, and about the completion, by reading the same value as written? > \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced > an error from which it can't recover. > \end{description} > @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \ref{sec:General Initialization And Device Operation / Device > Initialization}. > The driver MUST NOT clear a > -\field{device status} bit. If the driver sets the FAILED bit, > -the driver MUST later reset the device before attempting to re-initialize. > +\field{device status} bit other than DEVICE_STOPPED. If the > +driver sets the FAILED bit, the driver MUST later reset the device > +before attempting to re-initialize. > > The driver SHOULD NOT rely on completion of operations of a > device if DEVICE_NEEDS_RESET is set. > @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > recover by issuing a reset. > \end{note} > > +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not > +set. In order to stop the device, the driver MUST set DEVICE_STOPPED > +first and re-read status to check whether DEVICE_STOPPED is set by the > +device. In order to resume the device, the driver MUST clear > +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED > +is cleared by the device. > + > \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} > The device MUST initialize \field{device status} to 0 upon reset. > > The device MUST NOT consume buffers or send any used buffer > notifications to the driver before DRIVER_OK. > > +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. > + > +When driver is trying to set DEVICE_STOPPED, the device MUST not The when driver trying to set DEVICE_STOPPED is a bit soft as a duration. For example consider virtio-ccw, at the moment when the driver issues the ssch to set status, the device still does not know about it. > +process new avail requests and MUST complete all requests that is > +currently processing before setting DEVICE_STOPPED. I would like to have a more precise definition of 'new avail requests' and 'requests that is currently processing'. > + > +The device MUST keep the config space unchanged when DEVICE_STOPPED is > +set. Here you have the set by driver which is actually requesting the stop operation, and set by device which indicted that the stop operation was successfully performed by the device. > + > \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state > that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device > MUST send a device configuration change notification to the driver. > @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates > that the driver passes extra data (besides identifying the virtqueue) > in its device notifications. > + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the > + driver can stop and resume the device. > See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}. > \end{description} > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-21 21:33 ` [virtio-comment] " Halil Pasic @ 2020-12-22 2:36 ` Jason Wang 2020-12-22 6:50 ` Halil Pasic 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-22 2:36 UTC (permalink / raw) To: Halil Pasic; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/22 上午5:33, Halil Pasic wrote: > On Fri, 18 Dec 2020 12:23:02 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> This patch introduces a new status bit DEVICE_STOPPED. This will be >> used by the driver to stop and resume a device. The main user will be >> live migration support for virtio device. >> > Can you please provide some more background information, or point > me to the appropriate discussion? > > I mean AFAIK migration already works without this driver initiated > drain. What is the exact motivation? What about the big picture? I > guess some agent in the guest would have to make the driver issue > the DEVICE_STOP. This is not necessary if the datapath is done inside qemu and when migration is initiated by qemu itself. But it's a must for using virtio-device as a backend for emulated virtio devices (e.g vhost-vDPA). In this case, qemu needs to stop the device then it can safely synchronize the state from them. > >> Signed-off-by: Jason Wang <jasowang@redhat.com> >> --- >> content.tex | 26 ++++++++++++++++++++++++-- >> 1 file changed, 24 insertions(+), 2 deletions(-) >> >> diff --git a/content.tex b/content.tex >> index 61eab41..4392b60 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >> drive the device. >> >> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >> + indicates that the device has been stopped by the driver. >> + > AFAIU it is not only about indicating stopped, but also requesting to be > stopped. > > More importantly, that must not be set immediately, in a sense that the > one side initiates some action by requesting the bit to be set, and the > other side must not set the bit before the action is performed. Yes. > We also > seem to assume that every device implementation is capable of performing > this trick. A dedicated feature bit is introduced for this. > Is it for hardware devices (e.g. PCI) standard to request an > operation by writing some value into a register, and get feedback bout > a non-completion by reading different value that written, This is not ununsal in other devices. And in fact, the FEATURES_OK works like this: """ The device MUST NOT offer a feature which requires another feature which was not offered. The device SHOULD accept any valid subset of features the driver accepts, otherwise it MUST fail to set the FEATURES_OK device status bit when the driver writes it. """ We've already had several hardware implementation of virtio-pci devices from different vendors. I didn't hear any complain about such kind of design. > and about the > completion, by reading the same value as written? After after DEVICE_STOPPED is read from device the driver can assume the device is stopped. > > >> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced >> an error from which it can't recover. >> \end{description} >> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> \ref{sec:General Initialization And Device Operation / Device >> Initialization}. >> The driver MUST NOT clear a >> -\field{device status} bit. If the driver sets the FAILED bit, >> -the driver MUST later reset the device before attempting to re-initialize. >> +\field{device status} bit other than DEVICE_STOPPED. If the >> +driver sets the FAILED bit, the driver MUST later reset the device >> +before attempting to re-initialize. >> >> The driver SHOULD NOT rely on completion of operations of a >> device if DEVICE_NEEDS_RESET is set. >> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> recover by issuing a reset. >> \end{note} >> >> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED >> +first and re-read status to check whether DEVICE_STOPPED is set by the >> +device. In order to resume the device, the driver MUST clear >> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >> +is cleared by the device. >> + >> \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} >> The device MUST initialize \field{device status} to 0 upon reset. >> >> The device MUST NOT consume buffers or send any used buffer >> notifications to the driver before DRIVER_OK. >> >> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >> + >> +When driver is trying to set DEVICE_STOPPED, the device MUST not > The when driver trying to set DEVICE_STOPPED is a bit soft as a > duration. For example consider virtio-ccw, at the moment when the driver > issues the ssch to set status, the device still does not know about it. I need more context on this, if it works like this, when or how can device know the status has been changed? (E.g how reset or other status bit is supposed to work?) It looks like a transport limitation if we can't guarantee this. Similar issue were met in the PCIE Endpoint device, but it can be workaround by designing a new transport. > >> +process new avail requests and MUST complete all requests that is >> +currently processing before setting DEVICE_STOPPED. > I would like to have a more precise definition of 'new avail requests' > and 'requests that is currently processing'. Good point. How about something like: The device MUST stop reading requests from descriptor area or driver area and MUST complete all in flight requests before setting DEVICE_STOPPED. To be 100% accurate, it looks to me we need to mention device implementation internals or pseudo code. I start with "in flight" but Stefan wants a more accurate one. Reading the spec I found "in flight" has been used in: """ The driver SHOULD NOT rely on completion of operations of a device if DEVICE_NEEDS_RESET is set. Note: For example, the driver can’t assume requests in flight will be completed if DEVICE_NEEDS_RESET is set, nor can it assume that they have not been completed. A good implementation will try to recover by issuing a reset. """ So we are probably fine. Any idea or suggestion are more than welcomed. > >> + >> +The device MUST keep the config space unchanged when DEVICE_STOPPED is >> +set. > Here you have the set by driver which is actually requesting the stop > operation, and set by device which indicted that the stop operation > was successfully performed by the device. Exactly. Thanks > >> + >> \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state >> that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device >> MUST send a device configuration change notification to the driver. >> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} >> \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates >> that the driver passes extra data (besides identifying the virtqueue) >> in its device notifications. >> + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the >> + driver can stop and resume the device. >> See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}. >> \end{description} >> This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-22 2:36 ` Jason Wang @ 2020-12-22 6:50 ` Halil Pasic 2020-12-22 7:30 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Halil Pasic @ 2020-12-22 6:50 UTC (permalink / raw) To: Jason Wang; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare On Tue, 22 Dec 2020 10:36:41 +0800 Jason Wang <jasowang@redhat.com> wrote: > > On 2020/12/22 上午5:33, Halil Pasic wrote: > > On Fri, 18 Dec 2020 12:23:02 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > >> This patch introduces a new status bit DEVICE_STOPPED. This will be > >> used by the driver to stop and resume a device. The main user will be > >> live migration support for virtio device. > >> > > Can you please provide some more background information, or point > > me to the appropriate discussion? > > > > I mean AFAIK migration already works without this driver initiated > > drain. What is the exact motivation? What about the big picture? I > > guess some agent in the guest would have to make the driver issue > > the DEVICE_STOP. > > > This is not necessary if the datapath is done inside qemu and when > migration is initiated by qemu itself. > > But it's a must for using virtio-device as a backend for emulated virtio > devices (e.g vhost-vDPA). In this case, qemu needs to stop the device > then it can safely synchronize the state from them. > You say, in this case qemu needs to stop the device, which makes sense (it also has to do this when the datapath is implemented in qemu), but AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm confused. I'm still curious about how the different components in the stack (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are supposed to interact. > > > > >> Signed-off-by: Jason Wang <jasowang@redhat.com> > >> --- > >> content.tex | 26 ++++++++++++++++++++++++-- > >> 1 file changed, 24 insertions(+), 2 deletions(-) > >> > >> diff --git a/content.tex b/content.tex > >> index 61eab41..4392b60 100644 > >> --- a/content.tex > >> +++ b/content.tex > >> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > >> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > >> drive the device. > >> > >> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, > >> + indicates that the device has been stopped by the driver. > >> + > > AFAIU it is not only about indicating stopped, but also requesting to be > > stopped. > > > > More importantly, that must not be set immediately, in a sense that the > > one side initiates some action by requesting the bit to be set, and the > > other side must not set the bit before the action is performed. > > > Yes. > > > > We also > > seem to assume that every device implementation is capable of performing > > this trick. > > > A dedicated feature bit is introduced for this. > This is not about the feature bit, but about the mechanism. But your subsequent answers explain, that this is nothing unusual, and then we should be fine. > > > Is it for hardware devices (e.g. PCI) standard to request an > > operation by writing some value into a register, and get feedback bout > > a non-completion by reading different value that written, > > > This is not ununsal in other devices. And in fact, the FEATURES_OK works > like this: > > """ > > The device MUST NOT offer a feature which requires another feature which > was not offered. The device SHOULD accept any valid subset of features > the driver accepts, otherwise it MUST fail to set the FEATURES_OK device > status bit when the driver writes it. > > """ > Thanks for the pointer. I intend to have another look at how FEATURES_OK works, and how similar this is to DEVICE_STOPPED. > We've already had several hardware implementation of virtio-pci devices > from different vendors. I didn't hear any complain about such kind of > design. > > > > and about the > > completion, by reading the same value as written? > > > After after DEVICE_STOPPED is read from device the driver can assume the > device is stopped. > How is whether the device has already stopped or is still in the middle of stopping supposed/expected to affect the drivers behavior? Does the driver actually care? > > > > > > >> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced > >> an error from which it can't recover. > >> \end{description} > >> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > >> \ref{sec:General Initialization And Device Operation / Device > >> Initialization}. > >> The driver MUST NOT clear a > >> -\field{device status} bit. If the driver sets the FAILED bit, > >> -the driver MUST later reset the device before attempting to re-initialize. > >> +\field{device status} bit other than DEVICE_STOPPED. If the > >> +driver sets the FAILED bit, the driver MUST later reset the device > >> +before attempting to re-initialize. > >> > >> The driver SHOULD NOT rely on completion of operations of a > >> device if DEVICE_NEEDS_RESET is set. > >> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > >> recover by issuing a reset. > >> \end{note} > >> > >> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not > >> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED > >> +first and re-read status to check whether DEVICE_STOPPED is set by the > >> +device. In order to resume the device, the driver MUST clear > >> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED > >> +is cleared by the device. > >> + > >> \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} > >> The device MUST initialize \field{device status} to 0 upon reset. > >> > >> The device MUST NOT consume buffers or send any used buffer > >> notifications to the driver before DRIVER_OK. > >> > >> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. > >> + > >> +When driver is trying to set DEVICE_STOPPED, the device MUST not > > The when driver trying to set DEVICE_STOPPED is a bit soft as a > > duration. For example consider virtio-ccw, at the moment when the driver > > issues the ssch to set status, the device still does not know about it. > > > I need more context on this, if it works like this, when or how can > device know the status has been changed? (E.g how reset or other status > bit is supposed to work?) I think we have a misunderstanding here. The device will eventually learn about the drivers request to write the status, it is just not instantaneous. It is like when you turn the light switch on, it takes some amount of time for the current to reach the lightbulb. > > It looks like a transport limitation if we can't guarantee this. Similar > issue were met in the PCIE Endpoint device, but it can be workaround by > designing a new transport. > I don't think we have a problem with virtio-ccw here. > > > > >> +process new avail requests and MUST complete all requests that is > >> +currently processing before setting DEVICE_STOPPED. > > I would like to have a more precise definition of 'new avail requests' > > and 'requests that is currently processing'. > > > Good point. How about something like: > > The device MUST stop reading requests from descriptor area or driver > area and MUST complete all in flight requests before setting DEVICE_STOPPED. > > To be 100% accurate, it looks to me we need to mention device > implementation internals or pseudo code. I start with "in flight" but > Stefan wants a more accurate one. Reading the spec I found "in flight" > has been used in: > > """ > > The driver SHOULD NOT rely on completion of operations of a device if > DEVICE_NEEDS_RESET is set. Note: For example, the driver can’t assume > requests in flight will be completed if DEVICE_NEEDS_RESET is set, nor > can it assume that they have not been completed. A good implementation > will try to recover by issuing a reset. > > """ > My problem is, that one can look at this from the drivers or from the devices perspective. From the drivers perspective, every buffer that has been made available but haven't been marked as used, is essentially a request in flight. From the device perspective, and probably from migration perspective, the in-flight requests are those that were submitted to some kind of a backend. I.e. available requests that are just sitting on the queue, but weren't submitted to the backend, can sit around, and get submitted after the migration. It seems that what we want is the following: while the DEVICE_STOPPED is set, there are no requests in the backend (i.e. all requests submitted prior have completed, and no more request are submitted to it). BTW how about s/stop/drain? I think drain has more of this 'finish what you are doing' while stop can be 'stop in the middle of it, whatever you are doing'. > > So we are probably fine. Any idea or suggestion are more than welcomed. > > I will think about it some more. > > > >> + > >> +The device MUST keep the config space unchanged when DEVICE_STOPPED is > >> +set. > > Here you have the set by driver which is actually requesting the stop > > operation, and set by device which indicted that the stop operation > > was successfully performed by the device. > > > Exactly. > > Thanks > Thank you for all the exlanations! Regards, Halil [..] This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-22 6:50 ` Halil Pasic @ 2020-12-22 7:30 ` Jason Wang 2020-12-22 12:14 ` Cornelia Huck 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-22 7:30 UTC (permalink / raw) To: Halil Pasic; +Cc: stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/22 下午2:50, Halil Pasic wrote: > On Tue, 22 Dec 2020 10:36:41 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> On 2020/12/22 上午5:33, Halil Pasic wrote: >>> On Fri, 18 Dec 2020 12:23:02 +0800 >>> Jason Wang <jasowang@redhat.com> wrote: >>> >>>> This patch introduces a new status bit DEVICE_STOPPED. This will be >>>> used by the driver to stop and resume a device. The main user will be >>>> live migration support for virtio device. >>>> >>> Can you please provide some more background information, or point >>> me to the appropriate discussion? >>> >>> I mean AFAIK migration already works without this driver initiated >>> drain. What is the exact motivation? What about the big picture? I >>> guess some agent in the guest would have to make the driver issue >>> the DEVICE_STOP. >> >> This is not necessary if the datapath is done inside qemu and when >> migration is initiated by qemu itself. >> >> But it's a must for using virtio-device as a backend for emulated virtio >> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device >> then it can safely synchronize the state from them. >> > You say, in this case qemu needs to stop the device, which makes sense > (it also has to do this when the datapath is implemented in qemu), but > AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm > confused. It's initiated by Qemu. Guest is unware of live migration. > > I'm still curious about how the different components in the stack > (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are > supposed to interact. It works like: From Qemu point of view, vhost-vDPA is just another type of vhost backend. Qemu needs to stop virtio (vhost) before it can do migration. So we require vDPA devices to have the ability of stopping or pausing its datapath. If the vDPA device is by chance the virtio-PCI device, it needs an interface for receiving stop/resume command from the driver. So the devce stop/resume command was sent from Qemu to vhost-VDPA, then to vDPA parent which could be a virtio-PCI device in this case. > >>>> Signed-off-by: Jason Wang <jasowang@redhat.com> >>>> --- >>>> content.tex | 26 ++++++++++++++++++++++++-- >>>> 1 file changed, 24 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/content.tex b/content.tex >>>> index 61eab41..4392b60 100644 >>>> --- a/content.tex >>>> +++ b/content.tex >>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >>>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >>>> drive the device. >>>> >>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >>>> + indicates that the device has been stopped by the driver. >>>> + >>> AFAIU it is not only about indicating stopped, but also requesting to be >>> stopped. >>> >>> More importantly, that must not be set immediately, in a sense that the >>> one side initiates some action by requesting the bit to be set, and the >>> other side must not set the bit before the action is performed. >> >> Yes. >> >> >>> We also >>> seem to assume that every device implementation is capable of performing >>> this trick. >> >> A dedicated feature bit is introduced for this. >> > This is not about the feature bit, but about the mechanism. But your > subsequent answers explain, that this is nothing unusual, and then > we should be fine. > >>> Is it for hardware devices (e.g. PCI) standard to request an >>> operation by writing some value into a register, and get feedback bout >>> a non-completion by reading different value that written, >> >> This is not ununsal in other devices. And in fact, the FEATURES_OK works >> like this: >> >> """ >> >> The device MUST NOT offer a feature which requires another feature which >> was not offered. The device SHOULD accept any valid subset of features >> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device >> status bit when the driver writes it. >> >> """ >> > Thanks for the pointer. I intend to have another look at how FEATURES_OK > works, and how similar this is to DEVICE_STOPPED. My understanding is that for both of them, driver can try to set the bit by writing to the status register but it's the device that decide when to set the bit. > >> We've already had several hardware implementation of virtio-pci devices >> from different vendors. I didn't hear any complain about such kind of >> design. >> >> >>> and about the >>> completion, by reading the same value as written? >> >> After after DEVICE_STOPPED is read from device the driver can assume the >> device is stopped. >> > How is whether the device has already stopped or is still in the middle > of stopping supposed/expected to affect the drivers behavior? Does the > driver actually care? Yes, in the case of live migration. If device takes too longs time to be stopped. Qemu may simple fail the migration. > >>> >>>> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced >>>> an error from which it can't recover. >>>> \end{description} >>>> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >>>> \ref{sec:General Initialization And Device Operation / Device >>>> Initialization}. >>>> The driver MUST NOT clear a >>>> -\field{device status} bit. If the driver sets the FAILED bit, >>>> -the driver MUST later reset the device before attempting to re-initialize. >>>> +\field{device status} bit other than DEVICE_STOPPED. If the >>>> +driver sets the FAILED bit, the driver MUST later reset the device >>>> +before attempting to re-initialize. >>>> >>>> The driver SHOULD NOT rely on completion of operations of a >>>> device if DEVICE_NEEDS_RESET is set. >>>> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >>>> recover by issuing a reset. >>>> \end{note} >>>> >>>> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >>>> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED >>>> +first and re-read status to check whether DEVICE_STOPPED is set by the >>>> +device. In order to resume the device, the driver MUST clear >>>> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >>>> +is cleared by the device. >>>> + >>>> \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} >>>> The device MUST initialize \field{device status} to 0 upon reset. >>>> >>>> The device MUST NOT consume buffers or send any used buffer >>>> notifications to the driver before DRIVER_OK. >>>> >>>> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >>>> + >>>> +When driver is trying to set DEVICE_STOPPED, the device MUST not >>> The when driver trying to set DEVICE_STOPPED is a bit soft as a >>> duration. For example consider virtio-ccw, at the moment when the driver >>> issues the ssch to set status, the device still does not know about it. >> >> I need more context on this, if it works like this, when or how can >> device know the status has been changed? (E.g how reset or other status >> bit is supposed to work?) > I think we have a misunderstanding here. The device will eventually > learn about the drivers request to write the status, it is just not > instantaneous. It is like when you turn the light switch on, it takes > some amount of time for the current to reach the lightbulb. I see, so it looks not a problem since device is not required to be stopped immediately if driver write to that bit. > >> It looks like a transport limitation if we can't guarantee this. Similar >> issue were met in the PCIE Endpoint device, but it can be workaround by >> designing a new transport. >> > I don't think we have a problem with virtio-ccw here. Agree. > >>>> +process new avail requests and MUST complete all requests that is >>>> +currently processing before setting DEVICE_STOPPED. >>> I would like to have a more precise definition of 'new avail requests' >>> and 'requests that is currently processing'. >> >> Good point. How about something like: >> >> The device MUST stop reading requests from descriptor area or driver >> area and MUST complete all in flight requests before setting DEVICE_STOPPED. >> >> To be 100% accurate, it looks to me we need to mention device >> implementation internals or pseudo code. I start with "in flight" but >> Stefan wants a more accurate one. Reading the spec I found "in flight" >> has been used in: >> >> """ >> >> The driver SHOULD NOT rely on completion of operations of a device if >> DEVICE_NEEDS_RESET is set. Note: For example, the driver can’t assume >> requests in flight will be completed if DEVICE_NEEDS_RESET is set, nor >> can it assume that they have not been completed. A good implementation >> will try to recover by issuing a reset. >> >> """ >> > My problem is, that one can look at this from the drivers or from the > devices perspective. From the drivers perspective, every buffer that has > been made available but haven't been marked as used, is essentially a > request in flight. Not a native speaker, but the sentence starts from "The device" so it should be the perspective of device. > > From the device perspective, and probably from migration perspective, the > in-flight requests are those that were submitted to some kind of a > backend. I.e. available requests that are just sitting on the queue, but > weren't submitted to the backend, can sit around, and get submitted > after the migration. It seems that what we want is the following: while > the DEVICE_STOPPED is set, there are no requests in the backend (i.e. > all requests submitted prior have completed, and no more request are > submitted to it). Yes. but do we need to mention something like "backend" in the spec? It sounds like a implementation detail. > > BTW how about s/stop/drain? I think drain has more of this 'finish > what you are doing' while stop can be 'stop in the middle of it, whatever > you are doing'. It might be better. But drain may fit more for a queue not a device. We had stuffs other than queue e.g as Stefan pointed out, we should keep config space unchanged (and no config interrupt) after the device is stopped. > >> So we are probably fine. Any idea or suggestion are more than welcomed. >> >> > I will think about it some more. Thanks > >>>> + >>>> +The device MUST keep the config space unchanged when DEVICE_STOPPED is >>>> +set. >>> Here you have the set by driver which is actually requesting the stop >>> operation, and set by device which indicted that the stop operation >>> was successfully performed by the device. >> >> Exactly. >> >> Thanks >> > Thank you for all the exlanations! > > Regards, > Halil > > [..] > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-22 7:30 ` Jason Wang @ 2020-12-22 12:14 ` Cornelia Huck 2020-12-22 12:51 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Cornelia Huck @ 2020-12-22 12:14 UTC (permalink / raw) To: Jason Wang; +Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare On Tue, 22 Dec 2020 15:30:31 +0800 Jason Wang <jasowang@redhat.com> wrote: > On 2020/12/22 下午2:50, Halil Pasic wrote: > > On Tue, 22 Dec 2020 10:36:41 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > >> On 2020/12/22 上午5:33, Halil Pasic wrote: > >>> On Fri, 18 Dec 2020 12:23:02 +0800 > >>> Jason Wang <jasowang@redhat.com> wrote: > >>> > >>>> This patch introduces a new status bit DEVICE_STOPPED. This will be > >>>> used by the driver to stop and resume a device. The main user will be > >>>> live migration support for virtio device. > >>>> > >>> Can you please provide some more background information, or point > >>> me to the appropriate discussion? > >>> > >>> I mean AFAIK migration already works without this driver initiated > >>> drain. What is the exact motivation? What about the big picture? I > >>> guess some agent in the guest would have to make the driver issue > >>> the DEVICE_STOP. > >> > >> This is not necessary if the datapath is done inside qemu and when > >> migration is initiated by qemu itself. > >> > >> But it's a must for using virtio-device as a backend for emulated virtio > >> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device > >> then it can safely synchronize the state from them. > >> > > You say, in this case qemu needs to stop the device, which makes sense > > (it also has to do this when the datapath is implemented in qemu), but > > AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm > > confused. > > > It's initiated by Qemu. Guest is unware of live migration. But isn't setting DEVICE_STOPPED a _driver_ initiated process? That sounds like "guest" to me. > > > > > > I'm still curious about how the different components in the stack > > (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are > > supposed to interact. > > > It works like: > > From Qemu point of view, vhost-vDPA is just another type of vhost > backend. Qemu needs to stop virtio (vhost) before it can do migration. > So we require vDPA devices to have the ability of stopping or pausing > its datapath. If the vDPA device is by chance the virtio-PCI device, it > needs an interface for receiving stop/resume command from the driver. > > So the devce stop/resume command was sent from Qemu to vhost-VDPA, then > to vDPA parent which could be a virtio-PCI device in this case. But QEMU implements the _device_, not the driver, doesn't it? And IIUC, vhost-VDPA and the vDPA parent are also on the device side. I feel like I'm missing something essential here. > > > > > >>>> Signed-off-by: Jason Wang <jasowang@redhat.com> > >>>> --- > >>>> content.tex | 26 ++++++++++++++++++++++++-- > >>>> 1 file changed, 24 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/content.tex b/content.tex > >>>> index 61eab41..4392b60 100644 > >>>> --- a/content.tex > >>>> +++ b/content.tex > >>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > >>>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > >>>> drive the device. > >>>> > >>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, > >>>> + indicates that the device has been stopped by the driver. > >>>> + > >>> AFAIU it is not only about indicating stopped, but also requesting to be > >>> stopped. > >>> > >>> More importantly, that must not be set immediately, in a sense that the > >>> one side initiates some action by requesting the bit to be set, and the > >>> other side must not set the bit before the action is performed. > >> > >> Yes. > >> > >> > >>> We also > >>> seem to assume that every device implementation is capable of performing > >>> this trick. > >> > >> A dedicated feature bit is introduced for this. > >> > > This is not about the feature bit, but about the mechanism. But your > > subsequent answers explain, that this is nothing unusual, and then > > we should be fine. > > > >>> Is it for hardware devices (e.g. PCI) standard to request an > >>> operation by writing some value into a register, and get feedback bout > >>> a non-completion by reading different value that written, > >> > >> This is not ununsal in other devices. And in fact, the FEATURES_OK works > >> like this: > >> > >> """ > >> > >> The device MUST NOT offer a feature which requires another feature which > >> was not offered. The device SHOULD accept any valid subset of features > >> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device > >> status bit when the driver writes it. > >> > >> """ > >> > > Thanks for the pointer. I intend to have another look at how FEATURES_OK > > works, and how similar this is to DEVICE_STOPPED. > > > My understanding is that for both of them, driver can try to set the bit > by writing to the status register but it's the device that decide when > to set the bit. I think there's a difference: For FEATURES_OK, the driver can read back the status and knows that the feature combination is not accepted if it is not set. For DEVICE_STOPPED, it does not really know whether the device has started to stop yet. It only knows that the device is actually done stopping when DEVICE_STOPPED is set. Do we need a different mechanism for the device to signal the driver that the device has actually been stopped? If the driver sees the STOPPED status after reading back, it knows that the device has acknowledged the request and is now stopping down. If it is not set, it means that the device has not honoured the request. Same for clearing it to resume. This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-22 12:14 ` Cornelia Huck @ 2020-12-22 12:51 ` Jason Wang 2020-12-22 15:54 ` Cornelia Huck 2020-12-24 4:52 ` Halil Pasic 0 siblings, 2 replies; 43+ messages in thread From: Jason Wang @ 2020-12-22 12:51 UTC (permalink / raw) To: Cornelia Huck Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/22 下午8:14, Cornelia Huck wrote: > On Tue, 22 Dec 2020 15:30:31 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> On 2020/12/22 下午2:50, Halil Pasic wrote: >>> On Tue, 22 Dec 2020 10:36:41 +0800 >>> Jason Wang <jasowang@redhat.com> wrote: >>> >>>> On 2020/12/22 上午5:33, Halil Pasic wrote: >>>>> On Fri, 18 Dec 2020 12:23:02 +0800 >>>>> Jason Wang <jasowang@redhat.com> wrote: >>>>> >>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be >>>>>> used by the driver to stop and resume a device. The main user will be >>>>>> live migration support for virtio device. >>>>>> >>>>> Can you please provide some more background information, or point >>>>> me to the appropriate discussion? >>>>> >>>>> I mean AFAIK migration already works without this driver initiated >>>>> drain. What is the exact motivation? What about the big picture? I >>>>> guess some agent in the guest would have to make the driver issue >>>>> the DEVICE_STOP. >>>> This is not necessary if the datapath is done inside qemu and when >>>> migration is initiated by qemu itself. >>>> >>>> But it's a must for using virtio-device as a backend for emulated virtio >>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device >>>> then it can safely synchronize the state from them. >>>> >>> You say, in this case qemu needs to stop the device, which makes sense >>> (it also has to do this when the datapath is implemented in qemu), but >>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm >>> confused. >> >> It's initiated by Qemu. Guest is unware of live migration. > But isn't setting DEVICE_STOPPED a _driver_ initiated process? That > sounds like "guest" to me. I think we need clarification on the "driver". Think the following setup: Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA driver) It's the Qemu that initiates the stop, and the request is forwarded to virtio-pci vDPA driver. So the process is transparent to the guest (driver). > >> >>> I'm still curious about how the different components in the stack >>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are >>> supposed to interact. >> >> It works like: >> >> From Qemu point of view, vhost-vDPA is just another type of vhost >> backend. Qemu needs to stop virtio (vhost) before it can do migration. >> So we require vDPA devices to have the ability of stopping or pausing >> its datapath. If the vDPA device is by chance the virtio-PCI device, it >> needs an interface for receiving stop/resume command from the driver. >> >> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then >> to vDPA parent which could be a virtio-PCI device in this case. > But QEMU implements the _device_, not the driver, doesn't it? The device is implemented with the co-operation between Qemu and vhost-vDPA. During migration qemu need to stop the virtio-net device, then vhost must be stopped as well. > And IIUC, > vhost-VDPA and the vDPA parent are also on the device side. I feel like > I'm missing something essential here. Virtio-PCI driver could be a vDPA parent as well in this case. So we need stop the virtio-pci device. > >> >>> >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com> >>>>>> --- >>>>>> content.tex | 26 ++++++++++++++++++++++++-- >>>>>> 1 file changed, 24 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/content.tex b/content.tex >>>>>> index 61eab41..4392b60 100644 >>>>>> --- a/content.tex >>>>>> +++ b/content.tex >>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >>>>>> drive the device. >>>>>> >>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >>>>>> + indicates that the device has been stopped by the driver. >>>>>> + >>>>> AFAIU it is not only about indicating stopped, but also requesting to be >>>>> stopped. >>>>> >>>>> More importantly, that must not be set immediately, in a sense that the >>>>> one side initiates some action by requesting the bit to be set, and the >>>>> other side must not set the bit before the action is performed. >>>> Yes. >>>> >>>> >>>>> We also >>>>> seem to assume that every device implementation is capable of performing >>>>> this trick. >>>> A dedicated feature bit is introduced for this. >>>> >>> This is not about the feature bit, but about the mechanism. But your >>> subsequent answers explain, that this is nothing unusual, and then >>> we should be fine. >>> >>>>> Is it for hardware devices (e.g. PCI) standard to request an >>>>> operation by writing some value into a register, and get feedback bout >>>>> a non-completion by reading different value that written, >>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works >>>> like this: >>>> >>>> """ >>>> >>>> The device MUST NOT offer a feature which requires another feature which >>>> was not offered. The device SHOULD accept any valid subset of features >>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device >>>> status bit when the driver writes it. >>>> >>>> """ >>>> >>> Thanks for the pointer. I intend to have another look at how FEATURES_OK >>> works, and how similar this is to DEVICE_STOPPED. >> >> My understanding is that for both of them, driver can try to set the bit >> by writing to the status register but it's the device that decide when >> to set the bit. > I think there's a difference: For FEATURES_OK, the driver can read back > the status and knows that the feature combination is not accepted if it > is not set. For DEVICE_STOPPED, it does not really know whether the > device has started to stop yet. It only knows that the device is > actually done stopping when DEVICE_STOPPED is set. The only difference is that device may need sometime to be stopped. FEATURES_OK might not be the best example. Let's see how reset work which is more similar to DEVICE_STOPPED: """ After writing 0 to device_status, the driver MUST wait for a read of device_status to return 0 before reinitializing the device. """ > > Do we need a different mechanism for the device to signal the driver > that the device has actually been stopped? If the driver sees the > STOPPED status after reading back, it knows that the device has > acknowledged the request and is now stopping down. If it is not set, it > means that the device has not honoured the request. Same for clearing > it to resume. I don't see much difference. Device reset doesn't have such notification and driver may simply poll or use a timer. Thanks > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-22 12:51 ` Jason Wang @ 2020-12-22 15:54 ` Cornelia Huck 2020-12-23 2:48 ` Jason Wang 2020-12-24 4:52 ` Halil Pasic 1 sibling, 1 reply; 43+ messages in thread From: Cornelia Huck @ 2020-12-22 15:54 UTC (permalink / raw) To: Jason Wang; +Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare On Tue, 22 Dec 2020 20:51:04 +0800 Jason Wang <jasowang@redhat.com> wrote: > On 2020/12/22 下午8:14, Cornelia Huck wrote: > > On Tue, 22 Dec 2020 15:30:31 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > >> On 2020/12/22 下午2:50, Halil Pasic wrote: > >>> On Tue, 22 Dec 2020 10:36:41 +0800 > >>> Jason Wang <jasowang@redhat.com> wrote: > >>> > >>>> On 2020/12/22 上午5:33, Halil Pasic wrote: > >>>>> On Fri, 18 Dec 2020 12:23:02 +0800 > >>>>> Jason Wang <jasowang@redhat.com> wrote: > >>>>> > >>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be > >>>>>> used by the driver to stop and resume a device. The main user will be > >>>>>> live migration support for virtio device. > >>>>>> > >>>>> Can you please provide some more background information, or point > >>>>> me to the appropriate discussion? > >>>>> > >>>>> I mean AFAIK migration already works without this driver initiated > >>>>> drain. What is the exact motivation? What about the big picture? I > >>>>> guess some agent in the guest would have to make the driver issue > >>>>> the DEVICE_STOP. > >>>> This is not necessary if the datapath is done inside qemu and when > >>>> migration is initiated by qemu itself. > >>>> > >>>> But it's a must for using virtio-device as a backend for emulated virtio > >>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device > >>>> then it can safely synchronize the state from them. > >>>> > >>> You say, in this case qemu needs to stop the device, which makes sense > >>> (it also has to do this when the datapath is implemented in qemu), but > >>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm > >>> confused. > >> > >> It's initiated by Qemu. Guest is unware of live migration. > > But isn't setting DEVICE_STOPPED a _driver_ initiated process? That > > sounds like "guest" to me. > > > I think we need clarification on the "driver". Looking through the spec, I think we have never properly defined what we mean by "device" or "driver". Not sure what a good definition would look like. I would consider a "device" an entity that is present and is configured/used by a "driver". The "driver" is the means to actually configure/use a "device", i.e. the "initiating" part (even though some actions are initiated by the device once the driver has started interacting with it). > Think the following setup: > > Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA > driver) > > It's the Qemu that initiates the stop, and the request is forwarded to > virtio-pci vDPA driver. So the process is transparent to the guest (driver). So it's Guest driver <-> QEMU <-> vhost-vdpa <-> vDPA parent |<-- guest -->|<-------------- host -------------->| but the vDPA parent is itself acting as a "driver" if you take my attempt at a definition above? (Wouldn't that mean that the device in QEMU is configured/used from two entities?) > > > > > >> > >>> I'm still curious about how the different components in the stack > >>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are > >>> supposed to interact. > >> > >> It works like: > >> > >> From Qemu point of view, vhost-vDPA is just another type of vhost > >> backend. Qemu needs to stop virtio (vhost) before it can do migration. > >> So we require vDPA devices to have the ability of stopping or pausing > >> its datapath. If the vDPA device is by chance the virtio-PCI device, it > >> needs an interface for receiving stop/resume command from the driver. > >> > >> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then > >> to vDPA parent which could be a virtio-PCI device in this case. > > But QEMU implements the _device_, not the driver, doesn't it? > > > The device is implemented with the co-operation between Qemu and > vhost-vDPA. During migration qemu need to stop the virtio-net device, > then vhost must be stopped as well. Yes, it makes sense that any vhost parts need to be stopped as well. > > > > And IIUC, > > vhost-VDPA and the vDPA parent are also on the device side. I feel like > > I'm missing something essential here. > > > Virtio-PCI driver could be a vDPA parent as well in this case. So we > need stop the virtio-pci device. Who is the "driver" in that case? > > > > > >> > >>> > >>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com> > >>>>>> --- > >>>>>> content.tex | 26 ++++++++++++++++++++++++-- > >>>>>> 1 file changed, 24 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/content.tex b/content.tex > >>>>>> index 61eab41..4392b60 100644 > >>>>>> --- a/content.tex > >>>>>> +++ b/content.tex > >>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > >>>>>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > >>>>>> drive the device. > >>>>>> > >>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, > >>>>>> + indicates that the device has been stopped by the driver. > >>>>>> + > >>>>> AFAIU it is not only about indicating stopped, but also requesting to be > >>>>> stopped. > >>>>> > >>>>> More importantly, that must not be set immediately, in a sense that the > >>>>> one side initiates some action by requesting the bit to be set, and the > >>>>> other side must not set the bit before the action is performed. > >>>> Yes. > >>>> > >>>> > >>>>> We also > >>>>> seem to assume that every device implementation is capable of performing > >>>>> this trick. > >>>> A dedicated feature bit is introduced for this. > >>>> > >>> This is not about the feature bit, but about the mechanism. But your > >>> subsequent answers explain, that this is nothing unusual, and then > >>> we should be fine. > >>> > >>>>> Is it for hardware devices (e.g. PCI) standard to request an > >>>>> operation by writing some value into a register, and get feedback bout > >>>>> a non-completion by reading different value that written, > >>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works > >>>> like this: > >>>> > >>>> """ > >>>> > >>>> The device MUST NOT offer a feature which requires another feature which > >>>> was not offered. The device SHOULD accept any valid subset of features > >>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device > >>>> status bit when the driver writes it. > >>>> > >>>> """ > >>>> > >>> Thanks for the pointer. I intend to have another look at how FEATURES_OK > >>> works, and how similar this is to DEVICE_STOPPED. > >> > >> My understanding is that for both of them, driver can try to set the bit > >> by writing to the status register but it's the device that decide when > >> to set the bit. > > I think there's a difference: For FEATURES_OK, the driver can read back > > the status and knows that the feature combination is not accepted if it > > is not set. For DEVICE_STOPPED, it does not really know whether the > > device has started to stop yet. It only knows that the device is > > actually done stopping when DEVICE_STOPPED is set. > > > The only difference is that device may need sometime to be stopped. > FEATURES_OK might not be the best example. Let's see how reset work > which is more similar to DEVICE_STOPPED: > > """ > > After writing 0 to device_status, the driver MUST wait for a read of > device_status to return 0 before reinitializing the device. > > """ Hm, but that is PCI-specific writing of fields in the configuration structure, right? CCW uses a different method for resetting (an asynchronous channel command; channel commands create an interrupt when done); "write 0 to status for reset" is not universal. Which makes me think: is using the device status a good universal method for stopping the device? If I look at it only from the CCW perspective, I'd have used a new channel command with a payload of stop/resume and only reflected the stopped state in the device status (i.e. read-only from the driver, and only changed by the device when it transitions to/from the stopped state.) Could PCI use a new field for that? > > > > > Do we need a different mechanism for the device to signal the driver > > that the device has actually been stopped? If the driver sees the > > STOPPED status after reading back, it knows that the device has > > acknowledged the request and is now stopping down. If it is not set, it > > means that the device has not honoured the request. Same for clearing > > it to resume. > > > I don't see much difference. Device reset doesn't have such notification > and driver may simply poll or use a timer. See above; CCW does not really use the device status in the same way. I think a notification would be useful. This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-22 15:54 ` Cornelia Huck @ 2020-12-23 2:48 ` Jason Wang 2020-12-25 7:38 ` Halil Pasic 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-23 2:48 UTC (permalink / raw) To: Cornelia Huck Cc: Halil Pasic, stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/22 下午11:54, Cornelia Huck wrote: > On Tue, 22 Dec 2020 20:51:04 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> On 2020/12/22 下午8:14, Cornelia Huck wrote: >>> On Tue, 22 Dec 2020 15:30:31 +0800 >>> Jason Wang <jasowang@redhat.com> wrote: >>> >>>> On 2020/12/22 下午2:50, Halil Pasic wrote: >>>>> On Tue, 22 Dec 2020 10:36:41 +0800 >>>>> Jason Wang <jasowang@redhat.com> wrote: >>>>> >>>>>> On 2020/12/22 上午5:33, Halil Pasic wrote: >>>>>>> On Fri, 18 Dec 2020 12:23:02 +0800 >>>>>>> Jason Wang <jasowang@redhat.com> wrote: >>>>>>> >>>>>>>> This patch introduces a new status bit DEVICE_STOPPED. This will be >>>>>>>> used by the driver to stop and resume a device. The main user will be >>>>>>>> live migration support for virtio device. >>>>>>>> >>>>>>> Can you please provide some more background information, or point >>>>>>> me to the appropriate discussion? >>>>>>> >>>>>>> I mean AFAIK migration already works without this driver initiated >>>>>>> drain. What is the exact motivation? What about the big picture? I >>>>>>> guess some agent in the guest would have to make the driver issue >>>>>>> the DEVICE_STOP. >>>>>> This is not necessary if the datapath is done inside qemu and when >>>>>> migration is initiated by qemu itself. >>>>>> >>>>>> But it's a must for using virtio-device as a backend for emulated virtio >>>>>> devices (e.g vhost-vDPA). In this case, qemu needs to stop the device >>>>>> then it can safely synchronize the state from them. >>>>>> >>>>> You say, in this case qemu needs to stop the device, which makes sense >>>>> (it also has to do this when the datapath is implemented in qemu), but >>>>> AFAIU DEVICE_STOPPED is initiated by the guest and not by qemu. I'm >>>>> confused. >>>> It's initiated by Qemu. Guest is unware of live migration. >>> But isn't setting DEVICE_STOPPED a _driver_ initiated process? That >>> sounds like "guest" to me. >> >> I think we need clarification on the "driver". > Looking through the spec, I think we have never properly defined what > we mean by "device" or "driver". Not sure what a good definition would > look like. I would consider a "device" an entity that is present and is > configured/used by a "driver". The "driver" is the means to actually > configure/use a "device", i.e. the "initiating" part (even though some > actions are initiated by the device once the driver has started > interacting with it). Sorry for being unclear. I think the spec is fine since it's not limit to a host/guest interface, the clarification is only needed for the context of the use cases of DEVICE_STOP. > >> Think the following setup: >> >> Guest driver <-> Qemu <-> vhost-vdpa <-> vDPA parent (virtio-pci vDPA >> driver) >> >> It's the Qemu that initiates the stop, and the request is forwarded to >> virtio-pci vDPA driver. So the process is transparent to the guest (driver). > So it's > > Guest driver <-> QEMU <-> vhost-vdpa <-> vDPA parent > |<-- guest -->|<-------------- host -------------->| > > but the vDPA parent is itself acting as a "driver" if you take my > attempt at a definition above? > > (Wouldn't that mean that the device in QEMU is configured/used from two > entities?) A more accurate version Guest virtio-pci driver <-> virtio-pci device QEMU <-> vhost-vdpa <-> virtio pci vDPA driver <-> virtio-pci device So we had two drivers: 1) The first that is running in the guest (it could be either L1 or L2) 2) The second that is running on the host (or L1 if the first is running in L2) And we had two devices 1) The first that is emulated by QEMU which is running in either L0 or L1 2) The second could be either a physical device on L0 or emulated one in L1 (Qemu is running in L1) The main use case for DEVICE_STOP the second device and driver. This driver is not the virtio-pci driver but a vDPA driver [1]. [1] https://lkml.org/lkml/2020/12/3/1589 > >> >>> >>>> >>>>> I'm still curious about how the different components in the stack >>>>> (guest OS, qemu, vdpa-vhost in host kernel, the PCI function) are >>>>> supposed to interact. >>>> It works like: >>>> >>>> From Qemu point of view, vhost-vDPA is just another type of vhost >>>> backend. Qemu needs to stop virtio (vhost) before it can do migration. >>>> So we require vDPA devices to have the ability of stopping or pausing >>>> its datapath. If the vDPA device is by chance the virtio-PCI device, it >>>> needs an interface for receiving stop/resume command from the driver. >>>> >>>> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then >>>> to vDPA parent which could be a virtio-PCI device in this case. >>> But QEMU implements the _device_, not the driver, doesn't it? >> >> The device is implemented with the co-operation between Qemu and >> vhost-vDPA. During migration qemu need to stop the virtio-net device, >> then vhost must be stopped as well. > Yes, it makes sense that any vhost parts need to be stopped as well. > >> >>> And IIUC, >>> vhost-VDPA and the vDPA parent are also on the device side. I feel like >>> I'm missing something essential here. >> >> Virtio-PCI driver could be a vDPA parent as well in this case. So we >> need stop the virtio-pci device. > Who is the "driver" in that case? It's the virtio-PCI vDPA driver. > >> >>> >>>> >>>>> >>>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com> >>>>>>>> --- >>>>>>>> content.tex | 26 ++++++++++++++++++++++++-- >>>>>>>> 1 file changed, 24 insertions(+), 2 deletions(-) >>>>>>>> >>>>>>>> diff --git a/content.tex b/content.tex >>>>>>>> index 61eab41..4392b60 100644 >>>>>>>> --- a/content.tex >>>>>>>> +++ b/content.tex >>>>>>>> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >>>>>>>> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >>>>>>>> drive the device. >>>>>>>> >>>>>>>> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >>>>>>>> + indicates that the device has been stopped by the driver. >>>>>>>> + >>>>>>> AFAIU it is not only about indicating stopped, but also requesting to be >>>>>>> stopped. >>>>>>> >>>>>>> More importantly, that must not be set immediately, in a sense that the >>>>>>> one side initiates some action by requesting the bit to be set, and the >>>>>>> other side must not set the bit before the action is performed. >>>>>> Yes. >>>>>> >>>>>> >>>>>>> We also >>>>>>> seem to assume that every device implementation is capable of performing >>>>>>> this trick. >>>>>> A dedicated feature bit is introduced for this. >>>>>> >>>>> This is not about the feature bit, but about the mechanism. But your >>>>> subsequent answers explain, that this is nothing unusual, and then >>>>> we should be fine. >>>>> >>>>>>> Is it for hardware devices (e.g. PCI) standard to request an >>>>>>> operation by writing some value into a register, and get feedback bout >>>>>>> a non-completion by reading different value that written, >>>>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works >>>>>> like this: >>>>>> >>>>>> """ >>>>>> >>>>>> The device MUST NOT offer a feature which requires another feature which >>>>>> was not offered. The device SHOULD accept any valid subset of features >>>>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device >>>>>> status bit when the driver writes it. >>>>>> >>>>>> """ >>>>>> >>>>> Thanks for the pointer. I intend to have another look at how FEATURES_OK >>>>> works, and how similar this is to DEVICE_STOPPED. >>>> My understanding is that for both of them, driver can try to set the bit >>>> by writing to the status register but it's the device that decide when >>>> to set the bit. >>> I think there's a difference: For FEATURES_OK, the driver can read back >>> the status and knows that the feature combination is not accepted if it >>> is not set. For DEVICE_STOPPED, it does not really know whether the >>> device has started to stop yet. It only knows that the device is >>> actually done stopping when DEVICE_STOPPED is set. >> >> The only difference is that device may need sometime to be stopped. >> FEATURES_OK might not be the best example. Let's see how reset work >> which is more similar to DEVICE_STOPPED: >> >> """ >> >> After writing 0 to device_status, the driver MUST wait for a read of >> device_status to return 0 before reinitializing the device. >> >> """ > Hm, but that is PCI-specific writing of fields in the configuration > structure, right? CCW uses a different method for resetting (an > asynchronous channel command; channel commands create an interrupt when > done); "write 0 to status for reset" is not universal. I may miss something. Even if CCW is using something asynchronously (e.g interrupt). It still present a synchronous API to the upper layer. E.g ccw_io_helper() is using wait_event()? > > Which makes me think: is using the device status a good universal > method for stopping the device? This part I don't understand. Device stop is kind of similar to reset in this case. If reset can work what prevent stop work? E.g: ccw_io_helper(VIRTIO_CCW_DOING_WRITE_STATUS); while (1) { ccw_io_helper(VIRTIO_CCW_DOING_READ_STATUS); if (status & DEVICE_STOPPED) break; } > If I look at it only from the CCW > perspective, I'd have used a new channel command with a payload of > stop/resume and only reflected the stopped state in the device status > (i.e. read-only from the driver, and only changed by the device when it > transitions to/from the stopped state.) Could PCI use a new field for > that? PCI can use new filed. But I wonder how can that help for CCW. Thanks > >>> Do we need a different mechanism for the device to signal the driver >>> that the device has actually been stopped? If the driver sees the >>> STOPPED status after reading back, it knows that the device has >>> acknowledged the request and is now stopping down. If it is not set, it >>> means that the device has not honoured the request. Same for clearing >>> it to resume. >> >> I don't see much difference. Device reset doesn't have such notification >> and driver may simply poll or use a timer. > See above; CCW does not really use the device status in the same way. I > think a notification would be useful. This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-23 2:48 ` Jason Wang @ 2020-12-25 7:38 ` Halil Pasic 2020-12-27 10:00 ` Michael S. Tsirkin 2020-12-28 6:47 ` Jason Wang 0 siblings, 2 replies; 43+ messages in thread From: Halil Pasic @ 2020-12-25 7:38 UTC (permalink / raw) To: Jason Wang Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On Wed, 23 Dec 2020 10:48:45 +0800 Jason Wang <jasowang@redhat.com> wrote: [..] > >>>>>>> Is it for hardware devices (e.g. PCI) standard to request an > >>>>>>> operation by writing some value into a register, and get feedback bout > >>>>>>> a non-completion by reading different value that written, > >>>>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works > >>>>>> like this: > >>>>>> > >>>>>> """ > >>>>>> > >>>>>> The device MUST NOT offer a feature which requires another feature which > >>>>>> was not offered. The device SHOULD accept any valid subset of features > >>>>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device > >>>>>> status bit when the driver writes it. > >>>>>> > >>>>>> """ > >>>>>> > >>>>> Thanks for the pointer. I intend to have another look at how FEATURES_OK > >>>>> works, and how similar this is to DEVICE_STOPPED. > >>>> My understanding is that for both of them, driver can try to set the bit > >>>> by writing to the status register but it's the device that decide when > >>>> to set the bit. > >>> I think there's a difference: For FEATURES_OK, the driver can read back > >>> the status and knows that the feature combination is not accepted if it > >>> is not set. For DEVICE_STOPPED, it does not really know whether the > >>> device has started to stop yet. It only knows that the device is > >>> actually done stopping when DEVICE_STOPPED is set. > >> I share Connie's concern. The config->set_status() is a synchronous API, that is right after the iowrite8 for pci comes back, or the channel program finishes executing for ccw, we can go, do a config->get_status() and know if the status was set succesfully. > >> The only difference is that device may need sometime to be stopped. > >> FEATURES_OK might not be the best example. Let's see how reset work > >> which is more similar to DEVICE_STOPPED: > >> > >> """ > >> > >> After writing 0 to device_status, the driver MUST wait for a read of > >> device_status to return 0 before reinitializing the device. > >> > >> """ > > Hm, but that is PCI-specific writing of fields in the configuration > > structure, right? CCW uses a different method for resetting (an > > asynchronous channel command; channel commands create an interrupt when > > done); "write 0 to status for reset" is not universal. > > > I may miss something. Even if CCW is using something asynchronously (e.g > interrupt). It still present a synchronous API to the upper layer. E.g > ccw_io_helper() is using wait_event()? Right. The reason why we have to wait for the completion of the channel program is exactly that. Furthermore if we have to fail setting FEATURES_OK we fail the channel program, that was supposed to do the operation with an unit check. So at the point where the set_status() is done, we already know if it worked or not. Of course the device can change it's status any time. Since we have a synchronous API setting DEVICE_STOPPED would also have to block until all in-flight requests are completed. But this does not see to be what you want. I understood that virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set, if there are still outstanding requests. I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED) (followed by a readback) as asynchronous is a good idea. BTW regarding 'The device SHOULD accept any valid subset of features the driver accepts, otherwise it MUST fail to set the FEATURES_OK device status bit when the driver writes it.' I'm a bit puzzled what 'must fail to set' actually means. From CPU perspective the 'set the FEATURE_OK status bit' boils down to an instruction generated for iowrite8. As far as I understand this instruction is not supposed to fail (on s390 there are multiple ways in which a pci store can fail). AFAIK x86 uses normal memory access instructions. I suppose when a store instruction does not fail, then the store must happen. Or is that only true for RAM and for PCI memory not? > > > > > > Which makes me think: is using the device status a good universal > > method for stopping the device? > > > This part I don't understand. Device stop is kind of similar to reset in > this case. If reset can work what prevent stop work? > > E.g: > > ccw_io_helper(VIRTIO_CCW_DOING_WRITE_STATUS); > while (1) { > ccw_io_helper(VIRTIO_CCW_DOING_READ_STATUS); > if (status & DEVICE_STOPPED) > break; > } > You are right I think. It can work for virtio-ccw, if setting DEVICE_STOPPED is synchronous. For reset we use the CCW_CMD_VDEV_RESET ccw command (see in the spec), but that is a design choice. I think setting the status to 0 via CCW_CMD_WRITE_STATUS should also work. The code below is from qemu and you should look for the invocation of virtio_ccw_reset_virtio(). I guess, that is an undocumented feature. And we can get away with setting the status first, and then doing the reset here, because the implementation guarantees that the driver won't see the new status value until the reset is actually done. (This is not a property of the architecture, but a property of the implementation). case CCW_CMD_WRITE_STATUS: if (check_len) { if (ccw.count != sizeof(status)) { ret = -EINVAL; break; } } else if (ccw.count < sizeof(status)) { /* Can't execute command. */ ret = -EINVAL; break; } if (!ccw.cda) { ret = -EFAULT; } else { ccw_dstream_read(&sch->cds, status); if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) { virtio_ccw_stop_ioeventfd(dev); } if (virtio_set_status(vdev, status) == 0) { if (vdev->status == 0) { virtio_ccw_reset_virtio(dev, vdev); } if (status & VIRTIO_CONFIG_S_DRIVER_OK) { virtio_ccw_start_ioeventfd(dev); } sch->curr_status.scsw.count = ccw.count - sizeof(status); ret = 0; } else { /* Trigger a command reject. */ ret = -ENOSYS; } } > > > If I look at it only from the CCW > > perspective, I'd have used a new channel command with a payload of > > stop/resume and only reflected the stopped state in the device status > > (i.e. read-only from the driver, and only changed by the device when it > > transitions to/from the stopped state.) Could PCI use a new field for > > that? > > > PCI can use new filed. But I wonder how can that help for CCW. Using a new mechanism to request the device to stop would help with the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK synchronous, and make REQUEST_STOP asynchronous. A new PCI field would have the downside, of being PCI specific. This feature makes sense to me regardless of the transport. But I don't think it is likely to become relevant for the ccw any time soon. Regards, Halil This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-25 7:38 ` Halil Pasic @ 2020-12-27 10:00 ` Michael S. Tsirkin 2020-12-28 6:21 ` Halil Pasic 2020-12-28 6:47 ` Jason Wang 1 sibling, 1 reply; 43+ messages in thread From: Michael S. Tsirkin @ 2020-12-27 10:00 UTC (permalink / raw) To: Halil Pasic Cc: Jason Wang, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote: >When driver is trying to set DEVICE_STOPPED, the device MUST not >process new avail requests and MUST complete all requests that is >currently processing before setting DEVICE_STOPPED. ... > Since we have a synchronous API setting DEVICE_STOPPED would also have to > block until all in-flight requests are completed. Judging from the surrounding discussion, when you say complete you seem to mean "use", and I'm not sure how you define in flight, but it seems there could be many of these (up to a full queue?) so waiting for all available buffers to be used might indeed require an asynchronous interface, which gets complex very quickly. However wouldn't it be possible for device to just cancel processing available buffers even if it started processing them? Some buffers could be in indeterminate state (e.g. we might not have a way to know how much data did device have time to write into a buffer). Making it clearer what does "complete" mean here might help. -- MST This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-27 10:00 ` Michael S. Tsirkin @ 2020-12-28 6:21 ` Halil Pasic 2020-12-28 7:01 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Halil Pasic @ 2020-12-28 6:21 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On Sun, 27 Dec 2020 05:00:05 -0500 "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote: > >When driver is trying to set DEVICE_STOPPED, the device MUST not > >process new avail requests and MUST complete all requests that is > >currently processing before setting DEVICE_STOPPED. > > ... > > > Since we have a synchronous API setting DEVICE_STOPPED would also have to > > block until all in-flight requests are completed. > > Judging from the surrounding discussion, > when you say complete you seem to mean "use", and My intention was merely to paraphrase Jason's proposal which says: +When driver is trying to set DEVICE_STOPPED, the device MUST not +process new avail requests and MUST complete all requests that is +currently processing before setting DEVICE_STOPPED. > I'm not sure how you define in flight, but > it seems there could be many of these (up to a full queue?) In the follow-on discussion 'in-flight' emerged as an alternative to 'all requests that is currently processing' form the proposed text. A large part of the discussion is, IMHO, about finding precise definitions, for what the quoted paragraph is trying to express. > so waiting for all available buffers to be > used might indeed require an asynchronous interface, > which gets complex very quickly. > > However wouldn't it be possible for device to just cancel > processing available buffers even if it started processing > them? Some buffers could be in indeterminate state > (e.g. we might not have a way to know how much data did > device have time to write into a buffer). > Maybe Jason can answer this one. > Making it clearer what does "complete" mean here might help. > I think what we are trying to accomplish here, is avoiding, having non-standardised state in device (that can not be dropped) when migrating. I'm still struggling with wrapping my mind around the problem. AFAIU migration and migratability is not really a feature of the virtio standard, but can be a feature of it's implementation (e.g. QEMU & KVM), where the standard does help a great deal by having certain aspects of the operation and interaction nailed down. Regards, Halil This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-28 6:21 ` Halil Pasic @ 2020-12-28 7:01 ` Jason Wang 2020-12-28 12:30 ` Michael S. Tsirkin 2020-12-29 13:35 ` Halil Pasic 0 siblings, 2 replies; 43+ messages in thread From: Jason Wang @ 2020-12-28 7:01 UTC (permalink / raw) To: Halil Pasic, Michael S. Tsirkin Cc: Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On 2020/12/28 下午2:21, Halil Pasic wrote: > On Sun, 27 Dec 2020 05:00:05 -0500 > "Michael S. Tsirkin" <mst@redhat.com> wrote: > >> On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote: >>> When driver is trying to set DEVICE_STOPPED, the device MUST not >>> process new avail requests and MUST complete all requests that is >>> currently processing before setting DEVICE_STOPPED. >> ... >> >>> Since we have a synchronous API setting DEVICE_STOPPED would also have to >>> block until all in-flight requests are completed. >> Judging from the surrounding discussion, >> when you say complete you seem to mean "use", and > My intention was merely to paraphrase Jason's proposal which says: > > +When driver is trying to set DEVICE_STOPPED, the device MUST not > +process new avail requests and MUST complete all requests that is > +currently processing before setting DEVICE_STOPPED. > >> I'm not sure how you define in flight, but >> it seems there could be many of these (up to a full queue?) > In the follow-on discussion 'in-flight' emerged as an alternative to > 'all requests that is currently processing' form the proposed text. > > A large part of the discussion is, IMHO, about finding precise > definitions, for what the quoted paragraph is trying to express. > >> so waiting for all available buffers to be >> used might indeed require an asynchronous interface, >> which gets complex very quickly. Some part of the virtio has enforced an asynchronous interface during reset: For MMIO the spec said: """ To stop using the queue the driver MUST write zero (0x0) to this QueueReady and MUST read the value back to ensure synchronization. """ For PCI it said: """ After writing 0 to device_status, the driver MUST wait for a read of device_status to return 0 before reinitializing the device. """ >> >> However wouldn't it be possible for device to just cancel >> processing available buffers even if it started processing >> them? Some buffers could be in indeterminate state >> (e.g. we might not have a way to know how much data did >> device have time to write into a buffer). >> > Maybe Jason can answer this one. For networking device, it should be possible (packet could be dropped or duplicated). But I'm not sure it works for other device. E.g for the block devices that needs to communicate with a remote backend. Waat's more important, my understadning is that the intermediate state is something that we need to avoid (hard to be migrated anyhow). > >> Making it clearer what does "complete" mean here might help. >> > I think what we are trying to accomplish here, is avoiding, having > non-standardised state in device (that can not be dropped) when > migrating. > > I'm still struggling with wrapping my mind around the problem. AFAIU > migration and migratability is not really a feature of the virtio > standard, but can be a feature of it's implementation > (e.g. QEMU & KVM), where the standard does help a great deal by having > certain aspects of the operation and interaction nailed down. It looks like a must (at least from the level of semantics) for designing software API for vDPA. Using a standard spec may help to avoid subtle misunderstanding of different vendors. Thanks > > Regards, > Halil > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-28 7:01 ` Jason Wang @ 2020-12-28 12:30 ` Michael S. Tsirkin 2020-12-29 9:04 ` Jason Wang 2020-12-29 13:35 ` Halil Pasic 1 sibling, 1 reply; 43+ messages in thread From: Michael S. Tsirkin @ 2020-12-28 12:30 UTC (permalink / raw) To: Jason Wang Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On Mon, Dec 28, 2020 at 03:01:57PM +0800, Jason Wang wrote: > > On 2020/12/28 下午2:21, Halil Pasic wrote: > > On Sun, 27 Dec 2020 05:00:05 -0500 > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote: > > > > When driver is trying to set DEVICE_STOPPED, the device MUST not > > > > process new avail requests and MUST complete all requests that is > > > > currently processing before setting DEVICE_STOPPED. > > > ... > > > > > > > Since we have a synchronous API setting DEVICE_STOPPED would also have to > > > > block until all in-flight requests are completed. > > > Judging from the surrounding discussion, > > > when you say complete you seem to mean "use", and > > My intention was merely to paraphrase Jason's proposal which says: > > > > +When driver is trying to set DEVICE_STOPPED, the device MUST not > > +process new avail requests and MUST complete all requests that is > > +currently processing before setting DEVICE_STOPPED. > > > > > I'm not sure how you define in flight, but > > > it seems there could be many of these (up to a full queue?) > > In the follow-on discussion 'in-flight' emerged as an alternative to > > 'all requests that is currently processing' form the proposed text. > > > > A large part of the discussion is, IMHO, about finding precise > > definitions, for what the quoted paragraph is trying to express. > > > > > so waiting for all available buffers to be > > > used might indeed require an asynchronous interface, > > > which gets complex very quickly. > > > Some part of the virtio has enforced an asynchronous interface during reset: > > For MMIO the spec said: > > """ > > To stop using the queue the driver MUST write zero (0x0) to this QueueReady > and MUST read the value back to ensure synchronization. > > """ > > For PCI it said: > > """ > > After writing 0 to device_status, the driver MUST wait for a read of > device_status to return 0 before reinitializing the device. > > """ Absolutely. It's ok for simple things. However doing anything major like waiting for IO in a tight loop is a problem. > > > > > > > However wouldn't it be possible for device to just cancel > > > processing available buffers even if it started processing > > > them? Some buffers could be in indeterminate state > > > (e.g. we might not have a way to know how much data did > > > device have time to write into a buffer). > > > > > Maybe Jason can answer this one. > > > For networking device, it should be possible (packet could be dropped or > duplicated). But I'm not sure it works for other device. E.g for the block > devices that needs to communicate with a remote backend. Waat's more > important, my understadning is that the intermediate state is something that > we need to avoid (hard to be migrated anyhow). The difficulty is on the device side though, so why not say: if it wants to flush IO that's up to it. > > > > > > Making it clearer what does "complete" mean here might help. > > > > > I think what we are trying to accomplish here, is avoiding, having > > non-standardised state in device (that can not be dropped) when > > migrating. > > > > I'm still struggling with wrapping my mind around the problem. AFAIU > > migration and migratability is not really a feature of the virtio > > standard, but can be a feature of it's implementation > > (e.g. QEMU & KVM), where the standard does help a great deal by having > > certain aspects of the operation and interaction nailed down. > > > It looks like a must (at least from the level of semantics) for designing > software API for vDPA. Using a standard spec may help to avoid subtle > misunderstanding of different vendors. > > Thanks > > > > > > Regards, > > Halil > > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-28 12:30 ` Michael S. Tsirkin @ 2020-12-29 9:04 ` Jason Wang 2021-01-12 10:54 ` Michael S. Tsirkin 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-29 9:04 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare ----- Original Message ----- > On Mon, Dec 28, 2020 at 03:01:57PM +0800, Jason Wang wrote: > > > > On 2020/12/28 下午2:21, Halil Pasic wrote: > > > On Sun, 27 Dec 2020 05:00:05 -0500 > > > "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > > > > > On Fri, Dec 25, 2020 at 08:38:35AM +0100, Halil Pasic wrote: > > > > > When driver is trying to set DEVICE_STOPPED, the device MUST not > > > > > process new avail requests and MUST complete all requests that is > > > > > currently processing before setting DEVICE_STOPPED. > > > > ... > > > > > > > > > Since we have a synchronous API setting DEVICE_STOPPED would also > > > > > have to > > > > > block until all in-flight requests are completed. > > > > Judging from the surrounding discussion, > > > > when you say complete you seem to mean "use", and > > > My intention was merely to paraphrase Jason's proposal which says: > > > > > > +When driver is trying to set DEVICE_STOPPED, the device MUST not > > > +process new avail requests and MUST complete all requests that is > > > +currently processing before setting DEVICE_STOPPED. > > > > > > > I'm not sure how you define in flight, but > > > > it seems there could be many of these (up to a full queue?) > > > In the follow-on discussion 'in-flight' emerged as an alternative to > > > 'all requests that is currently processing' form the proposed text. > > > > > > A large part of the discussion is, IMHO, about finding precise > > > definitions, for what the quoted paragraph is trying to express. > > > > > > > so waiting for all available buffers to be > > > > used might indeed require an asynchronous interface, > > > > which gets complex very quickly. > > > > > > Some part of the virtio has enforced an asynchronous interface during > > reset: > > > > For MMIO the spec said: > > > > """ > > > > To stop using the queue the driver MUST write zero (0x0) to this QueueReady > > and MUST read the value back to ensure synchronization. > > > > """ > > > > For PCI it said: > > > > """ > > > > After writing 0 to device_status, the driver MUST wait for a read of > > device_status to return 0 before reinitializing the device. > > > > """ > > Absolutely. It's ok for simple things. > However doing anything major like waiting for IO > in a tight loop is a problem. It's mainly used for Qemu. And driver can choose to cancel the stop by trying to clear the bit. Does this work? > > > > > > > > > > > > > However wouldn't it be possible for device to just cancel > > > > processing available buffers even if it started processing > > > > them? Some buffers could be in indeterminate state > > > > (e.g. we might not have a way to know how much data did > > > > device have time to write into a buffer). > > > > > > > Maybe Jason can answer this one. > > > > > > For networking device, it should be possible (packet could be dropped or > > duplicated). But I'm not sure it works for other device. E.g for the block > > devices that needs to communicate with a remote backend. Waat's more > > important, my understadning is that the intermediate state is something > > that > > we need to avoid (hard to be migrated anyhow). > > The difficulty is on the device side though, so why not say: if it wants to > flush IO that's up to it. Do you mean to introduce a device specific way to flush IO? If yes, it actually introduces a new intermediate state implicitly: 1) device is stopped 2) device is stopped and IO is flushed This looks more complicated than a single new state: 1) device is stopped and IO is flushed Thanks > > > > > > > > > > Making it clearer what does "complete" mean here might help. > > > > > > > I think what we are trying to accomplish here, is avoiding, having > > > non-standardised state in device (that can not be dropped) when > > > migrating. > > > > > > I'm still struggling with wrapping my mind around the problem. AFAIU > > > migration and migratability is not really a feature of the virtio > > > standard, but can be a feature of it's implementation > > > (e.g. QEMU & KVM), where the standard does help a great deal by having > > > certain aspects of the operation and interaction nailed down. > > > > > > It looks like a must (at least from the level of semantics) for designing > > software API for vDPA. Using a standard spec may help to avoid subtle > > misunderstanding of different vendors. > > > > Thanks > > > > > > > > > > Regards, > > > Halil > > > > > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-29 9:04 ` Jason Wang @ 2021-01-12 10:54 ` Michael S. Tsirkin 2021-01-13 3:35 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Michael S. Tsirkin @ 2021-01-12 10:54 UTC (permalink / raw) To: Jason Wang Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On Tue, Dec 29, 2020 at 04:04:27AM -0500, Jason Wang wrote: > > > > > However wouldn't it be possible for device to just cancel > > > > > processing available buffers even if it started processing > > > > > them? Some buffers could be in indeterminate state > > > > > (e.g. we might not have a way to know how much data did > > > > > device have time to write into a buffer). > > > > > > > > > Maybe Jason can answer this one. > > > > > > > > > For networking device, it should be possible (packet could be dropped or > > > duplicated). But I'm not sure it works for other device. E.g for the block > > > devices that needs to communicate with a remote backend. Waat's more > > > important, my understadning is that the intermediate state is something > > > that > > > we need to avoid (hard to be migrated anyhow). > > > > The difficulty is on the device side though, so why not say: if it wants to > > flush IO that's up to it. > > Do you mean to introduce a device specific way to flush IO? If yes, it > actually introduces a new intermediate state implicitly: > > 1) device is stopped > 2) device is stopped and IO is flushed > > This looks more complicated than a single new state: > > 1) device is stopped and IO is flushed > > Thanks I thought some more about it, if we call a buffer which was made available by driver but not yet used by device, a cancelled buffer, then maybe we can specify the behaviour by explaining that it should be possible to retry any cancelled buffer, that is make it available again, and achieve the same effect as if it was used. For example, any reads without side effects can be just ignored even if they might have modified guest memory. -- MST This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2021-01-12 10:54 ` Michael S. Tsirkin @ 2021-01-13 3:35 ` Jason Wang 0 siblings, 0 replies; 43+ messages in thread From: Jason Wang @ 2021-01-13 3:35 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On 2021/1/12 下午6:54, Michael S. Tsirkin wrote: > On Tue, Dec 29, 2020 at 04:04:27AM -0500, Jason Wang wrote: >>>>>> However wouldn't it be possible for device to just cancel >>>>>> processing available buffers even if it started processing >>>>>> them? Some buffers could be in indeterminate state >>>>>> (e.g. we might not have a way to know how much data did >>>>>> device have time to write into a buffer). >>>>>> >>>>> Maybe Jason can answer this one. >>>> >>>> For networking device, it should be possible (packet could be dropped or >>>> duplicated). But I'm not sure it works for other device. E.g for the block >>>> devices that needs to communicate with a remote backend. Waat's more >>>> important, my understadning is that the intermediate state is something >>>> that >>>> we need to avoid (hard to be migrated anyhow). >>> The difficulty is on the device side though, so why not say: if it wants to >>> flush IO that's up to it. >> Do you mean to introduce a device specific way to flush IO? If yes, it >> actually introduces a new intermediate state implicitly: >> >> 1) device is stopped >> 2) device is stopped and IO is flushed >> >> This looks more complicated than a single new state: >> >> 1) device is stopped and IO is flushed >> >> Thanks > I thought some more about it, if we call a buffer which was > made available by driver but not yet used by device, For "used by device" you meant posting entries to used ring? > a cancelled buffer, then maybe we can specify > the behaviour by explaining that it should be possible to > retry any cancelled buffer, that is make it available again, > and achieve the same effect as if it was used. > For example, any reads without side effects can be just > ignored even if they might have modified guest memory. Right, but how about the buffer that can't be retried? Or you meant we leave that to a specific device to handle? THanks > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-28 7:01 ` Jason Wang 2020-12-28 12:30 ` Michael S. Tsirkin @ 2020-12-29 13:35 ` Halil Pasic 2020-12-30 8:15 ` Jason Wang 1 sibling, 1 reply; 43+ messages in thread From: Halil Pasic @ 2020-12-29 13:35 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On Mon, 28 Dec 2020 15:01:57 +0800 Jason Wang <jasowang@redhat.com> wrote: > Some part of the virtio has enforced an asynchronous interface during reset: > > For MMIO the spec said: > > """ > > To stop using the queue the driver MUST write zero (0x0) to this > QueueReady and MUST read the value back to ensure synchronization. > > """ I read the MMIO quote like a single read is sufficient to ensure synchronization. I.e. it does not require a loop which waits for the read to yield the expected value. > > For PCI it said: > > """ > > After writing 0 to device_status, the driver MUST wait for a read of > device_status to return 0 before reinitializing the device. > > """ Yes, this does sound like a loop. And that's what Linux does. But this is transport (PCI) specific. On a spec level, a reset is a distinct operation from setting device status (to 0). It just happens to be mapped to the PCI transport as setting the status to 0. For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while setting the status is mapped via CCW_CMD_WRITE_STATUS. Regards, Halil This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-29 13:35 ` Halil Pasic @ 2020-12-30 8:15 ` Jason Wang 2021-01-11 18:16 ` Cornelia Huck 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-30 8:15 UTC (permalink / raw) To: Halil Pasic Cc: Michael S. Tsirkin, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On 2020/12/29 下午9:35, Halil Pasic wrote: > On Mon, 28 Dec 2020 15:01:57 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> Some part of the virtio has enforced an asynchronous interface during reset: >> >> For MMIO the spec said: >> >> """ >> >> To stop using the queue the driver MUST write zero (0x0) to this >> QueueReady and MUST read the value back to ensure synchronization. >> >> """ > I read the MMIO quote like a single read is sufficient to ensure > synchronization. I.e. it does not require a loop which waits for > the read to yield the expected value. > >> For PCI it said: >> >> """ >> >> After writing 0 to device_status, the driver MUST wait for a read of >> device_status to return 0 before reinitializing the device. >> >> """ > Yes, this does sound like a loop. And that's what Linux does. But this > is transport (PCI) specific. On a spec level, a reset is a distinct > operation from setting device status (to 0). I'm not sure or it looks unclear to me for this point. E.g the device reset is mentioned in "Device Status Field" belongs to "Basic Facilities of a Virtio Device". But there's no "Device Reset" in "Basic Facilities of a Virtio Device". > It just happens to be > mapped to the PCI transport as setting the status to 0. MMIO did the same. And it makes sense since using a single API to configure/change device status looks simpler. > > For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while > setting the status is mapped via CCW_CMD_WRITE_STATUS. Yes, but I think actually there's no limitation if we want to tread 0 as a reset command for CCW_CMD_WRITE_STATUS. Thanks > > Regards, > Halil > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-30 8:15 ` Jason Wang @ 2021-01-11 18:16 ` Cornelia Huck 2021-01-12 3:27 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Cornelia Huck @ 2021-01-11 18:16 UTC (permalink / raw) To: Jason Wang Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment, eperezma, sgarzare On Wed, 30 Dec 2020 16:15:47 +0800 Jason Wang <jasowang@redhat.com> wrote: > On 2020/12/29 下午9:35, Halil Pasic wrote: > > On Mon, 28 Dec 2020 15:01:57 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > >> Some part of the virtio has enforced an asynchronous interface during reset: > >> > >> For MMIO the spec said: > >> > >> """ > >> > >> To stop using the queue the driver MUST write zero (0x0) to this > >> QueueReady and MUST read the value back to ensure synchronization. > >> > >> """ > > I read the MMIO quote like a single read is sufficient to ensure > > synchronization. I.e. it does not require a loop which waits for > > the read to yield the expected value. > > > >> For PCI it said: > >> > >> """ > >> > >> After writing 0 to device_status, the driver MUST wait for a read of > >> device_status to return 0 before reinitializing the device. > >> > >> """ > > Yes, this does sound like a loop. And that's what Linux does. But this > > is transport (PCI) specific. On a spec level, a reset is a distinct > > operation from setting device status (to 0). > > > I'm not sure or it looks unclear to me for this point. > > E.g the device reset is mentioned in "Device Status Field" belongs to > "Basic Facilities of a Virtio Device". But there's no "Device Reset" in > "Basic Facilities of a Virtio Device". I think it should be, just to make clear what a driver-initiated reset of a device actually resets (and that the method for doing so is transport-specific.) > > > > It just happens to be > > mapped to the PCI transport as setting the status to 0. > > > MMIO did the same. And it makes sense since using a single API to > configure/change device status looks simpler. > > > > > > For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while > > setting the status is mapped via CCW_CMD_WRITE_STATUS. > > > Yes, but I think actually there's no limitation if we want to tread 0 as > a reset command for CCW_CMD_WRITE_STATUS. I'm not a fan of the "driver writes 0 to status to initiate a device reset" method, but we are stuck with it. I think it's actually not working well with two other requirements in the spec: - "The driver MUST NOT clear a device status bit." - "The device MUST initialize device status to 0 upon reset." (This suggests to me that a zero status is something set by the device as the result of a reset request by the driver, and _not_ set by the driver.) Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be incompatible with older devices, wouldn't it? This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2021-01-11 18:16 ` Cornelia Huck @ 2021-01-12 3:27 ` Jason Wang 2021-01-12 12:13 ` Cornelia Huck 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2021-01-12 3:27 UTC (permalink / raw) To: Cornelia Huck Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment, eperezma, sgarzare On 2021/1/12 上午2:16, Cornelia Huck wrote: > On Wed, 30 Dec 2020 16:15:47 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> On 2020/12/29 下午9:35, Halil Pasic wrote: >>> On Mon, 28 Dec 2020 15:01:57 +0800 >>> Jason Wang <jasowang@redhat.com> wrote: >>> >>>> Some part of the virtio has enforced an asynchronous interface during reset: >>>> >>>> For MMIO the spec said: >>>> >>>> """ >>>> >>>> To stop using the queue the driver MUST write zero (0x0) to this >>>> QueueReady and MUST read the value back to ensure synchronization. >>>> >>>> """ >>> I read the MMIO quote like a single read is sufficient to ensure >>> synchronization. I.e. it does not require a loop which waits for >>> the read to yield the expected value. >>> >>>> For PCI it said: >>>> >>>> """ >>>> >>>> After writing 0 to device_status, the driver MUST wait for a read of >>>> device_status to return 0 before reinitializing the device. >>>> >>>> """ >>> Yes, this does sound like a loop. And that's what Linux does. But this >>> is transport (PCI) specific. On a spec level, a reset is a distinct >>> operation from setting device status (to 0). >> >> I'm not sure or it looks unclear to me for this point. >> >> E.g the device reset is mentioned in "Device Status Field" belongs to >> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in >> "Basic Facilities of a Virtio Device". > I think it should be, just to make clear what a driver-initiated reset > of a device actually resets (and that the method for doing so is > transport-specific.) Then we need some clarifications in the spec. It would be easily imply that reset is part of device status after reading "Basic Facilities of a Virtio Device". > >> >>> It just happens to be >>> mapped to the PCI transport as setting the status to 0. >> >> MMIO did the same. And it makes sense since using a single API to >> configure/change device status looks simpler. >> >> >>> For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while >>> setting the status is mapped via CCW_CMD_WRITE_STATUS. >> >> Yes, but I think actually there's no limitation if we want to tread 0 as >> a reset command for CCW_CMD_WRITE_STATUS. > I'm not a fan of the "driver writes 0 to status to initiate a device > reset" method, but we are stuck with it. I think it's actually not > working well with two other requirements in the spec: > > - "The driver MUST NOT clear a device status bit." Yes, that's kind of conflict but it was how PCI works now ... > - "The device MUST initialize device status to 0 upon reset." (This > suggests to me that a zero status is something set by the device as > the result of a reset request by the driver, and _not_ set by the > driver.) Not a native speaker but we probably need to define "set" and "clear" mean, E.g: Clear a bit from driver means write with that bit cleared. Set a bit from driver means write with that bit set. Clear a bit from device means read with that be cleared. Set a bit from device means read with that bit set. So a question here is how to trigger the device set as we discussed before. > > Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be > incompatible with older devices, wouldn't it? Probably, but we can make both methods work. Note that I'm not suggesting doing something like this. Just to point out it may work. And there's something not clear in the spec. Thanks > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2021-01-12 3:27 ` Jason Wang @ 2021-01-12 12:13 ` Cornelia Huck 2021-01-13 2:52 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Cornelia Huck @ 2021-01-12 12:13 UTC (permalink / raw) To: Jason Wang Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment, eperezma, sgarzare On Tue, 12 Jan 2021 11:27:20 +0800 Jason Wang <jasowang@redhat.com> wrote: > On 2021/1/12 上午2:16, Cornelia Huck wrote: > > On Wed, 30 Dec 2020 16:15:47 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > >> On 2020/12/29 下午9:35, Halil Pasic wrote: > >>> On Mon, 28 Dec 2020 15:01:57 +0800 > >>> Jason Wang <jasowang@redhat.com> wrote: > >>> > >>>> Some part of the virtio has enforced an asynchronous interface during reset: > >>>> > >>>> For MMIO the spec said: > >>>> > >>>> """ > >>>> > >>>> To stop using the queue the driver MUST write zero (0x0) to this > >>>> QueueReady and MUST read the value back to ensure synchronization. > >>>> > >>>> """ > >>> I read the MMIO quote like a single read is sufficient to ensure > >>> synchronization. I.e. it does not require a loop which waits for > >>> the read to yield the expected value. > >>> > >>>> For PCI it said: > >>>> > >>>> """ > >>>> > >>>> After writing 0 to device_status, the driver MUST wait for a read of > >>>> device_status to return 0 before reinitializing the device. > >>>> > >>>> """ > >>> Yes, this does sound like a loop. And that's what Linux does. But this > >>> is transport (PCI) specific. On a spec level, a reset is a distinct > >>> operation from setting device status (to 0). > >> > >> I'm not sure or it looks unclear to me for this point. > >> > >> E.g the device reset is mentioned in "Device Status Field" belongs to > >> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in > >> "Basic Facilities of a Virtio Device". > > I think it should be, just to make clear what a driver-initiated reset > > of a device actually resets (and that the method for doing so is > > transport-specific.) > > > Then we need some clarifications in the spec. It would be easily imply > that reset is part of device status after reading "Basic Facilities of a > Virtio Device". I can try to come up with a patch. > > > > > >> > >>> It just happens to be > >>> mapped to the PCI transport as setting the status to 0. > >> > >> MMIO did the same. And it makes sense since using a single API to > >> configure/change device status looks simpler. > >> > >> > >>> For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while > >>> setting the status is mapped via CCW_CMD_WRITE_STATUS. > >> > >> Yes, but I think actually there's no limitation if we want to tread 0 as > >> a reset command for CCW_CMD_WRITE_STATUS. > > I'm not a fan of the "driver writes 0 to status to initiate a device > > reset" method, but we are stuck with it. I think it's actually not > > working well with two other requirements in the spec: > > > > - "The driver MUST NOT clear a device status bit." > > > Yes, that's kind of conflict but it was how PCI works now ... Nod. > > > > - "The device MUST initialize device status to 0 upon reset." (This > > suggests to me that a zero status is something set by the device as > > the result of a reset request by the driver, and _not_ set by the > > driver.) > > > Not a native speaker but we probably need to define "set" and "clear" > mean, E.g: > > Clear a bit from driver means write with that bit cleared. Set a bit > from driver means write with that bit set. Agreed. > Clear a bit from device means read with that be cleared. Set a bit from > device means read with that bit set. I always saw the status as an actual part of the device. I.e., the device setting or clearing a bit means that the device is modifying the status in the device. The driver doing a read will get the status as it currently is within the device. IOW, the device "controls" the status, and the driver can submit changes to the status. [Maybe that's closer to how ccw operates with its commands for reading or writing the status. Are pci/mmio envisioning the status more like some kind of shared area?] > > So a question here is how to trigger the device set as we discussed before. > > > > > > Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be > > incompatible with older devices, wouldn't it? > > > Probably, but we can make both methods work. Note that I'm not > suggesting doing something like this. Just to point out it may work. And > there's something not clear in the spec. I'm still not convinced that's the way to go. But first, we should be clear on how the status really is supposed to work. This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2021-01-12 12:13 ` Cornelia Huck @ 2021-01-13 2:52 ` Jason Wang 2021-01-14 12:00 ` Cornelia Huck 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2021-01-13 2:52 UTC (permalink / raw) To: Cornelia Huck Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment, eperezma, sgarzare On 2021/1/12 下午8:13, Cornelia Huck wrote: > On Tue, 12 Jan 2021 11:27:20 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> On 2021/1/12 上午2:16, Cornelia Huck wrote: >>> On Wed, 30 Dec 2020 16:15:47 +0800 >>> Jason Wang <jasowang@redhat.com> wrote: >>> >>>> On 2020/12/29 下午9:35, Halil Pasic wrote: >>>>> On Mon, 28 Dec 2020 15:01:57 +0800 >>>>> Jason Wang <jasowang@redhat.com> wrote: >>>>> >>>>>> Some part of the virtio has enforced an asynchronous interface during reset: >>>>>> >>>>>> For MMIO the spec said: >>>>>> >>>>>> """ >>>>>> >>>>>> To stop using the queue the driver MUST write zero (0x0) to this >>>>>> QueueReady and MUST read the value back to ensure synchronization. >>>>>> >>>>>> """ >>>>> I read the MMIO quote like a single read is sufficient to ensure >>>>> synchronization. I.e. it does not require a loop which waits for >>>>> the read to yield the expected value. >>>>> >>>>>> For PCI it said: >>>>>> >>>>>> """ >>>>>> >>>>>> After writing 0 to device_status, the driver MUST wait for a read of >>>>>> device_status to return 0 before reinitializing the device. >>>>>> >>>>>> """ >>>>> Yes, this does sound like a loop. And that's what Linux does. But this >>>>> is transport (PCI) specific. On a spec level, a reset is a distinct >>>>> operation from setting device status (to 0). >>>> I'm not sure or it looks unclear to me for this point. >>>> >>>> E.g the device reset is mentioned in "Device Status Field" belongs to >>>> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in >>>> "Basic Facilities of a Virtio Device". >>> I think it should be, just to make clear what a driver-initiated reset >>> of a device actually resets (and that the method for doing so is >>> transport-specific.) >> >> Then we need some clarifications in the spec. It would be easily imply >> that reset is part of device status after reading "Basic Facilities of a >> Virtio Device". > I can try to come up with a patch. Thanks > >> >>> >>>> >>>>> It just happens to be >>>>> mapped to the PCI transport as setting the status to 0. >>>> MMIO did the same. And it makes sense since using a single API to >>>> configure/change device status looks simpler. >>>> >>>> >>>>> For the channel IO transport it is mapped via CCW_CMD_VDEV_RESET while >>>>> setting the status is mapped via CCW_CMD_WRITE_STATUS. >>>> Yes, but I think actually there's no limitation if we want to tread 0 as >>>> a reset command for CCW_CMD_WRITE_STATUS. >>> I'm not a fan of the "driver writes 0 to status to initiate a device >>> reset" method, but we are stuck with it. I think it's actually not >>> working well with two other requirements in the spec: >>> >>> - "The driver MUST NOT clear a device status bit." >> >> Yes, that's kind of conflict but it was how PCI works now ... > Nod. > >> >>> - "The device MUST initialize device status to 0 upon reset." (This >>> suggests to me that a zero status is something set by the device as >>> the result of a reset request by the driver, and _not_ set by the >>> driver.) >> >> Not a native speaker but we probably need to define "set" and "clear" >> mean, E.g: >> >> Clear a bit from driver means write with that bit cleared. Set a bit >> from driver means write with that bit set. > Agreed. > >> Clear a bit from device means read with that be cleared. Set a bit from >> device means read with that bit set. > I always saw the status as an actual part of the device. I.e., the > device setting or clearing a bit means that the device is modifying the > status in the device. The driver doing a read will get the status as it > currently is within the device. IOW, the device "controls" the status, > and the driver can submit changes to the status. > > [Maybe that's closer to how ccw operates with its commands for reading > or writing the status. Are pci/mmio envisioning the status more like > some kind of shared area?] My understanding is it's not a shared area, each MMIO read and write will generate a PCI transaction on the bus. Some developers upstream told me that in the PCIE Endpoint device, the MMIO area could be a shared memory from the view of endpoint device which makes it very hard to make current virtio-pci transport work[1]. [1] https://lore.kernel.org/linux-pci/20190827180114.5979-1-haotian.wang@sifive.com/T/ > >> So a question here is how to trigger the device set as we discussed before. >> >> >>> Also, treating CCW_CMD_WRITE_STATUS with 0 as a reset request would be >>> incompatible with older devices, wouldn't it? >> >> Probably, but we can make both methods work. Note that I'm not >> suggesting doing something like this. Just to point out it may work. And >> there's something not clear in the spec. > I'm still not convinced that's the way to go. But first, we should be > clear on how the status really is supposed to work. Right, I meant we had two ways to make the implement aligned with the spec: 1) implement zero write as reset (and maybe keep the old way), so we don't need to modify the spec 2) clarify that transport specific device reset is part of the basic facilities Either is fine I think. Thanks > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2021-01-13 2:52 ` Jason Wang @ 2021-01-14 12:00 ` Cornelia Huck 0 siblings, 0 replies; 43+ messages in thread From: Cornelia Huck @ 2021-01-14 12:00 UTC (permalink / raw) To: Jason Wang Cc: Halil Pasic, Michael S. Tsirkin, stefanha, virtio-comment, eperezma, sgarzare On Wed, 13 Jan 2021 10:52:32 +0800 Jason Wang <jasowang@redhat.com> wrote: > On 2021/1/12 下午8:13, Cornelia Huck wrote: > > On Tue, 12 Jan 2021 11:27:20 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > >> On 2021/1/12 上午2:16, Cornelia Huck wrote: > >>> On Wed, 30 Dec 2020 16:15:47 +0800 > >>> Jason Wang <jasowang@redhat.com> wrote: > >>> > >>>> On 2020/12/29 下午9:35, Halil Pasic wrote: > >>>>> On Mon, 28 Dec 2020 15:01:57 +0800 > >>>>> Jason Wang <jasowang@redhat.com> wrote: > >>>>> > >>>>>> Some part of the virtio has enforced an asynchronous interface during reset: > >>>>>> > >>>>>> For MMIO the spec said: > >>>>>> > >>>>>> """ > >>>>>> > >>>>>> To stop using the queue the driver MUST write zero (0x0) to this > >>>>>> QueueReady and MUST read the value back to ensure synchronization. > >>>>>> > >>>>>> """ > >>>>> I read the MMIO quote like a single read is sufficient to ensure > >>>>> synchronization. I.e. it does not require a loop which waits for > >>>>> the read to yield the expected value. > >>>>> > >>>>>> For PCI it said: > >>>>>> > >>>>>> """ > >>>>>> > >>>>>> After writing 0 to device_status, the driver MUST wait for a read of > >>>>>> device_status to return 0 before reinitializing the device. > >>>>>> > >>>>>> """ > >>>>> Yes, this does sound like a loop. And that's what Linux does. But this > >>>>> is transport (PCI) specific. On a spec level, a reset is a distinct > >>>>> operation from setting device status (to 0). > >>>> I'm not sure or it looks unclear to me for this point. > >>>> > >>>> E.g the device reset is mentioned in "Device Status Field" belongs to > >>>> "Basic Facilities of a Virtio Device". But there's no "Device Reset" in > >>>> "Basic Facilities of a Virtio Device". > >>> I think it should be, just to make clear what a driver-initiated reset > >>> of a device actually resets (and that the method for doing so is > >>> transport-specific.) > >> > >> Then we need some clarifications in the spec. It would be easily imply > >> that reset is part of device status after reading "Basic Facilities of a > >> Virtio Device". > > I can try to come up with a patch. > > > Thanks Attempt here: https://lists.oasis-open.org/archives/virtio-comment/202101/msg00020.html This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-25 7:38 ` Halil Pasic 2020-12-27 10:00 ` Michael S. Tsirkin @ 2020-12-28 6:47 ` Jason Wang 2020-12-29 13:20 ` Halil Pasic 1 sibling, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-28 6:47 UTC (permalink / raw) To: Halil Pasic Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/25 下午3:38, Halil Pasic wrote: > On Wed, 23 Dec 2020 10:48:45 +0800 > Jason Wang <jasowang@redhat.com> wrote: > [..] >>>>>>>>> Is it for hardware devices (e.g. PCI) standard to request an >>>>>>>>> operation by writing some value into a register, and get feedback bout >>>>>>>>> a non-completion by reading different value that written, >>>>>>>> This is not ununsal in other devices. And in fact, the FEATURES_OK works >>>>>>>> like this: >>>>>>>> >>>>>>>> """ >>>>>>>> >>>>>>>> The device MUST NOT offer a feature which requires another feature which >>>>>>>> was not offered. The device SHOULD accept any valid subset of features >>>>>>>> the driver accepts, otherwise it MUST fail to set the FEATURES_OK device >>>>>>>> status bit when the driver writes it. >>>>>>>> >>>>>>>> """ >>>>>>>> >>>>>>> Thanks for the pointer. I intend to have another look at how FEATURES_OK >>>>>>> works, and how similar this is to DEVICE_STOPPED. >>>>>> My understanding is that for both of them, driver can try to set the bit >>>>>> by writing to the status register but it's the device that decide when >>>>>> to set the bit. >>>>> I think there's a difference: For FEATURES_OK, the driver can read back >>>>> the status and knows that the feature combination is not accepted if it >>>>> is not set. For DEVICE_STOPPED, it does not really know whether the >>>>> device has started to stop yet. It only knows that the device is >>>>> actually done stopping when DEVICE_STOPPED is set. > I share Connie's concern. The config->set_status() is a synchronous API, > that is right after the iowrite8 for pci comes back, or the channel > program finishes executing for ccw, we can go, do a config->get_status() > and know if the status was set succesfully. > >>>> The only difference is that device may need sometime to be stopped. >>>> FEATURES_OK might not be the best example. Let's see how reset work >>>> which is more similar to DEVICE_STOPPED: >>>> >>>> """ >>>> >>>> After writing 0 to device_status, the driver MUST wait for a read of >>>> device_status to return 0 before reinitializing the device. >>>> >>>> """ >>> Hm, but that is PCI-specific writing of fields in the configuration >>> structure, right? CCW uses a different method for resetting (an >>> asynchronous channel command; channel commands create an interrupt when >>> done); "write 0 to status for reset" is not universal. >> >> I may miss something. Even if CCW is using something asynchronously (e.g >> interrupt). It still present a synchronous API to the upper layer. E.g >> ccw_io_helper() is using wait_event()? > Right. The reason why we have to wait for the completion of the channel > program is exactly that. Furthermore if we have to fail setting > FEATURES_OK we fail the channel program, that was supposed to do the > operation with an unit check. So at the point where the set_status() is > done, we already know if it worked or not. The only difference is, for FEATURES_OK, driver know it won't work any more. But for DEVICE_STOPPED, driver know it might work sometime in the future. > Of course the device can > change it's status any time. > > Since we have a synchronous API setting DEVICE_STOPPED would also have to > block until all in-flight requests are completed. But this does not see > to be what you want. I understood that > virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would > a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set, > if there are still outstanding requests. Yes. > > I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by > a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED) > (followed by a readback) as asynchronous is a good idea. The main reason is that DEVICE_STOPPED is a one of the possible state. We'd better do this by extending the current device status state machine. Otherwise it would be more complicated (e.g using another interface). It looks to me what you don't like is the driver API design? If yes, how about introduce something like virtio_try_add_status()? The difference is that it doesn't require a immediate result. > > BTW regarding 'The device SHOULD accept any valid subset of features > the driver accepts, otherwise it MUST fail to set the FEATURES_OK device > status bit when the driver writes it.' I'm a bit puzzled what 'must fail > to set' actually means. Good point, my understanding for "set XX bit" means the whether or not the bit could be read by the driver. We need some clarification here. > From CPU perspective the 'set the FEATURE_OK > status bit' boils down to an instruction generated for iowrite8. As far > as I understand this instruction is not supposed to fail (on s390 there > are multiple ways in which a pci store can fail). Yes, my understanding is that it doesn't mean fail the instruction. As you said it's sometime impossible a specific transport. > AFAIK x86 uses normal > memory access instructions. I suppose when a store instruction does not > fail, then the store must happen. Or is that only true for RAM and for > PCI memory not? AFAIK, PCI have non-posted write which can fail but the spec doesn't requires that. > > >> >>> Which makes me think: is using the device status a good universal >>> method for stopping the device? >> >> This part I don't understand. Device stop is kind of similar to reset in >> this case. If reset can work what prevent stop work? >> >> E.g: >> >> ccw_io_helper(VIRTIO_CCW_DOING_WRITE_STATUS); >> while (1) { >> ccw_io_helper(VIRTIO_CCW_DOING_READ_STATUS); >> if (status & DEVICE_STOPPED) >> break; >> } >> > You are right I think. It can work for virtio-ccw, if setting > DEVICE_STOPPED is synchronous. For reset we use the CCW_CMD_VDEV_RESET > ccw command (see in the spec), but that is a design choice. I think > setting the status to 0 via CCW_CMD_WRITE_STATUS should also work. The > code below is from qemu and you should look for the invocation of > virtio_ccw_reset_virtio(). I guess, that is an undocumented feature. Then I think it needs to be clarified in the spec. > And > we can get away with setting the status first, and then doing the reset > here, because the implementation guarantees that the driver won't see > the new status value until the reset is actually done. (This is not > a property of the architecture, but a property of the implementation). So did PCI: case VIRTIO_PCI_STATUS: if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { virtio_pci_stop_ioeventfd(proxy); } virtio_set_status(vdev, val & 0xFF); if (val & VIRTIO_CONFIG_S_DRIVER_OK) { virtio_pci_start_ioeventfd(proxy); } if (vdev->status == 0) { virtio_pci_reset(DEVICE(proxy)); } As you said, it's a implementation property since qemu implements a synchronous virtio_set_status(). But it might not be true for real hardware which may require more time or even hard to drain a queue in short time. That's why PCI spec mandate the driver to poll for the status after writing zero to device status. > > case CCW_CMD_WRITE_STATUS: > if (check_len) { > if (ccw.count != sizeof(status)) { > ret = -EINVAL; > break; > } > } else if (ccw.count < sizeof(status)) { > /* Can't execute command. */ > ret = -EINVAL; > break; > } > if (!ccw.cda) { > ret = -EFAULT; > } else { > ccw_dstream_read(&sch->cds, status); > if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) { > virtio_ccw_stop_ioeventfd(dev); > } > if (virtio_set_status(vdev, status) == 0) { > if (vdev->status == 0) { > virtio_ccw_reset_virtio(dev, vdev); > } > if (status & VIRTIO_CONFIG_S_DRIVER_OK) { > virtio_ccw_start_ioeventfd(dev); > } > sch->curr_status.scsw.count = ccw.count - sizeof(status); > ret = 0; > } else { > /* Trigger a command reject. */ > ret = -ENOSYS; > } > } > > >>> If I look at it only from the CCW >>> perspective, I'd have used a new channel command with a payload of >>> stop/resume and only reflected the stopped state in the device status >>> (i.e. read-only from the driver, and only changed by the device when it >>> transitions to/from the stopped state.) Could PCI use a new field for >>> that? >> >> PCI can use new filed. But I wonder how can that help for CCW. > Using a new mechanism to request the device to stop would help with > the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK > synchronous, and make REQUEST_STOP asynchronous. Right, but as said before, it looks more like a part of device status state machine, using new mechanism might bring extra complexity and confusion. Looking at CCW implementation of device status, I don't see any blocker for implementing it synchronously if we just wait for the stop of the device and then return? > > A new PCI field would have the downside, of being PCI specific. This > feature makes sense to me regardless of the transport. But I don't think > it is likely to become relevant for the ccw any time soon. Yes, it's better not limit the feature to a specific transport. Thanks > > Regards, > Halil > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-28 6:47 ` Jason Wang @ 2020-12-29 13:20 ` Halil Pasic 2020-12-30 8:03 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Halil Pasic @ 2020-12-29 13:20 UTC (permalink / raw) To: Jason Wang Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On Mon, 28 Dec 2020 14:47:21 +0800 Jason Wang <jasowang@redhat.com> wrote: > [..] > > Right. The reason why we have to wait for the completion of the channel > > program is exactly that. Furthermore if we have to fail setting > > FEATURES_OK we fail the channel program, that was supposed to do the > > operation with an unit check. So at the point where the set_status() is > > done, we already know if it worked or not. > > > The only difference is, for FEATURES_OK, driver know it won't work any > more. But for DEVICE_STOPPED, driver know it might work sometime in the > future. > If the channel program fails with a command reject, that that ccw is not allowed to cause further side effects. I.e. we can not fail the channel program, but then make the operation succeed later. > > > Of course the device can > > change it's status any time. > > > > Since we have a synchronous API setting DEVICE_STOPPED would also have to > > block until all in-flight requests are completed. But this does not see > > to be what you want. I understood that > > virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would > > a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set, > > if there are still outstanding requests. > > > Yes. > > > > > > I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by > > a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED) > > (followed by a readback) as asynchronous is a good idea. > > > The main reason is that DEVICE_STOPPED is a one of the possible state. > We'd better do this by extending the current device status state > machine. Otherwise it would be more complicated (e.g using another > interface). > > It looks to me what you don't like is the driver API design? If yes, how > about introduce something like virtio_try_add_status()? The difference > is that it doesn't require a immediate result. > Yes a new (Linux) driver API operation sounds good -- at least on the surface. I would rather call it virtio_add_status_async() than virtio_try_add_status() because for me 'try' tells that something may or may not work, but the outcome is decided when the function returns. E.g. try_lock(l) may or may not succeed the lock l, but when it returns you know if it did take l or not. But how would you implement that new operation? I guess it would still have to boil down to config->set_status(dev), and to set_status(s), being asynchronous when s & DEVICE_STOPPED and synchronous otherwise (and in particular FEATURES_OK is added to the status). Ultimately a set_status(b) boils down to a iowrite8(b) for the PCI transport, and to a channel program that references a single byte buffer with the value of b for Channel IO transport. Making that synchroneaous or asynchronous depending on the value of b does not sound right. Maybe Conny can chime in on this later as well. [..] > > AFAIK x86 uses normal > > memory access instructions. I suppose when a store instruction does not > > fail, then the store must happen. Or is that only true for RAM and for > > PCI memory not? > > > AFAIK, PCI have non-posted write which can fail but the spec doesn't > requires that. > So for a posted write a PCI device can just silently discard writes it does not 'like' (i.e. pretend there was no such transaction in the first place)? [..] > > So did PCI: > > case VIRTIO_PCI_STATUS: > if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { > virtio_pci_stop_ioeventfd(proxy); > } > > virtio_set_status(vdev, val & 0xFF); > > if (val & VIRTIO_CONFIG_S_DRIVER_OK) { > virtio_pci_start_ioeventfd(proxy); > } > > if (vdev->status == 0) { > virtio_pci_reset(DEVICE(proxy)); > } > > As you said, it's a implementation property since qemu implements a > synchronous virtio_set_status(). But it might not be true for real > hardware which may require more time or even hard to drain a queue in > short time. That's why PCI spec mandate the driver to poll for the > status after writing zero to device status. > That is a PCI specific thing, and unfortunately my PCI skills are very limited. Please compare static void vp_reset(struct virtio_device *vdev) { vp_iowrite8(0, &vp_dev->common->device_status); [..] while (vp_ioread8(&vp_dev->common->device_status)) msleep(1); With int virtio_finalize_features(struct virtio_device *dev) { [..] virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); status = dev->config->get_status(dev); if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { The latter clearly assumes that the first get_status() will yield the final outcome, where the former clearly assumes that the effect of set_status() may be delayed beyond the first subsequent get_status(). Btw, if the device were to to fail resetting itself, what would you expect to happen? I would expect the device to present a status DEVICE_NEEDS_RESET, but that would effectively make vp_reset() getting stuck in that loop. Or does the spec guarantee that a reset must work eventually? [..] > >> PCI can use new filed. But I wonder how can that help for CCW. > > Using a new mechanism to request the device to stop would help with > > the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK > > synchronous, and make REQUEST_STOP asynchronous. > > > Right, but as said before, it looks more like a part of device status > state machine, using new mechanism might bring extra complexity and > confusion. > > Looking at CCW implementation of device status, I don't see any blocker > for implementing it synchronously if we just wait for the stop of the > device and then return? > I don't see a problem on the qemu side, but I do see a problem on the guest side. There is a reason why you want set DEVICE_STOPPED to be asynchronous. Regards, Halil [..] This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-29 13:20 ` Halil Pasic @ 2020-12-30 8:03 ` Jason Wang 0 siblings, 0 replies; 43+ messages in thread From: Jason Wang @ 2020-12-30 8:03 UTC (permalink / raw) To: Halil Pasic Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/29 下午9:20, Halil Pasic wrote: > On Mon, 28 Dec 2020 14:47:21 +0800 > Jason Wang <jasowang@redhat.com> wrote: > > [..] >>> Right. The reason why we have to wait for the completion of the channel >>> program is exactly that. Furthermore if we have to fail setting >>> FEATURES_OK we fail the channel program, that was supposed to do the >>> operation with an unit check. So at the point where the set_status() is >>> done, we already know if it worked or not. >> >> The only difference is, for FEATURES_OK, driver know it won't work any >> more. But for DEVICE_STOPPED, driver know it might work sometime in the >> future. >> > If the channel program fails with a command reject, that that ccw is not > allowed to cause further side effects. I.e. we can not fail the channel > program, but then make the operation succeed later. > >>> Of course the device can >>> change it's status any time. >>> >>> Since we have a synchronous API setting DEVICE_STOPPED would also have to >>> block until all in-flight requests are completed. But this does not see >>> to be what you want. I understood that >>> virtio_add_status(dev, DEVICE_STOPPED) would return immediately, and so would >>> a subsequent config->get_status(dev) without the DEVICE_STOPPED bit set, >>> if there are still outstanding requests. >> >> Yes. >> >> >>> I don't think defining virtio_add_status(dev, FEATURES_OK) (followed by >>> a readback) as synchronous and virtio_add_status(dev, DEVICE_STOPPED) >>> (followed by a readback) as asynchronous is a good idea. >> >> The main reason is that DEVICE_STOPPED is a one of the possible state. >> We'd better do this by extending the current device status state >> machine. Otherwise it would be more complicated (e.g using another >> interface). >> >> It looks to me what you don't like is the driver API design? If yes, how >> about introduce something like virtio_try_add_status()? The difference >> is that it doesn't require a immediate result. >> > Yes a new (Linux) driver API operation sounds good -- at least on the > surface. I would rather call it virtio_add_status_async() than > virtio_try_add_status() because for me 'try' tells that something may > or may not work, but the outcome is decided when the function > returns. E.g. try_lock(l) may or may not succeed the lock l, but when it > returns you know if it did take l or not. I agree. > > But how would you implement that new operation? I guess it would still > have to boil down to config->set_status(dev), and to set_status(s), > being asynchronous when s & DEVICE_STOPPED and synchronous otherwise (and > in particular FEATURES_OK is added to the status). Or it could be done via a dedicated config ops like how vp_reset() did. > > Ultimately a set_status(b) boils down to a iowrite8(b) for the PCI > transport, and to a channel program that references a single byte buffer > with the value of b for Channel IO transport. Making that synchroneaous > or asynchronous depending on the value of b does not sound right. Just to make sure we're in the same page. I meant e.g in CCW_CMD_WRITE_STATUS, qemu just propagate the request of device stop to device and return success. And it's the device that decide when it can set the DEVICE_STOPPED bit and then the next CCW_CMD_READ_STATUS could read the bit. > > Maybe Conny can chime in on this later as well. > > [..] >>> AFAIK x86 uses normal >>> memory access instructions. I suppose when a store instruction does not >>> fail, then the store must happen. Or is that only true for RAM and for >>> PCI memory not? >> >> AFAIK, PCI have non-posted write which can fail but the spec doesn't >> requires that. >> > So for a posted write a PCI device can just silently discard writes it > does not 'like' (i.e. pretend there was no such transaction in the first > place)? I might be wrong, but it's my understanding as well. > > [..] >> So did PCI: >> >> case VIRTIO_PCI_STATUS: >> if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { >> virtio_pci_stop_ioeventfd(proxy); >> } >> >> virtio_set_status(vdev, val & 0xFF); >> >> if (val & VIRTIO_CONFIG_S_DRIVER_OK) { >> virtio_pci_start_ioeventfd(proxy); >> } >> >> if (vdev->status == 0) { >> virtio_pci_reset(DEVICE(proxy)); >> } >> >> As you said, it's a implementation property since qemu implements a >> synchronous virtio_set_status(). But it might not be true for real >> hardware which may require more time or even hard to drain a queue in >> short time. That's why PCI spec mandate the driver to poll for the >> status after writing zero to device status. >> > That is a PCI specific thing, and unfortunately my PCI skills are very > limited. > > Please compare > > static void vp_reset(struct virtio_device *vdev) > { > vp_iowrite8(0, &vp_dev->common->device_status); > [..] > while (vp_ioread8(&vp_dev->common->device_status)) > msleep(1); > > With > int virtio_finalize_features(struct virtio_device *dev) > { > [..] > virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK); > status = dev->config->get_status(dev); > if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) { > > The latter clearly assumes that the first get_status() will > yield the final outcome, where the former clearly assumes that > the effect of set_status() may be delayed beyond the first > subsequent get_status(). I think the reason is that. For PCI, the spec forces the device to either set or clear the FEATURE_OK during write. But for reset, device may choose to delay setting status to 0 after the write. > > Btw, if the device were to to fail resetting itself, what would > you expect to happen? I would expect the device to present a > status DEVICE_NEEDS_RESET, but that would effectively make > vp_reset() getting stuck in that loop. Yes, then it looks like a recursion here. > Or does the spec guarantee > that a reset must work eventually? My understanding is that the spec implies it must succeed eventually. > > [..] > >>>> PCI can use new filed. But I wonder how can that help for CCW. >>> Using a new mechanism to request the device to stop would help with >>> the synchronous vs asynchronous stuff. I.e. we could keep the FEATURES_OK >>> synchronous, and make REQUEST_STOP asynchronous. >> >> Right, but as said before, it looks more like a part of device status >> state machine, using new mechanism might bring extra complexity and >> confusion. >> >> Looking at CCW implementation of device status, I don't see any blocker >> for implementing it synchronously if we just wait for the stop of the >> device and then return? >> > I don't see a problem on the qemu side, but I do see a problem on > the guest side. There is a reason why you want set DEVICE_STOPPED to > be asynchronous. So my understanding is that. at least for current kernel virtio drivers, it doesn't need DEVICE_STOPPED since there's no actual use cases. So no need for inventing new helpers. It would be only need for the vDPA drivers that is used by userspace via vhost-vDPA. In this case, we just provide the mechanism to write and read status, vDPA driver don't need to enforce any policy in the middle (e.g polling or not). Thanks > > Regards, > Halil > > [..] > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-22 12:51 ` Jason Wang 2020-12-22 15:54 ` Cornelia Huck @ 2020-12-24 4:52 ` Halil Pasic 2020-12-24 5:51 ` Jason Wang 1 sibling, 1 reply; 43+ messages in thread From: Halil Pasic @ 2020-12-24 4:52 UTC (permalink / raw) To: Jason Wang Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On Tue, 22 Dec 2020 20:51:04 +0800 Jason Wang <jasowang@redhat.com> wrote: > >> From Qemu point of view, vhost-vDPA is just another type of vhost > >> backend. Qemu needs to stop virtio (vhost) before it can do migration. > >> So we require vDPA devices to have the ability of stopping or pausing > >> its datapath. If the vDPA device is by chance the virtio-PCI device, it > >> needs an interface for receiving stop/resume command from the driver. > >> > >> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then > >> to vDPA parent which could be a virtio-PCI device in this case. > > But QEMU implements the _device_, not the driver, doesn't it? > > > The device is implemented with the co-operation between Qemu and > vhost-vDPA. During migration qemu need to stop the virtio-net device, > then vhost must be stopped as well. > Sorry for the lazy/stupid question (it's been a while since I last understood that code), how does this work with vhost based virtio-net? I don't see ioctl that stops the old classic vhost (no vdpa). I have a fuzzy remembrance that we 'stop' the notifiers (ioeventfd & irqfd) but I'm not sure. And since we are at virtio-net, I ask myself how are 'new requests' and 'requests that is (the device) currently processing' defined for the receive functionality and the rx queue(s). Remember the latter ('in flight requests') need to be all completed. I mean 'in-flight' is pretty straight-forward for something like virtio-blk, or even for tx but I have a hard time with rx. > > > And IIUC, > > vhost-VDPA and the vDPA parent are also on the device side. I feel like > > I'm missing something essential here. > > > Virtio-PCI driver could be a vDPA parent as well in this case. So we > need stop the virtio-pci device. I used to think about vdpa like a vehicle to make partial virtio support in HW viable and easy. I.e. when I hear vdpa I have something like this in mind: https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg Of course the 'physical nic' from the linked picture can be a nic that supports both the virtio data plane and the virtio control plane (i.e. what you are referring to as a virtio-pci device). But do we still expect that device to be connected via vdpa? The second question is not strictly on topic, but I'm still curious, what do we plan to do with a lower device that does not support virtio control plane? In that case we can't go via DEVICE_STOP. Does that mean, there has to be a vendor specific mechanism equivalent to the DEVICE_STOP mechanism, or otherwise we are non-migratable? Regards, Halil This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-24 4:52 ` Halil Pasic @ 2020-12-24 5:51 ` Jason Wang 2020-12-25 3:18 ` Halil Pasic 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-24 5:51 UTC (permalink / raw) To: Halil Pasic Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/24 下午12:52, Halil Pasic wrote: > On Tue, 22 Dec 2020 20:51:04 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >>>> From Qemu point of view, vhost-vDPA is just another type of vhost >>>> backend. Qemu needs to stop virtio (vhost) before it can do migration. >>>> So we require vDPA devices to have the ability of stopping or pausing >>>> its datapath. If the vDPA device is by chance the virtio-PCI device, it >>>> needs an interface for receiving stop/resume command from the driver. >>>> >>>> So the devce stop/resume command was sent from Qemu to vhost-VDPA, then >>>> to vDPA parent which could be a virtio-PCI device in this case. >>> But QEMU implements the _device_, not the driver, doesn't it? >> >> The device is implemented with the co-operation between Qemu and >> vhost-vDPA. During migration qemu need to stop the virtio-net device, >> then vhost must be stopped as well. >> > Sorry for the lazy/stupid question (it's been a while since I last > understood that code), how does this work with vhost based virtio-net? > > I don't see ioctl that stops the old classic vhost (no vdpa). I have > a fuzzy remembrance that we 'stop' the notifiers (ioeventfd & irqfd) but > I'm not sure. Each vhost devices implements its own ioctl: For vhost-net, it's VHOST_NET_SET_BACKEND, passing -1 as socket fd will stop the device. For vhost-vsock, it's VHOST_VSOCK_SET_RUNNING. I don't check SCSI but it should have something similar. > > And since we are at virtio-net, The feature is not net specific. > I ask myself how are 'new requests' and > 'requests that is (the device) currently processing' defined for the > receive functionality and the rx queue(s). Remember the latter ('in > flight requests') need to be all completed. I mean 'in-flight' is pretty > straight-forward for something like virtio-blk, or even for tx but I have > a hard time with rx. I think it should be no difference from the view of virtqueue. It's the requests that have been read from avail ring but not added to the used ring. Actually this ties to the visibility of virtqueue internal state. Usually device maintains a tail pointer (last_avail_idx). So in the simplest case when no indices wrap and no out-of-order, requests belongs to [last_avail_idx, used_idx) are considered as in-flight. > >>> And IIUC, >>> vhost-VDPA and the vDPA parent are also on the device side. I feel like >>> I'm missing something essential here. >> >> Virtio-PCI driver could be a vDPA parent as well in this case. So we >> need stop the virtio-pci device. > I used to think about vdpa like a vehicle to make partial virtio support > in HW viable and easy. I.e. when I hear vdpa I have something like this > in mind: > https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg > > Of course the 'physical nic' from the linked picture can be a nic that > supports both the virtio data plane and the virtio control plane (i.e. > what you are referring to as a virtio-pci device). But do we still expect > that device to be connected via vdpa? Why not?[1] With the help of vhost-vDPA, we get the wire speed and live migration support for virtio-pci device. > > The second question is not strictly on topic, but I'm still curious, what > do we plan to do with a lower device that does not support virtio control > plane? Yes, the vDPA device just need to implement the same semantic not the same interface. You may refer mlx5e vDPA driver in the kernel source. > In that case we can't go via DEVICE_STOP. Does that mean, there > has to be a vendor specific mechanism equivalent to the DEVICE_STOP > mechanism, or otherwise we are non-migratable? Exactly. Thanks [1] https://lwn.net/Articles/838978/ > > Regards, > Halil > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-24 5:51 ` Jason Wang @ 2020-12-25 3:18 ` Halil Pasic 2020-12-25 6:45 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Halil Pasic @ 2020-12-25 3:18 UTC (permalink / raw) To: Jason Wang Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On Thu, 24 Dec 2020 13:51:31 +0800 Jason Wang <jasowang@redhat.com> wrote: > > On 2020/12/24 下午12:52, Halil Pasic wrote: > > On Tue, 22 Dec 2020 20:51:04 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > [..] > > Each vhost devices implements its own ioctl: > > For vhost-net, it's VHOST_NET_SET_BACKEND, passing -1 as socket fd will > stop the device. > For vhost-vsock, it's VHOST_VSOCK_SET_RUNNING. > > I don't check SCSI but it should have something similar. > Thanks! > > > > > And since we are at virtio-net, > > > The feature is not net specific. > Nod. > > > I ask myself how are 'new requests' and > > 'requests that is (the device) currently processing' defined for the > > receive functionality and the rx queue(s). Remember the latter ('in > > flight requests') need to be all completed. I mean 'in-flight' is pretty > > straight-forward for something like virtio-blk, or even for tx but I have > > a hard time with rx. > > > I think it should be no difference from the view of virtqueue. It's the > requests that have been read from avail ring but not added to the used ring. > > Actually this ties to the visibility of virtqueue internal state. > Usually device maintains a tail pointer (last_avail_idx). So in the > simplest case when no indices wrap and no out-of-order, requests belongs > to [last_avail_idx, used_idx) are considered as in-flight. > I wouldn't call it virtqueue internal state, but I get what you mean. The virtio specification is concerned with the device/driver interface. The driver does not and can not differentiate between buffers that are available but "haven't been read from avail ring", and those that have "been read from the avail ring". As you said last_avail_idx is a device private thing -- a device internal state. I tend to say, that from a perspective of the driver, all requests that are available, and not yet used, are in-flight. So we have to be very careful when wording this requirement, to avoid misunderstandings. I don't think the first RFC is good enough. I will think some more about this. I'm under the impression that considering virtio console (serial) could be useful, as it provides a reliable duplex data transfer. I.e. some I/O is not driver initiated, but dropping packets ain't OK. So, the data may be in flight without the virtqueue buffer being in flight according to our latest definition ([last_avail_idx, used_idx)). Sorry, I have the feeling I'm spouting out half baked thoughts. Thank you for bearing with me. > > > > >>> And IIUC, > >>> vhost-VDPA and the vDPA parent are also on the device side. I feel like > >>> I'm missing something essential here. > >> > >> Virtio-PCI driver could be a vDPA parent as well in this case. So we > >> need stop the virtio-pci device. > > I used to think about vdpa like a vehicle to make partial virtio support > > in HW viable and easy. I.e. when I hear vdpa I have something like this > > in mind: > > https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg > > > > Of course the 'physical nic' from the linked picture can be a nic that > > supports both the virtio data plane and the virtio control plane (i.e. > > what you are referring to as a virtio-pci device). But do we still expect > > that device to be connected via vdpa? > > > Why not?[1] With the help of vhost-vDPA, we get the wire speed and live > migration support for virtio-pci device. Indeed. > > > > > > The second question is not strictly on topic, but I'm still curious, what > > do we plan to do with a lower device that does not support virtio control > > plane? > > > Yes, the vDPA device just need to implement the same semantic not the > same interface. You may refer mlx5e vDPA driver in the kernel source. Thanks! I will have a look. Regards, Halil [..] This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-25 3:18 ` Halil Pasic @ 2020-12-25 6:45 ` Jason Wang 2020-12-27 11:12 ` Michael S. Tsirkin 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-25 6:45 UTC (permalink / raw) To: Halil Pasic Cc: Cornelia Huck, stefanha, virtio-comment, mst, eperezma, sgarzare On 2020/12/25 上午11:18, Halil Pasic wrote: > On Thu, 24 Dec 2020 13:51:31 +0800 > Jason Wang <jasowang@redhat.com> wrote: > >> On 2020/12/24 下午12:52, Halil Pasic wrote: >>> On Tue, 22 Dec 2020 20:51:04 +0800 >>> Jason Wang <jasowang@redhat.com> wrote: >>> > [..] >> Each vhost devices implements its own ioctl: >> >> For vhost-net, it's VHOST_NET_SET_BACKEND, passing -1 as socket fd will >> stop the device. >> For vhost-vsock, it's VHOST_VSOCK_SET_RUNNING. >> >> I don't check SCSI but it should have something similar. >> > Thanks! > >>> And since we are at virtio-net, >> >> The feature is not net specific. >> > Nod. > >>> I ask myself how are 'new requests' and >>> 'requests that is (the device) currently processing' defined for the >>> receive functionality and the rx queue(s). Remember the latter ('in >>> flight requests') need to be all completed. I mean 'in-flight' is pretty >>> straight-forward for something like virtio-blk, or even for tx but I have >>> a hard time with rx. >> >> I think it should be no difference from the view of virtqueue. It's the >> requests that have been read from avail ring but not added to the used ring. >> >> Actually this ties to the visibility of virtqueue internal state. >> Usually device maintains a tail pointer (last_avail_idx). So in the >> simplest case when no indices wrap and no out-of-order, requests belongs >> to [last_avail_idx, used_idx) are considered as in-flight. >> > I wouldn't call it virtqueue internal state, but I get what you mean. The > virtio specification is concerned with the device/driver interface. The > driver does not and can not differentiate between buffers that are > available but "haven't been read from avail ring", and those that have > "been read from the avail ring". As you said last_avail_idx is a device > private thing -- a device internal state. Exactly. > > I tend to say, that from a perspective of the driver, all requests that > are available, and not yet used, are in-flight. So we have to be very > careful when wording this requirement, to avoid misunderstandings. I > don't think the first RFC is good enough. I will think some more about > this. Yes, I agree. The problem is that the spec doesn't describe how device work, so if we want to be more accurate, it might require some work not only for stop but also for e.g reset (something like in flight has been used by the spec in that case). > I'm under the impression that considering virtio console (serial) > could be useful, as it provides a reliable duplex data transfer. I.e. > some I/O is not driver initiated, but dropping packets ain't OK. So, the > data may be in flight without the virtqueue buffer being in flight > according to our latest definition ([last_avail_idx, used_idx)). Yes. > Sorry, I have the feeling I'm spouting out half baked thoughts. Thank you > for bearing with me. You're welcome and you comments are very helpful and appreciated. Thanks > >>>>> And IIUC, >>>>> vhost-VDPA and the vDPA parent are also on the device side. I feel like >>>>> I'm missing something essential here. >>>> Virtio-PCI driver could be a vDPA parent as well in this case. So we >>>> need stop the virtio-pci device. >>> I used to think about vdpa like a vehicle to make partial virtio support >>> in HW viable and easy. I.e. when I hear vdpa I have something like this >>> in mind: >>> https://www.redhat.com/cms/managed-files/2019-10-02-vdpa-figure5.jpg >>> >>> Of course the 'physical nic' from the linked picture can be a nic that >>> supports both the virtio data plane and the virtio control plane (i.e. >>> what you are referring to as a virtio-pci device). But do we still expect >>> that device to be connected via vdpa? >> >> Why not?[1] With the help of vhost-vDPA, we get the wire speed and live >> migration support for virtio-pci device. > Indeed. > >> >>> The second question is not strictly on topic, but I'm still curious, what >>> do we plan to do with a lower device that does not support virtio control >>> plane? >> >> Yes, the vDPA device just need to implement the same semantic not the >> same interface. You may refer mlx5e vDPA driver in the kernel source. > Thanks! I will have a look. > > Regards, > Halil > > [..] > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-25 6:45 ` Jason Wang @ 2020-12-27 11:12 ` Michael S. Tsirkin 2020-12-28 7:05 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Michael S. Tsirkin @ 2020-12-27 11:12 UTC (permalink / raw) To: Jason Wang Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote: > > I tend to say, that from a perspective of the driver, all requests that > > are available, and not yet used, are in-flight. So we have to be very > > careful when wording this requirement, to avoid misunderstandings. I > > don't think the first RFC is good enough. I will think some more about > > this. > > > Yes, I agree. The problem is that the spec doesn't describe how device work, > so if we want to be more accurate, it might require some work not only for > stop but also for e.g reset (something like in flight has been used by the > spec in that case). You probably mean the DEVICE_NEEDS_RESET description, right? For example, the driver can't assume requests in flight will be completed if DEVICE_NEEDS_RESET is set, nor can it assume that they have not been completed. A good implementation will try to recover by issuing a reset. yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely is related to the fact it is not widely implemented. -- MST This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-27 11:12 ` Michael S. Tsirkin @ 2020-12-28 7:05 ` Jason Wang 2020-12-28 12:27 ` Michael S. Tsirkin 0 siblings, 1 reply; 43+ messages in thread From: Jason Wang @ 2020-12-28 7:05 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On 2020/12/27 下午7:12, Michael S. Tsirkin wrote: > On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote: >>> I tend to say, that from a perspective of the driver, all requests that >>> are available, and not yet used, are in-flight. So we have to be very >>> careful when wording this requirement, to avoid misunderstandings. I >>> don't think the first RFC is good enough. I will think some more about >>> this. >> >> Yes, I agree. The problem is that the spec doesn't describe how device work, >> so if we want to be more accurate, it might require some work not only for >> stop but also for e.g reset (something like in flight has been used by the >> spec in that case). > You probably mean the DEVICE_NEEDS_RESET description, right? > > For example, the driver can't assume requests in flight will be > completed if DEVICE_NEEDS_RESET is set, nor can it assume that > they have not been completed. A good implementation will try to > recover by issuing a reset. > > yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely > is related to the fact it is not widely implemented. For both NEEDS_RESET and device reset. For NEEDS_RESET, we use "in flight" and "completed" without an actual definition. For device reset, we don't even define what is the device expected (e.g how are "in flight" requests expected to be processed) to behave. Thanks > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-28 7:05 ` Jason Wang @ 2020-12-28 12:27 ` Michael S. Tsirkin 2020-12-29 8:57 ` Jason Wang 0 siblings, 1 reply; 43+ messages in thread From: Michael S. Tsirkin @ 2020-12-28 12:27 UTC (permalink / raw) To: Jason Wang Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare On Mon, Dec 28, 2020 at 03:05:43PM +0800, Jason Wang wrote: > > On 2020/12/27 下午7:12, Michael S. Tsirkin wrote: > > On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote: > > > > I tend to say, that from a perspective of the driver, all requests that > > > > are available, and not yet used, are in-flight. So we have to be very > > > > careful when wording this requirement, to avoid misunderstandings. I > > > > don't think the first RFC is good enough. I will think some more about > > > > this. > > > > > > Yes, I agree. The problem is that the spec doesn't describe how device work, > > > so if we want to be more accurate, it might require some work not only for > > > stop but also for e.g reset (something like in flight has been used by the > > > spec in that case). > > You probably mean the DEVICE_NEEDS_RESET description, right? > > > > For example, the driver can't assume requests in flight will be > > completed if DEVICE_NEEDS_RESET is set, nor can it assume that > > they have not been completed. A good implementation will try to > > recover by issuing a reset. > > > > yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely > > is related to the fact it is not widely implemented. > > > For both NEEDS_RESET and device reset. > > For NEEDS_RESET, we use "in flight" and "completed" without an actual > definition. > > For device reset, we don't even define what is the device expected (e.g how > are "in flight" requests expected to be processed) to behave. > > Thanks Because device is expected to just stop: None of the virtqueues of a device are live once the device has been reset. and it's driver's job to recover buffers: Thus a driver MUST ensure a virtqueue isn't live (by device reset) before removing exposed buffers. what happened to buffers which were not used? I think it's a quality of implementation/device specific issue, e.g. for net: for transmit, if device sends a packet without using the buffer, then driver will resend the packet leading to duplicates. for receive, it's a packet drop, slightly less of a problem. my question is whether such behaviour is sufficient for migration? if not can we really describe it generally? it's possible we need to describe it per device type. -- MST This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-28 12:27 ` Michael S. Tsirkin @ 2020-12-29 8:57 ` Jason Wang 0 siblings, 0 replies; 43+ messages in thread From: Jason Wang @ 2020-12-29 8:57 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Halil Pasic, Cornelia Huck, stefanha, virtio-comment, eperezma, sgarzare ----- Original Message ----- > On Mon, Dec 28, 2020 at 03:05:43PM +0800, Jason Wang wrote: > > > > On 2020/12/27 下午7:12, Michael S. Tsirkin wrote: > > > On Fri, Dec 25, 2020 at 02:45:28PM +0800, Jason Wang wrote: > > > > > I tend to say, that from a perspective of the driver, all requests > > > > > that > > > > > are available, and not yet used, are in-flight. So we have to be very > > > > > careful when wording this requirement, to avoid misunderstandings. I > > > > > don't think the first RFC is good enough. I will think some more > > > > > about > > > > > this. > > > > > > > > Yes, I agree. The problem is that the spec doesn't describe how device > > > > work, > > > > so if we want to be more accurate, it might require some work not only > > > > for > > > > stop but also for e.g reset (something like in flight has been used by > > > > the > > > > spec in that case). > > > You probably mean the DEVICE_NEEDS_RESET description, right? > > > > > > For example, the driver can't assume requests in flight will be > > > completed if DEVICE_NEEDS_RESET is set, nor can it assume that > > > they have not been completed. A good implementation will try to > > > recover by issuing a reset. > > > > > > yes, DEVICE_NEEDS_RESET is unfortunately underspecified which likely > > > is related to the fact it is not widely implemented. > > > > > > For both NEEDS_RESET and device reset. > > > > For NEEDS_RESET, we use "in flight" and "completed" without an actual > > definition. > > > > For device reset, we don't even define what is the device expected (e.g how > > are "in flight" requests expected to be processed) to behave. > > > > Thanks > > Because device is expected to just stop: > > None of the virtqueues > of a device are live once the device has been reset. So it's something similar to what DEVICE_STOPPED wants. > > and it's driver's job to recover buffers: > > Thus a driver MUST ensure a virtqueue isn't live (by device reset) before > removing exposed buffers. > > what happened to buffers which were not used? > > I think it's a quality of implementation/device specific issue, e.g. for net: > for transmit, if device sends a packet without using the buffer, This sounds like an intermediate state we need to avoid in the migration. Can we mandate the device to make the buffer used in this case? For networking device, it should be not slow, just wait for the DMA to be completed is sufficient. > then driver will resend the packet leading to duplicates. > for receive, it's a packet drop, slightly less of a problem. For net yes. > > > my question is whether such behaviour is sufficient for migration? I suspect it's not, we need to either 1) wait for the buffer to be used or 2) fail the migration if the device is not stopped in a short time > if not can we really describe it generally? It looks like a virtqueue level issue, so we need to try. > it's possible we need to > describe it per device type. Maybe. Thanks > > -- > MST > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-18 4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang 2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella 2020-12-21 21:33 ` [virtio-comment] " Halil Pasic @ 2021-05-03 9:02 ` Eugenio Perez Martin 2021-05-06 2:51 ` Jason Wang 2021-05-05 13:16 ` Michael S. Tsirkin 3 siblings, 1 reply; 43+ messages in thread From: Eugenio Perez Martin @ 2021-05-03 9:02 UTC (permalink / raw) To: Jason Wang Cc: Stefan Hajnoczi, virtio-comment, Michael Tsirkin, Stefano Garzarella On Fri, Dec 18, 2020 at 5:23 AM Jason Wang <jasowang@redhat.com> wrote: > > This patch introduces a new status bit DEVICE_STOPPED. This will be > used by the driver to stop and resume a device. The main user will be > live migration support for virtio device. > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > content.tex | 26 ++++++++++++++++++++++++-- > 1 file changed, 24 insertions(+), 2 deletions(-) > > diff --git a/content.tex b/content.tex > index 61eab41..4392b60 100644 > --- a/content.tex > +++ b/content.tex > @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > drive the device. > > +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, > + indicates that the device has been stopped by the driver. > + > \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced > an error from which it can't recover. > \end{description} > @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \ref{sec:General Initialization And Device Operation / Device > Initialization}. > The driver MUST NOT clear a > -\field{device status} bit. If the driver sets the FAILED bit, > -the driver MUST later reset the device before attempting to re-initialize. > +\field{device status} bit other than DEVICE_STOPPED. If the > +driver sets the FAILED bit, the driver MUST later reset the device > +before attempting to re-initialize. > > The driver SHOULD NOT rely on completion of operations of a > device if DEVICE_NEEDS_RESET is set. > @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > recover by issuing a reset. > \end{note} > > +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not > +set. In order to stop the device, the driver MUST set DEVICE_STOPPED > +first and re-read status to check whether DEVICE_STOPPED is set by the > +device. In order to resume the device, the driver MUST clear > +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED > +is cleared by the device. > + > \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} > The device MUST initialize \field{device status} to 0 upon reset. > > The device MUST NOT consume buffers or send any used buffer > notifications to the driver before DRIVER_OK. > > +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. > + > +When driver is trying to set DEVICE_STOPPED, the device MUST not > +process new avail requests and MUST complete all requests that is > +currently processing before setting DEVICE_STOPPED. > + > +The device MUST keep the config space unchanged when DEVICE_STOPPED is > +set. > + > \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state > that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device > MUST send a device configuration change notification to the driver. > @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates > that the driver passes extra data (besides identifying the virtqueue) > in its device notifications. > + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the > + driver can stop and resume the device. > See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}. > \end{description} > > -- > 2.25.1 > Hi! After implementing a small POC for this in QEMU/virtio-pci vdpa device, I think that the standard should specify what state the driver [1] is allowed to change, if any, and what fields are valid for the driver to query (like device status or virtqueue state). As a proposal, I think all the device config status should freeze (as "the device is not allowed to change") when paused, and the driver must be able to read all of it. Regarding writing to the device config, the straightforward solution is to not allow them, and mandate the driver to go through a full reset cycle in my opinion. The only exception should be the queue selector: Writing to it by the driver has the obvious consequences. This way, the device does not need to check for changes on the resume, it could be tricky. However maybe there are some use cases where to change just a field through this method could be faster than the whole device reset. I can think in another approach: The device supports to write some fields (for example, just virtqueue addresses and virtqueue size, or some feature flags), but any other change would lead to the device to report DEVICE_NEEDS_RESET. This has the mentioned advantages of reducing latencies (as transactions) in case of device re-usage, but at the cost of increasing complexity. Any opinion about if it is worth it? Thanks! [1] From the device's point of view, so I mean QEMU here. This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2021-05-03 9:02 ` [virtio-comment] " Eugenio Perez Martin @ 2021-05-06 2:51 ` Jason Wang 0 siblings, 0 replies; 43+ messages in thread From: Jason Wang @ 2021-05-06 2:51 UTC (permalink / raw) To: Eugenio Perez Martin Cc: Stefan Hajnoczi, virtio-comment, Michael Tsirkin, Stefano Garzarella 在 2021/5/3 下午5:02, Eugenio Perez Martin 写道: > On Fri, Dec 18, 2020 at 5:23 AM Jason Wang <jasowang@redhat.com> wrote: >> This patch introduces a new status bit DEVICE_STOPPED. This will be >> used by the driver to stop and resume a device. The main user will be >> live migration support for virtio device. >> >> Signed-off-by: Jason Wang <jasowang@redhat.com> >> --- >> content.tex | 26 ++++++++++++++++++++++++-- >> 1 file changed, 24 insertions(+), 2 deletions(-) >> >> diff --git a/content.tex b/content.tex >> index 61eab41..4392b60 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >> drive the device. >> >> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >> + indicates that the device has been stopped by the driver. >> + >> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced >> an error from which it can't recover. >> \end{description} >> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> \ref{sec:General Initialization And Device Operation / Device >> Initialization}. >> The driver MUST NOT clear a >> -\field{device status} bit. If the driver sets the FAILED bit, >> -the driver MUST later reset the device before attempting to re-initialize. >> +\field{device status} bit other than DEVICE_STOPPED. If the >> +driver sets the FAILED bit, the driver MUST later reset the device >> +before attempting to re-initialize. >> >> The driver SHOULD NOT rely on completion of operations of a >> device if DEVICE_NEEDS_RESET is set. >> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> recover by issuing a reset. >> \end{note} >> >> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED >> +first and re-read status to check whether DEVICE_STOPPED is set by the >> +device. In order to resume the device, the driver MUST clear >> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >> +is cleared by the device. >> + >> \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} >> The device MUST initialize \field{device status} to 0 upon reset. >> >> The device MUST NOT consume buffers or send any used buffer >> notifications to the driver before DRIVER_OK. >> >> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >> + >> +When driver is trying to set DEVICE_STOPPED, the device MUST not >> +process new avail requests and MUST complete all requests that is >> +currently processing before setting DEVICE_STOPPED. >> + >> +The device MUST keep the config space unchanged when DEVICE_STOPPED is >> +set. >> + >> \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state >> that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device >> MUST send a device configuration change notification to the driver. >> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} >> \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates >> that the driver passes extra data (besides identifying the virtqueue) >> in its device notifications. >> + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the >> + driver can stop and resume the device. >> See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}. >> \end{description} >> >> -- >> 2.25.1 >> > Hi! > > After implementing a small POC for this in QEMU/virtio-pci vdpa > device, I think that the standard should specify what state the driver > [1] is allowed to change, if any, and what fields are valid for the > driver to query (like device status or virtqueue state). > > As a proposal, I think all the device config status should freeze (as > "the device is not allowed to change") when paused, and the driver > must be able to read all of it. Yes. > > Regarding writing to the device config, the straightforward solution > is to not allow them, and mandate the driver to go through a full > reset cycle in my opinion. The only exception should be the queue > selector: Writing to it by the driver has the obvious consequences. > This way, the device does not need to check for changes on the resume, > it could be tricky. However maybe there are some use cases where to > change just a field through this method could be faster than the whole > device reset. > > I can think in another approach: The device supports to write some > fields (for example, just virtqueue addresses and virtqueue size, or > some feature flags), but any other change would lead to the device to > report DEVICE_NEEDS_RESET. This has the mentioned advantages of > reducing latencies (as transactions) in case of device re-usage, but > at the cost of increasing complexity. Any opinion about if it is worth > it? I'm not sure we need to do this, and actually it's not specific to the stop state. A more generic question is: do we allow the driver to change device/virtqueue status after DRIVER_OK is set? If yes, it's probably less interesting in limiting the behavior when stop is set. If not, we should clarify it and there's no need to teat stop as a specific case to deal with. Thanks > > Thanks! > > [1] From the device's point of view, so I mean QEMU here. > > > This publicly archived list offers a means to provide input to the > OASIS Virtual I/O Device (VIRTIO) TC. > > In order to verify user consent to the Feedback License terms and > to minimize spam in the list archive, subscription is required > before posting. > > Subscribe: virtio-comment-subscribe@lists.oasis-open.org > Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org > List help: virtio-comment-help@lists.oasis-open.org > List archive: https://lists.oasis-open.org/archives/virtio-comment/ > Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf > List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists > Committee: https://www.oasis-open.org/committees/virtio/ > Join OASIS: https://www.oasis-open.org/join/ > This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2020-12-18 4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang ` (2 preceding siblings ...) 2021-05-03 9:02 ` [virtio-comment] " Eugenio Perez Martin @ 2021-05-05 13:16 ` Michael S. Tsirkin 2021-05-06 7:26 ` Jason Wang 3 siblings, 1 reply; 43+ messages in thread From: Michael S. Tsirkin @ 2021-05-05 13:16 UTC (permalink / raw) To: Jason Wang; +Cc: stefanha, virtio-comment, eperezma, sgarzare On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote: > This patch introduces a new status bit DEVICE_STOPPED. This will be > used by the driver to stop and resume a device. The main user will be > live migration support for virtio device. > > Signed-off-by: Jason Wang <jasowang@redhat.com> I have been thinking about this whole approach. While yes it's possible to have the hypervisor poke at the device while device is not reset, e.g. by blocking guest access to device, while device can still DMA into guest memory, this is at least unusual and seems somewhat fragile. I think normally live migration would belong in a PF as opposed to the VF. As such, how about for starters we prototype this as part of a vendor specific capability of a PF? This will also show-case how to add vendor specific capabilities to devices. Longer term it would be nice to come up with all kind of PF related functionality in the spec too, but I think that should be part of a more comprehensive/extensible approach since it's not limited to live migration concerns. For example the way qemu handles control vq of a nic in software creates problems, couldn't we inject the relevant commands through PF somehow? > --- > content.tex | 26 ++++++++++++++++++++++++-- > 1 file changed, 24 insertions(+), 2 deletions(-) > > diff --git a/content.tex b/content.tex > index 61eab41..4392b60 100644 > --- a/content.tex > +++ b/content.tex > @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to > drive the device. > > +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, > + indicates that the device has been stopped by the driver. > + > \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced > an error from which it can't recover. > \end{description} > @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > \ref{sec:General Initialization And Device Operation / Device > Initialization}. > The driver MUST NOT clear a > -\field{device status} bit. If the driver sets the FAILED bit, > -the driver MUST later reset the device before attempting to re-initialize. > +\field{device status} bit other than DEVICE_STOPPED. If the > +driver sets the FAILED bit, the driver MUST later reset the device > +before attempting to re-initialize. > > The driver SHOULD NOT rely on completion of operations of a > device if DEVICE_NEEDS_RESET is set. > @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev > recover by issuing a reset. > \end{note} > > +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not > +set. In order to stop the device, the driver MUST set DEVICE_STOPPED > +first and re-read status to check whether DEVICE_STOPPED is set by the > +device. In order to resume the device, the driver MUST clear > +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED > +is cleared by the device. > + > \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} > The device MUST initialize \field{device status} to 0 upon reset. > > The device MUST NOT consume buffers or send any used buffer > notifications to the driver before DRIVER_OK. > > +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. > + > +When driver is trying to set DEVICE_STOPPED, the device MUST not > +process new avail requests and MUST complete all requests that is > +currently processing before setting DEVICE_STOPPED. > + > +The device MUST keep the config space unchanged when DEVICE_STOPPED is > +set. > + > \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state > that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device > MUST send a device configuration change notification to the driver. > @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} > \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates > that the driver passes extra data (besides identifying the virtqueue) > in its device notifications. > + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the > + driver can stop and resume the device. > See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}. > \end{description} > > -- > 2.25.1 This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* [virtio-comment] Re: [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP 2021-05-05 13:16 ` Michael S. Tsirkin @ 2021-05-06 7:26 ` Jason Wang 0 siblings, 0 replies; 43+ messages in thread From: Jason Wang @ 2021-05-06 7:26 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: stefanha, virtio-comment, eperezma, sgarzare 在 2021/5/5 下午9:16, Michael S. Tsirkin 写道: > On Fri, Dec 18, 2020 at 12:23:02PM +0800, Jason Wang wrote: >> This patch introduces a new status bit DEVICE_STOPPED. This will be >> used by the driver to stop and resume a device. The main user will be >> live migration support for virtio device. >> >> Signed-off-by: Jason Wang <jasowang@redhat.com> > I have been thinking about this whole approach. > > While yes it's possible to have the hypervisor poke > at the device while device is not reset, e.g. by > blocking guest access to device, while device can > still DMA into guest memory, this is at least unusual > and seems somewhat fragile. So in the context of stop, I wonder if we can just forbid the DMA in this case? > > I think normally live migration would belong in a PF as opposed to the VF. > > As such, how about for starters we prototype this > as part of a vendor specific capability of a PF? Just to make sure I understand the proposal. Except for the differnt control path, I don't see any other differences: Consider we use vendor specific capability, we still need to define the semantics like the VIRTIO_F_DEVICE_STOP in this case. E.g as you said, quiescence the DMA and other stuffs. > This will also show-case how to add vendor specific capabilities > to devices. > > > Longer term it would be nice to come up with all kind of PF > related functionality in the spec too, but I think that should > be part of a more comprehensive/extensible approach > since it's not limited to live migration concerns. > For example the way qemu handles control vq of a nic > in software creates problems, couldn't we inject > the relevant commands through PF somehow? Yes, as we discussed in the past, I plan to send a RFC to introduce the admin virtqueue for PF. It could be used for such extensions. Thanks > > >> --- >> content.tex | 26 ++++++++++++++++++++++++-- >> 1 file changed, 24 insertions(+), 2 deletions(-) >> >> diff --git a/content.tex b/content.tex >> index 61eab41..4392b60 100644 >> --- a/content.tex >> +++ b/content.tex >> @@ -47,6 +47,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> \item[DRIVER_OK (4)] Indicates that the driver is set up and ready to >> drive the device. >> >> +\item[DEVICE_STOPPED (32)] When VIRTIO_F_DEVICE_STOPPED is negotiated, >> + indicates that the device has been stopped by the driver. >> + >> \item[DEVICE_NEEDS_RESET (64)] Indicates that the device has experienced >> an error from which it can't recover. >> \end{description} >> @@ -58,8 +61,9 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> \ref{sec:General Initialization And Device Operation / Device >> Initialization}. >> The driver MUST NOT clear a >> -\field{device status} bit. If the driver sets the FAILED bit, >> -the driver MUST later reset the device before attempting to re-initialize. >> +\field{device status} bit other than DEVICE_STOPPED. If the >> +driver sets the FAILED bit, the driver MUST later reset the device >> +before attempting to re-initialize. >> >> The driver SHOULD NOT rely on completion of operations of a >> device if DEVICE_NEEDS_RESET is set. >> @@ -70,12 +74,28 @@ \section{\field{Device Status} Field}\label{sec:Basic Facilities of a Virtio Dev >> recover by issuing a reset. >> \end{note} >> >> +The driver MUST NOT set or clear DEVICE_STOPPED when DRIVER_OK is not >> +set. In order to stop the device, the driver MUST set DEVICE_STOPPED >> +first and re-read status to check whether DEVICE_STOPPED is set by the >> +device. In order to resume the device, the driver MUST clear >> +DEVICE_STOPPED first and read status to ensure whether DEVICE_STOPPED >> +is cleared by the device. >> + >> \devicenormative{\subsection}{Device Status Field}{Basic Facilities of a Virtio Device / Device Status Field} >> The device MUST initialize \field{device status} to 0 upon reset. >> >> The device MUST NOT consume buffers or send any used buffer >> notifications to the driver before DRIVER_OK. >> >> +The device MUST ignore DEVICE_STOPPED when DRIVER_OK is not set. >> + >> +When driver is trying to set DEVICE_STOPPED, the device MUST not >> +process new avail requests and MUST complete all requests that is >> +currently processing before setting DEVICE_STOPPED. >> + >> +The device MUST keep the config space unchanged when DEVICE_STOPPED is >> +set. >> + >> \label{sec:Basic Facilities of a Virtio Device / Device Status Field / DEVICENEEDSRESET}The device SHOULD set DEVICE_NEEDS_RESET when it enters an error state >> that a reset is needed. If DRIVER_OK is set, after it sets DEVICE_NEEDS_RESET, the device >> MUST send a device configuration change notification to the driver. >> @@ -6553,6 +6573,8 @@ \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits} >> \item[VIRTIO_F_NOTIFICATION_DATA(38)] This feature indicates >> that the driver passes extra data (besides identifying the virtqueue) >> in its device notifications. >> + \item[VIRTIO_F_DEVICE_STOP(39)] This feature indicates that the >> + driver can stop and resume the device. >> See \ref{sec:Virtqueues / Driver notifications}~\nameref{sec:Virtqueues / Driver notifications}. >> \end{description} >> >> -- >> 2.25.1 This publicly archived list offers a means to provide input to the OASIS Virtual I/O Device (VIRTIO) TC. In order to verify user consent to the Feedback License terms and to minimize spam in the list archive, subscription is required before posting. Subscribe: virtio-comment-subscribe@lists.oasis-open.org Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org List help: virtio-comment-help@lists.oasis-open.org List archive: https://lists.oasis-open.org/archives/virtio-comment/ Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists Committee: https://www.oasis-open.org/committees/virtio/ Join OASIS: https://www.oasis-open.org/join/ ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2021-05-06 7:26 UTC | newest] Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-18 4:23 [virtio-comment] [PATCH RFC] virtio: introduce VIRTIO_F_DEVICE_STOP Jason Wang 2020-12-18 10:15 ` [virtio-comment] " Stefano Garzarella 2020-12-21 3:08 ` Jason Wang 2020-12-21 11:06 ` Stefano Garzarella 2020-12-22 2:38 ` Jason Wang 2020-12-21 21:33 ` [virtio-comment] " Halil Pasic 2020-12-22 2:36 ` Jason Wang 2020-12-22 6:50 ` Halil Pasic 2020-12-22 7:30 ` Jason Wang 2020-12-22 12:14 ` Cornelia Huck 2020-12-22 12:51 ` Jason Wang 2020-12-22 15:54 ` Cornelia Huck 2020-12-23 2:48 ` Jason Wang 2020-12-25 7:38 ` Halil Pasic 2020-12-27 10:00 ` Michael S. Tsirkin 2020-12-28 6:21 ` Halil Pasic 2020-12-28 7:01 ` Jason Wang 2020-12-28 12:30 ` Michael S. Tsirkin 2020-12-29 9:04 ` Jason Wang 2021-01-12 10:54 ` Michael S. Tsirkin 2021-01-13 3:35 ` Jason Wang 2020-12-29 13:35 ` Halil Pasic 2020-12-30 8:15 ` Jason Wang 2021-01-11 18:16 ` Cornelia Huck 2021-01-12 3:27 ` Jason Wang 2021-01-12 12:13 ` Cornelia Huck 2021-01-13 2:52 ` Jason Wang 2021-01-14 12:00 ` Cornelia Huck 2020-12-28 6:47 ` Jason Wang 2020-12-29 13:20 ` Halil Pasic 2020-12-30 8:03 ` Jason Wang 2020-12-24 4:52 ` Halil Pasic 2020-12-24 5:51 ` Jason Wang 2020-12-25 3:18 ` Halil Pasic 2020-12-25 6:45 ` Jason Wang 2020-12-27 11:12 ` Michael S. Tsirkin 2020-12-28 7:05 ` Jason Wang 2020-12-28 12:27 ` Michael S. Tsirkin 2020-12-29 8:57 ` Jason Wang 2021-05-03 9:02 ` [virtio-comment] " Eugenio Perez Martin 2021-05-06 2:51 ` Jason Wang 2021-05-05 13:16 ` Michael S. Tsirkin 2021-05-06 7:26 ` Jason Wang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.