qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [VIRTIO PCI PATCH v5 0/1] Add freeze_mode to virtio_pci_common_cfg
@ 2023-09-19 11:42 Jiqian Chen
  2023-09-19 11:42 ` [VIRTIO PCI PATCH v5 1/1] transport-pci: " Jiqian Chen
  0 siblings, 1 reply; 32+ messages in thread
From: Jiqian Chen @ 2023-09-19 11:42 UTC (permalink / raw)
  To: Gerd Hoffmann, Jason Wang, Michael S . Tsirkin, Xuan Zhuo,
	David Airlie, Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev
  Cc: qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Alex Deucher, Christian Koenig, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Huang Rui,
	Jiqian Chen

Hi all,
This is the v5 patches and makes below changes:
* Since this series patches add a new mechanism that let virtgpu and Qemu can negotiate
  their reset behavior, and other guys hope me can improve this mechanism to virtio pci
  level, so that other virtio devices can also benefit from it. So instead of adding
  new feature flag VIRTIO_GPU_F_FREEZE_S3 only serves for virtgpu, v5 add a new parameter
  named freeze_mode to struct virtio_pci_common_cfg, when guest begin suspending, set
  freeze_mode to VIRTIO_PCI_FREEZE_MODE_FREEZE_S3, and then all virtio devices can get
  this status, and notice that guest is suspending, then they can change their reset
  behavior.

V5 of Qemu patch:
https://lore.kernel.org/qemu-devel/20230919110225.2282914-1-Jiqian.Chen@amd.com/T/#t
V5 of kernel patch:
https://lore.kernel.org/lkml/20230919104607.2282248-1-Jiqian.Chen@amd.com/T/#t

The link to trace this issue:
https://gitlab.com/qemu-project/qemu/-/issues/1860

Best regards,
Jiqian Chen

v4:
no v4 patches.
V4 of Qemu patch:
https://lore.kernel.org/qemu-devel/20230719074726.1613088-1-Jiqian.Chen@amd.com/T/#t
No v4 of kernel patch


v3:
makes below changes:
* Use enum for freeze mode, so this can be extended with more
  modes in the future.
* Rename functions and paratemers with "_S3" postfix.
* Explain in more detail
Link:
https://lists.oasis-open.org/archives/virtio-comment/202307/msg00209.html
V3 of Qemu patch:
https://lore.kernel.org/qemu-devel/20230720120816.8751-1-Jiqian.Chen@amd.com
V3 of Kernel patch: https://lore.kernel.org/lkml/20230720115805.8206-1-Jiqian.Chen@amd.com/T/#t


v2:
makes below changes:
* Elaborate on the types of resources.
* Add some descriptions for S3 and S4.
Link:
https://lists.oasis-open.org/archives/virtio-comment/202307/msg00160.html
V2 of Qemu patch:
https://lore.kernel.org/qemu-devel/20230630070016.841459-1-Jiqian.Chen@amd.com/T/#t
V2 of Kernel patch:
https://lore.kernel.org/lkml/20230630073448.842767-1-Jiqian.Chen@amd.com/T/#t


v1:
Hi all,
I am working to implement virtgpu S3 function on Xen.

Currently on Xen, if we start a guest through Qemu with enabling virtgpu, and then suspend and
s3resume guest. We can find that the guest kernel comes back, but the display doesn't. It just
shown a black screen.

That is because when guest was during suspending, it called into Qemu and Qemu destroyed all
resources and reset renderer. This made the display gone after guest resumed.

So, I add a mechanism that when guest is suspending, it will notify Qemu, and then Qemu will
not destroy resources. That can help guest's display come back.

As discussed and suggested by Robert Beckett and Gerd Hoffmann on v1 qemu's mailing list.
Due to that mechanism needs cooperation between guest and host. What's more, as virtio drivers
by design paravirt drivers, it is reasonable for guest to accept some cooperation with host to
manage suspend/resume. So I request to add a new feature flag, so that guest and host can
negotiate whenever freezing is supported or not.
Link:
https://lists.oasis-open.org/archives/virtio-comment/202306/msg00595.html
V1 of Qemu patch:
https://lore.kernel.org/qemu-devel/20230608025655.1674357-2-Jiqian.Chen@amd.com/
V1 of Kernel patch:
https://lore.kernel.org/lkml/20230608063857.1677973-1-Jiqian.Chen@amd.com/


Jiqian Chen (1):
  transport-pci: Add freeze_mode to virtio_pci_common_cfg

 transport-pci.tex | 7 +++++++
 1 file changed, 7 insertions(+)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 11:42 [VIRTIO PCI PATCH v5 0/1] Add freeze_mode to virtio_pci_common_cfg Jiqian Chen
@ 2023-09-19 11:42 ` Jiqian Chen
  2023-09-19 12:10   ` Parav Pandit
                     ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jiqian Chen @ 2023-09-19 11:42 UTC (permalink / raw)
  To: Gerd Hoffmann, Jason Wang, Michael S . Tsirkin, Xuan Zhuo,
	David Airlie, Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev
  Cc: qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Alex Deucher, Christian Koenig, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Huang Rui,
	Jiqian Chen

When guest vm does S3, Qemu will reset and clear some things of virtio
devices, but guest can't aware that, so that may cause some problems.
For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
resume, that function will destroy render resources of virtio-gpu. As
a result, after guest resume, the display can't come back and we only
saw a black screen. Due to guest can't re-create all the resources, so
we need to let Qemu not to destroy them when S3.

For above purpose, we need a mechanism that allows guests and QEMU to
negotiate their reset behavior. So this patch add a new parameter
named freeze_mode to struct virtio_pci_common_cfg. And when guest
suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
devices can change their reset behavior on Qemu side according to
freeze_mode. What's more, freeze_mode can be used for all virtio
devices to affect the behavior of Qemu, not just virtio gpu device.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 transport-pci.tex | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/transport-pci.tex b/transport-pci.tex
index a5c6719..2543536 100644
--- a/transport-pci.tex
+++ b/transport-pci.tex
@@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
         le64 queue_desc;                /* read-write */
         le64 queue_driver;              /* read-write */
         le64 queue_device;              /* read-write */
+        le16 freeze_mode;               /* read-write */
         le16 queue_notif_config_data;   /* read-only for driver */
         le16 queue_reset;               /* read-write */
 
@@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
 \item[\field{queue_device}]
         The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
 
+\item[\field{freeze_mode}]
+        The driver writes this to set the freeze mode of virtio pci.
+        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
+        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
+        Other values are reserved for future use, like S4, etc.
+
 \item[\field{queue_notif_config_data}]
         This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
         The driver will use this value when driver sends available buffer
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 11:42 ` [VIRTIO PCI PATCH v5 1/1] transport-pci: " Jiqian Chen
@ 2023-09-19 12:10   ` Parav Pandit
  2023-09-19 14:09     ` Michael S. Tsirkin
  2023-09-20  3:57     ` [virtio-dev] " Chen, Jiqian
  2023-09-19 12:31   ` Michael S. Tsirkin
  2023-09-21  4:22   ` Jason Wang
  2 siblings, 2 replies; 32+ messages in thread
From: Parav Pandit @ 2023-09-19 12:10 UTC (permalink / raw)
  To: Jiqian Chen, Gerd Hoffmann, Jason Wang, Michael S . Tsirkin,
	Xuan Zhuo, David Airlie, Gurchetan Singh, Chia-I Wu,
	Marc-André Lureau, Robert Beckett, Mikhail Golubev-Ciuchea,
	virtio-comment, virtio-dev
  Cc: qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Alex Deucher, Christian Koenig, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Huang Rui

Hi Jiqian,

> From: Jiqian Chen <Jiqian.Chen@amd.com>
> Sent: Tuesday, September 19, 2023 5:13 PM
> 
> When guest vm does S3, Qemu will reset and clear some things of virtio
> devices, but guest can't aware that, so that may cause some problems.
It is not true that guest VM is not aware of it.
As you show in your kernel patch, it is freeze/unfreeze in the guest VM PCI PM driver callback. So please update the commit log.

> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
s/excample/example

> resume, that function will destroy render resources of virtio-gpu. As a result,
> after guest resume, the display can't come back and we only saw a black
> screen. Due to guest can't re-create all the resources, so we need to let Qemu
> not to destroy them when S3.
Above QEMU specific details to go in cover letter, instead of commit log, but no strong opinion.
Explaining the use case is good.

> 
> For above purpose, we need a mechanism that allows guests and QEMU to
> negotiate their reset behavior. So this patch add a new parameter named
Freeze != reset. :)
Please fix it to say freeze or suspend.

> freeze_mode to struct virtio_pci_common_cfg. And when guest suspends, it can
> write freeze_mode to be FREEZE_S3, and then virtio devices can change their
> reset behavior on Qemu side according to freeze_mode. What's more,
Not reset, but suspend behavior.

> freeze_mode can be used for all virtio devices to affect the behavior of Qemu,
> not just virtio gpu device.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  transport-pci.tex | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex index a5c6719..2543536 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport
>          le64 queue_desc;                /* read-write */
>          le64 queue_driver;              /* read-write */
>          le64 queue_device;              /* read-write */
> +        le16 freeze_mode;               /* read-write */
>          le16 queue_notif_config_data;   /* read-only for driver */
>          le16 queue_reset;               /* read-write */
> 
The new field cannot be in the middle of the structure.
Otherwise, the location of the queue_notif_config_data depends on completely unrelated feature bit, breaking the backward compatibility.
So please move it at the end.

> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure
> layout}\label{sec:Virtio Transport  \item[\field{queue_device}]
>          The driver writes the physical address of Device Area here.  See section
> \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
> 
> +\item[\field{freeze_mode}]
> +        The driver writes this to set the freeze mode of virtio pci.
> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-
For above names, please define the actual values in the spec.

> pci enters S3 suspension;
> +        Other values are reserved for future use, like S4, etc.
> +
It cannot be just one way communication from driver to device as freezing the device of few hundred MB to GB of gpu memory or other device memory can take several msec.
Hence driver must poll to get the acknowledgement from the device that freeze functionality is completed.

Please refer to queue_reset register definition for achieving such scheme and reframe the wording for it.

Also kindly add the device and driver normative on how/when this register is accessed.

Also please fix the description to not talk about guest VM. Largely it only exists in theory of operation etc text.

You need to describe what exactly should happen in the device when its freeze.
Please refer to my series where infrastructure is added for device migration where the FREEZE mode behavior is defined.
It is similar to what you define, but its management plane operation controlled outside of the guest VM.
But it is good direction in terms of what to define in spec language.
https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#u

you are missing the feature bit to indicate to the driver that device supports this functionality.
Please add one.

>  \item[\field{queue_notif_config_data}]
>          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been
> negotiated.
>          The driver will use this value when driver sends available buffer
> --
> 2.34.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 11:42 ` [VIRTIO PCI PATCH v5 1/1] transport-pci: " Jiqian Chen
  2023-09-19 12:10   ` Parav Pandit
@ 2023-09-19 12:31   ` Michael S. Tsirkin
  2023-09-20  4:56     ` Chen, Jiqian
  2023-09-20  5:59     ` [virtio-comment] " Zhu, Lingshan
  2023-09-21  4:22   ` Jason Wang
  2 siblings, 2 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2023-09-19 12:31 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Alex Deucher, Christian Koenig, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Huang Rui

On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
> When guest vm does S3, Qemu will reset and clear some things of virtio
> devices, but guest can't aware that, so that may cause some problems.
> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
> resume, that function will destroy render resources of virtio-gpu. As
> a result, after guest resume, the display can't come back and we only
> saw a black screen. Due to guest can't re-create all the resources, so
> we need to let Qemu not to destroy them when S3.
> 
> For above purpose, we need a mechanism that allows guests and QEMU to
> negotiate their reset behavior. So this patch add a new parameter
> named freeze_mode to struct virtio_pci_common_cfg. And when guest
> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
> devices can change their reset behavior on Qemu side according to
> freeze_mode. What's more, freeze_mode can be used for all virtio
> devices to affect the behavior of Qemu, not just virtio gpu device.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  transport-pci.tex | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..2543536 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          le64 queue_desc;                /* read-write */
>          le64 queue_driver;              /* read-write */
>          le64 queue_device;              /* read-write */
> +        le16 freeze_mode;               /* read-write */
>          le16 queue_notif_config_data;   /* read-only for driver */
>          le16 queue_reset;               /* read-write */
>

we can't add fields in the middle of the structure like this -
offset of queue_notif_config_data and queue_reset changes.

  
> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  \item[\field{queue_device}]
>          The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>  
> +\item[\field{freeze_mode}]
> +        The driver writes this to set the freeze mode of virtio pci.
> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
> +        Other values are reserved for future use, like S4, etc.
> +

we need to specify these values then.

we also need
- feature bit to detect support for S3
- conformance statements documenting behavious under S3


>  \item[\field{queue_notif_config_data}]
>          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>          The driver will use this value when driver sends available buffer
> -- 
> 2.34.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 12:10   ` Parav Pandit
@ 2023-09-19 14:09     ` Michael S. Tsirkin
  2023-09-20  3:57     ` [virtio-dev] " Chen, Jiqian
  1 sibling, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2023-09-19 14:09 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jiqian Chen, Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Alex Deucher, Christian Koenig, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Huang Rui

On Tue, Sep 19, 2023 at 12:10:29PM +0000, Parav Pandit wrote:
> Hi Jiqian,
> 
> > From: Jiqian Chen <Jiqian.Chen@amd.com>
> > Sent: Tuesday, September 19, 2023 5:13 PM
> > 
> > When guest vm does S3, Qemu will reset and clear some things of virtio
> > devices, but guest can't aware that, so that may cause some problems.
> It is not true that guest VM is not aware of it.
> As you show in your kernel patch, it is freeze/unfreeze in the guest VM PCI PM driver callback. So please update the commit log.
> 
> > For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
> s/excample/example
> 
> > resume, that function will destroy render resources of virtio-gpu. As a result,
> > after guest resume, the display can't come back and we only saw a black
> > screen. Due to guest can't re-create all the resources, so we need to let Qemu
> > not to destroy them when S3.
> Above QEMU specific details to go in cover letter, instead of commit log, but no strong opinion.

i feel it does not matter much.

> Explaining the use case is good.
> 
> > 
> > For above purpose, we need a mechanism that allows guests and QEMU to
> > negotiate their reset behavior. So this patch add a new parameter named
> Freeze != reset. :)
> Please fix it to say freeze or suspend.
> 
> > freeze_mode to struct virtio_pci_common_cfg. And when guest suspends, it can
> > write freeze_mode to be FREEZE_S3, and then virtio devices can change their
> > reset behavior on Qemu side according to freeze_mode. What's more,
> Not reset, but suspend behavior.
> 
> > freeze_mode can be used for all virtio devices to affect the behavior of Qemu,
> > not just virtio gpu device.
> > 
> > Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> > ---
> >  transport-pci.tex | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/transport-pci.tex b/transport-pci.tex index a5c6719..2543536 100644
> > --- a/transport-pci.tex
> > +++ b/transport-pci.tex
> > @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure
> > layout}\label{sec:Virtio Transport
> >          le64 queue_desc;                /* read-write */
> >          le64 queue_driver;              /* read-write */
> >          le64 queue_device;              /* read-write */
> > +        le16 freeze_mode;               /* read-write */
> >          le16 queue_notif_config_data;   /* read-only for driver */
> >          le16 queue_reset;               /* read-write */
> > 
> The new field cannot be in the middle of the structure.
> Otherwise, the location of the queue_notif_config_data depends on completely unrelated feature bit, breaking the backward compatibility.
> So please move it at the end.
> 
> > @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure
> > layout}\label{sec:Virtio Transport  \item[\field{queue_device}]
> >          The driver writes the physical address of Device Area here.  See section
> > \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
> > 
> > +\item[\field{freeze_mode}]
> > +        The driver writes this to set the freeze mode of virtio pci.
> > +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> > +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-
> For above names, please define the actual values in the spec.
> 
> > pci enters S3 suspension;
> > +        Other values are reserved for future use, like S4, etc.
> > +
> It cannot be just one way communication from driver to device as freezing the device of few hundred MB to GB of gpu memory or other device memory can take several msec.
> Hence driver must poll to get the acknowledgement from the device that freeze functionality is completed.
> 
> Please refer to queue_reset register definition for achieving such scheme and reframe the wording for it.
> 
> Also kindly add the device and driver normative on how/when this register is accessed.
> 
> Also please fix the description to not talk about guest VM. Largely it only exists in theory of operation etc text.
> 
> You need to describe what exactly should happen in the device when its freeze.
> Please refer to my series where infrastructure is added for device migration where the FREEZE mode behavior is defined.
> It is similar to what you define, but its management plane operation controlled outside of the guest VM.
> But it is good direction in terms of what to define in spec language.
> https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#u
> 
> you are missing the feature bit to indicate to the driver that device supports this functionality.
> Please add one.
> 
> >  \item[\field{queue_notif_config_data}]
> >          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been
> > negotiated.
> >          The driver will use this value when driver sends available buffer
> > --
> > 2.34.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] RE: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 12:10   ` Parav Pandit
  2023-09-19 14:09     ` Michael S. Tsirkin
@ 2023-09-20  3:57     ` Chen, Jiqian
  2023-09-20  4:10       ` Parav Pandit
  1 sibling, 1 reply; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-20  3:57 UTC (permalink / raw)
  To: Parav Pandit
  Cc: virtio-dev, virtio-comment, Mikhail Golubev-Ciuchea,
	Robert Beckett, Marc-André Lureau, Chia-I Wu,
	Gurchetan Singh, David Airlie, Xuan Zhuo, Michael S . Tsirkin,
	Jason Wang, Gerd Hoffmann, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray,
	Chen, Jiqian

Hi Parav,

On 2023/9/19 20:10, Parav Pandit wrote:
> Hi Jiqian,
> 
>> From: Jiqian Chen <Jiqian.Chen@amd.com>
>> Sent: Tuesday, September 19, 2023 5:13 PM
>>
>> When guest vm does S3, Qemu will reset and clear some things of virtio
>> devices, but guest can't aware that, so that may cause some problems.
> It is not true that guest VM is not aware of it.
> As you show in your kernel patch, it is freeze/unfreeze in the guest VM PCI PM driver callback. So please update the commit log.
Thanks, I will update it.

> 
>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
> s/excample/example
> 
>> resume, that function will destroy render resources of virtio-gpu. As a result,
>> after guest resume, the display can't come back and we only saw a black
>> screen. Due to guest can't re-create all the resources, so we need to let Qemu
>> not to destroy them when S3.
> Above QEMU specific details to go in cover letter, instead of commit log, but no strong opinion.
> Explaining the use case is good.
Thanks, I will also add it to cover letter.

> 
>>
>> For above purpose, we need a mechanism that allows guests and QEMU to
>> negotiate their reset behavior. So this patch add a new parameter named
> Freeze != reset. :)
> Please fix it to say freeze or suspend.
But in my virtio-gpu scene, I want to prevent Qemu destroying resources when Guest do resuming(pci_pm_resume-> virtio_pci_restore-> virtio_device_restore-> virtio_reset_device-> vp_modern_set_status->Qemu virtio_pci_reset->virtio_gpu_gl_reset-> virtio_gpu_reset). And I add check in virtio_gpu_gl_reset and virtio_gpu_reset, if freeze_mode was set to FREEZE_S3 during Guest suspending, Qemu will not destroy resources. So the reason why I add this mechanism is to affect the reset behavior. And I think this also can help other virtio devices to affect their behavior, like the issue of virtio-video which Mikhail Golubev-Ciuchea encountered.

> 
>> freeze_mode to struct virtio_pci_common_cfg. And when guest suspends, it can
>> write freeze_mode to be FREEZE_S3, and then virtio devices can change their
>> reset behavior on Qemu side according to freeze_mode. What's more,
> Not reset, but suspend behavior.
The same reason as above.

> 
>> freeze_mode can be used for all virtio devices to affect the behavior of Qemu,
>> not just virtio gpu device.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  transport-pci.tex | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex index a5c6719..2543536 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure
>> layout}\label{sec:Virtio Transport
>>          le64 queue_desc;                /* read-write */
>>          le64 queue_driver;              /* read-write */
>>          le64 queue_device;              /* read-write */
>> +        le16 freeze_mode;               /* read-write */
>>          le16 queue_notif_config_data;   /* read-only for driver */
>>          le16 queue_reset;               /* read-write */
>>
> The new field cannot be in the middle of the structure.
> Otherwise, the location of the queue_notif_config_data depends on completely unrelated feature bit, breaking the backward compatibility.
> So please move it at the end.
I have confused about this. I found in latest kernel code(master branch):
struct virtio_pci_common_cfg {
	/* About the whole device. */
	__le32 device_feature_select;	/* read-write */
	__le32 device_feature;		/* read-only */
	__le32 guest_feature_select;	/* read-write */
	__le32 guest_feature;		/* read-write */
	__le16 msix_config;		/* read-write */
	__le16 num_queues;		/* read-only */
	__u8 device_status;		/* read-write */
	__u8 config_generation;		/* read-only */

	/* About a specific virtqueue. */
	__le16 queue_select;		/* read-write */
	__le16 queue_size;		/* read-write, power of 2. */
	__le16 queue_msix_vector;	/* read-write */
	__le16 queue_enable;		/* read-write */
	__le16 queue_notify_off;	/* read-only */
	__le32 queue_desc_lo;		/* read-write */
	__le32 queue_desc_hi;		/* read-write */
	__le32 queue_avail_lo;		/* read-write */
	__le32 queue_avail_hi;		/* read-write */
	__le32 queue_used_lo;		/* read-write */
	__le32 queue_used_hi;		/* read-write */

	__le16 freeze_mode;		/* read-write */
};
There is no queue_notif_config_data or queue_reset, and freeze_mode I added is at the end. Why is it different from virtio-spec?

> 
>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure
>> layout}\label{sec:Virtio Transport  \item[\field{queue_device}]
>>          The driver writes the physical address of Device Area here.  See section
>> \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>
>> +\item[\field{freeze_mode}]
>> +        The driver writes this to set the freeze mode of virtio pci.
>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-
> For above names, please define the actual values in the spec.
Ok, I will add them.

> 
>> pci enters S3 suspension;
>> +        Other values are reserved for future use, like S4, etc.
>> +
> It cannot be just one way communication from driver to device as freezing the device of few hundred MB to GB of gpu memory or other device memory can take several msec.
> Hence driver must poll to get the acknowledgement from the device that freeze functionality is completed.
I think the freeze functionality itself has not many problems. My patches just want to tell Qemu that the reset request is from the process of guest resuming not other scene, and write a status into freeze_mode, then we can change the reset behavior during guest resuming.

> 
> Please refer to queue_reset register definition for achieving such scheme and reframe the wording for it.
> 
> Also kindly add the device and driver normative on how/when this register is accessed.
Thanks, I will add them.

> 
> Also please fix the description to not talk about guest VM. Largely it only exists in theory of operation etc text.
Thanks, I will change it.

> 
> You need to describe what exactly should happen in the device when its freeze.
> Please refer to my series where infrastructure is added for device migration where the FREEZE mode behavior is defined.
> It is similar to what you define, but its management plane operation controlled outside of the guest VM.
> But it is good direction in terms of what to define in spec language.
> https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@nvidia.com/T/#u
Thank you very much for your suggestion. I will refer to your link and then modify my description.

> 
> you are missing the feature bit to indicate to the driver that device supports this functionality.
> Please add one.
Do I need to add feature bit to DEFINE_VIRTIO_COMMON_FEATURES? And each time when I write freeze_mode filed on kernel driver side, I need to check this bit?

> 
>>  \item[\field{queue_notif_config_data}]
>>          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been
>> negotiated.
>>          The driver will use this value when driver sends available buffer
>> --
>> 2.34.1
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] RE: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  3:57     ` [virtio-dev] " Chen, Jiqian
@ 2023-09-20  4:10       ` Parav Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  4:10 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: virtio-dev, virtio-comment, Mikhail Golubev-Ciuchea,
	Robert Beckett, Marc-André Lureau, Chia-I Wu,
	Gurchetan Singh, David Airlie, Xuan Zhuo, Michael S . Tsirkin,
	Jason Wang, Gerd Hoffmann, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> Sent: Wednesday, September 20, 2023 9:28 AM

> >> For above purpose, we need a mechanism that allows guests and QEMU to
> >> negotiate their reset behavior. So this patch add a new parameter
> >> named
> > Freeze != reset. :)
> > Please fix it to say freeze or suspend.
> But in my virtio-gpu scene, I want to prevent Qemu destroying resources when
> Guest do resuming(pci_pm_resume-> virtio_pci_restore->
> virtio_device_restore-> virtio_reset_device-> vp_modern_set_status->Qemu
> virtio_pci_reset->virtio_gpu_gl_reset-> virtio_gpu_reset). And I add check in
> virtio_gpu_gl_reset and virtio_gpu_reset, if freeze_mode was set to FREEZE_S3
> during Guest suspending, Qemu will not destroy resources. So the reason why I
> add this mechanism is to affect the reset behavior. And I think this also can help
> other virtio devices to affect their behavior, like the issue of virtio-video which
> Mikhail Golubev-Ciuchea encountered.
>
The point is when driver tells to freeze, it is freeze command and not reset.
So resume() should not invoke device_reset() when FREEZE+RESUME supported.
 
> >
> >> freeze_mode to struct virtio_pci_common_cfg. And when guest suspends,
> >> it can write freeze_mode to be FREEZE_S3, and then virtio devices can
> >> change their reset behavior on Qemu side according to freeze_mode.
> >> What's more,
> > Not reset, but suspend behavior.
> The same reason as above.
>
Reset should not be done by the guest driver when the device supports unfreeze.
 
> >
> >> freeze_mode can be used for all virtio devices to affect the behavior
> >> of Qemu, not just virtio gpu device.
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >>  transport-pci.tex | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> diff --git a/transport-pci.tex b/transport-pci.tex index
> >> a5c6719..2543536 100644
> >> --- a/transport-pci.tex
> >> +++ b/transport-pci.tex
> >> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure
> >> layout}\label{sec:Virtio Transport
> >>          le64 queue_desc;                /* read-write */
> >>          le64 queue_driver;              /* read-write */
> >>          le64 queue_device;              /* read-write */
> >> +        le16 freeze_mode;               /* read-write */
> >>          le16 queue_notif_config_data;   /* read-only for driver */
> >>          le16 queue_reset;               /* read-write */
> >>
> > The new field cannot be in the middle of the structure.
> > Otherwise, the location of the queue_notif_config_data depends on
> completely unrelated feature bit, breaking the backward compatibility.
> > So please move it at the end.
> I have confused about this. I found in latest kernel code(master branch):
> struct virtio_pci_common_cfg {
> 	/* About the whole device. */
> 	__le32 device_feature_select;	/* read-write */
> 	__le32 device_feature;		/* read-only */
> 	__le32 guest_feature_select;	/* read-write */
> 	__le32 guest_feature;		/* read-write */
> 	__le16 msix_config;		/* read-write */
> 	__le16 num_queues;		/* read-only */
> 	__u8 device_status;		/* read-write */
> 	__u8 config_generation;		/* read-only */
> 
> 	/* About a specific virtqueue. */
> 	__le16 queue_select;		/* read-write */
> 	__le16 queue_size;		/* read-write, power of 2. */
> 	__le16 queue_msix_vector;	/* read-write */
> 	__le16 queue_enable;		/* read-write */
> 	__le16 queue_notify_off;	/* read-only */
> 	__le32 queue_desc_lo;		/* read-write */
> 	__le32 queue_desc_hi;		/* read-write */
> 	__le32 queue_avail_lo;		/* read-write */
> 	__le32 queue_avail_hi;		/* read-write */
> 	__le32 queue_used_lo;		/* read-write */
> 	__le32 queue_used_hi;		/* read-write */
> 
> 	__le16 freeze_mode;		/* read-write */
> };
> There is no queue_notif_config_data or queue_reset, and freeze_mode I added
> is at the end. Why is it different from virtio-spec?
>
Because notify data may not be used by Linux driver so it may be shorter.
I didn’t dig code yet.
 
> >
> >> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure
> >> layout}\label{sec:Virtio Transport  \item[\field{queue_device}]
> >>          The driver writes the physical address of Device Area here.
> >> See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
> >>
> >> +\item[\field{freeze_mode}]
> >> +        The driver writes this to set the freeze mode of virtio pci.
> >> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> >> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and
> >> +virtio-
> > For above names, please define the actual values in the spec.
> Ok, I will add them.
> 
> >
> >> pci enters S3 suspension;
> >> +        Other values are reserved for future use, like S4, etc.
> >> +
> > It cannot be just one way communication from driver to device as freezing the
> device of few hundred MB to GB of gpu memory or other device memory can
> take several msec.
> > Hence driver must poll to get the acknowledgement from the device that
> freeze functionality is completed.
> I think the freeze functionality itself has not many problems. My patches just
> want to tell Qemu that the reset request is from the process of guest resuming
> not other scene, and write a status into freeze_mode, then we can change the
> reset behavior during guest resuming.
> 
Either guest should do freeze or reset, not both.
With that each functionality has clear semantics of what it exactly does.
Freeze to not change reset behavior.

I am not saying freeze functionality has problem. Freeze functionality is request, response mechanism.
Driver requests its, device takes it sweet time more than 50nsec, to freeze large device. Responds back in a 1msec or some finite time that freeze done.
And driver can progress to freeze the VM.

Same on unfreeze, it can bring back large amount of memory from some slow media.
So unfreeze to be request->response as well.

> >
> >
> > You need to describe what exactly should happen in the device when its
> freeze.
> > Please refer to my series where infrastructure is added for device migration
> where the FREEZE mode behavior is defined.
> > It is similar to what you define, but its management plane operation controlled
> outside of the guest VM.
> > But it is good direction in terms of what to define in spec language.
> > https://lore.kernel.org/virtio-comment/20230909142911.524407-7-parav@n
> > vidia.com/T/#u
> Thank you very much for your suggestion. I will refer to your link and then
> modify my description.
> 
> >
> > you are missing the feature bit to indicate to the driver that device supports
> this functionality.
> > Please add one.
> Do I need to add feature bit to DEFINE_VIRTIO_COMMON_FEATURES? 
Explore VIRTIO_F_RING_RESET touch points.
You can also explore new patch [1] which adds generic feature bit to understand the spec touch points where to add etc.

[1] https://lore.kernel.org/virtio-comment/20230918173518.15900-1-parav@nvidia.com/T/#m9dd18d352e3ac38e0e7c82ad9a634db43dfc8b3b

> And
> each time when I write freeze_mode filed on kernel driver side, I need to check
> this bit?
> 
Yes. 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 12:31   ` Michael S. Tsirkin
@ 2023-09-20  4:56     ` Chen, Jiqian
  2023-09-20  5:59     ` [virtio-comment] " Zhu, Lingshan
  1 sibling, 0 replies; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-20  4:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray,
	Chen, Jiqian

Hi Michael S. Tsirkin,

On 2023/9/19 20:31, Michael S. Tsirkin wrote:
> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>> When guest vm does S3, Qemu will reset and clear some things of virtio
>> devices, but guest can't aware that, so that may cause some problems.
>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>> resume, that function will destroy render resources of virtio-gpu. As
>> a result, after guest resume, the display can't come back and we only
>> saw a black screen. Due to guest can't re-create all the resources, so
>> we need to let Qemu not to destroy them when S3.
>>
>> For above purpose, we need a mechanism that allows guests and QEMU to
>> negotiate their reset behavior. So this patch add a new parameter
>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>> devices can change their reset behavior on Qemu side according to
>> freeze_mode. What's more, freeze_mode can be used for all virtio
>> devices to affect the behavior of Qemu, not just virtio gpu device.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  transport-pci.tex | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..2543536 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>          le64 queue_desc;                /* read-write */
>>          le64 queue_driver;              /* read-write */
>>          le64 queue_device;              /* read-write */
>> +        le16 freeze_mode;               /* read-write */
>>          le16 queue_notif_config_data;   /* read-only for driver */
>>          le16 queue_reset;               /* read-write */
>>
> 
> we can't add fields in the middle of the structure like this -
> offset of queue_notif_config_data and queue_reset changes.
I have confused about this. I found in latest kernel code(master branch):
struct virtio_pci_common_cfg {
	/* About the whole device. */
	__le32 device_feature_select;	/* read-write */
	__le32 device_feature;		/* read-only */
	__le32 guest_feature_select;	/* read-write */
	__le32 guest_feature;		/* read-write */
	__le16 msix_config;		/* read-write */
	__le16 num_queues;		/* read-only */
	__u8 device_status;		/* read-write */
	__u8 config_generation;		/* read-only */

	/* About a specific virtqueue. */
	__le16 queue_select;		/* read-write */
	__le16 queue_size;		/* read-write, power of 2. */
	__le16 queue_msix_vector;	/* read-write */
	__le16 queue_enable;		/* read-write */
	__le16 queue_notify_off;	/* read-only */
	__le32 queue_desc_lo;		/* read-write */
	__le32 queue_desc_hi;		/* read-write */
	__le32 queue_avail_lo;		/* read-write */
	__le32 queue_avail_hi;		/* read-write */
	__le32 queue_used_lo;		/* read-write */
	__le32 queue_used_hi;		/* read-write */

	__le16 freeze_mode;		/* read-write */
};
There is no queue_notif_config_data or queue_reset, and freeze_mode I added is at the end. Why is it different from virtio-spec?
I add the offset for freeze_mode(VIRTIO_PCI_COMMON_F_MODE), and change the offset of Q_NDATA and Q_RESET
-#define VIRTIO_PCI_COMMON_Q_NDATA      56
-#define VIRTIO_PCI_COMMON_Q_RESET      58
+#define VIRTIO_PCI_COMMON_F_MODE       56
+#define VIRTIO_PCI_COMMON_Q_NDATA      58
+#define VIRTIO_PCI_COMMON_Q_RESET      60

> 
>   
>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>  \item[\field{queue_device}]
>>          The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>  
>> +\item[\field{freeze_mode}]
>> +        The driver writes this to set the freeze mode of virtio pci.
>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>> +        Other values are reserved for future use, like S4, etc.
>> +
> 
> we need to specify these values then.
Thanks, I will add the values.

> 
> we also need
> - feature bit to detect support for S3
Do I need to add feature bit to DEFINE_VIRTIO_COMMON_FEATURES? And each time when I write freeze_mode filed on kernel driver side, I need to check this bit?

> - conformance statements documenting behavious under S3
Sorry, I am not very sure. Do you mean when freeze_mode is set FREEZE_S3, what operations should driver and device to do? Can you elaborate on it, or give an example?

> 
> 
>>  \item[\field{queue_notif_config_data}]
>>          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>          The driver will use this value when driver sends available buffer
>> -- 
>> 2.34.1
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 12:31   ` Michael S. Tsirkin
  2023-09-20  4:56     ` Chen, Jiqian
@ 2023-09-20  5:59     ` Zhu, Lingshan
  2023-09-20  6:33       ` [virtio-dev] " Chen, Jiqian
  1 sibling, 1 reply; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  5:59 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jiqian Chen
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Alex Deucher, Christian Koenig, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Huang Rui



On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>> When guest vm does S3, Qemu will reset and clear some things of virtio
>> devices, but guest can't aware that, so that may cause some problems.
>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>> resume, that function will destroy render resources of virtio-gpu. As
>> a result, after guest resume, the display can't come back and we only
>> saw a black screen. Due to guest can't re-create all the resources, so
>> we need to let Qemu not to destroy them when S3.
>>
>> For above purpose, we need a mechanism that allows guests and QEMU to
>> negotiate their reset behavior. So this patch add a new parameter
>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>> devices can change their reset behavior on Qemu side according to
>> freeze_mode. What's more, freeze_mode can be used for all virtio
>> devices to affect the behavior of Qemu, not just virtio gpu device.
Hi Jiqian,

Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and 
vq state
https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b5561b@intel.com/T/

We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
negotiated, the driver can set SUSPEND in the device status to suspend the
device.

When SUSPEND, the device should pause its operations and preserve its 
configurations
in its configuration space.

The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes 
running.

This is originally to serve live migration, but I think it can also meet 
your needs.

Thanks,
Zhu Lingshan
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>   transport-pci.tex | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..2543536 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>           le64 queue_desc;                /* read-write */
>>           le64 queue_driver;              /* read-write */
>>           le64 queue_device;              /* read-write */
>> +        le16 freeze_mode;               /* read-write */
>>           le16 queue_notif_config_data;   /* read-only for driver */
>>           le16 queue_reset;               /* read-write */
>>
> we can't add fields in the middle of the structure like this -
> offset of queue_notif_config_data and queue_reset changes.
>
>    
>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>   \item[\field{queue_device}]
>>           The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>   
>> +\item[\field{freeze_mode}]
>> +        The driver writes this to set the freeze mode of virtio pci.
>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>> +        Other values are reserved for future use, like S4, etc.
>> +
> we need to specify these values then.
>
> we also need
> - feature bit to detect support for S3
> - conformance statements documenting behavious under S3
>
>
>>   \item[\field{queue_notif_config_data}]
>>           This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>           The driver will use this value when driver sends available buffer
>> -- 
>> 2.34.1
>
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
>
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
>
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  5:59     ` [virtio-comment] " Zhu, Lingshan
@ 2023-09-20  6:33       ` Chen, Jiqian
  2023-09-20  6:58         ` Parav Pandit
  2023-09-20  6:58         ` Zhu, Lingshan
  0 siblings, 2 replies; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-20  6:33 UTC (permalink / raw)
  To: Zhu, Lingshan, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray,
	Chen, Jiqian

Hi Lingshan,

On 2023/9/20 13:59, Zhu, Lingshan wrote:
> 
> 
> On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
>> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>>> When guest vm does S3, Qemu will reset and clear some things of virtio
>>> devices, but guest can't aware that, so that may cause some problems.
>>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>>> resume, that function will destroy render resources of virtio-gpu. As
>>> a result, after guest resume, the display can't come back and we only
>>> saw a black screen. Due to guest can't re-create all the resources, so
>>> we need to let Qemu not to destroy them when S3.
>>>
>>> For above purpose, we need a mechanism that allows guests and QEMU to
>>> negotiate their reset behavior. So this patch add a new parameter
>>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>>> devices can change their reset behavior on Qemu side according to
>>> freeze_mode. What's more, freeze_mode can be used for all virtio
>>> devices to affect the behavior of Qemu, not just virtio gpu device.
> Hi Jiqian,
> 
> Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
> https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b5561b@intel.com/T/
> 
> We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
> negotiated, the driver can set SUSPEND in the device status to suspend the
> device.
> 
> When SUSPEND, the device should pause its operations and preserve its configurations
> in its configuration space.
> 
> The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.
> 
> This is originally to serve live migration, but I think it can also meet your needs.
I noticed your series, but I am not sure they are also meet my needs.
If driver write 0 to reset device, can the SUSPEND bit be cleared? (pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if the reset request is from guest restore process or not, and then I can't change the reset behavior.
Can you send me your patch link on kernel and qemu side? I will take a deep look.

> 
> Thanks,
> Zhu Lingshan
>>>
>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>> ---
>>>   transport-pci.tex | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>> index a5c6719..2543536 100644
>>> --- a/transport-pci.tex
>>> +++ b/transport-pci.tex
>>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>           le64 queue_desc;                /* read-write */
>>>           le64 queue_driver;              /* read-write */
>>>           le64 queue_device;              /* read-write */
>>> +        le16 freeze_mode;               /* read-write */
>>>           le16 queue_notif_config_data;   /* read-only for driver */
>>>           le16 queue_reset;               /* read-write */
>>>
>> we can't add fields in the middle of the structure like this -
>> offset of queue_notif_config_data and queue_reset changes.
>>
>>   
>>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>   \item[\field{queue_device}]
>>>           The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>>   +\item[\field{freeze_mode}]
>>> +        The driver writes this to set the freeze mode of virtio pci.
>>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>>> +        Other values are reserved for future use, like S4, etc.
>>> +
>> we need to specify these values then.
>>
>> we also need
>> - feature bit to detect support for S3
>> - conformance statements documenting behavious under S3
>>
>>
>>>   \item[\field{queue_notif_config_data}]
>>>           This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>>           The driver will use this value when driver sends available buffer
>>> -- 
>>> 2.34.1
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  6:33       ` [virtio-dev] " Chen, Jiqian
@ 2023-09-20  6:58         ` Parav Pandit
  2023-09-20  7:06           ` Zhu, Lingshan
  2023-09-20  6:58         ` Zhu, Lingshan
  1 sibling, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  6:58 UTC (permalink / raw)
  To: Chen, Jiqian, Zhu, Lingshan, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray


> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> Sent: Wednesday, September 20, 2023 12:03 PM

> If driver write 0 to reset device, can the SUSPEND bit be cleared?
It must as reset operation, resets everything else and so the suspend too.

> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
> >virtio_reset_device)
> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
> the reset request is from guest restore process or not, and then I can't change
> the reset behavior.
Reset should not be influenced by suspend.
Suspend should do the work of suspend and reset to do the reset.

The problem to overcome in [1] is, resume operation needs to be synchronous as it involves large part of context to resume back, and hence just asynchronously setting DRIVER_OK is not enough.
The sw must verify back that device has resumed the operation and ready to answer requests.

This is slightly different flow than setting the DRIVER_OK for the first time device initialization sequence as it does not involve large restoration.

So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should clear the SUSPEND bit and verify that it is out of SUSPEND.

Because driver is still in _OK_ driving the device flipping the SUSPEND bit.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  6:33       ` [virtio-dev] " Chen, Jiqian
  2023-09-20  6:58         ` Parav Pandit
@ 2023-09-20  6:58         ` Zhu, Lingshan
  2023-09-20  7:17           ` [virtio-comment] " Chen, Jiqian
  1 sibling, 1 reply; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  6:58 UTC (permalink / raw)
  To: Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 2:33 PM, Chen, Jiqian wrote:
> Hi Lingshan,
>
> On 2023/9/20 13:59, Zhu, Lingshan wrote:
>>
>> On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
>>> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>>>> When guest vm does S3, Qemu will reset and clear some things of virtio
>>>> devices, but guest can't aware that, so that may cause some problems.
>>>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>>>> resume, that function will destroy render resources of virtio-gpu. As
>>>> a result, after guest resume, the display can't come back and we only
>>>> saw a black screen. Due to guest can't re-create all the resources, so
>>>> we need to let Qemu not to destroy them when S3.
>>>>
>>>> For above purpose, we need a mechanism that allows guests and QEMU to
>>>> negotiate their reset behavior. So this patch add a new parameter
>>>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>>>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>>>> devices can change their reset behavior on Qemu side according to
>>>> freeze_mode. What's more, freeze_mode can be used for all virtio
>>>> devices to affect the behavior of Qemu, not just virtio gpu device.
>> Hi Jiqian,
>>
>> Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
>> https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b5561b@intel.com/T/
>>
>> We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
>> negotiated, the driver can set SUSPEND in the device status to suspend the
>> device.
>>
>> When SUSPEND, the device should pause its operations and preserve its configurations
>> in its configuration space.
>>
>> The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.
>>
>> This is originally to serve live migration, but I think it can also meet your needs.
> I noticed your series, but I am not sure they are also meet my needs.
> If driver write 0 to reset device, can the SUSPEND bit be cleared? (pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
if the driver writes 0, it resets all virtio functionalities. So SUSPEND 
is cleared.
device reset can also be used to recover the device from fatal errors, 
so it should reset everything in virtio.
> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if the reset request is from guest restore process or not, and then I can't change the reset behavior.
I think when enter S3, the hypervisor/driver should set SUSPEND to the 
device. And when resume from S3, the hypervisor/driver should
re-write DRIVER_OK to clear SUSPEND, then the device resume running.
> Can you send me your patch link on kernel and qemu side? I will take a deep look.
There are no patches for qemu/kernel yet, spec first.
>
>> Thanks,
>> Zhu Lingshan
>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>> ---
>>>>    transport-pci.tex | 7 +++++++
>>>>    1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>>> index a5c6719..2543536 100644
>>>> --- a/transport-pci.tex
>>>> +++ b/transport-pci.tex
>>>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>>            le64 queue_desc;                /* read-write */
>>>>            le64 queue_driver;              /* read-write */
>>>>            le64 queue_device;              /* read-write */
>>>> +        le16 freeze_mode;               /* read-write */
>>>>            le16 queue_notif_config_data;   /* read-only for driver */
>>>>            le16 queue_reset;               /* read-write */
>>>>
>>> we can't add fields in the middle of the structure like this -
>>> offset of queue_notif_config_data and queue_reset changes.
>>>
>>>    
>>>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>>    \item[\field{queue_device}]
>>>>            The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>>>    +\item[\field{freeze_mode}]
>>>> +        The driver writes this to set the freeze mode of virtio pci.
>>>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>>>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>>>> +        Other values are reserved for future use, like S4, etc.
>>>> +
>>> we need to specify these values then.
>>>
>>> we also need
>>> - feature bit to detect support for S3
>>> - conformance statements documenting behavious under S3
>>>
>>>
>>>>    \item[\field{queue_notif_config_data}]
>>>>            This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>>>            The driver will use this value when driver sends available buffer
>>>> -- 
>>>> 2.34.1
>>> This publicly archived list offers a means to provide input to the
>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>
>>> In order to verify user consent to the Feedback License terms and
>>> to minimize spam in the list archive, subscription is required
>>> before posting.
>>>
>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>> List help: virtio-comment-help@lists.oasis-open.org
>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>> Committee: https://www.oasis-open.org/committees/virtio/
>>> Join OASIS: https://www.oasis-open.org/join/
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  6:58         ` Parav Pandit
@ 2023-09-20  7:06           ` Zhu, Lingshan
  2023-09-20  7:10             ` Parav Pandit
  2023-09-20  7:24             ` Chen, Jiqian
  0 siblings, 2 replies; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  7:06 UTC (permalink / raw)
  To: Parav Pandit, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 2:58 PM, Parav Pandit wrote:
>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>> Sent: Wednesday, September 20, 2023 12:03 PM
>> If driver write 0 to reset device, can the SUSPEND bit be cleared?
> It must as reset operation, resets everything else and so the suspend too.
>
>> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
>>> virtio_reset_device)
>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
>> the reset request is from guest restore process or not, and then I can't change
>> the reset behavior.
> Reset should not be influenced by suspend.
> Suspend should do the work of suspend and reset to do the reset.
>
> The problem to overcome in [1] is, resume operation needs to be synchronous as it involves large part of context to resume back, and hence just asynchronously setting DRIVER_OK is not enough.
> The sw must verify back that device has resumed the operation and ready to answer requests.
this is not live migration, all device status and other information 
still stay in the device, no need to "resume" context, just resume running.

Like resume from a failed LM.
>
> This is slightly different flow than setting the DRIVER_OK for the first time device initialization sequence as it does not involve large restoration.
>
> So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should clear the SUSPEND bit and verify that it is out of SUSPEND.
>
> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
Please read the spec, it says:
The driver MUST NOT clear a device status bit




^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:06           ` Zhu, Lingshan
@ 2023-09-20  7:10             ` Parav Pandit
  2023-09-20  7:27               ` Zhu, Lingshan
  2023-09-20  7:24             ` Chen, Jiqian
  1 sibling, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  7:10 UTC (permalink / raw)
  To: Zhu, Lingshan, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 20, 2023 12:37 PM

> > The problem to overcome in [1] is, resume operation needs to be synchronous
> as it involves large part of context to resume back, and hence just
> asynchronously setting DRIVER_OK is not enough.
> > The sw must verify back that device has resumed the operation and ready to
> answer requests.
> this is not live migration, all device status and other information still stay in the
> device, no need to "resume" context, just resume running.
> 
I am aware that it is not live migration. :)

"Just resuming" involves lot of device setup task. The device implementation does not know for how long a device is suspended.
So for example, a VM is suspended for 6 hours, hence the device context could be saved in a slow disk.
Hence, when the resume is done, it needs to setup things again and driver got to verify before accessing more from the device.
 
> Like resume from a failed LM.
> >
> > This is slightly different flow than setting the DRIVER_OK for the first time
> device initialization sequence as it does not involve large restoration.
> >
> > So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver
> should clear the SUSPEND bit and verify that it is out of SUSPEND.
> >
> > Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
> 
Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's synchronous new register..


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  6:58         ` Zhu, Lingshan
@ 2023-09-20  7:17           ` Chen, Jiqian
  2023-09-20  7:42             ` Zhu, Lingshan
  0 siblings, 1 reply; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-20  7:17 UTC (permalink / raw)
  To: Zhu, Lingshan, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray

Hi Lingshan,

On 2023/9/20 14:58, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 2:33 PM, Chen, Jiqian wrote:
>> Hi Lingshan,
>>
>> On 2023/9/20 13:59, Zhu, Lingshan wrote:
>>>
>>> On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
>>>> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>>>>> When guest vm does S3, Qemu will reset and clear some things of virtio
>>>>> devices, but guest can't aware that, so that may cause some problems.
>>>>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>>>>> resume, that function will destroy render resources of virtio-gpu. As
>>>>> a result, after guest resume, the display can't come back and we only
>>>>> saw a black screen. Due to guest can't re-create all the resources, so
>>>>> we need to let Qemu not to destroy them when S3.
>>>>>
>>>>> For above purpose, we need a mechanism that allows guests and QEMU to
>>>>> negotiate their reset behavior. So this patch add a new parameter
>>>>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>>>>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>>>>> devices can change their reset behavior on Qemu side according to
>>>>> freeze_mode. What's more, freeze_mode can be used for all virtio
>>>>> devices to affect the behavior of Qemu, not just virtio gpu device.
>>> Hi Jiqian,
>>>
>>> Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
>>> https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b5561b@intel.com/T/
>>>
>>> We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
>>> negotiated, the driver can set SUSPEND in the device status to suspend the
>>> device.
>>>
>>> When SUSPEND, the device should pause its operations and preserve its configurations
>>> in its configuration space.
>>>
>>> The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.
>>>
>>> This is originally to serve live migration, but I think it can also meet your needs.
>> I noticed your series, but I am not sure they are also meet my needs.
>> If driver write 0 to reset device, can the SUSPEND bit be cleared? (pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
> if the driver writes 0, it resets all virtio functionalities. So SUSPEND is cleared.
Then your patches are not meet my needs. In my scene, it needs to keep the SUSPEND bit util the resume process is complete.
Because in my virtio-gpu scene, when guest resume, it call virtio_reset_device to clear all device status bits, and then reset virtio-gpu in Qemu, and then destroy render resources, I don't want the resources are destroyed during the resume process. So, I add freeze_mode to tell Qemu that guest is doing S3 and resources need to be kept.

> device reset can also be used to recover the device from fatal errors, so it should reset everything in virtio.
>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if the reset request is from guest restore process or not, and then I can't change the reset behavior.
> I think when enter S3, the hypervisor/driver should set SUSPEND to the device. And when resume from S3, the hypervisor/driver should
> re-write DRIVER_OK to clear SUSPEND, then the device resume running.
>> Can you send me your patch link on kernel and qemu side? I will take a deep look.
> There are no patches for qemu/kernel yet, spec first.
>>
>>> Thanks,
>>> Zhu Lingshan
>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>> ---
>>>>>    transport-pci.tex | 7 +++++++
>>>>>    1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>>>> index a5c6719..2543536 100644
>>>>> --- a/transport-pci.tex
>>>>> +++ b/transport-pci.tex
>>>>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>>>            le64 queue_desc;                /* read-write */
>>>>>            le64 queue_driver;              /* read-write */
>>>>>            le64 queue_device;              /* read-write */
>>>>> +        le16 freeze_mode;               /* read-write */
>>>>>            le16 queue_notif_config_data;   /* read-only for driver */
>>>>>            le16 queue_reset;               /* read-write */
>>>>>
>>>> we can't add fields in the middle of the structure like this -
>>>> offset of queue_notif_config_data and queue_reset changes.
>>>>
>>>>   
>>>>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>>>    \item[\field{queue_device}]
>>>>>            The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>>>>    +\item[\field{freeze_mode}]
>>>>> +        The driver writes this to set the freeze mode of virtio pci.
>>>>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>>>>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>>>>> +        Other values are reserved for future use, like S4, etc.
>>>>> +
>>>> we need to specify these values then.
>>>>
>>>> we also need
>>>> - feature bit to detect support for S3
>>>> - conformance statements documenting behavious under S3
>>>>
>>>>
>>>>>    \item[\field{queue_notif_config_data}]
>>>>>            This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>>>>            The driver will use this value when driver sends available buffer
>>>>> -- 
>>>>> 2.34.1
>>>> This publicly archived list offers a means to provide input to the
>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>
>>>> In order to verify user consent to the Feedback License terms and
>>>> to minimize spam in the list archive, subscription is required
>>>> before posting.
>>>>
>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>
> 
> 
> This publicly archived list offers a means to provide input to the
> OASIS Virtual I/O Device (VIRTIO) TC.
> 
> In order to verify user consent to the Feedback License terms and
> to minimize spam in the list archive, subscription is required
> before posting.
> 
> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
> List help: virtio-comment-help@lists.oasis-open.org
> List archive: https://lists.oasis-open.org/archives/virtio-comment/
> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
> Committee: https://www.oasis-open.org/committees/virtio/
> Join OASIS: https://www.oasis-open.org/join/
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:06           ` Zhu, Lingshan
  2023-09-20  7:10             ` Parav Pandit
@ 2023-09-20  7:24             ` Chen, Jiqian
  2023-09-20  7:30               ` Zhu, Lingshan
  1 sibling, 1 reply; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-20  7:24 UTC (permalink / raw)
  To: Zhu, Lingshan, Parav Pandit, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray,
	Chen, Jiqian

Hi Lingshan,
It seems you reply to the wrong email thread. They are not related to my patch.

On 2023/9/20 15:06, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 2:58 PM, Parav Pandit wrote:
>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>> Sent: Wednesday, September 20, 2023 12:03 PM
>>> If driver write 0 to reset device, can the SUSPEND bit be cleared?
>> It must as reset operation, resets everything else and so the suspend too.
>>
>>> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
>>>> virtio_reset_device)
>>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
>>> the reset request is from guest restore process or not, and then I can't change
>>> the reset behavior.
>> Reset should not be influenced by suspend.
>> Suspend should do the work of suspend and reset to do the reset.
>>
>> The problem to overcome in [1] is, resume operation needs to be synchronous as it involves large part of context to resume back, and hence just asynchronously setting DRIVER_OK is not enough.
>> The sw must verify back that device has resumed the operation and ready to answer requests.
> this is not live migration, all device status and other information still stay in the device, no need to "resume" context, just resume running.
> 
> Like resume from a failed LM.
>>
>> This is slightly different flow than setting the DRIVER_OK for the first time device initialization sequence as it does not involve large restoration.
>>
>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should clear the SUSPEND bit and verify that it is out of SUSPEND.
>>
>> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> Please read the spec, it says:
> The driver MUST NOT clear a device status bit
> 
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:10             ` Parav Pandit
@ 2023-09-20  7:27               ` Zhu, Lingshan
  2023-09-20  7:32                 ` Parav Pandit
  0 siblings, 1 reply; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  7:27 UTC (permalink / raw)
  To: Parav Pandit, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 3:10 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 20, 2023 12:37 PM
>>> The problem to overcome in [1] is, resume operation needs to be synchronous
>> as it involves large part of context to resume back, and hence just
>> asynchronously setting DRIVER_OK is not enough.
>>> The sw must verify back that device has resumed the operation and ready to
>> answer requests.
>> this is not live migration, all device status and other information still stay in the
>> device, no need to "resume" context, just resume running.
>>
> I am aware that it is not live migration. :)
>
> "Just resuming" involves lot of device setup task. The device implementation does not know for how long a device is suspended.
> So for example, a VM is suspended for 6 hours, hence the device context could be saved in a slow disk.
> Hence, when the resume is done, it needs to setup things again and driver got to verify before accessing more from the device.
The restore procedures should perform by the hypervisor and done before 
set DRIVER_OK and wake up the guest.
And the hypervisor/driver needs to check the device status by re-reading.
>   
>> Like resume from a failed LM.
>>> This is slightly different flow than setting the DRIVER_OK for the first time
>> device initialization sequence as it does not involve large restoration.
>>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver
>> should clear the SUSPEND bit and verify that it is out of SUSPEND.
>>> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
>> Please read the spec, it says:
>> The driver MUST NOT clear a device status bit
>>
> Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's synchronous new register..
so re-read
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:24             ` Chen, Jiqian
@ 2023-09-20  7:30               ` Zhu, Lingshan
  2023-09-20  7:35                 ` Parav Pandit
  0 siblings, 1 reply; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  7:30 UTC (permalink / raw)
  To: Chen, Jiqian, Parav Pandit, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 3:24 PM, Chen, Jiqian wrote:
> Hi Lingshan,
> It seems you reply to the wrong email thread. They are not related to my patch.
These reply to Parva's comments.
@Parva, if you want to discuss more about live migration, please reply 
in my thread, lets don't flood here.
>
> On 2023/9/20 15:06, Zhu, Lingshan wrote:
>>
>> On 9/20/2023 2:58 PM, Parav Pandit wrote:
>>>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>>>> Sent: Wednesday, September 20, 2023 12:03 PM
>>>> If driver write 0 to reset device, can the SUSPEND bit be cleared?
>>> It must as reset operation, resets everything else and so the suspend too.
>>>
>>>> (pci_pm_resume->virtio_pci_restore->virtio_device_restore-
>>>>> virtio_reset_device)
>>>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if
>>>> the reset request is from guest restore process or not, and then I can't change
>>>> the reset behavior.
>>> Reset should not be influenced by suspend.
>>> Suspend should do the work of suspend and reset to do the reset.
>>>
>>> The problem to overcome in [1] is, resume operation needs to be synchronous as it involves large part of context to resume back, and hence just asynchronously setting DRIVER_OK is not enough.
>>> The sw must verify back that device has resumed the operation and ready to answer requests.
>> this is not live migration, all device status and other information still stay in the device, no need to "resume" context, just resume running.
>>
>> Like resume from a failed LM.
>>> This is slightly different flow than setting the DRIVER_OK for the first time device initialization sequence as it does not involve large restoration.
>>>
>>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the driver should clear the SUSPEND bit and verify that it is out of SUSPEND.
>>>
>>> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
>> Please read the spec, it says:
>> The driver MUST NOT clear a device status bit
>>
>>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:27               ` Zhu, Lingshan
@ 2023-09-20  7:32                 ` Parav Pandit
  2023-09-20  7:45                   ` Zhu, Lingshan
  0 siblings, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  7:32 UTC (permalink / raw)
  To: Zhu, Lingshan, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 20, 2023 12:58 PM
> 
> On 9/20/2023 3:10 PM, Parav Pandit wrote:
> >> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> >> Sent: Wednesday, September 20, 2023 12:37 PM
> >>> The problem to overcome in [1] is, resume operation needs to be
> >>> synchronous
> >> as it involves large part of context to resume back, and hence just
> >> asynchronously setting DRIVER_OK is not enough.
> >>> The sw must verify back that device has resumed the operation and
> >>> ready to
> >> answer requests.
> >> this is not live migration, all device status and other information
> >> still stay in the device, no need to "resume" context, just resume running.
> >>
> > I am aware that it is not live migration. :)
> >
> > "Just resuming" involves lot of device setup task. The device implementation
> does not know for how long a device is suspended.
> > So for example, a VM is suspended for 6 hours, hence the device context
> could be saved in a slow disk.
> > Hence, when the resume is done, it needs to setup things again and driver got
> to verify before accessing more from the device.
> The restore procedures should perform by the hypervisor and done before set
> DRIVER_OK and wake up the guest.
Which is the signal to trigger the restore? Which is the trigger in physical device when there is no hypervisor?

In my view, setting the DRIVER_OK is the signal regardless of hypervisor or physical device.
Hence the re-read is must.

> And the hypervisor/driver needs to check the device status by re-reading.
> >
> >> Like resume from a failed LM.
> >>> This is slightly different flow than setting the DRIVER_OK for the
> >>> first time
> >> device initialization sequence as it does not involve large restoration.
> >>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the
> >>> driver
> >> should clear the SUSPEND bit and verify that it is out of SUSPEND.
> >>> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
> >> Please read the spec, it says:
> >> The driver MUST NOT clear a device status bit
> >>
> > Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's
> synchronous new register..
> so re-read
Yes. re-read until set, Thanks.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:30               ` Zhu, Lingshan
@ 2023-09-20  7:35                 ` Parav Pandit
  2023-09-20  7:47                   ` Zhu, Lingshan
  0 siblings, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  7:35 UTC (permalink / raw)
  To: Zhu, Lingshan, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 20, 2023 1:00 PM
> 
> On 9/20/2023 3:24 PM, Chen, Jiqian wrote:
> > Hi Lingshan,
> > It seems you reply to the wrong email thread. They are not related to my
> patch.
> These reply to Parva's comments.
> @Parva, if you want to discuss more about live migration, please reply in my
> thread, lets don't flood here.
You made the point that "this is not live migration".
I am not discussing live migration in this thread.

I replied for the point that device restoration from suspend for physical and hypevisor based device is not a 40nsec worth of work to restore by just doing a register write.
This is not live or device migration. This is restoring the device context initiated by the driver owning the device.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-comment] Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:17           ` [virtio-comment] " Chen, Jiqian
@ 2023-09-20  7:42             ` Zhu, Lingshan
  0 siblings, 0 replies; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  7:42 UTC (permalink / raw)
  To: Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 3:17 PM, Chen, Jiqian wrote:
> Hi Lingshan,
>
> On 2023/9/20 14:58, Zhu, Lingshan wrote:
>>
>> On 9/20/2023 2:33 PM, Chen, Jiqian wrote:
>>> Hi Lingshan,
>>>
>>> On 2023/9/20 13:59, Zhu, Lingshan wrote:
>>>> On 9/19/2023 8:31 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Sep 19, 2023 at 07:42:42PM +0800, Jiqian Chen wrote:
>>>>>> When guest vm does S3, Qemu will reset and clear some things of virtio
>>>>>> devices, but guest can't aware that, so that may cause some problems.
>>>>>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>>>>>> resume, that function will destroy render resources of virtio-gpu. As
>>>>>> a result, after guest resume, the display can't come back and we only
>>>>>> saw a black screen. Due to guest can't re-create all the resources, so
>>>>>> we need to let Qemu not to destroy them when S3.
>>>>>>
>>>>>> For above purpose, we need a mechanism that allows guests and QEMU to
>>>>>> negotiate their reset behavior. So this patch add a new parameter
>>>>>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>>>>>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>>>>>> devices can change their reset behavior on Qemu side according to
>>>>>> freeze_mode. What's more, freeze_mode can be used for all virtio
>>>>>> devices to affect the behavior of Qemu, not just virtio gpu device.
>>>> Hi Jiqian,
>>>>
>>>> Have you seen this series: [PATCH 0/5] virtio: introduce SUSPEND bit and vq state
>>>> https://lore.kernel.org/all/3f4cbf84-010c-cffa-0b70-33c449b5561b@intel.com/T/
>>>>
>>>> We introduced a bit in the device status SUSPEND, when VIRTIO_F_SUSPEND is
>>>> negotiated, the driver can set SUSPEND in the device status to suspend the
>>>> device.
>>>>
>>>> When SUSPEND, the device should pause its operations and preserve its configurations
>>>> in its configuration space.
>>>>
>>>> The driver re-write DRIVER_OK to clear SUSPEND, so the device resumes running.
>>>>
>>>> This is originally to serve live migration, but I think it can also meet your needs.
>>> I noticed your series, but I am not sure they are also meet my needs.
>>> If driver write 0 to reset device, can the SUSPEND bit be cleared? (pci_pm_resume->virtio_pci_restore->virtio_device_restore->virtio_reset_device)
>> if the driver writes 0, it resets all virtio functionalities. So SUSPEND is cleared.
> Then your patches are not meet my needs. In my scene, it needs to keep the SUSPEND bit util the resume process is complete.
> Because in my virtio-gpu scene, when guest resume, it call virtio_reset_device to clear all device status bits, and then reset virtio-gpu in Qemu, and then destroy render resources, I don't want the resources are destroyed during the resume process. So, I add freeze_mode to tell Qemu that guest is doing S3 and resources need to be kept.
When a guest set to S3, the hypervisor suspend the guest to RAM, and the 
passthrough-ed device are in low power state.
I am not sure the device can keep its status/context/data, maybe need to 
recover from RAM anyway.

So I suggest not reset the device, there need some code changes in QEMU
When set to S3, SUSPEND the device, then suspend to RAM
When resume from S3, restore from RAM and set DRIVER_OK to resume running.
>
>> device reset can also be used to recover the device from fatal errors, so it should reset everything in virtio.
>>> If SUSPEND is cleared, then during the reset process in Qemu, I can't judge if the reset request is from guest restore process or not, and then I can't change the reset behavior.
>> I think when enter S3, the hypervisor/driver should set SUSPEND to the device. And when resume from S3, the hypervisor/driver should
>> re-write DRIVER_OK to clear SUSPEND, then the device resume running.
>>> Can you send me your patch link on kernel and qemu side? I will take a deep look.
>> There are no patches for qemu/kernel yet, spec first.
>>>> Thanks,
>>>> Zhu Lingshan
>>>>>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>>>>>> ---
>>>>>>     transport-pci.tex | 7 +++++++
>>>>>>     1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/transport-pci.tex b/transport-pci.tex
>>>>>> index a5c6719..2543536 100644
>>>>>> --- a/transport-pci.tex
>>>>>> +++ b/transport-pci.tex
>>>>>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>>>>             le64 queue_desc;                /* read-write */
>>>>>>             le64 queue_driver;              /* read-write */
>>>>>>             le64 queue_device;              /* read-write */
>>>>>> +        le16 freeze_mode;               /* read-write */
>>>>>>             le16 queue_notif_config_data;   /* read-only for driver */
>>>>>>             le16 queue_reset;               /* read-write */
>>>>>>
>>>>> we can't add fields in the middle of the structure like this -
>>>>> offset of queue_notif_config_data and queue_reset changes.
>>>>>
>>>>>    
>>>>>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>>>>>     \item[\field{queue_device}]
>>>>>>             The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>>>>>     +\item[\field{freeze_mode}]
>>>>>> +        The driver writes this to set the freeze mode of virtio pci.
>>>>>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>>>>>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>>>>>> +        Other values are reserved for future use, like S4, etc.
>>>>>> +
>>>>> we need to specify these values then.
>>>>>
>>>>> we also need
>>>>> - feature bit to detect support for S3
>>>>> - conformance statements documenting behavious under S3
>>>>>
>>>>>
>>>>>>     \item[\field{queue_notif_config_data}]
>>>>>>             This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>>>>>             The driver will use this value when driver sends available buffer
>>>>>> -- 
>>>>>> 2.34.1
>>>>> This publicly archived list offers a means to provide input to the
>>>>> OASIS Virtual I/O Device (VIRTIO) TC.
>>>>>
>>>>> In order to verify user consent to the Feedback License terms and
>>>>> to minimize spam in the list archive, subscription is required
>>>>> before posting.
>>>>>
>>>>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>>>>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>>>>> List help: virtio-comment-help@lists.oasis-open.org
>>>>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>>>>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>>>>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>>>>> Committee: https://www.oasis-open.org/committees/virtio/
>>>>> Join OASIS: https://www.oasis-open.org/join/
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
>>>> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
>>>>
>>
>> This publicly archived list offers a means to provide input to the
>> OASIS Virtual I/O Device (VIRTIO) TC.
>>
>> In order to verify user consent to the Feedback License terms and
>> to minimize spam in the list archive, subscription is required
>> before posting.
>>
>> Subscribe: virtio-comment-subscribe@lists.oasis-open.org
>> Unsubscribe: virtio-comment-unsubscribe@lists.oasis-open.org
>> List help: virtio-comment-help@lists.oasis-open.org
>> List archive: https://lists.oasis-open.org/archives/virtio-comment/
>> Feedback License: https://www.oasis-open.org/who/ipr/feedback_license.pdf
>> List Guidelines: https://www.oasis-open.org/policies-guidelines/mailing-lists
>> Committee: https://www.oasis-open.org/committees/virtio/
>> Join OASIS: https://www.oasis-open.org/join/
>>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:32                 ` Parav Pandit
@ 2023-09-20  7:45                   ` Zhu, Lingshan
  2023-09-20  7:47                     ` Parav Pandit
  0 siblings, 1 reply; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  7:45 UTC (permalink / raw)
  To: Parav Pandit, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 3:32 PM, Parav Pandit wrote:
>
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 20, 2023 12:58 PM
>>
>> On 9/20/2023 3:10 PM, Parav Pandit wrote:
>>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>>> Sent: Wednesday, September 20, 2023 12:37 PM
>>>>> The problem to overcome in [1] is, resume operation needs to be
>>>>> synchronous
>>>> as it involves large part of context to resume back, and hence just
>>>> asynchronously setting DRIVER_OK is not enough.
>>>>> The sw must verify back that device has resumed the operation and
>>>>> ready to
>>>> answer requests.
>>>> this is not live migration, all device status and other information
>>>> still stay in the device, no need to "resume" context, just resume running.
>>>>
>>> I am aware that it is not live migration. :)
>>>
>>> "Just resuming" involves lot of device setup task. The device implementation
>> does not know for how long a device is suspended.
>>> So for example, a VM is suspended for 6 hours, hence the device context
>> could be saved in a slow disk.
>>> Hence, when the resume is done, it needs to setup things again and driver got
>> to verify before accessing more from the device.
>> The restore procedures should perform by the hypervisor and done before set
>> DRIVER_OK and wake up the guest.
> Which is the signal to trigger the restore? Which is the trigger in physical device when there is no hypervisor?
>
> In my view, setting the DRIVER_OK is the signal regardless of hypervisor or physical device.
> Hence the re-read is must.
Yes, as I said below, should verify by re-read.
>
>> And the hypervisor/driver needs to check the device status by re-reading.
>>>> Like resume from a failed LM.
>>>>> This is slightly different flow than setting the DRIVER_OK for the
>>>>> first time
>>>> device initialization sequence as it does not involve large restoration.
>>>>> So, to merge two ideas, instead of doing DRIVER_OK to resume, the
>>>>> driver
>>>> should clear the SUSPEND bit and verify that it is out of SUSPEND.
>>>>> Because driver is still in _OK_ driving the device flipping the SUSPEND bit.
>>>> Please read the spec, it says:
>>>> The driver MUST NOT clear a device status bit
>>>>
>>> Yes, this is why either DRIER_OK validation by the driver is needed or Jiqian's
>> synchronous new register..
>> so re-read
> Yes. re-read until set, Thanks.
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:35                 ` Parav Pandit
@ 2023-09-20  7:47                   ` Zhu, Lingshan
  2023-09-20  7:51                     ` Parav Pandit
  2023-09-20  7:53                     ` Chen, Jiqian
  0 siblings, 2 replies; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  7:47 UTC (permalink / raw)
  To: Parav Pandit, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 3:35 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 20, 2023 1:00 PM
>>
>> On 9/20/2023 3:24 PM, Chen, Jiqian wrote:
>>> Hi Lingshan,
>>> It seems you reply to the wrong email thread. They are not related to my
>> patch.
>> These reply to Parva's comments.
>> @Parva, if you want to discuss more about live migration, please reply in my
>> thread, lets don't flood here.
> You made the point that "this is not live migration".
> I am not discussing live migration in this thread.
>
> I replied for the point that device restoration from suspend for physical and hypevisor based device is not a 40nsec worth of work to restore by just doing a register write.
> This is not live or device migration. This is restoring the device context initiated by the driver owning the device.
restore the device context should be done by the hypervisor before 
setting DRIVER_OK and waking up the guest, not a concern here, out of spec



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:45                   ` Zhu, Lingshan
@ 2023-09-20  7:47                     ` Parav Pandit
  0 siblings, 0 replies; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  7:47 UTC (permalink / raw)
  To: Zhu, Lingshan, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 20, 2023 1:16 PM

[..]
> > In my view, setting the DRIVER_OK is the signal regardless of hypervisor or
> physical device.
> > Hence the re-read is must.
> Yes, as I said below, should verify by re-read.
> >
Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:47                   ` Zhu, Lingshan
@ 2023-09-20  7:51                     ` Parav Pandit
  2023-09-20  7:55                       ` Zhu, Lingshan
  2023-09-20  7:53                     ` Chen, Jiqian
  1 sibling, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  7:51 UTC (permalink / raw)
  To: Zhu, Lingshan, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray


> From: Zhu, Lingshan <lingshan.zhu@intel.com>
> Sent: Wednesday, September 20, 2023 1:17 PM

> > This is not live or device migration. This is restoring the device context
> initiated by the driver owning the device.
> restore the device context should be done by the hypervisor before setting
> DRIVER_OK and waking up the guest, not a concern here, out of spec

The framework is generic for the PCI devices hence, there may not be any hypervisor at all. Hence restore operation to trigger on DRIVER_OK setting, when previously suspended.

Since we already agree in previous email that re-read until device sets the DRIVER_OK, its fine to me. It covers the above restore condition.

Thanks.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:47                   ` Zhu, Lingshan
  2023-09-20  7:51                     ` Parav Pandit
@ 2023-09-20  7:53                     ` Chen, Jiqian
  2023-09-20  7:56                       ` Parav Pandit
  1 sibling, 1 reply; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-20  7:53 UTC (permalink / raw)
  To: Zhu, Lingshan, Parav Pandit, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray

Hi Lingshan,
Please reply to your own email thread, below are not related to my patches. Thanks a lot.

On 2023/9/20 15:47, Zhu, Lingshan wrote:
> 
> 
> On 9/20/2023 3:35 PM, Parav Pandit wrote:
>>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>>> Sent: Wednesday, September 20, 2023 1:00 PM
>>>
>>> On 9/20/2023 3:24 PM, Chen, Jiqian wrote:
>>>> Hi Lingshan,
>>>> It seems you reply to the wrong email thread. They are not related to my
>>> patch.
>>> These reply to Parva's comments.
>>> @Parva, if you want to discuss more about live migration, please reply in my
>>> thread, lets don't flood here.
>> You made the point that "this is not live migration".
>> I am not discussing live migration in this thread.
>>
>> I replied for the point that device restoration from suspend for physical and hypevisor based device is not a 40nsec worth of work to restore by just doing a register write.
>> This is not live or device migration. This is restoring the device context initiated by the driver owning the device.
> restore the device context should be done by the hypervisor before setting DRIVER_OK and waking up the guest, not a concern here, out of spec
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:51                     ` Parav Pandit
@ 2023-09-20  7:55                       ` Zhu, Lingshan
  0 siblings, 0 replies; 32+ messages in thread
From: Zhu, Lingshan @ 2023-09-20  7:55 UTC (permalink / raw)
  To: Parav Pandit, Chen, Jiqian, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 9/20/2023 3:51 PM, Parav Pandit wrote:
>> From: Zhu, Lingshan <lingshan.zhu@intel.com>
>> Sent: Wednesday, September 20, 2023 1:17 PM
>>> This is not live or device migration. This is restoring the device context
>> initiated by the driver owning the device.
>> restore the device context should be done by the hypervisor before setting
>> DRIVER_OK and waking up the guest, not a concern here, out of spec
> The framework is generic for the PCI devices hence, there may not be any hypervisor at all. Hence restore operation to trigger on DRIVER_OK setting, when previously suspended.
>
> Since we already agree in previous email that re-read until device sets the DRIVER_OK, its fine to me. It covers the above restore condition.
OK
>
> Thanks.
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:53                     ` Chen, Jiqian
@ 2023-09-20  7:56                       ` Parav Pandit
  2023-09-20  8:18                         ` [virtio-comment] " Chen, Jiqian
  0 siblings, 1 reply; 32+ messages in thread
From: Parav Pandit @ 2023-09-20  7:56 UTC (permalink / raw)
  To: Chen, Jiqian, Zhu, Lingshan, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray

Hi Jiquian,

> From: Chen, Jiqian <Jiqian.Chen@amd.com>
> Sent: Wednesday, September 20, 2023 1:24 PM
> 
> Hi Lingshan,
> Please reply to your own email thread, below are not related to my patches.
> Thanks a lot.

They are related to your patch.
 Both the patches have overlapping functionalities.

You probably missed my previous response explaining the same at [1].

[1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00255.html


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-comment] RE: [virtio-dev] Re: [virtio-comment] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-20  7:56                       ` Parav Pandit
@ 2023-09-20  8:18                         ` Chen, Jiqian
  0 siblings, 0 replies; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-20  8:18 UTC (permalink / raw)
  To: Parav Pandit, Chen, Jiqian, Zhu, Lingshan, Michael S. Tsirkin
  Cc: Gerd Hoffmann, Jason Wang, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, virtio-comment,
	virtio-dev, qemu-devel, linux-kernel, Stefano Stabellini,
	Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray



On 2023/9/20 15:56, Parav Pandit wrote:
> Hi Jiquian,
> 
>> From: Chen, Jiqian <Jiqian.Chen@amd.com>
>> Sent: Wednesday, September 20, 2023 1:24 PM
>>
>> Hi Lingshan,
>> Please reply to your own email thread, below are not related to my patches.
>> Thanks a lot.
> 
> They are related to your patch.
>  Both the patches have overlapping functionalities.
But Lingshan's patch is not meet my need. It clears the SUSPEND bit during reset.

> 
> You probably missed my previous response explaining the same at [1].
> 
> [1] https://lists.oasis-open.org/archives/virtio-comment/202309/msg00255.html
Yes, I didn't receive this response. 
After reading your responses, I think your concerns are below:
> The point is when driver tells to freeze, it is freeze command and not reset.
> So resume() should not invoke device_reset() when FREEZE+RESUME supported.
The modifications in my Qemu and kernel patches, I just set freeze_mode to be S3, and then when guest resume I can change the reset behavior according the S3 mode, see below callstack:
pci_pm_resume
	virtio_pci_restore
		virtio_device_restore
			virtio_reset_device(set 0 the device status and then call virtio_reset to reset device)
			drv->restore(virtio_gpu_restore)
So, reset will be done during the restore process(resume invoke the reset), and I want to affect the reset behavior during the restore process.

> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-19 11:42 ` [VIRTIO PCI PATCH v5 1/1] transport-pci: " Jiqian Chen
  2023-09-19 12:10   ` Parav Pandit
  2023-09-19 12:31   ` Michael S. Tsirkin
@ 2023-09-21  4:22   ` Jason Wang
  2023-09-21  6:28     ` [virtio-dev] " Chen, Jiqian
  2 siblings, 1 reply; 32+ messages in thread
From: Jason Wang @ 2023-09-21  4:22 UTC (permalink / raw)
  To: Jiqian Chen
  Cc: Gerd Hoffmann, Michael S . Tsirkin, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Alex Deucher, Christian Koenig, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Huang Rui

On Tue, Sep 19, 2023 at 7:43 PM Jiqian Chen <Jiqian.Chen@amd.com> wrote:
>
> When guest vm does S3, Qemu will reset and clear some things of virtio
> devices, but guest can't aware that, so that may cause some problems.
> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
> resume, that function will destroy render resources of virtio-gpu. As
> a result, after guest resume, the display can't come back and we only
> saw a black screen. Due to guest can't re-create all the resources, so
> we need to let Qemu not to destroy them when S3.
>
> For above purpose, we need a mechanism that allows guests and QEMU to
> negotiate their reset behavior. So this patch add a new parameter
> named freeze_mode to struct virtio_pci_common_cfg. And when guest
> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
> devices can change their reset behavior on Qemu side according to
> freeze_mode. What's more, freeze_mode can be used for all virtio
> devices to affect the behavior of Qemu, not just virtio gpu device.

A simple question, why is this issue specific to pci?

Thanks


>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  transport-pci.tex | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/transport-pci.tex b/transport-pci.tex
> index a5c6719..2543536 100644
> --- a/transport-pci.tex
> +++ b/transport-pci.tex
> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>          le64 queue_desc;                /* read-write */
>          le64 queue_driver;              /* read-write */
>          le64 queue_device;              /* read-write */
> +        le16 freeze_mode;               /* read-write */
>          le16 queue_notif_config_data;   /* read-only for driver */
>          le16 queue_reset;               /* read-write */
>
> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>  \item[\field{queue_device}]
>          The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>
> +\item[\field{freeze_mode}]
> +        The driver writes this to set the freeze mode of virtio pci.
> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
> +        Other values are reserved for future use, like S4, etc.
> +
>  \item[\field{queue_notif_config_data}]
>          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>          The driver will use this value when driver sends available buffer
> --
> 2.34.1
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-21  4:22   ` Jason Wang
@ 2023-09-21  6:28     ` Chen, Jiqian
  2023-09-22  3:17       ` Jason Wang
  0 siblings, 1 reply; 32+ messages in thread
From: Chen, Jiqian @ 2023-09-21  6:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: Gerd Hoffmann, Michael S . Tsirkin, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray,
	Chen, Jiqian

Hi Jason,

On 2023/9/21 12:22, Jason Wang wrote:
> On Tue, Sep 19, 2023 at 7:43 PM Jiqian Chen <Jiqian.Chen@amd.com> wrote:
>>
>> When guest vm does S3, Qemu will reset and clear some things of virtio
>> devices, but guest can't aware that, so that may cause some problems.
>> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
>> resume, that function will destroy render resources of virtio-gpu. As
>> a result, after guest resume, the display can't come back and we only
>> saw a black screen. Due to guest can't re-create all the resources, so
>> we need to let Qemu not to destroy them when S3.
>>
>> For above purpose, we need a mechanism that allows guests and QEMU to
>> negotiate their reset behavior. So this patch add a new parameter
>> named freeze_mode to struct virtio_pci_common_cfg. And when guest
>> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
>> devices can change their reset behavior on Qemu side according to
>> freeze_mode. What's more, freeze_mode can be used for all virtio
>> devices to affect the behavior of Qemu, not just virtio gpu device.
> 
> A simple question, why is this issue specific to pci?
I thought you possibly missed the previous version patches. At the beginning, I just wanted to add a new feature flag VIRTIO_GPU_F_FREEZE_S3 for virtio-gpu since I encountered virtio-gpu issue during guest S3, so that the guest and qemu can negotiate and change the reset behavior during S3. But Parav and Mikhail hoped me can improve the feature to a pci level, then other virtio devices can also benefit from it. Although I am not sure if expanding its influence is appropriate, I have not received any feedback from others, so I change it to the pci level and made this version.
If you are interested, please see the previous version: https://lists.oasis-open.org/archives/virtio-comment/202307/msg00209.html, thank you.

> 
> Thanks
> 
> 
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  transport-pci.tex | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/transport-pci.tex b/transport-pci.tex
>> index a5c6719..2543536 100644
>> --- a/transport-pci.tex
>> +++ b/transport-pci.tex
>> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>          le64 queue_desc;                /* read-write */
>>          le64 queue_driver;              /* read-write */
>>          le64 queue_device;              /* read-write */
>> +        le16 freeze_mode;               /* read-write */
>>          le16 queue_notif_config_data;   /* read-only for driver */
>>          le16 queue_reset;               /* read-write */
>>
>> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
>>  \item[\field{queue_device}]
>>          The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
>>
>> +\item[\field{freeze_mode}]
>> +        The driver writes this to set the freeze mode of virtio pci.
>> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
>> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
>> +        Other values are reserved for future use, like S4, etc.
>> +
>>  \item[\field{queue_notif_config_data}]
>>          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
>>          The driver will use this value when driver sends available buffer
>> --
>> 2.34.1
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [virtio-dev] Re: [VIRTIO PCI PATCH v5 1/1] transport-pci: Add freeze_mode to virtio_pci_common_cfg
  2023-09-21  6:28     ` [virtio-dev] " Chen, Jiqian
@ 2023-09-22  3:17       ` Jason Wang
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Wang @ 2023-09-22  3:17 UTC (permalink / raw)
  To: Chen, Jiqian
  Cc: Gerd Hoffmann, Michael S . Tsirkin, Xuan Zhuo, David Airlie,
	Gurchetan Singh, Chia-I Wu, Marc-André Lureau,
	Robert Beckett, Mikhail Golubev-Ciuchea, Parav Pandit,
	virtio-comment, virtio-dev, qemu-devel, linux-kernel,
	Stefano Stabellini, Roger Pau Monné,
	Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart,
	Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Huang, Ray

On Thu, Sep 21, 2023 at 2:28 PM Chen, Jiqian <Jiqian.Chen@amd.com> wrote:
>
> Hi Jason,
>
> On 2023/9/21 12:22, Jason Wang wrote:
> > On Tue, Sep 19, 2023 at 7:43 PM Jiqian Chen <Jiqian.Chen@amd.com> wrote:
> >>
> >> When guest vm does S3, Qemu will reset and clear some things of virtio
> >> devices, but guest can't aware that, so that may cause some problems.
> >> For excample, Qemu calls virtio_reset->virtio_gpu_gl_reset when guest
> >> resume, that function will destroy render resources of virtio-gpu. As
> >> a result, after guest resume, the display can't come back and we only
> >> saw a black screen. Due to guest can't re-create all the resources, so
> >> we need to let Qemu not to destroy them when S3.
> >>
> >> For above purpose, we need a mechanism that allows guests and QEMU to
> >> negotiate their reset behavior. So this patch add a new parameter
> >> named freeze_mode to struct virtio_pci_common_cfg. And when guest
> >> suspends, it can write freeze_mode to be FREEZE_S3, and then virtio
> >> devices can change their reset behavior on Qemu side according to
> >> freeze_mode. What's more, freeze_mode can be used for all virtio
> >> devices to affect the behavior of Qemu, not just virtio gpu device.
> >
> > A simple question, why is this issue specific to pci?
> I thought you possibly missed the previous version patches. At the beginning, I just wanted to add a new feature flag VIRTIO_GPU_F_FREEZE_S3 for virtio-gpu since I encountered virtio-gpu issue during guest S3, so that the guest and qemu can negotiate and change the reset behavior during S3. But Parav and Mikhail hoped me can improve the feature to a pci level, then other virtio devices can also benefit from it. Although I am not sure if expanding its influence is appropriate, I have not received any feedback from others, so I change it to the pci level and made this version.
> If you are interested, please see the previous version: https://lists.oasis-open.org/archives/virtio-comment/202307/msg00209.html, thank you.

This is not a good answer. Let me ask you differently, why don't you
see it in other forms of transport like virtio-gpu-mmio?

Thanks

>
> >
> > Thanks
> >
> >
> >>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >>  transport-pci.tex | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> diff --git a/transport-pci.tex b/transport-pci.tex
> >> index a5c6719..2543536 100644
> >> --- a/transport-pci.tex
> >> +++ b/transport-pci.tex
> >> @@ -319,6 +319,7 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >>          le64 queue_desc;                /* read-write */
> >>          le64 queue_driver;              /* read-write */
> >>          le64 queue_device;              /* read-write */
> >> +        le16 freeze_mode;               /* read-write */
> >>          le16 queue_notif_config_data;   /* read-only for driver */
> >>          le16 queue_reset;               /* read-write */
> >>
> >> @@ -393,6 +394,12 @@ \subsubsection{Common configuration structure layout}\label{sec:Virtio Transport
> >>  \item[\field{queue_device}]
> >>          The driver writes the physical address of Device Area here.  See section \ref{sec:Basic Facilities of a Virtio Device / Virtqueues}.
> >>
> >> +\item[\field{freeze_mode}]
> >> +        The driver writes this to set the freeze mode of virtio pci.
> >> +        VIRTIO_PCI_FREEZE_MODE_UNFREEZE - virtio-pci is running;
> >> +        VIRTIO_PCI_FREEZE_MODE_FREEZE_S3 - guest vm is doing S3, and virtio-pci enters S3 suspension;
> >> +        Other values are reserved for future use, like S4, etc.
> >> +
> >>  \item[\field{queue_notif_config_data}]
> >>          This field exists only if VIRTIO_F_NOTIF_CONFIG_DATA has been negotiated.
> >>          The driver will use this value when driver sends available buffer
> >> --
> >> 2.34.1
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >
>
> --
> Best regards,
> Jiqian Chen.



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-09-22  3:18 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-19 11:42 [VIRTIO PCI PATCH v5 0/1] Add freeze_mode to virtio_pci_common_cfg Jiqian Chen
2023-09-19 11:42 ` [VIRTIO PCI PATCH v5 1/1] transport-pci: " Jiqian Chen
2023-09-19 12:10   ` Parav Pandit
2023-09-19 14:09     ` Michael S. Tsirkin
2023-09-20  3:57     ` [virtio-dev] " Chen, Jiqian
2023-09-20  4:10       ` Parav Pandit
2023-09-19 12:31   ` Michael S. Tsirkin
2023-09-20  4:56     ` Chen, Jiqian
2023-09-20  5:59     ` [virtio-comment] " Zhu, Lingshan
2023-09-20  6:33       ` [virtio-dev] " Chen, Jiqian
2023-09-20  6:58         ` Parav Pandit
2023-09-20  7:06           ` Zhu, Lingshan
2023-09-20  7:10             ` Parav Pandit
2023-09-20  7:27               ` Zhu, Lingshan
2023-09-20  7:32                 ` Parav Pandit
2023-09-20  7:45                   ` Zhu, Lingshan
2023-09-20  7:47                     ` Parav Pandit
2023-09-20  7:24             ` Chen, Jiqian
2023-09-20  7:30               ` Zhu, Lingshan
2023-09-20  7:35                 ` Parav Pandit
2023-09-20  7:47                   ` Zhu, Lingshan
2023-09-20  7:51                     ` Parav Pandit
2023-09-20  7:55                       ` Zhu, Lingshan
2023-09-20  7:53                     ` Chen, Jiqian
2023-09-20  7:56                       ` Parav Pandit
2023-09-20  8:18                         ` [virtio-comment] " Chen, Jiqian
2023-09-20  6:58         ` Zhu, Lingshan
2023-09-20  7:17           ` [virtio-comment] " Chen, Jiqian
2023-09-20  7:42             ` Zhu, Lingshan
2023-09-21  4:22   ` Jason Wang
2023-09-21  6:28     ` [virtio-dev] " Chen, Jiqian
2023-09-22  3:17       ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).