All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
@ 2019-03-14 13:17 Oleksandr Andrushchenko
  2019-03-14 13:50 ` [PATCH] " Oleksandr Andrushchenko
                   ` (3 more replies)
  0 siblings, 4 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 13:17 UTC (permalink / raw)
  To: netdev, xen-devel, linux-kernel, jgross, boris.ostrovsky,
	sstabellini, davem
  Cc: andr2000, Oleksandr Andrushchenko

From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>

Currently on driver resume we remove all the network queues and
destroy shared Tx/Rx rings leaving the driver in its current state
and never signaling the backend of this frontend's state change.
This leads to the number of consequences:
- when frontend withdraws granted references to the rings etc. it cannot
  be cleanly done as the backend still holds those (it was not told to
  free the resources)
- it is not possible to resume driver operation as all the communication
  means with the backned were destroyed by the frontend, thus
  making the frontend appear to the guest OS as functional, but
  not really.

Fix this by not destroying communication channels/rings on driver
resume.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
---
 drivers/net/xen-netfront.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c914c24f880b..2ca162048da4 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1422,22 +1422,6 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 	}
 }
 
-/**
- * We are reconnecting to the backend, due to a suspend/resume, or a backend
- * driver restart.  We tear down our netif structure and recreate it, but
- * leave the device-layer structures intact so that this is transparent to the
- * rest of the kernel.
- */
-static int netfront_resume(struct xenbus_device *dev)
-{
-	struct netfront_info *info = dev_get_drvdata(&dev->dev);
-
-	dev_dbg(&dev->dev, "%s\n", dev->nodename);
-
-	xennet_disconnect_backend(info);
-	return 0;
-}
-
 static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
 {
 	char *s, *e, *macstr;
@@ -2185,7 +2169,6 @@ static struct xenbus_driver netfront_driver = {
 	.ids = netfront_ids,
 	.probe = netfront_probe,
 	.remove = xennet_remove,
-	.resume = netfront_resume,
 	.otherend_changed = netback_changed,
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 13:17 [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback Oleksandr Andrushchenko
  2019-03-14 13:50 ` [PATCH] " Oleksandr Andrushchenko
@ 2019-03-14 13:50 ` Oleksandr Andrushchenko
  2019-03-14 14:47 ` Boris Ostrovsky
  2019-03-14 14:47 ` Boris Ostrovsky
  3 siblings, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 13:50 UTC (permalink / raw)
  To: netdev, xen-devel, linux-kernel, jgross, boris.ostrovsky,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko

I am still confused about suspend/resume for the front drivers,
for instance, block front [1] does implement full/proper reconnect
on .resume, but net front only does this partially.

Could anyone please shed some light on suspend/resume design?

Thank you,
Oleksandr

On 3/14/19 3:17 PM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Currently on driver resume we remove all the network queues and
> destroy shared Tx/Rx rings leaving the driver in its current state
> and never signaling the backend of this frontend's state change.
> This leads to the number of consequences:
> - when frontend withdraws granted references to the rings etc. it cannot
>    be cleanly done as the backend still holds those (it was not told to
>    free the resources)
> - it is not possible to resume driver operation as all the communication
>    means with the backned were destroyed by the frontend, thus
>    making the frontend appear to the guest OS as functional, but
>    not really.
>
> Fix this by not destroying communication channels/rings on driver
> resume.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>   drivers/net/xen-netfront.c | 17 -----------------
>   1 file changed, 17 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c914c24f880b..2ca162048da4 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1422,22 +1422,6 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>   	}
>   }
>   
> -/**
> - * We are reconnecting to the backend, due to a suspend/resume, or a backend
> - * driver restart.  We tear down our netif structure and recreate it, but
> - * leave the device-layer structures intact so that this is transparent to the
> - * rest of the kernel.
> - */
> -static int netfront_resume(struct xenbus_device *dev)
> -{
> -	struct netfront_info *info = dev_get_drvdata(&dev->dev);
> -
> -	dev_dbg(&dev->dev, "%s\n", dev->nodename);
> -
> -	xennet_disconnect_backend(info);
> -	return 0;
> -}
> -
>   static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
>   {
>   	char *s, *e, *macstr;
> @@ -2185,7 +2169,6 @@ static struct xenbus_driver netfront_driver = {
>   	.ids = netfront_ids,
>   	.probe = netfront_probe,
>   	.remove = xennet_remove,
> -	.resume = netfront_resume,
>   	.otherend_changed = netback_changed,
>   };
>   

[1] 
https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 13:17 [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback Oleksandr Andrushchenko
@ 2019-03-14 13:50 ` Oleksandr Andrushchenko
  2019-03-14 13:50 ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 13:50 UTC (permalink / raw)
  To: netdev, xen-devel, linux-kernel, jgross, boris.ostrovsky,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko

I am still confused about suspend/resume for the front drivers,
for instance, block front [1] does implement full/proper reconnect
on .resume, but net front only does this partially.

Could anyone please shed some light on suspend/resume design?

Thank you,
Oleksandr

On 3/14/19 3:17 PM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Currently on driver resume we remove all the network queues and
> destroy shared Tx/Rx rings leaving the driver in its current state
> and never signaling the backend of this frontend's state change.
> This leads to the number of consequences:
> - when frontend withdraws granted references to the rings etc. it cannot
>    be cleanly done as the backend still holds those (it was not told to
>    free the resources)
> - it is not possible to resume driver operation as all the communication
>    means with the backned were destroyed by the frontend, thus
>    making the frontend appear to the guest OS as functional, but
>    not really.
>
> Fix this by not destroying communication channels/rings on driver
> resume.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>   drivers/net/xen-netfront.c | 17 -----------------
>   1 file changed, 17 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c914c24f880b..2ca162048da4 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1422,22 +1422,6 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>   	}
>   }
>   
> -/**
> - * We are reconnecting to the backend, due to a suspend/resume, or a backend
> - * driver restart.  We tear down our netif structure and recreate it, but
> - * leave the device-layer structures intact so that this is transparent to the
> - * rest of the kernel.
> - */
> -static int netfront_resume(struct xenbus_device *dev)
> -{
> -	struct netfront_info *info = dev_get_drvdata(&dev->dev);
> -
> -	dev_dbg(&dev->dev, "%s\n", dev->nodename);
> -
> -	xennet_disconnect_backend(info);
> -	return 0;
> -}
> -
>   static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
>   {
>   	char *s, *e, *macstr;
> @@ -2185,7 +2169,6 @@ static struct xenbus_driver netfront_driver = {
>   	.ids = netfront_ids,
>   	.probe = netfront_probe,
>   	.remove = xennet_remove,
> -	.resume = netfront_resume,
>   	.otherend_changed = netback_changed,
>   };
>   

[1] 
https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 13:17 [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback Oleksandr Andrushchenko
  2019-03-14 13:50 ` [PATCH] " Oleksandr Andrushchenko
  2019-03-14 13:50 ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
@ 2019-03-14 14:47 ` Boris Ostrovsky
  2019-03-14 14:52   ` [PATCH] " Oleksandr Andrushchenko
  2019-03-14 14:52   ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
  2019-03-14 14:47 ` Boris Ostrovsky
  3 siblings, 2 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 14:47 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko

On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Currently on driver resume we remove all the network queues and
> destroy shared Tx/Rx rings leaving the driver in its current state
> and never signaling the backend of this frontend's state change.
> This leads to the number of consequences:
> - when frontend withdraws granted references to the rings etc. it cannot
>   be cleanly done as the backend still holds those (it was not told to
>   free the resources)
> - it is not possible to resume driver operation as all the communication
>   means with the backned were destroyed by the frontend, thus
>   making the frontend appear to the guest OS as functional, but
>   not really.


What do you mean? Are you saying that after resume you lose connectivity?

-boris


>
> Fix this by not destroying communication channels/rings on driver
> resume.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>  drivers/net/xen-netfront.c | 17 -----------------
>  1 file changed, 17 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c914c24f880b..2ca162048da4 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1422,22 +1422,6 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>  	}
>  }
>  
> -/**
> - * We are reconnecting to the backend, due to a suspend/resume, or a backend
> - * driver restart.  We tear down our netif structure and recreate it, but
> - * leave the device-layer structures intact so that this is transparent to the
> - * rest of the kernel.
> - */
> -static int netfront_resume(struct xenbus_device *dev)
> -{
> -	struct netfront_info *info = dev_get_drvdata(&dev->dev);
> -
> -	dev_dbg(&dev->dev, "%s\n", dev->nodename);
> -
> -	xennet_disconnect_backend(info);
> -	return 0;
> -}
> -
>  static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
>  {
>  	char *s, *e, *macstr;
> @@ -2185,7 +2169,6 @@ static struct xenbus_driver netfront_driver = {
>  	.ids = netfront_ids,
>  	.probe = netfront_probe,
>  	.remove = xennet_remove,
> -	.resume = netfront_resume,
>  	.otherend_changed = netback_changed,
>  };
>  


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 13:17 [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback Oleksandr Andrushchenko
                   ` (2 preceding siblings ...)
  2019-03-14 14:47 ` Boris Ostrovsky
@ 2019-03-14 14:47 ` Boris Ostrovsky
  3 siblings, 0 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 14:47 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko

On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Currently on driver resume we remove all the network queues and
> destroy shared Tx/Rx rings leaving the driver in its current state
> and never signaling the backend of this frontend's state change.
> This leads to the number of consequences:
> - when frontend withdraws granted references to the rings etc. it cannot
>   be cleanly done as the backend still holds those (it was not told to
>   free the resources)
> - it is not possible to resume driver operation as all the communication
>   means with the backned were destroyed by the frontend, thus
>   making the frontend appear to the guest OS as functional, but
>   not really.


What do you mean? Are you saying that after resume you lose connectivity?

-boris


>
> Fix this by not destroying communication channels/rings on driver
> resume.
>
> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> ---
>  drivers/net/xen-netfront.c | 17 -----------------
>  1 file changed, 17 deletions(-)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index c914c24f880b..2ca162048da4 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -1422,22 +1422,6 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>  	}
>  }
>  
> -/**
> - * We are reconnecting to the backend, due to a suspend/resume, or a backend
> - * driver restart.  We tear down our netif structure and recreate it, but
> - * leave the device-layer structures intact so that this is transparent to the
> - * rest of the kernel.
> - */
> -static int netfront_resume(struct xenbus_device *dev)
> -{
> -	struct netfront_info *info = dev_get_drvdata(&dev->dev);
> -
> -	dev_dbg(&dev->dev, "%s\n", dev->nodename);
> -
> -	xennet_disconnect_backend(info);
> -	return 0;
> -}
> -
>  static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
>  {
>  	char *s, *e, *macstr;
> @@ -2185,7 +2169,6 @@ static struct xenbus_driver netfront_driver = {
>  	.ids = netfront_ids,
>  	.probe = netfront_probe,
>  	.remove = xennet_remove,
> -	.resume = netfront_resume,
>  	.otherend_changed = netback_changed,
>  };
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 14:47 ` Boris Ostrovsky
  2019-03-14 14:52   ` [PATCH] " Oleksandr Andrushchenko
@ 2019-03-14 14:52   ` Oleksandr Andrushchenko
  2019-03-14 15:02     ` [PATCH] " Boris Ostrovsky
  2019-03-14 15:02     ` [Xen-devel][PATCH] " Boris Ostrovsky
  1 sibling, 2 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 14:52 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko, Volodymyr Babchuk

On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Currently on driver resume we remove all the network queues and
>> destroy shared Tx/Rx rings leaving the driver in its current state
>> and never signaling the backend of this frontend's state change.
>> This leads to the number of consequences:
>> - when frontend withdraws granted references to the rings etc. it cannot
>>    be cleanly done as the backend still holds those (it was not told to
>>    free the resources)
>> - it is not possible to resume driver operation as all the communication
>>    means with the backned were destroyed by the frontend, thus
>>    making the frontend appear to the guest OS as functional, but
>>    not really.
>
> What do you mean? Are you saying that after resume you lose connectivity?
Exactly, if you take a look at the .resume callback as it is now
what it does it destroys the rings etc. and never notifies the backend
of that, e.g. it stays in, say, connected state with communication
channels destroyed. It never goes into any other Xen bus state, so there is
no way its state machine can help recovering.
>
> -boris
>
>
>> Fix this by not destroying communication channels/rings on driver
>> resume.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   drivers/net/xen-netfront.c | 17 -----------------
>>   1 file changed, 17 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c914c24f880b..2ca162048da4 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -1422,22 +1422,6 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>>   	}
>>   }
>>   
>> -/**
>> - * We are reconnecting to the backend, due to a suspend/resume, or a backend
>> - * driver restart.  We tear down our netif structure and recreate it, but
>> - * leave the device-layer structures intact so that this is transparent to the
>> - * rest of the kernel.
>> - */
>> -static int netfront_resume(struct xenbus_device *dev)
>> -{
>> -	struct netfront_info *info = dev_get_drvdata(&dev->dev);
>> -
>> -	dev_dbg(&dev->dev, "%s\n", dev->nodename);
>> -
>> -	xennet_disconnect_backend(info);
>> -	return 0;
>> -}
>> -
>>   static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
>>   {
>>   	char *s, *e, *macstr;
>> @@ -2185,7 +2169,6 @@ static struct xenbus_driver netfront_driver = {
>>   	.ids = netfront_ids,
>>   	.probe = netfront_probe,
>>   	.remove = xennet_remove,
>> -	.resume = netfront_resume,
>>   	.otherend_changed = netback_changed,
>>   };
>>   


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 14:47 ` Boris Ostrovsky
@ 2019-03-14 14:52   ` Oleksandr Andrushchenko
  2019-03-14 14:52   ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 14:52 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>
>> Currently on driver resume we remove all the network queues and
>> destroy shared Tx/Rx rings leaving the driver in its current state
>> and never signaling the backend of this frontend's state change.
>> This leads to the number of consequences:
>> - when frontend withdraws granted references to the rings etc. it cannot
>>    be cleanly done as the backend still holds those (it was not told to
>>    free the resources)
>> - it is not possible to resume driver operation as all the communication
>>    means with the backned were destroyed by the frontend, thus
>>    making the frontend appear to the guest OS as functional, but
>>    not really.
>
> What do you mean? Are you saying that after resume you lose connectivity?
Exactly, if you take a look at the .resume callback as it is now
what it does it destroys the rings etc. and never notifies the backend
of that, e.g. it stays in, say, connected state with communication
channels destroyed. It never goes into any other Xen bus state, so there is
no way its state machine can help recovering.
>
> -boris
>
>
>> Fix this by not destroying communication channels/rings on driver
>> resume.
>>
>> Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> ---
>>   drivers/net/xen-netfront.c | 17 -----------------
>>   1 file changed, 17 deletions(-)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index c914c24f880b..2ca162048da4 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -1422,22 +1422,6 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>>   	}
>>   }
>>   
>> -/**
>> - * We are reconnecting to the backend, due to a suspend/resume, or a backend
>> - * driver restart.  We tear down our netif structure and recreate it, but
>> - * leave the device-layer structures intact so that this is transparent to the
>> - * rest of the kernel.
>> - */
>> -static int netfront_resume(struct xenbus_device *dev)
>> -{
>> -	struct netfront_info *info = dev_get_drvdata(&dev->dev);
>> -
>> -	dev_dbg(&dev->dev, "%s\n", dev->nodename);
>> -
>> -	xennet_disconnect_backend(info);
>> -	return 0;
>> -}
>> -
>>   static int xen_net_read_mac(struct xenbus_device *dev, u8 mac[])
>>   {
>>   	char *s, *e, *macstr;
>> @@ -2185,7 +2169,6 @@ static struct xenbus_driver netfront_driver = {
>>   	.ids = netfront_ids,
>>   	.probe = netfront_probe,
>>   	.remove = xennet_remove,
>> -	.resume = netfront_resume,
>>   	.otherend_changed = netback_changed,
>>   };
>>   


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 14:52   ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
  2019-03-14 15:02     ` [PATCH] " Boris Ostrovsky
@ 2019-03-14 15:02     ` Boris Ostrovsky
  2019-03-14 15:10       ` [PATCH] " Oleksandr Andrushchenko
  2019-03-14 15:10       ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
  1 sibling, 2 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 15:02 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko, Volodymyr Babchuk

On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Currently on driver resume we remove all the network queues and
>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>> and never signaling the backend of this frontend's state change.
>>> This leads to the number of consequences:
>>> - when frontend withdraws granted references to the rings etc. it
>>> cannot
>>>    be cleanly done as the backend still holds those (it was not told to
>>>    free the resources)
>>> - it is not possible to resume driver operation as all the
>>> communication
>>>    means with the backned were destroyed by the frontend, thus
>>>    making the frontend appear to the guest OS as functional, but
>>>    not really.
>>
>> What do you mean? Are you saying that after resume you lose
>> connectivity?
> Exactly, if you take a look at the .resume callback as it is now
> what it does it destroys the rings etc. and never notifies the backend
> of that, e.g. it stays in, say, connected state with communication
> channels destroyed. It never goes into any other Xen bus state, so
> there is
> no way its state machine can help recovering.


My tree is about a month old so perhaps there is some sort of regression
but this certainly works for me. After resume netfront gets
XenbusStateInitWait from backend which causes xennet_connect().

-boris



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 14:52   ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
@ 2019-03-14 15:02     ` Boris Ostrovsky
  2019-03-14 15:02     ` [Xen-devel][PATCH] " Boris Ostrovsky
  1 sibling, 0 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 15:02 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>
>>> Currently on driver resume we remove all the network queues and
>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>> and never signaling the backend of this frontend's state change.
>>> This leads to the number of consequences:
>>> - when frontend withdraws granted references to the rings etc. it
>>> cannot
>>>    be cleanly done as the backend still holds those (it was not told to
>>>    free the resources)
>>> - it is not possible to resume driver operation as all the
>>> communication
>>>    means with the backned were destroyed by the frontend, thus
>>>    making the frontend appear to the guest OS as functional, but
>>>    not really.
>>
>> What do you mean? Are you saying that after resume you lose
>> connectivity?
> Exactly, if you take a look at the .resume callback as it is now
> what it does it destroys the rings etc. and never notifies the backend
> of that, e.g. it stays in, say, connected state with communication
> channels destroyed. It never goes into any other Xen bus state, so
> there is
> no way its state machine can help recovering.


My tree is about a month old so perhaps there is some sort of regression
but this certainly works for me. After resume netfront gets
XenbusStateInitWait from backend which causes xennet_connect().

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:02     ` [Xen-devel][PATCH] " Boris Ostrovsky
  2019-03-14 15:10       ` [PATCH] " Oleksandr Andrushchenko
@ 2019-03-14 15:10       ` Oleksandr Andrushchenko
  2019-03-14 15:40         ` Boris Ostrovsky
  2019-03-14 15:40         ` Boris Ostrovsky
  1 sibling, 2 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 15:10 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko, Volodymyr Babchuk

On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Currently on driver resume we remove all the network queues and
>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>> and never signaling the backend of this frontend's state change.
>>>> This leads to the number of consequences:
>>>> - when frontend withdraws granted references to the rings etc. it
>>>> cannot
>>>>     be cleanly done as the backend still holds those (it was not told to
>>>>     free the resources)
>>>> - it is not possible to resume driver operation as all the
>>>> communication
>>>>     means with the backned were destroyed by the frontend, thus
>>>>     making the frontend appear to the guest OS as functional, but
>>>>     not really.
>>> What do you mean? Are you saying that after resume you lose
>>> connectivity?
>> Exactly, if you take a look at the .resume callback as it is now
>> what it does it destroys the rings etc. and never notifies the backend
>> of that, e.g. it stays in, say, connected state with communication
>> channels destroyed. It never goes into any other Xen bus state, so
>> there is
>> no way its state machine can help recovering.
>
> My tree is about a month old so perhaps there is some sort of regression
> but this certainly works for me. After resume netfront gets
> XenbusStateInitWait from backend which causes xennet_connect().
Ah, the difference can be of the way we get the guest enter
the suspend state. I am making my guest to suspend with:
echo mem > /sys/power/state
And then I use an interrupt to the guest (this is a test code)
to wake it up.
Could you please share your exact use-case when the guest enters suspend
and what you do to resume it?
I can see no way backend may want enter XenbusStateInitWait in my use-case
as it simply doesn't know we want him to.
>
> -boris
>
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:02     ` [Xen-devel][PATCH] " Boris Ostrovsky
@ 2019-03-14 15:10       ` Oleksandr Andrushchenko
  2019-03-14 15:10       ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 15:10 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>
>>>> Currently on driver resume we remove all the network queues and
>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>> and never signaling the backend of this frontend's state change.
>>>> This leads to the number of consequences:
>>>> - when frontend withdraws granted references to the rings etc. it
>>>> cannot
>>>>     be cleanly done as the backend still holds those (it was not told to
>>>>     free the resources)
>>>> - it is not possible to resume driver operation as all the
>>>> communication
>>>>     means with the backned were destroyed by the frontend, thus
>>>>     making the frontend appear to the guest OS as functional, but
>>>>     not really.
>>> What do you mean? Are you saying that after resume you lose
>>> connectivity?
>> Exactly, if you take a look at the .resume callback as it is now
>> what it does it destroys the rings etc. and never notifies the backend
>> of that, e.g. it stays in, say, connected state with communication
>> channels destroyed. It never goes into any other Xen bus state, so
>> there is
>> no way its state machine can help recovering.
>
> My tree is about a month old so perhaps there is some sort of regression
> but this certainly works for me. After resume netfront gets
> XenbusStateInitWait from backend which causes xennet_connect().
Ah, the difference can be of the way we get the guest enter
the suspend state. I am making my guest to suspend with:
echo mem > /sys/power/state
And then I use an interrupt to the guest (this is a test code)
to wake it up.
Could you please share your exact use-case when the guest enters suspend
and what you do to resume it?
I can see no way backend may want enter XenbusStateInitWait in my use-case
as it simply doesn't know we want him to.
>
> -boris
>
>
Thank you,
Oleksandr

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:10       ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
@ 2019-03-14 15:40         ` Boris Ostrovsky
  2019-03-14 16:33           ` Oleksandr Andrushchenko
                             ` (3 more replies)
  2019-03-14 15:40         ` Boris Ostrovsky
  1 sibling, 4 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 15:40 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko, Volodymyr Babchuk

On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>
>>>>> Currently on driver resume we remove all the network queues and
>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>> and never signaling the backend of this frontend's state change.
>>>>> This leads to the number of consequences:
>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>> cannot
>>>>>     be cleanly done as the backend still holds those (it was not
>>>>> told to
>>>>>     free the resources)
>>>>> - it is not possible to resume driver operation as all the
>>>>> communication
>>>>>     means with the backned were destroyed by the frontend, thus
>>>>>     making the frontend appear to the guest OS as functional, but
>>>>>     not really.
>>>> What do you mean? Are you saying that after resume you lose
>>>> connectivity?
>>> Exactly, if you take a look at the .resume callback as it is now
>>> what it does it destroys the rings etc. and never notifies the backend
>>> of that, e.g. it stays in, say, connected state with communication
>>> channels destroyed. It never goes into any other Xen bus state, so
>>> there is
>>> no way its state machine can help recovering.
>>
>> My tree is about a month old so perhaps there is some sort of regression
>> but this certainly works for me. After resume netfront gets
>> XenbusStateInitWait from backend which causes xennet_connect().
> Ah, the difference can be of the way we get the guest enter
> the suspend state. I am making my guest to suspend with:
> echo mem > /sys/power/state
> And then I use an interrupt to the guest (this is a test code)
> to wake it up.
> Could you please share your exact use-case when the guest enters suspend
> and what you do to resume it?


xl save / xl restore

> I can see no way backend may want enter XenbusStateInitWait in my
> use-case
> as it simply doesn't know we want him to.


Yours looks like ACPI path, I don't know how well it was tested TBH.


-boris

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:10       ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
  2019-03-14 15:40         ` Boris Ostrovsky
@ 2019-03-14 15:40         ` Boris Ostrovsky
  1 sibling, 0 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 15:40 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>
>>>>> Currently on driver resume we remove all the network queues and
>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>> and never signaling the backend of this frontend's state change.
>>>>> This leads to the number of consequences:
>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>> cannot
>>>>>     be cleanly done as the backend still holds those (it was not
>>>>> told to
>>>>>     free the resources)
>>>>> - it is not possible to resume driver operation as all the
>>>>> communication
>>>>>     means with the backned were destroyed by the frontend, thus
>>>>>     making the frontend appear to the guest OS as functional, but
>>>>>     not really.
>>>> What do you mean? Are you saying that after resume you lose
>>>> connectivity?
>>> Exactly, if you take a look at the .resume callback as it is now
>>> what it does it destroys the rings etc. and never notifies the backend
>>> of that, e.g. it stays in, say, connected state with communication
>>> channels destroyed. It never goes into any other Xen bus state, so
>>> there is
>>> no way its state machine can help recovering.
>>
>> My tree is about a month old so perhaps there is some sort of regression
>> but this certainly works for me. After resume netfront gets
>> XenbusStateInitWait from backend which causes xennet_connect().
> Ah, the difference can be of the way we get the guest enter
> the suspend state. I am making my guest to suspend with:
> echo mem > /sys/power/state
> And then I use an interrupt to the guest (this is a test code)
> to wake it up.
> Could you please share your exact use-case when the guest enters suspend
> and what you do to resume it?


xl save / xl restore

> I can see no way backend may want enter XenbusStateInitWait in my
> use-case
> as it simply doesn't know we want him to.


Yours looks like ACPI path, I don't know how well it was tested TBH.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:40         ` Boris Ostrovsky
@ 2019-03-14 16:33           ` Oleksandr Andrushchenko
  2019-03-14 18:16             ` Boris Ostrovsky
  2019-03-14 18:16             ` Boris Ostrovsky
  2019-03-14 16:33           ` Oleksandr Andrushchenko
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 16:33 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko, Volodymyr Babchuk

On 3/14/19 17:40, Boris Ostrovsky wrote:
> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> Currently on driver resume we remove all the network queues and
>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>> and never signaling the backend of this frontend's state change.
>>>>>> This leads to the number of consequences:
>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>> cannot
>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>> told to
>>>>>>      free the resources)
>>>>>> - it is not possible to resume driver operation as all the
>>>>>> communication
>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>      not really.
>>>>> What do you mean? Are you saying that after resume you lose
>>>>> connectivity?
>>>> Exactly, if you take a look at the .resume callback as it is now
>>>> what it does it destroys the rings etc. and never notifies the backend
>>>> of that, e.g. it stays in, say, connected state with communication
>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>> there is
>>>> no way its state machine can help recovering.
>>> My tree is about a month old so perhaps there is some sort of regression
>>> but this certainly works for me. After resume netfront gets
>>> XenbusStateInitWait from backend which causes xennet_connect().
>> Ah, the difference can be of the way we get the guest enter
>> the suspend state. I am making my guest to suspend with:
>> echo mem > /sys/power/state
>> And then I use an interrupt to the guest (this is a test code)
>> to wake it up.
>> Could you please share your exact use-case when the guest enters suspend
>> and what you do to resume it?
>
> xl save / xl restore
>
>> I can see no way backend may want enter XenbusStateInitWait in my
>> use-case
>> as it simply doesn't know we want him to.
>
> Yours looks like ACPI path, I don't know how well it was tested TBH.

Hm, so it does work for your use-case, but doesn't for mine.

What would be the best way forward?

1. Implement .resume properly as, for example, block front does [1]

2. Remove .resume completely: this does work as long as backend doesn't 
change anything

I am still a bit unsure if we really need to re-initialize rings, 
re-read front's config from

Xenstore etc - what changes on backend side are expected when we resume 
the front driver?

>
>
> -boris

Thank you,

Oleksandr


[1] 
https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:40         ` Boris Ostrovsky
  2019-03-14 16:33           ` Oleksandr Andrushchenko
@ 2019-03-14 16:33           ` Oleksandr Andrushchenko
  2019-03-14 19:00           ` [Xen-devel] " Julien Grall
  2019-03-14 19:00           ` Julien Grall
  3 siblings, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 16:33 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

On 3/14/19 17:40, Boris Ostrovsky wrote:
> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> Currently on driver resume we remove all the network queues and
>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>> and never signaling the backend of this frontend's state change.
>>>>>> This leads to the number of consequences:
>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>> cannot
>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>> told to
>>>>>>      free the resources)
>>>>>> - it is not possible to resume driver operation as all the
>>>>>> communication
>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>      not really.
>>>>> What do you mean? Are you saying that after resume you lose
>>>>> connectivity?
>>>> Exactly, if you take a look at the .resume callback as it is now
>>>> what it does it destroys the rings etc. and never notifies the backend
>>>> of that, e.g. it stays in, say, connected state with communication
>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>> there is
>>>> no way its state machine can help recovering.
>>> My tree is about a month old so perhaps there is some sort of regression
>>> but this certainly works for me. After resume netfront gets
>>> XenbusStateInitWait from backend which causes xennet_connect().
>> Ah, the difference can be of the way we get the guest enter
>> the suspend state. I am making my guest to suspend with:
>> echo mem > /sys/power/state
>> And then I use an interrupt to the guest (this is a test code)
>> to wake it up.
>> Could you please share your exact use-case when the guest enters suspend
>> and what you do to resume it?
>
> xl save / xl restore
>
>> I can see no way backend may want enter XenbusStateInitWait in my
>> use-case
>> as it simply doesn't know we want him to.
>
> Yours looks like ACPI path, I don't know how well it was tested TBH.

Hm, so it does work for your use-case, but doesn't for mine.

What would be the best way forward?

1. Implement .resume properly as, for example, block front does [1]

2. Remove .resume completely: this does work as long as backend doesn't 
change anything

I am still a bit unsure if we really need to re-initialize rings, 
re-read front's config from

Xenstore etc - what changes on backend side are expected when we resume 
the front driver?

>
>
> -boris

Thank you,

Oleksandr


[1] 
https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 16:33           ` Oleksandr Andrushchenko
@ 2019-03-14 18:16             ` Boris Ostrovsky
  2019-03-14 18:20               ` Oleksandr Andrushchenko
  2019-03-14 18:20               ` [PATCH] " Oleksandr Andrushchenko
  2019-03-14 18:16             ` Boris Ostrovsky
  1 sibling, 2 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 18:16 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko, Volodymyr Babchuk

On 3/14/19 12:33 PM, Oleksandr Andrushchenko wrote:
> On 3/14/19 17:40, Boris Ostrovsky wrote:
>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>
>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>> This leads to the number of consequences:
>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>> cannot
>>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>>> told to
>>>>>>>      free the resources)
>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>> communication
>>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>>      not really.
>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>> connectivity?
>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>> what it does it destroys the rings etc. and never notifies the
>>>>> backend
>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>> there is
>>>>> no way its state machine can help recovering.
>>>> My tree is about a month old so perhaps there is some sort of
>>>> regression
>>>> but this certainly works for me. After resume netfront gets
>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>> Ah, the difference can be of the way we get the guest enter
>>> the suspend state. I am making my guest to suspend with:
>>> echo mem > /sys/power/state
>>> And then I use an interrupt to the guest (this is a test code)
>>> to wake it up.
>>> Could you please share your exact use-case when the guest enters
>>> suspend
>>> and what you do to resume it?
>>
>> xl save / xl restore
>>
>>> I can see no way backend may want enter XenbusStateInitWait in my
>>> use-case
>>> as it simply doesn't know we want him to.
>>
>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>
> Hm, so it does work for your use-case, but doesn't for mine.
>
> What would be the best way forward?
>
> 1. Implement .resume properly as, for example, block front does [1]
>
> 2. Remove .resume completely: this does work as long as backend
> doesn't change anything

For save/restore (migration) there is no guarantee that the new backend
has the same set of features.

>
> I am still a bit unsure if we really need to re-initialize rings,
> re-read front's config from
>
> Xenstore etc - what changes on backend side are expected when we
> resume the front driver?


Number of queues, for example. Or things in xennet_fix_features().

-boris

>
>>
>>
>> -boris
>
> Thank you,
>
> Oleksandr
>
>
> [1]
> https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 16:33           ` Oleksandr Andrushchenko
  2019-03-14 18:16             ` Boris Ostrovsky
@ 2019-03-14 18:16             ` Boris Ostrovsky
  1 sibling, 0 replies; 38+ messages in thread
From: Boris Ostrovsky @ 2019-03-14 18:16 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

On 3/14/19 12:33 PM, Oleksandr Andrushchenko wrote:
> On 3/14/19 17:40, Boris Ostrovsky wrote:
>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>
>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>> This leads to the number of consequences:
>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>> cannot
>>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>>> told to
>>>>>>>      free the resources)
>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>> communication
>>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>>      not really.
>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>> connectivity?
>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>> what it does it destroys the rings etc. and never notifies the
>>>>> backend
>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>> there is
>>>>> no way its state machine can help recovering.
>>>> My tree is about a month old so perhaps there is some sort of
>>>> regression
>>>> but this certainly works for me. After resume netfront gets
>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>> Ah, the difference can be of the way we get the guest enter
>>> the suspend state. I am making my guest to suspend with:
>>> echo mem > /sys/power/state
>>> And then I use an interrupt to the guest (this is a test code)
>>> to wake it up.
>>> Could you please share your exact use-case when the guest enters
>>> suspend
>>> and what you do to resume it?
>>
>> xl save / xl restore
>>
>>> I can see no way backend may want enter XenbusStateInitWait in my
>>> use-case
>>> as it simply doesn't know we want him to.
>>
>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>
> Hm, so it does work for your use-case, but doesn't for mine.
>
> What would be the best way forward?
>
> 1. Implement .resume properly as, for example, block front does [1]
>
> 2. Remove .resume completely: this does work as long as backend
> doesn't change anything

For save/restore (migration) there is no guarantee that the new backend
has the same set of features.

>
> I am still a bit unsure if we really need to re-initialize rings,
> re-read front's config from
>
> Xenstore etc - what changes on backend side are expected when we
> resume the front driver?


Number of queues, for example. Or things in xennet_fix_features().

-boris

>
>>
>>
>> -boris
>
> Thank you,
>
> Oleksandr
>
>
> [1]
> https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 18:16             ` Boris Ostrovsky
@ 2019-03-14 18:20               ` Oleksandr Andrushchenko
  2019-03-14 18:20               ` [PATCH] " Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 18:20 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Oleksandr Andrushchenko, Volodymyr Babchuk

On 3/14/19 20:16, Boris Ostrovsky wrote:
> On 3/14/19 12:33 PM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 17:40, Boris Ostrovsky wrote:
>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>
>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>> This leads to the number of consequences:
>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>> cannot
>>>>>>>>       be cleanly done as the backend still holds those (it was not
>>>>>>>> told to
>>>>>>>>       free the resources)
>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>> communication
>>>>>>>>       means with the backned were destroyed by the frontend, thus
>>>>>>>>       making the frontend appear to the guest OS as functional, but
>>>>>>>>       not really.
>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>> connectivity?
>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>> what it does it destroys the rings etc. and never notifies the
>>>>>> backend
>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>> there is
>>>>>> no way its state machine can help recovering.
>>>>> My tree is about a month old so perhaps there is some sort of
>>>>> regression
>>>>> but this certainly works for me. After resume netfront gets
>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>> Ah, the difference can be of the way we get the guest enter
>>>> the suspend state. I am making my guest to suspend with:
>>>> echo mem > /sys/power/state
>>>> And then I use an interrupt to the guest (this is a test code)
>>>> to wake it up.
>>>> Could you please share your exact use-case when the guest enters
>>>> suspend
>>>> and what you do to resume it?
>>> xl save / xl restore
>>>
>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>> use-case
>>>> as it simply doesn't know we want him to.
>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>> Hm, so it does work for your use-case, but doesn't for mine.
>>
>> What would be the best way forward?
>>
>> 1. Implement .resume properly as, for example, block front does [1]
>>
>> 2. Remove .resume completely: this does work as long as backend
>> doesn't change anything
> For save/restore (migration) there is no guarantee that the new backend
> has the same set of features.
>
>> I am still a bit unsure if we really need to re-initialize rings,
>> re-read front's config from
>>
>> Xenstore etc - what changes on backend side are expected when we
>> resume the front driver?
>
> Number of queues, for example. Or things in xennet_fix_features().
Ok, so it seems I have no choice, but implement proper .resume then )
>
> -boris
Thank you!
>>>
>>> -boris
>> Thank you,
>>
>> Oleksandr
>>
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072
>>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 18:16             ` Boris Ostrovsky
  2019-03-14 18:20               ` Oleksandr Andrushchenko
@ 2019-03-14 18:20               ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-14 18:20 UTC (permalink / raw)
  To: Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

On 3/14/19 20:16, Boris Ostrovsky wrote:
> On 3/14/19 12:33 PM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 17:40, Boris Ostrovsky wrote:
>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>
>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>> This leads to the number of consequences:
>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>> cannot
>>>>>>>>       be cleanly done as the backend still holds those (it was not
>>>>>>>> told to
>>>>>>>>       free the resources)
>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>> communication
>>>>>>>>       means with the backned were destroyed by the frontend, thus
>>>>>>>>       making the frontend appear to the guest OS as functional, but
>>>>>>>>       not really.
>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>> connectivity?
>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>> what it does it destroys the rings etc. and never notifies the
>>>>>> backend
>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>> there is
>>>>>> no way its state machine can help recovering.
>>>>> My tree is about a month old so perhaps there is some sort of
>>>>> regression
>>>>> but this certainly works for me. After resume netfront gets
>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>> Ah, the difference can be of the way we get the guest enter
>>>> the suspend state. I am making my guest to suspend with:
>>>> echo mem > /sys/power/state
>>>> And then I use an interrupt to the guest (this is a test code)
>>>> to wake it up.
>>>> Could you please share your exact use-case when the guest enters
>>>> suspend
>>>> and what you do to resume it?
>>> xl save / xl restore
>>>
>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>> use-case
>>>> as it simply doesn't know we want him to.
>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>> Hm, so it does work for your use-case, but doesn't for mine.
>>
>> What would be the best way forward?
>>
>> 1. Implement .resume properly as, for example, block front does [1]
>>
>> 2. Remove .resume completely: this does work as long as backend
>> doesn't change anything
> For save/restore (migration) there is no guarantee that the new backend
> has the same set of features.
>
>> I am still a bit unsure if we really need to re-initialize rings,
>> re-read front's config from
>>
>> Xenstore etc - what changes on backend side are expected when we
>> resume the front driver?
>
> Number of queues, for example. Or things in xennet_fix_features().
Ok, so it seems I have no choice, but implement proper .resume then )
>
> -boris
Thank you!
>>>
>>> -boris
>> Thank you,
>>
>> Oleksandr
>>
>>
>> [1]
>> https://elixir.bootlin.com/linux/v5.0.2/source/drivers/block/xen-blkfront.c#L2072
>>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:40         ` Boris Ostrovsky
  2019-03-14 16:33           ` Oleksandr Andrushchenko
  2019-03-14 16:33           ` Oleksandr Andrushchenko
@ 2019-03-14 19:00           ` Julien Grall
  2019-03-18 10:02             ` Oleksandr Andrushchenko
  2019-03-18 10:02             ` [Xen-devel] " Oleksandr Andrushchenko
  2019-03-14 19:00           ` Julien Grall
  3 siblings, 2 replies; 38+ messages in thread
From: Julien Grall @ 2019-03-14 19:00 UTC (permalink / raw)
  To: Boris Ostrovsky, Oleksandr Andrushchenko, netdev, xen-devel,
	linux-kernel, jgross, sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

Hi,

On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> Currently on driver resume we remove all the network queues and
>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>> and never signaling the backend of this frontend's state change.
>>>>>> This leads to the number of consequences:
>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>> cannot
>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>> told to
>>>>>>      free the resources)
>>>>>> - it is not possible to resume driver operation as all the
>>>>>> communication
>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>      not really.
>>>>> What do you mean? Are you saying that after resume you lose
>>>>> connectivity?
>>>> Exactly, if you take a look at the .resume callback as it is now
>>>> what it does it destroys the rings etc. and never notifies the backend
>>>> of that, e.g. it stays in, say, connected state with communication
>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>> there is
>>>> no way its state machine can help recovering.
>>>
>>> My tree is about a month old so perhaps there is some sort of regression
>>> but this certainly works for me. After resume netfront gets
>>> XenbusStateInitWait from backend which causes xennet_connect().
>> Ah, the difference can be of the way we get the guest enter
>> the suspend state. I am making my guest to suspend with:
>> echo mem > /sys/power/state
>> And then I use an interrupt to the guest (this is a test code)
>> to wake it up.
>> Could you please share your exact use-case when the guest enters suspend
>> and what you do to resume it?
> 
> 
> xl save / xl restore
> 
>> I can see no way backend may want enter XenbusStateInitWait in my
>> use-case
>> as it simply doesn't know we want him to.
> 
> 
> Yours looks like ACPI path, I don't know how well it was tested TBH.

I remember a series from amazon [1] that plays around suspend and 
hibernation. The patch [2] leads me to think that guest triggered 
suspend/resume does not work properly. It looks like the series has 
never been fully reviewed. Not sure why...

Anyway, from my understanding this series may solve Oleksandr issue. 
However, this would only address the common code side. AFAIK Oleksandr 
is targeting Arm platform. If so, I think this would require more work 
than this series. Arm code still miss few bits properly suspend/resume 
arch specific code (see [2]).

I have a branch on my git to track the series. However, they never have 
been resent after Ian Campbell left Citrix. I would be happy to review 
them if someone wants to pick them up and repost them.

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html

[2] 
http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2

> 
> 
> -boris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
> 

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 15:40         ` Boris Ostrovsky
                             ` (2 preceding siblings ...)
  2019-03-14 19:00           ` [Xen-devel] " Julien Grall
@ 2019-03-14 19:00           ` Julien Grall
  3 siblings, 0 replies; 38+ messages in thread
From: Julien Grall @ 2019-03-14 19:00 UTC (permalink / raw)
  To: Boris Ostrovsky, Oleksandr Andrushchenko, netdev, xen-devel,
	linux-kernel, jgross, sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko

Hi,

On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>
>>>>>> Currently on driver resume we remove all the network queues and
>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>> and never signaling the backend of this frontend's state change.
>>>>>> This leads to the number of consequences:
>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>> cannot
>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>> told to
>>>>>>      free the resources)
>>>>>> - it is not possible to resume driver operation as all the
>>>>>> communication
>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>      not really.
>>>>> What do you mean? Are you saying that after resume you lose
>>>>> connectivity?
>>>> Exactly, if you take a look at the .resume callback as it is now
>>>> what it does it destroys the rings etc. and never notifies the backend
>>>> of that, e.g. it stays in, say, connected state with communication
>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>> there is
>>>> no way its state machine can help recovering.
>>>
>>> My tree is about a month old so perhaps there is some sort of regression
>>> but this certainly works for me. After resume netfront gets
>>> XenbusStateInitWait from backend which causes xennet_connect().
>> Ah, the difference can be of the way we get the guest enter
>> the suspend state. I am making my guest to suspend with:
>> echo mem > /sys/power/state
>> And then I use an interrupt to the guest (this is a test code)
>> to wake it up.
>> Could you please share your exact use-case when the guest enters suspend
>> and what you do to resume it?
> 
> 
> xl save / xl restore
> 
>> I can see no way backend may want enter XenbusStateInitWait in my
>> use-case
>> as it simply doesn't know we want him to.
> 
> 
> Yours looks like ACPI path, I don't know how well it was tested TBH.

I remember a series from amazon [1] that plays around suspend and 
hibernation. The patch [2] leads me to think that guest triggered 
suspend/resume does not work properly. It looks like the series has 
never been fully reviewed. Not sure why...

Anyway, from my understanding this series may solve Oleksandr issue. 
However, this would only address the common code side. AFAIK Oleksandr 
is targeting Arm platform. If so, I think this would require more work 
than this series. Arm code still miss few bits properly suspend/resume 
arch specific code (see [2]).

I have a branch on my git to track the series. However, they never have 
been resent after Ian Campbell left Citrix. I would be happy to review 
them if someone wants to pick them up and repost them.

Cheers,

[1] 
https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html

[2] 
http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2

> 
> 
> -boris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 19:00           ` [Xen-devel] " Julien Grall
  2019-03-18 10:02             ` Oleksandr Andrushchenko
@ 2019-03-18 10:02             ` Oleksandr Andrushchenko
  2019-03-20  3:50               ` Munehisa Kamata
  2019-03-20  3:50               ` [Xen-devel] " Munehisa Kamata
  1 sibling, 2 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-18 10:02 UTC (permalink / raw)
  To: Julien Grall, Boris Ostrovsky, netdev, xen-devel, linux-kernel,
	jgross, sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko, kamatam, anchalag, eduval

+Amazon
pls see inline

On 3/14/19 9:00 PM, Julien Grall wrote:
> Hi,
>
> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>
>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>> This leads to the number of consequences:
>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>> cannot
>>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>>> told to
>>>>>>>      free the resources)
>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>> communication
>>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>>      not really.
>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>> connectivity?
>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>> what it does it destroys the rings etc. and never notifies the 
>>>>> backend
>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>> there is
>>>>> no way its state machine can help recovering.
>>>>
>>>> My tree is about a month old so perhaps there is some sort of 
>>>> regression
>>>> but this certainly works for me. After resume netfront gets
>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>> Ah, the difference can be of the way we get the guest enter
>>> the suspend state. I am making my guest to suspend with:
>>> echo mem > /sys/power/state
>>> And then I use an interrupt to the guest (this is a test code)
>>> to wake it up.
>>> Could you please share your exact use-case when the guest enters 
>>> suspend
>>> and what you do to resume it?
>>
>>
>> xl save / xl restore
>>
>>> I can see no way backend may want enter XenbusStateInitWait in my
>>> use-case
>>> as it simply doesn't know we want him to.
>>
>>
>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>
> I remember a series from amazon [1] that plays around suspend and 
> hibernation. The patch [2] leads me to think that guest triggered 
> suspend/resume does not work properly. It looks like the series has 
> never been fully reviewed. Not sure why...
Julien, thanks a lot for bringing these patches to our attention which 
we obviously missed.
>
> Anyway, from my understanding this series may solve Oleksandr issue. 
> However, this would only address the common code side. AFAIK Oleksandr 
> is targeting Arm platform. If so, I think this would require more work 
> than this series. Arm code still miss few bits properly suspend/resume 
> arch specific code (see [2]).
>
> I have a branch on my git to track the series. However, they never 
> have been resent after Ian Campbell left Citrix. I would be happy to 
> review them if someone wants to pick them up and repost them.
>
First of all, let me make it clear that we are interested in hibernation 
long term, so it would be
desirable to re-use as much work form resume/suspend as we can. But, we 
see it as a step by
step work, e.g. first S2RAM and later on hibernation.
Let me clarify the immediate use-case that we have, so it is easier to 
understand what we want
and what we don't at the moment. We are about to continue work started 
by Mirela/Xilinx on
Suspend-to-RAM for ARM [3] and we made number of assumptions:
1. We are talking about *system* suspend, e.g. the goal is to suspend 
all the components
of the system and Xen itself at once. Think about this as fast-boot 
and/or energy saving
feature if you will.
2. With suspend/resume there is no intention to migrate VMs to any other 
host.
3. Most probably configuration of the back/front won't change between 
suspend/resume.
But long term we are also thinking for supporting suspend/resume in its 
broader meaning,
e.g. what is probably what you mean by suspend/resume.
Given that, we think that we don't need Xen support to save grants, page 
tables and other
VM's context on suspend at least at the first stage as we are 
implementing not a fully
blown suspend/resume, but only S2RAM part of it which is much more 
simpler than a generic
suspend implementation. We only need changes to Linux kernel frontend 
drivers from [1] - the
piece that we miss is suspend/resume implementation in the netfront 
driver. What is more, as
we are not changing back/front configuration, we can even live with 
empty .resume/.suspend
frontend's callbacks because event channels, rings etc. are "statically" 
allocated in our
use-case at the first system start (cold boot). And indeed, tests show 
that waking domains
in the right order do allow that.
So, frankly, from [3] we are immediately interested in implementing 
.resume/.suspend, not
even freeze/thaw/restore callbacks: if Amazon has will and capacity to 
continue working on [3]
then once that gets into the upstream it also solves our S2RAM use-case, 
but if not then we
can probably re-work netfront patch and only provide .resume/.suspend 
callbacks which we need
for now (remember our very specific use-case which can survive suspend 
without callbacks
implemented).
IMO, patches at [2] seem to be useful while implementing generic 
suspend/resume and can
be postponed for S2RAM.

Julien/Juergen/Boris/Amazon - could you please express your view on the 
above?
Is it acceptable that for now we only take re-worked netfront patch from 
[3] with full
implementation in mind for later (we reuse code for .resume/.suspend)?

> Cheers,
>
> [1] 
> https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>
> [2] 
> http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>
[3] 
https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
>>
>>
>> -boris
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>
>
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-14 19:00           ` [Xen-devel] " Julien Grall
@ 2019-03-18 10:02             ` Oleksandr Andrushchenko
  2019-03-18 10:02             ` [Xen-devel] " Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-18 10:02 UTC (permalink / raw)
  To: Julien Grall, Boris Ostrovsky, netdev, xen-devel, linux-kernel,
	jgross, sstabellini, davem
  Cc: kamatam, Volodymyr Babchuk, eduval, anchalag, Oleksandr Andrushchenko

+Amazon
pls see inline

On 3/14/19 9:00 PM, Julien Grall wrote:
> Hi,
>
> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>
>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>> This leads to the number of consequences:
>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>> cannot
>>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>>> told to
>>>>>>>      free the resources)
>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>> communication
>>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>>      not really.
>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>> connectivity?
>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>> what it does it destroys the rings etc. and never notifies the 
>>>>> backend
>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>> there is
>>>>> no way its state machine can help recovering.
>>>>
>>>> My tree is about a month old so perhaps there is some sort of 
>>>> regression
>>>> but this certainly works for me. After resume netfront gets
>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>> Ah, the difference can be of the way we get the guest enter
>>> the suspend state. I am making my guest to suspend with:
>>> echo mem > /sys/power/state
>>> And then I use an interrupt to the guest (this is a test code)
>>> to wake it up.
>>> Could you please share your exact use-case when the guest enters 
>>> suspend
>>> and what you do to resume it?
>>
>>
>> xl save / xl restore
>>
>>> I can see no way backend may want enter XenbusStateInitWait in my
>>> use-case
>>> as it simply doesn't know we want him to.
>>
>>
>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>
> I remember a series from amazon [1] that plays around suspend and 
> hibernation. The patch [2] leads me to think that guest triggered 
> suspend/resume does not work properly. It looks like the series has 
> never been fully reviewed. Not sure why...
Julien, thanks a lot for bringing these patches to our attention which 
we obviously missed.
>
> Anyway, from my understanding this series may solve Oleksandr issue. 
> However, this would only address the common code side. AFAIK Oleksandr 
> is targeting Arm platform. If so, I think this would require more work 
> than this series. Arm code still miss few bits properly suspend/resume 
> arch specific code (see [2]).
>
> I have a branch on my git to track the series. However, they never 
> have been resent after Ian Campbell left Citrix. I would be happy to 
> review them if someone wants to pick them up and repost them.
>
First of all, let me make it clear that we are interested in hibernation 
long term, so it would be
desirable to re-use as much work form resume/suspend as we can. But, we 
see it as a step by
step work, e.g. first S2RAM and later on hibernation.
Let me clarify the immediate use-case that we have, so it is easier to 
understand what we want
and what we don't at the moment. We are about to continue work started 
by Mirela/Xilinx on
Suspend-to-RAM for ARM [3] and we made number of assumptions:
1. We are talking about *system* suspend, e.g. the goal is to suspend 
all the components
of the system and Xen itself at once. Think about this as fast-boot 
and/or energy saving
feature if you will.
2. With suspend/resume there is no intention to migrate VMs to any other 
host.
3. Most probably configuration of the back/front won't change between 
suspend/resume.
But long term we are also thinking for supporting suspend/resume in its 
broader meaning,
e.g. what is probably what you mean by suspend/resume.
Given that, we think that we don't need Xen support to save grants, page 
tables and other
VM's context on suspend at least at the first stage as we are 
implementing not a fully
blown suspend/resume, but only S2RAM part of it which is much more 
simpler than a generic
suspend implementation. We only need changes to Linux kernel frontend 
drivers from [1] - the
piece that we miss is suspend/resume implementation in the netfront 
driver. What is more, as
we are not changing back/front configuration, we can even live with 
empty .resume/.suspend
frontend's callbacks because event channels, rings etc. are "statically" 
allocated in our
use-case at the first system start (cold boot). And indeed, tests show 
that waking domains
in the right order do allow that.
So, frankly, from [3] we are immediately interested in implementing 
.resume/.suspend, not
even freeze/thaw/restore callbacks: if Amazon has will and capacity to 
continue working on [3]
then once that gets into the upstream it also solves our S2RAM use-case, 
but if not then we
can probably re-work netfront patch and only provide .resume/.suspend 
callbacks which we need
for now (remember our very specific use-case which can survive suspend 
without callbacks
implemented).
IMO, patches at [2] seem to be useful while implementing generic 
suspend/resume and can
be postponed for S2RAM.

Julien/Juergen/Boris/Amazon - could you please express your view on the 
above?
Is it acceptable that for now we only take re-worked netfront patch from 
[3] with full
implementation in mind for later (we reuse code for .resume/.suspend)?

> Cheers,
>
> [1] 
> https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>
> [2] 
> http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>
[3] 
https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
>>
>>
>> -boris
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>
>
Thank you,
Oleksandr

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-18 10:02             ` [Xen-devel] " Oleksandr Andrushchenko
  2019-03-20  3:50               ` Munehisa Kamata
@ 2019-03-20  3:50               ` Munehisa Kamata
  2019-03-21 19:25                 ` Anchal Agarwal
                                   ` (2 more replies)
  1 sibling, 3 replies; 38+ messages in thread
From: Munehisa Kamata @ 2019-03-20  3:50 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Julien Grall, Boris Ostrovsky, netdev,
	xen-devel, linux-kernel, jgross, sstabellini, davem
  Cc: Volodymyr Babchuk, Oleksandr Andrushchenko, anchalag, eduval

On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
> +Amazon
> pls see inline
Hi Oleksandr,

Let me add some comments as the original author of the series.

> 
> On 3/14/19 9:00 PM, Julien Grall wrote:
>> Hi,
>>
>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>
>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>> This leads to the number of consequences:
>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>> cannot
>>>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>>>> told to
>>>>>>>>      free the resources)
>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>> communication
>>>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>>>      not really.
>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>> connectivity?
>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>> what it does it destroys the rings etc. and never notifies the backend
>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>> there is
>>>>>> no way its state machine can help recovering.
>>>>>
>>>>> My tree is about a month old so perhaps there is some sort of regression
>>>>> but this certainly works for me. After resume netfront gets
>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>> Ah, the difference can be of the way we get the guest enter
>>>> the suspend state. I am making my guest to suspend with:
>>>> echo mem > /sys/power/state
>>>> And then I use an interrupt to the guest (this is a test code)
>>>> to wake it up.
>>>> Could you please share your exact use-case when the guest enters suspend
>>>> and what you do to resume it?
>>>
>>>
>>> xl save / xl restore
>>>
>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>> use-case
>>>> as it simply doesn't know we want him to.
>>>
>>>
>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>>
>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
>>
>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
>>
>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
>>
> First of all, let me make it clear that we are interested in hibernation long term, so it would be
> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
> step work, e.g. first S2RAM and later on hibernation.
> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
> Suspend-to-RAM for ARM [3] and we made number of assumptions:
> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
> feature if you will.
> 2. With suspend/resume there is no intention to migrate VMs to any other host.
> 3. Most probably configuration of the back/front won't change between suspend/resume.
> But long term we are also thinking for supporting suspend/resume in its broader meaning,
> e.g. what is probably what you mean by suspend/resume.
AFAIK .suspend and .resume callbacks in frontend drivers are
specifically for xl save/restore case rather than the normal "system"
suspend. i.e. The former is Boris' case and something I called "Xen
suspend" in the patch series, the latter should be your interest and
called "ACPI path" here, and I referred to as "PM suspend". They are
very different code paths, see drivers/xen/manage.c for details of
Xen suspend.

> Given that, we think that we don't need Xen support to save grants, page tables and other
> VM's context on suspend at least at the first stage as we are implementing not a fully
> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
> we are not changing back/front configuration, we can even live with empty .resume/.suspend
> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
> use-case at the first system start (cold boot). And indeed, tests show that waking domains
> in the right order do allow that.
> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
If you just (re)implement .suspend and .resume so without taking care
of Xen suspend, you can easily break the existing functionality. The
patch series introduced .freeze and .restore callbacks for both PM
suspend and hibernation, and kept .suspend (not implemented in most
frontend though) and .resume with no changes for Xen suspend.

Note that xenbus has mapped freeze/thaw/restore events to suspend,
resume and cancel callbacks to handle "checkpoint" case[4]. This was a
bit tricky and led me to the design to have the separate set of
callbacks at each frontend driver level[5]. You might need to consider
a similar approach even if your immediate interest at the moment is PM
suspend.

> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
> for now (remember our very specific use-case which can survive suspend without callbacks
> implemented).
> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
> be postponed for S2RAM.
> 
> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
> implementation in mind for later (we reuse code for .resume/.suspend)?
In fact, Anchal has taken over my initial work and she may want to chime
in here.

That said, I'd be very happy to review patches if you come up with your
own ones, so feel free to add me in that case.

>> Cheers,
>>
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>>
>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>>
> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html

[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
[5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html

>>>
>>>
>>> -boris
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xenproject.org
>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>>
>>
> Thank you,
> Oleksandr

Thanks,
Munehisa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-18 10:02             ` [Xen-devel] " Oleksandr Andrushchenko
@ 2019-03-20  3:50               ` Munehisa Kamata
  2019-03-20  3:50               ` [Xen-devel] " Munehisa Kamata
  1 sibling, 0 replies; 38+ messages in thread
From: Munehisa Kamata @ 2019-03-20  3:50 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Julien Grall, Boris Ostrovsky, netdev,
	xen-devel, linux-kernel, jgross, sstabellini, davem
  Cc: anchalag, Volodymyr Babchuk, eduval, Oleksandr Andrushchenko

On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
> +Amazon
> pls see inline
Hi Oleksandr,

Let me add some comments as the original author of the series.

> 
> On 3/14/19 9:00 PM, Julien Grall wrote:
>> Hi,
>>
>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>
>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>> This leads to the number of consequences:
>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>> cannot
>>>>>>>>      be cleanly done as the backend still holds those (it was not
>>>>>>>> told to
>>>>>>>>      free the resources)
>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>> communication
>>>>>>>>      means with the backned were destroyed by the frontend, thus
>>>>>>>>      making the frontend appear to the guest OS as functional, but
>>>>>>>>      not really.
>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>> connectivity?
>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>> what it does it destroys the rings etc. and never notifies the backend
>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>> there is
>>>>>> no way its state machine can help recovering.
>>>>>
>>>>> My tree is about a month old so perhaps there is some sort of regression
>>>>> but this certainly works for me. After resume netfront gets
>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>> Ah, the difference can be of the way we get the guest enter
>>>> the suspend state. I am making my guest to suspend with:
>>>> echo mem > /sys/power/state
>>>> And then I use an interrupt to the guest (this is a test code)
>>>> to wake it up.
>>>> Could you please share your exact use-case when the guest enters suspend
>>>> and what you do to resume it?
>>>
>>>
>>> xl save / xl restore
>>>
>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>> use-case
>>>> as it simply doesn't know we want him to.
>>>
>>>
>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>>
>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
>>
>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
>>
>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
>>
> First of all, let me make it clear that we are interested in hibernation long term, so it would be
> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
> step work, e.g. first S2RAM and later on hibernation.
> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
> Suspend-to-RAM for ARM [3] and we made number of assumptions:
> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
> feature if you will.
> 2. With suspend/resume there is no intention to migrate VMs to any other host.
> 3. Most probably configuration of the back/front won't change between suspend/resume.
> But long term we are also thinking for supporting suspend/resume in its broader meaning,
> e.g. what is probably what you mean by suspend/resume.
AFAIK .suspend and .resume callbacks in frontend drivers are
specifically for xl save/restore case rather than the normal "system"
suspend. i.e. The former is Boris' case and something I called "Xen
suspend" in the patch series, the latter should be your interest and
called "ACPI path" here, and I referred to as "PM suspend". They are
very different code paths, see drivers/xen/manage.c for details of
Xen suspend.

> Given that, we think that we don't need Xen support to save grants, page tables and other
> VM's context on suspend at least at the first stage as we are implementing not a fully
> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
> we are not changing back/front configuration, we can even live with empty .resume/.suspend
> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
> use-case at the first system start (cold boot). And indeed, tests show that waking domains
> in the right order do allow that.
> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
If you just (re)implement .suspend and .resume so without taking care
of Xen suspend, you can easily break the existing functionality. The
patch series introduced .freeze and .restore callbacks for both PM
suspend and hibernation, and kept .suspend (not implemented in most
frontend though) and .resume with no changes for Xen suspend.

Note that xenbus has mapped freeze/thaw/restore events to suspend,
resume and cancel callbacks to handle "checkpoint" case[4]. This was a
bit tricky and led me to the design to have the separate set of
callbacks at each frontend driver level[5]. You might need to consider
a similar approach even if your immediate interest at the moment is PM
suspend.

> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
> for now (remember our very specific use-case which can survive suspend without callbacks
> implemented).
> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
> be postponed for S2RAM.
> 
> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
> implementation in mind for later (we reuse code for .resume/.suspend)?
In fact, Anchal has taken over my initial work and she may want to chime
in here.

That said, I'd be very happy to review patches if you come up with your
own ones, so feel free to add me in that case.

>> Cheers,
>>
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>>
>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>>
> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html

[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
[5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html

>>>
>>>
>>> -boris
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xenproject.org
>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>>
>>
> Thank you,
> Oleksandr

Thanks,
Munehisa

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-20  3:50               ` [Xen-devel] " Munehisa Kamata
@ 2019-03-21 19:25                 ` Anchal Agarwal
  2019-03-22 10:44                 ` Oleksandr Andrushchenko
  2019-03-22 10:44                 ` [Xen-devel] " Oleksandr Andrushchenko
  2 siblings, 0 replies; 38+ messages in thread
From: Anchal Agarwal @ 2019-03-21 19:25 UTC (permalink / raw)
  To: Munehisa Kamata, Oleksandr Andrushchenko, Julien Grall,
	Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem
  Cc: eduval, Volodymyr Babchuk, David Woodhouse, anchalag

On Tue, Mar 19, 2019 at 08:50:05PM -0700, Munehisa Kamata wrote:
> On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
> > +Amazon
> > pls see inline
> Hi Oleksandr,
> 
> Let me add some comments as the original author of the series.
> 
> > 
> > On 3/14/19 9:00 PM, Julien Grall wrote:
> >> Hi,
> >>
> >> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
> >>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
> >>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
> >>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
> >>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
> >>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>>>>>>
> >>>>>>>> Currently on driver resume we remove all the network queues and
> >>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
> >>>>>>>> and never signaling the backend of this frontend's state change.
> >>>>>>>> This leads to the number of consequences:
> >>>>>>>> - when frontend withdraws granted references to the rings etc. it
> >>>>>>>> cannot
> >>>>>>>> ???????? be cleanly done as the backend still holds those (it was not
> >>>>>>>> told to
> >>>>>>>> ???????? free the resources)
> >>>>>>>> - it is not possible to resume driver operation as all the
> >>>>>>>> communication
> >>>>>>>> ???????? means with the backned were destroyed by the frontend, thus
> >>>>>>>> ???????? making the frontend appear to the guest OS as functional, but
> >>>>>>>> ???????? not really.
> >>>>>>> What do you mean? Are you saying that after resume you lose
> >>>>>>> connectivity?
> >>>>>> Exactly, if you take a look at the .resume callback as it is now
> >>>>>> what it does it destroys the rings etc. and never notifies the backend
> >>>>>> of that, e.g. it stays in, say, connected state with communication
> >>>>>> channels destroyed. It never goes into any other Xen bus state, so
> >>>>>> there is
> >>>>>> no way its state machine can help recovering.
> >>>>>
> >>>>> My tree is about a month old so perhaps there is some sort of regression
> >>>>> but this certainly works for me. After resume netfront gets
> >>>>> XenbusStateInitWait from backend which causes xennet_connect().
> >>>> Ah, the difference can be of the way we get the guest enter
> >>>> the suspend state. I am making my guest to suspend with:
> >>>> echo mem > /sys/power/state
> >>>> And then I use an interrupt to the guest (this is a test code)
> >>>> to wake it up.
> >>>> Could you please share your exact use-case when the guest enters suspend
> >>>> and what you do to resume it?
> >>>
> >>>
> >>> xl save / xl restore
> >>>
> >>>> I can see no way backend may want enter XenbusStateInitWait in my
> >>>> use-case
> >>>> as it simply doesn't know we want him to.
> >>>
> >>>
> >>> Yours looks like ACPI path, I don't know how well it was tested TBH.
> >>
> >> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
> > Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
> >>
> >> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
> >>
> >> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
> >>
> > First of all, let me make it clear that we are interested in hibernation long term, so it would be
> > desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
> > step work, e.g. first S2RAM and later on hibernation.
> > Let me clarify the immediate use-case that we have, so it is easier to understand what we want
> > and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
> > Suspend-to-RAM for ARM [3] and we made number of assumptions:
> > 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
> > of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
> > feature if you will.
> > 2. With suspend/resume there is no intention to migrate VMs to any other host.
> > 3. Most probably configuration of the back/front won't change between suspend/resume.
> > But long term we are also thinking for supporting suspend/resume in its broader meaning,
> > e.g. what is probably what you mean by suspend/resume.
> AFAIK .suspend and .resume callbacks in frontend drivers are
> specifically for xl save/restore case rather than the normal "system"
> suspend. i.e. The former is Boris' case and something I called "Xen
> suspend" in the patch series, the latter should be your interest and
> called "ACPI path" here, and I referred to as "PM suspend". They are
> very different code paths, see drivers/xen/manage.c for details of
> Xen suspend.
> 
> > Given that, we think that we don't need Xen support to save grants, page tables and other
> > VM's context on suspend at least at the first stage as we are implementing not a fully
> > blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
> > suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
> > piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
> > we are not changing back/front configuration, we can even live with empty .resume/.suspend
> > frontend's callbacks because event channels, rings etc. are "statically" allocated in our
> > use-case at the first system start (cold boot). And indeed, tests show that waking domains
> > in the right order do allow that.
> > So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
> If you just (re)implement .suspend and .resume so without taking care
> of Xen suspend, you can easily break the existing functionality. The
> patch series introduced .freeze and .restore callbacks for both PM
> suspend and hibernation, and kept .suspend (not implemented in most
> frontend though) and .resume with no changes for Xen suspend.
> 
> Note that xenbus has mapped freeze/thaw/restore events to suspend,
> resume and cancel callbacks to handle "checkpoint" case[4]. This was a
> bit tricky and led me to the design to have the separate set of
> callbacks at each frontend driver level[5]. You might need to consider
> a similar approach even if your immediate interest at the moment is PM
> suspend.
> 
> > even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
> > then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
> > can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
> > for now (remember our very specific use-case which can survive suspend without callbacks
> > implemented).
> > IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
> > be postponed for S2RAM.
> > 
> > Julien/Juergen/Boris/Amazon - could you please express your view on the above?
> > Is it acceptable that for now we only take re-worked netfront patch from [3] with full
> > implementation in mind for later (we reuse code for .resume/.suspend)?
> In fact, Anchal has taken over my initial work and she may want to chime
> in here.
> 
> That said, I'd be very happy to review patches if you come up with your
> own ones, so feel free to add me in that case.
Yes I am working on those patches and plan to re-post them in an effort to upstream the
the patches. I agree with Munehisa here on considering the patches that are already out
there as I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION 
separate from xen suspend and resume. 
> 
> >> Cheers,
> >>
> >> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
> >>
> >> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
> >>
> > [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
> 
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
> 
> >>>
> >>>
> >>> -boris
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xenproject.org
> >>> https://lists.xenproject.org/mailman/listinfo/xen-devel
> >>>
> >>
> > Thank you,
> > Oleksandr
> 
> Thanks,
> Munehisa

Thanks,
Anchal

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-20  3:50               ` [Xen-devel] " Munehisa Kamata
  2019-03-21 19:25                 ` Anchal Agarwal
  2019-03-22 10:44                 ` Oleksandr Andrushchenko
@ 2019-03-22 10:44                 ` Oleksandr Andrushchenko
  2019-03-25 17:30                   ` Anchal Agarwal
  2 siblings, 1 reply; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-22 10:44 UTC (permalink / raw)
  To: Munehisa Kamata, Oleksandr Andrushchenko, Julien Grall,
	Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem, anchalag
  Cc: Volodymyr Babchuk, eduval


On 3/20/19 5:50 AM, Munehisa Kamata wrote:
> On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
>> +Amazon
>> pls see inline
> Hi Oleksandr,
>
> Let me add some comments as the original author of the series.
Thank you for your work!
>
>> On 3/14/19 9:00 PM, Julien Grall wrote:
>>> Hi,
>>>
>>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>>
>>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>>> This leads to the number of consequences:
>>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>>> cannot
>>>>>>>>>       be cleanly done as the backend still holds those (it was not
>>>>>>>>> told to
>>>>>>>>>       free the resources)
>>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>>> communication
>>>>>>>>>       means with the backned were destroyed by the frontend, thus
>>>>>>>>>       making the frontend appear to the guest OS as functional, but
>>>>>>>>>       not really.
>>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>>> connectivity?
>>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>>> what it does it destroys the rings etc. and never notifies the backend
>>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>>> there is
>>>>>>> no way its state machine can help recovering.
>>>>>> My tree is about a month old so perhaps there is some sort of regression
>>>>>> but this certainly works for me. After resume netfront gets
>>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>>> Ah, the difference can be of the way we get the guest enter
>>>>> the suspend state. I am making my guest to suspend with:
>>>>> echo mem > /sys/power/state
>>>>> And then I use an interrupt to the guest (this is a test code)
>>>>> to wake it up.
>>>>> Could you please share your exact use-case when the guest enters suspend
>>>>> and what you do to resume it?
>>>>
>>>> xl save / xl restore
>>>>
>>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>>> use-case
>>>>> as it simply doesn't know we want him to.
>>>>
>>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
>> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
>>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
>>>
>>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
>>>
>> First of all, let me make it clear that we are interested in hibernation long term, so it would be
>> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
>> step work, e.g. first S2RAM and later on hibernation.
>> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
>> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
>> Suspend-to-RAM for ARM [3] and we made number of assumptions:
>> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
>> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
>> feature if you will.
>> 2. With suspend/resume there is no intention to migrate VMs to any other host.
>> 3. Most probably configuration of the back/front won't change between suspend/resume.
>> But long term we are also thinking for supporting suspend/resume in its broader meaning,
>> e.g. what is probably what you mean by suspend/resume.
> AFAIK .suspend and .resume callbacks in frontend drivers are
> specifically for xl save/restore case rather than the normal "system"
> suspend. i.e. The former is Boris' case and something I called "Xen
> suspend" in the patch series, the latter should be your interest and
> called "ACPI path" here, and I referred to as "PM suspend". They are
> very different code paths, see drivers/xen/manage.c for details of
> Xen suspend.
Yes, I saw that code, thank you
>
>> Given that, we think that we don't need Xen support to save grants, page tables and other
>> VM's context on suspend at least at the first stage as we are implementing not a fully
>> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
>> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
>> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
>> we are not changing back/front configuration, we can even live with empty .resume/.suspend
>> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
>> use-case at the first system start (cold boot). And indeed, tests show that waking domains
>> in the right order do allow that.
>> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
> If you just (re)implement .suspend and .resume so without taking care
> of Xen suspend, you can easily break the existing functionality. The
> patch series introduced .freeze and .restore callbacks for both PM
> suspend and hibernation, and kept .suspend (not implemented in most
> frontend though) and .resume with no changes for Xen suspend.
>
> Note that xenbus has mapped freeze/thaw/restore events to suspend,
> resume and cancel callbacks to handle "checkpoint" case[4]. This was a
> bit tricky and led me to the design to have the separate set of
> callbacks at each frontend driver level[5]. You might need to consider
> a similar approach even if your immediate interest at the moment is PM
> suspend.
For the immediate task we have at the moment we think we can re-use
your work and implement .suspend/.resume based on it (we are targeting
S2RAM as the first stage).
But long term - we do support the idea of fully implemented
suspend and *hibernate* functionality as you describe it.
So, yes, we are also thinking about that.
>
>> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
>> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
>> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
>> for now (remember our very specific use-case which can survive suspend without callbacks
>> implemented).
>> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
>> be postponed for S2RAM.
>>
>> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
>> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
>> implementation in mind for later (we reuse code for .resume/.suspend)?
> In fact, Anchal has taken over my initial work and she may want to chime
> in here.
Great, could you please let us know what is the progress and further plans
on that, so we do not work on the same code and can coordinate our
efforts somehow? Anchal, could you please shed some light on this?
>
> That said, I'd be very happy to review patches if you come up with your
> own ones, so feel free to add me in that case.
Sure, thank you!
>
>>> Cheers,
>>>
>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>>>
>>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>>>
>> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
>
>>>>
>>>> -boris
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xenproject.org
>>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>>>
>> Thank you,
>> Oleksandr
> Thanks,
> Munehisa

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-20  3:50               ` [Xen-devel] " Munehisa Kamata
  2019-03-21 19:25                 ` Anchal Agarwal
@ 2019-03-22 10:44                 ` Oleksandr Andrushchenko
  2019-03-22 10:44                 ` [Xen-devel] " Oleksandr Andrushchenko
  2 siblings, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-22 10:44 UTC (permalink / raw)
  To: Munehisa Kamata, Oleksandr Andrushchenko, Julien Grall,
	Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem, anchalag
  Cc: eduval, Volodymyr Babchuk


On 3/20/19 5:50 AM, Munehisa Kamata wrote:
> On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
>> +Amazon
>> pls see inline
> Hi Oleksandr,
>
> Let me add some comments as the original author of the series.
Thank you for your work!
>
>> On 3/14/19 9:00 PM, Julien Grall wrote:
>>> Hi,
>>>
>>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>>
>>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>>> This leads to the number of consequences:
>>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>>> cannot
>>>>>>>>>       be cleanly done as the backend still holds those (it was not
>>>>>>>>> told to
>>>>>>>>>       free the resources)
>>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>>> communication
>>>>>>>>>       means with the backned were destroyed by the frontend, thus
>>>>>>>>>       making the frontend appear to the guest OS as functional, but
>>>>>>>>>       not really.
>>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>>> connectivity?
>>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>>> what it does it destroys the rings etc. and never notifies the backend
>>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>>> there is
>>>>>>> no way its state machine can help recovering.
>>>>>> My tree is about a month old so perhaps there is some sort of regression
>>>>>> but this certainly works for me. After resume netfront gets
>>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>>> Ah, the difference can be of the way we get the guest enter
>>>>> the suspend state. I am making my guest to suspend with:
>>>>> echo mem > /sys/power/state
>>>>> And then I use an interrupt to the guest (this is a test code)
>>>>> to wake it up.
>>>>> Could you please share your exact use-case when the guest enters suspend
>>>>> and what you do to resume it?
>>>>
>>>> xl save / xl restore
>>>>
>>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>>> use-case
>>>>> as it simply doesn't know we want him to.
>>>>
>>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
>> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
>>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
>>>
>>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
>>>
>> First of all, let me make it clear that we are interested in hibernation long term, so it would be
>> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
>> step work, e.g. first S2RAM and later on hibernation.
>> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
>> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
>> Suspend-to-RAM for ARM [3] and we made number of assumptions:
>> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
>> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
>> feature if you will.
>> 2. With suspend/resume there is no intention to migrate VMs to any other host.
>> 3. Most probably configuration of the back/front won't change between suspend/resume.
>> But long term we are also thinking for supporting suspend/resume in its broader meaning,
>> e.g. what is probably what you mean by suspend/resume.
> AFAIK .suspend and .resume callbacks in frontend drivers are
> specifically for xl save/restore case rather than the normal "system"
> suspend. i.e. The former is Boris' case and something I called "Xen
> suspend" in the patch series, the latter should be your interest and
> called "ACPI path" here, and I referred to as "PM suspend". They are
> very different code paths, see drivers/xen/manage.c for details of
> Xen suspend.
Yes, I saw that code, thank you
>
>> Given that, we think that we don't need Xen support to save grants, page tables and other
>> VM's context on suspend at least at the first stage as we are implementing not a fully
>> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
>> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
>> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
>> we are not changing back/front configuration, we can even live with empty .resume/.suspend
>> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
>> use-case at the first system start (cold boot). And indeed, tests show that waking domains
>> in the right order do allow that.
>> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
> If you just (re)implement .suspend and .resume so without taking care
> of Xen suspend, you can easily break the existing functionality. The
> patch series introduced .freeze and .restore callbacks for both PM
> suspend and hibernation, and kept .suspend (not implemented in most
> frontend though) and .resume with no changes for Xen suspend.
>
> Note that xenbus has mapped freeze/thaw/restore events to suspend,
> resume and cancel callbacks to handle "checkpoint" case[4]. This was a
> bit tricky and led me to the design to have the separate set of
> callbacks at each frontend driver level[5]. You might need to consider
> a similar approach even if your immediate interest at the moment is PM
> suspend.
For the immediate task we have at the moment we think we can re-use
your work and implement .suspend/.resume based on it (we are targeting
S2RAM as the first stage).
But long term - we do support the idea of fully implemented
suspend and *hibernate* functionality as you describe it.
So, yes, we are also thinking about that.
>
>> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
>> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
>> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
>> for now (remember our very specific use-case which can survive suspend without callbacks
>> implemented).
>> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
>> be postponed for S2RAM.
>>
>> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
>> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
>> implementation in mind for later (we reuse code for .resume/.suspend)?
> In fact, Anchal has taken over my initial work and she may want to chime
> in here.
Great, could you please let us know what is the progress and further plans
on that, so we do not work on the same code and can coordinate our
efforts somehow? Anchal, could you please shed some light on this?
>
> That said, I'd be very happy to review patches if you come up with your
> own ones, so feel free to add me in that case.
Sure, thank you!
>
>>> Cheers,
>>>
>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>>>
>>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>>>
>> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
>
>>>>
>>>> -boris
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xenproject.org
>>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>>>
>> Thank you,
>> Oleksandr
> Thanks,
> Munehisa
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-22 10:44                 ` [Xen-devel] " Oleksandr Andrushchenko
@ 2019-03-25 17:30                   ` Anchal Agarwal
  2019-03-27  6:40                     ` [Xen-devel] " Oleksandr Andrushchenko
  2019-03-27  6:40                     ` Oleksandr Andrushchenko
  0 siblings, 2 replies; 38+ messages in thread
From: Anchal Agarwal @ 2019-03-25 17:30 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: jgross, sstabellini, eduval, Oleksandr Andrushchenko, netdev,
	Munehisa Kamata, linux-kernel, Julien Grall, xen-devel,
	Boris Ostrovsky, Volodymyr Babchuk, davem

On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote:
> 
> On 3/20/19 5:50 AM, Munehisa Kamata wrote:
> > On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
> >> +Amazon
> >> pls see inline
> > Hi Oleksandr,
> >
> > Let me add some comments as the original author of the series.
> Thank you for your work!
Hi Oleksandr,
> >
> >> On 3/14/19 9:00 PM, Julien Grall wrote:
> >>> Hi,
> >>>
> >>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
> >>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
> >>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
> >>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
> >>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
> >>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>>>>>>>
> >>>>>>>>> Currently on driver resume we remove all the network queues and
> >>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
> >>>>>>>>> and never signaling the backend of this frontend's state change.
> >>>>>>>>> This leads to the number of consequences:
> >>>>>>>>> - when frontend withdraws granted references to the rings etc. it
> >>>>>>>>> cannot
> >>>>>>>>>  ???????? be cleanly done as the backend still holds those (it was not
> >>>>>>>>> told to
> >>>>>>>>>  ???????? free the resources)
> >>>>>>>>> - it is not possible to resume driver operation as all the
> >>>>>>>>> communication
> >>>>>>>>>  ???????? means with the backned were destroyed by the frontend, thus
> >>>>>>>>>  ???????? making the frontend appear to the guest OS as functional, but
> >>>>>>>>>  ???????? not really.
> >>>>>>>> What do you mean? Are you saying that after resume you lose
> >>>>>>>> connectivity?
> >>>>>>> Exactly, if you take a look at the .resume callback as it is now
> >>>>>>> what it does it destroys the rings etc. and never notifies the backend
> >>>>>>> of that, e.g. it stays in, say, connected state with communication
> >>>>>>> channels destroyed. It never goes into any other Xen bus state, so
> >>>>>>> there is
> >>>>>>> no way its state machine can help recovering.
> >>>>>> My tree is about a month old so perhaps there is some sort of regression
> >>>>>> but this certainly works for me. After resume netfront gets
> >>>>>> XenbusStateInitWait from backend which causes xennet_connect().
> >>>>> Ah, the difference can be of the way we get the guest enter
> >>>>> the suspend state. I am making my guest to suspend with:
> >>>>> echo mem > /sys/power/state
> >>>>> And then I use an interrupt to the guest (this is a test code)
> >>>>> to wake it up.
> >>>>> Could you please share your exact use-case when the guest enters suspend
> >>>>> and what you do to resume it?
> >>>>
> >>>> xl save / xl restore
> >>>>
> >>>>> I can see no way backend may want enter XenbusStateInitWait in my
> >>>>> use-case
> >>>>> as it simply doesn't know we want him to.
> >>>>
> >>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
> >>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
> >> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
> >>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
> >>>
> >>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
> >>>
> >> First of all, let me make it clear that we are interested in hibernation long term, so it would be
> >> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
> >> step work, e.g. first S2RAM and later on hibernation.
> >> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
> >> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
> >> Suspend-to-RAM for ARM [3] and we made number of assumptions:
> >> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
> >> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
> >> feature if you will.
> >> 2. With suspend/resume there is no intention to migrate VMs to any other host.
> >> 3. Most probably configuration of the back/front won't change between suspend/resume.
> >> But long term we are also thinking for supporting suspend/resume in its broader meaning,
> >> e.g. what is probably what you mean by suspend/resume.
> > AFAIK .suspend and .resume callbacks in frontend drivers are
> > specifically for xl save/restore case rather than the normal "system"
> > suspend. i.e. The former is Boris' case and something I called "Xen
> > suspend" in the patch series, the latter should be your interest and
> > called "ACPI path" here, and I referred to as "PM suspend". They are
> > very different code paths, see drivers/xen/manage.c for details of
> > Xen suspend.
> Yes, I saw that code, thank you
> >
> >> Given that, we think that we don't need Xen support to save grants, page tables and other
> >> VM's context on suspend at least at the first stage as we are implementing not a fully
> >> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
> >> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
> >> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
> >> we are not changing back/front configuration, we can even live with empty .resume/.suspend
> >> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
> >> use-case at the first system start (cold boot). And indeed, tests show that waking domains
> >> in the right order do allow that.
> >> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
> > If you just (re)implement .suspend and .resume so without taking care
> > of Xen suspend, you can easily break the existing functionality. The
> > patch series introduced .freeze and .restore callbacks for both PM
> > suspend and hibernation, and kept .suspend (not implemented in most
> > frontend though) and .resume with no changes for Xen suspend.
> >
> > Note that xenbus has mapped freeze/thaw/restore events to suspend,
> > resume and cancel callbacks to handle "checkpoint" case[4]. This was a
> > bit tricky and led me to the design to have the separate set of
> > callbacks at each frontend driver level[5]. You might need to consider
> > a similar approach even if your immediate interest at the moment is PM
> > suspend.
> For the immediate task we have at the moment we think we can re-use
> your work and implement .suspend/.resume based on it (we are targeting
> S2RAM as the first stage).
> But long term - we do support the idea of fully implemented
> suspend and *hibernate* functionality as you describe it.
> So, yes, we are also thinking about that.
> >
> >> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
> >> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
> >> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
> >> for now (remember our very specific use-case which can survive suspend without callbacks
> >> implemented).
> >> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
> >> be postponed for S2RAM.
> >>
> >> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
> >> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
> >> implementation in mind for later (we reuse code for .resume/.suspend)?
> > In fact, Anchal has taken over my initial work and she may want to chime
> > in here.
> Great, could you please let us know what is the progress and further plans
> on that, so we do not work on the same code and can coordinate our
> efforts somehow? Anchal, could you please shed some light on this?
> >
Looks like my previous email did not make it to mailing list. May be some issues with my 
email server settings. Giving it another shot.
Yes, I am working on those patches and plan to re-post them in an effort to upstream.
I agree with Munehisa here on considering the patches that are already out there as
I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION from xen
suspend and resume. There may be minor fixes here and there however, the overall
idea will still remain the same. As the previous patches there will be support for
only xen-blkfront and xen-netfront in the initial patchset.
> > That said, I'd be very happy to review patches if you come up with your
> > own ones, so feel free to add me in that case.
> Sure, thank you!
> >
> >>> Cheers,
> >>>
> >>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
> >>>
> >>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
> >>>
> >> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
> > [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> > [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
> >
> >>>>
> >>>> -boris
> >>>>
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@lists.xenproject.org
> >>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
> >>>>
> >> Thank you,
> >> Oleksandr
> > Thanks,
> > Munehisa
Thanks,
Anchal

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-25 17:30                   ` Anchal Agarwal
@ 2019-03-27  6:40                     ` Oleksandr Andrushchenko
  2019-03-28 23:19                       ` Anchal Agarwal
  2019-03-27  6:40                     ` Oleksandr Andrushchenko
  1 sibling, 1 reply; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-27  6:40 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: Munehisa Kamata, Oleksandr_Andrushchenko, Julien Grall,
	Boris Ostrovsky, netdev, xen-devel, linux-kernel, jgross,
	sstabellini, davem, Volodymyr Babchuk, eduval

On 3/25/19 7:30 PM, Anchal Agarwal wrote:
> On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote:
>> On 3/20/19 5:50 AM, Munehisa Kamata wrote:
>>> On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
>>>> +Amazon
>>>> pls see inline
>>> Hi Oleksandr,
>>>
>>> Let me add some comments as the original author of the series.
>> Thank you for your work!
> Hi Oleksandr,
>>>> On 3/14/19 9:00 PM, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>>>>
>>>>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>>>>> This leads to the number of consequences:
>>>>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>>>>> cannot
>>>>>>>>>>>   ???????? be cleanly done as the backend still holds those (it was not
>>>>>>>>>>> told to
>>>>>>>>>>>   ???????? free the resources)
>>>>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>>>>> communication
>>>>>>>>>>>   ???????? means with the backned were destroyed by the frontend, thus
>>>>>>>>>>>   ???????? making the frontend appear to the guest OS as functional, but
>>>>>>>>>>>   ???????? not really.
>>>>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>>>>> connectivity?
>>>>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>>>>> what it does it destroys the rings etc. and never notifies the backend
>>>>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>>>>> there is
>>>>>>>>> no way its state machine can help recovering.
>>>>>>>> My tree is about a month old so perhaps there is some sort of regression
>>>>>>>> but this certainly works for me. After resume netfront gets
>>>>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>>>>> Ah, the difference can be of the way we get the guest enter
>>>>>>> the suspend state. I am making my guest to suspend with:
>>>>>>> echo mem > /sys/power/state
>>>>>>> And then I use an interrupt to the guest (this is a test code)
>>>>>>> to wake it up.
>>>>>>> Could you please share your exact use-case when the guest enters suspend
>>>>>>> and what you do to resume it?
>>>>>> xl save / xl restore
>>>>>>
>>>>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>>>>> use-case
>>>>>>> as it simply doesn't know we want him to.
>>>>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>>>>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
>>>> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
>>>>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
>>>>>
>>>>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
>>>>>
>>>> First of all, let me make it clear that we are interested in hibernation long term, so it would be
>>>> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
>>>> step work, e.g. first S2RAM and later on hibernation.
>>>> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
>>>> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
>>>> Suspend-to-RAM for ARM [3] and we made number of assumptions:
>>>> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
>>>> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
>>>> feature if you will.
>>>> 2. With suspend/resume there is no intention to migrate VMs to any other host.
>>>> 3. Most probably configuration of the back/front won't change between suspend/resume.
>>>> But long term we are also thinking for supporting suspend/resume in its broader meaning,
>>>> e.g. what is probably what you mean by suspend/resume.
>>> AFAIK .suspend and .resume callbacks in frontend drivers are
>>> specifically for xl save/restore case rather than the normal "system"
>>> suspend. i.e. The former is Boris' case and something I called "Xen
>>> suspend" in the patch series, the latter should be your interest and
>>> called "ACPI path" here, and I referred to as "PM suspend". They are
>>> very different code paths, see drivers/xen/manage.c for details of
>>> Xen suspend.
>> Yes, I saw that code, thank you
>>>> Given that, we think that we don't need Xen support to save grants, page tables and other
>>>> VM's context on suspend at least at the first stage as we are implementing not a fully
>>>> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
>>>> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
>>>> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
>>>> we are not changing back/front configuration, we can even live with empty .resume/.suspend
>>>> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
>>>> use-case at the first system start (cold boot). And indeed, tests show that waking domains
>>>> in the right order do allow that.
>>>> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
>>> If you just (re)implement .suspend and .resume so without taking care
>>> of Xen suspend, you can easily break the existing functionality. The
>>> patch series introduced .freeze and .restore callbacks for both PM
>>> suspend and hibernation, and kept .suspend (not implemented in most
>>> frontend though) and .resume with no changes for Xen suspend.
>>>
>>> Note that xenbus has mapped freeze/thaw/restore events to suspend,
>>> resume and cancel callbacks to handle "checkpoint" case[4]. This was a
>>> bit tricky and led me to the design to have the separate set of
>>> callbacks at each frontend driver level[5]. You might need to consider
>>> a similar approach even if your immediate interest at the moment is PM
>>> suspend.
>> For the immediate task we have at the moment we think we can re-use
>> your work and implement .suspend/.resume based on it (we are targeting
>> S2RAM as the first stage).
>> But long term - we do support the idea of fully implemented
>> suspend and *hibernate* functionality as you describe it.
>> So, yes, we are also thinking about that.
>>>> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
>>>> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
>>>> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
>>>> for now (remember our very specific use-case which can survive suspend without callbacks
>>>> implemented).
>>>> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
>>>> be postponed for S2RAM.
>>>>
>>>> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
>>>> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
>>>> implementation in mind for later (we reuse code for .resume/.suspend)?
>>> In fact, Anchal has taken over my initial work and she may want to chime
>>> in here.
>> Great, could you please let us know what is the progress and further plans
>> on that, so we do not work on the same code and can coordinate our
>> efforts somehow? Anchal, could you please shed some light on this?
> Looks like my previous email did not make it to mailing list. May be some issues with my
> email server settings. Giving it another shot.
> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
This is really great, looking forward to it: any date in your mind
when this can happen?
> I agree with Munehisa here on considering the patches that are already out there as
> I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION from xen
> suspend and resume. There may be minor fixes here and there however, the overall
> idea will still remain the same.
Ok, so I'll plan my efforts accordingly
>   As the previous patches there will be support for
> only xen-blkfront and xen-netfront in the initial patchset.
>>> That said, I'd be very happy to review patches if you come up with your
>>> own ones, so feel free to add me in that case.
>> Sure, thank you!
>>>>> Cheers,
>>>>>
>>>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>>>>>
>>>>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>>>>>
>>>> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
>>> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
>>> [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
>>>
>>>>>> -boris
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xenproject.org
>>>>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>>>>>
>>>> Thank you,
>>>> Oleksandr
>>> Thanks,
>>> Munehisa
> Thanks,
> Anchal
Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-25 17:30                   ` Anchal Agarwal
  2019-03-27  6:40                     ` [Xen-devel] " Oleksandr Andrushchenko
@ 2019-03-27  6:40                     ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-03-27  6:40 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: jgross, sstabellini, eduval, Oleksandr_Andrushchenko, netdev,
	Munehisa Kamata, linux-kernel, Julien Grall, xen-devel,
	Boris Ostrovsky, Volodymyr Babchuk, davem

On 3/25/19 7:30 PM, Anchal Agarwal wrote:
> On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote:
>> On 3/20/19 5:50 AM, Munehisa Kamata wrote:
>>> On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
>>>> +Amazon
>>>> pls see inline
>>> Hi Oleksandr,
>>>
>>> Let me add some comments as the original author of the series.
>> Thank you for your work!
> Hi Oleksandr,
>>>> On 3/14/19 9:00 PM, Julien Grall wrote:
>>>>> Hi,
>>>>>
>>>>> On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
>>>>>> On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
>>>>>>> On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
>>>>>>>> On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>> On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
>>>>>>>>>> On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
>>>>>>>>>>> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>>>>>>>>>>>
>>>>>>>>>>> Currently on driver resume we remove all the network queues and
>>>>>>>>>>> destroy shared Tx/Rx rings leaving the driver in its current state
>>>>>>>>>>> and never signaling the backend of this frontend's state change.
>>>>>>>>>>> This leads to the number of consequences:
>>>>>>>>>>> - when frontend withdraws granted references to the rings etc. it
>>>>>>>>>>> cannot
>>>>>>>>>>>   ???????? be cleanly done as the backend still holds those (it was not
>>>>>>>>>>> told to
>>>>>>>>>>>   ???????? free the resources)
>>>>>>>>>>> - it is not possible to resume driver operation as all the
>>>>>>>>>>> communication
>>>>>>>>>>>   ???????? means with the backned were destroyed by the frontend, thus
>>>>>>>>>>>   ???????? making the frontend appear to the guest OS as functional, but
>>>>>>>>>>>   ???????? not really.
>>>>>>>>>> What do you mean? Are you saying that after resume you lose
>>>>>>>>>> connectivity?
>>>>>>>>> Exactly, if you take a look at the .resume callback as it is now
>>>>>>>>> what it does it destroys the rings etc. and never notifies the backend
>>>>>>>>> of that, e.g. it stays in, say, connected state with communication
>>>>>>>>> channels destroyed. It never goes into any other Xen bus state, so
>>>>>>>>> there is
>>>>>>>>> no way its state machine can help recovering.
>>>>>>>> My tree is about a month old so perhaps there is some sort of regression
>>>>>>>> but this certainly works for me. After resume netfront gets
>>>>>>>> XenbusStateInitWait from backend which causes xennet_connect().
>>>>>>> Ah, the difference can be of the way we get the guest enter
>>>>>>> the suspend state. I am making my guest to suspend with:
>>>>>>> echo mem > /sys/power/state
>>>>>>> And then I use an interrupt to the guest (this is a test code)
>>>>>>> to wake it up.
>>>>>>> Could you please share your exact use-case when the guest enters suspend
>>>>>>> and what you do to resume it?
>>>>>> xl save / xl restore
>>>>>>
>>>>>>> I can see no way backend may want enter XenbusStateInitWait in my
>>>>>>> use-case
>>>>>>> as it simply doesn't know we want him to.
>>>>>> Yours looks like ACPI path, I don't know how well it was tested TBH.
>>>>> I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
>>>> Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
>>>>> Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
>>>>>
>>>>> I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
>>>>>
>>>> First of all, let me make it clear that we are interested in hibernation long term, so it would be
>>>> desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
>>>> step work, e.g. first S2RAM and later on hibernation.
>>>> Let me clarify the immediate use-case that we have, so it is easier to understand what we want
>>>> and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
>>>> Suspend-to-RAM for ARM [3] and we made number of assumptions:
>>>> 1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
>>>> of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
>>>> feature if you will.
>>>> 2. With suspend/resume there is no intention to migrate VMs to any other host.
>>>> 3. Most probably configuration of the back/front won't change between suspend/resume.
>>>> But long term we are also thinking for supporting suspend/resume in its broader meaning,
>>>> e.g. what is probably what you mean by suspend/resume.
>>> AFAIK .suspend and .resume callbacks in frontend drivers are
>>> specifically for xl save/restore case rather than the normal "system"
>>> suspend. i.e. The former is Boris' case and something I called "Xen
>>> suspend" in the patch series, the latter should be your interest and
>>> called "ACPI path" here, and I referred to as "PM suspend". They are
>>> very different code paths, see drivers/xen/manage.c for details of
>>> Xen suspend.
>> Yes, I saw that code, thank you
>>>> Given that, we think that we don't need Xen support to save grants, page tables and other
>>>> VM's context on suspend at least at the first stage as we are implementing not a fully
>>>> blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
>>>> suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
>>>> piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
>>>> we are not changing back/front configuration, we can even live with empty .resume/.suspend
>>>> frontend's callbacks because event channels, rings etc. are "statically" allocated in our
>>>> use-case at the first system start (cold boot). And indeed, tests show that waking domains
>>>> in the right order do allow that.
>>>> So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
>>> If you just (re)implement .suspend and .resume so without taking care
>>> of Xen suspend, you can easily break the existing functionality. The
>>> patch series introduced .freeze and .restore callbacks for both PM
>>> suspend and hibernation, and kept .suspend (not implemented in most
>>> frontend though) and .resume with no changes for Xen suspend.
>>>
>>> Note that xenbus has mapped freeze/thaw/restore events to suspend,
>>> resume and cancel callbacks to handle "checkpoint" case[4]. This was a
>>> bit tricky and led me to the design to have the separate set of
>>> callbacks at each frontend driver level[5]. You might need to consider
>>> a similar approach even if your immediate interest at the moment is PM
>>> suspend.
>> For the immediate task we have at the moment we think we can re-use
>> your work and implement .suspend/.resume based on it (we are targeting
>> S2RAM as the first stage).
>> But long term - we do support the idea of fully implemented
>> suspend and *hibernate* functionality as you describe it.
>> So, yes, we are also thinking about that.
>>>> even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
>>>> then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
>>>> can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
>>>> for now (remember our very specific use-case which can survive suspend without callbacks
>>>> implemented).
>>>> IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
>>>> be postponed for S2RAM.
>>>>
>>>> Julien/Juergen/Boris/Amazon - could you please express your view on the above?
>>>> Is it acceptable that for now we only take re-worked netfront patch from [3] with full
>>>> implementation in mind for later (we reuse code for .resume/.suspend)?
>>> In fact, Anchal has taken over my initial work and she may want to chime
>>> in here.
>> Great, could you please let us know what is the progress and further plans
>> on that, so we do not work on the same code and can coordinate our
>> efforts somehow? Anchal, could you please shed some light on this?
> Looks like my previous email did not make it to mailing list. May be some issues with my
> email server settings. Giving it another shot.
> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
This is really great, looking forward to it: any date in your mind
when this can happen?
> I agree with Munehisa here on considering the patches that are already out there as
> I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION from xen
> suspend and resume. There may be minor fixes here and there however, the overall
> idea will still remain the same.
Ok, so I'll plan my efforts accordingly
>   As the previous patches there will be support for
> only xen-blkfront and xen-netfront in the initial patchset.
>>> That said, I'd be very happy to review patches if you come up with your
>>> own ones, so feel free to add me in that case.
>> Sure, thank you!
>>>>> Cheers,
>>>>>
>>>>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
>>>>>
>>>>> [2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
>>>>>
>>>> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
>>> [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
>>> [5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
>>>
>>>>>> -boris
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xenproject.org
>>>>>> https://lists.xenproject.org/mailman/listinfo/xen-devel
>>>>>>
>>>> Thank you,
>>>> Oleksandr
>>> Thanks,
>>> Munehisa
> Thanks,
> Anchal
Thank you,
Oleksandr

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-27  6:40                     ` [Xen-devel] " Oleksandr Andrushchenko
@ 2019-03-28 23:19                       ` Anchal Agarwal
  2019-05-16  6:26                         ` Oleksandr Andrushchenko
  2019-05-16  6:26                           ` Oleksandr Andrushchenko
  0 siblings, 2 replies; 38+ messages in thread
From: Anchal Agarwal @ 2019-03-28 23:19 UTC (permalink / raw)
  To: Oleksandr Andrushchenko
  Cc: jgross, sstabellini, Oleksandr_Andrushchenko, netdev,
	Munehisa Kamata, linux-kernel, Anchal Agarwal, Julien Grall,
	xen-devel, Boris Ostrovsky, Volodymyr Babchuk, davem

On Wed, Mar 27, 2019 at 08:40:20AM +0200, Oleksandr Andrushchenko wrote:
> On 3/25/19 7:30 PM, Anchal Agarwal wrote:
> >On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote:
> >>On 3/20/19 5:50 AM, Munehisa Kamata wrote:
> >>>On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:
> >>>>+Amazon
> >>>>pls see inline
> >>>Hi Oleksandr,
> >>>
> >>>Let me add some comments as the original author of the series.
> >>Thank you for your work!
> >Hi Oleksandr,
> >>>>On 3/14/19 9:00 PM, Julien Grall wrote:
> >>>>>Hi,
> >>>>>
> >>>>>On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
> >>>>>>On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>On 3/14/19 5:02 PM, Boris Ostrovsky wrote:
> >>>>>>>>On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>>>On 3/14/19 4:47 PM, Boris Ostrovsky wrote:
> >>>>>>>>>>On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:
> >>>>>>>>>>>From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> >>>>>>>>>>>
> >>>>>>>>>>>Currently on driver resume we remove all the network queues and
> >>>>>>>>>>>destroy shared Tx/Rx rings leaving the driver in its current state
> >>>>>>>>>>>and never signaling the backend of this frontend's state change.
> >>>>>>>>>>>This leads to the number of consequences:
> >>>>>>>>>>>- when frontend withdraws granted references to the rings etc. it
> >>>>>>>>>>>cannot
> >>>>>>>>>>>  ???????? be cleanly done as the backend still holds those (it was not
> >>>>>>>>>>>told to
> >>>>>>>>>>>  ???????? free the resources)
> >>>>>>>>>>>- it is not possible to resume driver operation as all the
> >>>>>>>>>>>communication
> >>>>>>>>>>>  ???????? means with the backned were destroyed by the frontend, thus
> >>>>>>>>>>>  ???????? making the frontend appear to the guest OS as functional, but
> >>>>>>>>>>>  ???????? not really.
> >>>>>>>>>>What do you mean? Are you saying that after resume you lose
> >>>>>>>>>>connectivity?
> >>>>>>>>>Exactly, if you take a look at the .resume callback as it is now
> >>>>>>>>>what it does it destroys the rings etc. and never notifies the backend
> >>>>>>>>>of that, e.g. it stays in, say, connected state with communication
> >>>>>>>>>channels destroyed. It never goes into any other Xen bus state, so
> >>>>>>>>>there is
> >>>>>>>>>no way its state machine can help recovering.
> >>>>>>>>My tree is about a month old so perhaps there is some sort of regression
> >>>>>>>>but this certainly works for me. After resume netfront gets
> >>>>>>>>XenbusStateInitWait from backend which causes xennet_connect().
> >>>>>>>Ah, the difference can be of the way we get the guest enter
> >>>>>>>the suspend state. I am making my guest to suspend with:
> >>>>>>>echo mem > /sys/power/state
> >>>>>>>And then I use an interrupt to the guest (this is a test code)
> >>>>>>>to wake it up.
> >>>>>>>Could you please share your exact use-case when the guest enters suspend
> >>>>>>>and what you do to resume it?
> >>>>>>xl save / xl restore
> >>>>>>
> >>>>>>>I can see no way backend may want enter XenbusStateInitWait in my
> >>>>>>>use-case
> >>>>>>>as it simply doesn't know we want him to.
> >>>>>>Yours looks like ACPI path, I don't know how well it was tested TBH.
> >>>>>I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
> >>>>Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
> >>>>>Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).
> >>>>>
> >>>>>I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
> >>>>>
> >>>>First of all, let me make it clear that we are interested in hibernation long term, so it would be
> >>>>desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
> >>>>step work, e.g. first S2RAM and later on hibernation.
> >>>>Let me clarify the immediate use-case that we have, so it is easier to understand what we want
> >>>>and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
> >>>>Suspend-to-RAM for ARM [3] and we made number of assumptions:
> >>>>1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
> >>>>of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
> >>>>feature if you will.
> >>>>2. With suspend/resume there is no intention to migrate VMs to any other host.
> >>>>3. Most probably configuration of the back/front won't change between suspend/resume.
> >>>>But long term we are also thinking for supporting suspend/resume in its broader meaning,
> >>>>e.g. what is probably what you mean by suspend/resume.
> >>>AFAIK .suspend and .resume callbacks in frontend drivers are
> >>>specifically for xl save/restore case rather than the normal "system"
> >>>suspend. i.e. The former is Boris' case and something I called "Xen
> >>>suspend" in the patch series, the latter should be your interest and
> >>>called "ACPI path" here, and I referred to as "PM suspend". They are
> >>>very different code paths, see drivers/xen/manage.c for details of
> >>>Xen suspend.
> >>Yes, I saw that code, thank you
> >>>>Given that, we think that we don't need Xen support to save grants, page tables and other
> >>>>VM's context on suspend at least at the first stage as we are implementing not a fully
> >>>>blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
> >>>>suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
> >>>>piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
> >>>>we are not changing back/front configuration, we can even live with empty .resume/.suspend
> >>>>frontend's callbacks because event channels, rings etc. are "statically" allocated in our
> >>>>use-case at the first system start (cold boot). And indeed, tests show that waking domains
> >>>>in the right order do allow that.
> >>>>So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
> >>>If you just (re)implement .suspend and .resume so without taking care
> >>>of Xen suspend, you can easily break the existing functionality. The
> >>>patch series introduced .freeze and .restore callbacks for both PM
> >>>suspend and hibernation, and kept .suspend (not implemented in most
> >>>frontend though) and .resume with no changes for Xen suspend.
> >>>
> >>>Note that xenbus has mapped freeze/thaw/restore events to suspend,
> >>>resume and cancel callbacks to handle "checkpoint" case[4]. This was a
> >>>bit tricky and led me to the design to have the separate set of
> >>>callbacks at each frontend driver level[5]. You might need to consider
> >>>a similar approach even if your immediate interest at the moment is PM
> >>>suspend.
> >>For the immediate task we have at the moment we think we can re-use
> >>your work and implement .suspend/.resume based on it (we are targeting
> >>S2RAM as the first stage).
> >>But long term - we do support the idea of fully implemented
> >>suspend and *hibernate* functionality as you describe it.
> >>So, yes, we are also thinking about that.
> >>>>even freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]
> >>>>then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
> >>>>can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
> >>>>for now (remember our very specific use-case which can survive suspend without callbacks
> >>>>implemented).
> >>>>IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
> >>>>be postponed for S2RAM.
> >>>>
> >>>>Julien/Juergen/Boris/Amazon - could you please express your view on the above?
> >>>>Is it acceptable that for now we only take re-worked netfront patch from [3] with full
> >>>>implementation in mind for later (we reuse code for .resume/.suspend)?
> >>>In fact, Anchal has taken over my initial work and she may want to chime
> >>>in here.
> >>Great, could you please let us know what is the progress and further plans
> >>on that, so we do not work on the same code and can coordinate our
> >>efforts somehow? Anchal, could you please shed some light on this?
> >Looks like my previous email did not make it to mailing list. May be some issues with my
> >email server settings. Giving it another shot.
> >Yes, I am working on those patches and plan to re-post them in an effort to upstream.
> This is really great, looking forward to it: any date in your mind
> when this can happen?
Not a specific date but may be in few weeks. I am currently swamped at work.
> >I agree with Munehisa here on considering the patches that are already out there as
> >I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION from xen
> >suspend and resume. There may be minor fixes here and there however, the overall
> >idea will still remain the same.
> Ok, so I'll plan my efforts accordingly
> >  As the previous patches there will be support for
> >only xen-blkfront and xen-netfront in the initial patchset.
> >>>That said, I'd be very happy to review patches if you come up with your
> >>>own ones, so feel free to add me in that case.
> >>Sure, thank you!
> >>>>>Cheers,
> >>>>>
> >>>>>[1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
> >>>>>
> >>>>>[2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
> >>>>>
> >>>>[3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
> >>>[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6b
> >>>[5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
> >>>
> >>>>>>-boris
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>Xen-devel mailing list
> >>>>>>Xen-devel@lists.xenproject.org
> >>>>>>https://lists.xenproject.org/mailman/listinfo/xen-devel
> >>>>>>
> >>>>Thank you,
> >>>>Oleksandr
> >>>Thanks,
> >>>Munehisa
> >Thanks,
> >Anchal
> Thank you,
> Oleksandr

Thanks,
Anchal

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-28 23:19                       ` Anchal Agarwal
@ 2019-05-16  6:26                           ` Oleksandr Andrushchenko
  2019-05-16  6:26                           ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-05-16  6:26 UTC (permalink / raw)
  To: Anchal Agarwal, Oleksandr_Andrushchenko
  Cc: Anchal Agarwal, Munehisa Kamata, Julien Grall, Boris Ostrovsky,
	netdev, xen-devel, linux-kernel, jgross, sstabellini, davem,
	Volodymyr Babchuk, Artem Mygaiev

Hello, Anchal!

On 3/29/19 1:19 AM, Anchal Agarwal wrote:
[snip]
>>>> Great, could you please let us know what is the progress and further plans
>>>> on that, so we do not work on the same code and can coordinate our
>>>> efforts somehow? Anchal, could you please shed some light on this?
>>> Looks like my previous email did not make it to mailing list. May be some issues with my
>>> email server settings. Giving it another shot.
>>> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
>> This is really great, looking forward to it: any date in your mind
>> when this can happen?
> Not a specific date but may be in few weeks. I am currently swamped at work.
>
Any progress on this?

Thank you,
Oleksandr

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-03-28 23:19                       ` Anchal Agarwal
@ 2019-05-16  6:26                         ` Oleksandr Andrushchenko
  2019-05-16  6:26                           ` Oleksandr Andrushchenko
  1 sibling, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-05-16  6:26 UTC (permalink / raw)
  To: Anchal Agarwal, Oleksandr_Andrushchenko
  Cc: jgross, Artem Mygaiev, sstabellini, netdev, Munehisa Kamata,
	linux-kernel, Anchal Agarwal, Julien Grall, xen-devel,
	Boris Ostrovsky, Volodymyr Babchuk, davem

Hello, Anchal!

On 3/29/19 1:19 AM, Anchal Agarwal wrote:
[snip]
>>>> Great, could you please let us know what is the progress and further plans
>>>> on that, so we do not work on the same code and can coordinate our
>>>> efforts somehow? Anchal, could you please shed some light on this?
>>> Looks like my previous email did not make it to mailing list. May be some issues with my
>>> email server settings. Giving it another shot.
>>> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
>> This is really great, looking forward to it: any date in your mind
>> when this can happen?
> Not a specific date but may be in few weeks. I am currently swamped at work.
>
Any progress on this?

Thank you,
Oleksandr

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
@ 2019-05-16  6:26                           ` Oleksandr Andrushchenko
  0 siblings, 0 replies; 38+ messages in thread
From: Oleksandr Andrushchenko @ 2019-05-16  6:26 UTC (permalink / raw)
  To: Anchal Agarwal, Oleksandr_Andrushchenko
  Cc: jgross, Artem Mygaiev, sstabellini, netdev, Munehisa Kamata,
	linux-kernel, Anchal Agarwal, Julien Grall, xen-devel,
	Boris Ostrovsky, Volodymyr Babchuk, davem

Hello, Anchal!

On 3/29/19 1:19 AM, Anchal Agarwal wrote:
[snip]
>>>> Great, could you please let us know what is the progress and further plans
>>>> on that, so we do not work on the same code and can coordinate our
>>>> efforts somehow? Anchal, could you please shed some light on this?
>>> Looks like my previous email did not make it to mailing list. May be some issues with my
>>> email server settings. Giving it another shot.
>>> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
>> This is really great, looking forward to it: any date in your mind
>> when this can happen?
> Not a specific date but may be in few weeks. I am currently swamped at work.
>
Any progress on this?

Thank you,
Oleksandr

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-05-16  6:26                           ` Oleksandr Andrushchenko
@ 2019-05-30 12:32                             ` Agarwal, Anchal
  -1 siblings, 0 replies; 38+ messages in thread
From: Agarwal, Anchal @ 2019-05-30 12:32 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Anchal Agarwal, Oleksandr_Andrushchenko
  Cc: Kamata, Munehisa, Julien Grall, Boris Ostrovsky, netdev,
	xen-devel, linux-kernel, jgross, sstabellini, davem,
	Volodymyr Babchuk, Artem Mygaiev

Hi Oleksandr,

    Hello, Anchal!
    
    On 3/29/19 1:19 AM, Anchal Agarwal wrote:
    [snip]
    >>>> Great, could you please let us know what is the progress and further plans
    >>>> on that, so we do not work on the same code and can coordinate our
    >>>> efforts somehow? Anchal, could you please shed some light on this?
    >>> Looks like my previous email did not make it to mailing list. May be some issues with my
    >>> email server settings. Giving it another shot.
    >>> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
    >> This is really great, looking forward to it: any date in your mind
    >> when this can happen?
    > Not a specific date but may be in few weeks. I am currently swamped at work.
    >
    Any progress on this?

Yes, but at a snail's pace.
    
    Thank you,
    Oleksandr

Thanks,
Anchal    


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH] xen/netfront: Remove unneeded .resume callback
  2019-05-16  6:26                           ` Oleksandr Andrushchenko
  (?)
  (?)
@ 2019-05-30 12:32                           ` Agarwal, Anchal
  -1 siblings, 0 replies; 38+ messages in thread
From: Agarwal, Anchal @ 2019-05-30 12:32 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Anchal Agarwal, Oleksandr_Andrushchenko
  Cc: jgross, Artem Mygaiev, sstabellini, netdev, Kamata, Munehisa,
	linux-kernel, Julien Grall, xen-devel, Boris Ostrovsky,
	Volodymyr Babchuk, davem

Hi Oleksandr,

    Hello, Anchal!
    
    On 3/29/19 1:19 AM, Anchal Agarwal wrote:
    [snip]
    >>>> Great, could you please let us know what is the progress and further plans
    >>>> on that, so we do not work on the same code and can coordinate our
    >>>> efforts somehow? Anchal, could you please shed some light on this?
    >>> Looks like my previous email did not make it to mailing list. May be some issues with my
    >>> email server settings. Giving it another shot.
    >>> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
    >> This is really great, looking forward to it: any date in your mind
    >> when this can happen?
    > Not a specific date but may be in few weeks. I am currently swamped at work.
    >
    Any progress on this?

Yes, but at a snail's pace.
    
    Thank you,
    Oleksandr

Thanks,
Anchal    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Xen-devel] [PATCH] xen/netfront: Remove unneeded .resume callback
@ 2019-05-30 12:32                             ` Agarwal, Anchal
  0 siblings, 0 replies; 38+ messages in thread
From: Agarwal, Anchal @ 2019-05-30 12:32 UTC (permalink / raw)
  To: Oleksandr Andrushchenko, Anchal Agarwal, Oleksandr_Andrushchenko
  Cc: jgross, Artem Mygaiev, sstabellini, netdev, Kamata, Munehisa,
	linux-kernel, Julien Grall, xen-devel, Boris Ostrovsky,
	Volodymyr Babchuk, davem

Hi Oleksandr,

    Hello, Anchal!
    
    On 3/29/19 1:19 AM, Anchal Agarwal wrote:
    [snip]
    >>>> Great, could you please let us know what is the progress and further plans
    >>>> on that, so we do not work on the same code and can coordinate our
    >>>> efforts somehow? Anchal, could you please shed some light on this?
    >>> Looks like my previous email did not make it to mailing list. May be some issues with my
    >>> email server settings. Giving it another shot.
    >>> Yes, I am working on those patches and plan to re-post them in an effort to upstream.
    >> This is really great, looking forward to it: any date in your mind
    >> when this can happen?
    > Not a specific date but may be in few weeks. I am currently swamped at work.
    >
    Any progress on this?

Yes, but at a snail's pace.
    
    Thank you,
    Oleksandr

Thanks,
Anchal    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2019-05-30 12:32 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-14 13:17 [Xen-devel][PATCH] xen/netfront: Remove unneeded .resume callback Oleksandr Andrushchenko
2019-03-14 13:50 ` [PATCH] " Oleksandr Andrushchenko
2019-03-14 13:50 ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
2019-03-14 14:47 ` Boris Ostrovsky
2019-03-14 14:52   ` [PATCH] " Oleksandr Andrushchenko
2019-03-14 14:52   ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
2019-03-14 15:02     ` [PATCH] " Boris Ostrovsky
2019-03-14 15:02     ` [Xen-devel][PATCH] " Boris Ostrovsky
2019-03-14 15:10       ` [PATCH] " Oleksandr Andrushchenko
2019-03-14 15:10       ` [Xen-devel][PATCH] " Oleksandr Andrushchenko
2019-03-14 15:40         ` Boris Ostrovsky
2019-03-14 16:33           ` Oleksandr Andrushchenko
2019-03-14 18:16             ` Boris Ostrovsky
2019-03-14 18:20               ` Oleksandr Andrushchenko
2019-03-14 18:20               ` [PATCH] " Oleksandr Andrushchenko
2019-03-14 18:16             ` Boris Ostrovsky
2019-03-14 16:33           ` Oleksandr Andrushchenko
2019-03-14 19:00           ` [Xen-devel] " Julien Grall
2019-03-18 10:02             ` Oleksandr Andrushchenko
2019-03-18 10:02             ` [Xen-devel] " Oleksandr Andrushchenko
2019-03-20  3:50               ` Munehisa Kamata
2019-03-20  3:50               ` [Xen-devel] " Munehisa Kamata
2019-03-21 19:25                 ` Anchal Agarwal
2019-03-22 10:44                 ` Oleksandr Andrushchenko
2019-03-22 10:44                 ` [Xen-devel] " Oleksandr Andrushchenko
2019-03-25 17:30                   ` Anchal Agarwal
2019-03-27  6:40                     ` [Xen-devel] " Oleksandr Andrushchenko
2019-03-28 23:19                       ` Anchal Agarwal
2019-05-16  6:26                         ` Oleksandr Andrushchenko
2019-05-16  6:26                         ` [Xen-devel] " Oleksandr Andrushchenko
2019-05-16  6:26                           ` Oleksandr Andrushchenko
2019-05-30 12:32                           ` Agarwal, Anchal
2019-05-30 12:32                             ` Agarwal, Anchal
2019-05-30 12:32                           ` Agarwal, Anchal
2019-03-27  6:40                     ` Oleksandr Andrushchenko
2019-03-14 19:00           ` Julien Grall
2019-03-14 15:40         ` Boris Ostrovsky
2019-03-14 14:47 ` Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.