All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] Fix PM hibernation in Xen guests
@ 2020-05-19 23:24 ` Anchal Agarwal
  0 siblings, 0 replies; 73+ messages in thread
From: Anchal Agarwal @ 2020-05-19 23:24 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, x86, boris.ostrovsky, jgross, linux-pm,
	linux-mm, kamatam, sstabellini, konrad.wilk, roger.pau, axboe,
	davem, rjw, len.brown, pavel, peterz, eduval, sblbir, anchalag,
	xen-devel, vkuznets, netdev, linux-kernel, dwmw, benh

Hello,
This series fixes PM hibernation for hvm guests running on xen hypervisor.
The running guest could now be hibernated and resumed successfully at a
later time. The fixes for PM hibernation are added to block and
network device drivers i.e xen-blkfront and xen-netfront. Any other driver
that needs to add S4 support if not already, can follow same method of
introducing freeze/thaw/restore callbacks.
The patches had been tested against upstream kernel and xen4.11. Large
scale testing is also done on Xen based Amazon EC2 instances. All this testing
involved running memory exhausting workload in the background.

Doing guest hibernation does not involve any support from hypervisor and
this way guest has complete control over its state. Infrastructure
restrictions for saving up guest state can be overcome by guest initiated
hibernation.

These patches were send out as RFC before and all the feedback had been
incorporated in the patches. The last RFCV3 could be found here:
https://lkml.org/lkml/2020/2/14/2789

Known issues:
1.KASLR causes intermittent hibernation failures. VM fails to resumes and
has to be restarted. I will investigate this issue separately and shouldn't
be a blocker for this patch series.
2. During hibernation, I observed sometimes that freezing of tasks fails due
to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1
out of 200 runs and hibernation is aborted in this case. Re-trying hibernation
may work. Also, this is a known issue with hibernation and some
filesystems like XFS has been discussed by the community for years with not an
effectve resolution at this point.

Testing How to:
---------------
1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream
xen-4.11]
2. Bring up a HVM guest w/t kernel compiled with hibernation patches
[I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem images].
3. Create a swap file size=RAM size
4. Update grub parameters and reboot
5. Trigger pm-hibernation from within the VM

Example:
Set up a file-backed swap space. Swap file size>=Total memory on the system
sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB
sudo chmod 600 /swap
sudo mkswap /swap
sudo swapon /swap

Update resume device/resume offset in grub if using swap file:
resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1

Execute:
--------
sudo pm-hibernate
OR
echo disk > /sys/power/state && echo reboot > /sys/power/disk

Compute resume offset code:
"
#!/usr/bin/env python
import sys
import array
import fcntl

#swap file
f = open(sys.argv[1], 'r')
buf = array.array('L', [0])

#FIBMAP
ret = fcntl.ioctl(f.fileno(), 0x01, buf)
print buf[0]
"


Anchal Agarwal (5):
  x86/xen: Introduce new function to map HYPERVISOR_shared_info on
    Resume
  genirq: Shutdown irq chips in suspend/resume during hibernation
  xen: Introduce wrapper for save/restore sched clock offset
  xen: Update sched clock offset to avoid system instability in
    hibernation
  PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA

Munehisa Kamata (7):
  xen/manage: keep track of the on-going suspend mode
  xenbus: add freeze/thaw/restore callbacks support
  x86/xen: add system core suspend and resume callbacks
  xen-blkfront: add callbacks for PM suspend and hibernation
  xen-netfront: add callbacks for PM suspend and hibernation
  xen/time: introduce xen_{save,restore}_steal_clock
  x86/xen: save and restore steal clock

 arch/x86/xen/enlighten_hvm.c      |   8 ++
 arch/x86/xen/suspend.c            |  72 ++++++++++++++++++
 arch/x86/xen/time.c               |  18 ++++-
 arch/x86/xen/xen-ops.h            |   3 +
 drivers/block/xen-blkfront.c      | 122 ++++++++++++++++++++++++++++--
 drivers/net/xen-netfront.c        |  98 +++++++++++++++++++++++-
 drivers/xen/events/events_base.c  |   1 +
 drivers/xen/manage.c              |  73 ++++++++++++++++++
 drivers/xen/time.c                |  29 ++++++-
 drivers/xen/xenbus/xenbus_probe.c |  99 +++++++++++++++++++-----
 include/linux/irq.h               |   2 +
 include/xen/xen-ops.h             |   8 ++
 include/xen/xenbus.h              |   3 +
 kernel/irq/chip.c                 |   2 +-
 kernel/irq/internals.h            |   1 +
 kernel/irq/pm.c                   |  31 +++++---
 kernel/power/user.c               |   6 +-
 17 files changed, 536 insertions(+), 40 deletions(-)

-- 
2.24.1.AMZN


^ permalink raw reply	[flat|nested] 73+ messages in thread
* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation
@ 2020-05-20 10:09 kbuild test robot
  0 siblings, 0 replies; 73+ messages in thread
From: kbuild test robot @ 2020-05-20 10:09 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 7344 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <ad580b4d5b76c18fe2fe409704f25622e01af361.1589926004.git.anchalag@amazon.com>
References: <ad580b4d5b76c18fe2fe409704f25622e01af361.1589926004.git.anchalag@amazon.com>
TO: Anchal Agarwal <anchalag@amazon.com>
TO: tglx(a)linutronix.de
TO: mingo(a)redhat.com
TO: bp(a)alien8.de
TO: hpa(a)zytor.com
TO: x86(a)kernel.org
TO: boris.ostrovsky(a)oracle.com
TO: jgross(a)suse.com
TO: linux-pm(a)vger.kernel.org
TO: linux-mm(a)kvack.org
TO: kamatam(a)amazon.com
TO: sstabellini(a)kernel.org
TO: konrad.wilk(a)oracle.com
TO: roger.pau(a)citrix.com
TO: axboe(a)kernel.dk
TO: davem(a)davemloft.net
TO: rjw(a)rjwysocki.net
TO: len.brown(a)intel.com
TO: pavel(a)ucw.cz
TO: peterz(a)infradead.org
TO: eduval(a)amazon.com
TO: sblbir(a)amazon.com
TO: anchalag(a)amazon.com
TO: xen-devel(a)lists.xenproject.org
TO: vkuznets(a)redhat.com
TO: netdev(a)vger.kernel.org
TO: linux-kernel(a)vger.kernel.org
TO: dwmw(a)amazon.co.uk
TO: benh(a)kernel.crashing.org

Hi Anchal,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.7-rc6]
[cannot apply to xen-tip/linux-next tip/irq/core tip/auto-latest next-20200519]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Anchal-Agarwal/Fix-PM-hibernation-in-Xen-guests/20200520-073211
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 03fb3acae4be8a6b680ffedb220a8b6c07260b40
config: x86_64-allmodconfig (attached as .config)
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-193-gb8fad4bc-dirty
        # save the attached .config to linux build tree
        make C=1 ARCH=x86_64 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
:::::: branch date: 11 hours ago
:::::: commit date: 11 hours ago

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> drivers/block/xen-blkfront.c:2700:0: sparse: sparse: missing terminating " character
   drivers/block/xen-blkfront.c:2701:0: sparse: sparse: missing terminating " character
   drivers/block/xen-blkfront.c:2700:25: sparse: sparse: Expected ) in function call
   drivers/block/xen-blkfront.c:2700:25: sparse: sparse: got The

# https://github.com/0day-ci/linux/commit/1997467d18e784a64ee0fe00875492e9605f6147
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 1997467d18e784a64ee0fe00875492e9605f6147
vim +2700 drivers/block/xen-blkfront.c

9f27ee59503865 Jeremy Fitzhardinge 2007-07-17  2672  
1997467d18e784 Munehisa Kamata     2020-05-19  2673  static int blkfront_freeze(struct xenbus_device *dev)
1997467d18e784 Munehisa Kamata     2020-05-19  2674  {
1997467d18e784 Munehisa Kamata     2020-05-19  2675  	unsigned int i;
1997467d18e784 Munehisa Kamata     2020-05-19  2676  	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
1997467d18e784 Munehisa Kamata     2020-05-19  2677  	struct blkfront_ring_info *rinfo;
1997467d18e784 Munehisa Kamata     2020-05-19  2678  	/* This would be reasonable timeout as used in xenbus_dev_shutdown() */
1997467d18e784 Munehisa Kamata     2020-05-19  2679  	unsigned int timeout = 5 * HZ;
1997467d18e784 Munehisa Kamata     2020-05-19  2680  	unsigned long flags;
1997467d18e784 Munehisa Kamata     2020-05-19  2681  	int err = 0;
1997467d18e784 Munehisa Kamata     2020-05-19  2682  
1997467d18e784 Munehisa Kamata     2020-05-19  2683  	info->connected = BLKIF_STATE_FREEZING;
1997467d18e784 Munehisa Kamata     2020-05-19  2684  
1997467d18e784 Munehisa Kamata     2020-05-19  2685  	blk_mq_freeze_queue(info->rq);
1997467d18e784 Munehisa Kamata     2020-05-19  2686  	blk_mq_quiesce_queue(info->rq);
1997467d18e784 Munehisa Kamata     2020-05-19  2687  
1997467d18e784 Munehisa Kamata     2020-05-19  2688  	for_each_rinfo(info, rinfo, i) {
1997467d18e784 Munehisa Kamata     2020-05-19  2689  	    /* No more gnttab callback work. */
1997467d18e784 Munehisa Kamata     2020-05-19  2690  	    gnttab_cancel_free_callback(&rinfo->callback);
1997467d18e784 Munehisa Kamata     2020-05-19  2691  	    /* Flush gnttab callback work. Must be done with no locks held. */
1997467d18e784 Munehisa Kamata     2020-05-19  2692  	    flush_work(&rinfo->work);
1997467d18e784 Munehisa Kamata     2020-05-19  2693  	}
1997467d18e784 Munehisa Kamata     2020-05-19  2694  
1997467d18e784 Munehisa Kamata     2020-05-19  2695  	for_each_rinfo(info, rinfo, i) {
1997467d18e784 Munehisa Kamata     2020-05-19  2696  	    spin_lock_irqsave(&rinfo->ring_lock, flags);
1997467d18e784 Munehisa Kamata     2020-05-19  2697  	    if (RING_FULL(&rinfo->ring)
1997467d18e784 Munehisa Kamata     2020-05-19  2698  		    || RING_HAS_UNCONSUMED_RESPONSES(&rinfo->ring)) {
1997467d18e784 Munehisa Kamata     2020-05-19  2699  		xenbus_dev_error(dev, err, "Hibernation Failed.
1997467d18e784 Munehisa Kamata     2020-05-19 @2700  			The ring is still busy");
1997467d18e784 Munehisa Kamata     2020-05-19  2701  		info->connected = BLKIF_STATE_CONNECTED;
1997467d18e784 Munehisa Kamata     2020-05-19  2702  		spin_unlock_irqrestore(&rinfo->ring_lock, flags);
1997467d18e784 Munehisa Kamata     2020-05-19  2703  		return -EBUSY;
1997467d18e784 Munehisa Kamata     2020-05-19  2704  	}
1997467d18e784 Munehisa Kamata     2020-05-19  2705  	    spin_unlock_irqrestore(&rinfo->ring_lock, flags);
1997467d18e784 Munehisa Kamata     2020-05-19  2706  	}
1997467d18e784 Munehisa Kamata     2020-05-19  2707  	/* Kick the backend to disconnect */
1997467d18e784 Munehisa Kamata     2020-05-19  2708  	xenbus_switch_state(dev, XenbusStateClosing);
1997467d18e784 Munehisa Kamata     2020-05-19  2709  
1997467d18e784 Munehisa Kamata     2020-05-19  2710  	/*
1997467d18e784 Munehisa Kamata     2020-05-19  2711  	 * We don't want to move forward before the frontend is diconnected
1997467d18e784 Munehisa Kamata     2020-05-19  2712  	 * from the backend cleanly.
1997467d18e784 Munehisa Kamata     2020-05-19  2713  	 */
1997467d18e784 Munehisa Kamata     2020-05-19  2714  	timeout = wait_for_completion_timeout(&info->wait_backend_disconnected,
1997467d18e784 Munehisa Kamata     2020-05-19  2715  					      timeout);
1997467d18e784 Munehisa Kamata     2020-05-19  2716  	if (!timeout) {
1997467d18e784 Munehisa Kamata     2020-05-19  2717  		err = -EBUSY;
1997467d18e784 Munehisa Kamata     2020-05-19  2718  		xenbus_dev_error(dev, err, "Freezing timed out;"
1997467d18e784 Munehisa Kamata     2020-05-19  2719  				 "the device may become inconsistent state");
1997467d18e784 Munehisa Kamata     2020-05-19  2720  	}
1997467d18e784 Munehisa Kamata     2020-05-19  2721  
1997467d18e784 Munehisa Kamata     2020-05-19  2722  	return err;
1997467d18e784 Munehisa Kamata     2020-05-19  2723  }
1997467d18e784 Munehisa Kamata     2020-05-19  2724  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 74002 bytes --]

^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2020-06-08 18:51 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 23:24 [PATCH 00/12] Fix PM hibernation in Xen guests Anchal Agarwal
2020-05-19 23:24 ` Anchal Agarwal
2020-05-19 23:24 ` [PATCH 01/12] xen/manage: keep track of the on-going suspend mode Anchal Agarwal
2020-05-19 23:24   ` Anchal Agarwal
2020-05-30 22:26   ` Boris Ostrovsky
2020-06-01 21:00     ` Agarwal, Anchal
2020-06-01 22:39       ` Boris Ostrovsky
2020-05-19 23:25 ` [PATCH 02/12] xenbus: add freeze/thaw/restore callbacks support Anchal Agarwal
2020-05-19 23:25   ` Anchal Agarwal
2020-05-30 22:56   ` Boris Ostrovsky
2020-06-01 23:36     ` Agarwal, Anchal
2020-05-19 23:25 ` [PATCH 03/12] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume Anchal Agarwal
2020-05-19 23:25   ` Anchal Agarwal
2020-05-30 23:02   ` Boris Ostrovsky
2020-06-04 23:03     ` Anchal Agarwal
2020-06-04 23:03       ` Anchal Agarwal
2020-06-05 21:39       ` Boris Ostrovsky
2020-06-05 21:39         ` Boris Ostrovsky
2020-06-08 16:52         ` Anchal Agarwal
2020-06-08 16:52           ` Anchal Agarwal
2020-06-08 18:49           ` Boris Ostrovsky
2020-06-08 18:49             ` Boris Ostrovsky
2020-05-19 23:26 ` [PATCH 04/12] x86/xen: add system core suspend and resume callbacks Anchal Agarwal
2020-05-19 23:26   ` Anchal Agarwal
2020-05-30 23:10   ` Boris Ostrovsky
2020-06-03 22:40     ` Agarwal, Anchal
2020-06-05 21:24       ` Boris Ostrovsky
2020-06-08 17:09         ` Anchal Agarwal
2020-06-08 17:09           ` Anchal Agarwal
2020-05-19 23:26 ` [PATCH 05/12] genirq: Shutdown irq chips in suspend/resume during hibernation Anchal Agarwal
2020-05-19 23:34   ` Anchal Agarwal
2020-05-19 23:34   ` Anchal Agarwal
2020-05-19 23:26   ` Anchal Agarwal
2020-05-19 23:29   ` Singh, Balbir
2020-05-19 23:29     ` Singh, Balbir
2020-05-19 23:36     ` Agarwal, Anchal
2020-05-19 23:36       ` Agarwal, Anchal
2020-05-30 23:17   ` Boris Ostrovsky
2020-06-01 20:46     ` Agarwal, Anchal
2020-05-19 23:27 ` [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation Anchal Agarwal
2020-05-19 23:27   ` Anchal Agarwal
2020-05-20  5:00   ` kbuild test robot
2020-05-20  5:00     ` kbuild test robot
2020-05-20  5:00     ` kbuild test robot
2020-05-20  5:07   ` kbuild test robot
2020-05-20  5:07     ` kbuild test robot
2020-05-21 23:48   ` Anchal Agarwal
2020-05-21 23:48     ` Anchal Agarwal
2020-05-22  1:43     ` Singh, Balbir
2020-05-22  1:43       ` Singh, Balbir
2020-05-23 12:32   ` kbuild test robot
2020-05-23 12:32     ` kbuild test robot
2020-05-28 12:30   ` Roger Pau Monné
2020-05-28 12:30     ` Roger Pau Monné
2020-05-19 23:28 ` [PATCH 07/12] xen-netfront: " Anchal Agarwal
2020-05-19 23:28   ` Anchal Agarwal
2020-05-19 23:28 ` [PATCH 08/12] xen/time: introduce xen_{save,restore}_steal_clock Anchal Agarwal
2020-05-19 23:28   ` Anchal Agarwal
2020-05-30 23:32   ` Boris Ostrovsky
2020-05-19 23:28 ` [PATCH 09/12] x86/xen: save and restore steal clock Anchal Agarwal
2020-05-19 23:28   ` Anchal Agarwal
2020-05-30 23:44   ` Boris Ostrovsky
2020-06-04 18:33     ` Anchal Agarwal
2020-06-04 18:33       ` Anchal Agarwal
2020-05-19 23:29 ` [PATCH 10/12] xen: Introduce wrapper for save/restore sched clock offset Anchal Agarwal
2020-05-19 23:29   ` Anchal Agarwal
2020-05-19 23:29 ` [PATCH 11/12] xen: Update sched clock offset to avoid system instability in hibernation Anchal Agarwal
2020-05-19 23:29   ` Anchal Agarwal
2020-05-19 23:29 ` [PATCH 12/12] PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA Anchal Agarwal
2020-05-19 23:29   ` Anchal Agarwal
2020-05-28 17:59 ` [PATCH 00/12] Fix PM hibernation in Xen guests Agarwal, Anchal
2020-05-28 17:59   ` Agarwal, Anchal
2020-05-20 10:09 [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.