linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger@xen.org>
To: Anchal Agarwal <anchalag@amazon.com>
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	hpa@zytor.com, x86@kernel.org, boris.ostrovsky@oracle.com,
	jgross@suse.com, linux-pm@vger.kernel.org, linux-mm@kvack.org,
	kamatam@amazon.com, sstabellini@kernel.org,
	konrad.wilk@oracle.com, axboe@kernel.dk, davem@davemloft.net,
	rjw@rjwysocki.net, len.brown@intel.com, pavel@ucw.cz,
	peterz@infradead.org, eduval@amazon.com, sblbir@amazon.com,
	xen-devel@lists.xenproject.org, vkuznets@redhat.com,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	dwmw@amazon.co.uk, benh@kernel.crashing.org
Subject: Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation
Date: Thu, 28 May 2020 14:30:12 +0200	[thread overview]
Message-ID: <20200528122956.GF1195@Air-de-Roger> (raw)
In-Reply-To: <ad580b4d5b76c18fe2fe409704f25622e01af361.1589926004.git.anchalag@amazon.com>

On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
> From: Munehisa Kamata <kamatam@amazon.com>
> 
> S4 power transition states are much different than xen
> suspend/resume. Former is visible to the guest and frontend drivers should
> be aware of the state transitions and should be able to take appropriate
> actions when needed. In transition to S4 we need to make sure that at least
> all the in-flight blkif requests get completed, since they probably contain
> bits of the guest's memory image and that's not going to get saved any
> other way. Hence, re-issuing of in-flight requests as in case of xen resume
> will not work here. This is in contrast to xen-suspend where we need to
> freeze with as little processing as possible to avoid dirtying RAM late in
> the migration cycle and we know that in-flight data can wait.
> 
> Add freeze, thaw and restore callbacks for PM suspend and hibernation
> support. All frontend drivers that needs to use PM_HIBERNATION/PM_SUSPEND
> events, need to implement these xenbus_driver callbacks. The freeze handler
> stops block-layer queue and disconnect the frontend from the backend while
> freeing ring_info and associated resources. Before disconnecting from the
> backend, we need to prevent any new IO from being queued and wait for existing
> IO to complete. Freeze/unfreeze of the queues will guarantee that there are no
> requests in use on the shared ring. However, for sanity we should check
> state of the ring before disconnecting to make sure that there are no
> outstanding requests to be processed on the ring. The restore handler
> re-allocates ring_info, unquiesces and unfreezes the queue and re-connect to
> the backend, so that rest of the kernel can continue to use the block device
> transparently.
> 
> Note:For older backends,if a backend doesn't have commit'12ea729645ace'
> xen/blkback: unmap all persistent grants when frontend gets disconnected,
> the frontend may see massive amount of grant table warning when freeing
> resources.
> [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!
> 
> In this case, persistent grants would need to be disabled.
> 
> [Anchal Changelog: Removed timeout/request during blkfront freeze.
> Reworked the whole patch to work with blk-mq and incorporate upstream's
> comments]

Please tag versions using vX and it would be helpful if you could list
the specific changes that you performed between versions. There where
3 RFC versions IIRC, and there's no log of the changes between them.

> 
> Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
> Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
> ---
>  drivers/block/xen-blkfront.c | 122 +++++++++++++++++++++++++++++++++--
>  1 file changed, 115 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 3b889ea950c2..464863ed7093 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -48,6 +48,8 @@
>  #include <linux/list.h>
>  #include <linux/workqueue.h>
>  #include <linux/sched/mm.h>
> +#include <linux/completion.h>
> +#include <linux/delay.h>
>  
>  #include <xen/xen.h>
>  #include <xen/xenbus.h>
> @@ -80,6 +82,8 @@ enum blkif_state {
>  	BLKIF_STATE_DISCONNECTED,
>  	BLKIF_STATE_CONNECTED,
>  	BLKIF_STATE_SUSPENDED,
> +	BLKIF_STATE_FREEZING,
> +	BLKIF_STATE_FROZEN

Nit: adding a terminating ',' would prevent further additions from
having to modify this line.

>  };
>  
>  struct grant {
> @@ -219,6 +223,7 @@ struct blkfront_info
>  	struct list_head requests;
>  	struct bio_list bio_list;
>  	struct list_head info_list;
> +	struct completion wait_backend_disconnected;
>  };
>  
>  static unsigned int nr_minors;
> @@ -1005,6 +1010,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
>  	info->sector_size = sector_size;
>  	info->physical_sector_size = physical_sector_size;
>  	blkif_set_queue_limits(info);
> +	init_completion(&info->wait_backend_disconnected);
>  
>  	return 0;
>  }
> @@ -1057,7 +1063,7 @@ static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
>  		case XEN_SCSI_DISK5_MAJOR:
>  		case XEN_SCSI_DISK6_MAJOR:
>  		case XEN_SCSI_DISK7_MAJOR:
> -			*offset = (*minor / PARTS_PER_DISK) + 
> +			*offset = (*minor / PARTS_PER_DISK) +
>  				((major - XEN_SCSI_DISK1_MAJOR + 1) * 16) +
>  				EMULATED_SD_DISK_NAME_OFFSET;
>  			*minor = *minor +
> @@ -1072,7 +1078,7 @@ static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
>  		case XEN_SCSI_DISK13_MAJOR:
>  		case XEN_SCSI_DISK14_MAJOR:
>  		case XEN_SCSI_DISK15_MAJOR:
> -			*offset = (*minor / PARTS_PER_DISK) + 
> +			*offset = (*minor / PARTS_PER_DISK) +

Unrelated changes, please split to a pre-patch.

>  				((major - XEN_SCSI_DISK8_MAJOR + 8) * 16) +
>  				EMULATED_SD_DISK_NAME_OFFSET;
>  			*minor = *minor +
> @@ -1353,6 +1359,8 @@ static void blkif_free(struct blkfront_info *info, int suspend)
>  	unsigned int i;
>  	struct blkfront_ring_info *rinfo;
>  
> +	if (info->connected == BLKIF_STATE_FREEZING)
> +		goto free_rings;
>  	/* Prevent new requests being issued until we fix things up. */
>  	info->connected = suspend ?
>  		BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
> @@ -1360,6 +1368,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
>  	if (info->rq)
>  		blk_mq_stop_hw_queues(info->rq);
>  
> +free_rings:
>  	for_each_rinfo(info, rinfo, i)
>  		blkif_free_ring(rinfo);
>  
> @@ -1563,8 +1572,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  	struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
>  	struct blkfront_info *info = rinfo->dev_info;
>  
> -	if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
> -		return IRQ_HANDLED;
> +	if (unlikely(info->connected != BLKIF_STATE_CONNECTED
> +		    && info->connected != BLKIF_STATE_FREEZING)){

Extra tab and missing space between '){'. Also my preference would be
for the && to go at the end of the previous line, like it's done
elsewhere in the file.

> +	    return IRQ_HANDLED;
> +	}
>  
>  	spin_lock_irqsave(&rinfo->ring_lock, flags);
>   again:
> @@ -2027,6 +2038,7 @@ static int blkif_recover(struct blkfront_info *info)
>  	unsigned int segs;
>  	struct blkfront_ring_info *rinfo;
>  
> +	bool frozen = info->connected == BLKIF_STATE_FROZEN;

Please put this together with the rest of the variable definitions,
and leave the empty line as a split between variable definitions and
code. I've already requested this on RFC v3 but you seem to have
dropped some of the requests I've made there.

>  	blkfront_gather_backend_features(info);
>  	/* Reset limits changed by blk_mq_update_nr_hw_queues(). */
>  	blkif_set_queue_limits(info);
> @@ -2048,6 +2060,9 @@ static int blkif_recover(struct blkfront_info *info)
>  		kick_pending_request_queues(rinfo);
>  	}
>  
> +	if (frozen)
> +		return 0;
> +
>  	list_for_each_entry_safe(req, n, &info->requests, queuelist) {
>  		/* Requeue pending requests (flush or discard) */
>  		list_del_init(&req->queuelist);
> @@ -2364,6 +2379,7 @@ static void blkfront_connect(struct blkfront_info *info)
>  
>  		return;
>  	case BLKIF_STATE_SUSPENDED:
> +	case BLKIF_STATE_FROZEN:
>  		/*
>  		 * If we are recovering from suspension, we need to wait
>  		 * for the backend to announce it's features before
> @@ -2481,12 +2497,36 @@ static void blkback_changed(struct xenbus_device *dev,
>  		break;
>  
>  	case XenbusStateClosed:
> -		if (dev->state == XenbusStateClosed)
> +		if (dev->state == XenbusStateClosed) {
> +			if (info->connected == BLKIF_STATE_FREEZING) {
> +				blkif_free(info, 0);
> +				info->connected = BLKIF_STATE_FROZEN;
> +				complete(&info->wait_backend_disconnected);
> +				break;

There's no need for the break here, you can rely on the break below.

> +			}
> +
>  			break;
> +		}
> +
> +		/*
> +		 * We may somehow receive backend's Closed again while thawing
> +		 * or restoring and it causes thawing or restoring to fail.
> +		 * Ignore such unexpected state regardless of the backend state.
> +		 */
> +		if (info->connected == BLKIF_STATE_FROZEN) {

I think you can join this with the previous dev->state == XenbusStateClosed?

Also, won't the device be in the Closed state already if it's in state
frozen?

> +			dev_dbg(&dev->dev,
> +					"ignore the backend's Closed state: %s",
> +					dev->nodename);
> +			break;
> +		}
>  		/* fall through */
>  	case XenbusStateClosing:
> -		if (info)
> -			blkfront_closing(info);
> +		if (info) {
> +			if (info->connected == BLKIF_STATE_FREEZING)
> +				xenbus_frontend_closed(dev);
> +			else
> +				blkfront_closing(info);
> +		}
>  		break;
>  	}
>  }
> @@ -2630,6 +2670,71 @@ static void blkif_release(struct gendisk *disk, fmode_t mode)
>  	mutex_unlock(&blkfront_mutex);
>  }
>  
> +static int blkfront_freeze(struct xenbus_device *dev)
> +{
> +	unsigned int i;
> +	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
> +	struct blkfront_ring_info *rinfo;
> +	/* This would be reasonable timeout as used in xenbus_dev_shutdown() */
> +	unsigned int timeout = 5 * HZ;
> +	unsigned long flags;
> +	int err = 0;
> +
> +	info->connected = BLKIF_STATE_FREEZING;
> +
> +	blk_mq_freeze_queue(info->rq);
> +	blk_mq_quiesce_queue(info->rq);
> +
> +	for_each_rinfo(info, rinfo, i) {
> +	    /* No more gnttab callback work. */
> +	    gnttab_cancel_free_callback(&rinfo->callback);
> +	    /* Flush gnttab callback work. Must be done with no locks held. */
> +	    flush_work(&rinfo->work);
> +	}
> +
> +	for_each_rinfo(info, rinfo, i) {
> +	    spin_lock_irqsave(&rinfo->ring_lock, flags);
> +	    if (RING_FULL(&rinfo->ring)
> +		    || RING_HAS_UNCONSUMED_RESPONSES(&rinfo->ring)) {

'||' should go at the end of the previous line.

> +		xenbus_dev_error(dev, err, "Hibernation Failed.
> +			The ring is still busy");
> +		info->connected = BLKIF_STATE_CONNECTED;
> +		spin_unlock_irqrestore(&rinfo->ring_lock, flags);

You need to unfreeze the queues here, or else the device will be in a
blocked state AFAICT.

> +		return -EBUSY;
> +	}
> +	    spin_unlock_irqrestore(&rinfo->ring_lock, flags);
> +	}

This block has indentation all messed up.

> +	/* Kick the backend to disconnect */
> +	xenbus_switch_state(dev, XenbusStateClosing);
> +
> +	/*
> +	 * We don't want to move forward before the frontend is diconnected
> +	 * from the backend cleanly.
> +	 */
> +	timeout = wait_for_completion_timeout(&info->wait_backend_disconnected,
> +					      timeout);
> +	if (!timeout) {
> +		err = -EBUSY;

Note err is only used here, and I think could just be dropped.

> +		xenbus_dev_error(dev, err, "Freezing timed out;"
> +				 "the device may become inconsistent state");

Leaving the device in this state is quite bad, as it's in a closed
state and with the queues frozen. You should make an attempt to
restore things to a working state.

> +	}
> +
> +	return err;
> +}
> +
> +static int blkfront_restore(struct xenbus_device *dev)
> +{
> +	struct blkfront_info *info = dev_get_drvdata(&dev->dev);
> +	int err = 0;
> +
> +	err = talk_to_blkback(dev, info);
> +	blk_mq_unquiesce_queue(info->rq);
> +	blk_mq_unfreeze_queue(info->rq);
> +	if (!err)
> +	    blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);

Bad indentation. Also shouldn't you first update the queues and then
unfreeze them?

Thanks, Roger.

  parent reply	other threads:[~2020-05-28 13:10 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-19 23:24 [PATCH 00/12] Fix PM hibernation in Xen guests Anchal Agarwal
2020-05-19 23:24 ` [PATCH 01/12] xen/manage: keep track of the on-going suspend mode Anchal Agarwal
2020-05-30 22:26   ` Boris Ostrovsky
2020-06-01 21:00     ` Agarwal, Anchal
2020-06-01 22:39       ` Boris Ostrovsky
2020-05-19 23:25 ` [PATCH 02/12] xenbus: add freeze/thaw/restore callbacks support Anchal Agarwal
2020-05-30 22:56   ` Boris Ostrovsky
2020-06-01 23:36     ` Agarwal, Anchal
2020-05-19 23:25 ` [PATCH 03/12] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume Anchal Agarwal
2020-05-30 23:02   ` Boris Ostrovsky
2020-06-04 23:03     ` Anchal Agarwal
2020-06-05 21:39       ` Boris Ostrovsky
2020-06-08 16:52         ` Anchal Agarwal
2020-06-08 18:49           ` Boris Ostrovsky
2020-05-19 23:26 ` [PATCH 04/12] x86/xen: add system core suspend and resume callbacks Anchal Agarwal
2020-05-30 23:10   ` Boris Ostrovsky
2020-06-03 22:40     ` Agarwal, Anchal
2020-06-05 21:24       ` Boris Ostrovsky
2020-06-08 17:09         ` Anchal Agarwal
2020-05-19 23:26 ` [PATCH 05/12] genirq: Shutdown irq chips in suspend/resume during hibernation Anchal Agarwal
2020-05-19 23:29   ` Singh, Balbir
2020-05-19 23:36     ` Agarwal, Anchal
2020-05-19 23:34   ` Anchal Agarwal
2020-05-30 23:17   ` Boris Ostrovsky
2020-06-01 20:46     ` Agarwal, Anchal
2020-05-19 23:27 ` [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation Anchal Agarwal
2020-05-20  5:00   ` kbuild test robot
2020-05-20  5:07   ` kbuild test robot
2020-05-21 23:48   ` Anchal Agarwal
2020-05-22  1:43     ` Singh, Balbir
2020-05-28 12:30   ` Roger Pau Monné [this message]
2020-05-19 23:28 ` [PATCH 07/12] xen-netfront: " Anchal Agarwal
2020-05-19 23:28 ` [PATCH 08/12] xen/time: introduce xen_{save,restore}_steal_clock Anchal Agarwal
2020-05-30 23:32   ` Boris Ostrovsky
2020-05-19 23:28 ` [PATCH 09/12] x86/xen: save and restore steal clock Anchal Agarwal
2020-05-30 23:44   ` Boris Ostrovsky
2020-06-04 18:33     ` Anchal Agarwal
2020-05-19 23:29 ` [PATCH 10/12] xen: Introduce wrapper for save/restore sched clock offset Anchal Agarwal
2020-05-19 23:29 ` [PATCH 11/12] xen: Update sched clock offset to avoid system instability in hibernation Anchal Agarwal
2020-05-19 23:29 ` [PATCH 12/12] PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA Anchal Agarwal
2020-05-28 17:59 ` [PATCH 00/12] Fix PM hibernation in Xen guests Agarwal, Anchal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200528122956.GF1195@Air-de-Roger \
    --to=roger@xen.org \
    --cc=anchalag@amazon.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=davem@davemloft.net \
    --cc=dwmw@amazon.co.uk \
    --cc=eduval@amazon.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=kamatam@amazon.com \
    --cc=konrad.wilk@oracle.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=sblbir@amazon.com \
    --cc=sstabellini@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).