netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
@ 2020-06-03 23:33 Agarwal, Anchal
  2020-06-04  7:05 ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Agarwal, Anchal @ 2020-06-03 23:33 UTC (permalink / raw)
  To: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, roger.pau,
	axboe, davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo,
	Singh, Balbir, xen-devel, vkuznets, netdev, linux-kernel,
	Woodhouse, David, benh

 CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
    > From: Munehisa Kamata <kamatam@amazon.com>
    > 
    > S4 power transition states are much different than xen
    > suspend/resume. Former is visible to the guest and frontend drivers should
    > be aware of the state transitions and should be able to take appropriate
    > actions when needed. In transition to S4 we need to make sure that at least
    > all the in-flight blkif requests get completed, since they probably contain
    > bits of the guest's memory image and that's not going to get saved any
    > other way. Hence, re-issuing of in-flight requests as in case of xen resume
    > will not work here. This is in contrast to xen-suspend where we need to
    > freeze with as little processing as possible to avoid dirtying RAM late in
    > the migration cycle and we know that in-flight data can wait.
    > 
    > Add freeze, thaw and restore callbacks for PM suspend and hibernation
    > support. All frontend drivers that needs to use PM_HIBERNATION/PM_SUSPEND
    > events, need to implement these xenbus_driver callbacks. The freeze handler
    > stops block-layer queue and disconnect the frontend from the backend while
    > freeing ring_info and associated resources. Before disconnecting from the
    > backend, we need to prevent any new IO from being queued and wait for existing
    > IO to complete. Freeze/unfreeze of the queues will guarantee that there are no
    > requests in use on the shared ring. However, for sanity we should check
    > state of the ring before disconnecting to make sure that there are no
    > outstanding requests to be processed on the ring. The restore handler
    > re-allocates ring_info, unquiesces and unfreezes the queue and re-connect to
    > the backend, so that rest of the kernel can continue to use the block device
    > transparently.
    > 
    > Note:For older backends,if a backend doesn't have commit'12ea729645ace'
    > xen/blkback: unmap all persistent grants when frontend gets disconnected,
    > the frontend may see massive amount of grant table warning when freeing
    > resources.
    > [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
    > [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!
    > 
    > In this case, persistent grants would need to be disabled.
    > 
    > [Anchal Changelog: Removed timeout/request during blkfront freeze.
    > Reworked the whole patch to work with blk-mq and incorporate upstream's
    > comments]

    Please tag versions using vX and it would be helpful if you could list
    the specific changes that you performed between versions. There where
    3 RFC versions IIRC, and there's no log of the changes between them.

I will elaborate on "upstream's comments" in my changelog in my next round of patches.
    > 
    > Signed-off-by: Anchal Agarwal <anchalag@amazon.com>
    > Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
    > ---
    >  drivers/block/xen-blkfront.c | 122 +++++++++++++++++++++++++++++++++--
    >  1 file changed, 115 insertions(+), 7 deletions(-)
    > 
    > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
    > index 3b889ea950c2..464863ed7093 100644
    > --- a/drivers/block/xen-blkfront.c
    > +++ b/drivers/block/xen-blkfront.c
    > @@ -48,6 +48,8 @@
    >  #include <linux/list.h>
    >  #include <linux/workqueue.h>
    >  #include <linux/sched/mm.h>
    > +#include <linux/completion.h>
    > +#include <linux/delay.h>
    > 
    >  #include <xen/xen.h>
    >  #include <xen/xenbus.h>
    > @@ -80,6 +82,8 @@ enum blkif_state {
    >       BLKIF_STATE_DISCONNECTED,
    >       BLKIF_STATE_CONNECTED,
    >       BLKIF_STATE_SUSPENDED,
    > +     BLKIF_STATE_FREEZING,
    > +     BLKIF_STATE_FROZEN

    Nit: adding a terminating ',' would prevent further additions from
    having to modify this line.
ACK.
    >  };
    > 
    >  struct grant {
    > @@ -219,6 +223,7 @@ struct blkfront_info
    >       struct list_head requests;
    >       struct bio_list bio_list;
    >       struct list_head info_list;
    > +     struct completion wait_backend_disconnected;
    >  };
    > 
    >  static unsigned int nr_minors;
    > @@ -1005,6 +1010,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
    >       info->sector_size = sector_size;
    >       info->physical_sector_size = physical_sector_size;
    >       blkif_set_queue_limits(info);
    > +     init_completion(&info->wait_backend_disconnected);
    > 
    >       return 0;
    >  }
    > @@ -1057,7 +1063,7 @@ static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
    >               case XEN_SCSI_DISK5_MAJOR:
    >               case XEN_SCSI_DISK6_MAJOR:
    >               case XEN_SCSI_DISK7_MAJOR:
    > -                     *offset = (*minor / PARTS_PER_DISK) +
    > +                     *offset = (*minor / PARTS_PER_DISK) +
    >                               ((major - XEN_SCSI_DISK1_MAJOR + 1) * 16) +
    >                               EMULATED_SD_DISK_NAME_OFFSET;
    >                       *minor = *minor +
    > @@ -1072,7 +1078,7 @@ static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
    >               case XEN_SCSI_DISK13_MAJOR:
    >               case XEN_SCSI_DISK14_MAJOR:
    >               case XEN_SCSI_DISK15_MAJOR:
    > -                     *offset = (*minor / PARTS_PER_DISK) +
    > +                     *offset = (*minor / PARTS_PER_DISK) +

    Unrelated changes, please split to a pre-patch.

I must admit, this may have occurred due to some issues with my local vim settings. I will fix this.
    >                               ((major - XEN_SCSI_DISK8_MAJOR + 8) * 16) +
    >                               EMULATED_SD_DISK_NAME_OFFSET;
    >                       *minor = *minor +
    > @@ -1353,6 +1359,8 @@ static void blkif_free(struct blkfront_info *info, int suspend)
    >       unsigned int i;
    >       struct blkfront_ring_info *rinfo;
    > 
    > +     if (info->connected == BLKIF_STATE_FREEZING)
    > +             goto free_rings;
    >       /* Prevent new requests being issued until we fix things up. */
    >       info->connected = suspend ?
    >               BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
    > @@ -1360,6 +1368,7 @@ static void blkif_free(struct blkfront_info *info, int suspend)
    >       if (info->rq)
    >               blk_mq_stop_hw_queues(info->rq);
    > 
    > +free_rings:
    >       for_each_rinfo(info, rinfo, i)
    >               blkif_free_ring(rinfo);
    > 
    > @@ -1563,8 +1572,10 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
    >       struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
    >       struct blkfront_info *info = rinfo->dev_info;
    > 
    > -     if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
    > -             return IRQ_HANDLED;
    > +     if (unlikely(info->connected != BLKIF_STATE_CONNECTED
    > +                 && info->connected != BLKIF_STATE_FREEZING)){

    Extra tab and missing space between '){'. Also my preference would be
    for the && to go at the end of the previous line, like it's done
    elsewhere in the file.
Ok.
    > +         return IRQ_HANDLED;
    > +     }
    > 
    >       spin_lock_irqsave(&rinfo->ring_lock, flags);
    >   again:
    > @@ -2027,6 +2038,7 @@ static int blkif_recover(struct blkfront_info *info)
    >       unsigned int segs;
    >       struct blkfront_ring_info *rinfo;
    > 
    > +     bool frozen = info->connected == BLKIF_STATE_FROZEN;

    Please put this together with the rest of the variable definitions,
    and leave the empty line as a split between variable definitions and
    code. I've already requested this on RFC v3 but you seem to have
    dropped some of the requests I've made there.

I am sorry if I missed this. I think I fixed everything you commented but may be this got left out. Will fix.
I will cross-check RFCV3 to see if I missed anything else.

    >       blkfront_gather_backend_features(info);
    >       /* Reset limits changed by blk_mq_update_nr_hw_queues(). */
    >       blkif_set_queue_limits(info);
    > @@ -2048,6 +2060,9 @@ static int blkif_recover(struct blkfront_info *info)
    >               kick_pending_request_queues(rinfo);
    >       }
    > 
    > +     if (frozen)
    > +             return 0;
    > +
    >       list_for_each_entry_safe(req, n, &info->requests, queuelist) {
    >               /* Requeue pending requests (flush or discard) */
    >               list_del_init(&req->queuelist);
    > @@ -2364,6 +2379,7 @@ static void blkfront_connect(struct blkfront_info *info)
    > 
    >               return;
    >       case BLKIF_STATE_SUSPENDED:
    > +     case BLKIF_STATE_FROZEN:
    >               /*
    >                * If we are recovering from suspension, we need to wait
    >                * for the backend to announce it's features before
    > @@ -2481,12 +2497,36 @@ static void blkback_changed(struct xenbus_device *dev,
    >               break;
    > 
    >       case XenbusStateClosed:
    > -             if (dev->state == XenbusStateClosed)
    > +             if (dev->state == XenbusStateClosed) {
    > +                     if (info->connected == BLKIF_STATE_FREEZING) {
    > +                             blkif_free(info, 0);
    > +                             info->connected = BLKIF_STATE_FROZEN;
    > +                             complete(&info->wait_backend_disconnected);
    > +                             break;

    There's no need for the break here, you can rely on the break below.
Ack.
    > +                     }
    > +
    >                       break;
    > +             }
    > +
    > +             /*
    > +              * We may somehow receive backend's Closed again while thawing
    > +              * or restoring and it causes thawing or restoring to fail.
    > +              * Ignore such unexpected state regardless of the backend state.
    > +              */
    > +             if (info->connected == BLKIF_STATE_FROZEN) {

    I think you can join this with the previous dev->state == XenbusStateClosed?

    Also, won't the device be in the Closed state already if it's in state
    frozen?
Yes but I think this mostly due to a hypothetical case if during thawing backend switches to Closed state.
I am not entirely sure if that could happen. Could use some expertise here.

    > +                     dev_dbg(&dev->dev,
    > +                                     "ignore the backend's Closed state: %s",
    > +                                     dev->nodename);
    > +                     break;
    > +             }
    >               /* fall through */
    >       case XenbusStateClosing:
    > -             if (info)
    > -                     blkfront_closing(info);
    > +             if (info) {
    > +                     if (info->connected == BLKIF_STATE_FREEZING)
    > +                             xenbus_frontend_closed(dev);
    > +                     else
    > +                             blkfront_closing(info);
    > +             }
    >               break;
    >       }
    >  }
    > @@ -2630,6 +2670,71 @@ static void blkif_release(struct gendisk *disk, fmode_t mode)
    >       mutex_unlock(&blkfront_mutex);
    >  }
    > 
    > +static int blkfront_freeze(struct xenbus_device *dev)
    > +{
    > +     unsigned int i;
    > +     struct blkfront_info *info = dev_get_drvdata(&dev->dev);
    > +     struct blkfront_ring_info *rinfo;
    > +     /* This would be reasonable timeout as used in xenbus_dev_shutdown() */
    > +     unsigned int timeout = 5 * HZ;
    > +     unsigned long flags;
    > +     int err = 0;
    > +
    > +     info->connected = BLKIF_STATE_FREEZING;
    > +
    > +     blk_mq_freeze_queue(info->rq);
    > +     blk_mq_quiesce_queue(info->rq);
    > +
    > +     for_each_rinfo(info, rinfo, i) {
    > +         /* No more gnttab callback work. */
    > +         gnttab_cancel_free_callback(&rinfo->callback);
    > +         /* Flush gnttab callback work. Must be done with no locks held. */
    > +         flush_work(&rinfo->work);
    > +     }
    > +
    > +     for_each_rinfo(info, rinfo, i) {
    > +         spin_lock_irqsave(&rinfo->ring_lock, flags);
    > +         if (RING_FULL(&rinfo->ring)
    > +                 || RING_HAS_UNCONSUMED_RESPONSES(&rinfo->ring)) {

    '||' should go at the end of the previous line.
Ack.
    > +             xenbus_dev_error(dev, err, "Hibernation Failed.
    > +                     The ring is still busy");
    > +             info->connected = BLKIF_STATE_CONNECTED;
    > +             spin_unlock_irqrestore(&rinfo->ring_lock, flags);

    You need to unfreeze the queues here, or else the device will be in a
    blocked state AFAICT.
Yes, I was under wrong assumption that if freeze fails everything should thaw back correctly apparently not.
I will fix this.
    > +             return -EBUSY;
    > +     }
    > +         spin_unlock_irqrestore(&rinfo->ring_lock, flags);
    > +     }

    This block has indentation all messed up.
Too many of these ( 
    > +     /* Kick the backend to disconnect */
    > +     xenbus_switch_state(dev, XenbusStateClosing);
    > +
    > +     /*
    > +      * We don't want to move forward before the frontend is diconnected
    > +      * from the backend cleanly.
    > +      */
    > +     timeout = wait_for_completion_timeout(&info->wait_backend_disconnected,
    > +                                           timeout);
    > +     if (!timeout) {
    > +             err = -EBUSY;

    Note err is only used here, and I think could just be dropped.

This err is what's being returned from the function. Am I missing anything?

    > +             xenbus_dev_error(dev, err, "Freezing timed out;"
    > +                              "the device may become inconsistent state");

    Leaving the device in this state is quite bad, as it's in a closed
    state and with the queues frozen. You should make an attempt to
    restore things to a working state.

You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to 
leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why 
I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.

    > +     }
    > +
    > +     return err;
    > +}
    > +
    > +static int blkfront_restore(struct xenbus_device *dev)
    > +{
    > +     struct blkfront_info *info = dev_get_drvdata(&dev->dev);
    > +     int err = 0;
    > +
    > +     err = talk_to_blkback(dev, info);
    > +     blk_mq_unquiesce_queue(info->rq);
    > +     blk_mq_unfreeze_queue(info->rq);
    > +     if (!err)
    > +         blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);

    Bad indentation. Also shouldn't you first update the queues and then
    unfreeze them?
Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes the queue
So I don't think the order could be reversed.

    Thanks, Roger.

Thanks,
Anchal



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-03 23:33 [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation] Agarwal, Anchal
@ 2020-06-04  7:05 ` Roger Pau Monné
  2020-06-16 21:49   ` Anchal Agarwal
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-06-04  7:05 UTC (permalink / raw)
  To: Agarwal, Anchal
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

Hello,

On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
>  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
>     On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
>     > From: Munehisa Kamata <kamatam@amazon.com>
>     > 
>     > S4 power transition states are much different than xen
>     > suspend/resume. Former is visible to the guest and frontend drivers should
>     > be aware of the state transitions and should be able to take appropriate
>     > actions when needed. In transition to S4 we need to make sure that at least
>     > all the in-flight blkif requests get completed, since they probably contain
>     > bits of the guest's memory image and that's not going to get saved any
>     > other way. Hence, re-issuing of in-flight requests as in case of xen resume
>     > will not work here. This is in contrast to xen-suspend where we need to
>     > freeze with as little processing as possible to avoid dirtying RAM late in
>     > the migration cycle and we know that in-flight data can wait.
>     > 
>     > Add freeze, thaw and restore callbacks for PM suspend and hibernation
>     > support. All frontend drivers that needs to use PM_HIBERNATION/PM_SUSPEND
>     > events, need to implement these xenbus_driver callbacks. The freeze handler
>     > stops block-layer queue and disconnect the frontend from the backend while
>     > freeing ring_info and associated resources. Before disconnecting from the
>     > backend, we need to prevent any new IO from being queued and wait for existing
>     > IO to complete. Freeze/unfreeze of the queues will guarantee that there are no
>     > requests in use on the shared ring. However, for sanity we should check
>     > state of the ring before disconnecting to make sure that there are no
>     > outstanding requests to be processed on the ring. The restore handler
>     > re-allocates ring_info, unquiesces and unfreezes the queue and re-connect to
>     > the backend, so that rest of the kernel can continue to use the block device
>     > transparently.
>     > 
>     > Note:For older backends,if a backend doesn't have commit'12ea729645ace'
>     > xen/blkback: unmap all persistent grants when frontend gets disconnected,
>     > the frontend may see massive amount of grant table warning when freeing
>     > resources.
>     > [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
>     > [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!
>     > 
>     > In this case, persistent grants would need to be disabled.
>     > 
>     > [Anchal Changelog: Removed timeout/request during blkfront freeze.
>     > Reworked the whole patch to work with blk-mq and incorporate upstream's
>     > comments]
> 
>     Please tag versions using vX and it would be helpful if you could list
>     the specific changes that you performed between versions. There where
>     3 RFC versions IIRC, and there's no log of the changes between them.
> 
> I will elaborate on "upstream's comments" in my changelog in my next round of patches.

Sorry for being picky, but can you please make sure your email client
properly quotes previous emails on reply. Note the lack of '>' added
to the quoted parts of your reply.

>     > +                     }
>     > +
>     >                       break;
>     > +             }
>     > +
>     > +             /*
>     > +              * We may somehow receive backend's Closed again while thawing
>     > +              * or restoring and it causes thawing or restoring to fail.
>     > +              * Ignore such unexpected state regardless of the backend state.
>     > +              */
>     > +             if (info->connected == BLKIF_STATE_FROZEN) {
> 
>     I think you can join this with the previous dev->state == XenbusStateClosed?
> 
>     Also, won't the device be in the Closed state already if it's in state
>     frozen?
> Yes but I think this mostly due to a hypothetical case if during thawing backend switches to Closed state.
> I am not entirely sure if that could happen. Could use some expertise here.

I think the frontend seeing the backend in the closed state during
restore would be a bug that should prevent the frontend from
resuming.

>     > +     /* Kick the backend to disconnect */
>     > +     xenbus_switch_state(dev, XenbusStateClosing);
>     > +
>     > +     /*
>     > +      * We don't want to move forward before the frontend is diconnected
>     > +      * from the backend cleanly.
>     > +      */
>     > +     timeout = wait_for_completion_timeout(&info->wait_backend_disconnected,
>     > +                                           timeout);
>     > +     if (!timeout) {
>     > +             err = -EBUSY;
> 
>     Note err is only used here, and I think could just be dropped.
> 
> This err is what's being returned from the function. Am I missing anything?

Just 'return -EBUSY;' directly, and remove the top level variable. You
can also use -EBUSY directly in the xenbus_dev_error call. Anyway, not
that important.

>     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
>     > +                              "the device may become inconsistent state");
> 
>     Leaving the device in this state is quite bad, as it's in a closed
>     state and with the queues frozen. You should make an attempt to
>     restore things to a working state.
> 
> You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to 
> leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why 
> I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.

You can manually force this state, and then check that it will behave
correctly. I would expect that on a failure to disconnect from the
backend you should switch the frontend to the 'Init' state in order to
try to reconnect to the backend when possible.

>     > +     }
>     > +
>     > +     return err;
>     > +}
>     > +
>     > +static int blkfront_restore(struct xenbus_device *dev)
>     > +{
>     > +     struct blkfront_info *info = dev_get_drvdata(&dev->dev);
>     > +     int err = 0;
>     > +
>     > +     err = talk_to_blkback(dev, info);
>     > +     blk_mq_unquiesce_queue(info->rq);
>     > +     blk_mq_unfreeze_queue(info->rq);
>     > +     if (!err)
>     > +         blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
> 
>     Bad indentation. Also shouldn't you first update the queues and then
>     unfreeze them?
> Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes the queue
> So I don't think the order could be reversed.

Regardless of what blk_mq_update_nr_hw_queues does, I don't think it's
correct to unfreeze the queues without having updated them. Also the
freezing/unfreezing uses a refcount, so I think it's perfectly fine to
call blk_mq_update_nr_hw_queues first and then unfreeze the queues.

Also note that talk_to_blkback returning an error should likely
prevent any unfreezing, as the queues won't be updated to match the
parameters of the backend.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-04  7:05 ` Roger Pau Monné
@ 2020-06-16 21:49   ` Anchal Agarwal
  2020-06-16 22:30     ` Anchal Agarwal
  2020-06-17  8:35     ` Roger Pau Monné
  0 siblings, 2 replies; 14+ messages in thread
From: Anchal Agarwal @ 2020-06-16 21:49 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Hello,
> 
> On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> >     On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
> >     > From: Munehisa Kamata <kamatam@amazon.com>
> >     >
> >     > S4 power transition states are much different than xen
> >     > suspend/resume. Former is visible to the guest and frontend drivers should
> >     > be aware of the state transitions and should be able to take appropriate
> >     > actions when needed. In transition to S4 we need to make sure that at least
> >     > all the in-flight blkif requests get completed, since they probably contain
> >     > bits of the guest's memory image and that's not going to get saved any
> >     > other way. Hence, re-issuing of in-flight requests as in case of xen resume
> >     > will not work here. This is in contrast to xen-suspend where we need to
> >     > freeze with as little processing as possible to avoid dirtying RAM late in
> >     > the migration cycle and we know that in-flight data can wait.
> >     >
> >     > Add freeze, thaw and restore callbacks for PM suspend and hibernation
> >     > support. All frontend drivers that needs to use PM_HIBERNATION/PM_SUSPEND
> >     > events, need to implement these xenbus_driver callbacks. The freeze handler
> >     > stops block-layer queue and disconnect the frontend from the backend while
> >     > freeing ring_info and associated resources. Before disconnecting from the
> >     > backend, we need to prevent any new IO from being queued and wait for existing
> >     > IO to complete. Freeze/unfreeze of the queues will guarantee that there are no
> >     > requests in use on the shared ring. However, for sanity we should check
> >     > state of the ring before disconnecting to make sure that there are no
> >     > outstanding requests to be processed on the ring. The restore handler
> >     > re-allocates ring_info, unquiesces and unfreezes the queue and re-connect to
> >     > the backend, so that rest of the kernel can continue to use the block device
> >     > transparently.
> >     >
> >     > Note:For older backends,if a backend doesn't have commit'12ea729645ace'
> >     > xen/blkback: unmap all persistent grants when frontend gets disconnected,
> >     > the frontend may see massive amount of grant table warning when freeing
> >     > resources.
> >     > [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> >     > [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!
> >     >
> >     > In this case, persistent grants would need to be disabled.
> >     >
> >     > [Anchal Changelog: Removed timeout/request during blkfront freeze.
> >     > Reworked the whole patch to work with blk-mq and incorporate upstream's
> >     > comments]
> >
> >     Please tag versions using vX and it would be helpful if you could list
> >     the specific changes that you performed between versions. There where
> >     3 RFC versions IIRC, and there's no log of the changes between them.
> >
> > I will elaborate on "upstream's comments" in my changelog in my next round of patches.
> 
> Sorry for being picky, but can you please make sure your email client
> properly quotes previous emails on reply. Note the lack of '>' added
> to the quoted parts of your reply.
That was just my outlook probably. Note taken.
> 
> >     > +                     }
> >     > +
> >     >                       break;
> >     > +             }
> >     > +
> >     > +             /*
> >     > +              * We may somehow receive backend's Closed again while thawing
> >     > +              * or restoring and it causes thawing or restoring to fail.
> >     > +              * Ignore such unexpected state regardless of the backend state.
> >     > +              */
> >     > +             if (info->connected == BLKIF_STATE_FROZEN) {
> >
> >     I think you can join this with the previous dev->state == XenbusStateClosed?
> >
> >     Also, won't the device be in the Closed state already if it's in state
> >     frozen?
> > Yes but I think this mostly due to a hypothetical case if during thawing backend switches to Closed state.
> > I am not entirely sure if that could happen. Could use some expertise here.
> 
> I think the frontend seeing the backend in the closed state during
> restore would be a bug that should prevent the frontend from
> resuming.
> 
> >     > +     /* Kick the backend to disconnect */
> >     > +     xenbus_switch_state(dev, XenbusStateClosing);
> >     > +
> >     > +     /*
> >     > +      * We don't want to move forward before the frontend is diconnected
> >     > +      * from the backend cleanly.
> >     > +      */
> >     > +     timeout = wait_for_completion_timeout(&info->wait_backend_disconnected,
> >     > +                                           timeout);
> >     > +     if (!timeout) {
> >     > +             err = -EBUSY;
> >
> >     Note err is only used here, and I think could just be dropped.
> >
> > This err is what's being returned from the function. Am I missing anything?
> 
> Just 'return -EBUSY;' directly, and remove the top level variable. You
> can also use -EBUSY directly in the xenbus_dev_error call. Anyway, not
> that important.
> 
> >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> >     > +                              "the device may become inconsistent state");
> >
> >     Leaving the device in this state is quite bad, as it's in a closed
> >     state and with the queues frozen. You should make an attempt to
> >     restore things to a working state.
> >
> > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> 
> You can manually force this state, and then check that it will behave
> correctly. I would expect that on a failure to disconnect from the
> backend you should switch the frontend to the 'Init' state in order to
> try to reconnect to the backend when possible.
> 
From what I understand forcing manually is, failing the freeze without
disconnect and try to revive the connection by unfreezing the
queues->reconnecting to backend [which never got diconnected]. May be even
tearing down things manually because I am not sure what state will frontend
see if backend fails to to disconnect at any point in time. I assumed connected.
Then again if its "CONNECTED" I may not need to tear down everything and start
from Initialising state because that may not work.

So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
I don't see it getting handled in the backend then what will be backend's state?
Will it still switch xenbus state to 'Closed'? If not what will frontend see, 
if it tries to read backend's state through xenbus_read_driver_state ?

So the flow be like:
Front end marks XenbusStateClosing
Backend marks its state as XenbusStateClosing
    Frontend marks XenbusStateClosed
    Backend disconnects calls xen_blkif_disconnect
       Backend fails to disconnect, the above function returns EBUSY
       What will be state of backend here? 
       Frontend did not tear down the rings if backend does not switches the
       state to 'Closed' in case of failure.

If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
won't be calling connect(). {From reading code in frontend_changed}
IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
we did not tear down anything so calling talk_to_blkback may not be needed

Does that sound correct?
> >     > +     }
> >     > +
> >     > +     return err;
> >     > +}
> >     > +
> >     > +static int blkfront_restore(struct xenbus_device *dev)
> >     > +{
> >     > +     struct blkfront_info *info = dev_get_drvdata(&dev->dev);
> >     > +     int err = 0;
> >     > +
> >     > +     err = talk_to_blkback(dev, info);
> >     > +     blk_mq_unquiesce_queue(info->rq);
> >     > +     blk_mq_unfreeze_queue(info->rq);
> >     > +     if (!err)
> >     > +         blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
> >
> >     Bad indentation. Also shouldn't you first update the queues and then
> >     unfreeze them?
> > Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes the queue
> > So I don't think the order could be reversed.
> 
> Regardless of what blk_mq_update_nr_hw_queues does, I don't think it's
> correct to unfreeze the queues without having updated them. Also the
> freezing/unfreezing uses a refcount, so I think it's perfectly fine to
> call blk_mq_update_nr_hw_queues first and then unfreeze the queues.
> 
> Also note that talk_to_blkback returning an error should likely
> prevent any unfreezing, as the queues won't be updated to match the
> parameters of the backend.
>
I think you are right here. Will send out fixes in V2
> Thanks, Roger.
> 
Thanks,
Anchal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-16 21:49   ` Anchal Agarwal
@ 2020-06-16 22:30     ` Anchal Agarwal
  2020-06-17  8:38       ` Roger Pau Monné
  2020-06-17  8:35     ` Roger Pau Monné
  1 sibling, 1 reply; 14+ messages in thread
From: Anchal Agarwal @ 2020-06-16 22:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > 
> > 
> > 
> > Hello,
> > 
> > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >
> > >
> > >
> > >     On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
> > >     > From: Munehisa Kamata <kamatam@amazon.com>
> > >     >
> > >     > S4 power transition states are much different than xen
> > >     > suspend/resume. Former is visible to the guest and frontend drivers should
> > >     > be aware of the state transitions and should be able to take appropriate
> > >     > actions when needed. In transition to S4 we need to make sure that at least
> > >     > all the in-flight blkif requests get completed, since they probably contain
> > >     > bits of the guest's memory image and that's not going to get saved any
> > >     > other way. Hence, re-issuing of in-flight requests as in case of xen resume
> > >     > will not work here. This is in contrast to xen-suspend where we need to
> > >     > freeze with as little processing as possible to avoid dirtying RAM late in
> > >     > the migration cycle and we know that in-flight data can wait.
> > >     >
> > >     > Add freeze, thaw and restore callbacks for PM suspend and hibernation
> > >     > support. All frontend drivers that needs to use PM_HIBERNATION/PM_SUSPEND
> > >     > events, need to implement these xenbus_driver callbacks. The freeze handler
> > >     > stops block-layer queue and disconnect the frontend from the backend while
> > >     > freeing ring_info and associated resources. Before disconnecting from the
> > >     > backend, we need to prevent any new IO from being queued and wait for existing
> > >     > IO to complete. Freeze/unfreeze of the queues will guarantee that there are no
> > >     > requests in use on the shared ring. However, for sanity we should check
> > >     > state of the ring before disconnecting to make sure that there are no
> > >     > outstanding requests to be processed on the ring. The restore handler
> > >     > re-allocates ring_info, unquiesces and unfreezes the queue and re-connect to
> > >     > the backend, so that rest of the kernel can continue to use the block device
> > >     > transparently.
> > >     >
> > >     > Note:For older backends,if a backend doesn't have commit'12ea729645ace'
> > >     > xen/blkback: unmap all persistent grants when frontend gets disconnected,
> > >     > the frontend may see massive amount of grant table warning when freeing
> > >     > resources.
> > >     > [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> > >     > [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use!
> > >     >
> > >     > In this case, persistent grants would need to be disabled.
> > >     >
> > >     > [Anchal Changelog: Removed timeout/request during blkfront freeze.
> > >     > Reworked the whole patch to work with blk-mq and incorporate upstream's
> > >     > comments]
> > >
> > >     Please tag versions using vX and it would be helpful if you could list
> > >     the specific changes that you performed between versions. There where
> > >     3 RFC versions IIRC, and there's no log of the changes between them.
> > >
> > > I will elaborate on "upstream's comments" in my changelog in my next round of patches.
> > 
> > Sorry for being picky, but can you please make sure your email client
> > properly quotes previous emails on reply. Note the lack of '>' added
> > to the quoted parts of your reply.
> That was just my outlook probably. Note taken.
> > 
> > >     > +                     }
> > >     > +
> > >     >                       break;
> > >     > +             }
> > >     > +
> > >     > +             /*
> > >     > +              * We may somehow receive backend's Closed again while thawing
> > >     > +              * or restoring and it causes thawing or restoring to fail.
> > >     > +              * Ignore such unexpected state regardless of the backend state.
> > >     > +              */
> > >     > +             if (info->connected == BLKIF_STATE_FROZEN) {
> > >
> > >     I think you can join this with the previous dev->state == XenbusStateClosed?
> > >
> > >     Also, won't the device be in the Closed state already if it's in state
> > >     frozen?
> > > Yes but I think this mostly due to a hypothetical case if during thawing backend switches to Closed state.
> > > I am not entirely sure if that could happen. Could use some expertise here.
> > 
> > I think the frontend seeing the backend in the closed state during
> > restore would be a bug that should prevent the frontend from
> > resuming.
> > 
> > >     > +     /* Kick the backend to disconnect */
> > >     > +     xenbus_switch_state(dev, XenbusStateClosing);
> > >     > +
> > >     > +     /*
> > >     > +      * We don't want to move forward before the frontend is diconnected
> > >     > +      * from the backend cleanly.
> > >     > +      */
> > >     > +     timeout = wait_for_completion_timeout(&info->wait_backend_disconnected,
> > >     > +                                           timeout);
> > >     > +     if (!timeout) {
> > >     > +             err = -EBUSY;
> > >
> > >     Note err is only used here, and I think could just be dropped.
> > >
> > > This err is what's being returned from the function. Am I missing anything?
> > 
> > Just 'return -EBUSY;' directly, and remove the top level variable. You
> > can also use -EBUSY directly in the xenbus_dev_error call. Anyway, not
> > that important.
> > 
> > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > >     > +                              "the device may become inconsistent state");
> > >
> > >     Leaving the device in this state is quite bad, as it's in a closed
> > >     state and with the queues frozen. You should make an attempt to
> > >     restore things to a working state.
> > >
> > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > 
> > You can manually force this state, and then check that it will behave
> > correctly. I would expect that on a failure to disconnect from the
> > backend you should switch the frontend to the 'Init' state in order to
> > try to reconnect to the backend when possible.
> > 
> From what I understand forcing manually is, failing the freeze without
> disconnect and try to revive the connection by unfreezing the
> queues->reconnecting to backend [which never got diconnected]. May be even
> tearing down things manually because I am not sure what state will frontend
> see if backend fails to to disconnect at any point in time. I assumed connected.
> Then again if its "CONNECTED" I may not need to tear down everything and start
> from Initialising state because that may not work.
> 
> So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> I don't see it getting handled in the backend then what will be backend's state?
> Will it still switch xenbus state to 'Closed'? If not what will frontend see, 
> if it tries to read backend's state through xenbus_read_driver_state ?
> 
> So the flow be like:
> Front end marks XenbusStateClosing
> Backend marks its state as XenbusStateClosing
>     Frontend marks XenbusStateClosed
>     Backend disconnects calls xen_blkif_disconnect
>        Backend fails to disconnect, the above function returns EBUSY
>        What will be state of backend here? 
>        Frontend did not tear down the rings if backend does not switches the
>        state to 'Closed' in case of failure.
> 
> If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> won't be calling connect(). {From reading code in frontend_changed}
> IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> we did not tear down anything so calling talk_to_blkback may not be needed
> 
> Does that sound correct?
Send that too quickly, I also meant to add XenBusIntialised state should be ok
only if we expect backend will stay in "Connected" state. Also, I experimented
with that notion. I am little worried about the correctness here. 
Can the backend  come to an Unknown state somehow?
> > >     > +     }
> > >     > +
> > >     > +     return err;
> > >     > +}
> > >     > +
> > >     > +static int blkfront_restore(struct xenbus_device *dev)
> > >     > +{
> > >     > +     struct blkfront_info *info = dev_get_drvdata(&dev->dev);
> > >     > +     int err = 0;
> > >     > +
> > >     > +     err = talk_to_blkback(dev, info);
> > >     > +     blk_mq_unquiesce_queue(info->rq);
> > >     > +     blk_mq_unfreeze_queue(info->rq);
> > >     > +     if (!err)
> > >     > +         blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_rings);
> > >
> > >     Bad indentation. Also shouldn't you first update the queues and then
> > >     unfreeze them?
> > > Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes the queue
> > > So I don't think the order could be reversed.
> > 
> > Regardless of what blk_mq_update_nr_hw_queues does, I don't think it's
> > correct to unfreeze the queues without having updated them. Also the
> > freezing/unfreezing uses a refcount, so I think it's perfectly fine to
> > call blk_mq_update_nr_hw_queues first and then unfreeze the queues.
> > 
> > Also note that talk_to_blkback returning an error should likely
> > prevent any unfreezing, as the queues won't be updated to match the
> > parameters of the backend.
> >
> I think you are right here. Will send out fixes in V2
> > Thanks, Roger.
> > 
> Thanks,
> Anchal
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-16 21:49   ` Anchal Agarwal
  2020-06-16 22:30     ` Anchal Agarwal
@ 2020-06-17  8:35     ` Roger Pau Monné
  2020-06-19 23:43       ` Anchal Agarwal
  1 sibling, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-06-17  8:35 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > >     > +                              "the device may become inconsistent state");
> > >
> > >     Leaving the device in this state is quite bad, as it's in a closed
> > >     state and with the queues frozen. You should make an attempt to
> > >     restore things to a working state.
> > >
> > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > 
> > You can manually force this state, and then check that it will behave
> > correctly. I would expect that on a failure to disconnect from the
> > backend you should switch the frontend to the 'Init' state in order to
> > try to reconnect to the backend when possible.
> > 
> From what I understand forcing manually is, failing the freeze without
> disconnect and try to revive the connection by unfreezing the
> queues->reconnecting to backend [which never got diconnected]. May be even
> tearing down things manually because I am not sure what state will frontend
> see if backend fails to to disconnect at any point in time. I assumed connected.
> Then again if its "CONNECTED" I may not need to tear down everything and start
> from Initialising state because that may not work.
> 
> So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> I don't see it getting handled in the backend then what will be backend's state?
> Will it still switch xenbus state to 'Closed'? If not what will frontend see, 
> if it tries to read backend's state through xenbus_read_driver_state ?
> 
> So the flow be like:
> Front end marks XenbusStateClosing
> Backend marks its state as XenbusStateClosing
>     Frontend marks XenbusStateClosed
>     Backend disconnects calls xen_blkif_disconnect
>        Backend fails to disconnect, the above function returns EBUSY
>        What will be state of backend here?

Backend should stay in state 'Closing' then, until it can finish
tearing down.

>        Frontend did not tear down the rings if backend does not switches the
>        state to 'Closed' in case of failure.
> 
> If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend

Backend will stay in state 'Closing' I think.

> won't be calling connect(). {From reading code in frontend_changed}
> IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> we did not tear down anything so calling talk_to_blkback may not be needed
> 
> Does that sound correct?

I think switching to the initial state in order to try to attempt a
reconnection would be our best bet here.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-16 22:30     ` Anchal Agarwal
@ 2020-06-17  8:38       ` Roger Pau Monné
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2020-06-17  8:38 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Tue, Jun 16, 2020 at 10:30:03PM +0000, Anchal Agarwal wrote:
> On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > >     On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
> > > >     > From: Munehisa Kamata <kamatam@amazon.com>
> > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > >     > +                              "the device may become inconsistent state");
> > > >
> > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > >     state and with the queues frozen. You should make an attempt to
> > > >     restore things to a working state.
> > > >
> > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > > 
> > > You can manually force this state, and then check that it will behave
> > > correctly. I would expect that on a failure to disconnect from the
> > > backend you should switch the frontend to the 'Init' state in order to
> > > try to reconnect to the backend when possible.
> > > 
> > From what I understand forcing manually is, failing the freeze without
> > disconnect and try to revive the connection by unfreezing the
> > queues->reconnecting to backend [which never got diconnected]. May be even
> > tearing down things manually because I am not sure what state will frontend
> > see if backend fails to to disconnect at any point in time. I assumed connected.
> > Then again if its "CONNECTED" I may not need to tear down everything and start
> > from Initialising state because that may not work.
> > 
> > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > I don't see it getting handled in the backend then what will be backend's state?
> > Will it still switch xenbus state to 'Closed'? If not what will frontend see, 
> > if it tries to read backend's state through xenbus_read_driver_state ?
> > 
> > So the flow be like:
> > Front end marks XenbusStateClosing
> > Backend marks its state as XenbusStateClosing
> >     Frontend marks XenbusStateClosed
> >     Backend disconnects calls xen_blkif_disconnect
> >        Backend fails to disconnect, the above function returns EBUSY
> >        What will be state of backend here? 
> >        Frontend did not tear down the rings if backend does not switches the
> >        state to 'Closed' in case of failure.
> > 
> > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> > won't be calling connect(). {From reading code in frontend_changed}
> > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > we did not tear down anything so calling talk_to_blkback may not be needed
> > 
> > Does that sound correct?
> Send that too quickly, I also meant to add XenBusIntialised state should be ok
> only if we expect backend will stay in "Connected" state. Also, I experimented
> with that notion. I am little worried about the correctness here. 
> Can the backend  come to an Unknown state somehow?

Not really, there's no such thing as an Unknown state.

There are no guarantees about what a backend can do really, so it
could indeed switch to a not recognized state, but that would be a
bug in the backend.

Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-17  8:35     ` Roger Pau Monné
@ 2020-06-19 23:43       ` Anchal Agarwal
  2020-06-22  8:38         ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Anchal Agarwal @ 2020-06-19 23:43 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > >     > +                              "the device may become inconsistent state");
> > > >
> > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > >     state and with the queues frozen. You should make an attempt to
> > > >     restore things to a working state.
> > > >
> > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > >
> > > You can manually force this state, and then check that it will behave
> > > correctly. I would expect that on a failure to disconnect from the
> > > backend you should switch the frontend to the 'Init' state in order to
> > > try to reconnect to the backend when possible.
> > >
> > From what I understand forcing manually is, failing the freeze without
> > disconnect and try to revive the connection by unfreezing the
> > queues->reconnecting to backend [which never got diconnected]. May be even
> > tearing down things manually because I am not sure what state will frontend
> > see if backend fails to to disconnect at any point in time. I assumed connected.
> > Then again if its "CONNECTED" I may not need to tear down everything and start
> > from Initialising state because that may not work.
> >
> > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > I don't see it getting handled in the backend then what will be backend's state?
> > Will it still switch xenbus state to 'Closed'? If not what will frontend see,
> > if it tries to read backend's state through xenbus_read_driver_state ?
> >
> > So the flow be like:
> > Front end marks XenbusStateClosing
> > Backend marks its state as XenbusStateClosing
> >     Frontend marks XenbusStateClosed
> >     Backend disconnects calls xen_blkif_disconnect
> >        Backend fails to disconnect, the above function returns EBUSY
> >        What will be state of backend here?
> 
> Backend should stay in state 'Closing' then, until it can finish
> tearing down.
> 
It disconnects the ring after switching to connected state too. 
> >        Frontend did not tear down the rings if backend does not switches the
> >        state to 'Closed' in case of failure.
> >
> > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> 
> Backend will stay in state 'Closing' I think.
> 
> > won't be calling connect(). {From reading code in frontend_changed}
> > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > we did not tear down anything so calling talk_to_blkback may not be needed
> >
> > Does that sound correct?
> 
> I think switching to the initial state in order to try to attempt a
> reconnection would be our best bet here.
>
It does not seems to work correctly, I get hung tasks all over and all the
requests to filesystem gets stuck. Backend does shows the state as connected
after xenbus_dev_suspend fails but I think there may be something missing.
I don't seem to get IO interrupts thereafter i.e hitting the function blkif_interrupts.
I think just marking it initialised may not be the only thing.
Here is a short description of what I am trying to do:
So, on timeout:
    Switch XenBusState to "Initialized"
    unquiesce/unfreeze the queues and return
    mark info->connected = BLKIF_STATE_CONNECTED
    return EBUSY

I even allowed blkfront_connect to switch state to "CONNECTED" rather me doing
it explicitly as mentioned above without re-allocating/re-registering the device
just to make sure bklfront_info object has all the right values.
Do you see anythign missing here?

Also, while wrapping my brain around this recovery, one of the reasons I see
backend may not disconnct is if there are inflight I/O requests. There cannot be
pending I/O on shared ring because that check is already there before we switch
bus state to Closing. Also, queues are frozen so there will be no new I/O.
The only situation I can think of is since there too much of memory state to be
written and modified  that may not get completed within the timeout provided and
disconnect may fail. In that case, the time out needs to be configurable by the 
user since the hibernation may always fail depending on infrastructure or workload
running during hibernation.

Thanks,
Anchal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-19 23:43       ` Anchal Agarwal
@ 2020-06-22  8:38         ` Roger Pau Monné
  2020-06-23  0:43           ` Anchal Agarwal
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-06-22  8:38 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Fri, Jun 19, 2020 at 11:43:12PM +0000, Anchal Agarwal wrote:
> On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > 
> > 
> > 
> > On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > > >     > +                              "the device may become inconsistent state");
> > > > >
> > > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > > >     state and with the queues frozen. You should make an attempt to
> > > > >     restore things to a working state.
> > > > >
> > > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > > >
> > > > You can manually force this state, and then check that it will behave
> > > > correctly. I would expect that on a failure to disconnect from the
> > > > backend you should switch the frontend to the 'Init' state in order to
> > > > try to reconnect to the backend when possible.
> > > >
> > > From what I understand forcing manually is, failing the freeze without
> > > disconnect and try to revive the connection by unfreezing the
> > > queues->reconnecting to backend [which never got diconnected]. May be even
> > > tearing down things manually because I am not sure what state will frontend
> > > see if backend fails to to disconnect at any point in time. I assumed connected.
> > > Then again if its "CONNECTED" I may not need to tear down everything and start
> > > from Initialising state because that may not work.
> > >
> > > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > > I don't see it getting handled in the backend then what will be backend's state?
> > > Will it still switch xenbus state to 'Closed'? If not what will frontend see,
> > > if it tries to read backend's state through xenbus_read_driver_state ?
> > >
> > > So the flow be like:
> > > Front end marks XenbusStateClosing
> > > Backend marks its state as XenbusStateClosing
> > >     Frontend marks XenbusStateClosed
> > >     Backend disconnects calls xen_blkif_disconnect
> > >        Backend fails to disconnect, the above function returns EBUSY
> > >        What will be state of backend here?
> > 
> > Backend should stay in state 'Closing' then, until it can finish
> > tearing down.
> > 
> It disconnects the ring after switching to connected state too. 
> > >        Frontend did not tear down the rings if backend does not switches the
> > >        state to 'Closed' in case of failure.
> > >
> > > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> > 
> > Backend will stay in state 'Closing' I think.
> > 
> > > won't be calling connect(). {From reading code in frontend_changed}
> > > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > > we did not tear down anything so calling talk_to_blkback may not be needed
> > >
> > > Does that sound correct?
> > 
> > I think switching to the initial state in order to try to attempt a
> > reconnection would be our best bet here.
> >
> It does not seems to work correctly, I get hung tasks all over and all the
> requests to filesystem gets stuck. Backend does shows the state as connected
> after xenbus_dev_suspend fails but I think there may be something missing.
> I don't seem to get IO interrupts thereafter i.e hitting the function blkif_interrupts.
> I think just marking it initialised may not be the only thing.
> Here is a short description of what I am trying to do:
> So, on timeout:
>     Switch XenBusState to "Initialized"
>     unquiesce/unfreeze the queues and return
>     mark info->connected = BLKIF_STATE_CONNECTED

If xenbus state is Initialized isn't it wrong to set info->connected
== CONNECTED?

You should tear down all the internal state (like a proper close)?

>     return EBUSY
> 
> I even allowed blkfront_connect to switch state to "CONNECTED" rather me doing
> it explicitly as mentioned above without re-allocating/re-registering the device
> just to make sure bklfront_info object has all the right values.
> Do you see anythign missing here?

I'm afraid you will have to do a little bit of debugging here to
figure out what's going on. You can add printk's to several places to
see which path is taken, and why blkfront ends in such state.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-22  8:38         ` Roger Pau Monné
@ 2020-06-23  0:43           ` Anchal Agarwal
  2020-06-23  8:19             ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Anchal Agarwal @ 2020-06-23  0:43 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Mon, Jun 22, 2020 at 10:38:46AM +0200, Roger Pau Monné wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Fri, Jun 19, 2020 at 11:43:12PM +0000, Anchal Agarwal wrote:
> > On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >
> > >
> > >
> > > On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > > > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > > > >     > +                              "the device may become inconsistent state");
> > > > > >
> > > > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > > > >     state and with the queues frozen. You should make an attempt to
> > > > > >     restore things to a working state.
> > > > > >
> > > > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > > > >
> > > > > You can manually force this state, and then check that it will behave
> > > > > correctly. I would expect that on a failure to disconnect from the
> > > > > backend you should switch the frontend to the 'Init' state in order to
> > > > > try to reconnect to the backend when possible.
> > > > >
> > > > From what I understand forcing manually is, failing the freeze without
> > > > disconnect and try to revive the connection by unfreezing the
> > > > queues->reconnecting to backend [which never got diconnected]. May be even
> > > > tearing down things manually because I am not sure what state will frontend
> > > > see if backend fails to to disconnect at any point in time. I assumed connected.
> > > > Then again if its "CONNECTED" I may not need to tear down everything and start
> > > > from Initialising state because that may not work.
> > > >
> > > > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > > > I don't see it getting handled in the backend then what will be backend's state?
> > > > Will it still switch xenbus state to 'Closed'? If not what will frontend see,
> > > > if it tries to read backend's state through xenbus_read_driver_state ?
> > > >
> > > > So the flow be like:
> > > > Front end marks XenbusStateClosing
> > > > Backend marks its state as XenbusStateClosing
> > > >     Frontend marks XenbusStateClosed
> > > >     Backend disconnects calls xen_blkif_disconnect
> > > >        Backend fails to disconnect, the above function returns EBUSY
> > > >        What will be state of backend here?
> > >
> > > Backend should stay in state 'Closing' then, until it can finish
> > > tearing down.
> > >
> > It disconnects the ring after switching to connected state too.
> > > >        Frontend did not tear down the rings if backend does not switches the
> > > >        state to 'Closed' in case of failure.
> > > >
> > > > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> > >
> > > Backend will stay in state 'Closing' I think.
> > >
> > > > won't be calling connect(). {From reading code in frontend_changed}
> > > > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > > > we did not tear down anything so calling talk_to_blkback may not be needed
> > > >
> > > > Does that sound correct?
> > >
> > > I think switching to the initial state in order to try to attempt a
> > > reconnection would be our best bet here.
> > >
> > It does not seems to work correctly, I get hung tasks all over and all the
> > requests to filesystem gets stuck. Backend does shows the state as connected
> > after xenbus_dev_suspend fails but I think there may be something missing.
> > I don't seem to get IO interrupts thereafter i.e hitting the function blkif_interrupts.
> > I think just marking it initialised may not be the only thing.
> > Here is a short description of what I am trying to do:
> > So, on timeout:
> >     Switch XenBusState to "Initialized"
> >     unquiesce/unfreeze the queues and return
> >     mark info->connected = BLKIF_STATE_CONNECTED
> 
> If xenbus state is Initialized isn't it wrong to set info->connected
> == CONNECTED?
>
Yes, you are right earlier I was marking it explicitly but that was not right,
the connect path for blkfront will do that.
> You should tear down all the internal state (like a proper close)?
> 
Isn't that similar to disconnecting in the first place that failed during
freeze? Do you mean re-try to close but this time re-connect after close
basically do everything you would at "restore"?

Also, I experimented with that and it works intermittently. I want to take a
step back on this issue and ask few questions here:
1. Is fixing this recovery a blocker for me sending in a V2 version?

2. In our 2-3 years of supporting this feature at large scale we haven't seen this issue
where backend fails to disconnect. What we are trying to do here is create a
hypothetical situation where we leave backend in Closing state and try and see how it
recovers. The reason why I think it "may not" occur and the timeout of 5HZ is
sufficient is because we haven't come across even a single use-case where it
caused hibernation to fail.
The reason why I think "it may" occur is if we are running a really memory
intensive workload and ring is busy and is unable to complete all the requests
in the given timeout. This is very unlikely though.

3) Also, I do not think this may be straight forward to fix and expect
hibernation to work flawlessly in subsequent invocations. I am open to 
all suggestions.

Thanks,
Anchal
> >     return EBUSY
> >
> > I even allowed blkfront_connect to switch state to "CONNECTED" rather me doing
> > it explicitly as mentioned above without re-allocating/re-registering the device
> > just to make sure bklfront_info object has all the right values.
> > Do you see anythign missing here?
> 
> I'm afraid you will have to do a little bit of debugging here to
> figure out what's going on. You can add printk's to several places to
> see which path is taken, and why blkfront ends in such state.
>
> Thanks, Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-23  0:43           ` Anchal Agarwal
@ 2020-06-23  8:19             ` Roger Pau Monné
  2020-06-25 18:36               ` Anchal Agarwal
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-06-23  8:19 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Tue, Jun 23, 2020 at 12:43:14AM +0000, Anchal Agarwal wrote:
> On Mon, Jun 22, 2020 at 10:38:46AM +0200, Roger Pau Monné wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > 
> > 
> > 
> > On Fri, Jun 19, 2020 at 11:43:12PM +0000, Anchal Agarwal wrote:
> > > On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > >
> > > >
> > > >
> > > > On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > > > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > > > > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > > > > >     > +                              "the device may become inconsistent state");
> > > > > > >
> > > > > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > > > > >     state and with the queues frozen. You should make an attempt to
> > > > > > >     restore things to a working state.
> > > > > > >
> > > > > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > > > > >
> > > > > > You can manually force this state, and then check that it will behave
> > > > > > correctly. I would expect that on a failure to disconnect from the
> > > > > > backend you should switch the frontend to the 'Init' state in order to
> > > > > > try to reconnect to the backend when possible.
> > > > > >
> > > > > From what I understand forcing manually is, failing the freeze without
> > > > > disconnect and try to revive the connection by unfreezing the
> > > > > queues->reconnecting to backend [which never got diconnected]. May be even
> > > > > tearing down things manually because I am not sure what state will frontend
> > > > > see if backend fails to to disconnect at any point in time. I assumed connected.
> > > > > Then again if its "CONNECTED" I may not need to tear down everything and start
> > > > > from Initialising state because that may not work.
> > > > >
> > > > > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > > > > I don't see it getting handled in the backend then what will be backend's state?
> > > > > Will it still switch xenbus state to 'Closed'? If not what will frontend see,
> > > > > if it tries to read backend's state through xenbus_read_driver_state ?
> > > > >
> > > > > So the flow be like:
> > > > > Front end marks XenbusStateClosing
> > > > > Backend marks its state as XenbusStateClosing
> > > > >     Frontend marks XenbusStateClosed
> > > > >     Backend disconnects calls xen_blkif_disconnect
> > > > >        Backend fails to disconnect, the above function returns EBUSY
> > > > >        What will be state of backend here?
> > > >
> > > > Backend should stay in state 'Closing' then, until it can finish
> > > > tearing down.
> > > >
> > > It disconnects the ring after switching to connected state too.
> > > > >        Frontend did not tear down the rings if backend does not switches the
> > > > >        state to 'Closed' in case of failure.
> > > > >
> > > > > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> > > >
> > > > Backend will stay in state 'Closing' I think.
> > > >
> > > > > won't be calling connect(). {From reading code in frontend_changed}
> > > > > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > > > > we did not tear down anything so calling talk_to_blkback may not be needed
> > > > >
> > > > > Does that sound correct?
> > > >
> > > > I think switching to the initial state in order to try to attempt a
> > > > reconnection would be our best bet here.
> > > >
> > > It does not seems to work correctly, I get hung tasks all over and all the
> > > requests to filesystem gets stuck. Backend does shows the state as connected
> > > after xenbus_dev_suspend fails but I think there may be something missing.
> > > I don't seem to get IO interrupts thereafter i.e hitting the function blkif_interrupts.
> > > I think just marking it initialised may not be the only thing.
> > > Here is a short description of what I am trying to do:
> > > So, on timeout:
> > >     Switch XenBusState to "Initialized"
> > >     unquiesce/unfreeze the queues and return
> > >     mark info->connected = BLKIF_STATE_CONNECTED
> > 
> > If xenbus state is Initialized isn't it wrong to set info->connected
> > == CONNECTED?
> >
> Yes, you are right earlier I was marking it explicitly but that was not right,
> the connect path for blkfront will do that.
> > You should tear down all the internal state (like a proper close)?
> > 
> Isn't that similar to disconnecting in the first place that failed during
> freeze? Do you mean re-try to close but this time re-connect after close
> basically do everything you would at "restore"?

Last time I checked blkfront supported reconnections (ie: disconnect
from a backend and connect again). I was assuming we could apply the
same here on timeout, and just follow the same path where the frontend
waits indefinitely for the backend to close and then attempts to
reconnect.

> Also, I experimented with that and it works intermittently. I want to take a
> step back on this issue and ask few questions here:
> 1. Is fixing this recovery a blocker for me sending in a V2 version?

At the end of day it's your feature. I would certainly prefer for it
to work as good as possible, this being a recovery in case of failure
just make sure it does something sane (ie: crash/close the frontend)
and add a TODO note.

> 2. In our 2-3 years of supporting this feature at large scale we haven't seen this issue
> where backend fails to disconnect. What we are trying to do here is create a
> hypothetical situation where we leave backend in Closing state and try and see how it
> recovers. The reason why I think it "may not" occur and the timeout of 5HZ is
> sufficient is because we haven't come across even a single use-case where it
> caused hibernation to fail.
> The reason why I think "it may" occur is if we are running a really memory
> intensive workload and ring is busy and is unable to complete all the requests
> in the given timeout. This is very unlikely though.

As said above I would generally prefer for code to handle possible
failures the best way, and hence I think here it would be nice to
fallback to the normal disconnect path and just wait for the backend
to close.

You likely have this very well tuned to your own environment and
workloads, since this will now be upstream others might have more
contended systems where it could start to fail.

> 3) Also, I do not think this may be straight forward to fix and expect
> hibernation to work flawlessly in subsequent invocations. I am open to 
> all suggestions.

Right, adding a TODO would seem appropriate then.

Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-23  8:19             ` Roger Pau Monné
@ 2020-06-25 18:36               ` Anchal Agarwal
  2020-06-26  9:12                 ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Anchal Agarwal @ 2020-06-25 18:36 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Tue, Jun 23, 2020 at 10:19:03AM +0200, Roger Pau Monné wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Tue, Jun 23, 2020 at 12:43:14AM +0000, Anchal Agarwal wrote:
> > On Mon, Jun 22, 2020 at 10:38:46AM +0200, Roger Pau Monné wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >
> > >
> > >
> > > On Fri, Jun 19, 2020 at 11:43:12PM +0000, Anchal Agarwal wrote:
> > > > On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > > > > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > > > > > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > > > > > >     > +                              "the device may become inconsistent state");
> > > > > > > >
> > > > > > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > > > > > >     state and with the queues frozen. You should make an attempt to
> > > > > > > >     restore things to a working state.
> > > > > > > >
> > > > > > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > > > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > > > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > > > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > > > > > >
> > > > > > > You can manually force this state, and then check that it will behave
> > > > > > > correctly. I would expect that on a failure to disconnect from the
> > > > > > > backend you should switch the frontend to the 'Init' state in order to
> > > > > > > try to reconnect to the backend when possible.
> > > > > > >
> > > > > > From what I understand forcing manually is, failing the freeze without
> > > > > > disconnect and try to revive the connection by unfreezing the
> > > > > > queues->reconnecting to backend [which never got diconnected]. May be even
> > > > > > tearing down things manually because I am not sure what state will frontend
> > > > > > see if backend fails to to disconnect at any point in time. I assumed connected.
> > > > > > Then again if its "CONNECTED" I may not need to tear down everything and start
> > > > > > from Initialising state because that may not work.
> > > > > >
> > > > > > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > > > > > I don't see it getting handled in the backend then what will be backend's state?
> > > > > > Will it still switch xenbus state to 'Closed'? If not what will frontend see,
> > > > > > if it tries to read backend's state through xenbus_read_driver_state ?
> > > > > >
> > > > > > So the flow be like:
> > > > > > Front end marks XenbusStateClosing
> > > > > > Backend marks its state as XenbusStateClosing
> > > > > >     Frontend marks XenbusStateClosed
> > > > > >     Backend disconnects calls xen_blkif_disconnect
> > > > > >        Backend fails to disconnect, the above function returns EBUSY
> > > > > >        What will be state of backend here?
> > > > >
> > > > > Backend should stay in state 'Closing' then, until it can finish
> > > > > tearing down.
> > > > >
> > > > It disconnects the ring after switching to connected state too.
> > > > > >        Frontend did not tear down the rings if backend does not switches the
> > > > > >        state to 'Closed' in case of failure.
> > > > > >
> > > > > > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> > > > >
> > > > > Backend will stay in state 'Closing' I think.
> > > > >
> > > > > > won't be calling connect(). {From reading code in frontend_changed}
> > > > > > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > > > > > we did not tear down anything so calling talk_to_blkback may not be needed
> > > > > >
> > > > > > Does that sound correct?
> > > > >
> > > > > I think switching to the initial state in order to try to attempt a
> > > > > reconnection would be our best bet here.
> > > > >
> > > > It does not seems to work correctly, I get hung tasks all over and all the
> > > > requests to filesystem gets stuck. Backend does shows the state as connected
> > > > after xenbus_dev_suspend fails but I think there may be something missing.
> > > > I don't seem to get IO interrupts thereafter i.e hitting the function blkif_interrupts.
> > > > I think just marking it initialised may not be the only thing.
> > > > Here is a short description of what I am trying to do:
> > > > So, on timeout:
> > > >     Switch XenBusState to "Initialized"
> > > >     unquiesce/unfreeze the queues and return
> > > >     mark info->connected = BLKIF_STATE_CONNECTED
> > >
> > > If xenbus state is Initialized isn't it wrong to set info->connected
> > > == CONNECTED?
> > >
> > Yes, you are right earlier I was marking it explicitly but that was not right,
> > the connect path for blkfront will do that.
> > > You should tear down all the internal state (like a proper close)?
> > >
> > Isn't that similar to disconnecting in the first place that failed during
> > freeze? Do you mean re-try to close but this time re-connect after close
> > basically do everything you would at "restore"?
> 
> Last time I checked blkfront supported reconnections (ie: disconnect
> from a backend and connect again). I was assuming we could apply the
> same here on timeout, and just follow the same path where the frontend
> waits indefinitely for the backend to close and then attempts to
> reconnect.
> 
> > Also, I experimented with that and it works intermittently. I want to take a
> > step back on this issue and ask few questions here:
> > 1. Is fixing this recovery a blocker for me sending in a V2 version?
> 
> At the end of day it's your feature. I would certainly prefer for it
> to work as good as possible, this being a recovery in case of failure
> just make sure it does something sane (ie: crash/close the frontend)
> and add a TODO note.
> 
> > 2. In our 2-3 years of supporting this feature at large scale we haven't seen this issue
> > where backend fails to disconnect. What we are trying to do here is create a
> > hypothetical situation where we leave backend in Closing state and try and see how it
> > recovers. The reason why I think it "may not" occur and the timeout of 5HZ is
> > sufficient is because we haven't come across even a single use-case where it
> > caused hibernation to fail.
> > The reason why I think "it may" occur is if we are running a really memory
> > intensive workload and ring is busy and is unable to complete all the requests
> > in the given timeout. This is very unlikely though.
> 
> As said above I would generally prefer for code to handle possible
> failures the best way, and hence I think here it would be nice to
> fallback to the normal disconnect path and just wait for the backend
> to close.
>
Do you mind throwing some light in here, what that path may be, if its
straight forward to fix I would like to debug it a bit more. May be I am
missing some of the context here.

I was of the view we may just want to mark frontend closed which should do 
the job of freeing resources and then following the same flow as
blkfront_restore. That does not seems to work correctly 100% of the time.

> You likely have this very well tuned to your own environment and
> workloads, since this will now be upstream others might have more
> contended systems where it could start to fail.
> 
I agree, however, this is also from the testing I did with 100 of runs 
outside of EC2 running few tests of my own. 
> > 3) Also, I do not think this may be straight forward to fix and expect
> > hibernation to work flawlessly in subsequent invocations. I am open to
> > all suggestions.
> 
> Right, adding a TODO would seem appropriate then.
>
Just to double check, I will send in a V2 with this marked as TO-DO?
> Roger.

Thanks,
Anchal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-25 18:36               ` Anchal Agarwal
@ 2020-06-26  9:12                 ` Roger Pau Monné
  2020-06-29 19:20                   ` Anchal Agarwal
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-06-26  9:12 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Thu, Jun 25, 2020 at 06:36:59PM +0000, Anchal Agarwal wrote:
> On Tue, Jun 23, 2020 at 10:19:03AM +0200, Roger Pau Monné wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > 
> > 
> > 
> > On Tue, Jun 23, 2020 at 12:43:14AM +0000, Anchal Agarwal wrote:
> > > On Mon, Jun 22, 2020 at 10:38:46AM +0200, Roger Pau Monné wrote:
> > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > >
> > > >
> > > >
> > > > On Fri, Jun 19, 2020 at 11:43:12PM +0000, Anchal Agarwal wrote:
> > > > > On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> > > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > > > > > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > > > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > > > > > > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > > > > > > >     > +                              "the device may become inconsistent state");
> > > > > > > > >
> > > > > > > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > > > > > > >     state and with the queues frozen. You should make an attempt to
> > > > > > > > >     restore things to a working state.
> > > > > > > > >
> > > > > > > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > > > > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > > > > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > > > > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > > > > > > >
> > > > > > > > You can manually force this state, and then check that it will behave
> > > > > > > > correctly. I would expect that on a failure to disconnect from the
> > > > > > > > backend you should switch the frontend to the 'Init' state in order to
> > > > > > > > try to reconnect to the backend when possible.
> > > > > > > >
> > > > > > > From what I understand forcing manually is, failing the freeze without
> > > > > > > disconnect and try to revive the connection by unfreezing the
> > > > > > > queues->reconnecting to backend [which never got diconnected]. May be even
> > > > > > > tearing down things manually because I am not sure what state will frontend
> > > > > > > see if backend fails to to disconnect at any point in time. I assumed connected.
> > > > > > > Then again if its "CONNECTED" I may not need to tear down everything and start
> > > > > > > from Initialising state because that may not work.
> > > > > > >
> > > > > > > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > > > > > > I don't see it getting handled in the backend then what will be backend's state?
> > > > > > > Will it still switch xenbus state to 'Closed'? If not what will frontend see,
> > > > > > > if it tries to read backend's state through xenbus_read_driver_state ?
> > > > > > >
> > > > > > > So the flow be like:
> > > > > > > Front end marks XenbusStateClosing
> > > > > > > Backend marks its state as XenbusStateClosing
> > > > > > >     Frontend marks XenbusStateClosed
> > > > > > >     Backend disconnects calls xen_blkif_disconnect
> > > > > > >        Backend fails to disconnect, the above function returns EBUSY
> > > > > > >        What will be state of backend here?
> > > > > >
> > > > > > Backend should stay in state 'Closing' then, until it can finish
> > > > > > tearing down.
> > > > > >
> > > > > It disconnects the ring after switching to connected state too.
> > > > > > >        Frontend did not tear down the rings if backend does not switches the
> > > > > > >        state to 'Closed' in case of failure.
> > > > > > >
> > > > > > > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> > > > > >
> > > > > > Backend will stay in state 'Closing' I think.
> > > > > >
> > > > > > > won't be calling connect(). {From reading code in frontend_changed}
> > > > > > > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > > > > > > we did not tear down anything so calling talk_to_blkback may not be needed
> > > > > > >
> > > > > > > Does that sound correct?
> > > > > >
> > > > > > I think switching to the initial state in order to try to attempt a
> > > > > > reconnection would be our best bet here.
> > > > > >
> > > > > It does not seems to work correctly, I get hung tasks all over and all the
> > > > > requests to filesystem gets stuck. Backend does shows the state as connected
> > > > > after xenbus_dev_suspend fails but I think there may be something missing.
> > > > > I don't seem to get IO interrupts thereafter i.e hitting the function blkif_interrupts.
> > > > > I think just marking it initialised may not be the only thing.
> > > > > Here is a short description of what I am trying to do:
> > > > > So, on timeout:
> > > > >     Switch XenBusState to "Initialized"
> > > > >     unquiesce/unfreeze the queues and return
> > > > >     mark info->connected = BLKIF_STATE_CONNECTED
> > > >
> > > > If xenbus state is Initialized isn't it wrong to set info->connected
> > > > == CONNECTED?
> > > >
> > > Yes, you are right earlier I was marking it explicitly but that was not right,
> > > the connect path for blkfront will do that.
> > > > You should tear down all the internal state (like a proper close)?
> > > >
> > > Isn't that similar to disconnecting in the first place that failed during
> > > freeze? Do you mean re-try to close but this time re-connect after close
> > > basically do everything you would at "restore"?
> > 
> > Last time I checked blkfront supported reconnections (ie: disconnect
> > from a backend and connect again). I was assuming we could apply the
> > same here on timeout, and just follow the same path where the frontend
> > waits indefinitely for the backend to close and then attempts to
> > reconnect.
> > 
> > > Also, I experimented with that and it works intermittently. I want to take a
> > > step back on this issue and ask few questions here:
> > > 1. Is fixing this recovery a blocker for me sending in a V2 version?
> > 
> > At the end of day it's your feature. I would certainly prefer for it
> > to work as good as possible, this being a recovery in case of failure
> > just make sure it does something sane (ie: crash/close the frontend)
> > and add a TODO note.
> > 
> > > 2. In our 2-3 years of supporting this feature at large scale we haven't seen this issue
> > > where backend fails to disconnect. What we are trying to do here is create a
> > > hypothetical situation where we leave backend in Closing state and try and see how it
> > > recovers. The reason why I think it "may not" occur and the timeout of 5HZ is
> > > sufficient is because we haven't come across even a single use-case where it
> > > caused hibernation to fail.
> > > The reason why I think "it may" occur is if we are running a really memory
> > > intensive workload and ring is busy and is unable to complete all the requests
> > > in the given timeout. This is very unlikely though.
> > 
> > As said above I would generally prefer for code to handle possible
> > failures the best way, and hence I think here it would be nice to
> > fallback to the normal disconnect path and just wait for the backend
> > to close.
> >
> Do you mind throwing some light in here, what that path may be, if its
> straight forward to fix I would like to debug it a bit more. May be I am
> missing some of the context here.

So the frontend should do:

- Switch to Closed state (and cleanup everything required).
- Wait for backend to switch to Closed state (must be done
  asynchronously, handled in blkback_changed).
- Switch frontend to XenbusStateInitialising, that will in turn force
  the backend to switch to XenbusStateInitWait.
- After that it should just follow the normal connection procedure.

I think the part that's missing is the frontend doing the state change
to XenbusStateInitialising when the backend switches to the Closed
state.

> I was of the view we may just want to mark frontend closed which should do 
> the job of freeing resources and then following the same flow as
> blkfront_restore. That does not seems to work correctly 100% of the time.

I think the missing part is that you must wait for the backend to
switch to the Closed state, or else the switch to
XenbusStateInitialising won't be picked up correctly by the backend
(because it's still doing it's cleanup).

Using blkfront_restore might be an option, but you need to assert the
backend is in the initial state before using that path.

> > You likely have this very well tuned to your own environment and
> > workloads, since this will now be upstream others might have more
> > contended systems where it could start to fail.
> > 
> I agree, however, this is also from the testing I did with 100 of runs 
> outside of EC2 running few tests of my own. 
> > > 3) Also, I do not think this may be straight forward to fix and expect
> > > hibernation to work flawlessly in subsequent invocations. I am open to
> > > all suggestions.
> > 
> > Right, adding a TODO would seem appropriate then.
> >
> Just to double check, I will send in a V2 with this marked as TO-DO?

I think that's fine. Please clearly describe what's missing, so
others know what they might have to implement.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-26  9:12                 ` Roger Pau Monné
@ 2020-06-29 19:20                   ` Anchal Agarwal
  2020-06-30  8:30                     ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Anchal Agarwal @ 2020-06-29 19:20 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Fri, Jun 26, 2020 at 11:12:39AM +0200, Roger Pau Monné wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Thu, Jun 25, 2020 at 06:36:59PM +0000, Anchal Agarwal wrote:
> > On Tue, Jun 23, 2020 at 10:19:03AM +0200, Roger Pau Monné wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > >
> > >
> > >
> > > On Tue, Jun 23, 2020 at 12:43:14AM +0000, Anchal Agarwal wrote:
> > > > On Mon, Jun 22, 2020 at 10:38:46AM +0200, Roger Pau Monné wrote:
> > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jun 19, 2020 at 11:43:12PM +0000, Anchal Agarwal wrote:
> > > > > > On Wed, Jun 17, 2020 at 10:35:28AM +0200, Roger Pau Monné wrote:
> > > > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jun 16, 2020 at 09:49:25PM +0000, Anchal Agarwal wrote:
> > > > > > > > On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monné wrote:
> > > > > > > > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > > > > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> > > > > > > > > >  CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> > > > > > > > > >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> > > > > > > > > >     > +                              "the device may become inconsistent state");
> > > > > > > > > >
> > > > > > > > > >     Leaving the device in this state is quite bad, as it's in a closed
> > > > > > > > > >     state and with the queues frozen. You should make an attempt to
> > > > > > > > > >     restore things to a working state.
> > > > > > > > > >
> > > > > > > > > > You mean if backend closed after timeout? Is there a way to know that? I understand it's not good to
> > > > > > > > > > leave it in this state however, I am still trying to find if there is a good way to know if backend is still connected after timeout.
> > > > > > > > > > Hence the message " the device may become inconsistent state".  I didn't see a timeout not even once on my end so that's why
> > > > > > > > > > I may be looking for an alternate perspective here. may be need to thaw everything back intentionally is one thing I could think of.
> > > > > > > > >
> > > > > > > > > You can manually force this state, and then check that it will behave
> > > > > > > > > correctly. I would expect that on a failure to disconnect from the
> > > > > > > > > backend you should switch the frontend to the 'Init' state in order to
> > > > > > > > > try to reconnect to the backend when possible.
> > > > > > > > >
> > > > > > > > From what I understand forcing manually is, failing the freeze without
> > > > > > > > disconnect and try to revive the connection by unfreezing the
> > > > > > > > queues->reconnecting to backend [which never got diconnected]. May be even
> > > > > > > > tearing down things manually because I am not sure what state will frontend
> > > > > > > > see if backend fails to to disconnect at any point in time. I assumed connected.
> > > > > > > > Then again if its "CONNECTED" I may not need to tear down everything and start
> > > > > > > > from Initialising state because that may not work.
> > > > > > > >
> > > > > > > > So I am not so sure about backend's state so much, lets say if  xen_blkif_disconnect fail,
> > > > > > > > I don't see it getting handled in the backend then what will be backend's state?
> > > > > > > > Will it still switch xenbus state to 'Closed'? If not what will frontend see,
> > > > > > > > if it tries to read backend's state through xenbus_read_driver_state ?
> > > > > > > >
> > > > > > > > So the flow be like:
> > > > > > > > Front end marks XenbusStateClosing
> > > > > > > > Backend marks its state as XenbusStateClosing
> > > > > > > >     Frontend marks XenbusStateClosed
> > > > > > > >     Backend disconnects calls xen_blkif_disconnect
> > > > > > > >        Backend fails to disconnect, the above function returns EBUSY
> > > > > > > >        What will be state of backend here?
> > > > > > >
> > > > > > > Backend should stay in state 'Closing' then, until it can finish
> > > > > > > tearing down.
> > > > > > >
> > > > > > It disconnects the ring after switching to connected state too.
> > > > > > > >        Frontend did not tear down the rings if backend does not switches the
> > > > > > > >        state to 'Closed' in case of failure.
> > > > > > > >
> > > > > > > > If backend stays in CONNECTED state, then even if we mark it Initialised in frontend, backend
> > > > > > >
> > > > > > > Backend will stay in state 'Closing' I think.
> > > > > > >
> > > > > > > > won't be calling connect(). {From reading code in frontend_changed}
> > > > > > > > IMU, Initialising will fail since backend dev->state != XenbusStateClosed plus
> > > > > > > > we did not tear down anything so calling talk_to_blkback may not be needed
> > > > > > > >
> > > > > > > > Does that sound correct?
> > > > > > >
> > > > > > > I think switching to the initial state in order to try to attempt a
> > > > > > > reconnection would be our best bet here.
> > > > > > >
> > > > > > It does not seems to work correctly, I get hung tasks all over and all the
> > > > > > requests to filesystem gets stuck. Backend does shows the state as connected
> > > > > > after xenbus_dev_suspend fails but I think there may be something missing.
> > > > > > I don't seem to get IO interrupts thereafter i.e hitting the function blkif_interrupts.
> > > > > > I think just marking it initialised may not be the only thing.
> > > > > > Here is a short description of what I am trying to do:
> > > > > > So, on timeout:
> > > > > >     Switch XenBusState to "Initialized"
> > > > > >     unquiesce/unfreeze the queues and return
> > > > > >     mark info->connected = BLKIF_STATE_CONNECTED
> > > > >
> > > > > If xenbus state is Initialized isn't it wrong to set info->connected
> > > > > == CONNECTED?
> > > > >
> > > > Yes, you are right earlier I was marking it explicitly but that was not right,
> > > > the connect path for blkfront will do that.
> > > > > You should tear down all the internal state (like a proper close)?
> > > > >
> > > > Isn't that similar to disconnecting in the first place that failed during
> > > > freeze? Do you mean re-try to close but this time re-connect after close
> > > > basically do everything you would at "restore"?
> > >
> > > Last time I checked blkfront supported reconnections (ie: disconnect
> > > from a backend and connect again). I was assuming we could apply the
> > > same here on timeout, and just follow the same path where the frontend
> > > waits indefinitely for the backend to close and then attempts to
> > > reconnect.
> > >
> > > > Also, I experimented with that and it works intermittently. I want to take a
> > > > step back on this issue and ask few questions here:
> > > > 1. Is fixing this recovery a blocker for me sending in a V2 version?
> > >
> > > At the end of day it's your feature. I would certainly prefer for it
> > > to work as good as possible, this being a recovery in case of failure
> > > just make sure it does something sane (ie: crash/close the frontend)
> > > and add a TODO note.
> > >
> > > > 2. In our 2-3 years of supporting this feature at large scale we haven't seen this issue
> > > > where backend fails to disconnect. What we are trying to do here is create a
> > > > hypothetical situation where we leave backend in Closing state and try and see how it
> > > > recovers. The reason why I think it "may not" occur and the timeout of 5HZ is
> > > > sufficient is because we haven't come across even a single use-case where it
> > > > caused hibernation to fail.
> > > > The reason why I think "it may" occur is if we are running a really memory
> > > > intensive workload and ring is busy and is unable to complete all the requests
> > > > in the given timeout. This is very unlikely though.
> > >
> > > As said above I would generally prefer for code to handle possible
> > > failures the best way, and hence I think here it would be nice to
> > > fallback to the normal disconnect path and just wait for the backend
> > > to close.
> > >
> > Do you mind throwing some light in here, what that path may be, if its
> > straight forward to fix I would like to debug it a bit more. May be I am
> > missing some of the context here.
> 
> So the frontend should do:
> 
> - Switch to Closed state (and cleanup everything required).
> - Wait for backend to switch to Closed state (must be done
>   asynchronously, handled in blkback_changed).
> - Switch frontend to XenbusStateInitialising, that will in turn force
>   the backend to switch to XenbusStateInitWait.
> - After that it should just follow the normal connection procedure.
> 
> I think the part that's missing is the frontend doing the state change
> to XenbusStateInitialising when the backend switches to the Closed
> state.
> 
> > I was of the view we may just want to mark frontend closed which should do
> > the job of freeing resources and then following the same flow as
> > blkfront_restore. That does not seems to work correctly 100% of the time.
> 
> I think the missing part is that you must wait for the backend to
> switch to the Closed state, or else the switch to
> XenbusStateInitialising won't be picked up correctly by the backend
> (because it's still doing it's cleanup).
> 
> Using blkfront_restore might be an option, but you need to assert the
> backend is in the initial state before using that path.
>
Yes, I agree and I make sure that XenbusStateInitialising only triggers
on frontend once backend is disconnected. msleep in a loop not that graceful but
works.
Frontend only switches to XenbusStateInitialising once it sees backend
as Closed. The issue here is and may require more debugging is:
1. Hibernate instance->Closing failed, artificially created situation by not
marking frontend Closed in the first place during freezing.
2. System comes back up fine restored to 'backend connected'.
3. Re-run (1) again without reboot
4. (4) fails to recover basically freezing does not fail at all which is weird
   because it should timeout as it passes through same path. It hits a BUG in
   talk_to_blkback() and instance crashes.

Anyways just wanted to paint out a picture that there may be something more
happening here which needs a persistent debugging. 
> > > You likely have this very well tuned to your own environment and
> > > workloads, since this will now be upstream others might have more
> > > contended systems where it could start to fail.
> > >
> > I agree, however, this is also from the testing I did with 100 of runs
> > outside of EC2 running few tests of my own.
> > > > 3) Also, I do not think this may be straight forward to fix and expect
> > > > hibernation to work flawlessly in subsequent invocations. I am open to
> > > > all suggestions.
> > >
> > > Right, adding a TODO would seem appropriate then.
> > >
> > Just to double check, I will send in a V2 with this marked as TO-DO?
> 
> I think that's fine. Please clearly describe what's missing, so
> others know what they might have to implement.
> 
Ack.
> Thanks, Roger.
> 
Thanks,
Anchal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
  2020-06-29 19:20                   ` Anchal Agarwal
@ 2020-06-30  8:30                     ` Roger Pau Monné
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2020-06-30  8:30 UTC (permalink / raw)
  To: Anchal Agarwal
  Cc: Boris Ostrovsky, tglx, mingo, bp, hpa, x86, jgross, linux-pm,
	linux-mm, Kamata, Munehisa, sstabellini, konrad.wilk, axboe,
	davem, rjw, len.brown, pavel, peterz, Valentin, Eduardo, Singh,
	Balbir, xen-devel, vkuznets, netdev, linux-kernel, Woodhouse,
	David, benh

On Mon, Jun 29, 2020 at 07:20:35PM +0000, Anchal Agarwal wrote:
> On Fri, Jun 26, 2020 at 11:12:39AM +0200, Roger Pau Monné wrote:
> > So the frontend should do:
> > 
> > - Switch to Closed state (and cleanup everything required).
> > - Wait for backend to switch to Closed state (must be done
> >   asynchronously, handled in blkback_changed).
> > - Switch frontend to XenbusStateInitialising, that will in turn force
> >   the backend to switch to XenbusStateInitWait.
> > - After that it should just follow the normal connection procedure.
> > 
> > I think the part that's missing is the frontend doing the state change
> > to XenbusStateInitialising when the backend switches to the Closed
> > state.
> > 
> > > I was of the view we may just want to mark frontend closed which should do
> > > the job of freeing resources and then following the same flow as
> > > blkfront_restore. That does not seems to work correctly 100% of the time.
> > 
> > I think the missing part is that you must wait for the backend to
> > switch to the Closed state, or else the switch to
> > XenbusStateInitialising won't be picked up correctly by the backend
> > (because it's still doing it's cleanup).
> > 
> > Using blkfront_restore might be an option, but you need to assert the
> > backend is in the initial state before using that path.
> >
> Yes, I agree and I make sure that XenbusStateInitialising only triggers
> on frontend once backend is disconnected. msleep in a loop not that graceful but
> works.
> Frontend only switches to XenbusStateInitialising once it sees backend
> as Closed. The issue here is and may require more debugging is:
> 1. Hibernate instance->Closing failed, artificially created situation by not
> marking frontend Closed in the first place during freezing.
> 2. System comes back up fine restored to 'backend connected'.

I'm not sure I'm following what is happening here, what should happen
IMO is that the backend will eventually reach the Closed state? Ie:
the frontend has initiated the disconnection from the backend by
setting the Closing state, and the backend will have to eventually
reach the Closed state.

At that point the frontend can initiate a reconnection by switching to
the Initialising state.

> 3. Re-run (1) again without reboot
> 4. (4) fails to recover basically freezing does not fail at all which is weird
>    because it should timeout as it passes through same path. It hits a BUG in
>    talk_to_blkback() and instance crashes.

It's hard to tell exactly. I guess you would have to figure what makes
the frontend not get stuck at the same place as the first attempt.

Roger.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-30  8:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-03 23:33 [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation] Agarwal, Anchal
2020-06-04  7:05 ` Roger Pau Monné
2020-06-16 21:49   ` Anchal Agarwal
2020-06-16 22:30     ` Anchal Agarwal
2020-06-17  8:38       ` Roger Pau Monné
2020-06-17  8:35     ` Roger Pau Monné
2020-06-19 23:43       ` Anchal Agarwal
2020-06-22  8:38         ` Roger Pau Monné
2020-06-23  0:43           ` Anchal Agarwal
2020-06-23  8:19             ` Roger Pau Monné
2020-06-25 18:36               ` Anchal Agarwal
2020-06-26  9:12                 ` Roger Pau Monné
2020-06-29 19:20                   ` Anchal Agarwal
2020-06-30  8:30                     ` Roger Pau Monné

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).