From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=hIPo=75=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_ADSP_ALL,DKIM_INVALID,
	DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,
	SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 994E7C433E0
	for <linux-mm@archiver.kernel.org>; Tue, 16 Jun 2020 21:49:49 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 2563F2082E
	for <linux-mm@archiver.kernel.org>; Tue, 16 Jun 2020 21:49:48 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="cTtwnkJh"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2563F2082E
Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 618FA6B0003; Tue, 16 Jun 2020 17:49:48 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5C7A26B0005; Tue, 16 Jun 2020 17:49:48 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4B6316B000C; Tue, 16 Jun 2020 17:49:48 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169])
	by kanga.kvack.org (Postfix) with ESMTP id 3520A6B0003
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 17:49:48 -0400 (EDT)
Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id B0ADF181AC9BF
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 21:49:47 +0000 (UTC)
X-FDA: 76936417614.23.side81_3c1095c26e02
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin23.hostedemail.com (Postfix) with ESMTP id 715AC37608
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 21:49:47 +0000 (UTC)
X-HE-Tag: side81_3c1095c26e02
X-Filterd-Recvd-Size: 13814
Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25])
	by imf42.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 16 Jun 2020 21:49:46 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
  d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209;
  t=1592344187; x=1623880187;
  h=date:from:to:cc:message-id:references:mime-version:
   content-transfer-encoding:in-reply-to:subject;
  bh=2xZWar77dXkth8tk0V/E/5zPJfDMTckPDO14OJN6cFs=;
  b=cTtwnkJhSYrfmjffV8dWk1+nGirffm3PLrhoDu2YBm9UqBA1llNrkp5B
   KPmyE+s7lUVSbUU05igwzZAPbsr+9jkXmuH7cmQ4c2nriPA9nYCBGGS9l
   q7YEesMms2thgxBqdnG3WWBNRBhgpKdLtGBFUY5mHDPAoBdZlkGPlrBT+
   4=;
IronPort-SDR: mrYnVTXGTnnqHewBUIddvzzRXlflYMjMyxE96i0WnUJPOgd3L5PuAQgZhdT+1V4jG9PhbPquY9
 XRYaFMbAvvQw==
X-IronPort-AV: E=Sophos;i="5.73,519,1583193600"; 
   d="scan'208";a="44534346"
Subject: Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation]
Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2c-87a10be6.us-west-2.amazon.com) ([10.47.23.38])
  by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 16 Jun 2020 21:49:43 +0000
Received: from EX13MTAUEE002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162])
	by email-inbound-relay-2c-87a10be6.us-west-2.amazon.com (Postfix) with ESMTPS id 8C216A07C3;
	Tue, 16 Jun 2020 21:49:41 +0000 (UTC)
Received: from EX13D08UEE004.ant.amazon.com (10.43.62.182) by
 EX13MTAUEE002.ant.amazon.com (10.43.62.24) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Tue, 16 Jun 2020 21:49:26 +0000
Received: from EX13MTAUEA002.ant.amazon.com (10.43.61.77) by
 EX13D08UEE004.ant.amazon.com (10.43.62.182) with Microsoft SMTP Server (TLS)
 id 15.0.1497.2; Tue, 16 Jun 2020 21:49:25 +0000
Received: from dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com
 (172.22.96.68) by mail-relay.amazon.com (10.43.61.169) with Microsoft SMTP
 Server id 15.0.1497.2 via Frontend Transport; Tue, 16 Jun 2020 21:49:25 +0000
Received: by dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (Postfix, from userid 4335130)
	id 872B240139; Tue, 16 Jun 2020 21:49:25 +0000 (UTC)
Date: Tue, 16 Jun 2020 21:49:25 +0000
From: Anchal Agarwal <anchalag@amazon.com>
To: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>, "tglx@linutronix.de"
	<tglx@linutronix.de>, "mingo@redhat.com" <mingo@redhat.com>, "bp@alien8.de"
	<bp@alien8.de>, "hpa@zytor.com" <hpa@zytor.com>, "x86@kernel.org"
	<x86@kernel.org>, "jgross@suse.com" <jgross@suse.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>, "linux-mm@kvack.org"
	<linux-mm@kvack.org>, "Kamata, Munehisa" <kamatam@amazon.com>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>, "konrad.wilk@oracle.com"
	<konrad.wilk@oracle.com>, "axboe@kernel.dk" <axboe@kernel.dk>,
	"davem@davemloft.net" <davem@davemloft.net>, "rjw@rjwysocki.net"
	<rjw@rjwysocki.net>, "len.brown@intel.com" <len.brown@intel.com>,
	"pavel@ucw.cz" <pavel@ucw.cz>, "peterz@infradead.org" <peterz@infradead.org>,
	"Valentin, Eduardo" <eduval@amazon.com>, "Singh, Balbir" <sblbir@amazon.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>, "netdev@vger.kernel.org"
	<netdev@vger.kernel.org>, "linux-kernel@vger.kernel.org"
	<linux-kernel@vger.kernel.org>, "Woodhouse, David" <dwmw@amazon.co.uk>,
	"benh@kernel.crashing.org" <benh@kernel.crashing.org>
Message-ID: <20200616214925.GA21684@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com>
References: <7FD7505E-79AA-43F6-8D5F-7A2567F333AB@amazon.com>
 <20200604070548.GH1195@Air-de-Roger>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
In-Reply-To: <20200604070548.GH1195@Air-de-Roger>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Rspamd-Queue-Id: 715AC37608
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monn=E9 wrote:
> CAUTION: This email originated from outside of the organization. Do not=
 click links or open attachments unless you can confirm the sender and kn=
ow the content is safe.
>=20
>=20
>=20
> Hello,
>=20
> On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote:
> >  CAUTION: This email originated from outside of the organization. Do =
not click links or open attachments unless you can confirm the sender and=
 know the content is safe.
> >
> >
> >
> >     On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote:
> >     > From: Munehisa Kamata <kamatam@amazon.com>
> >     >
> >     > S4 power transition states are much different than xen
> >     > suspend/resume. Former is visible to the guest and frontend dri=
vers should
> >     > be aware of the state transitions and should be able to take ap=
propriate
> >     > actions when needed. In transition to S4 we need to make sure t=
hat at least
> >     > all the in-flight blkif requests get completed, since they prob=
ably contain
> >     > bits of the guest's memory image and that's not going to get sa=
ved any
> >     > other way. Hence, re-issuing of in-flight requests as in case o=
f xen resume
> >     > will not work here. This is in contrast to xen-suspend where we=
 need to
> >     > freeze with as little processing as possible to avoid dirtying =
RAM late in
> >     > the migration cycle and we know that in-flight data can wait.
> >     >
> >     > Add freeze, thaw and restore callbacks for PM suspend and hiber=
nation
> >     > support. All frontend drivers that needs to use PM_HIBERNATION/=
PM_SUSPEND
> >     > events, need to implement these xenbus_driver callbacks. The fr=
eeze handler
> >     > stops block-layer queue and disconnect the frontend from the ba=
ckend while
> >     > freeing ring_info and associated resources. Before disconnectin=
g from the
> >     > backend, we need to prevent any new IO from being queued and wa=
it for existing
> >     > IO to complete. Freeze/unfreeze of the queues will guarantee th=
at there are no
> >     > requests in use on the shared ring. However, for sanity we shou=
ld check
> >     > state of the ring before disconnecting to make sure that there =
are no
> >     > outstanding requests to be processed on the ring. The restore h=
andler
> >     > re-allocates ring_info, unquiesces and unfreezes the queue and =
re-connect to
> >     > the backend, so that rest of the kernel can continue to use the=
 block device
> >     > transparently.
> >     >
> >     > Note:For older backends,if a backend doesn't have commit'12ea72=
9645ace'
> >     > xen/blkback: unmap all persistent grants when frontend gets dis=
connected,
> >     > the frontend may see massive amount of grant table warning when=
 freeing
> >     > resources.
> >     > [   36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff)
> >     > [   36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use=
!
> >     >
> >     > In this case, persistent grants would need to be disabled.
> >     >
> >     > [Anchal Changelog: Removed timeout/request during blkfront free=
ze.
> >     > Reworked the whole patch to work with blk-mq and incorporate up=
stream's
> >     > comments]
> >
> >     Please tag versions using vX and it would be helpful if you could=
 list
> >     the specific changes that you performed between versions. There w=
here
> >     3 RFC versions IIRC, and there's no log of the changes between th=
em.
> >
> > I will elaborate on "upstream's comments" in my changelog in my next =
round of patches.
>=20
> Sorry for being picky, but can you please make sure your email client
> properly quotes previous emails on reply. Note the lack of '>' added
> to the quoted parts of your reply.
That was just my outlook probably. Note taken.
>=20
> >     > +                     }
> >     > +
> >     >                       break;
> >     > +             }
> >     > +
> >     > +             /*
> >     > +              * We may somehow receive backend's Closed again =
while thawing
> >     > +              * or restoring and it causes thawing or restorin=
g to fail.
> >     > +              * Ignore such unexpected state regardless of the=
 backend state.
> >     > +              */
> >     > +             if (info->connected =3D=3D BLKIF_STATE_FROZEN) {
> >
> >     I think you can join this with the previous dev->state =3D=3D Xen=
busStateClosed?
> >
> >     Also, won't the device be in the Closed state already if it's in =
state
> >     frozen?
> > Yes but I think this mostly due to a hypothetical case if during thaw=
ing backend switches to Closed state.
> > I am not entirely sure if that could happen. Could use some expertise=
 here.
>=20
> I think the frontend seeing the backend in the closed state during
> restore would be a bug that should prevent the frontend from
> resuming.
>=20
> >     > +     /* Kick the backend to disconnect */
> >     > +     xenbus_switch_state(dev, XenbusStateClosing);
> >     > +
> >     > +     /*
> >     > +      * We don't want to move forward before the frontend is d=
iconnected
> >     > +      * from the backend cleanly.
> >     > +      */
> >     > +     timeout =3D wait_for_completion_timeout(&info->wait_backe=
nd_disconnected,
> >     > +                                           timeout);
> >     > +     if (!timeout) {
> >     > +             err =3D -EBUSY;
> >
> >     Note err is only used here, and I think could just be dropped.
> >
> > This err is what's being returned from the function. Am I missing any=
thing?
>=20
> Just 'return -EBUSY;' directly, and remove the top level variable. You
> can also use -EBUSY directly in the xenbus_dev_error call. Anyway, not
> that important.
>=20
> >     > +             xenbus_dev_error(dev, err, "Freezing timed out;"
> >     > +                              "the device may become inconsist=
ent state");
> >
> >     Leaving the device in this state is quite bad, as it's in a close=
d
> >     state and with the queues frozen. You should make an attempt to
> >     restore things to a working state.
> >
> > You mean if backend closed after timeout? Is there a way to know that=
? I understand it's not good to
> > leave it in this state however, I am still trying to find if there is=
 a good way to know if backend is still connected after timeout.
> > Hence the message " the device may become inconsistent state".  I did=
n't see a timeout not even once on my end so that's why
> > I may be looking for an alternate perspective here. may be need to th=
aw everything back intentionally is one thing I could think of.
>=20
> You can manually force this state, and then check that it will behave
> correctly. I would expect that on a failure to disconnect from the
> backend you should switch the frontend to the 'Init' state in order to
> try to reconnect to the backend when possible.
>=20
>From what I understand forcing manually is, failing the freeze without
disconnect and try to revive the connection by unfreezing the
queues->reconnecting to backend [which never got diconnected]. May be eve=
n
tearing down things manually because I am not sure what state will fronte=
nd
see if backend fails to to disconnect at any point in time. I assumed con=
nected.
Then again if its "CONNECTED" I may not need to tear down everything and =
start
from Initialising state because that may not work.

So I am not so sure about backend's state so much, lets say if  xen_blkif=
_disconnect fail,
I don't see it getting handled in the backend then what will be backend's=
 state?
Will it still switch xenbus state to 'Closed'? If not what will frontend =
see,=20
if it tries to read backend's state through xenbus_read_driver_state ?

So the flow be like:
Front end marks XenbusStateClosing
Backend marks its state as XenbusStateClosing
    Frontend marks XenbusStateClosed
    Backend disconnects calls xen_blkif_disconnect
       Backend fails to disconnect, the above function returns EBUSY
       What will be state of backend here?=20
       Frontend did not tear down the rings if backend does not switches =
the
       state to 'Closed' in case of failure.

If backend stays in CONNECTED state, then even if we mark it Initialised =
in frontend, backend
won't be calling connect(). {From reading code in frontend_changed}
IMU, Initialising will fail since backend dev->state !=3D XenbusStateClos=
ed plus
we did not tear down anything so calling talk_to_blkback may not be neede=
d

Does that sound correct?
> >     > +     }
> >     > +
> >     > +     return err;
> >     > +}
> >     > +
> >     > +static int blkfront_restore(struct xenbus_device *dev)
> >     > +{
> >     > +     struct blkfront_info *info =3D dev_get_drvdata(&dev->dev)=
;
> >     > +     int err =3D 0;
> >     > +
> >     > +     err =3D talk_to_blkback(dev, info);
> >     > +     blk_mq_unquiesce_queue(info->rq);
> >     > +     blk_mq_unfreeze_queue(info->rq);
> >     > +     if (!err)
> >     > +         blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_r=
ings);
> >
> >     Bad indentation. Also shouldn't you first update the queues and t=
hen
> >     unfreeze them?
> > Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes t=
he queue
> > So I don't think the order could be reversed.
>=20
> Regardless of what blk_mq_update_nr_hw_queues does, I don't think it's
> correct to unfreeze the queues without having updated them. Also the
> freezing/unfreezing uses a refcount, so I think it's perfectly fine to
> call blk_mq_update_nr_hw_queues first and then unfreeze the queues.
>=20
> Also note that talk_to_blkback returning an error should likely
> prevent any unfreezing, as the queues won't be updated to match the
> parameters of the backend.
>
I think you are right here. Will send out fixes in V2
> Thanks, Roger.
>=20
Thanks,
Anchal