From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIM_ADSP_ALL,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 994E7C433E0 for ; Tue, 16 Jun 2020 21:49:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2563F2082E for ; Tue, 16 Jun 2020 21:49:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="cTtwnkJh" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2563F2082E Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=amazon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 618FA6B0003; Tue, 16 Jun 2020 17:49:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C7A26B0005; Tue, 16 Jun 2020 17:49:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B6316B000C; Tue, 16 Jun 2020 17:49:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 3520A6B0003 for ; Tue, 16 Jun 2020 17:49:48 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id B0ADF181AC9BF for ; Tue, 16 Jun 2020 21:49:47 +0000 (UTC) X-FDA: 76936417614.23.side81_3c1095c26e02 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 715AC37608 for ; Tue, 16 Jun 2020 21:49:47 +0000 (UTC) X-HE-Tag: side81_3c1095c26e02 X-Filterd-Recvd-Size: 13814 Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Tue, 16 Jun 2020 21:49:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1592344187; x=1623880187; h=date:from:to:cc:message-id:references:mime-version: content-transfer-encoding:in-reply-to:subject; bh=2xZWar77dXkth8tk0V/E/5zPJfDMTckPDO14OJN6cFs=; b=cTtwnkJhSYrfmjffV8dWk1+nGirffm3PLrhoDu2YBm9UqBA1llNrkp5B KPmyE+s7lUVSbUU05igwzZAPbsr+9jkXmuH7cmQ4c2nriPA9nYCBGGS9l q7YEesMms2thgxBqdnG3WWBNRBhgpKdLtGBFUY5mHDPAoBdZlkGPlrBT+ 4=; IronPort-SDR: mrYnVTXGTnnqHewBUIddvzzRXlflYMjMyxE96i0WnUJPOgd3L5PuAQgZhdT+1V4jG9PhbPquY9 XRYaFMbAvvQw== X-IronPort-AV: E=Sophos;i="5.73,519,1583193600"; d="scan'208";a="44534346" Subject: Re: [PATCH 06/12] xen-blkfront: add callbacks for PM suspend and hibernation] Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2c-87a10be6.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 16 Jun 2020 21:49:43 +0000 Received: from EX13MTAUEE002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2c-87a10be6.us-west-2.amazon.com (Postfix) with ESMTPS id 8C216A07C3; Tue, 16 Jun 2020 21:49:41 +0000 (UTC) Received: from EX13D08UEE004.ant.amazon.com (10.43.62.182) by EX13MTAUEE002.ant.amazon.com (10.43.62.24) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 16 Jun 2020 21:49:26 +0000 Received: from EX13MTAUEA002.ant.amazon.com (10.43.61.77) by EX13D08UEE004.ant.amazon.com (10.43.62.182) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 16 Jun 2020 21:49:25 +0000 Received: from dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (172.22.96.68) by mail-relay.amazon.com (10.43.61.169) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Tue, 16 Jun 2020 21:49:25 +0000 Received: by dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com (Postfix, from userid 4335130) id 872B240139; Tue, 16 Jun 2020 21:49:25 +0000 (UTC) Date: Tue, 16 Jun 2020 21:49:25 +0000 From: Anchal Agarwal To: Roger Pau =?iso-8859-1?Q?Monn=E9?= CC: Boris Ostrovsky , "tglx@linutronix.de" , "mingo@redhat.com" , "bp@alien8.de" , "hpa@zytor.com" , "x86@kernel.org" , "jgross@suse.com" , "linux-pm@vger.kernel.org" , "linux-mm@kvack.org" , "Kamata, Munehisa" , "sstabellini@kernel.org" , "konrad.wilk@oracle.com" , "axboe@kernel.dk" , "davem@davemloft.net" , "rjw@rjwysocki.net" , "len.brown@intel.com" , "pavel@ucw.cz" , "peterz@infradead.org" , "Valentin, Eduardo" , "Singh, Balbir" , "xen-devel@lists.xenproject.org" , "vkuznets@redhat.com" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Woodhouse, David" , "benh@kernel.crashing.org" Message-ID: <20200616214925.GA21684@dev-dsk-anchalag-2a-9c2d1d96.us-west-2.amazon.com> References: <7FD7505E-79AA-43F6-8D5F-7A2567F333AB@amazon.com> <20200604070548.GH1195@Air-de-Roger> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline In-Reply-To: <20200604070548.GH1195@Air-de-Roger> User-Agent: Mutt/1.5.21 (2010-09-15) X-Rspamd-Queue-Id: 715AC37608 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 04, 2020 at 09:05:48AM +0200, Roger Pau Monn=E9 wrote: > CAUTION: This email originated from outside of the organization. Do not= click links or open attachments unless you can confirm the sender and kn= ow the content is safe. >=20 >=20 >=20 > Hello, >=20 > On Wed, Jun 03, 2020 at 11:33:52PM +0000, Agarwal, Anchal wrote: > > CAUTION: This email originated from outside of the organization. Do = not click links or open attachments unless you can confirm the sender and= know the content is safe. > > > > > > > > On Tue, May 19, 2020 at 11:27:50PM +0000, Anchal Agarwal wrote: > > > From: Munehisa Kamata > > > > > > S4 power transition states are much different than xen > > > suspend/resume. Former is visible to the guest and frontend dri= vers should > > > be aware of the state transitions and should be able to take ap= propriate > > > actions when needed. In transition to S4 we need to make sure t= hat at least > > > all the in-flight blkif requests get completed, since they prob= ably contain > > > bits of the guest's memory image and that's not going to get sa= ved any > > > other way. Hence, re-issuing of in-flight requests as in case o= f xen resume > > > will not work here. This is in contrast to xen-suspend where we= need to > > > freeze with as little processing as possible to avoid dirtying = RAM late in > > > the migration cycle and we know that in-flight data can wait. > > > > > > Add freeze, thaw and restore callbacks for PM suspend and hiber= nation > > > support. All frontend drivers that needs to use PM_HIBERNATION/= PM_SUSPEND > > > events, need to implement these xenbus_driver callbacks. The fr= eeze handler > > > stops block-layer queue and disconnect the frontend from the ba= ckend while > > > freeing ring_info and associated resources. Before disconnectin= g from the > > > backend, we need to prevent any new IO from being queued and wa= it for existing > > > IO to complete. Freeze/unfreeze of the queues will guarantee th= at there are no > > > requests in use on the shared ring. However, for sanity we shou= ld check > > > state of the ring before disconnecting to make sure that there = are no > > > outstanding requests to be processed on the ring. The restore h= andler > > > re-allocates ring_info, unquiesces and unfreezes the queue and = re-connect to > > > the backend, so that rest of the kernel can continue to use the= block device > > > transparently. > > > > > > Note:For older backends,if a backend doesn't have commit'12ea72= 9645ace' > > > xen/blkback: unmap all persistent grants when frontend gets dis= connected, > > > the frontend may see massive amount of grant table warning when= freeing > > > resources. > > > [ 36.852659] deferring g.e. 0xf9 (pfn 0xffffffffffffffff) > > > [ 36.855089] xen:grant_table: WARNING:e.g. 0x112 still in use= ! > > > > > > In this case, persistent grants would need to be disabled. > > > > > > [Anchal Changelog: Removed timeout/request during blkfront free= ze. > > > Reworked the whole patch to work with blk-mq and incorporate up= stream's > > > comments] > > > > Please tag versions using vX and it would be helpful if you could= list > > the specific changes that you performed between versions. There w= here > > 3 RFC versions IIRC, and there's no log of the changes between th= em. > > > > I will elaborate on "upstream's comments" in my changelog in my next = round of patches. >=20 > Sorry for being picky, but can you please make sure your email client > properly quotes previous emails on reply. Note the lack of '>' added > to the quoted parts of your reply. That was just my outlook probably. Note taken. >=20 > > > + } > > > + > > > break; > > > + } > > > + > > > + /* > > > + * We may somehow receive backend's Closed again = while thawing > > > + * or restoring and it causes thawing or restorin= g to fail. > > > + * Ignore such unexpected state regardless of the= backend state. > > > + */ > > > + if (info->connected =3D=3D BLKIF_STATE_FROZEN) { > > > > I think you can join this with the previous dev->state =3D=3D Xen= busStateClosed? > > > > Also, won't the device be in the Closed state already if it's in = state > > frozen? > > Yes but I think this mostly due to a hypothetical case if during thaw= ing backend switches to Closed state. > > I am not entirely sure if that could happen. Could use some expertise= here. >=20 > I think the frontend seeing the backend in the closed state during > restore would be a bug that should prevent the frontend from > resuming. >=20 > > > + /* Kick the backend to disconnect */ > > > + xenbus_switch_state(dev, XenbusStateClosing); > > > + > > > + /* > > > + * We don't want to move forward before the frontend is d= iconnected > > > + * from the backend cleanly. > > > + */ > > > + timeout =3D wait_for_completion_timeout(&info->wait_backe= nd_disconnected, > > > + timeout); > > > + if (!timeout) { > > > + err =3D -EBUSY; > > > > Note err is only used here, and I think could just be dropped. > > > > This err is what's being returned from the function. Am I missing any= thing? >=20 > Just 'return -EBUSY;' directly, and remove the top level variable. You > can also use -EBUSY directly in the xenbus_dev_error call. Anyway, not > that important. >=20 > > > + xenbus_dev_error(dev, err, "Freezing timed out;" > > > + "the device may become inconsist= ent state"); > > > > Leaving the device in this state is quite bad, as it's in a close= d > > state and with the queues frozen. You should make an attempt to > > restore things to a working state. > > > > You mean if backend closed after timeout? Is there a way to know that= ? I understand it's not good to > > leave it in this state however, I am still trying to find if there is= a good way to know if backend is still connected after timeout. > > Hence the message " the device may become inconsistent state". I did= n't see a timeout not even once on my end so that's why > > I may be looking for an alternate perspective here. may be need to th= aw everything back intentionally is one thing I could think of. >=20 > You can manually force this state, and then check that it will behave > correctly. I would expect that on a failure to disconnect from the > backend you should switch the frontend to the 'Init' state in order to > try to reconnect to the backend when possible. >=20 >From what I understand forcing manually is, failing the freeze without disconnect and try to revive the connection by unfreezing the queues->reconnecting to backend [which never got diconnected]. May be eve= n tearing down things manually because I am not sure what state will fronte= nd see if backend fails to to disconnect at any point in time. I assumed con= nected. Then again if its "CONNECTED" I may not need to tear down everything and = start from Initialising state because that may not work. So I am not so sure about backend's state so much, lets say if xen_blkif= _disconnect fail, I don't see it getting handled in the backend then what will be backend's= state? Will it still switch xenbus state to 'Closed'? If not what will frontend = see,=20 if it tries to read backend's state through xenbus_read_driver_state ? So the flow be like: Front end marks XenbusStateClosing Backend marks its state as XenbusStateClosing Frontend marks XenbusStateClosed Backend disconnects calls xen_blkif_disconnect Backend fails to disconnect, the above function returns EBUSY What will be state of backend here?=20 Frontend did not tear down the rings if backend does not switches = the state to 'Closed' in case of failure. If backend stays in CONNECTED state, then even if we mark it Initialised = in frontend, backend won't be calling connect(). {From reading code in frontend_changed} IMU, Initialising will fail since backend dev->state !=3D XenbusStateClos= ed plus we did not tear down anything so calling talk_to_blkback may not be neede= d Does that sound correct? > > > + } > > > + > > > + return err; > > > +} > > > + > > > +static int blkfront_restore(struct xenbus_device *dev) > > > +{ > > > + struct blkfront_info *info =3D dev_get_drvdata(&dev->dev)= ; > > > + int err =3D 0; > > > + > > > + err =3D talk_to_blkback(dev, info); > > > + blk_mq_unquiesce_queue(info->rq); > > > + blk_mq_unfreeze_queue(info->rq); > > > + if (!err) > > > + blk_mq_update_nr_hw_queues(&info->tag_set, info->nr_r= ings); > > > > Bad indentation. Also shouldn't you first update the queues and t= hen > > unfreeze them? > > Please correct me if I am wrong, blk_mq_update_nr_hw_queues freezes t= he queue > > So I don't think the order could be reversed. >=20 > Regardless of what blk_mq_update_nr_hw_queues does, I don't think it's > correct to unfreeze the queues without having updated them. Also the > freezing/unfreezing uses a refcount, so I think it's perfectly fine to > call blk_mq_update_nr_hw_queues first and then unfreeze the queues. >=20 > Also note that talk_to_blkback returning an error should likely > prevent any unfreezing, as the queues won't be updated to match the > parameters of the backend. > I think you are right here. Will send out fixes in V2 > Thanks, Roger. >=20 Thanks, Anchal