From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48321C43603 for ; Mon, 9 Dec 2019 14:41:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 12489207FD for ; Mon, 9 Dec 2019 14:41:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="Zc5iSu2O" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727804AbfLIOlq (ORCPT ); Mon, 9 Dec 2019 09:41:46 -0500 Received: from smtp-fw-6002.amazon.com ([52.95.49.90]:44549 "EHLO smtp-fw-6002.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726687AbfLIOlp (ORCPT ); Mon, 9 Dec 2019 09:41:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1575902505; x=1607438505; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=XbcTZM4MjHE6lmT4GQMd2AUnJ2M3OvQz7zQpxDVD1KM=; b=Zc5iSu2OtZqxN5T1MbLyl/rgXp8YIELTJLwhEjZcCi2KwGmmVS0mon/t qUO94rswxhAi9mgPbtXNVVx4m0nsvZ5sA+CGeF54Se/nmzYYp8jtQ8t+F QZ3urZdnhSm3vqODrLIpMWFEd2uU1ULV3AcCxV763PAXGnkK2EwV6+Zg9 0=; IronPort-SDR: YwFq6IemvsINv3mi9a8IuDZl45EjZnpt8fBGfIdhG+M/Nx3njDgEBZRpGtinUxcfUUPKL1ZOXZ Z3G3Hi67ilWw== X-IronPort-AV: E=Sophos;i="5.69,296,1571702400"; d="scan'208";a="6820773" Received: from iad6-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-8549039f.us-west-2.amazon.com) ([10.124.125.6]) by smtp-border-fw-out-6002.iad6.amazon.com with ESMTP; 09 Dec 2019 14:41:43 +0000 Received: from EX13MTAUEA001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2a-8549039f.us-west-2.amazon.com (Postfix) with ESMTPS id 32D3AA22FC; Mon, 9 Dec 2019 14:41:42 +0000 (UTC) Received: from EX13D32EUC004.ant.amazon.com (10.43.164.121) by EX13MTAUEA001.ant.amazon.com (10.43.61.243) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 9 Dec 2019 14:41:41 +0000 Received: from EX13D32EUC003.ant.amazon.com (10.43.164.24) by EX13D32EUC004.ant.amazon.com (10.43.164.121) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 9 Dec 2019 14:41:41 +0000 Received: from EX13D32EUC003.ant.amazon.com ([10.43.164.24]) by EX13D32EUC003.ant.amazon.com ([10.43.164.24]) with mapi id 15.00.1367.000; Mon, 9 Dec 2019 14:41:40 +0000 From: "Durrant, Paul" To: =?iso-8859-1?Q?Roger_Pau_Monn=E9?= CC: "linux-kernel@vger.kernel.org" , "xen-devel@lists.xenproject.org" , "Juergen Gross" , Stefano Stabellini , "Boris Ostrovsky" Subject: RE: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed Thread-Topic: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed Thread-Index: AQHVq3SCoU35oX1INEGjFwMD1PQM5aexs7UAgAAEsBCAAAg3gIAAAevggAAghQCAAAFgMA== Date: Mon, 9 Dec 2019 14:41:40 +0000 Message-ID: References: <20191205140123.3817-1-pdurrant@amazon.com> <20191205140123.3817-3-pdurrant@amazon.com> <20191209113926.GS980@Air-de-Roger> <19b5c2fa36b842e58bbdddd602c4e672@EX13D32EUC003.ant.amazon.com> <20191209122537.GV980@Air-de-Roger> <54e3cd3a42d8418d9a36388315deab13@EX13D32EUC003.ant.amazon.com> <20191209142852.GW980@Air-de-Roger> In-Reply-To: <20191209142852.GW980@Air-de-Roger> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.43.164.211] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: Roger Pau Monn=E9 > Sent: 09 December 2019 14:29 > To: Durrant, Paul > Cc: linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org; Juergen > Gross ; Stefano Stabellini ; > Boris Ostrovsky > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced t= o > closed >=20 > On Mon, Dec 09, 2019 at 12:40:47PM +0000, Durrant, Paul wrote: > > > -----Original Message----- > > > From: Roger Pau Monn=E9 > > > Sent: 09 December 2019 12:26 > > > To: Durrant, Paul > > > Cc: linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org; > Juergen > > > Gross ; Stefano Stabellini ; > > > Boris Ostrovsky > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is > forced to > > > closed > > > > > > On Mon, Dec 09, 2019 at 12:01:38PM +0000, Durrant, Paul wrote: > > > > > -----Original Message----- > > > > > From: Roger Pau Monn=E9 > > > > > Sent: 09 December 2019 11:39 > > > > > To: Durrant, Paul > > > > > Cc: linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org; > > > Juergen > > > > > Gross ; Stefano Stabellini > ; > > > > > Boris Ostrovsky > > > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is > > > forced to > > > > > closed > > > > > > > > > > On Thu, Dec 05, 2019 at 02:01:21PM +0000, Paul Durrant wrote: > > > > > > Only force state to closed in the case when the toolstack may > need > > > to > > > > > > clean up. This can be detected by checking whether the state in > > > xenstore > > > > > > has been set to closing prior to device removal. > > > > > > > > > > I'm not sure I see the point of this, I would expect that a > failure to > > > > > probe or the removal of the device would leave the xenbus state a= s > > > > > closed, which is consistent with the actual driver state. > > > > > > > > > > Can you explain what's the benefit of leaving a device without a > > > > > driver in such unknown state? > > > > > > > > > > > > > If probe fails then I think it should leave the state alone. If the > > > > state is moved to closed then basically you just killed that > > > > connection to the guest (as the frontend will normally close down > > > > when it sees this change) so, if the probe failure was due to a bug > > > > in blkback or, e.g., a transient resource issue then it's game over > > > > as far as that guest goes. > > > > > > But the connection can be restarted by switching the backend to the > > > init state again. > > > > Too late. The frontend saw closed and you already lost. > > > > > > > > > The ultimate goal here is PV backend re-load that is completely > > > transparent to the guest. Modifying anything in xenstore compromises > that > > > so we need to be careful. > > > > > > That's a fine goal, but not switching to closed state in > > > xenbus_dev_remove seems wrong, as you have actually left the frontend > > > without a matching backend and with the state not set to closed. > > > > > > > Why is this a problem? With this series fully applied a (block) backend > can come and go without needing to change the state. Relying on guests to > DTRT is not a sustainable option for a cloud deployment. > > > > > Ie: that would be fine if you explicitly state this is some kind of > > > internal blkback reload, but not for the general case where blkback > > > has been unbound. I think we need someway to difference a blkback > > > reload vs a unbound. > > > > > > > Why do we need that though? Why is it advantageous for a backend to go > to closed. No PV backends cope with an unbind as-is, and a toolstack > initiated unplug will always set state to 5 anyway. So TBH any state > transition done directly in the xenbus code looks wrong to me anyway (but > appears to be a necessary evil to keep the toolstack working in the event > it spawns a backend where there is actually to driver present, or it > doesn't come online). >=20 > IMO the normal flow for unbind would be to attempt to close open > connections and then remove the driver: leaving frontends connected > without any attached backends is not correct, and will just block the > guest frontend until requests start timing out. >=20 > I can see the reasoning for doing that for the purpose of updating a > blkback module without guests noticing, but I would prefer that > leaving connections open was an option that could be given when > unbinding (or maybe a driver option in sysfs?), so that the default > behaviour would be to try to close everything when unbinding if > possible. Well unbind is pretty useless now IMO since bind doesn't work, and a transi= tion straight to closed is just plain wrong anyway. But, we could have a fl= ag that the backend driver sets to say that it supports transparent re-bind= that gates this code. Would that make you feel more comfortable? If you want unbind to actually do a proper unplug then that's extra work an= d not really something I want to tackle (and re-bind would still need to be= toolstack initiated as something would have to re-create the xenstore area= ). Paul >=20 > Thanks, Roger.