On Tue, Apr 13, 2021 at 04:25:12PM +0100, Michael Brown wrote: > The logic in connect() is currently written with the assumption that > xenbus_watch_pathfmt() will return an error for a node that does not > exist. This assumption is incorrect: xenstore does allow a watch to > be registered for a nonexistent node (and will send notifications > should the node be subsequently created). > > As of commit 1f2565780 ("xen-netback: remove 'hotplug-status' once it > has served its purpose"), this leads to a failure when a domU > transitions into XenbusStateConnected more than once. On the first > domU transition into Connected state, the "hotplug-status" node will > be deleted by the hotplug_status_changed() callback in dom0. On the > second or subsequent domU transition into Connected state, the > hotplug_status_changed() callback will therefore never be invoked, and > so the backend will remain stuck in InitWait. > > This failure prevents scenarios such as reloading the xen-netfront > module within a domU, or booting a domU via iPXE. There is > unfortunately no way for the domU to work around this dom0 bug. > > Fix by explicitly checking for existence of the "hotplug-status" node, > thereby creating the behaviour that was previously assumed to exist. This change is wrong. The 'hotplug-status' node is created _only_ by a hotplug script and done so when it's executed. When kernel waits for hotplug script to be executed it waits for the node to _appear_, not _change_. So, this change basically made the kernel not waiting for the hotplug script at all. Furthermore, there is an additional side effect: in case of a driver domain, xl devd may be started after the backend node is created (this may happen if you start the frontend domain in parallel with the backend). In this case, 'xl devd' will see the vif backend in XenbusStateConnected state already and will not execute hotplug script at all. I think the proper fix is to re-register the watch when necessary, instead of not registering it at all. > Signed-off-by: Michael Brown > --- > drivers/net/xen-netback/xenbus.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c > index a5439c130130..d24b7a7993aa 100644 > --- a/drivers/net/xen-netback/xenbus.c > +++ b/drivers/net/xen-netback/xenbus.c > @@ -824,11 +824,15 @@ static void connect(struct backend_info *be) > xenvif_carrier_on(be->vif); > > unregister_hotplug_status_watch(be); > - err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, NULL, > - hotplug_status_changed, > - "%s/%s", dev->nodename, "hotplug-status"); > - if (!err) > + if (xenbus_exists(XBT_NIL, dev->nodename, "hotplug-status")) { > + err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, > + NULL, hotplug_status_changed, > + "%s/%s", dev->nodename, > + "hotplug-status"); > + if (err) > + goto err; > be->have_hotplug_status_watch = 1; > + } > > netif_tx_wake_all_queues(be->vif->dev); > -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab