On Mon, May 10, 2021 at 07:47:01PM +0100, Michael Brown wrote: > On 10/05/2021 19:32, Marek Marczykowski-Górecki wrote: > > On Tue, Apr 13, 2021 at 04:25:12PM +0100, Michael Brown wrote: > > > The logic in connect() is currently written with the assumption that > > > xenbus_watch_pathfmt() will return an error for a node that does not > > > exist. This assumption is incorrect: xenstore does allow a watch to > > > be registered for a nonexistent node (and will send notifications > > > should the node be subsequently created). > > > > > > As of commit 1f2565780 ("xen-netback: remove 'hotplug-status' once it > > > has served its purpose"), this leads to a failure when a domU > > > transitions into XenbusStateConnected more than once. On the first > > > domU transition into Connected state, the "hotplug-status" node will > > > be deleted by the hotplug_status_changed() callback in dom0. On the > > > second or subsequent domU transition into Connected state, the > > > hotplug_status_changed() callback will therefore never be invoked, and > > > so the backend will remain stuck in InitWait. > > > > > > This failure prevents scenarios such as reloading the xen-netfront > > > module within a domU, or booting a domU via iPXE. There is > > > unfortunately no way for the domU to work around this dom0 bug. > > > > > > Fix by explicitly checking for existence of the "hotplug-status" node, > > > thereby creating the behaviour that was previously assumed to exist. > > > > This change is wrong. The 'hotplug-status' node is created _only_ by a > > hotplug script and done so when it's executed. When kernel waits for > > hotplug script to be executed it waits for the node to _appear_, not > > _change_. So, this change basically made the kernel not waiting for the > > hotplug script at all. > > That doesn't sound plausible to me. In the setup as you describe, how is > the kernel expected to differentiate between "hotplug script has not yet > created the node" and "hotplug script does not exist and will therefore > never create any node"? Is the later valid at all? From what I can see in the toolstack, it always sets some hotplug script (if not specified explicitly - then "vif-bridge"), -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab