On Tue, May 18, 2021 at 07:57:16AM +0100, Paul Durrant wrote: > On 17/05/2021 22:43, Marek Marczykowski-Górecki wrote: > > On Tue, May 11, 2021 at 12:46:38PM +0000, Durrant, Paul wrote: > > > I really can't remember any detail. Perhaps try reverting both patches then and check that the unbind/rmmod/modprobe/bind sequence still works (and the backend actually makes it into connected state). > > > > Ok, I've tried this. I've reverted both commits, then used your test > > script from the 9476654bd5e8ad42abe8ee9f9e90069ff8e60c17: > > This has been tested by running iperf as a server in the test VM and > > then running a client against it in a continuous loop, whilst also > > running: > > while true; > > do echo vif-$DOMID-$VIF >unbind; > > echo down; > > rmmod xen-netback; > > echo unloaded; > > modprobe xen-netback; > > cd $(pwd); > > brctl addif xenbr0 vif$DOMID.$VIF; > > ip link set vif$DOMID.$VIF up; > > echo up; > > sleep 5; > > done > > in dom0 from /sys/bus/xen-backend/drivers/vif to continuously unbind, > > unload, re-load, re-bind and re-plumb the backend. > > In fact, the need to call `brctl` and `ip link` manually is exactly > > because the hotplug script isn't executed. When I execute it manually, > > the backend properly gets back to working. So, removing 'hotplug-status' > > was in the correct place (netback_remove). The missing part is the toolstack > > calling the hotplug script on xen-netback re-bind. > > > > Why is that missing? We're going behind the back of the toolstack to do the > unbind and bind so why should we expect it to re-execute a hotplug script? Ok, then simply execute the whole hotplug script (instead of its subset) after re-loading the backend module and everything will be fine. For example like this: XENBUS_PATH=backend/vif/$DOMID/$VIF \ XENBUS_TYPE=vif \ XENBUS_BASE_PATH=backend \ script=/etc/xen/scripts/vif-bridge \ vif=vif.$DOMID.$VIF \ /etc/xen/scripts/vif-bridge online (...) > > In short: if device gets XenbusStateInitWait for the first time (ddev == > > NULL case), it goes to add_device() which executes the hotplug script > > and stores the device. > > Then, if device goes to XenbusStateClosed + online==0 state, then it > > executes hotplug script again (with "offline" parameter) and forgets the > > device. If you unbind the driver, the device stays in > > XenbusStateConnected state (in xenstore), and after you bind it again, > > it goes to XenbusStateInitWait. It don't think it goes through > > XenbusStateClosed, and online stays at 1 too, so libxl doesn't execute > > the hotplug script again. > > This is pretty key. The frontend should not notice an unbind/bind i.e. there > should be no evidence of it happening by examining states in xenstore (from > the guest side). If you update the backend module, I think the frontend needs at least to re-evaluate feature-* nodes. In case of applying just a bug fix, they should not change (in theory), but technically that would be the correct thing to do. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab