On Mon, Jan 06, 2020 at 05:03:40PM +0100, Marek Marczykowski-Górecki wrote: > On Mon, Jan 06, 2020 at 03:40:22PM +0000, Ian Jackson wrote: > > Adding Roger to the CC. > > > > Marek Marczykowski-Górecki writes ("Re: [PATCH] libxl: create backend/ xenstore dir for driver domains"): > > > On Mon, Jan 06, 2020 at 02:20:46PM +0000, Ian Jackson wrote: > > > > Marek Marczykowski-Górecki writes ("[PATCH] libxl: create backend/ xenstore dir for driver domains"): > > > > > Cleaning up backend xenstore entries is a responsibility of the backend. > > > > > When backend lives outside of dom0, the domain needs proper permissions > > > > > to do it. Normally it is given permission to remove the device dir > > > > > itself, but not the dir containing it (named after frontend ID). After a > > > > > whole those empty leftover directories accumulate to the point xenstore > > > > > returning E2BIG on listing them. > > > > > > > > > > Fix this by giving backend domain write access also to backend/ > > > > > directory itself when c_info->driver_domain option is set. The code > > > > > removing relevant dir is already there (just lacked permissions to do so). > > > > > > > > > > Note this also allows the backend domain to create new entries, > > > > > pretending to host backend devices it don't have. But since libxl uses > > > > > /libxl/ xenstore dir for this information (still outside of backend > > > > > domain control), this shouldn't be an issue. > > > > > > > > This seems quite hazardous to me. The reasoning you use to show that > > > > this iws OK seems fragile, and in general it doesn't feel right to > > > > give the particular backend such wide scope. > > > > > > > > Can we find another way to address this problem ? I think the > > > > containing directory should be removed by the toolstack. Why is this > > > > difficult ? (I presume there is a reason or you would have done it > > > > that way...) > > > > > > It was done this way previously and caused issues, see this commit: > > > > > > commit 546678c6a60f64fb186640460dfa69a837c8fba5 > > > Author: Roger Pau Monne > > > Date: Wed Sep 23 12:06:56 2015 +0200 > > > > > > libxl: fix the cleanup of the backend path when using driver domains > > > > Thanks. > > > > > With the current libxl implementation the control domain will > > > remove both the frontend and the backend xenstore paths of a > > > device that's handled by a driver domain. This is incorrect, > > > since the driver domain possibly needs to access the backend > > > path in order to perform the disconnection and cleanup of the > > > device. > > > > > > Fix this by making sure the control domain only cleans the > > > frontend path, leaving the backend path to be cleaned by the > > > driver domain. Note that if the device is not handled by a > > > driver domain the control domain will perform the removal of > > > both the frontend and the backend paths. > > > > Hmm. I see my Ack on that. Nevertheless maybe it is wrong. > > > > Looking at it afresh, I think maybe the right answer is: > > > > * If the driver domain is expected to be working properly, the > > toolstack should wait for the driver domain to complete the device > > shutdown, before removing the backend node. Indeed, the toolstack > > ought to wait for this before actually destroying the guest in Xen, > > by the usual logic for clean domain shutdown. > > I think that's not enough. .../state = 6 is set by the kernel, but > xl devd in the driver domain may want to cleanup things (hotplug scripts > etc). And indeed libxl__device_destroy() is called from > device_hotplug_done(), not device_backend_callback(). > > Alternatively, toolstack could wait for the actual backend node to be > removed (by the driver domain), and then cleanup the parent directory (if > empty). I don't find it particularly appealing, as every contact with > libxl async code reduce overall happiness... > > > * There needs to be a way to deal with a broken/unresponsive driver > > domain. That will involve not waiting for the backend so must > > involve simply deleting the backend from xenstore. > > It's already there: if driver domain fails to set .../state = 6 within > a timeout, toolstack will forcibly remove the entry. > > > Is the distinction here between "xl shutdown" and "xl destroy", on the > > actual guest domain, good enough ? Hopefully if the driver domain > > sees the backend directory simply vanish it can destructively tear > > everything down ? > > In the past this lead to multiple issues, where hotplug script didn't > know which device actually was removed. In some cases I needed to > workaround this by saving xenstore dump into a file in an "online" > hotplug script, but it is very ugly solution. Any opinion on the above? In the above context (plus the fact that the toolstack use /libxl to enumerate devices), I still think giving driver domain write access to the backend/ node is the right solution for this problem. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?