Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)

* Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
@ 2020-02-03 22:19 Laine Stump
  2020-02-04 10:24 ` Michael S. Tsirkin
  2020-02-04 18:43 ` Daniel P. Berrangé
  0 siblings, 2 replies; 8+ messages in thread
From: Laine Stump @ 2020-02-03 22:19 UTC (permalink / raw)
  To: libvir-list; +Cc: qemu-devel

Although I've never experienced it, due to not running Windows guests, 
I've recently learned that a Windows guest permits a user (hopefully 
only one with local admin privileges??!) to "hot-unplug" any PCI device. 
I've also learned that some hypervisor admins don't want to permit 
admins of the virtual machines they're managing to unplug PCI devices. I 
believe this is impossible to prevent on an i440fx-based machinetype, 
and can only be done on a q35-based machinetype by assigning the devices 
to the root bus (so that they are seen as integrated devices) rather 
than to a pcie-root-port. But when libvirt is assigning PCI addresses to 
devices in a q35-base guest, it will *always* assign a PCIe device to a 
pcie-root-port specifically so that hotplug is possible (this was done 
to maintain functional parity with i440fx guests, where all PCI slots 
support hotplug).

To make the above-mentioned admins happy, we need to make it possible to 
(easily) create guest configurations for q35-based virtual machines 
where the PCI devices can't be hot-unplugged by the guest OS.

Thinking in the context of a management platform (e.g. OpenStack or 
ovirt) that goes through libvirt to use QEMU (and forgetting about 
i440fx, concentrating only on q35), I can think of a few different ways 
this could be done:

1) Rather than leaving the task of assignung the PCI addresses of 
devices to libvirt (which is what essentially *all* management apps that 
use libvirt currently do), the management application could itself 
directly assign the PCI addressed of all devices to be slots on pcie.0.

This is problematic because once a management application has taken over 
the PCI address assignment of a single device, it must learn the rules 
of what type of device can be plugged into what type of PCI controller 
(including plugging in new controllers when necessary), and keep track 
of which slots on which PCI controllers are already in use - effectively 
tossing that part of libvirt's functionality / embedded knowledge / 
usefulness to management applications out the window. It's even more of 
a problem for management applications that have no provision for 
manually assigning PCI addresses - virt-manager for example only 
supports this by using "XML mode" where the froopy point-click UI is 
swapped out for an edit window where the user is simply presented with 
the full XML for a device and allowed to tweak it around as they see fit 
(including duplicate addresses, plugging the wrong kind of device into 
the wrong slot, referencing non-existent controllers, etc). (NB: you 
could argue that management could just take over PCI address assignment 
in the case of wanting hotplug disabled, and only care about / support 
pcie.0 (which makes the task much easier, since you just ignore the 
existence of any other PCI controllers, leaving you with a homogenous 
array of 32 slot x 8 functions, but becomes much more complicated if you 
want to allow a mix of hotpluggable and non-hotpluggable devices, and 
you *know* someone will)

2) libvirt could gain a knob "somewhere" in the domain XML to force a 
single device, or all devices, to be assigned to a PCI address on pcie.0 
rather than on a pcie-root-port. This could be thought of as a "hint" 
about device placement, as well as extra validation in the case that a 
PCI address has been manually assigned. So, for example, let's say a 
"hotplug='disable'" option is added somewhere at the top level of the 
domain (maybe "<hotplug enable='no'/>" inside <features> or something 
like that); when PCI addresses are assigned by libvirt, it would attempt 
to find a slot on a controller that didn't support hotplug. And/or a 
similar knob could be added to each device. In both cases, the setting 
would be used both when assigning PCI addresses and also to validate 
user-provided PCI addresses to assure that the desired criterion was met 
(otherwise someone would manually select a PCI address on a controller 
that supported hotplug, but then set "hotplug='disabled'" and expect 
hotplug to be magically disabled on the slot).

Some of you will remember that I proposed such a knob for libvirt a few 
years ago when we were first fleshing out support for QEMU's PCI Express 
controllers and the Q35 machinetype, and it was rejected as "libvirt 
dictating policy". Of course at that time there weren't actual users 
demanding the functionality, and now there are. Aside from that, all I 
can say is that it isn't libvirt dictating this policy, it's the user of 
libvirt, and libvirt is just following directions :-) (and that I really 
really dislike the idea of a forced handover of the entire task of 
assigning/managing device PCI addresses to management apps just because 
they decide they want to disable guest-initiated hotplug

3) qemu could add a "hotpluggable=no" commandline option to all PCI 
devices (including vfio-pci) and then do whatever is necessary to make 
sure this is honored in the emulated hardware (is it possible to set 
this on a per-slot basis in a PCI controller? Or must it be done for an 
entire controller? I suppose it's not as much of an issue for 
pcie-root-port, as long as you're not using multiple functions). libvirt 
would then need to add this option to the XML for each device, and 
management applications would need to set it - it would essentially look 
the same to the management application, but it would be implemented 
differently - instead of libvirt using that flag to make a choice about 
which slot to assign, it would assign PCI addresses in the same manner 
as before, and use the libvirt XML flag to set a QEMU commandline flag 
for the device.

The upside of this is that we would be disabling hotplug by "disabling 
hotplug" rather than by "assigning the device to a slot that 
coincidentally doesn't support hotplug", making it all more orthogonal - 
everything else in a guest's config could remain exactly the same while 
enabling/disabling hotplug. (Another upside is that it could possibly be 
made to work for i440fx machine types, but we're not supposed to care 
about that any more, so I won't mention it :-)) The downside is that it 
requires a new feature in QEMU (whose difficulty/feasibility I have 0 
knowledge of), so there are 3 layers of work rather than 2.

So does anyone have any different (and hopefully better) idea of how to 
do this? Arguments for/against the 3 possibilities I've listed here?

^ permalink raw reply	[flat|nested] 8+ messages in thread