* Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
@ 2020-02-03 22:19 Laine Stump
2020-02-04 10:24 ` Michael S. Tsirkin
2020-02-04 18:43 ` Daniel P. Berrangé
0 siblings, 2 replies; 8+ messages in thread
From: Laine Stump @ 2020-02-03 22:19 UTC (permalink / raw)
To: libvir-list; +Cc: qemu-devel
Although I've never experienced it, due to not running Windows guests,
I've recently learned that a Windows guest permits a user (hopefully
only one with local admin privileges??!) to "hot-unplug" any PCI device.
I've also learned that some hypervisor admins don't want to permit
admins of the virtual machines they're managing to unplug PCI devices. I
believe this is impossible to prevent on an i440fx-based machinetype,
and can only be done on a q35-based machinetype by assigning the devices
to the root bus (so that they are seen as integrated devices) rather
than to a pcie-root-port. But when libvirt is assigning PCI addresses to
devices in a q35-base guest, it will *always* assign a PCIe device to a
pcie-root-port specifically so that hotplug is possible (this was done
to maintain functional parity with i440fx guests, where all PCI slots
support hotplug).
To make the above-mentioned admins happy, we need to make it possible to
(easily) create guest configurations for q35-based virtual machines
where the PCI devices can't be hot-unplugged by the guest OS.
Thinking in the context of a management platform (e.g. OpenStack or
ovirt) that goes through libvirt to use QEMU (and forgetting about
i440fx, concentrating only on q35), I can think of a few different ways
this could be done:
1) Rather than leaving the task of assignung the PCI addresses of
devices to libvirt (which is what essentially *all* management apps that
use libvirt currently do), the management application could itself
directly assign the PCI addressed of all devices to be slots on pcie.0.
This is problematic because once a management application has taken over
the PCI address assignment of a single device, it must learn the rules
of what type of device can be plugged into what type of PCI controller
(including plugging in new controllers when necessary), and keep track
of which slots on which PCI controllers are already in use - effectively
tossing that part of libvirt's functionality / embedded knowledge /
usefulness to management applications out the window. It's even more of
a problem for management applications that have no provision for
manually assigning PCI addresses - virt-manager for example only
supports this by using "XML mode" where the froopy point-click UI is
swapped out for an edit window where the user is simply presented with
the full XML for a device and allowed to tweak it around as they see fit
(including duplicate addresses, plugging the wrong kind of device into
the wrong slot, referencing non-existent controllers, etc). (NB: you
could argue that management could just take over PCI address assignment
in the case of wanting hotplug disabled, and only care about / support
pcie.0 (which makes the task much easier, since you just ignore the
existence of any other PCI controllers, leaving you with a homogenous
array of 32 slot x 8 functions, but becomes much more complicated if you
want to allow a mix of hotpluggable and non-hotpluggable devices, and
you *know* someone will)
2) libvirt could gain a knob "somewhere" in the domain XML to force a
single device, or all devices, to be assigned to a PCI address on pcie.0
rather than on a pcie-root-port. This could be thought of as a "hint"
about device placement, as well as extra validation in the case that a
PCI address has been manually assigned. So, for example, let's say a
"hotplug='disable'" option is added somewhere at the top level of the
domain (maybe "<hotplug enable='no'/>" inside <features> or something
like that); when PCI addresses are assigned by libvirt, it would attempt
to find a slot on a controller that didn't support hotplug. And/or a
similar knob could be added to each device. In both cases, the setting
would be used both when assigning PCI addresses and also to validate
user-provided PCI addresses to assure that the desired criterion was met
(otherwise someone would manually select a PCI address on a controller
that supported hotplug, but then set "hotplug='disabled'" and expect
hotplug to be magically disabled on the slot).
Some of you will remember that I proposed such a knob for libvirt a few
years ago when we were first fleshing out support for QEMU's PCI Express
controllers and the Q35 machinetype, and it was rejected as "libvirt
dictating policy". Of course at that time there weren't actual users
demanding the functionality, and now there are. Aside from that, all I
can say is that it isn't libvirt dictating this policy, it's the user of
libvirt, and libvirt is just following directions :-) (and that I really
really dislike the idea of a forced handover of the entire task of
assigning/managing device PCI addresses to management apps just because
they decide they want to disable guest-initiated hotplug
3) qemu could add a "hotpluggable=no" commandline option to all PCI
devices (including vfio-pci) and then do whatever is necessary to make
sure this is honored in the emulated hardware (is it possible to set
this on a per-slot basis in a PCI controller? Or must it be done for an
entire controller? I suppose it's not as much of an issue for
pcie-root-port, as long as you're not using multiple functions). libvirt
would then need to add this option to the XML for each device, and
management applications would need to set it - it would essentially look
the same to the management application, but it would be implemented
differently - instead of libvirt using that flag to make a choice about
which slot to assign, it would assign PCI addresses in the same manner
as before, and use the libvirt XML flag to set a QEMU commandline flag
for the device.
The upside of this is that we would be disabling hotplug by "disabling
hotplug" rather than by "assigning the device to a slot that
coincidentally doesn't support hotplug", making it all more orthogonal -
everything else in a guest's config could remain exactly the same while
enabling/disabling hotplug. (Another upside is that it could possibly be
made to work for i440fx machine types, but we're not supposed to care
about that any more, so I won't mention it :-)) The downside is that it
requires a new feature in QEMU (whose difficulty/feasibility I have 0
knowledge of), so there are 3 layers of work rather than 2.
So does anyone have any different (and hopefully better) idea of how to
do this? Arguments for/against the 3 possibilities I've listed here?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
2020-02-03 22:19 Disabling PCI "hot-unplug" for a guest (and/or a single PCI device) Laine Stump
@ 2020-02-04 10:24 ` Michael S. Tsirkin
2020-02-04 16:13 ` Julia Suvorova
2020-02-04 18:43 ` Daniel P. Berrangé
1 sibling, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2020-02-04 10:24 UTC (permalink / raw)
To: Laine Stump; +Cc: libvir-list, qemu-devel
On Mon, Feb 03, 2020 at 05:19:51PM -0500, Laine Stump wrote:
> 3) qemu could add a "hotpluggable=no" commandline option to all PCI devices
> (including vfio-pci) and then do whatever is necessary to make sure this is
> honored in the emulated hardware (is it possible to set this on a per-slot
> basis in a PCI controller? Or must it be done for an entire controller?
I think it's possible on a per-slot basis, yes.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
2020-02-04 10:24 ` Michael S. Tsirkin
@ 2020-02-04 16:13 ` Julia Suvorova
2020-02-04 16:35 ` Michael S. Tsirkin
0 siblings, 1 reply; 8+ messages in thread
From: Julia Suvorova @ 2020-02-04 16:13 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: libvir-list, qemu-devel, Laine Stump
On Tue, Feb 4, 2020 at 11:26 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Feb 03, 2020 at 05:19:51PM -0500, Laine Stump wrote:
> > 3) qemu could add a "hotpluggable=no" commandline option to all PCI devices
> > (including vfio-pci) and then do whatever is necessary to make sure this is
> > honored in the emulated hardware (is it possible to set this on a per-slot
> > basis in a PCI controller? Or must it be done for an entire controller?
>
> I think it's possible on a per-slot basis, yes.
There's a "Hot-Plug Capable" option in Slot Capability register, so we
can switch it off. But it's only for pcie devices, can't say anything
about conventional pci.
Best regards, Julia Suvorova.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
2020-02-04 16:13 ` Julia Suvorova
@ 2020-02-04 16:35 ` Michael S. Tsirkin
2020-02-05 11:36 ` Daniel P. Berrangé
0 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2020-02-04 16:35 UTC (permalink / raw)
To: Julia Suvorova; +Cc: libvir-list, qemu-devel, Laine Stump
On Tue, Feb 04, 2020 at 05:13:54PM +0100, Julia Suvorova wrote:
> On Tue, Feb 4, 2020 at 11:26 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Feb 03, 2020 at 05:19:51PM -0500, Laine Stump wrote:
> > > 3) qemu could add a "hotpluggable=no" commandline option to all PCI devices
> > > (including vfio-pci) and then do whatever is necessary to make sure this is
> > > honored in the emulated hardware (is it possible to set this on a per-slot
> > > basis in a PCI controller? Or must it be done for an entire controller?
> >
> > I think it's possible on a per-slot basis, yes.
>
> There's a "Hot-Plug Capable" option in Slot Capability register, so we
> can switch it off. But it's only for pcie devices, can't say anything
> about conventional pci.
>
> Best regards, Julia Suvorova.
For conventional PCI, we can drop SHPC capability and remove
the eject method from ACPI.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
2020-02-03 22:19 Disabling PCI "hot-unplug" for a guest (and/or a single PCI device) Laine Stump
2020-02-04 10:24 ` Michael S. Tsirkin
@ 2020-02-04 18:43 ` Daniel P. Berrangé
2020-02-05 16:29 ` Laine Stump
1 sibling, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2020-02-04 18:43 UTC (permalink / raw)
To: Laine Stump; +Cc: libvir-list, qemu-devel
On Mon, Feb 03, 2020 at 05:19:51PM -0500, Laine Stump wrote:
> Although I've never experienced it, due to not running Windows guests, I've
> recently learned that a Windows guest permits a user (hopefully only one
> with local admin privileges??!) to "hot-unplug" any PCI device. I've also
> learned that some hypervisor admins don't want to permit admins of the
> virtual machines they're managing to unplug PCI devices. I believe this is
> impossible to prevent on an i440fx-based machinetype, and can only be done
> on a q35-based machinetype by assigning the devices to the root bus (so that
> they are seen as integrated devices) rather than to a pcie-root-port. But
> when libvirt is assigning PCI addresses to devices in a q35-base guest, it
> will *always* assign a PCIe device to a pcie-root-port specifically so that
> hotplug is possible (this was done to maintain functional parity with i440fx
> guests, where all PCI slots support hotplug).
After speaking with Alex & Laine on IRC, I learnt some further relevant
points
- In a "typical" physical machine PCI slots will not be marked
as hotpluggable - /sys/bus/pci/slots only has 2 entries on
my HP DL180, corresponding to unused physical PCI slots
- QEMU is hardcoded to report all pci & pcie-root-ports as hotpluggable.
So /sys/bus/pci/slots on i440fx has 31 entries, one for every
device, while on q35 it has one entry for every pcie-root-port
IIUC.
- It is conceptually possible to enhance pcie-root-port device
to allow its hotplug capability to be toggled. Alternatively
a parallel pcie-root-port-nohotplug device could be created.
The end result would be the same from guest POV
- The vfio-pci device has a companion vfio-pci-nohotplug
device. The difference is simply whether the QEMU DEviceClass
has the "hotpluggable" attribute set, and is separate from
whether the PCI(e) root port has hotplug enabled
The last point here about vfio-pci si particularly important,
as it shows libvirt needs to be capable of tracking hotpluggability
independantly on the PCI port and the PCI device attached to the
port.
> 1) Rather than leaving the task of assignung the PCI addresses of devices to
> libvirt (which is what essentially *all* management apps that use libvirt
> currently do), the management application could itself directly assign the
> PCI addressed of all devices to be slots on pcie.0.
[snip]
This is essentially a hack to work around the fact the the pcie-root-port
is hardcoded to report itself as hotpluggable.
As such I don't consider this is serious long term solution. If you
absolutely cannot wait for a newer libvirt/QEMU, this solution could
be used as a quick hack for mgmt apps, but long term we need todo
better.
> 2) libvirt could gain a knob "somewhere" in the domain XML to force a single
> device, or all devices, to be assigned to a PCI address on pcie.0 rather
> than on a pcie-root-port. This could be thought of as a "hint" about device
> placement, as well as extra validation in the case that a PCI address has
> been manually assigned. So, for example, let's say a "hotplug='disable'"
> option is added somewhere at the top level of the domain (maybe "<hotplug
> enable='no'/>" inside <features> or something like that); when PCI addresses
> are assigned by libvirt, it would attempt to find a slot on a controller
> that didn't support hotplug. And/or a similar knob could be added to each
> device. In both cases, the setting would be used both when assigning PCI
> addresses and also to validate user-provided PCI addresses to assure that
> the desired criterion was met (otherwise someone would manually select a PCI
> address on a controller that supported hotplug, but then set
> "hotplug='disabled'" and expect hotplug to be magically disabled on the
> slot).
Essentially this is the using "hotpluggable=yes|no" on the device as
a policy knob to control device placement, to again workaround the
fact that pcie-root-port are hardcoded to always report themselves
as hotpluggable.
So I'd think this should be ruled out for the same reason as
option 1.
It has a second downside though. As we see from vfio-pci-nohotplug,
there is a valid use case for a "hotpluggable=yes|no" attribute on
a device for controlling a specific hardware config choice in QEMU.
We don't want to overload this attribute to both control use of
vfio-pci-nohotplug, and also be a policy knob for device placement.
> 3) qemu could add a "hotpluggable=no" commandline option to all PCI devices
> (including vfio-pci) and then do whatever is necessary to make sure this is
> honored in the emulated hardware (is it possible to set this on a per-slot
> basis in a PCI controller? Or must it be done for an entire controller? I
> suppose it's not as much of an issue for pcie-root-port, as long as you're
> not using multiple functions). libvirt would then need to add this option to
> the XML for each device, and management applications would need to set it -
> it would essentially look the same to the management application, but it
> would be implemented differently - instead of libvirt using that flag to
> make a choice about which slot to assign, it would assign PCI addresses in
> the same manner as before, and use the libvirt XML flag to set a QEMU
> commandline flag for the device.
I think this, or something close to it is the desirable way forward
here, as it gives us more explicit control over what the emulated
hardware actually advertizes. So instead of trying to workaround
limitations of QEMU, we'd be working /with/ QEMU to improve its
feature offerings.
In QEMU pcie-root-port either needs to gain a hotpluggable=yes|no
attribute, or a second pcie-root-port-nohotplug needs adding.
Withever of these two approaches are taken in QEMU, this can be
controlled from libvirt via an attribute on the controller.eg
<controller type='pci' model='pcie-root-port' hotpluggable="no|yes"/>
This hotpluggable attribute can be mapped to whichever CLI syntax
QEMU wants to support.
This alone is probably sufficient for the Windows problem motivating
this thread.
There already exists the vfio-pci-nohotplug device, but this is not
exposed by libvirt. So we can add an attribute to <hostdev> to
control its use.
The remaining question is whether there's any compelling reason to
add non-hotpluggable variants of other devices, virtio-net-pci-nohotplug ?
I'd probably /not/ do this, unless there's a clear compelling benefit
it gives which can't be achieved already via the pcie-root-port
hotpluggability controls.
For management applications, with Q35 we already recommend that they
explicitly add *many*
<controller type='pci' model='pcie-root-port'/>
to new guests. Enough to cover all the initial cold-plugged devices,
and enough spare ports to enable future hotplug of extra devices.
OpenStack for example will add 32 pcie-root-ports, so that Q35 has
approximately the same hotplug capacity as i440fx would have offered.
To control hotplug, management apps simply need tweak what they're
doing with pcie-root-ports with an extra attribute
eg, consider there were 4 devices on the initially booted VM which
need hotplug disabled, and we still want freedom to hotplug 2
extra devices at runtime.
<controller type='pci' model='pcie-root-port' hotplug="no"/>
<controller type='pci' model='pcie-root-port' hotplug="no"/>
<controller type='pci' model='pcie-root-port' hotplug="no"/>
<controller type='pci' model='pcie-root-port' hotplug="no"/>
<controller type='pci' model='pcie-root-port' hotplug="yes"/>
<controller type='pci' model='pcie-root-port' hotplug="yes"/>
This is quite easy, as applications still do *not* have to taken on
responsibility for full PCI device addressing. They merely need to
be able to count how many PCI devices they're using. The only "gotcha"
is if they forget about the auto-added USB, VGA and Balloon devices,
but that's not a big deal.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
2020-02-04 16:35 ` Michael S. Tsirkin
@ 2020-02-05 11:36 ` Daniel P. Berrangé
2020-02-05 13:10 ` Michael S. Tsirkin
0 siblings, 1 reply; 8+ messages in thread
From: Daniel P. Berrangé @ 2020-02-05 11:36 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: libvir-list, Julia Suvorova, qemu-devel, Laine Stump
On Tue, Feb 04, 2020 at 11:35:37AM -0500, Michael S. Tsirkin wrote:
> On Tue, Feb 04, 2020 at 05:13:54PM +0100, Julia Suvorova wrote:
> > On Tue, Feb 4, 2020 at 11:26 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Feb 03, 2020 at 05:19:51PM -0500, Laine Stump wrote:
> > > > 3) qemu could add a "hotpluggable=no" commandline option to all PCI devices
> > > > (including vfio-pci) and then do whatever is necessary to make sure this is
> > > > honored in the emulated hardware (is it possible to set this on a per-slot
> > > > basis in a PCI controller? Or must it be done for an entire controller?
> > >
> > > I think it's possible on a per-slot basis, yes.
> >
> > There's a "Hot-Plug Capable" option in Slot Capability register, so we
> > can switch it off. But it's only for pcie devices, can't say anything
> > about conventional pci.
> >
> > Best regards, Julia Suvorova.
>
> For conventional PCI, we can drop SHPC capability and remove
> the eject method from ACPI.
Before considering this, is there any compelling reason to care about
this for PCI ? Currently with i440fx there's no direct representation
of the 32 slots as objects in either QEMU or libvirt. So extending this
to allow disabling hotplug for i440fx PCI slots is going to need much
more config work for QEMU, libvirt and mgmt apps. Personally I'd only
do this for PCIe until there's a clear requirement given for legacy PCI
support too.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
2020-02-05 11:36 ` Daniel P. Berrangé
@ 2020-02-05 13:10 ` Michael S. Tsirkin
0 siblings, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2020-02-05 13:10 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: libvir-list, Julia Suvorova, qemu-devel, Laine Stump
On Wed, Feb 05, 2020 at 11:36:37AM +0000, Daniel P. Berrangé wrote:
> On Tue, Feb 04, 2020 at 11:35:37AM -0500, Michael S. Tsirkin wrote:
> > On Tue, Feb 04, 2020 at 05:13:54PM +0100, Julia Suvorova wrote:
> > > On Tue, Feb 4, 2020 at 11:26 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Feb 03, 2020 at 05:19:51PM -0500, Laine Stump wrote:
> > > > > 3) qemu could add a "hotpluggable=no" commandline option to all PCI devices
> > > > > (including vfio-pci) and then do whatever is necessary to make sure this is
> > > > > honored in the emulated hardware (is it possible to set this on a per-slot
> > > > > basis in a PCI controller? Or must it be done for an entire controller?
> > > >
> > > > I think it's possible on a per-slot basis, yes.
> > >
> > > There's a "Hot-Plug Capable" option in Slot Capability register, so we
> > > can switch it off. But it's only for pcie devices, can't say anything
> > > about conventional pci.
> > >
> > > Best regards, Julia Suvorova.
> >
> > For conventional PCI, we can drop SHPC capability and remove
> > the eject method from ACPI.
>
> Before considering this, is there any compelling reason to care about
> this for PCI ?
Not that I know. I simply answered Julia's question.
> Currently with i440fx there's no direct representation
> of the 32 slots as objects in either QEMU or libvirt. So extending this
> to allow disabling hotplug for i440fx PCI slots is going to need much
> more config work for QEMU, libvirt and mgmt apps. Personally I'd only
> do this for PCIe until there's a clear requirement given for legacy PCI
> support too.
Makes sense to me.
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Disabling PCI "hot-unplug" for a guest (and/or a single PCI device)
2020-02-04 18:43 ` Daniel P. Berrangé
@ 2020-02-05 16:29 ` Laine Stump
0 siblings, 0 replies; 8+ messages in thread
From: Laine Stump @ 2020-02-05 16:29 UTC (permalink / raw)
To: Daniel P. Berrangé; +Cc: libvir-list, qemu-devel
On 2/4/20 1:43 PM, Daniel P. Berrangé wrote:
> On Mon, Feb 03, 2020 at 05:19:51PM -0500, Laine Stump wrote:
>> Although I've never experienced it, due to not running Windows guests, I've
>> recently learned that a Windows guest permits a user (hopefully only one
>> with local admin privileges??!) to "hot-unplug" any PCI device. I've also
>> learned that some hypervisor admins don't want to permit admins of the
>> virtual machines they're managing to unplug PCI devices. I believe this is
>> impossible to prevent on an i440fx-based machinetype, and can only be done
>> on a q35-based machinetype by assigning the devices to the root bus (so that
>> they are seen as integrated devices) rather than to a pcie-root-port. But
>> when libvirt is assigning PCI addresses to devices in a q35-base guest, it
>> will *always* assign a PCIe device to a pcie-root-port specifically so that
>> hotplug is possible (this was done to maintain functional parity with i440fx
>> guests, where all PCI slots support hotplug).
>
> After speaking with Alex & Laine on IRC, I learnt some further relevant
> points
>
> - In a "typical" physical machine PCI slots will not be marked
> as hotpluggable - /sys/bus/pci/slots only has 2 entries on
> my HP DL180, corresponding to unused physical PCI slots
>
> - QEMU is hardcoded to report all pci & pcie-root-ports as hotpluggable.
> So /sys/bus/pci/slots on i440fx has 31 entries, one for every
> device, while on q35 it has one entry for every pcie-root-port
> IIUC.
>
> - It is conceptually possible to enhance pcie-root-port device
> to allow its hotplug capability to be toggled. Alternatively
> a parallel pcie-root-port-nohotplug device could be created.
> The end result would be the same from guest POV
>
> - The vfio-pci device has a companion vfio-pci-nohotplug
> device. The difference is simply whether the QEMU DEviceClass
> has the "hotpluggable" attribute set, and is separate from
> whether the PCI(e) root port has hotplug enabled
>
>
> The last point here about vfio-pci si particularly important,
> as it shows libvirt needs to be capable of tracking hotpluggability
> independantly on the PCI port and the PCI device attached to the
> port.
>
>
>> 1) Rather than leaving the task of assignung the PCI addresses of devices to
>> libvirt (which is what essentially *all* management apps that use libvirt
>> currently do), the management application could itself directly assign the
>> PCI addressed of all devices to be slots on pcie.0.
>
> [snip]
>
> This is essentially a hack to work around the fact the the pcie-root-port
> is hardcoded to report itself as hotpluggable.
>
> As such I don't consider this is serious long term solution. If you
> absolutely cannot wait for a newer libvirt/QEMU, this solution could
> be used as a quick hack for mgmt apps, but long term we need todo
> better.
>
>> 2) libvirt could gain a knob "somewhere" in the domain XML to force a single
>> device, or all devices, to be assigned to a PCI address on pcie.0 rather
>> than on a pcie-root-port. This could be thought of as a "hint" about device
>> placement, as well as extra validation in the case that a PCI address has
>> been manually assigned. So, for example, let's say a "hotplug='disable'"
>> option is added somewhere at the top level of the domain (maybe "<hotplug
>> enable='no'/>" inside <features> or something like that); when PCI addresses
>> are assigned by libvirt, it would attempt to find a slot on a controller
>> that didn't support hotplug. And/or a similar knob could be added to each
>> device. In both cases, the setting would be used both when assigning PCI
>> addresses and also to validate user-provided PCI addresses to assure that
>> the desired criterion was met (otherwise someone would manually select a PCI
>> address on a controller that supported hotplug, but then set
>> "hotplug='disabled'" and expect hotplug to be magically disabled on the
>> slot).
>
> Essentially this is the using "hotpluggable=yes|no" on the device as
> a policy knob to control device placement, to again workaround the
> fact that pcie-root-port are hardcoded to always report themselves
> as hotpluggable.
>
> So I'd think this should be ruled out for the same reason as
> option 1.
>
> It has a second downside though. As we see from vfio-pci-nohotplug,
> there is a valid use case for a "hotpluggable=yes|no" attribute on
> a device for controlling a specific hardware config choice in QEMU.
>
> We don't want to overload this attribute to both control use of
> vfio-pci-nohotplug, and also be a policy knob for device placement.
>
>
>> 3) qemu could add a "hotpluggable=no" commandline option to all PCI devices
>> (including vfio-pci) and then do whatever is necessary to make sure this is
>> honored in the emulated hardware (is it possible to set this on a per-slot
>> basis in a PCI controller? Or must it be done for an entire controller? I
>> suppose it's not as much of an issue for pcie-root-port, as long as you're
>> not using multiple functions). libvirt would then need to add this option to
>> the XML for each device, and management applications would need to set it -
>> it would essentially look the same to the management application, but it
>> would be implemented differently - instead of libvirt using that flag to
>> make a choice about which slot to assign, it would assign PCI addresses in
>> the same manner as before, and use the libvirt XML flag to set a QEMU
>> commandline flag for the device.
>
> I think this, or something close to it is the desirable way forward
> here, as it gives us more explicit control over what the emulated
> hardware actually advertizes. So instead of trying to workaround
> limitations of QEMU, we'd be working /with/ QEMU to improve its
> feature offerings.
>
> In QEMU pcie-root-port either needs to gain a hotpluggable=yes|no
> attribute, or a second pcie-root-port-nohotplug needs adding.
>
> Withever of these two approaches are taken in QEMU, this can be
> controlled from libvirt via an attribute on the controller.eg
>
> <controller type='pci' model='pcie-root-port' hotpluggable="no|yes"/>
>
> This hotpluggable attribute can be mapped to whichever CLI syntax
> QEMU wants to support.
>
> This alone is probably sufficient for the Windows problem motivating
> this thread.
>
> There already exists the vfio-pci-nohotplug device, but this is not
> exposed by libvirt. So we can add an attribute to <hostdev> to
> control its use.
>
> The remaining question is whether there's any compelling reason to
> add non-hotpluggable variants of other devices, virtio-net-pci-nohotplug ?
>
> I'd probably /not/ do this, unless there's a clear compelling benefit
> it gives which can't be achieved already via the pcie-root-port
> hotpluggability controls.
>
>
> For management applications, with Q35 we already recommend that they
> explicitly add *many*
>
> <controller type='pci' model='pcie-root-port'/>
>
> to new guests. Enough to cover all the initial cold-plugged devices,
> and enough spare ports to enable future hotplug of extra devices.
> OpenStack for example will add 32 pcie-root-ports, so that Q35 has
> approximately the same hotplug capacity as i440fx would have offered.
>
> To control hotplug, management apps simply need tweak what they're
> doing with pcie-root-ports with an extra attribute
>
> eg, consider there were 4 devices on the initially booted VM which
> need hotplug disabled, and we still want freedom to hotplug 2
> extra devices at runtime.
>
> <controller type='pci' model='pcie-root-port' hotplug="no"/>
> <controller type='pci' model='pcie-root-port' hotplug="no"/>
> <controller type='pci' model='pcie-root-port' hotplug="no"/>
> <controller type='pci' model='pcie-root-port' hotplug="no"/>
> <controller type='pci' model='pcie-root-port' hotplug="yes"/>
> <controller type='pci' model='pcie-root-port' hotplug="yes"/>
>
> This is quite easy,
Not quite so easy as you might think :-). The problem is that, in our
fervor to make Q35 guests "as similar as possible" to 440fx guests, we
have made the PCI address assignment code search out a hotplug-capable
slot for each unassigned device, and if no hotpluggable slot is
available it automatically adds a new root port so that there is a
hotplug-capable slot. With the current set of VIR_PCI_CONNECT_TYPE
flags, unless we lie to that code and tell it that these new root ports
support hotplug, the the devices will be assigned first to the two
available hotpluggable root ports (in your example above), and then when
those are both used, it will start adding new root ports - the ports
with hotplug='no' will never be used.
On the other hand, if we change the code to *not* require a
hotplug-capable port, then the new devices will just be assigned to
slots on the root bus.
So the address assignment code is going to need to be re-jiggered. I
guess we would need to create yet another PCI controller capability for
use by the address assignment code - "Not
Hot-Pluggable-But-Still-Okay-For-Auto-Assign-When-The-Guest-Is-Powered-Off".
So, a mental exercise - let's say we make a new virDomainPCIConnectFlag
called VIR_PCI_CONNECT_AUTO_ASSIGNABLE (that's a bit easier to say than
what I had in the previous paragraph). That connect type would be added
to all the controllers that currently have CONNECT_HOTPLUGGABLE set:
pci-root
pci-bridge
pcie-root
pcie-pci-bridge
pcie-root-port
pcie-downstream-port
We then add a new controller model
VIR_DOMAIN_CONTROLLER_PCIE_ROOT_PORT_NO_HOTPLUG and set it as
AUTO_ASSIGNABLE, but *not* HOTPLUGGABLE.
Now when a device is added to "cold" config, instead of requiring
HOTPLUGGABLE, we require AUTO_ASSIGNABLE, which will get the same
assignment as before the change (except that the NO_HOTPLUG root ports
will also be used). If we actually hotplug a device, then we continue to
require the HOTPLUGGABLE connect flag.
So that much works, but we still have the problem that in order to add a
device to the "cold" config that can later be hotplugged, we have to
make sure all of the hotplug='no' controllers are in use.
I guess that will work, but it's getting pretty obtuse and complicated
just to avoid adding a single flag that says "this device shouldn't be
put in a hotplug-capable slot" directly to the XML for the device that
you don't want hotplugged. It would be *much* simpler if qemu could
provide an attribute for endpoint devices that would disable hotplug for
that device, rather than requiring the option to be set on the
controller that the device is plugged into. Then libvirt could just have
that attribute added to the XML, and the management app would just
simply add "hotplug='no'" to each device. It's too bad that the method
vfio-pci-nohotplug uses apparently doesn't work for PCIe...
> as applications still do *not* have to taken on
> responsibility for full PCI device addressing. They merely need to
> be able to count how many PCI devices they're using. The only "gotcha"
> is if they forget about the auto-added USB, VGA and Balloon devices,
> but that's not a big deal.
>
> Regards,
> Daniel
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-02-05 16:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03 22:19 Disabling PCI "hot-unplug" for a guest (and/or a single PCI device) Laine Stump
2020-02-04 10:24 ` Michael S. Tsirkin
2020-02-04 16:13 ` Julia Suvorova
2020-02-04 16:35 ` Michael S. Tsirkin
2020-02-05 11:36 ` Daniel P. Berrangé
2020-02-05 13:10 ` Michael S. Tsirkin
2020-02-04 18:43 ` Daniel P. Berrangé
2020-02-05 16:29 ` Laine Stump
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.