* [Qemu-devel] pci-assign fails with read error on config-space file
@ 2016-10-28 11:28 Henning Schild
2016-10-28 15:22 ` Laszlo Ersek
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Henning Schild @ 2016-10-28 11:28 UTC (permalink / raw)
To: libvir-list, libvirt-users, qemu-devel, qemu-discuss; +Cc: Henning Schild
Hey,
i am running an unusual setup where i assign pci devices behind the
back of libvirt. I have two options to do that:
1. a wrapper script for qemu that takes care of suid-root and appends
arguments for pci-assign
2. virsh qemu-monitor-command ... 'device_add pci-assign...'
I know i should probably not be doing this, it is a workaround to
introduce fine-grained pci-assignment in an openstack setup, where
vendor and device id are not enough to pick the right device for a vm.
In both cases qemu will crash with the following output:
> qemu: hardware error: pci read failed, ret = 0 errno = 22
followed by the usual machine state dump. With strace i found it to be
a failing read on the config space file of my device.
/sys/bus/pci/devices/0000:xx:xx.x/config
A few reads out of that file succeeded, as well as accesses on vendor
etc.
Manually launching a qemu with the pci-assign works without a problem,
so i "blame" libvirt and the cgroup environment the qemu ends up in.
So i put a bash into the exact same cgroup setup - next to a running
qemu, expecting a dd or hexdump on the config-space file to fail. But
from that bash i can read the file without a problem.
Has anyone seen that problem before? Right now i do not know what i
am missing, maybe qemu is hitting some limits configured for the
cgroups or whatever. I can not use pci-assign from libvirt, but if i
did would it configure cgroups in a different way or relax some limits?
What would be a good next step to debug that? Right now i am looking at
kernel event traces, but the machine is pretty big and so is the trace.
That assignment used to work and i do not know how it broke, i have
tried combinations of several kernels, versions of libvirt and qemu.
(kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and 2.7)
All combinations show the same problem, even the ones that work on
other machines. So when it comes to software versions the problem could
well be caused by a software update of another component, that i
got with the package manager and did not compile myself. It is a debian
8.6 with all recent updates installed. My guess would be that systemd
could have an influence on cgroups or limits causing such a problem.
regards,
Henning
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] pci-assign fails with read error on config-space file
2016-10-28 11:28 [Qemu-devel] pci-assign fails with read error on config-space file Henning Schild
@ 2016-10-28 15:22 ` Laszlo Ersek
2016-11-02 9:40 ` Henning Schild
2016-10-28 15:25 ` [Qemu-devel] [libvirt-users] " Laine Stump
2016-11-02 9:54 ` [Qemu-devel] " Daniel P. Berrange
2 siblings, 1 reply; 8+ messages in thread
From: Laszlo Ersek @ 2016-10-28 15:22 UTC (permalink / raw)
To: Henning Schild
Cc: libvir-list, libvirt-users, qemu-devel, qemu-discuss, Alex Williamson
On 10/28/16 13:28, Henning Schild wrote:
> Hey,
>
> i am running an unusual setup where i assign pci devices behind the
> back of libvirt. I have two options to do that:
> 1. a wrapper script for qemu that takes care of suid-root and appends
> arguments for pci-assign
> 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
>
> I know i should probably not be doing this, it is a workaround to
> introduce fine-grained pci-assignment in an openstack setup, where
> vendor and device id are not enough to pick the right device for a vm.
(1) The libvirt domain XML identifies the host PCI device to assign by
full PCI address (see the <source> element:
<http://libvirt.org/formatdomain.html#elementsHostDev>); it does not
filter with vendor/device ID.
So, I believe your comment refers to the pci-stub host kernel driver not
being flexible enough for binding vs. not binding different instances of
the same vendor/device ID.
If that's the case, would you be helped by the following host kernel patch?
[PATCH] PCI: pci-stub: accept exceptions to the ID- and class-based matching
<http://www.spinics.net/lists/linux-pci/msg55497.html>
(2) Is there any reason (other than (1)) that you are using the legacy /
deprecated pci-assign method, rather than VFIO?
I suggest to evaluate whether the "pci-stub.except=..." kernel parameter
helped your use case, and if (consequently) you could move to a fully
libvirt + VFIO based config.
Thanks
Laszlo
>
> In both cases qemu will crash with the following output:
>
>> qemu: hardware error: pci read failed, ret = 0 errno = 22
>
> followed by the usual machine state dump. With strace i found it to be
> a failing read on the config space file of my device.
> /sys/bus/pci/devices/0000:xx:xx.x/config
> A few reads out of that file succeeded, as well as accesses on vendor
> etc.
>
> Manually launching a qemu with the pci-assign works without a problem,
> so i "blame" libvirt and the cgroup environment the qemu ends up in.
> So i put a bash into the exact same cgroup setup - next to a running
> qemu, expecting a dd or hexdump on the config-space file to fail. But
> from that bash i can read the file without a problem.
>
> Has anyone seen that problem before? Right now i do not know what i
> am missing, maybe qemu is hitting some limits configured for the
> cgroups or whatever. I can not use pci-assign from libvirt, but if i
> did would it configure cgroups in a different way or relax some limits?
>
> What would be a good next step to debug that? Right now i am looking at
> kernel event traces, but the machine is pretty big and so is the trace.
>
> That assignment used to work and i do not know how it broke, i have
> tried combinations of several kernels, versions of libvirt and qemu.
> (kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and 2.7)
> All combinations show the same problem, even the ones that work on
> other machines. So when it comes to software versions the problem could
> well be caused by a software update of another component, that i
> got with the package manager and did not compile myself. It is a debian
> 8.6 with all recent updates installed. My guess would be that systemd
> could have an influence on cgroups or limits causing such a problem.
>
> regards,
> Henning
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [libvirt-users] pci-assign fails with read error on config-space file
2016-10-28 11:28 [Qemu-devel] pci-assign fails with read error on config-space file Henning Schild
2016-10-28 15:22 ` Laszlo Ersek
@ 2016-10-28 15:25 ` Laine Stump
2016-10-28 17:08 ` Alex Williamson
2016-11-02 10:34 ` Henning Schild
2016-11-02 9:54 ` [Qemu-devel] " Daniel P. Berrange
2 siblings, 2 replies; 8+ messages in thread
From: Laine Stump @ 2016-10-28 15:25 UTC (permalink / raw)
To: libvir-list, libvirt-users, qemu-devel, qemu-discuss; +Cc: Henning Schild
On 10/28/2016 07:28 AM, Henning Schild wrote:
> Hey,
>
> i am running an unusual setup where i assign pci devices behind the
> back of libvirt. I have two options to do that:
> 1. a wrapper script for qemu that takes care of suid-root and appends
> arguments for pci-assign
> 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
With any reasonably modern version of Linux/qemu/libvirt, you should not
be using pci-assign, but should use vfio-pci instead. pci-assign is old,
unmaintained, and deprecated (and any other bad words you can think of).
Also, have you done anything to lock the guest's memory in host RAM?
This is necessary so that the source/destination of DMA reads/writes is
always present. It is done automatically by libvirt as required *when
libvirt knows that a device is being assigned to the guest*, but if
you're going behind libvirt's back, you need to take care of that
yourself (or alternately, don't go behind libvirt's back, which is the
greatly preferred alternative!)
>
> I know i should probably not be doing this,
Yes, that is a serious understatement :-) And I suspect that it isn't
necessary.
> it is a workaround to
> introduce fine-grained pci-assignment in an openstack setup, where
> vendor and device id are not enough to pick the right device for a vm.
libvirt selects the device according to its PCI address, not vendor and
device id. Is that not "fine-grained" enough? (And does OpenStack not
let you select devices based on their PCI address?)
>
> In both cases qemu will crash with the following output:
>
>> qemu: hardware error: pci read failed, ret = 0 errno = 22
> followed by the usual machine state dump. With strace i found it to be
> a failing read on the config space file of my device.
> /sys/bus/pci/devices/0000:xx:xx.x/config
> A few reads out of that file succeeded, as well as accesses on vendor
> etc.
>
> Manually launching a qemu with the pci-assign works without a problem,
> so i "blame" libvirt and the cgroup environment the qemu ends up in.
> So i put a bash into the exact same cgroup setup - next to a running
> qemu, expecting a dd or hexdump on the config-space file to fail. But
> from that bash i can read the file without a problem.
>
> Has anyone seen that problem before?
No, because nobody else (that I've ever heard) is doing what you are
doing. You're going around behind the back of libvirt (and OpenStack)
to do device assignment with a method that was replaced with something
newer/better/etc about 3 years ago, and in the process are likely
missing a lot of the details that would otherwise be automatically
handled by libvirt.
> Right now i do not know what i
> am missing, maybe qemu is hitting some limits configured for the
> cgroups or whatever. I can not use pci-assign from libvirt, but if i
> did would it configure cgroups in a different way or relax some limits?
>
> What would be a good next step to debug that? Right now i am looking at
> kernel event traces, but the machine is pretty big and so is the trace.
My recommendation would be this:
1) look at OpenStack to see if it allows selecting the device to assign
by PCI address. If so, use that (it will just tell libvirt "assign this
device", and libvirt will automatically use VFIO for the device
assignment if it's available (which it will be))
2) if (1) is a deadend (i.e. OpenStack doesn't allow you to select based
on PCI address), use your "sneaky backdoor method" to do "virsh
attach-device somexmlfile.xml", where somexmlfile.xml has a proper
<hostdev> element to select and assign the host device you want. Again,
libvirt will automatically figure out if VFIO can be used, and will
properly setup everything necessary related to cgroups, locked memory, etc.
>
> That assignment used to work and i do not know how it broke, i have
> tried combinations of several kernels, versions of libvirt and qemu.
> (kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and 2.7)
> All combinations show the same problem, even the ones that work on
> other machines. So when it comes to software versions the problem could
> well be caused by a software update of another component, that i
> got with the package manager and did not compile myself. It is a debian
> 8.6 with all recent updates installed. My guess would be that systemd
> could have an influence on cgroups or limits causing such a problem.
That you would need to think of such things points out that your current
setup is fragile and ultimately unmaintainable. Please consider
"coloring inside the lines" :-) (We'd be happy to help if there are any
hangups along the way, either on the libvirt-users mailing list or in
the #virt channel on irc.oftc.net).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [libvirt-users] pci-assign fails with read error on config-space file
2016-10-28 15:25 ` [Qemu-devel] [libvirt-users] " Laine Stump
@ 2016-10-28 17:08 ` Alex Williamson
2016-11-02 10:34 ` Henning Schild
1 sibling, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2016-10-28 17:08 UTC (permalink / raw)
To: Laine Stump
Cc: libvir-list, libvirt-users, qemu-devel, qemu-discuss, Henning Schild
On Fri, 28 Oct 2016 11:25:55 -0400
Laine Stump <laine@redhat.com> wrote:
> On 10/28/2016 07:28 AM, Henning Schild wrote:
> > Hey,
> >
> > i am running an unusual setup where i assign pci devices behind the
> > back of libvirt. I have two options to do that:
> > 1. a wrapper script for qemu that takes care of suid-root and appends
> > arguments for pci-assign
> > 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
>
> With any reasonably modern version of Linux/qemu/libvirt, you should not
> be using pci-assign, but should use vfio-pci instead. pci-assign is old,
> unmaintained, and deprecated (and any other bad words you can think of).
>
> Also, have you done anything to lock the guest's memory in host RAM?
> This is necessary so that the source/destination of DMA reads/writes is
> always present. It is done automatically by libvirt as required *when
> libvirt knows that a device is being assigned to the guest*, but if
> you're going behind libvirt's back, you need to take care of that
> yourself (or alternately, don't go behind libvirt's back, which is the
> greatly preferred alternative!)
Note that pci-assign doesn't care about user locked memory limits, so
that much is not required for this deprecated use case, but I fully
agree that going behind libvirt's back is completely unadvised and
pci-assign is deprecated and likely broken. Maybe we should even
consider removing it in the QEMU2.8 release cycle. Use vfio-pci and
use the mechanisms provided in libvirt to attach the device to the VM,
if these don't work, file bugs and improve the environment to meet your
needs rather than working around it. I can't figure out from the
original report what specifically about the environment prevents
use of libvirt <hostdev> entries. Thanks,
Alex
> >
> > I know i should probably not be doing this,
>
>
> Yes, that is a serious understatement :-) And I suspect that it isn't
> necessary.
>
>
> > it is a workaround to
> > introduce fine-grained pci-assignment in an openstack setup, where
> > vendor and device id are not enough to pick the right device for a vm.
>
> libvirt selects the device according to its PCI address, not vendor and
> device id. Is that not "fine-grained" enough? (And does OpenStack not
> let you select devices based on their PCI address?)
>
> >
> > In both cases qemu will crash with the following output:
> >
> >> qemu: hardware error: pci read failed, ret = 0 errno = 22
> > followed by the usual machine state dump. With strace i found it to be
> > a failing read on the config space file of my device.
> > /sys/bus/pci/devices/0000:xx:xx.x/config
> > A few reads out of that file succeeded, as well as accesses on vendor
> > etc.
> >
> > Manually launching a qemu with the pci-assign works without a problem,
> > so i "blame" libvirt and the cgroup environment the qemu ends up in.
> > So i put a bash into the exact same cgroup setup - next to a running
> > qemu, expecting a dd or hexdump on the config-space file to fail. But
> > from that bash i can read the file without a problem.
> >
> > Has anyone seen that problem before?
>
> No, because nobody else (that I've ever heard) is doing what you are
> doing. You're going around behind the back of libvirt (and OpenStack)
> to do device assignment with a method that was replaced with something
> newer/better/etc about 3 years ago, and in the process are likely
> missing a lot of the details that would otherwise be automatically
> handled by libvirt.
>
>
> > Right now i do not know what i
> > am missing, maybe qemu is hitting some limits configured for the
> > cgroups or whatever. I can not use pci-assign from libvirt, but if i
> > did would it configure cgroups in a different way or relax some limits?
> >
> > What would be a good next step to debug that? Right now i am looking at
> > kernel event traces, but the machine is pretty big and so is the trace.
>
>
> My recommendation would be this:
>
> 1) look at OpenStack to see if it allows selecting the device to assign
> by PCI address. If so, use that (it will just tell libvirt "assign this
> device", and libvirt will automatically use VFIO for the device
> assignment if it's available (which it will be))
>
> 2) if (1) is a deadend (i.e. OpenStack doesn't allow you to select based
> on PCI address), use your "sneaky backdoor method" to do "virsh
> attach-device somexmlfile.xml", where somexmlfile.xml has a proper
> <hostdev> element to select and assign the host device you want. Again,
> libvirt will automatically figure out if VFIO can be used, and will
> properly setup everything necessary related to cgroups, locked memory, etc.
>
>
> >
> > That assignment used to work and i do not know how it broke, i have
> > tried combinations of several kernels, versions of libvirt and qemu.
> > (kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and 2.7)
> > All combinations show the same problem, even the ones that work on
> > other machines. So when it comes to software versions the problem could
> > well be caused by a software update of another component, that i
> > got with the package manager and did not compile myself. It is a debian
> > 8.6 with all recent updates installed. My guess would be that systemd
> > could have an influence on cgroups or limits causing such a problem.
>
> That you would need to think of such things points out that your current
> setup is fragile and ultimately unmaintainable. Please consider
> "coloring inside the lines" :-) (We'd be happy to help if there are any
> hangups along the way, either on the libvirt-users mailing list or in
> the #virt channel on irc.oftc.net).
>
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] pci-assign fails with read error on config-space file
2016-10-28 15:22 ` Laszlo Ersek
@ 2016-11-02 9:40 ` Henning Schild
0 siblings, 0 replies; 8+ messages in thread
From: Henning Schild @ 2016-11-02 9:40 UTC (permalink / raw)
To: Laszlo Ersek
Cc: libvir-list, libvirt-users, qemu-devel, qemu-discuss, Alex Williamson
Am Fri, 28 Oct 2016 17:22:41 +0200
schrieb Laszlo Ersek <lersek@redhat.com>:
> On 10/28/16 13:28, Henning Schild wrote:
> > Hey,
> >
> > i am running an unusual setup where i assign pci devices behind the
> > back of libvirt. I have two options to do that:
> > 1. a wrapper script for qemu that takes care of suid-root and
> > appends arguments for pci-assign
> > 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
> >
> > I know i should probably not be doing this, it is a workaround to
> > introduce fine-grained pci-assignment in an openstack setup, where
> > vendor and device id are not enough to pick the right device for a
> > vm.
>
> (1) The libvirt domain XML identifies the host PCI device to assign by
> full PCI address (see the <source> element:
> <http://libvirt.org/formatdomain.html#elementsHostDev>); it does not
> filter with vendor/device ID.
>
> So, I believe your comment refers to the pci-stub host kernel driver
> not being flexible enough for binding vs. not binding different
> instances of the same vendor/device ID.
My comment referred to OpenStack. The version we are using assigns PCI
devices purely by device and vendor ID. The pci stub is no problem at
all, you can always bind/unbind by address.
> If that's the case, would you be helped by the following host kernel
> patch?
>
> [PATCH] PCI: pci-stub: accept exceptions to the ID- and class-based
> matching
>
> <http://www.spinics.net/lists/linux-pci/msg55497.html>
>
> (2) Is there any reason (other than (1)) that you are using the
> legacy / deprecated pci-assign method, rather than VFIO?
>
> I suggest to evaluate whether the "pci-stub.except=..." kernel
> parameter helped your use case, and if (consequently) you could move
> to a fully libvirt + VFIO based config.
I would like to do that in the long run and will look into the options.
But for now i was hoping for a quick answer to make the hacky version
work again.
Thanks,
Henning
> Thanks
> Laszlo
>
> >
> > In both cases qemu will crash with the following output:
> >
> >> qemu: hardware error: pci read failed, ret = 0 errno = 22
> >
> > followed by the usual machine state dump. With strace i found it to
> > be a failing read on the config space file of my device.
> > /sys/bus/pci/devices/0000:xx:xx.x/config
> > A few reads out of that file succeeded, as well as accesses on
> > vendor etc.
> >
> > Manually launching a qemu with the pci-assign works without a
> > problem, so i "blame" libvirt and the cgroup environment the qemu
> > ends up in. So i put a bash into the exact same cgroup setup - next
> > to a running qemu, expecting a dd or hexdump on the config-space
> > file to fail. But from that bash i can read the file without a
> > problem.
> >
> > Has anyone seen that problem before? Right now i do not know what i
> > am missing, maybe qemu is hitting some limits configured for the
> > cgroups or whatever. I can not use pci-assign from libvirt, but if i
> > did would it configure cgroups in a different way or relax some
> > limits?
> >
> > What would be a good next step to debug that? Right now i am
> > looking at kernel event traces, but the machine is pretty big and
> > so is the trace.
> >
> > That assignment used to work and i do not know how it broke, i have
> > tried combinations of several kernels, versions of libvirt and qemu.
> > (kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and
> > 2.7) All combinations show the same problem, even the ones that
> > work on other machines. So when it comes to software versions the
> > problem could well be caused by a software update of another
> > component, that i got with the package manager and did not compile
> > myself. It is a debian 8.6 with all recent updates installed. My
> > guess would be that systemd could have an influence on cgroups or
> > limits causing such a problem.
> >
> > regards,
> > Henning
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] pci-assign fails with read error on config-space file
2016-10-28 11:28 [Qemu-devel] pci-assign fails with read error on config-space file Henning Schild
2016-10-28 15:22 ` Laszlo Ersek
2016-10-28 15:25 ` [Qemu-devel] [libvirt-users] " Laine Stump
@ 2016-11-02 9:54 ` Daniel P. Berrange
2016-11-02 11:45 ` Henning Schild
2 siblings, 1 reply; 8+ messages in thread
From: Daniel P. Berrange @ 2016-11-02 9:54 UTC (permalink / raw)
To: Henning Schild; +Cc: libvir-list, libvirt-users, qemu-devel, qemu-discuss
On Fri, Oct 28, 2016 at 01:28:19PM +0200, Henning Schild wrote:
> Hey,
>
> i am running an unusual setup where i assign pci devices behind the
> back of libvirt. I have two options to do that:
> 1. a wrapper script for qemu that takes care of suid-root and appends
> arguments for pci-assign
> 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
>
> I know i should probably not be doing this, it is a workaround to
> introduce fine-grained pci-assignment in an openstack setup, where
> vendor and device id are not enough to pick the right device for a vm.
>
> In both cases qemu will crash with the following output:
>
> > qemu: hardware error: pci read failed, ret = 0 errno = 22
>
> followed by the usual machine state dump. With strace i found it to be
> a failing read on the config space file of my device.
> /sys/bus/pci/devices/0000:xx:xx.x/config
> A few reads out of that file succeeded, as well as accesses on vendor
> etc.
errno == 22, means EINVAL, so it feels unlikely to be a permissions
problem unless the kernel or QEMU is reporting the wrong errno.
> Manually launching a qemu with the pci-assign works without a problem,
> so i "blame" libvirt and the cgroup environment the qemu ends up in.
The 'config' file is a plain file, so not affected by cgroups - that
only affects block devices.
When libvirt runs QEMU, it runs unprivileged qemu:qemu user/group,
so perhaps it is a permissions thing, despite the fact that you're
getting EINVAL, not EACCESS.
It would be interesting to know just what part of the config space
QEMU was trying to read I guess, to better understand why it might
be failing
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [libvirt-users] pci-assign fails with read error on config-space file
2016-10-28 15:25 ` [Qemu-devel] [libvirt-users] " Laine Stump
2016-10-28 17:08 ` Alex Williamson
@ 2016-11-02 10:34 ` Henning Schild
1 sibling, 0 replies; 8+ messages in thread
From: Henning Schild @ 2016-11-02 10:34 UTC (permalink / raw)
To: Laine Stump; +Cc: libvir-list, libvirt-users, qemu-devel, qemu-discuss
Am Fri, 28 Oct 2016 11:25:55 -0400
schrieb Laine Stump <laine@redhat.com>:
> On 10/28/2016 07:28 AM, Henning Schild wrote:
> > Hey,
> >
> > i am running an unusual setup where i assign pci devices behind the
> > back of libvirt. I have two options to do that:
> > 1. a wrapper script for qemu that takes care of suid-root and
> > appends arguments for pci-assign
> > 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
>
> With any reasonably modern version of Linux/qemu/libvirt, you should
> not be using pci-assign, but should use vfio-pci instead. pci-assign
> is old, unmaintained, and deprecated (and any other bad words you can
> think of).
>
> Also, have you done anything to lock the guest's memory in host RAM?
> This is necessary so that the source/destination of DMA reads/writes
> is always present. It is done automatically by libvirt as required
> *when libvirt knows that a device is being assigned to the guest*,
> but if you're going behind libvirt's back, you need to take care of
> that yourself (or alternately, don't go behind libvirt's back, which
> is the greatly preferred alternative!)
Memory locking is taken care of with "-realtime mlock=on".
> >
> > I know i should probably not be doing this,
>
>
> Yes, that is a serious understatement :-) And I suspect that it isn't
> necessary.
I know, but that was never the question ;).
> > it is a workaround to
> > introduce fine-grained pci-assignment in an openstack setup, where
> > vendor and device id are not enough to pick the right device for a
> > vm.
>
> libvirt selects the device according to its PCI address, not vendor
> and device id. Is that not "fine-grained" enough? (And does OpenStack
> not let you select devices based on their PCI address?)
The workaround is indeed for the version of OpenStack we are using.
Recent versions might have support for more fine-grained assignment,
but updating OpenStack is not something i would like to do right now.
Another item on the TODO-list that i would like to keep seperate from
the problem at hand.
> >
> > In both cases qemu will crash with the following output:
> >
> >> qemu: hardware error: pci read failed, ret = 0 errno = 22
> > followed by the usual machine state dump. With strace i found it to
> > be a failing read on the config space file of my device.
> > /sys/bus/pci/devices/0000:xx:xx.x/config
> > A few reads out of that file succeeded, as well as accesses on
> > vendor etc.
> >
> > Manually launching a qemu with the pci-assign works without a
> > problem, so i "blame" libvirt and the cgroup environment the qemu
> > ends up in. So i put a bash into the exact same cgroup setup - next
> > to a running qemu, expecting a dd or hexdump on the config-space
> > file to fail. But from that bash i can read the file without a
> > problem.
> >
> > Has anyone seen that problem before?
>
> No, because nobody else (that I've ever heard) is doing what you are
> doing. You're going around behind the back of libvirt (and
> OpenStack) to do device assignment with a method that was replaced
> with something newer/better/etc about 3 years ago, and in the process
> are likely missing a lot of the details that would otherwise be
> automatically handled by libvirt.
Sure, and my question was aiming at what exactly i could be missing.
That is just to fix a system that used to work and get a better
understanding of "a lot of the details that would otherwise be
automatically handled by libvirt".
>
> > Right now i do not know what i
> > am missing, maybe qemu is hitting some limits configured for the
> > cgroups or whatever. I can not use pci-assign from libvirt, but if i
> > did would it configure cgroups in a different way or relax some
> > limits?
> >
> > What would be a good next step to debug that? Right now i am
> > looking at kernel event traces, but the machine is pretty big and
> > so is the trace.
>
>
> My recommendation would be this:
>
> 1) look at OpenStack to see if it allows selecting the device to
> assign by PCI address. If so, use that (it will just tell libvirt
> "assign this device", and libvirt will automatically use VFIO for the
> device assignment if it's available (which it will be))
The version currently in use does not allow that.
> 2) if (1) is a deadend (i.e. OpenStack doesn't allow you to select
> based on PCI address), use your "sneaky backdoor method" to do "virsh
> attach-device somexmlfile.xml", where somexmlfile.xml has a proper
> <hostdev> element to select and assign the host device you want.
> Again, libvirt will automatically figure out if VFIO can be used, and
> will properly setup everything necessary related to cgroups, locked
> memory, etc.
Thanks! I will try the sneaky .xml method, in that case i will only
have to play tricks on OpenStack and hopefully get all the libvirt
details.
>
> >
> > That assignment used to work and i do not know how it broke, i have
> > tried combinations of several kernels, versions of libvirt and qemu.
> > (kernel 3.18 and 4.4, libvirt 1.3.2 and 2.0.0, and qemu 2.2.1 and
> > 2.7) All combinations show the same problem, even the ones that
> > work on other machines. So when it comes to software versions the
> > problem could well be caused by a software update of another
> > component, that i got with the package manager and did not compile
> > myself. It is a debian 8.6 with all recent updates installed. My
> > guess would be that systemd could have an influence on cgroups or
> > limits causing such a problem.
>
> That you would need to think of such things points out that your
> current setup is fragile and ultimately unmaintainable. Please
> consider "coloring inside the lines" :-) (We'd be happy to help if
> there are any hangups along the way, either on the libvirt-users
> mailing list or in the #virt channel on irc.oftc.net).
It is a legacy reference/demo/proof-of-concept setup for
realtime-enabled VMs, that somehow broke. PCI assignment was used for
NICs when guests did not support virtio.
https://archive.fosdem.org/2016/schedule/event/virt_iaas_real_time_cloud/
Since it is a hack and unmaintainable and does not scale, we do not use
it anymore. But i was curious why it suddenly stopped working in that
old demo setup.
regards,
Henning
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] pci-assign fails with read error on config-space file
2016-11-02 9:54 ` [Qemu-devel] " Daniel P. Berrange
@ 2016-11-02 11:45 ` Henning Schild
0 siblings, 0 replies; 8+ messages in thread
From: Henning Schild @ 2016-11-02 11:45 UTC (permalink / raw)
To: Daniel P. Berrange; +Cc: libvir-list, libvirt-users, qemu-devel, qemu-discuss
Am Wed, 2 Nov 2016 09:54:16 +0000
schrieb "Daniel P. Berrange" <berrange@redhat.com>:
> On Fri, Oct 28, 2016 at 01:28:19PM +0200, Henning Schild wrote:
> > Hey,
> >
> > i am running an unusual setup where i assign pci devices behind the
> > back of libvirt. I have two options to do that:
> > 1. a wrapper script for qemu that takes care of suid-root and
> > appends arguments for pci-assign
> > 2. virsh qemu-monitor-command ... 'device_add pci-assign...'
> >
> > I know i should probably not be doing this, it is a workaround to
> > introduce fine-grained pci-assignment in an openstack setup, where
> > vendor and device id are not enough to pick the right device for a
> > vm.
> >
> > In both cases qemu will crash with the following output:
> >
> > > qemu: hardware error: pci read failed, ret = 0 errno = 22
> >
> > followed by the usual machine state dump. With strace i found it to
> > be a failing read on the config space file of my device.
> > /sys/bus/pci/devices/0000:xx:xx.x/config
> > A few reads out of that file succeeded, as well as accesses on
> > vendor etc.
>
> errno == 22, means EINVAL, so it feels unlikely to be a permissions
> problem unless the kernel or QEMU is reporting the wrong errno.
>
> > Manually launching a qemu with the pci-assign works without a
> > problem, so i "blame" libvirt and the cgroup environment the qemu
> > ends up in.
>
> The 'config' file is a plain file, so not affected by cgroups - that
> only affects block devices.
>
> When libvirt runs QEMU, it runs unprivileged qemu:qemu user/group,
> so perhaps it is a permissions thing, despite the fact that you're
> getting EINVAL, not EACCESS.
If the wrapper qemu decides to assign a PCI device it will use a
suid-root qemu to do so. So it is no EACCESS, as i said other reads
worked fine.
> It would be interesting to know just what part of the config space
> QEMU was trying to read I guess, to better understand why it might
> be failing
I should have said that before, it is a one byte read on offest 64. So
just behind the regular cfg-space.
regards,
Henning
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-11-02 11:44 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-28 11:28 [Qemu-devel] pci-assign fails with read error on config-space file Henning Schild
2016-10-28 15:22 ` Laszlo Ersek
2016-11-02 9:40 ` Henning Schild
2016-10-28 15:25 ` [Qemu-devel] [libvirt-users] " Laine Stump
2016-10-28 17:08 ` Alex Williamson
2016-11-02 10:34 ` Henning Schild
2016-11-02 9:54 ` [Qemu-devel] " Daniel P. Berrange
2016-11-02 11:45 ` Henning Schild
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.