All of lore.kernel.org
 help / color / mirror / Atom feed
* vfio issue in qemu 2.5
@ 2016-02-04 15:01 Goutham GS
  2016-02-04 17:02 ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Goutham GS @ 2016-02-04 15:01 UTC (permalink / raw)
  To: kvm

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

Hi All,

We are facing a vfio issue on qemu 2.5. Really appreciate any help or
pointers. Details are as below:

We are using qemu 2.5 compiled out of git commit
0b0571dd246871f18b7d64b5279511e91e2a7bf6 and are using Linux Kernel 3.18.19
for both host and the VM. We are also using KVM VM with pci-assign'ed SRIOV
VF interfaces.

The issue happens once in a while when a running VM is rebooted. On boot,
the VM hits the following error and stops.

qemu-system-x86_64: -device
vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to set
iommu for container: Bad address
qemu-system-x86_64: -device
vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to setup
container for group 40
qemu-system-x86_64: -device
vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to get
group 40
qemu-system-x86_64: -device
vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: Device initialization
failed

Strange thing is, once this error is hit, no further VMs can be spawned on
the host and all of them run into the same problem. However a reboot of the
host appears to solve the issue.

I have attached the relevant logs.

Regards,
Goutham.

[-- Attachment #2: vfio_issue.tgz --]
[-- Type: application/x-compressed, Size: 2208 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: vfio issue in qemu 2.5
  2016-02-04 15:01 vfio issue in qemu 2.5 Goutham GS
@ 2016-02-04 17:02 ` Alex Williamson
  2016-02-04 18:23   ` Goutham GS
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2016-02-04 17:02 UTC (permalink / raw)
  To: Goutham GS; +Cc: kvm

On Thu, 4 Feb 2016 20:31:17 +0530
"Goutham GS" <goutham@zadarastorage.com> wrote:

> Hi All,
> 
> We are facing a vfio issue on qemu 2.5. Really appreciate any help or
> pointers. Details are as below:
> 
> We are using qemu 2.5 compiled out of git commit
> 0b0571dd246871f18b7d64b5279511e91e2a7bf6 and are using Linux Kernel
> 3.18.19 for both host and the VM. We are also using KVM VM with
> pci-assign'ed SRIOV VF interfaces.
> 
> The issue happens once in a while when a running VM is rebooted. On
> boot, the VM hits the following error and stops.
> 
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to
> set iommu for container: Bad address
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to
> setup container for group 40
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to
> get group 40
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: Device
> initialization failed
> 
> Strange thing is, once this error is hit, no further VMs can be
> spawned on the host and all of them run into the same problem.
> However a reboot of the host appears to solve the issue.
> 
> I have attached the relevant logs.

Is it possible to try a newer kernel on the host?  "Bad address" is
-EFAULT, but I'm not actually able to spot a return path for the
VFIO_SET_IOMMU ioctl that returns -EFAULT.  Is there anything in dmesg
when this triggers?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: vfio issue in qemu 2.5
  2016-02-04 17:02 ` Alex Williamson
@ 2016-02-04 18:23   ` Goutham GS
  2016-02-05  5:19     ` Goutham GS
  0 siblings, 1 reply; 6+ messages in thread
From: Goutham GS @ 2016-02-04 18:23 UTC (permalink / raw)
  To: 'Alex Williamson'; +Cc: kvm

Hello Alex,

Thanks for your quick response.

Unfortunately we are tied to this kernel. Probably we can move to 3.19, if
we are sure of the benefits . Not sure.

Regarding dmesg, there is nothing on the host and the VM never came up to
the point where we could collect dmesg.

Regards,
Goutham.

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Thursday, February 4, 2016 10:32 PM
To: Goutham GS <goutham@zadarastorage.com>
Cc: kvm@vger.kernel.org
Subject: Re: vfio issue in qemu 2.5

On Thu, 4 Feb 2016 20:31:17 +0530
"Goutham GS" <goutham@zadarastorage.com> wrote:

> Hi All,
> 
> We are facing a vfio issue on qemu 2.5. Really appreciate any help or 
> pointers. Details are as below:
> 
> We are using qemu 2.5 compiled out of git commit
> 0b0571dd246871f18b7d64b5279511e91e2a7bf6 and are using Linux Kernel
> 3.18.19 for both host and the VM. We are also using KVM VM with 
> pci-assign'ed SRIOV VF interfaces.
> 
> The issue happens once in a while when a running VM is rebooted. On 
> boot, the VM hits the following error and stops.
> 
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to 
> set iommu for container: Bad address
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to 
> setup container for group 40
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to 
> get group 40
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: Device 
> initialization failed
> 
> Strange thing is, once this error is hit, no further VMs can be 
> spawned on the host and all of them run into the same problem.
> However a reboot of the host appears to solve the issue.
> 
> I have attached the relevant logs.

Is it possible to try a newer kernel on the host?  "Bad address" is -EFAULT,
but I'm not actually able to spot a return path for the VFIO_SET_IOMMU ioctl
that returns -EFAULT.  Is there anything in dmesg when this triggers?
Thanks,

Alex


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: vfio issue in qemu 2.5
  2016-02-04 18:23   ` Goutham GS
@ 2016-02-05  5:19     ` Goutham GS
  2016-02-05  6:02       ` Alex Williamson
  0 siblings, 1 reply; 6+ messages in thread
From: Goutham GS @ 2016-02-05  5:19 UTC (permalink / raw)
  To: 'Alex Williamson'; +Cc: kvm

Hello Alex,

Sorry, I got the timestamps wrong yesterday.

I see the following messages in kern.log at the time of the issue:

Jan 31 03:46:39 qa2-sn2 kernel: [228419.858857] vfio-pci 0000:04:01.4:
enabling device (0000 -> 0002)
Jan 31 03:46:39 qa2-sn2 kernel: [228419.970492] vfio-pci 0000:04:02.7:
enabling device (0000 -> 0002)
Jan 31 03:46:39 qa2-sn2 kernel: [228420.082435] IOMMU: no free domain ids
Jan 31 03:46:39 qa2-sn2 kernel: [228420.124440] IOMMU: no free domain ids

Does this say anything?

Regards,
Goutham.

-----Original Message-----
From: Goutham GS [mailto:goutham@zadarastorage.com] 
Sent: Thursday, February 4, 2016 11:53 PM
To: 'Alex Williamson' <alex.williamson@redhat.com>
Cc: kvm@vger.kernel.org
Subject: RE: vfio issue in qemu 2.5

Hello Alex,

Thanks for your quick response.

Unfortunately we are tied to this kernel. Probably we can move to 3.19, if
we are sure of the benefits . Not sure.

Regarding dmesg, there is nothing on the host and the VM never came up to
the point where we could collect dmesg.

Regards,
Goutham.

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com]
Sent: Thursday, February 4, 2016 10:32 PM
To: Goutham GS <goutham@zadarastorage.com>
Cc: kvm@vger.kernel.org
Subject: Re: vfio issue in qemu 2.5

On Thu, 4 Feb 2016 20:31:17 +0530
"Goutham GS" <goutham@zadarastorage.com> wrote:

> Hi All,
> 
> We are facing a vfio issue on qemu 2.5. Really appreciate any help or 
> pointers. Details are as below:
> 
> We are using qemu 2.5 compiled out of git commit
> 0b0571dd246871f18b7d64b5279511e91e2a7bf6 and are using Linux Kernel
> 3.18.19 for both host and the VM. We are also using KVM VM with 
> pci-assign'ed SRIOV VF interfaces.
> 
> The issue happens once in a while when a running VM is rebooted. On 
> boot, the VM hits the following error and stops.
> 
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to 
> set iommu for container: Bad address
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to 
> setup container for group 40
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: vfio: failed to 
> get group 40
> qemu-system-x86_64: -device
> vfio-pci,host=04:00.7,id=hostdev2,bus=pci.0,addr=0x9: Device 
> initialization failed
> 
> Strange thing is, once this error is hit, no further VMs can be 
> spawned on the host and all of them run into the same problem.
> However a reboot of the host appears to solve the issue.
> 
> I have attached the relevant logs.

Is it possible to try a newer kernel on the host?  "Bad address" is -EFAULT,
but I'm not actually able to spot a return path for the VFIO_SET_IOMMU ioctl
that returns -EFAULT.  Is there anything in dmesg when this triggers?
Thanks,

Alex



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: vfio issue in qemu 2.5
  2016-02-05  5:19     ` Goutham GS
@ 2016-02-05  6:02       ` Alex Williamson
  2016-02-05  6:30         ` Goutham GS
  0 siblings, 1 reply; 6+ messages in thread
From: Alex Williamson @ 2016-02-05  6:02 UTC (permalink / raw)
  To: Goutham GS; +Cc: kvm

On Fri, 5 Feb 2016 10:49:23 +0530
"Goutham GS" <goutham@zadarastorage.com> wrote:

> Hello Alex,
> 
> Sorry, I got the timestamps wrong yesterday.
> 
> I see the following messages in kern.log at the time of the issue:
> 
> Jan 31 03:46:39 qa2-sn2 kernel: [228419.858857] vfio-pci 0000:04:01.4:
> enabling device (0000 -> 0002)
> Jan 31 03:46:39 qa2-sn2 kernel: [228419.970492] vfio-pci 0000:04:02.7:
> enabling device (0000 -> 0002)
> Jan 31 03:46:39 qa2-sn2 kernel: [228420.082435] IOMMU: no free domain
> ids Jan 31 03:46:39 qa2-sn2 kernel: [228420.124440] IOMMU: no free
> domain ids
> 
> Does this say anything?

That's kind of a big deal.  Kernel v3.17 introduced a domain ID leak
that didn't get fully fixed until kernel v4.2.  To make matters worse,
even though the fixes were tagged for stable, the last one doesn't seem
to have been backported to any pre-4.2 stable series.  Long story
short, you'll need to move to v4.2 or backport this to your kernel:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=46ebb7af7b93792de65e124e1ab8b89a108a41f2

Thanks,
Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: vfio issue in qemu 2.5
  2016-02-05  6:02       ` Alex Williamson
@ 2016-02-05  6:30         ` Goutham GS
  0 siblings, 0 replies; 6+ messages in thread
From: Goutham GS @ 2016-02-05  6:30 UTC (permalink / raw)
  To: 'Alex Williamson'; +Cc: kvm

Hello Alex,

Thanks a tonne. We will backport the fix.

Regards,
Goutham.

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@redhat.com] 
Sent: Friday, February 5, 2016 11:32 AM
To: Goutham GS <goutham@zadarastorage.com>
Cc: kvm@vger.kernel.org
Subject: Re: vfio issue in qemu 2.5

On Fri, 5 Feb 2016 10:49:23 +0530
"Goutham GS" <goutham@zadarastorage.com> wrote:

> Hello Alex,
> 
> Sorry, I got the timestamps wrong yesterday.
> 
> I see the following messages in kern.log at the time of the issue:
> 
> Jan 31 03:46:39 qa2-sn2 kernel: [228419.858857] vfio-pci 0000:04:01.4:
> enabling device (0000 -> 0002)
> Jan 31 03:46:39 qa2-sn2 kernel: [228419.970492] vfio-pci 0000:04:02.7:
> enabling device (0000 -> 0002)
> Jan 31 03:46:39 qa2-sn2 kernel: [228420.082435] IOMMU: no free domain 
> ids Jan 31 03:46:39 qa2-sn2 kernel: [228420.124440] IOMMU: no free 
> domain ids
> 
> Does this say anything?

That's kind of a big deal.  Kernel v3.17 introduced a domain ID leak that
didn't get fully fixed until kernel v4.2.  To make matters worse, even
though the fixes were tagged for stable, the last one doesn't seem to have
been backported to any pre-4.2 stable series.  Long story short, you'll need
to move to v4.2 or backport this to your kernel:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4
6ebb7af7b93792de65e124e1ab8b89a108a41f2

Thanks,
Alex


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-02-05  6:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-04 15:01 vfio issue in qemu 2.5 Goutham GS
2016-02-04 17:02 ` Alex Williamson
2016-02-04 18:23   ` Goutham GS
2016-02-05  5:19     ` Goutham GS
2016-02-05  6:02       ` Alex Williamson
2016-02-05  6:30         ` Goutham GS

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.