All of lore.kernel.org
 help / color / mirror / Atom feed
* pci_enable_msix()  failure
@ 2012-11-20 18:04 Alex Lyakas
       [not found] ` <50ABE45C.4080509@redhat.com>
  2012-11-29  0:18 ` Bjorn Helgaas
  0 siblings, 2 replies; 3+ messages in thread
From: Alex Lyakas @ 2012-11-20 18:04 UTC (permalink / raw)
  To: linux-pci

Hello list, I was advised to post this question by Jesse Barnes; I also 
posted to KVM list.

I am running Ubuntu-Precise 3.2.0-29-generic #46, with stock KVM ("QEMU 
emulator version 1.0 (qemu-kvm-1.0)") on a Dell R510 server. I have one 
dual-port Intel's NIC 82599, of which I spawn 32 VFs from each port. I spawn
virtual machines with KVM, each VM has 4 VFs attached (two from each PF).

Once in a while, in particular when I spawn multiple VMs in parallel, I hit 
an issue that one of the VFs does not have an IRQ assigned to it. I am 
checking this in /proc/interrupts, looking for entries like 
"kvm:0000:03:14.6". In some cases, an entry is missing for a particular VF. 
As a result, the VF within the VM is non-functional.

I debugged this issue further, by adding prints to kvm.ko code. I see that 
the failure happens in kvm_vm_ioctl_assigned_device/KVM_ASSIGN_DEV_IRQ path, 
which calls assigned_device_enable_host_msix() function, which calls
pci_enable_msix(), which fails with EINVAL or with ENOMEM. This path is 
called twice for each VF.

I see that first pci_enable_msix() returns -12/-22, and when 
kvm_vm_ioctl_set_msix_nr() is called again, it sees that adev->entries_nr != 
0 and fails the call with EINVAL. I can repro it only when spawning like 8 
or 10 VMs in parallel, but it doesn't happen every time. So it seems like 
this is not a resource shortage problem, but some race somewhere.

I tested this with several version of ixgbe drivers, including the in-tree 
version that comes with Precise. It reproduces with all the versions.

Can anybody pls advise on how to debug this issue further?

Thanks,
Alex. 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: pci_enable_msix()  failure
       [not found] ` <50ABE45C.4080509@redhat.com>
@ 2012-11-21 13:48   ` Alex Lyakas
  0 siblings, 0 replies; 3+ messages in thread
From: Alex Lyakas @ 2012-11-21 13:48 UTC (permalink / raw)
  To: Don Dutile; +Cc: linux-pci

Hi Don,

I had indeed iqrbalance set on. So I rested without it, and was able to 
reproduce the issue. However, now the failure happens on call to 
request_threaded_irq() within assigned_device_enable_host_msix() function of 
KVM. The problem is that request_threaded_irq() returns -16 (EBUSY).  Will 
this help somehow to debug the issue further?

Thanks,
Alex.


-----Original Message----- 
From: Don Dutile
Sent: 20 November, 2012 10:13 PM
To: Alex Lyakas
Subject: Re: pci_enable_msix() failure

On 11/20/2012 01:04 PM, Alex Lyakas wrote:
> Hello list, I was advised to post this question by Jesse Barnes; I also 
> posted to KVM list.
>
> I am running Ubuntu-Precise 3.2.0-29-generic #46, with stock KVM ("QEMU 
> emulator version 1.0 (qemu-kvm-1.0)") on a Dell R510 server. I have one 
> dual-port Intel's NIC 82599, of which I spawn 32 VFs from each port. I 
> spawn
> virtual machines with KVM, each VM has 4 VFs attached (two from each PF).
>
> Once in a while, in particular when I spawn multiple VMs in parallel, I 
> hit an issue that one of the VFs does not have an IRQ assigned to it. I am 
> checking this in /proc/interrupts, looking for entries like 
> "kvm:0000:03:14.6". In some cases, an entry is missing for a particular 
> VF. As a result, the VF within the VM is non-functional.
>
> I debugged this issue further, by adding prints to kvm.ko code. I see that 
> the failure happens in kvm_vm_ioctl_assigned_device/KVM_ASSIGN_DEV_IRQ 
> path, which calls assigned_device_enable_host_msix() function, which calls
> pci_enable_msix(), which fails with EINVAL or with ENOMEM. This path is 
> called twice for each VF.
>
> I see that first pci_enable_msix() returns -12/-22, and when 
> kvm_vm_ioctl_set_msix_nr() is called again, it sees that adev->entries_nr 
> != 0 and fails the call with EINVAL. I can repro it only when spawning 
> like 8 or 10 VMs in parallel, but it doesn't happen every time. So it 
> seems like this is not a resource shortage problem, but some race 
> somewhere.
>
> I tested this with several version of ixgbe drivers, including the in-tree 
> version that comes with Precise. It reproduces with all the versions.
>
> Can anybody pls advise on how to debug this issue further?
>
> Thanks,
> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Is your system running w/irqbalance on?
if so, can you test with it off? 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: pci_enable_msix() failure
  2012-11-20 18:04 pci_enable_msix() failure Alex Lyakas
       [not found] ` <50ABE45C.4080509@redhat.com>
@ 2012-11-29  0:18 ` Bjorn Helgaas
  1 sibling, 0 replies; 3+ messages in thread
From: Bjorn Helgaas @ 2012-11-29  0:18 UTC (permalink / raw)
  To: Alex Lyakas; +Cc: linux-pci

On Tue, Nov 20, 2012 at 11:04 AM, Alex Lyakas <alex@zadarastorage.com> wrote:
> Hello list, I was advised to post this question by Jesse Barnes; I also
> posted to KVM list.

When you post a question to more than one list, you should always send
a *single* message with all the lists being copied.  That way
everybody can tell what progress is being made, and we can avoid
duplicating effort.

I see via a Google search that you've gotten responses on the KVM
list, e.g., http://www.spinics.net/lists/kvm/msg82997.html, so I
assume you don't need any more help from linux-pci.

If that's not the case, please add linux-pci to the CC: list of the
thread where you're working on this.

> I am running Ubuntu-Precise 3.2.0-29-generic #46, with stock KVM ("QEMU
> emulator version 1.0 (qemu-kvm-1.0)") on a Dell R510 server. I have one
> dual-port Intel's NIC 82599, of which I spawn 32 VFs from each port. I spawn
> virtual machines with KVM, each VM has 4 VFs attached (two from each PF).
>
> Once in a while, in particular when I spawn multiple VMs in parallel, I hit
> an issue that one of the VFs does not have an IRQ assigned to it. I am
> checking this in /proc/interrupts, looking for entries like
> "kvm:0000:03:14.6". In some cases, an entry is missing for a particular VF.
> As a result, the VF within the VM is non-functional.
>
> I debugged this issue further, by adding prints to kvm.ko code. I see that
> the failure happens in kvm_vm_ioctl_assigned_device/KVM_ASSIGN_DEV_IRQ path,
> which calls assigned_device_enable_host_msix() function, which calls
> pci_enable_msix(), which fails with EINVAL or with ENOMEM. This path is
> called twice for each VF.
>
> I see that first pci_enable_msix() returns -12/-22, and when
> kvm_vm_ioctl_set_msix_nr() is called again, it sees that adev->entries_nr !=
> 0 and fails the call with EINVAL. I can repro it only when spawning like 8
> or 10 VMs in parallel, but it doesn't happen every time. So it seems like
> this is not a resource shortage problem, but some race somewhere.
>
> I tested this with several version of ixgbe drivers, including the in-tree
> version that comes with Precise. It reproduces with all the versions.
>
> Can anybody pls advise on how to debug this issue further?
>
> Thanks,
> Alex.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-11-29  0:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-20 18:04 pci_enable_msix() failure Alex Lyakas
     [not found] ` <50ABE45C.4080509@redhat.com>
2012-11-21 13:48   ` Alex Lyakas
2012-11-29  0:18 ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.